How to Test Data Proportions with R
Let’s look at an example to illustrate the basic R tests for data proportions. The following example is based on real research, published by Robert Rutledge, MD, and his colleagues in the Annals of Surgery (1993).
In a hospital in North Carolina, the doctors registered the patients who were involved in a car accident and whether they used seat belts. The following matrix represents the number of survivors and deceased patients in each group:
> survivors <- matrix(c(1781,1443,135,47), ncol=2) > colnames(survivors) <- c('survived','died') > rownames(survivors) <- c('no seat belt','seat belt') > survivors survived died no seat belt 1781 135 seat belt 1443 47
To know whether seat belts made a difference in the chances of surviving, you can carry out a proportion test. This test tells how probable it is that both proportions are the same. A low p-value tells you that both proportions probably differ from each other. To test this in R, you can use the prop.test() function on the preceding matrix:
> result.prop <- prop.test(survivors)
You also can use the prop.test() function on tables or vectors. If you use it with vectors, remember that the first vector has to be the number of successes, and the second number has to be the total number of cases.
The prop.test() function then gives you the following output:
> result.prop 2-sample test for equality of proportions with continuity correction data: survivors X-squared = 24.3328, df = 1, p-value = 8.105e-07 alternative hypothesis: two.sided 95 percent confidence interval: -0.05400606 -0.02382527 sample estimates: prop 1 prop 2 0.9295407 0.9684564
This test report is almost identical to the one from t.test() and contains essentially the same information. At the bottom, R prints for you the proportion of people who died in each group. The p-value tells you how likely it is that both the proportions are equal.
So, you see that the chance of dying in a hospital after a crash is lower if you’re wearing a seat belt at the time of the crash. R also reports the confidence interval of the difference between the proportions.