How to Analyze Data in Tables with R
You can use R’s prop.test() function for data in matrices and tables. For prop.test(), these tables need to have two columns with the number of counts for the two possible outcomes.
How to test contingency of tables
Alternatively, you can use the chisq.test() function to analyze tables with a chi-squared (χ2) contingency test. To do this on the matrix with the seat-belt data, you simply do the following:
This returns the following output:
Pearson's Chi-squared test with Yates' continuity correction data: survivors X-squared = 24.3328, df = 1, p-value = 8.105e-07
The values for the statistic (X-squared), the degrees of freedom, and the p-value are exactly the same as with the prop.test() function. That’s to be expected, because — in this case, at least — both tests are equivalent.
How to test tables with more than two columns
Unlike the prop.test() function, the chisq.test() function can deal with tables with more than two columns and even with more than two dimensions. To illustrate this, let’s take a look at the table HairEyeColor. You can see its structure with the following code:
> str(HairEyeColor) Table [1:4, 1:4, 1:2] 32 53 10 3 11 50 10 30 10 25 ... - attr(*, "dimnames")=List of 3 ..$ Hair: chr [1:4] "Black" "Brown" "Red" "Blond" ..$ Eye : chr [1:4] "Brown" "Blue" "Hazel" "Green" ..$ Sex : chr [1:2] "Male" "Female"
So, the table HairEyeColor has three dimensions: one for hair color, one for eye color, and one for sex. The table represents the distribution of these three features among 592 students.
The dimension names of a table are stored in an attribute called dimnames. As you can see from the output of the str() function, this is actually a list with the names for the rows/columns in each dimension. If this list is a named list, the names are used to label the dimensions. You can use the dimnames() function to extract or change the dimension names.
To check whether hair color and eye color are related, you can collapse the table over the first two dimensions using the margin.table() function to summarize hair and eye color for both genders. This function sums the values in some dimensions to give you a summary table with fewer dimensions. For that, you have to specify which margins you want to keep.
So, to get the table of hair and eye color, you use the following:
> HairEyeMargin <- margin.table(HairEyeColor, margin=c(1,2)) > HairEyeMargin Eye Hair Brown Blue Hazel Green Black 68 20 15 5 Brown 119 84 54 29 Red 26 17 14 14 Blond 7 94 10 16
Now you can simply check whether hair and eye color are related by testing it on this table:
> chisq.test(HairEyeMargin) Pearson's Chi-squared test data: HairEyeMargin X-squared = 138.2898, df = 9, p-value < 2.2e-16
As expected, the output of this test tells you that some combinations of hair and eye color are more common than others. Not a big surprise, but you can use these techniques on other, more interesting research questions.