How to Analyze Data in Tables with R - dummies

How to Analyze Data in Tables with R

By Andrie de Vries, Joris Meys

You can use R’s prop.test() function for data in matrices and tables. For prop.test(), these tables need to have two columns with the number of counts for the two possible outcomes.

How to test contingency of tables

Alternatively, you can use the chisq.test() function to analyze tables with a chi-squared (χ2) contingency test. To do this on the matrix with the seat-belt data, you simply do the following:

> chisq.test(seatbelt.table)

This returns the following output:

 Pearson's Chi-squared test with Yates' continuity correction
data: survivors
X-squared = 24.3328, df = 1, p-value = 8.105e-07

The values for the statistic (X-squared), the degrees of freedom, and the p-value are exactly the same as with the prop.test() function. That’s to be expected, because — in this case, at least — both tests are equivalent.

How to test tables with more than two columns

Unlike the prop.test() function, the chisq.test() function can deal with tables with more than two columns and even with more than two dimensions. To illustrate this, let’s take a look at the table HairEyeColor. You can see its structure with the following code:

> str(HairEyeColor)
 Table [1:4, 1:4, 1:2] 32 53 10 3 11 50 10 30 10 25 ...
 - attr(*, "dimnames")=List of 3
 ..$ Hair: chr [1:4] "Black" "Brown" "Red" "Blond"
 ..$ Eye : chr [1:4] "Brown" "Blue" "Hazel" "Green"
 ..$ Sex : chr [1:2] "Male" "Female"

So, the table HairEyeColor has three dimensions: one for hair color, one for eye color, and one for sex. The table represents the distribution of these three features among 592 students.

The dimension names of a table are stored in an attribute called dimnames. As you can see from the output of the str() function, this is actually a list with the names for the rows/columns in each dimension. If this list is a named list, the names are used to label the dimensions. You can use the dimnames() function to extract or change the dimension names.

To check whether hair color and eye color are related, you can collapse the table over the first two dimensions using the margin.table() function to summarize hair and eye color for both genders. This function sums the values in some dimensions to give you a summary table with fewer dimensions. For that, you have to specify which margins you want to keep.

So, to get the table of hair and eye color, you use the following:

> HairEyeMargin <- margin.table(HairEyeColor, margin=c(1,2))
> HairEyeMargin
Hair  Brown Blue Hazel Green
 Black  68  20  15   5
 Brown  119  84  54  29
 Red   26  17  14  14
 Blond   7  94  10  16

Now you can simply check whether hair and eye color are related by testing it on this table:

> chisq.test(HairEyeMargin)
       Pearson's Chi-squared test
data: HairEyeMargin
X-squared = 138.2898, df = 9, p-value < 2.2e-16

As expected, the output of this test tells you that some combinations of hair and eye color are more common than others. Not a big surprise, but you can use these techniques on other, more interesting research questions.