How to Create a Two-Way Data Table with R

A two-way table is a table that describes two categorical data variables together, and R gives you a whole toolset to work with two-way tables. They contain the number of cases for each combination of the categories in both variables. The analysis of categorical data always starts with tables. But first, you have to create the tables.

How to create an R data table from two variables

For example, you want to know how many cars have three, four, or five gears, but split up for cars with automatic gearboxes and cars with manual gearboxes. You can do this again with using the table() function with two arguments, like this:

> with(cars, table(am, gear))
     3 4 5
 auto  0 8 5
 manual 15 4 0

The levels of the variable you give as the first argument are the row names, and the levels of the variable you give as the second argument are the column names. In the table, you get the counts for every combination. For example, you can count 15 cars with manual gearboxes and three gears.

How to create R data tables from a matrix

Researchers also use tables for more serious business, like for finding out whether a certain behavior (like smoking) has an impact on the risk of getting an illness (for example, lung cancer). This way you have four possible cases: risk behavior and sick, risk behavior and healthy, no risk behavior and healthy, or no risk behavior and sick.

Often the result of such a study consists of the counts for every combination. If you have the counts for every case, you can very easily create the table yourself, like this:

> trial <- matrix(c(34,11,9,32), ncol=2)
> colnames(trial) <- c('sick', 'healthy')
> rownames(trial) <- c('risk', 'no_risk')
> trial.table <- as.table(trial)

With this code, you do the following:

  1. Create a matrix with the number of cases for every combination of sick/healthy and risk/no risk behavior.

  2. Add column names to point out which category the counts are for.

  3. Convert that matrix to a table.

The result looks like this:

> trial.table
    sick healthy
risk   34    9
no_risk  11   32

A table like trial.table can be seen as a summary of two variables. One variable indicates if the person is sick or healthy, and the other variable indicates whether the person shows risky behavior.

How to extract the data in R

Although tables and matrices are two different beasts, you can treat a two-way table like a matrix in most situations. This becomes handy if you want to extract values from the table. If you want to know how many people were sick and showed risk behavior, you simply do the following:

> trial.table['risk', 'sick']
[1] 34
  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
Advertisement

Inside Dummies.com