How to Look at Data Margins and Proportions in R - dummies

How to Look at Data Margins and Proportions in R

By Andrie de Vries, Joris Meys

In categorical data analysis, many R techniques use the marginal totals of the table in the calculations. The marginal totals are the total counts of the cases over the categories of interest. For example, the marginal totals for behavior would be the sum over the rows of the table trial.table.

How to add margins to the table

R allows you to extend a table with the marginal totals of the rows and columns in one simple command. For that, you use the addmargins() function, like this:

> addmargins(trial.table)
    sick healthy Sum
risk   34    9 43
no_risk  11   32 43
Sum    45   41 86

You also can add the margins for only one dimension by specifying the margin argument for the addmargins() function. For example, to get only the marginal counts for the behavior, you do the following:

> addmargins(trial.table,margin=2)
    sick healthy Sum
risk   34    9 43
no_risk  11   32 43

The margin argument takes a number or a vector of numbers, but it can be a bit confusing. The margins are numbered the same way as in the apply() function. So 1 stands for rows and 2 for columns. To add the column margin, you need to set margin to 2, but this column margin contains the row totals.

How to calculate proportions

You can convert a table with counts to a table with proportions very easily using the prop.table() function. This also works for multiway tables. If you want to know the proportions of observations in every cell of the table to the total number of cases, you simply do the following:

> prop.table(trial.table)
       sick  healthy
risk  0.3953488 0.1046512
no_risk 0.1279070 0.3720930

This tells you that, for example, 10.4 percent of the people in the study were healthy, even when they showed risk behavior.

How to calculate proportions over columns and rows

But what if you want to know which fraction of people with risk behavior got sick? Then you don’t have to calculate the proportions by dividing the counts by the total number of cases for the whole dataset; instead, you divide the counts by the marginal totals.

R lets you do this very easily using, again, the prop.table() function, but this time specifying the margin argument.

Take a look at the table again. You want to calculate the proportions over each row, because each row represents one category of behavior. So, to get the correct proportions, you specify margin=1 like this:

> prop.table(trial.table, margin=1)
       sick  healthy
risk  0.7906977 0.2093023
no_risk 0.2558140 0.7441860

In every row, the proportions sum up to 1. Now you can see that 79 percent of the people showing risk behavior got sick. Well, it isn’t big news that risky behavior can cause diseases, and the proportions shown in the last result point in that direction.

Yet, scientists believe you only if you can back it up in a more objective way. That’s the point at which you should consider doing some statistical testing.