How to Look at Data Margins and Proportions in R
In categorical data analysis, many R techniques use the marginal totals of the table in the calculations. The marginal totals are the total counts of the cases over the categories of interest. For example, the marginal totals for behavior would be the sum over the rows of the table trial.table.
How to add margins to the table
R allows you to extend a table with the marginal totals of the rows and columns in one simple command. For that, you use the addmargins() function, like this:
> addmargins(trial.table) sick healthy Sum risk 34 9 43 no_risk 11 32 43 Sum 45 41 86
You also can add the margins for only one dimension by specifying the margin argument for the addmargins() function. For example, to get only the marginal counts for the behavior, you do the following:
> addmargins(trial.table,margin=2) sick healthy Sum risk 34 9 43 no_risk 11 32 43
The margin argument takes a number or a vector of numbers, but it can be a bit confusing. The margins are numbered the same way as in the apply() function. So 1 stands for rows and 2 for columns. To add the column margin, you need to set margin to 2, but this column margin contains the row totals.
How to calculate proportions
You can convert a table with counts to a table with proportions very easily using the prop.table() function. This also works for multiway tables. If you want to know the proportions of observations in every cell of the table to the total number of cases, you simply do the following:
> prop.table(trial.table) sick healthy risk 0.3953488 0.1046512 no_risk 0.1279070 0.3720930
This tells you that, for example, 10.4 percent of the people in the study were healthy, even when they showed risk behavior.
How to calculate proportions over columns and rows
But what if you want to know which fraction of people with risk behavior got sick? Then you don’t have to calculate the proportions by dividing the counts by the total number of cases for the whole dataset; instead, you divide the counts by the marginal totals.
R lets you do this very easily using, again, the prop.table() function, but this time specifying the margin argument.
Take a look at the table again. You want to calculate the proportions over each row, because each row represents one category of behavior. So, to get the correct proportions, you specify margin=1 like this:
> prop.table(trial.table, margin=1) sick healthy risk 0.7906977 0.2093023 no_risk 0.2558140 0.7441860
In every row, the proportions sum up to 1. Now you can see that 79 percent of the people showing risk behavior got sick. Well, it isn’t big news that risky behavior can cause diseases, and the proportions shown in the last result point in that direction.
Yet, scientists believe you only if you can back it up in a more objective way. That’s the point at which you should consider doing some statistical testing.