How to Calculate Data Proportions and Find the Center in R

By Andrie de Vries, Joris Meys

After you have the data table with the counts, you can use R to easily calculate the proportion of each count to the total simply by dividing the table by the total counts. To calculate the proportion of manual and automatic gearboxes in the dataset cars, you can use the following code:

> amtable/sum(amtable)
  auto manual
0.40625 0.59375

Yet, R also provides the prop.table() function to do the same. You can get the exact same result as the previous line of code by doing the following:

> prop.table(amtable)

You may wonder why you would use an extra function for something that’s as easy as dividing by the sum. The prop.table() function also can calculate marginal proportions.

In statistics, the mode of a categorical variable is the value that occurs most frequently. It isn’t exactly the center of your data, but if there’s no order in your data — if you look at a nominal variable — you can’t really talk about a center either.

Although there isn’t a specific function to calculate the mode, you can get it by combining a few tricks:

  1. To get the counts for each value, use table().

  2. To find the location of the maximum number of counts, use max().

  3. To find the mode of your variable, select the name corresponding with the location in Step 2 from the table in Step 1.

So, to find the mode for the variable am in the dataset cars, you can use the following code:

> id <- amtable == max(amtable)
> names(amtable)[id]
[1] "manual"

The variable id contains a logical vector that has the value TRUE for every value in the table amtable that is equal to the maximum in that table. You select the name from the values in amtable using this logical vector as an index.

You also can use the which.max() function to find the location of the maximum in a vector. This function has one important disadvantage, though: If you have multiple maximums, which.max() will return the position of the first maximum only. If you’re interested in all maximums, you should use the construct in the previous example.