How to Use Apply to Create Tabular Summaries in R
You use tapply() to create tabular summaries of data in R. With tapply(), you can easily create summaries of subgroups in data. This function takes three arguments:
X: A vector
INDEX: A factor or list of factors
FUN: A function
For example, calculate the mean sepal length in the dataset iris:
> tapply(iris$Sepal.Length, iris$Species, mean) setosa versicolor virginica 5.006 5.936 6.588
With this short line of code, you do some powerful stuff. You tell R to take the Sepal.Length column, split it according to Species, and then calculate the mean for each group.
This is an important idiom for writing code in R, and it usually goes by the name Split, Apply, and Combine (SAC). In this case, you split a vector into groups, apply a function to each group, and then combine the result into a vector.
Of course, using the with() function, you can write your line of code in a slightly more readable way:
> with(iris, tapply(Sepal.Length, Species, mean)) setosa versicolor virginica 5.006 5.936 6.588
Using tapply(), you also can create more complex tables to summarize your data. You do this by using a list as your INDEX argument.
How to use tapply() to create higher-dimensional tables
For example, try to summarize the data frame mtcars, a built-in data frame with data about motor-car engines and performance. As with any object, you can use str() to inspect its structure:
The variable am is a numeric vector that indicates whether the engine has an automatic (0) or manual (1) gearbox. Because this isn’t very descriptive, start by creating a new object, cars, that is a copy of mtcars, and change the column am to be a factor:
> cars <- within(mtcars, + am <- factor(am, levels=0:1, labels=c("Automatic", "Manual")) + )
Now use tapply() to find the mean miles per gallon (mpg) for each type of gearbox:
> with(cars, tapply(mpg, am, mean)) Automatic Manual 17.14737 24.39231
Yes, you’re correct. This is still only a one-dimensional table. Now, try to make a two-dimensional table with the type of gearbox (am) and number of gears (gear):
> with(cars, tapply(mpg, list(gear, am), mean)) Automatic Manual 3 16.10667 NA 4 21.05000 26.275 5 NA 21.380
You use tapply() to create tabular summaries of data. This is a little bit similar to the table() function. However, table() can create only contingency tables (that is, tables of counts), whereas with tapply() you can specify any function as the aggregation function. In other words, with tapply(), you can calculate counts, means, or any other value.
If you want to summarize statistics on a single vector, tapply() is very useful and quick to use.
How to use aggregate()
Another R function that does something very similar is aggregate():
> with(cars, aggregate(mpg, list(gear=gear, am=am), mean)) gear am x 1 3 Automatic 16.10667 2 4 Automatic 21.05000 3 4 Manual 26.27500 4 5 Manual 21.38000
Next, you take aggregate() to new heights using the formula interface.