How to Use the apply() Function to Summarize Arrays in R

By Andrie de Vries, Joris Meys

If you have data in the form of an array or matrix and you want to summarize this data, R’s apply() function is really useful. The apply() function traverses an array or matrix by column or row and applies a summarizing function.

The apply() function takes four arguments:

  • X: This is your data — an array (or matrix).

  • MARGIN: A numeric vector indicating the dimension over which to traverse; 1 means rows and 2 means columns.

  • FUN: The function to apply (for example, sum or mean).

  • (dots): If your FUN function requires any additional arguments, you can add them here.

To illustrate this, look at the built-in dataset Titanic. This is a four-dimensional table with passenger data of the ship Titanic, describing their cabin class, gender, age, and whether they survived.

> str(Titanic)
 Table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
 - attr(*, "dimnames")=List of 4
 ..$ Class  : chr [1:4] "1st" "2nd" "3rd" "Crew"
 ..$ Sex   : chr [1:2] "Male" "Female"
 ..$ Age   : chr [1:2] "Child" "Adult"
 ..$ Survived: chr [1:2] "No" "Yes"

To find out how many passengers were in each of their cabin classes, you need to summarize Titanic over its first dimension, Class:

> apply(Titanic, 1, sum)
 1st 2nd 3rd Crew
 325 285 706 885

Similarly, to calculate the number of passengers in the different age groups, you need to apply the sum() function over the third dimension:

> apply(Titanic, 3, sum)
Child Adult
 109 2092

You also can apply a function over two dimensions at the same time. To do this, you need to combine the desired dimensions with the c() function. For example, to get a summary of how many people in each age group survived, you do the following:

> apply(Titanic, c(3, 4), sum)
Age    No Yes
 Child  52 57
 Adult 1438 654