How to Use the apply() Function to Summarize Arrays in R
If you have data in the form of an array or matrix and you want to summarize this data, R’s apply() function is really useful. The apply() function traverses an array or matrix by column or row and applies a summarizing function.
The apply() function takes four arguments:
X: This is your data — an array (or matrix).
MARGIN: A numeric vector indicating the dimension over which to traverse; 1 means rows and 2 means columns.
FUN: The function to apply (for example, sum or mean).
... (dots): If your FUN function requires any additional arguments, you can add them here.
To illustrate this, look at the built-in dataset Titanic. This is a four-dimensional table with passenger data of the ship Titanic, describing their cabin class, gender, age, and whether they survived.
> str(Titanic) Table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ... - attr(*, "dimnames")=List of 4 ..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew" ..$ Sex : chr [1:2] "Male" "Female" ..$ Age : chr [1:2] "Child" "Adult" ..$ Survived: chr [1:2] "No" "Yes"
To find out how many passengers were in each of their cabin classes, you need to summarize Titanic over its first dimension, Class:
> apply(Titanic, 1, sum) 1st 2nd 3rd Crew 325 285 706 885
Similarly, to calculate the number of passengers in the different age groups, you need to apply the sum() function over the third dimension:
> apply(Titanic, 3, sum) Child Adult 109 2092
You also can apply a function over two dimensions at the same time. To do this, you need to combine the desired dimensions with the c() function. For example, to get a summary of how many people in each age group survived, you do the following:
> apply(Titanic, c(3, 4), sum) Survived Age No Yes Child 52 57 Adult 1438 654