How to Use the Apply Family of Functions in R - dummies

How to Use the Apply Family of Functions in R

By Andrie de Vries, Joris Meys

Using for loops has some very important side effects. The objects you create in the for loop stay in the workspace afterward. Objects you change in the for loop are changed in the workspace. While this may be the intentions for some, for others, this is an unwanted side effect of the way for loops are implemented in R.

Take a look at the following example:

> songline <- 'Get out of my dreams...'
> for(songline in 1:5) print('...Get into my car!')

Contrary to what you may expect, after running this code, the value of songline is not the string ‘Get out of my dreams…’, but the number 5, as shown in the output below:

> songline
[1] 5

Although you never explicitly changed the value of songline anywhere in the code, R does so implicitly when carrying out the for loop. Every iteration, R reassigns the next value from the vector to songline . . . in the workspace!

By choosing the names of the variables and the identifier wisely, you can avoid running into trouble. When writing large scripts, you need to do some serious bookkeeping for the names to avoid mistakes.

To be completely correct, using a for loop has an effect on the environment you work in at that moment. If you just use the for loop in scripts that you run in the console, the effects will take place in the workspace. If you use a for loop in the body of the function, the effects will take place within the environment of that function.

Here’s the good news: R has another looping system that’s very powerful, that’s at least as fast as for loops (and sometimes much faster), and — most important of all — that doesn’t have the side effects of a for loop. Actually, this system consists of a complete family of related functions, known as the apply family. This family contains seven functions, all ending with apply.

The family features

Before you start using any of the functions in the apply family, here are the most important properties of these functions:

  • Every one of the apply functions takes at least two arguments: an object and another function. You pass the function as an argument.

  • None of these apply functions has side effects. This is the main reason to use them!: If you can use any apply function instead of a for loop, use the apply solution. Be aware that possible side effects of the applied function are not taken care of by the apply family.

  • Every apply function can pass on arguments to the function that is given as an argument. It does that using the dots argument.

  • Every function of the apply family always returns a result. Using the apply family makes sense only if you need that result. If you want to print messages to the console with print() or cat() for example, using the apply family is unnecessary.

Meet three of the members

Say hello to apply(), sapply(), and lapply(), the most used members of the apply family. Every one of these functions applies another function to all elements in an object. What those elements are depends on the object and the function.

Function Name Objects the Function Works On What the Function Sees as Elements Result Type
apply Matrix Rows or columns Vector, matrix, array, or list
Array Rows, columns, or any dimension Vector, matrix, array, or list
Data frame Rows or columns Vector, matrix, array, or list
sapply Vector Elements Vector, matrix, or list
Data frame Variables Vector, matrix, or list
List Elements Vector, matrix, or list
lapply Vector Elements List
Data frame Variables List
List Elements List