How to Apply Functions on Rows and Columns in R - dummies

How to Apply Functions on Rows and Columns in R

By Andrie de Vries, Joris Meys

In R, you can use the apply() function to apply a function over every row or column of a matrix or data frame. This presents some very handy opportunities.

Count in R using the apply function

Imagine you counted the birds in your backyard on three different days and stored the counts in a matrix like this:

> counts <- matrix(c(3,2,4,6,5,1,8,6,1), ncol=3)
> colnames(counts) <- c('sparrow','dove','crow')
> counts
     sparrow dove crow
[1,]       3    6    8
[2,]       2    5    6
[3,]       4    1    1

Each column represents a different species, and each row represents a different day. Now you want to know the maximum count per species on any given day. You could construct a for loop to do so, but using apply(), you do this in only one line of code:

> apply(counts, 2, max)
sparrow    dove    crow
      4       6       8

The apply() function returns a vector with the maximum for each column and conveniently uses the column names as names for this vector as well. If R doesn’t find names for the dimension over which apply() runs, it returns an unnamed object instead.

Let’s take a look at how this apply() function works. In the previous lines of code, you used three arguments:

  • The object on which the function has to be applied: In this case, it’s the matrix counts.

  • The dimension or index over which the function has to be applied: The number 1 means row-wise, and the number 2 means column-wise. Here, we apply the function over the columns. In the case of more-dimensional arrays, this index can be larger than 2.

  • The name of the function that has to be applied: You can use quotation marks around the function name, but you don’t have to. Here, we apply the function max. Note that there are no parentheses needed after the function name.

The apply() function splits up the matrix in rows. Remember that if you select a single row or column, R will, by default, simplify that to a vector. The apply() function then uses these vectors one by one as an argument to the function you specified. So, the applied function needs to be able to deal with vectors.

Add extra arguments to the apply function

Let’s go back to our example from the preceding section: Imagine you didn’t look for doves the second day. This means that, for that day, you don’t have any data, so you have to set that value to NA like this:

> counts[2, 2] <- NA

If you apply the max function on the columns of this matrix, you get the following result:

> apply(counts,2,max)
sparrow    dove    crow
      4      NA       8

That’s not what you want. In order to deal with the missing values, you need to pass the argument na.rm to the max function in the apply() call (see Chapter 4). Luckily, this is easily done in R. You just have to add all extra arguments to the function as extra arguments of the apply() call, like this:

> apply(counts, 2, max, na.rm=TRUE)
sparrow    dove    crow
      4       6       8

You can pass any arguments you want to the function in the apply() call by just adding them between the parentheses after the first three arguments.