How to Use Loops with Indices in R - dummies

How to Use Loops with Indices in R

By Andrie de Vries, Joris Meys

Using loops in R is very handy, but you can write more efficient code if you loop not over the values but over the indices. To do so, you replace the middle section in the function with the following code:

nclient <- length(client)
VAT <- numeric(nclient)
for(i in seq_along(client)){
  VAT[i] <- switch(client[i], private=1.12, public=1.06, 1))

Here are a few differences from using values of the vector for looping:

  • You assign the length of the vector client to the variable nclient.

  • Then you make a numeric vector VAT that is exactly as long as the vector client. This is called pre-allocation of a vector.

  • Then you loop over indices of client instead of the vector itself by using the function seq_along(). In the first pass through the loop, the first value in VAT is set to be the result of switch() applied to the first value in client. In the second pass, the second value of VAT is the result of switch() applied to the second value in client and so on.

You may be tempted to replace seq_along(client) with the vector 1:nclient, but that would be a bad idea. If the vector client has a length of 0, seq_along(client) creates an empty vector and the code in the loop never executes. If you use 1:nclient, R creates a vector c(1,0) and loop over those two values, giving you a completely wrong result.

Every time you lengthen an object in R, R has to copy the whole object and move it to a new place in the memory. This has two effects:

  • First, it slows down your code, because all the copying takes time.

  • Second, as R continuously moves things around in memory, this memory gets split up in a lot of small spaces.

This is called fragmentation, and it makes the communication between R and the memory less smooth. You can avoid this fragmentation by pre-allocating memory as in the previous example.