How to Use Vectorization with If Statements in R - dummies

How to Use Vectorization with If Statements in R

By Andrie de Vries, Joris Meys

Vectorization is one of the defining attributes of the R language. R wouldn’t be R if it didn’t have some kind of vectorized version of an if…else statement.

The problem

The priceCalculator() function still isn’t very economical to use. If you have 100 clients, you’ll have to calculate the price for every client separately. Check for yourself what happens if you add, for example, three different amounts of hours as an argument:

> priceCalculator(c(25,110))
[1] 1060 4664
Warning message:
In if (hours > 100) net.price <- net.price * 0.9 :
  the condition has length > 1 and only the first element will be used

Not only does R warn you that something fishy is going on, but the result you get is plain wrong. Instead of $4,664, the second client should be charged only $4,198:

> priceCalculator(110)
[1] 4198

The warning message should give you a fair idea about what hapened. An if statement can deal only with a single value, but the expression hours > 100 returns two values, as shown by the following code:

>  c(25, 110) > 100

Choose based on a logical vector in R

The solution you’re looking for is the ifelse() function, which is a vectorized way of choosing values from two vectors. This remarkable function takes three arguments:

  • A test vector with logical values

  • A vector with values that should be returned if the corresponding value in the test vector is TRUE

  • A vector with values that should be returned if the corresponding value in the test vector is FALSE

How it works

Take a look at the following example:

> ifelse(c(1,3) < 2.5 , 1:2 , 3:4)
[1] 1 4

Run over the steps the function takes:

  1. The conditional expression c(1,3) < 2.5 is evaluated to a logical vector.

  2. The first value of this vector is TRUE, because 1 is smaller than 2.5. So, the first value of the result is the first value of the second argument, which is 1.

  3. The next value is FALSE, because 3 is larger than 2.5. Hence, ifelse() takes the second value of the third argument (which is 4) as the second value of the result.

  4. A vector with the selected values is returned as the result.

Try it out

To see how this works in the example of the priceCalculator() function, try the function out at the command line in the console. Say you have two clients and you worked 25 and 110 hours for them, respectively. You can calculate the net price with the following code:

> my.hours <- c(25,110)
> my.hours * 40 * ifelse(my.hours > 100, 0.9, 1)
[1] 1000 3960

Remember, the ifelse() function can recycle its arguments. And that’s exactly what it does here. In the preceding ifelse() function call, you translate the logical vector created by the expression my.hours > 100 into a vector containing the numbers 0.9 and 1 in lieu of TRUE and FALSE, respectively.

Adapt the function in R

Of course, you need to adapt the priceCalculator() function in such a way that you also can input a vector with values for the argument public. Otherwise, you wouldn’t be able to calculate the prices for a mixture of public and private clients. The final function looks like this:

priceCalculator <- function(hours,pph=40,public){
    net.price <- hours * pph
    net.price <- net.price * ifelse(hours > 100 , 0.9, 1)
    tot.price <- net.price * ifelse(public, 1.06, 1.12)

Next, create a little data frame to test the function. For example:

> clients <- data.frame(
+  hours = c(25, 110, 125, 40),
+  public = c(TRUE,TRUE,FALSE,FALSE)

You can use this data frame now as arguments for the priceCalculator() function, like this:

> with(clients, priceCalculator(hours, public = public))
[1] 1060 4198 5040 1792