How to Hunt for Bugs in R - dummies

How to Hunt for Bugs in R

By Andrie de Vries, Joris Meys

Hunting for Bugs in R can sometimes be a tricky situation. Although the error message always tells you which line of code generates the error, it may not be the line of code where things started going wrong. This makes bug hunting a complex business, but some simple strategies can help you track down these pesky creatures.

Calculate the logit

To illustrate some bug-hunting strategies in R,use a simple example. Say, for example, your colleague wrote two functions to calculate the logit from both proportions and percentages, but he can’t get them to work. So, he asks you to help find the bugs. Here’s the code he sends you:

# checks input and does logit calculation
logit <- function(x){
 x <- ifelse(x < 0 | x > 1, "NA", x)
 log(x / (1 - x) )
# transforms percentage to number and calls logit
logitpercent <- function(x){
 x <- gsub("%", ", x)

Copy and paste this code into the editor, and save the file using, for example, logitfunc.R as its name. After that, source the file in R from the editor using either the source() function or the source button or command from the editor of your choice. Now the function code is loaded in R, and you’re ready to start hunting.

The logit is nothing else but the logarithm of the odds, calculated as log(x / (1-x)) if x is the probability of some event taking place. Statisticians use this when modeling binary data using generalized linear models. If you ever need to calculate a logit yourself, you can use the function qlogis() for that. To calculate probabilities from logit values, you use the plogis() function.

Know where an error comes from

Your colleague complained that he got an error when trying the following code:

> logitpercent('50%')
Error in 1 - x : non-numeric argument to binary operator

Sure enough, but you don’t find the code 1 – x in the body of logitpercent(). So, the error comes from somewhere else. To know from where, you can use the traceback() function immediately after the error occurred, like this:

> traceback()
2: logit(as.numeric(x)) at logitfunc.R#9
1: logitpercent("50%")

This traceback() function prints what is called the call stack that lead to the last error. This call stack represents the sequence of function calls, but in reverse order. The function at the top is the function in which the actual error is generated.

In this example, R called the logitpercent() function, and that function, in turn, called logit(). The traceback tells you that the error occurred inside the logit() function. Even more, the traceback() function tells you that the error occurred in line 9 of the logitfunc.R code file, as indicated by logitfunc.R#9 in the traceback() output.

The call stack gives you a whole lot of information — sometimes too much. It may point to some obscure internal function as the one that threw the error. If that function doesn’t ring a bell, check higher in the call stack for a function you recognize and start debugging from there.