How to Cross the Borders in R - dummies

How to Cross the Borders in R

By Andrie de Vries, Joris Meys

While working with functions in R, sometimes, you work with some objects that you didn’t first create in the workspace. You use the arguments x, mult, and FUN as if they’re objects, and you create an object percent within the function that you can’t find back in the workspace after using the function. So, what’s going on?

Create a test case

Let’s find out through a small example. First, create an object x and a small test() function like this:

x <- 1:5
test <- function(x){
 cat("This is x:", x, "n")
 cat("This is x after removing it:",x,"n")

The test() function doesn’t do much. It takes an argument x, prints it to the console, removes it, and tries to print it again. You may think this function will fail, because x disappears after the line rm(x). But no, if you try this function it works just fine, as shown in the following example:

> test(5:1)
This is x: 5 4 3 2 1
This is x after removing it: 1 2 3 4 5

Even after removing x, R still can find another x that it can print. If you look a bit more closely, you see that the x printed in the second line is actually not the one you gave as an argument, but the x you created before in the workspace. How come?

Search the path

If you use a function, the function first creates a temporary local environment. This local environment is nested within the global environment, which means that, from that local environment, you also can access any object from the global environment. As soon as the function ends, the local environment is destroyed together with all objects in it.

To be completely correct, a function always creates an environment within the environment it’s called from, called the parent environment. If you call a function from the workspace through a script or using the command line, this parent environment happens to be the global environment.

If R sees any object name mentioned in any code in the function, it first searches the local environment. Because it finds an object x there, it uses that one for the first cat() statement. In the next line, R removes that object x. So, when R reaches the third line, it can’t find an object x in the local environment anymore. No problem. R moves up the stack of environments and checks to see if it finds anything looking like an x in the global environment. Because it can find an x there, it uses that one in the second cat() statement.

If you use rm() inside a function, rm() will, by default, delete only objects within that function. This way, you can avoid running out of memory when you write functions that have to work on huge datasets. You can immediately remove big temporary objects instead of waiting for the function to do so at the end.