How to Use Internal Functions in R - dummies

How to Use Internal Functions in R

By Andrie de Vries, Joris Meys

At times, it can be very helpful to use internal functions in R. Writing your functions in such a way that they need objects in the global environment doesn’t really make sense, because you use functions to avoid dependency on objects in the global environment in the first place.

In fact, the whole concept behind R strongly opposes using global variables used in different functions. As a functional programming language, one of the main ideas of R is that the outcome of a function should not be dependent on anything but the values for the arguments of that function. If you give the arguments the same values, you always get the same result.

If you come from other programming languages like Java, this characteristic may strike you as odd, but it has its merits. Sometimes you need to repeat some calculations a few times within a function, but these calculations only make sense inside that function.

Suppose you want to compare the light production of some lamps at half power and full power. The towels you put in front of the window to block the sun out aren’t really up to the job, so you also measure how much light is still coming through. You want to subtract the mean of this value from the results in order to correct your measurements.

To calculate the efficiency at 50 percent power, you can use the following function:

calculate.eff <- function(x, y, control){
 min.base <- function(z) z - mean(control)
 min.base(x) / min.base(y)

Inside the calculate.eff() function, you see another function definition for a min.base() function. Exactly as in the case of other objects, this function is created in the local environment of calculate.eff() and destroyed again when the function is done. You won’t find min.base() back in the workspace.

You can use the function as follows:

> half <- c(2.23, 3.23, 1.48)
> full <- c(4.85, 4.95, 4.12)
> nothing <- c(0.14, 0.18, 0.56, 0.23)
> calculate.eff(half, full, nothing)
[1] 0.4270093 0.6318887 0.3129473

If you look a bit more closely at the function definition of min.base(), you notice that it uses an object control but doesn’t have an argument with that name. How does this work then? When you call the function, the following happens:

  1. The function calculate.eff() creates a new local environment that contains the objects x (with the value of fifty), y (with the value of hundred), control (with the value of nothing), as well as the function min.base().

  2. The function min.base() creates a new local environment within the one of calculate.eff() containing only one object z with the value of x.

  3. min.base() looks for the object control in the environment of calculate.eff() and subtracts the mean of this vector from every number of z. This value is then returned.

  4. The same thing happens again, but this time z gets the value of y.

  5. Both results are divided by another, and the result is passed on to the global environment again.

The local environment is embedded in the environment where the function is defined, not where it’s called. Suppose you use addPercent() inside calculate.eff() to format the numbers. The local environment created by addPercent() is not embedded in the one of calculate.eff() but in the global environment, where addPercent() is defined.