How to Add Calculated Fields to Data in R - dummies

How to Add Calculated Fields to Data in R

By Andrie de Vries, Joris Meys

After you’ve created the appropriate subset of your data, the next step in your analysis is likely to be to perform some calculations with R.

How to do arithmetic on columns of a data frame

R makes it very easy to perform calculations on columns of a data frame because each column is itself a vector. Sticking to the iris data frame, try to do a few calculations on the columns. For example, calculate the ratio between the lengths and width of the sepals:

> x <- iris$Sepal.Length / iris$Sepal.Width

Now you can use all the R tools to examine your result. For example, inspect the first five elements of your results with the head() function:

> head(x)
[1] 1.457143 1.633333 1.468750 1.483871 1.388889 1.384615

As you can see, performing calculations on columns of a data frame is straightforward. Just keep in mind that each column is really a vector, so you simply have to remember how to perform operations on vectors.

How to use with and within to improve code readability

After a short while of writing subset statements in R, you’ll get tired of typing the dollar sign to extract columns of a data frame. Fortunately, there is a way to reduce the amount of typing and to make your code much more readable at the same time. The trick is to use the with() function. Try this:

> y <- with(iris, Sepal.Length / Sepal.Width)

The with() function allows you to refer to columns inside a data frame without explicitly using the dollar sign or even the name of the data frame itself. So, in our example, because you use with(iris, …) R knows to evaluate both Sepal.Length and Sepal.Width in the context of iris.

Hopefully, you agree that this is much easier to read and understand. By printing the values of your new variable y, you can confirm that it’s identical to x in the previous example.

> head(y)
[1] 1.457143 1.633333 1.468750 1.483871 1.388889 1.384615

You also can use the identical() function to get R to tell you whether these values are, in fact, the same:

> identical(x, y)
[1] TRUE

In addition to with(), the helpful within() function allows you to assign values to columns in your data very easily. Say you want to add your calculated ratio of sepal length to width to the original data frame. You’re already familiar with writing it like this:

> iris$ratio <- iris$Sepal.Length / iris$Sepal.Width

Now, using within() it turns into the following:

> iris <- within(iris, ratio <- Sepal.Length / Sepal.Width)

This works in a very similar way to with(), except that you can use the assign operator (<-) inside your function. If you now look at the structure of iris, you’ll notice that ratio is a column:

> head(iris$ratio)
[1] 1.457143 1.633333 1.468750 1.483871 1.388889 1.384615