A data frame can be extended with new variables in R. You may, for example, get data from another player on Granny’s team. Or you may want to calculate a new variable from the other variables in the dataset, like the total sum of baskets made in each game.

Adding a single variable

There are three main ways of adding a variable. Similar to the case of adding observations, you can use either the cbind() function or the indices.

You also can use the dollar sign to add an extra variable. Imagine that Granny asked you to add the number of baskets of her friend Gabrielle to the data frame. First, you would create a vector with that data like this:

> baskets.of.Gabrielle <- c(11, 5, 6, 7, 3, 12, 4, 5, 9)

To create an extra variable named Gabrielle with that data, you simply do the following:

> baskets.df$Gabrielle <- baskets.of.Gabrielle

If you want to check whether this worked, but you don’t want to display the complete data frame, you could use the head() function. This function takes two arguments: the object you want to display and the number of rows you want to see. To see the first four rows of the new data frame, baskets.df, use the following code:

> head(baskets.df, 4)
  Granny Geraldine Gabrielle
1st   12     5    11
2nd   4     4     5
3rd   5     2     6
4th   6     4     7

Adding multiple variables using cbind

You can pretend your data frame is a matrix and use the cbind() function to do this. Unlike when you use rbind() on data frames, you don’t even need to worry about the row or column names. Let’s create a new data frame with the goals for Gertrude and Guinevere. To combine both into a data frame, try:

> new.df <- data.frame(
+  Gertrude = c(3, 5, 2, 1, NA, 3, 1, 1, 4),
+  Guinevere = c(6, 9, 7, 3, 3, 6, 2, 10, 6)
+ )

Although the row names of the data frames new.df and baskets.df differ, R will ignore this and just use the row names of the first data frame in the cbind() function, as you can see from the output of the following code:

> head(cbind(baskets.df, new.df), 4)
  Granny Geraldine Gabrielle Gertrude Guinevere
1st   12     5    11    3     6
2nd   4     4     5    5     9
3rd   5     2     6    2     7
4th   6     4     7    1     3

When using a data frame or a matrix with column names, R will use those as the names of the variables. If you use cbind() to add a vector to a data frame, R will use the vector’s name as a variable name unless you specify one yourself, as you did with rbind().

If you bind a matrix without column names to the data frame, R automatically uses the column numbers as names. That will cause a bit of trouble though, because plain numbers are invalid object names and, hence, more difficult to use as variable names. In this case, you’d better use the indices.

Whenever you want to use a data frame and don’t want to continuously have to type its name followed by $, you can use the functions with() and within(). With the within() function, you also can easily add variables to a data frame.

About This Article

This article is from the book:

About the book authors:

Andrie de Vries is a leading R expert and Business Services Director for Revolution Analytics. With over 20 years of experience, he provides consulting and training services in the use of R. Joris Meys is a statistician, R programmer and R lecturer with the faculty of Bio-Engineering at the University of Ghent.

This article can be found in the category: