How to Take Samples from Data in R
How to Compare Two Data Samples with R’s T-Test
How to Set the Contrasts for Your Data with R

How to Subset Data Frames in R

Now that you’ve reviewed the rules for creating subsets, you can try it with some data frames in R. You just have to remember that a data frame is a two-dimensional object and contains rows as well as columns. This means that you need to specify the subset for rows and columns independently. To do so, you combine the operators.

To illustrate subsetting of data frames, have a look at the built-in dataset iris, a data frame of five columns and 150 rows with data about iris flowers.

> str(iris)
'data.frame': 150 obs. of 5 variables:
 $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species   : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

When you subset objects with more than one dimension, you specify the subset argument for each dimension — you separate the subset arguments with commas.

For example, to get the first five rows of iris and all the columns, try the following:

> iris[1:5, ]

To get all the rows but only two of the columns, try the following:

> iris[, c("Sepal.Length", "Sepal.Width")]

You need to take special care when subsetting in a single column of a data frame. Try the following:

iris[, 'Sepal.Length']

You’ll see that the result is a vector, not a data frame as you would expect.

When your subset operation returns a single column, the default behavior is to return a simplified version. The way this works, is that R inspects the lengths of the returned elements. If all these elements have the same length, then R simplifies the result to a vector, matrix, or array.

In the example, R simplifies the result to a vector. To override this behavior, you need to specify the argument drop=FALSE in your subset operation:

> iris[, 'Sepal.Length', drop=FALSE]

Alternatively, you can subset the data frame like a list. The following code returns you a data frame with only one column as well:

> iris['Sepal.Length']

Finally, to get a subset of only some columns and some rows:

> iris[1:5, c("Sepal.Length", "Sepal.Width")]
 Sepal.Length Sepal.Width
1     5.1     3.5
2     4.9     3.0
3     4.7     3.2
4     4.6     3.1
5     5.0     3.6
  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
How to Compare Values in Logical Vectors in R
How to Calculate Data Correlations in R
How to Adapt the addPercent Function in R
How to Add Calculated Fields to Data in R
How to Source a Script in R