Online Test Banks
Score higher
See Online Test Banks
Learning anything is easy
Browse Online Courses
Mobile Apps
Learning on the go
Explore Mobile Apps
Dummies Store
Shop for books and more
Start Shopping

How to Subset Data Frames in R

Now that you’ve reviewed the rules for creating subsets, you can try it with some data frames in R. You just have to remember that a data frame is a two-dimensional object and contains rows as well as columns. This means that you need to specify the subset for rows and columns independently. To do so, you combine the operators.

To illustrate subsetting of data frames, have a look at the built-in dataset iris, a data frame of five columns and 150 rows with data about iris flowers.

> str(iris)
'data.frame': 150 obs. of 5 variables:
 $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species   : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

When you subset objects with more than one dimension, you specify the subset argument for each dimension — you separate the subset arguments with commas.

For example, to get the first five rows of iris and all the columns, try the following:

> iris[1:5, ]

To get all the rows but only two of the columns, try the following:

> iris[, c("Sepal.Length", "Sepal.Width")]

You need to take special care when subsetting in a single column of a data frame. Try the following:

iris[, 'Sepal.Length']

You’ll see that the result is a vector, not a data frame as you would expect.

When your subset operation returns a single column, the default behavior is to return a simplified version. The way this works, is that R inspects the lengths of the returned elements. If all these elements have the same length, then R simplifies the result to a vector, matrix, or array.

In the example, R simplifies the result to a vector. To override this behavior, you need to specify the argument drop=FALSE in your subset operation:

> iris[, 'Sepal.Length', drop=FALSE]

Alternatively, you can subset the data frame like a list. The following code returns you a data frame with only one column as well:

> iris['Sepal.Length']

Finally, to get a subset of only some columns and some rows:

> iris[1:5, c("Sepal.Length", "Sepal.Width")]
 Sepal.Length Sepal.Width
1     5.1     3.5
2     4.9     3.0
3     4.7     3.2
4     4.6     3.1
5     5.0     3.6
  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus

Inside Sweepstakes

Win $500. Easy.