By Andrie de Vries, Joris Meys

Vectors, lists, and data frames play an important role in representing data in R, so being able to succinctly and correctly specify a subset of your data is important.

There are three main operators that you can use to subset your data:

  • $: Extracts a single element by name from a list or data frame. For example, iris$Sepal.Length extracts the column Sepal.Length from the data frame iris.

  • [[: Extracts a single element by name or position from a list or data frame. For example, iris[[Sepal.Length]] extracts the column Sepal.Length from the data frame; iris; iris[[2]] extracts the second element from iris.

  • [: Extracts multiple elements from a vector, array, list, or data frame. For example, iris[, c(Sepal.Length, Species)] extracts the columns Sepal.Length and Species from iris; iris[1:10, ] extracts the first ten rows from iris; and iris[1:10, Species] extracts the first ten elements of the column Species from iris.

The bracket subset operator, [, allows you to return multiple elements. You specify the subset in one of five ways:

  • Blank: Returns everything. For example, iris[] returns all of iris.

  • Positive numeral: Includes only these elements. For example, iris[1:100, 5] extracts the first hundred elements of the fifth column of iris.

  • Negative numeral: Excludes these elements. For example, iris[-(1:100), ] excludes the first hundred rows from iris, while iris[, -5] excludes the fifth column from iris.

  • Logical: Includes if TRUE; excludes if FALSE. For example, iris[iris$Species==setosa, ] extracts only those rows from iris where the Species value is setosa.

  • Name: Includes all names that match. For example, iris[, c(Species, Petal.Width)] extracts the columns Species and Petal.Width from iris.