How to Create Subsets of Your Data in R - dummies

How to Create Subsets of Your Data in R

By Andrie de Vries, Joris Meys

Often the first task in data processing is to create subsets of your data in R for further analysis. You’re already familiar with the three subset operators:

  • $: The dollar-sign operator selects a single element of your data (and drops the dimensions of the returned object). When you use this operator with a data frame, the result is always a vector; when you use it with a named list, you get that element.

  • [[: The double-square-brackets operator also returns a single element, but it offers you the flexibility of referring to the elements by position, rather than by name. You use it for data frames and lists.

  • [: The single-square-brackets operator can return multiple elements of your data.

This summary is simplified.

When you use the single-square-brackets operator, you return multiple elements of your data. This means that you need a way of specifying exactly which elements you need.

In this paragraph, you can try subsetting with the built-in dataset islands, a named numeric vector with 48 elements.

> str(islands)
 Named num [1:48] 11506 5500 16988 2968 16 ...
 - attr(*, "names")= chr [1:48] "Africa" "Antarctica" "Asia" "Australia" ...
Subset Effect Example
Blank Returns all your data islands[]
Positive numerical values Extracts the elements at these locations islands[c(8, 1, 1, 42)]
Negative numerical values Extract all but these elements; in other words, excludes these
Logical values A logical value of TRUE includes
element; FALSE excludes element
islands[islands < 20]
Text strings Includes elements where the names match islands[c(“Madagascar”, “Cuba”)]