How to Extract a Subset of a Vector in R - dummies

How to Extract a Subset of a Vector in R

By Andrie de Vries, Joris Meys

You use the same indexing rules for character vectors that you use for numeric vectors (or for vectors of any type). The process of referring to a subset of a vector through indexing its elements is also called subsetting. In other words, subsetting is the process of extracting a subset of a vector.

To illustrate how to work with vectors, and specifically how to create subsets, use the built-in datasets letters and LETTERS. Both are character vectors consisting of the letters of the alphabet, in lowercase (letters) and uppercase (LETTERS). Try it:

> letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k"
[12] "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v"
[23] "w" "x" "y" "z"
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K"
[12] "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V"
[23] "W" "X" "Y" "Z"

Aside from being useful to illustrate the use of subsets, you can use these built-in vectors whenever you need to make lists of things.

Let’s return to the topic of creating subsets. To extract a specific element from a vector, use square brackets. To get the tenth element of letters, for example, use the following:

> letters[10]
[1] "j"

To get the last three elements of LETTERS, use the following:

> LETTERS[24:26]
[1] "X" "Y" "Z"

The colon operator (:) in R is a handy way of creating sequences, so 24:26 results in 25, 25, 26. When this appears inside the square brackets, R returns elements 24 through 26.

In our last example, it was easy to extract the last three letters of LETTERS, because you know that the alphabet contains 26 letters. Quite often, you don’t know the length of a vector. You can use the tail() function to display the trailing elements of a vector. To get the last five elements of LETTERS, try the following:

> tail(LETTERS, 5)
[1] "V" "W" "X" "Y" "Z"

Similarly, you can use the head() function to get the first element of a variable. By default, both head() and tail() returns six elements, but you can tell it to return any specific number of elements in the second argument. Try extracting the first ten letters:

> head(letters, 10)
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"