How to Search Text by Pattern in R

By Andrie de Vries, Joris Meys

Like any programming language, R makes it easy to compile lists of sorted and ordered data. To find substrings, you can use the grep() function, which takes two essential arguments:

  • pattern: The pattern you want to find.

  • x: The character vector you want to search.

Suppose you want to find all the states that contain the pattern New. Do it like this:

> grep(“New”, state.name)
[1] 29 30 31 32

The result of grep() is a numeric vector with the positions of each of the components that contain the matching pattern. In other words, the 29th component of state.name contains the word New.

> state.name[29]
New Hampshire

Phew, that worked! But typing in the position of each matching text is going to be a lot of work. Fortunately, you can use the results of grep() directly to subset the original vector. You can do this by adding the argument value = TRUE. Try this:

> grep(“New”, state.name, value = TRUE)
[1] “New Hampshire” “New Jersey”
[3] “New Mexico”  “New York”

The grep() function is case sensitive — it only matches text in the same case (uppercase or lowercase) as your search pattern. If you search for the pattern “new” in lowercase, your search results are empty:

> grep(“new”, state.name, value = TRUE)
character(0)