How to Search Text by Pattern in R

By Andrie de Vries, Joris Meys

Like any programming language, R makes it easy to compile lists of sorted and ordered data. To find substrings, you can use the grep() function, which takes two essential arguments:

  • pattern: The pattern you want to find.

  • x: The character vector you want to search.

Suppose you want to find all the states that contain the pattern New. Do it like this:

> grep(“New”,
[1] 29 30 31 32

The result of grep() is a numeric vector with the positions of each of the components that contain the matching pattern. In other words, the 29th component of contains the word New.

New Hampshire

Phew, that worked! But typing in the position of each matching text is going to be a lot of work. Fortunately, you can use the results of grep() directly to subset the original vector. You can do this by adding the argument value = TRUE. Try this:

> grep(“New”,, value = TRUE)
[1] “New Hampshire” “New Jersey”
[3] “New Mexico”  “New York”

The grep() function is case sensitive — it only matches text in the same case (uppercase or lowercase) as your search pattern. If you search for the pattern “new” in lowercase, your search results are empty:

> grep(“new”,, value = TRUE)