Like any programming language, R makes it easy to compile lists of sorted and ordered data. To find substrings, you can use the grep() function, which takes two essential arguments:

  • pattern: The pattern you want to find.

  • x: The character vector you want to search.

Suppose you want to find all the states that contain the pattern New. Do it like this:

> grep("New",
[1] 29 30 31 32

The result of grep() is a numeric vector with the positions of each of the components that contain the matching pattern. In other words, the 29th component of contains the word New.

New Hampshire

Phew, that worked! But typing in the position of each matching text is going to be a lot of work. Fortunately, you can use the results of grep() directly to subset the original vector. You can do this by adding the argument value = TRUE. Try this:

> grep("New",, value = TRUE)
[1] "New Hampshire" "New Jersey"
[3] "New Mexico"  "New York"

The grep() function is case sensitive — it only matches text in the same case (uppercase or lowercase) as your search pattern. If you search for the pattern “new” in lowercase, your search results are empty:

> grep("new",, value = TRUE)

About This Article

This article is from the book:

About the book authors:

Andrie de Vries is a leading R expert and Business Services Director for Revolution Analytics. With over 20 years of experience, he provides consulting and training services in the use of R. Joris Meys is a statistician, R programmer and R lecturer with the faculty of Bio-Engineering at the University of Ghent.

This article can be found in the category: