How to Search for Multiple Words in R

When working with text in R, you may need to find words or patterns inside text. Imagine you have a list of the states in the United States, and you want to find out which state names consist of two words.

To find substrings, you can use the grep() function, which takes two essential arguments:

  • pattern: The pattern you want to find.

  • x: The character vector you want to search.

So, how do you find the names of all the states with more than one word? This is easy when you realize that you can frame the question by finding all those states that contain a space:

> state.name[grep(" ", state.name)]
 [1] "New Hampshire" "New Jersey"
 [3] "New Mexico"   "New York"
 [5] "North Carolina" "North Dakota"
 [7] "Rhode Island"  "South Carolina"
 [9] "South Dakota"  "West Virginia"

The results include all the states that have two-word names, such as New Jersey, New York, North Carolina, South Dakota, and West Virginia.

You can see from this list that there are no state names that contain East. You can confirm this by doing another find:

> state.name[grep("East", state.name)]
character(0)

When the result of a character operation is an empty vector (that is, there is nothing in it), R represents it as character(0). Similarly, an empty, or zero-length, numeric vector is represented with integer(0) or numeric(0).

R makes a distinction between NULL and an empty vector. NULL usually means something is undefined. This is subtly different from something that is empty. For example, a character vector that happens to have no elements is still a character vector, represented by character(0).

  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
Advertisement

Inside Dummies.com

Dummies.com Sweepstakes

Win $500. Easy.