How to Use Regular Expressions in R
R supports the concept of regular expressions, which allows you to search for patterns inside text. You may never have heard of regular expressions, but you’re probably familiar with the broad concept. If you’ve ever used an * or a ? to indicate any letter in a word, then you’ve used a form of wildcard search. Regular expressions support the idea of wildcards and much more.
Regular expressions allow three ways of making a search pattern more general than a single, fixed expression:
Alternatives: You can search for instances of one pattern or another, indicated by the | symbol. For example beach|beech matches both beach and beech.
On English and American English keyboards, you can usually find the | on the same key as backslash (\).
Grouping: You group patterns together using parentheses ( ). For example you write be(a|e)ch to find both beach and beech.
Quantifiers: You specify whether an element in the pattern must be repeated or not by adding * (occurs zero or many times) or + (occurs one or many times). For example, to find either bach or beech (zero or more of a and e but not both), you use b(e*|a*)ch.
Try the following examples. First, create a new variable with five words:
> rwords <- c("bach", "back", "beech", "beach", "black")
Find either beach or beech using alternative matching:
> grep("beach|beech", rwords)  3 4
This means the search string was found in elements 3 and 4 of rwords. To extract the actual elements, you can use subsetting with square brackets:
> rwords[grep("beach|beech", rwords)]  "beech" "beach"
Now use the grouping rule to extract the same words:
> rwords[grep("be(a|e)ch", rwords)]  "beech" "beach"
Lastly, use the quantifier modification to extract bach and beech but not beach:
rwords[grep("b(e*|a*)ch", rwords)]  "bach" "beech"
To find more help in R about regular expressions, look at the Help page ?regexp. Some other great resources for learning more about regular expressions are Wikipedia and, where you can find a quick-start guide and tutorials.