How to Extend Text Functionality with Stringr in R

By Andrie de Vries, Joris Meys

If you’ve worked at all with the text manipulation functions of R, you probably wonder why all these functions have such unmemorable names and seemingly diverse syntax. If so, you’re not alone.

In fact, Hadley Wickham wrote a package available from CRAN that simplifies and standardizes working with text in R. This package is called stringr, and you can install it by using the R console or by choosing Tools→Install Packages in RStudio.

Although you have to install a package only once, you have to load it into the workspace using the library() function every time you start a new R session and plan to use the functions in that package.


Here are some of the advantages of using stringr rather than the standard R functions:

  • Function names and arguments are consistent and more descriptive. For example, all stringr functions have names starting with str_ (such as str_detect() and str_replace()).

  • stringr has a more consistent way of dealing with cases with missing data or empty values.

  • stringr has a more consistent way of ensuring that input and output data are of the same type.

The stringr equivalent for grep() is str_detect(), and the equivalent for gsub() is str_replace_all().

As a starting point to explore stringr, you may find some of these functions useful:

  • str_detect(): Detects the presence or absence of a pattern in a string

  • str_extract(): Extracts the first piece of a string that matches a pattern

  • str_length(): Returns the length of a string (in characters)

  • str_locate(): Locates the position of the first occurrence of a pattern in a string

  • str_match(): Extracts the first matched group from a string

  • str_replace(): Replaces the first occurrence of a matched pattern in a string

  • str_split(): Splits up a string into a variable number of pieces

  • str_sub(): Extracts substrings from a character vector

  • str_trim(): Trims white space from the start and end of string

  • str_wrap(): Wraps strings into nicely formatted paragraphs