How to Substitute Text in R - dummies

How to Substitute Text in R

By Andrie de Vries, Joris Meys

The sub() function (short for substitute) in R searches for a pattern in text and replaces this pattern with replacement text. You use sub() to substitute text for text, and you use its cousin gsub() to substitute all occurrences of a pattern. (The g in gsub() stands for global.)

Suppose you have the sentence He is a wolf in cheap clothing, which is clearly a mistake. You can fix it with a gsub() substitution. The gsub() function takes three arguments: the pattern to find, the replacement pattern, and the text to modify:

> gsub("cheap", "sheep's", "A wolf in cheap clothing")
[1] "A wolf in sheep's clothing"

Another common type of problem that can be solved with text substitution is removing substrings. Removing substrings is the same as replacing the substring with empty text (that is, nothing at all).

Imagine a situation in which you have three file names in a vector: file_a.csv, file_b.csv, and file_c.csv. Your task is to extract the a, b, and c from those file names. You can do this in two steps: First, replace the pattern “file_” with nothing, and then replace the “.csv” with nothing. You’ll be left with your desired vector:

> x <- c("file_a.csv", "file_b.csv", "file_c.csv")
> y <- gsub("file_", ", x)
> y
[1] "a.csv" "b.csv" "c.csv"
> gsub(".csv", ", y)
[1] "a" "b" "c"