How to Concatenate Text Cases in R - dummies

How to Concatenate Text Cases in R

By Andrie de Vries, Joris Meys

You can put together separate data elements in R so that they form a single text string. To concatenate text, you use the paste() function:

paste("The", "quick", "brown", "fox")
[1] "The quick brown fox"

By default, paste() uses a blank space to concatenate the vectors. In other words, you separate elements with spaces. This is because paste() takes an argument that specifies the separator. The default for the sep argument is a space (” “) — it defaults to separating elements with a blank space, unless you tell it otherwise.

When you use paste(), or any function that accepts multiple arguments, make sure that you pass arguments in the correct format. Take a look at this example, but notice that this time there is a c() function in the code:

paste(c("The", "quick", "brown", "fox"))
[1] "The"  "quick" "brown" "fox"

What’s happening here? Why doesn’t paste() paste the words together? The reason is that, by using c(), you passed a vector as a single argument to paste(). The c() function combines elements into a vector. By default, paste() concatenates separate vectors — it doesn’t collapse elements of a vector.

For the same reason, paste(words) results in the following:

[1] "The"  "quick" "brown" "FOX"  "jumps" "over" "the"  "lazy" "DOG"

The paste() function takes two optional arguments. The separator (sep) argument controls how different vectors get concatenated, and the collapse argument controls how a vector gets collapsed into itself, so to speak.

When you want to concatenate the elements of a vector by using paste(), you use the collapse argument, as follows:

paste(words, collapse=" ")
[1] "The quick brown FOX jumps over the lazy DOG"

The collapse argument of paste can take any character value. If you want to paste together text by using an underscore, use the following:

paste(words, collapse="_")
[1] "The_quick_brown_FOX_jumps_over_the_lazy_DOG"

You can use sep and collapse in the same paste call. In this case, the vectors are first pasted with sep and then collapsed with collapse. Try this:

> paste(LETTERS[1:5], 1:5, sep="_", collapse="---")
[1] "A_1---B_2---C_3---D_4---E_5"

What happens here is that you first concatenate the elements of each vector with an underscore (that is, A_1, B_2, and so on), and then you collapse the results into a single string with between each element.

The paste() function takes vectors as input and joins them together. If one vector is shorter than the other, R recycles (repeats) the shorter vector to match the length of the longer one — a powerful feature.

Suppose that you have five objects, and you want to label them “sample 1”, “sample 2”, and so on. You can do this by passing a short vector with the value sample and a long vector with the values 1:5 to paste(). In this example, the shorter vector is repeated five times:

> paste("Sample", 1:5)
[1] "Sample 1" "Sample 2" "Sample 3" "Sample 4" "Sample 5"