Advertisement
Online Test Banks
Score higher
See Online Test Banks
eLearning
Learning anything is easy
Browse Online Courses
Mobile Apps
Learning on the go
Explore Mobile Apps
Dummies Store
Shop for books and more
Start Shopping

How to Split Strings in R

A collection of combined letters and words is called a string. Whenever you work with text, you need to be able to concatenate words (string them together) and split them apart. In R, you use the paste() function to concatenate and the strsplit() function to split. In this section, we show you how to use both functions.

First, create a character vector called pangram, and assign it the value "The quick brown fox jumps over the lazy dog", as follows:

> pangram <- "The quick brown fox jumps over the lazy dog"
> pangram
[1] "The quick brown fox jumps over the lazy dog"

To split this text at the word boundaries (spaces), you can use strsplit() as follows:

> strsplit(pangram, " ")
[[1]]
[1] "The"  "quick" "brown" "fox"  "jumps" "over" "the"  "lazy" "dog"

Notice that the unusual first line of strsplit()’s output consists of [[1]]. Similar to the way that R displays vectors, [[1]] means that R is showing the first element of a list. Lists are extremely important concepts in R; they allow you to combine all kinds of variables.

In the preceding example, this list has only a single element. Yes, that’s right: The list has one element, but that element is a vector.

To extract an element from a list, you have to use double square brackets. Split your pangram into words, and assign the first element to a new variable called words, using double-square-brackets ([[]]) subsetting, as follows:

words <- strsplit(pangram, " ")[[1]]
> words
[1] "The"  "quick" "brown" "fox"  "jumps" "over" "the"  "lazy" "dog"

To find the unique elements of a vector, including a vector of text, you use the unique() function. In the variable words, "the" appears twice: once in lowercase and once with the first letter capitalized. To get a list of the unique words, first convert words to lowercase and then use unique:

> unique(tolower(words))
[1] "the"  "quick" "brown" "fox"  "jumps" "over" "lazy"
[8] "dog"
  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
Advertisement
Advertisement

Inside Dummies.com

Dummies.com Sweepstakes

Win an iPad Mini. Enter to win now!