How to Sort Text Cases in R - dummies

How to Sort Text Cases in R

By Andrie de Vries, Joris Meys

Data can be sorted alphabetically or numerically, in ascending or descending order. Like any programming language, R makes it easy to compile lists of sorted and ordered data.

Because text in R is represented as character vectors, you can sort these vectors using the same functions as you use with numeric data. For example, to get R to sort the alphabet in reverse, use the sort() function:

> sort(letters, decreasing=TRUE)
 [1] "z" "y" "x" "w" "v" "u" "t" "s" "r" "q" "p"
[12] "o" "n" "m" "l" "k" "j" "i" "h" "g" "f" "e"
[23] "d" "c" "b" "a"

Here you used the decreasing argument of sort().

The sort() function sorts a vector. It doesn’t sort the characters of each element of the vector. In other words, sort() doesn’t mangle the word itself. You can still read each of the words in words.

Try it on your vector words that you created in the previous paragraph:

> sort(words)
[1] "brown" "DOG"  "FOX"  "jumps" "lazy"
[6] "over" "quick" "the"  "The"

R performs lexicographic sorting, as opposed to, for example, the C language, which sorts in ASCII order. This means that the sort order will depend on the locale of the machine the code runs on. In other words, the sort order may be different if the machine running R is configured to use Danish than it will if the machine is configured to use English. The R help file contains this description:

Beware of making any assumptions about the collation order: e.g., in Estonian, Z comes between S and T, and collation is not necessarily character-by-character — in Danish aa sorts as a single letter, after z.

In most cases, lexicographic sorting simply means that the sort order is independent of whether the string is in lowercase or uppercase. For more details, read the help text in ?sort as well as ?Comparison.

You can get help on any function by typing a question mark followed by the function name into the console.