R has a special data structure for categorical data, called factors. Factors are closely related to characters because any character vector can be represented by a factor. To look a little bit under the hood of the structure of a factor, use the str() function:

> str(state.region)
 Factor w/ 4 levels "Northeast","South",..: 2 4 4 2 4 4 1 2 2 2 ...

R reports the structure of state.region as a factor with four levels. You can see that the first two levels are "Northeast" and "South", but these levels are represented as integers 1, 2, 3, and 4.

Factors are a convenient way to describe categorical data. Internally a factor is stored as a numeric value associated with each level. This means you can set and investigate the levels of a factor separately from the values of the factor.

To look at the levels of a factor, you use the levels() function. For example, to extract the factor levels of state.region, use the following:

> levels(state.region)
[1] "Northeast"   "South"
[3] "North Central" "West"

Because the values of the factor are linked to the levels, when you change the levels, you also indirectly change the values themselves. To make this clear, change the levels of state.region to the values "NE", "S", "NC", and "W":

> levels(state.region) <- c("NE", "S", "NC", "W")
> head(state.region)
[1] S W W S W W
Levels: NE S NC W

Sometimes it’s useful to know the number of levels of a factor. The convenience function nlevels() extracts the number of levels from a factor:

> nlevels(state.region)
[1] 4

Because the levels of a factor are internally stored by R as a vector, you also can extract the number of levels using length:

> length(levels(state.region))
[1] 4

For the very same reason, you can index the levels of a factor using standard vector subsisting rules. For example, to extract the second and third factor levels, use the following:

> levels(state.region)[2:3]
[1] "S" "NC"