Grouping the Bars on a Bar Plot with R

By Joseph Schmuller

You’ve probably seen bar plots where each point on the x-axis has more than one bar. The image below shows an example. The bar plot shows the frequency of eye color for four hair colors in 313 female students. The data is from the HairEyeColor data set. This type of plot is called a grouped bar plot.

grouped bar plot R
Grouped bar plot of Eye Color and Hair Color in 313 female students.

How does the base R graphics package deal with that? You begin by isolating the female data in the HairEyeColor data set, which lives in the datasets package:

> library(datasets)
> females <- HairEyeColor[,,2] > females
       Eye
Hair    Brown Blue Hazel Green
  Black    36    9     5     2
  Brown    66   34    29    14
  Red      16    7     7     7
  Blond     4   64     5     8

To begin producing the image above, you have to specify the colors in the bars and in the legend:

> color.names = c("black","grey40","grey80","white")

A word about those names: You can combine grey with any number from 0 to 100 to create a color — “grey0” is equivalent to “black” and “grey100” is equivalent to “white”.

Now you turn once again to the barplot() function. Interestingly, if you use females as the first argument for barplot(), R draws a plot with Eye Color on the x-axis (rather than Hair Color). To reverse that, you use t() to interchange (transpose, in other words) the rows and columns:

> t(females)
       Hair
Eye     Black Brown Red Blond
  Brown    36    66  16     4
  Blue      9    34   7    64
  Hazel     5    29   7     5
  Green     2    14   7     8

The function that produces the bar plot is

> barplot(t(females),beside=T,ylim=c(0,70),xlab="Hair Color",ylab="Frequency of Eye 
              Color", col=color.names,axis.lty="solid")

beside=T tells R to plot the bars, well, beside each other. (Try it without this argument and watch what happens.) ylim insures that no bar will rise above the highest value on the y-axis. col=color.names supplies the colors named in the vector.

The plot isn’t complete without the legend (the box that tells you which plot colors correspond to which eye colors):

> legend("top",rownames(t(females)),cex =0.8,fill=color.names,title="Eye Color")

The first argument puts the legend at the top of the plot, and the second argument provides the names. The third argument specifies the size of the characters in the legend — .08 means “80% of the normal size.” The fourth argument gives the colors for the color swatches, and the fifth, of course, provides the title.