How to Plot Histograms with Your Data in R - dummies

How to Plot Histograms with Your Data in R

By Andrie de Vries, Joris Meys

To get a clearer visual idea about how your data is distributed within the range, you can plot a histogram using R. To make a histogram for the mileage data, you simply use the hist() function, like this:

> hist(cars$mpg, col='grey')

You see that the hist() function first cuts the range of the data in a number of even intervals, and then counts the number of observations in each interval. The bars height is proportional to those frequencies. On the y-axis, you find the counts.

With the argument col, you give the bars in the histogram a bit of color.


How to play with breaks

R chooses the number of intervals it considers most useful to represent the data, but you can disagree with what R does and choose the breaks yourself. For this, you use the breaks argument of the hist() function.

You can specify the breaks in a couple different ways:

  • You can tell R the number of bars you want in the histogram by giving a single number as the argument. Just keep in mind that R will still decide whether that’s actually reasonable, and it tries to cut up the range using nice rounded numbers.

  • You can tell R exactly where to put the breaks by giving a vector with the break points as a value to the breaks argument.

So, if you don’t agree with R and you want to have bars representing the intervals 5 to 15, 15 to 25, and 25 to 35, you can do this with the following code:

> hist(cars$mpg, breaks=c(5,15,25,35))

You also can give the name of the algorithm R has to use to determine the number of breaks as the value for the breaks argument. You can find more information on those algorithms on the Help page ?hist. Try to experiment with those algorithms a bit to check which one works the best.