How to Use Quantile Plots to Check Data Normality in R - dummies

How to Use Quantile Plots to Check Data Normality in R

By Andrie de Vries, Joris Meys

Histograms leave much to the interpretation of the viewer. A better graphical way in R to tell whether your data is distributed normally is to look at a so-called quantile-quantile (QQ) plot.

With this technique, you plot quantiles against each other. If you compare two samples, for example, you simply compare the quantiles of both samples. Or, to put it a bit differently, R does the following to construct a QQ plot:

  • It sorts the data of both samples.

  • It plots these sorted values against each other.

If both samples don’t contain the same number of values, R calculates extra values by interpolation for the smallest sample to create two samples of the same size.

How to compare two data samples

Of course, you don’t have to do that all by yourself, you can simply use the qqplot() function for that. So, to check whether the temperatures during activity and during rest are distributed equally, you simply do the following:

> qqplot(beaver2$temp[beaver2$activ==1],
+    beaver2$temp[beaver2$activ==0])

This creates a plot where the ordered values are plotted against each other.


Between the square brackets, you can use a logical vector to select the cases you want. Here you select all cases where the variable activ equals 1 for the first sample, and all cases where that variable equals 0 for the second sample.

How to use an R QQ plot to check for data normality

In most cases, you don’t want to compare two samples with each other, but compare a sample with a theoretical sample that comes from a certain distribution (for example, the normal distribution).

To make a QQ plot this way, R has the special qqnorm() function. As the name implies, this function plots your sample against a normal distribution. You simply give the sample you want to plot as a first argument and add any graphical parameters you like.

R then creates a sample with values coming from the standard normal distribution, or a normal distribution with a mean of zero and a standard deviation of one. With this second sample, R creates the QQ plot as explained before.

R also has a qqline() function, which adds a line to your normal QQ plot. This line makes it a lot easier to evaluate whether you see a clear deviation from normality. The closer all points lie to the line, the closer the distribution of your sample comes to the normal distribution. The qqline() function also takes the sample as an argument.

Now you want to do this for the temperatures during both the active and the inactive period of the beaver. You can use the qqnorm() function twice to create both plots. For the inactive periods, you can use the following code:

> qqnorm( beaver2$temp[beaver2$activ==0], main='Inactive')
> qqline( beaver2$temp[beaver2$activ==0] )

You can do the same for the active period by changing the value 0 to 1.