##### Statistics for Big Data For Dummies

One technique you can use to identify the distribution a dataset follows is the QQ-plot (QQ stands for quantile-quantile). You can use the QQ-plot to compare a dataset to a large number of different probability distributions. Often, data is compared to the normal distribution because many statistical tests assume normally distributed data.

This figure shows a normal QQ-plot for ExxonMobil's daily returns in 2013.

QQ-plot of daily returns to ExxonMobil stock in 2013.

The QQ-plot shows the quantiles of the normal distribution on the horizontal axis and the quantiles of the dataset on the vertical axis. If the dataset exactly matches the normal distribution, the points on the graph exactly match the upward-sloping line.

In this case, the returns to ExxonMobil stock closely follow the normal distribution, except for small discrepancies in the left and right tails of the distribution. (The upper right-hand corner of the QQ-plot represents the right tail of the distribution followed by ExxonMobil returns and the bottom left-hand corner represents the left tail.) The QQ-plot shows that the distribution of ExxonMobil returns has slightly fatter left and right tails than the normal distribution.

A more formal statistical test would be required to prove whether or not the ExxonMobil data is normally distributed, but the QQ-plot shows that the data is likely to be normal.