Accuracy and Precision in Terms of the Sampling Distribution
The idea of a sampling distribution is at the heart of the concepts of accuracy and precision. Imagine a scenario in which an experiment (like a clinical trial or a survey) is carried out over and over again an enormous number of times, each time on a different random sample of subjects.
Using the “percent of kids who like chocolate” example, each experiment could consist of interviewing 50 randomly chosen children and reporting what percentage of kids in that sample said that they liked chocolate.
Repeating that entire experiment N times (and supposing that N is up in the millions) would require a lot of scientists, take a lot of time, and cost a lot of money, but suppose that you could actually do it.
For each repetition of the experiment, you’d get some particular value for the sample statistic you were interested in (the percent of kids in that sample who like chocolate), and you’d write this number down on a (really big) piece of paper.
After conducting your experiment N times, you’d have a huge set of values for the sampling statistic (that is, the percent of kids who like chocolate). You could then calculate the mean of those values by adding them up and dividing by N.
And you could calculate the standard deviation by subtracting the mean from each value, squaring each difference, adding up the squares, dividing by N – 1, and then taking the square root. And you could construct a histogram of the N percentage values to see how they were spread out.
Statisticians describe this in a more formal way — they say that all your replicate results are spread out in something called the sampling distribution for that sample statistic of your experiment.
Accuracy refers to how close your observed sample statistic comes to the true population parameter, or more formally, how close the mean of the sampling distribution is to the mean of the population distribution. For example, how close is the mean of all your percentage values to the true percentage of children who like chocolate?
Precision refers to how close your replicate values of the sample statistic are to each other, or more formally, how wide the sampling distribution is, which can be expressed as the standard deviation of the sampling distribution. For example, what is the standard deviation of your big collection of percentage values?