How Sample Size Affects Standard Error
Standard error is inversely affected by the size, n, of a statistical sample. Because n is in the denominator of the standard error formula, the standard error decreases as n increases. It makes sense that having more data gives less variation (and more precision) in your results.
Error here doesn’t mean there’s been a mistake — it means there is a gap between the population and sample results. The standard error of the sample mean is denoted by
(sigma sub-x-bar). Its formula is
is population standard deviation (sigma sub-x) and n is size of each sample.
Suppose X is the time it takes for a clerical worker to type and send one letter of recommendation, and say X has a normal distribution with mean 10.5 minutes and standard deviation 3 minutes. The bottom curve in the above figure shows the picture of the distribution of X, the individual times for all clerical workers in the population. According to the Empirical Rule, most of the values are within 3 standard deviations of the mean (10.5) — between 1.5 and 19.5.
Now take a random sample of 10 clerical workers, measure their times, and find the average,
each time. Repeat this process over and over, and graph all the possible results for all possible samples. The middle curve in the figure shows the picture of the sampling distribution of
Notice that it’s still centered at 10.5 (which you expected) but its variability is smaller; the standard error in this case is
(quite a bit less than 3 minutes, the standard deviation of the individual times).
Looking at the figure, the average times for samples of 10 clerical workers are closer to the mean (10.5) than the individual times are. That’s because average times don’t vary as much from sample to sample as individual times vary from person to person.
Now take all possible random samples of 50 clerical workers and find their means; the sampling distribution is shown in the tallest curve in the figure. The standard error of
You can see the average times for 50 clerical workers are even closer to 10.5 than the ones for 10 clerical workers. By the Empirical Rule, most of the values fall between 10.5 – 3(.42) = 9.24 minutes and 10.5 + 3(.42) = 11.76 minutes. Larger samples tend to be a more accurate reflections of the population, hence their sample means are more likely to be closer to the population mean — hence less variation.
Why is having more precision around the mean important? Because sometimes you don’t know the population mean but want to determine what it is, or at least get as close to it as possible. How can you do that? By taking a large random sample from the population and finding its mean. You know that your sample mean will be close to the actual population mean if your sample is large, as the above figure shows (assuming your data are collected correctly).