The Bootstrap Method for Standard Errors and Confidence Intervals
You can calculate the standard error (SE) and confidence interval (CI) of the more common sample statistics (means, proportions, event counts and rates, and regression coefficients). But an SE and CI exist (theoretically, at least) for any number you could possibly wring from your data — medians, centiles, correlation coefficients, and other quantities that might involve complicated calculations, like the area under a concentration-versus-time curve (AUC) or the estimated five-year survival probability derived from a survival analysis. Formulas for the SE and CI around these numbers might not be available or might be hopelessly difficult to evaluate. Also, the formulas that do exist might apply only to normally distributed numbers, and you might not be sure what kind of distribution your data follows.
Consider a very simple problem. Suppose you’ve measured the IQ of 20 subjects and have gotten the following results: 61, 88, 89, 89, 90, 92, 93, 94, 98, 98, 101, 102, 105, 108, 109, 113, 114, 115, 120, and 138. These numbers have a mean of 100.85 and a median of 99.5. Because you’re a good scientist, you know that whenever you report some number you’ve calculated from your data (like a mean or median), you’ll also want to indicate the precision of that value in the form of an SE and CI.
For the mean, and if you can assume that the IQ values are approximately normally distributed, things are pretty simple. You can calculate the SE of the mean as 3.54 and the 95% CI around the mean as 93.4 to 108.3.
But what about the SE and CI for the median, for which there are no simple formulas? And what if you can’t be sure those IQ values come from a normal distribution? Then the simple formulas might not be reliable.
Fortunately, there is a very general method for estimating SEs and CIs for anything you can calculate from your data, and it doesn’t require any assumptions about how your numbers are distributed. The SE of any sample statistic is the standard deviation (SD) of the sampling distribution for that statistic. And the 95% confidence limits of a sample statistic are well approximated by the 2.5th and 97.5th centiles of the sampling distribution of that statistic.
So if you could replicate your entire experiment many thousands times (using a different sample of subjects each time), and each time calculate and save the value of the thing you’re interested in (median, AUC, or whatever), this collection of thousands of values would be a very good approximation to the sampling distribution of the quantity of interest. Then you could estimate the SE simply as the SD of the sampling distribution and the confidence limits from the centiles of the distribution.
But actually carrying out this scenario isn’t feasible — you probably don’t have the time, patience, or money to perform your entire study thousands of times. Fortunately, you don’t have to repeat the study thousands of times to get an estimate of the sampling distribution. You can do it by reusing the data from your one actual study, over and over again! This may sound too good to be true, and statisticians were very skeptical of this method when it was first proposed. They called it bootstrapping, comparing it to the impossible task of picking yourself up by your bootstraps.
But it turns out that if you keep reusing the same data in a certain way, this method actually works. Over the years, the bootstrap procedure has become an accepted way to get reliable estimates of SEs and CIs for almost anything you can calculate from your data; in fact, it’s often considered to be the gold standard against which various approximation formulas for SEs and CIs are judged.
To see how the bootstrap method works, here’s how you would use it to estimate the SE and 95% CI of the mean and the median of the 20 IQ values shown earlier. You have to resample your 20 numbers, over and over again, in the following way:
Write each of your measurements on a separate slip of paper and put them all into a bag.
In this example, you write the 20 measured IQs on separate slips.
Reach in and draw out one slip, write that number down, and put the slip back into the bag.
(That last part is very important!)
Repeat Step 2 as many times as needed to match the number of measurements you have, returning the slip to the bag each time.
This is called resampling with replacement, and it produces a resampled data set. In this example, you repeat Step 2 19 more times, for a total of 20 times (which is the number of IQ measurements you have).
Calculate the desired sample statistic of the resampled numbers from Steps 2 and 3, and record that number.
In this example, you find the mean and the median of the 20 resampled numbers.
Repeat Steps 2 through 4 many thousands of times.
Each time, you generate a new resampled data set from which you calculate and record the desired sample statistics (in this case the mean and median of the resampled data set). You wind up with thousands of values for the mean and thousands of values for the median.
In each resampled data set, some of the original values may occur more than once, and some may not be present at all. Almost every resampled data set will be different from all the others. The bootstrap method is based on the fact that these mean and median values from the thousands of resampled data sets comprise a good estimate of the sampling distribution for the mean and median. Collectively, they resemble the kind of results you may have gotten if you had repeated your actual study over and over again.
Calculate the standard deviation of your thousands of values of the sample statistic.
This process gives you a bootstrapped estimate of the SE of the sample statistic. In this example, you calculate the SD of the thousands of means to get the SE of the mean, and you calculate the SD of the thousands of medians to get the SE of the median.
Obtain the 2.5th and 97.5th centiles of the thousands of values of the sample statistic.
You do this by sorting your thousands of values of the sample statistic into numerical order, and then chopping off the lowest 2.5 percent and the highest 2.5 percent of the sorted set of numbers. The smallest and largest values that remain are the bootstrapped estimate of low and high 95% confidence limits for the sample statistic.
In this example, the 2.5th and 97.5th centiles of the means and medians of the thousands of resampled data sets are the 95% confidence limits for the mean and median, respectively.
Obviously you’d never try to do this bootstrapping process by hand, but it’s quite easy to do with software like the free Statistics101 program. You can enter your observed results and tell it to generate, say, 100,000 resampled data sets, calculate and save the mean and the median from each one, and then calculate the SD and the 2.5th and 97.5th centiles of those 100,000 means and 100,000 medians. Here are a few results from a bootstrap analysis performed on this data:
Actual Data: 61, 88, 89, 89, 90, 92, 93, 94, 98, 98, 101, 102, 105, 108, 109, 113, 114, 115, 120, and 138. Mean = 100.85; Median = 99.5
Resampled Data Set #1: 61, 88, 88, 89, 89, 90, 92, 93, 98, 102, 105, 105, 105, 109, 109, 109, 109, 114, 114, and 120. Mean1 = 99.45, Median1 = 103.50
Resampled Data Set #2: 61, 88, 89, 89, 90, 92, 92, 98, 98, 98, 102, 105, 105, 108, 108, 113, 113, 113, 114, and 138. Mean2 = 100.7, Median2 = 100.0
(Between Set #2 and the following set, 99,996 more bootstrapped data sets were generated.)
Resampled Data Set #99,999: 61, 61, 88, 89, 92, 93, 93, 94, 98, 98, 98, 101, 102, 105, 109, 114, 115, 120, 120, and 138. Mean99,999 = 99.45, Median99,999 = 98.00
Resampled Data Set #100,000: 61, 61, 61, 88, 89, 89, 90, 93, 93, 94, 102, 105, 108, 109, 109, 114, 115, 115, 120, and 138. Mean100,000 = 97.7, Median100,000 = 98.0
Here’s a summary of the 100,000 resamples:
The SD of the 100,000 means = 3.46; this is the bootstrapped SE of the mean (SEM).
The SD of the 100,000 medians = 4.24; this is the bootstrapped SE of the median.
The 2.5th and 97.5th centiles of the 100,000 means = 94.0 and 107.6; these are the bootstrapped 95% confidence limits for the mean.
The 2.5th and 97.5th centiles of the 100,000 medians = 92.5 and 108.5; these are the bootstrapped 95% confidence limits for the median.
So you would report your mean and median, along with their bootstrapped standard errors and 95% confidence interval this way:
Mean = 100.85 ± 3.46 (94.0–107.6); Median = 99.5 ± 4.24 (92.5–108.5).
You’ll notice that the SE is larger (and the CI is wider) for the median than for the mean. This is generally true for normally distributed data — the median has about 25% more variability than the mean. But for non-normally distributed data, the median is often more precise than the mean.
You don’t need to use bootstrapping for something as simple as the SE or CI of a mean because there are simple formulas for that. But the bootstrap method can just as easily calculate the SE or CI for a median, a correlation coefficient, or a pharmacokinetic parameter like the AUC or elimination half-life of a drug, for which there are no simple SE or CI formulas and for which the normality assumptions might not apply.
Bootstrapping is conceptually simple, but it’s not foolproof. The method involves certain assumptions and has certain limitations. For example, it’s probably not going to be very useful if you have only a few observed values. Check out Statistics 101 for more information on using the bootstrap method (and for the free Statistics101 software to do the bootstrap calculations very easily).