The Confidence Interval around a Regression Coefficient

By John Pezzullo

This is one time you don’t need any formulas because you shouldn’t attempt to calculate standard errors or confidence intervals (CIs) for regression coefficients yourself. Any good regression program can provide the SE for every parameter (coefficient) it fits to your data. The regression program may also provide the confidence limits for any confidence level you specify, but if it doesn’t, you can easily calculate the confidence limits using the formulas for large samples.

Suppose you’re interested in whether or not blood urea nitrogen (BUN), a measure of kidney performance, tends to increase after age 60 in healthy adults. You can enroll a bunch of generally healthy adults age 60 and above, record their ages, and measure their BUN. Then you can create a scatter plot of BUN versus age and fit a straight line to the data points.

The slope of this line would have units of (mg/dL)/year and would tell you how much, on average, a healthy person’s BUN goes up with every additional year of age after age 60. Suppose the answer you get is that BUN increases 1.4 mg/dL per year. What is the 95 percent CI around that estimate of yearly increase?

The answer depends, in a complicated way, on the number of subjects in the analysis (60 in this example) and on the amount of correlation in the data (how close the points come to the fitted straight line). The actual formulas are far too complicated for you to try to evaluate by hand (or with a calculator).

Fortunately, all but the simplest regression programs display, for each regression parameter, the SE of that parameter. Some of them also display the 95% confidence limits for each parameter, but if they don’t, the limits can easily be calculated as 1.96 SE’s above and below the parameter values.