Econometrics For Dummies
Book image
Explore Book Buy On Amazon

One of the most important decisions you make when specifying your econometric model is which variables to include as independent variables. Here, you find out what problems can occur if you include too few or too many independent variables in your model, and you see how this misspecification affects your results.

Omitting relevant variables

If a variable that belongs in the model is excluded from the estimated regression function, the model is misspecified and may cause bias in the estimated coefficients.

You have an omitted variable bias if an excluded variable has some effect (positive or negative) on your dependent variable and it’s correlated with at least one of your independent variables.

The mathematical nature of specification bias can be expressed using a simple model. Suppose the true population model is given by

image0.png

where X1 and X2 are the two variables that affect Y. But due to ignorance or lack of data, instead you estimate this regression:

image1.png

which omits X2 from the independent variables. The expected value of

image2.png

in this situation is

image3.png

But this equation violates the Gauss-Markov theorem because

image4.png

The magnitude of the bias can be expressed as

image5.png

where

image6.png

if the effect of X2 on Y and

image7.png

is the slope from this regression:

image8.png

which captures the correlation (positive or negative) between the included and excluded variable(s).

Summary of Omitted Variable Bias
Impact of Omitted Variable on Dependent Variable Correlation between Included and Omitted Variable:
Positive Negative
Positive Positive bias Negative bias
Negative Negative bias Positive bias

In practice, you’re likely to have some omitted variable bias because it’s impossible to control for everything that affects your dependent variable. However, you can increase your chances of minimizing omitted variable bias by avoiding simple regression models (with one independent variable) and including the variables that are likely to be the most important theoretically (and possibly, but not necessarily statistically) in explaining the dependent variable.

Including irrelevant variables

If a variable doesn’t belong in the model and is included in the estimated regression function, the model is overspecified. If you overspecify the regression model by including an irrelevant variable, the estimated coefficients remain unbiased. However, it has an undesirable effect of increasing the standard errors of your coefficients.

In a simple regression model (with one independent variable), the estimated standard error of the regression coefficient for X is

image9.png

where

image10.png

is the estimated variance of the error and

image11.png

is the total variation in X.

If you include additional independent variables in the model, the estimated standard error for any given regression coefficient is given by

image12.png

where

image13.png

is the R-squared from the regression of Xk on the other independent variables or Xs. Because

image14.png

the numerator decreases. An irrelevant variable doesn’t help explain any of the variation in Y, so without an offsetting decrease in

image15.png

the standard error increases.

Just because your estimated coefficient isn’t statistically significant doesn’t make it irrelevant. A well-specified model usually includes some variables that are statistically significant and some that aren’t. Additionally, variables that aren’t statistically significant can contribute enough explained variation to have no detrimental impact on the standard errors.

About This Article

This article is from the book:

About the book author:

Roberto Pedace, PhD, is an associate professor in the Department of Economics at Scripps College. His published work has appeared in Economic Inquiry, Industrial Relations, the Southern Economic Journal, Contemporary Economic Policy, the Journal of Sports Economics, and other outlets.

This article can be found in the category: