Learn more with dummies

Enter your email to join our mailing list for FREE content right to your inbox. Easy!

Perfect Multicollinearity and Your Econometric Model

By Roberto Pedace

Getting a grasp on perfect multicollinearity, which is uncommon, is easier if you can picture an econometric model that uses two independent variables, such as the following:


Suppose that, in this model,


where the alphas are constants. By substitution, you obtain


which indicates that the model collapses and can’t be estimated as originally specified.

Perfect multicollinearity occurs when two or more independent variables in a regression model exhibit a deterministic (perfectly predictable or containing no randomness) linear relationship.

The result of perfect multicollinearity is that you can’t obtain any structural inferences about the original model using sample data for estimation. In a model with perfect multicollinearity, your regression coefficients are indeterminate and their standard errors are infinite.

Perfect multicollinearity usually occurs when data has been constructed or manipulated by the researcher. For example, you have perfect multicollinearity if you include a dummy variable for every possible group or category of a qualitative characteristic instead of including a variable for all but one of the groups.

In the following figure, STATA is used to create a variable that is a linear combination of another variable. Then the graph of the two variables is plotted and includes both of them as independent variables in a regression model. Notice, however, that the results do not contain parameter estimates for both variables. Obtaining individual regression coefficients for every variable is impossible if you have perfect multicollinearity.


Most econometric software programs identify perfect multicollinearity and drop one (or more) variables prior to providing the estimation results, taking care of the problem for you. The good news is that you can avoid perfect multicollinearity by exhibiting some care in creating variables and carefully choosing which ones to include as independent variables.