How to Find Average Differences by Using a Dummy Variable - dummies

How to Find Average Differences by Using a Dummy Variable

By Roberto Pedace

You should recall from your statistics course how to conduct the t-test to examine the differences in means between two groups. But what you may not know is that you can use dummy variables and regression analysis to obtain the same results as the t-test.


Even though your econometric model is likely to include both quantitative and qualitative characteristics, you can begin with a model that only uses a dummy variable to capture qualitative characteristics and ignores other potential independent variables. This process amounts to identifying differences in means for groups identified by the dummy variable(s), but it’s a useful building block to understanding more realistic models that combine qualitative characteristics with quantitative variables.

If the qualitative characteristic that you’d like to use as an independent variable contains only two groups, then an econometric model with a single dummy variable as the only explanatory variable can be expressed as


where Y is the dependent variable,


is the intercept (or constant) term, and


is the impact of the characteristic represented by the dummy variable (D). Di = 1 if the specific qualitative characteristic is present and Di = 0 if not.

If the qualitative characteristic you’d like to use as an independent variable has more than two groups, then the econometric model must include J – 1 variables to fully capture the possibilities. Suppose you’d like to use a variable with a qualitative characteristic containing four possible outcomes {A, B, C, and D}. The basic econometric model to capture a qualitative characteristic is expressed as


where DiB = 1 if the observation belongs to group B, DiC = 1 if the observation belongs to group C, DiD = 1 if the observation belongs to group D, and DiB = DiC = DiD = 0 if the observation is in group A. By using this equation, you implicitly assign group A as the reference or base group in any two-group comparison.


One useful way of seeing the role of a dummy variable in an econometric model is to interpret the results of a regression using a dummy variable as the only independent variable.

An estimated regression with a dummy variable is generally written as


where the


terms represent the estimated parameters. Because D can only be 0 or 1 for any given observation,


The predicted Y value from a regression represents the estimate of the conditional mean (E(Y | Di)). A dummy variable only has two values, so you get two predicted Y values. Therefore, the predicted Y values are equal to the sample means for each group.