# How to Find Average Differences by Using a Dummy Variable

You should recall from your statistics course how to conduct the *t*-test to examine the differences in means between two groups. But what you may not know is that you can use dummy variables and regression analysis to obtain the same results as the *t*-test.

## Specification

Even though your econometric model is likely to include both quantitative and qualitative characteristics, you can begin with a model that only uses a dummy variable to capture qualitative characteristics and ignores other potential independent variables. This process amounts to identifying differences in means for groups identified by the dummy variable(s), but it’s a useful building block to understanding more realistic models that combine qualitative characteristics with quantitative variables.

If the qualitative characteristic that you’d like to use as an independent variable contains only two groups, then an econometric model with a single dummy variable as the only explanatory variable can be expressed as

where *Y* is the dependent variable,

is the intercept (or constant) term, and

is the impact of the characteristic represented by the dummy variable (*D*). *D** _{i}* = 1 if the specific qualitative characteristic is present and

*D*

*= 0 if not.*

_{i}If the qualitative characteristic you’d like to use as an independent variable has more than two groups, then the econometric model must include *J* – 1 variables to fully capture the possibilities. Suppose you’d like to use a variable with a qualitative characteristic containing four possible outcomes {A, B, C, and D}. The basic econometric model to capture a qualitative characteristic is expressed as

where *D** _{iB}* = 1 if the observation belongs to group B,

*D*

*= 1 if the observation belongs to group C,*

_{iC}*D*

*= 1 if the observation belongs to group D, and*

_{iD}*D*

*=*

_{iB}*D*

*=*

_{iC}*D*

*= 0 if the observation is in group A. By using this equation, you implicitly assign group A as the reference or base group in any two-group comparison.*

_{iD}## Interpretation

One useful way of seeing the role of a dummy variable in an econometric model is to interpret the results of a regression using a dummy variable as the only independent variable.

An estimated regression with a dummy variable is generally written as

where the

terms represent the estimated parameters. Because *D* can only be 0 or 1 for any given observation,

The predicted *Y* value from a regression represents the estimate of the conditional mean (*E*(*Y* | *D** _{i}*)). A dummy variable only has two values, so you get two predicted

*Y*values. Therefore, the predicted

*Y*values are equal to the sample means for each group.