How to Find Average Differences by Using a Dummy Variable

You should recall from your statistics course how to conduct the t-test to examine the differences in means between two groups. But what you may not know is that you can use dummy variables and regression analysis to obtain the same results as the t-test.

Specification

Even though your econometric model is likely to include both quantitative and qualitative characteristics, you can begin with a model that only uses a dummy variable to capture qualitative characteristics and ignores other potential independent variables. This process amounts to identifying differences in means for groups identified by the dummy variable(s), but it’s a useful building block to understanding more realistic models that combine qualitative characteristics with quantitative variables.

If the qualitative characteristic that you’d like to use as an independent variable contains only two groups, then an econometric model with a single dummy variable as the only explanatory variable can be expressed as

image0.jpg

where Y is the dependent variable,

image1.jpg

is the intercept (or constant) term, and

image2.jpg

is the impact of the characteristic represented by the dummy variable (D). Di = 1 if the specific qualitative characteristic is present and Di = 0 if not.

If the qualitative characteristic you’d like to use as an independent variable has more than two groups, then the econometric model must include J – 1 variables to fully capture the possibilities. Suppose you’d like to use a variable with a qualitative characteristic containing four possible outcomes {A, B, C, and D}. The basic econometric model to capture a qualitative characteristic is expressed as

image3.jpg

where DiB = 1 if the observation belongs to group B, DiC = 1 if the observation belongs to group C, DiD = 1 if the observation belongs to group D, and DiB = DiC = DiD = 0 if the observation is in group A. By using this equation, you implicitly assign group A as the reference or base group in any two-group comparison.

Interpretation

One useful way of seeing the role of a dummy variable in an econometric model is to interpret the results of a regression using a dummy variable as the only independent variable.

An estimated regression with a dummy variable is generally written as

image4.jpg

where the

image5.jpg

terms represent the estimated parameters. Because D can only be 0 or 1 for any given observation,

image6.jpg

The predicted Y value from a regression represents the estimate of the conditional mean (E(Y | Di)). A dummy variable only has two values, so you get two predicted Y values. Therefore, the predicted Y values are equal to the sample means for each group.

  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
Advertisement

Inside Dummies.com