Quantifying Qualitative Information for Econometric Models
Estimating an econometric model requires that all the information be quantified. In other words, numbers must be used to characterize both your quantitative and qualitative variables. Quantitative variables are typically coded with numeric values in the raw data, but qualitative variables are likely to require you to perform some quantification manipulation. Here, you find out how to quantify variables when working with two groups or with multiple groups.
Defining a dummy variable when you have only two possible characteristics
In many cases, the qualitative characteristics you want to include in your econometric analysis have two groups (or categories). In general, you have two groups when sample observations have a "this" or "that" option. For example, in most surveys, gender is classified as either male or female.
If a qualitative characteristic has two groups, you need to create one dummy variable in order to quantitatively capture that attribute. The dummy variable takes the value of 1 if one of the two characteristics is present and 0 if the other characteristic is observed. The group that's identified (or assigned) 0 values for the created dummy variable is called your reference or base group.
The table illustrates how you can create a dummy variable from your original data. Column 1 contains the movie title, and Column 2 contains the lead actor's name. Column 3 isn't part of the original data, but you can create the variable Female using the information in Column 2.
The variable Female is a dummy variable equal to 1 if the lead actor is female and equal to 0 if the lead actor is male. Notice that only one dummy variable is needed to capture two possibilities (in this case, male and female).
|1 Title||2 Lead||3 Female|
|The Wrestler||Mickey Rourke||0|
|Akeelah and the Bee||Keke Palmer||1|
|The Last King of Scotland||James McAvoy||0|
Your econometric results aren't affected by which group you decide to assign a 1 and which group you assign a 0 in your dummy variable.
Juggling multiple characteristics with dummy variables
In some cases, the qualitative characteristics you want to include in your econometric analysis have more than two groups (or categories). In general, you work with several groups when sample observations are classified into one of many possibilities. For example, a firm may be located in the West, Midwest, South, or Northeast region of the country.
In order to quantitatively capture a qualitative attribute with numerous groups (or possibilities), you need to create dummy variables for each group minus 1. The dummy variable takes the value of 1 if a particular characteristic is present and 0 otherwise.
In other words, if you have J groups, you need J – 1 dummy variables with 1s and 0s to capture all the qualitative information. The group that does not have a dummy variable is identified when all the other dummy values are 0, and it's called your reference or base group.
With this data, you can create the dummy variables you need from a qualitative variable with several groups. Column 1 contains the movie title, and Column 2 contains the MPAA rating (G, PG, PG13, or R). Columns 3, 4, and 5 aren't part of the original data, but you can create them using the information of MPAA rating in Column 2.
Notice that the number of dummy variables you need is one less (three) than the number of possible outcomes for the qualitative characteristic (in this case, four: G, PG, PG13, and R).
|1 Title||2 MPAA Rating||3 PG||4 PG13||5 R|
|Herbie: Fully Loaded||G||0||0||0|
The group you choose to assign a 0 all the way across doesn't affect your econometric results. Those observations (in this example, G-rated movies like Herbie: Fully Loaded) are important to include and do affect the overall results, because they are all part of the reference group. It doesn't matter, however, which type of movie is chosen to be the reference group.