The Role of Casuality in Econometrics

By Roberto Pedace

Econometrics is typically used for one of the following objectives: predicting or forecasting future events or explaining how one or more factors affect some outcome of interest. Although some econometrics problems have both objectives, in most cases you use econometric tools for one aim or the other.

Regardless of the objective for using econometrics, econometric studies generally have one characteristic in common: the specification of a model. Model specification consists of selecting an outcome of interest or dependent variable (typically labeled as Y) and one or more independent factors (or explanatory variables, usually labeled with Xs). In addition to variable selection, model specification also refers to choosing an appropriate functional form.

Independent variables are the factors that cause changes in your dependent variable, not the other way around. Because most situations in economics (and in some business fields like marketing and accounting) involve cause-and-effect scenarios, applied work in econometrics pays careful attention to the variables chosen to be dependent and independent.

If the relationship between cause variables and effect variables isn’t obvious, you should utilize your common sense and knowledge of economics to justify the causal assumptions of your model.

Justifying your model means that you should be able to explain why it makes sense to think of your dependent variable as being caused by the independent variables you’ve selected. In some cases, that connection may be obvious, but in other cases you may need to provide a detailed explanation.

For example, if you have state data and your dependent variable is the average amount of time unemployed workers are without a job, you’d want to include independent variables that capture the skill traits of workers and other state characteristics that may influence unemployment spell length. Average education and work experience levels are characteristics that, according to human capital theory, should help workers reduce the amount of time they’re unemployed.

These are justifiable independent variables and won’t require much explanation because of their direct connection with the outcome of interest.

On the other hand, state policies, such as welfare assistance and unemployment insurance, have a less obvious connection. Nevertheless, they’re likely to influence worker decision making and be important causal factors. It’s likely, however, that you’ll need to invest more time explaining how they’re related to the outcome and why their inclusion among the independent variables makes sense.

Keep in mind that regression analysis identifies the direction (sign) and strength (magnitude) of the relationship between the variables in your model. But the strength of the statistical relationship does not imply causality.

The figure shows the scatter plot of monthly ice cream production in the United States and drowning deaths in Florida single residence pools in 2006. You can see that drowning and ice cream production have a strong positive relationship (trend line is upward sloping, so both variables move in the same direction [deaths increase, ice cream increase]), but you don’t have a strong case for one causing the other simply because they’re correlated (ice cream affects drowning?).


It’s simply an example of spurious correlation, which occurs when two variables coincidentally have a statistical relationship (positive or negative) but one doesn’t cause the other.

Causation cannot be proven by statistical results. Your results can be used to support a hypothesis of causality, but only after you’ve developed a model that is well grounded in economic theory and/or good common sense.