|
Published:
June 24, 2013

Econometrics For Dummies

Overview

Score your highest in econometrics? Easy.

Econometrics can prove challenging for many students unfamiliar with the terms and concepts discussed in a typical econometrics course. Econometrics For Dummies eliminates that confusion with easy-to-understand explanations of important topics in the study of economics.

Econometrics For Dummies breaks down this complex subject and provides you with an easy-to-follow course supplement to further refine your understanding of how econometrics

works and how it can be applied in real-world situations.

  • An excellent resource for anyone participating in a college or graduate level econometrics course
  • Provides you with an easy-to-follow introduction to the techniques and applications of econometrics
  • Helps you score high on exam day

If you're seeking a degree in economics and looking for a plain-English guide to this often-intimidating course, Econometrics For Dummies has you covered.

Read More

About The Author

Roberto Pedace, PhD, is an associate professor in the Department of Economics at Scripps College. His published work has appeared in Economic Inquiry, Industrial Relations, the Southern Economic Journal, Contemporary Economic Policy, the Journal of Sports Economics, and other outlets.

Sample Chapters

econometrics for dummies

CHEAT SHEET

You can use the statistical tools of econometrics along with economic theory to test hypotheses of economic theories, explain economic phenomena, and derive precise quantitative estimates of the relationship between economic variables.To accurately perform these tasks, you need econometric model-building skills, quality data, and appropriate estimation strategies.

HAVE THIS BOOK?

Articles from
the book

Avoiding mistakes when you do econometric analysis depends on your ability to apply knowledge you acquired before and during your econometrics class. Following is a rundown of common pitfalls to help you improve your application of econometric analysis. Failing to use your common sense and knowledge of economic theory One of the characteristics that differentiate applied research in econometrics from other applications of statistical analysis is the use of economic theory and common sense to motivate the connection between the independent and dependent variables.
Following are the ten components you need to include in any econometrics research project. No matter what the specifics of your class assignment, you’ll probably be expected to come up with a topic, collect data, use econometrics software to complete the analysis, and interpret your findings. Introducing your topic and posing the primary question of interest The first paragraphs of your research paper should provide an interesting description of your topic.
Using the ordinary least squares (OLS) technique to estimate a model with a dummy dependent variable is known as creating a linear probability model, or LPM. LPMs aren’t perfect. Three specific problems can arise: Non-normality of the error term Heteroskedastic errors Potentially nonsensical predictions Non-normality of the error term The assumption that the error is normally distributed is critical for performing hypothesis tests after estimating your econometric model.
Serial correlation in the error term (autocorrelation) is a common problem for OLS regression estimation, especially with time-series and panel data. However, you usually have no way of knowing in advance if it’s going to be present, and theory doesn’t usually help you anticipate its presence. Consequently, you have to inspect your residuals to determine if they’re characterized by autocorrelation.
Econometrics students always appreciate a review of the statistical concepts that are most important to succeeding with econometrics. Specifically, you need to be comfortable with probability distributions, the calculation of descriptive statistics, and hypothesis tests. Your ability to accurately quantify economic relationships depends not only on your econometric model-building skills but also on the quality of the data you’re using for analysis and your capacity to adopt the appropriate strategies for estimating models that are likely to violate a statistical assumption.
Because one primary objective of econometrics is to examine relationships between variables, you need to be familiar with probabilities that combine information on two variables. A bivariate or joint probability density provides the relative frequencies (or chances) that events with more than one random variable will occur.
With a cubic function, you allow the effect of the independent variable (X) on the dependent variable (Y) to change. As the value of X increases (or decreases), the impact of the dependent variable may increase or decrease. However, unlike a quadratic function, this relationship changes at some unique value of X.
When you need to estimate a sample regression function (SRF), the most common econometric method is the ordinary least squares (OLS) technique, which uses the least squares principle to fit a prespecified regression function through your sample data. The least squares principle states that the SRF should be constructed (with the constant and slope values) so that the sum of the squared distance between the observed values of your dependent variable and the values estimated from your SRF is minimized (the smallest possible value).
You may want to allow your econometric model to have some flexibility, because economic relationships are rarely linear. Many situations are subject to the "law" of diminishing marginal benefits and/or increasing marginal costs, which implies that the impact of the independent variables won't be constant (linear).
Econometric techniques are used to estimate economic models, which ultimately allow you to explain how various factors affect some outcome of interest or to forecast future events. The ordinary least squares (OLS) technique is the most popular method of performing regression analysis and estimating econometric models, because in standard situations (meaning the model satisfies a series of statistical assumptions) it produces optimal (the best possible) results.
You can use the statistical tools of econometrics along with economic theory to test hypotheses of economic theories, explain economic phenomena, and derive precise quantitative estimates of the relationship between economic variables.To accurately perform these tasks, you need econometric model-building skills, quality data, and appropriate estimation strategies.
The t distribution is used quite a bit in econometrics. You probably used the t distribution extensively when dealing with means in your statistics class, but in econometrics you also use it for regression coefficients. Before you find out how that works, you should know how the t distribution is derived and its basic properties.
The cumulative density function (CDF) of a random variable X is the sum or accrual of probabilities up to some value. It shows how the sum of the probabilities approaches 1, which sometimes occurs at a constant rate and sometimes occurs at a changing rate. The CDF for discrete random variables For a discrete random variable, the CDF is equivalent to where f(X) is the probability density function.
If you use natural log values for your dependent variable (Y) and keep your independent variables (X) in their original scale, the econometric specification is called a log-linear model. These models are typically used when you think the variables may have an exponential growth relationship. For example, if you put some cash in a saving account, you expect to see the effect of compounding interest with an exponential growth of your money!
Using natural logs for variables on both sides of your econometric specification is called a log-log model. This model is handy when the relationship is nonlinear in parameters, because the log transformation generates the desired linearity in parameters (you may recall that linearity in parameters is one of the OLS assumptions).
A probability density function (PDF) shows the probabilities of a random variable for all its possible values. The probabilities associated with specific values (or events) from a random variable must adhere to the properties where Xj represents the possible values (outcomes) of random variable X. In other words, the chances of any random event occurring must be anywhere from impossible (probability of 0) to certain (probability of 1), and the sum of the probabilities for all events must be 1 (or 100 percent).
In econometrics, the standard estimation procedure for the classical linear regression model, ordinary least squares (OLS), can accommodate complex relationships. Therefore, you have a considerable amount of flexibility in developing the theoretical model. You can estimate linear and nonlinear functions including but not limited to Polynomial functions (for example, quadratic and cubic functions) Inverse functions Log functions (log-log, log-linear, and linear-log) In many cases, the dependent variable in a regression model can be influenced by both quantitative variables and qualitative factors.
The regression function is usually expressed mathematically in one of the following ways: basic notation, summation notation, or matrix notation. The Y variable represents the outcome you’re interested in, called the dependent variable, and the Xs represent all the independent (or explanatory) variables. Your objective now is to estimate the population regression function (PRF) using your sample data.
High multicollinearity results from a linear relationship between your independent variables with a high degree of correlation but aren’t completely deterministic (in other words, they don’t have perfect correlation). It’s much more common than its perfect counterpart and can be equally problematic when it comes to estimating an econometric model.
In econometrics, when you collect a random sample of data and calculate a statistic with that data, you’re producing a point estimate, which is a single estimate of a population parameter. Descriptive statistics are measurements that can be used to summarize your sample data and, subsequently, make predictions about your population of interest.
In econometrics, an informal way of checking for heteroskedasticity is with a graphical examination of the residuals. If you want to use graphs for an examination of heteroskedasticity, you first choose an independent variable that’s likely to be responsible for the heteroskedasticity. Then you can construct a scatter diagram with the chosen independent variable and the squared residuals from your OLS regression.
In econometrics, the procedure used for forecasting can be quite varied. If historical data is available, forecasting typically involves the use of one or more quantitative techniques. If historical data isn't available, or if it contains significant gaps or is unreliable, then forecasting can actually be qualitative.
In order to begin doing any exploratory data analysis or econometric work, you need a dataset that can be opened by specialized econometric software such as those in STATA format (*.dta). (STATA is one of the most popular econometrics software programs and makes the application of econometric techniques possible for anyone who’s not a computer programming genius.
In many cases, seasonal patterns are removed from time-series data when they’re released on public databases. Data that has been stripped of its seasonal patterns is referred to as seasonally adjusted or deseasonalized data. In order to obtain a goodness-of-fit measure that isolates the influence of your independent variables, you must estimate your model with deseasonalized values for both your dependent and independent variables.
Statisticians and econometricians typically require the estimators they use for inference and prediction to have certain desirable properties. For statisticians, unbiasedness and efficiency are the two most-desirable properties an estimator can have. An estimator is unbiased if, in repeated estimations using the method, the mean value of the estimator coincides with the true parameter value.
The error term is the most important component of the classical linear regression model (CLRM). Most of the CLRM assumptions that allow econometricians to prove the desirable properties of the OLS estimators (the Gauss-Markov theorem) directly involve characteristics about the error term (or disturbances). One of the CLRM assumptions deals with the conditional variance of the error term; namely, that the variance of the error term is constant (homoskedastic).
Seasonality effects can be correlated with both your dependent and independent variables. In order to avoid confounding the seasonality effects with those of your independent variables, you need to explicitly control for the season in which the measurement is observed. If you include dummy variables for seasons along with the other relevant independent variables, you can simultaneously obtain better estimates of both seasonality and the effects of the other independent variables.
You should recall from your statistics course how to conduct the t-test to examine the differences in means between two groups. But what you may not know is that you can use dummy variables and regression analysis to obtain the same results as the t-test. Specification Even though your econometric model is likely to include both quantitative and qualitative characteristics, you can begin with a model that only uses a dummy variable to capture qualitative characteristics and ignores other potential independent variables.
In econometrics, the expected value (or mean) of a random variable provides a measure of central tendency, which means that it provides one measurement of where the data tends to cluster. The expected value is the average of a random variable. If you have a discrete random variable, you can calculate the expected value with the equation where X represents the different possible values for the random variable, and f(X) is the probability that each value will occur.
Prediction in econometrics involves some prior knowledge. For example, you may attempt to predict how many “likes” your status update will get on Facebook given the number of “friends” you have and time of day you posted. In order to do so, you’ll want to be familiar with conditional probabilities. Conditional probabilities calculate the chance that a specific value for a random variable will occur given that another random variable has already taken a value.
One of the most important decisions you make when specifying your econometric model is which variables to include as independent variables. Here, you find out what problems can occur if you include too few or too many independent variables in your model, and you see how this misspecification affects your results.
Before you begin with regression analysis, you need to identify the population regression function (PRF). The PRF defines reality (or your perception of it) as it relates to your topic of interest. To identify it, you need to determine your dependent and independent variables (and how they’ll be measured) as well as the mathematical function describing how the variables are related.
The higher the frequency of an economic time series, the more likely it is to display seasonal patterns. For example, retail sales figures often exhibit a significant increase around the winter holidays. When you’re dealing with quarterly data, this increase is likely to be reflected with larger values in the fourth quarter of each year.
Before you begin doing econometric analysis, make sure you’re familiar with your data and how to view it in the popular STATA software. After all, you don’t want to estimate an econometric model with data that’s mostly incomplete or full of errors. In version 12.1 of STATA, the default setting allows you to open a dataset as large as 64 megabytes (MB) and containing up to 5,000 variables.
If you believe that the outcome (dependent variable) you’re modeling is likely to approach some value asymptotically (as X approaches zero or infinity), then an inverse function may be the way to go. Inverse functions can be useful if you’re trying to estimate a Phillips curve (the inverse relationship between inflation and unemployment rates) or a demand function (the inverse relationship between price and quantity demanded), among other economic phenomena where the variables are related inversely.
Unlike typical cross-section analysis, which imposes a static nature to your models, a pooled cross section allows you to incorporate a dynamic time element. You can do this with a pooled cross section because cross-sectional units are observed in two or more periods. Typically, pooled cross sections contain many more cross-sectional observations than the number of time periods being pooled.
Limited dependent variables arise when some minimum threshold value must be reached before the values of the dependent variable are observed and/or when some maximum threshold value restricts the observed values of the dependent variable. A limited dependent variable causes the standard model to become where restricted values don’t allow you to always observe Y*.
Autocorrelation, also known as serial correlation, may exist in a regression model when the order of the observations in the data is relevant or important. In other words, with time-series (and sometimes panel or logitudinal) data, autocorrelation is a concern. Most of the CLRM assumptions that allow econometricians to prove the desirable properties of the OLS estimators (the Gauss-Markov theorem) directly involve characteristics of the error term.
Getting a grasp on perfect multicollinearity, which is uncommon, is easier if you can picture an econometric model that uses two independent variables, such as the following: Suppose that, in this model, where the alphas are constants. By substitution, you obtain which indicates that the model collapses and can’t be estimated as originally specified.
Most economic time series grow over time, but sometimes time series actually decline over time. In either case, you’re looking at a time trend. The most common models capturing time trends are either linear or exponential. If the dependent variable has a relatively steady increase over time, your best bet is to model the relationship with a linear time trend.
In econometrics, a specific version of a normally distributed random variable is the standard normal. A standard normal distribution is a normal distribution with a mean of 0 and a variance of 1. It’s useful because you can convert any normally distributed random variable to the same scale, which allows you to easily and quickly calculate and compare probabilities.
Because economic relationships are rarely linear, you may want to allow your econometric model to have some flexibility. With a quadratic function, you allow the effect of the independent variable (X) on the dependent variable to change. As the value of X increases, the impact of the dependent variable increases or decreases.
Estimating an econometric model requires that all the information be quantified. In other words, numbers must be used to characterize both your quantitative and qualitative variables. Quantitative variables are typically coded with numeric values in the raw data, but qualitative variables are likely to require you to perform some quantification manipulation.
In econometrics, a random variable with a normal distribution has a probability density function that is continuous, symmetrical, and bell-shaped. Although many random variables can have a bell-shaped distribution, the density function of a normal distribution is precisely where represents the mean of the normally distributed random variable X, is the standard deviation,and represents the variance of the normally distributed random variable.
If your outcome of interest is qualitative, you use a dummy dependent variable and estimate the probability that the outcome (Y = 1) occurs using your econometric model. Although OLS can be used to estimate a model with a qualitative dependent variable, doing so would result in an error term that’s heteroskedastic and isn’t normally distributed.
In econometrics, the regression model is a common starting point of an analysis. As you define your regression model, you need to consider several elements: Economic theory, intuition, and common sense should all motivate your regression model. The most common regression estimation technique, ordinary least squares (OLS), obtains the best estimates of your model if the CLRM assumptions hold.
Economists apply econometric tools in a variety of specific fields (such as labor economics, development economics, health economics, and finance) to shed light on theoretical questions. They also use these tools to inform public policy debates, make business decisions, and forecast future events. Following is a list of ten interesting, practical applications of econometric techniques.
The Goldfeld-Quandt (GQ) test in econometrics begins by assuming that a defining point exists and can be used to differentiate the variance of the error term. Sample observations are divided into two groups, and evidence of heteroskedasticity is based on a comparison of the residual sum of squares (RSS) using the F-statistic.
In econometrics, an extremely common test for heteroskedasticity is the White test, which begins by allowing the heteroskedasticity process to be a function of one or more of your independent variables. It’s similar to the Breusch-Pagan test, but the White test allows the independent variable to have a nonlinear and interactive effect on the error variance.
Multicollinearity arises when a linear relationship exists between two or more independent variables in a regression model. In practice, you rarely encounter perfect multicollinearity, but high multicollinearity is quite common and can cause substantial problems for your regression analysis. Two types of multicollinearity exist: Perfect multicollinearity occurs when two or more independent variables in a regression model exhibit a deterministic (perfectly predictable or containing no randomness) linear relationship.
When studying economics, you probably used the F distribution in your statistics class to compare variances of two different normal distributions. In econometrics, you have a similar use for the F distribution. You’ll find that the F distribution is easier to use if you’re familiar with some of its characteristics.
In econometrics, you use the chi-squared distribution extensively. The chi-squared distribution is useful for comparing estimated variance values from a sample to those values based on theoretical assumptions. Therefore, it’s typically used to develop confidence intervals and hypothesis tests for population variance.
If you use natural log values for your independent variables (X) and keep your dependent variable (Y) in its original scale, the econometric specification is called a linear-log model (basically the mirror image of the log-linear model). These models are typically used when the impact of your independent variable on your dependent variable decreases as the value of your independent variable increases.
Econometrics is typically used for one of the following objectives: predicting or forecasting future events or explaining how one or more factors affect some outcome of interest. Although some econometrics problems have both objectives, in most cases you use econometric tools for one aim or the other. Regardless of the objective for using econometrics, econometric studies generally have one characteristic in common: the specification of a model.
The Breusch-Pagan (BP) test is one of the most common tests for heteroskedasticity. It begins by allowing the heteroskedasticity process to be a function of one or more of your independent variables, and it’s usually applied by assuming that heteroskedasticity may be a linear function of all the independent variables in the model.
If the classical linear regression model (CLRM) doesn't work for your data because one of its assumptions doesn't hold, then you have to address the problem before you can finalize your analysis. Fortunately, one of the primary contributions of econometrics is the development of techniques to address such problems or other complications with the data that make standard model estimation difficult or unreliable.
The Park test begins by assuming a specific model of the heteroskedastic process. Specifically, it assumes that the heteroskedasticity may be proportional to some power of an independent variable (Xk) in the model. This assumption can be expressed as You can obtain a linearized version of the Park model by using a log transformation: Because the values for aren’t known in practice, your are calculated from the residuals and used as proxies for Most econometrics software programs don’t have commands that allow you to automatically perform a Park test.
After you acquire data and choose the best econometric model for the question you want to answer, use formulas to produce the estimated output. In some cases, you have to perform these calculations by hand (sorry). However, even if your problem allows you to use econometric software such as STATA to generate results, it's nice to know what the computer is doing.
Probit and logit functions are both nonlinear in parameters, so ordinary least squares (OLS) can’t be used to estimate the betas. Instead, you have to use a technique known as maximum likelihood (ML) estimation. The objective of maximum likelihood (ML) estimation is to choose values for the estimated parameters (betas) that would maximize the probability of observing the Y values in the sample with the given X values.
Many economic phenomena are dichotomous in nature; in other words, the outcome either occurs or does not occur. Dichotomous outcomes are the most common type of discrete or qualitative dependent variables analyzed in economics. For example, a student who applies to graduate school will be admitted or not. If you're interested in determining which factors contribute to graduate school admission, then your outcome or dependent variable is dichotomous.
https://cdn.prod.website-files.com/6630d85d73068bc09c7c436c/69195ee32d5c606051d9f433_4.%20All%20For%20You.mp3

Frequently Asked Questions

No items found.