Econometrics For Dummies
Book image
Explore Book Buy On Amazon

In econometrics, when you collect a random sample of data and calculate a statistic with that data, you’re producing a point estimate, which is a single estimate of a population parameter.

Descriptive statistics are measurements that can be used to summarize your sample data and, subsequently, make predictions about your population of interest. When descriptive measures are calculated using population data, those values are called parameters. When you calculate descriptive measures using sample data, the values are called estimators (or statistics).

You could estimate many population parameters with sample data, but here you calculate the most popular statistics: mean, variance, standard deviation, covariance, and correlation. The following list indicates how each parameter and its corresponding estimator is calculated.

  • Mean (average): The mean is the simple average of the random variable, X. The population mean for X is

    image0.jpg

    where Xi represents the individual measurements and N is the size of the population. The sample mean is

    image1.jpg

    The difference between the sample and population mean is that that the sample mean uses the sample size n instead of the population size N.

  • Variance: The variance is the average of the squared differences from the mean. The population variance for a random variable X is

    image2.jpg

    where Xi represents the individual measurements,

    image3.jpg

    is the population mean, and N is the size of the population. The sample variance is

    image4.jpg

    Note that the denominator for the sample variance not only uses the sample size n but also subtracts 1 from that number. This change is known as a degrees of freedom adjustment. Degrees of freedom adjustments are usually important in proving that estimators are unbiased.

  • Standard deviation: The standard deviation measures how spread out the random variable is, on average, from the mean. The standard deviation is the square root of the variance, so the population standard deviation for random variable X is

    image5.jpg

    and the sample standard deviation is

    image6.jpg
  • Covariance: The covariance measures how much two random variables change together. The population covariance between two random variables X and Y is

    image7.jpg

    where Xi represents the individual X values, Yi represents the individual Y values, and N is the total number of measurements in the population. The sample covariance is

    image8.jpg

    where

    image9.jpg

    is the sample mean of X,

    image10.jpg

    is the sample mean of Y, and n is the sample size.

  • Correlation: The correlation refers to the relationship between two random variables or sets of data. The population correlation coefficient between two random variables X and Y is

    image11.jpg

    where

    image12.jpg

    is the population covariance,

    image13.jpg

    is the population standard deviation of X, and

    image14.jpg

    is the population standard deviation of Y. The sample correlation coefficient is

    image15.jpg

    where sXY is the sample covariance, sX is the sample standard deviation of X, and sY is the sample standard deviation of Y.

Now, try working with some numbers. The table shows five observations of hamburger sales and prices. Use the formulas to calculate the mean, variance, standard deviation, covariance, and correlation.

Hamburger Prices and Sales
Hamburger Sales (in units), Y Hamburger Price (in $), X
100 1
80 2
63 3
45 4
21 5

You can use computer software, such as STATA, to calculate descriptive statistics from the data. By typing “sum” on the command line, you get the descriptive statistics for all the variables in your dataset. If you want the correlation between two variables, select Statistics→Summaries, tables, and tests→Summary and descriptive statistics→Correlations and covariances from the menu bar.

Or you can enter “corr variable1 variable2” on the command line. In your command, replace variable1 and variable2 with the actual names you’ve given the variables in your dataset. You can get covariance by adding an option to the correlation command; type “corr variable1 variable2, cov” on the command line.

image16.jpg

You should verify that your manual calculations of these measures are consistent with STATA’s output.

Summarizing data with descriptive statistics is a relatively simple procedure, but make sure you examine the values carefully. You can use descriptive measures to ensure that your sample contains measurements that are realistic. For example, if your population of interest is college graduates, you wouldn’t expect your random sample from that group to have an average age of 21.

Careful attention to these details provides more credibility in your data and the subsequent inferences you make.

About This Article

This article is from the book:

About the book author:

Roberto Pedace, PhD, is an associate professor in the Department of Economics at Scripps College. His published work has appeared in Economic Inquiry, Industrial Relations, the Southern Economic Journal, Contemporary Economic Policy, the Journal of Sports Economics, and other outlets.

This article can be found in the category: