10 Things You Ought to Know about Statistics to Use Excel - dummies

10 Things You Ought to Know about Statistics to Use Excel

By Stephen L. Nelson, E. C. Nelson

Excel is a wonderful tool when you need to use statistics. If you’ve never been exposed to statistics in school or it’s been a decade or two since you were, let these tips help you use some of the statistical tools that Excel provides.

Descriptive statistics are straightforward

The first thing that you ought to know is that some statistical analysis and some statistical measures are pretty darn straightforward. Descriptive statistics, which include things such as the pivot table cross-tabulations, as well as some of the statistical functions, make sense even to somebody who’s not all that quantitative.

Averages aren’t so simple sometimes

When someone uses the term average, what he usually refers to is the most common average measurement, which is a mean. Understanding that the term average is imprecise makes much of Excel’s statistical functionality more comprehensible.

To make this discussion more concrete, assume that you’re looking at a small set of values: 1, 2, 3, 4, and 5. As you might know, the mean in this small set of values is 3. You can calculate the mean by adding together all the numbers in the set (1+2+3+4+5) and then dividing this sum (15) by the total number of values in the set (5).

The median value is the value that separates the largest values from the smallest values. In the data set 1, 2, 3, 4, and 5, the median is 3. The value 3 separates the largest values (4 and 5) from the smallest values (1 and 2).

You don’t need to understand different average measurements, but you should remember that the term average is pretty imprecise.

Standard deviations describe dispersion

The formula for standard deviation and the logic are pretty easy to understand.

A standard deviation describes how values in a data set vary around the mean. The neat thing about statistical measures like a standard deviation, you often gain real insights into the characteristics of the data that you’re looking at. Another thing is that with these two bits of data, you can often draw inferences about data by looking at samples.

An observation is an observation

Observation is one of the terms that you’ll encounter if you read anything about statistics. An observation is just an observation. One way to define the term observation is like this: Whenever you actually assign a value to one of your random variables, you create an observation.

A sample is a subset of values

A sample is a collection of observations from a population. For example, if you create a data set that records the daily high temperature in your neighborhood, your little collection of observations is a sample.

In comparison, a sample is not a population. A population includes all the possible observations.

Inferential statistics are cool but complicated

If you look at a sample of values from a population and the sample is representative and large enough, you can draw conclusions about the population based on characteristics of the sample.

Inferential statistics, although very powerful, possess two qualities that you need to know:

  • Accuracy issues

  • Steep learning curve

Probability distribution functions aren’t always confusing

Probability distribution function sounds pretty tricky; but you can actually understand intuitively what a probability distribution function is with a couple of useful examples.

One common distribution that you hear about in statistics classes, for example, is a T distribution. A T distribution is essentially a normal distribution except with heavier, fatter tails.

One common probability distribution function is a uniform distribution. In a uniform distribution, every event has the same probability of occurrence. The unique thing about this distribution is that everything is pretty darn level.


Another common type of probability distribution function is the normal distribution, also known as a bell curve or a Gaussian distribution.

A normal distribution occurs naturally in many situations. For example, intelligence quotients (IQs) are distributed normally.


Parameters aren’t so complicated

A parameter is an input to the probability distribution function. In other words, the formula or function or equation that describes a probability distribution curve needs inputs. In statistics, those inputs are called parameters.

Some probability distribution functions need only a single simple parameter. For example, to work with a uniform distribution, all you really need is the number of values in the data set. A six-sided die, for example, has only six possibilities.

Skewness and kurtosis describe a probability distribution’s shape

A couple of other useful statistical terms to know are skewness and kurtosis. Skewness quantifies the lack of symmetry in a probability distribution. In a perfectly symmetrical distribution, like the normal distribution, the skewness equals zero. If a probability distribution leans to the right or the left, however, the skewness equals some value other than zero, and the value quantifies the lack of symmetry.

Kurtosis quantifies the heaviness of the tails in a distribution. In a normal distribution, kurtosis equals zero. The tail is the thing that reaches out to the left or right. However, if a tail in a distribution is heavier than a normal distribution, the kurtosis is a positive number. If the tails in a distribution are skinnier than in a normal distribution, the kurtosis is a negative number.

Confidence intervals seem complicated at first, but are useful

Probabilities often confuse people. An important thing to understand about confidence levels is that they’re linked with the margin of error.

Another important thing to understand about confidence levels is that the bigger you make your sample size, the smaller your margin of error will be using the same confidence level.

As just one example, say you had some Google Analytics data on two different web ads you’re running to promote your small business, and you want to know which ad is more effective. You can use the confidence interval formula to figure out how long your ads need to run before Google’s collected enough data for you to know which ad is really better.