Statistical Significance and p-Values

By Jeff Sauro

When dealing with customer analytics in general, you’ll encounter the phrase statistically significant. You’ll also run into something called a p-value. There’s a lot packed in that little p and there are books written on the subject. Here’s what you need to know.

In principle, a statistically significant result (usually a difference) is a result that’s not attributed to chance. More technically, it means that if the Null Hypothesis is true (which means there really is no difference), there’s a low probability of getting a result that large or larger.

Consider these two important factors.

  • Sampling Error. There’s always a chance that the differences we observe when measuring a sample of customers is just the result of random noise; chance fluctuations; happenstance.

  • Probability; never certainty. Statistics is about probability; you cannot buy 100% certainty. Statistics is about managing risk. Can you live with a 10-percent likelihood that your decision is wrong? A 5-percent likelihood? 33 percent? The answer depends on context:

    What does it cost to increase the probability of making the right choice, and what is the consequence (or potential consequence) of making the wrong choice? Most publications suggest a cutoff of 5% — it’s okay to be fooled by randomness 1 time out of 20. That’s a reasonably high standard, and it may match your circumstances. It could just as easily be overkill, or it could expose you to far more risk than you can afford.

The p-value is one of the outcomes of a statistical test when making a comparison, say, between the conversion rate in a test of one marketing campaign compared to another. The p-value stands for probability value. The p-value is the probability of obtaining the difference you see in a comparison from a sample (or a larger one) if there really isn’t a difference for all customers.

Some examples of p-values are .012, .21, or .0001; a p-value of .012 indicates that the difference observed would only be seen about 1.2% of the time, if there really is no difference in the entire customer population.

Given that this is a pretty low percentage, in most cases, researchers conclude that the difference observed is not due to chance and call it statistically significant. By convention, journals and statisticians say something is statistically significant if the p-value is less than .05. There’s nothing sacred about .05, though; in applied research, the difference between .04 and .06 is usually negligible.

Statistical significance doesn’t mean practical significance. Only by considering context can you determine whether a difference is practically significant (that is, whether it requires action).