The Language of Hypothesis Testing

By John Pezzullo

The theory of statistical hypothesis testing was developed in the early 20th century and has been the mainstay of practical statistics ever since. It was designed to apply the scientific method to situations involving data with random fluctuations (and almost all real-world data has random fluctuations). Following are a few terms commonly used in hypothesis testing.

  • Null hypothesis (abbreviated H0): The assertion that any apparent effect you see in your data does not reflect any real effect in the population, but is merely the result of random fluctuations in your sample.

  • Alternate hypothesis (abbreviated H1 or HAlt): The assertion that there really is some real effect in your data, over and above whatever is attributable to random fluctuations.

  • Significance test: A calculation designed to determine whether H0 can reasonably explain what you see in your data.

  • Significance: The conclusion that random fluctuations alone can’t account for the size of the effect you observe in your data, so H0 must be false, and you accept HAlt.

  • Statistic: A number that you obtain or calculate from your data.

  • Test statistic: A number, calculated from your data, usually for the purpose of testing H0. It’s often — but not always — calculated as the ratio of a number that measures the size of the effect (the signal) divided by a number that measures the size of the random fluctuations (the noise).

  • p value: The probability that random fluctuations alone in the absence of any real effect (in the population) could have produced an observed effect at least as large as what you observe in your sample. The p value is the probability of random fluctuations making the test statistic at least as large as what you calculate from your data (or, more precisely, at least as far away from H0 in the direction of HAlt).

  • Type I error: Getting a significant result when, in fact, no real effect is present, only random fluctuations.

  • Alpha: The probability of making a Type I error.

  • Type II error: Failing to get a significant result when, in fact, some effect really is present.

  • Beta: The probability of making a Type II error.

  • Power: The probability of getting a significant result when some effect is really present.