10 Key Concepts in Hypothesis Testing
Hypothesis testing is a statistical technique that is used in a variety of situations. Though the technical details differ from situation to situation, all hypothesis tests use the same core set of terms and concepts. The following descriptions of common terms and concepts refer to a hypothesis test in which the means of two populations are being compared.
The null hypothesis is a clear statement about the relationship between two (or more) statistical objects. These objects may be measurements, distributions, or categories. Typically, the null hypothesis, as the name implies, states that there is no relationship.
In the case of two population means, the null hypothesis might state that the means of the two populations are equal.
Once the null hypothesis has been stated, it is easy to construct the alternative hypothesis. It is essentially the statement that the null hypothesis is false. In our example, the alternative hypothesis would be that the means of the two populations are not equal.
The significance level is a measure of the statistical strength of the hypothesis test. It is often characterized as the probability of incorrectly concluding that the null hypothesis is false.
The significance level is something that you should specify up front. In applications, the significance level is typically one of three values: 10%, 5%, or 1%. A 1% significance level represents the strongest test of the three. For this reason, 1% is a higher significance level than 10%.
Related to significance, the power of a test measures the probability of correctly concluding that the null hypothesis is true. Power is not something that you can choose. It is determined by several factors, including the significance level you select and the size of the difference between the things you are trying to compare.
Unfortunately, significance and power are inversely related. Increasing significance decreases power. This makes it difficult to design experiments that have both very high significance and power.
The test statistic is a single measure that captures the statistical nature of the relationship between observations you are dealing with. The test statistic depends fundamentally on the number of observations that are being evaluated. It differs from situation to situation.
Distribution of the test statistic
The whole notion of hypothesis rests on the ability to specify (exactly or approximately) the distribution that the test statistic follows. In the case of this example, the difference between the means will be approximately normally distributed (assuming there are a relatively large number of observations).
One-tailed vs. two-tailed tests
Depending on the situation, you may want (or need) to employ a one- or two-tailed test. These tails refer to the right and left tails of the distribution of the test statistic. A two-tailed test allows for the possibility that the test statistic is either very large or very small (negative is small). A one-tailed test allows for only one of these possibilities.
In an example where the null hypothesis states that the two population means are equal, you need to allow for the possibility that either one could be larger than the other. The test statistic could be either positive or negative. So, you employ a two-tailed test.
The null hypothesis might have been slightly different, namely that the mean of population 1 is larger than the mean of population 2. In that case, you don’t need to account statistically for the situation where the first mean is smaller than the second. So, you would employ a one-tailed test.
The critical value in a hypothesis test is based on two things: the distribution of the test statistic and the significance level. The critical value(s) refer to the point in the test statistic distribution that give the tails of the distribution an area (meaning probability) exactly equal to the significance level that was chosen.
Your decision to reject or accept the null hypothesis is based on comparing the test statistic to the critical value. If the test statistic exceeds the critical value, you should reject the null hypothesis. In this case, you would say that the difference between the two population means is significant. Otherwise, you accept the null hypothesis.
The p-value of a hypothesis test gives you another way to evaluate the null hypothesis. The p-value represents the highest significance level at which your particular test statistic would justify rejecting the null hypothesis. For example, if you have chosen a significance level of 5%, and the p-value turns out to be .03 (or 3%), you would be justified in rejecting the null hypothesis.