How to Compare Two Independent Population Averages
You can compare numerical data for two statistical populations or groups (such as cholesterol levels in men versus women, or income levels for high school versus college grads) to test a claim about the difference in their averages. (For example, is the difference in the population means equal to zero, indicating their means are equal?) Two independent (totally separate) random samples need to be selected, one from each population, in order to collect the data needed for this test.
The null hypothesis is that the two population means are the same; in other words, that their difference is equal to 0. The notation for the null hypothesis is
You can also write the null hypothesis as
emphasizing the idea that their difference is equal to zero if the means are the same.
The formula for the test statistic comparing two means (under certain conditions) is:
To calculate it, do the following:

Calculate the sample means
are given.) Let n_{1} and n_{2} represent the two sample sizes (they need not be equal).

Find the difference between the two sample means:
Keep in mind that because
is equal to 0 if H_{0} is true, it doesn’t need to be included in the numerator of the test statistic. However, if the difference they are testing is any value other than 0, you subtract that value from x_{1}x_{2} in the numerator of the test statistic.

Calculate the standard error using the following equation:

Divide your result from Step 2 by your result from Step 3.
To interpret the test statistic, add the following two steps to the list:

Look up your test statistic on the standard normal (Z) distribution (see the below Ztable) and calculate the pvalue.

Compare the pvalue to your significance level, (such as 0.05). If it’s less than or equal to your significance level, reject H_{0}. Otherwise, fail to reject H_{0}.
The conditions for using this test are that the two population standard deviations are known and either both populations have a normal distribution or both sample sizes are large enough for the Central Limit Theorem to be applied.
For example, suppose you want to compare the absorbency of two brands of paper towels (call the brands Statsabsorbent and Spongeomatic). You can make this comparison by looking at the average number of ounces each brand can absorb before being saturated. H_{0} says the difference between the average absorbencies is 0 (nonexistent), and H_{a} says the difference is not 0. In other words, one brand is more absorbent than the other. Using statistical notation, you have
Here, you have no indication of which paper towel may be more absorbent, so the notequalto alternative is the one to use.
Suppose you select a random sample of 50 paper towels from each brand and measure the absorbency of each paper towel. Suppose the average absorbency of Statsabsorbent (x_{1}) for your sample is 3 ounces, and assume the population standard deviation is 0.9 ounces. For Spongeomatic (x_{2}), the average absorbency is 3.5 ounces according to your sample; assume the population standard deviation is 1.2 ounces. Carry out this hypothesis test by following the 6 steps listed above:

Given the above information, you know

The difference between the sample means for (Statsabsorbent – Spongeomatic) is
(A negative difference simply means that the second sample mean was larger than the first.)

The standard error is

Divide the difference, –0.5, by the standard error, 0.2121, which gives you –2.36. This is your test statistic.

To find the pvalue, look up –2.36 on the standard normal (Z) distribution — see the above Ztable. The chance of being beyond, in this case to the left of, –2.36 is equal to 0.0091. Because H_{a} is a notequalto alternative, you double this percentage to get 2 ∗ 0.0091 = 0.0182, your pvalue.

This pvalue is quite a bit less than 0.05. That means you have fairly strong evidence to reject H_{0}.
Your conclusion is that a statistically significant difference exists between the absorbency levels of these two brands of paper towels, based on your samples. And Spongeomatic comes out on top, because it has a higher average. (Statsabsorbent minus Spongeomatic being negative means Spongeomatic had the higher value.)
The temptation is to say, “Well, I knew the claim that the absorbency levels were equal was wrong because one brand had a sample mean of 3.5 ounces and the other was 3.0 ounces. Why do I even need a hypothesis test?” All those numbers tell you is something about those 100 paper towels sampled. You also need to factor in variation using the standard error and the normal distribution to be able to say something about the entire population of paper towels.