Central Tendency: Beyond the Basics
The mean and median are the two most reliable and commonly reported measures of the center, and they are used in a wide variety of situations. However, if you’re seriously studying statistics, you should be familiar with two other measures of central tendency.
The mode is another measure of center that calculates which value (or range of values) occurs most frequently. The mean and median can be very effective at describing symmetric and unimodal distributions. The mode is useful for explaining situations that the mean and median cannot, particularly skewed or multimodal data.
To calculate the mode, you simply create a frequency table of all possible values and count the number of times each appears. For example, if the data set contains 10, 20, 20, 20, 30, 30, 40, 50, 50; then the mode is 20.
If you have a data set that doesn’t have values that are repeated exactly, you can split them into ranges similar to the way you prepare for making a histogram. For example, in the following table, two players on the Lakers are making the NBA league minimum, so the mode could be considered to be $959,111. Alternately, you could split the data into groups of $1 million, in which case the mode would be the range from $5–6 million because four players fall into that group.
The mode can be visualized by the peak in a histogram. With data sets that have multiple peaks, it’s not uncommon to report multiple modes because the mean and the median may not accurately reflect where most of the values lie.
You’ve seen that the mean is susceptible to outliers and will be “pulled” toward the most extreme values. The trimmed mean (or truncated mean) tries to eliminate the influence of outliers by trimming off a small number of extreme values so the mean focuses more on the most central values.
To calculate a trimmed mean, you choose a small percentage of your data set (say, 10 percent), split that number in half, remove the corresponding percentage of values from both the low and high ends, and then calculate the mean of the remaining values.
For example, suppose a data set contains the following n = 20 values: 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 9, 500. The outlier value of 500 drives the (traditional) sample mean to be 29.6, which is larger than all but one of the data values and not really indicative of where all the action is.
Instead, you can cut out the most extreme 10 percent, which means removing two values (10% x 20 = 20), and just calculate a mean based on the middle 90 percent of values. Since you have to split that two between the two ends, you’ll remove one from the low end (3) and one from the high end (500). The 90 percent trimmed mean based on the remaining 18 data values is 4.9 and better reflects the central trend of the data.