Describing Your Statistical Data with Numbers
After collecting good statistical data, you can summarize it with descriptive statistics. These are numbers that describe a data set in terms of its important features:
If the data are categorical (where individuals are placed into groups, such as gender or political affiliation), they are typically summarized using the number of individuals in each group (called the frequency) or the percentage of individuals in each group (called the relative frequency).
Numerical data represent measurements or counts, where the actual numbers have meaning (such as height and weight). With numerical data, more features can be summarized besides the number or percentage in each group. Some of these features include
Measures of center (in other words, where is the middle of the data?)
Measures of spread (how diverse or how concentrated are the data around the center?)
If appropriate, numbers that measure the strength of the relationship between two variables (such as height and weight)
Some descriptive statistics are more appropriate than others in certain situations; for example, the average isn’t always the best measure of the center of a data set; the median is often a better choice. And the standard deviation isn’t the only measure of variability on the block; the interquartile range has excellent qualities too. You need to be able to discern, interpret, and evaluate the types of descriptive statistics being presented to you on a daily basis and to know when a more appropriate statistic is in order.
The descriptive statistics you see most often include frequencies (counts) and relative frequencies (percents) for categorical data, and the mean, median, standard deviation, and percentiles for numerical data.
However, some data involving numbers is really more appropriate to be considered as categorical data — for example, Social Security numbers or numbers that are used as labels. The general rule is that if the numbers have no mathematical meaning (for example, who cares what the average Social Security number is?), then the data is really categorical.