The Symmetry and Shape of Data Distributions Often Seen in Biostatistics

Biology Essentials For Dummies

Biostatistics can be surprising sometimes: Data obtained in biological studies can often be distributed in strange ways, as you can see in the following frequency distributions:

Two summary statistical measures, skewness and kurtosis, typically are used to describe certain aspects of the symmetry and shape of the distribution of numbers in your statistical data.

Skewness

Skewness refers to whether the distribution has left-right symmetry or whether it has a longer tail on one side or the other. Many different skewness coefficients have been proposed over the years. The most common one, often represented by the Greek letter lowercase gamma (γ), is calculated by averaging the cubes (third powers) of the deviations of each point from the mean, and then dividing by the cube of the standard deviation. This gives a dimensionless coefficient (one that is independent of the units of the observed values), which can be positive, negative, or zero.

A negative skewness coefficient (lowercase gamma) indicates left-skewed data (long left tail); a zero gamma indicates unskewed data; and a positive gamma indicates right-skewed data (long right tail).

Of course, the skewness coefficient for any set of real data almost never comes out to exactly zero because of random sampling fluctuations. So how large does gamma have to be before you suspect real skewness in your data? A very rough rule of thumb for large samples is that if gamma is greater than

your data is probably skewed.

Kurtosis

The three distributions shown below happen to have the same mean and the same standard deviation, and all three have perfect left-right symmetry (that is, they are unskewed). But their shapes are still very different. Kurtosis is a way of quantifying these differences in shape.

If you think of a typical distribution function curve as having a "head" (near the center), "shoulders" (on either side of the head), and "tails" (out at the ends), the term kurtosis refers to whether the distribution curve tends to have

A pointy head, fat tails, and no shoulders (leptokurtic)
Normal appearance
Broad shoulders, small tails, and not much of a head (platykurtic)

The Pearson kurtosis index, often represented by the Greek letter kappa, is calculated by averaging the fourth powers of the deviations of each point from the mean and dividing by the fourth power of the standard deviation. It is a dimensionless coefficient (is independent of the units in which the original data was expressed). Its value can range from 1 to infinity and is equal to 3.0 for a normal distribution.

The excess kurtosis is the amount by which kappa exceeds (or falls short of) 3. A very rough rule of thumb for large samples is that if kappa differs from 3 by more than

your data probably has abnormal kurtosis.

About This Article

About the book author:

John C. Pezzullo, PhD, has held faculty appointments in the departments of biomathematics and biostatistics, pharmacology, nursing, and internal medicine at Georgetown University. He is semi-retired and continues to teach biostatistics and clinical trial design online to Georgetown University students.