# Probability Distributions in Biostatistics

Samples differ from populations because of random fluctuations. Statisticians understand *quantitatively* how random fluctuations behave by developing mathematical equations, called *probability distribution functions*, that describe how likely it is that random fluctuations will exceed any given magnitude.

A probability distribution can be represented in several ways:

**As a mathematical equation**that gives the chance that a fluctuation will be of a certain magnitude. Using calculus, this function can be*integrated*— turned into another related function that tells the probability that a fluctuation will be at least as large as a certain magnitude.**As a graph of the distribution,**which looks and works much like a histogram of observed data.**As a table of values**telling how likely it is that random fluctuations will exceed a certain magnitude.

Over the years, hundreds of different probability distributions have been described, but most practical statistical work utilizes only a few of them.

## Distributions that describe your data

Some distributions describe the random fluctuations you see in your data:

**Normal:**The familiar, bell-shaped,*normal*distribution describes (at least approximately) an enormous number of variables you encounter in biological research.**Log-normal:**The skewed,*log-normal*distribution describes many laboratory results (enzymes and antibody titers, for example), lengths of hospital stays, and related things like costs, utilization of tests, drugs, and so forth.**Binomial:**The*binomial*distribution describes proportions, such as the fraction of subjects responding to treatment.**Poisson:**The*Poisson*distribution describes the number of occurrences of sporadic random events, such as clicks in a gamma radiation counter or deaths during some period of time.

## Distributions that come up during statistical testing

Some frequency distributions don't describe fluctuations in observed data, but rather describe fluctuations in numbers that you calculate as part of a statistical hypothesis test. These distributions include the Student t, chi-square, and Fisher F distributions, which are used to obtain the p values that result from the tests.