Probability Distributions in Biostatistics - dummies

Probability Distributions in Biostatistics

By John Pezzullo

Samples differ from populations because of random fluctuations. Statisticians understand quantitatively how random fluctuations behave by developing mathematical equations, called probability distribution functions, that describe how likely it is that random fluctuations will exceed any given magnitude.

A probability distribution can be represented in several ways:

  • As a mathematical equation that gives the chance that a fluctuation will be of a certain magnitude. Using calculus, this function can be integrated — turned into another related function that tells the probability that a fluctuation will be at least as large as a certain magnitude.

  • As a graph of the distribution, which looks and works much like a histogram of observed data.

  • As a table of values telling how likely it is that random fluctuations will exceed a certain magnitude.

Over the years, hundreds of different probability distributions have been described, but most practical statistical work utilizes only a few of them.

Distributions that describe your data

Some distributions describe the random fluctuations you see in your data:

  • Normal: The familiar, bell-shaped, normal distribution describes (at least approximately) an enormous number of variables you encounter in biological research.

  • Log-normal: The skewed, log-normal distribution describes many laboratory results (enzymes and antibody titers, for example), lengths of hospital stays, and related things like costs, utilization of tests, drugs, and so forth.

  • Binomial: The binomial distribution describes proportions, such as the fraction of subjects responding to treatment.

  • Poisson: The Poisson distribution describes the number of occurrences of sporadic random events, such as clicks in a gamma radiation counter or deaths during some period of time.

Distributions that come up during statistical testing

Some frequency distributions don’t describe fluctuations in observed data, but rather describe fluctuations in numbers that you calculate as part of a statistical hypothesis test. These distributions include the Student t, chi-square, and Fisher F distributions, which are used to obtain the p values that result from the tests.