Commercial Software for Biostatistical Analysis - dummies

Commercial Software for Biostatistical Analysis

By John Pezzullo

Commercial statistical programs usually provide a wide range of capabilities, personal user support (such as a phone help-line), and some reason to believe (or at least to hope) that the software will be around and supported for many years to come.

Prices vary widely, and the array of pricing options may be bewildering, with single-user and site licenses, nonprofit and academic discounts, one-year and permanent licenses, “basic” and “pro” versions, and so on.

Many companies let you download a demo version of their software that’s limited in some way — some features may be disabled, the maximum number of cases or variables may be limited, or the software may run for only a certain number of days.

Demo versions are a great way to see whether a software package is easy to use and meets your needs before you shell out the cash for a full version.


SAS is one of the most comprehensive statistical packages on the market. It’s widely used in all branches of science and is especially pervasive in the pharmaceutical industry. The current versions run on Windows and some Linux systems.

SAS is designed to be run by user-written programs. A GUI (graphical user interface) module is available that makes the programming task easier, but SAS isn’t designed like a typical personal computer program. It doesn’t use the familiar “document” paradigm that almost all other personal computer software uses.

For example, you don’t create a new data file by going to a File menu and selecting New, nor do you open an existing data file by going to File and selecting Open. Most users need SAS training in order to use the program productively.

SAS is large-scale software, designed for large-scale operations. It comes with a wide variety of analyses built in, and its programming language lets you create modules to perform other, less-common kinds of analyses. Its scope has grown beyond just the statistical analysis of data; SAS is now a complete data acquisition, validation, management, analysis, and presentation system.

SAS is also expensive — depending on the optional modules you want to use, it can cost over $1,000 per year. If your organization has a site license, you may be able to use SAS for relatively little or no money for as long as you’re affiliated with that organization.

If SAS is available at your school or organization, it may be worth your time to learn how to use it (especially if you plan to work in pharmaceutical research).


SPSS is another comprehensive program that can perform all the analyses you’re likely to need while remaining quite intuitive and user-friendly. You create and edit data files the same way you’d create and edit word-processing documents and spreadsheets — using the File menu’s commands: New, Save, Open, and so forth.

SPSS contains a programming language that can automate repetitive tasks and perform calculations and analyses beyond those built into the software. SPSS runs on Windows, Macintosh, and some Linux systems.

SPSS pricing is complicated. Depending on the modules you want to use, it can cost many hundreds of dollars per year.

GraphPad Prism and InStat

Unlike most commercial stats packages, these two programs were designed by and for scientists, not by and for statisticians.

GraphPad Prism focuses on the needs of biological and clinical researchers in laboratory settings, and it’s quite capable of handling non-laboratory research as well. It offers a powerful combination of parametric and nonparametric tests, extensive regression and curve-fitting (including nonlinear regression), survival analysis, and scientific graphing. It runs on Windows and Mac systems.

GraphPad InStat carries the “scientist, rather than statistician” theme even further, with a user-friendly interface that guides you through the process of selecting the right test based on the structure of your experiment, verifying that your data meets the assumptions of the test, and interpreting all parts of the output in “plain English” with a minimum of statistical jargon.

It doesn’t have all the capabilities of Prism; its emphasis is on ease of use. If you don’t want to have to become a statistician but just want to get your data analyzed properly with minimal fuss, and without a long learning process, check out InStat. It runs on Windows and some Mac systems.

These programs are reasonably priced; academic and student discounts are available, and you can download trial versions to evaluate.

Excel and other spreadsheet programs

You can use Excel (and similar spreadsheet programs) to store, summarize, and analyze your raw data and to prepare graphs from your analysis. But using Excel for data storage and analysis has been controversial.

Some have argued that Excel is too unstructured to serve as a respectable database (you can put anything into any cell, with no constraints on data types, ranges, and so forth), and you can easily destroy all or parts of your database (by sorting just some columns and not others). Others have said that Excel’s built-in mathematical and statistical functions are inaccurate and unreliable.

Although some of those criticisms were valid years ago, today’s Excel is much improved and is satisfactory for most purposes.

Excel has built-in functions for summarizing data (means, standard deviations, medians, and so on) for the common probability distribution functions and their inverses (normal, Student t, Fisher F, and chi-square) and for performing Student t tests and calculating correlation coefficients and simple linear regression (slope and intercept). If you install the optional Analysis add-in packages provided with Excel, Excel can do more extensive analyses, such as ANOVA and multivariate regression.

Some other packages to consider

Among the many other commercial statistics packages, you may want to look into one or more of these:

  • Stata: This package provides a broad range of capabilities through user-written routines. It originally used a command-line interface, but recent versions have implemented a graphical shell. It runs on Windows, Mac, Unix, and Linux systems.

  • S Plus: Based on the S programming language, S Plus provides an extensive graphical user interface. It is highly extensible through user-written routines for almost every imaginable statistical procedure.

  • Minitab: With an emphasis on industrial quality control, this package contains many of the capabilities you need for biological research. It runs on Windows systems.