By Keith McCormick, Jesus Salcedo, Aaron Poh

IBM SPSS Statistics comes in the form of a base system, but you can acquire additional modules to add on to that system. If you’ve installed a full system, you may already have some of these add-ons. Most are integrated and look like integral parts of the base system. Some may be of no interest to you; others could become indispensable.

The Advanced Statistics module

The following is a list of the statistical techniques that are part of the Advanced Statistics module.

  • General Linear Models (GLM)

  • Generalized Linear Models (GENLIN)

  • Linear Mixed Models

  • Generalized Estimating Equations (GEE) Procedures

  • Generalized Linear Mixed Models (GLMM)

  • Survival Analysis Procedures

The Custom Tables module

This has been the most popular module for years, and for good reason. If you need to squeeze a lot of information into a report, you need this module. For instance, if you do survey research and you want to report on the entire survey in tabular form, this module comes to your rescue. Picture your entire dataset summarized in an appendix. It isn’t merely a convenience. If you need this kind of summary, get this module.

The Regression module

The following is a list of the statistical techniques that are part of the Regression module.

  • Multinomial and Binary Logistic Regression

  • Nonlinear Regression (NLR) and Constrained Nonlinear Regression (CNLR)

  • Weighted Least Squares Regression and Two-Stage Least Squares Regression

  • Probit Analysis

The Categories module

The Categories module is designed to enable you to reveal relationships among your categorical data. To help you understand your data, the Categories module uses perceptual mapping, optimal scaling, preference scaling, and dimension reduction. Using these techniques, you can visually interpret the relationships among your rows and columns.

Categories performs its analysis and displays results so you can understand ordinal and nominal data. It uses procedures similar to conventional regression, principal components, and canonical correlation. It performs regression using nominal or ordinal categorical predictor or outcome variables.

The Data Preparation module

Data preparation is no fun. No module will eliminate all the work for the human in this human–computer partnership, but the Data Preparation module is designed to eliminate some of the routine, predictable aspects. It helps you process your rows and columns of data.

For your rows of data, it helps you identify outliers that might distort your data. As for your variables, it helps you identify the best ones, and lets you know that you could improve some by transforming them. It also allows you to create special validation rules to speed up your data checks and avoid a lot of manual work. Finally, it helps you identify patterns in your missing data.

The Decision Trees module

Decision trees are, by far, the most popular and well known of the data mining techniques. In fact, there are entire software products dedicated to this approach. If you aren’t sure if you need to do data mining, but you want to try it out, this would be just about the best way because you already know your way around SPSS Statistics.

The Forecasting module

You can use the Forecasting module to rapidly construct expert time-series forecasts. This module includes statistical algorithms you can use to analyze historical data and predict trends. You can set it up to analyze hundreds of different time series at once instead of running a separate procedure for each one.

The software is designed to handle the special situations that arise in trend analysis. It automatically determines the best-fitting autoregressive integrated moving average (ARIMA) or smoothing model. It automatically tests data for seasonality, intermittency, and missing values. The software detects outliers and prevents them from unduly influencing the results. The graphs generated include confidence intervals and indicate the model’s goodness of fit.

The Missing Values module

The Data Preparation module seems to have missing values covered, but the two modules are actually quite different. The Data Preparation module is really about finding data errors; its validation rules will tell you that a data point just isn’t right. On the other hand, the Missing Values module is focused on when there is no data value at all. It attempts to estimate the missing piece of information using other data that you do have. This process is called imputation, also known as replacing with an educated guess. All kinds of researchers, data miners, and statisticians can benefit, but if you’re a survey researcher, this is really bound to come in handy.

The Bootstrapping module

Bootstrapping is a technique that involves “resampling” with replacement. The Bootstrapping module allows you to build more stable models using your data by overcoming the effect of outliers and other problems in your data.

Traditional statistics assumes that your data has a particular distribution, but this technique avoids that kind of assumption. The result is a more accurate sense of what’s going on in the population. It is, in a sense, a simple idea, but because it takes a lot of computer horsepower, it’s more popular now than when computers were slower.

The Complex Samples module

Sampling is a big part of statistics. A simple random sample is what you usually think of as a sample — like picking names out of a hat. Research is often more complicated than that. The Complex Sample module is about more complicated forms of sampling: two stage, stratified, and so on.

The Conjoint module

The Conjoint module provides a way for you to determine how each of your product’s attributes affect consumer preference. When you combine conjoint analysis with competitive market product research, it’s easier to zero in on product characteristics that are important to your customers.

With this research, you can determine which product attributes your customers care about, which ones they care about most, and how you can do useful studies of pricing and brand equity. And you can do all this before incurring the expense of bringing new products to market.

The Direct Marketing module

This module is a little different from the others. It’s a bundle of related features in a wizardlike environment. It’s designed to be one-stop shopping for marketers. The main features are recency frequency monetary (RFM) analysis, cluster analysis, and profiling.

The Exact Tests module

The Exact Tests module makes it possible to be more accurate in your analysis of small datasets and datasets that contain rare occurrences. It gives you the tools you need for analyzing such data conditions with more accuracy than would otherwise be possible.

When only a small sample size is available, you can use the Exact Tests module to analyze that smaller sample and have more confidence in the results. Here, the idea is to perform more analyses in a shorter period of time. This module allows you to conduct different surveys rather than spend time gathering samples to enlarge the base of the surveys you have.

The Neural Networks module

A neural net is a latticelike network of neuronlike nodes, set up within SPSS to act something like the neurons in a living brain. In the Neural Network module, a training algorithm iteratively adjusts the weights to closely match the actual relationships among the data. The idea is to minimize errors and maximize accurate predictions.

Amos

Amos is an interactive interface you can use to build structural equation models. Not a true “module,” it’s standalone software with its own graphical user interface (GUI). Using the diagrams you create with Amos, you can uncover otherwise-hidden relationships and observe graphically how changes in certain values affect other values. You can create a model on nonnumeric data without having to assign numerical scores to the data. You can analyze censored data without having to make assumptions beyond normality.

The Sample Power module

The Sample Power module was developed in conjunction with the late Jacob Cohen. Cohen was a contemporary statistics powerhouse and largely responsible for drawing more attention to Type II error.

The idea is that university training emphasizes avoiding Type I error to such a degree that you forget about the other kind of risk. Type II error is the risk that there is an amazing finding awaiting in the population, but the analysis of the sample data doesn’t reveal it.

The Sample Power module allows you to accurately calculate that risk, and it may prompt you either to collect more data to avoid the risk, or maybe, just maybe, you figure out that you can get by with a little less data and you can save your organization money during the data collection phase.

The Visualization Designer module

The Visualization Designer module doesn’t get as much attention as it deserves. Even veteran SPSS users don’t seem to know that much about it. Graphboard Template Chooser is one of the graphing methods in SPSS, and this module is actually a sibling product to Graphboard in a sense.

If you want to create really fancy graphs in SPSS, you have two choices: Learn how to program Graphics Production Language (GPL) or use the Visualization Designer module. GPL isn’t really that bad, but for some folks, writing code just isn’t their thing.

The Visualization Designer module allows you to create all kinds of graphics that aren’t possible otherwise, and when you’re done, you can add new “templates” to your copy of SPSS and to that of your colleagues, too. When you’re done, the new templates will show up as new chart types in the Graphboard Template Chooser.