SPSS Statistics For Dummies
Book image
Explore Book Buy On Amazon
IBM SPSS Statistics comes in the form of a base system, but you can acquire additional modules to add to that system. SPSS is available in various licensing editions: the campus editions, subscription plans, and commercial editions. Although the pricing and various bundles differ for each, they all enable you to include the same add-on modules.

If you're using a copy of SPSS at work or in a university setting that someone else installed, you might have some of these add-ons without realizing it because most are so fully integrated into the menus that they look like integral parts of the base system. If you notice that your menus are shorter or longer than someone else’s copy of SPSS, this is probably due to add-on modules.

Some add-ons might be of no interest to you; while others could become indispensable. Note that if you have a trial copy of SPSS, it likely has all the modules, including those that you might lose access to when you acquire your own copy. This article introduces you to the modules that can be added to SPSS and what they do; refer to the documentation that comes with each module for a full tutorial.

You'll likely come across the names IBM SPSS Amos and IBM SPSS Modeler. Although SPSS appears in the names, you purchase these programs separately, not as add-ons. Amos is used for Structural Equation Modeling (SEM) and SPSS Modeler is a predictive analytics and machine learning workbench.

The Advanced Statistics module

Following is a list of the statistical techniques that are part of the Advanced Statistics module:
  • General linear models (GLM)
  • Generalized linear models (GENLIN)
  • Linear mixed models
  • Generalized estimating equations (GEE) procedures
  • Generalized linear mixed models (GLMM)
  • Survival analysis procedures
Although these procedures are among the most advanced in SPSS, some are quite popular. For instance, hierarchical linear modeling (HLM), part of linear mixed models, is common in educational research. HLM models are statistical models in which parameters vary at more than one level. For instance, you may have data that includes information for both students and schools, and in an HLM model you can simultaneously incorporate information from both levels.

The key point is that this Advanced Statistical module contains specialized techniques that you need to use if you don’t meet the assumptions of plain-vanilla regression and analysis of variance (ANOVA). These techniques are more of an ANOVA flavor. Survival analysis is so-called time-to-event modeling, such as estimating time to death after diagnosis.

The Custom Tables module

The Custom Tables module has been the most popular module for years, and for good reason. If you need to squeeze a lot of information into a report, you need this module. For instance, if you do survey research and want to report on the entire survey in tabular form, the Custom Tables module can come to your rescue because it allows you to easily present vast information.

Get a free trial copy of SPSS Statistics with all the modules, and force yourself to spend a solid day using the modules you don’t have. See if any aspect of reporting you’re already doing could be done faster with the Custom Tables module. Reproduce a recent report, and see how much time you might save.

In the following figure, you see a simple Frequency table displaying two variables. Note that the categories for both variables are the same.

SPSS Frequency table Frequencies table of the discount variables.

The following table is the same data, but here the table was created using the SPSS Custom Tables module and is a much better table.

Custom table of the discount variables. Custom table of the discount variables.

If you’re producing the table for yourself, presentation may not matter. But if you’re putting the table in a report that will be sent to others, you need the SPSS Custom Tables module. By the way, with practice, it takes only a few seconds to make the custom version, and you can use Syntax to further customize the table!

Starting in version 27, the Custom Tables module is part of the standard edition.

The Regression module

The following is a list of the statistical techniques that are part of the Regression module:
  • Multinomial and binary logistic regression
  • Nonlinear regression (NLR) and constrained nonlinear regression (CNLR)
  • Weighted least squares regression and two-stage least squares regression
  • Probit analysis
In some ways, the Regression module is like the Advanced Statistics module — you use these techniques when you don’t meet the standard assumptions. However, with the Regression module, the techniques are fancy variants of regression when you can’t do ordinary least squares regression. Binary logistic regression is popular and used when the dependent variable has two categories — for example, stay or go (churn), buy or not buy, or get a disease or not get a disease.

The Categories module

The Categories module enables you to reveal relationships among your categorical data. To help you understand your data, the Categories module uses perceptual mapping, optimal scaling, preference scaling, and dimension reduction. Using these techniques, you can visually interpret the relationships among your rows and columns.

The Categories module performs its analysis on ordinal and nominal data. It uses procedures similar to conventional regression, principal components, and canonical correlation. It performs regression using nominal or ordinal categorical predictor or outcome variables.

The procedures of the Categories module make it possible to perform statistical operations on categorical data:

  • Using the scaling procedures, you can assign units of measurement and zero-points to your categorical data, which gives you access to new groups of statistical functions because you can analyze variables using mixed measurement levels.
  • Using correspondence analysis, you can numerically evaluate similarities among nominal variables and summarize your data according to components you select.
  • Using nonlinear canonical correlation analysis, you can collect variables of different measurement levels into sets of their own, and then analyze the sets.
You can use this module to produce a couple of useful tools:
  • Perceptual map: A high-resolution summary chart that serves as a graphic display of similar variables or categories. A perceptual map gives you insights into relationships among more than two categorical variables.
  • Biplot: A summary chart that makes it possible to look at the relationships among products, customers, and demographic characteristics.

The Data Preparation module

Let’s face it: Data preparation is no fun. We’ll take all the help we can get. No module will eliminate all the work for the human in this human–computer partnership, but the Data Preparation module will eliminate some routine, predictable aspects.

This module helps you process rows and columns of data. For rows of data, it helps you identify outliers that might distort your data. As for variables, it helps you identify the best ones, and lets you know that you could improve some by transforming them. It also enables you to create special validation rules to speed up your data checks and avoid a lot of manual work. Finally, it helps you identify patterns in your missing data.

Starting in version 27, the Data Preparation and Bootstrapping modules are part the base edition.

The Decision Trees module

Decision trees are, by far, the most popular and well-known data mining technique. In fact, entire software products are dedicated to this approach. If you aren’t sure whether you need to do data mining but you want to try it out, using the Decision Trees module would be one of the best ways to attempt data mining because you already know your way around SPSS Statistics. The Decision Trees module doesn’t have all the features of the decision trees in SPSS Modeler (an entire software package dedicated to data mining), but there is plenty here to give you a good start.

What are decision trees? Well, the idea is that you have something you want to predict (the target variable) and lots of variables that could possibly help you do that, but you don’t know which ones are most important. SPSS indicates which variables are most important and how the variables interact, and helps you predict the target variable in the future.

SPSS supports four of the most popular decision tree algorithms: CHAID, Exhaustive CHAID, C&RT, and QUEST.

The Forecasting module

You can use the Forecasting module to rapidly construct expert time-series forecasts. This module includes statistical algorithms for analyzing historical data and predicting trends. You can set it up to analyze hundreds of different time series at once instead of running a separate procedure for each one.

The software is designed to handle the special situations that arise in trend analysis. It automatically determines the best-fitting autoregressive integrated moving average (ARIMA) or exponential smoothing model. It automatically tests data for seasonality, intermittency, and missing values. The software detects outliers and prevents them from unduly influencing the results. The generated graphs include confidence intervals and indicate the model’s goodness of fit.

As you gain experience at forecasting, the Forecasting module gives you more control over every parameter when you’re building your data model. You can use the expert modeler in the Forecasting module to recommend starting points or to check calculations you’ve done by hand.

In addition, an algorithm called Temporal Causal Modeling (TCM) attempts to discover key causal relationships in time-series data by including only inputs that have a causal relationship with the target. This differs from traditional time-series modeling, where you must explicitly specify the predictors for a target series.

The Missing Values module

The Data Preparation module seems to have missing values covered, but the Missing Values module and the Data Preparation module are quite different. The Data Preparation module is about finding data errors; its validation rules will tell you whether a data point just isn’t right. The Missing Values module, on the other hand, is focused on when there is no data value. It attempts to estimate the missing piece of information using other data you do have. This process is called imputation, or replacing values with an educated guess. All kinds of data miners, statisticians, and researchers — especially survey researchers — can benefit from the Missing Values module.

The Bootstrapping module

Hang on tight because we’re going to get a little technical. Bootstrapping is a technique that involves resampling with replacement. The Bootstrapping module chooses a case at random, makes notes about it, replaces it, and chooses another. In this way, it’s possible to choose a case more than once or not at all. The net result is another version of your data that is similar but not identical. If you do this 1,000 times (the default), you can do some powerful things indeed.

The Bootstrapping module allows you to build more stable models by overcoming the effect of outliers and other problems in your data. Traditional statistics assumes that your data has a particular distribution, but this technique avoids that assumption. The result is a more accurate sense of what’s going on in the population. Bootstrapping, in a sense, is a simple idea, but because bootstrapping takes a lot of computer horsepower, it’s more popular now than when computers were slower.

Bootstrapping is a popular technique outside SPSS as well, so you can find articles on the web about the concept. The Bootstrapping module lets you apply this powerful concept to your data in SPSS Statistics.

The Complex Samples module

Sampling is a big part of statistics. A simple random sample is what we usually think of as a sample — like choosing names out of a hat. The hat is your population, and the scraps of paper you choose belong to your sample. Each slip of paper has an equal chance of being chosen. Research is often more complicated than that. The Complex Sample module is about more complicated forms of sampling: two stage, stratified, and so on.

Most often, survey researchers need this module, although many kinds of experimental researchers may benefit from it too. The Complex Samples modules helps you design the data collection, and then takes the design into account when calculating your statistics. Nearly all statistics in SPSS are calculated with the assumption that the data is a simple random sample. Your calculations can be distorted when this assumption is not met.

The Conjoint module

The Conjoint module provides a way for you to determine how each of your product’s attributes affect consumer preference. When you combine conjoint analysis with competitive market product research, it’s easier to zero in on product characteristics that are important to your customers.

With this research, you can determine which product attributes your customers care about, which ones they care about most, and how you can do useful studies of pricing and brand equity. And you can do all this before incurring the expense of bringing new products to market.

The Direct Marketing module

The Direct Marketing module is a little different from the others. It’s a bundle of related features in a wizardlike environment. The module is designed to be one-stop shopping for marketers. The main features are recency, frequency, and monetary (RFM) analysis, cluster analysis, and profiling:
  • RFM analysis: RFM analysis reports back to you about how recently, how often, and how much your customers spent on your business. Obviously, customers who are currently active, spend a lot, and spend often, are your best customers.
  • Cluster analysis: Cluster analysis is a way of segmenting your customers into different customer segments. Typically, you use this approach to match different marketing campaigns to different customers. For example, a cruise line may try different covers on the travel catalog going out to customers, with the adventurous types getting Alaska or Norway on the cover, and the umbrella-drink crowd getting pictures of the Caribbean.
  • Profiling: Profiling helps you see which customer characteristics are associated with specific outcomes. In this way, you can calculate the propensity score that a particular customer will respond to a specific campaign. Virtually all these features can be found in other areas of SPSS, but the wizardlike environment of the Direct Marketing module makes it easy for marketing analysts to be able produce useful results when they don’t have extensive training in the statistics behind the techniques.

The Exact Tests module

The Exact Tests module makes it possible to be more accurate in your analysis of small datasets and datasets that contain rare occurrences. It gives you the tools you need to analyze such data conditions with more accuracy than would otherwise be possible.

When only a small sample size is available, you can use the Exact Tests module to analyze the smaller sample and have more confidence in the results. Here, the idea is to perform more analyses in a shorter period of time. This module allows you to conduct different surveys rather than spend time gathering samples to enlarge your base of surveys.

The processes you use, and the forms of the results, are the same as those in the base SPSS system, but the internal algorithms are tuned to work with smaller datasets. The Exact Tests module provides more than 30 tests covering all the nonparametric and categorical tests you normally use for larger datasets. Included are one-sample, two-sample, and k-sample tests with independent or related samples, goodness-of-fit tests, tests of independence, and measures of association.

The Neural Networks module

A neural net is a latticelike network of neuronlike nodes, set up within SPSS to act something like the neurons in a living brain. The connections between these nodes have associated weights (degrees of relative effect), which are adjustable. When you adjust the weight of a connection, the network is said to learn.

In the Neural Network module, a training algorithm iteratively adjusts the weights to closely match the actual relationships among the data. The idea is to minimize errors and maximize accurate predictions. The computational neural network has one layer of neurons for inputs and another for outputs, with one or more hidden layers between them. The neural network can be used with other statistical procedures to provide clearer insight.

Using the familiar SPSS interface, you can mine your data for relationships. After selecting a procedure, you specify the dependent variables, which may be any combination of continuous and categorical types. To prepare for processing, you lay out the neural network architecture, including the computational resources you want to apply. To complete preparation, you choose what to do with the output:

  • List the results in tables.
  • Graphically display the results in charts.
  • Place the results in temporary variables in the dataset.
  • Export models in XML-formatted files.

About This Article

This article is from the book:

About the book authors:

Jesus Salcedo is an independent statistical and data-mining consultant who has been using SPSS products for more than 25 years. He has written numerous SPSS courses and trained thousands of users. Keith McCormick has been all over the world training and consulting in all things SPSS, statistics, and data mining. He now authors courses on the LinkedIn Learning platform and coaches executives on how to effectively manage their analytics teams.

Jesus Salcedo is an independent statistical and data-mining consultant who has been using SPSS products for more than 25 years. He has written numerous SPSS courses and trained thousands of users. Keith McCormick has been all over the world training and consulting in all things SPSS, statistics, and data mining. He now authors courses on the LinkedIn Learning platform and coaches executives on how to effectively manage their analytics teams.

This article can be found in the category: