SPSS Statistics Workbook For Dummies
Book image
Explore Book Buy On Amazon
This Cheat Sheet is a handy reference to some of the most commonly used data preparation techniques in SPSS Statistics. It also includes information about the different types of graphs you can create, given the level of measurement of the variables. You'll also find some of the questions you should ask yourself when first looking at a data set in SPSS Statistics.

Data preparation in SPSS Statistics

Data preparation is an integral part of every research project and is often the most time-consuming activity in a project. Different projects will require different types of data preparation, so there is no prescribed sequence in which data preparation tasks should be undertaken.

The following table lists some of the most common data preparation tasks, along with the SPSS submenu that will help you with these data preparation activity.

Data and Transform Menu Procedures

Activity Submenu(s) Useful For
Selecting a subset of cases Select Cases or Split File Running an analysis on only a portion of the data (such as customers who live in a particular region)
Identifying unusual cases Identify Unusual Cases or Sort Cases Sorting cases in ascending or descending order based on the values of one or more variables to view extreme cases
Removing duplicate cases Identify Duplicate Cases Identifying an individual who appears several times in the same dataset
Recoding data values Recode into Different Variables or Recode into the Same Variable (not recommended) Modifying a 7-point customer satisfaction survey into the responses (negative, neutral, or positive) after data inspection
Combining data files Merge Files Add Cases or Merge Files Add Variables Combining data that is kept in different locations but must be combined before data analysis can begin
Creating new variables Compute Variable Extracting additional information or insight from the variables originally in the dataset
Counting occurrences Count Values within Cases Counting how often something of interest occurs
Calculating with date and time variables Data and Time Wizard Calculating the amount of time that has passed between time points
Transforming string to numeric values Automatic Recode Modifying string variables so they can be used in more analyses
Creating groups from continuous data Visual Binning Creating groups out of scale variables (income groups from income)
Calculating summaries across cases Aggregate Creating the appropriate level of analysis for the data (taking transactional data so it can be analyzed at the customer level)
Changing the structure of the data file Restructure or Transpose Useful for making variables into cases or cases into variables

 

Effects of measurement level

The level of measurement of a variable determines the appropriate summary statistics and graphs to describe data. The following table summarizes the most common summary measures and graphs for each measurement level.

Level of Measurement

  Nominal Ordinal Scale
Definition Unordered categories Ordered categories Numeric values
Examples Gender, geographic location, job category Satisfaction ratings, income groups, ranking of preferences Number of purchases, cholesterol level, age
Measures of central tendency Mode Median Median or mean
Measures of dispersion None Min/max/range

 

 

Min/max/range,

Standard deviation/ variance

Graph Pie or bar Bar Histogram or box and whiskers plot

 

 

The prior table showed how level of measurement determines the type of graph you can use to display individual variables. The following table shows which types of graphs are appropriate for different variable combinations.

Graphs for Variable Combinations

  Categorical Dependent Scale Dependent
Categorical Independent Clustered bar or paneled pie Error bar or boxplot
Scale Independent Error bar or boxplot Scatter plot

Reviewing the data file for the first time in SPSS Statistics

After you have your data, you are ready to start exploring it and becoming familiar with its characteristics. Start by reviewing the distribution of each variable and checking the number of valid cases.

When you have a categorical variable, it’s important to know the number of unique values and to make sure there are no more or fewer categories than expected. It’s important also to determine how the cases are distributed among the categories of a variable.

Look for categories that have either very few or very many cases. Either situation could cause problems when analyzing the data, so you may need to exclude those values or combine them with other values (but only if it makes sense) to build a valid analysis.

For continuous variables, check for unusual distributions such as bimodality or a high degree of skewness. Also look at summary statistics and note if there are any deviations from what you expect (lower minimums, higher maximums, different means, or more or less variation in the data values).

Finally, you can easily spot potential problems in data that otherwise appears valid by asking a series of questions:

  • Does the distribution of the variable make sense?
  • Is this what you were expecting?
  • Do you notice any errors?
  • Do you notice any unusual values?
  • Will you have any potential problems when analyzing this data?

About This Article

This article is from the book:

About the book authors:

Jesus Salcedo is an independent statistical and data-mining consultant who has been using SPSS products for more than 25 years. He has written numerous SPSS courses and trained thousands of users. Keith McCormick has been all over the world training and consulting in all things SPSS, statistics, and data mining. He now authors courses on the LinkedIn Learning platform and coaches executives on how to effectively manage their analytics teams.

This article can be found in the category: