Data preparation in SPSS Statistics
Data preparation is an integral part of every research project and is often the most timeconsuming activity in a project. Different projects will require different types of data preparation, so there is no prescribed sequence in which data preparation tasks should be undertaken.
The following table lists some of the most common data preparation tasks, along with the SPSS submenu that will help you with these data preparation activity.
Data and Transform Menu Procedures
Activity  Submenu(s)  Useful For 
Selecting a subset of cases  Select Cases or Split File  Running an analysis on only a portion of the data (such as customers who live in a particular region) 
Identifying unusual cases  Identify Unusual Cases or Sort Cases  Sorting cases in ascending or descending order based on the values of one or more variables to view extreme cases 
Removing duplicate cases  Identify Duplicate Cases  Identifying an individual who appears several times in the same dataset 
Recoding data values  Recode into Different Variables or Recode into the Same Variable (not recommended)  Modifying a 7point customer satisfaction survey into the responses (negative, neutral, or positive) after data inspection 
Combining data files  Merge Files Add Cases or Merge Files Add Variables  Combining data that is kept in different locations but must be combined before data analysis can begin 
Creating new variables  Compute Variable  Extracting additional information or insight from the variables originally in the dataset 
Counting occurrences  Count Values within Cases  Counting how often something of interest occurs 
Calculating with date and time variables  Data and Time Wizard  Calculating the amount of time that has passed between time points 
Transforming string to numeric values  Automatic Recode  Modifying string variables so they can be used in more analyses 
Creating groups from continuous data  Visual Binning  Creating groups out of scale variables (income groups from income) 
Calculating summaries across cases  Aggregate  Creating the appropriate level of analysis for the data (taking transactional data so it can be analyzed at the customer level) 
Changing the structure of the data file  Restructure or Transpose  Useful for making variables into cases or cases into variables 
Effects of measurement level
The level of measurement of a variable determines the appropriate summary statistics and graphs to describe data. The following table summarizes the most common summary measures and graphs for each measurement level.
Level of Measurement
Nominal  Ordinal  Scale  
Definition  Unordered categories  Ordered categories  Numeric values 
Examples  Gender, geographic location, job category  Satisfaction ratings, income groups, ranking of preferences  Number of purchases, cholesterol level, age 
Measures of central tendency  Mode  Median  Median or mean 
Measures of dispersion  None  Min/max/range

Min/max/range,
Standard deviation/ variance 
Graph  Pie or bar  Bar  Histogram or box and whiskers plot

The prior table showed how level of measurement determines the type of graph you can use to display individual variables. The following table shows which types of graphs are appropriate for different variable combinations.
Graphs for Variable Combinations
Categorical Dependent  Scale Dependent  
Categorical Independent  Clustered bar or paneled pie  Error bar or boxplot 
Scale Independent  Error bar or boxplot  Scatter plot 
Reviewing the data file for the first time in SPSS Statistics
After you have your data, you are ready to start exploring it and becoming familiar with its characteristics. Start by reviewing the distribution of each variable and checking the number of valid cases.
When you have a categorical variable, it’s important to know the number of unique values and to make sure there are no more or fewer categories than expected. It’s important also to determine how the cases are distributed among the categories of a variable.
Look for categories that have either very few or very many cases. Either situation could cause problems when analyzing the data, so you may need to exclude those values or combine them with other values (but only if it makes sense) to build a valid analysis.
For continuous variables, check for unusual distributions such as bimodality or a high degree of skewness. Also look at summary statistics and note if there are any deviations from what you expect (lower minimums, higher maximums, different means, or more or less variation in the data values).
Finally, you can easily spot potential problems in data that otherwise appears valid by asking a series of questions:
 Does the distribution of the variable make sense?
 Is this what you were expecting?
 Do you notice any errors?
 Do you notice any unusual values?
 Will you have any potential problems when analyzing this data?