Using the Aggregate Procedure in IBM SPSS Statistics

By Keith McCormick, Jesus Salcedo, Aaron Poh

Often, a file may not be in the correct format for your analysis in SPSS Statistics. For example, purchase data is often collected as transactional data, so each row represents a purchase. To analyze this at the customer level, the data must be restructured so that there is one row (or record) per customer; this record will then contain summarize information.

The aggregate procedure in SPSS allows users to:

  • Create a new file so that each record in the new file contains only the summarized information, such as the total amount spent, the average amount spent, and the total number of purchases.

  • Add the aggregated information back to the original file so you can identify when customers made an atypical purchase.

The following figure shows a data file that is in a transactional format. Notice that each row represents a transaction and that each customer has data in more than one row.

Transactional data.

Transactional data.

The next figure shows the previous data after it has been aggregated. Notice that this file has only one row per customer, and there is summarized information for each customer: the total amount spent, the average amount spent, and the total number of purchases.

Aggregated data.

Aggregated data.

SPSS makes it easy to create an aggregated dataset, as shown in the preceding figure, so that users can conduct analyses on the aggregated file, or users can take these aggregated variables and merge them into the original dataset so they can see how individual transactions relate to the aggregated variables.

To create the aggregated dataset from the transactional data set, look over the following figure. It shows an example of a completed Aggregate Data dialog box. Notice that the data will be aggregated using the variable Customer ID, so the resulting file from this procedure will have only one row per customer. Also notice that the new variables will be included in the aggregated dataset. Finally, the aggregate procedure will create a new dataset containing only the aggregated variables.

Completed Aggregate Data dialog box.

Completed Aggregate Data dialog box.