By Keith McCormick, Jesus Salcedo, Aaron Poh

In IBM SPSS Statistics, people frequently have categorical variables with lots of values. It isn’t unusual for this data to have been entered as string values — alphanumeric characters. You really should avoid alphanumerics, though, and there is a simple way to make the data better: the Automatic Recode command.

Consider the example of a simple list of fruit names entered in a spreadsheet:

apple
banana
cantaloupe
durian
eggfruit
fig
grapefruit
huckleberry

Most menus will handle these variables, but some won’t. These variables have a little letter a next to their variable symbols in the menus to remind you that they have been declared a string in the Variable view. Commonly, these kinds of lists could be product names, customer names, car makes and models, and so on.

Large complex datasets almost always have these data points paired with numeric codes. Sometimes, however, you don’t have a coding scheme and you’re tempted to just type in the words. Not a good idea. There are at least four reasons why you should not do this:

  • Some menus in SPSS don’t like alphanumeric variables, and you may wonder where the variable went. These variables won’t even appear in some variable lists.

  • Strings in SPSS are case sensitive so “Fig,” “FIG,” and “fig” would be counted as three different fruits. Not good.

  • Perhaps worst of all, spaces before or after the word can cause trouble. So “ Fig ,” “Fig ,” and “ Fig” could all be counted as different fruits. You may not notice these spaces at first, which makes it even worse.

  • Missing data handling with alphanumerics is confusing. So “ “ could be considered its own fruit. Also not good.

But this situation doesn’t have to be difficult even if you have dozens or hundreds of names to deal with. To access Automatic Recode, select the Transform Menu, and then choose Automatic Recode. An example of the completed dialog is shown in the following figure. Note the Treat Blank Strings as User Missing check box. Checking that box is almost always a good idea. Notice, as well, that a new name must be provided for the new variable you are about to create.

The Automatic Recode menu

The Automatic Recode menu

If you were to run the example above, the following would appear in the output window:

fruit into fruit_num
Old Value         New Value  Value Label
apple                     1  apple
banana                    2  banana
canteloupe                3  canteloupe
durian                    4  durian
eggfruit                  5  eggfruit
fig                       6  fig
grapefruit                7  grapefruit
huckleberry               8  huckleberry

SPSS has created a new variable for you that no longer has alphanumeric characters. Instead you now have a new variable that has numeric values with the value labels showing the original values. So, if you have this kind of variable, there is really no excuse not to use it. Try to get rid of those alphanumeric variables!