Probably the most common mistakes in R are made while reading in data from text files using read.table() or read.csv(). Many mistakes result in R throwing errors, but sometimes you only notice something went wrong when you look at the structure of your data. In the latter case you often find that some or all variables are converted to factors when they really shouldn’t be.

When R gives errors or the structure of your data isn’t what you think it should be, check the following:

  • Did you forget to specify the argument header=TRUE? If so, R will see the column names as values and, as a result, convert every variable to a factor as it always does with character data in a text file.

  • Did you have spaces in your column names or data? The read.table() function can interpret spaces in, for example, column names or in string data as a separator. You then get errors telling you 'line x did not have y elements'.

  • Did you have a different decimal separator? In some countries, decimals are separated by a comma. You have to specifically tell R that’s the case by using the argument dec="," in the read.table() function.

  • Did you forget to specify stringsAsFactors = FALSE? By default, R changes character data to factors, so you always have to add this argument if you want your data to remain character variables.

  • Did you have another way of specifying missing values? R reads 'NA' in a text file as a missing value, but the file may use a different code (for example, 'missing'). R will see that as text and again convert that variable to a factor. You solve this by specifying the argument na.strings in the read.table() function.

If you always check the structure of your data immediately after you read it in, you can catch errors much earlier and avoid hours of frustration. Your best bet is to use str() for information on the types and head() to see if the values are what you expected.

About This Article

This article can be found in the category: