Preparing Your Data in Tableau - dummies

Preparing Your Data in Tableau

By Molly Monsey, Paul Sochan

Tableau can work with a number of different data sources, depending on the edition of Tableau that you’ve licensed. Fortunately, true database applications enforce certain rules to ensure that each field in a record contains the proper type of data. For example, users can’t enter arbitrary text in numeric or date fields, because this would make it impossible to properly process the information contained in the database.

Tableau works with data where it lives, whether it’s in a database, files, or even in the cloud. Here, you discover how to work with file-based data sources such as Excel worksheets. Due to the lack of controlled data structure, there are advantages and disadvantages that you should know.

Using the data preparation features

Excel worksheets may include extraneous information or have things like blank columns in the middle of the data. Here’s an example of a worksheet with both of these problems.

This Excel worksheet is missing key information.
This Excel worksheet is missing key information.

Notice that Tableau isn’t showing the correct field names. What appear to be the field names are mostly showing up as data in the first row (although several fields display null to indicate that the contents are invalid based on the data type of the field). In addition, the field that’s labeled F5 appears to be completely filled with invalid data.

The reason that Tableau is having problems with the fields becomes clearer when you take a look at the worksheet back in Excel. You can see extra rows above the data and a blank column within the data. In addition, the worksheet contains text in cell A1 above the data table.

Looking at the worksheet reveals the problems.
Looking at the worksheet reveals the problems.

Tableau wants your Excel spreadsheet to look like a database table. What this means is that it expects the first row to contain column headers and each subsequent row to contain data values for each column. It expects the data type in each column to remain consistent and for there to be no extraneous spaces or text inserted anywhere in the spreadsheet.

If any of these conditions exist, you will either need to manually clean your spreadsheet or use Tableau’s Data Interpreter.

Fortunately, Tableau is pretty good at noticing a problem and will display a prompt asking whether you want its Data Interpreter tool to try to correct any problems that might exist. To see whether the Data Interpreter can fix the problems, just click the prompt’s Turn On button. Check out the results of using this tool on the example worksheet.

The Data Interpreter may be able to fix the problems.
The Data Interpreter may be able to fix the problems.

If you compare the different images above, you see that Tableau now shows the correct field names, no longer has null values in the first record, and has eliminated the empty column from the results.

Always check the data preview area when creating new connections to ensure the data looks as you expect it to.

Targeting data sources for manual corrections

Although Tableau can correct many common types of layout and formatting issues automatically, some problems can only be corrected manually. Check below for an example of this type of issue. In this case, the ship dates in records 27 and 28 were not entered correctly, so they show as null rather than as actual dates. If you use this data in Tableau, these null values might give you invalid results.

The nulls noted are due to empty records in the source data.
The nulls noted are due to empty records in the source data.

You may not realize that your data contains null values until you actually start to perform your analysis. In this case, Tableau reports that the table contains a total of 9,994 records, so you may not be concerned about two null values.

Tableau helps you locate null values, which could also be errors that would need to be corrected in the source data. Although there are only two null values shown above in rows 27 and 28, there could be many more you can’t see without scrolling.

Null values just mean that a row contains an empty cell. This does not always mean there is an error in your data. Tableau is well aware of that fact, which is why it gives you options for dealing with null values. Yes, you can use the information Tableau provides to correct entries in the source data, but you can also have Tableau address the values using one of its many functions or remove values with the help of filters.