Data Mining For Dummies
Book image
Explore Book Buy On Amazon

Using codes for data reduces data entry time, prevents errors, and reduces the memory requirements for storing the data. But the codes aren’t meaningful unless you have documentation, or labels, to explain their meaning.

Some data formats enable you to enjoy the advantages of using codes while keeping the information about the meaning of the codes in the same file. These aren’t typical in data mining — you’re more likely to see them in statistical analysis products — but some data-mining applications can use these labeled data formats. Here’s how they work.

Data appears to contain only numbers, but these numbers are codes for values of categorical variables.

Data appears to contain only numbers, but these numbers are codes for values of categorical variables.

The same dataset with labels instead of numeric codes.

The same dataset with labels instead of numeric codes.

You can switch back and forth between these two display options using the menu.

Although the data is stored as numbers, the labels allow you to see what the data means.

Although the data is stored as numbers, the labels allow you to see what the data means.

In the figure, you are looking at it in the data editor. You also can set up an analysis or view the results.

You can include comments in a dataset.

You can include comments in a dataset.

You may also find other types of data labels in data-mining applications. The native data format for Weka allows you to include comments in a dataset. This gives you a good place to put annotations about the source of the data and other important details.

You can annotate data.

You can annotate data.

RapidMiner also has an option for annotations. You can use the graphic user interface to enter annotations for individual rows of data.

About This Article

This article is from the book:

About the book author:

Meta S. Brown helps organizations use practical data analysis to solve everyday business problems. A hands-on data miner who has tackled projects with up to $900 million at stake, she is a recognized expert in cutting-edge business analytics.

This article can be found in the category: