Data Mining For Dummies
Book image
Explore Book Buy On Amazon

When your data is in more than one place, you need ways to put it all together. When you join two datasets with different variables, you’re merging data. Merging is a common operation. Merging is used frequently in data mining, combining linked data such as

  • Customer records and marketing campaign data

  • Before and after test results

  • Internal and vendor data

To merge datasets, you must have a variable that identifies cases for matching; this is called a key or identifier variable. And you may have to identify one of the datasets as primary; the primary table must have only one case for any value of the key variable.

Some data-mining applications have more than one tool for merging datasets: The first figure shows the tool for basic merges, and the second figure shows the tool for setting up more complex merge criteria.

image0.jpg
image1.jpg

If your data sources contain the same variables (more or less; the match does not have to be identical) but different cases, joining them is called appending or concatenation. Like merging, this is a common operation. It’s used whenever you have new cases for something that you’ve already been tracking.

image2.jpg

The tricky part of finding the right tool is often figuring out what it’s called. Look in the menus (or search) for append, concatenate, or merge rows.

About This Article

This article is from the book:

About the book author:

Meta S. Brown helps organizations use practical data analysis to solve everyday business problems. A hands-on data miner who has tackled projects with up to $900 million at stake, she is a recognized expert in cutting-edge business analytics.

This article can be found in the category: