Data Mining for Data Warehousing - dummies

By Thomas C. Hammergren

At times, data mining for data warehousing is not commingled with the other forms of business intelligence. This lack of integration occurs for two reasons:

  • Business users don’t have the required knowledge in data mining’s statistical foundations.

  • The mainstream business intelligence vendors don’t provide the robust data mining tools, and data mining vendors don’t provide robust business intelligence tools.

Data mining tools provide a degree of technical analysis that requires a base understanding in statistical algorithms to be successful in their use.

Data mining is often presented as a magical technique that you can use to uncover the secrets of the universe from your organization’s data. In reality, data mining is an umbrella term for a series of advanced statistical techniques and models born in the 1980s as part of artificial intelligence research (neural networks, for example).

Data mining as a technique has one or both of these aspects:

  • Predictive: Data mining tools and capabilities search through large volumes of data, look for patterns and other aspects of the data in accordance with the techniques being used, and try to tell you what might happen based on the information that the data analysis found. Notice the emphasis on the word might: Data mining is a technique of probability, not a fortune-telling service.

  • Discovery-oriented: Both the basic querying-and-reporting and business analysis/OLAP categories of business intelligence tools provide business intelligence based on either questions users explicitly ask (sort of the question of the moment) or “institutionalized” questions that members of the organization regularly ask in the form of regular reports (or both). The key word is question: If no questions are asked, no answers are forthcoming.

Data mining’s discovery-oriented nature is intended to provide answers, even if you don’t ask any questions. (You can refer to this model as “tell me something interesting, even if I don’t know what questions to ask.”)

The data mining system typically provides these answers by building complex models that are used to analyze data, looking for some trend or tendency within the data that might be appropriate, and then telling you what it found.