9 Laws of Data Mining
Every profession has its guiding principles, ideas that provide structure and guidance in everyday work. Data mining is no exception. Following are nine fundamental ideas to guide you as you get down to work and become a data miner. These are the 9 Laws of Data Mining as they were originally stated by the pioneering data miner, Thomas Khabaza.
1st law: Business goals
Here’s the 1st Law of Data Mining, or “Business Goals Law”: Business objectives are the origin of every data-mining solution.
You explore data to find information that helps you run the business better. Shouldn’t this be the mantra of all business data analysis? Of course it should! Yet novice data miners often focus on technology and others details, which may be interesting, but not aligned with the needs and goals of executive decision-makers.
You’ve got to develop a habit of identifying business goals before doing anything else, and focusing on those goals at every step in the data-mining process. It’s significant that this law comes first. Everyone should understand that data mining is a process with a purpose.
2nd law: Business knowledge
Here’s the 2nd Law of Data Mining, or “Business Knowledge Law”: Business knowledge is central to every step of the data-mining process.
Data mining gives power to the people — businesspeople — who use their business knowledge, experience, and insight, along with data-mining methods, to find meaning in data.
You don’t have to be a fancy statistician to do data mining, but you do have to know something about what the data signifies and how the business works. Only when you understand the data and the problem that you need to solve can data-mining processes help you to discover useful information and put it to use.
3rd law: Data preparation
Here’s the 3rd Law of Data Mining, or “Data Preparation Law”: Data preparation is more than half of every data-mining process.
Traditional statisticians often have the opportunity to collect new data to address specific research questions. They may use rigorous processes to plan experiments, design survey research questionnaires or otherwise gather high-quality data that is well targeted to specific research goals. Yet after all that, they still spend a lot of time cleaning and preparing data for analysis.
Data miners, on the other hand, almost always have to work with whatever data is available. They use existing business records, public data, or the data they can buy. Chances are, all that data was gathered for some purpose other than data mining, and without any rigorous plan or careful data-collection process. So data miners spend a lot of time on data preparation.
4th law: Right model
Here’s the 4th Law of Data Mining, or “NFL-DM”: The right model for a given application can only be discovered by experiment.
This law is also known by the shorthand NFL-DM, meaning that there is No Free Lunch for the data miner.
First, what’s a model? It’s an equation that represents a pattern observed in data. At least, it represents the pattern in a rough way. Mathematical models of real things are never perfect! This is a fact of life, and it’s just as true for nuclear physicists as it is for data miners.
In data mining, models are selected through trial and error. You will experiment with different model types.
5th law: Pattern
Here’s the 5th Law of Data Mining: There are always patterns.
As a data miner, you’ll explore data in search of useful patterns. In other words, you’ll be looking for meaningful relationships among the variables in the data. Understanding these relationships provides better understanding of the business, and better predictions of what will happen in the future. Most importantly, understanding patterns in the data enables you to influence what will happen in the future.
You always find patterns. The data always has something to tell you. Sometimes, it confirms that what you’ve been doing is right. That may not seem exciting, but at least it tells you that you’ve been on the right track. Other days, the data may tell you that your current business practices don’t work. That’s exciting, and though it might not be pleasant in the short run, knowing the truth is an important step toward improvement.
6th law: Amplification
Here’s the 6th Law of Data Mining, or “Insight Law”: Data mining amplifies perception in the business domain.
Data-mining methods enable you to understand your business better than you could have done without them. Data-mining methods help you like a magnifier or a microscope, enabling the discovery of effects that would be difficult or impossible to detect through ordinary reporting.
Data mining is not instant.
Discovery and learning through data mining is an interactive process. You’ll make discoveries, find out a bit from each of them, and use what you’ve discovered to take action. The results of each action you try will produce more data, and that data lets you understand something more. It’s a cycle of discovery, and the cycle continues as long as you continue to explore and experiment.
7th law: Prediction
Here’s the 7th Law of Data Mining, or “Prediction Law”: Prediction increases information locally by generalization.
Data mining helps you use what you know to make better predictions (or estimates) of things you don’t know. Data mining uses data and modeling methods to replace your informal expectations with data-driven, consistent, and more accurate estimates.
8th law: Value
Here’s the 8th Law of Data Mining, or “Value Law”: The value of data-mining results is not determined by the accuracy or stability of predictive models.
Data miners don’t fuss over theory. As a data miner, you may never even know the theory behind the statistical models you use. Maybe that’s just as well, because in data mining, you’re going to use those models in ways that don’t necessarily line up with the theory behind them.
You’ll look for models that produce correct predictions (and you’ll use testing, rather than statistical theory, to judge that). But you may be more concerned with other issues, such as whether the model makes business sense, enlightens you about unexpected predictive factors or is practical to use in your workplace.
9th Law: Change
Here’s the 9th Law of Data Mining, or “Law of Change”: All patterns are subject to change.
The world is always changing. The model that gives you great predictions today may be useless tomorrow. This is a fact of life for all data analysts, not just data miners.