10 Common Data Mining Mistakes (That You Won't Make) - dummies

10 Common Data Mining Mistakes (That You Won’t Make)

By Meta S. Brown

Data mining is done by trial and error, and so, for data miners, making mistakes is only natural. Mistakes can be valuable, in other words, at least under certain conditions. Not all mistakes are created equal, however. Some are just better avoided. The following list offers ten such mistakes. If you read through them carefully, and commit them to memory, you just might avoid a few bumps on the learning curve:

  • Skipping data quality checks: Most data miners think developing predictive models is more fun than reviewing data for quality problems. But if you fail to detect and correct data quality problems, you could end up with worthless predictions.

  • Missing the point: You’ve discovered something fascinating! That’s nice, but if it isn’t also relevant to the business problem you set out to solve, well, it isn’t relevant at all. Get back on track.

  • Believing that a pattern in the data proves a cause-and-effect relationship: You explore a dataset and notice that when Variable A increases, Variable B increases, too. This could occur because Variable A influences Variable B, or because Variable B influences Variable A. On the other hand, it could be that both are influenced by some other variable that you have not considered. Or it could be a one-time coincidence. Who can say?

  • Stretching conclusions too far: Don’t presume that the relationships you observe in data will recur in different circumstances. If your data was collected in a cool environment, don’t assume that things will work the same way in a hot factory setting.

  • Betting on results that don’t make sense: Data mining methods are informal and not usually backed up by scientific method and theory, so your results had better at least make business sense. If there’s no common sense explanation for the results you present, your executive management probably won’t take it seriously, and they shouldn’t.

  • Falling in love with a particular modeling method: There is no single type of data mining model that fits every situation.

  • Putting a model into production without adequate testing: Don’t bet your business on a predictive model until you have tested it with holdout data and on a small scale in the field.

  • Ignoring results you don’t like: If you ignore your data now, it will come back one day and say, “I told you so.”

  • Using data mining to address every data analysis need: Data mining has tremendous value, yet some applications still call for rigorous data collection methods, formal statistical analysis, and scientific method.

  • Presuming that traditional data analysis techniques no longer matter: Refer to the previous bullet.