Focus on a Problem when Data Mining
A data-mining project begins when you identify a specific business issue to investigate. The narrower and better-defined the question, the more effectively it can be answered. The more clearly the question is defined, the more clearly the data requirements can be understood, as well as the limitations of the answer.
If you’re faced with an issue that is very broad (such as “Why are we not selling enough?”), it helps to first break the question down into manageable bits. You don’t have to cover the whole topic at once; just take one narrow part of the big problem and start with that.
Take, for example, one retailer’s initial question, “How much repeat business are we getting?” That sounds at first like a simple, straightforward question, but it’s actually a broad question that encompasses many smaller, more specific questions, like these:
How many new customers come back?
How many second-time customers return a third time?
Do customers who first buy suits come back for shoes?
Do those who make a small purchase return for larger purchases?
Answering these questions doesn’t require a lot more than counting. It’s just not that difficult to calculate how many customers return a second or third time, if you have some way to identify individuals. That’s easy for online stores, where shoppers can be tracked by an account login or email address. Traditional retailers can identify customers by house credit cards or loyalty cards, although not every customer uses those.
From here, as the retailer got more familiar with data mining and the potential of predictive analytics and data-mining tools, his questions became more sophisticated and action oriented:
How does the amount spent in the first visit relate to long-term spending?
What behaviors or characteristics are indicators of high future spending? If so, what are they?
Would additional information (for example, demographic data) improve our ability to predict a customer’s spending behavior?
The object of data mining is to move beyond simply knowing what has already happened and understand how you may influence what will happen in the future.