How to Choose an Algorithm for a Predictive Analysis Model
Various statistical, data-mining, and machine-learning algorithms are available for use in your predictive analysis model. You’re in a better position to select an algorithm after you’ve defined the objectives of your model and selected the data you’ll work on.
Some of these algorithms were developed to solve specific business problems, enhance existing algorithms, or provide new capabilities — which may make some of them more appropriate for your purposes than others. You can choose from a range of algorithms to address business concerns such as the following:
For customer segmentation and/or community detection in the social sphere, for example, you’d need clustering algorithms.
For customer retention or to develop a recommender system, you’d use classification algorithms.
For credit scoring or predicting the next outcome of time-driven events, you’d use a regression algorithm.
As time and resources permit, you should run as many algorithms of the appropriate type as you can. Comparing different runs of different algorithms can bring surprising findings about the data or the business intelligence embedded in the data. Doing so gives you more detailed insight into the business problem, and helps you identify which variables within your data have predictive power.
Some predictive analytics projects succeed best by building an ensemble model, a group of models that operate on the same data. An ensemble model uses a predefined mechanism to gather outcomes from all its component models and provide a final outcome for the user.
Models can take various forms — a query, a collection of scenarios, a decision tree, or an advanced mathematical analysis. In addition, certain models work best for certain data and analyses. You can (for example) use classification algorithms that employ decision rules to decide the outcome of a given scenario or transaction, addressing questions like these:
Is this customer likely to respond to our marketing campaign?
Is this money-transfer likely to be part of a money-laundering scheme?
Is this loan applicant likely to default on the loan?
You can use unsupervised clustering algorithms to find what relationships exist within your dataset. You can use these algorithms to find different groupings among your customers, determine what services can be grouped together, or decide for example which products can be upsold.
Regression algorithms can be used to forecast continuous data, such as predicting the trend for a stock movement given its past prices.
Data and business objectives aren’t the only factors to consider when you’re selecting an algorithm. The expertise of your data scientists is of tremendous value at this point; picking an algorithm that will get the job done is often a tricky combination of science and art.
The art part comes from experience and proficiency in the business domain, which also plays a critical role in identifying a model that can serve business objectives accurately.