Choosing an Algorithm for Predictive Analytics
Various statistical, data-mining, and machine-learning algorithms are available for use in your predictive analytics model. You’re in a better position to select an algorithm after you’ve defined the objectives of your model and selected the data you’ll work on. Some of these algorithms were developed to solve specific business problems, enhance existing algorithms, or provide new capabilities — which may make some of them more appropriate for your purposes than others. You can choose from a range of algorithms to address business concerns such as the following:
- For customer segmentation and/or community detection in the social sphere, for example, you’d need clustering algorithms.
- For customer retention or to develop a recommender system, you’d use classification algorithms.
- For credit scoring or predicting the next outcome of time-driven events, you’d use a regression algorithm.
As time and resources permit, you should run as many algorithms of the appropriate type as you can. Comparing different runs of different algorithms can bring surprising findings about the data or the business intelligence embedded in the data. Doing so gives you more detailed insight into the business problem, and helps you identify which variables within your data have predictive power.
Some predictive analytics projects succeed best by building an ensemble model, a group of models that operate on the same data. An ensemble model uses a predefined mechanism to gather outcomes from all its component models and provide a final outcome for the user.
Models can take various forms — a query, a collection of scenarios, a decision tree, or an advanced mathematical analysis. In addition, certain models work best for certain data and analyses. You can (for example) use classification algorithms that employ decision rules to decide the outcome of a given scenario or transaction, addressing questions like these:
- Is this customer likely to respond to our marketing campaign?
- Is this money-transfer likely to be part of a money-laundering scheme?
- Is this loan applicant likely to default on the loan?
You can use unsupervised clustering algorithms to find what relationships exist within your dataset. You can use these algorithms to find different groupings among your customers, determine what services can be grouped together, or decide for example which products can be upsold.
Regression algorithms can be used to forecast continuous data, such as predicting the trend for a stock movement given its past prices.
Decision trees, support vector machines, neural networks, logistic, and linear regressions are some of the most common algorithms. Although their mathematical implementations differ, these predictive models generate comparable results. The decision trees are more popular, because they’re easy to understand; you can follow the path to a given decision.
Classification algorithms are great for the type of analysis when the target is known (such as identifying spam emails). On the other hand, when the target variable is unknown, clustering algorithms are your best bet. They allow you to cluster or group your data into meaningful groups based on the similarities among the group members.
These algorithms are widely popular. There are many tools, both commercial and open-source, that implement them. With data accumulation thriving and accelerating (that is, big data), and cost-efficient hardware and platforms (such as cloud computing and Hadoop), predictive analytics tools are experiencing a boom.
Data and business objectives aren’t the only factors to consider when you’re selecting an algorithm. The expertise of your data scientists is of tremendous value at this point; picking an algorithm that will get the job done is often a tricky combination of science and art. The art part comes from experience and proficiency in the business domain, which also plays a critical role in identifying a model that can serve business objectives accurately.