How to Use Assumptions Appropriately in Predictive Analytics
In spite of everything you’ve been told about assumptions causing trouble, a few assumptions remain at the core of any predictive analytics model. Those assumptions show up in the variables selected and considered in the analysis — and those variables directly affect the accuracy of the final model’s output.
Therefore your wisest precaution at the outset is to identify which assumptions matter most to your model — and to keep them to an absolute minimum.
Creating a predictive model that works well in the real world requires an intimate knowledge of the business. Your model starts out knowing only the sample data — in practical terms, almost nothing. So start small and keep on enhancing the model as necessary.
Probing possible questions and scenarios can lead to key discoveries and/or can shed more light on the factors at play in the real world. This process can identify the core variables that could affect the outcome of the analysis.
In a systematic approach to predictive analysis, this phase — exploring “what-if” scenarios — is especially interesting and useful. Here’s where you change the model inputs to measure the effects of one variable or another on the output of the model; what you’re really testing is its forecasting capability.
Improving the model’s assumptions — by testing how they affect the model’s output, probing to see how sensitive the model is to them, and paring them down to the minimum — will help you guide the model toward a more reliable predictive capability. Before you can optimize your model, you have to know the predictive variables — features that have a direct impact on its output.
You can derive those decision variables by running multiple simulations of your model — while changing a few parameters with each run — and recording the results, especially the accuracy of the model’s forecasts. Usually you can trace variations in accuracy back to the specific parameters you changed.
At this point, the twenty-first century can turn to the fourteenth for help. William of Ockham, an English Franciscan friar and scholastic philosopher who lived in the 1300s, developed the research principle known as Occam’s Razor: You should cut away unnecessary assumptions until your theory has as few of them as possible. Then it’s likeliest to be true.
Too many assumptions weigh down your model’s forecasts with uncertainties and inaccuracies. Eliminating unnecessary variables leads to a more robust model, but it’s not easy to decide which variables to include in the analysis — and those decisions directly affect the performance of the model.
But here’s where the analyst can run into a dilemma: Including unnecessary factors can skew or distort the output of the model, but excluding a relevant variable leaves the model incomplete.
So when it comes time to select those all-important decision variables, call in your domain knowledge experts. When you have an accurate, reality-based set of decision variables, you don’t have to make too many assumptions — and the result can be fewer errors in your predictive model.