How to Define and Test Predictive Analytics Prototypes
An effective way to state your business objectives for predictive analytics clearly is as a bulleted list of user decisions. Then run your prototype to generate predictions and scores for each possible decision. For instance, in an example of Product X, you could list your objectives as a range of possible business decisions to be assessed:
Increase the sales volume of Product X
Terminate manufacture of Product X
Change the marketing strategy behind Product X
Increase ads in a specific geographical location
Increase ads for specific customers
The predictive model will evaluate these decisions according to their future likelihood of successful profitability. The output might indicate, for example, that the company has an 80-percent chance of increasing profit by increasing the sales volume of Product X.
How to find the right predictive data
After you’ve clearly stated the business objective and the problem you’re willing to tackle, the next step is to collect the data that your predictive model will use. In this phase, you have to identify your data source(s).
For instance, if you’re developing a prototype for predicting the right decision on a specific product, then you need to gather both internal and external data for that product. You should not restrict the type or source of data, as long as it’s relevant to the business goal.
If (say) your company is considering the introduction of a new hybrid sports car, you can contact the sales department and gather information about the sales data generated by similar products. You can contact the engineering department to find out how much the components cost (how about those longer-lasting batteries?), as well as the resources and time needed to produce the product (any retooling needed?).
You might also include data about previous decisions made about a similar product (say, an overpowered convertible introduced some years ago), and their outcome (market conditions and fuel prices depressed sales).
You might want to consider using big data related to the product in question. For instance, download customer reviews about company products, tweets or Facebook posts where the products are mentioned. One way to do that is to use application programming interfaces (APIs) provided by those companies.
For instance, if you want to gather tweets that contain a specific word, Twitter provides a set of APIs that you could use to download such tweets. There’s a limit to how much data you can capture free of charge; in some cases, you might have to pay to keep downloading the needed data from Twitter.
When you’ve determined the most relevant data and the most useful source from which to get it, start storing the data you intend to use for your predictive model. Data may need to undergo some preprocessing.
How to design your predictive model
For a prototype, your input could be a data matrix that represents known factors derived from historical data.
Such a data matrix, when analyzed, can produce output that looks something like this:
57.6 percent of customers stated they were unhappy with the product.
The product requires three hours on average to produce.
Positive sentiment on the product is 80 percent.
Inputs to the prototype model could include historical data about similar products, the corresponding decisions made about them, and the impact of those decisions on your business processes. The prototype’s output would be predictions and their corresponding scores as possible actions toward attaining the objectives you’ve set.
To get a usable prototype, you have to employ a mixture of techniques to build the model. For instance, you could use K-means algorithm as one of the clustering algorithms; you could use it to build clusters like these:
Products that were terminated — and that decision’s impact on profit
Products that were increased in volume and that decision’s impact on profit
Products whose marketing strategy was changed and that decision’s impact profit
Then you could use classification algorithms such as a decision tree or Naïve Bayes that would classify or predict missing values (such as sales profit value) for the product in question (Product X).
How to identify your test data
To evaluate your predictive analytics model, you have to run the model over some test data that it hasn’t seen yet. You could run the model over several historical datasets as input and record how many of the model’s predictions turn out correct.
How to run the model on test data
Evaluating your predictive model is an iterative process — essentially trial and error. Effective models rarely result from a mere first test. If your predictive model produces 100-percent accuracy, consider that result too good to be true; suspect something wrong with your data or your algorithms.
For instance, if the first algorithm you use to build your prototype is the Naïve Bayes classifier and you’re not satisfied with the predictions it gives you when you run the test data, try another algorithm such as the Nearest Neighbor classifier. Keep running other algorithms until you find the one that’s most consistently and reliably predictive.
During the testing, you might find out that you need to revisit the initial data that you used to build the prototype model. You might need to find more relevant data for your analysis.
As a precaution, always verify that the steps involved in building the model are correct. In addition, comparing the output of the model on the test dataset to the actual results will help you evaluate the accuracy of your model.
The higher the confidence in the results of your predictive model, the easier it is for the stakeholders to approve its deployment.
To make sure that your model is accurate, you need to evaluate whether the model meets its business objectives. Domain experts can help you interpret the results of your model.