How to Use Supervised Analytics to Train Predictive Models

By Anasse Bari, Mohamed Chaouchi, Tommy Jung

In supervised analytics, both input and preferred output are part of the training data. The predictive analytics model is presented with the correct results as part of its learning process. Such supervised learning assumes pre-classified examples: The goal is to get the model learn from the previously known classification so it can correctly label the next unknown data point based on what it has learned.

When the model’s training is complete, a mathematical function is inferred by examining the training data. That function will be used to label new data points.

For this approach to work correctly, the training data — along with the test data — must be carefully selected. The trained model should be able to predict the correct label for a new data point quickly and precisely, based on the data type(s) the model has seen in the training data.

Supervised analytics offer some distinct advantages:

  • The analyst is in charge of the process.

  • Labeling is based on known classifications.

  • Labeling errors can be easily resolved.

The flip side of these advantages is an equally distinct set of potential disadvantages:

  • Any mistakes at the training phase will be reinforced later on.

  • The classification provided by the analyst may not describe the whole population adequately.

  • The model may be unable to detect classes that deviate from the original training set.

  • The assumption that the clusters within the data don’t overlap — and that they can easily be separated — may not prove valid.