Averaging Different Predictors for Machine Learning
Both averaging and voting systems can also work fine when you use a mix of different machine learning algorithms. This is the averaging approach, and it’s widely used when you can’t reduce the estimate variance.
As you try to learn from data, you have to try different solutions, thus modeling your data using different machine learning solutions. It’s good practice to check whether you can put some of them successfully into ensembles using prediction averages or by counting the predicted classes. The principle is the same as in bagging noncorrelated predictions, when models mixed together can produce less variance-affected predictions. To achieve effective averaging, you have to
- Divide your data into training and test sets.
- Use the training data with different machine learning algorithms.
- Record predictions from each algorithm and evaluate the viability of the result using the test set.
- Correlate all the predictions available with each other.
- Pick the predictions that least correlate and average their result. Or, if you’re classifying, pick a group of least correlated predictions and, for each example, pick as a new class prediction the class that the majority of them predicted.
- Test the newly averaged or voted-by-majority prediction against the test data. If successful, you create your final model by averaging the results of the models part of the successful ensemble.
To understand which models correlate the least, take the predictions one by one, correlate each one against the others, and average the correlations to obtain an averaged correlation. Use the averaged correlation to rank the selected predictions that are most suitable for averaging.