Looking for Alternatives in Validation for Machine Learning

By Nikhil Abraham

You have a few alternatives to cross-validation in machine learning, all of which are derived from statistics. The first one to consider — but only if you have an in-sample made of few examples — is the leave-one-out cross-validation (LOOCV). It is analogous to k-folds cross-validation, with the only difference being that k, the number of folds, is exactly n, the number of examples. Therefore, in LOOCV, you build n models (which may turn into a huge number when you have many observations) and test each one on a single out-of-sample observation.

Apart from being computationally intensive and requiring that you build many models to test your hypothesis, the problem with LOOCV is that it tends to be pessimistic (making your error estimate higher). It’s also unstable for a small number of n, and the variance of the error is much higher. All these drawbacks make comparing models difficult.

Another alternative from statistics is bootstrapping, a method long used to estimate the sampling distribution of statistics, which are presumed not to follow a previously assumed distribution. Bootstrapping works by building a number (the more the better) of samples of size n (the original in-sample size) drawn with repetition. To draw with repetition means that the process could draw an example multiple times to use it as part of the bootstrapping resampling. Bootstrapping has the advantage of offering a simple and effective way to estimate the true error measure.

In fact, bootstrapped error measurements usually have much less variance than cross-validation ones. On the other hand, validation becomes more complicated due to the sampling with replacement, so your validation sample comes from the out-of-bootstrap examples. Moreover, using some training samples repeatedly can lead to a certain bias in the models built with bootstrapping.

If you are using out-of-bootstrapping examples for your test, you’ll notice that the test sample can be of various sizes, depending on the number of unique examples in the in-sample, likely accounting for about a third of your original in-sample size. This simple Python code snippet demonstrates randomly simulating a certain number of bootstraps:

from random import randint

import numpy as np

n = 1000 # number of examples

# your original set of examples

examples = set(range(n))

results = list()

for j in range(10000):

# your bootstrapped sample

chosen = [randint(0,n) for k in range(n)]

# out-of-sample



print ("Out-of-bootstrap: %0.1f %%" %



Out-of-bootstrap: 36.8 %

Running the experiment may require some time, and your results may be different due to the random nature of the experiment. However, you should see an output of around 36.8 percent.