Exploring Cost Functions in Machine Learning

By John Paul Mueller, Luca Massaron

The driving force behind optimization in machine learning is the response from a function internal to the algorithm, called the cost function. You may see other terms used in some contexts, such as loss function, objective function, scoring function, or error function, but the cost function is an evaluation function that measures how well the machine learning algorithm maps the target function that it’s striving to guess. In addition, a cost function determines how well a machine learning algorithm performs in a supervised prediction or an unsupervised optimization problem.

The evaluation function works by comparing the algorithm predictions against the actual outcome recorded from the real world. Comparing a prediction against its real value using a cost function determines the algorithm’s error level.

Because it’s a mathematical formulation, the cost function expresses the error level in a numerical form, thereby keeping errors low. The cost function transmits what is actually important and meaningful for your purposes to the learning algorithm. As a result, you must choose, or accurately define, the cost function based on an understanding of the problem you want to solve or the level of achievement you want to reach.

As an example, when considering stock market forecasting, the cost function expresses the importance of avoiding incorrect predictions. In this case, you want to make money by avoiding big losses. In forecasting sales, the concern is different because you need to reduce the error in common and frequent situations, not in the rare and exceptional ones, so you use a different cost function.

When the problem is to predict who will likely become ill from a certain disease, you prize algorithms that can score a high probability of singling out people who have the same characteristics and actually did become ill later. Based on the severity of the illness, you may also prefer that the algorithm wrongly chooses some people who don’t get ill after all rather than miss the people who actually do get ill.

The cost function is what truly drives the success of a machine learning application. It’s as critical to the learning process as representation (the capability to approximate certain mathematical functions) and optimization (how the machine learning algorithms set their internal parameters).

Most algorithms optimize their own cost function, and you have little choice but to apply them as they are. Some algorithms allow you to choose among a certain number of possible functions, providing more flexibility. When an algorithm uses a cost function directly in the optimization process, the cost function is used internally. Given that algorithms are set to work with certain cost functions, the optimization objective may differ from your desired objective.

In such a case, you measure the results using an external cost function that, for clarity of terminology, you call an error function or loss function (if it has to be minimized) or a scoring function (if it has to be maximized).

With respect to your target, a good practice is to define the cost function that works the best in solving your problem, and then to figure out which algorithms work best in optimizing it to define the hypothesis space you want to test.

When you work with algorithms that don’t allow the cost function you want, you can still indirectly influence their optimization process by fixing their hyper-parameters and selecting your input features with respect to your cost function. Finally, when you’ve gathered all the algorithm results, you evaluate them by using your chosen cost function and then decide on the final hypothesis with the best result from your chosen error function.

When an algorithm learns from data, the cost function guides the optimization process by pointing out the changes in the internal parameters that are the most beneficial for making better predictions. The optimization continues as the cost function response improves iteration by iteration. When the response stalls or worsens, it’s time to stop tweaking the algorithm’s parameters because the algorithm isn’t likely to achieve better prediction results from there on. When the algorithm works on new data and makes predictions, the cost function helps you evaluate whether it’s working properly and is indeed effective.

Deciding on the cost function is an underrated activity in machine learning. It’s a fundamental task because it determines how the algorithm behaves after learning and how it handles the problem you want to solve. Never rely on default options, but always ask yourself what you want to achieve using machine learning and check what cost function can best represent the achievement.