The Error Curve and Machine Learning

John Paul Mueller

Luca Massaron

Updated

2016-10-06 13:14:04

From the book

Machine Learning For Dummies

Download E-Book

TensorFlow For Dummies

Explore Book

Download E-Book

TensorFlow For Dummies

Explore Book

The gradient descent algorithm offers a perfect example of how machine learning works. You can provide it with an intuitive image, not just a mathematical formulation. Moreover, though it is just one of many possible methods, gradient descent is a widely used approach that’s applied to a series of machine learning algorithms such as linear models, neural networks, and gradient boosting machines.

Gradient descent works out a solution by starting from a random solution when given a set of parameters (a data matrix made of features and a response). It then proceeds in various iterations using the feedback from the cost function, thus changing its parameters with values that gradually improve the initial random solution and lower the error.

Even though the optimization may take a large number of iterations before reaching a good mapping, it relies on changes that improve the response cost function most (lower error) during each iteration. Here’s an example of a complex optimization process with many local minima (the minimum points on the curve marked with letters) where the process can get stuck (it no longer continues after the deep minimum marked with an asterisk) and cannot continue its descent.

A plotting of parameter data against the output of the cost function.

You can visualize the optimization process as a walk in high mountains, with the parameters being the different paths to descend to the valley. A gradient descent optimization occurs at each step. At each iteration, the algorithm chooses the path that reduces error the most, regardless of the direction taken. The idea is that if steps aren’t too large (causing the alogorithm to jump over the target), always following the most downward direction will result in finding the lowest place.

Unfortunately, this result doesn’t always occur because the algorithm can arrive at intermediate valleys, creating the illusion that it has reached the target. However, in most cases, gradient descent leads the machine learning algorithm to discover the right hypothesis for successfully mapping the problem. A different starting point can make the difference. Starting point x1 ends toward a local minimum, whereas points x2 and x3 reach the global minimum.

Visualizing the effect of starting point on outcome.

In an optimization process, you distinguish between different optimization outcomes. You can have a global minimum that’s truly the minimum error from the cost function, and you can have many local minima — solutions that seem to produce the minimum error but actually don’t (the intermediate valleys where the algorithm gets stuck). As a remedy, given the optimization process’s random initialization, running the optimization many times is good practice. This means trying different sequences of descending paths and not getting stuck in the same local minimum.

About This Article

About the book author:

John Paul Mueller is a freelance author and technical editor. He has writing in his blood, having produced 100 books and more than 600 articles to date. The topics range from networking to home security and from database management to heads-down programming. John has provided technical services to both Data Based Advisor and Coast Compute magazines.

Luca Massaron is a data scientist specialized in organizing and interpreting big data and transforming it into smart data by means of the simplest and most effective data mining and machine learning techniques. Because of his job as a quantitative marketing consultant and marketing researcher, he has been involved in quantitative data since 2000 with different clients and in various industries, and is one of the top 10 Kaggle data scientists.

This article can be found in the category:

Machine Learning

Hot off the press

Explore Related content

TensorFlow For Dummies

Machine Learning For Dummies

Deep Learning For Dummies

Book & Article Categories

Book & Article Categories

Collections

The Error Curve and Machine Learning

About This Article

About the book author:

This article can be found in the category:

Explore Related content

Book & Article Categories

Book & Article Categories

Collections

The Error Curve and Machine Learning

About This Article

This article is from the book:

About the book author:

This article can be found in the category:

Explore Related content

What Is the gsutil Utility?

Machine Learning: Leveraging Decision Trees with Random Forest Ensembles

The Machine Learning Process

What Is Computer Vision?

How to Use Anaconda for Machine Learning

The Relationship between AI and Machine Learning

10 Applications that Require Deep Learning

Distinguishing Classification Tasks with Convolutional Neural Networks

10 Types of Jobs that Use Deep Learning

Deep Learning and Natural Language Processing

Using AI for Sentiment Analysis

Deep Learning and Recurrent Neural Networks

Machine Learning vs. Deep Learning: Explaining Deep Learning Differences from Other Forms of AI

What is Deep Learning?

Neural Networks and Deep Learning: Neural Network Differentiation

How Does Machine Learning Work?

Deep Learning For Dummies Cheat Sheet

TensorFlow For Dummies Cheat Sheet

How to Create Vector and Matrix Operations in TensorFlow

How to Create Rounding and Comparison TensorFlow Operations