Coding All-in-One For Dummies
Book image
Explore Book Buy On Amazon
No matter if the machine learning problem is to guess a number or a class, the idea behind the learning strategy of the k-Nearest Neighbors (kNN) algorithm is always the same. The algorithm finds the most similar observations to the one you have to predict and from which you derive a good intuition of the possible answer by averaging the neighboring values, or by picking the most frequent answer class among them.

The learning strategy in a kNN is more like memorization. It’s just like remembering what the answer should be when the question has certain characteristics (based on circumstances or past examples) rather than really knowing the answer, because you understand the question by means of specific classification rules. In a sense, kNN is often defined as a lazy algorithm because no real learning is done at the training time, just data recording.

Being a lazy algorithm implies that kNN is quite fast at training but very slow at predicting. (Most of the searching activities and calculations on the neighbors is done at that time.) It also implies that the algorithm is quite memory-intensive because you have to store your data set in memory (which means that there’s a limit to possible applications when dealing with big data).

Ideally, kNN can make the difference when you’re working on classification and you have many labels to deal with (for instance, when a software agent posts a tag on a social network or when proposing a selling recommendation). kNN can easily deal with hundreds of labels, whereas other learning algorithms have to specify a different model for each label.

Usually, kNN works out the neighbors of an observation after using a measure of distance such as Euclidean (the most common choice) or Manhattan (works better when you have many redundant features in your data). No absolute rules exist concerning what distance measure is best to use. It really depends on the implementation you have. You also have to test each distance as a distinct hypothesis and verify by cross-validation as to which measure works better with the problem you’re solving.

About This Article

This article is from the book:

About the book author:

This All-in-One includes work by expert coders and coding educators, including Chris Minnick and Eva Holland coauthors of Coding with JavaScript For Dummies; Nikhil Abraham, author of Coding For Dummies and Getting a Coding Job For Dummies; John Paul Mueller and Luca Massaron, coauthors of Python for Data Science For Dummies and Machine Learning For Dummies; and Barry Burd, author of Flutter For Dummies.

This article can be found in the category: