Searching for Classification by k-Nearest Neighbors for Machine Learning

Getting Started with Coding

No matter if the machine learning problem is to guess a number or a class, the idea behind the learning strategy of the k-Nearest Neighbors (kNN) algorithm is always the same. The algorithm finds the most similar observations to the one you have to predict and from which you derive a good intuition of the possible answer by averaging the neighboring values, or by picking the most frequent answer class among them.

The learning strategy in a kNN is more like memorization. It’s just like remembering what the answer should be when the question has certain characteristics (based on circumstances or past examples) rather than really knowing the answer, because you understand the question by means of specific classification rules. In a sense, kNN is often defined as a lazy algorithm because no real learning is done at the training time, just data recording.

Being a lazy algorithm implies that kNN is quite fast at training but very slow at predicting. (Most of the searching activities and calculations on the neighbors is done at that time.) It also implies that the algorithm is quite memory-intensive because you have to store your data set in memory (which means that there’s a limit to possible applications when dealing with big data).

Ideally, kNN can make the difference when you’re working on classification and you have many labels to deal with (for instance, when a software agent posts a tag on a social network or when proposing a selling recommendation). kNN can easily deal with hundreds of labels, whereas other learning algorithms have to specify a different model for each label.

Usually, kNN works out the neighbors of an observation after using a measure of distance such as Euclidean (the most common choice) or Manhattan (works better when you have many redundant features in your data). No absolute rules exist concerning what distance measure is best to use. It really depends on the implementation you have. You also have to test each distance as a distinct hypothesis and verify by cross-validation as to which measure works better with the problem you’re solving.

About This Article

About the book author:

Nikhil Abraham was Director of Business Development & Growth at Codecademy. In that role, he taught and trained thousands of beginning coders across a variety of professions. He helped refine Codecademy's online courses, which have introduced basic coding skills to millions of learners.

This article can be found in the category:

Coding

From Category

Helping Kids with Coding: Distributing Mobile Apps

Using the MIT App Inventor to Teach Kids Coding

Helping Kids Find Common Coding Semantic Errors

Teaching Kids Strategies for Debugging Code

Teaching Kids to Code Subprograms with Parameters

Article Categories

Book Categories

Collections

Searching for Classification by k-Nearest Neighbors for Machine Learning

About This Article

About the book author:

This article can be found in the category:

Article Categories

Book Categories

Collections

Searching for Classification by k-Nearest Neighbors for Machine Learning

About This Article

This article is from the book:

About the book author:

This article can be found in the category: