Natural Language Processing in Machine Learning

By Nikhil Abraham

As human beings, understanding language is one of our first achievements, and associating words to their meaning seems natural. It’s also automatic to handle discourses that are ambiguous, unclear, or simply have a strong reference to the context of where we live or work (such as dialect, jargon, or terms family or associates understand). In addition, humans can catch subtle references to feelings and sentiments in text, enabling people to understand polite speech that hides negative feelings and irony.

Computers don’t have this ability but can rely on NLP, a field of computer science concerned with language understanding and language generation between a machine and a human being. Since Alan Turing first devised the Turing Test in 1950, which aims at spotting an artificial intelligence based on how it communicates with humans, NLP experts have developed a series of techniques that define the state of the art in computer-human interaction by text.

A computer powered with NLP can successfully spot spam in your email, tag the part of a conversation that contains a verb or a noun, and spot an entity like the name of a person or a company (called named entity recognition). All these achievements have found application in tasks such as spam filtering, predicting the stock market using news articles, and de-duplicating redundant information in data storage.

Things get more difficult for NLP when translating a text from another language and understanding who the subject is in an ambiguous phrase. For example, consider the sentence, “John told Luca he shouldn’t do that again.” In this case, you can’t really tell whether “he” refers to John or Luca. Disambiguating words with many meanings, such as considering whether the word mouse in a phrase refers to an animal or a computer device, can prove difficult. Obviously, the difficulty in all these problems arises because of the context.

As humans, we can easily resolve ambiguity by examining the text for hints about elements like place and time that express the details of the conversation (such as understanding what happened between John and Luca, or whether the conversation is about a computer when mentioning the mouse). Relying on additional information for understanding is part of the human experience. This sort of analysis is somewhat difficult for computers.

Moreover, if the task requires critical contextual knowledge or demands that the listener resort to common sense and general expertise, the task becomes daunting. Simply put, NLP still has a lot of ground to cover in order to discover how to extract meaningful summaries from text effectively or how to complete missing information from text.