Luca Mueller

Articles From Luca Mueller

10 results
10 results
10 Applications that Require Deep Learning

Article / Updated 08-16-2022

This article is too short. It can’t even begin to describe the ways in which deep learning will affect you in the future. Consider this article to be offering a tantalizing tidbit — an appetizer that can whet your appetite for exploring the world of deep learning further. These deep learning applications are already common in some cases. You probably used at least one of them today, and quite likely more than just one. Although the technology has begun to see widespread usage, it’s really just the beginning. We’re at the start of something, and AI is actually quite immature at this point. This article doesn’t discuss killer robots, dystopian futures, AI run amok, or any of the sensational scenarios that you might see in the movies. The information you find here is about real life, existing AI applications that you can interact with today. Deep learning can be used to restore color to black-and-white videos and pictures You probably have some black-and-white videos or pictures of family members or special events that you’d love to see in color. Color consists of three elements: hue (the actual color), value (the darkness or lightness of the color), and saturation (the intensity of the color). Oddly enough, many artists are color-blind and make strong use of color value in their creations. So having hue missing (the element that black-and-white art lacks) isn’t the end of the world. Quite the contrary, some artists view it as an advantage. When viewing something in black and white, you see value and saturation but not hue. Colorization is the process of adding the hue back in. Artists generally perform this process using a painstaking selection of individual colors. However, AI has automated this process using Convolutional Neural Networks (CNNs). The easiest way to use CNN for colorization is to find a library to help you. The Algorithmia site offers such a library and shows some example code. You can also try the application by pasting a URL into the supplied field. This Petapixel.com article describes just how well this application works. It’s absolutely amazing! Deep learning can approximate person poses in real time Person poses don’t tell you who is in a video stream, but rather what elements of a person are in the video stream. For example, using a person pose could tell you whether the person’s elbow appears in the video and where it appears. This article tells you more about how this whole visualization technique works. In fact, you can see how the system works through a short animation of one person in the first case and three people in the second case. Person poses can have all sorts of useful purposes. For example, you could use a person pose to help people improve their form for various kinds of sports — everything from golf to bowling. A person pose could also make new sorts of video games possible. Imagine being able to track a person’s position for a game without the usual assortment of cumbersome gear. Theoretically, you could use person poses to perform crime-scene analysis or to determine the possibility of a person committing a crime. Another interesting application of pose detection is for medical and rehabilitation purposes. Software powered by deep learning could tell you whether you’re doing your exercises correctly and track your improvements. An application of this sort could support the work of a professional rehabilitator by taking care of you when you aren’t in a medical facility (an activity called telerehabilitation). Fortunately, you can at least start working with person poses today using the tfjs-models (PoseNet) library. Deep learning can perform real-time behavior analysis Behavior analysis goes a step beyond what the person poses analysis does. When you perform behavior analysis, the question still isn’t a matter of whom, but how. This particular AI application affects how vendors design products and websites. Articles such as this one from Amplitude go to great lengths to fully define and characterize the use of behavior analysis. In most cases, behavior analysis helps you see how the process the product designer expected you to follow doesn’t match the process you actually use. Behavior analysis has a role to play in other areas of life as well. For example, behavior analysis can help people in the medical profession identify potential issues with people who have specific medical conditions, such as autism, and help the patient overcome those issues. Behavior analysis may also help teachers of physical arts show students how to hone their skills. You might also see it used in the legal profession to help ascertain motive. (The guilt is obvious, but why a person does something is essential to fair remediation of an unwanted behavior.) Fortunately, you can already start performing behavior analysis with Python. Deep learning can be used to translate languages The Internet has created an environment that can keep you from knowing whom you’re really talking to, where that person is, or sometimes even when the person is talking to you. One thing hasn’t changed, however: the need to translate one language to another when the two parties don’t speak a common language. In a few cases, mistranslation can be humorous, assuming that both parties have a sense of humor. However, mistranslation has also led to all sorts of serious consequences, including war. Consequently, even though translation software is extremely accessible on the Internet, careful selection of which product to use is important. One of the most popular of these applications is Google Translate, but many other applications are available, such as, DeepL. According to Forbes, machine translation is one area in which AI excels. Translation applications generally rely on Bidirectional Recurrent Neural Networks (BRNNs). You don’t have to create your own BRNN because you have many existing APIs to choose from. For example, you can get Python access to the Google Translate API using the library. The point is that translation is possibly one of the more popular deep learning applications and one that many people use without even thinking about it. Deep learning can be used to estimate solar savings potential Trying to determine whether solar energy will actually work in your location is difficult unless a lot of other people are also using it. In addition, it’s even harder to know what level of savings you might enjoy. Of course, you don’t want to install solar energy if it won’t satisfy your goals for using it, which may not actually include long-term cost savings (although generally it does). Some deep reinforcement learning projects now help you take the guesswork out of solar energy, including Project Sunroof. Fortunately, you can also get support for this kind of prediction in your Python application. AI can beat people at computer games The AI-versus-people competition continues to attract interest. From winning at chess to winning at Go, AI seems to have become unbeatable — at least, unbeatable at one game. Unlike humans, AI specializes, and an AI that can win at Go is unlikely to do well at chess. Even so, 2017 is often hailed as the beginning of the end for humans over AI in games. Of course, the competition has been going on for some time, And you can likely find competitions that the AI won far earlier than 2017. Indeed, some sources place the date for a Go win as early as October 2015. The article at Interesting Engineering describes 11 other times that the AI won. The problem is to custom create an AI that can win a particular game and realize that in specializing at that game, the AI may not do well at other games. The process of building an AI for just one game can look difficult. This article describes how to build a simple chess AI, which actually won’t defeat a chess master but could do well with an intermediate player. However, it’s actually a bit soon to say that people are out of the game. In the future, people may compete against the AI with more than one game. Examples of this sort of competition already abound, such as people who perform in a triathlon of games, which consists of three sporting events, rather than one. The competition would then become one of flexibility: the AI couldn’t simply hunker down and learn only one game, so the human would have a flexibility edge. This sort of AI use demonstrates that humans and AI may have to cooperate in the future, with the AI specializing in specific tasks and the human providing the flexibility needed to perform all required tasks. Deep learning can be used to generate voices Your car may already speak to you; many cars speak regularly to people now. Oddly, the voice generation is often so good that it’s hard to tell the generated voice from a real one. Some articles talk about how the experience of finding computer voices that sound quite real are becoming more common. The issue attracts enough attention now that many call centers tell you that you’re speaking to a computer rather than a person. Although call output relies on scripted responses, making it possible to generate responses with an extremely high level of confidence, voice recognition is a little harder to perform (but it has greatly improved). To work with voice recognition successfully, you often need to limit your input to specific key terms. By using keywords that the voice recognition is designed to understand, you avoid the need for a user to repeat a request. This need for specific terms gives it away that you’re talking to a computer — simply ask for something unexpected and the computer won’t know what to do with it. The easy way to implement your own voice system is to rely on an existing API, such as Cloud Speech to Text. Of course, you might need something that you can customize. In this case, using an API will prove helpful. This article tells how to build your own voice-based application using Python. Deep learning can be used to predict demographics Demographics, those vital or social statistics that group people by certain characteristics, have always been part art and part science. You can find any number of articles about getting your computer to generate demographics for clients (or potential clients). The use of demographics is wide ranging, but you see them used for things like predicting which product a particular group will buy (versus that of the competition). Demographics are an important means of categorizing people and then predicting some action on their part based on their group associations. Here are the methods that you often see cited for AIs when gathering demographics: Historical: Based on previous actions, an AI generalizes which actions you might perform in the future. Current activity: Based on the action you perform now and perhaps other characteristics, such as gender, a computer predicts your next action. Characteristics: Based on the properties that define you, such as gender, age, and area where you live, a computer predicts the choices you are likely to make. You can find articles about AI’s predictive capabilities that seem almost too good to be true. For example, this Medium article says that AI can now predict your demographics based solely on your name. The company in that article, Demografy, claims to provide gender, age, and cultural affinity based solely on name. Even though the site claims that it’s 100 percent accurate, this statistic is highly unlikely because some names are gender ambiguous, such as Renee, and others are assigned to one gender in some countries and another gender in others. Yes, demographic prediction can work, but exercise care before believing everything that these sites tell you. If you want to experiment with demographic prediction, you can find a number of APIs online. For example, the DeepAI API promises to help you predict age, gender, and cultural background based on a person’s appearance in a video. Each of the online APIs do specialize, so you need to choose the API with an eye toward the kind of input data you can provide. AI can create art from real-world pictures Deep learning can use the content of a real-world picture and an existing master for style to create a combination of the two. In fact, some pieces of art generated using this approach are commanding high prices on the auction block. You can find all sorts of articles on this particular kind of art generation, such as this Wired article. However, even though pictures are nice for hanging on the wall, you might want to produce other kinds of art. For example, you can create a 3-D version of your picture using products like Smoothie 3-D. It’s not the same as creating a sculpture; rather, you use a 3-D printer to build a 3-D version of your picture. Check out an experiment that you can perform to see how the process works. The output of an AI doesn’t need to consist of something visual, either. For example, deep learning enables you to create music based on the content of a picture. This form of art makes the method used by AI clearer. The AI transforms content that it doesn’t understand from one form to another. As humans, we see and understand the transformation, but all the computer sees are numbers to process using clever algorithms created by other humans. Deep learning can be used to forecast natural catastrophes People have been trying to predict natural disasters for as long as there have been people and natural disasters. No one wants to be part of an earthquake, tornado, volcanic eruption, or any other natural disaster. Being able to get away quickly is the prime consideration in this case given that humans can’t control their environment well enough yet to prevent any natural disaster. Deep learning provides the means to look for extremely subtle patterns that boggle the minds of humans. These patterns can help predict a natural catastrophe. The fact that software can predict any disaster at all is simply amazing. However, this article warns that relying on such software exclusively would be a mistake. Overreliance on technology is a constant theme, so don’t be surprised that deep learning is less than perfect in predicting natural catastrophes as well.

View Article
Deep Learning For Dummies Cheat Sheet

Cheat Sheet / Updated 04-12-2022

Deep learning affects every area of your life — everything from smartphone use to diagnostics received from your doctor. Python is an incredible programming language that you can use to perform deep learning tasks with a minimum of effort. By combining the huge number of available libraries with Python-friendly frameworks, you can avoid writing the low-level code normally needed to create deep learning applications. All you need to focus on is getting the job done. This cheat sheet presents the most commonly needed reminders for making your programming experience fast and easy.

View Cheat Sheet
How Does Machine Learning Work?

Article / Updated 08-26-2021

Machine learning is an application of AI that can automatically learn and improve from experience without being explicitly programmed to do so. The machine learning occurs as a result of analyzing ever increasing amounts of data, so the basic algorithms don’t change, but the code's internal weights and biases used to select a particular answer do. Of course, nothing is quite this simple. The following article discusses more about what machine learning is so that you can understand its place within the world of AI and what deep learning acquires from it. Data scientists often refer to the technology used to implement machine learning as algorithms. An algorithm is a series of step-by-step operations, usually computations, that can solve a defined problem in a finite number of steps. In machine learning, the algorithms use a series of finite steps to solve the problem by learning from data. Understanding how machine learning works Machine learning algorithms learn, but it’s often hard to find a precise meaning for the term learning because different ways exist to extract information from data, depending on how the machine learning algorithm is built. Generally, the learning process requires huge amounts of data that provides an expected response given particular inputs. Each input/response pair represents an example and more examples make it easier for the algorithm to learn. That’s because each input/response pair fits within a line, cluster, or other statistical representation that defines a problem domain. Machine learning is the act of optimizing a model, which is a mathematical, summarized representation of data itself, such that it can predict or otherwise determine an appropriate response even when it receives input that it hasn’t seen before. The more accurately the model can come up with correct responses, the better the model has learned from the data inputs provided. An algorithm fits the model to the data, and this fitting process is training. The image below shows an extremely simple graph that simulates what occurs in machine learning. In this case, starting with input values of 1, 4, 5, 8, and 10 and pairing them with their corresponding outputs of 7, 13, 15, 21, and 25, the machine learning algorithm determines that the best way to represent the relationship between the input and output is the formula 2x + 5. This formula defines the model used to process the input data — even new, unseen data —to calculate a corresponding output value. The trend line (the model) shows the pattern formed by this algorithm, such that a new input of 3 will produce a predicted output of 11. Even though most machine learning scenarios are much more complicated than this (and the algorithm can't create rules that accurately map every input to a precise output), the example gives provides you a basic idea of what happens. Rather than have to individually program a response for an input of 3, the model can compute the correct response based on input/response pairs that it has learned. Understanding that machine learning is pure math The central idea behind machine learning is that you can represent reality by using a mathematical function that the algorithm doesn’t know in advance, but which it can guess after seeing some data (always in the form of paired inputs and outputs). You can express reality and all its challenging complexity in terms of unknown mathematical functions that machine learning algorithms find and make available as a modification of their internal mathematical function. That is, every machine learning algorithm is built around a modifiable math function. The function can be modified because it has internal parameters or weights for such a purpose. As a result, the algorithm can tailor the function to specific information taken from data. This concept is the core idea for all kinds of machine learning algorithms. Learning in machine learning is purely mathematical, and it ends by associating certain inputs with certain outputs. It has nothing to do with understanding what the algorithm has learned. (When humans analyze data, we build an understanding of the data to a certain extent.) The learning process is often described as training because the algorithm is trained to match the correct answer (the output) to every question offered (the input). (Machine Learning For Dummies, by John Paul Mueller and Luca Massaron, describes how this process works in detail.) In spite of lacking deliberate understanding and of being a mathematical process, machine learning can prove useful in many tasks. It provides many AI applications the power to mimic rational thinking given a certain context when learning occurs by using the right data. Different strategies for machine learning Machine learning offers a number of different ways to learn from data. Depending on your expected output and on the type of input you provide, you can categorize algorithms by learning style. The style you choose depends on the sort of data you have and the result you expect. The four learning styles used to create algorithms are: Supervised machine learning Unsupervised machine learning Self-supervised machine learning Reinforcement machine learning The following sections discuss these machine learning styles. Supervised machine learning When working with supervised machine learning algorithms, the input data is labeled and has a specific expected result. You use training to create a model that an algorithm fits to the data. As training progresses, the predictions or classifications become more accurate. Here are some examples of supervised machine learning algorithms: Linear or Logistic regression Support Vector Machines (SVMs) Naïve Bayes K-Nearest Neighbors (KNN) You need to distinguish between regression problems, whose target is a numeric value, and classification problems, whose target is a qualitative variable, such as a class or tag. A regression task could determine the average prices of houses in the Boston area, while an example of a classification task is distinguishing between kinds of iris flowers based on their sepal and petal measures. Here are some examples of supervised machine learning: Data Input (X) Data Output (y) Real-World Application History of customers’ purchases A list of products that customers have never bought Recommender system Images A list of boxes labeled with an object name Image detection and recognition English text in the form of questions English text in the form of answers Chatbot, a software application that can converse English text German text Machine language translation Audio Text transcript Speech recognition Image, sensor data Steering, braking, or accelerating Behavioral planning for autonomous driving Unsupervised machine learning When working with unsupervised machine learning algorithms, the input data isn’t labeled and the results aren’t known. In this case, analysis of structures in the data produces the required model. The structural analysis can have a number of goals, such as to reduce redundancy or to group similar data. Examples of unsupervised machine learning are Clustering Anomaly detection Neural networks Self-Supervised machine learning You’ll find all sorts of kinds of learning described online, but self-supervised learning is in a category of its own. Some people describe it as autonomous supervised learning, which gives you the benefits of supervised learning but without all the work required to label data. Theoretically, self-supervised could solve issues with other kinds of learning that you may currently use. The following list compares self-supervised learning with other sorts of learning that people use. Supervised machine learning: The closest form of learning associated with self-supervised learning is supervised machine learning because both kinds of learning rely on pairs of inputs and labeled outputs. In addition, both forms of learning are associated with regression and classification. However, the difference is that self-supervised learning doesn’t require a person to label the output. Instead, it relies on correlations, embedded metadata, or domain knowledge embedded within the input data to contextually discover the output label. Unsupervised machine learning: Like unsupervised machine learning, self-supervised learning requires no data labeling. However, unsupervised learning focuses on data structure — that is, patterns within the data. Therefore, you don’t use self-supervised learning for tasks such as clustering, grouping, dimensionality reduction, recommendation engines, or the like. Semi-supervised machine learning: A semi-supervised learning solution works like an unsupervised learning solution in that it looks for data patterns. However, semi-supervised learning relies on a mix of labeled and unlabeled data to perform its tasks faster than is possible using strictly unlabeled data. Self-supervised learning never requires labels and uses context to perform its task, so it would actually ignore the labels when supplied. Reinforcement machine learning You can view reinforcement learning as an extension of self-supervised learning because both forms use the same approach to learning with unlabeled data to achieve similar goals. However, reinforcement learning adds a feedback loop to the mix. When a reinforcement learning solution performs a task correctly, it receives positive feedback, which strengthens the model in connecting the target inputs and output. Likewise, it can receive negative feedback for incorrect solutions. In some respects, the system works much the same as working with a dog based on a system of rewards. Training, validating, and testing data for machine learning Machine learning is a process, just as everything is a process in the world of computers. To build a successful machine learning solution, you perform these tasks as needed, and as often as needed: Training: Machine learning begins when you train a model using a particular algorithm against specific data. The training data is separate from any other data, but it must also be representative. If the training data doesn’t truly represent the problem domain, the resulting model can’t provide useful results. During the training process, you see how the model responds to the training data and make changes, as needed, to the algorithms you use and the manner in which you massage the data prior to input to the algorithm. Validating: Many datasets are large enough to split into a training part and a testing part. You first train the model using the training data, and then you validate it using the testing data. Of course, the testing data must again represent the problem domain accurately. It must also be statistically compatible with the training data. Otherwise, you won’t see results that reflect how the model will actually work. Testing: After a model is trained and validated, you still need to test it using real-world data. This step is important because you need to verify that the model will actually work on a larger dataset that you haven’t used for either training or testing. As with the training and validation steps, any data you use during this step must reflect the problem domain you want to interact with using the machine learning model. Training provides a machine learning algorithm with all sorts of examples of the desired inputs and outputs expected from those inputs. The machine learning algorithm then uses this input to create a math function. In other words, training is the process whereby the algorithm works out how to tailor a function to the data. The output of such a function is typically the probability of a certain output or simply a numeric value as output. To give an idea of what happens in the training process, imagine a child learning to distinguish trees from objects, animals, and people. Before the child can do so in an independent fashion, a teacher presents the child with a certain number of tree images, complete with all the facts that make a tree distinguishable from other objects of the world. Such facts could be features, such as the tree’s material (wood), its parts (trunk, branches, leaves or needles, roots), and location (planted in the soil). The child builds an understanding of what a tree looks like by contrasting the display of tree features with the images of other, different examples, such as pieces of furniture that are made of wood, but do not share other characteristics with a tree. A machine learning classifier works the same. A classifier algorithm provides you with a class as output. For instance, it could tell you that the photo you provide as an input matches the tree class (and not an animal or a person). To do so, it builds its cognitive capabilities by creating a mathematical formulation that includes all the given input features in a way that creates a function that can distinguish one class from another. Looking for generalization in machine learning To be useful, a machine learning model must represent a general view of the data provided. If the model doesn’t follow the data closely enough, it’s underfitted — that is, not fitted enough because of a lack of training. On the other hand, if the model follows the data too closely, it’s overfitted, following the data points like a glove because of too much training. Underfitting and overfitting both cause problems because the model isn’t generalized enough to produce useful results. Given unknown input data, the resulting predictions or classifications will contain large error values. Only when the model is correctly fitted to the data will it provide results within a reasonable error range. This whole issue of generalization is also important in deciding when to use machine learning. A machine learning solution always generalizes from specific examples to general examples of the same sort. How it performs this task depends on the orientation of the machine learning solution and the algorithms used to make it work. The problem for data scientists and others using machine learning and deep learning techniques is that the computer won’t display a sign telling you that the model correctly fits the data. Often, it’s a matter of human intuition to decide when a model is trained enough to provide a good generalized result. In addition, the solution creator must choose the right algorithm out of the thousands that exist. Without the right algorithm to fit the model to the data, the results will be disappointing. To make the selection process work, the data scientist must possess A strong knowledge of the available machine learning algorithms Experience dealing with the kind of data in question An understanding of the desired output A desire to experiment with various machine learning algorithms The last requirement is the most important because there are no hard-and-fast rules that say a particular algorithm will work with every kind of data in every possible situation. If this were the case, so many algorithms wouldn’t be available. To find the best algorithm, the data scientist often resorts to experimenting with a number of algorithms and comparing the results. Getting to know the limits of bias Your computer has no bias. It has no goal of world domination or of making your life difficult. In fact, computers don’t have goals of any kind. The only thing a computer can provide is output based on inputs and processing technique. However, bias still gets into the computer and taints the results it provides in a number of ways: Data: The data itself can contain mistruths or simply misrepresentations. For example, if a particular value appears twice as often in the data as it does in the real world, the output from a machine learning solution is tainted, even though the data itself is correct. Algorithm: Using the wrong algorithm will cause the machine learning solution to fit the model to the data incorrectly. Training: Too much or too little training changes how the model fits the data and therefore the result. Human interpretation: Even when a machine learning solution outputs a correct result, the human using that output can misinterpret it. The results are every bit as bad as, and perhaps worse than, when the machine learning solution fails to work as anticipated. You need to consider the effects of bias no matter what sort of machine learning solution you create. It’s important to know what sorts of limits these biases place on your machine learning solution and whether the solution is reliable enough to provide useful output. Keeping model complexity in mind for machine learning Simpler is always better when it comes to machine learning. Many different algorithms may provide you with useful output from your machine learning solution, but the best algorithm to use is the one that’s easiest to understand and provides the most straightforward results. Occam’s Razor is generally recognized as the best strategy to follow. Basically, Occam’s Razor tells you to use the simplest solution that will solve a particular problem. As complexity increases, so does the potential for errors. The most important guiding factor when selecting an algorithm should be simplicity.

View Article
Machine Learning vs. Deep Learning: Explaining Deep Learning Differences from Other Forms of AI

Article / Updated 11-14-2019

Given the embarrassment of riches that pertain to AI as a whole, such as large amounts of data, new and powerful computational hardware available to everyone, and plenty of private and public investments, you may be skeptical about the technology behind deep learning, which consists of neural networks that have more neurons and hidden layers than in the past. Deep networks contrast with the simpler, shallower networks of the past, which featured one or two hidden layers at best. Many solutions that render deep learning today possible are not at all new, but deep learning uses them in new ways. Deep learning isn’t simply a rebranding of an old technology, the perceptron, discovered in 1957 by Frank Rosenblatt at the Cornell Aeronautical Laboratory. Deep learning works better because of the extra sophistication it adds through the full use of powerful computers and the availability of better (not just more) data. Deep learning also implies a profound qualitative change in the capabilities offered by the technology along with new and astonishing applications. The presence of these capabilities modernizes old but good neural networks, transforming them into something new. The following article describes just how deep learning achieves its task. Adding more layers for deep learning You may wonder why deep learning has blossomed only now when the technology used as the foundation of deep learning existed long ago. Computers are more powerful today, and deep learning can access huge amounts of data. However, these answers point only to important problems with deep learning in the past, and lower computing power along with less data weren’t the only insurmountable obstacles. Until recently, deep learning also suffered from a key technical problem that kept neural networks from having enough layers to perform truly complex tasks. Because it can use many layers, deep learning can solve problems that are out of reach of machine learning, such as image recognition, machine translation, and speech recognition. When fitted with only a few layers, a neural network is a perfect universal function approximator, which is a system that can recreate any possible mathematical function. When fitted with many more layers, a neural network becomes capable of creating, inside its internal chain of matrix multiplications, a sophisticated system of representations to solve complex problems. To understand how a complex task like image recognition works, consider this process: A deep learning system trained to recognize images (such as a network capable of distinguishing photos of dogs from those featuring cats) defines internal weights that have the capability to recognize a picture topic. After detecting each single contour and corner in the image, the deep learning network assembles all such basic traits into composite characteristic features. The network matches such features to an ideal representation that provides the answer. In other words, a deep learning network can distinguish dogs from cats using its internal weights to define a representation of what, ideally, a dog and a cat should resemble. It then uses these internal weights to match any new image you provide it with. One of the earliest achievements of deep learning that made the public aware of its potentiality is the cat neuron. The Google Brain team, run at that time by Andrew Ng and Jeff Dean, put together 16,000 computers to calculate a deep learning network with more than a billion weights, thus enabling unsupervised learning from YouTube videos. The computer network could even determine by itself, without any human intervention, what a cat is, and Google scientists managed to dig out of the network a representation of how the network itself expected a cat should look (see the Wired article discussing neural networks). During the time that scientists couldn’t stack more layers into a neural network because of the limits of computer hardware, the potential of the technology remained buried, and scientists ignored neural networks. The lack of success added to the profound skepticism that arose around the technology during the last AI winter. However, what really prevented scientists from creating something more sophisticated was the problem with vanishing gradients. A vanishing gradient occurs when you try to transmit a signal through a neural network and the signal quickly fades to near zero values; it can’t get through the activation functions. This happens because neural networks are chained multiplications. Each below-zero multiplication decreases the incoming values rapidly, and activation functions need large enough values to let the signal pass. The farther neuron layers are from the output, the higher the likelihood that they’ll get locked out of updates because the signals are too small and the activation functions will stop them. Consequently, your network stops learning as a whole, or it learns at an incredibly slow pace. Every attempt at putting together and testing complex networks ended in failure because the backpropagation algorithm couldn’t update the layers nearer the input, thus rendering any learning from complex data, even when such data was available at the time, almost impossible. Today, deep networks are possible thanks to the studies of scholars from the University of Toronto in Canada, such as Geoffrey Hinton, who insisted on working on neural networks even when they seemed to most to be an old-fashioned machine learning approach. Professor Hinton, a veteran of the field of neural networks (he contributed to defining the backpropagation algorithm), and his team in Toronto devised a few methods to circumvent the problem of vanishing gradients. He opened the field to rethinking new solutions that made neural networks a crucial tool in machine learning and AI again. Professor Hinton and his team are memorable also for being among the first to test GPU usage in order to accelerate the training of a deep neural network. In 2012, they won an open competition, organized by the pharmaceutical company Merck and Kaggle (the latter a website for data science competitions), using their most recent deep learning discoveries. This event brought great attention to their work. You can read all the details of the Hinton team’s revolutionary achievement with neural network layers from this Geoffrey Hinton interview. Changing the activations for deep learning Geoffrey Hinton’s team was able to add more layers to a neural architecture because of two solutions that prevented trouble with backpropagation: They prevented the exploding gradients problem by using smarter network initialization. An exploding gradient differs from a vanishing gradient because it can make a network blow up as the exploding gradient becomes too large to handle. Your network can explode unless you correctly initialize the network to prevent it from computing large weight numbers. Then you solve the problem of vanishing gradients by changing the network activations. The team realized that passing a signal through various activation layers tended to damp the backpropagation signal until it becomes too faint to pass anymore after examining how a sigmoid activation worked. They used a new activation as the solution for this problem. The choice of which algorithm to use fell toward an old activation type of ReLU, which stands for rectified linear units. An ReLU activation stopped the received signal if it was below zero assuring the non-linearity characteristic of neural networks and letting the signal pass as it was if above zero. (Using this type of activation is an example of combining old but still good technology with current technology.) The image below shows how this process works. The ReLU worked incredibly well and let the backpropagation signal arrive at the initial deep network layers. When the signal is positive, its derivative is 1. You can also find proof of the ReLU derivative in looking. Note that the rate of change is constant and equivalent to a unit when the input signal is positive (whereas when the signal is negative, the derivative is 0, thus preventing the signal from passing). You can calculate the ReLU function using f(x)=max(0,x). The use of this algorithm increased training speed a lot, allowing fast training of even deeper networks without incurring any dead neurons. A dead neuron is one that the network can't activate because the signals are too faint. Adding regularization by dropout for deep learning The other introduction to deep learning made by Hinton’s team to complete the initial deep learning solution aimed at regularizing the network. A regularized network limits the network weights, which keeps the network from memorizing the input data and generalizing the witnessed data patterns. Remember, certain neurons memorize specific information and force the other neurons to rely on this stronger neuron, causing the weak neurons give up learning anything useful themselves (a situation called co-adaptation). To prevent co-adaptation, the code temporary switches off the activation of a random portion of neurons in the network. As you see from the left side of the image below, the weights normally operate by multiplying their inputs into outputs for the activations. To switch off activation, the code multiplies a mask made of a random mix of ones and zeros with the results. If the neuron is multiplied by one, the network passes its signal. When a neuron is multiplied by zero, the network stops its signal, forcing others neurons not to rely on it in the process. Dropout works only during training and doesn’t touch any part of the weights. It simply masks and hides part of the network, forcing the unmasked part to take a more active role in learning data patterns. During prediction time, dropout doesn’t operate, and the weights are numerically rescaled to take into account the fact that they didn’t work all together during training.

View Article
Using AI for Sentiment Analysis

Article / Updated 09-06-2019

Sentiment analysis computationally derives from a written text using the writer’s attitude (whether positive, negative, or neutral), toward the text topic. This kind of analysis proves useful for people working in marketing and communication because it helps them understand what customers and consumers think of a product or service and thus, act appropriately (for instance, trying to recover unsatisfied customers or deciding to use a different sales strategy). Everyone performs sentiment analysis. For example, when reading text, people naturally try to determine the sentiment that moved the person who wrote it. However, when the number of texts to read and understand is too huge and the text constantly accumulates, as in social media and customer e-mails, automating sentiment analysis is important. The upcoming example is a test run of RNNs using Keras and TensorFlow that builds a sentiment analysis algorithm capable of classifying the attitudes expressed in a film review. The data is a sample of the IMDb dataset that contains 50,000 reviews (split in half between train and test sets) of movies accompanied by a label expressing the sentiment of the review (0=negative, 1=positive). IMDb is a large online database containing information about films, TV series, and video games. Originally maintained by a fan base, it’s now run by an Amazon subsidiary. On IMDb, people find the information they need about their favorite show as well as post their comments or write a review for other visitors to read. Keras offers a downloadable wrapper for IMDb data. You prepare, shuffle, and arrange this data into a train and a test set. In particular, the IMDb textual data offered by Keras is cleansed of punctuation, normalized into lowercase, and transformed into numeric values. Each word is coded into a number representing its ranking in frequency. Most frequent words have low numbers; less frequent words have higher numbers. As a starter point, the code imports the imdb function from Keras and uses it to retrieve the data from the Internet (about a 17.5MB download). The parameters that the example uses encompass just the top 10,000 words, and Keras should shuffle the data using a specific random seed. (Knowing the seed makes it possible to reproduce the shuffle as needed.) The function returns two train and test sets, both made of text sequences and the sentiment outcome. from keras.datasets import imdb top_words = 10000 ((x_train, y_train), (x_test, y_test)) = imdb.load_data(num_words=top_words, seed=21) After the previous code completes, you can check the number of examples using the following code: print("Training examples: %i" % len(x_train)) print("Test examples: %i" % len(x_test)) After inquiring about the number of cases available for use in the training and test phase of the neural network, the code outputs an answer of 25,000 examples for each phase. (This dataset is a relatively small one for a language problem; clearly the dataset is mainly for demonstration purposes.) In addition, the code determines whether the dataset is balanced, which means it has an almost equal number of positive and negative sentiment examples. import numpy as np print(np.unique(y_train, return_counts=True)) The result, array([12500, 12500]), confirms that the dataset is split evenly between positive and negative outcomes. Such a balance between the response classes is exclusively because of the demonstrative nature of the dataset. In the real world, you seldom find balanced datasets. The next step creates some Python dictionaries that can convert between the code used in the dataset and the real words. In fact, the dataset used in this example is preprocessed and provides sequences of numbers representing the words, not the words themselves. (LSTM and GRU algorithms that you find in Keras expect sequences of numbers as numbers.) word_to_id = {w:i+3 for w,i in imdb.get_word_index().items()} id_to_word = {0:'<PAD>', 1:'<START>', 2:'<UNK>'} id_to_word.update({i+3:w for w,i in imdb.get_word_index().items()}) def convert_to_text(sequence): return ' '.join([id_to_word[s] for s in sequence if s>=3]) print(convert_to_text(x_train[8])) The previous code snippet defines two conversion dictionaries (from words to numeric codes and vice versa) and a function that translates the dataset examples into readable text. As an example, the code prints the ninth example: “this movie was like a bad train wreck as horrible as it was …”. From this excerpt, you can easily anticipate that the sentiment for this movie isn't positive. Words such as bad, wreck, and horrible convey a strong negative feeling, and that makes guessing the correct sentiment easy. In this example, you receive the numeric sequences and turn them back into words, but the opposite is common. Usually, you get phrases made up of words and turn them into sequences of integers to feed to a layer of RNNs. Keras offers a specialized function, <a href="https://keras.io/preprocessing/text/#tokenizer">Tokenizer</a> which can do that for you. It uses the methods fit_on_text, to learn how to map words to integers from training data, and texts_to_matrix, to transform text into a sequence. However, in other phrases, you may not find such revealing words for the sentiment analysis. The feeling is expressed in a more subtle or indirect way, and understanding the sentiment early in the text may not be possible because revealing phrases and words may appear much later in the discourse. For this reason, you also need to decide how much of the phrase you want to analyze. Conventionally, you take an initial part of the text and use it as representative of the entire review. Sometimes you just need a few initial words — for instance the first 50 words — to get the sense; sometimes you need more. Especially long texts don't reveal their orientation early. It is therefore up to you to understand the type of text you are working with and decide how many words to analyze using deep learning. This example considers only the first 200 words, which should suffice. You have noticed that the code starts giving code to words beginning with the number 3, thus leaving codes from 0 to 2. Lower numbers are used for special tags, such as signaling the start of the phrase, filling empty spaces to have the sequence fixed at a certain length, and marking the words that are excluded because they’re not frequent enough. This example picks up only the most frequent 10,000 words. Using tags to point out start, end, and notable situations is a trick that works with RNNs, especially for machine translation. from keras.preprocessing.sequence import pad_sequences max_pad = 200 x_train = pad_sequences(x_train, maxlen=max_pad) x_test = pad_sequences(x_test, maxlen=max_pad) print(x_train[0]) By using the pad_sequences function from Keras with max_pad set to 200, the code takes the first two hundred words of each review. In case the review contains fewer than two hundred words, as many zero values as necessary precede the sequence to reach the required number of sequence elements. Cutting the sequences to a certain length and filling the voids with zero values is called input padding, an important processing activity when using RNNs like deep learning algorithms. Now the code designs the architecture: from keras.models import Sequential from keras.layers import Bidirectional, Dense, Dropout from keras.layers import GlobalMaxPool1D, LSTM from keras.layers.embeddings import Embedding embedding_vector_length = 32 model = Sequential() model.add(Embedding(top_words, embedding_vector_length, input_length=max_pad)) model.add(Bidirectional(LSTM(64, return_sequences=True))) model.add(GlobalMaxPool1D()) model.add(Dense(16, activation="relu")) model.add(Dense(1, activation="sigmoid")) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) print(model.summary()) The previous code snippet defines the shape of the deep learning model, where it uses a few specialized layers for natural language processing from Keras. The example also has required a summary of the model (model.summary() command) to determine what is happening with architecture by using different neural layers. You have the Embedding layer, which transforms the numeric sequences into a dense word embedding. That type of word embedding is more suitable for being learned by a layer of RNNs. Keras provides an Embedding layer, which, apart from necessarily having to be the first layer of the network, can accomplish two tasks: Applying pretrained word embedding (such as Word2vec or GloVe) to the sequence input. You just need to pass the matrix containing the embedding to its parameter weights. Creating a word embedding from scratch, based on the inputs it receives. In this second case, Embedding just needs to know: input_dim: The size of the vocabulary expected from data output_dim: The size of the embedding space that will be produced (the so-called dimensions) input_length: The sequence size to expect After you determine the parameters, Embedding will find the better weights to transform the sequences into a dense matrix during training. The dense matrix size is given by the length of sequences and the dimensionality of the embedding. If you use The Embedding layer provided by Keras, you have to remember that the function provides only a weight matrix of the size of the vocabulary by the dimension of the desired embedding. It maps the words to the columns of the matrix and then tunes the matrix weights to the provided examples. This solution, although practical for nonstandard language problems, is not analogous to the word embeddings discussed previously, which are trained in a different way and on millions of examples. The example uses Bidirectional wrapping — an LSTM layer of 64 cells. Bidirectional transforms a normal LSTM layer by doubling it: On the first side, it applies the normal sequence of inputs you provide; on the second, it passes the reverse of the sequence. You use this approach because sometimes you use words in a differovoverfittingent order, and building a bidirectional layer will catch any word pattern, no matter the order. The Keras implementation is indeed straightforward: You just apply it as a function on the layer you want to render bidirectionally. The bidirectional LSTM is set to return sequences (return_sequences=True); that is, for each cell, it returns the result provided after seeing each element of the sequence. The results, for each sequence, is an output matrix of 200 x 128, where 200 is the number of sequence elements and 128 is the number of LSTM cells used in the layer. This technique prevents the RNN from taking the last result of each LSTM cell. Hints about the sentiment of the text could actually appear anywhere in the embedded words sequence. In short, it's important not to take the last result of each cell, but rather the best result of it. The code therefore relies on the following layer, GlobalMaxPool1D, to check each sequence of results provided by each LSTM cell and retain only the maximum result. That should ensure that the example picks the strongest signal from each LSTM cell, which is hopefully specialized by its training to pick some meaningful signals. After the neural signals are filtered, the example has a layer of 128 outputs, one for each LSTM cell. The code reduces and mixes the signals using a successive dense layer of 16 neurons with ReLU activation (thus making only positive signals pass through). The architecture ends with a final node using sigmoid activation, which will squeeze the results into the 0–1 range and make them look like probabilities. Having defined the architecture, you can now train the network to perform sentiment analysis. Three epochs (passing the data three times through the network to have it learn the patterns) will suffice. The code uses batches of 256 reviews each time, which allows the network to see enough variety of words and sentiments each time before updating its weights using backpropagation. Finally, the code focuses on the results provided by the validation data (which isn’t part of the training data). Getting a good result from the validation data means the neural net is processing the input correctly. The code reports on validation data just after each epoch finishes. history = model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=3, batch_size=256) Getting the results takes a while, but if you are using a GPU, it will complete in the time you take to drink a cup of coffee. At this point, you can evaluate the results, again using the validation data. (The results shouldn’t have any surprises or differences from what the code reported during training.) loss, metric = model.evaluate(x_test, y_test, verbose=0) print("Test accuracy: %0.3f" % metric) The final accuracy, which is the percentage of correct answers from the deep neural network, will be a value of around 85—86 percent. The result will change slightly each time you run the experiment because of randomization when building your neural network. That’s perfectly normal given the small size of the data you are working with. If you start with the right lucky weights, the learning will be easier in such a short training session. In the end, your network is a sentiment analyzer that can guess the sentiment expressed in a movie review correctly about 85 percent of the time. Given even more training data and more sophisticated neural architectures, you can get results that are even more impressive. In marketing, a similar tool is used to automate many processes that require reading text and taking action. Again, you could couple a network like this with a neural network that listens to a voice and turns it into text. (This is another application of RNNs, now powering Alexa, Siri, Google Voice, and many other personal assistants.) The transition allows the application to understand the sentiment even in vocal expressions, such as a phone call from a customer.

View Article
Deep Learning and Recurrent Neural Networks

Article / Updated 07-16-2019

Neural networks provide a transformation of your input into a desired output. Even in deep learning, the process is the same, although the transformation is more complex. In contrast to a simpler neural network made up of few layers, deep learning relies on more layers to perform complex transformations. The output from a data source connects to the input layer of the neural network, and the input layer starts processing the data. The hidden layers map the patterns and relate them to a specific output, which could be a value or a probability. This process works perfectly for any kind of input, and it works especially well for images. After each layer processes its data, it outputs the transformed data to the next layer. That next layer processes the data with complete independence from the previous layers. The use of this strategy implies that if you are feeding a video to your neural network, the network will process each image singularly, one after the other, and the result won't change at all even if you shuffled the order of the provided images. When running a network in such a fashion, you won't get any advantage from the order of the information processing. However, experience also teaches that to understand a process, you sometimes have to observe events in sequence. When you use the experience gained from a previous step to explore a new step, you can reduce the learning curve and lessen the time and effort needed to understand each step. Recurrent neural networks: Modeling sequences using memory Some neural architectures don’t allow you to process a sequence of elements simultaneously using a single input. For instance, when you have a series of monthly product sales, you accommodate the sales figures using twelve inputs, one for each month, and let the neural network analyze them at one time. It follows that when you have longer sequences, you need to accommodate them using a larger number of inputs, and your network becomes quite huge because each input should connect with every other input. You end up having a network characterized by a large number of connections (which translates into many weights), too. Recurrent Neural Networks (RNNs) are an alternative to the perceptron and CNNs. They first appeared in the 1980s, and various researchers have worked to improve them until they recently gained popularity thanks to the developments in deep learning and computational power. The idea behind RNNs is simple, they examine each element of the sequence once and retain memory of it so they can reuse it when examining the next element in the sequence. It’s akin to how the human mind works when reading text: a person reads letter by letter the text but understands words by remembering each letter in the word. In a similar fashion, an RNN can associate a word to a result by remembering the sequence of letters it receives. An extension of this technique makes it possible ask an RNN to determine whether a phrase is positive or negative—a widely used analysis called sentiment analysis. The network connects a positive or negative answer to certain word sequences it has seen in training examples. You represent an RNN graphically as a neural unit (also known as a cell) that connects an input to an output but also connects to itself. This self-connection represents the concept of recursion, which is a function applied to itself until it achieves a particular output. One of the most commonly used examples of recursion is computing a factorial. The image below shows a specific RNN example using a letter sequence to make the word jazz. The right side of the image below depicts a representation of the RNN unit behavior receiving jazz as an input, but there is actually only the one unit, as shown on the left. This image shows a recursive cell on the left and expands it as an unfolded series of units that receives the single letters of the word jazz on the right. It starts with j, followed by the other letters. As this process occurs, the RNN emits an output and modifies its internal parameters. By modifying its internal parameters, the unit learns from the data it receives and from the memory of the previous data. The sum of this learning is the state of the RNN cell. When discussing neural networks, you will hear lots of discussion about weights. With RNNs, you also need to know the term state. The weights help process the input into an output in an RNN, but the state contains the traces of the information the RNN has seen so far, so the state affects the functioning of the RNN. The state is a kind of short-term memory that resets after a sequence completes. As an RNN cell gets pieces of a sequence, it does the following: Processes them, changing the state with each input. Emits an output. After seeing the last output, the RNN learns the best weights for mapping the input into the correct output using backpropagation. Recurrent neural networks: Recognizing and translating speech The capability to recognize and translate between languages becomes more important each day as economies everywhere become increasingly globalized. Language translation is an area in which AI has a definite advantage over humans — so much so that articles from Digitalist Magazine and Forbes are beginning to question how long the human translator will remain viable. Of course, you must make the translation process viable using deep learning. From a neural architecture perspective, you have a couple of choices: Keep all the outputs provided by the RNN cell Keep the last RNN cell output The last output is the output of the entire RNN because it’s produced after completing the sequence examination. However, you can use the previous outputs if you need to predict another sequence or you intend to stack more RNN cells after the current one, such as when working with Convolutional Neural Networks (CNNs). Staking RNNs vertically enables the network to learn complex sequence patterns and become more effective in producing predictions. You can also stack RNNs horizontally in the same layer. Allowing multiple RNNs to learn from a sequence can help it get more from the data. Using multiple RNNs is similar to CNNs, in which each single layer uses depths of convolutions to learn details and patterns from the image. In the multiple RNNs case, a layer can grasp different nuances of the sequence they are examining. Designing grids of RNNs, both horizontally and vertically, improves predictive performances. However, deciding how to use the output determines what a deep learning architecture powered by RNNs can achieve. The key is the number of elements used as inputs and the sequence length expected as output. As the deep learning network synchronizes the RNN outputs, you get your desired outcome. You have a few possibilities when using multiple RNNs, as depicted in the image below: One to one: When you have one input and expect one output. They take one case, made up of a certain number of informative variables, and provide an estimate, such as a number or probability. One to many: Here you have one input and you expect a sequence of outputs as a result. Automatic captioning neural networks use this approach: You input a single image and produce a phrase describing image content. Many to one: The classic example for RNNs. For example, you input a textual sequence and expect a single result as output. You see this approach used for producing a sentiment analysis estimate or another classification of the text. Many to many: You provide a sequence as input and expect a resulting sequence as output. This is the core architecture for many of the most impressive deep learning–powered AI applications. This approach is used for machine translation (such as a network that can automatically translate a phrase from English to German), chatbots (a neural network that can answer your questions and argue with you), and sequence labeling (classifying each of the images in a video). Machine translation is the capability of a machine to translate, correctly and meaningfully, one human language into another one. This capability is something that scientists have striven to achieve for long time, especially for military purposes. You can read the fascinating story of all the attempts to perform machine translation by U.S. and Russian scientists in the article by Vasily Zubarev. The real breakthrough happened only after Google launched its Google Neural Machine Translation (GNMT), which you can read more about on the Google AI blog. GNMT relies on a series of RNNs (using the many-to-many paradigm) to read the word sequence in the language you want to translate from (called the encoder layer) and return the results to another RNN layer (the decoder layer) that transforms it into translated output. Neural machine translation needs two layers because the grammar and syntax of one language can be different from another. A single RNN can’t grasp two language systems at the same time, so the encoder-decoder couple is needed to handle the two languages. The system isn’t perfect, but it’s an incredible leap forward from the previous solutions described in Vasily Zubarev’s article, greatly reducing errors in word order, lexical mistakes (the chosen translation word), and grammar (how words are used). Moreover, performance depends on the training set, the differences between the languages involved, and their specific characteristics. For instance, because of how sentence structure is built in Japanese, the Japanese government is now investing in a real-time voice translator to help during the Tokyo Olympic Games in 2020 and to boost tourism by developing an advanced neural network solution. RNNs are the reason your voice assistant can answer you or your automatic translator can give you a foreign language translation. Because an RNN is simply a recurring operation of multiplication and summation, deep learning networks can’t really understand any meaning; they simply process words and phrases based on what they learned during training. Recurrent neural networks: Placing the correct caption on pictures Another possible application of RNNs using the many-to-many approach is caption generation, which involves providing an image to a neural network and receiving a text description that explains what’s happening in the image. In contrast to chatbots and machine translators, whose output is consumed by humans, caption generation works with robotics. It does more than simply generate image or video descriptions. Caption generation can help people with impaired vision perceive their environment using devices like the Horus wearable or build a bridge between images and knowledge bases (which are text based) for robots — allowing them to understand their surroundings better. You start from specially devised datasets such as the Pascal Sentence Dataset; the Flickr 30K, which consists of Flickr images annotated by crowd sourcing; or the MS Coco dataset. In all these datasets, each image includes one or more phrases explaining the image content. For example, in the MS Coco dataset sample number 5947, you see four flying airplanes that you could correctly caption as: Four airplanes in the sky overhead on an overcast day Four single-engine planes in the air on a cloudy day A group of four planes flying in formation A group of airplanes flying through the sky A fleet of planes flying through the sky A well-trained neural network should be able to produce analogous phrases, if presented with a similar photo. Google first published a paper on the solution for this problem, named the Show and Tell network or Neural Image Caption (NIC), in 2014, and then updated it one year later. Google has since open sourced the NIC and offered it as part of the TensorFlow framework. As a neural network, it consists of a pretrained CNN (such as Google LeNet, the 2014 winner of the ImageNet competition) that processes images similarly to transfer learning. An image is turned into a sequence of values representing the high-level image features detected by the CNN. During training, the embedded image passes to a layer of RNNs that memorize the image characteristics in their internal state. The CNN compares the results produced by the RNNs to all the possible descriptions provided for the training image and an error is computed. The error then backpropagates to the RNN’s part of the network to adjust the RNN’s weights and help it learn how to caption images correctly. After repeating this process many times using different images, the network is ready to see new images and provide its description of these new images. Recurrent neural networks provide opportunities for more advanced innovation and could help to automate some necessary tasks.

View Article
Deep Learning and Natural Language Processing

Article / Updated 07-16-2019

As a simplification, you can view language as a sequence of words made of letters (as well as punctuation marks, symbols, emoticons, and so on). Deep learning processes language best by using layers of RNNs, such as LSTM or GRU. However, knowing to use RNNs doesn't tell you how to use sequences as inputs; you need to determine the kind of sequences. In fact, deep learning networks accept only numeric input values. Computers encode letter sequences that you understand into numbers according to a protocol, such as Unicode Transformation Format-8 bit (UTF-8). UTF-8 is the most widely used encoding. Deep learning can also process textual data using Convolutional Neural Networks (CNNs) instead of RNNs by representing sequences as matrices (similar to image processing). Keras supports CNN layers, such as the Conv1D, which can operate on ordered features in time — that is, sequences of words or other signals. The 1D convolution output is usually followed by a MaxPooling1D layer that summarizes the outputs. CNNs applied to sequences find a limit in their insensitivity to the global order of the sequence. (They tend to spot local patterns.) For this reason, they're best used in sequence processing in combination with RNNs, not as their replacement. Natural Language Processing (NLP) consists of a series of procedures that improve the processing of words and phrases for statistical analysis, machine learning algorithms, and deep learning. NLP owes its roots to computational linguistics that powered AI rule-based systems, such as expert systems, which made decisions based on a computer translation of human knowledge, experience, and way of thinking. NLP digested textual information, which is unstructured, into more structured data so that expert systems could easily manipulate and evaluate it. Deep learning has taken the upper hand today, and expert systems are limited to specific applications in which interpretability and control of decision processes are paramount (for instance, in medical applications and driving behavior decision systems on some self-driving cars). Yet, the NLP pipeline is still quite relevant for many deep learning applications. Natural Language Processing: Defining understanding as tokenization In an NLP pipeline, the first step is to obtain raw text. Usually you store it in memory or access it from disk. When the data is too large to fit in memory, you maintain a pointer to it on disk (such as the directory name and the filename). In the following example, you use three documents (represented by string variables) stored in a list (the document container is the corpus in nat import numpy as np texts = ["My dog gets along with cats", "That cat is vicious", "My dog is happy when it is lunch"] After obtaining the text, you process it. As you process each phrase, you extract the relevant features from the text (you usually create a bag-of-words matrix) and pass everything to a learning model, such as a deep learning algorithm. During text processing, you can use different transformations to manipulate the text (with tokenization being the only mandatory transformation): Normalization: Remove capitalization. Cleaning: Remove nontextual elements such as punctuation and numbers. Tokenization: Split a sentence into individual words. Stop word removal: Remove common, uninformative words that don’t add meaning to the sentence, such as the articles the and a. Removing negations such as not could be detrimental if you want to guess the sentiment. Stemming: Reduce a word to its stem (which is the word form before adding inflectional affixes). An algorithm, called a stemmer, can do this based on a series of rules. Lemmatization: Transform a word into its dictionary form (the lemma). It’s an alternative to stemming, but it’s more complex because you don’t use an algorithm. Instead, you use a dictionary to convert every word into its lemma. Pos-tagging: Tag every word in a phrase with its grammatical role in the sentence (such as tagging a word as a verb or as a noun). N-grams: Associate every word with a certain number (the n in n-gram), of following words and treat them as a unique set. Usually, bi-grams (a series of two adjacent elements or tokens) and tri-grams (a series of three adjacent elements or tokens) work the best for analysis purposes. To achieve these transformations, you may need a specialized Python package such as NLTK or Scikit-learn. When working with deep learning and a large number of examples, you need only basic transformations: normalization, cleaning, and tokenization. The deep learning layers can determine what information to extract and process. When working with few examples, you do need to provide as much NLP processing as possible to help the deep learning network determine what to do in spite of the little guidance provided by the few examples. Keras offers a function, keras.preprocessing.text.Tokenizer, that normalizes (using the lower parameter set to True), cleans (the filters parameter contains a string of the characters to remove, usually these: '!"#$%&()*+,-./:;<=>?@[\]^_`{|}~ ') and tokenizes. Natural Language Processing: Putting all the documents into a bag After processing the text, you have to extract the relevant features, which means transforming the remaining text into numeric information for the neural network to process. This is commonly done using the bag-of-words approach, which is obtained by frequency encoding or binary encoding the text. This process equates to transforming each word into a matrix column as wide as the number of words you need to represent. The following example shows how to achieve this process and what it implies. As a first step, you prepare a basic normalization and tokenization using a few Python commands to determine the word vocabulary size for processing: unique_words = set(word.lower() for phrase in texts for word in phrase.split(" ")) print(f"There are {len(unique_words)} unique words") The code reports 14 words. You now proceed to load the Tokenizer function from Keras and set it to process the text by providing the expected vocabulary size: from keras.preprocessing.text import Tokenizer vocabulary_size = len(unique_words) + 1 tokenizer = Tokenizer(num_words=vocabulary_size) Using a vocabulary_size that's too small may exclude important words from the learning process. One that’s too large may uselessly consume computer memory. You need to provide Tokenizer with a correct estimate of the number of distinct words contained in the list of texts. You also always add 1 to the vocabulary_size to provide an extra word for the start of a phrase (a term that helps the deep learning network). At this point, Tokenizer maps the words present in the texts to indexes, which are numeric values representing the words in text: tokenizer.fit_on_texts(texts) print(tokenizer.index_word) The resulting indexes are as follows: {1: 'is', 2: 'my', 3: 'dog', 4: 'gets', 5: 'along', 6: 'with', 7: 'cats', 8: 'that', 9: 'cat', 10: 'vicious', 11: 'happy', 12: 'when', 13: 'it', 14: 'lunch'} The indexes represent the column number that houses the word information: print(tokenizer.texts_to_matrix(texts)) Here's the resulting matrix: [[0. 0. 1. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0.] [0. 1. 0. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 0. 0.] [0. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1.]] The matrix consists of 15 columns (14 words plus the start of phrase pointer) and three rows, representing the three processed texts. This is the text matrix to process using a shallow neural network (RNNs require a different format, as discussed later), which is always sized as vocabulary_size by the number of texts. The numbers inside the matrix represent the number of times a word appears in the phrase. This isn’t the only representation possible, though. Here are the others: Frequency encoding: Counts the number of word appearances in the phrase. one-hot encoding or binary encoding: Notes the presence of a word in a phrase, no matter how many times it appear. Term frequency-Inverse Document Frequency (TF-IDF) score: Encodes a measure relative to how many times a word appears in a document relative to the overall number of words in the matrix. (Words with higher scores are more distinctive; words with lower scores are less informative.) You can use the TF-IDF transformation from Keras directly. The Tokenizer offers a method, texts_to_matrix, that by default encodes your text and transforms it into a matrix in which the columns are your words, the rows are your texts, and the values are the word frequency within a text. If you apply the transformation by specifying mode='tfidf’, the transformation uses TF-IDF instead of word frequencies to fill the matrix values: print(np.round(tokenizer.texts_to_matrix(texts, mode='tfidf'), 1)) Note that by using a matrix representation, no matter whether you use binary, frequency, or the more sophisticated TF-IDF, you have lost any sense of word ordering that exists in the phrase. During processing, the words scatter in different columns, and the neural network can’t guess the word order in a phrase. This lack of order is why you call it a bag-of-words approach. The bag-of-words approach is used in many machine learning algorithms, often with results ranging from good to fair, and you can apply it to a neural network using dense architecture layers. Transformations of words encoded into n_grams (discussed in the previous paragraph as an NLP processing transformation) provide some more information, but again, you can't relate the words. RNNs keep track of sequences, so they still use one-hot encoding, but they don’t encode the entire phrase, rather, they individually encode each token (which could be a word, a character, or even a bunch of characters). For this reason, they expect a sequence of indexes representing the phrase: print(tokenizer.texts_to_sequences(texts)) As each phrase passes to a neural network input as a sequence of index numbers, the number is turned into a one-hot encoded vector. The one-hot encoded vectors are then fed into the RNN’s layers one at a time, making them easy to learn. For instance, here’s the transformation of the first phrase in the matrix: [[0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]] In this representation, you get a distinct matrix for each piece of text. Each matrix represents the individual texts as distinct words using columns, but now the rows represent the word appearance order. (The first row is the first word, the second row is the second word, and so on.) Using this basic approach, data scientists are able to use deep learning for Natural Language Processing.

View Article
Neural Networks and Deep Learning: Neural Network Differentiation

Article / Updated 07-16-2019

Once you know how neural networks basically work, you need a better understanding of what differentiates them to understand their role in deep learning. Beyond the different neural network architectures, the choice of the activation functions, optimizers and the neural network's learning rate can make the difference. Knowing basic operations isn’t enough because you won’t get the results you want. Looking under the hood of a neural network helps you understand how you can tune your solution to model specific problems. In addition, understanding the various algorithms used to create a neural network will help you obtain better results with less effort and in a shorter time. The following article focuses on three areas of neural network differentiation. Choosing the right activation function for your neural network An activation function is the part of a neural network that simply defines when a neuron fires. Consider it a sort of tipping point: Input of a certain value won’t cause the neuron to fire because it’s not enough, but just a little more input can cause the neuron to fire. A neuron is defined in a simple manner as follows: y = ∑ (weight * input) + bias The output, y, can be any value between + infinity and – infinity. The problem, then, is to decide on what value of y is the firing value, which is where an activation function comes into play in your neural network. The activation function determines which value is high or low enough to reflect a decision point in the neural network for a particular neuron or group of neurons. As with everything else in neural networks, you don't have just one activation function. You use the activation function that works best in a particular scenario. With this in mind, you can break the activation functions into these categories: Step: A step function (also called a binary function) relies on a specific threshold for making the decision about activating or not. Using a step function means that you know which specific value will cause an activation. However, step functions are limited in that they’re either fully activated or fully deactivated —no shades of gray exist. Consequently, when attempting to determine which class is most likely correct based in a given input, a step function won’t work. Linear: A linear function (A = cx) provides a straight-line determination of activation based on input. Using a linear function helps you determine which output to activate based on which output is most correct (as expressed by weighting). However, linear functions work only as a single layer. If you were to stack multiple linear function layers, the output would be the same as using a single layer, which defeats the purpose of using neural networks. Consequently, a linear function may appear as a single layer, but never as multiple layers. Sigmoid: A sigmoid function (A = 1 / 1 + e-x), which produces a curve shaped like the letter C or S, is nonlinear. It begins by looking sort of like the step function, except that the values between two points actually exist on a curve, which means that you can stack sigmoid functions to perform classification with multiple outputs. The range of a sigmoid function is between 0 and 1, not – infinity to + infinity as with a linear function, so the activations are bound within a specific range. However, the sigmoid function suffers from a problem called vanishing gradient, which means that the function refuses to learn after a certain point because the propagated error shrinks to zero as it approaches far away layers. Tanh: A tanh function (A = (2 / 1 + e-2x) – 1) is actually a scaled sigmoid function. It has a range of –1 to 1, so again, it's a precise method for activating neurons. The big difference between sigmoid functions and tanh functions is that the tanh function gradient is stronger, which means that detecting small differences is easier, making classification more sensitive. Like the sigmoid function, tanh suffers from vanishing gradient issues. ReLU: A ReLU, or Rectified Linear Units, function (A(x) = max(0, x)) provides an output in the range of 0 to infinity, so it’s similar to the linear function except that it’s also nonlinear, enabling you to stack ReLU functions. An advantage of ReLU is that it requires less processing power because fewer neurons fire. The lack of activity as the neuron approaches the 0 part of the line means that there are fewer potential outputs to look at. However, this advantage can also become a disadvantage when you have a problem called the dying ReLU. After a while, the neural network weights don’t provide the desired effect any longer (it simply stops learning) and the affected neurons die — they don’t respond to any input. Also, the ReLU has some variants that you should consider: ELU (Exponential Linear Unit): Differs from ReLU when the inputs are negative. In this case, the outputs don’t go to zero but instead slowly decrease to –1 exponentially. PReLU (Parametric Rectified Linear Unit): Differs from ReLU when the inputs are negative. In this case, the output is a linear function whose parameters are learned using the same technique as any other parameters of the network. LeakyReLU: Similar to PReLU but the parameter for the linear side is fixed. Relying on a smart optimizer for your neural network An optimizer serves to ensure that your neural network performs fast and correctly models whatever problem you want to solve by modifying the neural network’s biases and weights (see this article for more on improving your machine learning models). It turns out that an algorithm performs this task, but you must choose the correct algorithm to obtain the results you expect. As with all neural network scenarios, you have a number of optional algorithm types from which to choose: Stochastic gradient descent (SGD) RMSProp AdaGrad AdaDelta AMSGrad Adam and its variants, Adamax and Nadam An optimizer works by minimizing or maximizing the output of an objective function (also known as an error function) represented as E(x). This function is dependent on the model’s internal learnable parameters used to calculate the target values (Y) from the predictors (X). Two internal learnable parameters are weights (W) and bias (b). The various algorithms have different methods of dealing with the objective function. You can categorize the optimizer functions by the manner in which they deal with the derivative (dy/dx), which is the instantaneous change of y with respect to x. Here are the two levels of derivative handling: First order: These algorithms minimize or maximize the objective function using gradient values with respect to the parameters. Second order: These algorithms minimize or maximize the object function using the second-order derivative values with respect to the parameters. The second-order derivative can give a hint as to whether the first-order derivative is increasing or decreasing, which provides information about the curvature of the line. You commonly use first-order optimization techniques in neural networks, such as Gradient Descent, because they require fewer computations and tend to converge to a good solution relatively fast when working on large datasets. Setting a working learning rate in your neural network Each optimizer has completely different parameters to tune in your neural network. One constant is fixing the learning rate, which represents the rate at which the code updates the network’s weights (such as the alpha parameter). The learning rate can affect both the time the neural network takes to learn a good solution (the number of epochs) and the result. In fact, if the learning rate is too low, your network will take forever to learn. Setting the value too high causes instability when updating the weights, and the network won't ever converge to a good solution. Choosing a learning rate that works and training your neural network is daunting because you can effectively try values in the range from 0.000001 to 100. The best value varies from optimizer to optimizer. The value you choose depends on what type of data you have. Theory can be of little help here; you have to test different combinations before finding the most suitable learning rate for training your neural network successfully. In spite of all the math surrounding them, tuning neural networks and having them work best is mostly a matter of empirical efforts in trying different combinations of architectures and parameters. Take the time to evaluate the learning rate and set it appropriately to ensure your neural network functions optimally.

View Article
What is Deep Learning?

Article / Updated 07-16-2019

What is deep learning? Deep learning is a subcategory of machine learning. With both deep learning and machine learning, algorithms seem as though they are learning. This is accomplished when the algorithms analyze huge amounts of data and then take actions or perform a function based on the derived information. An understanding of deep learning begins with a precise definition of terms. Otherwise, you have a hard time separating the media hype from the realities of what deep learning can actually provide. Deep learning is part of both AI and machine learning. To understand deep learning, you must begin at the outside — that is, you start with AI, and then work your way through machine learning, and then finally define deep learning. The following guide steps you through this process. Deep learning starts with artificial intelligence Saying that AI is an artificial intelligence doesn’t really tell you anything meaningful, which is why so many discussions and disagreements arise over this term. Yes, you can argue that what occurs is artificial, not having come from a natural source. However, the intelligence part is, at best, ambiguous. People define intelligence in many different ways. However, you can say that intelligence involves certain mental exercises composed of the following activities: Learning: Having the ability to obtain and process new information. Reasoning: Being able to manipulate information in various ways. Understanding: Considering the result of information manipulation. Grasping truths: Determining the validity of the manipulated information. Seeing relationships: Divining how validated data interacts with other data. Considering meanings: Applying truths to particular situations in a manner consistent with their relationship. Separating fact from belief: Determining whether the data is adequately supported by provable sources that can be demonstrated to be consistently valid. The list could easily get quite long, but even this list is prone to interpretation by anyone who accepts it as viable. As you can see from the list, however, intelligence often follows a process that a computer system can mimic as part of a simulation: Set a goal based on needs or wants. Assess the value of any currently known information in support of the goal. Gather additional information that could support the goal. Manipulate the data such that it achieves a form consistent with existing information. Define the relationships and truth values between existing and new information. Determine whether the goal is achieved. Modify the goal in light of the new data and its effect on the probability of success. Repeat Steps 2 through 7 as needed until the goal is achieved (found true) or the possibilities for achieving it are exhausted (found false). Even though you can create algorithms and provide access to data in support of this process within a computer, a computer’s capability to achieve intelligence is severely limited. For example, a computer is incapable of understanding anything because it relies on machine processes to manipulate data using pure math in a strictly mechanical fashion. Likewise, computers can’t easily separate truth from mistruth. In fact, no computer can fully implement any of the mental activities described on the intelligence list. When thinking about AI, you must consider the goals of the people who developed it. The goal is to mimic human intelligence, not replicate it. A computer doesn’t truly think, but it gives the appearance of thinking. However, a computer only appears intelligent when it comes to logical/mathematical thinking. Unlike humans, however, a computer has no way to mimic intrapersonal or creative intelligence. What is the role of AI in deep learning? Remember, the first concept that’s important to understand is that AI (artificial intelligence) doesn’t really have anything to do with human intelligence. Yes, some AI is modeled to simulate human intelligence, but that’s what it is: a simulation. When thinking about AI, notice that an interplay exists between goal seeking, data processing used to achieve that goal, and data acquisition used to better understand the goal. AI relies on algorithms to achieve a result that may or may not have anything to do with human goals or methods of achieving those goals. With this in mind, you can categorize AI in four ways: Acting humanly: When a computer acts like a human, it best reflects the Turing test, in which the computer succeeds when differentiation between the computer and a human isn't possible. This category also reflects what the media would have you believe that AI is all about. You see it employed for technologies such as natural language processing, knowledge representation, automated reasoning, and machine learning (all four of which must be present to pass the test). The original Turing Test didn’t include any physical contact. The newer, Total Turing Test does include physical contact in the form of perceptual ability interrogation, which means that the computer must also employ both computer vision and robotics to succeed. Modern techniques include the idea of achieving the goal rather than mimicking humans completely. For example, the Wright brothers didn’t succeed in creating an airplane by precisely copying the flight of birds; rather, the birds provided ideas that led to aerodynamics, which in turn eventually led to human flight. The goal is to fly. Both birds and humans achieve this goal, but they use different approaches. Thinking humanly: When a computer thinks as a human, it performs tasks that require intelligence (as contrasted with rote procedures) from a human to succeed, such as driving a car. To determine whether a program thinks like a human, you must have some method of determining how humans think, which the cognitive modeling approach defines. This model relies on three techniques: Introspection: Detecting and documenting the techniques used to achieve goals by monitoring one’s own thought processes. Psychological testing: Observing a person’s behavior and adding it to a database of similar behaviors from other persons given a similar set of circumstances, goals, resources, and environmental conditions (among other things). Brain imaging: Monitoring brain activity directly through various mechanical means, such as Computerized Axial Tomography (CAT), Positron Emission Tomography (PET), Magnetic Resonance Imaging (MRI), and Magnetoencephalography (MEG). After creating a model, you can write a program that simulates the model. Given the amount of variability among human thought processes and the difficulty of accurately representing these thought processes as part of a program, the results are experimental at best. This category of thinking humanly is often used in psychology and other fields in which modeling the human thought process to create realistic simulations is essential. Thinking rationally: Studying how humans think using some standard enables the creation of guidelines that describe typical human behaviors. A person is considered rational when following these behaviors within certain levels of deviation. A computer that thinks rationally relies on the recorded behaviors to create a guide as to how to interact with an environment based on the data at hand. The goal of this approach is to solve problems logically, when possible. In many cases, this approach would enable the creation of a baseline technique for solving a problem, which would then be modified to actually solve the problem. In other words, the solving of a problem in principle is often different from solving it in practice, but you still need a starting point. Acting rationally: Studying how humans act in given situations under specific constraints enables you to determine which techniques are both efficient and effective. A computer that acts rationally relies on the recorded actions to interact with an environment based on conditions, environmental factors, and existing data. As with rational thought, rational acts depend on a solution in principle, which may not prove useful in practice. However, rational acts do provide a baseline upon which a computer can begin negotiating the successful completion of a goal. How does machine learning work? Machine learning is one of a number of subsets of AI. In machine learning, the goal is to create a simulation of human learning so that an application can adapt to uncertain or unexpected conditions. To perform this task, machine learning relies on algorithms to analyze huge datasets. Currently, machine learning can’t provide the sort of AI that the movies present (a machine can’t intuitively learn as a human can); it can only simulate specific kinds of learning, and only in a narrow range at that. Even the best algorithms can’t think, feel, present any form of self-awareness, or exercise free will. Characteristics that are basic to humans are frustratingly difficult for machines to grasp because of these limits in perception. Machines aren’t self-aware. What machine learning can do is perform predictive analytics far faster than any human can. As a result, machine learning can help humans work more efficiently. The current state of AI, then, is one of performing analysis, but humans must still consider the implications of that analysis: making the required moral and ethical decisions. The essence of the matter is that machine learning provides just the learning part of AI, and that part is nowhere near ready to create an AI of the sort you see in films. The main point of confusion between learning and intelligence is that people assume that simply because a machine gets better at its job (it can learn), it’s also aware (has intelligence). Nothing supports this view of machine learning. The same phenomenon occurs when people assume that a computer is purposely causing problems for them. The computer can’t assign emotions and therefore acts only upon the input provided and the instruction contained within an application to process that input. A true AI will eventually occur when computers can finally emulate the clever combination used by nature: Genetics: Slow learning from one generation to the next Teaching: Fast learning from organized sources Exploration: Spontaneous learning through media and interactions with others To keep machine learning concepts in line with what the machine can actually do, you need to consider specific machine learning uses. It’s useful to view uses of machine learning outside the normal realm of what many consider the domain of AI. Here are a few uses for machine learning that you might not associate with an AI: Access control: In many cases, access control is a yes-or-no proposition. An employee smartcard grants access to a resource in much the same way as people have used keys for centuries. Some locks do offer the capability to set times and dates that access is allowed, but such coarse-grained control doesn’t really answer every need. By using machine learning, you can determine whether an employee should gain access to a resource based on role and need. For example, an employee can gain access to a training room when the training reflects an employee role. Animal protection: The ocean might seem large enough to allow animals and ships to cohabitate without problem. Unfortunately, many animals get hit by ships each year. A machine learning algorithm could allow ships to avoid animals by learning the sounds and characteristics of both the animal and the ship. (The ship would rely on underwater listening gear to track the animals through their sounds, which you can actually hear a long distance from the ship.) Predicting wait times: Most people don’t like waiting when they have no idea of how long the wait will be. Machine learning allows an application to determine waiting times based on staffing levels, staffing load, complexity of the problems the staff is trying to solve, availability of resources, and so on. Moving from machine learning to deep learning Deep learning is a subset of machine learning, as previously mentioned. In both cases, algorithms appear to learn by analyzing extremely large amounts of data (however, learning can occur even with tiny datasets in some cases). However, deep learning varies in the depth of its analysis and the kind of automation it provides. You can summarize the differences between machine learning and deep learning like this: A completely different paradigm: Machine learning is a set of many different techniques that enable a computer to learn from data and to use what it learns to provide an answer, often in the form of a prediction. Machine learning relies on different paradigms such as using statistical analysis, finding analogies in data, using logic, and working with symbols. Contrast the myriad techniques used by machine learning with the single technique used by deep learning, which mimics human brain functionality. It processes data using computing units, called neurons, arranged into ordered sections, called The technique at the foundation of deep learning is the neural network. Flexible architectures: Machine learning solutions offer many knobs (adjustments) called hyperparameters that you tune to optimize algorithm learning from data. Deep learning solutions use hyperparameters, too, but they also use multiple user-configured layers (the user specifies number and type). In fact, depending on the resulting neural network, the number of layers can be quite large and form unique neural networks capable of specialized learning: Some can learn to recognize images, while others can detect and parse voice commands. The point is that the term deep is appropriate; it refers to the large number of layers potentially used for analysis. The architecture consists of the ensemble of different neurons and their arrangement in layers in a deep learning solution. Autonomous feature definition: Machine learning solutions require human intervention to succeed. To process data correctly, analysts and scientists use a lot of their own knowledge to develop working algorithms. For instance, in a machine learning solution that determines the value of a house by relying on data containing the wall measures of different rooms, the machine learning algorithm won't be able to calculate the surface of the house unless the analyst specifies how to calculate it beforehand. Creating the right information for a machine learning algorithm is called feature creation, which is a time-consuming activity. Deep learning doesn't require humans to perform any feature-creation activity because, thanks to its many layers, it defines its own best features. That's also why deep learning outperforms machine learning in otherwise very difficult tasks such as recognizing voice and images, understanding text, or beating a human champion at the Go game (the digital form of the board game in which you capture your opponent's territory). You need to understand a number of issues with regard to deep learning solutions, the most important of which is that the computer still doesn’t understand anything and isn’t aware of the solution it has provided. It simply provides a form of feedback loop and automation conjoined to produce desirable outputs in less time than a human could manually produce precisely the same result by manipulating a machine learning solution. The second issue is that some benighted people have insisted that the deep learning layers are hidden and not accessible to analysis. This isn’t the case. Anything a computer can build is ultimately traceable by a human. In fact, the General Data Protection Regulation (GDPR) requires that humans perform such analysis. The requirement to perform this analysis is controversial, but current law says that someone must do it. The third issue is that self-adjustment goes only so far. Deep learning doesn’t always ensure a reliable or correct result. In fact, deep learning solutions can go horribly wrong. Even when the application code doesn’t go wrong, the devices used to support the deep learning can be problematic. Even so, with these problems in mind, you can see deep learning used for a number of extremely popular applications.

View Article
Distinguishing Classification Tasks with Convolutional Neural Networks

Article / Updated 07-16-2019

Convolutional neural networks (CNN) are the building blocks of deep learning–based image recognition, yet they answer only a basic classification need: Given a picture, they can determine whether its content can be associated with a specific image class learned through previous examples. Therefore, when you train a deep neural network to recognize dogs and cats, you can feed it a photo and obtain output that tells you whether the photo contains a dog or cat. If the last network layer is a softmax layer, the network outputs the probability of the photo containing a dog or a cat (the two classes you trained it to recognize) and the output sums to 100 percent. When the last layer is a sigmoid-activated layer, you obtain scores that you can interpret as probabilities of content belonging to each class, independently. The scores won’t necessarily sum to 100 percent. In both cases, the classification may fail when the following occurs: The main object isn’t what you trained the network to recognize, such as presenting the example neural network with a photo of a raccoon. In this case, the network will output an incorrect answer of dog or cat. The main object is partially obstructed. For instance, your cat is playing hide and seek in the photo you show the network, and the network can’t spot it. The photo contains many different objects to detect, perhaps including animals other than cats and dogs. In this case, the output from the network will suggest a single class rather than include all the objects. The picture below shows image 47780 taken from the MS Coco dataset (released as part of the open source Creative Commons Attribution 4.0 License). The series of three outputs shows how a CNN has detected, localized, and segmented the objects appearing in the image (a kitten and a dog standing on a field of grass). A plain CNN can’t reproduce the examples you see below because its architecture will output the entire image as being of a certain class. To overcome this limitation, researchers extend the basic CNNs capabilities to make them capable of the following: Detection: Determining when an object is present in an image. Detection is different from classification because it involves just a portion of the image, implying that the network can detect multiple objects of the same and of different types. The capability to spot objects in partial images is called instance spotting. Localization: Defining exactly where a detected object appears in an image. You can have different types of localizations. Depending on granularity, they distinguish the part of the image that contains the detected object. Segmentation: Classification of objects at the pixel level. Segmentation takes localization to the extreme. This kind of neural model assigns each pixel of the image to a class or even an entity. For instance, the network marks all the pixels in a picture relative to dogs and distinguishes each one using a different label (called instance segmentation). Performing localization with convolutional neural networks Localization is perhaps the easiest extension that you can get from a regular CNN. It requires that you train a regressor model alongside your deep learning classification model. A regressor is a model that guesses numbers. Defining object location in an image is possible using corner pixel coordinates, which means that you can train a neural network to output key measures that make it easy to determine where the classified object appears in the picture using a bounding box. Usually a bounding box uses the x and y coordinates of the lower-left corner, together with the width and the height of the area that encloses the object. Classifying multiple objects with convolutional neural networks A CNN can detect (predicting a class) and localize (by providing coordinates) only a single object in an image. If you have multiple objects in an image, you may still use a CNN and locate each object present in the picture by means of two old image-processing solutions: Sliding window: Analyzes only a portion (called a region of interest) of the image at a time. When the region of interest is small enough, it likely contains only a single object. The small region of interest allows the CNN to correctly classify the object. This technique is called sliding window because the software uses an image window to limit visibility to a particular area (the way a window in a home does) and slowly moves this window around the image. The technique is effective but could detect the same image multiple times, or you may find that some objects go undetected based on the window size that you decide to use to analyze the images. Image pyramids: Solves the problem of using a window of fixed size because it generates increasingly smaller resolutions of the image. Therefore, you can apply a small sliding window. In this way, you transform the objects in the image, and one of the reductions may fit exactly into the sliding window used. These techniques are computationally intensive. To apply them, you have to resize the image multiple times and then split it into chunks. You then process each chunk using your classification CNN. The number of operations for these activities is so large that rendering the output in real time is impossible. The sliding window and image pyramid have inspired deep learning researchers to discover a couple of conceptually similar approaches that are less computationally intensive. The first approach is one-stage detection. It works by dividing the images into grids, and the neural network makes a prediction for every grid cell, predicting the class of the object inside. The prediction is quite rough, depending on the grid resolution (the higher the resolution, the more complex and slower the deep learning network). One-stage detection is very fast, having almost the same speed as a simple CNN for classification. The results have to be processed to gather the cells representing the same object together, and that may lead to further inaccuracies. Neural architectures based on this approach are Single-Shot Detector (SSD), You Only Look Once (YOLO), and RetinaNet. One-stage detectors are very fast, but not so precise. The second approach is two-stage detection. This approach uses a second neural network to refine the predictions of the first one. The first stage is the proposal network, which outputs its predictions on a grid. The second stage fine-tunes these proposals and outputs a final detection and localization of the objects. R-CNN, Fast R-CNN, and Faster R-CNN are all two-stage detection models that are much slower than their one-stage equivalents, but more precise in their predictions. Annotating multiple objects in images with convolutional neural networks To train deep learning models to detect multiple objects, you need to provide more information than in simple classification. For each object, you provide both a classification and coordinates within the image using the annotation process, which contrasts with the labeling used in simple image classification. Labeling images in a dataset is a daunting task even in simple classification. Given a picture, the neural network must provide a correct classification for the training and test phases. In labeling, the network decides on the right label for each picture, and not everyone will perceive the depicted image in the same way. The people who created the ImageNet dataset used the classification provided by multiple users from the Amazon Mechanical Turk crowdsourcing platform (ImageNet used the Amazon service so much that in 2012, it turned out to be Amazon’s most important academic customer.) In a similar way, you rely on the work of multiple people when annotating an image using bounding boxes. Annotation requires that you not only label each object in a picture but also must determine the best box with which to enclose each object. These two tasks make the annotation even more complex than labeling and more prone to producing erroneous results. Performing annotation correctly requires the work of more people who can provide a consensus on the accuracy of the annotation. Some open source software can help in annotation for image detection (as well as for image segmentation). Two tools are particularly effective: LabelImg, created by TzuTa Lin with a YouTube tutorial. LabelMe is a powerful tool for image segmentation that provides an online service. FastAnnotationTool, based on the computer vision library OpenCV. The package is less well maintained but still viable. Segmenting images with convolutional neural networks Semantic segmentation predicts a class for each pixel in the image, which is a different perspective from either labeling or annotation. Some people also call this task dense prediction because it makes a prediction for every pixel in an image. The task doesn’t specifically distinguish different objects in the prediction. For instance, a semantic segmentation can show all the pixels that are of the class cat, but it won’t provide any information about what the cat (or cats) are doing in the picture. You can easily get all the objects in a segmented image by post-processing, because after performing the prediction, you can get the object pixel areas and distinguish between different instances of them, if multiple separated areas exist under the same class prediction. Different deep learning architectures can achieve image segmentation. Fully Convolutional Networks (FCNs) and U-NETs are among the most effective. FCNs are built for the first part (called the encoder), which is the same as CNNs. After the initial series of convolutional layers, FCNs end with another series of CNNs that operate in a reverse fashion as the encoder (making them a decoder). The decoder is constructed to recreate the original input image size and output as pixels the classification of each pixel in the image. In such a fashion, the FCN achieves the semantic segmentation of the image. FCN are too computationally intensive for most real-time applications. In addition, they require large training sets to learn their tasks well; otherwise, their segmentation results are often coarse. Finding the encoder part of the FCN pretrained on ImageNet, which accelerates training and improves learning performance, is common. U-NETs are an evolution of FCN devised by Olaf Ronneberger, Philipp Fischer, and Thomas Brox in 2015 for medical purposes. U-NETs present advantages compared to FCNs. The encoding (also called contraction) and the decoding parts (also referred to as expansion) are perfectly symmetric. In addition, U-NETs use shortcut connections between the encoder and the decoder layers. These shortcuts allow the details of objects to pass easily from the encoding to the decoding parts of the U-NET, and the resulting segmentation is precise and fine-grained. Building a segmentation model from scratch can be a daunting task, but you don’t need to do that. You can use some pretrained U-NET architectures and immediately start using this kind of neural network by leveraging the segmentation model zoo (a term used to describe the collection of pretrained models offered by many frameworks) offered by segmentation models, a package offered by Pavel Yakubovskiy. You can find installation instructions, the source code, and plenty of usage examples at GitHub. The commands from the package seamlessly integrate with Keras.

View Article