Envision the World as a Graph with Bayes’ Theorem
Bayes’ theorem can help you deduce how likely something is to happen in a certain context, based on the general probabilities of the fact itself and the evidence you examine, and combined with the probability of the evidence given the fact. Seldom will a single piece of evidence diminish doubts and provide enough certainty in a prediction to ensure that it will happen. As a true detective, to reach certainty, you have to collect more evidence and make the individual pieces work together in your investigation. Noticing that a person has long hair isn’t enough to determine whether person is female or a male. Adding data about height and weight could help increase confidence.
The Naïve Bayes algorithm helps you arrange all the evidence you gather and reach a more solid prediction with a higher likelihood of being correct. Gathered evidence considered singularly couldn’t save you from the risk of predicting incorrectly, but all evidence summed together can reach a more definitive resolution. The following example shows how things work in a Naïve Bayes classification. This is an old, renowned problem, but it represents the kind of capability that you can expect from an AI. The dataset is from the paper “Induction of Decision Trees,” by John Ross Quinlan. Quinlan is a computer scientist who contributed to the development of another machine learning algorithm, decision trees, in a fundamental way, but his example works well with any kind of learning algorithm. The problem requires that the AI guess the best conditions to play tennis given the weather conditions. The set of features described by Quinlan is as follows:
- Outlook: Sunny, overcast, or rainy
- Temperature: Cool, mild, or hot
- Humidity: High or normal
- Windy: True or false
The following table contains the database entries used for the example:
The option of playing tennis depends on the four arguments shown here.
The result of this AI learning example is a decision as to whether to play tennis, given the weather conditions (the evidence). Using just the outlook (sunny, overcast, or rainy) won’t be enough, because the temperature and humidity could be too high or the wind might be strong. These arguments represent real conditions that have multiple causes, or causes that are interconnected. The Naïve Bayes algorithm is skilled at guessing correctly when multiple causes exist.
The algorithm computes a score, based on the probability of making a particular decision and multiplied by the probabilities of the evidence connected to that decision. For instance, to determine whether to play tennis when the outlook is sunny but the wind is strong, the algorithm computes the score for a positive answer by multiplying the general probability of playing (9 played games out of 14 occurrences) by the probability of the day’s being sunny (2 out of 9 played games) and of having windy conditions when playing tennis (3 out of 9 played games). The same rules apply for the negative case (which has different probabilities for not playing given certain conditions):
likelihood of playing: 9/14 * 2/9 * 3/9 = 0.05
likelihood of not playing: 5/14 * 3/5 * 3/5 = 0.13
Because the score for the likelihood is higher, the algorithm decides that it’s safer not to play under such conditions. It computes such likelihood by summing the two scores and dividing both scores by their sum:
probability of playing : 0.05 / (0.05 + 0.13) = 0.278
probability of not playing : 0.13 / (0.05 + 0.13) = 0.722
You can further extend Naïve Bayes to represent relationships that are more complex than a series of factors that hint at the likelihood of an outcome using a Bayesian network, which consists of graphs showing how events affect each other. Bayesian graphs have nodes that represent the events and arcs showing which events affect others, accompanied by a table of conditional probabilities that show how the relationship works in terms of probability. The figure shows a famous example of a Bayesian network taken from a 1988 academic paper, “Local computations with probabilities on graphical structures and their application to expert systems,” by Lauritzen, Steffen L. and David J. Spiegelhalter, published by the Journal of the Royal Statistical Society.
The depicted network is called Asia. It shows possible patient conditions and what causes what. For instance, if a patient has dyspnea, it could be an effect of tuberculosis, lung cancer, or bronchitis. Knowing whether the patient smokes, has been to Asia, or has anomalous x-ray results (thus giving certainty to certain pieces of evidence, a priori in Bayesian language) helps infer the real (posterior) probabilities of having any of the pathologies in the graph.
Bayesian networks, though intuitive, have complex math behind them, and they’re more powerful than a simple Naïve Bayes algorithm because they mimic the world as a sequence of causes and effects based on probability. Bayesian networks are so effective that you can use them to represent any situation. They have varied applications, such as medical diagnoses, the fusing of uncertain data arriving from multiple sensors, economic modeling, and the monitoring of complex systems such as a car. For instance, because driving in highway traffic may involve complex situations with many vehicles, the Analysis of MassIve Data STreams (AMIDST) consortium, in collaboration with the automaker Daimler, devised a Bayesian network that can recognize maneuvers by other vehicles and increase driving safety. Read more about this project and see the complex Bayesian network.