How to Utilize the Markov Model in Predictive Analytics
The Markov Model is a statistical model that can be used in predictive analytics that relies heavily on probability theory. (It’s named after a Russian mathematician whose primary research was in probability theory.)
Here’s a practical scenario that illustrates how it works: Imagine you want to predict whether Team X will win tomorrow’s game. The first thing to do is collect previous statistics about Team X. The question that might arise is how far back you should go in history?
Let’s assume you were able to get to the last 10 past game outcomes in sequence. You want to know the probability of Team X winning the next game, given the outcomes of the past 10 games.
The problem is that the further back in history you want to go, the harder and more complex the data collection and probability calculation become.
Believe it or not, the Markov Model simplifies your life by providing you with the Markov Assumption, which looks like this when you write it out in words:
The probability that an event will happen, given n past events, is approximately equal to the probability that such an event will happen given just the last past event.
Written as a formula, the Markov Assumption looks like this:
Either way, the Markov Assumption means that you don’t need to go too far back in history to predict tomorrow’s outcome. You can just use the most recent past event. This is called the first-order Markov prediction because you’re considering only the last event to predict the future event.
A second order Markov prediction includes just the last two events that happen in sequence. From the equation just given, the following widely used equation can also be derived:
This equation aims to calculate the probability that some events will happen in sequence: event1 after event2, and so on. This probability can be calculated by multiplying the probability of each eventt (given the event previous to it) by the next event in the sequence. For instance, suppose you want to predict the probability that Team X wins, then loses, and then ties.
Here’s how a typical predictive model based on a Markov Model would work. Consider the same example: Suppose you want to predict the results of a soccer game to be played by Team X. The three possible outcomes — called states — are win, loss, or tie.
Assume that you’ve collected past statistical data on the results of Team X’s soccer games, and that Team X lost its most recent game. You want to predict the outcome of the next soccer game. It’s all about guessing whether Team X will win, lose, or tie — relying only on data from past games. So here’s how you use a Markov Model to make that prediction.
Calculate some probabilities based on past data.
For instance, how many times has Team X lost games? How many times has Team X won games? For example, imagine if Team X won 6 games out of ten games in total. Then, Team X has won 60 percent of the time. In other words, the probability of wining for Team X is 60 percent.
Calculate the probability of a loss, and then the probability of a tie, in the same way.
Use the Naïve Bayes probability equation to calculate probabilities such as the following:
The probability that Team X will win, given that Team X lost the last game.
The probability that Team X will lose, given that Team X won the last game.
Calculate the probabilities for each state (win, loss, or tie).
Assuming that the team plays only one game per day, the probabilities are as follows:
P (Win|Loss) is the probability that Team X will win today, given that it lost yesterday.
P (Win|Tie) is the probability that Team X will win today, given that it tied yesterday.
P(Win|Win) is the probability that Team X will win today, given that it won yesterday.
Using the calculated probabilities, create a chart.
A circle in this chart represents a possible state that Team X could attain at any given time (win, loss, tie); the numbers on the arrows represent the probabilities that Team X could move from one state to another.
For instance, if Team X has just won today’s game (its current state = win), the probability that the team will win again is 60 percent; the probability that they’ll lose the next game is 20 percent (in which case they’d move from current state = win to future state = loss).
Suppose you want to know the chances that Team X will win two games in a row and lose the third one. As you might imagine, that’s not a straightforward prediction to make.
However, using the chart just created and the Markov assumption, you can easily predict the chances of such an event occurring. You start with the win state, walk through the win state again, and record 60 percent; then you move to the loss state and record 20 percent.
The chances that Team X will win twice and lose the third game become simple to calculate: 60 percent times 60 percent times 20 percent which is 60 percent * 60 percent * 20 percent, which equals 72 percent.
So what are the chances that Team X will win, then tie, and then lose twice after that? The answer is 20 percent (moving from win state to tie state) times 20 percent (moving from tie to loss), times 35 percent (moving from loss to loss) times 35 percent (moving from loss to loss). The result is 49 percent.