How to Test for Trends in Sales Forecasting
How do you know whether a forecasting trend is real? If you see a baseline that looks like it’s drifting up or down, does that represent a real trend or is it just random variation? To answer those questions, you have to get into probability and statistics. Fortunately, you don’t have to get into them too far — wrist-deep, maybe.
The basic train of thought goes like this:
- Use Excel to tell you what the correlation is between sales revenues and their associated time periods.
It doesn’t matter if I represent that time period as January 2011, February 2011, March 2011 . . . December 2016, or as 1, 2, 3 . . . 72.
- If there’s no relationship, as measured by the correlation, between revenues and time period, there’s no trend, and you don’t need to worry about it.
- If there is a relationship between revenues and time periods, you have to choose the best way to handle the trend.
- After Excel calculates the correlation, you have to decide whether it represents a real relationship between time period and revenue amount, or whether it’s just a lucky shot.
If the probability that’s it’s just luck is less than 5 percent, it’s a real trend. (Nothing magic about 5 percent, either — it’s conventional. Some people prefer to use 1 percent as their criterion — it’s more conservative than 5 percent, and they feel a little safer.) This raises the issue of statistical significance: What level of probability do you require before you decide that something (here, a correlation) is the real McCoy?
There are various methods for testing the statistical significance of a correlation coefficient. Here are three popular methods:
- Test the correlation directly and compare the result to the normal distribution.
- Test the correlation directly and compare the result to the t-distribution (the t-distribution, although similar to the normal curve, assumes that you’re using a smallish sample rather than an infinitely large population).
- Convert the correlation with the Fisher transformation (which converts a correlation coefficient to a value that fits in the normal curve) and compare the result to the normal distribution.
Other popular methods for testing the statistical significance of a correlation coefficient exist. Each returns a slightly different result. In practice, you’ll almost always make the same decision (the correlation is or is not significantly different from zero), regardless of the method you choose.
If you conclude that the trend the correlation measures is real (and when the probability is less than 1 percent that the correlation is a ghost, you probably should accept that conclusion), you have two more questions to ask yourself:
- Should you use a forecasting approach that handles trends well? You’d think that if you detected a trend, you should use a forecasting approach that handles trends well. That’s often true, but not necessarily. Suppose that instead of using time period as one of the variables in your correlation analysis, you used something such as sales revenues made by the competition.
If the competition’s revenues are slipping as yours are (or if both sets of revenues are growing), you’ll find a likely significant correlation between your revenues and the competition’s. But it’s quite possible — even likely — that there’s no real, causal relationship between their revenues and yours. It may be that both yours and theirs are correlated with the real causal factor: The size of the overall market is changing. In that case, you would probably be much better off using a measure of overall market size as your predictor variable. In this scenario, market size has a direct, causal relationship to your revenue, whereas your competition’s revenue has only an indirect relationship to your revenue.
- Should you detrend the data? A hidden variable, such as a consistent change in the overall size of a market, can lead you to believe that a predictor variable and the variable you want to forecast are directly related, when in fact they’re not. Or the predictor and the forecast may change in similar ways because they’re both related to time.
The way to handle this sort of situation is to detrend both variables first by means of a transformation.
Or you may prefer to make your forecast using an approach that doesn’t necessarily handle trends well, such as moving averages or simple exponential smoothing. One reason for doing this is that you may find the regression approach with your data set isn’t as accurate a forecaster as moving averages or smoothing. Again, see if you can transform the data to remove the trend.