How to Deal with Outliers Caused by Outside Forces
Be sure you check carefully for outliers before they influence your predictive analysis. Outliers can distort both the data and data analysis. For example, any statistical analysis done with data that leaves outliers in place ends up skewing the means and variances.
Unchecked or misinterpreted outliers may lead to false conclusions. Say your data that shows that a stock that was traded for a whole year at a price above $50 — but for only a few minutes out of that whole year the stock was traded at $20. The $20 price — an obvious exception — is the outlier in this dataset.
Now you have to decide whether to include the $20 stock price in your analysis; if you do, it has ramifications for the overall model. But what do you consider normal? Was the “flash crash” that took the stock market by surprise on May 6, 2010, a normal event or an exception?
During that brief time, the stock market experienced a sharp decline in prices across the board — which knocked the sample stock price down from $50 to $20, but had less to do with the stock than with wider market conditions. Does your model need to take the larger fluctuations of the stock market into account?
Anyone who’s lost money on brief moments of free-fall market considers those few minutes real and normal (even if they felt like an eternity to go through). A portfolio that diminishes in milliseconds due to a rapid decline, albeit short-lived, is clearly real. Yet the flash crash is an anomaly, an outlier that poses a problem for the model.
Regardless of what’s considered normal (which can change anyway), data sometimes contains values that don’t fit the expected values. This is especially true in the stock market, where virtually any event may send the market flying or plunging. You don’t want your model to fail when the reality changes suddenly — but a model and a reality are two different things.