Don’t Turn Your Excel Data Model into a Database
You might have read that measures used on an Excel dashboard absolutely should support the initial purpose of that dashboard. The same concept applies to the back-end data model. You should only import data that’s necessary to fulfill the purpose of your dashboard or report.
In an effort to have as much data as possible at their fingertips, many Excel users bring into their spreadsheets every piece of data they can get their hands on. You can spot these people by the 40-megabyte files they send through email. You’ve seen these spreadsheets — two tabs that contain some reporting or dashboard interface and then six hidden tabs that contain thousands of lines of data (most of which isn’t used). They essentially build a database in their spreadsheet.
What’s wrong with utilizing as much data as possible? Well, here are a few issues:
Aggregating data within Excel increases the number of formulas. If you’re bringing in all raw data, you have to aggregate that data in Excel. This inevitably causes you to exponentially increase the number of formulas you have to employ and maintain.
Remember that your data model is a vehicle for presenting analyses, not processing raw data. The data that works best in reporting mechanisms is what’s already been aggregated and summarized into useful views that can be navigated and fed to dashboard components. Importing data that’s already been aggregated as much as possible is far better.
For example, if you need to report on Revenue by Region and Month, there’s no need to import sales transactions into your data model. Instead, use an aggregated table consisting of Region, Month, and Sum of Revenue.
Your data model will be distributed with your dashboard. In other words, because your dashboard is fed by your data model, you need to maintain the model behind the scenes (likely in hidden tabs) when distributing the dashboard. Besides the fact that it causes the file size to be unwieldy, including too much data in your data model can actually degrade the performance of your dashboard. Why?
When you open an Excel file, the entire file is loaded into memory to ensure quick data processing and access. The drawback to this behavior is that Excel requires a great deal of RAM to process even the smallest change in your spreadsheet. You may have noticed that when you try to perform an action on a large, formula-intensive dataset, Excel is slow to respond, giving you a Calculating indicator on the status bar. The larger your dataset is, the less efficient the data crunching in Excel is.
Large datasets can cause difficulty in scalability. Imagine that you’re working in a small company and you’re using monthly transactions in your data model. Each month holds 80,000 lines of data. As time goes on, you build a robust process complete with all the formulas, pivot tables, and macros you need to analyze the data that’s stored on your neatly maintained tab.
Now what happens after one year? Do you start a new tab? How do you analyze two datasets on two different tabs as one entity? Are your formulas still good? Do you have to write new macros?
These are all issues that can be avoided by importing only aggregated and summarized data that’s useful to the core purpose of your reporting needs.