Separate Data, Analysis, and Presentation in Excel Data Models
A data model provides the foundation upon which your Excel reporting mechanism is built. One of the most important concepts in a data model is the separation of data, analysis, and presentation. The fundamental idea is that you don’t want your data to become too tied into any one particular way of presenting that data.
To wrap your mind around this concept, think about an invoice. When you receive an invoice, you don’t assume that the financial data on the invoice is the true source of your data. It’s merely a presentation of data that’s actually stored in a database. That data can be analyzed and presented to you in many other manners: in charts, in tables, or even on websites. This sounds obvious, but Excel users often fuse data, analysis, and presentation.
For instance, some Excel workbooks contain 12 tabs, each representing a month. On each tab, data for that month is listed along with formulas, pivot tables, and summaries. Now what happens when you’re asked to provide a summary by quarter? Do you add more formulas and tabs to consolidate the data on each of the month tabs? The fundamental problem in this scenario is that the tabs actually represent data values that are fused into the presentation of your analysis.
For an example more in line with reporting, take a look at the following figure. Hard-coded tables like this one are common. This table is an amalgamation of data, analysis, and presentation. Not only does this table tie you to a specific analysis, but there’s little to no transparency into what the analysis exactly consists of. Also, what happens when you need to report by quarter or when another dimension of analysis is needed? Do you import a table that consists of more columns and rows? How does that affect your model?
The alternative is to create three layers in your data model: a data layer, an analysis layer, and a presentation layer. You can think of these layers as three different spreadsheets in an Excel workbook: one sheet to hold the raw data that feeds your report, one sheet to serve as a staging area where the data is analyzed and shaped, and one sheet to serve as the presentation layer. This figure illustrates the three layers of an effective data model:
As you can see, the raw dataset is located on its own sheet. Although the dataset has some level of aggregation applied to keep it manageably small, no further analysis is done on the Data sheet.
The analysis layer consists primarily of formulas that analyze and pull data from the data layer into formatted tables commonly referred to as staging tables. These staging tables ultimately feed the reporting components in your presentation layer. In short, the sheet that contains the analysis layer becomes the staging area where data is summarized and shaped to feed the reporting components. Notice on the Analysis tab, the formula bar illustrates that the table consists of formulas that reference the Data tab.
There are a couple of benefits to this setup. First, the entire reporting model can be refreshed easily by simply replacing the raw data with an updated dataset. The formulas on the Analysis tab continue to work with the latest data. Second, any additional analysis can easily be created by using different combinations of formulas on the Analysis tab. If you need data that doesn’t exist in the Data sheet, you can easily append a column to the end of the raw dataset without disturbing the Analysis or Presentation sheets.
You don’t necessarily have to place your data, analysis, and presentation layers on different spreadsheets. In small data models, you may find it easier to place your data in one area of a spreadsheet while building staging tables in another area of the same spreadsheet.
Along those same lines, remember that you’re not limited to three spreadsheets, either. That is to say, you can have several sheets that provide the raw data, several sheets that analyze, and several that serve as the presentation layer.
Wherever you choose to place the different layers, keep in mind that the idea remains the same. The analysis layer should primarily consist of formulas that pull data from the Data sheets into staging tables used to feed your presentation.