How to Determine the Size for Your Data Warehouse - dummies

How to Determine the Size for Your Data Warehouse

By Thomas C. Hammergren

A common misconception that many data warehouse aficionados hold is that the only good data warehouse is a big data warehouse —an enormously big data warehouse. Many people even take the stance that unless they have some astronomically large number of bytes stored, it isn’t truly a data warehouse.

“Five hundred gigabytes? Okay, that’s a real data warehouse; it would be a better data warehouse, however, if it had at least a terabyte (1 trillion bytes) of data. Twenty-five gigabytes? Sorry, that’s a data mart, not a data warehouse.”

The size of a data warehouse is a characteristic — almost a by-product — of a data warehouse; it’s not an objective. No one should ever set out with a mission to “build a 500-gigabyte data warehouse that contains (whatever).”

To determine the size you need for your data warehouse, follow these steps:

  1. Determine the mission, or the business objectives, of the data warehouse.

    Ask the question, “Why bother creating this warehouse?”

  2. Determine the functionality that you want the data warehouse to have.

    Figure out what types of questions users will ask.

  3. Determine what contents (types of data) the data warehouse needs to support its functionality.

    Understand what types of answers your users will seek.

  4. Determine, based on the content volume (which is based on the functionality, which in turn is based on the mission), how big you need to make your data warehouse.