The Idea behind Multidimensional Databases - dummies

The Idea behind Multidimensional Databases

By Thomas C. Hammergren

Multidimensional databases (MDDBs) throw out the conventions of their relational ancestors and organize data in a manner that’s highly conducive to multidimensional analysis. To understand multidimensional databases, therefore, you must first understand the basics of the analytical functions performed with the data stored in them.

Multidimensional analysis is built around a few simple data organization concepts — specifically, facts and dimensions:

  • Facts: A fact is an instance of some particular occurrence or event and the properties of the event all stored in a database. Did you sell a watch to a customer last Friday afternoon? That’s a fact. Did your store receive a shipment of 76 class rings yesterday from a particular supplier? That’s another fact.

  • Dimensions: A dimension is a key descriptor, an index, by which you can access facts according to the value (or values) you want. For example, you might organize your sales data according to these dimensions: time, customer, and product.

The basics

In these simple examples, you can organize and view your sales data as a three-dimensional array, indexed by the time, customer, and product dimensions:

  • In October 2008 (the time dimension), Customer A (the customer dimension) bought class rings (the product dimension) — 79 of them for $8,833.

  • In 2007 (the time dimension), Customer A (the customer dimension) bought many different products (the product dimension) — a total of 3,333 units for $55,905 (the facts).

Notice the subtle different between the way the dimensions are used in these two examples. In the first one, the time dimension relates to a month; the customer dimension relates to a specific customer; and the product dimension is for a specific product.

In the second example, however, time is for a year, not a month; customer is still the same (an individual customer); and product is for the entire product line.

Multidimensional analysis supports the notion of hierarchies in dimensions. For example, you can organize time in a hierarchy of year→quarter→month. You can view facts (or the consolidation of facts) in the database at any one of these levels: by year, quarter, or month.

Similarly, you can organize products in a hierarchy of product family→product type→specific products. Class rings might be a product type; “class ring, modern style, onyx stone” might be a specific product. Furthermore, class rings, watches, other rings, and other items all would roll up into the jewelry product family.

Is there a limit to the number of dimensions?

Theoretically, you can have as many dimensions in your multidimensional model as necessary. The question always exists, however, of whether your multidimensional database product can support them. But here’s a more important question — even if a product allows for a certain number of dimensions (15, for example), does it make sense to create a model of that size?

You should work closely with your users to determine whether the number of dimensions makes your solution too complex — and therefore limiting the population of users — or improves the ease of use — and therefore expanding the user population.

You can, for example, add geography to the dimension list that contains time, customer, and product so that you can see and organize facts according to sales territories, states, cities, and specific stores.

How should you choose the levels in a hierarchy?

The levels in a hierarchy enable you to perform drill-down functionality. And by having multiple levels within a hierarchy, you can quickly get answers to your questions because of the information that has been set up at each of those specified levels, so that information is just waiting for your queries.

Because multidimensional databases have fairly rigid structures built around the precalculation of facts (creating and storing aggregates in the database, rather than performing report-time aggregation and calculation), the more dimensions you have and the more levels in each dimension you have, the greater your storage requirements and the longer your build or load times.

Physical database structures in an MDDB

Although nearly all MDDB products are built around the concept of facts, dimensions, and hierarchies, no one has come up with an MDDB standard definition. In the relational world, non-standardization has also been somewhat of a problem, particularly in relation to value-added features, such as constraints and stored procedures.

The basic relational table-row-column structure, however, has been fairly easy to export or unload into a flat file of some type and then reload it into another RDBMS product.

In the MDDB world, vendors have taken a variety of different approaches to their respective products’ physical representations of data. They’re all seeking ways to overcome storage and complexity problems caused by large numbers of dimensions (for example, more than 15) and deep levels of hierarchies (for example, 20 levels deep).

When you’re evaluating products, don’t get caught up in worrying about physical storage techniques: Just make sure the logical representations that come with the products (such as the hierarchies, levels, and facts) can meet your business needs. Eliminate products that seem clunky or that have, for example, a hierarchy model that doesn’t seem quite right for your data.

Then, after you find products that seem to fit your business, kick the tires a little (so to speak) to see how they work inside.