Horizontal versus Vertical Data Storage Management
Most relational database managers have been built on a horizontal storage manager, which places all data in a database by row (or record) when a transaction occurs. A database table is represented as a chain of database pages that contain one or more data rows.
A horizontal storage manager provides fast online transaction processing (OLTP) support because most transactions occur in a record format — for example, inserting a general ledger entry or writing a check. However, when a user requests a record, the database page that contains the data is often moved into memory, which for business intelligence applications is highly inefficient.
Several specialty database products have emerged over the years designed to assist and optimize query-centered applications, such as business intelligence. Such products enable you to more readily develop interactive data warehouses. The goal of these column-wise databases is to increase the speed of decision support queries performed against large amounts of data.
When database administrators are asked if they would ever place an index on a column that contains a person’s area code, a student’s grade point average, or a customer’s total transactions, they respond with a resounding, “No!” They usually give this adamant response for reasons based on technology, such as:
We index based only on standard, well-known paths (such as name) because of the overhead of indexes.
The cardinality, or unique occurrence of data, would force the database to perform a table scan anyway.
Yet, when you ask users what information they need to fulfill their job responsibilities, they respond with these kinds of requirements:
See the number of people by area code in my territory so that I can more effectively manage my promotions.
Identify the top-ranked students in the graduating class so I can arrange the proper interviews.
Figure out which customers do business with my company and spend between $100,000 and $500,000 annually.
Each of these three requirements characterizes a different user request, yet they all perform similar functions: They’re decision support-oriented queries. A need to access data drives user information requirements, but the users’ access patterns aren’t compatible with most RDBMS indexing strategies. In short, the RDBMS technology gets in the way of the applications’ success.
A database table is represented as a chain of database pages that contain one or more data rows, as shown in this figure. A horizontal storage manager provides fast online transaction processing (OLTP) support because most transactions occur in a record format.
These relational databases assist query activity by using indexes. Indexes are built on top of the rows to simplify and accelerate data retrieval on common paths, as shown in the figure.
Data warehousing solutions, such as business intelligence, don’t use many of these indexing techniques because they have been crafted to assist OLTP applications in the frequent need to find and update individual rows within database tables.
To properly support typical user queries found in business intelligence, other storage and indexing techniques are required. Vendors such as Sybase and Vertica have built vertical storage managers. Instead of storing data by row, these products store the data by columns — hence the name vertical storage manager or column-wise storage.
This method of storage effectively solves the problem of user queries against large sets of data because a user often seeks only a few columns, versus the large number of columns managed in a row by a horizontal storage manager. With the data stored as a series of page changes, with each page containing column data, query processing time is reduced by a factor of 500 or more to 1.
Additionally, these products support other optimizations that assist in the speed of query processing, including data compression, parallel query operations, and multiple indexing techniques per column. The challenge in implementing such technologies involves standardization. Therefore, many data management departments refuse to implement such technologies because those technologies require additional support labor.