Data Warehousing and Multimedia Information - dummies

Data Warehousing and Multimedia Information

By Thomas C. Hammergren

Every area of technology is constantly changing, and data warehousing is no exception. Because data warehousing is on the brink of a new generation of technologies, you must become familiar with some of the most significant trends.

Data warehouses typically include only a few different types of data: numbers, dates, and character-based information (such as names, addresses, product descriptions, and codes). The next wave of data warehousing, in which unstructured data ripe with multimedia content (pictures, images, video, audio, and documents) are included as part of a data warehouse, is detailed here.

Fire up your Web browser. Spend a few hours poking around the Internet, checking out all kinds of cool sites. You can find images, video and audio clips, entry forms for filling out information to submit to a site’s database, tabular results based on requests you might make — almost anything.

Today, an organization typically stores a large proportion of its data in documents created by using productivity tools such as Microsoft Excel and Word. Additionally, digitization advances in photography, document scanning, video production, and audio formats have further extended the range of unstructured data formats that you can use for business data.

The lines between structured data (traditional data types that computer applications have been using for years) and unstructured data (such as multimedia documents) have blurred. Not that long ago, if you wanted to create a multimedia environment that included both structured and unstructured data, you loosely followed these steps:

  1. Build a relational database for your structured data.

  2. Use a document-management system or an image-management system for your unstructured data.

  3. To handle logical links across environments, set aside in each relational database row one or more columns that point to related documents or images, as appropriate.

These environments were relatively awkward and prone to problems. For example, software upgrades to one system had an adverse effect on the other (links that break, for example).

The emergence of a new generation of business applications that merges traditional relational data structures with unstructured digital content has already begun. This profusion of digital content means that organizations are now seeking to manage both relational (structured) data and unstructured data at the enterprise level.

For example, consider a medical records application. Fifteen years ago, the application would most likely have maintained a list of medical records that were stored as simple rows and columns.

Today, and in the near future, a medical records application is more likely to manage a set of visit records that have reference images, x-rays, CAT scans, prescriptions, and other reference documents — and those records might also include higher-level capabilities such as spatial visualization, reporting, and analysis.

Many businesses will be (or currently are) eager to turn this unstructured data into useful information, but they’ll find (or they found) that their current data warehousing and business intelligence technology can’t deliver thorough analysis of this data. Traditional data warehousing and business intelligence technologies and infrastructure have inherent technological constraints that limit their ability to address this data.