Data Warehousing: From Unstructured Information to Structured Data - dummies

Data Warehousing: From Unstructured Information to Structured Data

By Thomas C. Hammergren

Some data warehousing architecture plans demonstrate an approach of putting structured data first, in which a business analyst uses data warehousing as a gateway into appropriate unstructured supporting information.

You can just as easily take the opposite path toward a unified approach to business intelligence. Suppose that you’re browsing the Internet or the company intranet, and a product diagram, blueprint, or some other type of image or document catches your attention.

Each piece of unstructured information can just as easily provide a path for you to access an OLAP-generated report posted on the company intranet (which can, in turn, have links that point you toward other structured or unstructured information) or can involve a mash-up with your information projected on a Google Map.

The structured and unstructured data barriers are breaking down quickly, just like the pathways across those softened barriers. Increasingly, businesses are amassing large volumes of non-relational, unstructured data in the form of digital images, documents, videos, and other multimedia formats — and these new data formats are quickly becoming a key component in formal and informal business processes that integrate with existing business applications, comply with regulatory requirements, or simply provide a richer user experience.

Consider the following business scenarios:

  • A pharmaceutical company needs to access lab documentation compiled over years and generations of clinical trials to gain FDA approval for a new medicine.

  • An insurance company needs to store policy documents and retrieve them for claims processing.

  • A call center company needs to store agent-recorded sessions as audio streams so that they can be retrieved remotely for quality assurance and contract compliance.

  • An industry analyst firm needs to make a searchable library of podcasts available for download from its Web site.

  • A legal practice needs to store electronic copies of documents as images and easily retrieve the documents relating to an individual client or case.

  • An architect partnership needs to store and retrieve digital plans with the associated client data.

  • A library needs to convert and archive large volumes of existing paper and analog content for indexing and use in a digital research tool (remember microfiche?).

These are a few examples of the ways in which businesses throughout the world can and do use unstructured digital data. You can analyze such information as easily as you can create digital content. Organizations are finding new, innovative ways to use this digital content to improve or extend their business capabilities, and many of those organizations need data warehousing and business intelligence solutions to leverage this information.

If you use traditionally collaborative work processes, performing tasks such as workflow or image management, you can easily augment those processes to point you toward data warehousing capabilities that provide you with additional value. In addition, the reports and query results you get and use as part of traditional analytical processing can serve as a pathway into a world of multimedia information that can supplement the data you typically handle.