Enterprise Information Integration Services
Around 1995, vendors began positioning their software as virtual data warehousing tools. The fundamental premise was that sometimes it just doesn’t make sense to copy and manipulate a bunch of data, just in case someone needs it. Why not access data directly from the source on an as-needed basis?
Alas, accessing data over a network at its source has proved to be the least challenging of the problems in trying to provide a kind of in-place data warehousing. The same challenges faced in any data warehousing environment (such as dealing with data quality, deciding what types of transformations must occur, and choosing how to handle those transformations when different sources are inconsistent) are still present.
Just because you can get to data at its source (in almost any database or file structure) doesn’t mean that data provides the necessary business intelligence when it’s in your hands.
To solve these data quality issues, many data architects have begun to perform bottom-up data mart construction to develop a component-based data warehouse. Rather than have a single database into which you feed all data (creating your data warehouse), a series of components each handles a particular set of functions (such as answering specific business questions) or certain subjects. Together, these data marts (or components) comprise a data warehousing environment.
This component-based, dynamic access data architecture is the basis for virtual data warehousing and, more specifically, what Enterprise Information Integration (EII) servers are offering to the market.
This figure shows an environment in which individual components are created within the data warehousing environment in a bottom-up manner. Instead of combining the components into one large database (and copying all the data again), EII creates a data warehousing environment in which users can access each component’s contents from a business intelligence tool like they were all stored together, even though they’re not.
Think about how you use a Web browser on your desktop. You either click a link or type a specific URL, and the environment, working behind the scenes, takes you to the right place for the content you asked for. Now, imagine the Internet running much faster.
When you go to various sites, you’re not accessing ads for the latest four-wheel drive you’ve been coveting, sports scores, Dilbert cartoons, or whatever else it is you do on the Internet. You’re bringing back pieces of data that are then combined and sent back to your browser. That’s virtual data warehousing — it’s just like the Internet!
It’s not a good idea to build a virtual data warehousing environment to access source data directly, in its native format. Your challenge isn’t figuring out how to join cross-platform databases (combining IMS data with DB2 data, for example) and handling those types of system-level transformation, it’s ensuring that the quality of the data is high and doesn’t require the user to manually cleanse the data.
Each application should therefore be warehouse-enabled and contain a data publisher that’s responsible for all the middleware services (such as extraction and quality assurance), as specified in the environment’s business rules.
The data publisher could conceivably operate almost in real-time mode, like it would have to do in an operational data store, or it could function in a periodic (batch-oriented) mode if instantaneous updates aren’t required. In this situation, the data publisher is a mini-middleware product embedded in the application (or a service accessed by the application).
When you think of virtual data warehousing, replace the question Can I get to the data? with the question Can I get to usable data? The data publisher plays an important role, and should not be neglected.
You also can’t neglect data architecture. Just because you’re developing components in a bottom-up manner and they’re being accessed in place, rather than being copied into a larger data warehouse database, doesn’t mean that you can neglect this function.
Say that one component stores customer IDs as five-digit numbers after transformation occurs and contains only customers who made purchases within the past six months. And another component, which contains all customers who have ever bought your company’s products, uses seven-character alphanumeric identifiers. In this situation, you might have the same type of data mismatch problems you would if you were accessing data directly from the sources.
Although EII allows for differences between component contents, you must understand and manage the differences so that you don’t impede the business intelligence mission.