Middleware Services: Data Movement and Data Loading
In most situations, the two middleware services — selection and extraction, and quality assurance — take place on the same platform (system) on which the data source resides. If your data warehouse will be hosted on a different platform than the data source, though, you have to use a data-movement service to effect the system-to-system transfer of the data.
You can likely use a relatively simple service (handled by a simple file-transfer program, for example). The movement service, if you need it at this point, simply moves the QA’d data into the environment in which you plan to make additional transformations.
If you’re doing your transformation and QA processing on a platform that’s different from the platform on which you run your production data warehouse (on a development server, for example, rather than on the operational server), you must execute one more data-movement service to get the data to the place where you want it to eventually reside. This process usually involves only a relatively simple file transfer.
The data-loading service loads the extracted, QA’d, transformed, and re-QA’d data into your warehouse. You might load data via a customized program, SQL (an INSERT statement, for example), or a utility.
If you need to load a large volume of data, try to use a fast-loading utility, which usually involves much less time than a programmatic or SQL-based approach.
If you use SQL to load your data into a relational database, try to make the loading as efficient as possible by turning off logging (if your DBMS product permits it). If the loading job is abnormally terminated, you just have to use the DROP or TRUNCATE statement to get rid of your partially loaded table, fix the problem that caused the termination, and restart the job.
This process usually is much faster than if you turn on the facilities needed for OLTP-style data and transaction integrity (with accompanying overhead).