How to Catalogue Existing Extract Files - dummies

How to Catalogue Existing Extract Files

By Thomas C. Hammergren

When you begin to consider what to do about the extract files and other types of sort-of data warehouse environments that exist in your organization, you must find them — which you might find difficult, considering the homegrown nature of these environments.

Here’s a hint: Follow the reports. Through group work sessions and individual meetings, determine and catalog the reports that employees use throughout the organization that you’re working with to build a data warehouse. Some of those reports probably come directly from the production applications, and their respective databases and files.

For now, don’t worry about these production application reports. (Keep track of them, though, because you can use them as an excellent starting point for the “what data do we need?” analysis, which determines what you want to put in the data warehouse.) Other reports come from data extracted from one or more applications and stored somewhere. Those reports are the ones to concentrate on now.

Using the set of reports as your starting point, first determine who’s using them and who’s responsible for generating them. You need to know who is using the generated reports because you might find that nobody uses them anymore. Just by assessing the inventory of reports and their current utilization, you’re halfway finished with eliminating this don’t-really-use-it functionality from your data warehousing environment (and managing its complexity).

Once you understand the report usage patterns, get to know the people responsible for generating the reports. They’re the ones who probably can tell you where they get the data, what processes they use to prepare and load that data before running the reports, and what issues and problems they have with data availability and integrity.

Sometimes, no single individual knows the entire end-to-end sequence of steps used to extract data, prepare and organize that data, and run the reports — especially when these processes cross organizational boundaries. (For example, the IT organization handles the initial extraction of the data and some rudimentary quality assurance, and the business organization handles the merge processes and runs the reports.)

In these situations, get all these people in the same room to discuss and agree on how things work. You can avoid spending a great deal of time playing “he said, she said” with people who, frankly, you’re probably aggravating with your constant questions and requests for meetings.

Eventually, through diligence, you get a complete picture of who’s using which data, who’s responsible for making that data available, and what’s going on behind the scenes to make it all happen.

Don’t forget the why part of the picture — for what business purposes extract files are being used. You have to find out this information anyway, as part of your requirements analysis. While you’re checking out what’s going on today, ask questions while you have the users’ attention, such as:

  • Why do you need the information in the report?

  • What decisions does this report assist you in making?

  • When you analyze the data on the report, who do you communicate your findings to?

Your line of questioning should follow a traditional understanding of what the report’s requirements are. Getting a firm understanding of these questions at this analysis point will save you time in the long run.