Determine What External Data You Really Need - dummies

Determine What External Data You Really Need

By Thomas C. Hammergren

Don’t overdo it when you think about what external data you need for your data warehouse. The same rule that applies to internal data in your warehouse is just as applicable to externally sourced data: Make sure that your analysis and decision making will have true business value before you go through the trouble of analyzing, transforming, storing, and making available all this data.

If your competitors’ sales data helps you get a clear picture of how you’re doing, go get it. If the knowledge that certain city populations are dramatically increasing or decreasing has no bearing on your company’s decision making, why bother acquiring and storing that data?

Suppose that the database service bureau from which you decide to purchase sales data has an extensive catalog of companies, time periods, and types of data elements, with a variety of package prices available.

When you’re considering raw data for your data warehouse, you might be tempted to think more is obviously better. Just like with internally provided data, however, you have to apply your business needs analysis before you begin to consider what data to acquire.

Some, perhaps many, of the data warehouse users will probably apply the simple “tell me what happened” style of querying and reporting, not OLAP “help me understand why something happened” or data mining “tell me what might happen” styles. Because simple querying and reporting almost always has an internal focus, you don’t need to consult those types of users about external data needs.

By process of elimination, therefore, you must make the remainder of the user community part of the external data business-needs analysis. Figure out who falls into this group as soon as possible so that you can focus your analysis and design efforts toward externally focused users. Follow these steps:

  1. Revalidate your list of total users.

    This list includes everyone in the company who’s a potential data warehouse user. Is everyone on the list still a candidate to use the data warehouse? (Or, if you’ve already deployed the data warehouse, does everyone on the list actually use it?)

    Do you need to add anyone to the list? If you’re satisfied with the accuracy of your data warehouse user list, continue to Step 2. If not, make sure that you adjust the list until it’s correct.

  2. For each person on the list, answer this question: To perform most effectively his or her assigned business functions, does this person need any data that’s not available from the company’s internal computer systems?

  3. Using the results from your interviewing, create a consolidated list of external data needs, the sources from which you can obtain the data, prices and fees, restrictions, and contact information.

  4. Talk to your project sponsor about budget approval.

    Make the request and do whatever else your company requires.

Often, in dealing with large user populations (100 or more people), data warehousing developers have a tendency to take a shortcut and apply the preceding question to groups of users, not to individual people, in the interest of meeting deliverable schedules.

If a bank’s credit-analysis organization, for example, has five people (Martha, Robin, Karen, Robert, and Sidney), all who have the same title of credit risk analyst, report to Suellen as peers on the organizational chart, and use the data warehouse, the same data needs apply across the entire group, right?

Don’t make this mistake. In more cases than not, a group of this size has at least two distinct business roles, each of which requires different external data (not to mention internal data). Robin and Robert might focus, for example, on credit card risk, so they need credit scores and market data only for bank cards; others in the group might concentrate on installment loan risk and therefore need external credit-risk data and other market data for different types of installment loans, such as auto, small business, and signature.

If you work with Robin and find out that she needs credit-card-oriented external data but wouldn’t use externally provided installment-loan data even if she had access to it, you absolutely don’t want to assume that no one else in Robin’s organization needs installment-loan data and that you don’t need to pursue that information.

Talk to everyone, even if it takes a little extra time.