Data Mining For Dummies
Book image
Explore Book Buy On Amazon

The Cross-Industry Standard Process for Data Mining (CRISP-DM) is the dominant process framework for data mining. In the first phase of a data-mining project, before you approach data or tools, you define what you’re out to accomplish and define the reasons for wanting to achieve this goal.

The business understanding phase includes four tasks (primary activities, each of which may involve several smaller parts).

Task: Identifying your business goals

The first thing you must do in any project is to find out exactly what you’re trying to accomplish! That’s less obvious than it sounds. Many data miners have invested time on data analysis, only to find that their management wasn’t particularly interested in the issue they were investigating. You must start with a clear understanding of

  • A problem that your management wants to address

  • The business goals

  • Constraints (limitations on what you may do, the kinds of solutions that can be used, when the work must be completed, and so on)

  • Impact (how the problem and possible solutions fit in with the business)

Deliverables for this task include three items (usually brief reports focusing on just the main points):

  • Background: Explain the business situation that drives the project. This item, like many that follow, amounts only to a few paragraphs.

  • Business goals: Define what your organization intends to accomplish with the project. This is usually a broader goal than you, as a data miner, can accomplish independently. For example, the business goal might be to increase sales from a holiday ad campaign by 10 percent year over year.

  • Business success criteria: Define how the results will be measured. Try to get clearly defined quantitative success criteria. If you must use subjective criteria (hint: terms like gain insight or get a handle on imply subjective criteria), at least get agreement on exactly who will judge whether or not those criteria have been fulfilled.

Task: Assessing your situation

This is where you get into more detail on the issues associated with your business goals. Now you will go deeper into fact-finding, building out a much fleshier explanation of the issues outlined in the business goals task.

Deliverables for this task include five in-depth reports:

  • Inventory of resources: A list of all resources available for the project. These may include people (not just data miners, but also those with expert knowledge of the business problem, data managers, technical support, and others), data, hardware, and software.

  • Requirements, assumptions, and constraints: Requirements will include a schedule for completion, legal and security obligations, and requirements for acceptable finished work. This is the point to verify that you’ll have access to appropriate data!

  • Risks and contingencies: Identify causes that could delay completion of the project, and prepare a contingency plan for each of them. For example, if an Internet outage in your office could pose a problem, perhaps your contingency could be to work at another office until the outage has ended.

  • Terminology: Create a list of business terms and data-mining terms that are relevant to your project and write them down in a glossary with definitions (and perhaps examples), so that everyone involved in the project can have a common understanding of those terms.

  • Costs and benefits: Prepare a cost-benefit analysis for the project. Try to state all costs and benefits in dollar (euro, pound, yen, and so on) terms. If the benefits don’t significantly exceed the costs, stop and reconsider this analysis and your project.

Decision makers often feel more comfortable allotting resources to projects that reduce costs than those that aim to increase revenue, so always look for cost-savings potential, and state savings opportunities first in your costs and benefits report.

Task: Defining your data-mining goals

Reaching the business goal often requires action from many people, not just the data miner. So now, you must define your little part within the bigger picture. If the business goal is to reduce customer attrition, for example, your data-mining goals might be to identify attrition rates for several customer segments, and develop models to predict which customers are at greatest risk.

Deliverables for this task include two reports:

  • Data-mining goals: Define data-mining deliverables, such as models, reports, presentations, and processed datasets.

  • Data-mining success criteria: Define the data-mining technical criteria necessary to support the business success criteria. Try to define these in quantitative terms (such as model accuracy or predictive improvement compared to an existing method). If the criteria must be qualitative, identify the person who makes the assessment.

Task: Producing your project plan

Now you specify every step that you, the data miner, intend to take until the project is completed and the results are presented and reviewed.

Deliverables for this task include two reports:

  • Project plan: Outline your step-by-step action plan for the project. Expand the outline with a schedule for completion of each step, required resources, inputs (such as data or a meeting with a subject matter expert), and outputs (such as cleaned data, a model, or a report) for each step, and dependencies (steps that can’t begin until this step is completed). Explicitly state that certain steps must be repeated (for example, modeling and evaluation usually call for several back-and-forth repetitions).

  • Initial assessment of tools and techniques: Identify the required capabilities for meeting your data-mining goals and assess the tools and resources that you have. If something is missing, you have to address that concern very early in the process.

About This Article

This article is from the book:

About the book author:

Meta S. Brown helps organizations use practical data analysis to solve everyday business problems. A hands-on data miner who has tackled projects with up to $900 million at stake, she is a recognized expert in cutting-edge business analytics.

This article can be found in the category: