Phases of the Data Mining Process

By Meta S. Brown

Part of Data Mining For Dummies Cheat Sheet

The Cross-Industry Standard Process for Data Mining (CRISP-DM) is the dominant data-mining process framework. It’s an open standard; anyone may use it. The following list describes the various phases of the process.

  • Business understanding: Get a clear understanding of the problem you’re out to solve, how it impacts your organization, and your goals for addressing it. Tasks in this phase include:

    • Identifying your business goals

    • Assessing your situation

    • Defining your data mining goals

    • Producing your project plan

  • Data understanding: Review the data that you have, document it, identify data management and data quality issues. Tasks for this phase include:

    • Gathering data

    • Describing

    • Exploring

    • Verifying quality

  • Data preparation: Get your data ready to use for modeling. Tasks for this phase include:

    • Selecting data

    • Cleaning data

    • Constructing

    • Integrating

    • Formatting

  • Modeling: Use mathematical techniques to identify patterns within your data. Tasks for this phase include:

    • Selecting techniques

    • Designing tests

    • Building models

    • Assessing models

  • Evaluation: Review the patterns you have discovered and assess their potential for business use. Tasks for this phase include:

    • Evaluating results

    • Reviewing the process

    • Determining the next steps

  • Deployment: Put your discoveries to work in everyday business. Tasks for this phase include:

    • Planning deployment (your methods for integrating data mining discoveries into use)

    • Reporting final results

    • Reviewing final results