Databases and Data Mining
Data collected by large organizations in the course of everyday business is usually stored in databases. But database administrators may not be willing to allow data miners direct access to these data sources, and direct access may not be the best option from your point of view either. Direct access to operational (used for routine business operations) databases can be a bad idea because
Data miners use a lot of data. You could unintentionally tie up resources and interfere with ordinary business operations.
Legal and other business obligations matter. You could unintentionally violate a data privacy law or other data management requirement if your data access is not properly controlled.
Operational databases are not organized for data mining. You could spend a lot of time struggling to get the data you need, and still not be sure of getting it right.
When you need data from an operational database (and you have the appropriate approval to use the data), you should discuss your needs with the administrator responsible for that data. You’ll need to explain exactly what data you need, the format you need for data mining, and whether you need the data just once or on an ongoing basis.
The best approach for one-time requests is often for the administrator to extract the data for you and deliver it in a text file or other acceptable format.
Ongoing data access is another matter. The administrator may not want to provide data extracts over and over, and giving you direct access to business systems is risky. A common solution is to create an analytic database. This is an ordinary relational database that is separate from conventional business systems. Data is routinely (and automatically) transferred from business systems to the analytic database, and data miners can access it at any time.
If you use an analytic database, make sure that it is organized properly to support data mining. Help your database administrator by sketching a diagram like The one shown to demonstrate how the data must be organized.
If the database administrator insists that the data can’t be stored this way, ask whether it’s possible to create a view (a stored query that can be queried as if it were a conventional data table) with the organization that you need.
Many data-mining products are able to read data from databases. The steps required vary based on the
Design of the data-mining application
Structure of the source database
Middleware, usually called a driver (ODBC driver, JDBC driver), special software that mediates between the database and applications software
Documentation for your data-mining application should tell you whether it can read data from a database, and if so, what tool or function to use, and how. The administrator who sets up the analytics database can provide details about accessing the database.
If you’re already comfortable working with databases and other applications, you’ll find nothing surprising about doing the same things with a data-mining application. If databases are new to you, get a knowledgeable person from your organization to walk you through the process with your own database and data-mining application.