Data Mining For Dummies book cover

Data Mining For Dummies

By: Meta S. Brown Published: 09-29-2014

Delve into your data for the key to success

Data mining is quickly becoming integral to creating value and business momentum. The ability to detect unseen patterns hidden in the numbers exhaustively generated by day-to-day operations allows savvy decision-makers to exploit every tool at their disposal in the pursuit of better business. By creating models and testing whether patterns hold up, it is possible to discover new intelligence that could change your business's entire paradigm for a more successful outcome.

Data Mining for Dummies shows you why it doesn't take a data scientist to gain this advantage, and empowers average business people to start shaping a process relevant to their business's needs. In this book, you'll learn the hows and whys of mining to the depths of your data, and how to make the case for heavier investment into data mining capabilities. The book explains the details of the knowledge discovery process including:

  • Model creation, validity testing, and interpretation
  • Effective communication of findings
  • Available tools, both paid and open-source
  • Data selection, transformation, and evaluation

Data Mining for Dummies takes you step-by-step through a real-world data-mining project using open-source tools that allow you to get immediate hands-on experience working with large amounts of data. You'll gain the confidence you need to start making data mining practices a routine part of your successful business. If you're serious about doing everything you can to push your company to the top, Data Mining for Dummies is your ticket to effective data mining.

Articles From Data Mining For Dummies

page 1
page 2
page 3
page 4
page 5
page 6
page 7
68 results
68 results
Data Mining For Dummies Cheat Sheet

Cheat Sheet / Updated 02-17-2022

Data mining is the way that ordinary businesspeople use a range of data analysis techniques to uncover useful information from data and put that information into practical use. Data miners don’t fuss over theory and assumptions. They validate their discoveries by testing. And they understand that things change, so when the discovery that worked like a charm yesterday doesn’t hold up today, they adapt.

View Cheat Sheet
How to Get Data from Weka

Step by Step / Updated 03-27-2016

University of Waikato faculty members develop tools as part of their work toward advancement of the field of machine learning. These tools are used in teaching, by scientists, and in industry. Weka is its general-purpose data-mining tool that offers a visual programming interface and a wide range of analytics capabilities. MOA is for real-time mining of data streams. To import the sample data in Weka, follow these steps:

View Step by Step
Labeling Data

Step by Step / Updated 03-27-2016

Using codes for data reduces data entry time, prevents errors, and reduces the memory requirements for storing the data. But the codes aren’t meaningful unless you have documentation, or labels, to explain their meaning. Some data formats enable you to enjoy the advantages of using codes while keeping the information about the meaning of the codes in the same file. These aren’t typical in data mining — you’re more likely to see them in statistical analysis products — but some data-mining applications can use these labeled data formats. Here’s how they work.

View Step by Step
3 Ways to Work Fast with Graphs Galore

Step by Step / Updated 03-27-2016

Data miners work fast. One way to improve your productivity is to take full advantage of tools that let you do several things at once. It’s time-consuming (and boring) to set up a number of graphs separately, one at a time. So use these alternatives whenever you can:

View Step by Step
5 Ways to Extend Your Graphics Range

Step by Step / Updated 03-27-2016

Because data miners lean heavily on basic graphs, some data-mining applications offer little or nothing more. Others provide a wide range of graph options, from the common to the exotic. It’s not necessary to use all of these, but you may benefit by selecting and using a few that suit your own needs. Data miners often use these graphs:

View Step by Step
How to Get Data from KNIME

Step by Step / Updated 03-27-2016

Your first hands-on step with data is getting it from wherever it is to the place where you need it to be. Text formats are common, and you’re likely to encounter them often. One of the most common is comma-separated value (.csv) text. KNIME.com AG is a small software and services firm focused on data mining. It offers a data-mining product with a visual programming interface. To open the sample data in KNIME:

View Step by Step
How to Get Data from RapidMiner

Step by Step / Updated 03-27-2016

RapidMiner is a small software and services firm focused on data mining. It offers a data-mining product with a visual programming interface. To open the sample data in RapidMiner, follow these steps:

View Step by Step
How to Get Data from Orange

Step by Step / Updated 03-27-2016

The Bioinformatics Laboratory of the Faculty of Computer and Information Science, University of Ljubljana, Slovenia, develops Orange in cooperation with an open source community. To open the sample data in Orange, follow these steps:

View Step by Step
Preventing Data Privacy Disasters

Article / Updated 03-26-2016

Data privacy is a big issue for data miners. News reports outlining the level of personal data in the hands of the US government's National Security Agency and breaches of commercial data sources have raised public awareness and concern. A central concept in data privacy is personally identifiable information (PII), or any data that can be traced to the individual person it describes. PII includes obvious identifiers such as names, credit card numbers, and social security numbers, and most data miners are well aware that this kind of data is private and must be handled with care. But PII refers to more than just these obvious identifiers. Any data that could be used to identify an individual, even if doing so requires using several fields in combination or manipulating the data in some way, is also PII. It's easy for data miners to overlook this kind of data, the kind that does not appear on the surface to be private, and yet could be sufficient for personal identification if it were manipulated for that purpose. If there is any way that data could be manipulated to identify individuals, it must be handled with the same precautions as you would give a list of credit card numbers. That's where data miners can easily get themselves in trouble. There are many ways to identify individuals if you make a little effort to do so. In one notable example, AOL Research released user search records for research use. The data was intended to be anonymous, there were no names in it, but The New York Times reported that it had been able to identify an individual from the search data by cross-referencing with phone listings. Later, Netflix made movie rating data available for use in a competition, and it was soon revealed that this data, too, could be used to identify individuals. In your work as a data miner, you may have experiences with prospective clients who shared data they claimed was anonymous (or even faked, to illustrate a point of discussion), but found that the data was nothing of the kind. Knowingly or not, these people are violating data privacy laws and exhibiting a lack of respect for their own customers. So, how can you prevent disasters like these? Don't try to do it alone. It's challenging to ensure compliance with all relevant data privacy laws, not to mention other good business practices. Jenny Juliany, Vice President of Solutions Architecture and Co-Founder of Intreis, a solutions integrator specializing in service management and compliance automation, describes the life cycle of data with an analogy to the four seasons: Spring: Inception, the data is created. Summer: Primetime, the data is in active use. Fall: Retirement, the data is no longer relevant or used, but there may be legal or other reasons to retain it. Winter: Removal, the data is destroyed. Each season has its own characteristics, with distinct requirements surrounding data privacy. Some are grounded in the law, others in common sense, and still others in individual agreements with clients and your own employer's business practices. It's not realistic to believe you can take on all these compliance details in addition to your primary role, so you must partner with your organization's data management professionals. You don't want to be the center of the next big data privacy scandal. Respect for data privacy and proper data management is the key to minimizing that risk. Don't wait until something goes wrong, contact the data privacy expert in your own organization today, and start building a working partnership to properly manage sensitive data. More details on the data lifecycle from Jenny Juliany on the Four Seasons of Data Management can be found here: 'Spring' Inception 'Summer' Primetime 'Fall' Retirement 'Winter' Removal

View Article
10 Common Data Mining Mistakes (That You Won't Make)

Article / Updated 03-26-2016

Data mining is done by trial and error, and so, for data miners, making mistakes is only natural. Mistakes can be valuable, in other words, at least under certain conditions. Not all mistakes are created equal, however. Some are just better avoided. The following list offers ten such mistakes. If you read through them carefully, and commit them to memory, you just might avoid a few bumps on the learning curve: Skipping data quality checks: Most data miners think developing predictive models is more fun than reviewing data for quality problems. But if you fail to detect and correct data quality problems, you could end up with worthless predictions. Missing the point: You've discovered something fascinating! That's nice, but if it isn't also relevant to the business problem you set out to solve, well, it isn't relevant at all. Get back on track. Believing that a pattern in the data proves a cause-and-effect relationship: You explore a dataset and notice that when Variable A increases, Variable B increases, too. This could occur because Variable A influences Variable B, or because Variable B influences Variable A. On the other hand, it could be that both are influenced by some other variable that you have not considered. Or it could be a one-time coincidence. Who can say? Stretching conclusions too far: Don't presume that the relationships you observe in data will recur in different circumstances. If your data was collected in a cool environment, don't assume that things will work the same way in a hot factory setting. Betting on results that don't make sense: Data mining methods are informal and not usually backed up by scientific method and theory, so your results had better at least make business sense. If there's no common sense explanation for the results you present, your executive management probably won't take it seriously, and they shouldn't. Falling in love with a particular modeling method: There is no single type of data mining model that fits every situation. Putting a model into production without adequate testing: Don't bet your business on a predictive model until you have tested it with holdout data and on a small scale in the field. Ignoring results you don't like: If you ignore your data now, it will come back one day and say, "I told you so." Using data mining to address every data analysis need: Data mining has tremendous value, yet some applications still call for rigorous data collection methods, formal statistical analysis, and scientific method. Presuming that traditional data analysis techniques no longer matter: Refer to the previous bullet.

View Article
page 1
page 2
page 3
page 4
page 5
page 6
page 7