Data Mining For Dummies
Book image
Explore Book Buy On Amazon

Your own internal data is often the most relevant data you can get. Government and nonprofit sources offer valuable data free. Use these sources whenever you can! When those sources don’t meet your needs, you’ll have to turn to commercial data suppliers. But which suppliers?


Acxiom is a major source for consumer marketing data. Acxiom’s data sources include publicly available property transaction records, auto warranty and service records, consumer-reported product registrations, surveys, census neighborhood statistics, and retailers. It provides

  • Demographics, such as age and gender

  • Home information, such as whether the consumer owns or rents

  • Motor vehicle information, such as make, model, and insurance renewal

  • Economic data, including income range and credit card use

  • Purchase data, types of products purchased, and frequency

  • Interests and indicators of interest, such as sports, arts and crafts, pet ownership, and other such categories


Corelogic is a source for property and financial information. Its offerings are aimed primarily at lenders, insurers, and landlords.


Datalogix provides sales data for consumer packaged goods (CPG), a category that covers thousands of consumable products in categories such as

  • Food and beverages

  • Clothing and shoes

  • Tobacco

  • Cleaning products

  • Pet care items

  • Cosmetics

This type of data is used primarily by marketers promoting CPG brands.


DataSift provides social media activity data. If you’d like to know how often a topic is being mentioned in social media, who’s talking, and what they’re saying, you can get this data through DataSift.

DataSift offers data from more than 20 sources, ranging from the well-known Twitter, Facebook public posts, YouTube, and Bitly to rising voices Sina Weibo, Intense Debate, and Yammer. It is one of only two sources for complete Twitter data.


eBureau is primarily a provider of scoring services for

  • Fraud detection

  • Credit risk

  • Collections

  • Consumer lead quality evaluation


Equifax is one of the three major credit-reporting agencies. It provides information about consumer credit activity and credit scores. It also provides consumer demographics, credit information about businesses, supplier information, and platforms for the management of collections and other business activities.


Experian is best known as one of the three major credit-reporting agencies. It provides consumer and business credit data and credit scoring. Experian also offers many types of consumer data, including

  • Brand preference and psychographic measures for many population segments, including kids and teens; lesbians, gays, bigenders, and transgenders; and Hispanic groups

  • Media behavior, brand preference, and attitudes by location down to the zip code level

  • Consumer online, mobile, and other media behavior


Gnip provides social media data. It supplies posts and other data from many social media sources, and offers the same advantages over collecting this data through the social media site APIs.

ID Analytics

ID Analytics focuses on identity fraud risk, providing an identity score to help businesses assess the risk of identity fraud in business transactions. These scores are calculated based on a combination of personally identifiable information (such as date of birth, phone number, and social security number) and device history (computer, smartphone, and so on) to assess the risk of identity fraud.


Intelius focuses on information about people and identity, including

  • Verification, such as reverse phone verification and email lookup

  • Information, such as people search and social net search

  • Protection, such as cell phone caller ID and criminal check

  • Marriage, divorce, and death records

  • Business services, such as employment and tenant screening


IRI provides data on consumer shopping and attitudes, with information of considerable depth for CPG marketers, including unique survey data not available elsewhere. Offerings include

  • Consumer panel data for information about consumer shopping and buying habits, attitudes, and demographics

  • Online health and wellness surveys that provide data to support brand marketing

  • Point-of-sale data for more than 12,500 retail stores


Nielsen is most famous as the company that creates television ratings. Nielsen tracks television viewing, it’s true, but it also tracks audience data for a number of other media platforms, including online, mobile, radio, and social media.

This is not survey data; Nielsen directly measures actual audience behavior. The data is typically used by advertisers to plan their media purchases. Nielsen also provides data about consumer buying behavior, obtained through point-of-sale records in stores and online, and data about shopping behavior and attitudes obtained from panel data from over 250,000 households across 25 countries.


Data miners who want to use information about individuals from online sources often find it challenging to match data sources together. Many names are not unique, and many people use more than one name online.

PeekYou provides information about web links and how they relate to people. PeekYou uses data from social media, news sites, and other sources, along with its own technology, to create a score to assess the likelihood that individual web pages are associated with particular people — people who may be the creators or the subjects of the page.

If you have URLs and want to know who made them, or who the information is about, this is the source to investigate.



Rapleaf provides information about individuals based on their email addresses. Rapleaf provides several types of information, including

  • Demographics, such as age, gender, and zip code

  • Interests, such as health and wellness, arts and crafts, and business

  • Purchase behavior for many categories, including charitable donations

Recorded Future

Recorded Future provides real-time threat intelligence information. It collects information from nearly half a million online sources in seven languages. The scope of this data covers a wide variety of threats as diverse as malware attacks, geopolitical instability, and events that pose a threat to corporations or executives.


TransUnion is one of the three primary credit-reporting agencies. It provides credit information about both consumers and businesses. TransUnion’s offerings also include criminal records, bankruptcies, demographics, and other data useful for risk management and fraud prevention in a number of industries, including insurance, financial services, and healthcare.

About This Article

This article is from the book:

About the book author:

Meta S. Brown helps organizations use practical data analysis to solve everyday business problems. A hands-on data miner who has tackled projects with up to $900 million at stake, she is a recognized expert in cutting-edge business analytics.

This article can be found in the category: