How to Fund Associations among Predictive Analytics Data Items - dummies

How to Fund Associations among Predictive Analytics Data Items

By Anasse Bari, Mohamed Chaouchi, Tommy Jung

The use of predictive analytics as a data-mining tool also seeks to discover hidden relationships among items in your data. These hidden relationships are called mining association rules.

Consider a large dataset of customer transactions, where a customer transaction consists of the product(s) purchased by a customer at a given time. In a scenario like this one, the purpose of predictive analytics as a tool is to identify associations between products in the dataset.

An association between two products is a relation, which can help the analyst discern a pattern and derive a rule from the raw data of customer transactions. An instance of such a rule could be grocery-buying patterns: If a customer purchases butter and bread, he or she is also likely to buy milk. The rule discovered in this case can be written as

{butter, bread} → {milk}.

In data-mining terms, {butter, bread} is called a basket. A real-world basket contains items, of course, and so does this basket: butter and bread. The discovered rule just described is that if a basket contains the items butter and bread, then it is also very likely to contain milk.

Finding such association rules in a dataset of customer transactions helps a company (in this case, a grocery store) maximize revenue by deciding which products should be on sale, how to position products in the store’s aisles, and how and when to offer promotional pricing.

Analyzing the data generated by past transactions in order to maximize profit is a common practice. Sales data collected regularly (daily, weekly, monthly) from point-of-sale systems such as online stores, supermarkets, bookstores, and restaurants is referred to as basket data — which is, in this case, essentially large-scale data about sales transactions.

Association rules are generated with a score known as confidence — which refers to how likely they are to hold true. For instance, if a generated rule shows that 98% of the people who purchased butter and bread also purchased milk, that percentage value (98%) is the confidence value.

Other terms associated with a rule are antecedent (the “if” part of an “if-then” statement) and the consequent (the “then” part of the “if-then”). In the previous example, the antecedent is butter and bread; milk is the consequent.

In practice, your company will use predictive analytics to retrieve association rules from a customer database. The analyst issues queries whose purpose is to find rules that are either related to the antecedent (what was bought) or rules that can lead to the consequent (what can be expected to be bought).

In another example, consider a coffee shop manager who wants to maximize profit using association rules as a data-mining tool. The store manager would request items like these:

  • Generate all rules that have croissant in the antecedent and café latte in the consequent.

    Such rules would help the manager develop recommendations for which products to sell together with croissants; if café latte is prominent as a consequent, it’s highly likely that the recommendation will be to sell café latte with croissants.

  • Generate all rules that have chocolate chip cookie as an antecedent.

    These rules may help outline and design a plan for increasing sales of chocolate chip cookies.

  • Generate all rules that have espresso as an antecedent.

    These rules would determine the products whose sales may be affected if the store runs out of espresso.