Basics of Content-based Predictive Analytics Filters - dummies

Basics of Content-based Predictive Analytics Filters

By Anasse Bari, Mohamed Chaouchi, Tommy Jung

Content-based predictive analytics recommender systems mostly match features (tagged keywords) among similar items and the user’s profile to make recommendations. When a user purchases an item that has tagged features, items with features that match those of the original item will be recommended. The more features match, the higher the probability the user will like the recommendation. This degree of probability is called precision.

Basics of tags to describe items

In general, the company doing the selling (or the manufacturer) usually tags its items with keywords. In the Amazon website, however, it’s fairly typical never to see the tags for any items purchased or viewed — and not even to be asked to tag an item. Customers can review the items they’ve purchased, but that’s not the same as tagging.

Tagging items can pose a scale challenge for a store like Amazon that has so many items. Additionally, some attributes can be subjective and may be incorrectly tagged, depending on who tags it. One solution that solves the scaling issue is to allow customers or the general public to tag the items.

To keep tags manageable and accurate, an acceptable set of tags may be provided by the website. Only when an appropriate number of users agree (that is, use the same tag to describe an item), will the agreed-upon tag be used to describe the item.

User-based tagging, however, turns up other problems for a content-based filtering system (and collaborative filtering):

  • Credibility: Not all customers tell the truth (especially online), and users who have only a small rating history can skew the data. In addition, some vendors may give (or encourage others to give) positive ratings to their own products while giving negative ratings to their competitors’ products.

  • Sparsity: Not all items will be rated or will have enough ratings to produce useful data.

  • Inconsistency: Not all users use the same keywords to tag an item, even though the meaning may be the same. Additionally, some attributes can be subjective. For example, one viewer of a movie may consider it short while another says it’s too long.

Attributes need clear definitions. An attribute with too few boundaries is hard to evaluate; imposing too many rules on an attribute may be asking users to do too much work, which will discourage them from tagging items.

Tagging most items in a product catalog can help solve the cold-start problem that plagues collaborative filtering. For a while, however, the precision of the system’s recommendations will be low until it creates or obtains a user profile.

Here’s a sample matrix of customers and their purchased items, shows an example of content-based filtering.

Items Feature 1 Feature 2 Feature 3 Feature 4 Feature 5
Item 1 X X
Item 2 X X
Item 3 X X X
Item 4 X X X
Item 5 X X X

Here, if a user likes Feature 2 — and that’s recorded in her profile — the system will recommend all items that have Feature 2 in them: Item 1, Item 2, and Item 4.

This approach works even if the user has never purchased or reviewed an item. The system will just look in the product database for any item that has been tagged with Feature 2. If (for example) a user who’s looking for movies with Audrey Hepburn — and that preference shows up in the user’s profile — the system will recommend all the movies that feature Audrey Hepburn to this user.

This example, however, quickly exposes a limitation of the content-based filtering technique: The user probably already knows about all the movies that Audrey Hepburn has been in, or can easily find out — so, from that user’s point of view, the system hasn’t recommended anything new or of value.

How to improve precision with constant feedback

One way to improve the precision of the system’s recommendations is to ask customers for feedback whenever possible. Collecting customer feedback can be done in many different ways, through multiple channels. Some companies ask the customer to rate an item or service after purchase. Other systems provide social-media-style links so customers can “like” or “dislike” a product. Constant interaction between

How to measure the effectiveness of system recommendations

The success of a system’s recommendations depends on how well it meets two criteria: precision (think of it as a set of perfect matches — usually a small set) and recall (think of it as a set of possible matches — usually a larger set). Here’s a closer look:

  • Precision measures how accurate the system’s recommendation was. Precision is difficult to measure because it can be subjective and hard to quantify. For instance, when a user first visits the Amazon site, can Amazon know for sure whether its recommendations are on target?

    Some recommendations may connect with the customer’s interests but the customer may still not buy. The highest confidence that a recommendation is precise comes from clear evidence: The customer buys the item. Alternatively, the system can explicitly ask the user to rate its recommendations.

  • Recall measures the set of possible good recommendations your system comes up with. Think of recall as an inventory of possible recommendations, but not all of them are perfect recommendations. There is generally an inverse relationship to precision and recall. That is, as recall goes up, precision goes down, and vice versa.

The ideal system would have both high precision and high recall. But realistically, the best outcome is to strike a delicate balance between the two. Emphasizing precision or recall really depends on the problem you’re trying to solve.