How to Use Item-Based Collaborative Filters in Predictive Analysis - dummies

How to Use Item-Based Collaborative Filters in Predictive Analysis

By Anasse Bari, Mohamed Chaouchi, Tommy Jung

One of Amazon’s recommender systems for predictive analysis uses item-based collaborative filtering — doling out a huge inventory of products from the company database when a user views a single item on the website. You know you’re looking at an item-based collaborative filtering system (or, often, a content-based system) if it shows you recommendations at your very first item view, even if you haven’t created a profile.

Looks like magic, but it’s not. Although your profile hasn’t been created yet (because you aren’t logged in or you don’t have any previous browser history on that site) the system takes what amounts to a guess: It bases its recommendation on the item itself and what other customers viewed or bought after (or before) they purchased that item. So you’ll see some onscreen message like

  • Customers who bought this item also bought . . .

  • Customers who bought items in your recent history also bought . . .

  • What other items do customers buy after viewing this item?

In essence, the recommendation is based on how similar the currently viewed item is to other items, based on the actions of the community of users.

The following shows a sample matrix of customers and the items they purchased. It will be used as an example of item-based collaborative filtering.

Customer Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
A X X X      
B X X        
C     X   X  
D     X X X  
E   X X      
F X X   X X  
G X   X      
H X          
I           X

Now let’s look at item similarity calculated using the cosine similarity formula. The formula for cosine similarity is (A · B) / (||A|| ||B||), where A and B are items to compare. To read the following example and find out how similar a pair of items are, just locate the cell where the two items intersect. The number will be between 0 and 1. A value of 1 means the items are perfectly similar; 0 means they are not similar.

Item 6 0 0 0 0 0  
Item 5 0.26 0.29 0.52 0.82   0
Item 4 0.32 0.35 0.32   0.82 0
Item 3 0.40 0.45   0.32 0.52 0
Item 2 0.67   0.45 0.35 0.29 0
Item 1   0.67 0.40 0.32 0.26 0
  Item 1 Item 2 Item 3 Item 4 Item 5 Item 6

The system can provide a list of recommendations that are above a certain similarity value or can recommend the top n number of items. In this scenario, you can say that any value greater than or equal to 0.40 is similar; the system will recommend those items.

For example, the similarity between item 1 and item 2 is 0.67. The similarity between item 2 and item 1 is the same. Thus it is a mirror image across the diagonal from lower-left to upper-right. You can also see that item 6 is not similar to any other items because it has a value of 0.

This implementation of an item-based recommendation system is simplified to illustrate how it works. For simplicity, only use one criterion to determine item similarity: whether the user purchased the item. More complex systems could go into greater detail by

  • Using profiles created by users that represent their tastes

  • Factoring in how much s user likes (or highly rates) an item

  • Weighing how many items the user purchased that are similar to the potential recommended item(s)

  • Making assumptions about whether a user likes an item on the basis of whether the user has simply viewed the item, even though no purchase was made

Here are two common ways you could use this recommender system:

  • Offline via an e-mail marketing campaign or if the user is on the website while logged in.

    The system could send marketing ads or make these recommendations on the website:

    • Item 3 to Customer B

      Recommended because Customer B purchased Items 1 and 2, and both items are similar to Item 3.

    • Item 4, then Item 2, to Customer C

      Recommended because Customer C purchased Items 3 and 5. Item 5 is similar to Item 4 (similarity value: 0.82). Item 2 is similar to Item 3 (similarity value: 0.45).

    • Item 2 to Customer D

      Recommended because Customer D purchased Items 3, 4, and 5. Item 3 is similar to Item 2.

    • Item 1 to Customer E

      Recommended because Customer E purchased Items 2 and 3, both of which are similar to Item 1.

    • Item 3 to Customer F

      Recommended because Customer F purchased Items 1, 2, 4, and 5. Items 1, 2, and 5 are similar to Item 3.

    • Item 2 to Customer G

      Recommended because Customer G purchased Items 1 and 3. They are both similar to Item 2.

    • Item 2, then Item 3, to Customer H

      Recommended because Customer H purchased Item 1. Item 1 is similar to Items 2 and 3.

    • Undetermined item to Customer A

      Ideally, you should have a lot more items and users. And there should be some items that a customer has purchased that are similar to other items that he or she has not yet purchased.

    • Undetermined item to Customer I

      In this case, the data is insufficient to serve as the basis of a recommendation. This is an example of the cold-start problem.

  • Online via a page view while the user is not logged in.