Basics of User-Based Collaborative Filters in Predictive Analysis

By Anasse Bari, Mohamed Chaouchi, Tommy Jung

With a user-based approach to collaborative filtering in predictive analysis, the system can calculate similarity between pairs of users by using the cosine similarity formula, a technique much like the item-based approach. Usually such calculations take longer to do, and may need to be computed more often, than those used in the item-based approach. That’s because

  • You’d have a lot more users than items (ideally anyway).

  • You’d expect items to change less frequently than users.

  • With more users and less change in the items offered, you can use many more attributes than just purchase history when calculating user similarity.

A user-based system can also use machine-learning algorithms to group all users who have shown that they have the same tastes. The system builds neighborhoods of users who have similar profiles, purchase patterns, or rating patterns. If a person in a neighborhood buys and likes an item, the recommender system can recommend that item to everyone else in the neighborhood.

As with item-based collaborative filtering, the user-based approach requires sufficient data on each user to be effective. Before the system can make recommendations, it must create a user profile — so it also requires that the user create an account and be logged in (or store session information in the browser via cookies) while viewing a website.

Initially the system can ask the user explicitly to create a profile, flesh out the profile by asking questions, and then optimize its suggestions after the user’s purchase data has accumulated.

Netflix is an example of quickly building a profile for each customer. Here’s the general procedure:

  1. Netflix invites its customers to set up queues of the movies they’d like to watch.

  2. The chosen movies are analyzed to learn about the customer’s tastes in movies.

  3. The predictive model recommends more movies for the customer to watch, based on the movies already in the queue.

A sample matrix of customers and their purchased items — is an example of user-based collaborative filtering. For simplicity, use a rule that a user neighborhood is created from users who bought at least two things in common.

Customer Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
A – N1 X X X
B – N1 X X
C – N2 X X
D – N2 X X X
E – N1 X X
F – N1 X X X X
G – N1 X X
H – N3 X
I – N3 X

There are three user neighborhoods formed: N1, N2, and N3. Every user in neighborhoods N1 and N2 has purchased at least 2 items in common with someone else in the same neighborhood. N3 are users that have not yet met the criteria and will not receive recommendations until they purchase other items to meet the criteria.

Here’s an example of how you could use this recommender system:

Offline via an e-mail marketing campaign or if the user is on the website while logged in. The system could send marketing ads or make recommendations on the website as follows:

  • Item 3 to Customer B

  • Item 4 to Customer C

  • Item 1 to Customer E

  • Item 3 to Customer F

  • Item 2 to Customer G

  • Undetermined item to Customers A and D

    Ideally you should have a lot more items than six. And there should always be some items in a customer’s neighborhood that the customer hasn’t purchased yet.

  • Undetermined item to Customers H and I

    In this case, there is insufficient data to serve as the basis of a recommendation.

One very important difference is that since each customer belongs to a group, any future purchases that a member makes will be recommended to the other members of the group until the filter is retrained. So customer A and D will start getting recommendations very quickly since they already belong to a neighborhood and surely the other neighbors will buy something soon.

For example: if Customer B buys Item 6, then the recommender system will recommend item 6 to everyone in N1 (Customer A, B, E, F and G).

Customer F can potentially belong to either neighborhood N1 or N2 depending how the collaborative filtering algorithm is implemented.

Customers H and I provide examples of the cold-start problem: The customer just hasn’t generated enough data to be grouped into a user neighborhood. In the absence of a user profile, a new customer with very little or no purchase history — or who only buys obscure items — will always pose the cold-start problem to the system, regardless of which collaborative filtering approach is in use.

Customer I illustrates an aspect of the cold-start problem that’s unique to the user-based approach. The item-based approach would start finding other items similar to the item that the customer bought; then, if other users start purchasing Item 6, the system can start making recommendations.

No further purchases need be made by the user; the item-based approach can start recommending. In a user-based system, however, Customer I has to make additional purchases in order to belong to a neighborhood of users; the system can’t make any recommendations yet.

Okay, there’s an assumption at work in these simple examples — namely, that the customer not only purchased the item but liked it enough to make similar purchases. What if the customer didn’t like the item? The system needs, at very least, to produce better precision in its recommendations.

You can add a criterion to the recommender system to group people who gave similar ratings to the items they purchased. If the system finds customers who like and dislike the same items, then the assumption of high precision is valid. In other words, there is a high probability that the customers share the same tastes.