Skip to main content

Recommender Systems — User-Based and Item-Based Collaborative Filtering

Recommender Systems — User-Based and Item-Based Collaborative Filtering

This is part 2 of my series on Recommender Systems. The last post was an introduction to RecSys. Today I’ll explain in more detail three types of Collaborative Filtering: User-Based Collaborative Filtering (UB-CF) and Item-Based Collaborative Filtering (IB-CF).
Let’s begin.

User-Based Collaborative Filtering (UB-CF)

Imagine that we want to recommend a movie to our friend Stanley. We could assume that similar people will have similar taste. Suppose that me and Stanley have seen the same movies, and we rated them all almost identically. But Stanley hasn’t seen ‘The Godfather: Part II’and I didIf I love that movie, it sounds logical to think that he will too. With that, we have created an artificial rating based on our similarity.
Well, UB-CF uses that logic and recommends items by finding similar users to the active user (to whom we are trying to recommend a movie). A specific application of this is the user-based Nearest Neighbor algorithm. This algorithm needs two tasks:
1.Find the K-nearest neighbors (KNN) to the user a, using a similarity function wto measure the distance between each pair of users:
2.Predict the rating that user a will give to all items the k neighbors have consumed but a has not. We Look for the item j with the best predicted rating.
In other words, we are creating a User-Item Matrix, predicting the ratings on items the active user has not see, based on the other similar users. This technique is memory-based.
Filling the blanks

PROS:

  • Easy to implement.
  • Context independent.
  • Compared to other techniques, such as content-based, it is more accurate.

CONS:

  • Sparsity: The percentage of people who rate items is really low.
  • Scalability: The more K neighbors we consider (under a certain threshold), the better my classification should be. Nevertheless, the more users there are in the system, the greater the cost of finding the nearest K neighbors will be.
  • Cold-start: New users will have no to little information about them to be compared with other users.
  • New item: Just like the last point, new items will lack of ratings to create a solid ranking (More of this on ‘How to sort and rank items’).

Item-Based Collaborative Filtering (IB-CF)

Back to Stanley. Instead of focusing on his friends, we could focus on what items from all the options are more similar to what we know he enjoys. This new focus is known as Item-Based Collaborative Filtering (IB-CF).
We could divide IB-CF in two sub tasks:
1.Calculate similarity among the items:
  • Cosine-Based Similarity
  • Correlation-Based Similarity
  • Adjusted Cosine Similarity
  • 1-Jaccard distance
2.Calculation of Prediction:
  • Weighted Sum
  • Regression
The difference between UB-CF and this method is that, in this case, we directly pre-calculate the similarity between the co-rated items, skipping K-neighborhood search.

Slope One

Slope One is part of the Item-Based Collaborative Filtering family, introduced in a 2005 paper by Daniel Lemire and Anna Maclachlan called Slope One Predictors for Online Rating-Based Collaborative Filtering.
The main idea behind this model is the following:
Suppose we have two different users: Aand B. Also, we have item I and item JUser A rated item I with 1 star and the item J with 1.5. If the User B rated Item Iwith a 2. We can make the assumption that the difference between both items will be the same as User A. With this in mind, User B would rate Item J as: 2+ (1,5–1) = 2,5
Main idea behind Slope One
The authors focus on 5 objectives:
1. Easy to implement and maintain.
2. Updatable online: new ratings should change predictions quickly.
3. Efficient at the time of consultation: storage is the main cost.
4. It works with little user feedback.
5. Reasonably accurate, within certain ranges in which a small gain in accuracy does not mean a great sacrifice of simplicity and scalability.

Recap

We saw User-Based and Item-Based Collaborative Filtering. The first has a focus on filling an user-item matrix and recommending based on the users more similar to the active user. On the other hand, IB-CF fills a Item-Item matrix, and recommends based on similar items.
It is hard to explain all these subjects briefly, but understanding them is the first step to getting deeper into RecSys

Comments

Popular posts from this blog

EVENT DRIVEN MICROSERVICES

EVENT BASED MICROSERVICES - Event Sourcing In a Microservice Architecture, especially with Database per Microservice, the Microservices need to exchange data. For resilient, highly scalable, and fault-tolerant systems, they should communicate asynchronously by exchanging Events. In such a case, you may want to have Atomic operations, e.g., update the Database and send the message. If you have SQL databases and want to have distributed transactions for a high volume of data, you cannot use the two-phase locking (2PL) as it does not scale. If you use NoSQL Databases and want to have a distributed transaction, you cannot use 2PL as many NoSQL databases do not support two-phase locking. In such scenarios, use Event based Architecture with Event Sourcing. In traditional databases, the Business Entity with the current “state” is directly stored. In Event Sourcing, any state-changing event or other significant events are stored instead of the entities. It means the modifications of a Busines...

Recommendation Engines - Know How

Recommendation Engines perform a variety of tasks - but the most important one is to find products that are most relevant to the user. Content based filtering, collaborative filtering and Association rules are common approaches to do so. So let's first  Understand basics of Recommendation Engines and then we'll later on Build Our Own Recommendation Engine !!! HIGH QUALITY, PERSONALIZED  ARE THE HOLY GRAIL FOR EVERY ONLINE STORE. UNLIKE OFFLINE STORES,  ONLINE STORES HAVE NO SALES PEOPLE. USERS ON THE OTHER HAND  HAVE LIMITED TIME AND PATIENCE,  ARE NOT SURE WHAT THEY ARE LOOKING FOR  ONLINE STORES HAVE A HUGE NUMBER OF  PRODUCTS. RECOMMENDATIONS HELP USERS  NAVIGATE THE MAZE OF ONLINE STORES  FIND WHAT THEY ARE LOOKING FOR  FIND THINGS THEY MIGHT LIKE, BUT DIDN’T KNOW OF. RECOMMENDATIONS HELP ONLINE STORES  SOLVE THE PROBLEM OF DISCOVERY. BUT HOW? Lets Explain this. ONLINE STORES HAVE DATA 1) WHAT USERS  BOUGHT 2)...

KAFKA - Architecture

Kafka - Architecture What is Kafka? Kafka is an event-streaming platform that is designed to process high volumes of data in real-time. Developed by LinkedIn in 2011, it has quickly become the infrastructural backbone of companies like Netflix, Twitter, and Spotify. Why do we need Kafka? In today’s data-driven world, tracking information like user clicks, recommendations, and shopping carts can be invaluable for a company’s growth. With these analytics, companies can make the product improvements needed to boost user engagement and conversion rates. However, on sites with millions of daily users, collecting and analyzing this data is nontrivial. Kafka was des i gned to streamline this operation, acting as a robust tool that maintains efficient, real-time processing capabilities with incredible quantities of data. For instance, as of late 2019, LinkedIn’s Kafka deployments were managing more than 7 trillion messages per day. How does Kafka work? Kafka provides a structured architecture ...