On a recent snowy afternoon, I fell down a YouTube rabbit hole so deep that it took nearly two hours to click my way out. A simple search for a particular “Schitt’s Creek” blooper reel led to binge-viewing vintage “Saturday Night Live” clips and watching Will Ferrell’s speech at the 2011 ceremony for the Mark Twain Prize for American Humor. The experience made me wonder: How does YouTube know what to suggest to me next? And in the same way, how do Amazon, Instagram and Netflix predict products or other content that I’m likely to show interest in?
The answer is usually collaborative filtering.
What is collaborative filtering?
Collaborative filtering is one type of algorithmic “recommender system” that predicts a web user’s preferences based on past experience. It sounds complicated, but it’s actually a pretty simple concept to understand. There are two popular types of recommender systems. Here’s how they break down.
The first recommender system is collaborative filtering. Collaborative filtering compares multiple users’ activities and delivers personalized recommendations to your screen based on interests the algorithm predicts you share with other users.
“The process of identifying similar users and recommending what similar users like is called collaborative filtering,” said Nabil Adam, distinguished professor and founding director of Rutgers University’s Institute for Data Science, Learning, and Applications. “[This] approach focuses on the similarity of the ratings of those items by users who have rated both items.”
The second recommender system is content-based. A content-based recommender system scans user-requested content for specific language, such as names. The content-based system takes those inputs and searches for other content that contains the same words. It doesn’t take other users’ activities into account when making recommendations.
There are pros and cons to both approaches. Collaborative filtering can make finding popular, high-quality items easier, assuming there are enough user ratings to generate predictive analytics. But even then, there’s no guarantee the item’s raters will share your taste. And if there aren’t enough user ratings, there may not be any suitable recommendations at all.
“An advantage of the collaborative filter-based system is that it works for any kind of item and no feature selection is needed,” said Adam. “Limitations of this system, on the other hand, include dealing with a cold start where there are not enough users in the system to find a match, the difficulty of finding users that have rated the same items, the “first rater” problem and handling popularity bias,” he added.
Content-based filtering can be an effective way to find new or rare, high-quality items that others don’t know about yet. But this can also result in recommendations that don’t suit your needs.
Examples of collaborative filtering
Here’s how collaborative filtering and content-based filtering work in practice.
Let’s say you love Jack Nicholson, and “The Shining” is one of your favorite classic horror films. You search Amazon for other scary movies because you want to watch something you haven’t seen. A collaborative-filtering system might recommend other popular classics, such as “The Exorcist,” because other users who rated “The Shining” highly also enjoyed “The Exorcist.” Therefore, according to the algorithm, you’re also likely to be interested in “The Exorcist.”
Now let’s imagine you search for “The Shining” on a website that uses content-based filtering. Your results might include a famous flick like “The Exorcist,” another Jack Nicholson movie such as “Chinatown” or another little-known horror film with no user ratings whatsoever.
Searching for movies online is a relatively safe endeavor. But what happens when collaborative filtering is used on other websites, such as those built for online dating?
“Collaborative filtering is commonly used in e-commerce to provide recommendations for various products,” said Adam. “In this context, there are two major concerns: the privacy of the users and the truthfulness of the ratings.
“As for the truthfulness of the ratings, a possible type of attack is the ‘shilling’ attack, where false profiles are introduced [and] that rate a selected set of items higher, resulting in significant changes,” he said. “Trusted-based collaborative-filtering systems prevent such attacks [and] preserve the truthfulness of the system. This is achieved by introducing a web of trusted users whose ratings are preferred over the untrusted users,” he added.
Are collaborative filtering algorithms safe?
Collaborative filtering is a valuable mechanism, but users should be aware of information security and privacy concerns. And especially in the case of online dating, it’s important to take steps to attempt to learn more about the identity of anyone you meet online.
“Information collected about users can be transferred, sold or used in a malicious way,” said Adam. “There are several research works addressing the challenge of maximizing the usability of the information provided by users while preserving their privacy. An example of such works is to have user-information and ratings-information databases obfuscated in such a way that clusters of similar data are preserved while hiding the actual values of data,” he added.
Collaborative filtering is here to stay
Searching the web to find the information we need can be time-consuming because the internet is a vast data network that grows bigger every day. Machine learning techniques, such as collaborative filtering, improve the user’s experience because they predict preferences based on past behavior. These “recommended systems” save time and also introduce users to new products or content they may not have found on their own.
But just like anything online, users should keep in mind that dozens or hundreds of five-star ratings on a collaborative-filtering system may not accurately predict shared taste. They may not even represent truthful opinions.
Long story short? Accept ratings and recommendations with a grain of salt, and always dig deeper when making the transition from a digital relationship to a personal one.