Recommender system


A recommender system, or a recommendation system, is a subclass of information filtering system that seeks to predict the "rating" or "preference" a user would give to an item. They are primarily used in commercial applications.
Recommender systems are utilized in a variety of areas and are most commonly recognized as playlist generators for video and music services like Netflix, YouTube and Spotify, product recommenders for services such as Amazon, or content recommenders for social media platforms such as Facebook and Twitter. These systems can operate using a single input, like music, or multiple inputs within and across platforms like news, books, and search queries. There are also popular recommender systems for specific topics like restaurants and online dating. Recommender systems have also been developed to explore research articles and experts, collaborators, and financial services.

Overview

Recommender systems usually make use of either or both collaborative filtering and content-based filtering, as well as other systems such as knowledge-based systems. Collaborative filtering approaches build a model from a user's past behavior as well as similar decisions made by other users. This model is then used to predict items that the user may have an interest in. Content-based filtering approaches utilize a series of discrete, pre-tagged characteristics of an item in order to recommend additional items with similar properties. Current recommender systems typically combine one or more approaches into a hybrid system.
The differences between collaborative and content-based filtering can be demonstrated by comparing two early music recommender systems – Last.fm and Pandora Radio.
Each type of system has its strengths and weaknesses. In the above example, Last.fm requires a large amount of information about a user to make accurate recommendations. This is an example of the cold start problem, and is common in collaborative filtering systems. Whereas Pandora needs very little information to start, it is far more limited in scope.
Recommender systems are a useful alternative to search algorithms since they help users discover items they might not have found otherwise. Of note, recommender systems are often implemented using search engines indexing non-traditional data.
Recommender systems were first mentioned in a technical report as a "digital bookshelf" in 1990 by Jussi Karlgren at Columbia University, and implemented at scale and worked through in technical reports and publications from 1994 onwards by Jussi Karlgren, then at SICS,
and research groups led by Pattie Maes at MIT, Will Hill at Bellcore, and Paul Resnick, also at MIT
whose work with GroupLens was awarded the 2010 ACM Software Systems Award.
Montaner provided the first overview of recommender systems from an intelligent agent perspective. Adomavicius provided a new, alternate overview of recommender systems. Herlocker provides an additional overview of evaluation techniques for recommender systems, and Beel et al. discussed the problems of offline evaluations. Beel et al. have also provided literature surveys on available research paper recommender systems and existing challenges.
Recommender systems have been the focus of several granted patents.

Approaches

Collaborative filtering

One approach to the design of recommender systems that has wide use is collaborative filtering. Collaborative filtering is based on the assumption that people who agreed in the past will agree in the future, and that they will like similar kinds of items as they liked in the past. The system generates recommendations using only information about rating profiles for different users or items. By locating peer users/items with a rating history similar to the current user or item, they generate recommendations using this neighborhood. Collaborative filtering methods are classified as memory-based and model-based. A well-known example of memory-based approaches is the user-based algorithm, while that of model-based approaches is the Kernel-Mapping Recommender.
A key advantage of the collaborative filtering approach is that it does not rely on machine analyzable content and therefore it is capable of accurately recommending complex items such as movies without requiring an "understanding" of the item itself. Many algorithms have been used in measuring user similarity or item similarity in recommender systems. For example, the k-nearest neighbor approach and the Pearson Correlation as first implemented by Allen.
When building a model from a user's behavior, a distinction is often made between explicit and implicit forms of data collection.
Examples of explicit data collection include the following:
Examples of implicit data collection include the following:
Collaborative filtering approaches often suffer from three problems: cold start, scalability, and sparsity.
One of the most famous examples of collaborative filtering is item-to-item collaborative filtering, an algorithm popularized by Amazon.com's recommender system.
Many social networks originally used collaborative filtering to recommend new friends, groups, and other social connections by examining the network of connections between a user and their friends. Collaborative filtering is still used as part of hybrid systems.

Content-based filtering

Another common approach when designing recommender systems is content-based filtering. Content-based filtering methods are based on a description of the item and a profile of the user's preferences. These methods are best suited to situations where there is known data on an item, but not on the user. Content-based recommenders treat recommendation as a user-specific classification problem and learn a classifier for the user's likes and dislikes based on an item's features.
In this system, keywords are used to describe the items and a user profile is built to indicate the type of item this user likes. In other words, these algorithms try to recommend items that are similar to those that a user liked in the past, or is examining in the present. It does not rely on a user sign-in mechanism to generate this often temporary profile. In particular, various candidate items are compared with items previously rated by the user and the best-matching items are recommended. This approach has its roots in information retrieval and information filtering research.
To create a user profile, the system mostly focuses on two types of information:
1. A model of the user's preference.
2. A history of the user's interaction with the recommender system.
Basically, these methods use an item profile characterizing the item within the system. To abstract the features of the items in the system, an item presentation algorithm is applied. A widely used algorithm is the tf–idf representation. The system creates a content-based profile of users based on a weighted vector of item features. The weights denote the importance of each feature to the user and can be computed from individually rated content vectors using a variety of techniques. Simple approaches use the average values of the rated item vector while other sophisticated methods use machine learning techniques such as Bayesian Classifiers, cluster analysis, decision trees, and artificial neural networks in order to estimate the probability that the user is going to like the item.
A key issue with content-based filtering is whether the system is able to learn user preferences from users' actions regarding one content source and use them across other content types. When the system is limited to recommending content of the same type as the user is already using, the value from the recommendation system is significantly less than when other content types from other services can be recommended. For example, recommending news articles based on browsing of news is useful, but would be much more useful when music, videos, products, discussions etc. from different services can be recommended based on news browsing. To overcome this, most content-based recommender systems now use some form of hybrid system.
Content-based recommender systems can also include opinion-based recommender systems. In some cases, users are allowed to leave text review or feedback on the items. These user-generated texts are implicit data for the recommender system because they are potentially rich resource of both feature/aspects of the item, and users' evaluation/sentiment to the item. Features extracted from the user-generated reviews are improved meta-data of items, because as they also reflect aspects of the item like meta-data, extracted features are widely concerned by the users. Sentiments extracted from the reviews can be seen as users' rating scores on the corresponding features. Popular approaches of opinion-based recommender system utilize various techniques including text mining, information retrieval, sentiment analysis and deep learning.

Multi-criteria recommender systems

Multi-criteria recommender systems can be defined as recommender systems that incorporate preference information upon multiple criteria. Instead of developing recommendation techniques based on a single criterion value, the overall preference of user u for the item i, these systems try to predict a rating for unexplored items of u by exploiting preference information on multiple criteria that affect this overall preference value. Several researchers approach MCRS as a multi-criteria decision making problem, and apply MCDM methods and techniques to implement MCRS systems. See this chapter for an extended introduction.

Risk-aware recommender systems

The majority of existing approaches to recommender systems focus on recommending the most relevant content to users using contextual information, yet do not take into account the risk of disturbing the user with unwanted notifications. It is important to consider the risk of upsetting the user by pushing recommendations in certain circumstances, for instance, during a professional meeting, early morning, or late at night. Therefore, the performance of the recommender system depends in part on the degree to which it has incorporated the risk into the recommendation process. One option to manage this issue is DRARS, a system which models the context-aware recommendation as a bandit problem. This system combines a content-based technique and a contextual bandit algorithm.

Mobile recommender systems

Mobile recommender systems make use of internet-accessing smart phones to offer personalized, context-sensitive recommendations. This is a particularly difficult area of research as mobile data is more complex than data that recommender systems often have to deal with. It is heterogeneous, noisy, requires spatial and temporal auto-correlation, and has validation and generality problems.
There are three factors that could affect the mobile recommender systems and the accuracy of prediction results: the context, the recommendation method and privacy. Additionally, mobile recommender systems suffer from a transplantation problem – recommendations may not apply in all regions.
One example of a mobile recommender system are the approaches taken by companies such as Uber and Lyft to generate driving routes for taxi drivers in a city. This system uses GPS data of the routes that taxi drivers take while working, which includes location, time stamps, and operational status. It uses this data to recommend a list of pickup points along a route, with the goal of optimizing occupancy times and profits.
Mobile recommendation systems have also been successfully built using the "Web of Data" as a source for structured information. A good example of such system is SMARTMUSEUM The system uses semantic modelling, information retrieval, and machine learning
techniques in order to recommend content matching user interests, even when presented with sparse or minimal user data.

Hybrid recommender systems

Most recommender systems now use a hybrid approach, combining collaborative filtering, content-based filtering, and other approaches. There is no reason why several different techniques of the same type could not be hybridized. Hybrid approaches can be implemented in several ways: by making content-based and collaborative-based predictions separately and then combining them; by adding content-based capabilities to a collaborative-based approach ; or by unifying the approaches into one model. Several studies that empirically compare the performance of the hybrid with the pure collaborative and content-based methods and demonstrated that the hybrid methods can provide more accurate
recommendations than pure approaches. These methods can also be used to overcome some of the common problems in recommender systems such as cold start and the sparsity problem, as well as the knowledge engineering bottleneck in knowledge-based approaches.
Netflix is a good example of the use of hybrid recommender systems. The website makes recommendations by comparing the watching and searching habits of similar users as well as by offering movies that share characteristics with films that a user has rated highly.
Some hybridization techniques include:
One of the events that energized research in recommender systems was the Netflix Prize. From 2006 to 2009, Netflix sponsored a competition, offering a grand prize of $1,000,000 to the team that could take an offered dataset of over 100 million movie ratings and return recommendations that were 10% more accurate than those offered by the company's existing recommender system. This competition energized the search for new and more accurate algorithms. On 21 September 2009, the grand prize of US$1,000,000 was given to the BellKor's Pragmatic Chaos team using tiebreaking rules.
The most accurate algorithm in 2007 used an ensemble method of 107 different algorithmic approaches, blended into a single prediction. As stated by the winners, Bell et al.:

Predictive accuracy is substantially improved when blending multiple predictors. Our experience is that most efforts should be concentrated in deriving substantially different approaches, rather than refining a single technique. Consequently, our solution is an ensemble of many methods.

Many benefits accrued to the web due to the Netflix project. Some teams have taken their technology and applied it to other markets. Some members from the team that finished second place founded Gravity R&D, a recommendation engine that's active in the RecSys community. 4-Tell, Inc. created a Netflix project–derived solution for ecommerce websites.
A number of privacy issues arose around the dataset offered by Netflix for the Netflix Prize competition. Although the data sets were anonymized in order to preserve customer privacy, in 2007 two researchers from the University of Texas were able to identify individual users by matching the data sets with film ratings on the Internet Movie Database. As a result, in December 2009, an anonymous Netflix user sued Netflix in Doe v. Netflix, alleging that Netflix had violated United States fair trade laws and the Video Privacy Protection Act by releasing the datasets. This, as well as concerns from the Federal Trade Commission, led to the cancellation of a second Netflix Prize competition in 2010.

Performance measures

Evaluation is important in assessing the effectiveness of recommendation algorithms. To measure the effectiveness of recommender systems, and compare different approaches, three types of evaluations are available: user studies, online evaluations, and offline evaluations.
The commonly used metrics are the mean squared error and root mean squared error, the latter having been used in the Netflix Prize. The information retrieval metrics such as precision and recall or DCG are useful to assess the quality of a recommendation method. Diversity, novelty, and coverage are also considered as important aspects in evaluation. However, many of the classic evaluation measures are highly criticized.
User studies are rather small scale. A few dozens or hundreds of users are presented recommendations created by different recommendation approaches, and then the users judge which recommendations are best. In A/B tests, recommendations are shown to typically thousands of users of a real product, and the recommender system randomly picks at least two different recommendation approaches to generate recommendations. The effectiveness is measured with implicit measures of effectiveness such as conversion rate or click-through rate. Offline evaluations are based on historic data, e.g. a dataset that contains information about how users previously rated movies.
The effectiveness of recommendation approaches is then measured based on how well a recommendation approach can predict the users' ratings in the dataset. While a rating is an explicit expression of whether a user liked a movie, such information is not available in all domains. For instance, in the domain of citation recommender systems, users typically do not rate a citation or recommended article. In such cases, offline evaluations may use implicit measures of effectiveness. For instance, it may be assumed that a recommender system is effective that is able to recommend as many articles as possible that are contained in a research article's reference list. However, this kind of offline evaluations is seen critical by many researchers. For instance, it has been shown that results of offline evaluations have low correlation with results from user studies or A/B tests. A dataset popular for offline evaluation has been shown to contain duplicate data and thus to lead to wrong conclusions in the evaluation of algorithms. Often, results of so-called offline evaluations do not correlate with actually assessed user-satisfaction. This is probably because offline training is highly biased toward the highly reachable items, and offline testing data is highly influenced by the outputs of the online recommendation module. Researchers have concluded that the results of offline evaluations should be viewed critically.

Beyond accuracy

Typically, research on recommender systems is concerned about finding the most accurate recommendation algorithms. However, there are a number of factors that are also important.
The field of recommender systems has been impacted by the replication crisis as well. A systematic analysis of publications applying deep learning or neural methods to the top-k recommendation problem, published in top conferences, has shown that on average less than 40% of articles are reproducible, with as little as 14% in some conferences. Overall the study identifies 18 articles, only 7 of them could be reproduced and 6 of them could be outperformed by much older and simpler properly tuned baselines. The article also highlights a number of potential problems in today's research scholarship and calls for improved scientific practices in that area. Similar issues have been spotted also in sequence-aware recommender systems.
Previous research was also found had little impact on the practical application of recommender systems. By 2011, Ekstrand, Konstan, et al. criticized that "it is currently difficult to reproduce and extend recommender systems research results,” and that evaluations are “not handled consistently". Konstan and Adomavicius conclude that "the Recommender Systems research community is facing a crisis where a significant number of papers present results that contribute little to collective knowledge often because the research lacks the evaluation to be properly judged and, hence, to provide meaningful contributions." As a consequence, much research about recommender systems can be considered as not reproducible. Hence, operators of recommender systems find little guidance in the current research for answering the question, which recommendation approaches to use in a recommender systems. Said & Bellogín conducted a study of papers published in the field, as well as benchmarked some of the most popular frameworks for recommendation and found large inconsistencies in results, even when the same algorithms and data sets were used. Some researchers demonstrated that minor variations in the recommendation algorithms or scenarios led to strong changes in the effectiveness of a recommender system. They conclude that seven actions are necessary to improve the current situation: " survey other research fields and learn from them, find a common understanding of reproducibility, identify and understand the determinants that affect reproducibility, conduct more comprehensive experiments modernize publication practices, foster the development and use of recommendation frameworks, and establish best-practice guidelines for recommender-systems research."