Yehuda Koren’s research while affiliated with Google Inc. and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (88)


Revisiting the Performance of iALS on Item Recommendation Benchmarks
  • Conference Paper

September 2022

·

40 Reads

·

56 Citations

Steffen Rendle

·

Walid Krichene

·

Li Zhang

·

Yehuda Koren


Revisiting the Performance of iALS on Item Recommendation Benchmarks

October 2021

·

47 Reads

Matrix factorization learned by implicit alternating least squares (iALS) is a popular baseline in recommender system research publications. iALS is known to be one of the most computationally efficient and scalable collaborative filtering methods. However, recent studies suggest that its prediction quality is not competitive with the current state of the art, in particular autoencoders and other item-based collaborative filtering methods. In this work, we revisit the iALS algorithm and present a bag of tricks that we found useful when applying iALS. We revisit four well-studied benchmarks where iALS was reported to perform poorly and show that with proper tuning, iALS is highly competitive and outperforms any method on at least half of the comparisons. We hope that these high quality results together with iALS's known scalability spark new interest in applying and further improving this decade old technique.


iALS++: Speeding up Matrix Factorization with Subspace Optimization

October 2021

·

55 Reads

iALS is a popular algorithm for learning matrix factorization models from implicit feedback with alternating least squares. This algorithm was invented over a decade ago but still shows competitive quality compared to recent approaches like VAE, EASE, SLIM, or NCF. Due to a computational trick that avoids negative sampling, iALS is very efficient especially for large item catalogues. However, iALS does not scale well with large embedding dimensions, d, due to its cubic runtime dependency on d. Coordinate descent variations, iCD, have been proposed to lower the complexity to quadratic in d. In this work, we show that iCD approaches are not well suited for modern processors and can be an order of magnitude slower than a careful iALS implementation for small to mid scale embedding sizes (d ~ 100) and only perform better than iALS on large embeddings d ~ 1000. We propose a new solver iALS++ that combines the advantages of iALS in terms of vector processing with a low computational complexity as in iCD. iALS++ is an order of magnitude faster than iCD both for small and large embedding dimensions. It can solve benchmark problems like Movielens 20M or Million Song Dataset even for 1000 dimensional embedding vectors in a few minutes.


On the Difficulty of Evaluating Baselines: A Study on Recommender Systems

May 2019

·

304 Reads

Numerical evaluations with comparisons to baselines play a central role when judging research in recommender systems. In this paper, we show that running baselines properly is difficult. We demonstrate this issue on two extensively studied datasets. First, we show that results for baselines that have been used in numerous publications over the past five years for the Movielens 10M benchmark are suboptimal. With a careful setup of a vanilla matrix factorization baseline, we are not only able to improve upon the reported results for this baseline but even outperform the reported results of any newly proposed method. Secondly, we recap the tremendous effort that was required by the community to obtain high quality results for simple methods on the Netflix Prize. Our results indicate that empirical findings in research papers are questionable unless they were obtained on standardized benchmarks where baselines have been tuned extensively by the research community.


Advances in Collaborative Filtering

January 2015

·

861 Reads

·

944 Citations

The collaborative filtering (CF) approach to recommenders has recently enjoyed much interest and progress. The fact that it played a central role within the recently completed Netflix competition has contributed to its popularity. This chapter surveys the recent progress in the field. Matrix factorization techniques, which became a first choice for implementing CF, are described together with recent innovations. We also describe several extensions that bring competitive accuracy into neighborhood methods, which used to dominate the field. The chapter demonstrates how to utilize temporal models and implicit feedback to extend models accuracy. In passing, we include detailed descriptions of some the central methods developed for tackling the challenge of the Netflix Prize competition.


Towards scalable and accurate item-oriented recommendations

October 2013

·

88 Reads

·

36 Citations

Most recommenders research aims at personalized systems, which suggest items based on user profiles. However, in reality many systems deal with item-oriented recommendations. In such setups, given a single item of interest, the system needs to provide other related items, following patterns like "people who liked this also liked...". While item-oriented systems are central in their importance, they have been approached so far using very basic tools. We identify several hurdles faced by standard approaches to the item-oriented task. First, the sparseness of observed activities prevents establishing reliable similarity relations for many item pairs. Second, we address a scalability challenge at the retrieval stage present in many real-world systems: Given an item inventory, which may encompass millions of items, it is desired to identify the most related item pairs in a sub-quadratic time. This work addresses these two challenges, thereby improving both accuracy and scalability of item-oriented recommenders. Additionally, we propose an empirical evaluation scheme for comparing the quality of different solutions with encouraging results.


Bootstrapping recommender system and method
  • Patent
  • Full-text available

August 2013

·

23 Reads

Bootstrapping a recommender system that makes item recommendations. The bootstrapping identifying questions for use in interviewing a user, e.g., a new user to the recommender system, to obtain user information, e.g., user profile information, for use in predicting item ratings for the user, the predicted item ratings for use in identifying item recommendations for the user. The bootstrapping using a cost function to minimize error in selection of the questions for the interview. The bootstrapping comprising a static bootstrapping and an adaptive bootstrapping.

Download

Collaborative filtering on ordinal user feedback

August 2013

·

133 Reads

·

34 Citations

We propose a collaborative filtering (CF) recommendation framework which is based on viewing user feedback on products as ordinal, rather than the more common numerical view. Such an ordinal view frequently provides a more natural reflection of the user intention when providing qualitative ratings, allowing users to have different internal scoring scales. Moreover, we can address scenarios where assigning numerical scores to different types of user feedback would not be easy. The framework can wrap most collaborative filtering algorithms, enabling algorithms previously designed for numerical values to handle ordinal values. We demonstrate our framework by wrapping a leading matrix factorization CF method. A cornerstone of our method is its ability to predict a full probability distribution of the expected item ratings, rather than only a single score for an item. One of the advantages this brings is a novel approach to estimating the confidence level in each individual prediction. Compared to previous approaches to confidence estimation, ours is more principled and empirically superior in its accuracy. We demonstrate the efficacy of the approach on two of the largest publicly available datasets: the Netflix data and the Yahoo! Music data.


Mining global email folders for identifying auto-folder tags

June 2013

·

24 Reads

Vishwanath Tumkur Ramarao

·

·

·

[...]

·

Embodiments are directed towards identifying auto-folder tags for messages by using a combinational optimization approach of bi-clustering folder names and features of messages based on relationship strengths. The combinational optimization approach of bi-clustering, generally, groups a plurality of folder names and a plurality of features into one or more metafolders to optimize a cost. The cost is based on an aggregate of cut relationship strengths, where a cut results when a relationship folder name and feature are grouped in separate metafolders. Furthermore, the plurality of folder names and the plurality of features are obtained by monitoring actions of a plurality of users, where the folder names are user generated folder names and features are from a plurality of messages. The metafolders may be used to tag new user messages with an auto-folder tag.


Citations (79)


... Collaborative filtering is the most common and widely used method for generating recommendations in music streaming services [22]. This algorithm relies on a set of songs that users preferred in the past to predict which song they would like to listen to. ...

Reference:

Content-Based Filtering Technique using Clustering Method for Music Recommender Systems
Advances in Collaborative Filtering
  • Citing Chapter
  • November 2021

... In this paper, we focus on whole-data models with weighted square loss. Weighted Matrix Factorization (WMF), also called iALS [14,20], pioneered this class of models and is still known to achieve competitive results while having highly scalable learning and prediction routines [22]. After its introduction, many extensions were proposed, among which three variants for context-aware recommender systems (CARS) [5,10,11], where each variant uses a different tensor decomposition method. ...

Revisiting the Performance of iALS on Item Recommendation Benchmarks
  • Citing Conference Paper
  • September 2022

... The analysis of web-based data has seen extensive exploration through graph-based models. Traditional approaches, such as collaborative filtering and matrix factorization [8], focus on pairwise relationships but fail to capture higher-order interactions. Hypergraphs, which extend graphs by allowing edges to connect multiple nodes, offer a richer representation of complex data. ...

Reference:

Neural Networks
Matrix factorization techniques for recommender systems
  • Citing Article
  • August 2009

Computer

... The authors [13] proposed an approach that might fit the identification of library migrations by analyzing large datasets related to software developmentspecifically, code change histories. This approach searches for migration process patterns and then filters them based on their frequency or associated code changes. ...

Factor in the neighbors
  • Citing Article
  • January 2010

ACM Transactions on Knowledge Discovery from Data

... In [4], semantic change between consecutive queries and the relationship between the changed query and the clicked document is used to infer query context. In addition, query clustering [3], geographical location [15], and association rules [1] are some of the methods used by researchers for better information retrieval. However, we argued that these context extraction methods are confined by the capacity of their employed representation, which is hardly generalizable and not optimal for retrieval tasks. ...

Expediting search trend detection via prediction of query counts
  • Citing Conference Paper
  • February 2013

... We quantitatively evaluate our model in the context of two large datasets containing both numerical and text reviews; the Amazon Review dataset [17] and the Yelp dataset [25]. To avoid the problems frequently highlighted with RMSE-based evaluation [12], we follow the approach of Koren and Sill [31]. 2 The evaluation highlights that our proposed KNN model beats strong baselines for both memory-based and model-based systems. The result is that our model provides both explainability benefits, inherited from memory-based methods, enhanced by now enabling textual-review snippets to be used, as well as competitive performance. ...

Collaborative filtering on ordinal user feedback
  • Citing Conference Paper
  • August 2013

... These methods emphasize the importance of learning item-to-item semantics rather than user-to-item predictions. For example, [14] proposed learning item representations from implicit feedback in a Euclidean space. The I2V model [15] is a popular method for learning static item representations based on CF item cooccurrences [15]. ...

Towards scalable and accurate item-oriented recommendations
  • Citing Conference Paper
  • October 2013

... It has been shown that in collaborative filtering problems, much of the signal lies in simple popularity biases [71]. For example, the winning model in the Netflix Prize competition [10] managed to explain 42.6% of the ratings' variance i.e., R 2 = 42.6%, but the vast majority of the learned signal was attributed to popularity biases which explained a whopping R 2 = 32.5% of the variance (without any personalization) [72]. ...

Web-Scale Media Recommendation Systems
  • Citing Article
  • September 2012

Proceedings of the IEEE

... They streamline access to relevant information by identifying resources aligned with user interests based on historical experiences, aiming to save users time and costs. Originating in ecommerce to combat information overload in the Web 2.0 era, recommender systems quickly expanded into e-learning [2], tourism [27], smart cities [5], music [3], research resources, and television programs. In modern times, platforms like Amazon.com, ...

Recommender Systems Handbook
  • Citing Book
  • October 2010