Walid Krichene’s research while affiliated with Google Inc. and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (20)


Private Matrix Factorization with Public Item Features
  • Conference Paper

September 2023

·

8 Reads

·

Walid Krichene

·

Li Zhang

·

Mukund Sundararajan

Multi-Task Differential Privacy Under Distribution Skew

February 2023

·

5 Reads

Walid Krichene

·

Prateek Jain

·

·

[...]

·

Li Zhang

We study the problem of multi-task learning under user-level differential privacy, in which n users contribute data to m tasks, each involving a subset of users. One important aspect of the problem, that can significantly impact quality, is the distribution skew among tasks. Certain tasks may have much fewer data samples than others, making them more susceptible to the noise added for privacy. It is natural to ask whether algorithms can adapt to this skew to improve the overall utility. We give a systematic analysis of the problem, by studying how to optimally allocate a user's privacy budget among tasks. We propose a generic algorithm, based on an adaptive reweighting of the empirical loss, and show that when there is task distribution skew, this gives a quantifiable improvement of excess empirical risk. Experimental studies on recommendation problems that exhibit a long tail of small tasks, demonstrate that our methods significantly improve utility, achieving the state of the art on two standard benchmarks.


Differentially Private Image Classification from Features

November 2022

·

8 Reads

Leveraging transfer learning has recently been shown to be an effective strategy for training large models with Differential Privacy (DP). Moreover, somewhat surprisingly, recent works have found that privately training just the last layer of a pre-trained model provides the best utility with DP. While past studies largely rely on algorithms like DP-SGD for training large models, in the specific case of privately learning from features, we observe that computational burden is low enough to allow for more sophisticated optimization schemes, including second-order methods. To that end, we systematically explore the effect of design parameters such as loss function and optimization algorithm. We find that, while commonly used logistic regression performs better than linear regression in the non-private setting, the situation is reversed in the private setting. We find that linear regression is much more effective than logistic regression from both privacy and computational aspects, especially at stricter epsilon values (ϵ<1\epsilon < 1). On the optimization side, we also explore using Newton's method, and find that second-order information is quite helpful even with privacy, although the benefit significantly diminishes with stricter privacy guarantees. While both methods use second-order information, least squares is effective at lower epsilons while Newton's method is effective at larger epsilon values. To combine the benefits of both, we propose a novel algorithm called DP-FC, which leverages feature covariance instead of the Hessian of the logistic regression loss and performs well across all ϵ\epsilon values we tried. With this, we obtain new SOTA results on ImageNet-1k, CIFAR-100 and CIFAR-10 across all values of ϵ\epsilon typically considered. Most remarkably, on ImageNet-1K, we obtain top-1 accuracy of 88\% under (8, 81078 * 10^{-7})-DP and 84.3\% under (0.1, 81078 * 10^{-7})-DP.



On sampled metrics for item recommendation

July 2022

·

4 Reads

·

77 Citations

Communications of the ACM

Recommender systems personalize content by recommending items to users. Item recommendation algorithms are evaluated by metrics that compare the positions of truly relevant items among the recommended items. To speed up the computation of metrics, recent work often uses sampled metrics where only a smaller set of random items and the relevant items are ranked. This paper investigates such sampled metrics in more detail and shows that they are inconsistent with their exact counterpart, in the sense that they do not persist relative statements, for example, recommender A is better than B , not even in expectation. Moreover, the smaller the sample size, the less difference there is between metrics, and for very small sample size, all metrics collapse to the AUC metric. We show that it is possible to improve the quality of the sampled metrics by applying a correction, obtained by minimizing different criteria. We conclude with an empirical evaluation of the naive sampled metrics and their corrected variants. To summarize, our work suggests that sampling should be avoided for metric calculation, however if an experimental study needs to sample, the proposed corrections can improve the quality of the estimate.


Reciprocity in Machine Learning

February 2022

·

41 Reads

Machine learning is pervasive. It powers recommender systems such as Spotify, Instagram and YouTube, and health-care systems via models that predict sleep patterns, or the risk of disease. Individuals contribute data to these models and benefit from them. Are these contributions (outflows of influence) and benefits (inflows of influence) reciprocal? We propose measures of outflows, inflows and reciprocity building on previously proposed measures of training data influence. Our initial theoretical and empirical results indicate that under certain distributional assumptions, some classes of models are approximately reciprocal. We conclude with several open directions.


Figure 1: The flow of data and computation through ALX framework on TPU devices. Each TPU core performs identical Sharded Gather, Solve and Sharded Scatter stages for its batch of data in SPMD fashion.
Figure 3: Illustrating example of how sparse batches are densified for it to be XLA compatible.
Figure 4: Comparison of eval metrics when using bfloat16 numerics vs float32. (a) shows the unrecoverable collapse in metrics when using bfloat16.
Figure 5: Comparison of training time per epoch (in seconds), of plausible alternatives of linear system solver on TPU.
Figure 6: Scaling analysis of running time as number of TPU cores are increased. Each figure plots the time taken to train for one epoch in seconds.

+1

ALX: Large Scale Matrix Factorization on TPUs
  • Preprint
  • File available

December 2021

·

246 Reads

We present ALX, an open-source library for distributed matrix factorization using Alternating Least Squares, written in JAX. Our design allows for efficient use of the TPU architecture and scales well to matrix factorization problems of O(B) rows/columns by scaling the number of available TPU cores. In order to spur future research on large scale matrix factorization methods and to illustrate the scalability properties of our own implementation, we also built a real world web link prediction dataset called WebGraph. This dataset can be easily modeled as a matrix factorization problem. We created several variants of this dataset based on locality and sparsity properties of sub-graphs. The largest variant of WebGraph has around 365M nodes and training a single epoch finishes in about 20 minutes with 256 TPU cores. We include speed and performance numbers of ALX on all variants of WebGraph. Both the framework code and the dataset is open-sourced.

Download

Revisiting the Performance of iALS on Item Recommendation Benchmarks

October 2021

·

47 Reads

Matrix factorization learned by implicit alternating least squares (iALS) is a popular baseline in recommender system research publications. iALS is known to be one of the most computationally efficient and scalable collaborative filtering methods. However, recent studies suggest that its prediction quality is not competitive with the current state of the art, in particular autoencoders and other item-based collaborative filtering methods. In this work, we revisit the iALS algorithm and present a bag of tricks that we found useful when applying iALS. We revisit four well-studied benchmarks where iALS was reported to perform poorly and show that with proper tuning, iALS is highly competitive and outperforms any method on at least half of the comparisons. We hope that these high quality results together with iALS's known scalability spark new interest in applying and further improving this decade old technique.


iALS++: Speeding up Matrix Factorization with Subspace Optimization

October 2021

·

55 Reads

iALS is a popular algorithm for learning matrix factorization models from implicit feedback with alternating least squares. This algorithm was invented over a decade ago but still shows competitive quality compared to recent approaches like VAE, EASE, SLIM, or NCF. Due to a computational trick that avoids negative sampling, iALS is very efficient especially for large item catalogues. However, iALS does not scale well with large embedding dimensions, d, due to its cubic runtime dependency on d. Coordinate descent variations, iCD, have been proposed to lower the complexity to quadratic in d. In this work, we show that iCD approaches are not well suited for modern processors and can be an order of magnitude slower than a careful iALS implementation for small to mid scale embedding sizes (d ~ 100) and only perform better than iALS on large embeddings d ~ 1000. We propose a new solver iALS++ that combines the advantages of iALS in terms of vector processing with a low computational complexity as in iCD. iALS++ is an order of magnitude faster than iCD both for small and large embedding dimensions. It can solve benchmark problems like Movielens 20M or Million Song Dataset even for 1000 dimensional embedding vectors in a few minutes.


On Sampled Metrics for Item Recommendation (Extended Abstract)

August 2021

·

9 Reads

·

12 Citations

Recommender systems personalize content by recommending items to users. Item recommendation algorithms are evaluated by metrics that compare the positions of truly relevant items among the recommended items. To speed up the computation of metrics, recent work often uses sampled metrics where only a smaller set of random items and the relevant items are ranked. This paper investigates such sampled metrics and shows that they are inconsistent with their exact counterpart, in the sense that they do not persist relative statements, e.g., recommender A is better than B, not even in expectation. We show that it is possible to improve the quality of the sampled metrics by applying a correction. We conclude with an empirical evaluation of the naive sampled metrics and their corrected variants. Our work suggests that sampling should be avoided for metric calculation, however if an experimental study needs to sample, the proposed corrections can improve the estimates.


Citations (6)


... WSL is frequently employed in dual encoders. iALS (Rendle et al. (2022)) utilizes this loss to optimize matrix factorization models, while SAGram (Krichene et al. (2018)) applies it to optimize non-linear encoders. A noteworthy advantage of WSL stems from the applicability of ALS or higher-order gradient descent optimization methods like Newton's method, which involve updating the left and right latent matrices using the closed-form solutions or the second-order Hessian matrix. ...

Reference:

NDCG-Consistent Softmax Approximation with Accelerated Convergence
Revisiting the Performance of iALS on Item Recommendation Benchmarks
  • Citing Conference Paper
  • September 2022

... To ensure fair and unbiased evaluation, we compute these metrics by scoring all items in the catalog. This avoids the bias of sampled metrics, which can favor popular items [3,18]. While computationally intensive, full-catalog evaluation provides more accurate results, as sampled metrics are inconsistent with exact versions and can misrepresent recommender performance. ...

On sampled metrics for item recommendation
  • Citing Article
  • July 2022

Communications of the ACM

... On the other hand, implicit interaction models, such as multi-layer perceptrons (MLPs) [34], excel at capturing complex feature relationships through non-linear transformations. Yet, MLPs lack the inductive biases necessary for efficiently modeling simple operations like inner products [39], which limits their ability to learn explicit patterns. This fundamental trade-off has motivated the development of two-stream models, such as DeepFM [11], DCN [11], and xDeepFM [25], which aim to integrate explicit and implicit interaction modeling into a unified framework. ...

Neural Collaborative Filtering vs. Matrix Factorization Revisited
  • Citing Conference Paper
  • September 2020

... To evaluate the Top-K SDP recommendations for each target region, we adopt an allranking strategy (Krichene and Rendle 2020), as opposed to the user subset extraction approach used in previous studies (Wang et al. 2019;Wang et al. 2019). Specifically, we recommend SDPs that have not been previously interacted with for each target region, and select the Top-K SDPs that best align with the region's development needs. ...

On Sampled Metrics for Item Recommendation
  • Citing Conference Paper
  • August 2020

... Our approach builds on a recent line of work in which optimization algorithms are studied via the analysis of their behavior in continuoustime limits Su et al. (2016); Jordan (2018); Shi et al. (2018). Specifically, in the case of SGD, we study stochastic differential equations (SDEs) as surrogates for discrete stochastic optimization methods (see, e.g., Kushner and Yin, 2003;Li et al., 2017;Krichene and Bartlett, 2017;Chaudhari et al., 2018;Diakonikolas and Jordan, 2019). The construction is roughly as follows. ...

Acceleration and Averaging in Stochastic Mirror Descent Dynamics
  • Citing Article
  • July 2017