Joseph Sill’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (4)


Collaborative filtering on ordinal user feedback
  • Conference Paper

August 2013

·

133 Reads

·

34 Citations

Yehuda Koren

·

Joseph Sill

We propose a collaborative filtering (CF) recommendation framework which is based on viewing user feedback on products as ordinal, rather than the more common numerical view. Such an ordinal view frequently provides a more natural reflection of the user intention when providing qualitative ratings, allowing users to have different internal scoring scales. Moreover, we can address scenarios where assigning numerical scores to different types of user feedback would not be easy. The framework can wrap most collaborative filtering algorithms, enabling algorithms previously designed for numerical values to handle ordinal values. We demonstrate our framework by wrapping a leading matrix factorization CF method. A cornerstone of our method is its ability to predict a full probability distribution of the expected item ratings, rather than only a single score for an item. One of the advantages this brings is a novel approach to estimating the confidence level in each individual prediction. Compared to previous approaches to confidence estimation, ours is more principled and empirically superior in its accuracy. We demonstrate the efficacy of the approach on two of the largest publicly available datasets: the Netflix data and the Yahoo! Music data.


Figure 1: FWLS forms a linear combination of products of model outputs and meta-features
Table 2 : RMSEs Using Cumulative Meta-Feature Sets
Feature-Weighted Linear Stacking
  • Article
  • Full-text available

November 2009

·

2,407 Reads

·

173 Citations

Ensemble methods, such as stacking, are designed to boost predictive accuracy by blending the predictions of multiple machine learning models. Recent work has shown that the use of meta-features, additional inputs describing each example in a dataset, can boost the performance of ensemble methods, but the greatest reported gains have come from nonlinear procedures requiring significant tuning and training time. Here, we present a linear technique, Feature-Weighted Linear Stacking (FWLS), that incorporates meta-features for improved accuracy while retaining the well-known virtues of linear regression regarding speed, stability, and interpretability. FWLS combines model predictions linearly using coefficients that are themselves linear functions of meta-features. This technique was a key facet of the solution of the second place team in the recently concluded Netflix Prize competition. Significant increases in accuracy over standard linear stacking are demonstrated on the Netflix Prize collaborative filtering dataset. Comment: 17 pages, 1 figure, 2 tables

Download

A linear fit gets the correct monotonicity directions

January 2008

·

22 Reads

·

6 Citations

Machine Learning

Let f be a function on ℝd that is monotonic in every variable. There are 2d possible assignments to the directions of monotonicity (two per variable). We provide sufficient conditions under which the optimal linear model obtained from a least squares regression on f will identify the monotonicity directions correctly. We show that when the input dimensions are independent, the linear fit correctly identifies the monotonicity directions. We provide an example to illustrate that in the general case, when the input dimensions are not independent, the linear fit may not identify the directions correctly. However, when the inputs are jointly Gaussian, as is often assumed in practice, the linear fit will correctly identify the monotonicity directions, even if the input dimensions are dependent. Gaussian densities are a special case of a more general class of densities (Mahalanobis densities) for which the result holds. Our results hold when f is a classification or regression function. If a finite data set is sampled from the function, we show that if the exact linear regression would have yielded the correct monotonicity directions, then the sample regression will also do so asymptotically (in a probabilistic sense). This result holds even if the data are noisy.


Using A Linear Fit To Determine Monotonicity Directions

April 2003

·

28 Reads

Lecture Notes in Computer Science

Let f be a function on ℝd that is monotonic in every variable. There are 2d possible assignments to the directions of monotonicity (two per variable). We provide sufficient conditions under which the optimal linear model obtained from a least squares regression on f will identify the monotonicity directions correctly. We show that when the input dimensions are independent, the linear fit correctly identifies the monotonicity directions. We provide an example to illustrate that in the general case, when the input dimensions are not independent, the linear fit may not identify the directions correctly. However, when the inputs are jointly Gaussian, as is often assumed in practice, the linear fit will correctly identify the monotonicity directions, even if the input dimensions are dependent. Gaussian densities are a special case of a more general class of densities (Mahalanobis densities) for which the result holds. Our results hold when f is a classification or regression function. If a finite data set is sampled from the function, we show that if the exact linear regression would have yielded the correct monotonicity directions, then the sample regression will also do so asymptotically (in a probabilistic sense). This result holds even if the data are noisy.

Citations (3)


... We quantitatively evaluate our model in the context of two large datasets containing both numerical and text reviews; the Amazon Review dataset [17] and the Yelp dataset [25]. To avoid the problems frequently highlighted with RMSE-based evaluation [12], we follow the approach of Koren and Sill [31]. 2 The evaluation highlights that our proposed KNN model beats strong baselines for both memory-based and model-based systems. The result is that our model provides both explainability benefits, inherited from memory-based methods, enhanced by now enabling textual-review snippets to be used, as well as competitive performance. ...

Reference:

KNNs of Semantic Encodings for Rating Prediction
Collaborative filtering on ordinal user feedback
  • Citing Conference Paper
  • August 2013

... Monotonicity has been investigated in the context of classification (Cano et al., 2019), including neural networks (Sill, 1997;Magdon-Ismail & Sill, 2008;Bonakdarpour et al., 2018;Sivaraman et al., 2020;Liu et al., 2020), random forests (Bartley et al., 2019) and rule ensembles (Bartley et al., 2018), decision trees (Ben-David et al., 1989;Ben-David, 1995), decision lists (Potharst & Bioch, 2000) and decision rules (Verbeke et al., 2017), support vector machines (Bartley et al., 2016), nearest-neighbor classifiers (Duivesteijn & Feelders, 2008), among others (Fard et al., 2016;Gupta et al., 2016;You et al., 2017;Bonakdarpour et al., 2018). Monotonicity has been studied Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. ...

A linear fit gets the correct monotonicity directions
  • Citing Article
  • January 2008

Machine Learning

... The meta-model, often referred to as a 'stacking' model, is a two layer model, where a first layer of several base learners is trained, and the second layer integrates the predictions of the base learners using another model to consolidate and finetune their predictions [50]. By combining multiple models, the meta-model aims to optimally combine the prediction results of the base learners to gain predictive power from the ensemble of their predictions and mitigate their individual weaknesses [51]. This technique is particularly pertinent when models exhibit a wide variability in performance across different datasets and validation methods, as indicated in the summary of related works (see Table 1). ...

Feature-Weighted Linear Stacking