Preference-based learning to rank.

Machine Learning (Impact Factor: 1.47). 09/2010; 80:189-211. DOI: 10.1007/s10994-010-5176-9
Source: DBLP

ABSTRACT This paper presents an efficient preference-based ranking algorithm running in two stages. In the first stage, the algorithm
learns a preference function defined over pairs, as in a standard binary classification problem. In the second stage, it makes
use of that preference function to produce an accurate ranking, thereby reducing the learning problem of ranking to binary
classification. This reduction is based on the familiar QuickSort and guarantees an expected pairwise misranking loss of at
most twice that of the binary classifier derived in the first stage. Furthermore, in the important special case of bipartite
ranking, the factor of two in loss is reduced to one. This improved bound also applies to the regret achieved by our ranking
and that of the binary classifier obtained.

Our algorithm is randomized, but we prove a lower bound for any deterministic reduction of ranking to binary classification
showing that randomization is necessary to achieve our guarantees. This, and a recent result by Balcan et al., who show a
regret bound of two for a deterministic algorithm in the bipartite case, suggest a trade-off between achieving low regret
and determinism in this context.

Our reduction also admits an improved running time guarantee with respect to that deterministic algorithm. In particular,
the number of calls to the preference function in the reduction is improved from Ω(n
2) to O(nlog n). In addition, when the top k ranked elements only are required (k≪n), as in many applications in information extraction or search engine design, the time complexity of our algorithm can be
further reduced to O(klog k+n). Our algorithm is thus practical for realistic applications where the number of points to rank exceeds several thousand.

  • [Show abstract] [Hide abstract]
    ABSTRACT: The TreeRank algorithm was recently proposed in [1] and [2] as a scoring-based method based on recursive partitioning of the input space. This tree induction algorithm builds orderings by recursively optimizing the Receiver Operating Characteristic curve through a one-step optimization procedure called LeafRank. One of the aim of this paper is the in-depth analysis of the empirical performance of the variants of TreeRank/LeafRank method. Numerical experiments based on both artificial and real data sets are provided. Further experiments using resampling and randomization, in the spirit of bagging and random forests are developed [3, 4] and we show how they increase both stability and accuracy in bipartite ranking. Moreover, an empirical comparison with other efficient scoring algorithms such as RankBoost and RankSVM is presented on UCI benchmark data sets.
    Formal Pattern Analysis & Applications 01/2012; · 0.74 Impact Factor


Available from