[Show abstract][Hide abstract] ABSTRACT: We propose a new probabilistic graphical model that jointly models the
difficulties of questions, the abilities of participants and the correct
answers to questions in aptitude testing and crowdsourcing settings. We devise
an active learning/adaptive testing scheme based on a greedy minimization of
expected model entropy, which allows a more efficient resource allocation by
dynamically choosing the next question to be asked based on the previous
responses. We present experimental results that confirm the ability of our
model to infer the required parameters and demonstrate that the adaptive
testing scheme requires fewer questions to obtain the same accuracy as a static
[Show abstract][Hide abstract] ABSTRACT: Recent advances in click model have positioned it as an attractive method for representing user preferences in web search and online advertising. Yet, most of the existing works focus on training the click model for individual queries, and cannot accurately model the tail queries due to the lack of training data. Simultaneously, most of the existing works consider the query, url and position, neglecting some other important attributes in click log data, such as the local time. Obviously, the click through rate is different between daytime and midnight. In this paper, we propose a novel click model based on Bayesian network, which is capable of modeling the tail queries because it builds the click model on attribute values, with those values being shared across queries. We called our work General Click Model (GCM) as we found that most of the existing works can be special cases of GCM by assigning different parameters. Experimental results on a large- scale commercial advertisement dataset show that GCM can significantly and consistently lead to better results as compared to the state-of-the-art works.
Proceedings of the Third International Conference on Web Search and Web Data Mining, WSDM 2010, New York, NY, USA, February 4-6, 2010; 01/2010
[Show abstract][Hide abstract] ABSTRACT: Given a terabyte click log, can we build an efficient and ef- fective click model? It is commonly believed that web search click logs are a gold mine for search business, because they reflect users' preference over web documents presented by the search engine. Click models provide a principled ap- proach to inferring user-perceived relevance of web docu- ments, which can be leveraged in numerous applications in search businesses. Due to the huge volume of click data, scalability is a must. We present the click chain model (CCM), which is based on a solid, Bayesian framework. It is both scalable and in- cremental, perfectly meeting the computational challenges imposed by the voluminous click logs that constantly grow. We conduct an extensive experimental study on a data set containing 8.8 million query sessions obtained in July 2008 from a commercial search engine. CCM consistently outper- forms two state-of-the-art competitors in a number of met- rics, with over 9.7% better log-likelihood, over 6.2% better click perplexity and much more robust (up to 30%) predic- tion of the first and the last clicked position.
Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, April 20-24, 2009; 01/2009
[Show abstract][Hide abstract] ABSTRACT: We address the problem of learning large complex rank- ing functions. Most IR applications use evaluation metrics that depend only upon the ranks of documents. However, most ranking functions generate document scores, which are sorted to produce a ranking. Hence IR metrics are innately non-smooth with respect to the scores, due to the sort. Un- fortunately, many machine learning algorithms require the gradient of a training objective in order to perform the op- timization of the model parameters, and because IR met- rics are non-smooth, we need to find a smooth proxy ob- jective that can be used for training. We present a new family of training objectives that are derived from the rank distributions of documents, induced by smoothed scores. We call this approach SoftRank. We focus on a smoothed approximation to Normalized Discounted Cumulative Gain (NDCG), called SoftNDCG and we compare it with three other training objectives in the recent literature. We present two main results. First, SoftRank yields a very good way of optimizing NDCG. Second, we show that it is possible to achieve state of the art test set NDCG results by optimizing a soft NDCG objective on the training set with a dierent discount function.
Proceedings of the International Conference on Web Search and Web Data Mining, WSDM 2008, Palo Alto, California, USA, February 11-12, 2008; 01/2008
[Show abstract][Hide abstract] ABSTRACT: Gates are a new notation for representing mixture models and context-sensitive independence in factor graphs. Factor graphs provide a natural representation for message-passing algorithms, such as expectation propagation. However, message passing in mixture models is not well captured by factor graphs unless the en- tire mixture is represented by one factor, because the message equations have a containment structure. Gates capture this containment structure graphically, al- lowing both the independences and the message-passing equations for a model to be readily visualized. Different variational approxima tions for mixture models can be understood as different ways of drawing the gates in a model. We present general equations for expectation propagation and variational message passing in the presence of gates.
Advances in Neural Information Processing Systems 21, Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 8-11, 2008; 01/2008
[Show abstract][Hide abstract] ABSTRACT: We extend the Bayesian skill rating system TrueSkill to infer entire time series of skills of players by smoothing through time instead of filtering. The skill of each participating player, say, every year is represented by a latent skill variable which is affected by the relevant game outcomes that year, and coupled with the skill variables of the previous and subsequent year. Inference in the resulting factor graph is carried out by approximate message passing (EP) along the time series of skills. As before the system tracks the uncertainty about player skills, explicitly models draws, can deal with any number of competing entities and can infer individual skills from team results. We extend the system to estimate player-specific draw margins. Based on these models we present an analysis of the skill curves of important players in the history of chess over the past 150 years. Results include plots of players' lifetime skill development as well as the ability to compare the skills of different players across time. Our results indicate that a) the overall playing strength has increased over the past 150 years, and b) that modelling a player's ability to force a draw provides significantly better predictive power.
Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 3-6, 2007; 01/2007
[Show abstract][Hide abstract] ABSTRACT: We investigate the task of learning grid-based CRFs with hierarchical features motivated by the task of territory prediction in Go. We first analyze various independent and grid-based CRF classification models and state-of-the-art training/inference algorithms to de-termine which offers the best performance across a variety of metrics. Faced with the performance drawbacks of independent models and the computational drawbacks of intractable CRF models, we introduce the BMA-Tree algorithm that uses Bayesian model averaging of tree-structured predictors to exploit hierarchical feature structure. Our results demonstrate that BMA-Tree is supe-rior to other independent classifiers and pro-vides a computationally efficient alternative to intractable grid-based CRF models when training is too slow or approximate inference is inadequate for the task at hand.
[Show abstract][Hide abstract] ABSTRACT: We introduce the term cosegmentation which denotes the task of segmenting simultaneously the common parts of an image pair. A generative model for cosegmentation is presented. Inference in the model leads to minimizing an energy with an MRF term encoding spatial coherency and a global constraint which attempts to match the appearance histograms of the common parts. This energy has not been proposed previously and its optimization is challenging and NP-hard. For this problem a novel optimization scheme which we call trust region graph cuts is presented. We demonstrate that this framework has the potential to improve a wide range of research: Object driven image retrieval, video tracking and segmentation, and interactive image editing. The power of the framework lies in its generality, the common part can be a rigid/non-rigid object (or scene), observed from different viewpoints or even similar objects of the same class.
Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on; 07/2006
[Show abstract][Hide abstract] ABSTRACT: We present a new Bayesian skill rating system which can be viewed as a generalisation of the Elo system used in Chess. The new system tracks the uncertainty about player skills, explicitly models draws, can deal with any number of competing entities and can infer individual skills from team results. Inference is performed by approximate message passing on a factor graph representation of the model. We present experimental evidence on the increased accuracy and convergence speed of the system compared to Elo and report on our experience with the new rating system running in a large-scale commercial online gaming service under the name of TrueSkill.
Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4-7, 2006; 01/2006
[Show abstract][Hide abstract] ABSTRACT: The LETOR datasets consist of data extracted from tradi- tional IR test corpora. For each of a number of test top- ics, a set of documents has been extracted, in the form of features of each document-query pair, for use by a ranker. An examination of the ways in which documents were se- lected for each topic shows that the selection has (for each of the three corpora) a particular bias or skewness. This has some unexpected eects which may considerably inuence any learning-to-rank exercise conducted on these datasets. The problems may be resolvable by modifying the datasets.
[Show abstract][Hide abstract] ABSTRACT: Because maximum-likelihood training is intractable for general factor graphs, an appealing alternative is local training, which approximates the likelihood gradient without performing global propagation on the graph. We discuss two new local training methods: shared-unary piecewise, in which unary factors are shared among every higher-way factor that they neighbor, and the one-step cutout method, which computes exact marginals on overlapping subgraphs. Comparing them to naive piecewise training, we show that just as piecewise training corresponds to using the Bethe pseudomarginals after zero BP iterations, shared-unary piecewise corresponds to the pseudomarginals after one parallel iteration, and the one-step cutout method corresponds to the beliefs after two iterations. We show in simulations that this point of view illuminates the errors made by shared-unary piecewise.
[Show abstract][Hide abstract] ABSTRACT: Gates are a new notation for representing mixture models and context-sensitive independence in factor graphs. Factor graphs provide a natural representation for message-passing algorithms for probabilistic inference, such as expectation propagation. However, message passing in mixture models is not well captured by factor graphs unless the entire mixture is represented by one factor, because the message equations have a containment structure. Gates capture this containment structure graphically, allowing both the independences and the message-passing equations for a model to be readily visualized. Different variational approximations for mixture models can be understood as different ways of drawing the gates in a model. We present general equations for expectation propagation and variational message passing in the presence of gates.