Yoav Freund

Yoav Freund
University of California, San Diego | UCSD · Department of Computer Science and Engineering (CSE)

About

111
Publications
25,852
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
41,769
Citations
Citations since 2017
6 Research Items
15973 Citations
201720182019202020212022202305001,0001,5002,0002,5003,000
201720182019202020212022202305001,0001,5002,0002,5003,000
201720182019202020212022202305001,0001,5002,0002,5003,000
201720182019202020212022202305001,0001,5002,0002,5003,000

Publications

Publications (111)
Preprint
Full-text available
We describe an application of machine learning to a real-world computer assisted labeling task. Our experimental results expose significant deviations from the IID assumption commonly used in machine learning. These results suggest that the common random split of all data into training and testing can often lead to poor performance.
Preprint
We introduce a variant of the $k$-nearest neighbor classifier in which $k$ is chosen adaptively for each query, rather than supplied as a parameter. The choice of $k$ depends on properties of each neighborhood, and therefore may significantly vary between different points. (For example, the algorithm will use larger $k$ for predicting the labels of...
Preprint
The two state-of-the-art implementations of boosted trees: XGBoost and LightGBM, can process large training sets extremely fast. However, this performance requires that memory size is sufficient to hold a 2-3 multiple of the training set size. This paper presents an alternative approach to implementing boosted trees. which achieves a significant sp...
Conference Paper
A new global bathymetry and topography grid (SRTM15_PLUS) is presented based on an updated compilation of ship soundings as well as depths predicted from satellite derived free-air gravity anomalies. The National Geospatial-Intelligence Agency provided 19 million, averaged soundings (15 arcsecond averages) in areas not covered by our previous 2014...
Preprint
We present a novel approach for parallel computation in the context of machine learning that we call "Tell Me Something New" (TMSN). This approach involves a set of independent workers that use broadcast to update each other when they observe "something new". TMSN does not require synchronization or a head node and is highly resilient against faili...
Article
We explore a novel approach to semi-supervised learning. This approach is contrary to the common approach in that the unlabeled examples serve to "muffle," rather than enhance, the guidance provided by the labeled examples. We provide several variants of the basic algorithm and show experimentally that they can achieve significantly higher AUC than...
Article
We develop a worst-case analysis of aggregation of binary classifier ensembles in a transductive setting, for a broad class of losses including but not limited to all convex surrogates. The result is a family of parameter-free ensemble aggregation algorithms, which are as efficient as linear learning and prediction for convex risk minimization but...
Article
We present and empirically evaluate an efficient algorithm that learns to predict using an ensemble of binary classifiers. It uses the structure of the ensemble predictions on unlabeled data to yield classification performance gains without making assumptions on the predictions or their origin, and does this as scalably as linear learning.
Article
We develop a worst-case analysis of aggregation of classifier ensembles for binary classification. The task of predicting to minimize error is formulated as a game played over a given set of unlabeled data (a transductive setting), where prior label information is encoded as constraints on the game. The minimax solution of this game identifies case...
Article
We consider a situation in which we see samples in $\mathbb{R}^d$ drawn i.i.d. from some distribution with mean zero and unknown covariance A. We wish to compute the top eigenvector of A in an incremental fashion - with an algorithm that maintains an estimate of the top eigenvector in O(d) space, and incrementally adjusts the estimate with each new...
Article
We consider using an ensemble of binary classifiers for transductive prediction, when unlabeled test data are known in advance. We derive minimax optimal rules for confidence-rated prediction in this setting. By using PAC-Bayes analysis on these rules, we obtain data-dependent performance guarantees without distributional assumptions on the data. O...
Conference Paper
Full-text available
The traditional k-NN classification rule predicts a label based on the most common label of the k nearest neighbors (the plurality rule). It is known that the plurality rule is optimal when the number of examples tends to infinity. In this paper we show that the plurality rule is sub-optimal when the number of labels is large and the number of exam...
Article
Handwriting is a natural and versatile method for human-computer interaction, especially on small mobile devices such as smart phones. However, as handwriting varies significantly from person to person, it is difficult to design handwriting recognizers that perform well for all users. A natural solution is to use machine learning to adapt the recog...
Article
The sensitivity of Adaboost to random label noise is a well-studied problem. LogitBoost, BrownBoost and RobustBoost are boosting algorithms claimed to be less sensitive to noise than AdaBoost. We present the results of experiments evaluating these algorithms on both synthetic and real datasets. We compare the performance on each of datasets when th...
Conference Paper
Full-text available
Robust real time tracking is a requirement for many emerging applications. Many of these applications must track objects even as their appearance changes. Training classifiers online has become an effective approach for dealing with variability in object appearance. Classifiers can learn and adapt to changes online at the cost of additional runtime...
Conference Paper
Full-text available
We present RIFFA, a reusable integration framework for FPGA accelerators. RIFFA provides communication and synchronization for FPGA accelerated software using a standard interface. Our goal is to expand the use of FPGAs as an acceleration platform by releasing, as open source, a no cost framework that easily integrates software on traditional CPUs...
Article
Full-text available
We study the tracking problem, namely, estimating the hidden state of an object over time, from unreliable and noisy measurements. The standard framework for the tracking problem is the generative framework, which is the basis of solutions such as the Bayesian algorithm and its approximation, the particle filters. However, these solutions can be ve...
Book
An accessible introduction and essential reference for an approach to machine learning that creates highly accurate prediction rules by combining many weak and inaccurate ones. Boosting is an approach to machine learning based on the idea of creating a highly accurate predictor by combining many weak and inaccurate “rules of thumb.” A remarkably ri...
Conference Paper
Full-text available
In this work, a novel occlusion detection algorithm using online learning is proposed for video applications. Each frame of a video is considered as a time-step for which pixels are classified as being either occluded or non-occluded. The Hedge algorithm is employed to determine weights for a set of experts, each of which is tuned to detect a speci...
Article
Full-text available
VoIP (Voice over IP) services are using the Internet infras-tructure to enable new forms of communication and collab-oration. A growing number of VoIP service providers such as Skype, Vonage, Broadvoice, as well as many cable ser-vices are using the Internet to offer telephone services at much lower costs. However, VoIP services rely on the user's...
Chapter
The solution of sparse linear systems, a fundamental and resource-intensive task in scientific computing, can be approached through multiple algorithms. Using an algorithm well adapted to characteristics of the task can significantly enhance the performance, such as reducing the time required for the operation, without compromising the quality of t...
Article
Full-text available
We present a novel particle filtering algorithm for tracking a moving sound source using a microphone array. If there are N microphones in the array, we track all $N \choose 2$ delays with a single particle filter over time. Since it is known that tracking in high dimensions is rife with difficulties, we instead integrate into our particle filter a...
Article
Full-text available
Fluorescent in situ hybridization (FISH) techniques are becoming extremely sensitive, to the point where individual RNA or DNA molecules can be detected with small probes. At this level of sensitivity, the elimination of ‘off-target’ hybridization is of crucial importance, but typical probes used for RNA and DNA FISH contain sequences repeated else...
Article
The detection and counting of transcripts within single cells via fluorescent in situ hybridization (FISH) has allowed researchers to ask quantitative questions about gene expression at the level of individual cells. This method is often preferable to quantitative RT-PCR, because it does not necessitate destruction of the cells being probed and mai...
Article
Full-text available
Identifying the catalytic residues in enzymes can aid in understanding the molecular basis of an enzyme's function and has significant implications for designing new drugs, identifying genetic disorders, and engineering proteins with novel functions. Since experimentally determining catalytic sites is expensive, better computational methods for ide...
Data
ResBoost dataset. Details on the enzymes in the ResBoost dataset and additional base classifiers.
Article
We present a new boosting algorithm, motivated by the large margins theory for boosting. We give experimental evidence that the new algorithm is significantly more robust against label noise than existing boosting algorithm.
Article
Full-text available
We study the problem of decision-theoretic online learning (DTOL). Motivated by practical applications, we focus on DTOL when the number of actions is very large. Previous algorithms for learning in this framework have a tunable learning rate parameter, and a barrier to using online-learning in practical applications is that it is not understood ho...
Article
Full-text available
We study the tracking problem, namely, estimating the hidden state of an object over time, from unreliable and noisy measurements. The standard framework for the tracking problem is the generative framework, which is the basis of solutions such as the Bayesian algorithm and its approximation, the particle filters. However, the problem with these so...
Conference Paper
Full-text available
We have built a system that engages naive users in an audio- visual interaction with a computer in an unconstrained pub- lic space. We combine audio source localization techniques with face detection algorithms to detect and track the user throughout a large lobby. The sensors we use are an ad-hoc microphone array and a PTZ camera. To engage the us...
Article
We present a new online learning algorithm for cumulative discounted gain. This learning algorithm does not use exponential weights on the experts. Instead, it uses a weighting scheme that depends on the regret of the master algorithm relative to the experts. In particular, experts whose discounted cumulative gain is smaller (worse) than that of th...
Article
Cell motility proceeds by cycles of edge protrusion, adhesion, and retraction. Whether these functions are coordinated by biochemical or biomechanical processes is unknown. We find that myosin II pulls the rear of the lamellipodial actin network, causing upward bending, edge retraction, and initiation of new adhesion sites. The network then separat...
Conference Paper
Full-text available
We present MEDUSA, an integrative method for learning motif models of transcription factor binding sites by incorporating promoter sequence and gene expression data. We use a modern large-margin machine learning approach, based on boosting, to enable feature selection from the high-dimensional search space of candidate binding sequences while avoid...
Conference Paper
Combinatorial transcriptional fluorescent in situ hybridization (CT-FLSH) is a confocal fluorescence imaging technique enabling the detection of multiple active transcription units in individual interphase diploid nuclei. As improved combinatorial labeling methods allow simultaneous measurement of gene activities to expand from five genes in a sing...
Data
Full-text available
Ortholog co-expression performance.
Data
Full-text available
Self-rank performance of phenotypic profiles.
Data
Full-text available
Distribution of the number of orthologs per organism.
Data
Full-text available
Performance of different profile similarity measures.
Data
Performance of different profile similarity measures.
Data
Full-text available
Performance bias due to paralogous metabolic enzymes.
Data
Full-text available
Effects of metabolite weighting and association-rank rescaling corrections.
Data
Full-text available
Paralogs and orthologs among metabolic enzymes.
Data
Full-text available
Performance of protein fusion associations.
Data
Full-text available
Overlap in predictions based on different types of association evidence.
Data
Full-text available
Alternating decision trees and related structures.
Data
Full-text available
Gene coverage of different orthology datasets. Additional datasets, including pair-wise functional association matrices for different types of evidence and BLAST-based orthology datasets, are available on the authors' web site[63].
Data
Full-text available
Performance of predictions based on KEGG pathway membership.
Data
Full-text available
Chromosome clustering using Gene Order vs. Gene Nucleotide Position.
Data
Full-text available
Sensitivity of prediction performance on the choice of excluded metabolites.
Data
Full-text available
Prediction performance with and without paralog exclusion.
Data
predictions.zip. Sample predictions of E. coli orphans. Additional datasets, including pair-wise functional association matrices for different types of evidence and BLAST-based orthology datasets, are available on the authors' web site[63].
Article
Full-text available
We have recently introduced a predictive framework for studying gene transcriptional regulation in simpler organisms using a novel supervised learning algorithm called GeneClass. GeneClass is motivated by the hypothesis that in model organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or dow...
Article
Full-text available
Existing large-scale metabolic models of sequenced organisms commonly include enzymatic functions which can not be attributed to any gene in that organism. Existing computational strategies for identifying such missing genes rely primarily on sequence homology to known enzyme-encoding genes. We present a novel method for identifying genes encoding...
Article
One of the most labor intensive aspects of developing ac-curate visual object detectors using machine learning is to gather sufficient amount of labeled examples. We develop a selective sampling method, based on boosting, which dra-matically reduces the amount of human labor required for this task. We apply this method to the problem of detecting p...
Article
We present a novel classification-based method for learning to predict gene regulatory response. Our approach is motivated by the hypothesis that in simple organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular experiment based on (1) the presence of binding si...
Article
We study a simple learning algorithm for binary classification. Instead of predicting with the best hypothesis in the hypothesis class, that is, the hypothesis that minimizes the training error, our algorithm predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error. We show that the prediction...
Article
We study the problem of learning to accurately rank a set of objects by combining a given collection of ranking or preference functions. This problem of combining preferences arises in several applications, such as that of combining the results of different search engines, or the "collaborativefiltering " problem of ranking movies for a user based...
Article
Full-text available
Discussions of: "Process consistency for AdaBoost" [Ann. Statist. 32 (2004), no. 1, 13-29] by W. Jiang; "On the Bayes-risk consistency of regularized boosting methods" [ibid., 30-55] by G. Lugosi and N. Vayatis; and "Statistical behavior and consistency of classification methods based on convex risk minimization" [ibid., 56-85] by T. Zhang. Include...
Article
Full-text available
We introduce novel profile-based string kernels for use with support vector machines (SVMs) for the problems of protein classification and remote homology detection. These kernels use probabilistic profiles, such as those produced by the PSI-BLAST algorithm, to define position-dependent mutation neighborhoods along protein sequences for inexact mat...
Article
This discussion concerns the following papers: W. Jiang [Process consistency for AdaBoost. ibid., 13–29 (2004; Zbl 1105.62316)]; G. Lugosi and N. Vayatis [On the Bayes-risk consistency of regularized boosting methods. ibid., 30–55 (2004; Zbl 1105.62319)]; and T. Zhang [Statistical behavior and consistency of classification methods based on convex r...
Article
Full-text available
. The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new boosting algorithm for combining preferences called Rank...
Article
In the multiarmed bandit problem, a gambler must decide which arm of K non- identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation...
Article
In an earlier paper (9), we introduced a new "boosting" algorithm called AdaBoost which, theoretically, can be used to significantly reduce the error of any learning algorithm that consistently generates classifiers whose performance is a little better than random guessing. We also introduced the related notion of a "pseudo-loss" which is a method...
Article
this paper, games are played by two players. One will be called the adversary and the other the strategy learning algorithm. We think of the adversary as a fixed resource-bounded computational mechanism that may be playing a strategy quite different from the minimax optimal strategy for the game. The strategy learning algorithm will be a polynomial...
Article
We study a simple learning algorithm for binary classification. Instead of predicting with the best hypothesis in the hypothesis class, this algorithm predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error. We show that the prediction of this algorithm is much more stable than the prediction...
Article
We would like to thank Leo Breiman for his interest in our work on boosting, for his extensive experiments with the AdaBoost algorithm (which he calls arc-fs) and for his very generous exposition of our work to the statistics community. Breiman’s experiments and our intensive email communication over the last two years have inspired us to think abo...
Article
. One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this phenomenon is related to the distribution of mar...
Article
We study the close connections between game theory, on-line prediction and boosting. After a brief review of game theory, we describe an algorithm for learning to play repeated games based on the on-line prediction methods of Littlestone and Warmuth. The analysis of this algorithm yields a simple proof of von Neumann's famous minmax theorem, as wel...
Article
We study a simple learning algorithm for binary classification. Instead of predicting with the best hypothesis in the hypothesis class, this algorithm predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error. We show that the prediction of this algorithm is much more stable than the prediction...
Article
In the multi-armed bandit problem, a gambler must decide which arm of K non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation...
Article
We describe an efficient algorithm for estimating a mixture of two product distributions over binary vectors. 1 Introduction There are two major lines in research in Machine learning, supervised learning and unsupervised learning. Supervised learning has been the more appealing to theoretical analysis, since the goal that it sets is very clear. We...
Article
In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting. We show that the multiplicative weight-updat...
Article
We introduce and analyze a new algorithm for linear classification which combines Rosenblatt's perceptron algorithm with Helmbold and Warmuth's leave-one-out method. Like Vapnik's maximal-margin classifier, our algorithm takes advantage of data that are linearly separable with large margins. Compared to Vapnik's algorithm, however, ours is much sim...
Article
We present a simple algorithm for playing a repeated game. We show that a player using this algorithm suffers average loss that is guaranteed to come close to the minimum loss achievable by any fixed strategy. Our bounds are nonasymptotic and hold for any opponent. The algorithm, which uses the multiplicative-weight methods of Littlestone and Warmu...
Article
Boosting is a general method for improving the accuracy of any given learning algorithm. This short overview paper introduces the boosting algorithm AdaBoost, and explains the un-derlying theory of boosting, including an explanation of why boosting often does not suffer from overfitting as well as boosting's relationship to support-vector machines....
Article
Full-text available
We introduce and analyze a new algorithm for linear classification which combines Rosenblatt 's perceptron algorithm with Helmbold and Warmuth's leave-one-out method. Like Vapnik 's maximal-margin classifier, our algorithm takes advantage of data that are linearly separable with large margins. Compared to Vapnik's algorithm, however, ours is much s...
Article
Full-text available
In the multi-armed bandit problem, a gambler must decide which arm of K non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation...
Conference Paper
We study the problem of learning to accurately rank a set of objects by combining a given collection of ranking or preference functions. This problem of combining preferences arises in several applications, such as that of combining the results of different search engines, or the "collaborative-filtering" problem of ranking movies for a user based...
Article
Full-text available
This paper describes new and efficient algorithms for learning deterministic finite automata. Our approach is primarily distinguished by two features: (1) the adoption of an average-case setting to model the “typical” labeling of a finite automaton, while retaining a worst-case model for the underlying graph of the automaton, along with (2) a learn...
Article
In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting. We show that the multiplicative weight-updat...
Article
Full-text available
We study online learning algorithms that predict by com-bining the predictions of several subordinate prediction algorithms, sometimes called "experts." These simple algorithms belong to the multiplicative weights family of algorithms. The performance of these algorithms degrades only logarithmically with the number of experts, making them particul...
Article
Full-text available
We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected n...
Article
Full-text available
We study efficient algorithms for solving the following problem, which we call the switching distributions learning problem. A sequence S = oe 1 oe 2 : : : oe n , over a finite alphabet S is generated in the following way. The sequence is a concatenation of K runs, each of which is a consecutive subsequence. Each run is generated by independent ran...
Conference Paper
Full-text available
One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated hypothesis usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this phenomenon,is related to the distribution of margi...
Conference Paper
In this paper we study learning algorithms for environments which are changing over time. Unlike most previous work, we are interested in the case where the changes might be rapid but their “direction” is relatively constant. We model this type of change by assuming that the target distribution is changing continuously at a constant rate from one e...