## About

111

Publications

25,852

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

41,769

Citations

Citations since 2017

## Publications

Publications (111)

We describe an application of machine learning to a real-world computer assisted labeling task. Our experimental results expose significant deviations from the IID assumption commonly used in machine learning. These results suggest that the common random split of all data into training and testing can often lead to poor performance.

We introduce a variant of the $k$-nearest neighbor classifier in which $k$ is chosen adaptively for each query, rather than supplied as a parameter. The choice of $k$ depends on properties of each neighborhood, and therefore may significantly vary between different points. (For example, the algorithm will use larger $k$ for predicting the labels of...

The two state-of-the-art implementations of boosted trees: XGBoost and LightGBM, can process large training sets extremely fast. However, this performance requires that memory size is sufficient to hold a 2-3 multiple of the training set size. This paper presents an alternative approach to implementing boosted trees. which achieves a significant sp...

A new global bathymetry and topography grid (SRTM15_PLUS) is presented based on an updated compilation of ship soundings as well as depths predicted from satellite derived free-air gravity anomalies. The National Geospatial-Intelligence Agency provided 19 million, averaged soundings (15 arcsecond averages) in areas not covered by our previous 2014...

We present a novel approach for parallel computation in the context of machine learning that we call "Tell Me Something New" (TMSN). This approach involves a set of independent workers that use broadcast to update each other when they observe "something new". TMSN does not require synchronization or a head node and is highly resilient against faili...

We explore a novel approach to semi-supervised learning. This approach is contrary to the common approach in that the unlabeled examples serve to "muffle," rather than enhance, the guidance provided by the labeled examples. We provide several variants of the basic algorithm and show experimentally that they can achieve significantly higher AUC than...

We develop a worst-case analysis of aggregation of binary classifier
ensembles in a transductive setting, for a broad class of losses including but
not limited to all convex surrogates. The result is a family of parameter-free
ensemble aggregation algorithms, which are as efficient as linear learning and
prediction for convex risk minimization but...

We present and empirically evaluate an efficient algorithm that learns to
predict using an ensemble of binary classifiers. It uses the structure of the
ensemble predictions on unlabeled data to yield classification performance
gains without making assumptions on the predictions or their origin, and does
this as scalably as linear learning.

We develop a worst-case analysis of aggregation of classifier ensembles for
binary classification. The task of predicting to minimize error is formulated
as a game played over a given set of unlabeled data (a transductive setting),
where prior label information is encoded as constraints on the game. The
minimax solution of this game identifies case...

We consider a situation in which we see samples in $\mathbb{R}^d$ drawn
i.i.d. from some distribution with mean zero and unknown covariance A. We wish
to compute the top eigenvector of A in an incremental fashion - with an
algorithm that maintains an estimate of the top eigenvector in O(d) space, and
incrementally adjusts the estimate with each new...

We consider using an ensemble of binary classifiers for transductive
prediction, when unlabeled test data are known in advance. We derive minimax
optimal rules for confidence-rated prediction in this setting. By using
PAC-Bayes analysis on these rules, we obtain data-dependent performance
guarantees without distributional assumptions on the data. O...

The traditional k-NN classification rule predicts a label based on the most common label of the k nearest neighbors (the plurality rule). It is known that the plurality rule is optimal when the number of examples tends to infinity. In this paper we show that the plurality rule is sub-optimal when the number of labels is large and the number of exam...

Handwriting is a natural and versatile method for human-computer interaction,
especially on small mobile devices such as smart phones. However, as
handwriting varies significantly from person to person, it is difficult to
design handwriting recognizers that perform well for all users. A natural
solution is to use machine learning to adapt the recog...

The sensitivity of Adaboost to random label noise is a well-studied problem.
LogitBoost, BrownBoost and RobustBoost are boosting algorithms claimed to be
less sensitive to noise than AdaBoost. We present the results of experiments
evaluating these algorithms on both synthetic and real datasets. We compare the
performance on each of datasets when th...

Robust real time tracking is a requirement for many emerging applications. Many of these applications must track objects even as their appearance changes. Training classifiers online has become an effective approach for dealing with variability in object appearance. Classifiers can learn and adapt to changes online at the cost of additional runtime...

We present RIFFA, a reusable integration framework for FPGA accelerators. RIFFA provides communication and synchronization for FPGA accelerated software using a standard interface. Our goal is to expand the use of FPGAs as an acceleration platform by releasing, as open source, a no cost framework that easily integrates software on traditional CPUs...

We study the tracking problem, namely, estimating the hidden state of an
object over time, from unreliable and noisy measurements. The standard
framework for the tracking problem is the generative framework, which is the
basis of solutions such as the Bayesian algorithm and its approximation, the
particle filters. However, these solutions can be ve...

An accessible introduction and essential reference for an approach to machine learning that creates highly accurate prediction rules by combining many weak and inaccurate ones.
Boosting is an approach to machine learning based on the idea of creating a highly accurate predictor by combining many weak and inaccurate “rules of thumb.” A remarkably ri...

In this work, a novel occlusion detection algorithm using online learning is proposed for video applications. Each frame of a video is considered as a time-step for which pixels are classified as being either occluded or non-occluded. The Hedge algorithm is employed to determine weights for a set of experts, each of which is tuned to detect a speci...

VoIP (Voice over IP) services are using the Internet infras-tructure to enable new forms of communication and collab-oration. A growing number of VoIP service providers such as Skype, Vonage, Broadvoice, as well as many cable ser-vices are using the Internet to offer telephone services at much lower costs. However, VoIP services rely on the user's...

The solution of sparse linear systems, a fundamental and resource-intensive task in scientific computing, can be approached through multiple algorithms. Using an algorithm well adapted to characteristics of the task can significantly enhance the performance, such as reducing the time required for the operation, without compromising the quality of t...

We present a novel particle filtering algorithm for tracking a moving sound source using a microphone array. If there are N microphones in the array, we track all $N \choose 2$ delays with a single particle filter over time. Since it is known that tracking in high dimensions is rife with difficulties, we instead integrate into our particle filter a...

Fluorescent in situ hybridization (FISH) techniques are becoming extremely sensitive, to the point where individual RNA or DNA molecules can
be detected with small probes. At this level of sensitivity, the elimination of ‘off-target’ hybridization is of crucial importance,
but typical probes used for RNA and DNA FISH contain sequences repeated else...

The detection and counting of transcripts within single cells via fluorescent in situ hybridization (FISH) has allowed researchers to ask quantitative questions about gene expression at the level of individual cells. This method is often preferable to quantitative RT-PCR, because it does not necessitate destruction of the cells being probed and mai...

Identifying the catalytic residues in enzymes can aid in understanding the molecular basis of an enzyme's function and has significant implications for designing new drugs, identifying genetic disorders, and engineering proteins with novel functions. Since experimentally determining catalytic sites is expensive, better computational methods for ide...

ResBoost dataset. Details on the enzymes in the ResBoost dataset and additional base classifiers.

We present a new boosting algorithm, motivated by the large margins theory for boosting. We give experimental evidence that the new algorithm is significantly more robust against label noise than existing boosting algorithm.

We study the problem of decision-theoretic online learning (DTOL). Motivated by practical applications, we focus on DTOL when the number of actions is very large. Previous algorithms for learning in this framework have a tunable learning rate parameter, and a barrier to using online-learning in practical applications is that it is not understood ho...

We study the tracking problem, namely, estimating the hidden state of an object over time, from unreliable and noisy measurements. The standard framework for the tracking problem is the generative framework, which is the basis of solutions such as the Bayesian algorithm and its approximation, the particle filters. However, the problem with these so...

We have built a system that engages naive users in an audio- visual interaction with a computer in an unconstrained pub- lic space. We combine audio source localization techniques with face detection algorithms to detect and track the user throughout a large lobby. The sensors we use are an ad-hoc microphone array and a PTZ camera. To engage the us...

We present a new online learning algorithm for cumulative discounted gain. This learning algorithm does not use exponential weights on the experts. Instead, it uses a weighting scheme that depends on the regret of the master algorithm relative to the experts. In particular, experts whose discounted cumulative gain is smaller (worse) than that of th...

Cell motility proceeds by cycles of edge protrusion, adhesion, and retraction. Whether these functions are coordinated by biochemical or biomechanical processes is unknown. We find that myosin II pulls the rear of the lamellipodial actin network, causing upward bending, edge retraction, and initiation of new adhesion sites. The network then separat...

We present MEDUSA, an integrative method for learning motif models of transcription factor binding sites by incorporating promoter sequence and gene expression data. We use a modern large-margin machine learning approach, based on boosting, to enable feature selection from the high-dimensional search space of candidate binding sequences while avoid...

Combinatorial transcriptional fluorescent in situ hybridization (CT-FLSH) is a confocal fluorescence imaging technique enabling the detection of multiple active transcription units in individual interphase diploid nuclei. As improved combinatorial labeling methods allow simultaneous measurement of gene activities to expand from five genes in a sing...

Ortholog co-expression performance.

Self-rank performance of phenotypic profiles.

Distribution of the number of orthologs per organism.

Performance of different profile similarity measures.

Performance of different profile similarity measures.

Performance bias due to paralogous metabolic enzymes.

Effects of metabolite weighting and association-rank rescaling corrections.

Paralogs and orthologs among metabolic enzymes.

Performance of protein fusion associations.

Overlap in predictions based on different types of association evidence.

Alternating decision trees and related structures.

Gene coverage of different orthology datasets. Additional datasets, including pair-wise functional association matrices for different types of evidence and BLAST-based orthology datasets, are available on the authors' web site[63].

Performance of predictions based on KEGG pathway membership.

Chromosome clustering using Gene Order vs. Gene Nucleotide Position.

Sensitivity of prediction performance on the choice of excluded metabolites.

Prediction performance with and without paralog exclusion.

predictions.zip. Sample predictions of E. coli orphans. Additional datasets, including pair-wise functional association matrices for different types of evidence and BLAST-based orthology datasets, are available on the authors' web site[63].

We have recently introduced a predictive framework for studying gene transcriptional regulation in simpler organisms using a novel supervised learning algorithm called GeneClass. GeneClass is motivated by the hypothesis that in model organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or dow...

Existing large-scale metabolic models of sequenced organisms commonly include enzymatic functions which can not be attributed to any gene in that organism. Existing computational strategies for identifying such missing genes rely primarily on sequence homology to known enzyme-encoding genes.
We present a novel method for identifying genes encoding...

One of the most labor intensive aspects of developing ac-curate visual object detectors using machine learning is to gather sufficient amount of labeled examples. We develop a selective sampling method, based on boosting, which dra-matically reduces the amount of human labor required for this task. We apply this method to the problem of detecting p...

We present a novel classification-based method for learning to predict gene regulatory response. Our approach is motivated by the hypothesis that in simple organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular experiment based on (1) the presence of binding si...

We study a simple learning algorithm for binary classification. Instead of predicting with the best hypothesis in the hypothesis class, that is, the hypothesis that minimizes the training error, our algorithm predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error. We show that the prediction...

We study the problem of learning to accurately rank a set of objects by combining a given collection of ranking or preference functions. This problem of combining preferences arises in several applications, such as that of combining the results of different search engines, or the "collaborativefiltering " problem of ranking movies for a user based...

Discussions of: "Process consistency for AdaBoost" [Ann. Statist. 32 (2004), no. 1, 13-29] by W. Jiang; "On the Bayes-risk consistency of regularized boosting methods" [ibid., 30-55] by G. Lugosi and N. Vayatis; and "Statistical behavior and consistency of classification methods based on convex risk minimization" [ibid., 56-85] by T. Zhang. Include...

We introduce novel profile-based string kernels for use with support vector machines (SVMs) for the problems of protein classification and remote homology detection. These kernels use probabilistic profiles, such as those produced by the PSI-BLAST algorithm, to define position-dependent mutation neighborhoods along protein sequences for inexact mat...

This discussion concerns the following papers: W. Jiang [Process consistency for AdaBoost. ibid., 13–29 (2004; Zbl 1105.62316)]; G. Lugosi and N. Vayatis [On the Bayes-risk consistency of regularized boosting methods. ibid., 30–55 (2004; Zbl 1105.62319)]; and T. Zhang [Statistical behavior and consistency of classification methods based on convex r...

. The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new boosting algorithm for combining preferences called Rank...

In the multiarmed bandit problem, a gambler must decide which arm of K non- identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation...

In an earlier paper (9), we introduced a new "boosting" algorithm called AdaBoost which, theoretically, can be used to significantly reduce the error of any learning algorithm that consistently generates classifiers whose performance is a little better than random guessing. We also introduced the related notion of a "pseudo-loss" which is a method...

this paper, games are played by two players. One will be called the adversary and the other the strategy learning algorithm. We think of the adversary as a fixed resource-bounded computational mechanism that may be playing a strategy quite different from the minimax optimal strategy for the game. The strategy learning algorithm will be a polynomial...

We study a simple learning algorithm for binary classification. Instead of predicting with the best hypothesis in the hypothesis class, this algorithm predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error. We show that the prediction of this algorithm is much more stable than the prediction...

We would like to thank Leo Breiman for his interest in our work on boosting, for his extensive experiments with the AdaBoost algorithm (which he calls arc-fs) and for his very generous exposition of our work to the statistics community. Breiman’s experiments and our intensive email communication over the last two years have inspired us to think abo...

. One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this phenomenon is related to the distribution of mar...

We study the close connections between game theory, on-line prediction and boosting. After a brief review of game theory, we describe an algorithm for learning to play repeated games based on the on-line prediction methods of Littlestone and Warmuth. The analysis of this algorithm yields a simple proof of von Neumann's famous minmax theorem, as wel...

We study a simple learning algorithm for binary classification. Instead of predicting with the best hypothesis in the hypothesis class, this algorithm predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error. We show that the prediction of this algorithm is much more stable than the prediction...

In the multi-armed bandit problem, a gambler must decide which arm of K non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation...

We describe an efficient algorithm for estimating a mixture of two product distributions over binary vectors. 1 Introduction There are two major lines in research in Machine learning, supervised learning and unsupervised learning. Supervised learning has been the more appealing to theoretical analysis, since the goal that it sets is very clear. We...

In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting. We show that the multiplicative weight-updat...

We introduce and analyze a new algorithm for linear classification which combines Rosenblatt's perceptron algorithm with Helmbold and Warmuth's leave-one-out method. Like Vapnik's maximal-margin classifier, our algorithm takes advantage of data that are linearly separable with large margins. Compared to Vapnik's algorithm, however, ours is much sim...

We present a simple algorithm for playing a repeated game. We show that a player using this algorithm suffers average loss that is guaranteed to come close to the minimum loss achievable by any fixed strategy. Our bounds are nonasymptotic and hold for any opponent. The algorithm, which uses the multiplicative-weight methods of Littlestone and Warmu...

Boosting is a general method for improving the accuracy of any given learning algorithm. This short overview paper introduces the boosting algorithm AdaBoost, and explains the un-derlying theory of boosting, including an explanation of why boosting often does not suffer from overfitting as well as boosting's relationship to support-vector machines....

We introduce and analyze a new algorithm for linear classification which combines Rosenblatt 's perceptron algorithm with Helmbold and Warmuth's leave-one-out method. Like Vapnik 's maximal-margin classifier, our algorithm takes advantage of data that are linearly separable with large margins. Compared to Vapnik's algorithm, however, ours is much s...

In the multi-armed bandit problem, a gambler must decide which arm of K non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation...

We study the problem of learning to accurately rank a set of objects by combining a given collection of ranking or preference functions. This problem of combining preferences arises in several applications, such as that of combining the results of different search engines, or the "collaborative-filtering" problem of ranking movies for a user based...

This paper describes new and efficient algorithms for learning deterministic finite automata. Our approach is primarily distinguished by two features: (1) the adoption of an average-case setting to model the “typical” labeling of a finite automaton, while retaining a worst-case model for the underlying graph of the automaton, along with (2) a learn...

In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting. We show that the multiplicative weight-updat...

We study online learning algorithms that predict by com-bining the predictions of several subordinate prediction algorithms, sometimes called "experts." These simple algorithms belong to the multiplicative weights family of algorithms. The performance of these algorithms degrades only logarithmically with the number of experts, making them particul...

We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected n...

We study efficient algorithms for solving the following problem, which we call the switching distributions learning problem. A sequence S = oe 1 oe 2 : : : oe n , over a finite alphabet S is generated in the following way. The sequence is a concatenation of K runs, each of which is a consecutive subsequence. Each run is generated by independent ran...

One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated hypothesis usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this phenomenon,is related to the distribution of margi...

In this paper we study learning algorithms for environments which are changing over time. Unlike most previous work, we are interested in the case where the changes might be rapid but their “direction” is relatively constant. We model this type of change by assuming that the target distribution is changing continuously at a constant rate from one e...