Charles Elkan

Charles Elkan
  • University of California, San Diego

About

158
Publications
97,234
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
25,512
Citations
Current institution
University of California, San Diego

Publications

Publications (158)
Article
Full-text available
One-class classification is a common situation in remote sensing, where researchers aim to extract a single land type from remotely sensed data. Learning a classifier from labeled positive and unlabeled background data, which is the case-control sampling scenario, is efficient for one-class remote sensing classification because labeled negative dat...
Article
We consider real world task-oriented dialog settings, where agents need to generate both fluent natural language responses and correct external actions like database queries and updates. We demonstrate that, when applied to customer support chat transcripts, Sequence to Sequence (Seq2Seq) models often generate short, incoherent and ungrammatical na...
Article
Full-text available
Scheduling surgeries is a challenging task due to the fundamental uncertainty of the clinical environment, as well as the risks and costs associated with under- and over-booking. We investigate neural regression algorithms to estimate the parameters of surgery case durations, focusing on the issue of heteroscedasticity. We seek to simultaneously es...
Article
Full-text available
Besides the overpowering bouquet of raspberries in this guy's beer, this review is remarkable for another reason. It was produced by a computer program instructed to hallucinate a review for a "fruit/vegetable beer." Using a powerful artificial-intelligence tool called a recurrent neural network, the software that produced this passage isn't even p...
Article
Full-text available
Clinical medical data, especially in the intensive care unit (ICU), consists of multivariate time series of observations. For each patient visit (or episode), sensor data and lab test results are recorded in the patient's Electronic Health Record (EHR). While potentially containing a wealth of insights, the data is difficult to mine effectively, ow...
Article
Full-text available
We extend previous work on efficiently training linear models by applying stochastic updates to non-zero features only, lazily bringing weights current as needed. To date, only the closed form updates for the $\ell_1$, $\ell_{\infty}$, and the rarely used $\ell_2$ norm have been described. We extend this work by showing the proper closed form updat...
Article
Full-text available
The objective of machine learning is to extract useful information from data, while privacy is preserved by concealing information. Thus it seems hard to reconcile these competing interests. However, they frequently must be balanced when mining sensitive data. For example, medical research represents an important application where it is necessary b...
Conference Paper
Full-text available
This paper provides new insight into maximizing F1 measures in the context of binary classification and also in the context of multilabel classification. The harmonic mean of precision and recall, the F1 measure is widely used to evaluate the success of a binary classifier when one class is rare. Micro average, macro average, and per instance avera...
Article
Full-text available
This paper investigates the properties of the widely-utilized F1 metric as used to evaluate the performance of multi-label classifiers. We show that given an uninformative binary classifier, F1-optimal thresholding is to predict all instances positive. More surprisingly, we prove a relationship between the optimal threshold and the best achievable...
Article
Full-text available
This paper provides new insight into maximizing F1 scores in the context of binary classification and also in the context of multilabel classification. The harmonic mean of precision and recall, F1 score is widely used to measure the success of a binary classifier when one class is rare. Micro average, macro average, and per instance average F1 sco...
Article
This paper analyzes a novel method for publishing data while still protecting privacy. The method is based on computing weights that make an existing dataset, for which there are no confidentiality issues, analogous to the dataset that must be kept private. The existing dataset may be genuine but public already, or it may be synthetic. The weights...
Article
Full-text available
Multilabel learning is a machine learning task that is important for applications, but challenging. A recent method for multilabel learning called probabilistic classifier chains (PCCs) has several appealing properties. However, PCCs suffer from the computational issue that inference (i.e., predicting the label of an example) requires time exponent...
Article
This paper investigates the profitability of a trading strategy based on training a model to identify stocks with high or low predicted returns. A tail set is defined to be a group of stocks whose volatility-adjusted price change is in the highest or lowest quantile, for example the highest or lowest 5%. Each stock is represented by a set of techni...
Conference Paper
In this paper, we show how to use the classical technique of beam search for multilabel learning (MLL). A recent method for multilabel learning called probabilistic classifier chains (PCCs) has several appealing properties. However, PCCs suffer from the computational issue that inference (i.e., predicting the label of an example) requires time expo...
Conference Paper
This paper investigates a reinforcement learning method that combines learning a model of the environment with least-squares policy iteration (LSPI). The LSPI algorithm learns a linear approximation of the optimal state-action value function; the idea studied here is to let this value function depend on a learned estimate of the expected next state...
Article
This paper investigates the profitability of a trading strategy based on training a model to identify stocks with high or low predicted returns. A tail set is defined to be a group of stocks whose volatility-adjusted price change is in the highest or lowest quantile, for example the highest or lowest 5%. Each stock is represented by a set of techni...
Article
In many real-world applications of machine learning classifiers, it is essential to predict the probability of an example belonging to a particular class. This paper proposes a simple technique for predicting probabilities based on optimizing a ranking loss, followed by isotonic regression. This semi-parametric technique offers both good ranking an...
Article
Full-text available
The role of inhibition is investigated in a multiclass support vector machine formalism inspired by the brain structure of insects. The so-called mushroom bodies have a set of output neurons, or classification functions, that compete with each other to encode a particular input. Strongly active output neurons depress or inhibit the remaining output...
Article
Full-text available
Suppose that we have n training examples. The training data are a matrix with n rows and p columns, where each example is represented by values for p different features. Assume that each feature value is a real number. Let feature value j for example number i be written xij. The label of example i is yi. For example, yi = 1 if message i is spam and...
Article
We propose an online topic model for sequentially analyzing the time evolution of topics in document collections. Topics naturally evolve with multiple timescales. For example, some words may be used consistently over one hundred years, while other words ...
Article
Full-text available
In ecological studies, it is useful to estimate the probability that a species occurs at given locations. The probability of presence can be modeled by traditional statistical methods, if both presence and absence data are available. However, the challenge is that most species records contain only presence data, without reliable absence data. Previ...
Article
Full-text available
Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects includi...
Data
ACT annotation guidelines. Basic classification criteria for PPI abstracts.
Data
IMT method distribution. Distribution of interaction detection methods across the different IMT data sets.
Data
Full-text available
Evaluation metrics overview. Details on the calculation of the used evaluation scores.
Data
ACT example run. iP/R curve of the best team (73, S. Kim and W. J. Wilbur) in the Article Classification Task. Circle 1: Of the top 2% (130) of all results, approx. 90% (120) are relevant abstracts. Circle 2: To find half (295) of all relevant abstracts (Recall around 50%), a human going over the ranked list only has to look at the first 7% (421) o...
Conference Paper
Many reinforcement learning methods are based on a function Q(s,a) whose value is the discounted total reward expected after performing the action a in the state s. This paper explores the implications of representing the Q function as Q(s,a) = s T Wa, where W is a matrix that is learned. In this representation, both s and a are real-valued vectors...
Conference Paper
We propose to solve the link prediction problem in graphs using a supervised matrix factorization approach. The model learns latent features from the topological structure of a (possibly directed) graph, and is shown to make better predictions than popular unsupervised scores. We show how these latent features may be combined with optional explicit...
Article
Convex optimization has emerged as useful tool for applications that include data analysis and model fitting, resource allocation, engineering design, network design and optimization, finance, and control and signal processing. After an overview, the ...
Article
Full-text available
With well over 1,000 specialized biological databases in use today, the task of automatically identifying novel, relevant data for such databases is increasingly important. In this paper, we describe practical machine learning approaches for identifying MEDLINE documents and Swiss-Prot/TrEMBL protein records, for incorporation into a specialized bi...
Article
Full-text available
What is called supervised learning is the most fundamental task in machine learning. In supervised learning, we have training examples and test examples. A training example is an ordered pair 〈x, y 〉 where x is an instance and y is a label. A test example is an instance x with unknown label. The goal is to predict labels for test examples. The name...
Article
Full-text available
In remote-sensing classification, there are situations when users are only interested in classifying one specific land- cover type, without considering other classes. These situations are referred to as one-class classification. Traditional supervised learning is inefficient for one-class classification because it requires all classes that occur in...
Article
A low-rank approximation to a matrix A is a matrix with significantly smaller rank than A, and which is close to A according to some norm. Many practical applications involving the use of large matrices focus on low-rank approximations. By reducing the rank or dimensionality of the data, we reduce the complexity of analyzing the data. The singular...
Conference Paper
Full-text available
In dyadic prediction, labels must be predicted for pairs (dyads) whose members possess unique identifiers and, sometimes, additional features called side-information. Special cases of this problem include collaborative filtering and link prediction. We present a new log-linear model for dyadic prediction that is the first to satisfy several importa...
Conference Paper
Full-text available
This paper presents a fundamentally new approach to allowing learning algorithms to be applied to a dataset, while still keeping the records in the dataset confidential. Let D be the set of records to be kept private, and let E be a fixed set of records from a similar domain that is already public. The idea is to compute and publish a weight w(x) f...
Article
Full-text available
In dyadic prediction, the input consists of a pair of items (a dyad), and the goal is to predict the value of an observation related to the dyad. Special cases of dyadic prediction include collaborative filtering, where the goal is to predict ratings associated with (user, movie) pairs, and link prediction, where the goal is to predict the presence...
Article
An important extension of the idea of likelihood is conditional likelihood. The conditional likelihood of θ given data x and y is L(θ; y|x) = f(y|x; θ). Intuitively, y follows a probability distribution that is different for different x, but x itself is never unknown, so there is no need to have a probabilistic model of it. Technically, for each x...
Article
Full-text available
In dyadic prediction, labels must be predicted for pairs (dyads) whose members possess unique identifiers and, sometimes, additional features called side-information. Special cases of this problem include collaborative filtering and link prediction. We present the first model for dyadic prediction that satisfies several important desiderata: (i) la...
Article
Full-text available
Recently, supervised learning methods have been exploited to reconstruct gene regulatory networks from gene expression data. The reconstruction of a network is modeled as a binary classification problem for each pair of genes. A statistical classifier is trained to recognize the relationships between the activation profiles of gene pairs. This appr...
Article
Full-text available
Identifying a subset of features that preserves classification accuracy is a problem of growing importance, because of the increasing size and dimensionality of real-world data sets. We propose a new feature selection method, named Quadratic Programming Feature Selection (QPFS), that reduces the task to a quadratic optimization problem. In order to...
Article
Communications' Virtual Extension brings more quality articles to ACM members. These articles are now available in the ACM Digital Library.
Article
Full-text available
The aim of latent semantic indexing (LSI) is to uncover the relationships between terms, hidden concepts, and documents. LSI uses the matrix factorization technique known as singular value decomposition (SVD). In this paper, we apply LSI to standard benchmark collections. We find that LSI yields poor retrieval accuracy on the TREC 2, 7, 8, and 2004...
Conference Paper
Full-text available
Finding allowable places in words to insert hyphens is an important practical problem. The algorithm that is used most often nowadays has remained essentially unchanged for 25 years. This method is the TEX hyphenation algorithm of Knuth and Liang. We present here a hyphenation method that is clearly more accurate. The new method is an application o...
Conference Paper
Full-text available
Many dierent topic models have been used successfully for a variety of applications. However, even state-of-the-art topic models suer from the important aw that they do not capture the tendency of words to appear in bursts; it is a fundamental property of lan- guage that if a word is used once in a doc- ument, it is more likely to be used again. We...
Article
Full-text available
We apply topic models to financial data to obtain a more accurate view of eco-nomic networks than that supplied by traditional economic statistics. The learned topic models can serve as a substitute for or a complement to more complicated network analysis. Initial results on S&P500 stock market data show that topic models are able to obtain meaning...
Conference Paper
Full-text available
Classifiers are traditionally learned using sets of positive and negative training examples. However, often a classifier is required, but for training only an incomplete set of positive examples and a set of un- labeled examples are available. This is the situation, for example, with the Transport Classification Database (TCDB, www.tcdb.org), a rep...
Article
Full-text available
The Transporter Classification Database (TCDB), freely accessible at http://www.tcdb.org, is a relational database containing sequence, structural, functional and evolutionary information about transport systems from a variety of living organisms, based on the International Union of Biochemistry and Molecular Biology-approved transporter classifica...
Conference Paper
Full-text available
The input to an algorithm that learns a binary classifier normally consists of two sets of examples, where one set consists of positive examples of the concept to be learned, and the other set consists of negative examples. However, it is often the case that the available training data are an incomplete set of positive examples, and a set of unlabe...
Conference Paper
Full-text available
Learning a sequence classifier means learning to predict a sequence of output tags based on a set of input data items. For example, recognizing that a handwritten word is "cat", based on three images of handwritten letters and on gen- eral knowledge of English letter combinations, is a sequence classification task. This paper describes a new two-st...
Article
Full-text available
The KDD Cup is the oldest of the many data mining competitions that are now popular [1]. It is an integral part of the annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). In 2007, the traditional KDD Cup competition was augmented with a workshop with a focus on the concurrently active Netflix Prize competition [...
Conference Paper
Full-text available
The number of specialized databases in molecular biology is growing fast, as is the availability of molecular data. These trends necessitate the development of automatic methods for finding relevant information to include in specialized databases. We show how to use a comprehensive database (SwissProt) as a source of new entries for a specialized d...
Chapter
Automatically improving the performance of inference engines is a central issue in automated deduction research. This paper describes and evaluates mechanisms for speeding up search in an inference engine used in research on reactive planning. The inference engine is adaptive in the sense that its performance improves with experience. This improvem...
Conference Paper
Full-text available
This paper presents approaches to semi-supervised learning when the labeled training data and test data are differently distributed. Specifically, the samples selected for labeling are a biased subset of some general distribution and the test set consists of samples drawn from either that general distribution or the distribution of the unlabeled sa...
Conference Paper
Full-text available
The Dirichlet compound multinomial (DCM) distribution, also called the multivariate Polya distribution, is a model for text documents that takes into account burstiness: the fact that if a word occurs once in a document, it is likely to occur repeatedly. We derive a new fam- ily of distributions that are approximations to DCM distributions and cons...
Conference Paper
Full-text available
The Dirichlet compound multinomial (DCM) distribution has recently been shown to be a good model for documents because it captures the phenomenon of word burstiness, unlike standard mod- els such as the multinomial distribution. This paper investigates the DCM Fisher kernel, a function for comparing documents derived from the DCM. We show that the...
Article
Full-text available
This paper explores the automatic classification of audio tracks into musical genres. Our goal is to achieve human-level accuracy with fast training and classification. This goal is achieved with radial basis function (RBF) networks by using a combination of unsupervised and supervised initialization methods. These initialization methods yield clas...
Conference Paper
Full-text available
Multinomial distributions are often used to model text documents. However, they do not capture well the phenomenon that words in a document tend to appear in bursts: if a word appears once, it is more likely to appear again. In this paper, we propose the Dirichlet compound multinomial model (DCM) as an alternative to the multinomial. The DCM model...
Article
Full-text available
In this paper, we examine an important recent rule-based information extraction (IE) technique named Boosted Wrapper Induction (BWI) by conducting experiments on a wider variety of tasks than previously studied, including tasks using several collections of natural text documents. We investigate systematically how each algorithmic component of BWI,...
Article
Full-text available
When clustering a dataset, the right number $k$ of clusters to use is often not obvious, and choosing $k$ automatically is a hard algorithmic problem. In this paper we present an improved algorithm for learning $k$ while clustering. The G-means algorithm is based on a statistical test for the hypothesis that a subset of data follows a Gaussian dist...
Conference Paper
Full-text available
Most learning methods assume that the training set is drawn randomly from the population to which the learned model is to be applied. However in many applications this assumption is invalid. For example, lending institutions create models of who is likely to repay a loan from training sets consisting of people in their records to whom loans were gi...
Article
An online topic-speci c web search requires an intelligent web crawler. To be eective, a crawler must be able to identify and prioritize hyperlinks that are most likely to lead to relevant documents. We propose and evaluate a heuristic scoring method that predicts the utility of a link based on the presence of topic-speci c keywords associated in v...
Article
We investigate here the behavior of the standard k-means clustering algorithm and several alternatives to it: the k- harmonic means algorithm due to Zhang and colleagues, fuzzy k-means, Gaussian expectation-maximization, and two new variants of k-harmonic means. Our aim is to nd which aspects of these algorithms contribute to nding good clusterings...
Article
Full-text available
Hard disk drive failures are rare but are often costly. The ability to predict failures is important to consumers, drive manufacturers, and computer system manufacturers alike. In this paper we investigate the abilities of two Bayesian methods to predict disk drive failures based on measurements of drive internal conditions. We first view the probl...
Article
When clustering a dataset, the right number k of clusters to use is often not obvious, and choosing k automatically is a hard algorithmic problem.
Conference Paper
Full-text available
In this paper we show how to learn rules to improve the performance of a machine translation system. Given a system consisting of two translation functions (one from language A to language B and one from B to A), training text is translated from A to B and back again to A. Using these two transla- tions, differences in knowledge between the two tra...
Article
Full-text available
An important issue in reinforcement learning is how to incorporate expert knowledge in a principled manner, especially as we scale up to real-world tasks. In this paper, we present a method for incorporating arbitrary advice into the reward structure of a reinforcement learning agent without altering the optimal policy. This method extends the pote...
Article
Full-text available
The k-means algorithm is by far the most widely used method for discovering clusters in data. We show how to accelerate it dramatically, while still always computing exactly the same result as the standard algorithm. The accelerated algorithm avoids unnecessary distance calculations by applying the triangle inequality in two different ways, and by...
Conference Paper
Full-text available
We investigate here the behavior of the standard k-means clustering algorithm and several alternatives to it: the k-harmonic means algorithm due to Zhang and colleagues, fuzzy k-means, Gaussian expectation-maximization, and two new variants of k-harmonic means. Our aim is to find which aspects of these algorithms contribute to finding good clusteri...
Article
Full-text available
Improved methods are proposed for disk-drive failure prediction. The SMART (self monitoring and reporting technology) failure prediction system is currently implemented in disk-drives. Its purpose is to predict the near-term failure of an individual hard disk-drive, and issue a backup warning to prevent data loss. Two experimental tests of SMART sh...
Article
Full-text available
Class membership probability estimates are important for many applications of data mining in which classification outputs are combined with other sources of information for decision-making, such as example-dependent misclassification costs, the outputs of other classifiers, or domain knowledge. Previous calibration methods apply only to two-class p...
Article
Full-text available
We discuss a reinforcement learning framework where learners observe experts interacting with the environment. Our approach is to construct from these observations exploratory policies which favor selection of actions the expert has taken. This imitation strategy can be applied at any stage of learning, and requires neither that information regardi...
Article
Full-text available
In this paper, we examine an important recent rule-based information extraction (IE) technique named Boosted Wrapper Induction (BWI), by conducting experiments on a wider variety of tasks than previously studied, including tasks using several collections of natural text documents. We provide a systematic analysis of how each algorithmic component o...
Article
Introduction Information retrieval in the worldwide web environment poses unique challenges. The worldwide web is a distributed, always changing, and ever expanding collection of documents. These features of the web make it difficult to find information about a specific topic. The most common approaches involve indexing, but indexes introduce centr...
Article
Full-text available
To combine information from heterogeneous sources, equivalent data in the multiple sources must be identified.
Article
This paper is a reply to the article entitled Elkan's Theoretical Argument, Reconsidered by Prof. Enric Trillas and Prof. Claudi Alsina. I would like to express my thanks to Dr. Piero Bonissone for inviting me to write this paper and for showing me the article by Trillas and Alsina in advance of its publication. Ever since mathematical studies of f...
Article
Full-text available
This paper presents a first attempt at explaining the relationship between the psychological and artificial intelligence points of view of learning with a special focus on social learning. A two dimensional classification methodology is proposed that classifies learning behaviors in intelligent agents on the basis of agent structure and of informat...
Article
Full-text available
Detecting database records that are approximate duplicates, but not exact duplicates, is an important task. Databases may contain duplicate records concerning the same realworld entity because of data entry errors, because of unstandardized abbreviations, or because of differences in the detailed schemas of records from multiple databases, among ot...
Article
Full-text available
In many data mining domains, misclassification costs are different for different examples, in the same way that class membership probabilities are example-dependent. In these domains, both costs and probabilities are unknown for test examples, so both cost estimators and probability estimators must be learned. After discussing how to make optimal d...
Article
Full-text available
CoIL challenge 2000 was a supervised learning contest that attracted 43 entries. The authors of 29 entries later wrote explanations of their work. This paper discusses these reports and reaches three main conclusions. First, naive Bayesian classifiers remain competitive in practice: they were used by both the winning entry and the next best entry....
Article
Full-text available
This paper revisits the problem of optimal learning and decision-making when different misclassification errors incur different penalties. We characterize precisely but intuitively when a cost matrix is reasonable, and we show how to avoid the mistake of defining a cost matrix that is economically incoherent. For the two-class case, we prove a theo...
Article
Full-text available
Accurate, well-calibrated estimates of class membership probabilities are needed in many supervised learning applications, in particular when a cost-sensitive decision must be made about examples with example-dependent costs. This paper presents simple but successful methods for obtaining calibrated probability estimates from decision tree and naiv...
Article
This paper presents a simple new algorithm that performs k-means clustering in one scan of a dataset, while using a buffer for points from the dataset of fixed size. Experiments show that the new method is several times faster than standard k-means, and that it produces clusterings of equal or almost equal quality. The new method is a simplificatio...
Article
This paper will try to explain the details of Heckman's procedure and its mathematical justifcation using language familiar in the field of machine learning.
Article
With over 800 million pages covering most areas of human endeavor, the World-wide Web is a fertile ground for data mining research to make a difference to the effectiveness of information search. Today, Web surfers access the Web through two dominant ...
Article
Full-text available
data mining, machine learning, model fitting, regression, exploratory data analysis, error rate estimation, data modeling, data cleaning, data preparation, predictability We prove an inequality bound for the variance of the error of a regression function plus its non-smoothness as quantified by the Uniform Lipschitz condition. The coefficients in t...
Article
Full-text available
. Protein families are well characterized by a collection of motifs (Sonnhammer & Kahn 1994), sometimes referred to as the "common core" (Chothia & Lesk 1986). . These motifs can have structural and functional significance, and they may frequently be operated upon as units by diverse evolutionary mechanisms. . The quality of a multiple alignment de...
Article
Motivation: Modeling families of related biological sequences using Hidden Markov models #HMMs#, although increasingly widespread, faces at least one major problem: because of the complexity of these mathematical models, they require a relatively large training set in order to accurately characterize a given family. For families in which there are...
Article
Full-text available
. The MEME algorithm extends the expectation maximization (EM) algorithm for identifying motifs in unalignedbiopolymer sequences. The aim of MEME is to discover new motifs in a set of biopolymer sequences where little or nothing is known in advance about any motifs that may be present. MEME innovations expand the range of problems which can be solv...

Network

Cited By