Chris Drummond

Chris Drummond
National Research Council Canada | NRC · Information and Communications Technologies

About

60
Publications
13,862
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,115
Citations
Introduction
Skills and Expertise
Additional affiliations
July 2003 - present
National Research Council Canada
Position
  • Research Officer

Publications

Publications (60)
Article
Full-text available
Producing compressor maps is time consuming, costly and error prone and many data samples must be collected to give sufficient accuracy. Even then, expert input is typically required to fine tune the map to the appropriate shape. In this paper, we take some of that expertise and incorporate it in the smoothing process. The main piece of knowledge u...
Article
Full-text available
Reproducible research, a growing movement within many scientific fields, including machine learning, would require the code, used to generate the experimental results, be published along with any paper. Probably the most compelling argument for this is that it is simply following good scientific practice, established over the years by the greats of...
Article
Full-text available
This paper is a critique of the part played by the reproducible research movement within the scientific community. In particular, it raises concerns about the strong influence the movement is having on which papers are published. The primary effect is through changes to the peer review process. These not only require that the data and software used...
Article
Full-text available
This paper discusses a system that accelerates reinforcement learning by using transfer from related tasks. Without such transfer, even if two tasks are very similar at some abstract level, an extensive re-learning effort is required. The system achieves much of its power by transferring parts of previously learned solutions rather than a single co...
Article
Full-text available
This paper argues that we, in machine learn-ing, have adopted an evaluation procedure which is an impoverished realization of a con-troversial methodology. I call this an ortho-doxy because it is widely accepted; it shows up in our text books, we teach it to our grad-uate students and expect other researchers to abide by it. The attraction of this...
Article
The friction stir welding process can be modelled using a system of heat transfer and Navier-Stokes equations with a shear dependent viscosity. Finding numerical solutions of this system of nonlinear partial differential equations over a set of parameter space, however, is extremely time-consuming. Therefore, it is desirable to find a computational...
Article
Full-text available
Classification domains such as those in medicine, national security and the environment regularly suffer from a lack of training instances for the class of interest. In many cases, classification models induced under these conditions have poor predictive performance on the important minority class. Synthetic oversampling can be applied to mitigate...
Chapter
Full-text available
Chapter
Full-text available
Chapter
Full-text available
Conference Paper
Full-text available
Problems of class imbalance appear in diverse domains, ranging from gene function annotation to spectra and medical classification. On such problems, the classifier becomes biased in favour of the majority class. This leads to inaccuracy on the important minority classes, such as specific diseases and gene functions. Synthetic oversampling mitigate...
Conference Paper
Problems of class imbalance appear in diverse domains, ranging from gene function annotation to spectra and medical classification. On such problems, the classifier becomes biased in favour of the majority class. This leads to inaccuracy on the important minority classes, such as specific diseases and gene functions. Synthetic oversampling mitigate...
Conference Paper
Full-text available
Gamma-ray spectral classification requires the automatic identification of a large background class and a small minority class composed of instances that may pose a risk to humans and the environment. Accurate classification of such instances is required in a variety of domains, spanning event and port security to national monitoring for failures a...
Conference Paper
Ensemble Methods represent an important research area within machine learning. Here, we argue that the use of such methods can be generalized and applied in many more situations than they have been previously. Instead of using them only to combine the output of an algorithm, we can apply them to the decisions made inside the learning algorithm, its...
Chapter
Full-text available
Reproducible Research, the de facto title of a growing movement within many scientific fields, would require the code, used to generate the experimental results, be published along with any paper. Probably the most compelling argument for this is that it is simply following good scientific practice, established over the years by the greats of scien...
Conference Paper
Full-text available
Bothintensionalandextensionalbackgroundknowledgehave previously been used in inductive problems to complement the training set used for a task. In this research, we propose to explore the usefulness, for inductive learning, of a new kind of intensional background knowledge: the inter-relationships or conditional probability distributions between su...
Conference Paper
Full-text available
In this paper, we test some of the most commonly used classifiers to identify which ones are the most robust to changing environments. The environment may change over time due to some contextual or definitional changes. The environment may change with location. It would be surprising if the performance of common classifiers did not degrade with the...
Article
Statistically based metrics for gas turbine engine diagnostic systems are required to evaluate competing products fairly and to establish a convincing business case. Diagnostic algorithm validation often includes engine testing with implanted faults. The implantation rate is rarely, if ever, representative of the true fault occurrence rate. A techn...
Article
Full-text available
Algorithm performance evaluation is so entrenched in the machine learning community that one could call it an addiction. Like most addictions, it is harmful and very difficult to give up. It is harmful because it has serious limitations. Yet, we have great faith in practicing it in a ritualistic manner: we follow a fixed set of rules telling us the...
Conference Paper
Semi-supervised learning (SSL), is classification where additional unlabeled data can be used to improve accuracy. Generative approaches are appealing in this situation, as a model of the data's probability density can assist in identifying clusters. ...
Article
Full-text available
At various machine learning conferences, at various times, there have been discussions arising from the inability to replicate the experimental results published in a paper. There seems to be a wide spread view that we need to do something to address this prob-lem, as it is essential to the advancement of our field. The most compelling argument wou...
Article
Full-text available
The production of accurate compressor maps is an essential, but time consuming, step in gas turbine engine modeling. Insight into how the shape of a map depends on the compressor type, and design point characteristics, should accelerate this exercise. It should also serve as the basis of a more accurate scaling procedure than is currently available...
Conference Paper
Full-text available
The evaluation of classier performance in a cost-sensitive setting is straightforward if the operating conditions (misclassication costs and class dis- tributions) are x ed and known. When this is not the case, evaluation requires a method of visualizing classier performance across the full range of possi- ble operating conditions. This talk outlin...
Conference Paper
Full-text available
This paper shows how multi-dimensional functions, describ- ing the operation of complex equipment, can be learned. The functions are points in a shape space, each produced by morphing a prototypical function located at its origin. The prototypical function and the space's dimensions, which define morphological operations, are learned from a set of...
Article
Full-text available
Generalization is at the core of evaluation, we estimate the performance of a model on data we have never seen but ex- pect to encounter later on. Our current evaluation procedures assume that the data already seen is a random sample of the domain from which all future data will be drawn. Unfortu- nately, in practical situations this is rarely the...
Article
This paper introduces cost curves, a graphical technique for visualizing the performance (error rate or expected cost) of 2-class classifiers over the full range of possible class distributions and misclassification costs. Cost curves are shown to be superior to ROC curves for visualizing classifier performance for most purposes. This is because th...
Conference Paper
Full-text available
This paper experimentally compares the performance of discriminative and generative classifiers for cost sensitive learning. There is some evidence that learning a discriminative classifier is more effective for a traditional classification task. This paper explores the advantages, and disadvantages, of using a generative classifier when the miscla...
Article
Full-text available
This paper shows how machine learning can help in analyzing and understanding historical change. Using data from the Canadian census of 1901, we discover the influences on bilingualism in Canada at the beginning of the last century. The discovered theories partly agree with, and partly complement, the existing views of historians on this question....
Article
In 1988, Langley wrote an influential editorial in the journal Machine Learning titled “Machine Learning as an Experimental Science”, arguing persuasively for a greater focus on performance testing. Since that time the emphasis has become progressively stronger. Nowadays, to be accepted to one of our major conferences or journals, a paper must typi...
Conference Paper
Full-text available
This paper argues that severe class imbalance is not just an interesting technical challenge that improved learning algorithms will ad- dress, it is much more serious. To be useful, a classifier must appreciably outperform a trivial solution, such as choosing the majority class. Any application that is inherently noisy limits the error rate, and co...
Article
Evaluating classifier performance in a cost-sensitive setting is straightforward if the operating conditions (misclassification costs and class distributions) are fixed and known. When this is not the case, evaluation requires a method of visualizing classifier performance across the full range of possible operating conditions. This paper reviews t...
Article
Full-text available
Anomalies are rare events. For anomaly detection, severe class imbalance is the norm. Although there has been much research into imbalanced classes, there are sur- prisingly few examples of dealing with severe imbalance. Alternative performance mea- sures have superseded error rate, or accuracy, for algorithm comparison. But whatever their other me...
Article
This technical report discusses the experimental comparison of commonly used algorithms both in their traditional discriminative form and as generative classifiers. The performance is compared using cost curves to see what benefits might be gained by using a generative classifier when the misclassification costs, and class frequencies, are unknown....
Article
Full-text available
This paper shows that ROC curves, as a method of visualizing classifier performance, are inadequate for the needs of Artificial Intelligence researchers in several significant respects, and demonstrates that a different way of visualizing performance -- the cost curves introduced by Drummond and Holte at KDD'2000 -- overcomes these deficiencies.
Conference Paper
Full-text available
This paper proposes extending semi-supervised learning by allowing an ongoing interaction between a user and the system. The extension is intended to not only to speed up search for relevant aircraft engine maintenance records but also to help in improving the user's understanding of the problem domain. After the user has identified a small number...
Article
Full-text available
This paper shows how machine learning can help historians analyze and understand important social phenomena. Using data from the Canadian census of 1901, we discover the influences on bilingualism in Canada at beginning of the last century. The discovered theories partly agree with, and partly complement the existing views of historians on this que...
Article
Full-text available
This paper takes a new look at two sampling schemes commonly used to adapt machine al- gorithms to imbalanced classes and misclas- sication costs. It uses a performance anal- ysis technique called cost curves to explore the interaction of over and under-sampling with the decision tree learner C4.5. C4.5 was chosen as, when combined with one of the...
Article
Full-text available
Locating software items is difficult, even for knowledgeable software designers, when searching in large, complex and continuously growing libraries. This paper describes a technique we term “active browsing”. An active browser suggests to the designer items it estimates to be close to the target of the search. The novel aspect of active browsing i...
Article
This research addresses the problem of locating software items in extensive libraries. It aims to increase the speed and accuracy with which a user may browse software libraries for reusable code. The method proposed for this is called active browsing. The system monitors user actions, made within a normal browser, to infer an analogue representing...
Article
Full-text available
This paper proposes an alternative to ROC representation, in which the expected cost of a classifier is represented explicitly. This expected cost representation maintains many of the advantages of ROC representation, but is easier to understand. It allows the experimenter to immediately see the range of costs and class frequencies where a particul...
Article
Full-text available
This paper investigates how the splitting criteria and pruning methods of decision tree learning algorithms are influenced by misclassification costs or changes to the class distribution. Splitting criteria that are relatively insensitive to costs (class distributions) are found to perform as well as or better than, in terms of expected misclassifi...
Article
This thesis demonstrates how the power of symbolic processing can be exploited in the learning of low level control functions. It proposes a novel hybrid architecture with a tight coupling between a variant of symbolic planning and reinforcement learning. This architecture combines the strengths of the function approximation of subsymbolic learning...
Article
. This paper presents a system that transfers the results of prior learning to speed up reinforcement learning in a changing world. Often, even when the change to the world is relatively small an extensive relearning effort is required. The new system exploits strong features in the multi-dimensional function produced by reinforcement learning. The...
Article
. This paper demonstrates the exploitation of certain vision processing techniques to index into a case base of surfaces. The surfaces are the result of reinforcement learning and represent the optimum choice of actions to achieve some goal from anywhere in the state space. This paper shows how strong features that occur in the interaction of the s...
Article
Overshoot is caused by the interaction of a function approximator with rapid changes in the input data. In many applications this produces a visually confusing result. In reinforcement learning it can cause the iterative value estimator to diverge. This research investigates b-splines, a very general family of function approximators with a wide ran...
Conference Paper
Describes a technique which we term “intelligent browsing”; the software built upon this technique assists the users in their search through a multimedia database. Searching for an item of interest in a large multimedia database is a complex task, even for users who have knowledge about it. An active (intelligent) browser suggests to the user items...
Article
Full-text available
Locating software items is difficult, even for knowledgeable software designers, when searching in large, complex and continuously growing libraries. This paper describes a technique, we term active browsing. An active browser suggests to the designer items it estimates to be close to the target of the search. The novel aspect of active browsing is...
Article
Full-text available
This paper presents a common algorithmic framework encompassing the twom ain methods for using an abstract solution to guide search. It identifies certain keyi ssues in the design of techniques for using abstraction to guide search. Newa pproaches to these issues give rise to news earch techniques. Tw o of these are described in detail and compared...
Conference Paper
Full-text available
Discusses a novel method called `active browsing' which increases the speed and accuracy with which a user may browse libraries for reusable software. Information inferred solely from the user's normal actions is employed by the system to locate software items relevant to the user's search goal. This paper describes our active browsing system and i...
Article
Full-text available
One effective way to evaluate a prognostic algorithm is to estimate its potential cost savings. Unfortunately, the final cost is dependent on individual costs, seldom easy to obtain and changing over time. It is dependent on component failure rates, also subject to change. This paper shows a way of representing cost savings over a wide range of cos...
Article
Full-text available
This paper describes the task of browsing and an agent we have developed to improve the speed and success rate of browsing. The agent is a learning apprentice: it monitors the user's normal browsing actions and learns a measure of "relevance" to the user interests. It searches the library being browsed, uses the learned measure to evaluate items an...
Article
Full-text available
This paper shows how machine learning can help historians analyze and understand im- portant social phenomena. Using data from the Canadian census of 1901, we discover the inuences on bilingualism in Canada at beginning of the last century. The discovered theories partly agree with, and partly complement the existing views of historians on this que...
Article
Full-text available
Modern operation of complex systems such as trains and aircraft generates vast amounts of data. This data can be used to help predict component failures which may lead to considerable savings, reduce the number of delays, increase the overall throughput of the organization, and augment safety. Many data mining algorithms, such as neural networks, d...
Article
Over the life time of any piece of complex equipment, the likelihood of a failure and the cost of its repair will change. The best machine learning classifier, for predicting failures, is dependent on these values. This paper presents a way of visualizing expected cost which gives a clear picture as to when a particular classifier is the right one...

Projects

Projects (2)
Project
Reproducible research, a growing movement within many scientific fields, including machine learning, would require the code, used to generate the experimental results, be published along with any paper. Probably the most compelling argument for this is that it is simply following good scientific practice, established over the years by the greats of science. The implication is that failure to follow such a practice is unscientific, not a label any machine learning researchers would like to carry. It is further claimed that misconduct is causing a growing crisis of confidence in science. That, without this practice being enforced, science would inevitably fall into disrepute. This viewpoint is becoming ubiquitous but here I offer a differing opinion. I argue that far from being central to science, what is being promulgated is a narrow interpretation of how science works. I contend that the consequences are somewhat overstated. I …