R. Wayne Oldford

R. Wayne Oldford
University of Waterloo | UWaterloo · Statistics & Actuarial Science

Ph.D.

About

74
Publications
25,929
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
670
Citations
Additional affiliations
September 1986 - present
University of Waterloo
Position
  • Professor (Full)
September 1982 - August 1986
Massachusetts Institute of Technology
Position
  • Principal Research Associate
Description
  • Research in CCREMS, taught grad stats courses for the Math department, some internal consulting (Speech recognition lab).
May 1977 - January 1980
Statistics Canada
Position
  • Junior Survey Methodologist
Description
  • Dates: May 1977 to August 1978, August 1979. On unpaid educational leave otherwise. Did not return in 1980. Worked on the Agricultural Enumerative Survey.
Education
September 1979 - August 1982
University of Toronto
Field of study
  • Statistics
September 1978 - August 1979
University of Toronto
Field of study
  • Statistics
September 1973 - April 1977
University of Waterloo
Field of study
  • Double major: 1. Statistics; 2. Combinatorics & Optimization.

Publications

Publications (74)
Article
Full-text available
We propose using graph theoretic results to develop an infrastructure that tracks movement from a display of one set of variables to another. The illustrative example throughout is the real-time morphing of one scatterplot into another. Hurley and Oldford (J Comput Graph Stat 2010) made extensive use of the graph having variables as nodes and edges...
Conference Paper
Full-text available
The structure of a set of high dimensional data objects (e.g. images, documents, molecules, genetic expressions, etc.) is notoriously di�cult to visualize. In contrast, lower dimensional structure (esp. 3 or fewer dimensions) is natural to us and easy to visualize. A not unreasonable approach, then, is to explore one low dimensional visualization a...
Article
Full-text available
What is “statistical method”? Is it the same as “scientific method”? This paper answers the first question by specifying the elements and procedures common to all statistical investigations and organizing these into a single structure. This structure is illustrated by careful examination of the first scientific study on the speed of light carried o...
Article
Full-text available
Students of statistics should be challenged to discover the possibilities computational technology has to offer empirical investigation. This paper describes recent experience with a course in statistical computing that tries to do just that. Features of the course and the computing environment which would allow it to be replicated elsewhere are de...
Article
Full-text available
A graph theoretic approach is taken to the component order problem in the layout of statistical graphics. Eulerian tours and Hamiltonian decompositions of complete graphs are used to ameliorate order eects in statistical graphics. Similar traversals of edge weighted graphs are used to amplify the visual eect of selected salient features in the data...
Article
Full-text available
This work examines the problem of clique enumeration on a graph by exploiting its clique covers. The principle of inclusion/exclusion is applied to determine the number of cliques of size $r$ in the graph union of a set $\mathcal{C} = \{c_1, \ldots, c_m\}$ of $m$ cliques. This leads to a deeper examination of the sets involved and to an orbit parti...
Preprint
Full-text available
Motivated by an approach to visualization of high dimensional statistical data given in Hurley and Oldford (2011), this work examines the clique structure of $J_n(m, m-1)$ Johnson graphs. Cliques and maximal cliques are characterized and proved to be of one of only two types. These types are characterized by features of the intersection and of the...
Preprint
Full-text available
This work examines the problem of clique enumeration on a graph by exploiting its clique covers. The principle of inclusion/exclusion is applied to determine the number of cliques of size $r$ in the graph union of a set $\mathcal{C} = \{c_1, \ldots, c_m\}$ of $m$ cliques. This leads to a deeper examination of the sets involved and to an orbit parti...
Preprint
Full-text available
A novel multinomial theorem for commutative idempotents is shown to lead to new results about the moments, central moments, factorial moments, and their generating functions for any random variable $X = \sum_{i} Y_i $ expressible as a sum of Bernoulli indicator random variables $Y_i$. The resulting expressions are functions of the expectation of pr...
Article
Full-text available
We describe the features and implementation of the R package zenplots (zigzag expanded navigation plots) for displaying high-dimensional data according to the recently proposed zenplots. By default, zenplots lay out alternating one-and two-dimensional plots in a zigzag-like pattern where adjacent axes share the same variate. Zenplots are especially...
Article
A framework for quantifying dependence between random vectors is introduced. Using the notion of a collapsing function, random vectors are summarized by single random variables, referred to as collapsed random variables. Measures of association computed from the collapsed random variables are then used to measure the dependence between random vecto...
Article
Full-text available
During the 2016 US presidential election campaign, Hillary Clinton's emails were the source of much debate and controversy – and may have played a decisive role in her defeat by Donald Trump. Christopher Salahub and Wayne Oldford have built a tool to analyse the contents of her private server. What more can we learn? During the 2016 US presidential...
Article
Full-text available
A framework for quantifying dependence between random vectors is introduced. With the notion of a collapsing function, random vectors are summarized by single random variables, called collapsed random variables in the framework. Using this framework, a general graphical assessment of independence between groups of random variables for arbitrary col...
Working Paper
Full-text available
We present a web-based visualization that allows the user to interactively filter and display characteristics of 32,795 of Hillary Clinton’s emails as provided by Wikileaks. The visualization focuses on the meta-data of each email, including its senders, receivers, and the timestamp the email appeared on the Clinton server (from the Wikileaks sourc...
Article
Full-text available
White balancing is a fundamental step in the image processing pipeline. The process involves estimating the chromaticity of the illuminant or light source and using the estimate to correct the image to remove any color cast. Given the importance of the problem, there has been much previous work on illuminant estimation. Recently, an approach based...
Article
Full-text available
The paper introduces a special case of the Euclidean distance matrix completion problem (edmcp) of interest in statistical data analysis where only the minimal spanning tree distances are given and the matrix completion must preserve the minimal spanning tree. Two solutions are proposed, one an adaptation of a more general method based on a dissimi...
Article
The notion of a zenpath and a zenplot is introduced to search and detect dependence in high-dimensional data for model building and statistical inference. By using any measure of dependence between two random variables (such as correlation, Spearman's rho, Kendall's tau, tail dependence etc.), a zenpath can construct paths through pairs of variable...
Article
Full-text available
Quantile-quantile plots, or qqplots, are an important visual tool for many applications but their interpretation requires some care and often more experience. This apparent subjectivity is unnecessary. By drawing on the computational and display facilities now widely available, qqplots are easily enriched to help with their interpretation. An overv...
Research
Full-text available
Student Poster by Waddell and Huang, supervised by Oldford for Statistical Society of Canada 2013 data analysis competition.
Patent
Full-text available
Techniques for analyzing and synthesizing complex knowledge representations (KRs) may utilize an atomic knowledge representation model including both an elemental data structure and knowledge processing rules stored as machine-readable data and/or programming instructions. One or more of the knowledge processing rules may be applied to analyze an i...
Article
Full-text available
Compilers perform instruction scheduling to improve the performance of code on modern computer architectures. Superblocks—a straight-line sequence of code with a single entry point and multiple possible exit points—are a commonly used scheduling region within compilers. Superblock scheduling is NP-complete, and is done suboptimally in production co...
Conference Paper
Full-text available
The structure of a set of high dimensional data objects (e.g. images, documents, molecules, genetic expressions, etc.) is notoriously difficult to visualize. In contrast, lower dimensional structures (esp. 3 or fewer dimensions) are natural to us and easy to visualize. A not unreasonable approach then is to explore one low dimensional visualization...
Conference Paper
Full-text available
Automated and purely visual methods for cluster detection are complementary in the circumstances in which they have most value. Automated methods may be routinely applied to data of more than three dimensions, where our visual experience and ability necessarily end. Unfortunately, automated methods rely (implicitly) on pre-defined data patterns and...
Article
Full-text available
PairViz is an R package that produces orderings of statistical objects for visualization purposes. We abstract the ordering problem to one of constructing edge-traversals of (possibly weighted) graphs. PairViz implements various edge traversal algorithms which are based on Eulerian tours and Hamiltonian decompositions. We describe these algorithms,...
Conference Paper
Full-text available
We discuss the implementation of the RnavGraph R package and demonstrate its functionality on some high dimensional data. RnavGraph facilitates controlled exploration of high dimensional data space via (user determined) low dimensional trajectories through that space. The trajectories are paths on a navigation graph (navGraph), a graph whose nodes...
Article
Full-text available
Two million marine containers arrive each year at Cana-dian ports, representing a significant percentage of Canada's trade with its overseas partners. While the majority of these commercial shipments are perfectly legitimate, some marine containers are used by criminals to smuggle drugs and weapons. To address this risk, the Canada Border Ser-vices...
Book
Statistics, science and public policy IX: government, science and politics : proceedings of the Conference on Statistics, Science and Public Policy held at Herstmonceux Castle, Hailsham, U.K., April 21-24, 2004
Book
Statistics, science and public policy VII: environment, health and globalization : proceedings of the Conference on Statistics, Science and Public Policy held at Herstmonceux Castle, Hailsham, U.K., April 17-20, 2002 by A. M. Herzberg, R. W. Oldford Hardcover, 254 Pages, Published 2003
Book
Statistics, science and public policy VIII: science, ethics and the law : proceedings of the Conference on Statistics, Science and Public Policy held at Herstmonceux Castle, Hailsham, U.K., April 23-26, 2003
Article
Full-text available
All possible independence structures available between three variables are explored via a simple visual display called an eikosogram (see Cherry and Oldford, 2002). Formal mathematical development is complementary rather than necessary. If well understood, independence structures provide a solid basis for discussion of study design issues and stati...
Article
This book is essentially the author's doctoral thesis wherein computational structures (data and programs) are described as implemented by the author. The book is perhaps mistitled for it suggests that one might have found a substantive review and exploration of the novel computational structures which have appeared in the statistical computing lit...
Article
Full-text available
It has been suggested (Wainer, 1989) that the system rst proposed by C. S. Peirce to organise knowl-edge is particularly suited to describing statistical graphics. Peirce felt that all information could be broken down into three di erent types { monadic information, which describes something in and of itself, dyadic information, which describes a r...
Article
Full-text available
Every software project has reasons for its existence and con- tinued development. Quail extends the ANSI standard lan- guage COMMON LISP to facilitate data analysis and statis- tical modelling. Important extensions include a rich object- oriented statistical graphics and general interface building system, multi-way array manipulation, in addition t...
Article
Full-text available
A physical device is described that can be used by students in a laboratory setting to discover the potential effects of confounding, the important role played by randomization in experimental design, and the value of good blocking. If the self-discovery approach of a physical laboratory is not possible, the instructor can use the device on an over...
Article
Full-text available
A physical device is described that can be used by students in a laboratory setting to discover the potential effects of confounding, the important role played by randomization in experimental design, and the value of good blocking. If the self-discovery approach of a physical laboratory is not possible, the instructor can use the device on an over...
Technical Report
Full-text available
A history of the determination of the speed of light in vacuo is presented as a case study for a general approach to empirical problem solving. One 1879 study by Michelson is presented in detail. All of his reported measurements are reproduced here. Important studies which preceded it and which followed it are presented in a historical order and ar...
Article
This volume presents a selection of papers from the Fourth International Workshop on Artificial Intelligence and Statistics. This biennial workshop brings together researchers from both fields to discuss problems of mutual interest and to compare approaches to their solution. The fourth workshop focused on the topic of selecting models from data. A...
Book
Full-text available
This volume is a selection of papers presented at the Fourth International Workshop on Artificial Intelligence and Statistics held in January 1993. These biennial workshops have succeeded in bringing together researchers from Artificial Intelligence and from Statistics to discuss problems of mutual interest. The exchange has broadened research in b...
Chapter
Full-text available
We describe our software design and implementation of a wide variety of response models, which model the values of a response variable as an interpretable function of explanatory variables. A distinguishing characteristic of our approach is the attention given to building software abstractions which closely mimic their statistical counterparts.
Book
Selecting models from data: artificial intelligence and statistics IV by P. Cheeseman and R. W. Oldford (Editors), Paperback, 487 Pages, Published 1994 http://link.springer.com/chapter/10.1007/978-1-4612-2660-4_42
Article
Full-text available
This is a 1992 University of Waterloo technical report on the statistical graphics model used in Quail. The model is that known as the "Views" system in Quail and is based on a heavily objected oriented design that follows a natural hierarchy of plot elements in statistical graphics. The model is designed for interactive statistical graphics (e.g....
Conference Paper
Full-text available
Constraint-oriented programming has been a research topic in Computer Science since at least 1963. Advances in computer technology has allowed it to become a more active area in the last decade. In this paper an introduction to constraint-oriented programming is given. A general software model for constraint-oriented programming in an object-orient...
Article
Full-text available
This paper presents some general approaches to building software representations of statistical strategies. In statistics, strategy is the skilful management of data, probability models, experimental designs, and other statistical concepts. This paper addresses the representation of these concepts separately from the representation of the actions t...
Article
Full-text available
This paper presents new software designs for statistical data. These are implemented using an object-oriented programming paradigm. The implementations are built in a layered fashion from independent representations for the individual, variate, and datum components of a statistical observation to representations for univariate samples and multivari...
Article
Full-text available
A prototype statistical system the authors call DINDE is described. DINDE is aimed at the professional statistician and provides a statistical analysis environment that is more sophisticated than the current generation of systems. In particular, it allows the analyst to keep careful track of the entire analysis as it progresses. General design phil...
Technical Report
Full-text available
This paper presents software designed to aid the interactive management of a statistical analysis. A graphical interface is proposed which allows the analyst to keep track of the analysis and manage it as it is being carried out. The implementation is in an experimental statistical system, but the design principles apply more generally. Interactive...
Conference Paper
Full-text available
A new paradigm for statistical computing that has been evolving is identified. The paradigm of abstract statistical computing is to use software to model (possibly abstract) statistical concepts. Some examples from recent statistical computing research are discussed to illustrate the paradigm.
Article
Full-text available
The n-dimensional geometry of collinearity and data that are influential in least-squares linear regression is explored. A generalization of vector space dimensionality is introduced to provide an intuitive description of these problems. It is also noted that this new measure of dimensionality plays the role of the usual dimension in a James-Stein...
Conference Paper
Full-text available
DINDE is a highly interactive display oriented system where the user carries out data analysis by building and maintaining a network representation of it. The network links statistically meaningful objects (e.g. scatterplots, regression results, etc.) and is displayed in a mouse-sensitive window. We describe this network model for a statistical ana...
Chapter
Full-text available
We explore the possibility of using software like that developed by the AI community as a medium in which some of the strategies useful in practice may be recorded and examined. Particular attention will be paid to discerning the kinds of strategies, and which of their properties, might reasonably be studied in this medium. Examples of the implemen...
Conference Paper
Full-text available
We describe a prototype system, which we call DINDE, and the directed network model of statistical analysis on which it is currently based. DINDE is a highly interactive display oriented system where the user carries out the analysis by building and maintaining a network representation of it. An example analysis is used to describe this interaction...
Article
The notion of a conditioning analysis of a general, nonlinear set of relations is defined along with an associated definition of ill conditioning. From these, one may identify at least three different kinds of conditioning analyses of interest in statistics and econometrics: data, estimator, and criterion conditioning. While these three coincide in...
Article
Full-text available
We discuss the design and implementation of object-oriented datatypes for a sophisticated statistical analysis environment The discussion draws on our experience with an experimental statistical analysis system, called DINDE. DINDE resides in the integrated programming environment of a Xerox Interlisp-D machine running LOOPS. The discussion begins...
Article
Full-text available
Approximations to the bootstrap estimate of bias and variance may be obtained by replacing the estimate to be bootstrapped by one which is linear, or ,or quadratic, , in the resampling vector p. The bootstrap bias and variance of and may then be evaluated analytically. These estimators are discussed and then investigated via a Monte Carlo experimen...
Technical Report
Full-text available
A mathematical theory is presented which extends the geometric theory of vector spaces to deal particularly with finite collections of vectors. This theory is then exploited in the case of the linear model to describe the geometry of certain practically relevant issues such as least-squares regression diagnostics.
Article
Full-text available
The wide availability of statistical software which could be easily misused by naive users, together with the demonstrated success of `knowledge-based' or `expert' systems in other domains, prompted several statisticians to explore the possibility of introducing expert systems technology into statistical software. This paper reports on our progre...
Article
Full-text available
It is argued here that the essential phenomenon of import which C.P. Snow de- scribed in 1959 as that of two distinct non-communicating cultures { one of 'literary intellectuals' one of 'scientic intellectuals' { is better described as a shift in emphasis within the university culture from a humanities dominated one to a science dominated one. Soci...
Article
Full-text available
Eikosograms are diagrams which embed the rules of probability and can be used to un- derstand and to explore the probabilistic structure involving one or more categorical variables. Rectangular areas correspond to probabilities and can be used to calculate their numerical value and to determine the Bayes relation. Eikosograms are used here to resol...
Article
Full-text available
Diagrams convey information, some intended some not. A history of the information content of ringed diagrams and their use by Euler and Venn is given. It is argued that for the purposes of teaching introductory probability, Venn diagrams are either inappropriate or inferior to other diagrams. A diagram we call an eikosogram is shown to be coinciden...
Article
Full-text available
Abstract High breakdown, without other measures of estimator resistance, is an inadequate goal for regression estimators. This is shown by constructing an easily computed,regression estimator with 50% breakdown. The estimator is essentially least squares. Acknowledgements
Article
Full-text available
Diagrams convey information, some intended some not. A history of ringed diagrams including their use by Euler and Venn shows that the information content of these diagrams is consistent and inescapable - they describe abstract interrelations between dierent entities. This historical consistency predates and would have been known to both Euler and...
Article
Full-text available
In 1999, approximately 40 leading scientists, statisticians, public science administrators, and journalists were invited to Herstmonceaux Castle in Hailsham, England for the fourth conference on Statistics, Science, and Public Policy. The theme of this conference was \The Two Cultures?" in recognition of the fortieth anniversary of C.P. Snow's famo...
Article
Full-text available
May 9, 2002 "This is a time of great opportunity. What some call globalization is in fact the triumph of human liberty across national borders. We have today the chance to prove that freedom can work not just in the new world or old world, but in the whole world. Our great challenge is to include all the world's poor in an expanding circle of devel...
Article
Full-text available
This the paper where we look at the use of eikosograms to show independence.
Article
Full-text available
We present improved graphical displays for two classical data analysis problems - the comparisons of treatments in a one-way layout, and the assessment of interaction between factors in a two-factor experiment. In both settings the key to the improved display is that all n choose 2 pairs of factor levels are compared, whereas conventional displays...
Article
Full-text available
The debate about the appropriate computer human in-terface direct manipulation versus command line is an old one and a false one. That it regularly arises is a consequence of inappropriate software design. An ideally designed system would freely mix the two. D.A. Norman's 1988 Design of Everyday Things is condensed to essential principles which are...

Questions

Question (1)
Question
When recording eye tracks, the two eye tracks are recorded at once and they seem always to be separated.  One imagines that the gaze should be focused and that they should agree most of the time.

Network

Cited By