
R. Wayne OldfordUniversity of Waterloo | UWaterloo · Statistics & Actuarial Science
R. Wayne Oldford
Ph.D.
About
74
Publications
25,929
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
670
Citations
Introduction
Additional affiliations
September 1986 - present
Education
September 1979 - August 1982
September 1978 - August 1979
September 1973 - April 1977
Publications
Publications (74)
We propose using graph theoretic results to develop an infrastructure that tracks movement from a display of one set of variables
to another. The illustrative example throughout is the real-time morphing of one scatterplot into another. Hurley and Oldford
(J Comput Graph Stat 2010) made extensive use of the graph having variables as nodes and edges...
The structure of a set of high dimensional data objects (e.g. images, documents, molecules,
genetic expressions, etc.) is notoriously di�cult to visualize. In contrast, lower dimensional structure
(esp. 3 or fewer dimensions) is natural to us and easy to visualize. A not unreasonable approach,
then, is to explore one low dimensional visualization a...
What is “statistical method”? Is it the same as “scientific method”? This paper answers the first question by specifying the elements and procedures common to all statistical investigations and organizing these into a single structure. This structure is illustrated by careful examination of the first scientific study on the speed of light carried o...
Students of statistics should be challenged to discover the possibilities computational technology has to offer empirical investigation. This paper describes recent experience with a course in statistical computing that tries to do just that. Features of the course and the computing environment which would allow it to be replicated elsewhere are de...
A graph theoretic approach is taken to the component order problem in the layout of statistical graphics. Eulerian tours and Hamiltonian decompositions of complete graphs are used to ameliorate order eects in statistical graphics. Similar traversals of edge weighted graphs are used to amplify the visual eect of selected salient features in the data...
This work examines the problem of clique enumeration on a graph by exploiting its clique covers. The principle of inclusion/exclusion is applied to determine the number of cliques of size $r$ in the graph union of a set $\mathcal{C} = \{c_1, \ldots, c_m\}$ of $m$ cliques. This leads to a deeper examination of the sets involved and to an orbit parti...
Motivated by an approach to visualization of high dimensional statistical data given in Hurley and Oldford (2011), this work examines the clique structure of $J_n(m, m-1)$ Johnson graphs. Cliques and maximal cliques are characterized and proved to be of one of only two types. These types are characterized by features of the intersection and of the...
This work examines the problem of clique enumeration on a graph by exploiting its clique covers. The principle of inclusion/exclusion is applied to determine the number of cliques of size $r$ in the graph union of a set $\mathcal{C} = \{c_1, \ldots, c_m\}$ of $m$ cliques. This leads to a deeper examination of the sets involved and to an orbit parti...
A novel multinomial theorem for commutative idempotents is shown to lead to new results about the moments, central moments, factorial moments, and their generating functions for any random variable $X = \sum_{i} Y_i $ expressible as a sum of Bernoulli indicator random variables $Y_i$. The resulting expressions are functions of the expectation of pr...
We describe the features and implementation of the R package zenplots (zigzag expanded navigation plots) for displaying high-dimensional data according to the recently proposed zenplots. By default, zenplots lay out alternating one-and two-dimensional plots in a zigzag-like pattern where adjacent axes share the same variate. Zenplots are especially...
A framework for quantifying dependence between random vectors is introduced. Using the notion of a collapsing function, random vectors are summarized by single random variables, referred to as collapsed random variables. Measures of association computed from the collapsed random variables are then used to measure the dependence between random vecto...
During the 2016 US presidential election campaign, Hillary Clinton's emails were the source of much debate and controversy – and may have played a decisive role in her defeat by Donald Trump. Christopher Salahub and Wayne Oldford have built a tool to analyse the contents of her private server. What more can we learn? During the 2016 US presidential...
A framework for quantifying dependence between random vectors is introduced. With the notion of a collapsing function, random vectors are summarized by single random variables, called collapsed random variables in the framework. Using this framework, a general graphical assessment of independence between groups of random variables for arbitrary col...
We present a web-based visualization that allows the user to interactively filter and display characteristics of 32,795 of Hillary Clinton’s emails as provided by Wikileaks. The visualization focuses on the meta-data of each email, including its senders, receivers, and the timestamp the email appeared on the Clinton server (from the Wikileaks sourc...
White balancing is a fundamental step in the image processing pipeline. The process involves estimating the chromaticity of the illuminant or light source and using the estimate to correct the image to remove any color cast. Given the importance of the problem, there has been much previous work on illuminant estimation. Recently, an approach based...
The paper introduces a special case of the Euclidean distance matrix completion problem (edmcp) of interest in statistical data analysis where only the minimal spanning tree distances are given and the matrix completion must preserve the minimal spanning tree. Two solutions are proposed, one an adaptation of a more general method based on a dissimi...
The notion of a zenpath and a zenplot is introduced to search and detect dependence in high-dimensional data for model building and statistical inference. By using any measure of dependence between two random variables (such as correlation, Spearman's rho, Kendall's tau, tail dependence etc.), a zenpath can construct paths through pairs of variable...
Quantile-quantile plots, or qqplots, are an important visual tool for many applications but their interpretation requires some care and often more experience. This apparent subjectivity is unnecessary. By drawing on the computational and display facilities now widely available, qqplots are easily enriched to help with their interpretation. An overv...
Student Poster by Waddell and Huang, supervised by Oldford for Statistical Society of Canada 2013 data analysis competition.
Techniques for analyzing and synthesizing complex knowledge representations (KRs) may utilize an atomic knowledge representation model including both an elemental data structure and knowledge processing rules stored as machine-readable data and/or programming instructions. One or more of the knowledge processing rules may be applied to analyze an i...
Compilers perform instruction scheduling to improve the performance of code on modern computer architectures. Superblocks—a straight-line sequence of code with a single entry point and multiple possible exit points—are a commonly used scheduling region within compilers. Superblock scheduling is NP-complete, and is done suboptimally in production co...
The structure of a set of high dimensional data objects (e.g. images, documents, molecules, genetic expressions, etc.) is notoriously difficult to visualize. In contrast, lower dimensional structures (esp. 3 or fewer dimensions) are natural to us and easy to visualize. A not unreasonable approach then is to explore one low dimensional visualization...
Automated and purely visual methods for cluster detection are complementary in the circumstances in which they have most value. Automated methods may be routinely applied to data of more than three dimensions, where our visual experience and ability necessarily end. Unfortunately, automated methods rely (implicitly) on pre-defined data patterns and...
PairViz is an R package that produces orderings of statistical objects for visualization purposes. We abstract the ordering problem
to one of constructing edge-traversals of (possibly weighted) graphs. PairViz implements various edge traversal algorithms which are based on Eulerian tours and Hamiltonian decompositions. We describe
these algorithms,...
We discuss the implementation of the RnavGraph R package and demonstrate its functionality
on some high dimensional data. RnavGraph facilitates controlled exploration of high dimensional data
space via (user determined) low dimensional trajectories through that space. The trajectories are paths
on a navigation graph (navGraph), a graph whose nodes...
Two million marine containers arrive each year at Cana-dian ports, representing a significant percentage of Canada's trade with its overseas partners. While the majority of these commercial shipments are perfectly legitimate, some marine containers are used by criminals to smuggle drugs and weapons. To address this risk, the Canada Border Ser-vices...
Statistics, science and public policy IX: government, science and politics : proceedings of the Conference on Statistics, Science and Public Policy held at Herstmonceux Castle, Hailsham, U.K., April 21-24, 2004
Statistics, science and public policy VII: environment, health and globalization : proceedings of the Conference on Statistics, Science and Public Policy held at Herstmonceux Castle, Hailsham, U.K., April 17-20, 2002
by A. M. Herzberg, R. W. Oldford
Hardcover, 254 Pages, Published 2003
Statistics, science and public policy VIII: science, ethics and the law : proceedings of the Conference on Statistics, Science and Public Policy held at Herstmonceux Castle, Hailsham, U.K., April 23-26, 2003
All possible independence structures available between three variables are explored via a simple visual display called an eikosogram (see Cherry and Oldford, 2002). Formal mathematical development is complementary rather than necessary. If well understood, independence structures provide a solid basis for discussion of study design issues and stati...
This book is essentially the author's doctoral thesis wherein computational structures (data and programs) are described as implemented by the author. The book is perhaps mistitled for it suggests that one might have found a substantive review and exploration of the novel computational structures which have appeared in the statistical computing lit...
It has been suggested (Wainer, 1989) that the system rst proposed by C. S. Peirce to organise knowl-edge is particularly suited to describing statistical graphics. Peirce felt that all information could be broken down into three di erent types { monadic information, which describes something in and of itself, dyadic information, which describes a r...
Every software project has reasons for its existence and con- tinued development. Quail extends the ANSI standard lan- guage COMMON LISP to facilitate data analysis and statis- tical modelling. Important extensions include a rich object- oriented statistical graphics and general interface building system, multi-way array manipulation, in addition t...
A physical device is described that can be used by students in a laboratory setting to discover the potential effects of confounding, the important role played by randomization in experimental design, and the value of good blocking. If the self-discovery approach of a physical laboratory is not possible, the instructor can use the device on an over...
A physical device is described that can be used by students in a laboratory setting to discover the potential effects of confounding, the important role played by randomization in experimental design, and the value of good blocking. If the self-discovery approach of a physical laboratory is not possible, the instructor can use the device on an over...
A history of the determination of the speed of light in vacuo is presented as a case study for a general approach to empirical problem solving. One 1879 study by Michelson is presented in detail.
All of his reported measurements are reproduced here. Important studies which preceded it and which followed it are presented in a historical order and ar...
This volume presents a selection of papers from the Fourth International Workshop on Artificial Intelligence and Statistics. This biennial workshop brings together researchers from both fields to discuss problems of mutual interest and to compare approaches to their solution. The fourth workshop focused on the topic of selecting models from data. A...
This volume is a selection of papers presented at the Fourth International Workshop on Artificial Intelligence and Statistics held in January 1993. These biennial workshops have succeeded in bringing together researchers from Artificial Intelligence and from Statistics to discuss problems of mutual interest. The exchange has broadened research in b...
We describe our software design and implementation of a wide variety of response models, which model the values of a response variable as an interpretable function of explanatory variables. A distinguishing characteristic of our approach is the attention given to building software abstractions which closely mimic their statistical counterparts.
Selecting models from data: artificial intelligence and statistics IV
by P. Cheeseman and R. W. Oldford (Editors),
Paperback, 487 Pages, Published 1994
http://link.springer.com/chapter/10.1007/978-1-4612-2660-4_42
This is a 1992 University of Waterloo technical report on the statistical graphics model used in Quail.
The model is that known as the "Views" system in Quail and is based on a heavily objected oriented design that follows a natural hierarchy of plot elements in statistical graphics. The model is designed for interactive statistical graphics (e.g....
Constraint-oriented programming has been a research topic in Computer Science since at least 1963. Advances in computer technology has allowed it to become a more active area in the last decade. In this paper an introduction to constraint-oriented programming is given. A general software model for constraint-oriented programming in an object-orient...
This paper presents some general approaches to building software representations of statistical strategies. In statistics, strategy is the skilful management of data, probability models, experimental designs, and other statistical concepts. This paper addresses the representation of these concepts separately from the representation of the actions t...
This paper presents new software designs for statistical data. These are implemented using an object-oriented programming paradigm. The implementations are built in a layered fashion from independent representations for the individual, variate, and datum components of a statistical observation to representations for univariate samples and multivari...
A prototype statistical system the authors call DINDE is described. DINDE is aimed at the professional statistician and provides a statistical analysis environment that is more sophisticated than the current generation of systems. In particular, it allows the analyst to keep careful track of the entire analysis as it progresses. General design phil...
This paper presents software designed to aid the interactive management of a statistical analysis. A graphical interface is proposed which allows the analyst to keep track of the analysis and manage it as it is being carried out. The implementation is in an experimental statistical system, but the design principles apply more generally. Interactive...
A new paradigm for statistical computing that has been evolving is identified. The paradigm of abstract statistical computing is to use software to model (possibly abstract) statistical concepts. Some examples from recent statistical computing research are discussed to illustrate the paradigm.
The n-dimensional geometry of collinearity and data that are influential in least-squares linear regression is explored. A generalization of vector space dimensionality is introduced to provide an intuitive description of these problems. It is also noted that this new measure of dimensionality plays the role of the usual dimension in a James-Stein...
DINDE is a highly interactive display oriented system where the user carries out data analysis by building and maintaining a network representation of it. The network links statistically meaningful objects (e.g. scatterplots, regression results, etc.) and is displayed in a mouse-sensitive window. We describe this network model for a statistical ana...
We explore the possibility of using software like that developed by the AI community as a medium in which some of the strategies useful in practice may be recorded and examined. Particular attention will be paid to discerning the kinds of strategies, and which of their properties, might reasonably be studied in this medium. Examples of the implemen...
We describe a prototype system, which we call DINDE, and the directed network model of statistical analysis on which it is currently based. DINDE is a highly interactive display oriented system where the user carries out the analysis by building and maintaining a network representation of it. An example analysis is used to describe this interaction...
The notion of a conditioning analysis of a general, nonlinear set of relations is defined along with an associated definition of ill conditioning. From these, one may identify at least three different kinds of conditioning analyses of interest in statistics and econometrics: data, estimator, and criterion conditioning. While these three coincide in...
We discuss the design and implementation of object-oriented datatypes for a sophisticated statistical analysis environment The discussion draws on our experience with an experimental statistical analysis system, called DINDE. DINDE resides in the integrated programming environment of a Xerox Interlisp-D machine running LOOPS. The discussion begins...
Approximations to the bootstrap estimate of bias and variance may be obtained by replacing the estimate to be bootstrapped by one which is linear, or ,or quadratic, , in the resampling vector p. The bootstrap bias and variance of and may then be evaluated analytically. These estimators are discussed and then investigated via a Monte Carlo experimen...
A mathematical theory is presented which extends the geometric theory of vector spaces to deal particularly with finite collections of vectors. This theory is then exploited in the case of the linear model to describe the geometry of certain practically relevant issues such as least-squares regression diagnostics.
The wide availability of statistical software which could be easily misused by naive users, together with the demonstrated success of `knowledge-based' or `expert' systems in other domains, prompted several statisticians to explore the possibility of introducing expert systems technology into statistical software.
This paper reports on our progre...
It is argued here that the essential phenomenon of import which C.P. Snow de- scribed in 1959 as that of two distinct non-communicating cultures { one of 'literary intellectuals' one of 'scientic intellectuals' { is better described as a shift in emphasis within the university culture from a humanities dominated one to a science dominated one. Soci...
Eikosograms are diagrams which embed the rules of probability and can be used to un- derstand and to explore the probabilistic structure involving one or more categorical variables. Rectangular areas correspond to probabilities and can be used to calculate their numerical value and to determine the Bayes relation. Eikosograms are used here to resol...
Diagrams convey information, some intended some not. A history of the information content of ringed diagrams and their use by Euler and Venn is given. It is argued that for the purposes of teaching introductory probability, Venn diagrams are either inappropriate or inferior to other diagrams. A diagram we call an eikosogram is shown to be coinciden...
Abstract High breakdown, without other measures of estimator resistance, is an inadequate goal for regression estimators. This is shown by constructing an easily computed,regression estimator with 50% breakdown. The estimator is essentially least squares. Acknowledgements
Diagrams convey information, some intended some not. A history of ringed diagrams including their use by Euler and Venn shows that the information content of these diagrams is consistent and inescapable - they describe abstract interrelations between dierent entities. This historical consistency predates and would have been known to both Euler and...
In 1999, approximately 40 leading scientists, statisticians, public science administrators, and journalists were invited to Herstmonceaux Castle in Hailsham, England for the fourth conference on Statistics, Science, and Public Policy. The theme of this conference was \The Two Cultures?" in recognition of the fortieth anniversary of C.P. Snow's famo...
May 9, 2002 "This is a time of great opportunity. What some call globalization is in fact the triumph of human liberty across national borders. We have today the chance to prove that freedom can work not just in the new world or old world, but in the whole world. Our great challenge is to include all the world's poor in an expanding circle of devel...
This the paper where we look at the use of eikosograms to show independence.
We present improved graphical displays for two classical data analysis problems - the comparisons of treatments in a one-way layout, and the assessment of interaction between factors in a two-factor experiment.
In both settings the key to the improved display is that all n choose 2 pairs of factor levels are compared, whereas conventional displays...
The debate about the appropriate computer human in-terface direct manipulation versus command line is an old one and a false one. That it regularly arises is a consequence of inappropriate software design. An ideally designed system would freely mix the two. D.A. Norman's 1988 Design of Everyday Things is condensed to essential principles which are...
Questions
Question (1)
When recording eye tracks, the two eye tracks are recorded at once and they seem always to be separated. One imagines that the gaze should be focused and that they should agree most of the time.