## About

163

Publications

23,425

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

13,768

Citations

Introduction

**Skills and Expertise**

## Publications

Publications (163)

Cross-validation is an important evaluation strategy in behavioral predictive modeling; without it, a predictive model is likely to be overly optimistic. Statistical methods have been developed that allow researchers to straightforwardly cross-validate predictive models by using the same data employed to construct the model. In the present study, c...

The ten most frequently cited articles appearing in Psychometrika since its establishment in
1936 are highlighted in a series of ten commentaries. They are grouped in three themes, with
chronological ordering within these groups. The first group regards some characteristic problems
in factor analysis, the second group is about the analysis of proxi...

A review is provided for the creation of the Psychometric Society in 1935, and the establishment of its journal, Psychometrika, in 1936. This document is part of the 80th anniversary celebration for Psychometrika's founding, held during the annual meeting of the Psychometric Society in July of 2016 in Asheville, NC.

For 30 years, the adjusted Rand index has been the preferred method for comparing 2 partitions (e.g., clusterings) of a set of observations. Although the index is widely used, little is known about its variability. Herein, the variance of the adjusted Rand index (Hubert & Arabie, 1985) is provided and its properties are explored. It is shown that a...

When prediction using a diagnostic test outperforms simple prediction using base rates, the test is said to be "clinically efficient," a term first introduced into the literature by Meehl and Rosen (1955) in Psychological Bulletin. This article provides three equivalent conditions for determining the clinical efficiency of a diagnostic test: (a) Me...

Individual differences on spatial tasks were examined relative to differences in free-androgen levels. A spatial test battery was administered to 91 males and females who differed in free-androgen levels as determined by a radioimmunoassay. Polynomial regression analyses yielded significant curvilinear functions relating spatial scores and androgen...

Keep on taking the tablets – but for how long? New drugs can be tested for only a short time, but Vioxx and Avandia proved dangerous over the long term, and patients may take such drugs for years. So how can we establish the safety of long-term use?Howard Wainer finds that makers of light bulbs and door hinges can guide us.

The data analysis problem of ordering or sequencing a set of objects using an asymmetric proximity function is reviewed, with an emphasis on literature not generally referenced in psychology. In particular, an attempt is made to present some of the more important approaches to the asymmetric seriation task that have been developed independently und...

The concept of a spanning tree for a weighted graph is used to characterize several methods of clustering a set of objects. In particular, most of the paper is devoted to stating relationships between spanning trees, single-link and complete-link hierarchical clustering, network flow and two divisive clustering procedures. Several related topics us...

A further discussion of Brennan & Light's (1974) measure of nominal scale response agreement between two raters is given. Specifically, a monotonic function of Brennan & Light's statistic is obtained in terms of a generalized correlation coefficient and provided with a descriptive probabilistic interpretation. Under the different assumptions of fix...

Using the notion of a generalized proximity function comparison discussed by Hubert (1978), alternative indices are proposed for evaluating correlational pattern within a multitrait-multimethod matrix. The measures that are presented and the inference paradigm based on random sampling only require the ordering of the correlations in the given multi...

Using an occupancy model developed from combinatorics, the prototypic single-link and complete-link hierarchical clustering methods are considered to be at the two extremes of a space distortion clustering continuum. Two approaches for attacking the space distortion problem are suggested: (i) using an intermediate r-diameter criterion that includes...

Generalizations are given for a simple cross-product statistic that has been used to compare two proximity functions defined on the Cartesian product of a given set S. The extensions developed here are based on a similar cross-product form but rely on two functions defined either on S × S × S or on S × S × S × S. Typically, the latter are obtained...

A randomization model appropriate for evaluating priority effects in free recall (i.e. whether ‘new’ items are recalled prior to ‘old’ items) is discussed and related to well-known non-parametric significance tests. In particular, the bases for the measures that have been suggested in the psychological literature may be interpreted either in terms...

A least-squares strategy is developed for representing a symmetric proximity matrix containing similarity or dissimilarity values between each pair of objects from some given set, as an approximate sum of a small number of symmetric matrices having the same size as the original but which satisfy certain simple order constraints on their entries. Th...

Based on a given asymmetric proximity function, a two-stage computational heuristic for sequencing a set of objects along a continuum is presented and illustrated with the type of example common in the paired-comparison literature. The first stage, defined by the pairwise interchange of objects, is intended to generate reasonably good orderings fro...

This paper selectively reviews some of the recent developments in sequencing objects along a continuum that rely upon a symmetric proximity measure defined between the objects to be seriated. Most of the discussion is from a graph-theoretical point of view and emphasizes the notion of a maximal spanning tree and related ‘crude’ seriation strategies...

A least squares optimization strategy is first reviewed and applied to the task of fitting a given collection of symmetric proximity values defined between the objects from one set by a collection of reconstructed proximity values, satisfying a fixed set of constraints, generated from some specified graph-theoretic structure, such as an ultrametric...

A technique for evaluating the non-randomness of a proximity matrix prior to a cluster analysis has been suggested by Ogilvie which incorporates an asymptotic result developed by Erdös & Rényi for the size of the largest connected component in a random graph. On the basis of Ogilvie's Monte Carlo comparisons, the use of Erdös & Rényi's asymptotic f...

An integrated iterative method is presented for the optimal ordering and scaling of objects in multivariate data, where the variables themselves may be transformed in the process of optimizing the objective function. Given an ordering of objects, optimal transformation of variables is guaranteed by the combined use of
majorization (a particular (su...

Most published measures of spatial autocorrelation (SA) can be recast as a (normalized) cross-product statistic that indexes the degree of relationship between corresponding entries from two matrices—one specifying the spatial connections among a set of n locations, and the other reflecting a very explicit definition of similarity between the set o...

Research on group differences in interests has often focused on structural hypotheses and mean-score differences in Holland’s (1997) theory, with comparatively little research on basic interest measures. Group differences in interest profiles were examined using statistical methods for matching individuals with occupations, the C-index, Q correlati...

A fundamental concept encountered in the field of classification is that of an ultrametric which serves as a mechanism for characterizing collections of hierarchically organized partitions for some given object set. This chapter discusses the imposition of a given fixed order, in constructing and displaying an ultrametric. Then the extensions of th...

Noteworthy progress has been made in the development of statistical models for evaluating the structure of vocational interests over the past three decades. It is proposed that historically significant interest datasets, when combined with modern structural methods of data analysis, provide an opportunity to re-examine the underlying assumptions of...

This paper proposes an order-constrained K-means cluster analysis strategy, and implements that strategy through an auxiliary quadratic assignment optimization heuristic
that identifies an initial object order. A subsequent dynamic programming recursion is applied to optimally subdivide the
object set subject to the order constraint. We show that a...

Keywords
Weighting Schemes for the Fixed (Target) Matrix Q
Single Cluster Statistics
Partition Statistics
Partition Hierarchy Statistics
Alternative Assignment Indices
Modifications of the Target Matrix Q
See also
References

Edwin Diday, some two decades ago, was among the first few individuals to recognize the importance of the (anti-)Robinson
form for representing a proximity matrix, and was the leader in suggesting how such matrices might be depicted graphically
(as pyramids). We characterize the notions of an anti-Robinson (AR) and strongly anti-Robinson (SAR) matr...

Preface Part I. (Multi- and Unidimensional) City-Block Scaling: 1. Linear unidimensional scaling 2. Linear multidimensional scaling 3. Circular scaling 4. LUS for two-mode proximity data Part II. The Representation of Proximity Matrices by Tree Structures: 5. Ultrametrics for symmetric proximity data 6. Additive trees for symmetric proximity data 7...

In this paper we present a methodology for comparing the adequacy of two or more models in terms of how well they represent a given data set. a set of interregional migration models is tested, including several variations of push-pull models. Wilson's entropy maximizing model, a quadratic programing solution, and an ANOVA model. Testing is undertak...

On Saturday, July 2, the lead headline in The New York Times read as follows: "O'Connor to Retire, Touching Off Battle Over Court." Opening the story attached to the headline, Richard W. Stevenson wrote, "Justice Sandra Day O'Connor, the first wo-man to serve on the United States Supreme Court and a critical swing vote on abortion and a host of oth...

This paper is divided into two main parts: The first concerns the history of Multidimensional Scaling (MDS), focusing especially on Paul Green’s role in defining the role of MDS in marketing. The second concerns Clustering, again emphasizing Green’s role in defining the theory and practice of clustering methodology applied to marketing, both in the...

Combinatorial data analysis (CDA) as a generic term can be discussed within the framework of two distinct tasks of data analysis: exploratory and confirmatory. A confirmatory CDA approach compares some given data set to a specific structure that is conjectured for it a priori; the empirically observed degree of correspondence is evaluated by refere...

The fit of J. L. Holland's (1959, 1997) RIASEC model to U.S. racial-ethnic groups was assessed using circular unidimensional scaling. Samples of African American, Asian American, Caucasian American, and Hispanic American high school students and employed adults who completed either the UNIACT Interest Inventory (K. B. Swaney, 1995) or the Strong In...

The heritability of nociception and antinociception has been well established in the mouse. The pharmacogenetics of morphine analgesia are fairly well characterized, but far less is known about other analgesics. The purpose of this work was to begin the systematic genetic study of non-μ-opioid analgesics. We tested mice of 12 inbred mouse strains f...

-norm: (1)
dynamic programming; (2) an iterative quadratic assignment improvement
heuristic; (3) the Guttman update strategy as modified by Pliner's technique
of smoothing; (4) a nonlinear programming reformulation by Lau, Leung, and
Tse. The methods are all implemented through (freely downloadable) MATLAB
m-files; their use is illustrated by a com...

Methods for the hierarchical clustering of an object set produce a sequence of nested partitions such that object classes within each successive partition are constructed from the union of object classes present at the previous level. Any such sequence of nested partitions can in turn be characterized by an ultrametric. An approach to generalizing...

The first part of this monograph's title, Combinatorial Data Analysis (CDA), refers to a wide class of methods for the study of relevant data sets in which the arrangement of a collection of objects is absolutely central. Characteristically, CDA is involved either with the identification of arrangements that are optimal for a specific representatio...

In response to Tinsley (2000) we dispute his conclusions that congruence is a myth and the Holland hexagonal model lacks validity. We suggest that existing meta-analyses on the congruence–satisfaction relationship fail to account for significant sources of error, resulting in inaccurate conclusions. Tinsley's assertions concerning Holland's model a...

Matrix factorization in numerical linear algebra (NLA) typically serves the purpose of restating some given problem in such a way that it can be solved more readily; for example, one major application is in the solution of a linear system of equations. In contrast, within applied statistics/psychometrics (AS/P), a much more common use for matrix fa...

It is generally acknowledged that humans display highly variable sensitivity to pain, including variable responses to identical injuries or pathologies. The possible contribution of genetic factors has, however, been largely overlooked. An emerging rodent literature documents the importance of genotype in mediating basal nociceptive sensitivity, in...

Clinical pain syndromes, and experimental assays of nociception, are differentially affected by manipulations such as drug administration and exposure to environmental stress. This suggests that there are different 'types' of pain. We exploited genetic differences among inbred strains of mice in an attempt to define these primary 'types'; that is,...

Acknowledgments The O*NET Computerized Interest Profilerwas,produced,and funded by the O*NET project of the U.S. Department of Labor, Employment and Training Administration, Office of Policy and Research (OPR) under the direction of Gerard F. Fiala, Administrator. The O*NET project is directed by Jim Woods, Office of Policy and Research, and Donna...

This report summarizes a study designed to further the development of the O*NET Interest Profiler. The purpose of the study was to examine the psychometric properties — reliability and validity — of the final form of the O*NET Interest Profiler, and to evaluate the self-scoring aspect of the instrument.

There are various optimization strategies for approximating, through the minimization of a least-squares loss function, a given symmetric proximity matrix by a sum of matrices each subject to some collection of order constraints on its entries. We extend these approaches to include components in the approximating sum that satisfy what are called th...

A review is given for the data analysis task of representing a symmetric proximity matrix, defined for some object set, by a sum of matrices each having the restrictive anti-Robinson (AR) form. An emphasis is placed on the inclusion of an optimal monotonic transformation of the given proximity matrix and what each AR component of an additive decomp...

Optimal scaling is reviewed in the framework of a comprehensive multidimensional data analysis strategy, called the Data Theory Scaling System (DTSS). The optimal scaling methodology has a lot of potential for the analysis of multivariate data that are qualitative, non-normally distributed, or incomplete. Also, the system can easily deal with nonli...

The classification task discussed is one of constructing for an object set S an optimal partition into a given number of ordered classes based on symmetric or skew-symmetric proximity information among the objects. Several measures of merit for how well a given ordered partition reflects the proximity data are defined, and a dynamic programming str...

The tasks of linear and circular unidimensional scaling can be characterized by the attempt to represent the entries in a symmetric proximity matrix through distances among a set of object locations defined either along a linear continuum or around a closed, circular continuum. These two scaling tasks are approached through a least-squares optimiza...

The classification task of hierarchical clustering can be characterized as one of constructing for an object set S a sequence of successively less-refined partitions that attempts to represent the pattern of entries in a given symmetric proximity matrix defined between the objects. We discuss this process of constructing a partition hierarchy by th...

We review the current methodological and practical state of cluster analysis in marketing. Topics covered include segmentation, market structure analysis, a taxonomy based on overlap, connections to conjoint analysis, and validation.

The additive clustering approach is applied to the problem of two-mode clustering and compared with the recent error-variance
approach of Eckes and Orlik (1993). Although the schemes of the computational algorithms look very similar in both of the
approaches, the additive clustering has been shown to have several advantages. Specifically, two techn...

A least-squares strategy is proposed for representing a two-mode proximity matrix as an approximate sum of a small number of matrices that satisfy certain simple order constraints on their entries. The primary class of constraints considered define Q-forms (or anti-Q-forms) for a two-mode matrix, where after suitable and separate row and column reo...

To be useful to breeders, classification of genotypes based on cluster analysis must provide meaningful groupings of the genotypes clustered. We evaluated a classification of 148 U.S. maize [Zea mays L.] inbreds resulting from cluster analysis based on restriction fragment length polymorphisms (RFLPs) to determine if it represented the true associa...

We critically review Holland's structural hypotheses to provide a framework for discussing the task of evaluating vocational interest models. Two forms of Holland's RIASEC model are proposed and the predictions from these models are specified. Holland's circular order model and circumplex structure are then evaluated to demonstrate Hubert and Arabi...

We present an approach, independent of the common gradient-based necessary conditions for obtaining a (locally) optimal solution,
to multidimensional scaling using the city-block distance function, and implementable in either a metric or nonmetric context.
The difficulties encountered in relying on a gradient-based strategy are first reviewed: the...

Many well-known measures for the comparison of distinct partitions of the same set ofn objects are based on the structure of class overlap presented in the form of a contingency table (e.g., Pearson's chi-square statistic, Rand's measure, or Goodman-Kruskal's
b
), but they all can be rephrased through the use of a simple cross-product index define...

This paper considers the use of the Bond Energy approach of McCormick, Schweitzer, and White as an alternative to CONCOR and other methods for producing blockmodels, and to Baker's approach to three-way blockmodels. Results of analyses using artificial data and the Roethlisberger-Dickson Bank Wiring Room data are presented, where algorithms alterna...

The bond energy algorithm of W.T. McCormick, P.J. Schweitzer, and
T.W. White (Oper. Res., vol.20, p.993-1009, 1972) is examined in the
context of related strategies of data analysis that seek to solve
problems in production research, imaging, and related engineering
problems. A taxonomy of types of input data and forms of matrix
structure, adopted...

The applicability of Simulated Annealing (SA) (Kirkpatrick et al. (1983)) is studied in the context of the data analysis scheme of the “bond energy algorithm” originally proposed by McCormick et al. (1972) for permuting rows and columns of data matrices into visually interpretable forms. To evaluate the performance of three variations of SA, they w...

The task of assessing the similarity of pattern between the entries of two square matrices has been discussed extensively over the last decade, as a unifying strategy for approaching a variety of seemingly disparate statistical problems. As typically defined, the comparison depends on a measure of matrix correspondence, usually a normalized cross-p...

Simulated annealing was compared with a (locally optimal) pairwise interchange algorithm on two combinatorial data analysis tasks, viz., least-squares unidimensional scaling of symmetric proximity data and a particular unidimensional scaling method of dominance data. These two tasks are representative of a larger class of combinatorial data analysi...

Although various authors have provided improvements to the bond energy algorithm for seriation originally proposed by McCormick, Schweitzer, and White (1972), most of these approaches have limited the types of data that can be considered (e.g., by assuming only binary input). We return to the original algorithm, free of such restrictions, and demon...

The authors present a heuristic evaluation method which involves no explicit assumptions regarding population distributions. The method is an extension of Wolfe's techniques for testing related correlation coefficients. Basically, the method consists of calculating the correlations (rA,B-C) between the corresponding off-diagonal entries of two stan...

Although counseling psychologists conduct a great deal of research that attempts to reveal the structure of a given data set, rarely if ever do they utilize scaling procedures, preferring instead to rely on factor analytic strategies. In this article, we give a short introduction to the use of multidimensional scaling (MDS), with specific emphasis...

We discuss the task of evaluating conjectures of order within proximity matrices in connection with an inappropriate inference procedure proposed by Wakefield and Doughtie and subsequently adopted by other authors. This strategy is criticized and a different method based on randomization is proposed. An example from Holland's theory of personality...

The problem of comparing two different partitions of a finite set of objects reappears continually in the clustering literature. We begin by reviewing a well-known measure of partition correspondence often attributed to Rand (1971), discuss the issue of correcting this index for chance, and note that a recent normalization strategy developed by Mor...

A combinatorial data analysis strategy is reviewed that is designed to compare two arbitrary measures of proximity defined between the objects from some set. Based on a particular cross-product definition of correspondence between these two numerically specified notions of proximity (typically represented in the form of matrices), extensions are th...

A variety of tests for randomness are reviewed based on simple product-moment statistics defined between two matrices, { a<sub>ij</sub> } and { b<sub>ij</sub> }. Typically, the first matrix, { a<sub>ij</sub> }, contains proximity data on the spatial placement of n observations, { x <sub>1</sub>, ..., x<sub>n</sub> }; the second matrix, { b<sub>ij</...

A comprehensive statistical framework is presented which encompasses a wide range of existing nonparametric methods. The basic
strategy, referred to as linear assignment (LA), depends on a simple index of correspondence defined between two object sets
that have been matched in somea priori manner. In this broad sense, LA can be interpreted as a gen...

A number of statistical problems are considered using a very general inference strategy justified by randomisation: association between matched sets of directions, serial and cyclic correlation, spatial autocorrelation, and the relation of directional observations to a univariate variable or a discrete combinatorial structure. A variety of numerica...

A general strategy is proposed for measuring and testing the agreement between two raters who use some common scheme for assessing each of n objects. As long as a numerical measure of correspondence or proximity can be obtained between any two assessments, an agreement index can be defined and tested for significance under a null conjecture of rate...

In this paper we present a methodology for comparing the adequacy of two or more models in terms of how well they represent a given data set. A set of interregional migration models is tested, including several variations of push-pull models, Wilson's entropy maximizing model, a quadratic programming solution, and and ANOVA model. Testing is undert...

Exploratory multidimensional scaling and confirmatory nonparametric procedures (Hubert and Levin, 1976) were used to represent data from similarity rating and sorting tasks performed on nine animal names. Confirmatory procedures demonstrated that the organization of the data from the two tasks was similar. Analyses of data from sorting tasks perfor...

As the title suggests, this paper is concerned with proximity matrices, or more explicitly, with any matrix that contains numerical values indexing the relationship between objects from some given set. Since the methods of evaluation to be discussed have a number of uses, our review could be appropriately placed under a particular subtopic such as...

Tjostheim's Index measures the spatial association between ranked observations on two variables. Extensions are discussed and a measure is proposed which incorporates features of both Tjostheim's Index and Spearman's rank correlation.-R.Haynes

A procedure is discussed for comparing two rectangular n x m data matrices. The two matrices would typically represent data on the same n objects (for example, cities or subjects) and the same m attributes (for example, crime rates or attitudinal variables). An index that measures the degree to which both matrices are similar is presented along wit...

The Euclidean metric is perhaps the most commonly used and most convenient one for representing mapped phenomena. In this paper we examine the suitability of representing cognitive phenomena via the Euclidean metric. Some general properties of spaces are examined with particular emphasis on the properties of isotropy, incompleteness, and curvature,...

Connections between hierarchical clustering and the seriation of objects along a continuum that depend on the patterning of entries in a proximity matrix are pointed out. Based on the similarity between the central notion of an ultrametric in hierarchical clustering and what is called an anti-Robinson property in seriation, it is suggested that the...

Replies to D. Moshman's critique of the paper by the present authors , who point out (a) the implicit logical fallacy of affirming the consequent in Moshman's argument and (b) the counterintuitive contention that statistical independence does not rule out developmental priority. (2 ref)

By extending a technique for testing the difference between two dependent correlations developed by Wolfe, a strategy is proposed in a more general matrix context for evaluating a variety of data analysis schemes that are supposed to clarify the structure underlying a set of proximity measures. In the applications considered, a data analysis scheme...