Article

Quantitative science and the definition of measurement in Psychology

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

It is argued that establishing quantitative science involves two research tasks: the scientific one of showing that the relevant attribute is quantitative; and the instrumental one of constructing procedures for numerically estimating magnitudes. In proposing quantitative theories and claiming to measure the attributes involved, psychologists are logically committed to both tasks. However, they have adopted their own, special, definition of measurement, one that deflects attention away from the scientific task. It is argued that this is not accidental. From Fechner onwards, the dominant tradition in quantitative psychology ignored this task. Stevens' definition rationalized this neglect. The widespread acceptance of this definition within psychology made this neglect systemic, with the consequence that the implications of contemporary research in measurement theory for undertaking the scientific task are not appreciated. It is argued further that when the ideological support structures of a science sustain serious blind spots like this, then that science is in the grip of some kind of thought disorder. …unluckily our professors of psychology in general are not up to quantitative logic… E. L. Thorndike to J. McK. Cattell, 1904

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... This can be expressed as a derived ratio (or even a percentage given how common it is to think of effort in this manner e.g., "they gave X% in that attempt") given that I conceptualise the primitives, capacity and demands, as being quantitative attributes having natural origins 4 and hypothesised to relate to magnitudes of one another ordinally and additively (Michell, 1997). The effort E Api for person p and task i expressed as a percentage then is: ...
... However, in practice it is not easy to apply the definition to all kinds of tasks in terms of some measurement operation. For attempted performances of cognitive tasks in particular, which are thought to be behavioural reflections of relations between unobservable psychological dispositions of the actor and unobservable aspects of the task, operations of measurement are fraught with difficulty (Michell, 1997). In essence, it is difficult to derive the effort for a given individual attempting a given task because we haven't got good operationalisations of the magnitudes of C A and D A . ...
... if the load were halved it would be 25% etc. The task of deriving a measurement of effort here is simple because, in the case of physical tasks such as lifting a weight, we have knowledge of the quantitative nature of the attributes of capacity and demands given that they are physical variables and are amenable to classical measurement, and we have available to us operations for measuring them i.e., the numerical estimation of the ratio of their magnitudes to a unit of the same attribute (e.g., kilograms; Michell (1997)). ...
Preprint
Full-text available
'Effort' is a concept of interest in cognitive psychology and neuroscience where many theories include it as a postulate. Despite its intuitiveness it is difficult to define such that its operationalisation follows a logical derivation chain. Recently I have proposed conceptual definitions of both actual effort, and the perception of effort, as the ratio of task demands to capacity to meet task demands, both actual and perceived respectively. Clear conceptual definitions are key for determining whether a given operationalisation meets the necessary and sufficient conditions adequately. For physical tasks valid operationalisation, and indeed measurement, of actual effort is often trivial. But a problem arises for operationalisation of actual effort in cognitive tasks where the underlying capacity that disposes an individual to be able to attempt, and perhaps complete, the task is not directly observable nor is the demand the task presents. However, a solution may lie in applications of Additive Conjoint Measurement to determine conditions where classical measurement may be possible, and the Rasch model as a measurement operation of capacity and demands to derive effort. A key aspect of the Rasch model is that it posits and estimates from data two latent constructs that I accept here as conceptually equivalent to capacity and demands respectively in my definition of effort; first a characteristic of the individual (ability), and second a characteristic of the test or item (difficulty). As such, applications of these methods might provide a measurement operation of actual effort in cognitive tasks to enable more precise formulations and testing of theories that employ the concept. In this work I explore these ideas using simulation and analogical abduction of a task where effort is known and examine an empirical dataset. Finally, I discuss the conditions under which these methods may be suitable and their inherent limitations.
... Although criticism has accompanied psychological measurement from the beginning (Michell, 1997(Michell, , 2000, especially in the last 25 years, established measurement practices have been criticized from a variety of different theoretical perspectives. In the following, I will shortly summarize the most important challenges to psychological measurement. ...
... In the following, I will shortly summarize the most important challenges to psychological measurement. Joel Michell has repeatedly argued that measuring an attribute presupposes the attribute to be quantitative (Michell, 1997(Michell, , 2000(Michell, , 2003(Michell, , 2011(Michell, , 2020. However, according to Michell, psychologists have never delivered empirical evidence that psychological attributes are actually quantitative. ...
... However, according to Michell, psychologists have never delivered empirical evidence that psychological attributes are actually quantitative. Moreover, Michell has claimed that-with the exception of conjoint measurement-the quantitative methods currently used in psychology are incapable of delivering this evidence (Michell, 1997(Michell, , 2000. Beyond that, Michell has argued that the widespread inference from order to the alleged quantity of psychological phenomena (e.g., pain can be more or less intense, therefore, pain must be quantitative and, consequently, measurable) is a logical fallacy (Michell, 2012). ...
Article
Established measurement practices have been criticized from various theoretical perspectives. The purpose of this article is to argue that quantitative research could be more defensible if contested assumptions about measurement were abandoned, and to illustrate this thesis with the example of the better-than-average-effect (BTAE). If research on the BTAE is conceptualized as an interpretive endeavor, one can provide arguments that do not rely on psychological measurement for the claim that the BTAE is evidence for self-delusion in people. I outline these arguments and elaborate them by discussing a typical study on the BTAE. Furthermore, I show how a measurement-free characterization of the BTAE reveals an important research gap and points to the specific scientific value of research on the BTAE. Finally, I offer three general suggestions for conducting future interpretive quantitative research: justifying why a quantitative method is suitable for investigating a certain phenomenon, providing a comprehensive interpretation of the numerical results, and exploring participants’ understanding of the study material.
... Psychology is in crisis, again and anew. Continued debates about replicability (Regenwetter and Robinson, 2017), validity (Newton and Baird, 2016), generalisability (Yarkoni, 2022), robust results (Nosek et al., 2022), preregistration (Szollosi et al., 2020), measurement theories (Trendler, 2019;Uher, 2021c,d) and measurability (Michell, 1997;Trendler, 2009), amongst others, indicate profound problems still unsolved. Astonishingly, however, the widespread use of rating 'scales' for quantitative investigations of the complex phenomena of behaviour, psyche and society seems largely unchallenged-even by critics of contemporary practices (e.g., Michell, 2013). ...
... that were feasible in their field and that they considered to be analogous to physical measurement-yet without checking if these adaptations actually met (1) the criteria of measurement and (2) the peculiarities of their study phenomena (Strauch, 1976;Valsiner, 2017b;Uher, 2018aUher, , 2020b. Specifically, when operationalists defined a study phenomenon's meaning primarily by the operational procedures enabling its investigation (Problem complex §6 Operationalism), application of quantitative methods implied the invalid a priori answer that "Regardless of what it is, it can be measured-it is a continuous quantity" (Hibberd, 2019, p. 46;Strauch, 1976;Michell, 1997). Operational procedures yielding convergent numerical results were now interpreted as valid instruments for "measuring constructs. ...
Article
Full-text available
This article explores in-depth the metatheoretical and methodological foundations on which rating scales—by their very conception, design and application—are built and traces their historical origins. It brings together independent lines of critique from different scholars and disciplines to map out the problem landscape, which centres on the failed distinction between psychology’s study phenomena (e.g., experiences, everyday constructs) and the means of their exploration (e.g., terms, data, scientific constructs)—psychologists’ cardinal error. Rigorous analyses reveal a dense network of 12 complexes of problematic concepts, misconceived assumptions and fallacies that support each other, making it difficult to be identified and recognised by those (unwittingly) relying on them (e.g., various forms of reductionism, logical errors of operationalism, constructification, naïve use of language, quantificationism, statisticism, result-based data generation, misconceived nomotheticism). Through the popularity of rating scales for efficient quantitative data generation, uncritically interpreted as psychological measurement, these problems have become institutionalised in a wide range of research practices and perpetuate psychology’s crises (e.g., replication, confidence, validation, generalizability). The article provides an in-depth understanding that is needed to get to the root of these problems, which preclude not just measurement but also the scientific exploration of psychology’s study phenomena and thus its development as a science. From each of the 12 problem complexes; specific theoretical concepts, methodologies and methods are derived as well as key directions of development. The analyses—based on three central axioms for transdisciplinary research on individuals, (1) complexity, (2) complementarity and (3) anthropogenicity—highlight that psychologists must (further) develop an explicit metatheory and unambiguous terminology as well as concepts and theories that conceive individuals as living beings, open self-organising systems with complementary phenomena and dynamic interrelations across their multi-layered systemic contexts—thus, theories not simply of elemental properties and structures but of processes, relations, dynamicity, subjectivity, emergence, catalysis and transformation. Philosophical and theoretical foundations of approaches suited for exploring these phenomena must be developed together with methods of data generation and methods of data analysis that are appropriately adapted to the peculiarities of psychologists’ study phenomena (e.g., intra-individual variation, momentariness, contextuality). Psychology can profit greatly from its unique position at the intersection of many other disciplines and can learn from their advancements to develop research practices that are suited to tackle its crises holistically.
... Scientific measurement is a clearly defined concept: it is the inference of a magnitude, an amount of a property supposed to be quantitative [59]. The natural sciences have made unprecedented progress based on quantitative theories and models [60]. ...
... How should psychological concepts, or constructs, be measured, and how should there being quantitative be demonstrated? Michell [59] disentangled the problems of inferring measurement by suggesting two tasks to be fulfilled: the scientific task of measurement and the instrumental task. The scientific task is concerned with whether the variable to be assessed is actually quantitative. ...
Chapter
Full-text available
Measurement in the social sciences is typically characterized by a multitude of instruments that are assumed to measure the same concept but lack comparability. Underdeveloped conceptual theories that fail to expose a measurement mechanism are one reason for the incommensurable measurements. Without such a mechanism measurements cannot be linked to a fundamental reference as required by metrological traceability. However, traditional metrological concepts can be extended by allowing for direct links between different instruments, so-called crosswalks. In this regard, Rasch Measurement Theory proves particularly useful as it facilitates a co-calibration of different instruments onto a common metric. The example of the measurement of nicotine dependence through self-report instruments serves as a showcase of the problems in social measurement and how they can be overcome contributing to metrological traceability in the social sciences.
... All individuals provided written informed consent and were remunerated for participation. the specific structure of them, particularly that they are quantitative attributes, meaning their specific magnitudes are related to one another ordinally and additively and thus sustain measurement via numerical estimation of ratios of magnitudes (Michell, 1997), as yet remain untested. Thus, the reader is forewarned to at present consider the realist interpretation of our findings with particular tentativeness and perhaps instead to consider them primarily with respect to the specific operations undertaken until such time as the quantity assumption is tested. ...
... Here both actual capacity as a measurable ratio quantity, and the self-reported perception of capacity, were on the 0-100% interval allowing us to explore the extent of their identity at least with respect to the operationalisations themselves. However, whilst actual capacity was measured, it is not entirely clear that perception of capacity is itself a measurable ratio quantity (Michell, 1997). It is possible to test empirically the assumption of quantitative structure for even intensional constructs such as the phenomenology considered here thanks to advances such as conjoint measurement theory (Luce & Tukey, 1964). ...
Preprint
Full-text available
The actual capacity to perform tasks, and actual fatigue, are concepts that have been thought of as inherently linked. These considerations also extend to their phenomenology, meaning the perception(s) of capacity or fatigue. The phenomenology of capacity or fatigue thus may be capturing the same underlying latent construct. Further, it is speculated that the actual capacity of a person to perform a given task, and their perception of that capacity, have a psychophysical relationship. The aim of this study was therefore twofold: 1) to explore the extensional equivalence of perceptions of capacity and fatigue, and 2) to explore the relationship between actual capacity and the perception of that capacity. We analysed secondary outcomes from two experiments where 21 participants performed various elbow flexion tasks with either a dumbell, or a connected adaptive resistance exercise (CARE) machine enabling measurement of actual capacity (i.e., maximal force). Mixed effects ordered beta regression models estimated the latent constructs during conditions from self-reports of perceptions of capacity and fatigue comparing the two operationalisations, and the relationship between actual capacity (i.e., % maximal force) and self-reports of perceptions of capacity. We hypothesised that, given their theoretical extensional equivalence, the latent constructs captured by self-report ratings as operationalisations of perceptions of capacity and fatigue would exhibit a strong negative relationship between each other reflecting strong identity, and a positive association albeit with weak as opposed to strong identity between actual capacity and perception of capacity. Our results appear to broadly corroborate both hypotheses. There was a very strong relationship indicating strong identity and thus extensional equivalence of perceptions of capacity and fatigue latent constructs (r =-0.989 [95% CI-0.994 to-0.981]). Further, a coarse grained directional relationship between actual capacity and the perception of capacity was present suggesting only weak identity at best. Future research should endeavour to identify conditions permitting testing of assumptions of the present work (i.e., the quantity assumption) and explore further possible psychophysical models relating actual, and perceptions of, demands, capacity, and effort to understand the impact of the former upon the latter given the conceptual relationships between them.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
We argue that a goal of measurement is general objectivity: point estimates of a person’s measure (height, temperature, and reader ability) should be independent of the instrument and independent of the sample in which the person happens to find herself. In contrast, Rasch’s concept of specific objectivity requires only differences (i.e., comparisons) between person measures to be independent of the instrument. We present a canonical case in which there is no overlap between instruments and persons: each person is measured by a unique instrument. We then show what is required to estimate measures in this degenerate case. The canonical case encourages a simplification and reconceptualization of validity and reliability. Not surprisingly, this reconceptualization looks a lot like the way physicists and chemometricians think about validity and measurement error. We animate this presentation with a technology that blurs the distinction between instruction, assessment, and generally objective measurement of reader ability. We encourage adaptation of this model to health outcomes measurement.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
In an argument whereby, “… individual-centered statistical techniques require models in which each individual is characterized separately and from which, given adequate data, the individual parameters can be estimated”.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
The field of career education measurement is in disarray. Evidence mounts that today’s career education instruments are verbal ability measures in disguise. A plethora of trait names such as career maturity, career development, career planning, career awareness, and career decision making have, in the last decade, appeared as labels to scales comprised of multiple choice items. Many of these scales appear to be measuring similar underlying traits and certainly the labels have a similar sound or “jingle” to them. Other scale names are attached to clusters of items that appear to measure different traits and at first glance appear deserving of their unique trait names, e.g., occupational information, resources for exploration, work conditions, personal economics. The items of these scales look different and the labels correspondingly are dissimilar or have a different “jangle” to them.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
Growth in reading ability varies across individuals in terms of starting points, velocities, and decelerations. Reading assessments vary in the texts they include, the questions asked about those texts, and in the way responses are scored. Complex conceptual and operational challenges must be addressed if we are to coherently assess reading ability, so that learning outcomes are comparable within students over time, across classrooms, and across formative, interim, and accountability assessments. A philosophical and historical context in which to situate the problems emerges via analogies from scientific, aesthetic, and democratic values. In a work now over 100 years old, Cook's study of the geometry of proportions in art, architecture, and nature focuses more on individual variation than on average general patterns. Cook anticipates the point made by Kuhn and Rasch that the goal of research is the discovery of anomalies—not the discovery of scientific laws. Bluecher extends Cook’s points by drawing an analogy between the beauty of individual variations in the Parthenon’s pillars and the democratic resilience of unique citizen soldiers in Pericles’ Athenian army. Lessons for how to approach reading measurement follow from the beauty and strength of stochastically integrated variations and uniformities in architectural, natural, and democratic principles.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
One must provide information about the conditions under which [the measurement outcome] would change or be different. It follows that the generalizations that figure in explanations [of measurement outcomes] must be change-relating… Both explainers [e.g., person parameters and item parameters] and what is explained [measurement outcomes] must be capable of change, and such changes must be connected in the right way (Woodward, 2003). Rasch’s unidimensional models for measurement tell us how to connect object measures, instrument calibrations, and measurement outcomes. Substantive theory tells us what interventions or changes to the instrument must offset a change to the measure for an object of measurement to hold the measurement outcome constant. Integrating a Rasch model with a substantive theory dictates the form and substance of permissible conjoint interventions. Rasch analysis absent construct theory and an associated specification equation is a black box in which understanding may be more illusory than not. The mere availability of numbers to analyze and statistics to report is often accepted as methodologically satisfactory in the social sciences, but falls far short of what is needed for a science.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
A metrological infrastructure for the social, behavioral, and economic sciences has foundational and transformative potentials relating to education, health care, human and natural resource management, organizational performance assessment, and the economy at large. The traceability of universally uniform metrics to reference standard metrics is a taken-for-granted essential component of the infrastructure of the natural sciences and engineering. Advanced measurement methods and models capable of supporting similar metrics, standards, and traceability for intangible forms of capital have been available for decades but have yet to be implemented in ways that take full advantage of their capacities. The economy, education, health care reform, and the environment are all now top national priorities. There is nothing more essential to succeeding in these efforts than the quality of the measures we develop and deploy. Even so, few, if any, of these efforts are taking systematic advantage of longstanding, proven measurement technologies that may be crucial to the scientific and economic successes we seek. Bringing these technologies to the attention of the academic and business communities for use, further testing, and development in new directions is an area of critical national need.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
There is nothing wrong with the NAEP reading exercises, the sampling design, or the NAEP Reading Proficiency Scale, these authors maintain. But adding a rich criterion-based frame of reference to the scale should yield an even more useful tool for shaping U.S. educational policy.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
Measurement plays a vital role in the creation of markets, one that hinges on efficiencies gained via universal availability of precise and accurate information on product quantity and quality. Fulfilling the potential of these ideals requires close attention to measurement and the role of technology in science and the economy. The practical value of a strong theory of instrument calibration and metrological traceability stems from the capacity to mediate relationships in ways that align, coordinate, and integrate different firms’ expectations, investments, and capital budgeting decisions over the long term. Improvements in the measurement of reading ability exhibit patterns analogous to Moore’s Law, which has guided expectations in the micro-processor industry for almost 50 years. The state of the art in reading measurement serves as a model for generalizing the mediating role of instruments in making markets for other forms of intangible assets. These remarks provide only a preliminary sketch of the kinds of information that are both available and needed for making more efficient markets for human, social, and natural capital. Nevertheless, these initial steps project new horizons in the arts and sciences of measuring and managing intangible assets.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
Implicit in the idea of measurement is the concept of objectivity. When we measure the temperatureTemperature using a thermometer, we assume that the measurement we obtain is not dependent on the conditions of measurement, such as which thermometer we use. Any functioning thermometer should give us the same reading of, for example, 75 °F. If one thermometer measured 40 °, another 250 and a third 150, then the lack of objectivity would invalidate the very idea of accurately measuring temperatureTemperature.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
This paper presents and illustrates a novel methodology, construct-specification equations, for examining the construct validity of a psychological instrument. Whereas traditional approaches have focused on the study of between-person variation on the construct, the suggested methodology emphasizes study of the relationships between item characteristics and item scores. The major thesis of the construct-specification-equation approach is that until developers of a psychological instrument understand what item characteristics are determining the item difficulties, the understanding of what is being measured is unsatisfyingly primitive. This method is illustrated with data from the Knox Cube Test which purports to be a measure of visual attention and short-term memory.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
The last 50 years of human and social science measurement theory and practice have witnessed a steady retreat from physical science as the canonical model. Humphry (2011) unapologetically draws on metrology and physical science analogies to reformulate the relationship between discrimination and the unit. This brief note focuses on why this reformulation is important and on how these ideas can improve measurement theory and practice.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
Huge resources are invested in metrology and standards in the natural sciences, engineering, and across a wide range of commercial technologies. Significant positive returns of human, social, environmental, and economic value on these investments have been sustained for decades. Proven methods for calibrating test and survey instruments in linear units are readily available, as are data- and theory-based methods for equating those instruments to a shared unit. Using these methods, metrological traceability is obtained in a variety of commercially available elementary and secondary English and Spanish language reading education programs in the U.S., Canada, Mexico, and Australia. Given established historical patterns, widespread routine reproduction of predicted text-based and instructional effects expressed in a common language and shared frame of reference may lead to significant developments in theory and practice. Opportunities for systematic implementations of teacher-driven lean thinking and continuous quality improvement methods may be of particular interest and value.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
In his classic paper entitled “The Unreasonable Effectiveness of Mathematics in the Natural Sciences,” Eugene Wigner addresses the question of why the language of Mathematics should prove so remarkably effective in the physical [natural] sciences. He marvels that “the enormous usefulness of mathematics in the natural sciences is something bordering on the mysterious and that there is no rational explanation for it.” We have been similarly struck by the outsized benefits that theory based instrument calibrations convey on the natural sciences, in contrast with the almost universal practice in the social sciences of using data to calibrate instrumentation.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
Several concepts from Georg Rasch's last papers are discussed. The key one is comparison because Rasch considered the method of comparison fundamental to science. From the role of comparison stems scientific inference made operational by a properly developed frame of reference producing specific objectivity. The exact specifications Rasch outlined for making comparisons are explicated from quotes, and the role of causality derived from making comparisons is also examined. Understanding causality has implications for what can and cannot be produced via Rasch measurement. His simple examples were instructive, but the implications are far reaching upon first establishing the key role of comparison.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
The purpose of this paper is to review some assumptions underlying the use of norm-referenced tests in educational evaluations and to provide a prospectus for research on these assumptions as well as other questions related to norm-referenced tests. Specifically, the assumptions which will be examined are (1) expressing treatment effects in a standard score metric permits aggregation of effects across grades, (2) commonly used standardized tests are sufficiently comparable to permit aggregation of results across tests, and (3) the summer loss observed in Title I projects is due to an actual loss in achievement skills and knowledge. We wish to emphasize at the outset that our intent in this paper is to raise questions and not to present a coherent set of answers.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
Rasch’s unidimensional models for measurement show how to connect object measures (e.g., reader abilities), measurement mechanisms (e.g., machine-generated cloze reading items), and observational outcomes (e.g., counts correct on reading instruments). Substantive theory shows what interventions or manipulations to the measurement mechanism can be traded off against a change to the object measure to hold the observed outcome constant. A Rasch model integrated with a substantive theory dictates the form and substance of permissible interventions. Rasch analysis, absent construct theory and an associated specification equation, is a black box in which understanding may be more illusory than not. Finally, the quantitative hypothesis can be tested by comparing theory-based trade-off relations with observed trade-off relations. Only quantitative variables (as measured) support such trade-offs. Note that to test the quantitative hypothesis requires more than manipulation of the algebraic equivalencies in the Rasch model or descriptively fitting data to the model. A causal Rasch model involves experimental intervention/manipulation on either reader ability or text complexity or a conjoint intervention on both simultaneously to yield a successful prediction of the resultant observed outcome (count correct). We conjecture that when this type of manipulation is introduced for individual reader text encounters and model predictions are consistent with observations, the quantitative hypothesis is sustained.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
Does the reader comprehend the text because the reader is able or because the text is easy? Localizing the cause of comprehension in either the reader or the text is fraught with contradictions. A proposed solution uses a Rasch equation to models comprehension as the difference between a reader measure and text measure. Computing such a difference requires that reader and text are measured on a common scale. Thus, the puzzle is solved by positing a single continuum along which texts and readers can be conjointly ordered. A reader’s comprehension of a text is a function of the difference between reader ability and text readability. This solution forces recognition that generalizations about reader performance can be text independent (reader ability) or text dependent (comprehension). The article explores how reader ability and text readability can be measured on a single continuum, and the implications that this formulation holds for reading theory, the teaching of reading, and the testing of reading.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
This paper describes Mapping Variables, the principal technique for planning and constructing a test or rating instrument. A variable map is also useful for interpreting results. Modest reference is made to the history of mapping leading to its importance in psychometrics. Several maps are given to show the importance and value of mapping a variable by person and item data. The need for critical appraisal of maps is also stressed.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
A construct theory is the story we tell about what it means to move up and down the scale for a variable of interest (e.g., temperature, reading ability, short term memory). Why is it, for example, that items are ordered as they are on the item map? The story evolves as knowledge regarding the construct increases. We call both the process and the product of this evolutionary unfolding "construct definition" (Stenner et al., Journal of Educational Measurement 20:305–316, 1983). Advanced stages of construct definition are characterized by calibration equations (or specification equations) that operationalize and formalize a construct theory. These equations, make point predictions about item behavior or item ensemble distributions. The more closely theoretical calibrations coincide with empirical item difficulties, the more useful the construct theory and the more interesting the story. Twenty-five years of experience in developing the Lexile Framework for Reading enable us to distinguish five stages of thinking. Each subsequent stage can be characterized by an increasingly sophisticated use of substantive theory. Evidence that a construct theory and its associated technologies have reached a given stage or level can be found in the artifacts, instruments, and social networks that are realized at each level.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
The process of ascribing meaning to scores produced by a measurement procedure is generally recognized as the most important task in developing an educational or psychological measure, be it an achievement test, interest inventory, or personality scale. This process, which is commonly referred to as construct validation (Cronbach, 1971; Cronbach & Meehl, 1955; ETS, 1979; Messick, 1975, 1980), involves a family of methods and procedures for assessing the degree to which a test measures a trait or theoretical construct.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
Teachers make use of these two premises to match readers to text. Knowing a lot about text is helpful because “text matters” (Hiebert, 1998). But ordering or leveling text is only half the equation. We must also assess the level of the readers. These two activities are necessary so that the right books can be matched to the right reader at the right time. When teachers achieve this match intuitively, they are rewarded with students choosing to read more.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
The International Vocabulary of Measurement (VIM) and the Guide to Uncertainty in Measurement (GUM) shift the terms and concepts of measurement information quality away from an Error Approach toward a model-based Uncertainty Approach. An analogous shift has taken place in psychometrics with the decreasing use of True Score Theory and increasing attention to probabilistic models for unidimensional measurement. These corresponding shifts emerge from shared roots in cognitive processes common across the sciences and they point toward new opportunities for an art and science of living complex adaptive systems. The psychology of model-based reasoning sets the stage for not just a new consensus on measurement and uncertainty, and not just for a new valuation of the scientific status of psychology and the social sciences, but for an appreciation of how to harness the energy of self-organizing processes in ways that harmonize human relationships.
... If the predicted success rates are observed, then 100 L means the same thing up and down the scale in the substantive terms of real texts encountered by real readers. Is not this capacity to infer the repetition of a qualitatively meaningful constant amount precisely what we mean by "equal interval"? 1 Quantity Versus Heterogeneous Orders Michell (1997Michell ( , 2000Michell ( , 2007 made the case for what measurement is and in his earlier writings, exhorted psychologists to adopt this "standard model" in their practice. Over time, this eminently sensible call morphed into a diagnosis of so-called pathological behavior on the part of the field of psychology and its high priests and priestesses, called psychometricians. ...
... Note that there is a crucial difference between the trade-off property and orderrestricted inference tests such as double and triple cancelation (Kyngdon, 2008;Michell, 1997). The trade-off relation is a claim about what will happen in the individual case or a token test of a causal model. ...
Chapter
Full-text available
Psychometric models typically represent encounters between persons and dichotomous items as a random variable with two possible outcomes, one of which can be labeled success. For a given item, the stipulation that each person has a probability of success defines a construct on persons. This model specification defines the construct, but measurement is not yet achieved. The path to measurement must involve replication; unlike coin-tossing, this cannot be attained by repeating the encounter between the same person and the same item. Such replication can only be achieved with more items whose features are included in the model specifications. That is, the model must incorporate multiple items. This chapter examines multi-item model specifications that support the goal of measurement. The objective is to select the model that best facilitates the development of reliable measuring instruments. From this perspective, the Rasch model has important features compared to other models.
... This assumption raises the question whether psychological constructs actually exist prior and independently of the test that is used to measure them, whether they have a quantitative structure, and whether variations in them actually produce variations in the outcomes of the measurement procedure [28]. The debate is more than a century old and reporting on it is beyond the scope of this work, but there are several works that tackle this issue from different perspectives [29][30][31]. From a methodological point of view, the measurement process that is defined as 'the assignment of numerals to objects or events according to rule' (p. 677) [32] applies firstly to the observable behaviors that are consistent with the theoretical definition of the construct, i.e., represent the so-called 'operationalizations' (operations that anyone can perform [33]) of the construct. ...
... Differently from classical psychometric models such as factor analysis and item response theory, which assume that an unobservable latent variable or construct, causes the observable behaviors and their pattern of covariation, network psychometrics considers psychological characteristics as states or stable organizations of dynamic components that can interact with and cause one another. This approach has important theoretical implications, since it addresses the long-standing issue of the actual existence of psychological constructs (see, e.g., [29]) and appears to be more consistent with the observation of the role that is played by observable behaviors. For instance, in psychopathology there appear to be many direct causal relationships between symptoms, and these sorts of associations can play an important generative role in the etiology of a disorder, other than accounting for the empirical covariance between symptoms (see, e.g., [98]). ...
Article
Full-text available
Existing measures of the impact of job characteristics on workers’ well-being do not directly assess the extent to which such characteristics (e.g., opportunity to learn new skills) are perceived as positive or negative. We developed a measure, the Work Annoyance Scale (WAS), of the level of annoyance that workers feel about certain aspects of the job and evaluated its psychometric properties. Using archival data from two cohorts (n = 2226 and 655) of workers that had undergone an annual medical examination for occupational hazard, we show the usefulness of the network psychometric approach to scale validation and its similarities and differences from a traditional factor analytic approach. The results revealed a two-dimensional structure (working conditions and cognitive demands) that was replicable across cohorts and bootstrapped samples. The two dimensions had adequate structural consistency and discriminant validity with respect to other questionnaires commonly used in organizational assessment, and showed a consistent pattern of association with relevant background variables. Despite the need for more extensive tests of its content and construct validity in light of the organizational changes due to the COVID-19 pandemic and of an evaluation of the generalizability of the results to cultural contexts different from the Italian one, the WAS appears as a psychometrically sound tool for assessment and research in organizational contexts.
... For many psychological variables, measurement is a foundational element of theory development, and vice versa (Loevinger, 1957). To assess latent or unobservable variables, such as depression or extraversion, one usually develops indicators (i.e., items on a self-report measure) that are, in theory, caused by the unobservable variable (Michell, 1997). To use an example from physics, heat cannot be directly observed, but heat causes mercury to expand, so one can assess temperature using a mercury thermometer. ...
Chapter
In this chapter, I review key conceptual and methodological sources of bias in psychological measurement, emphasizing those with particular relevance to political phenomena and providing relevant examples of measurement bias in political psychological research. I then review the case of authoritarianism, which until recently was predominantly assessed among political conservatives. This emphasis on right-wing authoritarianism and the paucity of research concerning left-wing authoritarianism have led to widespread conceptual obstacles to understanding the psychological underpinnings of authoritarianism, illustrating the degree to which measurement bias has key implications for theory development and testing. In closing, I provide several recommendations for reducing political bias in psychological measurement.
... For instance, genetics studies on major depression and intelligence take into account standardised measures such as the Hamilton scale or various IQ tests, respectively. Although such measures provide practical tools to assess individual differences in psychology, framing a trait's variation as 'quantitative' is by no means neutral from an ontological perspective (e.g., Michell 1997Michell , 2012Hibberd 2014;Serpico 2018;Ward 2022). Moreover, the conceptualisation of many psychological traits varies widely across research areas, so that their definition is typically surrounded by major disagreements. ...
Article
Full-text available
The aim of this collection is to bring together philosophical and historical perspectives to address long-standing issues in the interpretation, utility, and impacts of quantitative genetics methods and findings. Methodological approaches and the underlying scientific understanding of genetics and heredity have transformed since the field’s inception. These advances have brought with them new philosophical issues regarding the interpretation and understanding of quantitative genetic results. The contributions in this issue demonstrate that there is still work to be done integrating old and new methodological and conceptual frameworks. In some cases, new results are interpreted using assumptions based on old concepts and methodologies that need to be explicitly recognised and updated. In other cases, new philosophical tools can be employed to synthesise historical quantitative genetics work with modern methodologies and findings. This introductory article surveys three general themes that have dominated philosophical discussion of quantitative genetics throughout history: 1. How methodologies have changed and transformed our knowledge and interpretations; 2. Whether or not quantitative genetics can offer explanations relating to causation and prediction; 3. The importance of defining the phenotypes under study. We situate the contributions in this special issue within a historical framework addressing these three themes.
... In IRT, measurement is understood as a logistic, probabilistic relationship between item difficulty and person ability. The underlying idea is that increasing a person's ability leads to a higher probability of solving a single item (Bond & Fox, 2013;Michell, 1997). IRT models may also account for varying item discriminability, guessing, or a clustered person structure. ...
Preprint
In this study, we compared different models of reading comprehension on a large data basis ofmore than 6500 students. We examine the Simple View of Reading, the Progress in International Reading Literacy study (PIRLS) four-process-model as well as the influence oftext difficulty by applying cross-validated psychometric modeling in the frameworks of classical test theory and item response theory to the newly developed reading comprehension test BYLET. Results demonstrate the best fit for a four-process model and a negligible influenceof text difficulty measured by word and sentence length. The psychometric models were robust toward new samples and the test showed good reliability and validity. We conclude that theories of reading comprehension processes also apply to the measurement of reading comprehensionas a trainable skill. The study is preregistered. Analysis code is available on https://osf.io/ywrks/?view_only=ce6ea36a465e4e959a17be90234ea0c7. Materials can be sent to interested researchers on request.
... One of the most prominent issues discussed in this context is that the present modus operandi builds on several untested assumptions about the nature of psychological attributes and their relation with the behavior that is actually assessed. For example, Michell (1997Michell ( , 1999Michell ( , 2009 argues that the quantitative nature of psychological attributes cannot be taken for granted but requires empirical justification. In his view, the question of whether psychological attributes are quantitative can only be justified if the empirical structure of psychological assessments conforms to the axioms of a corresponding measurement model (Krantz et al., 1971). ...
Article
Full-text available
Psychometrics builds on the fundamental premise that psychological attributes are unobservable and need to be inferred from observable behavior. Consequently, psychometric procedures consist primarily in applying latent variable modeling, which statistically relates latent variables to manifest variables. However, latent variable modeling falls short of providing a theoretically sound definition of psychological attributes. Whereas in a pragmatic interpretation of latent variable modeling latent variables cannot represent psychological attributes at all, a realist interpretation of latent variable modeling implies that latent variables are empty placeholders for unknown attributes. The authors argue that psychological attributes can only be identified if they are defined within the context of substantive formal theory. Building on the structuralist view of scientific theories, they show that any successful application of such a theory necessarily produces specific values for the theoretical terms that are defined within the theory. Therefore, substantive formal theory is both necessary and sufficient for psychological measurement.
... The CTT paradigm regarding measurement focuses on assessment grades qua grades and relies on a form of operationalism at least with respect to the concepts purported to be measured (Bridgman, 1927); though, strictly speaking CTT (or IRT) is not an approach to 'measurement' in the fundamental sense of classical scientific measurement (i.e., "the estimation or discovery of the ratio of some magnitude of a quantitative attribute to a unit of the same attribute"; -Michell Michell (1997)) instead falling more within representational theories of measurement. CTT, or 'true score theory' as it is also known, starts from the simple assumption that each individual has a true score for any given measurement operation which would be observed if there were no errors in measurement. ...
Preprint
Full-text available
Extended constructed response assessment methods such as the dissertation are common in higher education assessment and are typically afforded considerable weight in overall student assessment. Thus, quality assurance processes such as double marking are commonly implemented. However, the value of such processes has been questioned. Further, the measurement properties of the dissertation assessment method have received little attention. As such, we explored the dissertation assessment method through both Classical Test Theory (CTT) and Item Response Theory (IRT) approaches using a historical dataset of first and second marker grades. Under CTT we found poor agreement between markers which could threaten the validity of the true grades assigned to students. However, under IRT models we found that markers showed greater agreement regarding the underlying latent abilities thought to give rise to the extended constructed response that is the dissertation. We conclude by questioning the value of double marking processes. Grades qua grades (i.e., true grades) typically show poor agreement between markers suggesting double marking may be a waste of resource. Instead, the determination of grades from latent ability score using an IRT measurement model, which showed greater agreement between markers, might enable a single marker to provide a valid grade to students.
... Evidence for instrument validity commences with an adept conceptualisation phase that requires identifying a suitable conceptual or theoretical framework, relevant extensive scholarly literature and a panel of experts to scrutinise the construct(s) and provide feedback on the content validity (Du Preez & De Klerk, 2019;Michell, 1997;Zhou, 2019). Existing scholarly theories were utilised to generate items for the SCTQ to measure sensory ergonomics as a latent construct. ...
Article
Full-text available
Physical classrooms provide immense sensory stimulation to children and inform behaviour, cognitive processes and psychological state of mind. Children diagnosed with any subtype of attention-deficit hyperactivity disorder (ADHD) are more likely to exhibit sensory integration/processing impairments that contribute to inappropriate behavioural and learning responses. Teachers need good information and user-friendly psycho-educational instruments to meet the needs of children diagnosed with any ADHD subtype. The Sensory Classroom Teacher Questionnaire (SCTQ) utilises ADHD symptomatology to evaluate learning spaces that support children in regulating their response to sensory input. We report on the piloted design and refinement of the SCTQ based on best practices. A convenience sample of South African early childhood teachers administered the first (n = 313) and second (n = 72) versions of the SCTQ at various primary schools. Cross-disciplinary specialists appraised the SCTQ for content validity, while the Rasch rating scale model was applied to assess internal construct reliability and validity. The structure of the latent constructs was assessed using Bayesian confirmatory factor analysis. Following the first pilot, we refined the SCTQ by combining or deleting unnecessary items and reducing the five-point Likert scale to a three-point scale. Revising the Likert scale in version one was necessary to improve category functioning. Adjusting the three-point scale in the revised SCTQ indicated good item and scale functioning. We show the conceptual framework, refinement process, all results and the most recent version of the SCTQ for teachers to use and educational researchers to adapt further.
... For example, a gain of 10 points for a low scoring student is taken to indicate the same amount of increase in achievement as a score increase of 10 points for a high scoring student and likewise for a mid-scoring student. There is a thoughtful and ongoing debate in the field whether test scores can have this property (see, e.g., Ballou, 2009;Borsboom, 2005;Briggs, 2013;Domingue, 2014;Michell, 1997Michell, , 2008Mislevy, 2006;Mosteller, 1958;Yen, 1986). But to use vertical scales to measure cross-grade performance, testing programs are tasked with substantiating this claim with supporting evidence. ...
... The collapse of the craniological research program and its failure at quantifying intelligence by means of physical measures did not prevent the assumption of quantitativity of psychological attributes from making its way into the development of early psychological testing, most importantly intelligence testing (Boring, 1961;Carson, 2007Carson, , 2014Gould, 1981). Even when measures of intelligence by means of standardized testing started to appear, independent evidence of its quantitative structure proved far from easy to obtain (Michell, 1997). ...
Article
Full-text available
Craniology – the practice of inferring intelligence differences from the measurement of human skulls – survived the dismissal of phrenology and remained a widely popular research program until the end of the nineteenth century. From the 1970s, historians and sociologists of science extensively focused on the explicit and implicit socio-cultural biases invalidating the evidence and claims that craniology produced. Building on this literature, I reassess the history of craniological practice from a different but complementary perspective that relies on recent developments in the epistemology of measurement. More precisely, I identify two aspects of the measurement culture of nineteenth-century craniologists that are crucial to understand the lack of validity of craniological inference: their neglect of the problem of coordination for their presupposed quantification of intelligence and their narrow view of calibration. Based on my analysis, I claim that these methodological shortcomings amplified the impact of the socio-cultural biases of craniologists, which had a pervasive role in their evidential use of measurement. Finally, my argument shows how the epistemology of measurement perspective can offer useful tools in debates concerning the use of biological evidence to foster social discourse and for analyzing the relationship between theory, evidence, and measurement.
... That is, it promotes the illusion of a cumulative scientific process, where researchers respond to criticism iteratively with "improved" methods, where in reality the same errors of directional confirmation are repeated. Rather than being a feature of a rigorous scientific process, the prevalence of so much methodological criticism can be viewed as a symptom of a poorly functioning scientific system, one that focuses superficially on task design over a greater focus on test development, theory, measurement and construct validity (Allen, 2014;Borsboom, 2006Borsboom, , 2014Eronen & Bringmann, 2021;Farrell & Lewandowsky, 2010;Flake & Fried, 2019;Michell, 1997). A full discussion of these issues in animal cognition research is overdue and beyond the scope of this thesis but is touched upon again briefly in the discussion of modelling in Chapter 10. ...
Thesis
Full-text available
In this thesis I explore the extent to which researchers of animal cognition should be concerned about the reliability of its scientific results and the presence of theoretical biases across research programmes. To do so I apply and develop arguments borne in human psychology’s “replication crisis” to animal cognition research and assess a range of secondary data analysis methods to detect bias across heterogeneous research programmes. After introducing these topics in Chapter 1, Chapter 2 makes the argument that areas of animal cognition research likely contain many findings that will struggle to replicate in direct replication studies. In Chapter 3, I combine two definitions of replication to outline the relationship between replication and theory testing, generalisability, representative sampling, and between-group comparisons in animal cognition. Chapter 4 then explores deeper issue in animal cognition research, examining how the academic systems that might select for research with low replicability might also select for theoretical bias across the research process. I use this argument to suggest that much of the vociferous methodological criticism in animal cognition research will be ineffective without considering how the academic incentive structure shapes animal cognition research. Chapter 5 then beings my attempt to develop methods to detect bias and critically and quantitatively synthesise evidence in animal cognition research. In Chapter 5, I led a team examining publication bias and the robustness of statistical inference in studies of animal physical cognition. Chapter 6 was a systematic review and a quantitative risk-of-bias assessment of the entire corvid social cognition literature. And in Chapter 7, I led a team assessing how researchers in animal cognition report and interpret non-significant statistical results, as well as the p-value distributions of non-significant results across a manually extracted dataset and an automatically extracted dataset from the animal cognition literature. Chapter 8 then reflects on the difficulties of synthesising evidence and detecting bias in animal cognition research. In Chapter 9, I present survey data of over 200 animal cognition researchers who I questioned on the topics of this thesis. Finally, Chapter 10 summarises the findings of this thesis, and discusses potential next steps for research in animal cognition.
Article
Psychological science constructs much of the knowledge that we consume in our everyday lives. This book is a systematic analysis of this process, and of the nature of the knowledge it produces. The authors show how mainstream scientific activity treats psychological properties as being fundamentally stable, universal, and isolable. They then challenge this status quo by inviting readers to recognize that dynamics, context-specificity, interconnectedness, and uncertainty, are a natural and exciting part of human psychology – these are not things to be avoided and feared, but instead embraced. This requires a shift toward a process-based approach that recognizes the situated, time-dependent, and fundamentally processual nature of psychological phenomena. With complex dynamic systems as a framework, this book sketches out how we might move toward a process-based praxis that is more suitable and effective for understanding human functioning.
Article
Full-text available
There is much social scientific research dedicated to measuring and studying public trust. I examine the ways in which the notion of trust is implicitly conceptualised in such studies. I argue that there is a common ontological foundation in most research on the topic: trust is viewed as a phenomenon that is an attribute of individuals, intrapsychic, directed toward specific targets, and that is quantitative and measurable. I criticise this conceptualisation of trust and argue that it: (1) fails to consider the trustworthiness of individuals and institutions, (2) fails to recognise trust as a relational phenomenon and overlooks historical and material conditions that characterise relationships between people and institutions, and (3) lends itself to bureaucratic manipulation of publics rather than fostering authentic relationships of trust. I conclude that studies on trust need to be situated in larger frameworks that attend to the trustworthiness of actors and to relationships between them.
Chapter
This chapter considers the methodological principles of studying proculturation. It highlights the importance of taking an emic approach and “methodological cycle”. Fundamental qualitative features of proculturation are discussed that are considered as the basis for the elaboration of an appropriate phenomena-focused methodological framework. The developmental and constructive/imaginative nature of proculturation is particularly highlighted.
Article
Most mainstream psychologists consider philosophy irrelevant to their work, but see themselves as realists. Various opposition movements embrace philosophy but reject realism, either completely or partially, despite upholding ideas consistent with a realist philosophy. Many on both sides see the Tower of Babel that constitutes psychology as a sign of healthy diversity, not fragmentation. We argue that relations among the three factors – philosophy, realism and fragmentation – deserve closer scrutiny. With philosophy’s core method of conceptual analysis deprioritized, both mainstream psychology and the opposition fracture into an array of “partial realisms”, falling away from a realism that is thoroughly consistent. These are the source of psychology’s fragmentation. The conceptual neglect and resulting confusions can be seen in psychology’s recent replicability crisis, and in the widespread adoption of a representationalist approach to the mind/brain. We argue that the metatheoretical coherence and methodological maturity required for genuine scientific progress involves a consistent adherence to realism. We spell out what that involves and consider several reasons why this has been so difficult for psychology.
Chapter
Understanding in psychological and social domains is different from classical accounts in the relatively simple physical domains, where we search for laws of nature, explanation of particulars (point predictions), and explanation as hypothetico-deductive inference from “covering” laws. The quest for an analogous “social physics” is chimerical: the essential complexity of the “moral sciences” requires explanation by rules governing behavior (not laws), explanation of the underlying principles rather than the indefinite particulars (models as abstract structures behind appearances), and an understanding of historical uniqueness in all living subjects (as opposed to timeless identical and interchangeable objects in physics). These domains are empirical, but not experimental—the mathematical and scaling properties which experimentation presupposes are not available. Spontaneous orders require study of the superior power of disequilibrating forces and negative rules of order as constraints, and giving up the quest for positive prescriptions of particulars.KeywordsExplanationLawsRulesEmpirical versus experimental
Article
Full-text available
The current predicament of British science is but one consequence of a deep and widespread malaise. In response, scientists must reassert the pre-eminence of the concepts of objectivity and truth.
Chapter
Stevens proposed that measurement in psychology should employ one of four scales, nominal, ordinal, interval and ratio, each characterized by a mathematical operation that defines the group of which the scale is an example. Isomorphic with the mathematical scale, there is an appropriate psychological operation which if employed warrants the use of the corresponding scale. The development of this proposal is traced.
Article
The modern viewpoint on quantities goes back at least to Newton’s Universal Arithmetick. Newton asserts that the relation between any two quantities of the same kind can be expressed by a real, positive number.2 In 1901, O. Hoelder gave a set of ‘Axiome der Quantitaet’, which are sufficient to establish an isomorphism between any realization of his axioms and the additive semigroup of all positive real numbers. Related work of Hilbert, Veronese and others is indicative of a general interest in the subject of quantities in the abstract on the part of mathematicians of this period. During the last thirty years, from another direction, philosophers of science have become interested in the logical analysis of empirical procedures of measurement.3 The interests of these two groups overlap insofar as the philosophers have been concerned to state the formal conditions which must be satisfied by empirical operations measuring some characteristic of physical objects (or other entities). Philosophers have divided quantities (that is, entities or objects considered relatively to a given characteristic, such as mass, length or hardness) into two kinds. Intensive quantities are those which can merely be arranged in a serial order; extensive quantities are those for which a “natural” operation of addition or combination can also be specified. Another, more exact, way of making a distinction of this order is to say that intensive quantities are quantities to which numbers can be assigned uniquely up to a monotone transformation, and extensive quantities are quantities to which numbers can be assigned uniquely up to a similarity transformation (that is, multiplication by a positive constant).4 This last condition may be said to be the criterion of formal adequacy for a system of extensive quantities.
Article
Cliff's (1992) commentary on the failure of axiomatic measurement theory (AMT) to generate as much impact on cognitive psychology and psychometrics as he had once anticipated invites further commentary. For the most part, we do not disagree with his observations, but we believe that some amplification and clarification may be helpful. We attempt to establish three major points: 1. There are areas of psychology (e.g., decision making and psychophysics) in which AMT has had consider- ably more impact than Cliff acknowledges, and in these areas it assumes the form of theory, not scale, construction. 2. There are results of a new type (described in Luce, Krantz, Suppes, & Tversky, 1990; Narens, 1985), less well known than those of Krantz, Luce, Suppes, and Tversky (1971), about which Cliff makes no comment. These results should be of broad interest in psychol- ogy for two reasons: They provide a shelf of nonad- ditive representations that can be drawn upon along with the traditional additive and multiplicative ones, and they give better understanding about how to apply meaningfulness and invariance arguments. 3. The failure of measurement to "take" in cognition and psychometrics is related to a deep conceptual ques- tion conceming the relationship between statistics, as away of describing randomness, and measurement, as a way of describing structure. The lack ofan adequate theory for this relationship is, in reality, a weakness of both fields. Our observations do not undercut Cliffs charge that a possibly major reason for the limited impact lies with the researchers themselves. There is no question that our published works tend to be mathematically accessible only to persons having some exposure to abstract alge- bra, geometry, and topology. Except for Roberts, whose 1979 book describes some of the major additive models and their applications, the field is still awaiting someone willing and able to write a suitable bridging work. We suspect this is a major reason why Cliff and others seem unaware of some of the important applications and de-
Article
The thesis that numbers are ratios of quantities has recently been advanced by a number of philosophers. While adequate as a definition of the natural numbers, it is not clear that this view suffices for our understanding of the reals. These require continuous quantity and relative to any such quantity an infinite number of additive relations exist. Hence, for any two magnitudes of a continuous quantity there exists no unique ratio. This problem is overcome by defining ratios, and hence real numbers, as binary relations between infinite standard sequences. This definition leads smoothly into a new definition of measurement consonant with the traditional view of measurement as the discovery or estimation of numerical relations. The traditional view is further strengthened by allowing that the additive relations internal to a quantity are distinct from relations observed in the behaviour of objects manifesting quantities. In this way the traditional theory can accommodate the theory of conjoint measurement. This is worth doing because the traditional theory has one great strength lacked by its rivals: measurement statements and quantitative laws are able to be understood literally.
Article
Current researches on how we arrive at decisions concentrate on utility functions. This article deals with individuals' choices among pairs of dternatives involving only two components, in situations where no risks are incurred. For instance, Alice's father announces he will buy her a Ford coupé for graduation. Sheispermitted toselect between blue and yellow, and to decide whether or not it is to be convertible. A model for choices of this type is detailed here and a test of it reported.
Article
Various axiomatic theories of magnitude estimation are presented. The axioms are divided into the following categories:behavioral, in which the primitive relationships are in principle observable by the experimenter;cognitive, in which the primitive relationships are theoretical in nature and deal with subjective relationships that the subject is supposedly using in making his or her magnitude estimations; andpsychobehavioral, in which the relationships are theoretical and describe a supposed relationship between the experiment's stimuli and the subject's sensations of those stimuli. The goal of these axiomatizations is to understand from various perspectives what must be observed by the experimenter and assumed about the subject so that the results from an experiment in which the subject is asked to estimate or produce ratios are consistent with the proposition that the subject is, in a scientific sense, “computing ratios” in making his or her magnitude responses.
Article
This book is written as a text for a one semester course introducing students to the foundational issues involved in psychological measurement. It is not intended to compete with traditional textbooks on psychometrics, but rather to supplement them, by bringing to students in a critical way some of the recent advances made in our understanding of measurement by theorists such as S. S. Stevens, P. Suppes and R. D. Luce. The theory of conjoint measurement shows how non-extensive forms of measurement can be incorporated within a single, neo-traditional conception, one which has number as part of the empirical realm. In this book I have sought to present conjoint measurement as part of such a theory, one consonant with the development of quantitative science and consonant with an empirical realist theory of number. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Scitation is the online home of leading journals and conference proceedings from AIP Publishing and AIP Member Societies
Article
Professor Terman compares the mental test and the psychological experiment, basing his comparison upon a questionary sent to twenty-two psychologists. The following differences were noted: "1. Tests are intended to throw light upon individual differences; the experiment, to establish general principles. 2. The test, in contrast to the experiment, is characterized by simplicity, or brevity, or less elaborateness, or the use of paper and pencil instead of apparatus. 3. The test has a practical aim, usually individual diagnosis and guidance; it has to do with technology rather than with science." Five correspondents point out the methodological identity of test and experiment. The main thesis of the paper is that the mental test and the psychological experiment are essentially alike, and this is shown by examining the alleged grounds of distinction: (1) Use of tests in individual psychology; (2) pencil and paper character; (3) omission of introspection; (4) exactness, verifiability of results, control of conditions, and possibilities of analysis; (5) practical vs. theoretical aim. An historical survey shows that the mental test and the psychological experiment have grown up together. The psychologists questioned voted that the test method compares well in importance with other accepted psychological methods. From Psych Bulletin 22:01:00163. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Sensory psychology, although it quantitatively measures temporal, spatial, and intensive differences, is considered a qualitative science, because sensations show qualitative and modality differences. These must be accounted for by some selective process, which is usually assigned to the end organ rather than to the afferent nerve terminal. A study of the nature of stimulation, however, shows that afferent nerves and their impulses are qualitatively alike, and a quantitative description of the differences in volleys of nerve discharge constitutes a complete description of neural differences. What, then, is the basis of differentiation between sensations? An examination of hunger and nausea, cold and warmth, pain and pressure, indicates that they are differentiated in terms of the patterns of movement in given tissues which arouse corresponding patterns of nerve discharge. The word quality refers only to such a moment-by-moment representation of events at the periphery. The spatial and temporal pattern of the sensation is determined by the pattern of the volley of nerve impulses, but the modality is determined by the central region which furnishes the intrinsic nervous activity upon which the peripheral pattern is impressed. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
The author asserts that experimental psychology has added not a single quantitative law since the formulation of Weber's law. He believes that a promising approach to the problem of mental measurement may lie in the possibility of correlating experiences of "more" or "less" with extensive changes of linear magnitude. The program which he suggests for carrying out the problem is as follows: "Compare two or more linear magnitudes, note the errors resulting from the different comparisons, and plot one against the other. If the resulting series of points forms a smooth curve it will be possible to express differences in terms of errors, or errors in terms of differences. The meanings will be identical. The unit distance may be defined as that distance which is reacted to correctly in 75% of the cases, or any other percentage preferred to 75." It appears to the author to make little difference whether the units be equal in the psychological sense or in the physical sense. In selecting a method we must provide at least two samples differing from each other, if possible, only in the one feature to be compared. The method of paired comparisons and the method of rank differences are two methods which satisfy these requirements. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
A nonrelational theory of cross-modal matching, including magnitude estimation, is proposed for the general class of 1-dimensional measurement structures that have real-unit (ratio and interval scale) representations. A key feature of these structures is that each point of the structure can be mapped into each other point by a translation, which is the structural analogue of ratio scale transformations of the representation. Let M denote a 1:1 matching relation between 2 (not necessarily distinct) unit structures. The major assumption is that for each translation τ of one structure, there is a translation ς τ of the other such that if xMs, then τ( x) Mς τ( s). This property is shown to be equivalent to a power law holding between the unit representations. A concept of similar relations is taken from dimensional analysis, and 2 matching relations are shown to be similar if and only if their power laws differ only in the unit (modulus), not the exponent. A relation R between pairs in each system is said to be a ratio relation relative to a matching relation M satisfying the above condition provided that ( x,y) R( s,t) obtains if and only if for some translation τ both τ( x) Ms and τ( y) Mt. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Justification, in the vernacular language of philosophy of science, refers to the evaluation, defense, and confirmation of claims of truth. In this article, we examine some aspects of the rhetoric of justification, which in part draws on statistical data analysis to shore up facts and inductive inferences. There are a number of problems of methodological spirit and substance that in the past have been resistant to attempts to correct them. The major problems are discussed, and readers are reminded of ways to clear away these obstacles to justification. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Discussions about the adequacy of psychological measurement and assessment can quickly become controversial. Debates about the usefulness of criticism of psychological testing are longstanding. My 1st purpose, then, is to provide a historical survey of relevant measurement and assessment concepts. I do not delve into intimate details and complexities, but trace measurement and assessment controversies over time and across psychological domains. My 2nd goal is to expand discussion of the possible directions of measurement and assessment beyond those typically considered. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
A methodological analysis of certain problems which arise in connection with psychophysical measurement is given. An attempt is made to discourage the use of terms such as 'intensive,' 'extensive,' and 'additive' when referring to scales in psychophysics. Measurement in psychophysics is set up as a "technique in its own right which one cannot without violence subsume under any of the customary classifications of physical measurement." Attention is directed to the awkwardness of the fact "that a field of research as significant as psychophysics has been disguised as a search for scales." In setting the psychophysical scales aside from physical measurement, no disparagement of these scales is intended. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
The purpose of this study was to examine the possibility of measuring all those discriminable characteristics that exist in discriminable degrees. The logical criteria for measurement were critically examined and the psychological scaling methods were analyzed in the light of these criteria. The conclusion was drawn that none of the attempts at measurement, used so far by psychologists, meet the necessary criteria for fundamental measurement. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
The distinctions often drawn between applied and general psychology are seen to break down under analysis. Since everything ordinarily included by psychologists in the term applied psychology is shown to belong under general (experimental) psychology, applied psychology has no real scientific connotation (apart from the careful work of psychologists in the applied field, so-called, which is truly scientific) and hence may be turned over to pseudo-scientists with little regret. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Explores the relationship between measurement scales and statistical procedures in 3 theories of measurement within psychology—the representational, the operational, and the classical. It is asserted that the representational theory implies a relation between measurement scales and statistics, although not the one mentioned by S. S. Stevens (1946) or his followers. The operational and classical theories, for different reasons, imply no relation between measurement scales and statistics, contradicting Stevens's prescriptions. It is concluded that a resolution of this permissible-statistics controversy depends on a critical evaluation of these different theories. (36 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Two major traditions regarding the theory and practice of measurement in psychology can be identified: the classical approach associated with the work of S. S. Stevens, and the more recently developed axiomatic approach. These two approaches are compared and related to the historical development of the logical analysis of measurement. Some practical consequences of the choice of a measurement theory paradigm are discussed, and the appropriateness of the two major approaches to guide the theory and practice of psychological measurement are evaluated.
Article
The paper focuses on three problems of generalizing properties of concatenation structures (ordered structures with a monotonic operation) to ordered structures lacking any operation. (1) What is the natural generalization of the idea of Archimedeaness, of commensurability between large and small? (2) What is the natural generalization of the concept of a unit concatenation structure in which the translations (automorphisms with no fixed point) can be represented by multiplication by a constant? (3) What is the natural generalization of a ratio scale concatenation structure being distributive in a conjoint one, which has been shown to force a multiplicative representation of the latter and the product-of-powers representation of units found in physics? It is established (Theorems 5.1 and 5.2) that for homogeneous structures, the latter two questions are equivalent to it having the property that the set of all translations forms a homogeneous Archimedean ordered group. A sufficient condition for Archimedeaness of the translations is that they form a group, which is equivalent to their being 1-point unique, and the structure be Dedekind complete and order dense (Theorems 2.1 and 2.2). It is suggested that Archimedean order of the translations is, indeed, also the answer to the first question. As a lead into that conclusion, a number of results are reported in Section 3 on Archimedeaness in concatenation structures, including for positive structures sufficient conditions for several different notions of Archimedeaness to be equivalent. The results about idempotent structures are fragmentary.
Article
The paper presents a translation of excerpts from Fechner's (1887) paper On the principles of mental measurement and on Weber's law, which was his last and most perfect (Wundt) statement of the assumptions underlying his outer psychophysics. Fechner maintains that all measurement, including mental measurement, rests on the principle that n magnitudes that are judged equal may be added and result in a magnitude n times as large as the individual magnitudes. He concedes that bisection methods fulfill this principle as well as just noticeable differences. Weber's law is not a necessary precondition of mental measurement; its validity is an empirical question rather than a matter of principle. The differential threshold is not an inherent property of sensation or attention, but depends on the unavoidable spatio-temporal noncoincidence of stimuli and of the sensations corresponding to them. Given this presupposition and assigning a value of zero to the absolute threshold, it is possible to arrive at a scale of sensation differences, and thus of sensations, from a scale of difference sensations.
Chapter
The twentieth century has witnessed an unprecedented 'crisis in the foundations of mathematics', featuring a world-famous paradox (Russell's Paradox), a challenge to 'classical' mathematics from a world-famous mathematician (the 'mathematical intuitionism' of Brouwer), a new foundational school (Hilbert's Formalism), and the profound incompleteness results of Kurt Gödel. In the same period, the cross-fertilization of mathematics and philosophy resulted in a new sort of 'mathematical philosophy', associated most notably (but in different ways) with Bertrand Russell, W. V. Quine, and Gödel himself, and which remains at the focus of Anglo-Saxon philosophical discussion. The present collection brings together in a convenient form the seminal articles in the philosophy of mathematics by these and other major thinkers. It is a substantially revised version of the edition first published in 1964 and includes a revised bibliography. The volume will be welcomed as a major work of reference at this level in the field.
Article
The study deals mainly with absolute magnitude estimation (AME) of the component loud-nesses and the total loudness of pairs of heterofrequency, sequential tone bursts. Two kinds of relations are derived from the obtained group and individual data on the assumption of loudness additivity and a two-stage scaling model. They refer to numerical loudness estimates versus derived loudness magnitudes and to the loudness magnitudes versus tone sensation levels. The relations are validated by means of indirect and direct loudness matches. In an auxiliary experiment, the same subjects performed AMEs of subjective line lengths. The resulting group and individual relations between the numerical estimates and the underlying physical line lengths were found to be nearly the same as those between the numerical loudness estimates and the derived loudness magnitudes. The mutual consistency among the several sets of empirical and derived data strongly supports the assumptions of loudness additivity and the two-stage model.
Article
This article presents a discussion of the coordinate relationship of mathematical models and empirical observations of the real world. Scales of measurement are taken as examples of the application of mathematical models and the point is made that if the axioms of these scales are not satisfied by that segment of the real world which is mapped into them, then the interpretations of the mathematical conclusions may have no meaning or reality. It is for this reason that it is possible to impose on the real world an abstract theory which may be invalid. A partial ordering of various alternative mathematical systems available for measurement is presented with illustrations in order to reveal the relative strengths of these scales to which the real world must conform to permit their application.
Article
A refutation of Brower's criticisms of the users of statistics, the "atomistic fallacy," and the use of "complicated" statistics. (See ^W24:^n 3494.)
Article
It can be maintained that the application of any but ordinal statistics to the results of psychological measurement is unjustified, since there is no operation of addition, and no transitive symmetrical relation of equality. However, the meanings given numbers in measurement can be considered as varying with the operations employed in the measurement. Psychologists can develop measurement operations which will allow the application of statistical methods without a process of addition; e.g. equal-unit and ratio scales.
Vicissitudes of Fechnerian psychophysics in America Psychology : Theoretical-historical Perspectives
  • H E Adler
Adler, H. E. (1980). Vicissitudes of Fechnerian psychophysics in America. In R. W. Rieber & K. Salzinger (Eds), Psychology : Theoretical-historical Perspectives, pp. 11-23. New York : Academic Press. University Press.
Final report of the committee appointed to consider and report upon the possibility of quantitative estimates of sensory events
  • A Ferguson
  • C S Myers
  • R J Bartlett
  • H Banister
  • F C Bartlett
  • W Brown
  • N R Campbell
  • K J W Craik
  • J Drever
  • J Guild
  • R A Houstoun
  • J O Irwin
  • G W C Kaye
  • S J F Philpott
  • L F Richardson
  • J H Shaxby
  • T Smith
  • R H Thouless
  • W S Tucker
Ferguson, A., Myers, C. S., Bartlett, R. J., Banister, H., Bartlett, F. C., Brown, W., Campbell, N. R., Craik, K. J. W., Drever, J., Guild, J., Houstoun, R. A., Irwin, J. O., Kaye, G. W. C., Philpott, S. J. F., Richardson, L. F., Shaxby, J. H., Smith, T., Thouless, R. H. & Tucker, W. S. (1940). Final report of the committee appointed to consider and report upon the possibility of quantitative estimates of sensory events. Report of the British Association for the Advancement of Science, 2, 331-349.
The fundamental nature of measurement
  • I Lorge
Lorge, I. (1951). The fundamental nature of measurement. In F. Lindquist (Ed.), Educational Measurement, pp. 533-559. Washington, DC : American Council of Education.
Measurement structures with Archimedean ordered translation groups. Order, 4On the possible psychophysical laws' revisited: Remarks on cross-modality Luce, Simultaneous conjoint measurement: A new type of fundamental A critique of mental measurements
  • R D Luce
  • R D R D Luce
  • D H Krantz
  • P Suppes
  • A D Tversky
  • J W Tukey
  • T J Mccormack
Luce, R. D. (1987). Measurement structures with Archimedean ordered translation groups. Order, 4, Luce, R. D. (1990). 'On the possible psychophysical laws' revisited: Remarks on cross-modality Luce, R. D., Krantz, D. H., Suppes, P. & Tversky, A. (1990). Foundations of Measurement, vol. 3. San Luce, R. D. & Tukey, J. W. (1964). Simultaneous conjoint measurement: A new type of fundamental McCormack, T. J. (1922). A critique of mental measurements. School and Society, 15, 686692. Massey, B. S. (1986). Measures in Science and Engineering: Their Expression, Relation and Interpretation.