ArticlePDF Available

Is Construct Validation Valid?

Authors:

Abstract

What makes a measure of well-being valid? The dominant approach today, construct validation, uses psychometric tests to ensure that questionnaires behave in accordance with background knowledge. Our first claim is interpretive – construct validation obeys a coherentist logic that seeks to balance diverse sources of evidence about the construct in question. Our second claim is critical - while in theory this logic is defensible, in practice it does not secure valid measures. We argue that the practice of construct validation in well-being research is theory-avoidant, favoring a narrow focus on statistical tests while largely ignoring relevant philosophical considerations.
Is Construct Validation Valid?
Anna Alexandrova and Daniel M. Haybron*y
What makes a measure of well-being valid? The dominant approach today, construct
validation, uses psychometrics to ensure that questionnaires behave in accordance with
background knowledge. Our rst claim is interpretiveconstruct validation obeys a
coherentist logic that seeks to balance diverse sources of evidence about the construct
in question. Our second claim is criticalwhile in theory this logic is defensible, in
practice it does not secure valid measures. We argue that the practice of construct val-
idation in well-being research is theory avoidant, favoring a narrow focus on statistical
tests while largely ignoring relevant philosophical considerations.
1. Introduction. What makes a measure of well-being valid? A major proj-
ect in todays social and medical sciences is measurement of happiness, life
satisfaction, and perceived quality of life using self-reports. When question-
naires used to elicit these reports obey the principles of psychometrics, they
are considered to be valid measurement tools. Central to this project is con-
struct validationa method for checking the consilience of questionnaires
with the background knowledge about the property in question.
In this article we focus on construct validation of measures of self-reported
states relevant to well-being. There is perhaps more to well-being than sub-
jective states such as happiness or satisfaction, but we put this concern aside.
How an agent feels and judges their life is undoubtedly relevant to their over-
all well-beingany theorist accepts that much. So evaluating standard mea-
surement tools for detecting these feelings and judgments is important regard-
*To contact the authors, please write to: Anna Alexandrova, Department of History
and Philosophy of Science, University of Cambridge, Free School Lane, Cambridge
CB2 3RH, UK; e-mail: a.a.alexandrova@gmail.com. Daniel M. Haybron, College
of Arts and Sciences, Department of Philosophy, Saint Louis University, Verhaegen
Hall, 3634 Lindell Blvd., St. Louis, MO 63108; e-mail: haybrond@slu.edu.
yThe authors are equally and jointly responsible for the contents. They thank the anon-
ymous referees, Valerie Tiberius, Colin DeYoung, and Elina Vessonen for valuable com-
ments.
Philosophy of Science, 83 (December 2016) pp. 10981109. 0031-8248/2016/8305-0038$10.00
Copyright 2016 by the Philosophy of Science Association. All rights reserved.
1098
less of our philosophical persuasion on the nature of well-being. The real
question is whether construct validation evaluates these questionnaires in a
fair way.
Our rst claim is an explicit statement of the logic of the process, some-
thing philosophers have not done so far. Construct validation, we argue, fol-
lows a coherentist spirit according to which measures are valid to the extent
that they cohere with theoretical and empirical knowledge about the states
being measured. In theory this is a defensible approach to measurement, but
in practice the current procedures of validation do not respect all sources of
knowledge about well-being, and this is our second claim. Construct vali-
dation is in fact dangerously theory avoidant, failing to respect a core com-
mitment of any plausible theory of well-being, namely, that well-being is a
normative category. This constraint implies that measures of subjective states
relevant to well-being need to be judged on their normative validity in addi-
tion to other characteristics. The current almost exclusive attention to the sta-
tistical correlations between questionnaires and questionnaire items does not
provide sufcient constraints to weed out weak measures. We close with a sug-
gestion for how construct validation can be improved.
2. What Is Construct Validation?.The rst order of business is to get
clear on the logic behind the procedure. The psychometric tradition in
the social sciences has historically specialized in tests and questionnaires
for detecting unobservable attributes such as intelligence and personality
traits. Today for virtually all researchers who wish to measure any attribute
on the basis of self-reports or performances in tests, psychometric valida-
tion remains the obligatory procedure. The practitioners of the new science
of well-beingpsychologists, sociologists, clinical scientistshave also
embraced questionnaires and, with that, psychometric validation.
Questionnaires used in well-being research range from gauging a per-
sons feeling (How anxious do you feel?) to gauging their judgments
(Is your life going well according to your priorities?) to gauging their per-
ception of facts deemed important (Do you feel in control of your circum-
stances?). They can be longer or shorter and administered through various
media. Some well-known questionnaires include the Satisfaction with Life
Scale (SWLS; Diener et al. 1985), the Positive and Negative Affect Scale
(PANAS; Watson, Clark, and Tellegen 1988), and the Nottingham Health
Prole (Hunt et al. 1981), which measure life satisfaction, happiness, and
health-related quality of life, respectively.
Validation of these scales follows a typical pattern described in measure-
ment textbooks and articles on validation (Simms 2008; de Vet et al. 2011).
First, researchers dene the construct to be measured by elaborating its
scope and limits. This is the conceptual stage in which the meaning of
the concepts in question is discussed, invoking anything from philosophical
IS CONSTRUCT VALIDATION VALID? 1099
theories to untutored intuitions to dictionary denitions. For example, the
scope of happiness is often deemed to be positive and negative affect, while
the scope of satisfaction with life is deemed a cognitive judgment about
ones conditions and goals. In the second stage, researchers choose a mea-
surement method (a questionnaire, a test, or a task), select the items (what
questions? what tasks?), and settle on the scoring method. In the third and
nal stage, the instrument is tested for its validity. We focus on this last step,
because it is supposed to discipline all the free philosophizing that happens
in the earlier stages with the hard tools of psychometrics. What are those
tools?
It is hard to speak of a psychometric method in general because the meth-
ods are numerous and constantly evolving.
1
But in the case of well-being
measures, validation frequently involves factor analysis: when hundreds
of subjects ll out the same questionnaire, perhaps several times over a pe-
riod, it is possible to observe the correlations between responses to different
items. These correlations are then used to show that there are one or more
clusters of items called factorsthat account for the total information. Sci-
entists speak of factor analysis as extracting a manageable number of latent
dimensions that explain the covariation among a larger set of manifest var-
iables(Simms 2008, 421).
2
Explanation is here used in an entirely phe-
nomenological sense as saving the phenomena (the phenomena being the
total data generated by administering the questionnaire in question), rather
than stating the causes of the phenomena. For example, the SWLS is a pop-
ular ve-item Likert scale for measuring the cognitive aspect of subjective
well-being, that is, the extent to which subjects judge their life to be satis-
factory. Factor analysis identied all ve items to be measuring the same
latent variable because a single factor accounted for 66% of the variance
in the data (Diener et al. 1985). Other scales may turn out to gauge more
than one dimension.
The next step of the testing stage is to check that the behavior of these
factors accords with other things scientists know about the object in ques-
tion. In the case of subjective well-being, this knowledge includes how peo-
ple evaluate their lives and surroundings, what behavior results from these
evaluations, and what other people who know the subjects say about them.
For example, the aforementioned SWLS, according to its authors, earned
1. Sawilowsky (2007) summarizes the state of the art.
2. There is a difference between exploratory and conrmatory factor analysis (see de
Vet et al. 2011, 16972, among other places). The former is used to reduce the number
of items in a questionnaire by identifying the one(s) that best predict the overall ratings.
The latter, on the other hand, tests that the factors that best summarize the data also con-
form with a theory of the underlying phenomenon if there is one. This distinction is not
important for the present argument.
1100 ANNA ALEXANDROVA AND DANIEL M. HAYBRON
construct validity when Diener and his colleagues compared responses on
the SWLS to responses on other existing measures of subjective well-being
and related constructs such as affect intensity, happiness, and domain satis-
faction. The ndings conrmed their expectation that SWLS scores corre-
late highly with those measures that also elicit a judgment on subjective
well-being and less so with measures that focus only on affect or self-
esteem or other related but distinct notions. One piece of evidence in favor
of SWLS was that the scores of 53 elderly people from Illinois correlated
well to the ratings this same population received in an extended interview
about the extent to which they remained active and were oriented toward
self-directed learning(Diener et al. 1985, 73). How strong was the corre-
lation? It was r50.43, which is adequate by the standards of the discipline.
Since 1985, SWLS has continued to be scrutinized for its agreement with
the growing data about subjective well-being. Individual judgments of life
satisfaction have been checked against the reports of informants close to the
subjects (Schneider and Schimmack 2009). Proponents of SWLS argue that
it exhibits a plausible relationship with money, relationships, suicide, and
satisfaction with various domains of life, such as work and living condi-
tions.
3
Now we are in a position to formulate a logic for psychometric validation
that we believe captures these practices:
Implicit Logic. A measure Mof a construct Cis validated to the extent that
Mbehaves in a way that respects three sources of evidence:
1. Mis inspired by a plausible theory of Cspecied in stage 1.
2. Subjects reveal Mto track Cthrough their questionnaire answering
behavior.
3. Other knowledge about Cis consistent with variations in values of
Macross contexts.
The rst condition captures the role of philosophizing about the nature of C
in the rst stage of measure development. There are no strong criteria for
what makes a conception of Cplausible and how elaborate it should be.
The second condition species the assumption behind factor analysis.
4
The
third acknowledges that scientists go beyond the merely internal analysis of the
scale: a valid measure correlates with indicators that our background knowl-
edge says it should and does not correlate with indicators that it shouldnt. To-
3. See Diener et al. (2008, 7493) for summary and references.
4. For convenience we are focusing on the practice of factor analysis, even though not all
construct validation procedures that concern us involve ite.g., the validation of single-
item measures.
IS CONSTRUCT VALIDATION VALID? 1101
gether the three conditions capture what it takes for a measure to be declared
valid,
5
but they do not explain the reasons why this inference works. So the
next step is to evaluate Implicit Logic.
3. Construct Validation Is Good in Theory. Construct validation as de-
scribed above conceives of measurement as part of theory development and
validation as part of theory testing. On the original proposal formulated in
the classic 1955 article by Lee Cronbach and Paul Meehl, construct valida-
tion consists in testing the nomological network of hypotheses in the neigh-
borhood of the construct in question (Cronbach and Meehl 1955). To mea-
sure x, we need to know how xbehaves in relation to other properties and
processes that are systematically connected with xby lawlike regularities.
Something like this view is still the consensus: To determine whether a
measure is useful, one must conduct empirical tests that examine whether
the measure behaves as would be expected given the theory of the underly-
ing construct(Diener et al. 2008, 67).
We believe that this vision of measure validation is defensible. Its spirit is
remarkably similar to the coherentist vision that characterizes recent work
on measurement of physical quantities (Chang 2004; van Fraassen 2008;
Tal 2013). These philosophers emphasize that the outlines of the concept
in question, be it temperature or time, and the procedure for detecting it
are settled not separately but iteratively, checking and correcting one against
another. Similarly in our case, the initial philosophical judgment about the
nature of happiness or quality of life is coordinated with other constraints
such as the statistical features of the questionnaires and the background knowl-
edge about behavior, related indicators, and ratings of informants. The result-
ing measurement tools can be deemed valid to the extent that they accommo-
date all evidence.
The above vision appears to contrast with cases where measurement
starts with a set of observable relations (e.g., rigid rods of different lengths,
or choices of different goods by an agent) and proceeds via axioms to nu-
merical structures (such as a sequence of real numbers to represent length or
utility function). The latter picture is often associated with the representa-
tional theory of measurement. According to this, a measure is valid if there
is a demonstrated homomorphism between an observable relation and a nu-
merical relational structure (Krantz et al. 1971). The economic approach to
welfare measurement via gross domestic product and other economic indi-
cators seems to follow this logic because it relies, in part, on axioms that
5. There are, of course, other kinds of validity. We concentrate on construct because
among measurement theorists the consensus seems to be that construct validity encom-
passes all other types of validity, such as criterion, predictive, discriminant, and content
validity (Strauss and Smith 2009).
1102 ANNA ALEXANDROVA AND DANIEL M. HAYBRON
relate preferences to utility. Some commentators conclude that since the
psychometric approach does not rely on axioms, it is therefore not in keep-
ing with the representational theory (Angner 2009).
We make no such claims. It may well be that the psychometric approach
is not a tradition of its own and that it too needs something that has played
the role of axioms in the representational theory.
6
Perhaps step 1 of our Im-
plicit Logic aims at this goal by delineating the bounds of the concept in
question. All we claim is that the ideal behind construct validation is to for-
mulate reliable scales that accord with background knowledge. If this pro-
cess works, it should be enough for measurement. But does it?
4. Construct Validation in Practice. Things look worse in practice than
in theory. Although questionnaires are validated against a broad range of
evidence, psychometricians are selective about what counts as evidence
in favor of or against construct validity. We see two problems that illustrate
this selectivity. First, the existing data used to validate questionnaires do not
provide sufcient constraints to weed out the poor ones. Secondly, a legit-
imate source of evidence about the nature of states relevant to well-being
philosophical theorizingis either never used or else overridden by statis-
tical considerations. These are the two senses in which construct validation
is theory avoidant, sacricing valid theoretical knowledge for statistics for
no good reason.
As step 3 of our Implicit Logic shows, researchers base judgments of va-
lidity mainly on whether the measure in question exhibits plausible-seeming
correlations with relevant-seeming variables. This is not unreasonable, since
correlational data are themain source of empirical evidence at hand, and there
is something of a chicken-and-egg problem in that, if we already knew exactly
what correlations a measure should exhibit, we might not have much need for
the measure. One piece of evidence that a well-being measure is valid, for in-
stance, might be that it correlates to some signicant degree with money. But
then, on the other hand, the correlation between well-being and money may be
precisely one of the things we hope to nd out using the measure. Psycho-
metricians have their work cut out for them.
It makes sense, then, that validation procedures should be exible and
holistic: we see whether, on balance, the measure behaves in a way that
makes sense. While correlations with any given variable might prove to
be surprising, the overall pattern of correlations should not, in general, be
too much of a surprise. When we do get broadly unexpected resultsas
might have been the case, for instance, when research seemed to indicate
that happiness was so strongly prone to adaptation as to be nearly immuta-
6. See Cartwright and Bradburn (2011) on the importance of representation in social
measurement, where concepts are often fuzzy and multitudinous.
IS CONSTRUCT VALIDATION VALID? 1103
ble (Lykken and Tellegen 1996)then either we need some theoretical
framework to make sense of it, say, that happiness is strongly governed
by homeostatic mechanisms that keep the individual hovering around a given
set point,or we should suspect that the measures are not in fact valid (or
that the results are otherwise spurious).
The trouble is that what counts as a plausible correlationis a rather elas-
tic quantity, both vague and open to the interpretive predilections of the in-
vestigator, whose judgment in the matter may be less than impartial. The
problem is particularly acute in well-being research, where it can seem as
if nearly everything correlates substantially with nearly everything else.
Moreover, commonsense views of well-being tend to be both expansive
and incoherent; it is only somewhat exaggerated to say that just about any-
thing one might care to venture about well-beingmoney buys happiness,
money doesnt buy happinessis already part of the folklore.
Take a long list of variables that seem like they might be related to well-
beingmoney, relationships, health, education, work, and so on. Imagine
two measures, A and B, each of which correlates substantially with nearly
all of these variables, while also differing greatly in what those correlations
are. One suggests that relationships are more strongly related to well-being
than money, while the other has the reverse implication, and so forth. It
seems entirely possible that both measures could reasonably be deemed
to exhibit plausible correlationsand generally pass as valid measures of
well-being. It is also possible that one of those measures is in fact valid, while
the other is not: A gets the correlations essentially right, while B gets them
wrong.
This sort of scenario is not merely a theoretical possibility. Recent stud-
ies have found that life evaluation and affect measures of well-being give
importantly different results, and some researchers have taken the differ-
ences to indicate that life evaluation metrics (such as the SWLS) are supe-
rior on the grounds that they are, or are claimed to be, more sensitive to life
circumstancesgenerally, correlating more strongly with quantities that
have traditionally interested policymakers such as income, governance,
freedom, and so on (Helliwell, Layard, and Sachs 2012). One question,
to which we will return shortly, is whether the affect measures in question
are themselves well designed. More pertinent for current purposes is this:
why should we assume that the better measure must correlate more strongly
with those variables? Suppose that hedonism, one of the main theories of
well-being in the literature, is in fact correct. In that case, perhaps the best in-
terpretation of the data is that well-being isnt very sensitive to life circum-
stances. (Of course, those variables, like good governance, might matter a
great deal for reasons of justice, or some other reason.)
Alternatively, perhaps the life circumstanceson which these research-
ers are focusing just arent the ones that matter most for well-being. An im-
1104 ANNA ALEXANDROVA AND DANIEL M. HAYBRON
portant article discussing data from the same global survey, for example,
reports that while life evaluation metrics do indeed track material prosper-
itymore strongly, the affect measures better correlate with what the au-
thors call psychosocial prosperity: whether people reported being treated
with respect in the previous day, had friends to count on, learned something
new, did what they do best, or chose how their time was spent (Diener et al.
2010). It would not be eccentric to suggest that these are just the sorts of
variables that seem most obviously to matter for well-being, and to which
good measures of well-being ought to be sensitive. Perhaps, then, it is the
affect measure, and not life evaluation, that offers a more meaningful pic-
ture of well-being.
Or perhaps not. Our point is not to endorse or critique either sort of mea-
sure.
7
There may be other reasons to favor life evaluation measures, and
there are differences in the data sets being used by these investigators that
we cannot assess here. The point is just to illustrate how two prominent
measures could both be deemed valid measures of well-being by prevailing
standards, though they have very different statistical propertiesand, cru-
cially, statistical tests alone cannot tell us which is the superior instrument.
We need to appeal to theoretical considerations as well: what conception of
well-being is relevant here? Given our best understanding of human well-
being, what sorts of factors should a good measure correlate most strongly
with? Is the measure that more closely tracks money and stuff likely to be a
better indicator of well-being than one that tracks relationships and mean-
ingful work? If we do not take these theoretical questions seriously, ideally
before testing our instruments, we risk settling on whatever measures are
most convenient, most congenial to our personal views, or simply ours,
and not someone elses.
One form of theory avoidance, then, can lead us to focus on the wrong
correlations, or have the wrong ideas about what the right correlations are:
the statistical data alone do not provide sufcient constraints to allow us to
assess the validity of a measure. In a second form, theory avoidance can
have us measuring the wrong variables altogether, because our instruments
are insufciently grounded in theoretical considerations that might provide
a rationale for their design. We illustrate with an example of a popular af-
fect questionnaire known as PANAS. PANAS assesses the relative preva-
lence of positive over negative mood and is commonly used to measure the
affective dimensions of subjective well-being. This 20-item questionnaire
7. We may be seeming to mix apples and oranges here, as life evaluation and affect mea-
sures arent even supposed to be measures of the same construct. In fact, however, this is
not entirely true: while their proximal concerns are quite distinct, both are often posited
and deployed more fundamentally as general metrics of well-being, aimed at giving a
rough snapshot of overall welfare.
IS CONSTRUCT VALIDATION VALID? 1105
asks subjects to rate themselves on whether they feel enthusiastic, inter-
ested, excited, strong, alert, proud, active, determined, attentive, inspired,
and so on (Watson et al. 1988). All these items have passed factor analysis
and other standard psychometric tests. But note that absent from this list are
cheerfulness, joy, laughter, sadness, depression, tranquillity, anxiety, stress,
wearinessemotions that are intuitively far more central to a happy psy-
chological state and to well-being. This is because the authors of PANAS
arrived at the list of items by testing a long list of English mood terms and
paring it down via factor analysis, so that a longer list would not yield ap-
preciably different results.
Such a procedure allows investigators to avoid hard theoretical questions
about which taxonomy of emotional states to employ, or which states are
most relevant to well-being. But for the same reason, there is little reason
to expect such a method to yield a sound measure of well-being, or even
of emotional well-being. Rather, what is being assessed, roughly, is the
number of English mood terms that apply to the respondentor rather,
the number of terms from a list of words that survived factor analysis.
But, rst, this leaves the measure prey to the vagaries of common English
usage and folk psychologypotentially important emotional phenomena
may not be prominent in the vocabulary of a given language, or may not
be correctly classied as emotional, and so may be omitted from the mea-
sure. Of particular concern here are relatively diffuse background states
anxiety, stress, peace of mind (not on the list)that are quite important for
well-being yet easily overlooked, resulting in a kind of streetlightprob-
lem where we end up looking where the light is best, rather than where
the keys are.
Second, some states are presumably more important for well-being than
others; feelings of serenity or joy (not on the list) probably count for more
than feeling attentiveor alert(on the list), and indeed some of the
PANAS items might barely deserve inclusion at all, if our interest is in as-
sessing well-being. Yet a term like attentivemight exhibit quite distinctive
correlations and thus make it on the list, while other more salient terms are
left by the wayside.
The worries here essentially amount to saying that you cant get the right
measure without attending to theoretical considerationsnamely, what do
our best theories tell us are the emotional states that might matter for well-
being? For example, one of the authors recently proposed an account of
emotional well-being, or happiness, that divides emotional states into three
broad typesrepresenting functional responses to different types of well-
being-relevant information regarding matters of security, opportunity, and
successand further posits emotional well-being as a central element in
an account of well-being (Haybron 2008, 2013). Whether or not that taxon-
omy is the right one to employ in well-being measures, some such account
1106 ANNA ALEXANDROVA AND DANIEL M. HAYBRON
could provide a theoretically motivated basis for developing affect-based
well-being instruments.
We do not deny that PANAS is useful or exhibits some desirable statis-
tical properties, and perhaps it does provide a reasonable, if somewhat
opaque, metric of well-being. As before, our purpose is not to critique a par-
ticular measure so much as to illustrate how practices of construct valida-
tion can be seriously inadequate given the ease with which they can fail
to attend seriously to theoretical concerns. While we have not tried to doc-
ument the extent of the problem and have focused mainly on illustrating the
risks, that there is some problem should be uncontroversial. The risks, we
think, are not infrequently realized if only because the examples discussed
here, the SWLS and PANAS, are very popular. The problem here resembles
a complaint often lodged against philosophersconceptual or linguistic anal-
yses, namely, a heavy reliance on the investigatorshunches or intuitions,
without adequate attention to the theoretical motivation, or lack thereof, for
reaching a certain view. This is not just a hazard for philosophers.
5. What Is to Be Done?.It is understandable that social scientists, like
other researchers, will want to focus their efforts where their competence
and interests are greatest. The theory-avoidant status quo has developed in
psychology owing to its operationalist heritage, which was key to its estab-
lishment as a hardscience; even today, psychologists insist that although
building substantive theories of subjective well-being is a worthy enterprise,
they are not trained to do so and it is safer to tread close to the easily observ-
able and reproducible results of psychometrics. Any proposal for reform
should respect the fact that this status quo is unlikely to change in any deep
ways. But correlation mongering is no substitute for theory, and so we urge
that construct validation protocols also assess the normative validity of mea-
sures. The normative validity of a measure of, say, happiness, is the extent to
which this measure respects the importance of happiness for well-being,
since well-being is the ultimate object of concern for the scientic project
in question. We conceive of normative validity as a fourth condition on Im-
plicit Logic in addition to the three existing ones: a measure Mmust respect
what is important about construct C. Just as philosophers relying on empir-
ical assumptions are increasingly expected to engage with the relevant sci-
entic literatures, so too should empirical researchers attend to the literature
that bears on the key philosophical assumptions they are making.
We are under no illusions that this is a lot to ask of scientists whose iden-
tity often enough consists in not being philosophers. Besides, the very ques-
tion of normative validity can be a genuinely difcult onephilosophers do
disagree about the importance of, say, life satisfaction for well-being. Never-
theless, a scholarly convention to discuss normative validity at least briey
in articles on validation would go some way toward agging this issue. At
IS CONSTRUCT VALIDATION VALID? 1107
the very least, if there is no theory of well-being according to which the con-
struct in question is important, that should count against a measure.
The science of well-being makes no pretense of being value-free in one
clear sense: well-being is a value worth understanding and pursuing. The
eager and successful policy engagement of the prominent gures in this
eld attests to this therapeutic mission. From this point of view our proposal
is quite tamewe merely try to show how the measurement and validation
practices of the science of well-being can catch up to the already-existing
normative ambition.
REFERENCES
Angner, Erik. 2009. Subjective Measures of Well-Being: Philosophical Perspectives.In The Ox-
ford Handbook of Philosophy of Economics, ed. Harold Kincaid and Don Ross, 56079. Ox-
ford: Oxford University Press.
Cartwright, Nancy, and Norman Bradburn. 2011. ATheory of Measurement.In The Importance
of Common Metrics for Advancing Social Science Theory and Research: Proceedings of the
National Research Council Committee on Common Metrics,5370. Washington, DC: National
Academies.
Chang, Hasok. 2004. Inventing Temperature: Measurement and Scientic Progress. Oxford: Ox-
ford University Press.
Cronbach, Lee J., and Paul E. Meehl. 1955. Construct Validity in Psychological Tests.Psycho-
logical Bulletin 52 (4): 281302.
de Vet, Henrica C. W., Caroline B. Terwee, Lidwine B. Mokkink, and Dirk L. Knol. 2011. Mea-
surement in Medicine: A Practical Guide. Cambridge: Cambridge University Press.
Diener, Ed, Robert A. Emmons, Randy J. Larsen, and Sharon Grifn. 1985. The Satisfaction with
Life Scale.Journal of Personality Assessment 49 (1): 7175.
Diener, Ed, Richard E. Lucas, Ulrich Schimmack, and John Helliwell. 2008. Well-Being for Public
Policy. New York: Oxford University Press.
Diener, Ed, Weiting Ng, James Harter, and Raksha Arora. 2010. Wealth and Happiness across the
World: Material Prosperity Predicts Life Evaluation, Whereas Psychosocial Prosperity Pre-
dicts Positive Feeling.Journal of Personality and Social Psychology 99 (1): 52.
Haybron, Daniel. M. 2008. The Pursuit of Unhappiness: The Elusive Psychology of Well-Being.
New York: Oxford University Press.
———. 2013. Happiness: A Very Short Introduction. New York: Oxford University Press.
Helliwell, John, Richard Layard, and Jeffrey Sachs. 2012. World Happiness Report. Columbia Uni-
versity: Earth Institute.
Hunt, Sonja M., S. P. McKenna, J. McEwen, Jan Williams, and Evelyn Papp. 1981. The Notting-
ham Health Prole: Subjective Health Status and Medical Consultations.Social Science and
Medicine. Part A: Medical Psychology and Medical Sociology 15 (3): 22129.
Krantz, David, Duncan Luce, Patrick Suppes, and Amos Tversky. 1971. Foundations of Measure-
ment. Vol. 1, Additive and Polynomial Representations. New York: Academic.
Lykken, David, and Auke Tellegen. 1996. Happiness Is a Stochastic Phenomenon.Psychological
Science 7 (3): 18689.
Sawilowsky, Shlomo. 2007. Construct Validity.In Encyclopedia of Measurement and Statistics,
ed. Neil J. Salkind and K. Rasmussen, 17982. Thousand Oaks, CA: Sage.
Schneider, Leann, and Ulrich Schimmack. 2009. Self-Informant Agreement in Well-Being Rat-
ings: A Meta-analysis.Social Indicators Research 94 (3): 36376.
Simms, Leonard J. 2008. Classical and Modern Methods of Psychological Scale Construction.
Social and Personality Psychology Compass 2 (1): 41433.
Strauss, Milton E., and Gregory T. Smith. 2009. Construct Validity: Advances in Theory and
Methodology.Annual Review of Clinical Psychology 5:125.
1108 ANNA ALEXANDROVA AND DANIEL M. HAYBRON
Tal, Eran. 2013. Old and New Problems in Philosophy of Measurement.Philosophy Compass 8
(12): 115973.
van Fraassen, Bas. C. 2008. Scientic Representation: Paradoxes of Perspective. Oxford: Oxford
University Press.
Watson, David, Lee A. Clark, and Auke Tellegen. 1988. Development and Validation of Brief
Measures of Positive and Negative Affect: The PANAS Scales.Journal of Personality and
Social Psychology 54 (6): 1063.
IS CONSTRUCT VALIDATION VALID? 1109
... Given the propensity to use factor analyses for measurement and explanation ( Borsboom, 2005), there is little thought given for a non-ordered conceptualization of constructs like SWB (Alexandrova & Haybron, 2016;Michell, 2012). Yet, if there is something that one can expect to exist differently from a typical linear quantitative ordering of magnitudes or with heterogeneous order (e.g. ...
... That (statistical) validation of complex constructs such as SWB rely on factor analysis as a primary form of validation may require a larger literature review than possible here. Another main purpose of this paper is to be an empirical realization that standard psychometric practice of maximizing certain metrics such as Cronbach's alpha (see for example, Maul, 2017), can, in the words of Alexandrova and Haybron, (2016), is to some degree, "theory avoidant." In other words, at the risk of theory, measure construction may take place to maximize certain statistics, which leads one to question what is actually being measured. ...
... However, many of the theories posited by researchers such as Ryff et al. (1995) take the measurement of SWB as given, a quantity to be predicted. Researchers from fields such as psychology (Diener, Suh, Lucas, & Smith, 1999;Diener, Emmons, Sem, & Griffin, 1985), philosophy (see modern views from Alexandrova & Haybron, 2016), and economics (Dolan, Peasgood, & White, 2008) question the direction of the hypothesized causal relationship between subjective well-being and its purported indicators. Additionally, Deserno, Borsboom, Begeer, & Guerts (2017), construed SWB as a dynamic multi-causal system which induce different relationships among indicators. ...
... Far from mere statistical calculus, answering these questions requires a great degree of domain knowledge and critical judgment on often competing hypotheses (Boumans, 2015, Chapter 5) as the validity of particular poverty metrics is subject to disagreement even within the discipline. 5 As much as such evaluation is critical, it is also intricate and ideally involves addressing challenging contextual questions that lack clear-cut numerical answers (Alexandrova & Haybron, 2016). ...
... 1D). 7 I am not committed to any particular notion of measurement validity or adequacy as a detailed treatment of the philosophical literature on measurement validation and adequacy (e.g.,Alexandrova, 2017, Chapter 6;Alexandrova & Haybron, 2016;Bokulich & Parker, 2021;Feest, 2020) lies outside the scope of this paper. ...
Article
Full-text available
Supervised machine learning has found its way into ever more areas of scientific inquiry, where the outcomes of supervised machine learning applications are almost universally classified as predictions. I argue that what researchers often present as a mere terminological particularity of the field involves the consequential transformation of tasks as diverse as classification, measurement, or image segmentation into prediction problems. Focusing on the case of machine-learning enabled poverty prediction, I explore how reframing a measurement problem as a prediction task alters the primary epistemic aim of the application. Instead of measuring a property, machine learning developers conceive of their models as predicting a given measurement of this property. I argue that this predictive reframing common to supervised machine learning applications is epistemically and ethically problematic, as it allows developers to externalize concerns critical to the epistemic validity and ethical implications of their model's inferences. I further hold that the predictive reframing is not a necessary feature of supervised machine learning by offering an alternative conception of machine learning models as measurement models. An interpretation of supervised machine learning applications to measurement tasks as automatically-calibrated model-based measurements internalizes questions of construct validity and ethical desirability critical to the measurement problem these applications are intended to and presented as solving. Thereby, this paper introduces an initial framework for exploring technical, historical, and philosophical research at the intersection of measurement and machine learning.
... As already mentioned in section 1, the construct validity program of Cronbach and Meehl (1955) enjoyed widespread celebration for its theoretical virtue. It is still the most acknowledged validity theory today, even by philosophers who have pointed to its deficiencies (e.g., Alexandrova and Haybron, 2016;Stone, 2019). Nevertheless, by 1980, Cronbach had already conceded to testers who were prevented from following the spirit of the program by practical difficulties. ...
Article
This paper examines the role existence claims play in measurement validity. First, I review existing popular theories of measurement and of validity, and argue that they all follow a correspondence framework, which starts by assuming that an entity exists in the real world with certain properties that allow it to be measurable. According to this framework, to measure is to passively document the entity's properties, and to measure validly is for this documentation to be accurate. Second, drawing upon debates from within the testing community and the literature from the sociology of measurement, I argue that the correspondence framework faces both a theoretical challenge, where the assumption of the existence of the entity is rarely justifiable, and a practical challenge, where it does not match how measurement is done in many high stakes situations. Third, I suggest a validity-first framework of measurement, which reverses the justificatory order, as an alternative. I argue that we ought to start with a practice-based validation process, which serves as the basis for a measurement theory, and only posits objective existence when it is scientifically useful to do so.
... But it is well known that psychometric validity is always relative to a population in which a given survey was initially validated. In addition to this relativity, construct validation operates on purely correlational evidencethe point of construct validation is to check that a scale correlates with other variables that our background assumptions say are relevant (Alexandrova & Haybron, 2016). As such, construct validation is silent about the psycho-linguistic process involved in mapping the underlying construct of interest ('wellbeing') to self-reports on the metric in question. ...
Article
Full-text available
We compare and evaluate two competing paradigms in the ‘wellbeing public policy’ (WPP) space with the intention of promoting interdisciplinary dialogue. We argue that most WPP proposals adopt the same ‘social planner perspective’ (SPP) that undergirds conventional economic policy analysis. The SPP is broadly technocratic, emphasising scientific standards for what constitutes good policy and empowering ‘dispassionate’ experts. We argue that WPP could lend itself to a more transformative agenda, one that embraces the value-laden nature of ‘wellbeing’ as a concept. We call this the ‘citizen’s perspective’ (CP). It would see WPP relinquish the SPP’s stance of detached analysis by technical experts and instead give a greater role to participatory and deliberative modes of policymaking to define, analyse, and measure wellbeing and ultimately make policy decisions. We present a preliminary framework for analysing when the SPP or CP is more suitable to a particular area of WPP.
... But there is often a mismatch in scientific psychology because many researchers pay little attention to these practices. As a consequence, particularly in regard to construct validitythat is, the validity of inferences made about how the measured variables relate to the constructs of interest-there has been recent criticism of psychological research practices: they have been characterized as "theory-avoidant" and onesidedly oriented to statistics (Alexandrova & Haybron, 2016). As some scientists have self-critically noted, many researchers tend to skip central steps in the definition and conceptualization of the constructs they investigate, which often leads to a lack of conceptual clarity and threatens the validity of psychological inferences (Vazire et al., 2020). ...
Article
The replication crisis led to the rise of metascience as a possible solution. In this article, we examine central metascientific premises and argue that attempts to solve the replication crisis in psychology will benefit from a tighter integration of approaches from the psychological humanities. The first part of our article identifies central epistemic merits that metascientific endeavors can contribute to psychology. However, we argue secondly against the widespread claim that metascience is the only way to deal with the replication crisis in psychology and point to major epistemic problems: the one-sided notion of a singular scientific method, the homogenizing view of psychology, and the exclusion of practices of theorizing. As a possible compensation for such shortcomings, we introduce, third, the reflective and pluralistic approach of psychological humanities. In so doing, we show how psychological humanities can serve as an important complement to the objective of improving psychological research. Psychological humanities contribute to a more precise determination of validity, to ethical considerations, and a better understanding of psychology’s objects in regard to replication. Accordingly, we argue for the integration of psychological humanities into both metascience and psychology to provide a better basis for addressing epistemic and ethical questions.
... Accessed June 2, 2021. 29(Alexandrova, 2017;Alexandrova & Haybron, 2016). ...
Article
Full-text available
This paper introduces a category of functional conditions to address certain difficulties that have arisen in philosophical work on the nature of happiness. In earlier work, I defended an emotional state theory of happiness on which being happy consists substantially in dispositional states, such as one's propensity for a relaxed or cheerful mood. Hedonistic accounts of happiness, which reduce it to experiences of pleasure, were rejected partly on the grounds that they appear to commit a category mistake. However, the nature of this category mistake remained unclear, and the claimed dispositionality of happiness has likewise been challenged even by commentators otherwise sympathetic with an emotional state theory. Here I address these worries by sharpening the metaphysical underpinnings of the emotional state view as I have articulated it. Understanding happiness in terms of an individual's functional condition resolves these puzzles in a way that helps to explain the distinctive significance of happiness.
... We ensured a high level of content validity by choosing outcome variables through clearly formulated questions that cover both special and general areas of business owners' strategic activities. At the same time, the construct validity leaves many questions and can be defined as fragmentary and contradictory, which we also observe in a number of similar studies (Alexandrova and Haybron, 2016). We argue that decrease in the validity levelboth for the method and findings -is due to the normative nature of Schwartz model, which served as the basis for measuring business owners' individual values. ...
Article
This paper aims to research the relationship between business owners’ strategic intentions underlined by attitudes and their basic human values in the form of motivational types. The study is focused on business owners’ attitudes towards gaining power opposed to revenue generation, profit withdrawal time horizon, investment in research and development, adherence to ethical standards, and filling a role in society. Unfolding the association between strategically significant attitudes of business owners and their personal values is crucial. This is a cross-sectional survey study using Spearman's rank correlation analysis. Purposive sampling was conducted to collect data based on the authors’ personal network over a period of five years through a questionnaire among 682 business owners from 39 countries. The results showed that business owners’ strategically significant attitudes related to their intentions can be not only value-expressive and value-ambivalent as found in previous studies, but also value-unmanifested and value-quasi-manifested. The theoretical and practical implication of the paper is that studying the relationship between strategic intentions and individual values applying a normative approach weakens the validity of the findings.
Article
Full-text available
It is commonplace that care should be patient-centered. Nevertheless, no universally agreed-upon definition of patient-centered care exists. By consequence, the relation between patient-centered care as such and ethical principles cannot be investigated. However, some research has been performed on the relation between specific models of patient-centered care and ethical principles such as respect for autonomy and beneficence. In this article, I offer a detailed case study on the relationship between specific measures of patient-centered care and the ethical principle of respect for autonomy. Decision Quality Instruments (DQIs) are patient-centered care measures that were developed by Karen Sepucha and colleagues. The model of patient-centered care that guided the development of these DQIs pays special attention to the ethical principle of respect for autonomy. Using Jonathan Pugh’s theory of rational autonomy, I will investigate how the DQIs relate to patient autonomy. After outlining Pugh’s theory of rational autonomy and framing the DQIs accordingly (Part I), I will investigate whether the methodological choices made while developing these DQIs align with respect for autonomy (Part II). My analysis will indicate several tensions between DQIs and patient autonomy that could result in what I call “structural paternalism.” These tensions offer us sufficient reasons, especially given the importance of the ethical principle of respect for autonomy, to initiate a more encompassing debate on the normative validity of Decision Quality Instruments. The aim of the present paper is to highlight the need for, and to offer a roadmap to, this debate.
Article
Five comments below provide strong and interesting perspectives on multi‐item scale use. They define contexts and research areas where developed scales are valuable and where they are vulnerable. Katsikeas and Madan begin by taking a global perspective on scale use, demonstrating how the use and transferability of scales becomes even more problematic as researchers move across languages and cultures. They provide guidance for scale use that is particularly relevant to international marketing and marketing strategy research. Brendl and Calder acknowledge the use of well‐formed scales as measured variables in psychological experiments, both as independent and dependent variables, but critique the use of multi‐item scales to directly reveal latent unobservable constructs. As with any observed variable, scales should be used to test empirical predictions based on theoretical hypotheses about causal connections between theoretical constructs. Lehmann applauds the variability of multi‐item scales and urges the exploration of the impact of various items within a scale. He advocates for flexibility and variation in multi‐item scales related to psychological theories, simple three‐item scales for manipulation checks, and one‐item scales when measuring objective actions or beliefs. Baumgartner and Weijters focus on how to validate multi‐item scales, particularly when used as mediators or moderators where a unique interpretation of the scale is so central. They recommend meta‐analyses of scales that test relationships among measured scales. Like Lehmann, they worry about the impact of exhaustive scales on respondents and the impact of exhausted respondents on the scales themselves. In the final comment, Wang and Huang update our thinking on emerging ways to define and refine scales. They discuss ways to identify focal and orbital constructs and suggest Item Response Theory as a way to adapt scales to subsets of items that best contribute to identifying individual differences between respondents. They support confirmatory factor analysis across different studies to assess scale equivalence across different contexts, cultures and languages.
Article
Psychological science’s “credibility revolution” has produced an explosion of metascientific work on improving research practices. Although much attention has been paid to replicability (reducing false positives), improving credibility depends on addressing a wide range of problems afflicting psychological science, beyond simply making psychology research more replicable. Here we focus on the “four validities” and highlight recent developments—many of which have been led by early-career researchers—aimed at improving these four validities in psychology research. We propose that the credibility revolution in psychology, which has its roots in replicability, can be harnessed to improve psychology’s validity more broadly.
Article
Full-text available
Scale construction is a growth enterprise in the psychological literature. Unfortunately, many measures promise much but are severely limited by the inadequacies of their conceptualization and execution. In this paper, a model for developing psychological scales is presented that is rooted in the traditions of construct validity and classical test theory but informed by modern psychometric methods. Construct validity is conceptualized as a guiding principle in each of three phases of scale development, focused on (i) construct conceptualization and development of the initial item pool, (ii) item selection and structural validity, and (iii) assessment of external validity vis-à-vis other measures and relevant nontest criteria.
Article
Full-text available
A meta-analysis of published studies that reported correlations between self-ratings and informant ratings of well-being (life-satisfaction, happiness, positive affect, negative affect) was performed. The average self-informant correlation based on 44 independent samples and 81 correlations for a total of 8,897 participants was r=0.42 [99% credibility interval=0.39|0.45]. Statistically reliable moderators of agreement were construct (life-satisfaction=happiness>positive affect>negative affect), age of the target participant (older>younger), number of informants (multiple>single), and number of items in the measure (multiple>single). The implications for the validity of self-ratings of well-being as indicators of well-being are discussed.
Article
Full-text available
Science represents natural phenomena by means of theories, as well as in many concrete ways by such means as pictures, graphs, table-top models, and computer simulations. This book begins with an inquiry into the nature of representation in general, drawing on such diverse sources as Plato's dialogues, the development of perspectival drawing in the Renaissance, and the geometric styles of modeling in modern physics. Starting with Mach's and Poincaré's analyses of measurement and Reichenbach's 'problem of coordination', this book presents a view of measurement outcomes as representations achieved in a process of mutual stabilization of theory and empirical inquiry. With respect to the theories of contemporary science, the book defends an empiricist structuralist version of the 'picture theory' of science, compatible with a constructive empiricist view, through an inquiry into the paradoxes that came to light in 20th-century philosophies of science. It is argued that indexicality enters irreducibly into the conditions of use and application of measurement, models, and theories. The book concludes with an analysis of the complex relationship between appearance and reality in the scientific world-picture, arguing against the completeness criterion that demands a derivation of the appearances from the theoretically postulated reality.
Article
"Construct validation was introduced in order to specify types of research required in developing tests for which the conventional views on validation are inappropriate. Personality tests, and some tests of ability, are interpreted in terms of attributes for which there is no adequate criterion. This paper indicates what sorts of evidence can substantiate such an interpretation, and how such evidence is to be interpreted." 60 references. (PsycINFO Database Record (c) 2006 APA, all rights reserved).
Article
The philosophy of measurement studies the conceptual, ontological, epistemic, and technological conditions that make measurement possible and reliable. A new wave of philosophical scholarship has emerged in the last decade that emphasizes the material and historical dimensions of measurement and the relationships between measurement and theoretical modeling. This essay surveys these developments and contrasts them with earlier work on the semantics of quantity terms and the representational character of measurement. The conclusions highlight four characteristics of the emerging research program in philosophy of measurement: it is epistemological, coherentist, practice oriented, and model based.
Article
In recent studies of the structure of affect, positive and negative affect have consistently emerged as two dominant and relatively independent dimensions. A number of mood scales have been created to measure these factors; however, many existing measures are inadequate, showing low reliability or poor convergent or discriminant validity. To fill the need for reliable and valid Positive Affect and Negative Affect scales that are also brief and easy to administer, we developed two 10-item mood scales that comprise the Positive and Negative Affect Schedule (PANAS). The scales are shown to be highly internally consistent, largely uncorrelated, and stable at appropriate levels over a 2-month time period. Normative data and factorial and external evidence of convergent and discriminant validity for the scales are also presented. (PsycINFO Database Record (c) 2010 APA, all rights reserved)
Book
The case is made for implementing national accounts of well-being to help policy makers and individuals make better decisions. Well-being is defined as people's evaluations of their lives, including concepts such as life satisfaction and happiness, and is similar to the concept of 'utility' in economics. Measures of well-being in organizations, states, and nations can provide people with useful information. Importantly, accounts of well-being can help decision makers in business and government formulate better policies and regulations in order to enhance societal quality of life. Decision makers seek to implement policies and regulations that increase the quality of life, and the well-being measures are one useful way to assess the impact of policies as well as to inform debates about potential policies that address specific current societal issues. This book reviews the limitations of information gained from economic and social indicators, and shows how the well-being measures complement this information. Examples of using well-being for policy are given in four areas: health, the environment, work and the economy, and social life. Within each of these areas, examples are described of issues where well-being measures can provide policy-relevant information. Common objections to using the well-being measures for policy purposes are refuted. The well-being measures that are in place throughout the world are reviewed, and future steps in extending these surveys are described. Well-being measures can complement existing economic and social indicators, and are not designed to replace them.
Article
The Gallup World Poll, the first representative sample of planet Earth, was used to explore the reasons why happiness is associated with higher income, including the meeting of basic needs, fulfillment of psychological needs, increasing satisfaction with one's standard of living, and public goods. Across the globe, the association of log income with subjective well-being was linear but convex with raw income, indicating the declining marginal effects of income on subjective well-being. Income was a moderately strong predictor of life evaluation but a much weaker predictor of positive and negative feelings. Possessing luxury conveniences and satisfaction with standard of living were also strong predictors of life evaluation. Although the meeting of basic and psychological needs mediated the effects of income on life evaluation to some degree, the strongest mediation was provided by standard of living and ownership of conveniences. In contrast, feelings were most associated with the fulfillment of psychological needs: learning, autonomy, using one's skills, respect, and the ability to count on others in an emergency. Thus, two separate types of prosperity-economic and social psychological-best predict different types of well-being.
Book
What is temperature, and how can we measure it correctly? These may seem like simple questions, but the most renowned scientists struggled with them throughout the 18th and 19th centuries. In Inventing Temperature , Chang examines how scientists first created thermometers; how they measured temperature beyond the reach of standard thermometers; and how they managed to assess the reliability and accuracy of these instruments without a circular reliance on the instruments themselves. In a discussion that brings together the history of science with the philosophy of science, Chang presents the simple yet challenging epistemic and technical questions about these instruments, and the complex web of abstract philosophical issues surrounding them. Chang's book shows that many items of knowledge that we take for granted now are in fact spectacular achievements, obtained only after a great deal of innovative thinking, painstaking experiments, bold conjectures, and controversy. Lurking behind these achievements are some very important philosophical questions about how and when people accept the authority of science.