ArticlePDF Available

Is Construct Validation Valid?

Authors:

Abstract

What makes a measure of well-being valid? The dominant approach today, construct validation, uses psychometric tests to ensure that questionnaires behave in accordance with background knowledge. Our first claim is interpretive – construct validation obeys a coherentist logic that seeks to balance diverse sources of evidence about the construct in question. Our second claim is critical - while in theory this logic is defensible, in practice it does not secure valid measures. We argue that the practice of construct validation in well-being research is theory-avoidant, favoring a narrow focus on statistical tests while largely ignoring relevant philosophical considerations.
Is Construct Validation Valid?
Anna Alexandrova and Daniel M. Haybron*y
What makes a measure of well-being valid? The dominant approach today, construct
validation, uses psychometrics to ensure that questionnaires behave in accordance with
background knowledge. Our rst claim is interpretiveconstruct validation obeys a
coherentist logic that seeks to balance diverse sources of evidence about the construct
in question. Our second claim is criticalwhile in theory this logic is defensible, in
practice it does not secure valid measures. We argue that the practice of construct val-
idation in well-being research is theory avoidant, favoring a narrow focus on statistical
tests while largely ignoring relevant philosophical considerations.
1. Introduction. What makes a measure of well-being valid? A major proj-
ect in todays social and medical sciences is measurement of happiness, life
satisfaction, and perceived quality of life using self-reports. When question-
naires used to elicit these reports obey the principles of psychometrics, they
are considered to be valid measurement tools. Central to this project is con-
struct validationa method for checking the consilience of questionnaires
with the background knowledge about the property in question.
In this article we focus on construct validation of measures of self-reported
states relevant to well-being. There is perhaps more to well-being than sub-
jective states such as happiness or satisfaction, but we put this concern aside.
How an agent feels and judges their life is undoubtedly relevant to their over-
all well-beingany theorist accepts that much. So evaluating standard mea-
surement tools for detecting these feelings and judgments is important regard-
*To contact the authors, please write to: Anna Alexandrova, Department of History
and Philosophy of Science, University of Cambridge, Free School Lane, Cambridge
CB2 3RH, UK; e-mail: a.a.alexandrova@gmail.com. Daniel M. Haybron, College
of Arts and Sciences, Department of Philosophy, Saint Louis University, Verhaegen
Hall, 3634 Lindell Blvd., St. Louis, MO 63108; e-mail: haybrond@slu.edu.
yThe authors are equally and jointly responsible for the contents. They thank the anon-
ymous referees, Valerie Tiberius, Colin DeYoung, and Elina Vessonen for valuable com-
ments.
Philosophy of Science, 83 (December 2016) pp. 10981109. 0031-8248/2016/8305-0038$10.00
Copyright 2016 by the Philosophy of Science Association. All rights reserved.
1098
less of our philosophical persuasion on the nature of well-being. The real
question is whether construct validation evaluates these questionnaires in a
fair way.
Our rst claim is an explicit statement of the logic of the process, some-
thing philosophers have not done so far. Construct validation, we argue, fol-
lows a coherentist spirit according to which measures are valid to the extent
that they cohere with theoretical and empirical knowledge about the states
being measured. In theory this is a defensible approach to measurement, but
in practice the current procedures of validation do not respect all sources of
knowledge about well-being, and this is our second claim. Construct vali-
dation is in fact dangerously theory avoidant, failing to respect a core com-
mitment of any plausible theory of well-being, namely, that well-being is a
normative category. This constraint implies that measures of subjective states
relevant to well-being need to be judged on their normative validity in addi-
tion to other characteristics. The current almost exclusive attention to the sta-
tistical correlations between questionnaires and questionnaire items does not
provide sufcient constraints to weed out weak measures. We close with a sug-
gestion for how construct validation can be improved.
2. What Is Construct Validation?.The rst order of business is to get
clear on the logic behind the procedure. The psychometric tradition in
the social sciences has historically specialized in tests and questionnaires
for detecting unobservable attributes such as intelligence and personality
traits. Today for virtually all researchers who wish to measure any attribute
on the basis of self-reports or performances in tests, psychometric valida-
tion remains the obligatory procedure. The practitioners of the new science
of well-beingpsychologists, sociologists, clinical scientistshave also
embraced questionnaires and, with that, psychometric validation.
Questionnaires used in well-being research range from gauging a per-
sons feeling (How anxious do you feel?) to gauging their judgments
(Is your life going well according to your priorities?) to gauging their per-
ception of facts deemed important (Do you feel in control of your circum-
stances?). They can be longer or shorter and administered through various
media. Some well-known questionnaires include the Satisfaction with Life
Scale (SWLS; Diener et al. 1985), the Positive and Negative Affect Scale
(PANAS; Watson, Clark, and Tellegen 1988), and the Nottingham Health
Prole (Hunt et al. 1981), which measure life satisfaction, happiness, and
health-related quality of life, respectively.
Validation of these scales follows a typical pattern described in measure-
ment textbooks and articles on validation (Simms 2008; de Vet et al. 2011).
First, researchers dene the construct to be measured by elaborating its
scope and limits. This is the conceptual stage in which the meaning of
the concepts in question is discussed, invoking anything from philosophical
IS CONSTRUCT VALIDATION VALID? 1099
theories to untutored intuitions to dictionary denitions. For example, the
scope of happiness is often deemed to be positive and negative affect, while
the scope of satisfaction with life is deemed a cognitive judgment about
ones conditions and goals. In the second stage, researchers choose a mea-
surement method (a questionnaire, a test, or a task), select the items (what
questions? what tasks?), and settle on the scoring method. In the third and
nal stage, the instrument is tested for its validity. We focus on this last step,
because it is supposed to discipline all the free philosophizing that happens
in the earlier stages with the hard tools of psychometrics. What are those
tools?
It is hard to speak of a psychometric method in general because the meth-
ods are numerous and constantly evolving.
1
But in the case of well-being
measures, validation frequently involves factor analysis: when hundreds
of subjects ll out the same questionnaire, perhaps several times over a pe-
riod, it is possible to observe the correlations between responses to different
items. These correlations are then used to show that there are one or more
clusters of items called factorsthat account for the total information. Sci-
entists speak of factor analysis as extracting a manageable number of latent
dimensions that explain the covariation among a larger set of manifest var-
iables(Simms 2008, 421).
2
Explanation is here used in an entirely phe-
nomenological sense as saving the phenomena (the phenomena being the
total data generated by administering the questionnaire in question), rather
than stating the causes of the phenomena. For example, the SWLS is a pop-
ular ve-item Likert scale for measuring the cognitive aspect of subjective
well-being, that is, the extent to which subjects judge their life to be satis-
factory. Factor analysis identied all ve items to be measuring the same
latent variable because a single factor accounted for 66% of the variance
in the data (Diener et al. 1985). Other scales may turn out to gauge more
than one dimension.
The next step of the testing stage is to check that the behavior of these
factors accords with other things scientists know about the object in ques-
tion. In the case of subjective well-being, this knowledge includes how peo-
ple evaluate their lives and surroundings, what behavior results from these
evaluations, and what other people who know the subjects say about them.
For example, the aforementioned SWLS, according to its authors, earned
1. Sawilowsky (2007) summarizes the state of the art.
2. There is a difference between exploratory and conrmatory factor analysis (see de
Vet et al. 2011, 16972, among other places). The former is used to reduce the number
of items in a questionnaire by identifying the one(s) that best predict the overall ratings.
The latter, on the other hand, tests that the factors that best summarize the data also con-
form with a theory of the underlying phenomenon if there is one. This distinction is not
important for the present argument.
1100 ANNA ALEXANDROVA AND DANIEL M. HAYBRON
construct validity when Diener and his colleagues compared responses on
the SWLS to responses on other existing measures of subjective well-being
and related constructs such as affect intensity, happiness, and domain satis-
faction. The ndings conrmed their expectation that SWLS scores corre-
late highly with those measures that also elicit a judgment on subjective
well-being and less so with measures that focus only on affect or self-
esteem or other related but distinct notions. One piece of evidence in favor
of SWLS was that the scores of 53 elderly people from Illinois correlated
well to the ratings this same population received in an extended interview
about the extent to which they remained active and were oriented toward
self-directed learning(Diener et al. 1985, 73). How strong was the corre-
lation? It was r50.43, which is adequate by the standards of the discipline.
Since 1985, SWLS has continued to be scrutinized for its agreement with
the growing data about subjective well-being. Individual judgments of life
satisfaction have been checked against the reports of informants close to the
subjects (Schneider and Schimmack 2009). Proponents of SWLS argue that
it exhibits a plausible relationship with money, relationships, suicide, and
satisfaction with various domains of life, such as work and living condi-
tions.
3
Now we are in a position to formulate a logic for psychometric validation
that we believe captures these practices:
Implicit Logic. A measure Mof a construct Cis validated to the extent that
Mbehaves in a way that respects three sources of evidence:
1. Mis inspired by a plausible theory of Cspecied in stage 1.
2. Subjects reveal Mto track Cthrough their questionnaire answering
behavior.
3. Other knowledge about Cis consistent with variations in values of
Macross contexts.
The rst condition captures the role of philosophizing about the nature of C
in the rst stage of measure development. There are no strong criteria for
what makes a conception of Cplausible and how elaborate it should be.
The second condition species the assumption behind factor analysis.
4
The
third acknowledges that scientists go beyond the merely internal analysis of the
scale: a valid measure correlates with indicators that our background knowl-
edge says it should and does not correlate with indicators that it shouldnt. To-
3. See Diener et al. (2008, 7493) for summary and references.
4. For convenience we are focusing on the practice of factor analysis, even though not all
construct validation procedures that concern us involve ite.g., the validation of single-
item measures.
IS CONSTRUCT VALIDATION VALID? 1101
gether the three conditions capture what it takes for a measure to be declared
valid,
5
but they do not explain the reasons why this inference works. So the
next step is to evaluate Implicit Logic.
3. Construct Validation Is Good in Theory. Construct validation as de-
scribed above conceives of measurement as part of theory development and
validation as part of theory testing. On the original proposal formulated in
the classic 1955 article by Lee Cronbach and Paul Meehl, construct valida-
tion consists in testing the nomological network of hypotheses in the neigh-
borhood of the construct in question (Cronbach and Meehl 1955). To mea-
sure x, we need to know how xbehaves in relation to other properties and
processes that are systematically connected with xby lawlike regularities.
Something like this view is still the consensus: To determine whether a
measure is useful, one must conduct empirical tests that examine whether
the measure behaves as would be expected given the theory of the underly-
ing construct(Diener et al. 2008, 67).
We believe that this vision of measure validation is defensible. Its spirit is
remarkably similar to the coherentist vision that characterizes recent work
on measurement of physical quantities (Chang 2004; van Fraassen 2008;
Tal 2013). These philosophers emphasize that the outlines of the concept
in question, be it temperature or time, and the procedure for detecting it
are settled not separately but iteratively, checking and correcting one against
another. Similarly in our case, the initial philosophical judgment about the
nature of happiness or quality of life is coordinated with other constraints
such as the statistical features of the questionnaires and the background knowl-
edge about behavior, related indicators, and ratings of informants. The result-
ing measurement tools can be deemed valid to the extent that they accommo-
date all evidence.
The above vision appears to contrast with cases where measurement
starts with a set of observable relations (e.g., rigid rods of different lengths,
or choices of different goods by an agent) and proceeds via axioms to nu-
merical structures (such as a sequence of real numbers to represent length or
utility function). The latter picture is often associated with the representa-
tional theory of measurement. According to this, a measure is valid if there
is a demonstrated homomorphism between an observable relation and a nu-
merical relational structure (Krantz et al. 1971). The economic approach to
welfare measurement via gross domestic product and other economic indi-
cators seems to follow this logic because it relies, in part, on axioms that
5. There are, of course, other kinds of validity. We concentrate on construct because
among measurement theorists the consensus seems to be that construct validity encom-
passes all other types of validity, such as criterion, predictive, discriminant, and content
validity (Strauss and Smith 2009).
1102 ANNA ALEXANDROVA AND DANIEL M. HAYBRON
relate preferences to utility. Some commentators conclude that since the
psychometric approach does not rely on axioms, it is therefore not in keep-
ing with the representational theory (Angner 2009).
We make no such claims. It may well be that the psychometric approach
is not a tradition of its own and that it too needs something that has played
the role of axioms in the representational theory.
6
Perhaps step 1 of our Im-
plicit Logic aims at this goal by delineating the bounds of the concept in
question. All we claim is that the ideal behind construct validation is to for-
mulate reliable scales that accord with background knowledge. If this pro-
cess works, it should be enough for measurement. But does it?
4. Construct Validation in Practice. Things look worse in practice than
in theory. Although questionnaires are validated against a broad range of
evidence, psychometricians are selective about what counts as evidence
in favor of or against construct validity. We see two problems that illustrate
this selectivity. First, the existing data used to validate questionnaires do not
provide sufcient constraints to weed out the poor ones. Secondly, a legit-
imate source of evidence about the nature of states relevant to well-being
philosophical theorizingis either never used or else overridden by statis-
tical considerations. These are the two senses in which construct validation
is theory avoidant, sacricing valid theoretical knowledge for statistics for
no good reason.
As step 3 of our Implicit Logic shows, researchers base judgments of va-
lidity mainly on whether the measure in question exhibits plausible-seeming
correlations with relevant-seeming variables. This is not unreasonable, since
correlational data are themain source of empirical evidence at hand, and there
is something of a chicken-and-egg problem in that, if we already knew exactly
what correlations a measure should exhibit, we might not have much need for
the measure. One piece of evidence that a well-being measure is valid, for in-
stance, might be that it correlates to some signicant degree with money. But
then, on the other hand, the correlation between well-being and money may be
precisely one of the things we hope to nd out using the measure. Psycho-
metricians have their work cut out for them.
It makes sense, then, that validation procedures should be exible and
holistic: we see whether, on balance, the measure behaves in a way that
makes sense. While correlations with any given variable might prove to
be surprising, the overall pattern of correlations should not, in general, be
too much of a surprise. When we do get broadly unexpected resultsas
might have been the case, for instance, when research seemed to indicate
that happiness was so strongly prone to adaptation as to be nearly immuta-
6. See Cartwright and Bradburn (2011) on the importance of representation in social
measurement, where concepts are often fuzzy and multitudinous.
IS CONSTRUCT VALIDATION VALID? 1103
ble (Lykken and Tellegen 1996)then either we need some theoretical
framework to make sense of it, say, that happiness is strongly governed
by homeostatic mechanisms that keep the individual hovering around a given
set point,or we should suspect that the measures are not in fact valid (or
that the results are otherwise spurious).
The trouble is that what counts as a plausible correlationis a rather elas-
tic quantity, both vague and open to the interpretive predilections of the in-
vestigator, whose judgment in the matter may be less than impartial. The
problem is particularly acute in well-being research, where it can seem as
if nearly everything correlates substantially with nearly everything else.
Moreover, commonsense views of well-being tend to be both expansive
and incoherent; it is only somewhat exaggerated to say that just about any-
thing one might care to venture about well-beingmoney buys happiness,
money doesnt buy happinessis already part of the folklore.
Take a long list of variables that seem like they might be related to well-
beingmoney, relationships, health, education, work, and so on. Imagine
two measures, A and B, each of which correlates substantially with nearly
all of these variables, while also differing greatly in what those correlations
are. One suggests that relationships are more strongly related to well-being
than money, while the other has the reverse implication, and so forth. It
seems entirely possible that both measures could reasonably be deemed
to exhibit plausible correlationsand generally pass as valid measures of
well-being. It is also possible that one of those measures is in fact valid, while
the other is not: A gets the correlations essentially right, while B gets them
wrong.
This sort of scenario is not merely a theoretical possibility. Recent stud-
ies have found that life evaluation and affect measures of well-being give
importantly different results, and some researchers have taken the differ-
ences to indicate that life evaluation metrics (such as the SWLS) are supe-
rior on the grounds that they are, or are claimed to be, more sensitive to life
circumstancesgenerally, correlating more strongly with quantities that
have traditionally interested policymakers such as income, governance,
freedom, and so on (Helliwell, Layard, and Sachs 2012). One question,
to which we will return shortly, is whether the affect measures in question
are themselves well designed. More pertinent for current purposes is this:
why should we assume that the better measure must correlate more strongly
with those variables? Suppose that hedonism, one of the main theories of
well-being in the literature, is in fact correct. In that case, perhaps the best in-
terpretation of the data is that well-being isnt very sensitive to life circum-
stances. (Of course, those variables, like good governance, might matter a
great deal for reasons of justice, or some other reason.)
Alternatively, perhaps the life circumstanceson which these research-
ers are focusing just arent the ones that matter most for well-being. An im-
1104 ANNA ALEXANDROVA AND DANIEL M. HAYBRON
portant article discussing data from the same global survey, for example,
reports that while life evaluation metrics do indeed track material prosper-
itymore strongly, the affect measures better correlate with what the au-
thors call psychosocial prosperity: whether people reported being treated
with respect in the previous day, had friends to count on, learned something
new, did what they do best, or chose how their time was spent (Diener et al.
2010). It would not be eccentric to suggest that these are just the sorts of
variables that seem most obviously to matter for well-being, and to which
good measures of well-being ought to be sensitive. Perhaps, then, it is the
affect measure, and not life evaluation, that offers a more meaningful pic-
ture of well-being.
Or perhaps not. Our point is not to endorse or critique either sort of mea-
sure.
7
There may be other reasons to favor life evaluation measures, and
there are differences in the data sets being used by these investigators that
we cannot assess here. The point is just to illustrate how two prominent
measures could both be deemed valid measures of well-being by prevailing
standards, though they have very different statistical propertiesand, cru-
cially, statistical tests alone cannot tell us which is the superior instrument.
We need to appeal to theoretical considerations as well: what conception of
well-being is relevant here? Given our best understanding of human well-
being, what sorts of factors should a good measure correlate most strongly
with? Is the measure that more closely tracks money and stuff likely to be a
better indicator of well-being than one that tracks relationships and mean-
ingful work? If we do not take these theoretical questions seriously, ideally
before testing our instruments, we risk settling on whatever measures are
most convenient, most congenial to our personal views, or simply ours,
and not someone elses.
One form of theory avoidance, then, can lead us to focus on the wrong
correlations, or have the wrong ideas about what the right correlations are:
the statistical data alone do not provide sufcient constraints to allow us to
assess the validity of a measure. In a second form, theory avoidance can
have us measuring the wrong variables altogether, because our instruments
are insufciently grounded in theoretical considerations that might provide
a rationale for their design. We illustrate with an example of a popular af-
fect questionnaire known as PANAS. PANAS assesses the relative preva-
lence of positive over negative mood and is commonly used to measure the
affective dimensions of subjective well-being. This 20-item questionnaire
7. We may be seeming to mix apples and oranges here, as life evaluation and affect mea-
sures arent even supposed to be measures of the same construct. In fact, however, this is
not entirely true: while their proximal concerns are quite distinct, both are often posited
and deployed more fundamentally as general metrics of well-being, aimed at giving a
rough snapshot of overall welfare.
IS CONSTRUCT VALIDATION VALID? 1105
asks subjects to rate themselves on whether they feel enthusiastic, inter-
ested, excited, strong, alert, proud, active, determined, attentive, inspired,
and so on (Watson et al. 1988). All these items have passed factor analysis
and other standard psychometric tests. But note that absent from this list are
cheerfulness, joy, laughter, sadness, depression, tranquillity, anxiety, stress,
wearinessemotions that are intuitively far more central to a happy psy-
chological state and to well-being. This is because the authors of PANAS
arrived at the list of items by testing a long list of English mood terms and
paring it down via factor analysis, so that a longer list would not yield ap-
preciably different results.
Such a procedure allows investigators to avoid hard theoretical questions
about which taxonomy of emotional states to employ, or which states are
most relevant to well-being. But for the same reason, there is little reason
to expect such a method to yield a sound measure of well-being, or even
of emotional well-being. Rather, what is being assessed, roughly, is the
number of English mood terms that apply to the respondentor rather,
the number of terms from a list of words that survived factor analysis.
But, rst, this leaves the measure prey to the vagaries of common English
usage and folk psychologypotentially important emotional phenomena
may not be prominent in the vocabulary of a given language, or may not
be correctly classied as emotional, and so may be omitted from the mea-
sure. Of particular concern here are relatively diffuse background states
anxiety, stress, peace of mind (not on the list)that are quite important for
well-being yet easily overlooked, resulting in a kind of streetlightprob-
lem where we end up looking where the light is best, rather than where
the keys are.
Second, some states are presumably more important for well-being than
others; feelings of serenity or joy (not on the list) probably count for more
than feeling attentiveor alert(on the list), and indeed some of the
PANAS items might barely deserve inclusion at all, if our interest is in as-
sessing well-being. Yet a term like attentivemight exhibit quite distinctive
correlations and thus make it on the list, while other more salient terms are
left by the wayside.
The worries here essentially amount to saying that you cant get the right
measure without attending to theoretical considerationsnamely, what do
our best theories tell us are the emotional states that might matter for well-
being? For example, one of the authors recently proposed an account of
emotional well-being, or happiness, that divides emotional states into three
broad typesrepresenting functional responses to different types of well-
being-relevant information regarding matters of security, opportunity, and
successand further posits emotional well-being as a central element in
an account of well-being (Haybron 2008, 2013). Whether or not that taxon-
omy is the right one to employ in well-being measures, some such account
1106 ANNA ALEXANDROVA AND DANIEL M. HAYBRON
could provide a theoretically motivated basis for developing affect-based
well-being instruments.
We do not deny that PANAS is useful or exhibits some desirable statis-
tical properties, and perhaps it does provide a reasonable, if somewhat
opaque, metric of well-being. As before, our purpose is not to critique a par-
ticular measure so much as to illustrate how practices of construct valida-
tion can be seriously inadequate given the ease with which they can fail
to attend seriously to theoretical concerns. While we have not tried to doc-
ument the extent of the problem and have focused mainly on illustrating the
risks, that there is some problem should be uncontroversial. The risks, we
think, are not infrequently realized if only because the examples discussed
here, the SWLS and PANAS, are very popular. The problem here resembles
a complaint often lodged against philosophersconceptual or linguistic anal-
yses, namely, a heavy reliance on the investigatorshunches or intuitions,
without adequate attention to the theoretical motivation, or lack thereof, for
reaching a certain view. This is not just a hazard for philosophers.
5. What Is to Be Done?.It is understandable that social scientists, like
other researchers, will want to focus their efforts where their competence
and interests are greatest. The theory-avoidant status quo has developed in
psychology owing to its operationalist heritage, which was key to its estab-
lishment as a hardscience; even today, psychologists insist that although
building substantive theories of subjective well-being is a worthy enterprise,
they are not trained to do so and it is safer to tread close to the easily observ-
able and reproducible results of psychometrics. Any proposal for reform
should respect the fact that this status quo is unlikely to change in any deep
ways. But correlation mongering is no substitute for theory, and so we urge
that construct validation protocols also assess the normative validity of mea-
sures. The normative validity of a measure of, say, happiness, is the extent to
which this measure respects the importance of happiness for well-being,
since well-being is the ultimate object of concern for the scientic project
in question. We conceive of normative validity as a fourth condition on Im-
plicit Logic in addition to the three existing ones: a measure Mmust respect
what is important about construct C. Just as philosophers relying on empir-
ical assumptions are increasingly expected to engage with the relevant sci-
entic literatures, so too should empirical researchers attend to the literature
that bears on the key philosophical assumptions they are making.
We are under no illusions that this is a lot to ask of scientists whose iden-
tity often enough consists in not being philosophers. Besides, the very ques-
tion of normative validity can be a genuinely difcult onephilosophers do
disagree about the importance of, say, life satisfaction for well-being. Never-
theless, a scholarly convention to discuss normative validity at least briey
in articles on validation would go some way toward agging this issue. At
IS CONSTRUCT VALIDATION VALID? 1107
the very least, if there is no theory of well-being according to which the con-
struct in question is important, that should count against a measure.
The science of well-being makes no pretense of being value-free in one
clear sense: well-being is a value worth understanding and pursuing. The
eager and successful policy engagement of the prominent gures in this
eld attests to this therapeutic mission. From this point of view our proposal
is quite tamewe merely try to show how the measurement and validation
practices of the science of well-being can catch up to the already-existing
normative ambition.
REFERENCES
Angner, Erik. 2009. Subjective Measures of Well-Being: Philosophical Perspectives.In The Ox-
ford Handbook of Philosophy of Economics, ed. Harold Kincaid and Don Ross, 56079. Ox-
ford: Oxford University Press.
Cartwright, Nancy, and Norman Bradburn. 2011. ATheory of Measurement.In The Importance
of Common Metrics for Advancing Social Science Theory and Research: Proceedings of the
National Research Council Committee on Common Metrics,5370. Washington, DC: National
Academies.
Chang, Hasok. 2004. Inventing Temperature: Measurement and Scientic Progress. Oxford: Ox-
ford University Press.
Cronbach, Lee J., and Paul E. Meehl. 1955. Construct Validity in Psychological Tests.Psycho-
logical Bulletin 52 (4): 281302.
de Vet, Henrica C. W., Caroline B. Terwee, Lidwine B. Mokkink, and Dirk L. Knol. 2011. Mea-
surement in Medicine: A Practical Guide. Cambridge: Cambridge University Press.
Diener, Ed, Robert A. Emmons, Randy J. Larsen, and Sharon Grifn. 1985. The Satisfaction with
Life Scale.Journal of Personality Assessment 49 (1): 7175.
Diener, Ed, Richard E. Lucas, Ulrich Schimmack, and John Helliwell. 2008. Well-Being for Public
Policy. New York: Oxford University Press.
Diener, Ed, Weiting Ng, James Harter, and Raksha Arora. 2010. Wealth and Happiness across the
World: Material Prosperity Predicts Life Evaluation, Whereas Psychosocial Prosperity Pre-
dicts Positive Feeling.Journal of Personality and Social Psychology 99 (1): 52.
Haybron, Daniel. M. 2008. The Pursuit of Unhappiness: The Elusive Psychology of Well-Being.
New York: Oxford University Press.
———. 2013. Happiness: A Very Short Introduction. New York: Oxford University Press.
Helliwell, John, Richard Layard, and Jeffrey Sachs. 2012. World Happiness Report. Columbia Uni-
versity: Earth Institute.
Hunt, Sonja M., S. P. McKenna, J. McEwen, Jan Williams, and Evelyn Papp. 1981. The Notting-
ham Health Prole: Subjective Health Status and Medical Consultations.Social Science and
Medicine. Part A: Medical Psychology and Medical Sociology 15 (3): 22129.
Krantz, David, Duncan Luce, Patrick Suppes, and Amos Tversky. 1971. Foundations of Measure-
ment. Vol. 1, Additive and Polynomial Representations. New York: Academic.
Lykken, David, and Auke Tellegen. 1996. Happiness Is a Stochastic Phenomenon.Psychological
Science 7 (3): 18689.
Sawilowsky, Shlomo. 2007. Construct Validity.In Encyclopedia of Measurement and Statistics,
ed. Neil J. Salkind and K. Rasmussen, 17982. Thousand Oaks, CA: Sage.
Schneider, Leann, and Ulrich Schimmack. 2009. Self-Informant Agreement in Well-Being Rat-
ings: A Meta-analysis.Social Indicators Research 94 (3): 36376.
Simms, Leonard J. 2008. Classical and Modern Methods of Psychological Scale Construction.
Social and Personality Psychology Compass 2 (1): 41433.
Strauss, Milton E., and Gregory T. Smith. 2009. Construct Validity: Advances in Theory and
Methodology.Annual Review of Clinical Psychology 5:125.
1108 ANNA ALEXANDROVA AND DANIEL M. HAYBRON
Tal, Eran. 2013. Old and New Problems in Philosophy of Measurement.Philosophy Compass 8
(12): 115973.
van Fraassen, Bas. C. 2008. Scientic Representation: Paradoxes of Perspective. Oxford: Oxford
University Press.
Watson, David, Lee A. Clark, and Auke Tellegen. 1988. Development and Validation of Brief
Measures of Positive and Negative Affect: The PANAS Scales.Journal of Personality and
Social Psychology 54 (6): 1063.
IS CONSTRUCT VALIDATION VALID? 1109
... Based on a critique of how construct validity is employed in practice, Alexandrova and Haybron (2016) conclude that construct validity is not valid, asserting that it substitutes theory with statistics. Similarly, Borsboom et al. (2009) extensively discuss the logical and epistemological problems of construct validity theory, and propose to abandon construct validity in favor of test validity: whether a test measures what it should measure (note that this is not what construct validity is; see Borsboom et al., 2009). ...
... This plea for explicit and comprehensive definitions is in line with more broadly shared needs concerning conceptual clarity. For example, Alexandrova and Haybron (2016) argue that when validating psychological measures, researchers often focus on statistical procedures at the expense of theory, largely forgoing comprehensive conceptual clarification. Vazire et al. (2022) review a number of studies that illustrate problems with conceptual clarity and validity. ...
... If we consider 1879, the year that Wilhelm Wundt founded the first laboratory of psychology, the birth year of psychology as a science, then it is no wonder that research concerning many psychological constructs is still in its early stages in terms of construct definitions and measurement instruments. However, in the case of psychology, it appears the way measurement is approached (e.g., heavily relying on construct validity and quantitative psychometric methods) is, as Alexandrova and Haybron (2016) put it, theory-avoidant. This effectively stifles the potential for epistemic iteration, stagnating further theory development. ...
Article
Full-text available
A theory crisis and measurement crisis have been argued to be root causes of psychology's replication crisis. In both, the lack of conceptual clarification and the jingle-jangle jungle at the construct definition level as well the measurement level play a central role. We introduce a conceptual tool that can address these issues: Decentralized Construct Taxonomy specifications (DCTs). These consist of comprehensive specifications of construct definitions, corresponding instructions for quantitative and qualitative research, and unique identifiers. We discuss how researchers can develop DCT specifications as well as how DCT specifications can be used in research, practice, and theory development. Finally, we discuss the implications and potential for future developments to answer the call for conceptual clarification and epistemic iteration. This contributes to the move towards a psychological science that progresses in a cumulative fashion through discussion and comparison.
... Theory-avoidance is not the sole nor even the main problem in the science of wellbeing (Alexandrova 2017). Other theoretical challenges to WPP include the prudential adequacy of SWB as a construct (Bishop 2015), difficulties in aggregating different aspects of wellbeing into single metrics (Cooper et al. 2023), the challenge of devising policies that would meet democratic norms (Alexandrova & Fabian 2021), as well as the various obstacles to validity of measurement scales (Fabian 2021;Marsh et al. 2021;Alexandrova & Haybron 2016). While our critique overlaps with these issues, it is distinct from them. ...
Article
Full-text available
Advocacy for ‘wellbeing public policy’ (WPP) requires suitable evidence. Endorsing the ambition to focus policy on wellbeing outcomes, we nevertheless argue that the current evidence base is deficient due to a lack of theory. For the purposes of our analysis, we identify theory narrowly with conceptual clarity and themodelling of causal mechanisms underlying statistical regularities. The prevailing focus on identifying ‘drivers’ of wellbeing and their effect sizes is not well suited for such theorising. We show that this status quo creates potential for conceptual confusion, incorrect aggregation, poor robustness and external validity of policy evaluations, inept interventions, and raises the prospect of a ‘Lucas critique’ in wellbeing economics. We discuss what sort of theory addresses these pitfalls, and where WPP could proceed fruitfully even in the absence of such theory. Ultimately, we call upon wellbeing experts to invest in developing theory as this would improve the basis for WPP and outcomes for those affected by it. Moreover, such theoretical contributions from the field of WPP could spillover to other disciplines, extending the reach and influence of wellbeing research.
... Although experimental evidence is acquiring increasing attention among economists, RCTs and their practices were distorted from the original experimentalist research programs as born in the natural sciences (Jiménez-Buedo & Russo, 2021). That is, social sciences' experimentation practices overestimate the experiments' power of providing objective results, and thus neglect the background assumptions of their key principles (Angner, 2013;Alexandrova & Haybron, 2016;see also Nagatsu & Favereau, 2020;Jiménez-Buedo & Russo, 2021;Diener et al., 2022;Esterling et al., 2023). This became a common problem of applied economic work, being part of a broader set of issues featuring modern economics; some of which will be outlined in the next section. ...
Thesis
Economics is typically a quantitative science, which exclusively relies on mathematical techniques, statistical analysis, experimental work, and neglects qualitative evidence, data, and research methods. Although economic methodology scholars have outlined this unbalance, and a few studies pursued qualitative economic research in the past, these are rather the exception to the rule. However, most social sciences and adjacent disciplines do adopt qualitative methodologies when tackling economic phenomena, issues, and topics. Drawing upon the history of economic thought and the philosophy of the social sciences, this dissertation asks why economists do not rely on qualitative inquiry, how they could implement qualitative research, and in what subject domains. In doing so, it indeed (1) unveils the potential contribution of qualitative methods to both economic theory and policy, (2) highlights the role of sociocultural factors over behavioural elements in economic analysis, and (3) suggests the need for an ontological, epistemological, and axiological shift towards ‘qualitative economics’.
... Of course, latent variable models have their own weaknesses that do not make them universally appropriate. I would be one of the first people in line to criticize how factor analytic fit is evaluated (e.g., McNeish, 2023aMcNeish, , 2023b, latent variable models tend to be accompanied by little substantive theory that can hamper their utility (e.g., Fried, 2020;Eronen & Bringmann, 2021), and latent variable models encourage overemphasizing quantitative components of validity (e.g., Alexandrova & Haybron, 2016;Peters & Crutzen, 2024;Wolf, 2023). Any method applied without purpose and thought will have deficiencies and replacing uncritical sum scoring with uncritical use of factor analysis or IRT will do little to remedy current psychometric issues in empirical studies. ...
Article
Full-text available
This paper reflects on some practical implications of the excellent treatment of sum scoring and classical test theory (CTT) by Sijtsma et al. (Psychometrika 89(1):84–117, 2024). I have no major disagreements about the content they present and found it to be an informative clarification of the properties and possible extensions of CTT. In this paper, I focus on whether sum scores—despite their mathematical justification—are positioned to improve psychometric practice in empirical studies in psychology, education, and adjacent areas. First, I summarize recent reviews of psychometric practice in empirical studies, subsequent calls for greater psychometric transparency and validity, and how sum scores may or may not be positioned to adhere to such calls. Second, I consider limitations of sum scores for prediction, especially in the presence of common features like ordinal or Likert response scales, multidimensional constructs, and moderated or heterogeneous associations. Third, I review previous research outlining potential limitations of using sum scores as outcomes in subsequent analyses where rank ordering is not always sufficient to successfully characterize group differences or change over time. Fourth, I cover potential challenges for providing validity evidence for whether sum scores represent a single construct, particularly if one wishes to maintain minimal CTT assumptions. I conclude with thoughts about whether sum scores—even if mathematically justified—are positioned to improve psychometric practice in empirical studies.
... This is where Mark Fabian's A Theory of Subjective Wellbeing comes in. Building on Anna Alexandrova and Daniel Haybron's (2016) plausible diagnosis, Fabian starts off from the idea that progressing this debate further requires better theory. And this is what Fabian's book offers: a theory. ...
... It should briefly be noted that cost-effectiveness using life satisfaction scale data is not a promising alternative, despite recent advances (see Frijters and Krekel 2021 for an optimistic perspective). The psychometric exercises used to validate life satisfaction scales have been criticised (Alexandrova & Haybron 2016). Even if their validity is accepted, evidence suggests that the scales lack the precision required for cost-effectiveness analysis, especially when used with small samples, especially in the context of policy comparisons (Benjamin et al. 2020, Fabian 2021). ...
Technical Report
Full-text available
This report, produced by the University of Tasmania’s Institute for Social Change, was commissioned by the Tasmanian Department of Premier and Cabinet. The report sheds light on issues and common practice in the development and implementation of population outcomes and wellbeing frameworks with the overarching aim of informing a prospective population outcomes framework for the Tasmanian State Service (TSS). To identify common practice, this report reviews a range of frameworks and indices from around the world that seek to conceptualise and/or measure population wellbeing or comparable concepts.
Chapter
Full-text available
This chapter introduces scope validity as a conceptual tool to capture the (mis)matching of the scopes of disease operationalizations in different contexts of research and application. Drawing on examples of extrapolating results from animal models to human patient populations, the chapter proposes a shift in perspective away from idealized target constructs that can be hit and towards concrete practices of operationalization that render diseases researchable. It argues that we need to take seriously the locally varying conditions under which disease concepts operate and that impact on the assessment of a model’s validity. Combining an adequacy-for-purpose view towards validity with a practice-oriented, pragmatist and particularistic perspective on disease concepts, the chapter presents scope validity as a relational concept that does not presuppose the extent of a test or model’s generalizability to some hypothetical ideal. This offers us a possibility to distinguish between a model’s high external validity for a small patient population, and a model’s broad scope of applicability. Scope validity thus does not replace other validity concepts, such as predictive validity, external validity and construct validity, but rather helps to clarify and qualify the frame and conditions under which a model or test’s validity should be assessed, putting the question of adequacy in medical research to the forefront.
Article
Full-text available
This study aims to investigate the learning strategies of successful students in an online intermediate grammar class. The research participants were all English as Foreign Language (EFL) students, batches 2019 and 2020, 2021, who got an A or AB in intermediate grammar class at the English Language Education Program (ELEP), Universitas Kristen Satya Wacana (UKSW). Open-ended questionnaires and interviews were used as the instruments to collect the data. In the questionnaire, 49 participants were asked to respond to several statements. At the same time, in the interview process, the researcher involved 6 participants to clarify their responses to the questionnaire. The data analysis revealed three grammar learning strategies that could make students successful in their grammar class: (1) determining difficult grammatical structures and trying to learn the difficulty; (2) paying attention to grammatical structures when translating sentences; and (3) taking notes when the teacher explains a new grammatical structure.
Preprint
Full-text available
Critical thinking is not a current concept. However, it is taking on special relevance in today’s society thanks, in part, to the problem posed to human beings by fake news, and, by extension, it is beginning to be addressed systematically in educational centers. Similarly, critical thinking also affects the understanding of the information received from an emotional point of view, as it helps to be more objective and avoid misinterpreting both oral and written messages. In this sense, critical thinking is evolving into competence as the ability to put it into practice; these are parallel concepts to the concept of reading comprehension with respect to reading competence. In addition, most of the tests that have been used to measure critical thinking are translations from English, which, in some cases, are ambiguous and confusing to interpret in Spanish. As a result, understanding what is read is critical for developing a solid and solvent critical competence. The present study is a review of the scientific literature that aims to discern what critical thinking is, which is included as one of the skills to be developed with students according to the European Union. It provides a basis for the definition of critical competence as well as its dimensions, its relationship with reading, and its role in the classroom.
Article
Full-text available
Background Adolescent mental health is a major concern and brief general self‐report measures can facilitate insight into intervention response and epidemiology via large samples. However, measures' relative content and psychometrics are unclear. Method A systematic search of systematic reviews was conducted to identify relevant measures. We searched PsycINFO, MEDLINE, EMBASE, COSMIN, Web of Science, and Google Scholar. Theoretical domains were described, and item content was coded and analysed, including via the Jaccard index to determine measure similarity. Psychometric properties were extracted and rated using the COSMIN system. Results We identified 22 measures from 19 reviews, which considered general mental health (GMH) (positive and negative aspects together), life satisfaction, quality of life (mental health subscales only), symptoms, and wellbeing. Measures were often classified inconsistently within domains at the review level. Only 25 unique indicators were found and several indicators were found across the majority of measures and domains. Most measure pairs had low Jaccard indexes, but 6.06% of measure pairs had >50% similarity (most across two domains). Measures consistently tapped mostly emotional content but tended to show thematic heterogeneity (included more than one of emotional, cognitive, behavioural, physical and social themes). Psychometric quality was generally low. Conclusions Brief adolescent GMH measures have not been developed to sufficient standards, likely limiting robust inferences. Researchers and practitioners should attend carefully to specific items included, particularly when deploying multiple measures. Key considerations, more promising measures, and future directions are highlighted. PROSPERO registration: CRD42020184350 https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42020184350.
Article
Full-text available
In recent studies of the structure of affect, positive and negative affect have consistently emerged as two dominant and relatively independent dimensions. A number of mood scales have been created to measure these factors; however, many existing measures are inadequate, showing low reliability or poor convergent or discriminant validity. To fill the need for reliable and valid Positive Affect and Negative Affect scales that are also brief and easy to administer, we developed two 10-item mood scales that comprise the Positive and Negative Affect Schedule (PANAS). The scales are shown to be highly internally consistent, largely uncorrelated, and stable at appropriate levels over a 2-month time period. Normative data and factorial and external evidence of convergent and discriminant validity for the scales are also presented. (PsycINFO Database Record (c) 2010 APA, all rights reserved)
Article
Full-text available
Scale construction is a growth enterprise in the psychological literature. Unfortunately, many measures promise much but are severely limited by the inadequacies of their conceptualization and execution. In this paper, a model for developing psychological scales is presented that is rooted in the traditions of construct validity and classical test theory but informed by modern psychometric methods. Construct validity is conceptualized as a guiding principle in each of three phases of scale development, focused on (i) construct conceptualization and development of the initial item pool, (ii) item selection and structural validity, and (iii) assessment of external validity vis-à-vis other measures and relevant nontest criteria.
Article
Full-text available
A meta-analysis of published studies that reported correlations between self-ratings and informant ratings of well-being (life-satisfaction, happiness, positive affect, negative affect) was performed. The average self-informant correlation based on 44 independent samples and 81 correlations for a total of 8,897 participants was r=0.42 [99% credibility interval=0.39|0.45]. Statistically reliable moderators of agreement were construct (life-satisfaction=happiness>positive affect>negative affect), age of the target participant (older>younger), number of informants (multiple>single), and number of items in the measure (multiple>single). The implications for the validity of self-ratings of well-being as indicators of well-being are discussed.
Article
Full-text available
Science represents natural phenomena by means of theories, as well as in many concrete ways by such means as pictures, graphs, table-top models, and computer simulations. This book begins with an inquiry into the nature of representation in general, drawing on such diverse sources as Plato's dialogues, the development of perspectival drawing in the Renaissance, and the geometric styles of modeling in modern physics. Starting with Mach's and Poincaré's analyses of measurement and Reichenbach's 'problem of coordination', this book presents a view of measurement outcomes as representations achieved in a process of mutual stabilization of theory and empirical inquiry. With respect to the theories of contemporary science, the book defends an empiricist structuralist version of the 'picture theory' of science, compatible with a constructive empiricist view, through an inquiry into the paradoxes that came to light in 20th-century philosophies of science. It is argued that indexicality enters irreducibly into the conditions of use and application of measurement, models, and theories. The book concludes with an analysis of the complex relationship between appearance and reality in the scientific world-picture, arguing against the completeness criterion that demands a derivation of the appearances from the theoretically postulated reality.
Article
Full-text available
The Gallup World Poll, the first representative sample of planet Earth, was used to explore the reasons why happiness is associated with higher income, including the meeting of basic needs, fulfillment of psychological needs, increasing satisfaction with one's standard of living, and public goods. Across the globe, the association of log income with subjective well-being was linear but convex with raw income, indicating the declining marginal effects of income on subjective well-being. Income was a moderately strong predictor of life evaluation but a much weaker predictor of positive and negative feelings. Possessing luxury conveniences and satisfaction with standard of living were also strong predictors of life evaluation. Although the meeting of basic and psychological needs mediated the effects of income on life evaluation to some degree, the strongest mediation was provided by standard of living and ownership of conveniences. In contrast, feelings were most associated with the fulfillment of psychological needs: learning, autonomy, using one's skills, respect, and the ability to count on others in an emergency. Thus, two separate types of prosperity-economic and social psychological-best predict different types of well-being.
Book
The pursuit of happiness is a defining theme of the modern era. But what if people aren't very good at it? This and related questions are explored in this book, the first comprehensive philosophical treatment of happiness in the contemporary psychological sense. In these pages, Dan Haybron argues that people are probably less effective at judging, and promoting, their own welfare than common belief has it. For the psychological dimensions of well-being, particularly our emotional lives, are far richer and more complex than we tend to realize. Knowing one's own interests is no trivial matter. As well, we tend to make a variety of systematic errors in the pursuit of happiness. We may need, then, to rethink traditional assumptions about human nature, the good life, and the good society. Thoroughly engaged with both philosophical and scientific work on happiness and well-being, this book will be a definitive resource for philosophers, social scientists, policy makers, and other students of human well-being.
Article
"Construct validation was introduced in order to specify types of research required in developing tests for which the conventional views on validation are inappropriate. Personality tests, and some tests of ability, are interpreted in terms of attributes for which there is no adequate criterion. This paper indicates what sorts of evidence can substantiate such an interpretation, and how such evidence is to be interpreted." 60 references. (PsycINFO Database Record (c) 2006 APA, all rights reserved).
Article
The philosophy of measurement studies the conceptual, ontological, epistemic, and technological conditions that make measurement possible and reliable. A new wave of philosophical scholarship has emerged in the last decade that emphasizes the material and historical dimensions of measurement and the relationships between measurement and theoretical modeling. This essay surveys these developments and contrasts them with earlier work on the semantics of quantity terms and the representational character of measurement. The conclusions highlight four characteristics of the emerging research program in philosophy of measurement: it is epistemological, coherentist, practice oriented, and model based.
Book
The case is made for implementing national accounts of well-being to help policy makers and individuals make better decisions. Well-being is defined as people's evaluations of their lives, including concepts such as life satisfaction and happiness, and is similar to the concept of 'utility' in economics. Measures of well-being in organizations, states, and nations can provide people with useful information. Importantly, accounts of well-being can help decision makers in business and government formulate better policies and regulations in order to enhance societal quality of life. Decision makers seek to implement policies and regulations that increase the quality of life, and the well-being measures are one useful way to assess the impact of policies as well as to inform debates about potential policies that address specific current societal issues. This book reviews the limitations of information gained from economic and social indicators, and shows how the well-being measures complement this information. Examples of using well-being for policy are given in four areas: health, the environment, work and the economy, and social life. Within each of these areas, examples are described of issues where well-being measures can provide policy-relevant information. Common objections to using the well-being measures for policy purposes are refuted. The well-being measures that are in place throughout the world are reviewed, and future steps in extending these surveys are described. Well-being measures can complement existing economic and social indicators, and are not designed to replace them.