Content uploaded by Sara Dolnicar
Author content
All content in this area was uploaded by Sara Dolnicar on Mar 01, 2016
Content may be subject to copyright.
Why the level-free forced-choice binary measure of
brand-image beliefs works so well
Cite as:
Rossiter, J.R., Dolnicar, S. & Grün, B. (2015) Why Level-Free Forced Choice Binary
Measures of Brand Benefit Beliefs Work Well. International Journal of Market Research,
57(2), 1-9.
ABSTRACT
The level-free version of the Forced-Choice Binary measure of brand-image beliefs was
introduced in a recent article in IJMR (Dolnicar, Rossiter, and Grün, 2012) and was shown to
yield more stable brand-image results than the shorter ‘Pick-Any’ measure and the longer ‘7-
Point Scale’ measure. The aim of the present article is to investigate how the Level-Free
Forced-Choice Binary measure works and identify a number of advantages, which most
importantly include prevention of all forms of response bias.
Why the level-free forced-choice binary measure of
brand-image beliefs works so well
INTRODUCTION
Beliefs about the degree to which certain products and services possess particular
attributes represent a highly prevalent and important construct in marketing research. Beliefs
form the empirical basis for many commonly used marketing analyses: multidimensional
scaling (e.g., Green and Rao, 1972); conjoint analysis (e.g., Green and Srinivasan, 1978); and
multi-attribute attitude models, which in marketing science are called subjective expected
utility models (e.g., Fishbein, 1963; Fishbein and Ajzen, 1975, 2010; McFadden, 1980).
These belief-based models are used for most strategic (e.g. positioning) and operational (e.g.
advertising) marketing decisions. A recent practitioner example is the pairing of tennis ace
Roger Federer with Moët & Chandon champagne, an alliance made because of the M&C
CEO’s reasoning that consumers share three key beliefs about both brands: “elegance,
glamor, and success” (Eggleton, 2012, p. 24). In particular, beliefs are the basis of brand-
image research in academia and of brand-image tracking studies conducted by marketing
practitioners.
When deciding how to measure brand-image beliefs, most academic researchers opt
for a multipoint measure, most often some form of ‘7-Point Scale.’ The well-known
SERVQUAL service brand-image beliefs, for instance, are measured on Likert 7-point
‘agree-disagree’ answer scales (see Parasuraman, Zeithaml, and Berry, 1988). However,
measurement of beliefs with a ‘7-Point Scale’ is problematic, for two main reasons. One
reason is that researchers use the same 7-point measure regardless of whether the attribute is
unipolar (e.g., a laundry detergent’s cleaning performance) or bipolar (e.g., evaluation of the
3
extent to which a fast-food restaurant’s offerings are healthy, on the positive side, or
unhealthy, on the negative side). The SERVQUAL instrument, for example, mixes unipolar
attributes (e.g., degree of responsiveness) with bipolar attributes (e.g., politeness of service
personnel, with strong disagreement implying that the service personnel are extremely
impolite, not just lacking politeness). The main problem resulting from use of the same 7-
point answer scale is that the seven degrees of attribute intensity in a 7-point scale will tend
to ‘overmeasure’ unipolar attributes because ‘too much’ discrimination is offered (see
Viswanathan, Sudman, and Johnson, 2004) and at the same time may in comparison
‘undermeasure’ bipolar attributes if ‘too little’ discrimination is offered with only three levels
of attribute intensity on either side of the neutral midpoint.
A second serious problem is that the data obtained from 7-point and other multipoint
rating measures are notoriously subject to distortion from raters’ response sets (Cronbach,
1946, 1950). The most prevalent response sets are ‘acquiescent’ responding, ‘extreme’
responding, and ‘midpoint’ responding (Baumgartner and Steenkamp, 2001;
Diamantopoulos, Reynolds, and Simintiras, 2006; Dolnicar and Grün, 2007; Tellis and
Chandrasekaran, 2010; Weijters, Geuens, and Schillewaert, 2010). These response sets
produce biased ratings of belief intensity; they also lead to common-measure bias in the
correlations between the belief ratings measured on the same answer scale and can inflate or
deflate correlations depending on the type of response set adopted by the rater.
Despite these problems, multipoint belief measures continue to dominate in market
research. The single exception is in practitioners’ brand and advertising tracking studies,
where shorter binary measures are employed to measure brand-image beliefs because they
are much quicker for respondents to answer. The Forced-Choice Binary measure of brand-
image beliefs was introduced by the British Audience Research Bureau, BARB,
approximately 50 years ago (see Joyce, 1963, and also McDonald, 2000). Introduced at the
same time was the Free-Choice Binary, or ‘Pick-Any,’ measure – used today by nearly all
brand-image tracking companies, including the worldwide market leader, Millward Brown
plc. The Pick-Any measure’s popularity is due to it being more efficient to administer (it
requires only a simple brand-by-attribute, rows-by-columns, matrix on the questionnaire and
respondents only have to answer ‘yes’ to, i.e., to ‘pick,’ those attributes they believe the brand
has, omitting all others). An earlier study by Driesener and Romaniuk (2006) compared the
speed of completion of the Pick-Any measure versus brands’ rankings on the attributes and
brands’ ratings on a 5-point Likert measure, finding that the Pick-Any measure required just
half the amount of time to complete as the other two measures. However, Dolnicar, Rossiter,
and Grün (2012) raised concerns about the validity of the Pick-Any measure (see also
Dolnicar and Rossiter, 2008). The ‘free choosing’ in Pick Any allows respondents to omit,
and therefore underreport, the image incidence for the brands; the proposed Level-Free
Forced-Choice Binary measure is not affected by this problem because it encourages
respondents to carefully consider those attributes that are presented on the questionnaire.
Also, those image attributes that are ‘picked’ using a Pick-Any measure are reported
unstably, such that they too often fail to be picked again, even after a short time interval, and
many new ones are picked on the second occasion. A low two-way repeat rate of 50% –
equivalent to “chance” responding by the raters – was observed in an earlier large-scale study
with Pick-Any measures of beliefs (Dall’Almo Riley, Ehrenberg, Castleberry, Barwise, and
Barnard, 1997) and in Dolnicar et al.’s (2012) study the Pick-Any repeat rate was even worse
than chance, at 41%. Underreporting and unstable responding pose great problems for brand-
image-tracking studies because the results, when averaged over respondents, particularly if
‘rolled’ as a moving average as in most tracking studies, will imply falsely that brand images
are highly stable.
5
Another problem with brand-image measures, which has not been identified to date,
is that the image attributes frequently are worded with a fixed attribute level (e.g., ‘cleans
very well’ for a brand of laundry detergent; or being ‘extremely convenient’ for a fast-food
restaurant brand). The fixed level of the attribute in the item gets confounded with the level
chosen in the answer (see Rossiter, 2002). In the laundry detergent example, consumers
would be more likely to say ‘yes’ if the item were worded as ‘cleans well’ rather than ‘cleans
very well.’ In the fast-food restaurant example, consumers would be more likely to ‘agree’ if
the item were worded as ‘convenient’ rather than ‘extremely convenient.’
This last point about levels in measures led to the present authors’ innovation in
designing Forced-Choice Binary measures: the new measures are designed to be level-free in
the item wording, noting that they are already level-free in the binary answer options (‘yes,
no’ for unipolar attributes and ‘agree, disagree’ for bipolar attributes, with no levels between
the two answer options – see examples in Appendix A). This means that the overall measure
is doubly level-free. Indeed, the technical name given to the new version of Forced-Choice
Binary measures (see chronologically Rossiter, 2011; Dolnicar et al., 2012; Dolnicar and
Grün, 2013; and Dolnicar, in press) is DLF IIST, which stands for Doubly Level-Free
Individually-Inferred Satisfaction Threshold. The second part of the name – the IIST part –
refers to the information-processing mechanism through which the new version of the
Forced-Choice Binary measure is theorized to work.
THE THEORY UNDERLYING THE LEVEL-FREE
FORCED-CHOICE BINARY MEASURE
All multipoint brand-image rating measures – Likert, Semantic Differential, and so
forth – seek an absolute judgment. The Level-Free version of the Forced-Choice Binary
measure, in a fundamental departure from this, seeks a comparative judgment (see Thurstone,
1927). The comparative judgment requires the respondent to form, or retrieve from memory,
a rough estimate of the brand’s believed degree of possession of the attribute, which is
followed automatically by an easy determination as to whether this roughly believed degree
of attribute possession meets the respondent’s previously learned threshold of satisfactory
attribute possession (based on the respondent’s experience with brands previously
encountered in the product category). Attributes, as noted above, are of two kinds: unipolar
or bipolar. The upper panel of Figure 1 illustrates how unipolar attribute judgments are
theorized to operate via the Level-Free Forced-Choice Binary measure, and the lower panel
shows this for bipolar attributes. Note that there is one threshold for a unipolar attribute; but
theoretically there are two thresholds for a bipolar attribute, one for the negative pole and
another for the positive pole. However, Binary measures – like Likert measures – are always
worded unipolar (see examples in Appendices A and B), which means that for a bipolar
attribute either one pole must be chosen or else the two poles must be measured as separate
items. Accordingly, there is only one threshold to be considered in any single brand-image
item.
Figure 1 about here
An alternative way of presenting the theory of how the Level-Free Forced-Choice
Binary measures operate as comparative judgments is to express the process symbolically. In
general, belief judgments can be denoted symbolically as Bio.a.k, where B = the belief, i = the
individual judge or rater, o = the belief object (a brand of a product or service, for instance), a
= the attribute of judgment, and k = the judged absolute level of the attribute. Level-Free
Forced-Choice Binary judgments are different. The belief, B, is now only a rough judgment
rather than an attempted precise one, and can be denoted as Bio.a.rk, where the final subscript,
7
rk, signifies ‘rough k’ and allows for a range of k rather than a precise value. Cowley and
Rossiter’s (2005) range model of judgment provides the evidence that consumers do in fact
have a range in mind around the attribute level even though they are asked by the researcher
or by the instructions in the questionnaire to make a precise judgment by marking a single
point on the rating scale. (For the earliest demonstration of the ‘fuzzy’ nature of consumers’
belief ratings, see Woodruff, 1972.) When presented with a Level-Free Forced-Choice
Binary measure, such as ‘Omo gets stains out: □ Yes □ No,’ or ‘McDonald’s is unhealthy:
□ Agree □ Disagree,’ the supposition is that the consumer automatically brings to mind a
‘standard,’ or ‘threshold,’ that represents a learned average level of the attribute for that
category of objects (laundry detergents or fast-food restaurants in the two examples). This
learned average belief level can be denoted as Bic.a.avk, where the symbol, c, for category,
replaces the symbol, o, for object, and avk is now the consumer’s perceived average level of
the attribute for brands the consumer knows in that category. It is avk that functions as the
standard for comparison. Thus:
If Bio.a.rk ≥ Bic.a.avk, the consumer answers ‘yes,’ or ‘agree’
whereas
If Bio.a.rk < Bic.a.avk, the consumer answers ‘no,’ or ‘disagree’
The consumer’s comparative-judgment ‘decision rule’ is: if rk meets or exceeds avk, give an
affirmative answer; but if rk is below avk, give a negative answer. The comparison is easy to
make because it does not require an absolute point judgment. The comparison is so quick
(see Dolnicar and Grün, 2013) that it can be presumed to be automatic.
The theorized comparative judgment process is very different from that involved in
answering on a 7-point (or other multipoint) rating scale. For instance, when making belief
ratings on a 1-to-7 Unipolar Numerical rating scale, the consumer has to first make a
judgment about what the anchor words at each end of the scale mean (is one end the ‘zero’
level of the attribute or merely the ‘low’ level? and is the other end the ‘high’ level or is it the
‘extremely high’ level?) and also about what the numbers mean (the number ‘4’ in the middle
of a 1-to-7 numbered scale in particular). All this decision-making takes time and, more
importantly, it leads to undetectable individual variation in scale-level interpretation.
Varying scale-level interpretation is considered by psychometricians to be ‘random error,’ or
simply ‘noise.’
The new Forced-Choice Binary measure substantively reduces ‘noise.’ This is
because the between-individual variation – known as heterogeneity – is captured in the
threshold term (in avk). The threshold is specific to the individual because it is based on his
or her personal learning history. Figure 2 illustrates the consequences of this threshold.
Suppose that two consumers, person A and person B, are asked to rate a brand’s image belief
on a typical ‘7-Point Scale’ with possible scores ranging from 1 to 7, and that both decide that
the closest numerical rating that corresponds with their judgment is a 5. Does this mean that
the two consumers regard the object (a brand of laundry detergent, say) as performing equally
well on the attribute (cleaning performance, say)? The answer is: not at all if they have
different standards – different thresholds – for what constitutes satisfactory performance.
Person A’s threshold is 5 in the figure, so the brand’s performance would be seen by person A
as satisfactory. Person B’s threshold, on the other hand, is higher, namely a 6, so person B,
despite giving the same absolute rating, would not see the brand’s performance as
satisfactory. The problem for multipoint rating measures, such as the one shown, is that no
threshold is invoked. With the Level-Free Forced-Choice Binary measure the threshold is
inevitably invoked because the forced-answer measure could not be answered without it. Try
answering, for example, the item ‘Omo gets out stains: □ Yes □ No’ or the item
‘McDonald’s is convenient: □ Agree □ Disagree.’ You will realize that neither item could
9
be answered without having a standard or threshold for, respectively, ‘stain removal’ and
‘convenience’ implicitly in mind.
Figure 2 about here
The Level-Free Forced-Choice Binary measure’s unique ability to capture
heterogeneity across respondents means that, despite having only a ‘2-point answer scale,’ it
does not suffer from the ‘restricted variance’ problem (the problem offers alleged against
binary measures; see, e.g., Lehmann and Hulbert, 1972, Nunnally, 1978, and Gleason,
Devlin, and Brown, 2003). Binary-answered measures are of course restricted arithmetically
to 0’s and 1’s, or –1’s and +1’s if bipolar, whereas the answers on a ‘7-Point Scale’ measure
can range more widely over the numbers 1 to 7.1 But if much of the multipoint variation is in
fact ‘noise,’ caused by the respondent’s inability to make precise intensity judgments and by
the operation of response sets, this noise will distort the true rating. With the threshold-based
Level-Free Forced-Choice Binary measure, the variation is not ‘noise’ but is true between-
person variation in the easy-to-make judgments of whether the brand meets or falls below the
threshold for the particular attribute.
MULTIPOINT MEASURES OF BELIEFS
SUFFER FROM RESPONSE BIAS
1 The arithmetic restriction of data resulting from Level-Free Forced-Choice Binary measures is identical to that
of the Pick-Any measures commonly used in brand tracking studies. Therefore, all statistical methods that can
be used for Pick-Any data can also be used for data collected using the Level-Free Forced-Choice Binary
measure. Data from a 7-point answer scale contains seven discrete values; it does not measure brand-image
beliefs on a continuous scale, thus also precluding the use of most standard statistical procedures developed for
continuous data. Instead, procedures which take the discrete nature of the data into account have to be used.
Such procedures are often also applicable to binary data, because binary data is a special case of multi-category
data. For example, polychoric correlations (Olsson 1979), which estimate the correlation between two theorized
normally distributed continuous latent variables from two ordinally measured variables, can also be estimated
for binary data.
Level-Free Forced-Choice Binary measures of brand-image attributes do not require
precise ratings of attribute intensity; they require only estimates of ‘rough k,’ rk, not ‘exact k,’
k. There is good evidence that people cannot make precise ratings of typical consumer
beliefs (see Woodruff, 1972; and Cowley and Rossiter, 2005) and this indecision leaves
multipoint rating measures open to response sets, which produce what are known as biased
scores. The analysis in Table 1 summarizes how the most common multipoint measures of
beliefs – Unipolar, Likert, and Semantic Differential – compare with the Level-Free Forced-
Choice Binary measure for susceptibility to response sets. The major response sets are
acquiescence, extreme responding, and midpoint responding, and they each lead to a
particular form of bias in the ratings.
Table 1 about here
Acquiescence Bias
Acquiescent responding, or ‘yea-saying,’ and its opposite, disacquiescent responding,
or ‘nay-saying,’ occur most often with political or socially sensitive topics, where many
people tend to give socially desirable answers. The vast majority of consumer topics –
products, services, or in-ad presenters – are not sensitive topics. Any apparent yea-saying or
nay-saying is much more likely to be halo responding caused by the respondent’s favorable
(positive halo) or unfavorable (negative halo) preexisting overall attitude toward the rated
object. Halo responding is therefore a true response rather than an erroneous ‘bias’ (see
Holbrook, 1983; J. Park, K. Park, and Dubinsky, 2011; and Rossiter, 2011).
Acquiescent responding (as distinct from positive halo responding) can, however,
show up as a response-order effect. Response-order is a more recently identified response set
11
that occurs mainly with orally administered measures, as in face-to-face interviews or on the
telephone, where the last-mentioned response option tends to be retained better in working
memory and is thus chosen more often than it would normally be chosen on a self-
administered written or online questionnaire (see Krosnick and Alwin, 1987). However, the
present authors’ unpublished experiments varying the order of the two answer options for
Level-Free Forced-Choice Binary measures have shown no evidence of an order effect (with
the online administration that was used, a first-response bias would be expected). And with
orally delivered questions on the phone or in person, the two answer alternatives are easily
kept simultaneously in working memory, thus precluding the response-order effect.
Practically speaking, this means that with Level-Free Forced-Choice Binary measures it
makes no difference with a unipolar attribute whether you place the ‘yes’ answer-box first or
second, and the same goes for the ‘agree’ answer-box with a bipolar attribute.
Extreme Bias
Extreme responding, unlike acquiescent responding, does pose a serious problem for
all multipoint belief measures. Extreme responding is detectable only by examining
individuals’ response patterns across multiple similar items, wherein extreme responding is
likely to have occurred if the respondent has ‘straight-lined’ down one or the other extreme
side for all items. Whereas it is becoming routine for the better fieldwork companies to check
for ‘straight-liners,’ these respondents are often retained in the data to keep up the sample
size, because of the well-known worsening respondent recruitment incidence (see Menictas,
Wang, and Fine, 2011). Academic researchers hardly ever report checking for such biases, so
their data are almost always contaminated by extreme responding.
Extreme responding by any substantial proportion of raters will tend to artificially
inflate correlations between the image ratings. As the findings in Table 2 reveal, the inflation
will be worse for unipolar attribute measures because there is more likelihood of
overdiscrimination with more answer categories with which to be ‘extreme on’ (six beyond
the left-hand zero category) than with bipolar attribute measures (three on either side of the
neutral midpoint). Numbering bipolar answer scales as unipolar (e.g., 1 to 7) will also inflate
correlations because respondents see six categories over which to stretch their responses
when in fact there are only three.
Table 2 about here
The big advantage of Forced-Choice Binary measures is that there are no extreme
options: the individual has only to answer ‘yes’ (or ‘agree’) on one side of the internal
threshold and ‘no’ (or ‘disagree’) on the other, and cannot answer extremely.
Midpoint Bias
Midpoint responding is another common way for respondents to ‘opt out’ from
carefully answering survey questions (Dolnicar and Grün, in press) and, like extreme
responding, it is detectable only by inspection of individuals’ response patterns on multiple
belief items. Midpoint responding as a ‘response set’ is most likely to be found with bipolar
answer scales, where the midpoint is supposed to mean ‘neutral’ or ‘neither’ but is often
resorted to when the respondent ‘can’t be bothered’ answering properly (see Dolnicar and
Grün, in press). Erroneous midpoint responding may occur also with unipolar answer scales
when bipolar attributes are mixed in with unipolar ones. This happens frequently in Semantic
Differential item batteries (see Osgood, Suci, and Tannenbaum, 1957) that mix unipolar items
like ‘low quality…high quality’ with bipolar items like ‘bad…good.’
13
Midpoint responding as a response bias, like extreme responding, will affect
correlations between belief ratings, but in the opposing manner. Midpoint response bias will
tend to deflate the correlation. With ratings entered directly to the computer these days,
‘midpoint straight-lining,’ just like straight-lining on extreme answers, is easily detected but
rarely corrected for by removing the offending respondents. Deflation of the correlation
occurs because the between-attribute rating variance will tend toward zero if too many
respondents opt out via the midpoint. It should be noted that omitting the midpoint answer
option (such as using –2 –1 +1 +2 answer options instead of –2 – 1 0 +1 +2) does not
solve the midpoint opting-out problem. Respondents are likely to distribute their would-be-
zero opting out answers at random to one or other of the two near-midpoint categories (–1 or
+1 in this example). Consistent near-midpoint responding will still tend to deflate the
correlation between brand-image belief ratings.
Level-Free Forced-Choice Binary measures have no midpoint and thus they prevent
midpoint response bias.
MULTIPOINT BELIEF RATINGS ARE UNSTABLE
An important consequence of the response biases inherent in multipoint measures of
beliefs is individual-level instability of brand image ratings, even over short periods where no
actual change in the brand’s image has taken place. (Note that aggregate stability – total-
sample average stability – is not relevant, because it masks individual-level instability.)
Individual-level stability can be assessed in the ‘test-retest’ reliability paradigm by calculating
the proportion of respondents who exactly repeat their initial rating (or initial binary
judgment) on a short-interval, one- or two-week later, retest. Perfect stability (a proportion of
1.0) can be expected only among consumers who are familiar with the product category, the
brand to be rated, and the attributes used in the measures (see Dolnicar et al., 2012). Some
degree of stability can occur by chance and is dependent on the number of answer categories:
for a 7-point answer scale, where each category is equally likely to be chosen, the chance
stability proportion is 1/7 × 1/7 = 1/49 = .02, and for a binary answer scale it is 1/2 × 1/2 = .
25. Table 3 shows the exact stability proportions for 7-Point Scale ratings and Level-Free
Forced-Choice Binary judgments (for the same two data sets as in the previous table). While
both stability proportions are well above their respective chance proportions, the exact
stability for 7-Point Scale ratings is very low, averaging about .45, compared with the exact
stability for the Forced-Choice Binary measures, which in both cases is above .80.
Table 3 about here
The 7-Point Scale measure’s low exact repeatability provides empirical proof of the
present authors’ presumption that precise intensity ratings on multipoint answer scales are
difficult to make. The Level-Free Forced-Choice Binary measure’s attribute judgments of
whether the brand ‘meets’ or ‘doesn’t meet’ the individual’s established threshold are easier
to make and therefore are more stable.
NO DIAGNOSTIC LIMITATION WITH LEVEL-FREE
FORCED-CHOICE BINARY MEASURES
A seeming limitation in ‘switching’ to Level-Free Forced-Choice Binary measures is
researchers’ feared loss of diagnostic capability. With multipoint belief ratings, in theory, the
marketer can use multiple regression to relate the brand’s attribute belief ratings to a relevant
dependent variable such as Overall Attitude, Overall Satisfaction, or Purchase Intention.
From the regression coefficients for the belief ratings, the market researcher can compute the
‘elasticity’ of each attribute and estimate the incremental gain on the dependent variable if the
15
belief were to be increased for a positive attribute or decreased for a negative attribute. But if
multipoint belief ratings are fuzzy, biased in an unknown direction, and unstable, then the
regression weights, which are in effect partial correlation coefficients, will also be unstable.
Diagnosis by regression analysis then becomes untrustworthy and, worse, misleading.
Switching to Level-Free Forced-Choice Binary measures of brand-image attributes
would, at first, seem not to be the solution to the diagnostic problem because the binary
judgments are seen as too blunt and not sensitive enough to record marketing-induced shifts
in belief ratings. However, as some advertising theorists have pointed out (specifically
Moran, 1985, and Rossiter and Bellman, 2005), the ultimate purpose of marketing is to get as
many individual consumers as possible up to the ‘go, no go’ binary action threshold on the
brand’s targeted attribute or attributes. Unlike the Pick-Any measure, which understates the
incidence of brand-image beliefs, the Level-Free Forced-Choice Binary measure records the
brand’s threshold-meeting incidence for each attribute belief exactly. And unlike 7-Point or
other multipoint measures, the Level-Free Forced-Choice Binary measure avoids the
confound of the incidence of consumers who believe the brand has the attribute with the
intensity of their belief (which is a completely neglected problem with multipoint belief
ratings). For example, a service company might be rated overall 6 out of 7 for satisfying
customers on an attribute such as ‘response time’ but this could be due to most customers
being perfectly satisfied and giving the company a 7 rating while others are disaffected and
give it a much lower rating. Of course, this distribution of ratings could easily be checked by
inspecting individual-level responses instead of only the group-average response, but the
researcher would still not know at which number to make the “cutoff” for a truly satisfactory
rating. With the Level-Free Forced-Choice Binary measure there are no intensity differences
in the binary answers. Consumers are merely answering either at or below their individual
thresholds and this reveals pure incidence, namely: the proportion of customers who are
satisfactorily satisfied with regard to the targeted attribute or attributes.
In the hope of encouraging adoption of the Level-Free Forced-Choice Binary
measure, two examples of prototype questionnaires are provided in Appendix A. The first
example covers unipolar attributes for laundry detergents (scored 1, 0). The second example
covers bipolar attributes for fast-food restaurants (scored +1, –1). A third example is shown
for a modified SERVQUAL-type instrument in Appendix B, with Level-Free Forced-Choice
Binary measures replacing the usual 7-point Likert answer scales used in service quality
research. The attributes in this example are all unipolar as in Likert items, but worded level-
free unlike typical Likert items, and should be scored ‘yes’ = 1 and ‘no’ = 0 to reveal
individual ‘at-threshold’ incidence. Customized variations of these questionnaires can easily
be constructed from appropriate qualitative research.
A final, radical recommendation is that researchers drop the general term ‘brand
image.’ The term has long outlived its original meaning of an emotions-based profile of the
brand – correctly measured only by using a set of Osgood et al.’s (1957) 7-point, correctly
bipolar, Semantic Differential items. For efficiency reasons, professional tracking companies
do not use semantic scales when tracking ‘image attributes’ and academic researchers rarely
now use them either (see, for instance, Driesener and Romaniuk’s, 2006, article on ‘brand
image measurement’ and indeed the present article) and so ‘brand image’ is rarely in fact
measured! Brand-Attribute Beliefs, the term used in the Rossiter and Percy (1987, 1997) or
Rossiter and Bellman (2005) advertising management and research textbooks, would be more
fitting of what brand-tracking market researchers actually measure.
17
APPENDIX A
Examples of Level-Free Forced-Choice Binary measures for unipolar attributes and bipolar
attributes (one brand shown in each case)
Laundry detergents
Omo:
Cleans □ Yes □ No
Removes stains □ Yes □ No
Whitens whites □ Yes □ No
Brightens colors □ Yes □ No
Freshens clothes □ Yes □ No
Fast-food restaurants
McDonald’s:
Yummy □ Agree □ Disagree
Quick service □ Agree □ Disagree
Value for money □ Agree □ Disagree
Unhealthy □ Agree □ Disagree
Convenient □ Agree □ Disagree
APPENDIX B
SERVQUAL-type questionnaire modified from the Likert format to the Level-Free Forced-
Choice Binary format (service category: retail banks)
Banks’ previous or current customers as raters
Barclays Bank:
1. Welcoming-looking branches □ Yes □ No
2. Branch convenient to work or home □ Yes □ No
3. Well laid-out interior facilities □ Yes □ No
4. Short waiting times □ Yes □ No
5. Privacy for important transactions □ Yes □ No
6. Polite tellers □ Yes □ No
7. Competent desk personnel □ Yes □ No
8. Competitive interest rates □ Yes □ No
9. Account statements sent frequently □ Yes □ No
10. Account statements clear and accurate □ Yes □ No
11. Good online banking □ Yes □ No
19
REFERENCES
Baumgartner H, Steenkamp J-BEM. 2001. Response styles in marketing research: a cross-
national investigation. Journal of Marketing Research, 38(2), 143-156.
Cowley E, Rossiter JR. 2005. Range model of judgments. Journal of Consumer Psychology,
15(3), 250-262.
Cronbach LJ. 1946. Response sets and test validity. Educational and Psychological
Measurement, 6(4), 475-494.
Cronbach LJ. 1950. Further evidence on response sets and test design. Educational and
Psychological Measurement, 10(1), 3-31.
Dall’Olmo Riley F, Ehrenberg ASC, Castleberry SB, Barwise TP, Barnard NR. 1997. The
variability of attitudinal repeat rates. International Journal of Research in
Marketing, 14(5), 437-450.
Diamantopoulos A, Reynolds NL, Simintiras AC. 2006. The impact of response styles on the
stability of cross-national comparisons. Journal of Business Research, 59(August),
925-935.
Dolnicar, S. In press. Asking good survey questions. Journal of Travel Research.
Dolnicar S, Grün B. 2007. Cross-cultural differences in survey response patterns.
International Marketing Review, 24(2), 127-143.
Dolnicar S, Grün B. 2013. Validly measuring destination images in survey
studies. Journal of Travel Research, 52(1), 3-13.
Dolnicar, S., Grün, B. In press. Including don’t know answer options in brand image surveys
improves data quality. International Journal of Market Research.
Dolnicar S, Rossiter JR. 2008. The low stability of brand-attribute associations is partly due
to market research methodology. International Journal of Research in Marketing,
25(2), 104-108.
Dolnicar S, Rossiter JR, Grün B. 2012. ‘Pick any’ measures contaminate brand image studies.
International Journal of Market Research, 54(6), 821-834.
Driesener C, Romaniuk, J. 2006. Comparing methods of brand image measurement.
International Journal of Market Research, 48(6), 681-698.
Eggleton J. 2012. The brand raquet: why Federer, Moet are a champagne union. The
Australian, December 3, pp. 24, 28.
Fishbein M. 1963. An investigation of the relationships between belief about the object and
attitude toward the object. Human Relations, 16(3), 233-240.
Fishbein M, Ajzen I. 1975. Belief, Attitude, Intention, and Behavior: An Introduction to
Theory and Research. Addison-Wesley: Reading, MA.
Fishbein M, Ajzen I. 2010. Predicting and Changing Behavior: The Reasoned Action
Approach. Psychology Press: New York.
Gleason TC, Devlin SJ, Brown M. 2003. In search of the optimum scale. Marketing
Research, 15(3), 25-29.
Green PE, Rao VR. 1972. Applied Multidimensional Scaling: Comparison of Approaches
and Algorithms. Holt, Rinehart and Winston: New York.
Green PE, Srinivasan V. 1978. Conjoint analysis in consumer research: issues and outlook.
Journal of Consumer Research, 5(2), 103-123.
Holbrook MB. 1983. Using a structural model of halo effect to assess perceptual distortion
due to affective overtones. Journal of Consumer Research, 10(2), 247-252.
Joyce, T. 1963. Techniques of brand image measurement. New Developments in Research –
6th Annual Conference of the Market Research Society. London: Market Research
Society, pp. 45-63.
Krosnick JA, Alwin DF. 1987. An evaluation of a cognitive theory of response order effects
in survey measurement. Public Opinion Quarterly, 51(2), 201-219.
21
Lehmann DR, Hulbert J. 1972. Are three-point scales always good enough? Journal of
Marketing Research, 9(4), 444-446.
McDonald C. 2000. Tracking advertising and monitoring brands. Admap Monograph No. 6.
Henley-on-Thames, England: Admap Publications.
McFadden D. 1980. Econometric models for probabilistic choice among products. Journal of
Business, 53(3), 513-530.
Menictas C, Wang P, Fine B. 2011. Assessing flat-lining response style bias in online
research. Australasian Journal of Market & Social Research, 19(2), 34-44.
Moran WT. 1985. The circuit of effects in tracking advertising profitability. Journal of
Advertising Research, 25(1), 25-29.
Nunnally JC. 1978. Psychometric Theory, 2nd edn. McGraw-Hill: New York.
Olsson, U. 1979. Maximum likelihood estimation of the polychoric correlation coefficient.
Psychometrika, 44(4), 443-460.
Osgood CE, Suci GJ, Tannenbaum P. 1957. The Measurement of Meaning. Urbana, IL:
University of Illinois Press.
Parasuraman A, Zeithaml V, Berry LL. 1988. SERVQUAL: a multiple-item scale for
measuring consumer perceptions of service quality. Journal of Retailing, 64(1), 12-40.
Park JY, Park K, Dubinsky AJ. 2011. Impact of retailer image on private brand attitude: halo
effect and summary construct. Australian Journal of Psychology, 63(3), 173-184.
Rossiter JR. 2002. The C-OAR-SE procedure for scale development in marketing.
International Journal of Research in Marketing, 19(4), 305-335.
Rossiter JR. 2011. Measurement for the Social Sciences: The C-OAR-SE Method and Why it
Must Replace Psychometrics. Springer: New York.
Rossiter JR, Bellman S. 2005. Marketing Communications: Theory and Application. Sydney,
Australia: Pearson.
Rossiter JR, Percy L. 1987. Advertising & Promotion Management. McGraw-Hill, New York.
Rossiter JR, Percy L. 1997. Advertising Communications & Promotion Management, 2nd
edn. McGraw-Hill, New York.
Tellis GJ, Chandrasekaran D. 2010. Extent and impact of response biases in cross-national
survey research. International Journal of Research in Marketing, 27(4), 321-341.
Thurstone LL. 1927. A law of comparative judgment. Psychological Review, 34(4), 273-286.
Viswanathan M, Sudman S, Johnson M. 2004. Maximum versus meaningful discrimination in
scale response: implications for validity of measurement of consumer perceptions
about products. Journal of Business Research, 57(February), 108-125.
Weitjers B, Geuens M, Schillewaert N. 2010. The stability of individual response styles.
Psychological Methods, 15(1), 96-110.
Woodruff RB. 1972. Measurement of consumers’ prior brand information. Journal of
Marketing Research, 9(3), 258-263.
23
FIGURE 1
HOW THE LEVEL-FREE FORCED-CHOICE BINARY MEASURE IS THEORIZED TO
OPERATE FOR UNIPOLAR ATTRIBUTES AND BIPOLAR ATTRIBUTES
Unipolar performance attribute
‘No’ ‘Yes’
Zero
performance
Maximum
performance
Category threshold
of satisfactory
performance
Bipolar evaluative attribute measured as two unipolar attributes
‘Disagree’ ‘Agree’
Neutral Negative
maximum
Category
threshold for
negativity
‘Disagree’ ‘Agree’
Neutral Positive
maximum
Category
threshold for
positivity
FIGURE 2
HOW THE THRESHOLD ADDS TO VALIDITY BY ACCOUNTING FOR INDIVIDUAL-
LEVEL HETEROGENEITY
With 7-point rating measures of belief, two individuals could have identical scores of, say, 5…
Person A 1 2 3 4 5 6 7
Person B 1 2 3 4 5 6 7
But if their category satisfaction thresholds for the attribute differ, this would mean
a different result. If, for example:
Person A’s threshold is 5 or lower, then the FC Binary answer is ‘Yes.’
Person B’s threshold is 6 or higher, then the FC Binary answer is ‘No.’
25
Table 1. Multipoint measures of beliefs are susceptible to all major forms of response set
(response bias) whereas Level-Free Forced-Choice Binary measures prevent them
Belief measure
Response set
type Acquiescence Extremes Midpoint
Unipolar
(not at all maximum)
YES YES No (unless the
unipolar scale is
wrongly
interpreted as
bipolar)
Likert
(strongly disagree
strongly agree)
YES YES YES
Semantic differential
(e.g., dislike like)
YES YES YES
Level-Free FC Binary
(yes, no; disagree, agree)
No (empirical
tests show no
yea-saying
No (no No
extreme (no
midpoint)
options)
effect)
27
Table 2. Correlations between brand-attribute belief ratings show inflation (compared with
Level-Free Forced-Choice Binary judgments) when there are more scale points
Level-Free
FC Binary
(yes, no)
Unipolar
(4-point
one-sided)
Unipolar
(7-point
one-sided)
Correlations for laundry
detergent brand-attribute
beliefsa
.40 .74 .86
Level-Free
FC Binary
(agree, disagree)
Bipolar
(2-point each side
of midpoint)
Bipolar
(3-point each side
of midpoint)
Correlations for fast-food
restaurant brand-attribute
beliefsb
.22 .26 .29
a Six brands rated on seven laundry performance attributes by approximately n = 300
respondents per measure
b Five brands rated on five evaluative attributes by approximately n = 200 respondents
per measure
29
Table 3. Absolute belief intensity ratings are “fuzzy” as indicated by the former’s much
lower test-retest stability when compared with Level-Free Forced-Choice Binary
belief judgments
Exact stability proportion
7-Point
Scale
Level-Free
FC Binary
Laundry detergent performance beliefsa.44 .82
Fast-food restaurant evaluative beliefsb.46 .85
Chance stability proportion (.02) (.25)
a Same data set as in previous table
b Same data set as in previous table