ArticlePDF Available

Why the Level-Free Forced-Choice Binary Measure of Brand Benefit Beliefs Works So Well



The level-free version of the Forced-Choice Binary measure of brand benefit beliefs was introduced in a recent article in IJMR (Dolnicar et al. 2012) and was shown to yield more stable hence more reliable and trustworthy results than the shorter 'Pick-Any' measure and the longer '7-Point Scale' measure. The aims of the present article are (1) to explain how and why the Level-Free Forced-Choice Binary measure works so well, and (2) to point out its advantages over other belief measure formats advantages that, importantly, include prevention of all forms of response bias.
Why the level-free forced-choice binary measure of
brand-image beliefs works so well
Cite as:
Rossiter, J.R., Dolnicar, S. & Grün, B. (2015) Why Level-Free Forced Choice Binary
Measures of Brand Benefit Beliefs Work Well. International Journal of Market Research,
57(2), 1-9.
The level-free version of the Forced-Choice Binary measure of brand-image beliefs was
introduced in a recent article in IJMR (Dolnicar, Rossiter, and Grün, 2012) and was shown to
yield more stable brand-image results than the shorter ‘Pick-Any’ measure and the longer ‘7-
Point Scale’ measure. The aim of the present article is to investigate how the Level-Free
Forced-Choice Binary measure works and identify a number of advantages, which most
importantly include prevention of all forms of response bias.
Why the level-free forced-choice binary measure of
brand-image beliefs works so well
Beliefs about the degree to which certain products and services possess particular
attributes represent a highly prevalent and important construct in marketing research. Beliefs
form the empirical basis for many commonly used marketing analyses: multidimensional
scaling (e.g., Green and Rao, 1972); conjoint analysis (e.g., Green and Srinivasan, 1978); and
multi-attribute attitude models, which in marketing science are called subjective expected
utility models (e.g., Fishbein, 1963; Fishbein and Ajzen, 1975, 2010; McFadden, 1980).
These belief-based models are used for most strategic (e.g. positioning) and operational (e.g.
advertising) marketing decisions. A recent practitioner example is the pairing of tennis ace
Roger Federer with Moët & Chandon champagne, an alliance made because of the M&C
CEO’s reasoning that consumers share three key beliefs about both brands: “elegance,
glamor, and success” (Eggleton, 2012, p. 24). In particular, beliefs are the basis of brand-
image research in academia and of brand-image tracking studies conducted by marketing
When deciding how to measure brand-image beliefs, most academic researchers opt
for a multipoint measure, most often some form of ‘7-Point Scale.’ The well-known
SERVQUAL service brand-image beliefs, for instance, are measured on Likert 7-point
‘agree-disagree’ answer scales (see Parasuraman, Zeithaml, and Berry, 1988). However,
measurement of beliefs with a ‘7-Point Scale’ is problematic, for two main reasons. One
reason is that researchers use the same 7-point measure regardless of whether the attribute is
unipolar (e.g., a laundry detergent’s cleaning performance) or bipolar (e.g., evaluation of the
extent to which a fast-food restaurant’s offerings are healthy, on the positive side, or
unhealthy, on the negative side). The SERVQUAL instrument, for example, mixes unipolar
attributes (e.g., degree of responsiveness) with bipolar attributes (e.g., politeness of service
personnel, with strong disagreement implying that the service personnel are extremely
impolite, not just lacking politeness). The main problem resulting from use of the same 7-
point answer scale is that the seven degrees of attribute intensity in a 7-point scale will tend
to ‘overmeasure’ unipolar attributes because ‘too much’ discrimination is offered (see
Viswanathan, Sudman, and Johnson, 2004) and at the same time may in comparison
‘undermeasure’ bipolar attributes if ‘too little’ discrimination is offered with only three levels
of attribute intensity on either side of the neutral midpoint.
A second serious problem is that the data obtained from 7-point and other multipoint
rating measures are notoriously subject to distortion from raters’ response sets (Cronbach,
1946, 1950). The most prevalent response sets are ‘acquiescent’ responding, ‘extreme’
responding, and ‘midpoint’ responding (Baumgartner and Steenkamp, 2001;
Diamantopoulos, Reynolds, and Simintiras, 2006; Dolnicar and Grün, 2007; Tellis and
Chandrasekaran, 2010; Weijters, Geuens, and Schillewaert, 2010). These response sets
produce biased ratings of belief intensity; they also lead to common-measure bias in the
correlations between the belief ratings measured on the same answer scale and can inflate or
deflate correlations depending on the type of response set adopted by the rater.
Despite these problems, multipoint belief measures continue to dominate in market
research. The single exception is in practitioners’ brand and advertising tracking studies,
where shorter binary measures are employed to measure brand-image beliefs because they
are much quicker for respondents to answer. The Forced-Choice Binary measure of brand-
image beliefs was introduced by the British Audience Research Bureau, BARB,
approximately 50 years ago (see Joyce, 1963, and also McDonald, 2000). Introduced at the
same time was the Free-Choice Binary, or ‘Pick-Any,’ measure – used today by nearly all
brand-image tracking companies, including the worldwide market leader, Millward Brown
plc. The Pick-Any measure’s popularity is due to it being more efficient to administer (it
requires only a simple brand-by-attribute, rows-by-columns, matrix on the questionnaire and
respondents only have to answer ‘yes’ to, i.e., to ‘pick,’ those attributes they believe the brand
has, omitting all others). An earlier study by Driesener and Romaniuk (2006) compared the
speed of completion of the Pick-Any measure versus brands’ rankings on the attributes and
brands’ ratings on a 5-point Likert measure, finding that the Pick-Any measure required just
half the amount of time to complete as the other two measures. However, Dolnicar, Rossiter,
and Grün (2012) raised concerns about the validity of the Pick-Any measure (see also
Dolnicar and Rossiter, 2008). The ‘free choosing’ in Pick Any allows respondents to omit,
and therefore underreport, the image incidence for the brands; the proposed Level-Free
Forced-Choice Binary measure is not affected by this problem because it encourages
respondents to carefully consider those attributes that are presented on the questionnaire.
Also, those image attributes that are ‘picked’ using a Pick-Any measure are reported
unstably, such that they too often fail to be picked again, even after a short time interval, and
many new ones are picked on the second occasion. A low two-way repeat rate of 50% –
equivalent to “chance” responding by the raters – was observed in an earlier large-scale study
with Pick-Any measures of beliefs (Dall’Almo Riley, Ehrenberg, Castleberry, Barwise, and
Barnard, 1997) and in Dolnicar et al.’s (2012) study the Pick-Any repeat rate was even worse
than chance, at 41%. Underreporting and unstable responding pose great problems for brand-
image-tracking studies because the results, when averaged over respondents, particularly if
‘rolled’ as a moving average as in most tracking studies, will imply falsely that brand images
are highly stable.
Another problem with brand-image measures, which has not been identified to date,
is that the image attributes frequently are worded with a fixed attribute level (e.g., ‘cleans
very well for a brand of laundry detergent; or being ‘extremely convenient’ for a fast-food
restaurant brand). The fixed level of the attribute in the item gets confounded with the level
chosen in the answer (see Rossiter, 2002). In the laundry detergent example, consumers
would be more likely to say ‘yes’ if the item were worded as ‘cleans well’ rather than ‘cleans
very well.’ In the fast-food restaurant example, consumers would be more likely to ‘agree’ if
the item were worded as ‘convenient’ rather than ‘extremely convenient.’
This last point about levels in measures led to the present authors’ innovation in
designing Forced-Choice Binary measures: the new measures are designed to be level-free in
the item wording, noting that they are already level-free in the binary answer options (‘yes,
no’ for unipolar attributes and ‘agree, disagree’ for bipolar attributes, with no levels between
the two answer options – see examples in Appendix A). This means that the overall measure
is doubly level-free. Indeed, the technical name given to the new version of Forced-Choice
Binary measures (see chronologically Rossiter, 2011; Dolnicar et al., 2012; Dolnicar and
Grün, 2013; and Dolnicar, in press) is DLF IIST, which stands for Doubly Level-Free
Individually-Inferred Satisfaction Threshold. The second part of the name – the IIST part –
refers to the information-processing mechanism through which the new version of the
Forced-Choice Binary measure is theorized to work.
All multipoint brand-image rating measures – Likert, Semantic Differential, and so
forth – seek an absolute judgment. The Level-Free version of the Forced-Choice Binary
measure, in a fundamental departure from this, seeks a comparative judgment (see Thurstone,
1927). The comparative judgment requires the respondent to form, or retrieve from memory,
a rough estimate of the brand’s believed degree of possession of the attribute, which is
followed automatically by an easy determination as to whether this roughly believed degree
of attribute possession meets the respondent’s previously learned threshold of satisfactory
attribute possession (based on the respondent’s experience with brands previously
encountered in the product category). Attributes, as noted above, are of two kinds: unipolar
or bipolar. The upper panel of Figure 1 illustrates how unipolar attribute judgments are
theorized to operate via the Level-Free Forced-Choice Binary measure, and the lower panel
shows this for bipolar attributes. Note that there is one threshold for a unipolar attribute; but
theoretically there are two thresholds for a bipolar attribute, one for the negative pole and
another for the positive pole. However, Binary measures – like Likert measures – are always
worded unipolar (see examples in Appendices A and B), which means that for a bipolar
attribute either one pole must be chosen or else the two poles must be measured as separate
items. Accordingly, there is only one threshold to be considered in any single brand-image
Figure 1 about here
An alternative way of presenting the theory of how the Level-Free Forced-Choice
Binary measures operate as comparative judgments is to express the process symbolically. In
general, belief judgments can be denoted symbolically as Bio.a.k, where B = the belief, i = the
individual judge or rater, o = the belief object (a brand of a product or service, for instance), a
= the attribute of judgment, and k = the judged absolute level of the attribute. Level-Free
Forced-Choice Binary judgments are different. The belief, B, is now only a rough judgment
rather than an attempted precise one, and can be denoted as Bio.a.rk, where the final subscript,
rk, signifies ‘rough k’ and allows for a range of k rather than a precise value. Cowley and
Rossiter’s (2005) range model of judgment provides the evidence that consumers do in fact
have a range in mind around the attribute level even though they are asked by the researcher
or by the instructions in the questionnaire to make a precise judgment by marking a single
point on the rating scale. (For the earliest demonstration of the ‘fuzzy’ nature of consumers’
belief ratings, see Woodruff, 1972.) When presented with a Level-Free Forced-Choice
Binary measure, such as ‘Omo gets stains out: Yes No,’ or ‘McDonald’s is unhealthy:
Agree Disagree,’ the supposition is that the consumer automatically brings to mind a
‘standard,’ or ‘threshold,’ that represents a learned average level of the attribute for that
category of objects (laundry detergents or fast-food restaurants in the two examples). This
learned average belief level can be denoted as Bic.a.avk, where the symbol, c, for category,
replaces the symbol, o, for object, and avk is now the consumer’s perceived average level of
the attribute for brands the consumer knows in that category. It is avk that functions as the
standard for comparison. Thus:
If Bio.a.rk ≥ Bic.a.avk, the consumer answers ‘yes,’ or ‘agree’
If Bio.a.rk < Bic.a.avk, the consumer answers ‘no,’ or ‘disagree’
The consumer’s comparative-judgment ‘decision rule’ is: if rk meets or exceeds avk, give an
affirmative answer; but if rk is below avk, give a negative answer. The comparison is easy to
make because it does not require an absolute point judgment. The comparison is so quick
(see Dolnicar and Grün, 2013) that it can be presumed to be automatic.
The theorized comparative judgment process is very different from that involved in
answering on a 7-point (or other multipoint) rating scale. For instance, when making belief
ratings on a 1-to-7 Unipolar Numerical rating scale, the consumer has to first make a
judgment about what the anchor words at each end of the scale mean (is one end the ‘zero’
level of the attribute or merely the ‘low’ level? and is the other end the ‘high’ level or is it the
‘extremely high’ level?) and also about what the numbers mean (the number ‘4’ in the middle
of a 1-to-7 numbered scale in particular). All this decision-making takes time and, more
importantly, it leads to undetectable individual variation in scale-level interpretation.
Varying scale-level interpretation is considered by psychometricians to be ‘random error,’ or
simply ‘noise.’
The new Forced-Choice Binary measure substantively reduces ‘noise.’ This is
because the between-individual variation – known as heterogeneity – is captured in the
threshold term (in avk). The threshold is specific to the individual because it is based on his
or her personal learning history. Figure 2 illustrates the consequences of this threshold.
Suppose that two consumers, person A and person B, are asked to rate a brand’s image belief
on a typical ‘7-Point Scale’ with possible scores ranging from 1 to 7, and that both decide that
the closest numerical rating that corresponds with their judgment is a 5. Does this mean that
the two consumers regard the object (a brand of laundry detergent, say) as performing equally
well on the attribute (cleaning performance, say)? The answer is: not at all if they have
different standards – different thresholds – for what constitutes satisfactory performance.
Person A’s threshold is 5 in the figure, so the brand’s performance would be seen by person A
as satisfactory. Person B’s threshold, on the other hand, is higher, namely a 6, so person B,
despite giving the same absolute rating, would not see the brand’s performance as
satisfactory. The problem for multipoint rating measures, such as the one shown, is that no
threshold is invoked. With the Level-Free Forced-Choice Binary measure the threshold is
inevitably invoked because the forced-answer measure could not be answered without it. Try
answering, for example, the item ‘Omo gets out stains: Yes No’ or the item
‘McDonald’s is convenient: Agree Disagree.’ You will realize that neither item could
be answered without having a standard or threshold for, respectively, ‘stain removal’ and
‘convenience’ implicitly in mind.
Figure 2 about here
The Level-Free Forced-Choice Binary measure’s unique ability to capture
heterogeneity across respondents means that, despite having only a ‘2-point answer scale,’ it
does not suffer from the ‘restricted variance’ problem (the problem offers alleged against
binary measures; see, e.g., Lehmann and Hulbert, 1972, Nunnally, 1978, and Gleason,
Devlin, and Brown, 2003). Binary-answered measures are of course restricted arithmetically
to 0’s and 1’s, or –1’s and +1’s if bipolar, whereas the answers on a ‘7-Point Scale’ measure
can range more widely over the numbers 1 to 7.1 But if much of the multipoint variation is in
fact ‘noise,’ caused by the respondent’s inability to make precise intensity judgments and by
the operation of response sets, this noise will distort the true rating. With the threshold-based
Level-Free Forced-Choice Binary measure, the variation is not ‘noise’ but is true between-
person variation in the easy-to-make judgments of whether the brand meets or falls below the
threshold for the particular attribute.
1 The arithmetic restriction of data resulting from Level-Free Forced-Choice Binary measures is identical to that
of the Pick-Any measures commonly used in brand tracking studies. Therefore, all statistical methods that can
be used for Pick-Any data can also be used for data collected using the Level-Free Forced-Choice Binary
measure. Data from a 7-point answer scale contains seven discrete values; it does not measure brand-image
beliefs on a continuous scale, thus also precluding the use of most standard statistical procedures developed for
continuous data. Instead, procedures which take the discrete nature of the data into account have to be used.
Such procedures are often also applicable to binary data, because binary data is a special case of multi-category
data. For example, polychoric correlations (Olsson 1979), which estimate the correlation between two theorized
normally distributed continuous latent variables from two ordinally measured variables, can also be estimated
for binary data.
Level-Free Forced-Choice Binary measures of brand-image attributes do not require
precise ratings of attribute intensity; they require only estimates of ‘rough k,’ rk, not ‘exact k,’
k. There is good evidence that people cannot make precise ratings of typical consumer
beliefs (see Woodruff, 1972; and Cowley and Rossiter, 2005) and this indecision leaves
multipoint rating measures open to response sets, which produce what are known as biased
scores. The analysis in Table 1 summarizes how the most common multipoint measures of
beliefs – Unipolar, Likert, and Semantic Differential – compare with the Level-Free Forced-
Choice Binary measure for susceptibility to response sets. The major response sets are
acquiescence, extreme responding, and midpoint responding, and they each lead to a
particular form of bias in the ratings.
Table 1 about here
Acquiescence Bias
Acquiescent responding, or ‘yea-saying,’ and its opposite, disacquiescent responding,
or ‘nay-saying,’ occur most often with political or socially sensitive topics, where many
people tend to give socially desirable answers. The vast majority of consumer topics –
products, services, or in-ad presenters – are not sensitive topics. Any apparent yea-saying or
nay-saying is much more likely to be halo responding caused by the respondent’s favorable
(positive halo) or unfavorable (negative halo) preexisting overall attitude toward the rated
object. Halo responding is therefore a true response rather than an erroneous ‘bias’ (see
Holbrook, 1983; J. Park, K. Park, and Dubinsky, 2011; and Rossiter, 2011).
Acquiescent responding (as distinct from positive halo responding) can, however,
show up as a response-order effect. Response-order is a more recently identified response set
that occurs mainly with orally administered measures, as in face-to-face interviews or on the
telephone, where the last-mentioned response option tends to be retained better in working
memory and is thus chosen more often than it would normally be chosen on a self-
administered written or online questionnaire (see Krosnick and Alwin, 1987). However, the
present authors’ unpublished experiments varying the order of the two answer options for
Level-Free Forced-Choice Binary measures have shown no evidence of an order effect (with
the online administration that was used, a first-response bias would be expected). And with
orally delivered questions on the phone or in person, the two answer alternatives are easily
kept simultaneously in working memory, thus precluding the response-order effect.
Practically speaking, this means that with Level-Free Forced-Choice Binary measures it
makes no difference with a unipolar attribute whether you place the ‘yes’ answer-box first or
second, and the same goes for the ‘agree’ answer-box with a bipolar attribute.
Extreme Bias
Extreme responding, unlike acquiescent responding, does pose a serious problem for
all multipoint belief measures. Extreme responding is detectable only by examining
individuals’ response patterns across multiple similar items, wherein extreme responding is
likely to have occurred if the respondent has ‘straight-lined’ down one or the other extreme
side for all items. Whereas it is becoming routine for the better fieldwork companies to check
for ‘straight-liners,’ these respondents are often retained in the data to keep up the sample
size, because of the well-known worsening respondent recruitment incidence (see Menictas,
Wang, and Fine, 2011). Academic researchers hardly ever report checking for such biases, so
their data are almost always contaminated by extreme responding.
Extreme responding by any substantial proportion of raters will tend to artificially
inflate correlations between the image ratings. As the findings in Table 2 reveal, the inflation
will be worse for unipolar attribute measures because there is more likelihood of
overdiscrimination with more answer categories with which to be ‘extreme on’ (six beyond
the left-hand zero category) than with bipolar attribute measures (three on either side of the
neutral midpoint). Numbering bipolar answer scales as unipolar (e.g., 1 to 7) will also inflate
correlations because respondents see six categories over which to stretch their responses
when in fact there are only three.
Table 2 about here
The big advantage of Forced-Choice Binary measures is that there are no extreme
options: the individual has only to answer ‘yes’ (or ‘agree’) on one side of the internal
threshold and ‘no’ (or ‘disagree’) on the other, and cannot answer extremely.
Midpoint Bias
Midpoint responding is another common way for respondents to ‘opt out’ from
carefully answering survey questions (Dolnicar and Grün, in press) and, like extreme
responding, it is detectable only by inspection of individuals’ response patterns on multiple
belief items. Midpoint responding as a ‘response set’ is most likely to be found with bipolar
answer scales, where the midpoint is supposed to mean ‘neutral’ or ‘neither’ but is often
resorted to when the respondent ‘can’t be bothered’ answering properly (see Dolnicar and
Grün, in press). Erroneous midpoint responding may occur also with unipolar answer scales
when bipolar attributes are mixed in with unipolar ones. This happens frequently in Semantic
Differential item batteries (see Osgood, Suci, and Tannenbaum, 1957) that mix unipolar items
like ‘low quality…high quality’ with bipolar items like ‘bad…good.’
Midpoint responding as a response bias, like extreme responding, will affect
correlations between belief ratings, but in the opposing manner. Midpoint response bias will
tend to deflate the correlation. With ratings entered directly to the computer these days,
‘midpoint straight-lining,’ just like straight-lining on extreme answers, is easily detected but
rarely corrected for by removing the offending respondents. Deflation of the correlation
occurs because the between-attribute rating variance will tend toward zero if too many
respondents opt out via the midpoint. It should be noted that omitting the midpoint answer
option (such as using –2 –1 +1 +2 answer options instead of –2 – 1 0 +1 +2) does not
solve the midpoint opting-out problem. Respondents are likely to distribute their would-be-
zero opting out answers at random to one or other of the two near-midpoint categories (–1 or
+1 in this example). Consistent near-midpoint responding will still tend to deflate the
correlation between brand-image belief ratings.
Level-Free Forced-Choice Binary measures have no midpoint and thus they prevent
midpoint response bias.
An important consequence of the response biases inherent in multipoint measures of
beliefs is individual-level instability of brand image ratings, even over short periods where no
actual change in the brand’s image has taken place. (Note that aggregate stability – total-
sample average stability – is not relevant, because it masks individual-level instability.)
Individual-level stability can be assessed in the ‘test-retest’ reliability paradigm by calculating
the proportion of respondents who exactly repeat their initial rating (or initial binary
judgment) on a short-interval, one- or two-week later, retest. Perfect stability (a proportion of
1.0) can be expected only among consumers who are familiar with the product category, the
brand to be rated, and the attributes used in the measures (see Dolnicar et al., 2012). Some
degree of stability can occur by chance and is dependent on the number of answer categories:
for a 7-point answer scale, where each category is equally likely to be chosen, the chance
stability proportion is 1/7 × 1/7 = 1/49 = .02, and for a binary answer scale it is 1/2 × 1/2 = .
25. Table 3 shows the exact stability proportions for 7-Point Scale ratings and Level-Free
Forced-Choice Binary judgments (for the same two data sets as in the previous table). While
both stability proportions are well above their respective chance proportions, the exact
stability for 7-Point Scale ratings is very low, averaging about .45, compared with the exact
stability for the Forced-Choice Binary measures, which in both cases is above .80.
Table 3 about here
The 7-Point Scale measure’s low exact repeatability provides empirical proof of the
present authors’ presumption that precise intensity ratings on multipoint answer scales are
difficult to make. The Level-Free Forced-Choice Binary measure’s attribute judgments of
whether the brand ‘meets’ or ‘doesn’t meet’ the individual’s established threshold are easier
to make and therefore are more stable.
A seeming limitation in ‘switching’ to Level-Free Forced-Choice Binary measures is
researchers’ feared loss of diagnostic capability. With multipoint belief ratings, in theory, the
marketer can use multiple regression to relate the brand’s attribute belief ratings to a relevant
dependent variable such as Overall Attitude, Overall Satisfaction, or Purchase Intention.
From the regression coefficients for the belief ratings, the market researcher can compute the
‘elasticity’ of each attribute and estimate the incremental gain on the dependent variable if the
belief were to be increased for a positive attribute or decreased for a negative attribute. But if
multipoint belief ratings are fuzzy, biased in an unknown direction, and unstable, then the
regression weights, which are in effect partial correlation coefficients, will also be unstable.
Diagnosis by regression analysis then becomes untrustworthy and, worse, misleading.
Switching to Level-Free Forced-Choice Binary measures of brand-image attributes
would, at first, seem not to be the solution to the diagnostic problem because the binary
judgments are seen as too blunt and not sensitive enough to record marketing-induced shifts
in belief ratings. However, as some advertising theorists have pointed out (specifically
Moran, 1985, and Rossiter and Bellman, 2005), the ultimate purpose of marketing is to get as
many individual consumers as possible up to the ‘go, no go’ binary action threshold on the
brand’s targeted attribute or attributes. Unlike the Pick-Any measure, which understates the
incidence of brand-image beliefs, the Level-Free Forced-Choice Binary measure records the
brand’s threshold-meeting incidence for each attribute belief exactly. And unlike 7-Point or
other multipoint measures, the Level-Free Forced-Choice Binary measure avoids the
confound of the incidence of consumers who believe the brand has the attribute with the
intensity of their belief (which is a completely neglected problem with multipoint belief
ratings). For example, a service company might be rated overall 6 out of 7 for satisfying
customers on an attribute such as ‘response time’ but this could be due to most customers
being perfectly satisfied and giving the company a 7 rating while others are disaffected and
give it a much lower rating. Of course, this distribution of ratings could easily be checked by
inspecting individual-level responses instead of only the group-average response, but the
researcher would still not know at which number to make the “cutoff” for a truly satisfactory
rating. With the Level-Free Forced-Choice Binary measure there are no intensity differences
in the binary answers. Consumers are merely answering either at or below their individual
thresholds and this reveals pure incidence, namely: the proportion of customers who are
satisfactorily satisfied with regard to the targeted attribute or attributes.
In the hope of encouraging adoption of the Level-Free Forced-Choice Binary
measure, two examples of prototype questionnaires are provided in Appendix A. The first
example covers unipolar attributes for laundry detergents (scored 1, 0). The second example
covers bipolar attributes for fast-food restaurants (scored +1, –1). A third example is shown
for a modified SERVQUAL-type instrument in Appendix B, with Level-Free Forced-Choice
Binary measures replacing the usual 7-point Likert answer scales used in service quality
research. The attributes in this example are all unipolar as in Likert items, but worded level-
free unlike typical Likert items, and should be scored ‘yes’ = 1 and ‘no’ = 0 to reveal
individual ‘at-threshold’ incidence. Customized variations of these questionnaires can easily
be constructed from appropriate qualitative research.
A final, radical recommendation is that researchers drop the general term ‘brand
image.’ The term has long outlived its original meaning of an emotions-based profile of the
brand – correctly measured only by using a set of Osgood et al.’s (1957) 7-point, correctly
bipolar, Semantic Differential items. For efficiency reasons, professional tracking companies
do not use semantic scales when tracking ‘image attributes’ and academic researchers rarely
now use them either (see, for instance, Driesener and Romaniuk’s, 2006, article on ‘brand
image measurement’ and indeed the present article) and so ‘brand image’ is rarely in fact
measured! Brand-Attribute Beliefs, the term used in the Rossiter and Percy (1987, 1997) or
Rossiter and Bellman (2005) advertising management and research textbooks, would be more
fitting of what brand-tracking market researchers actually measure.
Examples of Level-Free Forced-Choice Binary measures for unipolar attributes and bipolar
attributes (one brand shown in each case)
Laundry detergents
Cleans Yes No
Removes stains Yes No
Whitens whites Yes No
Brightens colors Yes No
Freshens clothes Yes No
Fast-food restaurants
Yummy Agree Disagree
Quick service Agree Disagree
Value for money Agree Disagree
Unhealthy Agree Disagree
Convenient Agree Disagree
SERVQUAL-type questionnaire modified from the Likert format to the Level-Free Forced-
Choice Binary format (service category: retail banks)
Banks’ previous or current customers as raters
Barclays Bank:
1. Welcoming-looking branches Yes No
2. Branch convenient to work or home Yes No
3. Well laid-out interior facilities Yes No
4. Short waiting times Yes No
5. Privacy for important transactions Yes No
6. Polite tellers Yes No
7. Competent desk personnel Yes No
8. Competitive interest rates Yes No
9. Account statements sent frequently Yes No
10. Account statements clear and accurate Yes No
11. Good online banking Yes No
Baumgartner H, Steenkamp J-BEM. 2001. Response styles in marketing research: a cross-
national investigation. Journal of Marketing Research, 38(2), 143-156.
Cowley E, Rossiter JR. 2005. Range model of judgments. Journal of Consumer Psychology,
15(3), 250-262.
Cronbach LJ. 1946. Response sets and test validity. Educational and Psychological
Measurement, 6(4), 475-494.
Cronbach LJ. 1950. Further evidence on response sets and test design. Educational and
Psychological Measurement, 10(1), 3-31.
Dall’Olmo Riley F, Ehrenberg ASC, Castleberry SB, Barwise TP, Barnard NR. 1997. The
variability of attitudinal repeat rates. International Journal of Research in
Marketing, 14(5), 437-450.
Diamantopoulos A, Reynolds NL, Simintiras AC. 2006. The impact of response styles on the
stability of cross-national comparisons. Journal of Business Research, 59(August),
Dolnicar, S. In press. Asking good survey questions. Journal of Travel Research.
Dolnicar S, Grün B. 2007. Cross-cultural differences in survey response patterns.
International Marketing Review, 24(2), 127-143.
Dolnicar S, Grün B. 2013. Validly measuring destination images in survey
studies. Journal of Travel Research, 52(1), 3-13.
Dolnicar, S., Grün, B. In press. Including don’t know answer options in brand image surveys
improves data quality. International Journal of Market Research.
Dolnicar S, Rossiter JR. 2008. The low stability of brand-attribute associations is partly due
to market research methodology. International Journal of Research in Marketing,
25(2), 104-108.
Dolnicar S, Rossiter JR, Grün B. 2012. ‘Pick any’ measures contaminate brand image studies.
International Journal of Market Research, 54(6), 821-834.
Driesener C, Romaniuk, J. 2006. Comparing methods of brand image measurement.
International Journal of Market Research, 48(6), 681-698.
Eggleton J. 2012. The brand raquet: why Federer, Moet are a champagne union. The
Australian, December 3, pp. 24, 28.
Fishbein M. 1963. An investigation of the relationships between belief about the object and
attitude toward the object. Human Relations, 16(3), 233-240.
Fishbein M, Ajzen I. 1975. Belief, Attitude, Intention, and Behavior: An Introduction to
Theory and Research. Addison-Wesley: Reading, MA.
Fishbein M, Ajzen I. 2010. Predicting and Changing Behavior: The Reasoned Action
Approach. Psychology Press: New York.
Gleason TC, Devlin SJ, Brown M. 2003. In search of the optimum scale. Marketing
Research, 15(3), 25-29.
Green PE, Rao VR. 1972. Applied Multidimensional Scaling: Comparison of Approaches
and Algorithms. Holt, Rinehart and Winston: New York.
Green PE, Srinivasan V. 1978. Conjoint analysis in consumer research: issues and outlook.
Journal of Consumer Research, 5(2), 103-123.
Holbrook MB. 1983. Using a structural model of halo effect to assess perceptual distortion
due to affective overtones. Journal of Consumer Research, 10(2), 247-252.
Joyce, T. 1963. Techniques of brand image measurement. New Developments in Research –
6th Annual Conference of the Market Research Society. London: Market Research
Society, pp. 45-63.
Krosnick JA, Alwin DF. 1987. An evaluation of a cognitive theory of response order effects
in survey measurement. Public Opinion Quarterly, 51(2), 201-219.
Lehmann DR, Hulbert J. 1972. Are three-point scales always good enough? Journal of
Marketing Research, 9(4), 444-446.
McDonald C. 2000. Tracking advertising and monitoring brands. Admap Monograph No. 6.
Henley-on-Thames, England: Admap Publications.
McFadden D. 1980. Econometric models for probabilistic choice among products. Journal of
Business, 53(3), 513-530.
Menictas C, Wang P, Fine B. 2011. Assessing flat-lining response style bias in online
research. Australasian Journal of Market & Social Research, 19(2), 34-44.
Moran WT. 1985. The circuit of effects in tracking advertising profitability. Journal of
Advertising Research, 25(1), 25-29.
Nunnally JC. 1978. Psychometric Theory, 2nd edn. McGraw-Hill: New York.
Olsson, U. 1979. Maximum likelihood estimation of the polychoric correlation coefficient.
Psychometrika, 44(4), 443-460.
Osgood CE, Suci GJ, Tannenbaum P. 1957. The Measurement of Meaning. Urbana, IL:
University of Illinois Press.
Parasuraman A, Zeithaml V, Berry LL. 1988. SERVQUAL: a multiple-item scale for
measuring consumer perceptions of service quality. Journal of Retailing, 64(1), 12-40.
Park JY, Park K, Dubinsky AJ. 2011. Impact of retailer image on private brand attitude: halo
effect and summary construct. Australian Journal of Psychology, 63(3), 173-184.
Rossiter JR. 2002. The C-OAR-SE procedure for scale development in marketing.
International Journal of Research in Marketing, 19(4), 305-335.
Rossiter JR. 2011. Measurement for the Social Sciences: The C-OAR-SE Method and Why it
Must Replace Psychometrics. Springer: New York.
Rossiter JR, Bellman S. 2005. Marketing Communications: Theory and Application. Sydney,
Australia: Pearson.
Rossiter JR, Percy L. 1987. Advertising & Promotion Management. McGraw-Hill, New York.
Rossiter JR, Percy L. 1997. Advertising Communications & Promotion Management, 2nd
edn. McGraw-Hill, New York.
Tellis GJ, Chandrasekaran D. 2010. Extent and impact of response biases in cross-national
survey research. International Journal of Research in Marketing, 27(4), 321-341.
Thurstone LL. 1927. A law of comparative judgment. Psychological Review, 34(4), 273-286.
Viswanathan M, Sudman S, Johnson M. 2004. Maximum versus meaningful discrimination in
scale response: implications for validity of measurement of consumer perceptions
about products. Journal of Business Research, 57(February), 108-125.
Weitjers B, Geuens M, Schillewaert N. 2010. The stability of individual response styles.
Psychological Methods, 15(1), 96-110.
Woodruff RB. 1972. Measurement of consumers’ prior brand information. Journal of
Marketing Research, 9(3), 258-263.
Unipolar performance attribute
‘No’ ‘Yes’
Category threshold
of satisfactory
Bipolar evaluative attribute measured as two unipolar attributes
‘Disagree’ ‘Agree’
Neutral Negative
threshold for
‘Disagree’ ‘Agree’
Neutral Positive
threshold for
With 7-point rating measures of belief, two individuals could have identical scores of, say, 5…
Person A 1 2 3 4 5 6 7
Person B 1 2 3 4 5 6 7
But if their category satisfaction thresholds for the attribute differ, this would mean
a different result. If, for example:
Person A’s threshold is 5 or lower, then the FC Binary answer is ‘Yes.’
Person B’s threshold is 6 or higher, then the FC Binary answer is ‘No.’
Table 1. Multipoint measures of beliefs are susceptible to all major forms of response set
(response bias) whereas Level-Free Forced-Choice Binary measures prevent them
Belief measure
Response set
type Acquiescence Extremes Midpoint
(not at all maximum)
YES YES No (unless the
unipolar scale is
interpreted as
(strongly disagree
strongly agree)
Semantic differential
(e.g., dislike like)
Level-Free FC Binary
(yes, no; disagree, agree)
No (empirical
tests show no
No (no No
extreme (no
Table 2. Correlations between brand-attribute belief ratings show inflation (compared with
Level-Free Forced-Choice Binary judgments) when there are more scale points
FC Binary
(yes, no)
Correlations for laundry
detergent brand-attribute
.40 .74 .86
FC Binary
(agree, disagree)
(2-point each side
of midpoint)
(3-point each side
of midpoint)
Correlations for fast-food
restaurant brand-attribute
.22 .26 .29
a Six brands rated on seven laundry performance attributes by approximately n = 300
respondents per measure
b Five brands rated on five evaluative attributes by approximately n = 200 respondents
per measure
Table 3. Absolute belief intensity ratings are “fuzzy” as indicated by the former’s much
lower test-retest stability when compared with Level-Free Forced-Choice Binary
belief judgments
Exact stability proportion
FC Binary
Laundry detergent performance beliefsa.44 .82
Fast-food restaurant evaluative beliefsb.46 .85
Chance stability proportion (.02) (.25)
a Same data set as in previous table
b Same data set as in previous table
... The instrument asked respondents to select two of seven possible attributions for the writer of their more preferred passage and two of seven for the writer of their less preferred passage. Thus, to measure writer attributions, the instrument used forcedchoice rather than Likert-type ratings, which has shown several advantages in prior research involving beliefs (Bartram, 2007;Rossiter et al., 2015). ...
... A final limitation was caused by our choice to collect forced-choice (either/or) data rather than Likert-type ratings to measure attributions, which would have provided continuous data that could have been modeled with other statistical methods. However, forced-choice questions "[avoid] the confound of the incidence of [respondents] who believe the [target] has the attribute with the intensity of their belief" (Rossiter et al., 2015). Using forced-choice questions also allowed us to collect data from every respondent for a wider range of style features and attributions. ...
Full-text available
One line of prior research has focused on the effect of style on readers’ ability to comprehend or willingness to engage with a message. A separate line has illuminated the effect of errors on the impressions readers form about writers, identifying potentially serious consequences (e.g., the willingness to accept the writer as a coworker or to fund the writer’s business pitch). To date, few studies have investigated the effect of style on the impressions readers form of business writers. In this paper, we report a study of the relationship between business writer attributions and word- or sentence-level style features often emphasized by advocates of plain style. Using data from 614 respondents, we found statistically significant evidence that business writers conveyed (a) confidence by avoiding non-requisite words, jargon, and nominals and by using standard connotations and grammar, and (b) professionalism by avoiding non-requisite words and hedges and by using standard homonyms.
... It is well worth pointing out for the benefit of researchers the advantages of the forced-choice binary answer format for measuring people's beliefs such as the self-beliefs involved in self-esteem (see Rossiter, 2011;Rossiter, Dolnicar, & Grün, 2015). The main problem with the traditional multi-point answer scales for rating beliefs is that the ratings inevitably mask individual differences in the action threshold for subsequent behavior. ...
... Given our primary decision of replicating the 16 personality trait multi-category ratings, we considered a binary response to be the most efficient addition for this novel question. Finally, binary response types without magnitude, such as the question posed in this study, require a participant to respond based upon their personal threshold for an attribute (Dolnicar, 2013; (Rossiter, Dolnicar, & Grün, 2015); (Rossiter, 2011)), a characteristic we believed to better represent whether participants would actually approach their professors to discuss their performance. Following the binary response, all participants were also asked to respond in an essay textbox what informed their level of comfort. ...
Purpose Previous literature has documented that college professors view hypothetical students who stutter more negatively than their fluent peers. The purpose of the present study was to investigate whether individuals who stutter report they experience more negative perceptions in the college classroom, and the impact of those perceptions on their comfort approaching professors. Methods Two hundred forty-six adults who do and do not stutter, matched for age, participated in this study. Participants were presented with 16 positive and negative personality traits and asked to rate how strongly they believed their professor viewed them along each trait. All participants were asked whether they felt comfortable approaching their professors to discuss their performance. Adults who stutter were asked additional questions to investigate their college experience more comprehensively. Results Adults who stutter reported they experienced significantly more negative perceptions from their professors than adults who do not stutter, and were significantly less likely to feel comfortable approaching their professors. These reported negative perceptions, specifically being perceived as less self-assured, predicted comfort approaching professors to receive performance feedback for adults who stutter. Finally, amongst adults who stutter, perception of how they were evaluated compared to their peers was significantly related to comfort approaching professors. Conclusions Results support that the negative perceptions towards hypothetical students who stutter reported in previous literature are experienced by individuals who stutter, and that these perceptions drive comfort approaching professors for performative feedback. Results suggest professors may increase students’ comfort by clearly outlining equality in evaluation procedures.
... The respondents' level of consent with each statement was assessed with a binary variable; 'yes' (I agree) or 'no' (I do not agree). The binary response format was chosen since they reduce respondent's fatigue and time investment, are equally reliable compared to multi-category answers such as Likert scales and are, regarding (brand) beliefs, considered as more stable (Dolnicar and Grün 2007, Dolnicar et al. 2011, Rossiter et al. 2015. ...
Elite sports are associated with values such as fair competition, striving to become better and challenging oneself. These are considered as social benefits. However, integrity issues relating to misuse of doping or corruption challenge this. This paper examines the determinants of public perception of elite sport by means of a survey. Logistic regression modelling assesses the effects of trust, athletes as role models, perceived costs and benefits on public perception of elite sports. The results reveal that perceived benefits have a positive effect on all public perception measures whereas perceived costs, trust in key actors of the elite sport system and viewing athletes as role models have only an effect on selected perception measures. Sports managers should emphasise on perceived benefits in their communication strategy and use athletes as role models in their communication strategies to strengthen a positive perception of elite sports.
... In that case, consumers will be motivated by other features of the product such as the experiential or the symbolic aspect of it. Consumers will then look at what it feels like to consume the brand (brand experience) and take into consideration symbolic benefits (Keller 2003(Keller , 2009Orth and De Marchi 2007;Park et al. 1986;Rossiter et al. 2014). ...
Full-text available
The purpose of this paper is to develop a scale measuring consumers’ brand benefits in less developed economies. Based on the literature, items have been generated in qualitative and quantitative studies and tested by using exploratory and confirmatory factor analyses. The findings show that brand benefits converge into a two-factor structure (functional and symbolic) instead of three (functional, experiential and symbolic). These findings can be justified by the fact that consumers in developing economies do not have as much experience with brands as the ones from developed economies. The results also relate to previous literature findings on the topic of utilitarian and affective brand relationships. This scale can be used to advance the domain of brand benefits in a cross-cultural environment and can be employed by marketers when businesses plan to brand their products in developed economies.
... In this respect, several papers are devoted to discuss definition and usage of scales that should be clear and self-evident, with or without a central category. We limit here to quote the main references for this discussion which may have some implications on the interpretations of uncertainty (Dolnicar and Grün, 2013;Yusoff and Janor, 2014;Iannario, 2015b;Rossiter et al., 2015;Brulé and Veenhoven, 2016). To disentangle this intricate issue, a unifying approach has been recently proposed in the field of enterprise risk management (Gadrich et al., 2015). ...
The main objective of this paper is to evaluate the degree of uncertainty in self-reported happiness responses by means of a statistical model able to detect the relevant features of the expressed ratings. We consider a mixture model to address a twofold research question: how can we measure the indecision in expressed well-being; how to assess if this latent trait varies depending on the covariates of those surveyed? The selected modelling approach investigates the feeling/agreement component, making the underlying indecision explicit without imposing extra constraints to the model. Furthermore, our proposal allows to enhance the presence of a “refuge” option in the response patterns. The effects of individual characteristics may be highlighted, when significant. Results are presented stemming from an observational study showing that responses are characterized by a large variability among subjects. The methodology here experimented may be considered a general one since it can be exploited both in observational and in experimental surveys.
This article addresses the three biggest issues surrounding creativity in advertising: the need for a new test to identify highly creative individuals, the need to provide a better creative idea generation procedure, and the need to pretest ideas for effectiveness. I critically review what has already been done and then draw upon my own extensive work in the field of creativity ability testing and my work with Larry Percy on creative idea generation and pretesting to show how these procedures can be carried out more effectively. Creative ability is shown to be best measured with a multiple-item measure of divergent thinking that uses familiar cue objects, with responses scored only for originality. Creative ideas for use in advertising can be most efficiently generated by recruiting high creative ability individuals and using the I-G-I method of brainstorming. The most promising creative ideas can then be pretested in rough ad form by using the Remote Conveyor Test, which requires that the creative idea be attention-getting, novel for the product category, correctly understood with the key benefit quickly evident, and executed to be free of conflicting associations.
Full-text available
Tracking respondents’ eyes while they complete a survey reveals that (a) they do not read instructions, survey questions, and answer options carefully enough, investing only as little as 32% of the required time; (b) their attention diminishes over the course of the survey; and (c) their self-reports of the survey experience do not reflect actual survey completion behavior. As much as 15% of survey data may be negatively affected by systematic respondent inattention. From these findings, we derive practical recommendations on how to improve pre-testing of surveys and how to reduce the likelihood of survey respondents ignoring instructions and not reading survey questions and answer options.
In this article, the author argues that marketing will not become a science until we agree on an optimal standard measure (OSM) for each of our major constructs. The case for OSMs is made by critically examining the leading alternative measures of four constructs used widely in marketing management – corporate business reputation, corporate ethical reputation, customer satisfaction, and customer recommendation – and showing how we might progress towards designing an OSM for each.
Full-text available
Branding is a key strategy widely used in commercial marketing to make products more attractive to consumers. With the exception of bottled water, branding has largely not been adopted in the water context although public acceptance is critical to the implementation of water augmentation projects. Based on responses from 6247 study participants collected between 2009 and 2012, this study shows that (1) different kinds of water - specifically recycled water, desalinated water, tap water and rainwater from personal rainwater tanks - are each perceived very differently by the public, (2) external events out of the control of water managers, such as serious droughts or floods, had a minimal effect on people's perceptions of water, (3) perceptions of water were stable over time, and (4) certain water attributes are anticipated to be more effective to use in public communication campaigns aiming at increasing public acceptance for drinking purposes. The results from this study can be used by a diverse range of water stakeholders to increase public acceptance and adoption of water from alternative sources.
Full-text available
Response styles are a source of contamination in questionnaire ratings, and therefore they threaten the validity of conclusions drawn from marketing research data. In this article, the authors examine five forms of stylistic responding (acquiescence and disacquiescence response styles, extreme response style/response range, midpoint responding, and noncontingent responding) and discuss their biasing effects on scale scores and correlations between scales. Using data from large, representative samples of consumers from 11 countries of the European Union, the authors find systematic effects of response styles on scale scores as a function of two scale characteristics (the proportion of reverse-scored items and the extent of deviation of the scale mean from the midpoint of the response scale) and show that correlations between scales can be biased upward or downward depending on the correlation between the response style components. In combination with the apparent lack of concern with response styles evidenced in a secondary analysis of commonly used marketing scales, these findings suggest that marketing researchers should pay greater attention to the phenomenon of stylistic responding when constructing and using measurement instruments.
Full-text available
How do respondents use the Don't know answer option in surveys? We investigate this question in the context of brand image measurement, using an experimental design with about 2,000 respondents and, for the first time, considering a range of commonly used answer formats. Results indicate that Don't know options are primarily used when respondents genuinely cannot answer the question, as opposed to representing a quick, low-effort option to complete a survey. Two practical conclusions arise from this study: (1) a Don't know option should be offered in cases where it is expected that some respondents may be unfamiliar with some brands under study; and (2) answer formats without a midpoint should be used in brand image studies because midpoints can either be falsely misinterpreted as an alternative to ticking the Don't know option, or used as an avenue for respondent satisficing.
This study attempted to determine the feasibility of quantitatively measuring consumers’ prior brand information. A measurement procedure based on the Bayesian concept of a prior distribution is presented, and its use by a sample of husband-wife dyads is discussed.
The debate on the properties of the optimal scale for measuring customer satisfaction continues with this final contribution. Readers are invited to respond and contribute to this lively and interesting exchange of ideas.
This book proposes a revolutionary new theory of construct measurement - called C-OAR-SE - for the social sciences. The acronym is derived from the following key elements: construct definition; object representation; attribute classification; rater entity identification; selection of item type; enumeration and scoring. The new theory is applicable to the design of measures of constructs in: • management • marketing • information systems • organizational behavior • psychology • sociology C-OAR-SE is a rationally rather than empirically-based theory and procedure. It can be used for designing measures of the most complex and also the most basic constructs that we use in social science research. C-OAR-SE is a radical alternative to the traditional empirically-based psychometric approach, and a considerable amount of the book's content is devoted to demonstrating why the psychometric approach does not produce valid measures. The book argues that the psychometric approach has resulted in many misleading findings in the social sciences and has led to erroneous acceptance - or rejection - of many of our main theories and hypotheses, and that the C-OAR-SE approach to measurement would correct this massive problem. The main purpose of this book is to introduce and explain C-OAR-SE construct measurement theory in a way that will be understood by all social science researchers and that can be applied to designing new, more valid measures. Featuring numerous examples, practical applications, end-of-chapter questions, and appendices, the book will serve as an essential resource for students and professional researcher alike. © Springer Science+Business Media, LLC 2011. All rights reserved.
This study compared rating, ranking and 'pick-any' measures of brand image associations. The pick-any technique is a free response measure, where respondents are given an attribute as a cue and asked which brands they associate with it. It is a free response in that respondents can link any, all or no brands with each attribute. It only captures the association, however, with no indication of relative strength. The study confirmed past findings that the three measures are highly correlated at brand level (average correlation of 0.90). Further analysis at individual level found that individuals utilised the three measures in a consistent manner, suggesting that the measures are virtually interchangeable. The main exception to this is when respondents rank brands; low ranks may simply be a reflection of unfamiliarity rather than poor performance on the attribute. When examining the time taken to administer each technique, however, there are clear benefits in a pick-any approach, which takes about half the time of the other methods to collect image data.