Article

Re-Examining the Middle Means Typical and the Left and Top Means First Heuristics Using Eye-Tracking Methodology

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Web surveys are a common self-administered mode of data collection using written language to convey information. This language is usually accompanied by visual design elements, such as numbers, symbols, and graphics. As shown by previous research, such elements of survey questions can affect response behavior because respondents sometimes use interpretive heuristics, such as the “middle means typical” and the “left and top means first” heuristics when answering survey questions. In this study, we adopted the designs and survey questions of two experiments reported in Tourangeau, Couper, and Conrad (2004). One experiment varied the position of nonsubstantive response options in relation to other substantive response options and the second experiment varied the order of the response options. We implemented both experiments in an eye-tracking study. By recording respondents’ eye movements, we are able to observe how they read question stems and response options and we are able to draw conclusions about the survey response process the questions initiate. This enables us to investigate the mechanisms underlying the two interpretive heuristics and to test the assumptions of Tourangeau et al. (2004) about the ways in which interpretive heuristics influence survey responding. The eye-tracking data reveal mixed results for the two interpretive heuristics. For the middle means typical heuristic, it remains somewhat unclear whether respondents seize on the conceptual or visual midpoint of a response scale when answering survey questions. For the left and top means first heuristic, we found that violations of the heuristic increase response effort in terms of eye fixations. These results are discussed in the context of the findings of the original studies.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
The measurement of respondents' attitudes is key in social science research and many adjacent research fields. A common method to measure this information is to use Likert-type questions that consist of a statement that is evaluated with a rating scale. As shown by previous research, the scale design of Likert-type questions can have a profound impact on respondents’ answer behavior. In this study, we therefore investigate the measurement properties of scales that systematically vary with respect to polarity (i.e., unipolar and bipolar) and labeling (i.e., completely and end). We conducted a survey experiment in a probability-based online panel (N = 4851) and used questions on income (in)equality that were adopted from the European Social Survey (ESS). The results reveal considerable differences between the scales under investigation. They show that end labeled unipolar and bipolar scales accomplish the criteria of equidistance best. Completely labeled bipolar scales, in contrast, only show a poor performance in terms of equidistance. Completely labeled unipolar scales are somewhere in between. Overall, our findings suggest that researchers should be careful when using survey data measured with (slightly) different scales because the results might not be comparable.
Article
Full-text available
In social science research, unipolar and bipolar scales are commonly used methods in measuring respondents’ attitudes and opinions. Compared to other rating scale characteristics, scale polarity (unipolar and bipolar) and its effects on response behavior have rarely been addressed in previous research. To fill this gap in the literature, we investigate whether and to what extent fully verbalized unipolar and bipolar scales influence response behavior by analyzing observed and latent response distributions and latent thresholds of response categories. For this purpose, we conducted a survey experiment in a probability-based online panel and randomly assigned respondents to a unipolar or bipolar scale condition. The results reveal substantial differences between the two rating scales. They show significantly different response distributions and measurement non-invariance. In addition, response categories (and latent thresholds) of unipolar and bipolar scales are not equally distributed. The findings show that responses to unipolar and bipolar scales differ not only on the observational level but also on the latent level. Both rating scales vary with respect to their measurement properties, so that the responses obtained using each scale are not easily comparable. We recommend not considering unipolar and bipolar scales as interchangeable.
Article
Web surveys are an established data collection mode that use written language to provide information. The written language is accompanied by visual elements, such as presentation formats and shapes. However, research has shown that visual elements influence response behavior because respondents sometimes use interpretive heuristics to make sense of the visual elements. One such heuristic is the ‘left and top means first’ (LTMF) heuristic, which suggests that respondents tend to believe that a response scale consistently runs from left to right or from top to bottom. We conducted a web survey experiment to investigate how violations of the LTMF heuristic affect response behavior and data quality. For this purpose, a random half of respondents received response options that followed a consistent order and the other half received response options that followed an inconsistent order. The results reveal significantly different response distributions between the two groups. We also found that inconsistently ordered response options significantly increase response times and decrease data quality in terms of criterion validity. We, therefore, recommend using options that follow the design strategies of the LTMF heuristic.
Chapter
Full-text available
One of the most extensively discussed (though not most extensively researched, see Converse, 1984) issues in the literature on survey methodology is the choice between an open - or a closed - response format. Researchers are usually advised to use open-ended questions sparingly because they are time consuming, expensive, and difficult to analyze (e.g., Sheatsley, 1985; Sudman & Bradburn, 1983). In fact, “despite a few exceptions, the results of social surveys today are based mainly on what are varyingly called closed, fixed-choice, or precoded questions” (Schuman & Presser, 1981, p. 79). According to textbook: recommendations, the construction of precoded questions should be based on the responses to open-ended questions obtained during pilot studies.
Article
Full-text available
Response order effects are a well-known phenomenon that can occur when answering survey questions with multiple response categories. Although various theoretical explanations exist, the empirical evidence is contradictory. Moreover, different scale types produce different effect sizes. In the current study, we investigate the occurrence and causes of response order effects in horizontal and vertical rating scales by means of eye tracking. We conducted an experiment (n = 84) with two groups and varied the scale direction so that the response scales either ran from agree to disagree or vice versa. The results indicate that response order effects in rating scales are relatively small and are more likely to occur in vertical than in horizontal rating scales. Moreover, our eye-tracking data reveal that respondents do not read all categories, nor do they pay equal attention to all categories; these data support the survey satisficing theory of response order effects (Krosnick, 1991).
Article
Full-text available
We present the results of six experiments that demonstrate the impact of visual features of survey questions on the responses they elicit, the response process they initiate, or both. All six experiments were embedded in Web surveys. Experiments 1 and 2 investigate the effects of the placement of nonsubstantive response options (for example, "No opinion" and "Don't know" answer options) in relation to the substantive options. The results suggest that when these options are not differentiated visually (by a line or a space) from the substantive options, respondents may be misled about the midpoint of the scale; respondents seemed to use the visual rather than the conceptual mid-point of the scale as a reference point for responding. Experiment 3, which varied the spacing of the substantive options, showed a similar result. Responses were pushed in the direction of the visual midpoint when it fell to one side of the conceptual midpoint of the response scale. Experiment 4 examined the effects of varying whether the response options, which were arrayed vertically, followed a logical progression from top to bottom. Respondents answered more quickly when the options followed a logical order. Experiment 5 examined the effects of the placement of an unfamiliar item among a series of similar items. For example, one set of items asked respondents to say whether several makes and models of cars were expensive or not. The answers for the unfamiliar items depended on the items that were nearby on the list. Our last experiment varied whether a battery of related items was administered on a single screen, across two screens, or with each item on its own screen. The intercorrelations among the items were highest when they were all on the same screen. Respondents seem to apply interpretive heuristics in assigning meaning to visual cues in questionnaires. They see the visual midpoint of a scale as representing the typical or middle response; they expect options to be arrayed in a progression beginning with the leftmost or topmost item; and they expect items that are physically close to be related to each other conceptually.
Article
Full-text available
This article reports results from 14 experimental com- parisons designed to test 7 hypotheses about the effects of two types of nonverbal languages (symbols and graphics) on responses to self- administered questionnaires. The experiments were included in a survey of 1,042 university students. Signi ficant differences were observed for most comparisons, providing support for all seven hypotheses. These results support the view that respondents' answers to questions in self- administered surveys are influenced by more than words. Thus, the visual presentation of questions must be taken into consideration when designing such surveys and, especially, when comparing results across surveys in which the visual pres entation of questions is varied.
Article
Full-text available
Survey designers have long assumed that respondents who disagree with a negative question (“This policy is bad.”: Yes or No; 2-point scale) will agree with an equivalent positive question (“This policy is good.”: Yes or No; 2-point scale). However, experimental evidence has proven otherwise: Respondents are more likely to disagree with negative questions than to agree with positive ones. To explain these response effects for contrastive questions, the cognitive processes underlying question answering were examined. Using eye tracking, the authors show that the first reading of the question and the answers takes the same amount of time for contrastive questions. This suggests that the wording effect does not arise in the cognitive stages of question comprehension and attitude retrieval. Rereading a question and its answering options also takes the same amount of time, but happens more often for negative questions. This effect is likely to indicate a mapping difference: Fitting an opinion to the response options is more difficult for negative questions.
Article
Full-text available
We carried out two experiments to investigate how the shading of the options in a response scale affected the answers to the survey questions. The experiments were embedded in two web surveys, and they varied whether the two ends of the scale were represented by shades of the same or different hues. The experiments also varied the numerical labels for the scale points and examined responses to both unipolar scales (assessing frequency) and bipolar scales (assessing favorability). We predicted that the use of different hues would affect how respondents viewed the low end of the scale, making responses to that end seem more extreme than when the two ends were shades of the same hue. This hypothesis was based on the notion that respondents use various interpretive heuristics in assigning meaning to the visual features of survey questions. One such cue is visual similarity. When two options are similar in appearance, respondents will see them as conceptually closer than when they are dissimilar in appearance. The results were generally consistent with this prediction. When the end points of the scale were shaded in different hues, the responses tended to shift toward the high end of the scale, as compared to scales in which both ends of the scale were shaded in the same hue. Though noticeable, this shift was less extreme than the similar shift produced when negative numbers were used to label one end of the scale; moreover, the effect of color was eliminated when each scale point had a verbal label. These findings suggest that respondents have difficulty using scales and pay attention even to incidental features of the response scales in interpreting the scale points.
Article
Full-text available
Survey researchers since Cannell have worried that respondents may take various shortcuts to reduce the effort needed to complete a survey. The evidence for such shortcuts is often indirect. For instance, preferences for earlier versus later response options have been interpreted as evidence that respondents do not read beyond the first few options. This is really only a hypothesis, however, that is not supported by direct evidence regarding the allocation of respondent attention. In the current study, we used a new method to more directly observe what respondents do and do not look at by recording their eye movements while they answered questions in a Web survey. The eye-tracking data indicate that respondents do in fact spend more time looking at the first few options in a list of response options than those at the end of the list; this helps explain their tendency to select the options presented first regardless of their content. In addition, the eye-tracking data reveal that respondents are reluctant to invest effort in reading definitions of survey concepts that are only a mouse click away or paying attention to initially hidden response options. It is clear from the eye-tracking data that some respondents are more prone to these and other cognitive shortcuts than others, providing relatively direct evidence for what had been suspected based on more conventional measures.
Article
In social research, the use of agree/disagree (A/D) questions is a popular method for measuring attitudes. Research has shown that A/D questions require complex cognitive processing and are susceptible to response bias. Thus, some researchers recommend the use of item-specific (IS) questions. This study examines the processing of A/D and IS questions, using eye-tracking methodology. By recording respondents’ eye movements, how respondents process survey questions can be evaluated. The results reveal that IS questions cause more and longer fixations. However, this only applies to the response categories. There are no differences regarding the question stems. Altogether, it seems that IS response categories trigger deeper cognitive processing than A/D response categories.
Article
In empirical social research, using questions with an agreement scale, also known as agree/disagree (A/D) questions, is a popular technique for measuring attitudes and opinions. Methodological considerations, however, suggest that such questions require effortful cognitive processing and are prone to response bias, such as acquiescence. Therefore, many researchers recommend the use of item-specific (IS) questions, which are based on tailored response categories and seem to imply less response burden. In this study, we investigate the cognitive processing of A/D and IS questions in web surveys, using eye-tracking methodology. On the basis of recordings of respondents’ eye movements, we are able to draw conclusions about how respondents process survey questions and to evaluate how they process information. Our results indicate that IS questions require deeper processing than A/D questions. Interestingly, the eye-tracking data reveals that this phenomenon is only observable for the response categories but not for the question stems; this indicates that the stems do not differ in terms of cognitive effort. We therefore argue that the observed differences are directly attributable to a more intensive processing of the IS response categories. Practically speaking, this additionally indicates a more thoughtful processing of the response categories and, thus, might lead to more well-considered and appropriate responses.
Article
Previous studies show that respondents are generally more likely to disagree with negative survey questions (e.g., This is a bad book. Yes/No) than to agree with positive ones (e.g., This a good book. Yes/No). In the current research, we related this effect to the cognitive processes underlying question answering. Using eye-tracking, we show that during the initial reading of the question, negative evaluative terms (e.g., bad) require more processing time than their positive counterparts (e.g., good). In addition to these small differences in the initial stages of question answering, large processing differences occur later in the question answering process: Negative questions are reread longer and more often than their positive counterparts. This is particularly true when respondents answer no rather than yes to negative questions. Hence, wording effects for contrastive questions probably occur because response categories such as Yes and No do not carry an absolute meaning, but are given meaning relative to the evaluative term in the question (e.g., good/bad). As answering no to negative questions requires more processing effort in particular, a likely explanation for the occurrence of the wording effect is that no answers to a negative question convey a mitigated meaning. The activation of this additional pragmatic meaning causes additional processing effort and also causes respondents to pick a no answer to negative questions relatively easily.
Article
Previous research has shown that check-all-that-apply (CATA) and forced-choice (FC) question formats do not produce comparable results. The cognitive processes underlying respondents’ answers to both types of formats still require clarification. This study contributes to filling this gap by using eye-tracking data. Both formats are compared by analyzing attention processes and the cognitive effort respondents spend while answering one factual and one opinion question, respectively. No difference in cognitive effort spent on the factual question was found, whereas for the opinion question, respondents invested more cognitive effort in the FC than in the CATA condition. The findings indicate that higher endorsement in FC questions cannot only be explained by question format. Other possible causes are discussed.
Article
In this study, we investigated whether incorporating eye tracking into cognitive interviewing is effective when pretesting survey questions. In the control condition, a cognitive interview was conducted using a standardized interview protocol that included pre-defined probing questions for about one-quarter of the questions in a 52-item questionnaire. In the experimental condition, participants’ eye movements were tracked while they completed an online version of the questionnaire. Simultaneously, their reading patterns were monitored for evidence of response problems. Afterward, a cognitive interview was conducted using an interview protocol identical to that in the control condition. We compared both approaches with regard to the number and types of problems they detected. We found support for our hypothesis that cognitive interviewing and eye tracking complement each other effectively. As expected, the hybrid method was more productive in identifying both questionnaire problems and problematic questions than applying cognitive interviewing alone.
Article
This chapter presents analytic methods for matched studies with multiple risk factors of interest. We consider matched sample designs of two types, prospective (cohort or randomized) and retrospective (case-control) studies. We discuss direct and indirect parametric modeling of matched sample data and then focus on conditional logistic regression in matched case-control studies. Next, we describe the general case for matched samples including polytomous outcomes. An illustration of matched sample case-control analysis is presented. A problem solving section appears at the end of the chapter.
Book
1. Introduction 2. Respondents' understanding of survey questions 3. The role of memory in survey responding 4. Answering questions about date and durations 5. Attitude questions 6. Factual judgments and numerical estimates 7. Attitude judgments and context effects 8. Mapping and formatting 9. Survey reporting of sensitive topics 10. Mode of data collection 11. Impact of the application of cognitive models to survey measurement.
Article
Effects of the range of response categories provided in a closed answer format on behavioral reports and subsequent judgments were explored. Respondents reported their daily use of television along a scale that either ranged from “up to a half hour” “to more than two and a half hours” or ranged from “up to two and a half hours” to “more than four and a half hours.” The former subjects reported less use of television than the latter and estimated the average use of TV to be lower. Moreover, the former subjects evaluated TV to be more important in their lives (Experiment 1) and reported less satisfaction with the variety of their leisure-time activities (Experiment 2). These results indicate that subjects inferred the average amount of television watching from the response alternatives provided them and used it as a standard of comparison in evaluating their behavior and its implications.
Article
This research distinguishes conversational norms from conversational conventions and tests the notion that violation of conversational conventions in attitude questions disrupts processing and reduces data quality. Our first study showed that in questions with simple, dichotomous affirmative and negative response alternatives, respondents expect the affir-mative response alternative to be offered before the negative one. Four studies showed that The authors thank for their help and advice throughout this project. We also thank David Moore and the Gallup Organi-zation for providing access to the data analyzed in Study 6.
Article
Three experiments indicate that the numeric values provided as part of a rating scale may influence respondents' interpretation of the endpoint labels. In experiment 1, a representative sample of German adults rated their success in life along an 11-point rating scale, with the endpoints labeled “not at all successful” and “extremely successful.” When the numeric values ranged from 0 (“not at all successful”) to 10 (“extremely successful”), 34 percent of the respondents endorsed values between 0 and 5. However, only 13 percent endorsed formally equivalent values between −5 and 0, when the scale ranged from −5 (“not at all successful”) to +5 (“extremely successful”). Experiment 2 provided an extended conceptual replication of this finding, and experiment 3 demonstrates that recipients of a respondent's report draw different inferences from formally equivalent but numerically different values. In combination, the findings indicate that respondents use the numeric values to disambiguate the meaning of scale labels, resulting in different interpretations and, accordingly, different subjective scale anchors.
Article
Presents a model of reading comprehension that accounts for the allocation of eye fixations of 14 college students reading scientific passages. The model deals with processing at the level of words, clauses, and text units. Readers made longer pauses at points where processing loads were greater. Greater loads occurred while readers were accessing infrequent words, integrating information from important clauses, and making inferences at the ends of sentences. The model accounts for the gaze duration on each word of text as a function of the involvement of the various levels of processing. The model is embedded in a theoretical framework capable of accommodating the flexibility of reading. (70 ref)
Effects of Survey Question Comprehensibility on Response Quality
  • Lenzner
Words, Numbers, and Visual Heuristics in Web Surveys: Is There a Hierarchy of Importance?
  • Toepoel
  • Rayner
Eye Movements in Reading and Information Processing: 20 Years of Research
  • Rayner