Article

Estimating Survey Fatigue in Time Use Study

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Efficient study of time use should balance the level of detail required from the respondents (granularity of time units, the number of activities reported in each time unit, the level of detail in which each activity is described) and the burden respondents bear in answering the survey (the number of questions asked in the survey, the time it takes to complete the survey). Filling out a time diary can be a tedious and time-consuming chore, and the quality of data typically deteriorates as respondents progress through the time diary. To overcome some of these disadvantages, we utilize innovative sampled hours time-diary methodology, in which respondents list chronologically all activities they performed during six sampled hours of the previous day. Compared to other time-diary methods, the sampled hours time-diary method minimizes respondent burden. In addition, online administration enables us to provide respondents with memory recall assistance, such as a checklist of possible activities as well as a cumulative activity list for the day. Such memory cues cannot be provided in phone surveys and may be cumbersomely long in printed surveys administered by mail. We estimate the improved accuracy of this method of time use survey by randomly assigning each respondent an additional (seventh) hour. The effects of fatigue on survey response are estimated by comparing the answers of respondents that described a certain hour as an "early" hour (e.g. the 4th hour in the survey), and the answers of the respondents that described their activities during that same hour as a "late" one (e.g. the 5th hour in the survey). We use several different criteria according to which survey fatigue affects reporting, such as the number of activities reported during the hour, and the tendency to avoid reporting activities that call for follow-up questions. We find that the extent of respondents' survey fatigue at late stages in the survey, resulting in under-reporting of activities, is significant both statistically and substantively. These findings have implications with respect to the optimal design of time use surveys; the number of hours about which respondents are asked has to be very limited in order to maintain a reasonable level of accuracy in the responses. @ The Stanford Institute for the Quantitative Study of Society (SIQSS) is a multi-disciplinary independent research center at Stanford University devoted to the pursuit and sponsorship of high-quality empirical social science research about the nature of society and social change.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Ultimately, survey fatigue may even prompt respondents to drop out of the survey entirely. As a result, survey response fatigue might be a potential threat to panel surveys and to surveys that use diaries to gather information [23,26]; these effects can influence validity of the response and impact statistical analysis. ...
Preprint
Full-text available
Background: During the COVID-19 pandemic, the CoMix study, a longitudinal behavioral survey, was designed to monitor social contacts and public awareness in multiple countries, including Belgium. As a longitudinal survey, it is vulnerable to participants’ “survey fatigue”, which may impact inferences. Methods: A negative binomial generalized additive model for location, scale, and shape (NBI GAMLSS) was adopted to estimate the number of contacts reported between age groups and to deal with under-reporting due to fatigue within the study. The dropout process was analyzed with first-order auto-regressive logistic regression to identify factors that influence dropout. Using the so-called next generation principle, we calculated the effect of under-reporting due to fatigue on estimating the reproduction number. Results: Fewer contacts were reported as people participated longer in the survey, which suggests under-reporting due to survey fatigue. Participant dropout is significantly affected by household size and age categories, but not significantly affected by the number of contacts reported in any of the two latest waves. This indicates covariate-dependent missing completely at random (MCAR) in the dropout pattern, when missing at random (MAR) is the alternative. However, we cannot rule out more complex mechanisms such as missing not at random (MNAR). Moreover, under-reporting due to fatigue is found to be consistent over time and implies a 15-30% reduction in both the number of contacts and the reproduction number (R0) ratio between correcting and not correcting for under-reporting. Lastly, we found that correcting for fatigue did not change the pattern of relative incidence between age groups also when considering age-specific heterogeneity in susceptibility and infectivity. Conclusions: CoMix data highlights the variability of contact patterns across age groups and time, revealing the mechanisms governing the spread/transmission of COVID-19/airborne diseases in the population. Although such longitudinal contact surveys are prone to bias, participant fatigue, and drop-out, we showed that these factors can be identified and corrected using NBI GAMLSS. This information can be used to improve the design of similar, future surveys.
... Like us, they find large effects: a 15 minute increase in survey time before the module leads to an 8-17% decline in reported dietary diversity. 2 Finally, in a similar but different design and different context, Backor et al. (2007) conduct a web-based time-use survey in the US in which an extra question is included at a random order, creating variation in how many hours had already been asked about when a particular question appeared in the survey. Similar to these other papers, the authors find that an additional hour lowers the number of activities reported in each subsequent hour by 5 percentage points. ...
... However, the burden of that number of questions increases costs to researchers and may increase respondent fatigue (e.g. Backor, Golde, and Nie 2007;Peytchev and Peytcheva 2017). A less arduous method able to evaluate a respondent's environmental behavior and beliefs would thus make scientific endeavors to mitigate the climate change more efficient. ...
Article
Full-text available
Various measures have been proposed and validated to assess environmental motivation and explain peoples’ consumer behavior. However, most of the measures are rather complex, sometimes comprising dozens of items. In order to overcome the associated response burden, the goal of our research is to validate a much simpler measure of environmental motivation, namely the measure of Climate Change-Stage of Change. To do so we analyze data from a discrete choice experiment in which drivers decide to purchase a car with different levels of CO2 emissions and we also measure their environmental motivation with three alternative measures. The results show that environmental motivation assessed with Climate Change-Stage of Change explains the choices in the experiment as well as with more complex measures. Our findings have substantial implications for researchers as they may be able to assess climate-relevant motivation – a significant factor for many consumer choices – with a single question.
... Third, in steps 2-4, we used a three-round Delphi process (n=25) to validate and iterate the model before consolidating the collected information into a final step. minimizing survey fatigue [5]. Experts were randomly assigned to one bucket in the first round and then rotated in the subsequent rounds to collect a broad set of opinions. ...
Conference Paper
Full-text available
Cryptocurrencies have gained popularity in recent years. However, for many users, keeping ownership of their cryptocurrency is a complex task. News reports frequently bear witness to scams, hacked exchanges, and fortunes beyond retrieval. However, we lack a systematic understanding of user-centered cryptocurrency threats, as causes leading to loss are scattered across publications. To address this gap, we conducted a focus group (n=6) and an expert elicitation study (n=25) following a three-round Delphi process with a heterogeneous group of blockchain and security experts from academia and industry. We contribute the first systematic overview of threats cryptocurrency users are exposed to and propose six overarching categories. Our work is complemented by a discussion on how the human-computer-interaction community can address these threats and how practitioners can use the model to understand situations in which users might find themselves under the pressure of an attack to ultimately engineer more secure systems.
... Weisberg, 2005, p. 129). Measurement error is difficult to quantify, yet there are two types of evidence that supports a link between survey length and measurement error: (1) when faced with a long interview, respondents become increasingly more likely to say "no" to questions that allow them to skip out of additional questions during the course of the interview, as well as across reinterviews (Biemer, 2000;Shields & To, 2005;Silberstein & Jacobs, 1989), and in self-administered surveys (Backor, Golde, & Nie, 2007;Peytchev, Couper, McCabe, & Crawford, 2006), and (2) the predictive ability of the questions decreases from apparently increased random measurement error as the survey length increases (Peytchev, 2007). Less direct evidence is based Contact information: Andy Peytchev, RTI International, 3040 Cornwallis Rd., Research Triangle Park, NC 27709, (E-mail: an-drey@umich.edu) on indicators that can only be asserted to be related to measurement error, such as finding questions placed towards the end of a self-administered survey to be linked to shorter answers (Galesic & Bosnjak, 2009), faster responding and less variability across questions (Galesic & Bosnjak, 2009;Peytchev, 2007), and extreme straight-lining across items, i.e., selecting the same response option (Herzog & Bachman, 1981). ...
Article
Long survey instruments can be taxing to respondents, which may result in greater measurement error. There is little empirical evidence on the relationship between length and measurement error, possibly leading to longer surveys than desirable. At least equally important is the need for methods to reduce survey length while meeting the survey’s objectives. This study tests the ability to reduce measurement error related to survey length through split questionnaire design, in which the survey is modularized and respondents are randomly assigned to receive subsets of the survey modules. The omitted questions are then multiply imputed for all respondents. The imputation variance, however, may overwhelm any benefits to survey estimates from the reduction of survey length. We use an experimental design to further evaluate the effect of survey length on measurement error and to examine the degree to which a split questionnaire design can yield estimates with less measurement error. We found strong evidence for greater measurement error when the questions were asked late in the survey. We also found that a split questionnaire design retained lower measurement error without compromising total error from the additional imputation variance. This is the first study with an experimental design used to evaluate split questionnaire design, demonstrating substantial benefits in reduction of measurement error. Future experimental designs are needed to empirically evaluate the approach’s ability to reduce nonresponse bias.
... Non-differentiation occurs when a participant is asked to answer a series of questions using the same set of closed-ended answer choices (e.g., a rating scale from "strongly agree" to "strongly disagree") and provides highly similar responses across items without putting much thought into answering, rather than thinking carefully and answering the different questions differently. The theory of satisficing predicts that reduced participant motivation should be especially likely to yield increased satisficing as participant fatigue grows toward the end of a long questionnaire, as evidenced by numerous studies (Backor, Golde, & Nie, 2007;Herzog and Bachman, 1981;Johnson, Sieveking, & Clanton, 1974;Kraut et al., 1975). Therefore, if complete anonymity reduces participant motivation to provide accurate reports, then complete anonymity may yield more evidence of satisficing at the end of a questionnaire than at the beginning. ...
Article
Full-text available
Studies have shown that allowing people to answer questionnaires completely anonymously yields more reports of socially inappropriate attitudes, beliefs, and behaviors, and researchers have often assumed that this is evidence of increased honesty. But such evidence does not demonstrate that reports gathered under completely anonymous conditions are more accurate. Although complete anonymity may decrease a person's motivation to distort reports in socially desirable directions, complete anonymity may also decrease accountability, thereby decreasing motivation to answer thoughtfully and precisely. Three studies reported in this paper demonstrate that allowing college student participants to answer questions completely anonymously sometimes increased reports of socially undesirable attributes, but consistently reduced reporting accuracy and increased survey satisficing. These studies suggest that complete anonymity may compromise measurement accuracy rather than improve it.
... Non-differentiation occurs when a participant is asked to answer a series of questions using the same set of closed-ended answer choices (e.g., a rating scale from " strongly agree " to " strongly disagree " ) and provides highly similar responses across items without putting much thought into answering, rather than thinking carefully and answering the different questions differently. The theory of satisficing predicts that reduced participant motivation should be especially likely to yield increased satisficing as participant fatigue grows toward the end of a long questionnaire, as evidenced by numerous studies (Backor, Golde, & Nie, 2007; Herzog and Bachman, 1981; Johnson, Sieveking, & Clanton, 1974; Kraut et al., 1975). Therefore, if complete anonymity reduces participant motivation to provide accurate reports, then complete anonymity may yield more evidence of satisficing at the end of a questionnaire than at the beginning. ...
... This possibility has been examined in a variety of experiments assessing the impact on data quality of earlier versus later item placement. Consistent with expectations about fatigue and satisficing, several studies have found higher missing data levels, greater agreement, less detailed answers, or less differentiation among items when they appear later in a questionnaire compared to the same items placed earlier (Johnson et al., 1974;Kraut et al., 1975;Herzog and Bachman, 1981;Backor, Golde, and Nie, 2007). Most of the studies reporting such effects involved self-administered questionnaires. ...
... Some point out that longer periods of observation cause fatigue or diminished motivation Brög and Meyburg, 1980;Axhausen et al., 2002;Backor et al., 2007). So the increase of the number of diary days will lead to more inaccuracies. ...
Article
The article examines methodological and substantial problems faced by Canadian time use research. It assesses the gains and the limitations of this research from a historical and comparative perspectives.
... Some point out that longer periods of observation cause fatigue or diminished motivation (Szalai, 1972;Brög and Meyburg, 1980;Axhausen et al., 2002;Backor et al., 2007). So the increase of the number of diary days will lead to more inaccuracies. ...
Article
Full-text available
Time budget studies differ in the number of diary days. The ‘Guidelines on Harmonized European Time-Use Surveys (HETUS)’ issued by EUROSTAT recommend a two-day diary with both one weekday and one weekend day. In this contribution we examine whether the number of diary days has an effect on the quality of timeuse indicators. A lot of time-use researchers plead for a longer period of observation; some of them even argue that one- or two-day diaries are not very valuable since the high demands of scientific research cannot be accomplished unless multi-day cycles are captured. Longer periods of observation offer better prospects for analyses, especially for the study of rhythms and activity patterns which typically follow cycles of multi-day duration, and which are part of daily life. Other authors however point out that longer periods of observation cause fatigue or diminished motivation and thus will lead to more inaccuracies. In this contribution we use the pooled Flemish time budget data from 1999 and 2004 to compare 7-day diaries with the 2-day diaries as recommended by the EUROSTAT-guidelines. The respondents of the Flemish time use surveys all filled in diaries for 7 consecutive days. To simulate the 2-day registration, we randomly selected one weekday and one weekend day for each respondent. The 2-day selection was compared with the original 7-day registration. The aim of this comparison is to inventory the advantages and disadvantages of the 2-day and 7-day registration method. To do that, we compare different indicators, such as the averages and the standard deviations of the duration of several activities. We further examine whether certain types of activities are more affected by the method of registration than others. Finally we examine whether a longer period of registration negatively affects the quality of the data (less detail and less accurate).
... Some point out that longer periods of observation cause fatigue or diminished motivation Brög and Meyburg, 1980;Axhausen et al., 2002;Backor et al., 2007). So the increase of the number of diary days will lead to more inaccuracies. ...
Article
Ian Cullen and his research colleagues long ago suggested that people form habits in daily life that suboptimize behavior in view of constraints. Such rational suboptimization is posited here to apply to trips between home and work and to vary by time of the day. Previous research suggests that afternoons prove more difficult for people than mornings, with rush hour traffic patterns shown as one aspect. This paper contrasts with episode level data from Statistics Canada’s 2005 time-use survey the temporal pattern (shown as a “travel pulse”) of weekday commutes between home and job by full-time workers with external workplaces. The mean trip duration in the morning is less than in the afternoon, as is its standard deviation. This is rooted in a visibly greater dispersion of rational starting times from home in the morning with arrival at work at various times in advance of the start to the formal work day, while, in the afternoon, people typically depart from work directly at externally-determined closing times and in concentrated peaks. The result is that nearly twice the number of commuters set out at the same time during the afternoons than in the mornings. The less than individually-rational intensity of the afternoon commuting context is compounded by the concentration of everyday shopping stops during the afternoon commute. Mode of travel accounts for significantly different mean trip times, but differences in trip duration by time of day transcend travel mode. Differences by gender interact with mode of travel but are not generally significant. The rich legacy established by Andrew Harvey is apparent, as he has been an influential shaper and advocate of the Statistics Canada’s time-use surveys, the use of such data for transportation analyses, and a focus on episode-level analysis.
... Some point out that longer periods of observation cause fatigue or diminished motivation Brög and Meyburg, 1980;Axhausen et al., 2002;Backor et al., 2007). So the increase of the number of diary days will lead to more inaccuracies. ...
Article
Full-text available
In the 1980s, Harvey originated the key concept for the representation of multiple simultaneous activities without violating the constraint of the 24-hourday – the "hypercode". This implements his conceptual innovation in the context of childcare, and suggests a means of graphical representation.
... Some point out that longer periods of observation cause fatigue or diminished motivation Brög and Meyburg, 1980;Axhausen et al., 2002;Backor et al., 2007). So the increase of the number of diary days will lead to more inaccuracies. ...
Article
Media and other accounts of life after retirement suggest it to be “The Golden Years” of life, when the elderly have true leisure in the classic sense of freedom from responsibilities of work. However, like earlier time-diary studies, data from the 2003-07 Americans Time Use Project (ATUS) indicate that the great majority of seniors’ extra 20+ hours of free time is concentrated on three activities – TV, reading and rest. Only a few more hours are spent on sleep. Despite reports of increased work time among seniors, relatively few of those in Andy’s new age bracket remain in the labor force and they work fewer hours.
Article
Living standards measurement surveys require sustained attention for several hours. We quantify survey fatigue by randomizing the order of questions in 2-3 hour-long in-person surveys. An additional hour of survey time increases the probability that a respondent skips a question by 10%–64%. Because skips are more common, the total monetary value of aggregated categories such as assets or expenditures declines as the survey goes on, and this effect is sizeable for some categories: for example, an extra hour of survey time lowers food expenditures by 25%. We find similar effect sizes within phone surveys in which respondents were already familiar with questions, suggesting that cognitive burden may be a key driver of survey fatigue.
Article
Full-text available
With the ongoing need for water conservation, the American Southwest has worked to increase harvested rainwater efforts to meet municipal needs. Concomitantly, environmental pollution is prevalent, leading to concerns regarding the quality of harvested rainwater. Project Harvest , a co-created community science project, was initiated with communities that neighbor sources of pollution. To better understand how a participant’s socio-demographic factors affect home characteristics and rainwater harvesting infrastructure, pinpoint gardening practices, and determine participant perception of environmental pollution, a 145-question “Home Description Survey” was administered to Project Harvest participants ( n = 167) by project promotoras (community health workers). Race/ethnicity and community were significantly associated ( p < 0.05) with participant responses regarding proximity to potential sources of pollution, roof material, water harvesting device material, harvesting device capacity, harvesting device age, garden amendments, supplemental irrigation, and previous contaminant testing. Further, the study has illuminated the idiosyncratic differences in how underserved communities perceive environmental pollution and historical past land uses in their community. We propose that the collection of such data will inform the field on how to tailor environmental monitoring efforts and results for constituent use, how community members may alter activities to reduce environmental hazard exposure, and how future studies can be designed to meet the needs of environmentally disadvantaged communities.
Article
Contingent valuation (CV) methods are used in many contexts to estimate non-tangible costs, despite some indications that they may not be reliable. In criminal justice, CV has been used to generate “costs of crime” for street, violent, and white-collar crimes. This article explores respondent fatigue using both quantitative and qualitative indicators from an open-ended CV survey where respondents were asked how much they would be willing to pay to reduce certain crimes. Our findings reveal that willingness to pay (WTP) to reduce crime increases when both problematic response patterns and fatigue effects are accounted for in the calculation, indicating that fatigued respondents who also engage in straight lining are driving the WTP estimates down. We conclude by discussing the implications of our results for policymakers and other consumers of CV studies.
Article
Full-text available
Administrative data sources are increasingly used by National Statistical Institutes to compile statistics. These sources may be based on decentralised autonomous administrations, for instance municipalities that deliver data on their inhabitants. One issue that may arise when using these decentralised administrative data is that categorical variables are underreported by some of the data suppliers, for instance to avoid administrative burden. Under certain conditions overreporting may also occur. When statistical output on changes is estimated from decentralised administrative data, the question may arise whether those changes are affected by shifts in reporting frequencies. For instance, in a case study on hospital data, the values from certain data suppliers may have been affected by changes in reporting frequencies. We present an automatic procedure to detect suspicious data suppliers in decentralised administrative data in which shifts in reporting behaviour are likely to have affected the estimated output. The procedure is based on a predictive mean matching approach, where part of the original data values are replaced by imputed values obtained from a selected reference group. The method is successfully applied to a case study with administrative hospital data.
Conference Paper
Full-text available
Classroom response devices, such as clickers, have proved effective in improving student engagement during class time. We performed a study to investigate how much of this improvement was due to heightened accountability, either because students were required to take and pass a pre-quiz over the lecture material, or because students were given credit for each answer submitted. We found that the presence of a pre-quiz was associated with a much higher response rate, 38.5% vs. 29.3%. Giving credit for answering questions also boosted the response rate, from 30.3% to 43.2%. We also found that asking more questions during class tended was associated with a lower response rate. When only one question was asked, the response rate was above 60%, but if more than five questions were asked, the response rate was barely 30%. These findings suggest that accountability is important in making effective use of classroom response devices.
we would like to know about all the things you did between [START time] and [END time] YESTERDAY. What was the MAIN thing you were doing at [START time] YESTERDAY? People often do several activities at the same time
  • First
First, we would like to know about all the things you did between [START time] and [END time] YESTERDAY. What was the MAIN thing you were doing at [START time] YESTERDAY? People often do several activities at the same time, but please select what you consider to be your PRIMARY activity.