Article

Handle with Care: Implementation of the List Experiment and Crosswise Model in a Large-scale Survey on Academic Misconduct

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This research analyzes the effectiveness of the list experiment and crosswise model in measuring self-plagiarism and data manipulation. Both methods were implemented in a large-scale survey of academics on social norms and academic misconduct. As the results lend little confidence about the effectiveness of the methods, researchers are best advised to avoid them or, at best, to handle them with care.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In order not to discard these items from our analysis, and considering it is not unrealistic to assume that the prevalence in the population is not exactly 0, we set the z score for these items to −3.5 which is a little below the z score of −3.1 for the items with a DQ and CM prevalence estimate of 1%. Additionally, four items with negative CM prevalence estimates Jerke et al., 2021) were truncated at 0. To account for the nesting of items within studies, we performed a multilevel analysis on the difference scores. To examine the dependence of the d score on the sensitivity of the item, we calculated a proxy for sensitivity as the absolute value of Z DQ . ...
... Of the 45 included studies, publication years range from 2011 to 2021 (Canan et al., 2021;Jerke et al., 2021;Mieth et al., 2021). After a 3-year hiatus, on average 4-6 papers have been published each year (Figure 3). ...
... Studies originated from Germany (k = 16), Iran (k = 12), the US (k = 4), Switzerland (k = 3), Austria (k = 2), Costa Rica (k = 2), and one study each from Serbia, Turkey, and the UK. There were three international studies with samples from Germany and Switzerland , Germany, Switzerland and the UK (Jerke et al., 2019), and Austria, Germany, and Switzerland (Jerke et al., 2021). ...
Article
Full-text available
Tools for reliable assessment of socially sensitive or transgressive behavior warrant constant development. Among them, the Crosswise Model (CM) has gained considerable attention. We systematically reviewed and meta-analyzed empirical applications of CM and addressed a gap for quality assessment of indirect estimation models. Guided by the PRISMA protocol, we identified 45 empirical studies from electronic database and reference searches. Thirty of these were comparative validation studies (CVS) comparing CM and direct question (DQ) estimates. Six prevalence studies exclusively used CM. One was a qualitative study. Behavior investigated were substance use and misuse (k = 13), academic misconduct (k = 8), and corruption, tax evasion, and theft (k = 7) among others. Majority of studies (k = 39) applied the “more is better” hypothesis. Thirty-five studies relied on birthday distribution and 22 of these used P = 0.25 for the non-sensitive item. Overall, 11 studies were assessed as high-, 31 as moderate-, and two as low quality (excluding the qualitative study). The effect of non-compliance was assessed in eight studies. From mixed CVS results, the meta-analysis indicates that CM outperforms DQ on the “more is better” validation criterion, and increasingly so with higher behavior sensitivity. However, little difference was observed between DQ and CM estimates for items with DQ prevalence estimate around 50%. Based on empirical evidence available to date, our study provides support for the superiority of CM to DQ in assessing sensitive/transgressive behavior. Despite some limitations, CM is a valuable and promising tool for population level investigation.
... Fourth, we discarded the list experiment results because the implementation was deficient in several respects, which we cannot adequately explain. The decision is supported by similar challenges faced in other surveys on integrity [99]. Fifth, we changed the reported regression model to one not preregistered to ease interpretation, noting that results under the preregistered model are similar. ...
... While the former is among the highest for comparable studies, it seems futile to defend it as representative of the population of researchers in Denmark when it is almost certain that selfselection is an issue. We did implement a list experiment in the questionnaire, but unfortunately, the implementation of the instrument did not work out as planned (see also Jerke et al. [99]), so we do not have a measure of possible social desirability in the self-reported response patterns. Nevertheless, we have two surveys that differ considerably in response rates, are biased by self-selection and social desirability to some unknown extent and show remarkably similar response patterns. ...
Article
Full-text available
Questionable research practices (QRP) are believed to be widespread, but empirical assessments are generally restricted to a few types of practices. Furthermore, conceptual confusion is rife with use and prevalence of QRPs often being confused as the same quantity. We present the hitherto most comprehensive study examining QRPs across scholarly fields and knowledge production modes. We survey perception, use, prevalence and predictors of QRPs among 3,402 researchers in Denmark and 1,307 in the UK, USA, Croatia and Austria. Results reveal remarkably similar response patterns among Danish and international respondents (τ = 0.85). Self-reported use indicates whether respondents have used a QRP in recent publications. 9 out of 10 respondents admitted using at least one QRP. Median use is three out of nine QRP items. Self-reported prevalence reflects the frequency of use. On average, prevalence rates were roughly three times lower compared to self-reported use. Findings indicated that the perceived social acceptability of QRPs influenced self-report patterns. Results suggest that most researchers use different types of QRPs within a restricted time period. The prevalence estimates, however, do not suggest outright systematic use of specific QRPs. Perceived pressure was the strongest systemic predictor for prevalence. Conversely, more local attention to research cultures and academic age was negatively related to prevalence. Finally, the personality traits conscientiousness and, to a lesser degree, agreeableness were also inversely associated with self-reported prevalence. Findings suggest that explanations for engagement with QRPs are not only attributable to systemic factors, as hitherto suggested, but a complicated mixture of experience, systemic and individual factors, and motivated reasoning.
... Fourth, we discarded the list experiment results because the implementation was deficient in several respects, which we cannot adequately explain. The decision is supported by similar challenges faced in other surveys on integrity [99]. Fifth, we changed the reported regression model to one not preregistered to ease interpretation, noting that results under the preregistered model are similar. ...
... While the former is among the highest for comparable studies, it seems futile to defend it as representative of the population of researchers in Denmark when it is almost certain that selfselection is an issue. We did implement a list experiment in the questionnaire, but unfortunately, the implementation of the instrument did not work out as planned (see also Jerke et al. [99]), so we do not have a measure of possible social desirability in the self-reported response patterns. Nevertheless, we have two surveys that differ considerably in response rates, are biased by self-selection and social desirability to some unknown extent and show remarkably similar response patterns. ...
Preprint
Full-text available
Questionable research practices (QRP) are believed to be widespread, but empirical assessments are generally restricted to a few types of practices. Furthermore, conceptual confusion is rife with use and prevalence of QRPs often being confused as the same quantity. We present the hitherto most comprehensive study examining QRPs across scholarly fields and knowledge production modes. We survey perception, use, prevalence and predictors of QRPs among 3,402 researchers in Denmark and 1,307 in the UK, USA, Croatia and Austria.Results reveal remarkably similar response patterns among Danish and international respondents (τ = 0.85). Self-reported use indicates whether respondents have used a QRP in recent publications. 9 out of 10 respondents admitted using at least one QRP. Median use is three out of nine QRP items. Self-reported prevalence reflects the frequency of use. On average, prevalence rates were roughly three times lower compared to self-reported use. Findings indicated that the perceived social acceptability of QRPs influenced self-report patterns. Results suggest that most researchers use different types of QRPs within a restricted time period. The prevalence estimates, however, do not suggest outright systematic use of specific QRPs. Perceived pressure was the strongest systemic predictor for prevalence. Conversely, more local attention to research cultures and academic age was negatively related to prevalence. Finally, the personality traits conscientiousness and, to a lesser degree, agreeableness were also inversely associated with self-reported prevalence. Findings suggest that explanations for engagement with QRPs are not only attributable to systemic factors, as hitherto suggested, but a complicated mixture of experience, systemic and individual factors, and motivated reasoning.
... Inference relies again on likelihood 2. Model CM2 is rarely applied for the CM (e.g. Jerke et al. 2021). Although it allows a larger variety of non-threatening and nonsalient baseline items, it is typically more inefficient than Model M1, because the control group is only used to infer the prevalence of the baseline item. ...
... 4. We recommend selecting baseline items that are substantively (topic) and semantically (word meaning and form) distant from the target item. For instance, we would not recommend administering target items about academic misconduct with baseline items about conference attendance (as in the study by Jerke et al. (2021)), nor personality traits related to honesty. Distant personality traits may be suitable, such as extraversion and openness to experience. ...
Article
Full-text available
When surveys contain direct questions about sensitive topics, participants may not provide their true answers. Indirect question techniques incentivize truthful answers by concealing participants’ responses in various ways. The Crosswise Model aims to do this by pairing a sensitive target item with a non-sensitive baseline item, and only asking participants to indicate whether their responses to the two items are the same or different. Selection of the baseline item is crucial to guarantee participants’ perceived and actual privacy and to enable reliable estimates of the sensitive trait. This research makes the following contributions. First, it describes an integrated methodology to select the baseline item, based on conceptual and statistical considerations. The resulting methodology distinguishes four statistical models. Second, it proposes novel Bayesian estimation methods to implement these models. Third, it shows that the new models introduced here improve efficiency over common applications of the Crosswise Model and may relax the required statistical assumptions. These three contributions facilitate applying the methodology in a variety of settings. An empirical application on attitudes toward LGBT issues shows the potential of the Crosswise Model. An interactive app, Python and MATLAB codes support broader adoption of the model.
... This is partly by design -these techniques deliberately add noise to the signal from participant responses to protect individual privacy. Additionally, some indirect questions are more difficult for the respondent to understand than direct questions (Jerke et al., 2022), which adds noise if respondents answer the questions incorrectly. Considering whether direct or indirect question techniques are more appropriate therefore requires a trade-off between bias and noise. ...
... A number of studies compare these three indirect question techniques. CM has been shown to perform better than list experiments (Jerke et al., 2022) and RRT (Höglinger and Jann, 2018). RRT has been shown to outperform list experiments in one study (Rosenfeld et al., 2016) but perform less well than list experiments in another study (Coutts and Jann, 2011). ...
Technical Report
Full-text available
Gambling is a large and growing industry. With that growth, there has also been growing concern about the potential harms that can arise from problem gambling. In late 2022, new legislation was introduced in Ireland to provide for more stringent regulation of the gambling industry and to establish an independent regulator, the Gambling Regulatory Authority of Ireland (GRAI). This review summarises and evaluates evidence from international research that is relevant to a number of policy questions. In doing so, it also identifies where evidence is deficient or lacking, to highlight some important and fruitful avenues for future research.
... Nonetheless, the crosswise model is still implemented by researchers who Corresponding author: Sandra Walzenbach, University of Konstanz, Department of Sociology, Universitätsstraße 10, 78464 Konstanz, Germany (E-mail: sandra.walzenbach@uni-konstanz. de) believe in its attenuating effects on social desirability bias (Canan et al., 2021;Hopp & Speil, 2021;Mieth et al., 2021) or do not account for potential problems in their design (Jerke et al., 2022). Even two recent meta-analyses paint a rather positive picture: While one admittedly at least suspects problems with less educated respondents and with publication bias favouring significant results (Schnell & Thomas, 2021), one considers the crosswise model a "promising" method (Sagoe et al., 2021). ...
... As argued above, most applications of the crosswise model have assessed socially undesirable behaviour with low prevalence rate such as plagiarism, xenophobia, tax evasion, and drug consumption (Coutts et al., 2011;Hoffmann & Musch, 2016;Höglinger et al., 2016;Jann et al., 2012;Jerke et al., 2022;Korndörfer et al., 2014;Shamsipour et al., 2014). Typically, the crosswise estimate is compared to an experimental condition with a direct question and a higher CM than DQ prevalence is interpreted as a successful reduction in social desirability bias. ...
Article
Full-text available
This validation study on the crosswise model (CM) examines five survey experiments that were implemented in a general population survey. Our first crucial result is that in none of these experiments was the crosswise model able to verifiably reduce social desirability bias. In contrast to most previous CM applications, we use an experimental design that allows us to distinguish a reduction in social desirability bias from heuristic response behaviour, such as random ticking, leading to false positive or false negative answers. In addition, we provide insights into two potential explanatory mechanisms that have not yet received attention in empirical studies: response order effects and learning via repeated exposure. We do not find consistent response order effects, nor does response quality improve due to learning when respondents have had experiences with crosswise models in past survey waves. We interpret our results as evidence that the crosswise model does not work in general population surveys.
... The parameter π represents the unknown prevalence of the sensitive attribute, which has to be estimated; p represents the known randomization probability the known prevalence of experimentally induced cheating behavior (Hoffmann et al., 2015) as well as the prevalence of a non-sensitive control attribute (Hoffmann et al., 2020;Hoffmann & Musch, 2016, 2019. However, in some studies, the CWM provided prevalence estimates that did not differ from DQ estimates or were even negative (Hoffmann et al., 2020;Höglinger et al., 2016;Jerke et al., 2022). Moreover, in recent studies with a known status of individual respondents the CWM was found to sometimes produce false positives because some non-carriers of the sensitive attribute were falsely classified as carriers. ...
... The exact reasons for the different outcomes have yet to be understood and are the subject of current scientific debate, but some potential candidates that have been identified in previous studies and the current work are: Sample characteristics, especially with regard to respondent education, since lower education has been linked to lower instruction comprehension (Meisters et al., 2020a); sample size, since estimates based on small samples are more susceptible to the influence of random error and response bias; mode of administration (e.g. online vs. offline; Sagoe et al., 2021); the exact wording of the sensitive statement under investigation, since the wording may cause attributes to be perceived as too sensitive, not sensitive enough, or ambiguous (Hoffmann et al., 2020;Jerke et al., 2022;Sagoe et al., 2021); the choice of the non-sensitive statement used for randomization and the respective randomization probability, since the current study has shown that different randomization probabilities can sometimes result in different prevalence estimates for the same sensitive attribute; and the number of groups used when employing the CWM or the ECWM as an indirect questioning technique, respectively. Most previous studies opted for the one-group design of the original CWM which only applies one randomization probability. ...
Article
Full-text available
The Randomized Response Technique (Warner, Journal of the American Statistical Association, 60, 63-69, 1965) has been developed to control for socially desirable responses in surveys on sensitive attributes. The Crosswise Model (CWM; Yu et al., Metrika, 67, 251-263, 2008) and its extension, the Extended Crosswise Model (ECWM; Heck et al., Behavior Research Methods, 50, 1895-1905, 2018), are advancements of the Randomized Response Technique that have provided promising results in terms of improved validity of the obtained prevalence estimates compared to estimates based on conventional direct questions. However, recent studies have raised the question as to whether these promising results might have been primarily driven by a methodological artifact in terms of random responses rather than a successful control of socially desirable responding. The current study was designed to disentangle the influence of successful control of socially desirable responding and random answer behavior on the validity of (E)CWM estimates. To this end, we orthogonally manipulated the direction of social desirability (undesirable vs. desirable) and the prevalence (high vs. low) of sensitive attributes. Our results generally support the notion that the ECWM successfully controls social desirability bias and is inconsistent with the alternative account that ECWM estimates are distorted by a substantial influence of random responding. The results do not rule out a small proportion of random answers, especially when socially undesirable attributes with high prevalence are studied, or when high randomization probabilities are applied. Our results however do rule out that random responding is a major factor that can account for the findings attesting to the improved validity of (E)CWM as compared with DQ estimates.
... ICT is generally rated as preferable to other unobtrusive survey pro-cedures such as randomized response technique, which guarantees privacy by requesting a score for either the sensitive item or an unrelated one, for example -petitions that might confuse or even irritate some participants (Coutts & Jann, 2011;Hox & Lensvelt-Mulders, 2008;Rosenfeld et al., 2016;Wolter & Diekmann, 2021). Although list experiments are comparatively straightforward, a growing number of papers have voiced concerns about various kinds of non-strategic response error and ensuing instability (Tsuchiya & Hirai, 2010;Kiewiet de Jonge & Nickerson, 2014;Ahlquist, 2018;Gosen et al., 2019;Kramon & Weghorst, 2019;Jerke et al., 2019;Ehler et al., 2021;Kuhn & Vivyan, 2021;Riambau & Ostwald 2021;Jerke et al., 2022). ...
Article
Full-text available
This Research Note reports on a list experiment regarding anti-immigrant sentiment (n=1,965) that was fielded in Spain in 2020. Among participants with left-of-center ideology , the experiment originated a negative difference-in-means between treatment and control. Drawing on Zigerell's (2011) deflation hypothesis, we assess the possibility that leftist treatment group respondents may have altered their scores by more than one to distance themselves unmistakably from the sensitive item. We consider this possibility plausible in a context of intense polarization where immigration attitudes are closely associated with political ideology. This study's data speak to the results of recent meta-analyses that have revealed list-experiments to fail when applied to prejudiced attitudes and other highly sensitive issues-i.e., precisely the kind of issues with regard to which the technique ought to work best. We conclude that the possibility of strategic response error in specific respondent categories needs to be considered when staging and interpreting list experiments.
... Moreover, in "strong" validation studies, i.e., studies of a sensitive attribute with a known prevalence, the CWM yielded estimates that were close to the known prevalence [41,42]. However, some validation studies of the CWM yielded results that were not in agreement with the "more-is-better" criterion [18,42,43]. A potential explanation of these latter results is the experimental character of the sensitive attributes, which either had a zero prevalence [44] or were experimentally induced [20,22]. ...
Article
Full-text available
The Extended Crosswise Model (ECWM) is a randomized response model with neutral response categories, relatively simple instructions, and the availability of a goodness-of-fit test. This paper refines this model with a number sequence randomizer that virtually precludes the possibility to give evasive responses. The motivation for developing this model stems from a strategic priority of WADA (World Anti-Doping Agency) to monitor the prevalence of doping use by elite athletes. For this model we derived a maximum likelihood estimator that allows for binary logistic regression analysis. Three studies were conducted on online platforms with a total of over 6, 000 respondents; two on controlled substance use and one on compliance with COVID-19 regulations in the UK during the first lockdown. The results of these studies are promising. The goodness-of-fit tests showed little to no evidence for response biases, and the ECWM yielded higher prevalence estimates than direct questions for sensitive questions, and similar ones for non-sensitive questions. Furthermore, the randomizer with the shortest number sequences yielded the smallest response error rates on a control question with known prevalence.
... Unfortunately, the true value of an incidence remains unknown in the social-political world, so validation studies are often not an option (Landsheer et al. 1999). While this is circumvented by asking an additional direct question for comparison, assuming that the experimental condition will result in a better estimate, an increasing number of studies raise concerns about the validity and reliability of results (Höglinger, and Diekmann 2017;Jerke et al. 2022;Schnell and Thomas 2021). While careful pretesting of the question wording, items, and scenarios will help designing valid and reliable measures for an experimental design, simply adopting experimental designs from another context will be risky. ...
Article
Full-text available
Intended to combine the best of two worlds – the ability to estimate causal effects and to generalize to a wider population – survey experiments are increasingly used as a method of data collection in politics and international relations. This article examines their popularity over the past decades in social science research, discusses the core logic of survey experiments, and reviews the method against the principles of the total survey error paradigm.
... Since the crosswise model was developed to outperform randomized response (Yu et al. 2008), in principle, the crosswise model is expected to better elicit candid answers from survey respondents than any other technique. To date, several validation studies appear to confirm this expectation (Hoffmann et al. 2015;Hoffmann and Musch 2016;Höglinger and Jann 2018;Höglinger, Jann, and Diekmann 2016;Jann, Jerke, and Krumpal 2012;Jerke et al. 2020;Meisters et al. 2020a). ...
Article
Full-text available
The crosswise model is an increasingly popular survey technique to elicit candid answers from respondents on sensitive questions. Recent studies, however, point out that in the presence of inattentive respondents, the conventional estimator of the prevalence of a sensitive attribute is biased toward 0.5. To remedy this problem, we propose a simple design-based bias correction using an anchor question that has a sensitive item with known prevalence. We demonstrate that we can easily estimate and correct for the bias arising from inattentive respondents without measuring individual-level attentiveness. We also offer several useful extensions of our estimator, including a sensitivity analysis for the conventional estimator, a strategy for weighting, a framework for multivariate regressions in which a latent sensitive trait is used as an outcome or a predictor, and tools for power analysis and parameter selection. Our method can be easily implemented through our open-source software cWise .
Article
Full-text available
This article provides a meta-analysis of studies using the crosswise model (CM) in estimating the prevalence of sensitive characteristics in different samples and populations. On a data set of 141 items published in 33 either articles or books, we compare the difference (Δ) between estimates based on the CM and a direct question (DQ). The overall effect size of Δ is 4.88; 95% CI [4.56, 5.21]. The results of a meta-regression indicate that Δ is smaller when general populations and nonprobability samples are considered. The population effect suggests an education effect: Differences between the CM and DQ estimates are more likely to occur when highly educated populations, such as students, are studied. Our findings raise concerns to what extent the CM is able to improve estimates of sensitive behavior in general population samples.
Article
Full-text available
Researchers and practitioners are increasingly using methods from the social sciences to address complex conservation challenges. This brings benefits but also the responsibility to understand the suitability and limitations of these methods in different contexts. After years of use in other disciplines, the unmatched count technique (UCT) has recently been adopted by conservation scientists to investigate illegal and socially undesirable human behaviours. Here we provide guidance for practitioners and researchers on how to apply UCT effectively, and outline situations where it will be the most and least appropriate. We reviewed 101 publications in refereed journals that used UCT to draw conclusions on its use to date and provide recommendations on when and how to use the method effectively in conservation. In particular, we explored: type of studies undertaken (e.g. disciplines; behaviour being studied; rationale for using UCT); survey administration (e.g. sample size, pilot studies, administration mode); UCT outcomes (e.g. type of analyses, estimates, comparison with other methods); and type of recommendations. We show that UCT has been used across multiple disciplines and contexts, with 10 studies that focus on conservation and natural resource use. The UCT has been used to investigate topics falling into five categories: socially undesirable behaviours, socially undesirable views, illegal or non‐compliant behaviours, socially desirable behaviours; and personal topics (e.g. being HIV positive). It has been used in 51 countries and is suitable to several situations, but limitations do exist, and the method does not always improve reporting of sensitive topics. We provide best‐practice guidance to researchers and practitioners considering using UCT. We highlight that alternate methods should be considered if sample sizes are likely to be small, the behaviour in question is likely to be extremely rare, or if the behaviour is not particularly sensitive. UCT can be a useful tool for estimating the extent of non‐compliance within a conservation context, but as with all scientific investigation, careful study design, robust sampling and consistent implementation are required in order for it to be effective.
Article
Full-text available
Social desirability and the fear of sanctions can deter survey respondents from responding truthfully to sensitive questions. Self-reports on norm breaking behavior such as shoplifting, non-voting, or tax evasion may thus be subject to considerable misreporting. To mitigate such response bias, various indirect question techniques, such as the randomized response technique (RRT), have been proposed. We evaluate the viability of several popular variants of the RRT, including the recently proposed crosswise-model RRT, by comparing respondents’ self-reports on cheating in dice games to actual cheating behavior, thereby distinguishing between false negatives (underreporting) and false positives (overreporting). The study has been implemented as an online survey on Amazon Mechanical Turk (N = 6, 505). Our results from two validation designs indicate that the forced-response RRT and the unrelated-question RRT, as implemented in our survey, fail to reduce the level of misreporting compared to conventional direct questioning. For the crosswise-model RRT we do observe a reduction of false negatives. At the same time, however, there is a non-ignorable increase in false positives; a flaw that previous evaluation studies relying on comparative or aggregate-level validation could not detect. Overall, none of the evaluated indirect techniques outperformed conventional direct questioning. Furthermore, our study demonstrates the importance of identifying false negatives as well as false positives to avoid false conclusions about the validity of indirect sensitive question techniques.
Article
Full-text available
Validly measuring sensitive issues such as norm violations or stigmatizing traits through self-reports in surveys is often problematic. Special techniques for sensitive questions like the Randomized Response Technique (RRT) and, among its variants, the recent crosswise model should generate more honest answers by providing full response privacy. Different types of validation studies have examined whether these techniques actually improve data validity, with varying results. Yet, most of these studies did not consider the possibility of false positives, i.e., that respondents are misclassified as having a sensitive trait even though they actually do not. Assuming that respondents only falsely deny but never falsely admit possessing a sensitive trait, higher prevalence estimates have typically been interpreted as more valid estimates. If false positives occur, however, conclusions drawn under this assumption might be misleading. We present a comparative validation design that is able to detect false positives without the need for an individual-level validation criterion — which is often unavailable. Results show that the most widely used crosswise-model implementation produced false positives to a nonignorable extent. This defect was not revealed by several previous validation studies that did not consider false positives — apparently a blind spot in past sensitive question research.
Article
Full-text available
Self-administered online surveys may provide a higher level of privacy protection to respondents than surveys administered by an interviewer. Yet, studies indicate that asking sensitive questions is problematic also in self-administered surveys. Because respondents might not be willing to reveal the truth and provide answers that are subject to social desirability bias, the validity of prevalence estimates of sensitive behaviors from online surveys can be challenged. A well-known method to overcome these problems is the Randomized Response Technique (RRT). However, convincing evidence that the RRT provides more valid estimates than direct questioning in online surveys is still lacking. We therefore conducted an experimental study in which different implementations of the RRT, including two implementations of the so-called crosswise model, were tested and compared to direct questioning. Our study is an online survey (N = 6,037) on sensitive behaviors by students such as cheating in exams and plagiarism. Results vary considerably between different implementations, indicating that practical details have a strong effect on the performance of the RRT. Among all tested implementations, including direct questioning, the unrelated-question crosswise-model RRT yielded the highest estimates of student misconduct.
Article
Full-text available
On surveys that assess sensitive personal attributes, indirect questioning aims at increasing respondents’ willingness to answer truthfully by protecting confidentiality. However, the assumption that subjects understand questioning procedures fully and trust them to protect their privacy is tested rarely. In a scenario-based design, we compared four indirect questioning procedures in terms of comprehensibility and perceived privacy protection. All indirect questioning techniques were found less comprehensible for respondents than a conventional direct question used for comparison. Less-educated respondents experienced more difficulties when confronted with any indirect questioning technique. Regardless of education, the Crosswise Model was found most comprehensible among the four indirect methods. Indirect questioning was perceived to increase privacy protection in comparison to a direct question. Unexpectedly, comprehension and perceived privacy protection did not correlate. We recommend assessing these factors separately in future evaluations of indirect questioning.
Article
Full-text available
In this article, "Benford's law" is applied to the "randomized response technique" (RRT) to increase the validity of answers to sensitive questions. Using the Newcomb-Benford distribution as a randomizing device has several advantages. It is easy to explain and follow the procedure, as no physical device such as a coin or a dice is necessary and the method guarantees full anonymity. As is well known, the price for the anonymity of the RRT is a decrease in the efficiency of the estimator. However, because of the subjective overestimation of certain numbers (Benford illusion), the conflict between the variance of the estimates and the degree of anonymity is less pronounced compared to other RRT methods. The suggested RRT variant has the potential to improve the efficiency of the estimator. Moreover, the assumption is that this method works well with self-administered questionnaires.
Article
Full-text available
Misconduct in academic research is undoubtedly increasing, but studies estimating the prevalence of such behaviour suffer from biases inherent in researching sensitive topics. We compared the unmatched-count technique (UCT) and the crosswise-model (CM), two methods specifically designed to increase honest reporting to sensitive questions, with direct questioning (DQ) for five types of misconduct in the biological sciences. UCT performed better than CM and either outperformed or produced similar estimates to DQ depending on the question. Estimates of academic misconduct increased with decreasing seriousness of the behaviour, from c. 0% for data fabrication to >68% for inappropriate co-authorship. Results show that research into even minor issues of misconduct, is sensitive, suggesting that future studies should consider using specialised questioning techniques as they are more likely to yield accurate figures.
Article
Full-text available
Survey respondents may give untruthful answers to sensitive questions when asked directly. In recent years, researchers have turned to the list experiment (also known as the item count technique) to overcome this difficulty. While list experiments are arguably less prone to bias than direct questioning, list experiments are also more susceptible to sampling variability. We show that researchers need not abandon direct questioning altogether in order to gain the advantages of list experimentation. We develop a nonparametric estimator of the prevalence of sensitive behaviors that combines list experimentation and direct questioning. We prove that this estimator is asymptotically more efficient than the standard difference-in-means estimator, and we provide a basis for inference using Wald-type confidence intervals. Additionally, leveraging information from the direct questioning, we derive two nonparametric placebo tests for assessing identifying assumptions underlying list experiments. We demonstrate the effectiveness of our combined estimator and placebo tests with an original survey experiment.
Article
Full-text available
Warner (1965) introduced the Randomized Response Method (RRM) 20 years ago. This method has been advocated as a useful tool in eliciting sensitive information. Much of the early research focused on various parameter estimation methods. This article provides a comprehensive review of applications of the randomized response method with emphasis on recent publications, identifies issues now being studied, and suggests future research directions. As such, the article addresses method validation, respondent jeopardy, and new applications. In so doing, the method is considered in the broad context of obtaining responses to sensitive questions. Judging by the number of articles published, randomized response method continues to be of interest in numerous and diverse disciplines.
Article
Full-text available
Surveys usually yield reported rates of voting in elections that are higher than official turnout figures, a phenomenon often attributed to intentional misrepresentation by respondents who did not vote and would be embarrassed to admit that. The experiments reported here tested a procedure for reducing social desirability response bias by allowing respondents to report secretly whether they voted: the “randomized response technique.” In a national telephone survey of a sample of American adults and eight national surveys of American adults conducted via the Internet, respondents were either unable or unwilling to implement the randomized response technique properly, raising questions about whether this technique has ever worked properly to achieve its goals.
Chapter
Full-text available
Methods of Indirect Estimation The Item Count Method The National Household Seroprevalence Survey Pretest Preliminary Testing: Focus Groups and Individual Cognitive Interviews Controlled Experiments Final Version of Item Count Used in the NHSS Pretest Results from the NHSS Pretest Conclusions The Simplest form of the Item Count/Paired Estimator Showcard with Sample Item
Article
Full-text available
This study examines two different Randomized Response methods to see whether they evoke sufficient understanding and trust, and ensure fewer evasive answers to socially sensitive questions. Two Randomized Response methods were employed by trained interviewers to study fraud: the Forced Response method, using dice, and Kuk's method, using playing cards. Respondents were selected from the files of the social security offices of three Dutch cities. A total of 334 respondents participated voluntarily in this study of two Randomized Response methods. Most respondents were known to have committed some form of fraud, and their answer on the Randomized Response question is validated with this information. The results indicate that subjects who have a better understanding of the Forced Response technique give more socially undesirable answers. The interviewer has a most important role establishing trust and understanding. Respondents who are less able to understand the instructions, e.g., have limited language abilities, develop less trust in the method.
Article
Full-text available
The frequency with which scientists fabricate and falsify data, or commit other forms of scientific misconduct is a matter of controversy. Many surveys have asked scientists directly whether they have committed or know of a colleague who committed research misconduct, but their results appeared difficult to compare and synthesize. This is the first meta-analysis of these surveys. To standardize outcomes, the number of respondents who recalled at least one incident of misconduct was calculated for each question, and the analysis was limited to behaviours that distort scientific knowledge: fabrication, falsification, “cooking” of data, etc… Survey questions on plagiarism and other forms of professional misconduct were excluded. The final sample consisted of 21 surveys that were included in the systematic review, and 18 in the meta-analysis. A pooled weighted average of 1.97% (N = 7, 95%CI: 0.86–4.45) of scientists admitted to have fabricated, falsified or modified data or results at least once –a serious form of misconduct by any standard– and up to 33.7% admitted other questionable research practices. In surveys asking about the behaviour of colleagues, admission rates were 14.12% (N = 12, 95% CI: 9.91–19.72) for falsification, and up to 72% for other questionable research practices. Meta-regression showed that self reports surveys, surveys using the words “falsification” or “fabrication”, and mailed surveys yielded lower percentages of misconduct. When these factors were controlled for, misconduct was reported more frequently by medical/pharmacological researchers than others. Considering that these surveys ask sensitive questions and have other limitations, it appears likely that this is a conservative estimate of the true prevalence of scientific misconduct.
Article
Full-text available
Psychologists have worried about the distortions introduced into standardized personality measures by social desirability bias. Survey researchers have had similar concerns about the accuracy of survey reports about such topics as illicit drug use, abortion, and sexual behavior. The article reviews the research done by survey methodologists on reporting errors in surveys on sensitive topics, noting parallels and differences from the psychological literature on social desirability. The findings from the survey studies suggest that misreporting about sensitive topics is quite common and that it is largely situational. The extent of misreporting depends on whether the respondent has anything embarrassing to report and on design features of the survey. The survey evidence also indicates that misreporting on sensitive topics is a more or less motivated process in which respondents edit the information they report to avoid embarrassing themselves in the presence of an interviewer or to avoid repercussions from third parties.
Article
The Zurich Survey of Academics is a large-scale and representative web survey among scientists at universities in Switzerland, Germany, and Austria (DACH region). The survey was conducted in 2020 and includes N=15,972 scientists from 236 universities. The survey is motivated by recent developments, such as the significant increase of team work in science and problems of how to organize fair and sustainable collaborations. It also reflects recent discussions around the replication crisis, problems of scientific integrity, and the apparently increasing pressures in scientific work. The aim of the survey is to obtain in-depth insights from researchers in Europe. The survey includes a number of new measurements, such as vignettes, factorial surveys, behavioral games, an Implicit Association Test on misconduct, indirect questioning techniques for eliciting scientific misconduct, randomized survey experiments on selective publishing behavior, and more. These measurements are applied to elicit, among others, selfish versus prosocial behavior of scientists, authorship norms, and provisions of collective goods in science. This document describes the most innovative elements of the survey and the core item batteries, questions, games, behavioral tasks, and how permission to record linkage with individual bibliometric data was obtained. In addition, the specifics of the sampling and data cleaning are described. The document serves as a companion for informing about the questionnaire and the data for data analysts, interested researchers, reviewers and those interested in learning more about the specifics of the survey contents and the data structure. The document further entails links to additional material and documents, such as the codebook, ethics approval, and data protection. The survey is part of the larger-scale SNF/ERC Starting grant project “Social Norms, Cooperation and Conflict in Scientific Collaborations”.
Article
Misreporting of sensitive characteristics in surveys is a major concern among survey methodologists and social scientists across disciplines. Indirect question formats, such as the Item Count Technique (ICT) and the Randomized Response Techniques (RRT), including the Crosswise Model (CM) and the Triangular Model (TM), have been developed to protect respondents’ privacy by design to elicit more truthful answers. These methods have also been praised to produce more valid estimates than direct questions. However, recent research has revealed a number of problems, such as the occurrence of false negatives, false positives, and dependencies on socioeconomic characteristics, indicating that at least some respondents may still cheat or lie when asked indirectly. This article systematically investigates (1) how well respondents comprehend and (2) to what extent they trust the ICT, CM and TM. We conducted cognitive interviews with academics across disciplines, investigating how respondents perceive, think about and answer questions on academic misconduct using these indirect methods. The results indicate that most respondents comprehend the basic instructions, but many fail to understand the logic and principles of these techniques. Furthermore, the findings suggest that comprehension and honest self-reports are unrelated, thus violating core assumptions about the effectiveness of these techniques.
Article
This article presents an updated meta-analysis of survey experiments comparing the performance of the item count technique (ICT) and the direct questioning method. After synthesizing 246 effect sizes from 54 studies, we find that the probability that a sensitive item will be selected is .089 higher when using ICT compared to direct questioning. In recognition of the heterogeneity across studies, we seek to explain this variation by means of moderator analyses. We find that the relative effectiveness of ICT is moderated by cultural orientation in the context in which ICT is conducted (collectivism vs. individualism), the valence of topics involved in the applications (socially desirable vs. socially undesirable), and the number of nonkey items. In the Discussion section, we elaborate on the methodological implications of the main findings.
Article
We demonstrate that widely used measures of antigay sentiment and the size of the lesbian, gay, bisexual, and transgender (LGBT) population are misestimated, likely substantially. In a series of online experiments using a large and diverse but nonrepresentative sample, we compare estimates from the standard methodology of asking sensitive questions to measures from a “veiled” methodology that precludes inference about an individual but provides population estimates. The veiled method increased self-reports of antigay sentiment, particularly in the workplace: respondents were 67% more likely to disapprove of an openly gay manager when asked with a veil, and 71% more likely to say it should be legal to discriminate in hiring on the basis of sexual orientation. The veiled methodology also produces larger estimates of the fraction of the population that identifies as LGBT or has had a sexual experience with a member of the same sex. Self-reports of nonheterosexual identity rose by 65%, and same-sex sexual experiences by 59%. We conduct a “placebo test” and show that for nonsensitive placebo items, the veiled methodology produces effects that are small in magnitude and not significantly different from zero in seven out of eight items. Taken together, the results suggest antigay discrimination might be a more significant issue than formerly considered, as the nonheterosexual population and antigay workplace-related sentiment are both larger than previously measured. Data, as supplemental material, are available at http://dx.doi.org/10.1287/mnsc.2016.2503 . This paper was accepted by Uri Gneezy, behavioral economics.
Article
Yu, Tian, and Tang (2008) proposed two new techniques for asking questions on sensitive topics in population surveys: the triangular model (TM) and the crosswise model (CM). The two models can be used as alternatives to the well-known randomized response technique (RRT) and are meant to overcome some of the drawbacks of the RRT. Although Yu, Tian, and Tang provide a promising theoretical analysis of the proposed models, they did not test them. We therefore provide results from an experimental survey in which the crosswise model was implemented and compared to direct questioning. To our knowledge, this is the first empirical evaluation of the crosswise model. We focused on the crosswise model because it seems better suited than the triangular model to overcome the self-protective "no" bias observed for the RRT. This paper-and-pencil survey on plagiarism was administered to Swiss and German students in university classrooms. Results suggest that the CM is a promising data-collection instrument eliciting more socially undesirable answers than direct questioning.
Article
Estimates of the prevalence of sensitive attributes obtained through direct questions are prone to being distorted by untruthful responding. Indirect questioning procedures such as the Randomized Response Technique (RRT) aim to control for the influence of social desirability bias. However, even on RRT surveys, some participants may disobey the instructions in an attempt to conceal their true status. In the present study, we experimentally compared the validity of two competing indirect questioning techniques that presumably offer a solution to the problem of nonadherent respondents: the Stochastic Lie Detector and the Crosswise Model. For two sensitive attributes, both techniques met the "more is better" criterion. Their application resulted in higher, and thus presumably more valid, prevalence estimates than a direct question. Only the Crosswise Model, however, adequately estimated the known prevalence of a nonsensitive control attribute.
Article
While political campaigns commonly employ clientelistic mobilization tactics during elections in developing countries, studying vote buying with mass surveys has proven difficult since respondents often will not admit to receiving a gift or favor in exchange for their votes. This study explores the degree to which respondents vary in their reporting of the receipt of goods or favors. Analysis of list experiments included in 10 surveys conducted in eight Latin American countries demonstrates the widespread prevalence of underreporting and shows that it is best predicted by three different sources of question sensitivity. First, bias is greater among respondents with higher levels of education, likely due to greater understanding and awareness of democratic norms about vote buying. Second, since vote buying is often stigmatized as resulting from poverty, those who are particularly sensitive to questions about income also prove to be much more likely to edit their answers. Finally, bias is positively associated with the degree to which the goods distributed violate democratic norms, as bias is smallest in countries in which the gifts consist largely of innocuous campaign materials and items such as clothing and food. The results not only point to probable biases in analyses conducted using direct measures of gift dispensation, but also illuminate how social attitudes about vote buying have spread in different countries in Latin America.
Article
Surveys often contain sensitive questions, i.e., questions about private, illegal, or socially undesirable behavior. When asked directly in standard survey formats, respondents tend to underreport these behaviors, yielding biased results. One method that promises more valid estimates than direct questioning (DQ) is the item count technique (ICT). In this paper, the methodological pros and cons of ICT, as compared to DQ, are weighed up empirically with regard to questions eliciting self-reported delinquency. We present findings from a face-to-face survey of 552 respondents who had all been previously convicted under criminal law prior to the survey. The results show, first, that subjective measures of survey quality such as trust in anonymity or willingness to respond are not affected positively by ICT with the exception that interviewers feel less uncomfortable asking sensitive questions in ICT format than in DQ format. Second, all prevalence estimates of self-reported delinquent behaviors are significantly higher in ICT than in DQ format. Third, a regression model on determinants of response behavior indicates that the effect of ICT on response validity varies by gender. Overall, our results are in support of ICT. This technique is a promising alternative to other specialized questioning techniques such as the much more complicated randomized response technique (RRT).
Article
Due to its sensitive nature, tax compliance is difficult to study empirically, and valid information on tax evasion is rare. More specifically, when directly asked on surveys, respondents are likely to underreport their evasion behavior. Such invalid responses not only bias prevalence estimates but may also obscure associations with individual predictors. To generate more valid estimates of tax evasion, we used a new method of data collection for sensitive questions, the crosswise model (CM). The CM is conceptually based on the randomized response technique (RRT), but due to its advanced design, it is better suited for large surveys than classical RRTs. In an experimental online survey, we compared the CM (N = 862) to standard direct questioning (DQ; N = 305). First, our results showed that the CM was able to elicit a higher proportion of self-stigmatizing reports of tax evasion by increasing privacy in the data collection process. Second, on average, we found stronger effects of our predictor variables on tax evasion in the CM condition compared with the DQ condition such that an egoistic personality and the opportunity for tax evasion predicted actual tax evasion only in the CM condition.
Book
1. Introduction 2. Respondents' understanding of survey questions 3. The role of memory in survey responding 4. Answering questions about date and durations 5. Attitude questions 6. Factual judgments and numerical estimates 7. Attitude judgments and context effects 8. Mapping and formatting 9. Survey reporting of sensitive topics 10. Mode of data collection 11. Impact of the application of cognitive models to survey measurement.
Article
Survey questions asking about taboo topics such as sexual activities, illegal behaviour such as social fraud, or unsocial attitudes such as racism, often generate inaccurate survey estimates which are distorted by social desirability bias. Due to self-presentation concerns, survey respondents underreport socially undesirable activities and overreport socially desirable ones. This article reviews theoretical explanations of socially motivated misreporting in sensitive surveys and provides an overview of the empirical evidence on the effectiveness of specific survey methods designed to encourage the respondents to answer more honestly. Besides psychological aspects, like a stable need for social approval and the preference for not getting involved into embarrassing social interactions, aspects of the survey design, the interviewer’s characteristics and the survey situation determine the occurrence and the degree of social desirability bias. The review shows that survey designers could generate more valid data by selecting appropriate data collection strategies that reduce respondents’ discomfort when answering to a sensitive question.
Article
Over the past thirty years, the Democratic party has carried the mantle of racial liberalism. The party's endorsements of equal rights, fair housing laws and school busing have cost it the support of some whites, but these losses have been concentrated at the periphery of the party, among those least committed to its guiding principles or most unsympathetic to its efforts on behalf of racial equality. We argue that with the rise of affirmative action as the primary vehicle to advance racial equality, racial politics have become divisive in a new way, and that opposition to affirmative action now encompasses whites within the liberal core of the Democratic party.
Article
Due to the inherent sensitivity of many survey questions, a number of researchers have adopted an indirect questioning technique known as the list experiment (or the item-count technique) in order to reduce dishonest or evasive responses. However, standard practice with the list experiment requires a large sample size, utilizes only a difference-in-means estimator, and does not provide a measure of the sensitive item for each respondent. This paper addresses all of these issues. First, the paper presents design principles for the standard list experiment (and the double list experiment) for the reduction of bias and variance as well as providing sample-size formulas for the planning of studies. Second, this paper proves that a respondent-level probabilistic measure for the sensitive item can be derived. This provides a basis for diagnostics, improved estimation, and regression analysis. The techniques in this paper are illustrated with a list experiment from the 2008–2009 American National Election Studies (ANES) Panel Study and an adaptation of this experiment.
Article
The item count technique is an indirect questioning technique that is used to estimate the proportion of people who have engaged in stigmatizing behavior. This technique is expected to yield a more appropriate estimate than the ordinary direct questioning technique because it requests respondents to indicate, based on a list of several items, simply the number of items that are applicable to them, including the target key item. An experimental web survey was conducted in an attempt to compare the direct questioning technique and the item count technique. Compared with the direct questioning technique, the item count technique yielded higher estimates of the proportion of shoplifters by nearly 10 percentage points, whereas the difference between the estimates using these two techniques was mostly insignificant with respect to innocuous blood donation. The survey results suggest that in the item count technique respondents tend to report fewer total behaviors compared to the direct question case. This tendency is more pronounced in the case of longer item lists. Three domain estimators for the item count technique were compared, and the cross-based method appeared to be the most appropriate method. Large differences in domain estimates for shoplifting between the item count and direct questioning techniques were found among female respondents, middle-aged respondents, respondents living in urban areas, and highly-educated respondents.
Article
Sensitive topics or highly personal questions are often being asked in medical, psychological and sociological surveys. This paper proposes two new models (namely, the triangular and crosswise models) for survey sampling with the sensitive characteristics. We derive the maximum likelihood estimates (MLEs) and large-sample confidence intervals for the proportion of persons with sensitive characteristic. The modified MLEs and their asymptotic properties are developed. Under certain optimality criteria, the designs for the cooperative parameter are provided and the sample size formulas are given. We compare the efficiency of the two models based on the variance criterion. The proposed models have four advantages: neither model requires randomizing device, the models are easy to be implemented for both interviewer and interviewee, the interviewee does not face any sensitive questions, and both models can be applied to both face-to-face personal interviews and mail questionnaires.
Article
Gaining valid answers to so-called sensitive questions is an age-old problem in survey research. Various techniques have been developed to guarantee anonymity and minimize the respondent's feelings of jeopardy. Two such techniques are the randomized response technique (RRT) and the unmatched count technique (UCT). In this study we evaluate the effectiveness of different implementations of the RRT (using a forced-response design) in a computer-assisted setting and also compare the use of the RRT to that of the UCT. The techniques are evaluated according to various quality criteria, such as the prevalence estimates they provide, the ease of their use, and respondent trust in the techniques. Our results indicate that the RRTs are problematic with respect to several domains, such as the limited trust they inspire and non-response, and that the RRT estimates are unreliable due to a strong false "no" bias, especially for the more sensitive questions. The UCT, however, performed well compared to the RRTs on all the evaluated measures. The UCT estimates also had more face validity than the RRT estimates. We conclude that the UCT is a promising alternative to RRT in self-administered surveys and that future research should be directed towards evaluating and improving the technique.
Statistical analysis of list experiments
  • G Blair
  • K Imai
Thinking of doing a list experiment? Here’s a list of reasons why you should think again
  • A Gelman
Testing the properties of the crosswise model in reducing social desirability bias on attitudes towards Muslims: A validation study
  • D Johann
  • K Thomas
A new survey technique for studying deviant behavior
  • J D Miller
Testing the properties of the crosswise model in reducing social desirability bias on attitudes towards Muslims: A validation study
  • Johann D Thomas