To read the full-text of this research, you can request a copy directly from the authors.
Abstract
The rise of smartphone surveys, coupled with technological advancements, provide new ways for measuring respondents’ political attitudes. The use of open questions with requests for voice answers instead of text answers may simplify the answer process and provide nuanced information. So far, research comparing the measurement quality of text and voice answers is scarce. We therefore conducted an experiment in a smartphone survey (N = 2,402) to investigate the criterion validity of text and voice answers. Voice answers were collected using a JavaScript- and PHP-based voice recording tool that resembles the voice messaging function of Instant-Messaging Services. The results show that the open questions with requests for text and voice answers differ in terms of criterion validity. More specifically, the findings indicate that voice answers result in a somewhat higher criterion validity than their text counterparts. More refined research on the measurement quality of text and voice answers is required in order to draw robust conclusions.
To read the full-text of this research, you can request a copy directly from the authors.
... For instance, built-in microphones enable researchers to collect oral instead of written answers. Especially, smartphones facilitate the recording of oral answers to collect rich information about respondents' political attitudes by triggering an open narration (Gavras & Höhne, 2020;Revilla & Couper, 2019;Revilla et al., 2018). Respondents are able to express their attitudes with almost no further (technical) burden; they only need to press a recording button on the respective survey page and record their answer. ...
... Thus, we only find partial evidence for our third hypothesis. Gavras and Höhne (2020) found that sentiment scores of written and oral answers to open-ended questions on attitudes towards German political parties can be used to predict voting behaviour. In order to provide further descriptive evidence, we estimated the correlation matrix of the sentiment scores (see Appendix D in the Supplementary Materials). ...
... Third, future research may investigate the measurement quality of written and oral answers by using respondents' predicted sentiment scores to determine the association between these scores and criterion measures. Doing so, one would be able to estimate the criterion validity of open-ended questions with requests for written and oral answers (see Gavras & Höhne, 2020). Unfortunately, this analysis was beyond the scope of this paper. ...
The rapid increase in smartphone surveys and technological developments open novel opportunities for collecting survey answers. One of these opportunities is the use of open‐ended questions with requests for oral instead of written answers, which may facilitate the answer process and result in more in‐depth and unfiltered information. Whereas it is now possible to collect oral answers on smartphones, we still lack studies on the impact of this novel answer format on the characteristics of respondents' answers. In this study, we compare the linguistic and content characteristics of written versus oral answers to political attitude questions. For this purpose, we conducted an experiment in a smartphone survey (N = 2402) and randomly assigned respondents to an answer format (written or oral). Oral answers were collected via the open source ‘SurveyVoice (SVoice)’ tool, whereas written answers were typed in via the smartphone keypad. Applying length analysis, lexical structure analysis, sentiment analysis and structural topic models, our results reveal that written and oral answers differ substantially from each other in terms of lengths, structures, sentiments and topics. We find evidence that written answers are characterized by an intentional and conscious answering, whereas oral answers are characterized by an intuitive and spontaneous answering.
... In particular, smartphone sensors and apps allow researchers to collect new types of data, which can improve and expand survey measurement (Link et al., 2014), and offer the potential to reduce measurement errors, respondent burden and data collection costs (Jäckle et al., 2018). For example, GPS (McCool et al., 2021), accelerometers (Höhne & Schlosser, 2019;Höhne, Revilla, et al., 2020), web tracking applications and plug-ins (Bosch & Revilla, 2021bRevilla et al., 2017) and microphones (Gavras & Höhne, 2022;Revilla & Couper, 2021;Revilla et al., 2020), have already been used in (mobile) web survey research. ...
Images might provide richer and more objective information than text answers to open‐ended survey questions. Little is known, nonetheless, about the consequences for data quality of asking participants to answer open‐ended questions with images. Therefore, this paper addresses three research questions: (1) What is the effect of answering web survey questions with images instead of text on breakoff, noncompliance with the task, completion time and question evaluation? (2) What is the effect of including a motivational message on these four aspects? (3) Does the impact of asking to answer with images instead of text vary across device types? To answer these questions, we implemented a 2 × 3 between‐subject web survey experiment (N = 3043) in Germany. Half of the sample was required to answer using PCs and the other half with smartphones. Within each device group, respondents were randomly assigned to (1) a control group answering open‐ended questions with text; (2) a treatment group answering open‐ended questions with images; and (3) another treatment group answering open‐ended questions with images but prompted with a motivational message. Results show that asking participants to answer with images significantly increases participants' likelihood of noncompliance as well as their completion times, while worsening their overall survey experience. Including motivational messages, moreover, moderately reduces the likelihood of noncompliance. Finally, the likelihood of noncompliance is similar across devices.
Web surveys completed on smartphones open novel ways for measuring respondents’ attitudes, behaviors, and beliefs that are crucial for social science research and many adjacent research fields. In this study, we make use of the built-in microphones of smartphones to record voice answers in a smartphone survey and extract non-verbal cues, such as amplitudes and pitches, from the collected voice data. This allows us to predict respondents’ level of interest (i.e., disinterest, neutral, and high interest) based on their voice answers, which expands the opportunities for researching respondents’ engagement and answer behavior. We conducted a smartphone survey in a German online access panel and asked respondents four open-ended questions on political parties with requests for voice answers. In addition, we measured respondents’ self-reported survey interest using a closed-ended question with an end-labeled, seven-point rating scale. The results show a non-linear association between respondents’ predicted level of interest and answer length. Respondents with a predicted medium level of interest provide longer answers in terms of number of words and response times. However, respondents’ predicted level of interest and their self-reported interest are weakly associated. Finally, we argue that voice answers contain rich meta-information about respondents’ affective states, which are yet to be utilized in survey research.
With the increasing frequency of surveys being conducted via smartphones and tablets, the option of audio responses for providing open-ended responses has become more popular in research. This approach aims to improve the user experience of respondents and the quality of their answers. To circumvent the tedious task of transcribing each audio recording for analysis, many previous studies have used the Google Cloud Automatic Speech Recognition (ASR) service to convert audio data to text. Extending previous research, we benchmark the Google Cloud ASR service with state-of-the-art ASR systems from Meta (wav2vec 2.0), Nvidia (NeMo), and OpenAI (Whisper). To do so, we use 100 randomly selected and recorded open-ended responses to popular social science survey questions. Additionally, we provide a basic, easy-to-understand introduction on how the Whisper ASR system works as well as code for implementation. By comparing Word Error Rates, we show that for our data the Google Cloud ASR service is outperformed by almost all ASR systems, highlighting the need to also consider other ASR systems.
Probes are follow-ups to survey questions used to gain insights on respondents’ understanding of and responses to these questions. They are usually administered as open-ended questions, primarily in the context of questionnaire pretesting. Due to the decreased cost of data collection for open-ended questions in web surveys, researchers have argued for embedding more open-ended probes in large-scale web surveys. However, there are concerns that this may cause reactivity and impact survey data. The study presents a randomized experiment in which identical survey questions were run with and without open-ended probes. Embedding open-ended probes resulted in higher levels of survey break off, as well as increased backtracking and answer changes to previous questions. In most cases, there was no impact of open-ended probes on the cognitive processing of and response to survey questions. Implications for embedding open-ended probes into web surveys are discussed.
The method of web probing integrates cognitive interviewing techniques into web surveys and is increasingly used to evaluate survey questions. In a usual web probing scenario, probes are administered immediately after the question to be tested (concurrent probing), typically as open-ended questions. A second possibility of administering probes is in a closed format, whereby the response categories for the closed probes are developed during previously conducted qualitative cognitive interviews. Using closed probes has several benefits, such as reduced costs and time efficiency, because this method does not require manual coding of open-ended responses. In this article, we investigate whether the insights gained into item functioning when implementing closed probes are comparable to the insights gained when asking open-ended probes and whether closed probes are equally suitable to capture the cognitive processes for which traditionally open-ended probes are intended. The findings reveal statistically significant differences with regard to the variety of themes, the patterns of interpretation, the number of themes per respondent, and nonresponse. No differences in number of themes across formats by sex and educational level were found.
The ever-growing number of respondents completing web surveys via smartphones is paving the way for leveraging technological advances to improve respondents’ survey experience and, in turn, the quality of their answers. Smartphone surveys enable researchers to incorporate audio and voice features into web surveys, that is, having questions read aloud to respondents using pre-recorded audio files and collecting voice answers via the smartphone’s microphone. Moving from written to audio and voice communication channels might be associated with several benefits, such as humanizing the communication process between researchers and respondents. However, little is known about respondents’ willingness to undergo this change in communication channels. Replicating and extending earlier research, we examine the extent to which respondents are willing to use audio and voice channels in web surveys, the reasons for their (non)willingness, and respondent characteristics associated with (non)willingness. The results of a web survey conducted in a nonprobability online panel in Germany ( N = 2146) reveal that more than 50% of respondents would be willing to have the questions read aloud (audio channel) and about 40% would also be willing to give answers via voice input (voice channel). While respondents mostly name a general openness to new technologies for their willingness, they mostly name preference for written communication for their nonwillingness. Finally, audio and voice channels in smartphone surveys appeal primarily to frequent and competent smartphone users as well as younger and tech-savvy respondents.
Technological advancements and changes in online survey participation pave the way for new data collection methods. Particularly, the increasing smartphone rate in online surveys facilitates a re-consideration of prevailing communication channels to, for instance, naturalize the communication process between researchers and respondents and to collect more in-depth and high-quality data. However, so far, there is a lack of information on whether respondents are willing to undergo a change in communication channels. In this study, I therefore investigate respondents’ willingness to participate in online surveys with a smartphone to have the survey questions read out loud (audio channel) and to give oral answers via voice input (voice channel). For this purpose, I employed two willingness questions – one on audio and one on voice channels – in the probability-based German Internet Panel (N = 4,426). The results reveal that a substantial minority of respondents is willing to participate in online surveys with a smartphone to have the survey questions read out loud and to give oral answers via voice input. They also show that the device used for survey participation and personality traits, such as conscientiousness and extraversion, play a role when it comes to respondents’ willingness.
Multidimensional concepts are non-compensatory when higher values on one component cannot offset lower values on another. Thinking of the components of a multidimensional phenomenon as non-compensatory rather than substitutable can have wide-ranging implications, both conceptually and empirically. To demonstrate this point, we focus on populist attitudes that feature prominently in contemporary debates about liberal democracy. Given similar established public opinion constructs, the conceptual value of populist attitudes hinges on its unique specification as an attitudinal syndrome, which is characterized by the concurrent presence of its non-compensatory concept subdimensions. Yet this concept attribute is rarely considered in existing empirical research. We propose operationalization strategies that seek to take the distinct properties of non-compensatory multidimensional concepts seriously. Evidence on five populism scales in 12 countries reveals the presence and consequences of measurement-concept inconsistencies. Importantly, in some cases, using conceptually sound operationalization strategies upsets previous findings on the substantive role of populist attitudes.
As people increasingly communicate via asynchronous non-spoken modes on mobile devices, particularly text messaging (e.g., SMS), longstanding assumptions and practices of social measurement via telephone survey interviewing are being challenged. In the study reported here, 634 people who had agreed to participate in an interview on their iPhone were randomly assigned to answer 32 questions from US social surveys via text messaging or speech, administered either by a human interviewer or by an automated interviewing system. 10 interviewers from the University of Michigan Survey Research Center administered voice and text interviews; automated systems launched parallel text and voice interviews at the same time as the human interviews were launched. The key question was how the interview mode affected the quality of the response data, in particular the precision of numerical answers (how many were not rounded), variation in answers to multiple questions with the same response scale (differentiation), and disclosure of socially undesirable information. Texting led to higher quality data—fewer rounded numerical answers, more differentiated answers to a battery of questions, and more disclosure of sensitive information—than voice interviews, both with human and automated interviewers. Text respondents also reported a strong preference for future interviews by text. The findings suggest that people interviewed on mobile devices at a time and place that is convenient for them, even when they are multitasking, can give more trustworthy and accurate answers than those in more traditional spoken interviews. The findings also suggest that answers from text interviews, when aggregated across a sample, can tell a different story about a population than answers from voice interviews, potentially altering the policy implications from a survey.
Although the purpose of questionnaire items is to obtain a person’s opinion on a certain matter, a respondent’s registered opinion may not reflect his or her “true” opinion because of random and systematic errors. Response styles (RSs) are a respondent’s tendency to respond to survey questions in certain ways regardless of the content, and they contribute to systematic error. They affect univariate and multivariate distributions of data collected by rating scales and are alternative explanations for many research results. Despite this, RS are often not controlled in research. This article provides a comprehensive summary of the types of RS, lists their potential sources, and discusses ways to diagnose and control for them. Finally, areas for further research on RS are proposed.
In this paper, we investigate whether there are differences in the effect of instrument design between trained and fresh respondents.
In three experiments, we varied the number of items on a screen, the choice of response categories, and the layout of a five-point
rating scale. In general, effects of design carry over between trained and fresh respondents. We found little evidence that
survey experience influences the question-answering process. Trained respondents seem to be more sensitive to satisficing.
The shorter completion time, higher interitem correlations for multiple-item-per-screen formats, and the fact that they select
the first response options more often indicate that trained respondents tend to take shortcuts in the response process and
study the questions less carefully.
SentimentWortschatz, or SentiWS for short, is a publicly available German-language resource for sentiment analysis, opinion mining etc. It lists positive and negative sentiment bearing words weighted within the interval of ( 1;1) plus their part of speech tag, and if applicable, their inflections. The current version of SentiWS (v1.8b) contains 1,650 negative and 1,818 positive words, which sum up to 16,406 positive and 16,328 negative word forms, respectively. It not only contains adjectives and adverbs explicitly expressing a sentiment, but also nouns and verbs implicitly containing one. The present work describes the resource's structure, the three sources utilised to assemble it and the semi-supervised method incorporated to weight the strength of its entries. Furthermore the resource's contents are extensively evaluated using a German-language evaluation set we constructed. The evaluation set is verified being reliable and its shown that SentiWS provides a beneficial lexical resource for German-language sentiment analysis related tasks to build on.
Previous research has revealed techniques to improve response quality in open-ended questions in both paper and interviewer-administered
survey modes. The purpose of this paper is to test the effectiveness of similar techniques in web surveys. Using data from
a series of three random sample web surveys of Washington State University undergraduates, we examine the effects of visual
and verbal answer-box manipulations (i.e., altering the size of the answer box and including an explanation that answers could
exceed the size of the box) and the inclusion of clarifying and motivating introductions in the question stem. We gauge response
quality by the amount and type of information contained in responses as well as response time and item nonresponse. The results
indicate that increasing the size of the answer box has little effect on early responders to the survey but substantially
improved response quality among late responders. Including any sort of explanation or introduction that made response quality
and length salient also improved response quality for both early and late responders. In addition to discussing these techniques,
we also address the potential of the web survey mode to revitalize the use of open-ended questions in self-administered surveys.
More and more respondents are answering web surveys using mobile devices. Mobile respondents tend to provide shorter responses to open questions than PC respondents. Using voice recording to answer open-ended questions could increase data quality and help engage groups usually underrepresented in web surveys. Revilla, Couper, Bosch, and Asensio showed that in particular the use of voice recording still presents many challenges, even if it could be a promising tool. This article reports results from a follow-up experiment in which the main goals were to (1) test whether different instructions on how to use the voice recording tool reduce technical and understanding problems, and thereby reduce item nonresponse while preserving data quality and the evaluation of the tool; (2) test whether nonresponse due to context can be reduced by using a filter question, and how this affects data quality and the tool evaluation; and (3) understand which factors affect nonresponse to open-ended questions using voice recording, and if these factors also affect data quality and the evaluation of the tool. The experiment was implemented within a smartphone web survey in Spain focused on Android devices. The results suggest that different instructions did not affect nonresponse to the open questions and had little effect on data quality for those who did answer. Introducing a filter to ensure that people were in a setting that permits voice recording seems useful. Despite efforts to reduce problems, a substantial proportion of respondents are still unwilling or unable to answer open questions using voice recording.
The analysis of political texts from parliamentary speeches, party manifestos, social media, or press releases forms the basis of major and growing fields in political science, not least since advances in “text-as-data” methods have rendered the analysis of large text corpora straightforward. However, a lot of sources of political speech are not regularly transcribed, and their on-demand transcription by humans is prohibitively expensive for research purposes. This class includes political speech in certain legislatures, during political party conferences as well as television interviews and talk shows. We showcase how scholars can use automatic speech recognition systems to analyze such speech with quantitative text analysis models of the “bag-of-words” variety. To probe results for robustness to transcription error, we present an original “word error rate simulation” (WERSIM) procedure implemented in $R$ . We demonstrate the potential of automatic speech recognition to address open questions in political science with two substantive applications and discuss its limitations and practical challenges.
We implemented an experiment within a smartphone web survey to explore the feasibility of using voice input (VI) options. Based on device used, participants were randomly assigned to a treatment or control group. Respondents in the iPhone operating system (iOS) treatment group were asked to use the dictation button, in which the voice was translated automatically into text by the device. Respondents with Android devices were asked to use a VI button which recorded the voice and transmitted the audio file. Both control groups were asked to answer open-ended questions using standard text entry. We found that the use of VI still presents a number of challenges for respondents. Voice recording (Android) led to substantially higher nonresponse, whereas dictation (iOS) led to slightly higher nonresponse, relative to text input. However, completion time was significantly reduced using VI. Among those who provided an answer, when dictation was used, we found fewer valid answers and less information provided, whereas for voice recording, longer and more elaborated answers were obtained. Voice recording (Android) led to significantly lower survey evaluations, but not dictation (iOS).
Spatial models of issue voting generally assume that citizens have a single “vote function”. A given voter is expected to evaluate all parties using the same issue criteria. The impact of issues can vary between citizens and contexts, but is normally considered to be constant across parties. This paper reassesses this central assumption, by suggesting that party characteristics influence the salience of issue considerations in voters’ evaluations. Voters should rely more strongly on issues which are frequently associated with a given party and for which its issue stances are better known. Our analysis of the 2014 European elections supports these hypotheses by showing that the impact of voter-party issue distances on party evaluations is systematically related to the clarity and extremism of parties’ issue positions, as well as to their size and governmental status. These findings imply an important modification of standard proximity models of electoral competition and party preferences.
Mobile coverage recently has reached an all-time high, and in most countries, high-speed Internet connections are widely available. Due to technological development, smartphones and tablets have become increasingly popular. Accordingly, we have observed an increasing use of mobile devices to complete web surveys and, hence, survey methodologists have shifted their attention to the challenges that stem from this development. The present study investigated whether the growing use of smartphones has decreased how systematically this choice of device varies between groups of respondents (i.e., how selective smartphone usage for completing web surveys is). We collected a data set of 18,520 respondents from 18 web surveys that were fielded in Germany between 2012 and 2016. Based on these data, we show that while the use of smartphones to complete web surveys has considerably increased over time, selectivity with respect to using this device has remained stable.
Scholars estimating policy positions from political texts typically code words or sentences and then build left-right policy scales based on the relative frequencies of text units coded into different categories. Here we reexamine such scales and propose a theoretically and linguistically superior alternative based on the logarithm of odds-ratios. We contrast this scale with the current approach of the Comparative Manifesto Project (CMP), showing that our proposed logit scale avoids widely acknowledged flaws in previous approaches. We validate the new scale using independent expert surveys. Using existing CMP data, we show how to estimate more distinct policy dimensions, for more years, than has been possible before, and make this dataset publicly available. Finally, we draw some conclusions about the future design of coding schemes for political texts.