Conference Paper

Organize, Then Vote: Exploring Cognitive Load in Quadratic Survey Interfaces

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
How well do existing survey instruments differentiate between opinions that affect individual behavior and opinions that don't? To answer this question, we randomly assigned U.S. respondents to one of three survey instruments: Likert items (Likert), Likert items followed by personal importance items (Likert+) and Quadratic Voting for Survey Research (QVSR), which gives respondents a fixed budget to buy “favor” or “oppose” votes, with the price for each vote increasing quadratically. We find that, relative to Likert, both Likert+ and QVSR better identify people who care enough about an issue to act in opinion-congruent ways, with QVSR offering the most consistent improvement overall. Building on these results, we show how conclusions regarding the relationship between policy opinions and self-interest can differ across measurement strategies.
Article
Full-text available
Quadratic funding is a public good provision mechanism that satisfies desirable theoretical properties, such as efficiency under complete information, and has been gaining popularity in practical applications. We evaluate this mechanism in a setting of incomplete information regarding individual preferences, and show that this result only holds under knife-edge conditions. We also estimate the inefficiency of the mechanism in a variety of settings, and characterize circumstances in which inefficiency increases with population size. We show how these findings can be used to estimate the mechanism’s inefficiency in a wide range of situations under incomplete information.
Article
Full-text available
Information overload resulting from the ever faster growing mass of digital data makes knowledge work more and more complex. Being able to not get distracted and focus on what is currently relevant consumes valuable cognitive resources. Support by intelligent assistance software might alleviate this problem. We report two experiments that addressed this challenge by examining how context-based assistance may provide more available cognitive resources. Experiment 1 focused on work within a single context. Results indicate that external relevance classification can improve memory for content classified as currently more relevant. Experiment 2 focused on switching between two different contexts and shows that cognitive performance after context switches can be enhanced by context-specific structuring and saving a previous task status. Taken together, these results clearly demonstrate that automatic external information structuring by intelligent assistance software can protect knowledge workers from information overload by lightening their cognitive load and, thus, help improve cognitive performance.
Article
Full-text available
Surveys are a common instrument to gauge self-reported opinions from the crowd for scholars in the CSCW community, the social sciences, and many other research areas. Researchers often use surveys to prioritize a subset of given options when there are resource constraints. Over the past century, researchers have developed a wide range of surveying techniques, including one of the most popular instruments, the Likert ordinal scale [49], to elicit individual preferences. However, the challenge to elicit accurate and rich self-reported responses with surveys in a resource-constrained context still persists today. In this study, we examine Quadratic Voting (QV), a voting mechanism powered by the affordances of a modern computer and straddles ratings and rankings approaches [64], as an alternative online survey technique.We argue that QV could elicit more accurate self-reported responses compared to the Likert scale when the goal is to understand relative preferences under resource constraints. We conducted two randomized controlled experiments on Amazon Mechanical Turk, one in the context of public opinion polling and the other in a human-computer interaction user study. Based on our Bayesian analysis results, a QV survey with a sufficient amount of voice credits, aligned significantly closer to participants’ incentive-compatible behaviors than a Likert scale survey, with a medium to high effect size. In addition, we extended QV’s application scenario from typical public policy and education research to a problem setting familiar to the CSCW community: a prototypical HCI user study. Our experiment results, QV survey design, and QV interface serve as a stepping stone for CSCW researchers to further explore this surveying methodology in their studies and encourage decision-makers from other communities to consider QV as a promising alternative.
Article
Full-text available
As the amount of information online continues to grow, a correspondingly important opportunity is for individuals to reuse knowledge which has been summarized by others rather than starting from scratch. However, appropriate reuse requires judging the relevance, trustworthiness, and thoroughness of others' knowledge in relation to an individual's goals and context. In this work, we explore augmenting judgements of the appropriateness of reusing knowledge in the domain of programming, specifically of reusing artifacts that result from other developers' searching and decision making. Through an analysis of prior research on sensemaking and trust, along with new interviews with developers, we synthesized a framework for reuse judgements. The interviews also validated that developers express a desire for help with judging whether to reuse an existing decision. From this framework, we developed a set of techniques for capturing the initial decision maker's behavior and visualizing signals calculated based on the behavior, to facilitate subsequent consumers' reuse decisions, instantiated in a prototype system called Strata. Results of a user study suggest that the system significantly improves the accuracy, depth, and speed of reusing decisions. These results have implications for systems involving user-generated content in which other users need to evaluate the relevance and trustworthiness of that content.
Article
Full-text available
In four survey experiments we show that people generally answer more extremely to survey items presented in vertical versus horizontal Likert formats. Our findings suggest that this effect may be at least partly driven by differences in the visual range spanned by the response scale (i.e. the visual distance between endpoint response categories is larger in horizontal than in a vertical format). In addition, compared to traditional horizontal Likert data, vertical Likert data contain more variance, which is mainly non-substantive. As a result, data obtained with scale formats that have different distances between response categories (as is typically the case for vertical vs. horizontal formats) may lead to differences in measurement model parameter estimates like residual terms, and in some cases factor loadings and construct correlations. Based on these results, we provide recommendations on the use of response scale formats in online surveys, bearing in mind that several online survey tool providers promote the use of vertical Likert formats and even automatically change traditional horizontal formats of Likert-type items to vertical Likert formats when viewed on small screens (e.g., on mobile phones).
Conference Paper
Full-text available
This study aims to explore the feasibility of a text-based virtual agent as a new survey method to overcome the web survey's common response quality problems, which are caused by respondents' inattention. To this end, we conducted a 2 (platform: web vs. chatbot) × 2 (conversational style: formal vs. casual) experiment. We used satisficing theory to compare the responses' data quality. We found that the participants in the chatbot survey, as compared to those in the web survey, were more likely to produce differentiated responses and were less likely to satisfice; the chatbot survey thus resulted in higher-quality data. Moreover, when a casual conversational style is used, the participants were less likely to satisfice-although such effects were only found in the chatbot condition. These results imply that conversational interactivity occurs when a chat interface is accompanied by messages with effective tone. Based on an analysis of the qualitative responses, we also showed that a chatbot could perform part of a human interviewer's role by applying effective communication strategies.
Article
Full-text available
Studies of the processes underlying question answering in surveys suggest that the choice of (layout for) response categories can have a significant effect on respondent answers. In recent years, the use of pictures, such as emojis or stars, is often used in online communication. It is unclear if pictorial answer categories can replace traditional verbal formats as measurement instruments in surveys. In this article we investigate different versions of a Likert-scale to see if they generate similar results and user experiences. Data comes from the non-probability based Flitspanel in the Netherlands. The hearts and stars designs received lower average scores compared to the other formats. Smileys produced average answer scores in line with traditional radio buttons. Respondents evaluated the smiley design most positively. Grid designs were evaluated more negatively. People wanting to compare survey outcomes should be aware of these effects and only compare results when similar response formats are used.
Article
Full-text available
Cognitive load theory (CLT) applies what is known about human cognitive architecture to the study of learning and instruction, to generate insights into the characteristics and conditions of effective instruction and learning. Recent developments in CLT suggest that the human motor system plays an important role in cognition and learning; however, it is unclear whether models of working memory (WM) that are typically espoused by CLT researchers can reconcile these novel findings. For instance, often-cited WM models envision separate infor- mation processing systems—such as Baddeley and Hitch’s (1974) multicomponent model of WM—as a means to interpret modality-specific findings, although possible interactions with the human motor system remain under-explained. In this article, we examine the viability of these models to theoretically integrate recent research findings regarding the human motor system, as well as their ability to explain established CLT effects and other findings. We argue, it is important to explore alternate models of WM that focus on a single and integrated control of attention system that is applied to visual, phonological, embodied, and other sensory and nonsensory information. An integrated model such as this may better account for individual differences in experience and expertise and, parsimoniously, explain both recent and historical CLT findings across domains. To advance this aim, we propose an integrated model of WM that envisions a common and finite attentional resource that can be distributed across multiple modalities. How attention is mobilized and distributed across domains is interdependent, co- reinforcing, and ever-changing based on learners’ prior experience and their immediate cognitive demands. As a consequence, the distribution of attentional focus and WM resources will vary across individuals and tasks, depending on the nature of the specific task being performed; the neurological, developmental, and experiential abilities of the individual; and the current availability of internal and external cognitive resources.
Article
Full-text available
In an experiment dealing with the use of personal computer, tablet, or mobile, scale points (up to 5, 7, or 11) and response formats (bars or buttons) are varied to examine differences in mean scores and nonresponse. The total number of “not applicable” answers does not vary significantly. Personal computer has the lowest item nonresponse, followed by mobile and tablet, and a lower mean score than for mobile. Slider bars showed lower mean scores and more nonresponses than buttons, indicating that they are more prone to bias and difficult in use. Sider bars, which work with a drag-and-drop principle, perform worse than visual analogue scales working with a point-and-click principle and buttons. Five-point scales have more nonresponses than eleven-point scales. Respondents evaluate 11-point scales more positively than shorter scales.
Article
Full-text available
Success at university is a complex idea, with evidence that what “counts” as success is conceived differently by students and academics. This study contrasts two methodologies (“Likert-type” ordered response and quadratic voting, which does not appear to have been applied to education research previously) to identify which factors are important in university success to first year health science students. Completion (passing subjects and obtaining qualifications) and achievement (getting good grades) were the most important factors in both methodologies, but important differences were found between the two in the relative importance of four factors, particularly in the importance of a sense of belonging and personalisation of study options. Contrasting data from the two methods potentially separates factors students think are vital from those that are important but not essential—a distinction which is concealed using Likert-type instruments alone.
Article
Full-text available
Since their introduction in 1932, Likert and other continuous, independent rating scales have become the de facto toolset for survey research. Scholars have raised significant reliability and validity problems with these types of scales, and alternative methods for capturing perceptions and preferences have gained traction within specific domains. In this paper, we evaluate a new, broadly applicable approach to opinion measurement based on quadratic voting (QV), a method in which respondents express preferences by ‘buying’ votes for options using a fixed budget from which they pay quadratic prices for votes. Comparable QV-based and Likert-based survey instruments designed by Collective Decision Engines LLC were evaluated experimentally by assigning potential respondents randomly to one or the other method. Using a host of metrics, including respondent engagement and process-based metrics, we provide some initial evidence that the QV-based instrument provides a clearer measure of the preferences of the most intensely motivated respondents than the Likert-based instrument does. We consider the implications for survey satisficing, a key threat to the continued value of survey research, and discuss the mechanisms by which QV differentiates itself from Likert-based scales, thus establishing QV as a promising alternative survey tool for political and commercial research. We also explore key design issues within QV-based surveys to extend these promising results.
Article
Full-text available
We present a taxonomy of choice architecture techniques that focus on intervention design, as opposed to the underlying cognitive processes that make an intervention work. We argue that this distinction will facilitate further empirical testing and will assist practitioners in designing interventions. The framework is inductively derived from empirically tested examples of choice architecture and consists of nine techniques targeting decision information, decision structure, and decision assistance. An inter-rater reliability test demonstrates that these techniques can be used in an intersubjectively replicable way to describe sample choice architectures. We conclude by discussing limitations of the framework and key issues concerning the use of the techniques in the development of new choice architectures.
Conference Paper
Full-text available
A core tradition of HCI lies in the experimental evaluation of the effects of techniques and interfaces to determine if they are useful for achieving their purpose. However, our individual analyses tend to stand alone, and study results rarely accrue in more precise estimates via meta-analysis: in a literature search, we found only 56 meta-analyses in HCI in the ACM Digital Library, 3 of which were published at CHI (often called the top HCI venue). Yet meta-analysis is the gold standard for demonstrating robust quantitative knowledge. We treat this as a user-centered design problem: the failure to accrue quantitative knowledge is not the users' (i.e. researchers') failure, but a failure to consider those users' needs when designing statistical practice. Using simulation, we compare hypothetical publication worlds following existing frequentist against Bayesian practice. We show that Bayesian analysis yields more precise effects with each new study, facilitating knowledge accrual without traditional meta-analyses. Bayesian practices also allow more principled conclusions from small-n studies of novel techniques. These advantages make Bayesian practices a likely better fit for the culture and incentives of the field. Instead of admonishing ourselves to spend resources on larger studies, we propose using tools that more appropriately analyze small studies and encourage knowledge ac-crual from one study to the next. We also believe Bayesian methods can be adopted from the bottom up without the need for new incentives for replication or meta-analysis. These techniques offer the potential for a more user-(i.e. researcher-) centered approach to statistical analysis in HCI.
Conference Paper
Full-text available
This study explored various user interface designs to transition a two dimensional (2D) questionnaire from its paper-and-pencil testing format to the mobile platform. The current administration of the test limits its usage beyond the lab environment. Creating a mobile version would facilitate ubiquitous administration of the test. Yet, the mobile design must be at least as good as its paper-based counterpart in terms of input accuracy and user interaction efforts. We developed four user interface designs, each of which featured a specific interaction approach. These approaches included displaying the 2D space of the questionnaire in its original form (M1), inputting one variable at a time on the 2D space (M2), dissolving the 2D space into two one-dimensional ordinal scales (M3), and orienting the input selections to the diagonal axes (M4). The designs were tested by a total of 34 participants, aged 18 to 52 years. The study results find the first three interaction approaches (M1-M3) effective but the fourth approach inefficient. Furthermore, the results indicate that the two-tap designs (M2 and M3) are equally as good as the one-tap design (M1).
Article
Full-text available
This research began with a question about addressing a broader range of accessibility issues in voting than the standards in the Voluntary Voting System Guidelines (VVSG) require. The VVSG standards cover accessibility for low vision, blindness, and cognitive disabilities. But what if anyone could mark their ballot anywhere, any time, on any device? While the likelihood of voters voting on their own devices may be remote in the current elections environment, it is likely that election jurisdictions will begin to use consumer off the shelf devices as the voter-facing part of voting systems soon. Thus, we narrowed the scope of our research to prototyping an accessible, responsive, Web standards-compliant front end for ballot marking that would be accessible to voters with low literacy (a previously ignored voter audience) or who had mild cognitive disabilities. The final ballot interface is based on principles of " plain language " and " plain interaction. " The ballot interface is available under a Creative Commons license at anywhereballot.com. This paper reports on the rapid iterative testing and evaluation (RITE; Medlock et al., 2002) we conducted and the lessons we learned about designing a digital ballot interface for people with low literacy or mild cognitive disabilities.
Article
Full-text available
Despite the voluminous evidence in support of the paradoxical finding that providing individuals with more options can be detrimental to choice, the question of whether and when large assortments impede choice remains open. Even though extant research has identified a variety of antecedents and consequences of choice overload, the findings of the individual studies fail to come together into a cohesive understanding of when large assortments can benefit choice and when they can be detrimental to choice. In a meta-analysis of 99 observations (N = 7,202) reported by prior research, we identify four key factors—choice set complexity, decision task difficulty, preference uncertainty, and decision goal—that moderate the impact of assortment size on choice overload. We further show that each of these four factors has a reliable and significant impact on choice overload, whereby higher levels of decision task difficulty, greater choice set complexity, higher preference uncertainty, and a more prominent, effort-minimizing goal facilitate choice overload. We also find that four of the measures of choice overload used in prior research—satisfaction/confidence, regret, choice deferral, and switching likelihood—are equally powerful measures of choice overload and can be used interchangeably. Finally, we document that when moderating variables are taken into account the overall effect of assortment size on choice overload is significant—a finding counter to the data reported by prior meta-analytic research.
Article
Full-text available
1: Overview In recent years there has been an increased focus on the role of education and training, and on the effectiveness and efficiency of various instructional design strategies. Some of the most important breakthroughs in this regard have come from the discipline of Cognitive Science, which deals with the mental processes of learning, memory and problem solving. Cognitive load theory (e.g. Sweller, 1988; 1994) is an instructional theory generated by this field of research. It describes learning structures in terms of an information processing system involving long term memory, which effectively stores all of our knowledge and skills on a more-or-less permanent basis and working memory, which performs the intellectual tasks associated with consciousness. Information may only be stored in long term memory after first being attended to, and processed by, working memory. Working memory, however, is extremely limited in both capacity and duration. These limitations will, under some conditions, impede learning. The fundamental tenet of cognitive load theory is that the quality of instructional design will be raised if greater consideration is given to the role and limitations, of working memory. Since its conception in the early 1980's, cognitive load theory has been used to develop several instructional strategies which have been demonstrated empirically to be superior to those used conventionally.
Article
Full-text available
Rating interfaces are widely used on the Internet to elicit people's opinions. Little is known, however, about the effectiveness of these interfaces and their design space is relatively unexplored. We provide a taxonomy for the design space by identifying two axes: Measurement Scale for absolute rating vs. relative ranking, and Recall Support for the amount of information provided about previously recorded opinions. We present an exploration of the design space through iterative prototyping of three alternative interfaces and their evaluation. Among many findings, the study showed that users do take advantage of recall support in interfaces, preferring those that provide it. Moreover, we found that designing ranking systems is challenging; there may be a mismatch between a ranking interface that forces people to specify a total ordering for a set of items, and their mental model that some items are not directly comparable to each other.
Article
Full-text available
Studies concerning the impact of the length of response scales on the measurement of attitudes have primarily focused on the method bias associated with question format. At the same time another line of research has focused on the issue of response styles that affect how respondents answer to attitude questions. So far, research has paid less attention to the issue of whether the length of the response scales is related to response styles. In this study, we explore if differences in length of the response scale (i.e., method factor) have differential effects in evoking extreme and midpoint response style behavior (i.e., style factor). Our hypotheses read as follows. As the number of response categories increases, we expect subjects to be more likely to exert extreme response style. Furthermore, we expect subjects to be more likely to adopt a midpoint response style when they are offered a middle response category. To investigate these hypotheses we developed a split ballot experiment in which the number of response categories is manipulated from 5 to 11 categories. Data are collected by a random sample, large-scale web survey which allows for random assignment to the experimental conditions. The results show clear evidence of extreme response style and moderate evidence of midpoint response style. Extreme response style is not affected by the length of response scales, whereas the exertion of midpoint response style only popped up in the longer scale versions.
Conference Paper
Scale questionnaires are psychometric tools that capture perspectives and experiences. Consequently, these tools need to be reliable and valid. In this paper, we investigate the impact of response widgets - the UI elements that allow users to answer scale items - on the overall scale reliability and construct validity of three varied length scale questionnaires in a user study (N=30). Our results reveal that optimum reliability was achieved using radio buttons and dropdowns in all varied-length questionnaires. Further, valid results were produced utilising the slider and dropdown. No significant differences were found in time consumption, but click count was significantly higher with dropdown. Radio buttons scored lower in format satisfaction than others, and dropdown was the least effective in ease of selection and quick completion. In light of these results, we conclude that response widgets are more than just aesthetics and should be selected as per the researcher’s aims.
Article
Smart speakers have become exceedingly popular and entered many people's homes due to their ability to engage users with natural conversations. Researchers have also looked into using smart speakers as an interface to collect self-reported health data through conversations. Responding to surveys prompted by smart speakers requires users to listen to questions and answer in voice without any visual stimuli. Compared to traditional web-based surveys, where users can see questions and answers visually, voice surveys may be more cognitively challenging. Therefore, to collect reliable survey data, it is important to understand what types of questions are suitable to be administered by smart speakers. We selected five common survey questionnaires and deployed them as voice surveys and web surveys in a within-subject study. Our 24 participants answered questions using voice and web questionnaires in one session. They then repeated the same study session after 1 week to provide a "retest'' response. Our results suggest that voice surveys have comparable reliability to web surveys. We find that, when using 5-point or 7-point scales, voice surveys take about twice as long as web surveys. Based on objective measurements, such as response agreement and test-retest reliability, and subjective evaluations of user experience, we recommend that researchers consider adopting the binary scale and 5-point numerical scales for voice surveys on smart speakers.
Article
Online chat functions as a discussion channel for diverse social issues. However, deliberative discussion and consensus-reaching can be difficult in online chats in part because of the lack of structure. To explore the feasibility of a conversational agent that enables deliberative discussion, we designed and developed DebateBot, a chatbot that structures discussion and encourages reticent participants to contribute. We conducted a 2 (discussion structure: unstructured vs. structured) × 2 (discussant facilitation: unfacilitated vs. facilitated) between-subjects experiment (N = 64, 12 groups). Our findings are as follows: (1) Structured discussion positively affects discussion quality by generating diverse opinions within a group and resulting in a high level of perceived deliberative quality. (2) Facilitation drives a high level of opinion alignment between group consensus and independent individual opinions, resulting in authentic consensus reaching. Facilitation also drives more even contribution and a higher level of task cohesion and communication fairness. Our results suggest that a chatbot agent could partially substitute for a human moderator in deliberative discussions.
Book
US federalism grants state legislators the authority to design many aspects of election administration, including ballot features that mediate how citizens understand and engage with the choices available to them when casting their votes. Seemingly innocuous features in the physical design of ballots, such as the option to cast a straight ticket with a single checkmark, can have significant aggregate effects. Drawing on theoretical insights from behavioral economics and extensive data on state ballot laws from 1888 to the present, as well as in-depth case studies, this book shows how strategic politicians use ballot design to influence voting and elections, drawing comparisons across different periods in American history with varying levels of partisanship and contention. Engstrom and Roberts demonstrate the sweeping impact of ballot design on voting, elections, and democratic representation.
Book
The Adaptive Decision Maker argues that people use a variety of strategies to make judgments and choices. The authors introduce a model that shows how decision makers balance effort and accuracy considerations and predicts which strategy a person will use in a given situation. A series of experiments testing the model are presented, and the authors analyse how the model can lead to improved decisions and opportunities for further research.
Article
The increasing number and complexity of advanced driver assistance systems (ADAS) pave the way for fully automated driving. Automated vehicles are said to increase road safety and prevent human-made (fatal) accidents, amongst others. In the lower levels of automation, however, the driver is still responsible as a fallback authority. As a consequence, systems that reliably monitor the driver's state, especially the risk factor drowsiness, become increasingly essential to ensure the driver's ability to take over control from the vehicle on time. In research, the use of supervised machine learning for drowsiness detection is the prevalent method. As the ground truth for drowsiness is both application- and user-dependent, and no golden standard exists for its definition, measures are usually applied in the form of observer ratings. Also, in this work, observer ratings were investigated with regard to the required level of detail/complexity. To this end, video data, recorded within a simulator study (N = 30) comprised of each 45-minute manual and automated driving sessions, were evaluated by trained raters. Correlation analysis results show that - depending on the number of drowsiness levels - a comparable ground truth can be generated by reducing the rating frequency and thus the rating complexity by a factor of five. The knowledge gained can be used in future studies in this research area, the collection of a reliable and valid ground truth of drowsiness, as well as for improving the process in developing interactive drowsiness detection systems.
Article
An online survey, the Understanding Emoji Survey , was conducted to assess how English-speaking social media users interpret the pragmatic functions of emoji in examples adapted from public Facebook comments, based on a modified version of [15]’s taxonomy of functions. Of the responses received (N = 519; 351 females, 120 males, 48 “other”; 354 under 30, 165 over 30, age range 18--70+), tone modification was the preferred interpretation overall, followed by virtual action , although interpretations varied significantly by emoji type. Female and male interpretations were generally similar, while “other” gender respondents differed significantly in dispreferring tone and preferring multiple functions . Respondents over 30 often did not understand the functions or interpreted the emoji literally, while younger users interpreted them in more conventionalized ways. Older males were most likely, and younger females were least likely, to not understand emoji functions and to find emoji confusing or annoying, consistent with previously reported gender and age differences in attitudes toward, and frequency of, emoji use.
Article
We propose a design for philanthropic or publicly funded seeding to allow (near) optimal provision of a decentralized, self-organizing ecosystem of public goods. The concept extends ideas from quadratic voting to a funding mechanism for endogenous community formation. Citizens make contributions to public goods of value to them. The amount received by the public good is (proportional to) the square of the sum of the square roots of contributions received. Under the “standard model,” this mechanism yields first best public goods provision. Variations can limit the cost, help protect against collusion, and aid coordination. We discuss applications to campaign finance and highlight directions for future analysis and experimentation. This paper was accepted by Joshua Gans, business strategy.
Book
Statistical Rethinking: A Bayesian Course with Examples in R and Stan builds readers’ knowledge of and confidence in statistical modeling. Reflecting the need for even minor programming in today’s model-based statistics, the book pushes readers to perform step-by-step calculations that are usually automated. This unique computational approach ensures that readers understand enough of the details to make reasonable choices and interpretations in their own modeling work. The text presents generalized linear multilevel models from a Bayesian perspective, relying on a simple logical interpretation of Bayesian probability and maximum entropy. It covers from the basics of regression to multilevel models. The author also discusses measurement error, missing data, and Gaussian process models for spatial and network autocorrelation. By using complete R code examples throughout, this book provides a practical foundation for performing statistical inference. Designed for both PhD students and seasoned professionals in the natural and social sciences, it prepares them for more advanced or specialized statistical modeling. Web Resource The book is accompanied by an R package (rethinking) that is available on the author’s website and GitHub. The two core functions (map and map2stan) of this package allow a variety of statistical models to be constructed from standard model formulas.
Article
While group chat is becoming increasingly popular for team collaboration, these systems generate long streams of unstructured back-and-forth discussion that are difficult to comprehend. In this work, we investigate ways to enrich the representation of chat conversations, using techniques such as tagging and summarization, to enable users to better make sense of chat. Through needfinding interviews with 15 active group chat users, who were shown mock-up alternative chat designs, we found the importance of structured representations, including signals such as discourse acts. We then developed Tilda, a prototype system that enables people to collaboratively enrich their chat conversation while conversing. From lab evaluations, we examined the ease of marking up chat using Tilda as well as the effectiveness of Tilda-enabled summaries for getting an overview. From a field deployment, we found that teams actively engaged with Tilda both for marking up their chat as well as catching up on chat.
Thesis
This study investigated the differences between a modified version of the traditional ranking format (MTF) and a novel ranking format called the BINS format. The BINS format utilizes drag-and-drop technology to rank alternatives, allows respondents to indicate distance between ranks, and also allows respondents to assign ties to the same alternatives. Seventy-two participants completed two ranking tasks: a ranking of items from the Rokeach Value Survey – Form D (RVS) and a ranking of aspects according to how important they were in a participant’s decision to attend the University of Dayton (UD). Participants used the MTF to complete one ranking task, and the BINS format for the other. Four variables were examined for each ranking format: Completion Time (as recorded by a computer control system and as self-reported by participants), Usability on the System Usability Scale (SUS), Format Preference, and Number of Repositionings (as recorded by a computer control system and as self-reported by participants). Participants completed the RVS ranking task more quickly using the MTF when compared to the BINS format. There were no significant differences in completion time when participants ranked aspects related to UD. However, for both the RVS and aspects related to UD, significantly more participants self-reported that the BINS format allowed them to complete their ranking task faster than the MTF. Participants rated the BINS format as significantly more usable than the MTF. The majority of participants (78%) preferred to use the BINS format more than the MTF. Participants reported repositioning alternatives (ranking an alternative and then re-ranking the same alternative) significantly more often using the BINS format than the MTF. There was not a significant difference in actual repositionings between the MTF and the BINS format as reported by the computer control system. Overall, the results of this study established that the BINS format is a clear improvement over the MTF. The BINS format outperformed the MTF on measures of usability, preference, and reported number of repositionings. Furthermore, the BINS format reduces respondent burden by displaying an ordered list of ranked alternatives throughout a ranking task. By capturing information on the distance between ranks and by permitting ties between alternatives, the BINS format allows researchers to collect rich ranking data that is also compatible with factor analytic techniques. These unique features of the BINS format make it an ideal tool for implementation in the field of electronic survey research.
Conference Paper
Voting is a glocalized event across countries, states and municipalities in which individuals of all abilities want to participate. To enable people with disabilities to participate accessible voting is typically implemented by adding assistive technologies to electronic voting machines to accommodate people with disabilities. To overcome the complexities and inequities in this practice, two interfaces, EZ Ballot, which uses a linear yes/no input system for all selections, and QUICK Ballot, which provides random access voting through direct selection, were designed to provide one system for all voters. This paper reports efficacy testing of both interfaces. The study demonstrated that voters with a range of visual abilities were able to use both ballots independently. While non-sighted voters made fewer errors on the linear ballot (EZ Ballot), partially-sighted and sighted voters completed the random access ballot (QUICK Ballot) in less time. In addition, a higher percentage of non-sighted participants preferred the linear ballot, and a higher percentage of sighted participants preferred the random ballot.
Book
This textbook brings together both new and traditional research methods in Human Computer Interaction (HCI). Research methods include interviews and observations, ethnography, grounded theory and analysis of digital traces of behavior. Readers will gain an understanding of the type of knowledge each method provides, its disciplinary roots and how each contributes to understanding users, user behavior and the context of use. The background context, clear explanations and sample exercises make this an ideal textbook for graduate students, as well as a valuable reference for researchers and practitioners. 'It is an impressive collection in terms of the level of detail and variety.' (M. Sasikumar, ACM Computing Reviews #CR144066)
Article
Feature-rich software can be difficult to learn and use, and current approaches to organizing functionality do little to help users with performing unfamiliar tasks. In this paper, we investigate the potential for alternative task-centric interface designs that organize functionality around specific tasks. To understand the potential of this approach, we developed and studied Workflows, a prototype task-centric interface design. Our findings suggest that task-centric interfaces scaffold and guide the user's exploration of a subset of application functionality, and thereby help them to avoid common difficulties and inefficiencies caused by self-directed exploration of the full interface. We also found evidence that task-centric interfaces enable a different kind of application learning, in which users associate tasks with relevant keywords as opposed to low-level commands and procedures. This has potential benefits for memorability, because the keywords themselves describe the task, and scalability, because a few keywords can map to an arbitrarily large procedure.
Article
Cognitive load theory uses evolutionary theory to consider human cognitive architecture and uses that architecture to devise novel, instructional procedures. The theory assumes that knowledge can be divided into biologically primary knowledge that we have evolved to acquire and biologically secondary knowledge that is important for cultural reasons. Secondary knowledge, unlike primary knowledge, is the subject of instruction. It is processed in a manner that is analogous to the manner in which biological evolution processes information. When dealing with secondary knowledge, human cognition requires a very large information store, the contents of which are acquired largely by obtaining information from other information stores. Novel information is generated by a random generate and test procedure with only very limited amounts of novel information able to be processed at any given time. In contrast, very large amounts of organized information stored in the information store can be processed in order to generate complex action. This architecture has been used to generate instructional procedures, summarized in this chapter.
Article
Many claims are being made about the advantages of conducting surveys on the Web. However, there has been little research on the effects of format or design on the levels of unit and item response or on data quality. In a study conducted at the University of Michigan, a number of experiments were added to a survey of the student population to assess the impact of design features on resulting data quality. A sample of 1,602 students was sent an e-mail invitation to participate in a Web survey on attitudes toward affirmative action. Three experiments on design approaches were added to the survey application. One experiment varied whether respondents were reminded of their progress through the instrument. In a second experiment, one version presented several related items on one screen, while the other version presented one question per screen. In a third experiment, for one series of questions a random half of the sample clicked radio buttons to indicate their answers, while the other half entered a numeric response in a box. This article discusses the overall implementation and outcome of the survey, and it describes the results of the imbedded design experiments.
Article
Economic models of decision making assume that people have a stable way of thinking about value. In contrast, psychology has shown that people's preferences are often malleable and influenced by normatively irrelevant contextual features. Whereas economics derives its predictions from the assumption that people navigate a world of scarce resources, recent psychological work has shown that people often do not attend to scarcity. In this article, we show that when scarcity does influence cognition, it renders people less susceptible to classic context effects. Under conditions of scarcity, people focus on pressing needs and recognize the trade-offs that must be made against those needs. Those trade-offs frame perception more consistently than irrelevant contextual cues, which exert less influence. The results suggest that scarcity can align certain behaviors more closely with traditional economic predictions. © The Author(s) 2015.
Article
Studies have shown that voting error remains a problem with Direct Recording Electronic (DRE) voting machines. DRE's have an advantage over other voting technologies by facilitating ballot verification through review screens. However, results from ballot verification studies have shown that no more than half of study participants notice review screen anomalies (Campbell & Byrne, 2009). This research replicated previous studies on anomaly detection on review screens using a multimodal voting system called Prime III. The results suggest that Prime III facilitates ballot verification and effectively yielded a detection rate of 90%, even without informing participants on the importance of ballot verification.
Article
Social values are most commonly measured using ranking techniques, but there is a scarcity of systematic comparisons between rankings and other approaches to measuring values in survey research. On the basis of data from the 1980 General Social Survey, this article evaluates the comparability of results obtained using rankings and ratings of valued qualities. The comparison focuses on (1) the ordering of aggregate value preferences and (2) the measurement of individual differences in latent value preferences. The two methods are judged to be similar with respect to ordering the aggregate preferences of the sample, but dissimilar with regard to the latent variable structure underlying the measures