Conference Paper

Will You Accept an Imperfect AI?: Exploring Designs for Adjusting End-user Expectations of AI Systems

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

AI technologies have been incorporated into many end-user applications. However, expectations of the capabilities of such systems vary among people. Furthermore, bloated expectations have been identified as negatively affecting perception and acceptance of such systems. Although the intelligibility of ML algorithms has been well studied, there has been little work on methods for setting appropriate expectations before the initial use of an AI-based system. In this work, we use a Scheduling Assistant - an AI system for automated meeting request detection in free-text email - to study the impact of several methods of expectation setting. We explore two versions of this system with the same 50% level of accuracy of the AI component but each designed with a different focus on the types of errors to avoid (avoiding False Positives vs. False Negatives). We show that such different focus can lead to vastly different subjective perceptions of accuracy and acceptance. Further, we design expectation adjustment techniques that prepare users for AI imperfections and result in a significant increase in acceptance.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Our study isolates false positive errors, instances where something harmless is incorrectly flagged as dangerous, to examine their specific impact on trust and to maintain experimental clarity. Including false negatives in this study may have required additional conditions and potentially introduced confounds that obscure the effects of timing and severity, as both prior studies [11,30] and our initial stimuli validation showed that users responded differently to false-positives and false-negatives. ...
... As AI systems grow more persuasive and influential in shaping human decisions [53], error distinction becomes increasingly important. Kocielnik et al. [30] found that users respond differently to false alarms versus missed detections, suggesting that the acceptability of false positives depends heavily on the perceived cost of ignoring them. By focusing exclusively on false positives, our study aligns with high-impact, real-world use cases where these errors are both common and consequential, e.g., where operators must sift through alerts, many of which may be false and undermine trust in the system, leading to severe consequences [1,18,23,30,67]. ...
... Kocielnik et al. [30] found that users respond differently to false alarms versus missed detections, suggesting that the acceptability of false positives depends heavily on the perceived cost of ignoring them. By focusing exclusively on false positives, our study aligns with high-impact, real-world use cases where these errors are both common and consequential, e.g., where operators must sift through alerts, many of which may be false and undermine trust in the system, leading to severe consequences [1,18,23,30,67]. In high-stakes settings, false positives can lead to operational disruption or undue censorship [29], and users may respond differently to flagged harmless content than to missed threats. ...
Article
Full-text available
AI systems are increasingly used to assist decision making in high-and low-stakes domains, yet little is known about how the timing and severity of their errors affect user trust. In this mixed-methods study (n = 364), we examine how users respond to AI misclassifica-tions in two real-world contexts: military security and social media moderation. Participants evaluated a classifier that made high-or low-severity errors at different points in a sequence (beginning, end, random, or never). We find that trust is not simply a function of accuracy, but shaped by the timing and severity of errors. Even subtle output sequencing can influence perception, especially in 'low-risk' contexts. We discuss how certain interaction design patterns, such as sequencing outputs to end on a high note, could inadvertently or deliberately shape user trust. We propose a set of preliminary design patterns and oversight strategies to help identify when user perceptions might be unintentionally distorted. By pinpointing how severity and timing shape willingness to rely on AI, this work provides practical guidance for building systems that better align with human expectations, foster user trust, promote transparency, and support regulatory oversight. CCS CONCEPTS • Human-centered computing → Human computer interaction (HCI).
... The paper by Kocielnik et al. [7] investigates how to set appropriate expectations for AI systems among end-users, particularly focusing on an AI-powered Scheduling Assistant. It explores various techniques for expectation setting and their impact on user acceptance and satisfaction, emphasizing the importance of managing user perceptions of AI imperfections. ...
... Yang et al. [6], Kocielnik et al. [7] Managing User Expectations Misaligned expectations between user perception and actual AI capabilities lead to user dissatisfaction. Clear communication is essential. ...
... Studies by Kocielnik et al. [7] emphasize the challenge of managing user expectations in AI-driven systems. Users often have inflated expectations of what AI can achieve, leading to frustration when the system fails to meet those expectations. ...
Preprint
Full-text available
The integration of machine learning (ML) technologies into various fields necessitates user-centered design (UCD) approaches to enhance system usability and effectiveness. This scoping review explores recent literature (2019-2024) to examine how UCD principles are applied to ML systems, identifying key challenges, methodologies, gaps, and future research directions. By synthesizing insights from relevant studies, the review emphasizes the importance of aligning ML systems with user needs, managing expectations, and fostering interdisciplinary collaboration. Gaps identified include limited transparency in AI systems, interdisciplinary knowledge gaps, and a lack of domain-specific UCD methodologies. The review also suggests refining UCD processes and educational resources to ensure ML systems remain user-centric across diverse application domains.
... Kocielnik et al. [ 17] investigate the role of uncertainty in AI-driven assistants, proposing that systems should openly communicate confidence levels to users. Their findings suggest that interfaces capable of expressing uncertaintyrather than providing overconfident but inaccurate responses-can enhance user trust. ...
... -Intermediate (Learning): Users begin to recognize affordances and expect system responsiveness. At this stage, the interface dynamically suggests interactions based on prior user behavior [ 8,17]. Example: A chatbot offering reminders for frequently checked account balances. ...
... All previous considerations shared the implicit assumption that humans actually evaluate how the AI is performing its assigned task, and nothing but that. However, mismatches between AI capabilities and human expectations are common (Bach et al., 2024;Kocielnik et al., 2019). These expectations can be shaped by the knowledge, experiences, and attitudes humans hold about a particular domain. ...
... More specifically, they might shift the task definition, not evaluating performance on the assigned subtask (e.g., person detection) but on a more global task (e.g., collision avoidance). This is in line with previous findings that AI capabilities and human expectations often are misaligned (Bach et al., 2024;Kocielnik et al., 2019). ...
Preprint
Full-text available
If artificial intelligence (AI) is to be applied in safety-critical domains, its performance needs to be evaluated reliably. The present study aimed to understand how humans evaluate AI systems for person detection in automatic train operation. In three experiments, participants saw image sequences of people moving in the vicinity of railway tracks. A simulated AI had highlighted all detected people, sometimes correctly and sometimes not. Participants had to provide a numerical rating of the AI's performance and then verbally explain their rating. The experiments varied several factors that might influence human ratings: the types and plausibility of AI mistakes, the number of affected images, the number of people present in an image, the position of people relevant to the tracks, and the methods used to elicit human evaluations. While all these factors influenced human ratings, some effects were unexpected or deviated from normative standards. For instance, the factor with the strongest impact was people's position relative to the tracks, although participants had explicitly been instructed that the AI could not process such information. Taken together, the results suggest that humans may sometimes evaluate more than the AI's performance on the assigned task. Such mismatches between AI capabilities and human expectations should be taken into consideration when conducting safety audits of AI systems.
... Therefore, inline suggestions, while streamlining coding, warrant the need for transparent AI behavior and meaningful explanations to mitigate "automation bias" (Al Madi, 2022) and acceptance of incorrect code. Effective trust calibration can involve realtime feedback loops (e.g.,, automated test suites) or explicit disclaimers about the AI's uncertainty (Kocielnik et al., 2019). In line with HCI work on explainable AI, embedding contextual clues (e.g.,, highlighting AI confidence levels or providing direct links to relevant training data) may calibrate trust more effectively (Kocielnik et al., 2019). ...
... Effective trust calibration can involve realtime feedback loops (e.g.,, automated test suites) or explicit disclaimers about the AI's uncertainty (Kocielnik et al., 2019). In line with HCI work on explainable AI, embedding contextual clues (e.g.,, highlighting AI confidence levels or providing direct links to relevant training data) may calibrate trust more effectively (Kocielnik et al., 2019). Such design elements should also encourage user control, letting developers accept or reject suggestions with a clear sense of the trade-offs (Prather et al., 2024). ...
Preprint
Full-text available
The integration of Artificial Intelligence (AI) into Integrated Development Environments (IDEs) is reshaping software development, fundamentally altering how developers interact with their tools. This shift marks the emergence of Human-AI Experience in Integrated Development Environment (in-IDE HAX), a field that explores the evolving dynamics of Human-Computer Interaction in AI-assisted coding environments. Despite rapid adoption, research on in-IDE HAX remains fragmented which highlights the need for a unified overview of current practices, challenges, and opportunities. To provide a structured overview of existing research, we conduct a systematic literature review of 89 studies, summarizing current findings and outlining areas for further investigation. Our findings reveal that AI-assisted coding enhances developer productivity but also introduces challenges, such as verification overhead, automation bias, and over-reliance, particularly among novice developers. Furthermore, concerns about code correctness, security, and maintainability highlight the urgent need for explainability, verification mechanisms, and adaptive user control. Although recent advances have driven the field forward, significant research gaps remain, including a lack of longitudinal studies, personalization strategies, and AI governance frameworks. This review provides a foundation for advancing in-IDE HAX research and offers guidance for responsibly integrating AI into software development.
... They posit that AI models must have the ability to be simulated, be decomposable, and have explainable algorithms. Kocielnik et al. [40] explore design thinking approaches to make AIbased scheduling assistants more usable and comprehensible, e.g., they [40] design an example-based explanation behind a few representative predictions. Liu et al. [8] visualise examples of Decision Trees, LIME, GNNExplainer, and Gradient Visualisations to explain the working principles of AI models. ...
... They posit that AI models must have the ability to be simulated, be decomposable, and have explainable algorithms. Kocielnik et al. [40] explore design thinking approaches to make AIbased scheduling assistants more usable and comprehensible, e.g., they [40] design an example-based explanation behind a few representative predictions. Liu et al. [8] visualise examples of Decision Trees, LIME, GNNExplainer, and Gradient Visualisations to explain the working principles of AI models. ...
Preprint
Full-text available
The rapid integration of AI in products, services, and innovation processes has enabled transformative applications, raising global concerns about the trustworthiness of AI features and the corresponding development processes. In this paper, we provide a perspective on how design and innovation processes can be adapted to ensure the trustworthiness of AI-centric artefacts. We review generic recommendations for trustworthy AI provided by various organisations and scholars. By leveraging the “double-hump” model of data-driven innovation, we explain and illustrate how trustworthy AI could be integrated into the design and innovation processes. We then propose research directions, data, and methods that could help gather an empirical understanding of trustworthiness and thus lead to an assessment of existing AI artefacts for trustworthiness. Since there is a disparity among domains and organisations in terms of AI-related risk and maturity, we expect that the proposed process model and the assessment methods could contribute towards a reliable road map for the development and assessment of trustworthy AI.
... It highlights the importance of forming a correct mental model of the AI's limitations, so users know when to trust or distrust the AI [13]. However, studies found that flawed mental models exist and could result in unexpected harmful consequences and negatively affect human-AI collaboration [11,45]. ...
... For example, Kocielnik et al. [45] found that most people do not expect AI to behave inconsistently or imperfectly, which can prevent them from accurately evaluating and overseeing the system's performance. In contrast, Alexander et al. [11] found that people tend to have lower tolerance for mistakes made by AI and are less likely to continue using the system if it falters. ...
Preprint
Full-text available
Language model (LM) agents that act on users' behalf for personal tasks (e.g., replying emails) can boost productivity, but are also susceptible to unintended privacy leakage risks. We present the first study on people's capacity to oversee the privacy implications of the LM agents. By conducting a task-based survey (N=300), we investigate how people react to and assess the response generated by LM agents for asynchronous interpersonal communication tasks, compared with a response they wrote. We found that people may favor the agent response with more privacy leakage over the response they drafted or consider both good, leading to an increased harmful disclosure from 15.7% to 55.0%. We further identified six privacy profiles to characterize distinct patterns of concerns, trust, and privacy preferences in LM agents. Our findings shed light on designing agentic systems that enable privacy-preserving interactions and achieve bidirectional alignment on privacy preferences to help users calibrate trust.
... Additionally, this work emphasizes the importance of reducing time and physical labor to improve the efficiency and ergonomics of the scanning application, showcasing the potential for robotic systems to ease physical strain and streamline workflows. This is especially important in an environment where innovative features often gain more attention than their actual functionality [7]. ...
Article
Full-text available
This study evaluated the usability and effectiveness of robotic platforms working together with foresters in the wild on forest inventory tasks using LiDAR scanning. Emphasis was on the Universal Access principle, ensuring that robotic solutions are not only effective but also environmentally responsible and accessible for diverse users. Three robotic platforms were tested: Boston Dynamics Spot, AgileX Scout, and Bunker Mini. Spot’s quadrupedal locomotion struggled in dense undergrowth, leading to frequent mobility failures and a System Usability Scale (SUS) score of 78 ± 10. Its short, battery life and complex recovery processes further limited its suitability for forest operations without substantial modifications. In contrast, the wheeled AgileX Scout and tracked Bunker Mini demonstrated superior usability, each achieving a high SUS score of 88 ± 5. However, environmental impact varied: Scout’s wheeled design caused minimal disturbance, whereas Bunker Mini’s tracks occasionally damaged young vegetation, highlighting the importance of gentle interaction with natural ecosystems in robotic forestry. All platforms enhanced worker safety, reduced physical effort, and improved LiDAR workflows by eliminating the need for human presence during scans. Additionally, the study engaged forest engineering students, equipping them with hands-on experience in emerging robotic technologies and fostering discussions on their responsible integration into forestry practices. This study lays a crucial foundation for the integration of Artificial Intelligence (AI) into forest robotics, enabling future advancements in autonomous perception, decision-making, and adaptive navigation. By systematically evaluating robotic platforms in real-world forest environments, this research provides valuable empirical data that will inform AI-driven enhancements, such as machine learning-based terrain adaptation, intelligent path planning, and autonomous fault recovery. Furthermore, the study holds high value for the international research community, serving as a benchmark for future developments in forestry robotics and AI applications. Moving forward, future research will build on these findings to explore adaptive remote operation, AI-powered terrain-aware navigation, and sustainable deployment strategies, ensuring that robotic solutions enhance both operational efficiency and ecological responsibility in forest management worldwide.
... Several authors refer to this type of explanation as human-centered explanations [49,41,28]. The use of domain knowledge to increase the interpretability of intelligent systems has been studied in different works [50,51,52,53,54]. ...
Preprint
Full-text available
The growing application of artificial intelligence in sensitive domains has intensified the demand for systems that are not only accurate but also explainable and trustworthy. Although explainable AI (XAI) methods have proliferated, many do not consider the diverse audiences that interact with AI systems: from developers and domain experts to end-users and society. This paper addresses how trust in AI is influenced by the design and delivery of explanations and proposes a multilevel framework that aligns explanations with the epistemic, contextual, and ethical expectations of different stakeholders. The framework consists of three layers: algorithmic and domain-based, human-centered, and social explainability. We highlight the emerging role of Large Language Models (LLMs) in enhancing the social layer by generating accessible, natural language explanations. Through illustrative case studies, we demonstrate how this approach facilitates technical fidelity, user engagement, and societal accountability, reframing XAI as a dynamic, trust-building process.
... HCI research has explored how to improve ideation of AI concepts that are technically feasible and desired by users [31,36,65]. This addresses two of the four main reasons AI projects fail. ...
Conference Paper
Full-text available
Innovators transform the world by understanding where services are successfully meeting customers' needs and then using this knowledge to identify failsafe opportunities for innovation. Pre-trained models have changed the AI innovation landscape, making it faster and easier to create new AI products and services. Understanding where pre-trained models are successful is critical for supporting AI innovation. Unfortunately, the hype cycle surrounding pre-trained models makes it hard to know where AI can really be successful. To address this, we investigated pre-trained model applications developed by HCI researchers as a proxy for commercially successful applications. The research applications demonstrate technical capabilities, address real user needs, and avoid ethical challenges. Using an artifact analysis approach, we categorized capabilities, opportunity domains, data types, and emerging interaction design patterns, uncovering some of the opportunity space for innovation with pre-trained models.
... HCI research has explored how to improve ideation of AI concepts that are technically feasible and desired by users [31,36,65]. This addresses two of the four main reasons AI projects fail. ...
Preprint
Full-text available
Innovators transform the world by understanding where services are successfully meeting customers' needs and then using this knowledge to identify failsafe opportunities for innovation. Pre-trained models have changed the AI innovation landscape, making it faster and easier to create new AI products and services. Understanding where pre-trained models are successful is critical for supporting AI innovation. Unfortunately, the hype cycle surrounding pre-trained models makes it hard to know where AI can really be successful. To address this, we investigated pre-trained model applications developed by HCI researchers as a proxy for commercially successful applications. The research applications demonstrate technical capabilities, address real user needs, and avoid ethical challenges. Using an artifact analysis approach, we categorized capabilities, opportunity domains, data types, and emerging interaction design patterns, uncovering some of the opportunity space for innovation with pre-trained models.
... For example, Kocielnik et al. [100] found that presenting AI explanations upfront could lower user expectations of AI performance, thereby reducing dissatisfaction during AI breakdowns. However, this emotional remedy may not be effective when explanations are not perceived as practically helpful. ...
Preprint
Full-text available
This paper explores potential benefits of incorporating Rhetorical Design into the design of Explainable Artificial Intelligence (XAI) systems. While XAI is traditionally framed around explaining individual predictions or overall system behavior, explanations also function as a form of argumentation, shaping how users evaluate system perceived usefulness, credibility, and foster appropriate trust. Rhetorical Design offers a useful framework to analyze the communicative role of explanations between AI systems and users, focusing on: (1) logical reasoning conveyed through different types of explanations, (2) credibility projected by the system and its developers, and (3) emotional resonance elicited in users. Together, these rhetorical appeals help us understand how explanations influence user perceptions and facilitate AI adoption. This paper synthesizes design strategies from prior XAI work that align with these three rhetorical appeals and highlights both opportunities and challenges of integrating rhetorical design into XAI design.
... • Contextual awareness: The option to consider details around an ad rejection which include advertiser's past history, the type of business model and vertical and specific policies that are violated in the given situation [31]. ...
Article
Full-text available
The combination of Artificial Intelligence (AI) and advances in technology into the advertising platforms has changed the way policy enforcement works. AI offers scalability and improves efficiency in preventing bad actors from encroaching systems while protecting users from harmful content, however, this brings a new challenge in terms of opacity to legitimate advertisers. Explainable AI (XAI) gives a potential solution by offering some level of transparency into the AI world of enforcement and decision making. This paper talks about the benefits of XAI in ad enforcement, helping advertisers understand and correct their policy violations and in improving user trust through better ad disapproval explanation and recommendations to fix them. Along with the advantage of increased transparency, there are potential risks that need to be considered like exploiting of this information by malicious and bad actors. This paper focused on a tiered transparency framework for achieving a good balance between the transparency and protection problem in the implementation of XAI in the digital advertising landscape.
... Additionally, people might have clear expectations for how AIs should behave (e.g., as helpful and competent assistants), such that this schema collapses if one AI deviates. People also expect AIs to be more competent and to act less immorally than humans [1,31,50,59], which could make the negative action more surprising. This might especially be the case in organizational contexts if AI systems are expected to perform to a minimum competency standard in order for the organization to opt into using them. ...
Article
Full-text available
Robots and other artificial intelligence (AI) systems are widely perceived as moral agents responsible for their actions. As AI proliferates, these perceptions may become entangled via the moral spillover of attitudes towards one AI to attitudes towards other AIs. We tested how the seemingly harmful and immoral actions of an AI or human agent spill over to attitudes towards other AIs or humans in two preregistered experiments. In Study 1 ( N = 720), we established the moral spillover effect in human-AI interaction by showing that immoral actions increased attributions of negative moral agency (i.e., acting immorally) and decreased attributions of positive moral agency (i.e., acting morally) and moral patiency (i.e., deserving moral concern) to both the agent (a chatbot or human assistant) and the group to which they belong (all chatbot or human assistants). There was no significant difference in the spillover effects between the AI and human contexts. In Study 2 ( N = 684), we tested whether spillover persisted when the agent was individuated with a name and described as an AI or human, rather than specifically as a chatbot or personal assistant. We found that spillover persisted in the AI context but not in the human context, possibly because AIs were perceived as more homogeneous due to their outgroup status relative to humans. This asymmetry suggests a double standard whereby AIs are judged more harshly than humans when one agent morally transgresses. With the proliferation of diverse, autonomous AI systems, HCI research and design should account for the fact that experiences with one AI could easily generalize to perceptions of all AIs and negative HCI outcomes, such as reduced trust.
... To evaluate team effectiveness (Hackman, 1978;O'Neill, Flathmann, McNeese, & Salas, 2023), we measure the HATs' performance in terms of their ability in a cyberdefense scenario that requires agents to prevent attacks and resolve network issues. In addition to performance in this task, we also measure human perception of agents in terms of trustworthiness and cooperativeness in a post-experiment questionnaire (Kocielnik, Amershi, & Bennett, 2019;Ragot, Martin, & Cojean, 2020). To better understand how different types of autonomous agents lead to different team outcomes, we measure three collaborative metrics during the teamwork process based on the dynamics of the cyberdefense task. ...
Article
Full-text available
Autonomous agents are becoming increasingly prevalent and capable of collaborating with humans on interdependent tasks as teammates. There is increasing recognition that human-like agents might be natural human collaborators. However, there has been limited work on designing agents according to the principles of human cognition or in empirically testing their teamwork effectiveness. In this study, we introduce the Team Defense Game (TDG), a novel experimental platform for investigating human-autonomy teaming in cyber defense scenarios. We design an agent that relies on episodic memory to determine its actions (Cognitive agent) and compare its effectiveness with two types of autonomous agents: one that relies on heuristic reasoning (Heuristic agent) and one that behaves randomly (Random agent). These agents are compared in a human-autonomy team (HAT) performing a cyber-protection task in the TDG. We systematically evaluate how autonomous teammates’ abilities and competence impact the team’s interaction and outcomes. The results revealed that teams with Cognitive agents are the most effective partners, followed by teams with Heuristic and Random agents. Evaluation of collaborative team process metrics suggests that the cognitive agent is more adaptive to individual play styles of human teammates, but it is also inconsistent and less predictable than the Heuristic agent. Competent agents (Cognitive and Heuristic agents) require less human effort but might cause over-reliance. A post-experiment questionnaire showed that competent agents are rated more trustworthy and cooperative than Random agents. We also found that human participants’ subjective ratings correlate with their team performance, and humans tend to take the credit or responsibility for the team. Our work advances HAT research by providing empirical evidence of how the design of different autonomous agents (cognitive, heuristic, and random) affect team performance and dynamics in cybersecurity contexts. We propose that autonomous agents for HATs should possess both competence and human-like cognition while also ensuring predictable behavior or clear explanations to maintain human trust. Additionally, they should proactively seek human input to enhance teamwork effectiveness.
... However, it has been shown empirically that people perceive information from AI differently (Gelles et al., 2018;Kaur et al., 2022;Kocielnik et al., 2019;Lee & See, 2004;Meyer & Remisch, 2021;Ribeiro et al., 2016;Veletsianos et al., 2007;Zuboff, 1989). Combined with tendencies to anthropomorphize computers in general (Nass et al., 1994), and AI systems particularly (Proudfoot, 2011), humans may treat AI as more intelligent, which could lead to suboptimal decision outcomes. ...
Thesis
Full-text available
Within the decision space that has traditionally been reserved for humans, artificial intelligence (AI) is set to take over a number of tasks. In response, human decision-makers interacting with AI systems may have difficulty forming trust around such AI-generated information. Decision-making is currently conceptualized as a constructive process of evidence accumulation. However, this constructive process may evolve differently depending on how such interactions are engineered. The purpose of this study is to investigate how trust evolves temporally through intermediate judgments on AI-provided advice. In an online experiment (N=192), trust was found to oscillate over time and it was discovered that eliciting an intermediate judgment on AI provided advice exhibited a bolstering effect. Additionally, the study revealed that participants exhibited violations of total probability that current modeling techniques are unable to capture. Therefore, an approach using quantum open system modeling, representing trust as a function of time with a single probability distribution, is shown to improve modeling trust in an AI system over traditional Markovian techniques. The results of this study should improve AI system behaviors that may help steer a human’s preference to more Bayesian optimal rationality, which is useful in time-critical decision-making scenarios in complex task environments.
... However, the fact that a system is actually trustworthy does not seem to be sufficient to develop high levels of perceived trustworthiness [34,39,102]. Conversely, systems that actually lack trustworthiness may still be perceived as trustworthy [84,112,122]. Relatedly, different 1 In the remainder of this paper, we use the term (AI) system to refer to any kind of semi-automated and automated decision systems. This includes systems that produce software-based decisions such as embedded and cyber-physical systems (e.g., robots), algorithmic systems, and data-based systems (e.g., machine learning or deep neural networks). ...
... However, human-centered AI design presents two key challenges. The first involves the unique properties of AI as a design material, such as explainability [21], interpretability [1], trust [80], user control [44], and managing user expectations [41]. The second challenge is designers' understanding of AI capabilities. ...
Preprint
Full-text available
AI offers key advantages such as instant generation, multi-modal support, and personalized adaptability - potential that can address the highly heterogeneous communication barriers faced by people with aphasia (PWAs). We designed AI-enhanced communication tools and used them as design probes to explore how AI's real-time processing and generation capabilities - across text, image, and audio - can align with PWAs' needs in real-time communication and preparation for future conversations respectively. Through a two-phase "Research through Design" approach, eleven PWAs contributed design insights and evaluated four AI-enhanced prototypes. These prototypes aimed to improve communication grounding and conversational agency through visual verification, grammar construction support, error correction, and reduced language processing load. Despite some challenges, such as occasional mismatches with user intent, findings demonstrate how AI's specific capabilities can be advantageous in addressing PWAs' complex needs. Our work contributes design insights for future Augmentative and Alternative Communication (AAC) systems.
... II. XAI: EXPLAINABLE ARTIFICIAL INTELLIGENCE XAI has been applied across various felds [6,52,87,92] to interpret the black-box nature of AI models, including areas like Affective Computing [48], HCI [2,15,63,73,74,129,149,152,159], NLP [7,95,117,147,156], recommender systems [32,50,53,62], and social and cognitive sciences [18,24,104,131,138]. XAI has also been explored in philosophy [13,67,105,130], folk-psychological approaches [28,29,140,141], journalism [126], and across different modalities [137]. ...
Conference Paper
A robot arm explaining to a human how it detected an error using social signals, employing post-hoc explanations to communicate the predictions of an error detection deep-learning model. The study explored three explanation types: why-explanations (global feature importance), how-explanations (local feature contributions), and what-if-explanations (counterfactual scenarios). The fgure shows a what-if explanation, illustrating how changes in the user's facial expressions could alter the prediction. Abstract-As black-box AI systems become increasingly complex , understanding when and how to provide explanations to users is crucial. Multimodal signals, such as facial expressions, offer novel insights into how frequently explanations should be given. This paper explores whether users' facial features can help estimate the need for explanations in a collaborative robot task. We applied three state-of-the-art eXplainable AI (XAI) methods, addressing how, why, and what-if questions, explaining the robot's failure detection model. Each explanation type conveyed information differently: how-explanations described how the model functions, why-explanations provided personalised insights into input-feature-related cues, and what-if-explanations explored alternative scenarios. In a mixed-design study (N = 33), participants performed a robot-assisted pick-and-place task, receiving different explanation types. Our results show that users responded differently to these explanations, with why-explanations being the most preferred and prompting closer alignment in facial expressions with the robot. Contrary to expectations, what-if explanations led to the least alignment and required greater vocal effort. These fndings demonstrate how non-verbal cues can guide the frequency and type of explanations (personalised or general) and further highlight the importance of model transparency in human-robot collaboration.
... AI can be wrong in explaining religious content because AI itself is a product that was created to think like humans, but fundamentally cannot think like humans. AI is just a technology that collects data based on certain algorithms and then provides material based on the data stored and the algorithms used [57]. Given this, it does not rule out that AI could pose a threat to students' misconceptions about Islamic religious education material. ...
Article
Full-text available
Artificial Intelligence (AI) has emerged as a transformative force in education, including Islamic religious education, where it offers new opportunities to enhance learning methodologies. This study aims to analyze the integration of AI in Islamic education by evaluating its strengths, weaknesses, opportunities, and threats (SWOT analysis) while categorizing its impact on cognitive, affective, and psychomotor domains based on Bloom’s Taxonomy. A qualitative approach using library research methodology was employed, with data collected from academic journals, books, and research reports, analyzed through a SWOT framework. The findings indicate that AI significantly enhances cognitive and psychomotor learning in Islamic education. AI-based tools, such as ClassPoint AI, AI Chatbots, and Squirrel AI, contribute to knowledge retention, adaptive learning, and skill-based training in areas like Quranic recitation, prayer practices, and Islamic jurisprudence. However, AI remains limited in fostering affective learning, as it lacks human emotional intelligence and the ability to provide moral and ethical guidance, which are essential in Islamic education. The study also reveals challenges such as ethical concerns, technological disparities, and socio-cultural resistance in integrating AI into religious studies. Despite these limitations, AI presents significant opportunities, particularly in remote learning, personalized education, and accessibility for underserved communities. This research provides a structured evaluation of AI’s role within Bloom’s Taxonomy, offering insights into AI’s potential and limitations in Islamic education. The study contributes theoretically by linking AI-driven education with pedagogical principles, while practically, it guides educators and policymakers in strategically implementing AI while preserving Islamic ethical values. The study concludes that while AI enhances knowledge acquisition and skill-based learning, human educators remain essential for moral and ethical development. Future research should focus on developing ethical AI models, hybrid AI-human teaching approaches, and AI-driven affective learning systems to bridge gaps in AI-assisted moral and spiritual education.
... While our study did not surface such occurrences-likely because CodeA11y's suggestions were tied directly to verified issues from an accessibility checker, rather than the tool identifying issues on its own-this risk becomes more salient as AI assistants evolve to more proactively identify and address accessibility problems. Still, prior research suggests that developers often tolerate false positives more readily than false negatives [39], reasoning that overly cautious guidance from an assistant is less harmful than failing to flag genuine accessibility issues. Indeed, even standalone accessibility checkers, which are also known for producing false positives [34], have been widely adopted due to their overall beneficial effect on UI quality. ...
Preprint
Full-text available
A persistent challenge in accessible computing is ensuring developers produce web UI code that supports assistive technologies. Despite numerous specialized accessibility tools, novice developers often remain unaware of them, leading to ~96% of web pages that contain accessibility violations. AI coding assistants, such as GitHub Copilot, could offer potential by generating accessibility-compliant code, but their impact remains uncertain. Our formative study with 16 developers without accessibility training revealed three key issues in AI-assisted coding: failure to prompt AI for accessibility, omitting crucial manual steps like replacing placeholder attributes, and the inability to verify compliance. To address these issues, we developed CodeA11y, a GitHub Copilot Extension, that suggests accessibility-compliant code and displays manual validation reminders. We evaluated it through a controlled study with another 20 novice developers. Our findings demonstrate its effectiveness in guiding novice developers by reinforcing accessibility practices throughout interactions, representing a significant step towards integrating accessibility into AI coding assistants.
... Additionally, the degree of imperfection must be carefully balanced, as people might naturally expect AI agents to display intelligence, and continuous errors could decrease people's acceptance of them. To address this, a possible approach is to adjust people's expectations-preparing them for AI imperfections and enhancing their acceptance of such AI [29]. ...
Preprint
Full-text available
Prosocial behaviors, such as helping others, are well-known to enhance human well-being. While there is a growing trend of humans helping AI agents, it remains unclear whether the well-being benefits of helping others extend to interactions with non-human entities. To address this, we conducted an experiment (N = 295) to explore how helping AI agents impacts human well-being, especially when the agents fulfill human basic psychological needs--relatedness, competence, and autonomy--during the interaction. Our findings showed that helping AI agents reduced participants' feelings of loneliness. When AI met participants' needs for competence and autonomy during the helping process, there was a further decrease in loneliness and an increase in positive affect. However, when AI did not meet participants' need for relatedness, participants experienced an increase in positive affect. We discuss the implications of these findings for understanding how AI can support human well-being.
... While predictive AI systems are powerful, they are seldom perfect [58]. Transparency and accountability issues prevent deep learning-based AI systems from automation in high-stakes applications like medical diagnosis [21]. ...
Preprint
Full-text available
Explainable artificial intelligence (XAI) methods are being proposed to help interpret and understand how AI systems reach specific predictions. Inspired by prior work on conversational user interfaces, we argue that augmenting existing XAI methods with conversational user interfaces can increase user engagement and boost user understanding of the AI system. In this paper, we explored the impact of a conversational XAI interface on users' understanding of the AI system, their trust, and reliance on the AI system. In comparison to an XAI dashboard, we found that the conversational XAI interface can bring about a better understanding of the AI system among users and higher user trust. However, users of both the XAI dashboard and conversational XAI interfaces showed clear overreliance on the AI system. Enhanced conversations powered by large language model (LLM) agents amplified over-reliance. Based on our findings, we reason that the potential cause of such overreliance is the illusion of explanatory depth that is concomitant with both XAI interfaces. Our findings have important implications for designing effective conversational XAI interfaces to facilitate appropriate reliance and improve human-AI collaboration. Code can be found at https://github.com/delftcrowd/IUI2025_ConvXAI
... Machine learning algorithms have facilitated innovation in business efficiency and sustainability [1]. However, many of these algorithms require complex training and validation pipelines, which can lead to AI models, interchangeably referred to as cognitive services, that are not suitable for production or real environments [2]. For example, in medical imaging, satellite, and precision agriculture applications, researchers have proposed numerous methods for constructing AI models capable of making inferences [3] [4] without considering the fact that these trained models can later be useful to other individuals or organizations. ...
... Additionally, the Human-AI collaboration approach we propose is likely to improve acceptance among medical professionals compared to a fully automated system without human input 33 . Human supervision and refinement enhance interpretability and empower human raters by incorporating their expert judgment, ensuring that the AI complements rather than replaces the human element in surgical training. ...
Article
Full-text available
Formative verbal feedback during live surgery is essential for adjusting trainee behavior and accelerating skill acquisition. Despite its importance, understanding optimal feedback is challenging due to the difficulty of capturing and categorizing feedback at scale. We propose a Human-AI Collaborative Refinement Process that uses unsupervised machine learning (Topic Modeling) with human refinement to discover feedback categories from surgical transcripts. Our discovered categories are rated highly for clinical clarity and are relevant to practice, including topics like “Handling and Positioning of (tissue)” and “(Tissue) Layer Depth Assessment and Correction [during tissue dissection].” These AI-generated topics significantly enhance predictions of trainee behavioral change, providing insights beyond traditional manual categorization. For example, feedback on “Handling Bleeding” is linked to improved behavioral change. This work demonstrates the potential of AI to analyze surgical feedback at scale, informing better training guidelines and paving the way for automated feedback and cueing systems in surgery.
... Additionally, people might have clear expectations for how AIs should behave (e.g., as helpful and competent assistants), such that this schema collapses if one AI deviates. People also expect AIs to be more competent and to act less immorally than humans [1,31,50,59], which could make the negative action more surprising. This might especially be the case in organizational contexts if AI systems are expected to perform to a minimum competency standard in order for the organization to opt into using them. ...
Preprint
Full-text available
Robots and other artificial intelligence (AI) systems are widely perceived as moral agents responsible for their actions. As AI proliferates, these perceptions may become entangled via the moral spillover of attitudes towards one AI to attitudes towards other AIs. We tested how the seemingly harmful and immoral actions of an AI or human agent spill over to attitudes towards other AIs or humans in two preregistered experiments. In Study 1 (N = 720), we established the moral spillover effect in human-AI interaction by showing that immoral actions increased attributions of negative moral agency (i.e., acting immorally) and decreased attributions of positive moral agency (i.e., acting morally) and moral patiency (i.e., deserving moral concern) to both the agent (a chatbot or human assistant) and the group to which they belong (all chatbot or human assistants). There was no significant difference in the spillover effects between the AI and human contexts. In Study 2 (N = 684), we tested whether spillover persisted when the agent was individuated with a name and described as an AI or human, rather than specifically as a chatbot or personal assistant. We found that spillover persisted in the AI context but not in the human context, possibly because AIs were perceived as more homogeneous due to their outgroup status relative to humans. This asymmetry suggests a double standard whereby AIs are judged more harshly than humans when one agent morally transgresses. With the proliferation of diverse, autonomous AI systems, HCI research and design should account for the fact that experiences with one AI could easily generalize to perceptions of all AIs and negative HCI outcomes, such as reduced trust.
Article
AI increasingly assumes complex roles in Human-AI Teaming (HAT). However, communication and trust issues between humans and AI often hinder effective collaboration within HATs, highlighting a need for effective human-centered team training, an area significantly understudied. To address this gap, we interviewed eSports athletes and team-based, competitive gamers (N=22), a group experienced in HATs and team training, about their HAT team training needs and desires. Through the lens of Quantitative Ethnography (QE), we analyzed their insights to understand preferred team training strategies and the desired roles of AI within these strategies, considering the varying levels of human expertise. Our findings reveal a strong preference across all expertise levels for cross-training, which is training in other teammate roles, to improve perspective taking and coordination in HATs. Less experienced participants prefer structured procedural training, while experts favor self-correction methods for growth. Additionally, participants desired that AI act as a companion, with beginners and intermediates valuing AI's functional roles, and experts seeking AI in a coaching role. Among the first to emphasize human-centered team training in HATs, this study contributes to CSCW/HCI research by revealing varied preferences for training and AI roles, emphasizing the need to tailor these aspects to team dynamics and individual skills for better outcomes in HATs.
Article
The advancements of Large Language Models (LLMs) have decentralized the responsibility for the transparency of AI usage. Specifically, LLM users are now encouraged or required to disclose the use of LLM-generated content for varied types of real-world tasks. However, an emerging phenomenon, users' secret use of LLM , raises challenges in ensuring end users adhere to the transparency requirement. Our study used mixed-methods with an exploratory survey (125 real-world secret use cases reported) and a controlled experiment among 300 users to investigate the contexts and causes behind the secret use of LLMs. We found that such secretive behavior is often triggered by certain tasks, transcending demographic and personality differences among users. Task types were found to affect users' intentions to use secretive behavior, primarily through influencing perceived external judgment regarding LLM usage. Our results yield important insights for future work on designing interventions to encourage more transparent disclosure of the use of LLMs or other AI technologies.
Chapter
In this chapter, I lay out the ten most significant causes of the machine penalty, along with research evidence supporting each. First, there are the five human–AI comparisons: appearance, identity, behavior, mind, and essence. The penalty increases or is more likely when AI look less like humans, are compared to human experts, make mistakes, have lower levels of perceived mind, or lack some essential human capacity. Second, there are five situations which also influence the penalty: whether the situation is controllable, personal, important, subjective, or moral. The penalty increases or is more likely in situations where the perceiver desires more autonomy, the outcome is personal or personalized, the outcome is less important in general, or the situation involves subjective, moral, or human well-being decisions.
Article
Full-text available
The practical use of AI technologies with user interactions (e.g. in the form of self-service applications in consulting) require users to be able to understand and comprehend the results generated. A knowledge graph-based approach to process analyses with interactive machine learning methods identifies weaknesses and suitable improvement measures in business processes. In order to present the analysis results in a user-understandable way, e.g. for consulting clients, and to enable verification and corrections by expert users, an explainable and user-friendly interface is required. While many explainable AI researchers deal with computational aspects of generating explanations, there is less research on the design of eXplanation User Interfaces (XUI). In this paper, a systematic literature review identifies 41 XUIs for interactive machine learning, deriving design components and summarizing them in a design catalog, which forms the basis for specifying requirements on these interfaces. For evaluation purposes, requirements objectives regarding an XUI for the knowledge graph-based approach of process analysis were defined and specified with the help of selected design components from the design catalog. The requirements specifications were afterwards implemented and demonstrated using an example process. An evaluation with process analysts and consultants shows that it depends not on a high number of implemented design components, but rather on a careful selection of different forms of explanation (e.g. visual, textual) for both local and global explanation content in order to present analysis results in a comprehensible and understandable way. XUI with interaction functions for verifying and correcting analysis results increase the willingness to use AI systems. This can help to improve the acceptance of AI technologies in day-to-day consulting for both consultants and their clients.
Article
Full-text available
Objectives: This study aims to explore the role of artificial intelligence (AI) in alleviating consumer anxiety and enhancing customer experience across various industries. It seeks to analyze AI-driven tools and their effectiveness in mitigating consumer concerns while addressing the ethical and legal dimensions of their application. Theoretical Framework: The research is grounded in theoretical principles related to AI, consumer behavior, and trust-building mechanisms in customer relationships. It examines key conceptual frameworks governing AI's role in reducing consumer stress and fostering positive interactions. Method: This study employs a theoretical investigation, drawing from existing literature on AI applications in customer service. It critically analyzes AI-powered tools such as chatbots, sentiment analysis, and predictive analytics, evaluating their impact on consumer anxiety and trust. Results and Discussion: The findings highlight AI's potential in addressing consumer anxiety through personalized interactions and predictive solutions. AI-driven tools enhance customer support efficiency and responsiveness, ultimately improving consumer confidence. However, the study also underscores ethical and legal challenges, including consumer rights protection, corporate accountability, and compliance with ethical guidelines. Research Implications: This research provides valuable insights for businesses seeking to integrate AI into customer relations strategies. It offers a framework for developing AI-based solutions that foster trust, reduce stress, and ensure ethical compliance. Originality/Value: By combining AI-driven consumer anxiety reduction with ethical and legal considerations, this study presents a comprehensive approach to responsible AI deployment in customer relations. It serves as a guide for businesses aiming to balance innovation with consumer protection.
Article
Full-text available
This paper investigates the impact of personalized AI communication on clinical outcomes in breast cancer diagnosis. Our study examines how different AI communication styles influence diagnostic performance, workload, and trust among clinicians, focusing on imaging diagnosis. We engaged 52 clinicians, categorized as interns, juniors, middles, and seniors, who diagnosed patient cases using conventional and assertiveness-based AI communication. Results show that personalized AI communication reduced diagnostic time by a factor of 1.38 for interns and juniors, and by a factor of 1.37 for middle and senior clinicians, without compromising accuracy. Interns and juniors reduced their diagnostic errors by 39.2% with a more authoritative agent, while middle and senior clinicians saw a 5.5% reduction with a more suggestive agent. Clinicians preferred assertiveness-based AI agents for their clarity and competence, valuing detailed, contextual explanations over numerical outputs. These findings underscore the need for adaptable AI communication to build trust, reduce cognitive load, and streamline clinical workflows. This work offers valuable insights for designing effective AI systems in high-stakes domains, contributing significantly to the Human-Computer Interaction community by enhancing our understanding of AI-mediated clinical support.
Article
Selecting an effective utterance among countless possibilities that match a user's intention poses a challenge when using natural language interfaces. To address the challenge, we leveraged the principle of least collaborative effort in communication grounding theory and designed three grounded conversational interactions: 1) a grounding interface allows users to start with a provisional input and then invite a conversational agent to complete their input, 2) a multiple grounding interface presents multiple inputs for the user to select from, and 3) a structured grounding interface guides users to write inputs in a structure best understood by the system. We compared our three grounding interfaces to an ungrounded control interface in a crowdsourced study (N=80) using a natural language system that generates small programs. We found that the grounding interfaces reduced cognitive load and improved task performance. The structured grounding interface further reduced speaker change costs and improved technology acceptance, without sacrificing the perception of control. We discuss the implications of designing grounded conversational interactions in natural language systems.
Article
Large-scale generative models have enabled the development of AI-powered code completion tools to assist programmers in writing code. Like all AI-powered tools, these code completion tools are not always accurate and can introduce bugs or even security vulnerabilities into code if not properly detected and corrected by a human programmer. One technique that has been proposed and implemented to help programmers locate potential errors is to highlight uncertain tokens. However, little is known about the effectiveness of this technique. Through a mixed-methods study with 30 programmers, we compare three conditions: providing the AI system's code completion alone, highlighting tokens with the lowest likelihood of being generated by the underlying generative model, and highlighting tokens with the highest predicted likelihood of being edited by a programmer. We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits, and is subjectively preferred by study participants. In contrast, highlighting tokens according to their probability of being generated does not provide any benefit over the baseline with no highlighting. We further explore the design space of how to convey uncertainty in AI-powered code completion tools and find that programmers prefer highlights that are granular, informative, interpretable, and not overwhelming. This work contributes to building an understanding of what uncertainty means for generative models and how to convey it effectively.
Article
Full-text available
Donation-based support for open, peer production projects such as Wikipedia is an important mechanism for preserving their integrity and independence. For this reason understanding donation behavior and incentives is crucial in this context. In this work, using a dataset of aggregated donation information from Wikimedia's 2015 fund-raising campaign, representing nearly 1 million pages from English and French language versions of Wikipedia, we explore the relationship between the properties of contents of a page and the number of donations on this page. Our results suggest the existence of a reciprocity mechanism, meaning that articles that provide more utility value attract a higher rate of donation. We discuss these and other findings focusing on the impact they may have on the design of banner-based fundraising campaigns. Our findings shed more light on the mechanisms that lead people to donate to Wikipedia and the relation between properties of contents and donations.
Article
Full-text available
Mobile, wearable and other connected devices allow people to collect and explore large amounts of data about their own activities, behavior, and well-being. Yet, learning from-, and acting upon such data remain a challenge. The process of reflection has been identified as a key component of such learning. However, most tools do not explicitly design for reflection, carrying an implicit assumption that providing access to self-tracking data is sufficient. In this paper, we present Reflection Companion, a mobile conversational system that supports engaging reflection on personal sensed data, specifically physical activity data collected with fitness trackers. Reflection Companion delivers daily adaptive mini-dialogues and graphs to users' mobile phones to promote reflection. To generate our system's mini dialogues, we conducted a set of workshops with fitness trackers users, producing a diverse corpus of 275 reflection questions synthesized into a set of 25 reflection mini dialogues. In a 2-week field deployment with 33 active Fitbit users, we examined our system's ability to engage users in reflection through dialog. Results suggest that the mini-dialogues were successful in triggering reflection and that this reflection led to increased motivation, empowerment, and adoption of new behaviors. As a strong indicator of our system's value, 16 of the 33 participants elected to continue using the system for two additional weeks without compensation. We present our findings and describe implications for the design of technology-supported dialog systems for reflection on data.
Article
Full-text available
Machine learning (ML) has become increasingly influential to human society, yet the primary advancements and applications of ML are driven by research in only a few computational disciplines. Even applications that affect or analyze human behaviors and social structures are often developed with limited input from experts outside of computational fields. Social scientists—experts trained to examine and explain the complexity of human behavior and interactions in the world—have considerable expertise to contribute to the development of ML applications for human-generated data, and their analytic practices could benefit from more human-centered ML methods. Although a few researchers have highlighted some gaps between ML and social sciences [51, 57, 70], most discussions only focus on quantitative methods. Yet many social science disciplines rely heavily on qualitative methods to distill patterns that are challenging to discover through quantitative data. One common analysis method for qualitative data is qualitative coding. In this article, we highlight three challenges of applying ML to qualitative coding. Additionally, we utilize our experience of designing a visual analytics tool for collaborative qualitative coding to demonstrate the potential in using ML to support qualitative coding by shifting the focus to identifying ambiguity. We illustrate dimensions of ambiguity and discuss the relationship between disagreement and ambiguity. Finally, we propose three research directions to ground ML applications for social science as part of the progression toward human-centered machine learning.
Conference Paper
Full-text available
Machine learning (ML) is now a fairly established technology, and user experience (UX) designers appear regularly to integrate ML services in new apps, devices, and systems. Interestingly, this technology has not experienced a wealth of design innovation that other technologies have, and this might be because it is a new and difficult design material. To better understand why we have witnessed little design innovation, we conducted a survey of current UX practitioners with regards to how new ML services are envisioned and developed in UX practice. Our survey probed on how ML may or may not have been a part of their UX design education, on how they work to create new things with developers, and on the challenges they have faced working with this material. We use the findings from this survey and our review of related literature to present a series of challenges for UX and interaction design research and education. Finally, we discuss areas where new research and new curriculum might help our community unlock the power of design thinking to re-imagine what ML might be and might do.
Article
Full-text available
Our ultimate goal is to efficiently enable end-users to correctly anticipate a robot's behavior in novel situations. This behavior is often a direct result of the robot's underlying objective function. Our insight is that end-users need to have an accurate mental model of this objective function in order to understand and predict what the robot will do. While people naturally develop such a mental model over time through observing the robot act, this familiarization process may be lengthy. Our approach reduces this time by having the robot model how people infer objectives from observed behavior, and then selecting those behaviors that are maximally informative. The problem of computing a posterior over objectives from observed behavior is known as Inverse Reinforcement Learning (IRL), and has been applied to robots learning human objectives. We consider the problem where the roles of human and robot are swapped. Our main contribution is to recognize that unlike robots, humans will not be \emph{exact} in their IRL inference. We thus introduce two factors to define candidate approximate-inference models for human learning in this setting, and analyze them in a user study in the autonomous driving domain. We show that certain approximate-inference models lead to the robot generating example behaviors that better enable users to anticipate what the robot will do in test situations. Our results also suggest, however, that additional research is needed in modeling how humans extrapolate from examples of robot behavior.
Article
Full-text available
We develop and test a model that suggests that expectations influence subjective usability and emotional experiences and, thereby, behavioral intentions to continue use and to recommend the service to others. A longitudinal study of 165 real-life users examined the proposed model in a proximity mobile payment domain at three time points: before use, after three weeks of use, and after six weeks of use. The results confirm the short-term influence of expectations on users? evaluations of both usability and enjoyment of the service after three weeks of real-life use. Users? evaluations of their experiences mediated the influence of expectations on behavioral intentions. However, after six weeks, users? cumulative experiences of the mobile payment service had the strongest impact on their evaluations and the effect of pre-use expectations decreased. The research clarifies the role of expectations and highlights the importance of viewing expectations through a temporal perspective when evaluating user experience.
Conference Paper
Full-text available
Robots have the potential to save lives in emergency scenarios, but could have an equally disastrous effect if participants overtrust them. To explore this concept, we performed an experiment where a participant interacts with a robot in a non-emergency task to experience its behavior and then chooses whether to follow the robot's instructions in an emergency or not. Artificial smoke and fire alarms were used to add a sense of urgency. To our surprise, all 26 participants followed the robot in the emergency, despite half observing the same robot perform poorly in a navigation guidance task just minutes before. We performed additional exploratory studies investigating different failure modes. Even when the robot pointed to a dark room with no discernible exit the majority of people did not choose to safely exit the way they entered.
Article
Full-text available
Before using an interactive product, people form expectations about what the experience of use will be like. These expectations may affect both the use of the product and users’ attitudes toward it. This article briefly reviews existing theories of expectations to design and perform two crowdsourced experiments that investigate how expectations affect user experience measures. In the experiments, participants saw a primed or neutral review of a simple online game, played it, and rated it on various user experience measures. Results suggest that when expectations are confirmed, users tend to assimilate their ratings with their expectations; conversely, if the product quality is inconsistent with expectations, users tend to contrast their ratings with expectations and give ratings correlated with the level of disconfirmation. Results also suggest that expectation disconfirmation can be used more widely in analyses of user experience, even when the analyses are not specifically concerned with expectation disconfirmation.
Article
Full-text available
Quick interaction between a human teacher and a learning machine presents numerous benefits and challenges when working with web-scale data. The human teacher guides the machine towards accomplishing the task of interest. The learning machine leverages big data to find examples that maximize the training value of its interaction with the teacher. When the teacher is restricted to labeling examples selected by the machine, this problem is an instance of active learning. When the teacher can provide additional information to the machine (e.g., suggestions on what examples or predictive features should be used) as the learning task progresses, then the problem becomes one of interactive learning. To accommodate the two-way communication channel needed for efficient interactive learning, the teacher and the machine need an environment that supports an interaction language. The machine can access, process, and summarize more examples than the teacher can see in a lifetime. Based on the machine's output, the teacher can revise the definition of the task or make it more precise. Both the teacher and the machine continuously learn and benefit from the interaction. We have built a platform to (1) produce valuable and deployable models and (2) support research on both the machine learning and user interface challenges of the interactive learning problem. The platform relies on a dedicated, low-latency, distributed, in-memory architecture that allows us to construct web-scale learning machines with quick interaction speed. The purpose of this paper is to describe this architecture and demonstrate how it supports our research efforts. Preliminary results are presented as illustrations of the architecture but are not the primary focus of the paper.
Article
Full-text available
Intelligent interactive systems (IIS) have great potential to improve users' experience with technology by tailoring their behaviour and appearance to users' individual needs; however, these systems, with their complex algorithms and dynamic behaviour, can also suffer from a lack of comprehensibility and transparency. We present the results of two studies examining the comprehensibility of, and desire for explanations with deployed, low-cost IIS. The first study, a set of interviews with 21 participants, reveals that i) comprehensibility is not always dependent on explanations, and ii) the perceived cost of viewing explanations tends to outweigh the anticipated benefits. Our second study, a two-week diary study with 14 participants, confirms these findings in the context of daily use, with participants indicating a desire for an explanation in only 7% of diary entries. We discuss the implications of our findings for the design of explanation facilities.
Article
Full-text available
Reviews interpretations of the effect of expectation and disconfirmation on perceived product performance. At issue is the relative effect of the initial expectation level and the degree of positive or negative disconfirmation on affective judgments following product exposure. Although the results of prior studies suggest a dominant expectation effect, it is argued that detection of the disconfirmation phenomenon may have been clouded by a conceptual and methodological overdetermination problem. To test this notion, 243 college students responded to expectation and disconfirmation measures in a 3-stage field study of reactions to a recently introduced automobile model. These measures were later related to postexposure affect and intention variables in a hierarchical analysis of variance design. Although the results support earlier conclusions that level of expectation is related to postexposure judgments, it is also shown that the disconfirmation experience may have an independent and equally significant impact. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
The present research develops and tests a theoretical extension of the Technology Acceptance Model (TAM) that explains perceived usefulness and usage intentions in terms of social influence and cognitive instrumental processes. The extended model, referred to as TAM2, was tested using longitudinal data collected regarding four different systems at four organizations (N = 156), two involving voluntary usage and two involving mandatory usage. Model constructs were measured at three points in time at each organization: preimplementation, one month postimplementation, and three months postimplementation. The extended model was strongly supported for all four organizations at all three points of measurement, accounting for 40%--60% of the variance in usefulness perceptions and 34%--52% of the variance in usage intentions. Both social influence processes (subjective norm, voluntariness, and image) and cognitive instrumental processes (job relevance, output quality, result demonstrability, and perceived ease of use) significantly influenced user acceptance. These findings advance theory and contribute to the foundation for future research aimed at improving our understanding of user adoption behavior.
Conference Paper
Full-text available
Understanding the complexities of users' judgements and user experience is a prerequisite for informing HCI design. Current user experience (UX) research emphasises that, beyond usability, non-instrumental aspects of system quality contribute to overall judgement and that the user experience is subjective and variable. Based on judgement and decision-making theory, we have previously demonstrated that judgement of websites can be influenced by contextual factors. This paper explores the strength of such contextual influence by investigating framing effects on user judgement of website quality. Two experimental studies investigate how the presentation of information about a website influences the user experience and the relative importance of individual quality attributes for overall judgement. Theoretical implications for the emerging field of UX research and practical implications for design are discussed.
Conference Paper
Full-text available
For automatic or context-aware systems a major issue is user trust, which is to a large extent determined by system reliability. For systems based on sensor input which are inherently uncertain or even uncomplete there is little hope that they will ever be perfectly reliable. In this paper we test the hypothesis if explicitly displaying the current confidence of the system increases the usability of such systems. For the example of a context-aware mobile phone, the experiments show that displaying confidence information increases the user’s trust in the system.
Conference Paper
Full-text available
Context-aware mobile applications and systems have been extensively explored in the last decade and in the last few years we already saw promising products on the market. Most of these applications assume that context data is highly accurate. But in practice this information is often unreliable, especially when gathered from sensors or external sources. Previous research has argued that the system usability can be improved by displaying the uncertainty to the user. The research presented in this paper shows that it is not always an advantage to show the confidence of the context-aware application to the user. We developed a system for automatic form filling on mobile devices which fills in any web form with user data stored on the mobile device. The used algorithm generates rules which indicate with which probability which input field of a form should be filled in with which value. Based on this we developed two versions of our system. One shows the uncertainty of the system and one not. We then conducted a user study which shows that the user needs slightly more time and produces slightly more errors when the confidence of the system is visualized.
Conference Paper
Full-text available
The aim of this project is detection, analysis and recognition of facial features. The system operates on grayscale images. For the analysis Haar-like face detector was used along with anthropometric face model and a hybrid feature detection approach. The system localizes 17 characteristic points of analyzed face and, based on their displacements certain emotions can be automatically recognized. The system was tested on a publicly available database (Japanese Female Expression Database) JAFFE with ca. 77% accuracy for 7 basic emotions using various classifiers. Thanks to its open structure the system can cooperate well with any HCI system.
Article
Full-text available
Individual-level information systems adoption research has recently seen the introduction of expectation-disconfirmation theory (EDT) to explain how and why user reactions change over time. This prior research has produced valuable insights into the phenomenon of technology adoption beyond traditional models, such as the technology acceptance model. First, we identify gaps in EDT research that present potential opportunities for advances-specifically, we discuss methodological and analytical limitations in EDT research in information systems and present polynomial modeling and response surface methodology as solutions. Second, we draw from research on cognitive dissonance, realistic job preview, and prospect theory to present a polynomial model of expectation-disconfirmation in information systems. Finally, we test our model using data gathered over a period of 6 months among 1,143 employees being introduced to a new technology. The results confirmed our hypotheses that disconfirmation in general was bad, as evidenced by low behavioral intention to continue using a system for both positive and negative disconfirmation, thus supporting the need for a polynomial model to understand expectation disconfirmation in information systems.
Article
The authors investigate whether it is necessary to include disconfirmation as an intervening variable affecting satisfaction as is commonly argued, or whether the effect of disconfirmation is adequately captured by expectation and perceived performance. Further, they model the process for two types of products, a durable and a nondurable good, using experimental procedures in which three levels of expectations and three levels of performance are manipulated for each product in a factorial design. Each subject's perceived expectations, performance evaluations, disconfirmation, and satisfaction are subsequently measured by using multiple measures for each construct. The results suggest the effects are different for the two products. For the nondurable good, the relationships are as typically hypothesized. The results for the durable good are different in important respects. First, neither the disconfirmation experience nor subjects’ initial expectations affected subjects’ satisfaction with it. Rather, their satisfaction was determined solely by the performance of the durable good. Expectations did combine with performance to affect disconfirmation, though the magnitude of the disconfirmation experience did not translate into an impact on satisfaction. Finally, the direct performance-satisfaction link accounts for most of the variation in satisfaction.
Article
This study experimentally investigated the effects on product ratings of both overstatement and understatement of product quality. Results support common marketing practice in that overstatement resulted in more favorable ratings and understatement resulted in less favorable ratings.
Article
Results of a laboratory experiment indicate that customer satisfaction with a product is influenced by the effort expended to acquire the product, and the expectations concerning the product. Specifically, the experiment suggests that satisfaction with the product may be higher when customers expend considerable effort to obtain the product than when they use only modest effort. This finding is opposed to usual notions of marketing efficiency and customer convenience. The research also suggests that customer satisfaction is lower when the product does not come up to expectations than when the product meets expectations.
Article
Four psychological theories are considered in determining the effects of disconfirmed expectations on perceived product performance and consumer satisfaction. Results reveal that too great a gap between high consumer expectations and actual product performance may cause a less favorable evaluation of a product than a somewhat lower level of disparity.
Conference Paper
Algorithmic prioritization is a growing focus for social media users. Control settings are one way for users to adjust the prioritization of their news feeds, but they prioritize feed content in a way that can be difficult to judge objectively. In this work, we study how users engage with difficult-to-validate controls. Via two paired studies using an experimental system -- one interview and one online study -- we found that control settings functioned as placebos. Viewers felt more satisfied with their feed when controls were present, whether they worked or not. We also examine how people engage in sensemaking around control settings, finding that users often take responsibility for violated expectations -- for both real and randomly functioning controls. Finally, we studied how users controlled their social media feeds in the wild. The use of existing social media controls had little impact on user's satisfaction with the feed; instead, users often turned to improvised solutions, like scrolling quickly, to see what they want.
Conference Paper
Advances in artificial intelligence, sensors and big data management have far-reaching societal impacts. As these systems augment our everyday lives, it becomes increasing-ly important for people to understand them and remain in control. We investigate how HCI researchers can help to develop accountable systems by performing a literature analysis of 289 core papers on explanations and explaina-ble systems, as well as 12,412 citing papers. Using topic modeling, co-occurrence and network analysis, we mapped the research space from diverse domains, such as algorith-mic accountability, interpretable machine learning, context-awareness, cognitive psychology, and software learnability. We reveal fading and burgeoning trends in explainable systems, and identify domains that are closely connected or mostly isolated. The time is ripe for the HCI community to ensure that the powerful new autonomous systems have intelligible interfaces built-in. From our results, we propose several implications and directions for future research to-wards this goal.
Conference Paper
Everyday predictive systems typically present point predic­tions, making it hard for people to account for uncertainty when making decisions. Evaluations of uncertainty displays for transit prediction have assessed people’s ability to extract probabilities, but not the quality of their decisions. In a controlled, incentivized experiment, we had subjects decide when to catch a bus using displays with textual uncertainty, uncer­ tainty visualizations, or no-uncertainty (control). Frequency- based visualizations previously shown to allow people to bet­ ter extract probabilities (quantile dotplots) yielded better deci­sions. Decisions with quantile dotplots with 50 outcomes were (1) better on average, having expected payoffs 97% of optimal (95% CI: [95%,98%]), 5 percentage points more than con­ trol (95% CI: [2,8]); and (2) more consistent, having within- subject standard deviation of 3 percentage points (95% CI: [2,4]), 4 percentage points less than control (95% CI: [2,6]). Cumulative distribution function plots performed nearly as well, and both outperformed textual uncertainty, which was sensitive to the probability interval communicated. We discuss implications for realtime transit predictions and possible generalization to other domains.
Article
Although information workers may complain about meetings, they are an essential part of their work life. Consequently, busy people spend a significant amount of time scheduling meetings. We present Calendar.help, a system that provides fast, efficient scheduling through structured workflows. Users interact with the system via email, delegating their scheduling needs to the system as if it were a human personal assistant. Common scheduling scenarios are broken down using well-defined workflows and completed as a series of microtasks that are automated when possible and executed by a human otherwise. Unusual scenarios fall back to a trained human assistant who executes them as unstructured macrotasks. We describe the iterative approach we used to develop Calendar.help, and share the lessons learned from scheduling thousands of meetings during a year of real-world deployments. Our findings provide insight into how complex information tasks can be broken down into repeatable components that can be executed efficiently to improve productivity.
Conference Paper
Shared expectations and mutual understanding are critical facets of teamwork. Achieving these in human-robot collaborative contexts can be especially challenging, as humans and robots are unlikely to share a common language to convey intentions, plans, or justifications. Even in cases where human co-workers can inspect a robot's control code, and particularly when statistical methods are used to encode control policies, there is no guarantee that meaningful insights into a robot's behavior can be derived or that a human will be able to efficiently isolate the behaviors relevant to the interaction. We present a series of algorithms and an accompanying system that enables robots to autonomously synthesize policy descriptions and respond to both general and targeted queries by human collaborators. We demonstrate applicability to a variety of robot controller types including those that utilize conditional logic, tabular reinforcement learning, and deep reinforcement learning, synthesizing informative policy descriptions for collaborators and facilitating fault diagnosis by non-experts.
Conference Paper
Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally varound the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.
Conference Paper
Autonomous systems are designed to take actions on behalf of users, acting autonomously upon data from sensors or online sources. As such, the design of interaction mechanisms that enable users to understand the operation of autonomous systems and flexibly delegate or regain control is an open challenge for HCI. Against this background, in this paper we report on a lab study designed to investigate whether displaying the confidence of an autonomous system about the quality of its work, which we call its confidence information, can improve user acceptance and interaction with autonomous systems. The results demonstrate that confidence information encourages the usage of the autonomous system we tested, compared to a situation where such information is not available. Furthermore, an additional contribution of our work is the methodology we employ to study users' incentives to do work in collaboration with the autonomous system. In experiments comparing different incentive strategies, our results indicate that our translation of behavioural economics research methods to HCI can support the study of interactions with autonomous systems in the lab.
Conference Paper
Users often rely on realtime predictions in everyday contexts like riding the bus, but may not grasp that such predictions are subject to uncertainty. Existing uncertainty visualizations may not align with user needs or how they naturally reason about probability. We present a novel mobile interface design and visualization of uncertainty for transit predictions on mobile phones based on discrete outcomes. To develop it, we identified domain specific design requirements for visualizing uncertainty in transit prediction through: 1) a literature review, 2) a large survey of users of a popular realtime transit application, and 3) an iterative design process. We present several candidate visualizations of uncertainty for realtime transit predictions in a mobile context, and we propose a novel discrete representation of continuous outcomes designed for small screens, quantile dotplots. In a controlled experiment we find that quantile dotplots reduce the variance of probabilistic estimates by ~1.15 times compared to density plots and facilitate more confident estimation by end-users in the context of realtime transit prediction scenarios.
Article
To perform a survey-based assessment of patients' knowledge of radiologic imaging examinations, including patients' perspectives regarding communication of such information. Adult patients were given a voluntary survey before undergoing an outpatient imaging examination at our institution. Survey questions addressed knowledge of various aspects of the examination, as well as experiences, satisfaction, and preferences regarding communication of such knowledge. A total of 176 surveys were completed by patients awaiting CT (n = 45), MRI (n = 41), ultrasound (n = 46), and nuclear medicine (n = 44) examinations. A total of 97.1% and 97.8% of patients correctly identified the examination modality and the body part being imaged, respectively. A total of 45.8% correctly identified whether the examination entailed radiation; 51.1% and 71.4% of patients receiving intravenous or oral contrast, respectively, correctly indicated its administration. A total of 78.6% indicated that the ordering physician explained the examination in advance; among these, 72.1% indicated satisfaction with the explanation. A total of 21.8% and 20.5% indicated consulting the Internet, or friends and family, respectively, to learn about the examination. An overall understanding of the examination was reported by 70.8%. A total of 18.8% had unanswered questions about the examination, most commonly regarding examination logistics, contrast-agent usage, and when results would be available. A total of 52.9% were interested in discussing the examination with a radiologist in advance. Level of understanding was greatest for CT and least for nuclear medicine examinations, and lower when patients had not previously undergone the given examination. Patients' knowledge of their imaging examinations is frequently incomplete. The findings may motivate initiatives to improve patients' understanding of their imaging examinations, enhancing patient empowerment and contributing to patient-centered care. Copyright © 2015 American College of Radiology. Published by Elsevier Inc. All rights reserved.
Article
Four psychological theories are considered in determining the effects of disconfirmed expectations on perceived product performance and consumer satisfaction. Results reveal that too great a gap between high consumer expectations and actual product performance may cause a less favorable evaluation of a product than a somewhat lower level of disparity.
Conference Paper
Nowadays, people are overwhelmed with multiple tasks and responsibilities, resulting in increasing stress level. At the same time, it becomes harder to find time for self-reflection and diagnostics of problems that can be source of stress. In this paper, we propose a tool that supports a person in self-reflection by providing views on life events in their relation to person's well-being in a concise and intuitive form. The tool, called LifelogExplorer, takes sensor data (like skin conductance and accelerometer measurements) and data obtained from digital sources (like personal calendars) as input and generates views on this data which are comprehensible and meaningful for the user due to filtering and aggregation options which help to cope with the data explosion. We evaluate our approach on the data collected from two case studies focused on addressing stress at work: 1) with academic staff of a university, and 2) with teachers from a vocational school.
Article
Results of a laboratory experiment indicate that customer satisfaction with a product is influenced by the effort expended to acquire the product, and the expectations concerning the product. Specifically, the experiment suggests that satisfaction with the product may be higher when customers expend considerable effort to obtain the product than when they use only modest effort. This finding is opposed to usual notions of marketing efficiency and customer convenience. The research also suggests that customer satisfaction is lower when the product does not come up to expectations than when the product meets expectations.
Article
Nine pictorial displays for communicating quantitative information about the value of an uncertain quantity, x, were evaluated for their ability to communicate x̄, p(x > a) and p(b > x > a) to well-educated semi-and nontechnical subjects. Different displays performed best in different applications. Cumulative distribution functions alone can severely mislead some subjects in estimating the mean. A “rusty” knowledge of statistics did not improve performance, and even people with a good basic knowledge of statistics did not perform as well as one would like. Until further experiments are performed, the authors recommend the use of a cumulative distribution function plotted directly above a probability density function with the same horizontal scale, and with the location of the mean clearly marked on both curves.
Article
Context-aware intelligent systems employ implicit inputs, and make decisions based on complex rules and machine learning models that are rarely clear to users. Such lack of system intelligibility can lead to loss of user trust, satisfaction and acceptance of these systems. However, automatically providing explanations about a system"s decision process can help mitigate this problem. In this paper we present results from a controlled study with over 200 participants in which the effectiveness of different types of explanations was examined. Participants were shown examples of a system"s operation along with various automatically generated explanations, and then tested on their understanding of the system. We show, for example, that explanations describing why the system behaved a certain way resulted in better understanding and stronger feelings of trust. Explanations describing why the system did not behave a certain way, resulted in lower understanding yet adequate performance. We discuss implications for the use of our findings in real-world context-aware applications.
Article
A recent and dramatic increase in the use of automation has not yielded comparable improvements in performance. Researchers have found human operators often underutilize (disuse) and overly rely on (misuse) automated aids (Parasuraman and Riley, 1997). Three studies were performed with Cameron University students to explore the relationship among automation reliability, trust, and reliance. With the assistance of an automated decision aid, participants viewed slides of Fort Sill terrain and indicated the presence or absence of a camouflaged soldier. Results from the three studies indicate that trust is an important factor in understanding automation reliance decisions. Participants initially considered the automated decision aid trustworthy and reliable. After observing the automated aid make errors, participants distrusted even reliable aids, unless an explanation was provided regarding why the aid might err. Knowing why the aid might err increased trust in the decision aid and increased automation reliance, even when the trust was unwarranted. Our studies suggest a need for future research focused on understanding automation use, examining individual differences in automation reliance, and developing valid and reliable self-report measures of trust in automation.
Article
Although machine learning is becoming commonly used in today's software, there has been little research into how end users might interact with machine learning systems, beyond communicating simple “right/wrong” judgments. If the users themselves could work hand-in-hand with machine learning systems, the users’ understanding and trust of the system could improve and the accuracy of learning systems could be improved as well. We conducted three experiments to understand the potential for rich interactions between users and machine learning systems. The first experiment was a think-aloud study that investigated users’ willingness to interact with machine learning reasoning, and what kinds of feedback users might give to machine learning systems. We then investigated the viability of introducing such feedback into machine learning systems, specifically, how to incorporate some of these types of user feedback into machine learning systems, and what their impact was on the accuracy of the system. Taken together, the results of our experiments show that supporting rich interactions between users and machine learning systems is feasible for both user and machine. This shows the potential of rich human–computer collaboration via on-the-spot interactions as a promising direction for machine learning systems and users to collaboratively share intelligence.
Conference Paper
Context-aware applications should be intelligible so users can better understand how they work and improve their trust in them. However, providing intelligibility is non-trivial and requires the developer to understand how to generate explanations from application decision models. Furthermore, users need different types of explanations and this complicates the implementation of intelligibility. We have developed the Intelligibility Toolkit that makes it easy for application developers to obtain eight types of explanations from the most popular decision models of context-aware applications. We describe its extensible architecture, and the explanation generation algorithms we developed. We validate the usefulness of the toolkit with three canonical applications that use the toolkit to generate explanations for end-users.
Conference Paper
Intelligibility can help expose the inner workings and inputs of context-aware applications that tend to be opaque to users due to their implicit sensing and actions. However, users may not be interested in all the information that the applications can produce. Using scenarios of four real-world applications that span the design space of context-aware computing, we conducted two experiments to discover what information users are interested in. In the first experiment, we elicit types of information demands that users have and under what moderating circumstances they have them. In the second experiment, we verify the findings by soliciting users about which types they would want to know and establish whether receiving such information would satisfy them. We discuss why users demand certain types of information, and provide design implications on how to provide different intelligibility types to make context-aware applications intelligible and acceptable to users.
Article
Information systems with an "intelligent" or "knowledge" component are now prevalent and include knowledge-based systems, decision support systems, intelligent agents and knowledge management systems. These systems are in principle capable of explaining their reasoning or justifying their behavior. There appears to be a lack of understanding, however, of the benefits that can flow from explanation use, and how an explanation function should be constructed. Work with newer types of intelligent systems and help functions for everyday systems, such as word-processors, appears in many cases to neglect lessons learned in the past. This paper attempts to rectify this situation by drawing together the considerable body of work on the nature and use of explanations. Empirical studies, mainly with knowledge-based systems, are reviewed and linked to a sound theoretical base. The theoretical base combines a cognitive effort perspective, cognitive learning theory, and Toulmin's model of argumentation. Conclusions drawn from the review have both practical and theoretical significance. Explanations are important to users in a number of circumstances - when the user perceives an anomaly, when they want to learn, or when they need a specific piece of knowledge to participate properly in problem solving. Explanations, when suitably designed, have been shown to improve performance and learning and result in more positive user perceptions of a system. The design is important, however, because it appears that explanations will not be used if the user has to exert "too much" effort to get them. Explanations should be provided automatically if this can be done relatively unobtrusively, or by hypertext links, and should be context-specific rather than generic. Explanations that conform to Toulmin's model of argumentation, in that they provide adequate justification for the knowledge offered, should be more persuasive and lead to greater trust, agreement, satisfaction, and acceptance - of the explanation and possibly also of the system as a whole.
Article
This paper examines cognitive beliefs and affect influencing one's intention to continue using (con- tinuance) information systems (IS). Expectation- confirmation theory is adapted from the consumer behavior literature and integrated with theoretical and empirical findings from prior IS usage research to theorize a model of IS continuance. Five research hypotheses derived from this model are empirically validated using a field survey of online banking users. The results suggest that users' continuance intention is determined by their satisfaction with IS use and perceived usefulness of continued IS use. User satisfaction, in turn, is influenced by their confirmation of expectation from prior IS use and perceived usefulness. Post- acceptance perceived usefulness is influenced by 1 Ron Weber was the accepting senior editor for this paper. users' confirmation level. This study draws atten- tion to the substantive differences between accep- tance and continuance behaviors, theorizes and validates one of the earliest theoretical models of IS continuance, integrates confirmation and user satisfaction constructs within our current under- standing of IS use, conceptualizes and creates an initial scale for measuring IS continuance, and offers an initial explanation for the acceptance- discontinuance anomaly.
Article
In an experiment conducted to study the effects of product expectations on subjective usability ratings, participants (N = 36) read a positive or a negative product review for a novel mobile device before a usability test, while the control group read nothing. In the test, half of the users performed easy tasks, and the other half hard ones, with the device. A standard usability test procedure was utilized in which objective task performance measurements as well as subjective post-task and post-experiment usability questionnaires were deployed. The study revealed a surprisingly strong effect of positive expectations on subjective post-experiment ratings: the participants who had read the positive review gave the device significantly better post-experiment ratings than did the negative-prime and no-prime groups. This boosting effect of the positive prime held even in the hard task condition where the users failed in most of the tasks. This finding highlights the importance of understanding: (1) what kinds of product expectations participants bring with them to the test, (2) how well these expectations represent those of the intended user population, and (3) how the test situation itself influences and may bias these expectations.
Article
Manuscript copy. Thesis--University of Florida. Vita. Bibliography: leaves 123-130.