ArticlePDF Available

Automation Bias: Decision Making and Performance in High-Tech Cockpits

Authors:

Abstract

Automated aids and decision support tools are rapidly becoming indispensable tools in high-technology cockpits and are assuming increasing control of"cognitive" flight tasks, such as calculating fuel-efficient routes, navigating, or detecting and diagnosing system malfunctions and abnormalities. This study was designed to investigate automation bias, a recently documented factor in the use of automated aids and decision support systems. The term refers to omission and commission errors resulting from the use of automated cues as a heuristic replacement for vigilant information seeking and processing. Glass-cockpit pilots flew flight scenarios involving automation events or opportunities for automation-related omission and commission errors. Although experimentally manipulated accountability demands did not significantly impact performance, post hoc analyses revealed that those pilots who reported an internalized perception of "accountability" for their performance and strategies of interaction with the automation were significantly more likely to double-check automated functioning against other cues and less likely to commit errors than those who did not share this perception. Pilots were also lilkely to erroneously "remember" the presence of expected cues when describing their decision-making processes.
... We hypothesized that accountability pressures would enhance both individual and team performance by increasing resource-sharing, resource utilization, responsiveness to partner needs, and effort. This expectation aligns with prior research showing that people engage in more complex cognitive strategies and maintain greater task engagement when they feel responsible for justifying their decisions (Chang et al. 2017;Mosier et al. 1998;Skitka et al. 2000a, b). ...
Article
Full-text available
Accountability pressures on human operators supervising automation have been shown to reduce automation bias, but with increasingly autonomous automation enabled by artificial intelligence, the work structure between people and automated agents may be less supervisory and more interactive or team-like. We thus tested the effects of accountability pressures in supervisory and interactive work structures, recruiting 60 participants to interact with an automated agent in a resource management task. High versus low accountability pressures were manipulated based on previous studies, by changing the task environment, i.e., task instructions and the researcher’s dress code. Results show that an interactive control structure facilitated higher throughput, fewer resources shared, and lower resource utility compared to participants in a supervisory control structure. Higher accountability pressures resulted in lower throughput, more resources shared, and lower resource utility compared to lower accountability pressures. Although task environment complexity makes it difficult to draw clean conclusions, our results indicate that with more interactive structures and higher outcome accountability pressures, people will engage in the most available actions to maximize individual performance even when suboptimal performance is needed to achieve the highest joint outcome.
... Nach der Erhebung demografischer Daten zu Alter und Geschlecht wurden Informationen zum Servicedienstleister abgefragt, insbesondere die Häufigkeit der Inanspruchnahme von Reiseberatungen in den letzten fünf Jahren sowie (Mosier et al. 1997). ...
Article
Full-text available
The service industry is now characterised by the use of artificial intelligence (AI). AI solutions can improve various services—not only as part of online services, but also offline as part of physical interaction. For example, AI can be used to enhance the personal interaction between customers and advisors in traditional bricks-and-mortar travel agencies. However, this requires an appropriate work design and the willingness of both parties to use it. In this article, we use a case study to investigate how customers assess the service quality of travel advice in different AI application scenarios. To this end, we described four AI functions of a possible future travel counter, which are intended to support the advisor in his or her work and to change the interaction between advisor and customer in different ways. We then asked travellers interested in a bricks-and-mortar travel agency how they would rate the quality of service they would receive at this future travel counter. Based on the results of this survey, 42 textual responses are qualitatively analysed as a sub-sample of a more comprehensive survey with regard to assessments of various dimensions of perceived service quality. The results make it clear that personal interaction is the most important unique feature of offline travel advice, which is valued by customers and continues to be desired and necessary. It is therefore crucial that travel advisors are not replaced by an AI solution, but rather supported in their work in order to improve the framework for interpersonal interaction in terms of service quality. Practical Relevance: The results of the survey can help to develop AI solutions in the context of work, and more specifically travel advice, in a way that best suits the needs of the people involved.
... Similarities to the environment of a SOC can be found in Air Traffic Control (ATC) rooms; Pilot Cockpits; and Military Command and Control Centers (CandC) (De-Arteaga et al., 2020;Mosier et al., 1998) due to their use of automation. Humanautomation research in these fields has investigated how automation is being employed and what its effects on the human operator are. ...
Article
Full-text available
Security Operation Centers (SOCs) comprise people, processes, and technology and are responsible for protecting their respective organizations against any form of cyber incident. These teams consist of SOC analysts, ranging from Tier 1 to Tier 3. In defending against cyber-attacks, SOCs monitor and respond to alert traffic from numerous sources. However, a commonly discussed challenge is the volume of alerts that need to be assessed. To aid SOC analysts in the alert triage process, SOCs integrate automation and automated decision aids (ADAs). Research in the human automation field has demonstrated that automation has the potential of cognitive skill degradation. This is because human operators can become over-reliant on automated systems despite the presence of contradictory information. This cognitive bias is known as automation bias. The result of this study is the development of four critical success factors (CSFs) for the adoption of automation within SOCs in an attempt to mitigate automation bias: (1) Task-based Automation; (2) Process-based Automation; (3) Automation Performance Appraisal; and (4) SOC Analyst Training of Automated Systems. In applying these CSFs, a beneficial balance between the SOC analyst and the use of automation is achieved. This study promotes the human-in-the-loop approach whereby experienced and cognitively aware SOC analysts remain at the core of SOC processes.
... While AI significantly improves accuracy and speed in detecting structural anomalies and predicting maintenance needs, it can also diminish the role of human intuition and expertise in decision-making. Research has shown that excessive trust in automated systems may lead to automation bias, where users are more likely to overlook their own assessments or fail to challenge AI-generated decisions [115,116]. Striking a balance between AI-driven automation and human oversight is essential to ensure responsible and safe infrastructure management [117]. ...
Article
Full-text available
This study explores the growing influence of artificial intelligence (AI) on structural health monitoring (SHM), a critical aspect of infrastructure maintenance and safety. This study begins with a bibliometric analysis to identify current research trends, key contributing countries, and emerging topics in AI-integrated SHM. We examine seven core areas where AI significantly advances SHM capabilities: (1) data acquisition and sensor networks, highlighting improvements in sensor technology and data collection; (2) data processing and signal analysis, where AI techniques enhance feature extraction and noise reduction; (3) anomaly detection and damage identification using machine learning (ML) and deep learning (DL) for precise diagnostics; (4) predictive maintenance, using AI to optimize maintenance scheduling and prevent failures; (5) reliability and risk assessment, integrating diverse datasets for real-time risk analysis; (6) visual inspection and remote monitoring, showcasing the role of AI-powered drones and imaging systems; and (7) resilient and adaptive infrastructure, where AI enables systems to respond dynamically to changing conditions. This review also addresses the ethical considerations and societal impacts of AI in SHM, such as data privacy, equity, and transparency. We conclude by discussing future research directions and challenges, emphasizing the potential of AI to enhance the efficiency, safety, and sustainability of infrastructure systems.
... For example, accountability could influence the attention and monitoring behavior of overseers in the search for evidence of errors (McBride et al., 2014;Skitka et al., 2000). In line with this, the imperfect automation literature showed that perceived accountability can improve the rate with which people detect inaccurate outputs (Mosier et al., 1998;Skitka et al., 2000). However, high accountability could also increase stress in oversight jobs which may negatively affect the sensitivity to detect errors (Hall et al., 2017). ...
Article
Full-text available
Legislation and ethical guidelines around the globe call for effective human oversight of AI-based systems in high-risk contexts – that is oversight that reliably reduces the risks otherwise associated with the use of AI-based systems. Such risks may relate to the imperfect accuracy of systems (e.g., inaccurate classifications) or to ethical concerns (e.g., unfairness of outputs). Given the significant role that human oversight is expected to play in the operation of AI-based systems, it is crucial to better understand the conditions for effective human oversight. We argue that the reliable detection of errors (as an umbrella term for inaccuracies and unfairness) is crucial for effective human oversight. We then propose that Signal Detection Theory (SDT) offers a promising framework for better understanding what affects people’s sensitivity (i.e., how well they are able to detect errors) and response bias (i.e., the tendency to report errors given a perceived evidence of an error) in detecting errors. Whereas an SDT perspective on the detection of inaccuracies is straightforward, we demonstrate its broader applicability by detailing the specifics for an SDT perspective on unfairness detection, including the need to choose a standard for (un)fairness. Additionally, we illustrate that an SDT perspective helps to better understand the conditions for effective error detection by showing examples of task-, system-, and person-related factors that may affect the sensitivity and response bias of humans tasked with detecting unfairness associated with the use of AI-based systems. Finally, we discuss future research directions for an SDT perspective on error detection.
Conference Paper
Labels like "AI-powered" or "Human-Expert" activate mental models and shape user decisions. Yet, the transferability of these labels on performance in complex, realistic tasks needs investigation. This study examines how recommender labeling and human factors (mindset, expertise) impact performance in a complex business management scenario. We conducted an online experiment employing a management dashboard, where participants (N = 395) received recommendations labeled as either Artificial Intelligence (AI) or Human-Expert-generated. Unlike previous research, labeling did not significantly influence task performance. Instead, graph literacy and cognitive load were key predictors of performance. Participants with positive attitudes toward AI found recommendations helpful, but their performance did not improve with their use. Expertise seems to be dominant in AI labeling in this context. These findings highlight the interaction between expertise, mindset, and labeling, advocating for further research investigating in which contexts labeling and human factors critically influence performance when using AI recommendations.
Preprint
Full-text available
Recently, a growing number of experts in artificial intelligence (AI) and medicine have be-gun to suggest that the use of AI systems, particularly machine learning (ML) systems, is likely to humanise the practice of medicine by substantially improving the quality of clinician-patient relationships. In this thesis, however, I argue that medical ML systems are more likely to negatively impact these relationships than to improve them. In particular, I argue that the use of medical ML systems is likely to comprise the quality of trust, care, empathy, understanding, and communication between clinicians and patients.
Article
Full-text available
Machine learning (ML) systems are vulnerable to performance decline over time due to dataset shift. To address this problem, experts often suggest that ML systems should be regularly updated to ensure ongoing performance stability. Some scholarly literature has begun to address the epistemic and ethical challenges associated with different updating methodologies. Thus far, however, little attention has been paid to the impact of model updating on the ML-assisted decision-making process itself. This article aims to address this gap. It argues that model updating introduces a new sub-type of opacity into ML-assisted decision-making—update opacity—that occurs when users cannot understand how or why an update has changed the reasoning or behaviour of an ML system. This type of opacity presents a variety of distinctive epistemic and safety concerns that available solutions to the black box problem in ML are largely ill-equipped to address. A variety of alternative strategies may be developed or pursued to address the problem of update opacity more directly, including bi-factual explanations, dynamic model reporting, and update compatibility. However, each of these strategies presents its own risks or carries significant limitations. Further research will be needed to address the epistemic and safety concerns associated with model updating and update opacity going forward.
Article
With the emergence of artificial intelligence (AI) as a possible solution to the current workforce crises in the National Health Service, there has been an exponential increase in research showcasing its diagnostic performance in imaging-based screening programmes. It is likely to be implemented in screening in the near future. Surgeons play an integral role in any screening multidisciplinary team. However, there is a lack of awareness among the large majority regarding the evidence for AI in imaging-based diagnostics. It is of paramount importance that the surgeons who will be treating screen-detected cancers understand the nature of the AI models that will diagnose the disease. This review article outlines some of the flaws and gaps in the current AI-related evidence resulting in its lack of use at present. It describes four key stages to AI development, along with their respective barriers to implementation using examples from the breast, lung and orthopaedic fields. Furthermore, it explores algorithmic choice, big data, commonly reported outcome metrics and the novel human–AI relationship, together with possible solutions to ensure that AI is implemented safely in clinical practice.
Article
Algorithms are capable of advising human decision‐makers in an increasing number of management accounting tasks such as business forecasts. Due to expected potential of these (intelligent) algorithms, there are growing research efforts to explore ways how to boost algorithmic advice usage in forecasting tasks. However, algorithmic advice can also be erroneous. Yet, the risk of using relatively bad advice is largely ignored in this research stream. Therefore, we conduct two online experiments to examine this risk of using relatively bad advice in a forecasting task. In Experiment 1, we examine the influence of performance feedback (revealing previous relative advice quality) and source of advice on advice usage in business forecasts. The results indicate that the provision of performance feedback increases subsequent advice usage but also the usage of subsequent relatively bad advice. In Experiment 2, we investigate whether advice representation, that is, displaying forecast intervals instead of a point estimate, helps to calibrate advice usage towards relative advice quality. The results suggest that advice representation might be a potential countermeasure to the usage of relatively bad advice. However, the effect of this antidote weakens when forecast intervals become less informative.
Article
Full-text available
Bogus resumes were evaluated by 212 business professionals to discover what mediates sex discrimination in hiring decisions. We hypothesized that discrimination against women and men who applied for stereotypically "masculine" and "feminine" jobs, respectively, could be reduced by providing individuating information suggesting that the applicant was an exception to his or her gender stereotype and possessed traits usually associated with the opposite gender. We also hypothesized that individuating information consistent with stereotypes about an applicant's gender would decrease the probability that an applicant would be evaluated favorably for a job usually considered appropriate for the other gender. We found that individuating information eliminated sex-typed personality inferences about male and female applicants and affected applicants' perceived job suitability; however, sex discrimination was not eliminated. We suggest that sex discrimination is mediated by occupation stereotypes that specify both the personality traits and the gender appropriate for each occupation. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Examined the process leading to the confirmation of a perceiver's expectancies about another when the social label that created the expectancy provides poor or tentative evidence about another's true dispositions or capabilities. Ss were 67 undergraduates. One group was led to believe that a child came from a high SES background; the other group, that the child came from a low SES background. Nothing in the SES data conveyed information directly relevant to the child's ability level, and when asked, both groups reluctantly rated the child's ability level to be approximately at grade level. Two other groups received the SES information and then witnessed a videotape of the child taking an academic test. Although the videotaped series was identical for all Ss, those who had information that the child came from a high SES rated her abilities well above grade level, whereas those for whom the child was identified as coming from a lower-class background rated her abilities as below grade level. Both groups cited evidence from the ability test to support their conclusions. Findings are interpreted as suggesting that some "stereotype" information creates not certainties but hypotheses about the stereotyped individual. However, these hypotheses are often tested in a biased fashion that leads to their false confirmation. (33 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
The effects of justification on subjects' application of judgment policies in a multiple-cue probability learning task under conditions of high versus low task predictability and provision versus no provision of feedback were investigated. The results showed that having to justify one's judgments will lead to higher consistency in the judgment policy when task predictability is low and no feedback is provided. The results are interpreted as indicating that justification may lead to an analytical mode of functioning in judgment behavior. Implications for research in cognitive conflict are also discussed.
Article
Previous research indicates that our initial impressions of events frequently influence how we interpret later information. This experiment explored whether accountability-pressures to justify one's impressions to others-leads people to process information more vigilantly and, as a result, reduces the undue influence of early-formed impressions on final judgments. Subjects viewed evidence from a criminal case and then assessed the guilt of the defendant. The study varied (1) the order of presentation of pro-vs. anti-defendant information, (2) whether subjects expected to justify their decisions and, if so, whether subjects realized that they were accountable prior to or only after viewing the evidence. The results indicated that subjects given the anti/pro-defendant order of information were more likely to perceive the defendant as guilty than subjects given the pro/anti-defendant order of information, but only when subjects did not expect to justify their decisions or expected to justify their decisions only after viewing the evidence. Order of presentation of evidence had no impact when subjects expected to justify their decisions before viewing the evidence. Accountability prior to the evidence evidence also substantially improved free recall of the case material. The results suggest that accountability reduces primacy effects by affecting how people initially encode and process stimulus information.
Article
Checklists are a way of life on the flight deck, and, undoubtedly, are indispensable decision aids due to the volume of technical knowledge that must be readily accessible. The improper use of checklists, however, has been cited as a factor in several recent aircraft accidents (National Transportation Safety Board, 1988, 1989, 1990). Solutions to checklist problems, including the creation of electronic checklist systems which keep track of skipped items, may solve some problems but create others. In this paper, results from a simulation involving an engine shutdown are presented, and implications of the electronic checklist and 'memory' checklist are discussed, in terms of potential errors and effects on decision making. Performance using two types of electronic checklist systems is compared with performance using the traditional paper checklist. Additionally, a 'performing from memory' condition is compared with a 'performing from the checklist' condition. Results suggest that making checklist procedures more automatic, either by asking crews to accomplish steps from memory, or by checklists that encourage crews to rely on system state as indicated by the checklist, rather than as indicated by the system itself, will discourage information gathering, and may lead to dangerous operational errors.
Article
Previous attitude-attribution studies indicate that people are often quick to draw conclusions about the attitudes and personalities of others-even when plausible external or situational causes for behavior exist (an affect known as the overattribution effect or fundamental attribution error). This experiment explores whether accountability-pressures to justify one's causal interpretations of behavior to others-reduces or eliminates this bias. Subjects were exposed to an essay that supported or opposed affirmative action. They were informed that the essay writer had freely chosen or had been assigned the position he took. Finally, subjects either did not expect to justify their impressions of the essay writer or expected to justify their impressions either before or after exposure to the stimulus information. The results replicated previous findings when subjects did not feel accountable for their impressions of the essay writer or learned of being accountable only after viewing the stimulus information. Subjects attributed essay-consistent attitudes to the writer even when the writer had been assigned the task of advocating a particular position. Subjects were, however, significantly more sensitive to situational determinants of the essay writer's behavior when they felt accountable for their impressions prior to viewing the stimulus information. The results suggest that accountability eliminated the overattribution effect by affecting how subjects initially encoded and analyzed stimulus information.