Article

Comparing Two Decision Support Modes Using the Cognitive Shadow Online Policy-Capturing System

Authors:
  • Thales Research and Technology Canada
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The Cognitive Shadow is a prototype tool intended to support decision making by autonomously modeling human operators’ response pattern and providing online notifications to the operators about the decision they are expected to make in new situations. Since the system can be configured either in a reactive “shadowing” or a proactive “recommendation” mode, this study aimed to determine its most effective mode in terms of human and model accuracy, workload, and trust. Subjects participated in an aircraft threat evaluation simulation without decision support or while using either mode of the Cognitive Shadow. Whereas the recommendation mode had no advantage over the control condition, the shadowing mode led to higher human and model accuracy. These benefits were maintained even when the tool was unexpectedly removed. Neither mode influenced workload, and the initial lower trust rating in the shadowing mode faded quickly, making it the best overall configuration for the cognitive assistant.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In active learning application contexts, queries are considered to have a prohibitive cost (obtaining each labeled example costs time, resources or efforts) so it is important that the queried instance gives the model as much information as possible. The Cognitive Shadow system, developed by Thales Research and Technology Canada [2][3][4][5][6], is a data-driven tool making use of frugal learning techniques to capture the judgement policies of human experts and provide online decision support. In the current paper, we aim to assess the potential of such a technique under specific constraints such as the need to operate in real time and propose a new hybrid data exploration/exploitation method to balance potential tradeoffs. ...
... The Cognitive Shadow [2][3][4][5][6][7], is an AI-based knowledge capture and decision support system that can be integrated into various mission systems. It automatically learns an operator's decision pattern and provides real-time warnings to prevent potential errors when a mismatch is detected between the predicted decision and the user's. ...
Chapter
Full-text available
Modeling human expert decision patterns can potentially help create training and decision support systems when no ground truth data is available. A cognitive modeling approach presented herein uses a combination of supervised learning methods to mimic expert strategies. Yet without historical data logs on human expert judgments in a given domain, training machine learning algorithms with new examples to be labelled one by one by human experts can be time-consuming and costly. This paper investigates the use of active learning methods for example selection in policy capturing sessions with an oracle in order to optimize frugal learning efficiency. It also introduces a new hybrid method aimed at improving predictive accuracy based on a better management of the exploration/exploitation tradeoff. Analyses on three datasets evaluated data exploration, data exploitation and finally hybrid methods. Results highlight different tradeoffs of those methods and show the benefits of using a hybrid approach.
... Policy capturing has been mainly studied and/or performed as an offline and asynchronous analysis strategy that allows computing a predictive model using a set of decisions that were previously made to be representative of the judgement policies of a decision-maker. Recently, Thales Research and Technology Canada developed the Cognitive Shadow system, a prototype tool that automatically learns a user's decision pattern from past decisions and that can provide advisory warnings in real time when the decision of the user does not match the system prediction [7,11]. ...
... The goal of this study was to test the effectiveness of integrating three types of evolutionary computation models deriving from NE and GP methods (namely NEAT, GPL and ZGP) to augment the predictive capacity of the Cognitive Shadow, a tool developed to support frugal policy capturing to model and support human decision making [7,11]. To do so, we generated datasets of different sizes across three use cases that differed in complexity. ...
Conference Paper
Full-text available
Decision making can be modeled in various ways for the design of decision-support systems. One strategy privileged for this purpose is policy capturing, i.e. using statistical techniques (and more recently machine learning) to model judgement policies. The Cognitive Shadow is a prototype tool suited for frugal learning that automatically learns a user’s decision pattern in real time based on an ensemble of seven supervised learning algorithms. This tool can provide advisory warnings when the user decision is inconsistent with the predicted outcome. Evolutionary computation methods could reinforce the system’s efficiency because of their ability to deal with computational complexity via evolution-inspired optimization mechanisms. The goal of this study was to assess the potential of evolutionary algorithms for frugal learning in an online policy capturing context. To do so, we tested three evolutionary algorithms on three different datasets (each split in three sizes), and compared both their prediction performance and training time with that of the other modeling techniques already implemented in the Cognitive Shadow system. Although all three evolutionary models were generally outperformed by non-evolutionary learning algorithms, one genetic programming method showed good prediction performance for the more complex use cases with the smaller datasets.
Article
The Cognitive Shadow is a prototype decision support tool that can notify users when they deviate from their usual judgment pattern. Expert decision policies are learned automatically online while performing one’s task using a combination of machine learning algorithms. This study investigated whether combining this system with the use of a process tracing technique could improve its ability to model human decision policies. Participants played the role of anti-submarine warfare commanders and rated the likelihood of detecting a submarine in different ocean areas based on their environmental characteristics. In the process tracing condition, participants were asked to reveal only the information deemed necessary, and only that information was sent to the system for model training. In the control condition, all the available information was sent to the system with each decision. Results showed that process tracing data improved the model’s ability to predict human decisions compared to the control condition.
Conference Paper
While non-player opponents in commercial video games often rely on simple artificial intelligence techniques, machine learning techniques that capture human strategies could make them more engaging. Cognitive Shadow is a prototype tool that combines several artificial intelligence techniques to continuously model human decision-making patterns during tasks that require categorical decision-making. The present study aims to assess the potential of Cognitive Shadow to create learning opponents that will counter the player's decisions in a strategy game, making it more challenging and engaging. The game developed to this end is a more complex version of rock-paper-scissors, set within the context of a wizards’ duel. Each participant (Player 1) took part in three game sessions of 12 battles (each including five rounds), only being told that they would face a non-player opponent. During Session 1, Cognitive Shadow was in learning mode, thus the non-player opponent (Player 2) chose its plays at random. During Session 2, Cognitive Shadow was active and helped counter participants’ decisions without their knowledge. Before Session 3, participants were informed that their opponent was using machine learning to anticipate and counter their strategy. The results showed that Player 2 was more effective with the help of Cognitive Shadow, having won significantly more battles in Sessions 2 and 3 than in Session 1. In addition, the level of engagement reported by human players increased significantly in Session 3. These results indicate that cognitive shadowing can be used in a strategy game to increase engagement when players are aware of the learning behavior.
Article
Full-text available
One component in the successful use of automated systems is the extent to which people trust the automation to perform effectively. In order to understand the relationship between trust in computerized systems and the use of those systems, we need to be able to effectively measure trust. Although questionnaires regarding trust have been used in prior studies, these questionnaires were theoretically rather than empirically generated and did not distinguish between three potentially different types of trust: human-human trust, human-machine trust, and trust in general. A 3-phased experiment, comprising a word elicitation study, a questionnaire study, and a paired comparison study, was performed to better understand similarities and differences in the concepts of trust and distrust, and among the different types of trust. Results indicated that trust and distrust can be considered opposites, rather than different concepts. Components of trust, in terms of words related to trust, were similar across the three types of trust. Results obtained from a cluster analysis were used to identify 12 potential factors of trust between people and automated systems. These 12 factors were then used to develop a proposed scale to measure trust in automation.
Article
Full-text available
The out-of-the-loop performance problem, a major potential consequence of automation, leaves operators of automated systems handicapped in their ability to take over manual operations in the event of automation failure. This is attributed to a possible loss of skills and of situation awareness (SA) arising from vigilance and complacency problems, a shift from active to passive information processing, and change in feedback provided to the operator. We studied the automation of a navigation task using an expert system and demonstrated that low SA corresponded with out-of-the-loop performance decrements in decision time following a failure of the expert system. Level of operator control in interacting with automation is a major factor in moderating this loss of SA. Results indicated that the shift from active to passive processing was most likely responsible for decreased SA under automated conditions.
Article
Full-text available
The aim of this study was to evaluate display formats for an automated combat identification (CID) aid. Verbally informing users of automation reliability improves reliance on automated CID systems. A display can provide reliability information in real time. We developed and tested four visual displays that showed both target identity and system reliability information. Display type (pie, random mesh) and display proximity (integrated, separated) of identity and reliability information were manipulated. In Experiment 1, participants used the displays while engaging targets in a simulated combat environment. In Experiment 2, participants briefly viewed still scenes from the simulation. Participants relied on the automation more appropriately with the integrated display than with the separated display. Participants using the random mesh display showed greater sensitivity than those using a pie chart. However, in Experiment 2, the sensitivity effects were limited to lower reliability levels. The integrated display format and the random mesh display were the most effective displays tested. We recommend the use of the integrated format and a random mesh display to indicate identity and reliability information with an automated CID system.
Article
Full-text available
The mathematical representation of E. Brunswik's (1952) lens model has been used extensively to study human judgment and provides a unique opportunity to conduct a meta-analysis of studies that covers roughly 5 decades. Specifically, the authors analyzed statistics of the "lens model equation" (L. R. Tucker, 1964) associated with 249 different task environments obtained from 86 articles. On average, fairly high levels of judgmental achievement were found, and people were seen to be capable of achieving similar levels of cognitive performance in noisy and predictable environments. Further, the effects of task characteristics that influence judgment (numbers and types of cues, inter-cue redundancy, function forms and cue weights in the ecology, laboratory versus field studies, and experience with the task) were identified and estimated. A detailed analysis of learning studies revealed that the most effective form of feedback was information about the task. The authors also analyzed empirically under what conditions the application of bootstrapping--or replacing judges by their linear models--is advantageous. Finally, the authors note shortcomings of the kinds of studies conducted to date, limitations in the lens model methodology, and possibilities for future research.
Article
Full-text available
: This is the first of two papers that use off-training set (OTS) error to investigate the assumption -free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in which there are such distinctions.) In this first paper it is shown, loosely speaking, that for any two algorithms A and B, there are "as many" targets (or priors over targets) for which A has lower expected OTS error than B as vice-versa, for loss functions like zero-one loss. In particular, this is true if A is cross-validation and B is "anti-cross-validation" (choose the learning algorithm with largest cross-validation error). This paper ends with a discussion of the implications of these results for computational learning theory. It is shown that one can not say: if empirical misclassification rate is low; the Vapnik-Chervonenkis dimension of your generalizer is small; and the trainin...
Article
Full-text available
: This paper uses off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. It is shown, loosely speaking, that for any two algorithms A and B, there are as many targets (or priors over targets) for which A has lower expected OTS error than B as vice-versa, for loss functions like zero-one loss. In particular, this is true if A is cross-validation and B is "anti-cross-validation" (choose the generalizer with largest cross-validation error). On the other hand, for loss functions other than zero-one (e.g., quadratic loss), there are a priori distinctions between algorithms. However even for such loss functions, any algorithm is equivalent on average to its "randomized" version, and in this still has no first principles justification in terms of average error. Nonetheless, it may be that (for example) cross-validation has better minimax properties than anti-cross-validation, even for zero-one loss. This paper also analyzes averages over hyp...
Chapter
Thales Research and Technology Canada is developing a decision support system consisting in multiple classification models trained simultaneously online to capture experts’ decision policies based on their previous decisions. The system learns decision patterns from examples annotated by a human expert during a training phase of knowledge capture. Because of the small volume of labeled data, we investigated a machine learning technique called active learning that copes with the dilemma of learning with minimal resources and aims at requesting the most informative samples in a pool given the current models. The current study evaluates the impact of using active learning over an uninformed strategy (e.g., random sampling) in the context of policy capturing to reduce the annotation cost during the knowledge capture phase. This work shows that active learning has potential over random sampling for capturing human decision policies with minimal amount of examples and for reducing annotation cost significantly.
Chapter
The present work introduces a prototype intelligent cognitive assistant that continuously learns the decision pattern of the user online, instantly recognizes deviations from that pattern as potential errors and then alerts the user accordingly. We investigated the potential of this prototype system using a human-in-the-loop experiment designed to assess impacts on decision making performance, workload and trust in the decision support capability. Study participants interacted with a naval air-defence testbed to classify radar contacts as friendly, uncertain or hostile based on track parameters displayed on screen. The between-group experimental design included a control condition and two decision support conditions (with system reliability provided either offline or online). Results showed that both decision support conditions significantly improved task accuracy compared to the control condition. The advisory system was successful at improving human performance without burdening the user with excessive additional workload, even when providing reliability information in real time.
Article
Policy capturing is a judgment analysis method that typically uses linear statistical modeling to estimate expert judgments. A variant to this technique is to capture decision policies using data-mining algorithms designed to handle nonlinear decision rules, missing attributes, and noisy data. In the current study, we tested the effectiveness of a decision-tree induction algorithm and an instance-based classification method for policy capturing in comparison to the standard linear approach. Decision trees are relevant in naturalistic decision-making contexts since they can be used to represent “fast-and-frugal” judgment heuristics, which are well suited to describe human cognition under time pressure. We examined human classification behavior using a simulated naval air defense task in order to empirically compare the C4.5 decision-tree algorithm, the k-nearest neighbors algorithm, and linear regression on their ability to capture individual decision policies. Results show that C4.5 outperformed the other methods in terms of goodness of fit and cross-validation accuracy. Decision-tree models of individuals’ judgment policies actually classified contacts more accurately than their human counterparts, resulting in a threefold reduction in error rates. We conclude that a decision-tree induction algorithm can yield useful models for training and decision support applications, and we discuss the application of judgmental bootstrapping in real time in dynamic environments.
Article
This paper describes an experiment that was undertaken to compare three levels of automation in rail signalling; a high level in which an automated agent set routes for trains using timetable information, a medium level in which trains were routed along pre-defined paths, and a low level where the operator (signaller) was responsible for the movement of all trains. These levels are described in terms of a Rail Automation Model based on previous automation theory (Parasuraman et al., 2000). Performance, subjective workload, and signaller activity were measured for each level of automation running under both normal operating conditions and abnormal, or disrupted, conditions. The results indicate that perceived workload, during both normal and disrupted phases of the experiment, decreased as the level of automation increased and performance was most consistent (i.e. showed the least variation between participants) with the highest level of automation. The results give a strong case in favour of automation, particularly in terms of demonstrating the potential for automation to reduce workload, but also suggest much benefit can achieved from a mid-level of automation potentially at a lower cost and complexity.
Article
A prototype decision support system (DSS) was developed to enhance Navy tactical decision making based on naturalistic decision processes. Displays were developed to support critical decision making tasks through recognition-primed and explanation-based reasoning processes, and cognitive analysis was conducted of the decision making problems faced by Navy tactical officers in a shipboard Combat Information Center. Baseline testing in simulations of high intensity, peace keeping, littoral missions indicated that experienced decision makers were not well served by current systems, and their performance revealed periodic loss of situation awareness. A study is described with eight expert Navy tactical decision making teams who used either their current system alone or in conjunction with the prototype DSS. When the teams had the prototype DSS available, we observed significantly fewer communications to clarify the tactical situation, significantly more critical contacts identified early in the scenario, and a significantly greater number of defensive actions taken against imminent threats. These findings suggest that the prototype DSS enhanced the commanders' awareness of the tactical situation, which in turn contributed to greater confidence, lower workload, and more effective performance. Significant work remains to be done in learning how to optimally design and train users of such systems.
Article
The knowledge elicitation problem arises from the need to acquire the knowledge of human experts in an explicit form suitable for encoding in a computer program such as an expert system. This is very difficult to perform successfully because of the size and complexity of knowledge structures in the human brain, and because much procedural knowledge is tacit and unavailable to conscious verbal report via interview methods. The present paper draws upon an extensive review of research in the field of cognitive psychology in an attempt to offer a practical approach to this problem. First, a wide range of cognitive theories concerning the nature of knowledge representation in humans is considered, and a synthesis of the current state of theory is provided. Second, attention is drawn to a number of performance factors which may constrain the exhibition of a person's underlying cognitive competence. There then follows a review and discussion of a number of alternative psychological methodologies that might be applied to the elicitation of different types of human knowledge. Finally, some suggestions are made for the application of the psychological work discussed to the practical problem of knowledge elicitation.
Article
"Judgment Analysis" provides [a] theoretical and methodological summary of judgment analysis that integrates a diverse range of issues, guiding principles, and applications. Key features concern capturing, comparing, and aggregating judgment policies. The book is a training guide for new researchers and postgraduate students and a handbook for more experienced researchers and consultants. Key features: [includes] a complete methodological guide to the conduct of "Judgment Analysis" studies; traces the history of the Judgment Analysis paradigm from E. Brunswik to present day professionals [and] reviews areas of application for Judgment Analysis including medical, social policy, education, clinical, and consumer management. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Recognising the existence of different forms of knowledge is a first step towards effective knowledge elicitation. This article takes a brief look at some of the different types of knowledge which human experts possess and then focusses on the problem of implicit knowledge.The fact that much of an expert's knowledge is implicit or tacit in nature is a major problem for those working in the area of knowledge elicitation. Despite this, the topic has attracted little discussion or research. The present article reviews some of the limited literature on the topic and attempts to settle some of the confusion over what implicit knowledge is, or might be. Relevant experiments from the psychological literature are discussed. The paper also looks at possible ways of assessing implicit knowledge and makes recommendations for future research in this area.
Article
The present study investigates automation misuse based on complacency and automation bias in interacting with a decision aid in a process control system. The effect of a preventive training intervention which includes exposing participants to rare automation failures is examined. Complacency is reflected in an inappropriate checking and monitoring of automated functions. In interaction with automated decision aids complacency might result in commission errors, i.e., following automatically generated recommendations even though they are false. Yet, empirical evidence proving this kind of relationship is still lacking. A laboratory experiment (N=24) was conducted using a process control simulation. An automated decision aid provided advice for fault diagnosis and management. Complacency was directly measured by the participants’ information sampling behavior, i.e., the amount of information sampled in order to verify the automated recommendations. Possible commission errors were assessed when the aid provided false recommendations. The results provide clear evidence for complacency, reflected in an insufficient verification of the automation, while commission errors were associated with high levels of complacency. Hence, commission errors seem to be a possible, albeit not an inevitable consequence of complacency. Furthermore, exposing operators to automation failures during training significantly decreased complacency and thus represents a suitable means to reduce this risk, even though it might not avoid it completely. Potential applications of this research include the design of training protocols in order to prevent automation misuse in interaction with automated decision aids.
Chapter
The results of a multi-year research program to identify the factors associated with variations in subjective workload within and between different types of tasks are reviewed. Subjective evaluations of 10 workload-related factors were obtained from 16 different experiments. The experimental tasks included simple cognitive and manual control tasks, complex laboratory and supervisory control tasks, and aircraft simulation. Task-, behavior-, and subject-related correlates of subjective workload experiences varied as a function of difficulty manipulations within experiments, different sources of workload between experiments, and individual differences in workload definition. A multi-dimensional rating scale is proposed in which information about the magnitude and sources of six workload-related factors are combined to derive a sensitive and reliable estimate of workload.
Article
Data from 2 previous studies were reanalyzed, one on judgments regarding drug treatment of hyperlipidemia and the other on diagnosing heart failure. The original MH model and the extended MH model were compared with logistic regression (LR) in terms of fit to actual judgments, number of cues, and the extent to which the cues were consistent with clinical guidelines. There was a slightly better fit with LR compared with MH. The extended MH model gave a significantly better fit than the original MH model in the drug treatment task. In the diagnostic task, the number of cues was significantly lower in the MH models compared to LR, whereas in the therapeutic task, LR could be less or more frugal than the matching heuristic models depending on the significance level chosen for inclusion of cues. For the original MH model, but not for the extended MH model or LR, the most important cues in the drug treatment task were often used in a direction contrary to treatment guidelines. The extended MH model represents an improvement in that prevalence of cue values is adequately taken into account, which in turn may result in better fit and in better agreement with medical guidelines in the evaluation of cues.