Patrick GontarDFS Deutsche Flugsicherung · Safety Intelligence
Research Items (19)
Objective: The aim of this study was to analyze influences on interrater reliability and withingroup agreement within a highly experienced rater group when assessing pilots’ nontechnical skills. Background: Nontechnical skills of pilots are crucial for the conduct of safe flight operations. To train and assess these skills, reliable expert ratings are required. Literature shows to some degree that interrater reliability is influenced by factors related to the targets, scenarios, rating tools, or the raters themselves. Method: Thirty-seven type-rating examiners from a European airline assessed the performance of 4 flight crews based on video recordings using LOSA and adapted NOTECHS tools. We calculated rwg and ICC(3) to measure within-group agreement and interrater reliability. Results: The findings indicated that within-group agreement and interrater reliability were not always acceptable. It was shown that the performance of outstanding pilots was rated with the highest within-group agreement. For cognitive aspects of performance, interrater reliability was higher than for social aspects of performance. Agreement was lower on the pass–fail level than for the distinguished performance scales. Conclusion: These results suggest pass–fail decisions should not be based exclusively on nontechnical skill ratings. We furthermore recommend that regulatory authorities more systematically address interrater reliability in airline instructor training. Airlines as well as training facilities should be encouraged to demonstrate sufficient interrater reliability when using their rating tools.
The objective of this study is to analyse pilots' decision-making behaviour in terms of naturalistic decision-making. In line with the highly experienced group of pilots (n = 120), recognition-primed decisions are expected to dominate. In a full-flight simulator experiment, with two groups of pilots (short-haul and long-haul pilots) with different levels of practice and training, we were able to show that only about one-third of the pilots make recognition-primed decisions. Results may indicate that the current training practice helps pilots to handle foreseeable problems very well, yet does not support pilots in ambivalent and new decision-making situations. Based on these findings, we recommend the incorporation of more unforeseen events in recurrent training simulator missions to train pilots in handling unknown situations. Practitioner Summary: The results from a flight-simulator study showed that pilots' decision-making is more analytical than recognition-primed. A possible reason for this could be the pressure for justification, or simply that pilots cannot use their experience in unforeseen situations. Hence, training should include more unforeseen events.
This paper presents the results of different methods to assess reliability when instructor pilots rate pilots regarding their non-technical skills (NOTECHS). In preparation for a major inter-rater reliability study, this pretest analyzes the rating behavior of two instructor pilots during a full-flight simulator mission. Besides inter-rater reliability and test-retest reliability, the pilots' self-rating (n =12) and the instructors' point of view is analyzed. Results indicate a wide spread from poor to excellent reliabilities as a function of the different rating dimensions. Regarding inter-rater reliability, it is found that non-technical skills are rated more reliably under high workload conditions than under low workload conditions, and social aspects of non-technical skills are rated more reliably than cognitive aspects. Test-retest reliability is found to be .6 on average, whereas self-rating / instructor rating reliability is .5 on average. Based on these findings, implications for the major inter-rater reliability study will be derived and incorporated. The importance of effective Crew Resource Management (CRM) has been known since the late 1970s, when NASA held their workshop on " Resource Management on the Flightdeck " , and came to the conclusion that a majority of accidents are directly linked to interpersonal skills (Helmreich, Merritt, & Wilhelm, 1999; Dietrich, 2004; Gontar, Hoermann, Deischl, & Haslbeck, 2014). Consequently, adequate training methods and corresponding evaluation metrics were developed (O'Connor, Hoermann, Flin, Lodge, & Goeters, 2002). Although huge efforts were undertaken to train the raters, inter-rater reliability (IRR) is still an issue to be discussed. For example, Flin and Martin (2001), Law and Sherman (1995), Law and Wilhelm (1995), and Seamster, Edens, and Holt (1995) found different influencing factors that result in reduced inter-rater reliability in the aviation context. Sevdalis et al. (2008) and Yule et al. (2008) showed similar reduced reliabilities within the medical domain. In current airline practice, the trainer has to operate the simulator, simulate the air traffic controller, and assess the pilots during the mission ‒ all at the same time. These circumstances make it worth analyzing the current evaluation practice in an airline to develop general recommendations to improve reliability of CRM ratings during training.
Objective: This study evaluated airline pilots’ inceptor input patterns and flightpath control strategies during a manual instrument approach as a function of recent flight practice. Background: Manual flying skills erode due to an extensive use of automation and rare opportunity to practice these skills. Method: 126 randomly selected pilots of a European airline took part in this experiment, performing a simulated manual raw data precision approach. All of the pilots were allocated to one of four groups according to their fleet and rank: first officers and captains on short-haul, as well as first officers and captains on long-haul. A new method to analyze flightpath control strategies by differentiating between constant and variable flightpath errors was proposed. Time-domain measures were taken into account to evaluate sidestick inputs. Results: We distinguished between two different flightpath control strategies; both differed in the deviations achieved. In addition, the pilots who predominantly used one-dimensional sidestick inputs also had smaller deviations from the ideal flightpath. Conclusion: Pilots showed a relationship between manual fine-motor flying skills and recent flight practice, especially in long-haul fleets.
- Feb 2018
The safe and efficient operation of air traffic is highly dependent on the performance of the Air Traffic Control Officer (ATCO). The ATCOs control the traffic within defined areas by monitoring the traffic and granting clearances. A key element in analyzing the ATCOs is their interaction with the environment through their workplace. Especially the influence of task load on their situation awareness (SA) and applied control strategy provides information on the quality of the workplace. As task load increases, controllers are able to maintain performance by using different management or compensation strategies. This article supports the evaluation of ATCO’s workplaces by focusing on whether probe techniques for assessing SA are applicable for tower control operation and for measuring the influences of increased task load on the control strategy. An experiment with nine ATCOs was conducted in a simulated real-time air traffic control environment. Different measurements for SA were applied and compared regarding their efficiency and validity. The manipulation of task load and visibility influenced the SA and control strategy at the same time. Performance metrics were selected in advance to evaluate the participant’s efficiency. SA was measured with a probe technique and an offline self-assessment method. Findings suggest that probe techniques increase the insight into the understanding of SA in comparison to self-assessment and that they are applicable to the air traffic control environment. Control strategies were derived from the information-gathering process via the eye-movement behavior and connected to task load. The results imply that SA is part of the individual performance and that increasing demand through task load is handled with an adaptation of the control strategy.
We present the results of flight simulator experiments (60 runs) with randomly selected airline pilots under realistic operational conditions and discuss them in light of current fuel regulations and potential fuel starvation. The experiments were conducted to assess flight crew performance in handling complex technical malfunctions including decision-making in fourth-generation jet aircraft. Our analysis shows that the current fuel requirements of the European Aviation Safety Agency (EASA) are not sufficient to guarantee the safety target of the Advisory Council for Aviation Research and Innovation in Europe (ACARE), which is less than one accident in 10 million flights. To comply with this safety target, we recommend increasing the Final Reserve Fuel from 30 min to 45 min for jet aircraft. The minimum dispatched fuel upon landing should be at least 1 h.
- Sep 2017
This study aimed to analyze aircraft ground operation processes from a human factors perspective with special emphases on the occurrence and influence of interruptions on pilots’ workload. Interruptions have been shown to increase workload and error probability as well as to contribute to fatal accidents in various fields. Countermeasures have been initiated especially in high-risk environments such as those involving medical issues. In aviation, more explicitly during turn-around processes, interruptions might occur frequently and impair flight safety. One hundred and sixty fully certified pilots working for a European airline were observed during their turn-around while performing real operations. Pilots’ interruptions were documented and classified in order to predict subjectively perceived workload by use of multiple linear regression analysis. External factors such as weather conditions, technical problems, and time pressure were considered as covariates. On average, a pilot experienced about eight interruptions during a turn-around. Overall workload estimates showed a level comparable to that of manual flying in a simulator. Interruptions from colleagues or from outside the cockpit were found to predict pilots’ workload; however, further external factors such as poor weather conditions impacted workload even more strongly. We suggest two approaches based on our results to handling the high rate of interruptions. We first recommend procedural changes to diminish the interruption rate; second, we recommend comprehensive, line-oriented flight training for airline and ground staff to raise awareness about the negative influence of interruptions.
- Jun 2017
The training and evaluation of the crew resource management skills of pilots play an essential role in increasing flight safety, as they aim to reduce human error in aviation operations. Communication between pilots is a critical crew resource management skill, as flying an airplane requires coordinated action and collaboration by the flight deck crew. However, research that studied flight instructors’ agreement in (and, thus, the accuracy of) their evaluation of pilots’ communication behavior found little consistency in their judgments. As such, the present research explores the feasibility of a content-free approach—cross-recurrence analysis—to assess crew communication, in contrast to commonly employed content-based approaches that are grounded in speech act analysis. Results indicate that cross-recurrence analysis can identify communication patterns associated with high and low crew performance. We discuss the implications that these results may have for future research and communication assessment in pilot training.
N=60 commercial airline pilots holding valid ATPLs flew a manual ILS approach following a weather induced missed approach during a night mission in full flight simulators. Measures of subjective fatigue, sustained attention, and the NASA Taskload Index were collected before and after the mission. In addition, sleep history data were available covering three days prior to the simulator. Both subjective and objective measures of fatigue showed significant ascent over the three hours of the experimental procedure. While sleep history data and roster information were related to both the overall level of fatigue and to reaction times, pilots who experienced a higher degree of workload during the simulator exercise showed a significant increase in subjective fatigue scores after the mission. The findings provide some evidence for lasting effects of a sleep deficit as well as for a multifactorial model of fatigue risk. Most models of fatigue risk in aviation can be traced back to the classical two-process model of sleep regulation (Borbély, 1982), which explains sleepiness through the interaction of homeostatic (sleep pressure by time awake) and circadian influences (circadian phase). In order to achieve more accurate predictions, some fatigue risk management systems (FRMS) consider additionally sleep inertia (Åkerstedt & Folkard, 1990), task-related factors (i.e. time-zone transitions, workload, work-schedule), individual factors (i.e. lifestyle , chronotype) or cumulative effects (VanDongen et al., 2003). However, empirical validation data for task-related and individual factors with cognitive effectiveness within the aviation environment are rare and contradictory (Tritschler & Bond, 2010; Williamson et al., 2011). This study aimed to explore the relationship between subjective and objective measures of fatigue with factors of workload and work scheduling. Our data were gathered before and after a simulator night mission with a sample of long-and short-haul pilots who had been awake for more than 16 hours. It was expected that individual sleep history and scheduling factors are equally related to the overall level of fatigue before and after the simulator mission. In addition to that, we analyzed whether workload as experienced by the individual pilot during the simulator mission can be identified as a moderator variable for an increase of fatigue after the mission.
Der vorliegende Aufsatz beschreibt ein Versuchsdesign zur Untersuchung der menschlichen Informationsverarbeitung am Beispiel von Linienpi-loten während des manuellen Fliegens. In einem kurzen Landesszenario (raw data ILS) wurden abwechselnd Informationsaufnahme und –umsetzung gestört und deren Auswirkungen auf den jeweils anderen Bereich menschlicher Informationsverarbeitung beobachtet. Die Ergebnisse zeigen, dass die gewählten Störungen adäquat gewählt waren.
This paper introduces a flight simulator study to evaluate procedures and checklists for use in abnormal situations. These utilities are meant to support pilots in hazardous situations, but for extreme situations, they might be inappropriate. Sixty crews (A340, A320) were required to fly in an abnormal flight scenario with a technical defect (loss of one main hydraulic system), which leads to several subsequent events. The aircrews had to mitigate these situations in order to safely land. The decreasing remaining fuel intensified time pressure while overloaded procedures had to be performed. In this study, the provided procedures were tested in this simulated crisis scenario.
Können professionelle Piloten ihre fliegerischen Fähigkeiten aus dem täglichen Flugbetrieb im Flugsimulator sofort anwenden und umsetzen? Basierend auf dieser Frage, wurde im Rahmen eines Flugsimulatorexperiments der Anpassungsvorgang an einen statischen Simulator beim manuellen Fliegen untersucht. Dabei flogen elf erste Offiziere einer Partnerairline jeweils zehn manuelle Landeanflüge bei unterschiedlichen Umwelteinflüssen. Zur Bewertung der Anpassung wurden die manuelle Flugleistung, das Steuerverhalten und die subjektive Beanspruchung gemessen. Die Ergebnisse der Kontrastanalysen zeigen einen deutlichen Anpassungsprozess hinsichtlich der ILS Ablagen und Steuerbewegungen. Weiterhin ist eine Abnahme der subjektiven Beanspruchung im Laufe des Versuchs zu verzeichnen.
An diesem Flugsimulatorexperiment nahmen 57 zufällig ausgewählte Piloten mit unterschiedlich ausgeprägter Trainiertheit teil. Während sich eine Gruppe aus Langstreckenkapitänen (geringe Trainiertheit, hohe Erfahrung) zusammensetzte, wurden für die andere Gruppe Erste Offiziere der Kurzstrecke (hohe Trainiertheit, geringe Erfahrung) in einer Zufallsstichprobe einer kooperierenden Airline ausgewählt. Als abhängige Variable wurden Blickverteilung und Zuverlässigkeit bestimmter Kontrollblicke gemessen. Während des Szenarios mussten die Piloten nach einem Durchstartmanöver einen manuellen Anflug auf die Münchner Landebahn durchführen. Die Ergebnisse des Experiments zeigen Unterschiede zwischen den beiden Gruppen: Sowohl die allgemeine Blickverteilung als auch die Zuverlässigkeit, bezogen auf bestimmte Kontrollblicke, unterscheiden sich signifikant.
This paper presents an experimental evaluation of pilots’ ability to support their manual flying skills through visual behavior. To this end, two groups of pilots with different levels of practice and training are compared in a full flight simulator. Dif-ferent visual information acquisition strategies are used during the flight phases. In flight, pilots must direct their attention towards monitoring, while in a manual flying phase (approach and landing), a more frequent and accurate panel scan is impera-tive. The gaze data collected during this high-taskload flight period makes it possi-ble to detect the differences between these two groups.