Structuring the Safety Argumentation for Deep Neural Network Based Perception in Automotive Applications

To read the full-text of this research, you can request a copy directly from the authors.


Deep neural networks (DNNs) are widely considered as a key technology for perception in high and full driving automation. However, their safety assessment remains challenging, as they exhibit specific insufficiencies: black-box nature, simple performance issues, incorrect internal logic, and instability. These are not sufficiently considered in existing standards on safety argumentation. In this paper, we systematically establish and break down safety requirements to argue the sufficient absence of risk arising from such insufficiencies. We furthermore argue why diverse evidence is highly relevant for a safety argument involving DNNs, and classify available sources of evidence. Together, this yields a generic approach and template to thoroughly respect DNN specifics within a safety argumentation structure. Its applicability is shown by providing examples of methods and measures following an example use case based on pedestrian detection.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... We have completed and adapted this approach to integrate the specific properties (safe behaviour reference given by the LUT) and the safety net. [26] proposed a template to structure the safety argumentation part specific to DNNs. Their work is illustrated with an example use case based on pedestrian detection. ...
The latest generation of safety standards applicable to automated driving systems require both qualitative and quantitative safety acceptance criteria to be defined and argued. At the same time, the use of machine learning (ML) functions is increasingly seen as a prerequisite to achieving the necessary levels of perception performance in the complex operating environments of these functions. This inevitably leads to the question of which supporting evidence must be presented to demonstrate the safety of ML-based automated driving systems. This chapter discusses the challenge of deriving suitable acceptance criteria for the ML function and describes how such evidence can be structured in order to support a convincing safety assurance case for the system. In particular, we show how a combination of methods can be used to estimate the overall machine learning performance, as well as to evaluate and reduce the impact of ML-specific insufficiencies, both during design and operation.
Full-text available
Integration of Machine Learning (ML) components in critical applications introduces novel challenges for software certification and verification. New safety standards and technical guidelines are under development to support the safety of ML-based systems, e.g., ISO 21448 SOTIF for the automotive domain and the Assurance of Machine Learning for use in Autonomous Systems (AMLAS) framework. SOTIF and AMLAS provide high-level guidance but the details must be chiseled out for each specific case. We report results from an industry-academia collaboration on safety assurance of SMIRK, an ML-based pedestrian automatic emergency braking demonstrator running in an industry-grade simulator. We present the outcome of applying AMLAS on SMIRK for a minimalistic operational design domain, i.e., a complete safety case for its integrated ML-based component. Finally, we report lessons learned and provide both SMIRK and the safety case under an open-source licence for the research community to reuse.
Der Wechsel zu alternativen Antriebssystemen und die zunehmende Komplexität automotiver Software stellt die Automobilindustrie vor große Herausforderungen. Dies gilt vor allem dann, wenn neuartige Entwicklungsparadigmen die bisherige Tradition in Hinsicht auf die Produktentwicklung tiefgreifend verändern. KI ist ein solcher Treiber von Veränderung, den wir hinsichtlich seines Einflusses auf die technische Entwicklung zukünftiger Fahrzeugplattformen und Mobilitätsprodukten in diesem Kapitel beleuchten werden. Dabei gehen wir sowohl auf die Produkt- als auch auf die Entwicklungsprozess- und die Unternehmensseite ein.
Approximating while compressing lookup tables (LUT) with a set of neural networks (NN) is an emerging trend in safety critical systems, such as control/command or navigation systems. Recently, as an example, many research papers have focused on the ACAS Xu LUT compression. In this work, we explore how to make such a compression while preserving the system safety and offering adequate means of certification.
Label noise is a primary point of interest for safety concerns in previous works as it affects the robustness of a machine learning system by a considerable amount. This paper studies the sensitivity of object detection loss functions to label noise in bounding box detection tasks. Although label noise has been widely studied in the classification context, less attention is paid to its effect on object detection. We characterize different types of label noise and concentrate on the most common type of annotation error, which is missing labels. We simulate missing labels by deliberately removing bounding boxes at training time and study its effect on different deep learning object detection architectures and their loss functions. Our primary focus is on comparing two particular loss functions: cross-entropy loss and focal loss. We also experiment on the effect of different focal loss hyperparameter values with varying amounts of noise in the datasets and discover that even up to 50% missing labels can be tolerated with an appropriate selection of hyperparameters. The results suggest that focal loss is more sensitive to label noise, but increasing the gamma value can boost its robustness.
Developing a stringent safety argumentation for AI-based perception functions requires a complete methodology to systematically organize the complex interplay between specifications, data and training of AI-functions, safety measures and metrics, risk analysis, safety goals and safety requirements. The paper presents the overall approach of the German research project “KI-Absicherung” for developing a stringent safety-argumentation for AI-based perception functions. It is a risk-based approach in which an assurance case is constructed by an evidence-based safety argumentation.
Conference Paper
Full-text available
Deep neural networks (DNNs) are vulnerable to adversarial examples and other data perturbations. Especially in safety critical applications of DNNs, it is therefore crucial to detect misclassified samples. The current state-of-the-art detection methods require either significantly more runtime or more parameters than the original network itself. This paper therefore proposes GraN, a time-and parameter-efficient method that is easily adaptable to any DNN. GraN is based on the layer-wise norm of the DNN's gradient regarding the loss of the current input-output combination, which can be computed via backpropagation. GraN achieves state-of-the-art performance on numerous problem setups .
Full-text available
Neural networks (NNs) have become a key technology for solving highly complex tasks, and require integration into future safety argumentations. New safety relevant aspects introduced by NN based algorithms are: representativity of test cases, robustness, inner representation and logic, and failure detection for NNs. In this paper, a general argumentation structure for safety cases respecting these four aspects is proposed together with possible sources of evidence.
Conference Paper
Full-text available
Methods for safety assurance suggested by the ISO 26262 automotive functional safety standard are not sufficient for applications based on machine learning (ML). We provide a structured, certification oriented overview on available methods supporting the safety argumentation of a ML based system. It is sorted into life-cycle phases, and maturity of the approach as well as applicability to different ML types are collected. From this we deduce current open challenges: powerful solvers, inclusion of expert knowledge, validation of data representativity and model diversity, and model introspection with provable guarantees.
Full-text available
We consider the problem of detecting out-of-distribution examples in neural networks. We propose ODIN, a simple and effective out-of-distribution detector for neural networks, that does not require any change to a pre-trained model. Our method is based on the observation that using temperature scaling and adding small perturbations to the input can separate the softmax score distributions of in- and out-of-distribution samples, allowing for more effective detection. We show in a series of experiments that our approach is compatible with diverse network architectures and datasets. It consistently outperforms the baseline approach[1] by a large margin, establishing a new state-of-the-art performance on this task. For example, ODIN reduces the false positive rate from the baseline 34.7% to 4.3% on the DenseNet (applied to CIFAR-10) when the true positive rate is 95%. We theoretically analyze the method and prove that performance improvement is guaranteed under mild conditions on the image distributions.
Conference Paper
Full-text available
The introduction of automated vehicles without permanent human supervision demands a functional system description, including functional system boundaries and a comprehensive safety analysis. These inputs to the technical development can be identified and analyzed by a scenario-based approach. Furthermore, to establish an economical test and release process, a large number of scenarios must be identified to obtain meaningful test results. Experts are doing well to identify scenarios that are difficult to handle or unlikely to happen. However, experts are unlikely to identify all scenarios possible based on the knowledge they have on hand. Expert knowledge modeled for computer aided processing may help for the purpose of providing a wide range of scenarios. This contribution reviews ontologies as knowledge-based systems in the field of automated vehicles, and proposes a generation of traffic scenes in natural language as a basis for a scenario creation.
Full-text available
There are two major types of uncertainty one can model. Aleatoric uncertainty captures noise inherent in the observations. On the other hand, epistemic uncertainty accounts for uncertainty in the model -- uncertainty which can be explained away given enough data. Traditionally it has been difficult to model epistemic uncertainty in computer vision, but with new Bayesian deep learning tools this is now possible. We study the benefits of modeling epistemic vs. aleatoric uncertainty in Bayesian deep learning models for vision tasks. For this we present a Bayesian deep learning framework combining input-dependent aleatoric uncertainty together with epistemic uncertainty. We study models under the framework with per-pixel semantic segmentation and depth regression tasks. Further, our explicit uncertainty formulation leads to new loss functions for these tasks, which can be interpreted as learned attenuation. This makes the loss more robust to noisy data, also giving new state-of-the-art results on segmentation and depth regression benchmarks.
Conference Paper
Full-text available
Deep neural networks have emerged as a widely used and effective means for tackling complex, real-world problems. However, a major obstacle in applying them to safety-critical systems is the great difficulty in providing formal guarantees about their behavior. We present a novel, scalable, and efficient technique for verifying properties of deep neural networks (or providing counter-examples). The technique is based on the simplex method, extended to handle the non-convex Rectified Linear Unit (ReLU) activation function, which is a crucial ingredient in many modern neural networks. The verification procedure tackles neural networks as a whole, without making any simplifying assumptions. We evaluated our technique on a prototype deep neural network implementation of the next-generation Airborne Collision Avoidance System for unmanned aircraft (ACAS Xu). Results show that our technique can successfully prove properties of networks that are an order of magnitude larger than the largest networks verified using existing methods.
Full-text available
Despite the highest classification accuracy in wide varieties of application areas, artificial neural network has one disadvantage. The way this Network comes to a decision is not easily comprehensible. The lack of explanation ability reduces the acceptability of neural network in data mining and decision system. This drawback is the reason why researchers have proposed many rule extraction algorithms to solve the problem. Recently, Deep Neural Network (DNN) is achieving a profound result over the standard neural network for classification and recognition problems. It is a hot machine learning area proven both useful and innovative. This paper has thoroughly reviewed various rule extraction algorithms, considering the classification scheme: decompositional, pedagogical, and eclectics. It also presents the evaluation of these algorithms based on the neural network structure with which the algorithm is intended to work. The main contribution of this review is to show that there is a limited study of rule extraction algorithm from DNN.
Deep learning methods are widely regarded as indispensable when it comes to designing perception pipelines for autonomous agents such as robots, drones or automated vehicles. The main reasons, however, for deep learning not being used for autonomous agents at large scale already are safety concerns. Deep learning approaches typically exhibit a black-box behavior which makes it hard for them to be evaluated with respect to safety-critical aspects. While there have been some work on safety in deep learning, most papers typically focus on high-level safety concerns. In this work, we seek to dive into the safety concerns of deep learning methods on a deeply technical level. Additionally, we present extensive discussions on possible mitigation methods and give an outlook regarding what mitigation methods are still missing in order to facilitate an argumentation for the safety of a deep learning method.
Deep Learning (DL) bildet den neusten State-of-the-Art der Bilderkennung und wird mit Erfolg in vielen Gebieten angewendet, z.B. für autonomes Fahren, Gesichtserkennung und Spracherkennung[1]. Deep Learning ist eine Methode der künstlichen Intelligenz (KI) und basiert auf tiefen, künstlichen neuronalen Netzen. Anstatt mit viel Aufwand fest codierte Regeln durch einen menschlichen Experten festzulegen, werden mit einem Deep-Learning-Bilderkennungssystem statistische Muster aus Beispieldaten gelernt. Durch den Einsatz von Deep Learning werden viele Aufgaben in der Bilderkennung am Computer lösbar, die zuvor mit klassischen Methoden der Bildverarbeitung nicht zufriedenstellend bearbeitet werden konnten. Eine der häufigsten Anwendungen von Deep Learning ist die Erkennung und Analyse von Bildern. Hierzu zählt z. B. die Erkennung von Objekten (Auto, Fußgänger), die Segmentierung von Bildbereichen (normales Gewebe, Tumorgewebe) [2] oder das Generieren von Bildunterschriften („ein großer weißer Vogel steht im Wald“) [3]. Die hohe Erkennungsgenauigkeit von Deep-Learning-Bilderkennungssystemen bietet vielversprechende Möglichkeiten für die Automatisierung langwieriger, monotoner Aufgaben und Erschließung neuer Anwendungsbereiche in der Bilderkennung.
Due to their ability to efficiently process unstructured and highly dimensional input data, machine learning algorithms are being applied to perception tasks for highly automated driving functions. The consequences of failures and insufficiencies in such algorithms are severe and a convincing assurance case that the algorithms meet certain safety requirements is therefore required. However, the task of demonstrating the performance of such algorithms is non-trivial, and as yet, no consensus has formed regarding an appropriate set of verification measures. This paper provides a framework for reasoning about the contribution of performance evidence to the assurance case for machine learning in an automated driving context and applies the evaluation criteria to a pedestrian recognition case study.
Conference Paper
Concolic testing combines program execution and symbolic analysis to explore the execution paths of a software program. In this paper, we develop the first concolic testing approach for Deep Neural Networks (DNNs). More specifically, we utilise quantified linear arithmetic over rationals to express test requirements that have been studied in the literature, and then develop a coherent method to perform concolic testing with the aim of better coverage. Our experimental results show the effectiveness of the concolic testing approach in both achieving high coverage and finding adversarial examples.
Conference Paper
Neural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly. In order to better understand the space of adversarial examples, we survey ten recent proposals that are designed for detection and compare their efficacy. We show that all can be defeated by constructing new loss functions. We conclude that adversarial examples are significantly harder to detect than previously appreciated, and the properties believed to be intrinsic to adversarial examples are in fact not. Finally, we propose several simple guidelines for evaluating future proposed defenses.
Machine learning (ML) plays an ever-increasing role in advanced automotive functionality for driver assistance and autonomous operation; however, its adequacy from the perspective of safety certification remains controversial. In this paper, we analyze the impacts that the use of ML as an implementation approach has on ISO 26262 safety lifecycle and ask what could be done to address them. We then provide a set of recommendations on how to adapt the standard to accommodate ML.
Concepts of design assurance for neural networks
  • J M Cluzeau
  • X Henriquel
  • G Rebender
Cluzeau, J.M., Henriquel, X., Rebender, G., et al.: Concepts of design assurance for neural networks. Technical report, European Union Aviation Safety Agency (EASA) (2020)
Guideline for the development of deep learning image recognition systems. Beuth Verlag, 2020-04 edn
Deutsches Institut für Normung e.V.: DIN SPEC 13266:2020-04: Guideline for the development of deep learning image recognition systems. Beuth Verlag, 2020-04 edn, April 2020.
Benchmarking uncertainty estimation methods for deep learning with safety-related metrics
  • M Henne
  • A Schwaiger
  • K Roscher
  • G Weiss
Henne, M., Schwaiger, A., Roscher, K., Weiss, G.: Benchmarking uncertainty estimation methods for deep learning with safety-related metrics. In: Proceedings of the Workshop on Artificial Intelligence Safety, vol. 2560, pp. 83-90. (2020)
A consistent safety case argumentation for artificial intelligence in safety related automotive systems
  • S Voget
  • A Rudolph
  • J Mottok
Voget, S., Rudolph, A., Mottok, J.: A consistent safety case argumentation for artificial intelligence in safety related automotive systems. In: Proceedings of the 9th European Congress Embedded Real Time Systems (2018)
ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness
  • R Geirhos
  • P Rubisch
  • C Michaelis
  • M Bethge
  • F A Wichmann
  • W Brendel
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: Proceedings of the 7th International Conference on Learning Representations (2018)
Strategy to increase the safety of a dnn-based perception for HAD systems
  • T Sämann
  • P Schlicht
  • F Hüger
Sämann, T., Schlicht, P., Hüger, F.: Strategy to increase the safety of a dnn-based perception for HAD systems. CoRR abs/2002.08935 (2020)