June 2024
·
15 Reads
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
June 2024
·
15 Reads
June 2024
·
23 Reads
We introduce Situation Monitor, a novel zero-shot Out-of-Distribution (OOD) detection approach for transformer-based object detection models to enhance reliability in safety-critical machine learning applications such as autonomous driving. The Situation Monitor utilizes the Diversity-based Budding Ensemble Architecture (DBEA) and increases the OOD performance by integrating a diversity loss into the training process on top of the budding ensemble architecture, detecting Far-OOD samples and minimizing false positives on Near-OOD samples. Moreover, utilizing the resulting DBEA increases the model's OOD performance and improves the calibration of confidence scores, particularly concerning the intersection over union of the detected objects. The DBEA model achieves these advancements with a 14% reduction in trainable parameters compared to the vanilla model. This signifies a substantial improvement in efficiency without compromising the model's ability to detect OOD instances and calibrate the confidence scores accurately.
June 2024
·
21 Reads
As transformer-based object detection models progress, their impact in critical sectors like autonomous vehicles and aviation is expected to grow. Soft errors causing bit flips during inference have significantly impacted DNN performance, altering predictions. Traditional range restriction solutions for CNNs fall short for transformers. This study introduces the Global Clipper and Global Hybrid Clipper, effective mitigation strategies specifically designed for transformer-based models. It significantly enhances their resilience to soft errors and reduces faulty inferences to ~ 0\%. We also detail extensive testing across over 64 scenarios involving two transformer models (DINO-DETR and Lite-DETR) and two CNN models (YOLOv3 and SSD) using three datasets, totalling approximately 3.3 million inferences, to assess model robustness comprehensively. Moreover, the paper explores unique aspects of attention blocks in transformers and their operational differences from CNNs.
June 2023
·
57 Reads
March 2023
·
41 Reads
·
7 Citations
IEEE Transactions on Parallel and Distributed Systems
Chips pack ever more, ever smaller transistors. Fault rates increase in turn and become more concerning, particularly at the scale of High-Performance Computing (HPC) systems: on one hand, hardware fault protection is costly - more than 10% silicon area for floating-point units; on the other, HPC users expect correct application output after the anticipated time of computation, but workloads are seldom bit-reproducible and tolerances in output are allowed for. Benign hardware faults causing errors within these tolerances are therefore acceptable: however, with abstract reliability targets such as ’undetected failures per time’, current HPC system design does not allow for pursuing trade-offs between reliability and performance with respect to faults. To address the above, we propose a user-centric reliability benchmark to specify HPC system reliability targets, allowing for better performance optimizations in hardware design, while meeting HPC user expectations. Our open-source Hardware Design Fault Injection Toolkit ( HDFIT ) enables - for the first time - end-to-end hardware design reliability experiments: from netlist-level fault injection to application output error. In a proof of concept we present an HPC general matrix multiply (GEMM) reliability study, targeting a series of popular applications, and using HDFIT to benchmark an open-source GEMM accelerator.
January 2023
·
87 Reads
·
8 Citations
IEEE Transactions on Intelligent Vehicles
Navigating safely through occlusion scenarios remains challenging for Autonomous Vehicles (AVs) due to onboard sensors with obstructed Fields of View (FoVs). Integrating Vehicle-to-Everything (V2X) communication with AVs is beneficial since it provides information beyond the onboard sensors' FoVs. To achieve safe driving behaviors in occlusion scenarios, we present a Partially Observable Markov Decision Process (POMDP) behavior planner enhanced with V2X communication. Our approach leverages the perception data from onboard sensors and V2X communications independently, eliminating the need for fusing them. The planner first employs onboard sensors to identify the occlusion areas. Then, it generates phantom road users within those areas to represent and consider the collision risk of potentially occluded real road users. Following this, we introduce a V2X communication module to provide the most promising detection result in the occluded area, taking factors like observation area coverage, communication latency, and sensor reliability into account. The detection result is subsequently applied to enhance presence and movement estimations for the phantom road users. Lastly, the detected real objects and phantom road users are integrated into the state space of a POMDP planner to provide safe driving policies. Various qualitative and quantitative evaluations demonstrate that our approach delivers safer, more efficient, and more comfortable driving policies in challenging occlusion scenarios when compared to the baseline method, which uses only onboard sensors, and the method that fuses onboard and V2X perceptions.
October 2022
·
32 Reads
·
12 Citations
October 2022
·
32 Reads
·
13 Citations
September 2022
·
91 Reads
Object detection neural network models need to perform reliably in highly dynamic and safety-critical environments like automated driving or robotics. Therefore, it is paramount to verify the robustness of the detection under unexpected hardware faults like soft errors that can impact a systems perception module. Standard metrics based on average precision produce model vulnerability estimates at the object level rather than at an image level. As we show in this paper, this does not provide an intuitive or representative indicator of the safety-related impact of silent data corruption caused by bit flips in the underlying memory but can lead to an over- or underestimation of typical fault-induced hazards. With an eye towards safety-related real-time applications, we propose a new metric IVMOD (Image-wise Vulnerability Metric for Object Detection) to quantify vulnerability based on an incorrect image-wise object detection due to false positive (FPs) or false negative (FNs) objects, combined with a severity analysis. The evaluation of several representative object detection models shows that even a single bit flip can lead to a severe silent data corruption event with potentially critical safety implications, with e.g., up to (much greater than) 100 FPs generated, or up to approx. 90% of true positives (TPs) are lost in an image. Furthermore, with a single stuck-at-1 fault, an entire sequence of images can be affected, causing temporally persistent ghost detections that can be mistaken for actual objects (covering up to approx. 83% of the image). Furthermore, actual objects in the scene are continuously missed (up to approx. 64% of TPs are lost). Our work establishes a detailed understanding of the safety-related vulnerability of such critical workloads against hardware faults.
August 2022
·
28 Reads
·
6 Citations
Lecture Notes in Computer Science
Object detection neural network models need to perform reliably in highly dynamic and safety-critical environments like automated driving or robotics. Therefore, it is paramount to verify the robustness of the detection under unexpected hardware faults like soft errors that can impact a system’s perception module. Standard metrics based on average precision produce model vulnerability estimates at the object level rather than at an image level. As we show in this paper, this does not provide an intuitive or representative indicator of the safety-related impact of silent data corruption caused by bit flips in the underlying memory but can lead to an over- or underestimation of typical fault-induced hazards. With an eye towards safety-related real-time applications, we propose a new metric IVMOD (Image-wise Vulnerability Metric for Object Detection) to quantify vulnerability based on an incorrect image-wise object detection due to false positive (FPs) or false negative (FNs) objects, combined with a severity analysis. The evaluation of several representative object detection models shows that even a single bit flip can lead to a severe silent data corruption event with potentially critical safety implications, with e.g., up to 100 FPs generated, or up to 90% of true positives (TPs) lost in an image. Furthermore, with a single stuck-at-1 fault, an entire sequence of images can be affected, causing temporally persistent ghost detections that can be mistaken for actual objects (covering up to 83% of the image). Furthermore, actual objects in the scene are continuously missed (up to 64% of TPs are lost). Our work establishes a detailed understanding of the safety-related vulnerability of such critical workloads against hardware faults.
... For example, Ref. [9] demonstrates the performance of collaborative perception algorithms in simulated environments but rarely applies them to real-world driving tasks. Ref. [32] assumes accurate agent position data, which is often impractical in realworld scenarios. Our approach bridges this gap by integrating theoretical strategies with higher-fidelity implementations, which utilize perception data directly from emulated raw sensor inputs for more realistic analysis. ...
January 2023
IEEE Transactions on Intelligent Vehicles
... So far, most of the efforts to evaluate the reliability of GEMM accelerators have concentrated on assessing the effect of transient faults on different SA topologies [25][26][27]. The authors of [28] analyzed the impact of soft errors on machine learning accelerators (e.g., NVDLA). ...
March 2023
IEEE Transactions on Parallel and Distributed Systems
... Zhang et al. proposed a safety reinforcement learning method for autonomous vehicles based on Barrier Lyapunov Function(BLF), which reasonably organized and incorporated BLF items into the optimized inverse control method, and constrained the state variables within the designed safety region during the learning process [25]. Zhang [26]. Cao et al. proposed a decision-making framework called Trustworthy Improvement Reinforcement Learning (TiRL), which combines reinforcement learning and rule-based algorithms to allow self-improvement while maintaining better system performance [27]. ...
October 2022
... One class of solutions employs Partially Observable Markov Decision Processes (POMDPs) to model the uncertainties associated with occluded areas. These methods reason about hidden traffic participants based on their potential trajectories and interactions [7], [8]. While POMDPbased approaches provide robust theoretical frameworks, their practical application often involves high computational complexity, limiting real-time feasibility. ...
October 2022
... In this study, our primary focus is on faults manifesting in the model weights. As highlighted in [35], it has been established that the neurons within a convolutional neural network (CNN) exhibit a resilience 50 times greater than that of their associated weights, a conclusion verified through the Ares framework [36]. It is worth noting that addressing faults in neurons will be a subject of future investigations. ...
August 2022
Lecture Notes in Computer Science
... Additionally, previous works [1], [8], [25] formulate this problem as a Partially Observable Markov Decision Process (POMDP) with carefully designed observation models for occlusion in the belief state. Active perception behavior has been modeled in recent game theory work. ...
June 2022
... Over the past ten years, many research efforts have been paid to optimizing visual sensors' placement in different applications including enlarging coverage of indoor surveillance cameras [10,11], widening FOV of sensors mounted on autonomous vehicles [12][13][14][15][16], improving sensing capability of roadside monitoring LiDAR [17][18][19][20][21][22][23][24][25]. Because of the very different problem context, we mainly focus on works that are aimed at improving the perception of traffic scenes by optimally placing roadside or infrastructure sensors. ...
May 2022
... • We enable HW designers to profit from these targets and optimize their designs accordingly. By extending our previous work on AI accelerators [7], we introduce a new methodology enabling HW designers to execute a given HPC reliability benchmark while simulating faults at the netlist level (Section 2). ...
January 2022
IEEE Design and Test
... Perception and prediction are two important modules of autonomous driving to improve safety [25] and reliability [26,27]. Traditional methods [28][29][30] conduct these two tasks in a cascade manner which first estimates the object detection and tracking results and predicts the object trajectory. ...
November 2021
... It leverages a probability distribution in the prediction of the agents' future states for effective planning. For instance, works [31], [32] leverage the POMDP framework to address navigation and decision-making problems in dynamic environments with partial occlusions. [16] further incorporates a belief state updating module to predict the scenarios more precisely for effective planning. ...
September 2021