Conference Paper

Multimodal Failure Prediction for Vision-based Manipulation Tasks with Camera Faults

Authors:
  • Proximity Robotics & Automation GmbH
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Despite the rapid advancement of navigation algorithms, mobile robots often produce anomalous behaviors that can lead to navigation failures. The ability to detect such anomalous behaviors is a key component in modern robots to achieve high-levels of autonomy. Reactive anomaly detection methods identify anomalous task executions based on the current robot state and thus lack the ability to alert the robot before an actual failure occurs. Such an alert delay is undesirable due to the potential damage to both the robot and the surrounding objects. We propose a proactive anomaly detection network (PAAD) for robot navigation in unstructured and uncertain environments. PAAD predicts the probability of future failure based on the planned motions from the predictive controller and the current observation from the perception module. Multi-sensor signals are fused effectively to provide robust anomaly detection in the presence of sensor occlusion as seen in field environments. Our experiments on field robot data demonstrates superior failure identification performance than previous methods, and that our model can capture anomalous behaviors in real-time while maintaining a low false detection rate in cluttered fields. Code is available at https://github.com/tianchenji/PAAD .
Article
Full-text available
We investigate whether a robot arm can learn to pick and throw arbitrary rigid objects into selected boxes quickly and accurately. Throwing has the potential to increase the physical reachability and picking speed of a robot arm. However, precisely throwing arbitrary objects in unstructured settings presents many challenges: from acquiring objects in grasps suitable for reliable throwing, to handling varying object-centric properties (e.g., mass distribution, friction, shape) and complex aerodynamics. In this work, we propose an end-to-end formulation that jointly learns to infer control parameters for grasping and throwing motion primitives from visual observations (RGB-D images of arbitrary objects in a bin) through trial and error. Within this formulation, we investigate the synergies between grasping and throwing (i.e., learning grasps that enable more accurate throws) and between simulation and deep learning (i.e., using deep networks to predict residuals on top of control parameters predicted by a physics simulator). The resulting system, TossingBot, is able to grasp and successfully throw arbitrary objects into boxes located outside its maximum reach range at 500+ mean picks per hour (600+ grasps per hour with 85% throwing accuracy); and generalizes to new objects and target locations.
Article
Full-text available
Detecting when something unusual has happened could help assistive robots operate more safely and effectively around people. However, the variability associated with people and objects in human environments can make anomaly detection difficult. We previously introduced an algorithm that uses a hidden Markov model (HMM) with a log-likelihood detection threshold that varies based on execution progress. We now present an improved version of our previous algorithm (HMM-D) and introduce a new algorithm based on Gaussian process regression (HMM-GP). We also present a new and more thorough evaluation of 8 anomaly detection algorithms with force, sound, and kinematic signals collected from a robot closing microwave doors, latching a toolbox, scooping yogurt, and feeding yogurt to able-bodied participants. Overall, HMM-GP had the highest performance in terms of area under the curve for these real-world tasks, and multiple modalities improved performance with some anomalies being better detected with particular modalities. With synthetic anomalies, HMM-D exhibited shorter detection delays and outperformed HMM-GP with high-magnitude anomalies. In general, higher-magnitude synthetic anomalies tended to be detected more rapidly.
Article
Full-text available
This paper gives the main definitions relating to dependability, a generic concept including a special case of such attributes as reliability, availability, safety, integrity, maintainability, etc. Security brings in concerns for confidentiality, in addition to availability and integrity. Basic definitions are given first. They are then commented upon, and supplemented by additional definitions, which address the threats to dependability and security (faults, errors, failures), their attributes, and the means for their achievement (fault prevention, fault tolerance, fault removal, fault forecasting). The aim is to explicate a set of general concepts, of relevance across a wide range of situations and, therefore, helping communication and cooperation among a number of scientific and technical communities, including ones that are concentrating on particular types of system, of system failures, or of causes of system failures.
Article
An autonomous service robot should be able to interact with its environment safely and robustly without requiring human assistance. Unstructured environments are challenging for robots since the exact prediction of outcomes is not always possible. Even when the robot behaviors are well-designed, the unpredictable nature of the physical robot-object interaction may lead to failures in object manipulation. In this work, we focus on detecting and classifying both manipulation and post-manipulation phase failures using the same exteroception setup. We cover a diverse set of failure types for primary tabletop manipulation actions. In order to detect these failures, we propose FINO-Net [1], a deep multimodal sensor fusion-based classifier network architecture. FINO-Net accurately detects and classifies failures from raw sensory data without any additional information on task description and scene state. In this work, we use our extended FAILURE dataset [1] with 99 new multimodal manipulation recordings and annotate them with their corresponding failure types. FINO-Net achieves 0.87 failure detection and 0.80 failure classification F1 scores. Experimental results show that FINO-Net is also appropriate for real-time use.
Conference Paper
Online error detection helps to reduce the risk of failure of safety-critical systems. However, due to the increasing complexity of modern Cyber-Physical Systems and the sophisticated interaction of their heterogeneous components, it becomes harder to apply traditional error detection methods. Nowadays, the popularity of Deep Learning-based error detection snowballs. DL-based methods achieved significant progress along with better results. This paper introduces the KrakenBox, a deep learning-based error detector for industrial Cyber-Physical Systems (CPS). It provides conceptual and technical details of the KrakenBox hardware, software, and a case study. The KrakenBox hardware is based on NVIDIA Jetson AGX Xavier, designed to empower the deep learning-based application and the extended alarm module. The KrakenBox software consists of several programs capable of collecting, processing, storing, and analyzing time-series data. The KrakenBox can be connected to the networked automation system either via Ethernet or wirelessly. The paper presents the KrakenBox architecture and results of experiments that allow the evaluation of the error detection performance for varying error magnitude. The results of these experiments demonstrate that the KrakenBox is able to improve the safety of a networked automation system.
Article
Anomaly detection is crucial to ensure the security of cyber-physical systems (CPS). However, due to the increasing complexity of CPSs and more sophisticated attacks, conventional anomaly detection methods, which face the growing volume of data and need domain-specific knowledge, cannot be directly applied to address these challenges. To this end, deep learning-based anomaly detection (DLAD) methods have been proposed. In this article, we review state-of-the-art DLAD methods in CPSs. We propose a taxonomy in terms of the type of anomalies, strategies, implementation, and evaluation metrics to understand the essential properties of current methods. Further, we utilize this taxonomy to identify and highlight new characteristics and designs in each CPS domain. Also, we discuss the limitations and open problems of these methods. Moreover, to give users insights into choosing proper DLAD methods in practice, we experimentally explore the characteristics of typical neural models, the workflow of DLAD methods, and the running performance of DL models. Finally, we discuss the deficiencies of DL approaches, our findings, and possible directions to improve DLAD methods and motivate future research.
Article
Consistently testing autonomous mobile robots in real world scenarios is a necessary aspect of developing autonomous navigation systems. Each time the human safety monitor disengages the robot's autonomy system due to the robot performing an undesirable maneuver, the autonomy developers gain insight into how to improve the autonomy system. However, we believe that these disengagements not only show where the system fails, which is useful for troubleshooting, but also provide a direct learning signal by which the robot can learn to navigate. We present a reinforcement learning approach for learning to navigate from disengagements, or LaND. LaND learns a neural network model that predicts which actions lead to disengagements given the current sensory observation, and then at test time plans and executes actions that avoid disengagements. Our results demonstrate LaND can successfully learn to navigate in diverse, real world sidewalk environments, outperforming both imitation learning and reinforcement learning approaches. Videos, code, and other material are available on our website https://sites.google.com/view/sidewalk-learning .
Article
Mobile robot navigation is typically regarded as a geometric problem, in which the robot's objective is to perceive the geometry of the environment in order to plan collision-free paths towards a desired goal. However, a purely geometric view of the world can be insufficient for many navigation problems. For example, a robot navigating based on geometry may avoid a field of tall grass because it believes it is untraversable, and will therefore fail to reach its desired goal. In this work, we investigate how to move beyond these purely geometric-based approaches using a method that learns about physical navigational affordances from experience. Our reinforcement learning approach, which we call BADGR, is an end-to-end learning-based mobile robot navigation system that can be trained with autonomously-labeled off-policy data gathered in real-world environments, without any simulation or human supervision. BADGR can navigate in real-world urban and off-road environments with geometrically distracting obstacles. It can also incorporate terrain preferences, generalize to novel environments, and continue to improve autonomously by gathering more data. Videos, code, and other supplemental material are available on our website https://sites.google.com/view/badgr</uri
Article
Intelligent manipulation benefits from the capacity to flexibly control an end-effector with high degrees of freedom (DoF) and dynamically react to the environment. However, due to the challenges of collecting effective training data and learning efficiently, most grasping algorithms today are limited to top-down movements and open-loop execution. In this work, we propose a new low-cost hardware interface for collecting grasping demonstrations by people in diverse environments. This data makes it possible to train a robust end-to-end 6DoF closed-loop grasping model with reinforcement learning that transfers to real robots. A key aspect of our grasping model is that it uses "action-view" based rendering to simulate future states with respect to different possible actions. By evaluating these states using a learned value function (e.g., Q-function), our method is able to better select corresponding actions that maximize total rewards (i.e., grasping success). Our final grasping system is able to achieve reliable 6DoF closed-loop grasping of novel objects across various scene configurations, as well as in dynamic scenes with moving objects.
Article
Just like many other topics in computer vision, image classification has achieved significant progress recently by using deep learning neural networks, especially the Convolutional Neural Networks (CNNs). Most of the existing works focused on classifying very clear natural images, evidenced by the widely used image databases such as Caltech-256, PASCAL VOCs and ImageNet. However, in many real applications, the acquired images may contain certain degradations that lead to various kinds of blurring, noise, and distortions. One important and interesting problem is the effect of such degradations to the performance of CNN-based image classification and whether degradation removal helps CNN-based image classification. More specifically, we wonder whether image classification performance drops with each kind of degradation, whether this drop can be avoided by including degraded images into training, and whether existing computer vision algorithms that attempt to remove such degradations can help improve the image classification performance. In this paper, we empirically study those problems for nine kinds of degraded images - hazy images, motion-blurred images, fish-eye images, underwater images, low resolution images, salt-and-peppered images, images with white Gaussian noise, Gaussian-blurred images and out-of-focus images. We expect this work can draw more interests from the community to study the classification of degraded images.
Article
The detection of anomalous executions is valuable for reducing potential hazards in assistive manipulation. Multimodal sensory signals can be helpful for detecting a wide range of anomalies. However, the fusion of high-dimensional and heterogeneous modalities is a challenging problem. We introduce a long short-term memory based variational autoencoder (LSTM-VAE) that fuses signals and reconstructs their expected distribution. We also introduce an LSTM-VAE-based detector using a reconstruction-based anomaly score and a state-based threshold. For evaluations with 1,555 robot-assisted feeding executions including 12 representative types of anomalies, our detector had a higher area under the receiver operating characteristic curve (AUC) of 0.8710 than 5 other baseline detectors from the literature. We also show the multimodal fusion through the LSTM-VAE is effective by comparing our detector with 17 raw sensory signals versus 4 hand-engineered features.