Fig 12 - uploaded by Guillermo Gallego
Content may be subject to copyright.
Mapping. Qualitative comparison of mapping results (depth estimation) on several sequences using various stereo algorithms. The first column shows intensity frames from the DAVIS camera (not used, just for visualization). Columns 2 to 5 show inverse depth estimation results of GTS [26], SGM [45], CopNet [62] and our method, respectively. Depth maps are color coded, from red (close) to blue (far) over a black background, in the range 0.55-6.25 m for the top four rows (sequences from [21]) and the range 1-6.25 m for the bottom two rows (sequences from [55]).

Mapping. Qualitative comparison of mapping results (depth estimation) on several sequences using various stereo algorithms. The first column shows intensity frames from the DAVIS camera (not used, just for visualization). Columns 2 to 5 show inverse depth estimation results of GTS [26], SGM [45], CopNet [62] and our method, respectively. Depth maps are color coded, from red (close) to blue (far) over a black background, in the range 0.55-6.25 m for the top four rows (sequences from [21]) and the range 1-6.25 m for the bottom two rows (sequences from [55]).

Source publication
Article
Full-text available
Event-based cameras are bioinspired vision sensors whose pixels work independently from each other and respond asynchronously to brightness changes, with microsecond resolution. Their advantages make it possible to tackle challenging scenarios in robotics, such as high-speed and high dynamic range scenes. We present a solution to the problem of vis...

Contexts in source publication

Context 1
... in time so that the evaluation does not depend on the tracking module. Due to software incompatibility, propagation was not applied to CopNet. Therefore, CopNet is called only at the evaluation time; however, the density of its resulting inverse depth map is satisfactory when fed with enough number of events (15 000 events [63]). b) Results: Fig. 12 compares the inverse depth maps produced by the above stereo methods. The first column shows the raw grayscale frames from the DAVIS [64], which only illustrate the appearance of the scenes because the methods do not use intensity information. The second to the last columns show inverse depth maps produced by GTS, SGM, CopNet and our ...
Context 2
... table IV quantifies the depth errors for the last two sequences of Fig. 12, which are the ones where ground truth depth is available (acquired using a LiDAR [55]). Our method outperforms the baseline methods in all criteria: mean, median and relative error (with respect to the depth ...
Context 3
... we note an effect that appears in some reconstructions, even when computed using ground truth poses (Section VI-C). We observe that edges that are parallel to the baseline of the stereo rig, such as the upper edge of the monitor in rpg reader and the hoops on the barrel in upenn flying3 (Fig. 12), are difficult to recover regardless of the motion. All stereo methods suffer from this: although GTS, SGM and CopNet can return depth estimates for those parallel structures, they are typically unreliable; our method is able to reason about uncertainty and therefore rejects such estimates. In this respect, Fig. 17 shows two horizontal ...

Citations

... Visual-Inertial Odometry (VIO) pipelines such as Ultimate SLAM [149] or ESVIO [150] fuse event and inertial data, often using continuous-time trajectory models. Stereo event cameras have also been employed to recover depth through temporal and spatial consistency [151,152], while RGB-D setups like DEVO [153] combine event streams with depth sensors to enhance mapping fidelity. ...
Preprint
Full-text available
Neuromorphic, or event, cameras represent a transformation in the classical approach to visual sensing encodes detected instantaneous per-pixel illumination changes into an asynchronous stream of event packets. Their novelty compared to standard cameras lies in the transition from capturing full picture frames at fixed time intervals to a sparse data format which, with its distinctive qualities, offers potential improvements in various applications. However, these advantages come at the cost of reinventing algorithmic procedures or adapting them to effectively process the new data format. In this survey, we systematically examine neuromorphic vision along three main dimensions. First, we highlight the technological evolution and distinctive hardware features of neuromorphic cameras from their inception to recent models. Second, we review image processing algorithms developed explicitly for event-based data, covering key works on feature detection, tracking, and optical flow -which form the basis for analyzing image elements and transformations -as well as depth and pose estimation or object recognition, which interpret more complex scene structures and components. These techniques, drawn from classical computer vision and modern data-driven approaches, are examined to illustrate the breadth of applications for event-based cameras. Third, we present practical application case studies demonstrating how event cameras have been successfully used across various industries and scenarios. Finally, we analyze the challenges limiting widespread adoption, identify significant research gaps compared to standard imaging techniques, and outline promising future directions and opportunities that neuromorphic vision offers.
... Some methods combine depth map or standard cameras with event cameras to reconstruct 3D scenes, sacrificing the advantages of high temporal resolution offered by event cameras. Other approaches use stereo visual odometry (VO) (Zhou, Gallego, and Shen 2021) or SLAM ) to address these issues, but they can only reconstruct sparse 3D models like point clouds. The sparsity limits their broader applicability. ...
Article
Compared to frame-based methods, computational neuromorphic imaging using event cameras offers significant advantages, such as minimal motion blur, enhanced temporal resolution, and high dynamic range. The multi-view consistency of Neural Radiance Fields combined with the unique benefits of event cameras, has spurred recent research into reconstructing NeRF from data captured by moving event cameras. While showing impressive performance, existing methods rely on ideal conditions with the availability of uniform and high-quality event sequences and accurate camera poses, and mainly focus on the object level reconstruction, thus limiting their practical applications. In this work, we propose AE-NeRF to address the challenges of learning event-based NeRF from non-ideal conditions, including non-uniform event sequences, noisy poses, and various scales of scenes. Our method exploits the density of event streams and jointly learn a pose correction module with an event-based NeRF (e-NeRF) framework for robust 3D reconstruction from inaccurate camera poses. To generalize to larger scenes, we propose hierarchical event distillation with a proposal e-NeRF network and a vanilla e-NeRF network to resample and refine the reconstruction process. We further propose an event reconstruction loss and a temporal loss to improve the view consistency of the reconstructed scene. We established a comprehensive benchmark that includes large-scale scenes to simulate practical non-ideal conditions, incorporating both synthetic and challenging real-world event datasets. The experimental results show that our method achieves a new state-of-the-art in event-based 3D reconstruction.
... The reference SLAM algorithm is selected as ESVO 2 [39], which uses a stereo event camera setup of event cameras to generate synchronized timesurfaces and uses stereo semi-global matching to construct camera trajectory along with a sparse global map. The performance is evaluated by comparing the computed trajectories and ground-truth trajectories. ...
Preprint
Full-text available
Events offer a novel paradigm for capturing scene dynamics via asynchronous sensing, but their inherent randomness often leads to degraded signal quality. Event signal filtering is thus essential for enhancing fidelity by reducing this internal randomness and ensuring consistent outputs across diverse acquisition conditions. Unlike traditional time series that rely on fixed temporal sampling to capture steady-state behaviors, events encode transient dynamics through polarity and event intervals, making signal modeling significantly more complex. To address this, the theoretical foundation of event generation is revisited through the lens of diffusion processes. The state and process information within events is modeled as continuous probability flux at threshold boundaries of the underlying irradiance diffusion. Building on this insight, a generative, online filtering framework called Event Density Flow Filter (EDFilter) is introduced. EDFilter estimates event correlation by reconstructing the continuous probability flux from discrete events using nonparametric kernel smoothing, and then resamples filtered events from this flux. To optimize fidelity over time, spatial and temporal kernels are employed in a time-varying optimization framework. A fast recursive solver with O(1) complexity is proposed, leveraging state-space models and lookup tables for efficient likelihood computation. Furthermore, a new real-world benchmark Rotary Event Dataset (RED) is released, offering microsecond-level ground truth irradiance for full-reference event filtering evaluation. Extensive experiments validate EDFilter's performance across tasks like event filtering, super-resolution, and direct event-based blob tracking. Significant gains in downstream applications such as SLAM and video reconstruction underscore its robustness and effectiveness.
... We use the above sequences with different noise rates, 1, 3, 5, 7, 10 Hz per pixel, following prior work [28]. The ECD dataset [39] is a standard dataset for various tasks including camera ego-motion estimation [15,40,46,48,59,60]. Using a DAVIS240C camera (240×180 px [7]), each sequence provides events, frames, calibration information, IMU data, and ground truth (GT) camera poses (at 200 Hz). ...
Preprint
Full-text available
Event cameras are emerging vision sensors, whose noise is challenging to characterize. Existing denoising methods for event cameras consider other tasks such as motion estimation separately (i.e., sequentially after denoising). However, motion is an intrinsic part of event data, since scene edges cannot be sensed without motion. This work proposes, to the best of our knowledge, the first method that simultaneously estimates motion in its various forms (e.g., ego-motion, optical flow) and noise. The method is flexible, as it allows replacing the 1-step motion estimation of the widely-used Contrast Maximization framework with any other motion estimator, such as deep neural networks. The experiments show that the proposed method achieves state-of-the-art results on the E-MLB denoising benchmark and competitive results on the DND21 benchmark, while showing its efficacy on motion estimation and intensity reconstruction tasks. We believe that the proposed approach contributes to strengthening the theory of event-data denoising, as well as impacting practical denoising use-cases, as we release the code upon acceptance. Project page: https://github.com/tub-rip/ESMD
... Event-based Odometry: Existing Event Odometry (EO) approaches are developed specifically for event processing. While some approaches combine events with frames [8,24,25,46,67], event-only approaches can be classified as monocular EO [35,55], monocular EO with IMU [23,26], stereo EO [20,68], and stereo EO with IMU [44,53,54]. Due to the short history of event cameras, these systems require extensive research and development efforts to work reliably in practice. ...
Preprint
Event-based keypoint detection and matching holds significant potential, enabling the integration of event sensors into highly optimized Visual SLAM systems developed for frame cameras over decades of research. Unfortunately, existing approaches struggle with the motion-dependent appearance of keypoints and the complex noise prevalent in event streams, resulting in severely limited feature matching capabilities and poor performance on downstream tasks. To mitigate this problem, we propose SuperEvent, a data-driven approach to predict stable keypoints with expressive descriptors. Due to the absence of event datasets with ground truth keypoint labels, we leverage existing frame-based keypoint detectors on readily available event-aligned and synchronized gray-scale frames for self-supervision: we generate temporally sparse keypoint pseudo-labels considering that events are a product of both scene appearance and camera motion. Combined with our novel, information-rich event representation, we enable SuperEvent to effectively learn robust keypoint detection and description in event streams. Finally, we demonstrate the usefulness of SuperEvent by its integration into a modern sparse keypoint and descriptor-based SLAM framework originally developed for traditional cameras, surpassing the state-of-the-art in event-based SLAM by a wide margin. Source code and multimedia material are available at smartroboticslab.github.io/SuperEvent.
... EVO [2] integrated a novel event-based tracking pipeline using imageto-model alignment with an event-based 3D reconstruction approach [15] in parallel. ESVO [16] tackled the problem of purely event-based stereo odometry in a parallel tracking and mapping pipeline, which includes a novel mapping method optimized for spatio-temporal consistency across event streams and a tracking approach using 3D-2D registration. [17] utilized a geometry-based approach for event-only stereo feature detection and matching. ...
Preprint
Event cameras asynchronously output low-latency event streams, promising for state estimation in high-speed motion and challenging lighting conditions. As opposed to frame-based cameras, the motion-dependent nature of event cameras presents persistent challenges in achieving robust event feature detection and matching. In recent years, learning-based approaches have demonstrated superior robustness over traditional handcrafted methods in feature detection and matching, particularly under aggressive motion and HDR scenarios. In this paper, we propose SuperEIO, a novel framework that leverages the learning-based event-only detection and IMU measurements to achieve event-inertial odometry. Our event-only feature detection employs a convolutional neural network under continuous event streams. Moreover, our system adopts the graph neural network to achieve event descriptor matching for loop closure. The proposed system utilizes TensorRT to accelerate the inference speed of deep networks, which ensures low-latency processing and robust real-time operation on resource-limited platforms. Besides, we evaluate our method extensively on multiple public datasets, demonstrating its superior accuracy and robustness compared to other state-of-the-art event-based methods. We have also open-sourced our pipeline to facilitate research in the field: https://github.com/arclab-hku/SuperEIO.
... In 2018, Zhou et al. [4] proposed a forward-projection based depth estimation method by directly optimizing a temporal consistency energy across stereo time surfaces, without requiring disparity computation. Later in 2021, Zhou et al. [6] extended this concept by integrating stereo time surfaces with a stereo visual odometry framework, optimizing a spatiotemporal consistency objective for real-time semi-dense reconstruction. In 2022, Ghosh et al. [8] proposed a robust stereo depth estimation framework that fuses multi-view event ray densities (via Disparity Space Image, DSI), achieving highquality depth estimation without explicit disparity matching. ...
Preprint
Full-text available
Event cameras have gained increasing attention for 3D reconstruction due to their high temporal resolution, low latency, and high dynamic range. They capture per-pixel brightness changes asynchronously, allowing accurate reconstruction under fast motion and challenging lighting conditions. In this survey, we provide a comprehensive review of event-driven 3D reconstruction methods, including stereo, monocular, and multimodal systems. We further categorize recent developments based on geometric, learning-based, and hybrid approaches. Emerging trends, such as neural radiance fields and 3D Gaussian splatting with event data, are also covered. The related works are structured chronologically to illustrate the innovations and progression within the field. To support future research, we also highlight key research gaps and future research directions in dataset, experiment, evaluation, event representation, etc.
... Their method utilizes the events within a maximum-likelihood framework to estimate the camera motion in a known environment, employing non-linear optimization to minimize the photometric error. The event-based stereo visual odometry system is first proposed by Zhou et al. [43], which essentially employs a parallel tracking-andmapping philosophy. The mapping module builds a semi-dense 3D scene map, and the tracking module determines the camera pose by addressing the 3D-2D registration problem. ...
... For comparison, we adopt the key techniques of the method and apply them for object pose estimation and tracking. • ESVO: ESVO [43] represents the pioneering event-based stereo visual odometry methods, employing a parallel tracking-and-mapping paradigm. We primarily utilize the tracking module of ESVO for spacecraft tracking. ...
Preprint
Full-text available
Pose tracking of uncooperative spacecraft is an essential technology for space exploration and on-orbit servicing, which remains an open problem. Event cameras possess numerous advantages, such as high dynamic range, high temporal resolution, and low power consumption. These attributes hold the promise of overcoming challenges encountered by conventional cameras, including motion blur and extreme illumination, among others. To address the standard on-orbit observation missions, we propose a line-based pose tracking method for uncooperative spacecraft utilizing a stereo event camera. To begin with, we estimate the wireframe model of uncooperative spacecraft, leveraging the spatio-temporal consistency of stereo event streams for line-based reconstruction. Then, we develop an effective strategy to establish correspondences between events and projected lines of uncooperative spacecraft. Using these correspondences, we formulate the pose tracking as a continuous optimization process over 6-DOF motion parameters, achieved by minimizing event-line distances. Moreover, we construct a stereo event-based uncooperative spacecraft motion dataset, encompassing both simulated and real events. The proposed method is quantitatively evaluated through experiments conducted on our self-collected dataset, demonstrating an improvement in terms of effectiveness and accuracy over competing methods. The code will be open-sourced at https://github.com/Zibin6/SE6PT.
... Similarly, event-based methods for unsupervised depth estimation and egomotion estimation utilize the high temporal resolution of DVS outputs to generate real-time depth maps and motion trajectories [43,44]. Event-based SLAM frameworks [45,46,47] and visual odometry solutions [48,49] highlight the robustness of neuromorphic perception for localization and mapping under resource-constrained conditions. Techniques such as contrast maximization [50] and reward-based refinements [51] have further improved feature extraction and motion estimation, showcasing the flexibility of neuromorphic vision systems. ...
Preprint
Neuromorphic vision, inspired by biological neural systems, has recently gained significant attention for its potential in enhancing robotic autonomy. This paper presents a systematic exploration of a proposed Neuromorphic Navigation framework that uses event-based neuromorphic vision to enable efficient, real-time navigation in robotic systems. We discuss the core concepts of neuromorphic vision and navigation, highlighting their impact on improving robotic perception and decision-making. The proposed reconfigurable Neuromorphic Navigation framework adapts to the specific needs of both ground robots (Turtlebot) and aerial robots (Bebop2 quadrotor), addressing the task-specific design requirements (algorithms) for optimal performance across the autonomous navigation stack -- Perception, Planning, and Control. We demonstrate the versatility and the effectiveness of the framework through two case studies: a Turtlebot performing local replanning for real-time navigation and a Bebop2 quadrotor navigating through moving gates. Our work provides a scalable approach to task-specific, real-time robot autonomy leveraging neuromorphic systems, paving the way for energy-efficient autonomous navigation.
... Typically, event-based estimators require the accumulation of asynchronous event data to process [1]- [4]. Various methods have been explored for the event stream representation. ...
... Event-based visual odometry has gained significant attention due to its low latency, low power consumption, and high accuracy in fast-motion systems [1]- [4]. Unlike traditional cameras that rely on external constant-frequency triggers, event cameras capture asynchronous data, mimicking human vision. ...
... Building on event representation, various odometry algorithms have been developed. In [1], Zhou et al. first proposed to perform mapping and tracking process simultaneously in classic stereo event-based visual odometry (ESVO). The mapping module estimates depth through stereo disparity, generating reference frames to perform pose estimation in the tracking module. ...
Preprint
Full-text available
Event-based visual odometry has recently gained attention for its high accuracy and real-time performance in fast-motion systems. Unlike traditional synchronous estimators that rely on constant-frequency (zero-order) triggers, event-based visual odometry can actively accumulate information to generate temporally high-order estimation triggers. However, existing methods primarily focus on adaptive event representation after estimation triggers, neglecting the decision-making process for efficient temporal triggering itself. This oversight leads to the computational redundancy and noise accumulation. In this paper, we introduce a temporally high-order event-based visual odometry with spiking event accumulation networks (THE-SEAN). To the best of our knowledge, it is the first event-based visual odometry capable of dynamically adjusting its estimation trigger decision in response to motion and environmental changes. Inspired by biological systems that regulate hormone secretion to modulate heart rate, a self-supervised spiking neural network is designed to generate estimation triggers. This spiking network extracts temporal features to produce triggers, with rewards based on block matching points and Fisher information matrix (FIM) trace acquired from the estimator itself. Finally, THE-SEAN is evaluated across several open datasets, thereby demonstrating average improvements of 13% in estimation accuracy, 9% in smoothness, and 38% in triggering efficiency compared to the state-of-the-art methods.