Conference PaperPDF Available

Collaborative Monocular SLAM with Crowd-Sourced Data

Authors:
A preview of the PDF is not available
... This centralized architecture is based on the most recent developments to share anchors between users and platforms. While SLAM-algorithms from researchers have already been optimized for collaborative SLAM [29], commercial implementations are currently also adding support for environment data sharing between multiple clients (Google Cloud Anchors [30], Apple Shared Experiences [31]). These new APIs will be a key enabler for future improvements to our system. ...
Conference Paper
Full-text available
For Industry 4.0 – the Internet of Things (IoT) in an industrial manner – new methodologies for support and collaboration of employees are needed. One of these methodologies combines existing work practices with support through technologies like Augmented Reality (AR). Therefore, usability concepts for appropriate hardware as well as the data transfer need to be analyzed and designed within applicable industry standards. In this paper, we present two different use cases ("Real-Time Machine Data Overlay" and "Web-Based AR Remote Support") in the context of collaboration and support of employees. Both use cases are focusing on three main requirements: 1) Effective data transmission; 2) Devices certified for industrial environments; and 3) Usability targeted towards industrial users. Additionally, we present an architecture recommendation for combining both use cases as well as a discussion of the benefits and the limitations of our approaches leading to future directions.
... The content of this chapter is largely based on [69]. To better estimate these parameters of a camera-IMU system commonly found on mobile devices, this chapter presents an online calibration method developed from the multi-state constraint Kalman filter (MSCKF) algorithm [70]; online roughly meaning that calibrating parameters and estimating the motion of the system are done simultaneously. ...
Conference Paper
Full-text available
For large-scale and long-term simultaneous localization and mapping (SLAM), a robot has to deal with unknown initial positioning caused by either the kidnapped robot problem or multi-session mapping. This paper addresses these problems by tying the SLAM system with a global loop closure detection approach, which intrinsically handles these situations. However, online processing for global loop closure detection approaches is generally influenced by the size of the environment. The proposed graph-based SLAM system uses a memory management approach that only consider portions of the map to satisfy online processing requirements. The approach is tested and demonstrated using five indoor mapping sessions of a building using a robot equipped with a laser rangefinder and a Kinect.
Article
Full-text available
This paper presents ORB-SLAM, a feature-based monocular SLAM system that operates in real time, in small and large, indoor and outdoor environments. The system is robust to severe motion clutter, allows wide baseline loop closing and relocalization, and includes full automatic initialization. Building on excellent algorithms of recent years, we designed from scratch a novel system that uses the same features for all SLAM tasks: tracking, mapping, relocalization, and loop closing. A survival of the fittest strategy that selects the points and keyframes of the reconstruction leads to excellent robustness and generates a compact and trackable map that only grows if the scene content changes, allowing lifelong operation. We present an exhaustive evaluation in 27 sequences from the most popular datasets. ORB-SLAM achieves unprecedented performance with respect to other state-of-the-art monocular SLAM approaches. For the benefit of the community, we make the source code public.
Article
Full-text available
Global Positioning System (GPS) has been used as a primary source of navigation in land and airborne applications. However, challenging environments cause GPS signal blockage or degradation, and prevent reliable and seamless positioning and navigation using GPS only. Therefore, multi-sensor based navigation systems have been developed to overcome the limitations of GPS by adding some forms of augmentation. The next step towards assured robust navigation is to combine information from multiple ground-users, to further improve the chance of obtaining reliable navigation and positioning information. Collaborative (or cooperative) navigation can improve the individual navigation solution in terms of both accuracy and coverage, and may reduce the system's design cost, as equipping all users with high performance multi-sensor positioning systems is not cost effective. Generally, ‘Collaborative Navigation’ uses inter-nodal range measurements between platforms (users) to strengthen the navigation solution. In the collaborative navigation approach, the inter-nodal distance vectors from the known or more accurate positions to the unknown locations can be established. Therefore, the collaborative navigation technique has the advantage in that errors at the user's position can be compensated by other known (or more accurate) positions of other platforms, and may result in the improvement of the navigation solutions for the entire group of users. In this paper, three statistical network-based collaborative navigation algorithms, the Restricted Least-Squares Solution (RLESS), the Stochastic Constrained Least-Squares Solution (SCLESS) and the Best Linear Minimum Partial Bias Estimation (BLIMPBE) are proposed and compared to the Kalman filter. The proposed statistical collaborative navigation algorithms for network solution show better performance than the Kalman filter.
Conference Paper
We present a framework for large-scale crowdsourcing of first-person viewpoint videos recorded on mobile devices. Collecting videos at a massive scale poses a number of major issues in terms of network planning. To improve the scalability with regards to the number of users, videos and geographical area and better cope with restrictions on storage, bandwidth and processing power, the framework is distributed and based on the two-layer cloudlet architecture. To mitigate the limited bandwidth in the access network, a set of decision algorithms is constructed and evaluated that are able to filter out irrelevant videos based on their metadata and given selection criteria. To illustrate the crowdsourcing framework, we present Street View Live, an application for presenting videos based on location, similar to the popular Google Street View but with up-to-date videos covering the location instead of possibly outdated images. In order to have an up-to-date view of every location, the video collection is continuously extended and updated by crowdsourcing videos from mobile devices.
Article
This paper presents an architecture, protocol, and parallel algorithms for collaborative 3D mapping in the cloud with low-cost robots. The robots run a dense visual odometry algorithm on a smartphone-class processor. Key-frames from the visual odometry are sent to the cloud for parallel optimization and merging with maps produced by other robots. After optimization the cloud pushes the updated poses of the local key-frames back to the robots. All processes are managed by Rapyuta, a cloud robotics framework that runs in a commercial data center. This paper includes qualitative visualization of collaboratively built maps, as well as quantitative evaluation of localization accuracy, bandwidth usage, processing speeds, and map storage.
Article
In this document we describe a C++ framework for performing the optimization of nonlinear least squares problems that can be embedded as a graph or in an hyper-graph. An hyper-graph is an extension of a graph where an edge can connect multiple nodes and not only two. Several problems in robotics and in computer vision require to find the optimum of an error function with respect of a set of parameters. Examples include, popular applications like SLAM and Bundle adjustment. In the literature, many approaches have been proposed to address this class of problems. The naive implementation of standard methods, like Levenberg-Marquardt or Gauss-newton can lead to acceptable results for most applications, when the correct parameterization is chosen. However, to achieve the maximum performances substantial efforts might be required. g 2 o stands for General (Hyper) Graph Optimization. The purposes of this framework are the follow-ing: • To provide an easy-to-extend and easy-to-use general library for graph optimization that can be easily applied to different problems, • To provide people who want to understand SLAM or BA with an easy-to-read implementation that focuses on the relevant details of the problem specification. • Achieve state-of-the-art performances, while being as general as possible. In the remainder of this document we will first characterize the (hyper) graph-embeddable problems, and we will give an introduction to their solution via the popular Levenberg-Marquardt or Gauss-Newton algorithms implemented in this library. Subsequently, we will describe the high-level behavior of the library, and the basic structures. Finally, we will introduce how to implement 2D SLAM as a simple example. This document is not a replacement for the in-line documentation. Instead, it is a digest to help the user/reader to read/browse and extend the code.
Article
This paper describes a system for performing real-time multi-session visual mapping in large-scale environments. Multi-session mapping considers the problem of combining the results of multiple simultaneous localisation and mapping (SLAM) missions performed repeatedly over time in the same environment. The goal is to robustly combine multiple maps in a common metrical coordinate system, with consistent estimates of uncertainty. Our work employs incremental smoothing and mapping (iSAM) as the underlying SLAM state estimator and uses an improved appearance-based method for detecting loop closures within single mapping sessions and across multiple sessions. To stitch together pose graph maps from multiple visual mapping sessions, we employ spatial separator variables, called anchor nodes, to link together multiple relative pose graphs. The system architecture consists of a separate front-end for computing visual odometry and windowed bundle adjustment on individual sessions, in conjunction with a back-end for performing the place recognition and multi-session mapping. We provide experimental results for real-time multi-session visual mapping on wheeled and handheld datasets in the MIT Stata Center. These results demonstrate key capabilities that will serve as a foundation for future work in large-scale persistent visual mapping.
Conference Paper
Today, visual recognition systems are still rarely employed in robotics applications. Perhaps one of the main reasons for this is the lack of demanding benchmarks that mimic such scenarios. In this paper, we take advantage of our autonomous driving platform to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection. Our recording platform is equipped with four high resolution video cameras, a Velodyne laser scanner and a state-of-the-art localization system. Our benchmarks comprise 389 stereo and optical flow image pairs, stereo visual odometry sequences of 39.2 km length, and more than 200k 3D object annotations captured in cluttered scenarios (up to 15 cars and 30 pedestrians are visible per image). Results from state-of-the-art algorithms reveal that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world. Our goal is to reduce this bias by providing challenging benchmarks with novel difficulties to the computer vision community. Our benchmarks are available online at: www.cvlibs.net/datasets/kitti.
Article
We propose a novel method for visual place recognition using bag of words obtained from accelerated segment test (FAST)+BRIEF features. For the first time, we build a vocabulary tree that discretizes a binary descriptor space and use the tree to speed up correspondences for geometrical verification. We present competitive results with no false positives in very different datasets, using exactly the same vocabulary and settings. The whole technique, including feature extraction, requires 22 ms/frame in a sequence with 26 300 images that is one order of magnitude faster than previous approaches.