Figure 3 - uploaded by Kevin Doherty

Content may be subject to copyright.

# 3D dynamic scene graphs (129). These are a recent application of scene graphs (previously common in the computer graphics community) to the SLAM problem and provide a substantial step toward linking scene understanding and spatial perception methods. Figure courtesy of A. Rosinol.

Source publication

Simultaneous localization and mapping (SLAM) is the process of constructing a global model of an environment from local observations of it; this is a foundational capability for mobile robots, supporting such core functions as planning, navigation, and control. This article reviews recent progress in SLAM, focusing on advances in the expressive cap...

## Context in source publication

**Context 1**

... particular, 3D scene graph models (130,129) present a promising representational direction toward capturing object-level semantics, environment dynamics, and multiple spatial and semantic layers of abstraction (from the connectedness of unoccupied space, to rooms and buildings, and beyond). Scene graphs model the environment in terms of a directed graph where nodes can be entities such as objects or places and edges represent relationships between entities (depicted in Figure 3). The relationships modeled by a scene graph may be spatial or logical. ...

## Similar publications

It is essential to promote the intelligence and autonomy of Maritime Autonomous Surface Ships (MASSs). This study proposed an automatic collision-avoidance method based on an improved Artificial Potential Field (APF) with the formation of MASSs (F-MASSs). Firstly, the navigation environment model was constructed by the S-57 Electronic Navigation Ch...

Simultaneous localization and mapping (SLAM) is the process of constructing a global model of an environment from local observations of it; this is a foundational capability for mobile robots, supporting such core functions as planning, navigation, and control. This article reviews recent progress in SLAM, focusing on advances in the expressive cap...

## Citations

... In this process, how to reduce the uncertainty caused by sensor errors is very important. However, the neural network model itself has certain uncertainty, so when deep learning is introduced into SLAM, the uncertainty brought by deep learning is a factor to be dealt with [6]. Sünderhauf et al. [7] believe that the perception, decision-making, and action of robots all depend on incomplete and uncertain priori knowledge. ...

In recent years, some researchers have combined deep learning methods such as semantic segmentation with a visual SLAM to improve the performance of classical visual SLAM. However, the above method introduces the uncertainty of the neural network model. To solve the above problems, an improved feature selection method based on information entropy and feature semantic uncertainty is proposed in this paper. The former is used to obtain fewer and higher quality feature points, while the latter is used to correct the uncertainty of the network in feature selection. At the same time, in the initial stage of feature point selection, this paper first filters and eliminates the absolute dynamic object feature points in the a priori information provided by the feature point semantic label. Secondly, the potential static objects can be detected combined with the principle of epipolar geometric constraints. Finally, the semantic uncertainty of features is corrected according to the semantic context. Experiments on the KITTI odometer data set show that compared with SIVO, the translation error is reduced by 12.63% and the rotation error is reduced by 22.09%, indicating that our method has better tracking performance than the baseline method.

... We address this issue by leveraging a recent body of work in the robotics and vision communities that deals with socalled certifiably correct methods [58]. These methods use convex semidefinite relaxations of non-convex, polynomial optimization problems (POPs) to either directly find a global optimum or provide a certificate of global optimality for a given solution. ...

... In this section, we review the well-known procedure for deriving convex, SDP relaxations of a standard form of POP. This procedure was pioneered by Shor [59] and has become the cornerstone of certifiably correct methods in robotics and computer vision [58,10]. ...

Differentiable optimization is a powerful new paradigm capable of reconciling model-based and learning-based approaches in robotics. However, the majority of robotics optimization problems are non-convex and current differentiable optimization techniques are therefore prone to convergence to local minima. When this occurs, the gradients provided by these existing solvers can be wildly inaccurate and will ultimately corrupt the training process. On the other hand, any non-convex robotics problems can be framed as polynomial optimization problems and, in turn, admit convex relaxations that can be used to recover a global solution via so-called certifiably correct methods. We present SDPRLayers, an approach that leverages these methods as well as state-of-the-art convex implicit differentiation techniques to provide certifiably correct gradients throughout the training process. We introduce this approach and showcase theoretical results that provide conditions under which correctness of the gradients is guaranteed. We demonstrate our approach on two simple-but-demonstrative simulated examples, which expose the potential pitfalls of existing, state-of-the-art, differentiable optimization methods. We apply our method in a real-world application: we train a deep neural network to detect image keypoints for robot localization in challenging lighting conditions. An open-source, PyTorch implementation of SDPRLayers will be made available upon paper acceptance.

... Products of orthogonal matrices appear in applications of orthogonal group synchronisation [29]. The simultaneous localization and mapping problem in robotics involves optimization over a product of Stiefel manifolds [34]. ...

We address the problem of minimizing a smooth function under smooth equality constraints. Under regularity assumptions on these constraints, we propose a notion of approximate first- and second-order critical point which relies on the geometric formalism of Riemannian optimization. Using a smooth exact penalty function known as Fletcher’s augmented Lagrangian, we propose an algorithm to minimize the penalized cost function which reaches ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}-approximate second-order critical points of the original optimization problem in at most O(ε-3)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(\varepsilon ^{-3})$$\end{document} iterations. This improves on current best theoretical bounds. Along the way, we show new properties of Fletcher’s augmented Lagrangian, which may be of independent interest.

... Terrestrial laser scanning (TLS) can capture the structural elements of a forest understory, provide a three-dimensional image of the assessed space, and quantify structural elements at a high resolution (Eichhorn et al., 2017) for use in ecological applications. Mobile laser scanning (MLS) is an emerging alternative to TLS which allows virtual reconstruction of the forest understory's key structural parameters through simultaneous localization and mapping, so-called SLAM, using automatic feature recognition to render a spatially accurate point cloud of the scanned area (Rosen et al., 2021). ...

Forest understory complexity is important for many species, from large herbivores such as deer to small mammals such as mice and voles. For species that utilize the forest understory on a very small scale, it is often impractical to conduct correspondingly fine‐grained manual surveys of the understory, and thus few studies consider this small‐scale variation in understory complexity and instead work with average values on a larger scale. We explored the use of a mobile laser scanning derived understory complexity measure—understory roughness—to predict the capture probability of two representative small mammal species, the yellow‐necked mouse ( Apodemus flavicollis ) and the bank vole ( Clethrionomys glareolus ). We found a positive relationship between capture probability and understory roughness for both bank voles and yellow‐necked mice. Our results suggest that mobile laser scanning is a promising technology for measuring understory complexity in an ecologically meaningful way.

... Existing surveys on SLAM have reviewed the fundamental challenges for accurate and robust large-scale applications [15], [16], [17], from early probabilistic approaches and data association [18], [19] to the potential use of deep learning [20]. SLAM components, including sensors to the embedded localization [21] have been intensively studied to provide a robust solution to many applications, like autonomous driving [22], search and rescue tasks, infrastructure inspection and 3D reconstruction in static and dynamic environments [23] with challenging conditions [24]. ...

... II. SLAM PIPELINE FROM SENSORS TO 3D RECONSTRUCTION The SLAM community has made remarkable improvement on the accuracy and robustness of large-scale applications in the recent years [16], [17], [15]. Figure 1 illustrates a conventional 3D scene reconstruction pipeline using imaging sensors and inertial measurements as inputs. ...

The 3D reconstruction of simultaneous localization and mapping (SLAM) is an important topic in the field for transport systems such as drones, service robots and mobile AR/VR devices. Compared to a point cloud representation, the 3D reconstruction based on meshes and voxels is particularly useful for high-level functions, like obstacle avoidance or interaction with the physical environment. This article reviews the implementation of a visual-based 3D scene reconstruction pipeline on resource-constrained hardware platforms. Real-time performances, memory management and low power consumption are critical for embedded systems. A conventional SLAM pipeline from sensors to 3D reconstruction is described, including the potential use of deep learning. The implementation of advanced functions with limited resources is detailed. Recent systems propose the embedded implementation of 3D reconstruction methods with different granularities. The trade-off between required accuracy and resource consumption for real-time localization and reconstruction is one of the open research questions identified and discussed in this paper.

... For applications like autonomous mobile robots (AMR) the WM is often a geometric map, which can be provided a-priori or created by the robot itself using Simultaneous Localization and Mapping (SLAM) [3]. New sensor data is compared to the geometric map for localization and object tracking, e.g. using particle filtering [4]. ...

Robots that have to robustly execute their task in an environment containing many variations need situational awareness to adapt at run-time. This work proposes a knowledge-centered software architecture with a world model (WM) as a first class citizen, from which other software components can query information in order to infer predictions, configure skills, and monitor the progress of the task. This approach is demonstrated on the task of detecting tomato trusses hanging from a plant, with possible occlusions from leaves. A Labeled Property Graph is used to model a tomato plant, which can be queried to create predictions of truss locations. This information is used to configure two tomato detection skills. First the plant is passively scanned for trusses. Association of the obtained information to the semantic objects in the model leads to multiple semantic hypotheses, that are explicitly modeled in the graph world model. If trusses are missing according to a hypothesis the second skill actively looks at inferred position of the undetected trusses. Tests shows that this approach of context-aware active perception allows the robot to decide when to look for missing trusses, which improves the detection of occluded trusses. Moreover, by keeping the task-, skill- and semantic association functionalities agnostic to the context, but relying on the answers to the queries to the world model, the approach is composable and flexible. This is shown by a qualitative test on a different tomato plant.

... S TATE-ESTIMATION is an integral component of modern robotics systems. Workhorse algorithms for stateestimation -such as localization and simultaneous localization and mapping (SLAM) -are now capable of estimating hundreds of thousands of states on a single processor in real time [48] and are far from the computational bottleneck of robotic systems. To obtain such levels of performance, these algorithms typically rely on local optimization methods (e.g., Gauss-Newton), which often exhibit super-linear convergence. ...

... low-rank nature of its semidefinite program (SDP) relaxation. A series of extensions to this method have been and continue to be developed [48]. ...

... Some of these methods boast runtimes that even rival stateof-the-art, local methods (e.g., Gauss-Newton-based methods [21]), with the added guarantee of a global certificate [10], [36]. An excellent review of the current state of certifiable methods is provided in [48]. ...

In recent years, there has been remarkable progress in the development of so-called certifiable perception methods, which leverage semidefinite, convex relaxations to find global optima of perception problems in robotics. However, many of these relaxations rely on simplifying assumptions that facilitate the problem formulation, such as an isotropic measurement noise distribution. In this paper, we explore the tightness of the semidefinite relaxations of matrix-weighted (anisotropic) state-estimation problems and reveal the limitations lurking therein: matrix-weighted factors can cause convex relaxations to lose tightness. In particular, we show that the semidefinite relaxations of localization problems with matrix weights may be tight only for low noise levels. We empirically explore the factors that contribute to this loss of tightness and demonstrate that redundant constraints can be used to regain tightness, albeit at the expense of real-time performance. As a second technical contribution of this paper, we show that the state-of-the-art relaxation of scalar-weighted SLAM cannot be used when matrix weights are considered. We provide an alternate formulation and show that its SDP relaxation is not tight (even for very low noise levels) unless specific redundant constraints are used. We demonstrate the tightness of our formulations on both simulated and real-world data.

... Meta AI, 2 Imperial College London,3 Reality Labs Research,4 Northwestern University(a) Ladybug (b) Dubrovnik (c) Final ...

... The literature on orthogonal synchronization is vast, appearing in multiple communities such as robotics, image processing, signal processing, and dynamical systems. We highlight a few salient references here; see also [22,23] for partial surveys. Many of the tools we use in our analysis have been used before. ...

... The second equality uses (19) and (23). Next, SBD(EẎẎ ⊤ ) = (p + r − 2)I rn implies ...

Orthogonal group synchronization is the problem of estimating $n$ elements $Z_1, \ldots, Z_n$ from the orthogonal group $\mathrm{O}(r)$ given some relative measurements $R_{ij} \approx Z_i^{}Z_j^{-1}$. The least-squares formulation is nonconvex. To avoid its local minima, a Shor-type convex relaxation squares the dimension of the optimization problem from $O(n)$ to $O(n^2)$. Burer--Monteiro-type nonconvex relaxations have generic landscape guarantees at dimension $O(n^{3/2})$. For smaller relaxations, the problem structure matters. It has been observed in the robotics literature that nonconvex relaxations of only slightly increased dimension seem sufficient for SLAM problems. We partially explain this. This also has implications for Kuramoto oscillators. Specifically, we minimize the least-squares cost function in terms of estimators $Y_1, \ldots, Y_n$. Each $Y_i$ is relaxed to the Stiefel manifold $\mathrm{St}(r, p)$ of $r \times p$ matrices with orthonormal rows. The available measurements implicitly define a (connected) graph $G$ on $n$ vertices. In the noiseless case, we show that second-order critical points are globally optimal as soon as $p \geq r+2$ for all connected graphs $G$. (This implies that Kuramoto oscillators on $\mathrm{St}(r, p)$ synchronize for all $p \geq r + 2$.) This result is the best possible for general graphs; the previous best known result requires $2p \geq 3(r + 1)$. For $p > r + 2$, our result is robust to modest amounts of noise (depending on $p$ and $G$). When local minima remain, they still achieve minimax-optimal error rates. Our proof uses a novel randomized choice of tangent direction to prove (near-)optimality of second-order critical points. Finally, we partially extend our noiseless landscape results to the complex case (unitary group), showing that there are no spurious local minima when $2p \geq 3r$.

... Visual localization and mapping can be defined as the tasks of recovering the position and attitude of objects from images and building a virtual representation of the world from multiple scenes relative to a reference frame [1]. ...

... On the other hand, vision-based methods use cameras, which are highly available and low-cost, have low power consumption, can be easily mounted on other devices, and provide a large amount of data of the scene. Due to these advantages, vision-based methods have become a hot topic in recent years, especially for applications of virtual and augmented reality, robotics, and autonomous driving [1][2][3]. ...

... In vision-based localization and mapping methods, keyframes can be selected to achieve sufficient visual coverage of the environment while keeping its representation simple for computational efficiency. Moreover, by carefully selecting this subset of keyframes, we can prevent useless or wrong information from being considered during the optimization, thus avoiding degenerative configurations from ill-conditioned systems; Figure 2. Despite being considered an essential factor for the visual localization and mapping algorithm performance [3], keyframe selection does not receive as much attention as other key techniques such as feature detection and matching [9], uncertainty modeling [1], and map modeling [10]. Executing pilot searches on the most popular digital libraries in the area of computer science and robotics, we could not find any comprehensive reviews related to this topic. ...

Visual localization and mapping algorithms attempt to estimate, from images, geometrical models that explain ego motion and the positions of objects in a real scene. The success of these tasks depends directly on the quality and availability of visual data, since the information is recovered from visual changes in images. Keyframe selection is a commonly used approach to reduce the amount of data to be processed as well as to prevent useless or wrong information to be considered during the optimization. This study aims to identify, analyze, and summarize the methods present in the literature for keyframe selection within the context of visual localization and mapping. We adopt a systematic literature review (SLR) as the basis of our work, built on top of a well-defined methodology. To the best of our knowledge, this is the first review related to this topic. The results show that there is a lack of studies present in the literature that directly address the keyframe selection problem in this application context and a deficiency in the testing and validation of the proposed methods. In addition to these findings, we also propose an updated categorization of the proposed methods on top of the well-discussed categories present in the literature. We believe that this SLR is a step toward developing a body of knowledge in keyframe selection within the context of visual localization and mapping tasks by encouraging the development of more theoretical and less heuristic methods and a systematic testing and validation process.