Conference Paper

Semantic SLAM with Autonomous Object-Level Data Association

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... YOLOv3 [24] draws on the residual network structure and multi-scale detection ideas to form a deeper network level, improving mean average precision (mAP) and small object detection. Therefore, a large number of semantic vSLAM works [43]- [48] use this detector to meet the accuracy of object detection and localization in dynamic environments. Among these methods, Nicholson et al. [43] specially design a sensor model for object detector based on YOLOv3. ...
... where L di represents the measurement information of the i object in the keyframe F d , which usually consists of object category L c di ∈ C, detection confidence L s di , and object detection bounding box L b di in works [36]- [38], [40], [41], [45], [48], [63], [64]. These pieces of information can be obtained from the semantic object extraction methods [21], [31], [33] in Section II-A. ...
... However, the proposed method needs to improve real-time performance and tracking accuracy in the face of object switches and missed detection. Unlike the hierarchical object association strategy [64], the two-step strategy for object association proposed by [48] relies mainly on object category and appearance similarity to match landmarks, which is not applicable in outdoor environments. ...
Preprint
Full-text available
Visual Simultaneous Localization and Mapping (vSLAM) has achieved great progress in the computer vision and robotics communities, and has been successfully used in many fields such as autonomous robot navigation and AR/VR. However, vSLAM cannot achieve good localization in dynamic and complex environments. Numerous publications have reported that, by combining with the semantic information with vSLAM, the semantic vSLAM systems have the capability of solving the above problems in recent years. Nevertheless, there is no comprehensive survey about semantic vSLAM. To fill the gap, this paper first reviews the development of semantic vSLAM, explicitly focusing on its strengths and differences. Secondly, we explore three main issues of semantic vSLAM: the extraction and association of semantic information, the application of semantic information, and the advantages of semantic vSLAM. Then, we collect and analyze the current state-of-the-art SLAM datasets which have been widely used in semantic vSLAM systems. Finally, we discuss future directions that will provide a blueprint for the future development of semantic vSLAM.
... In these methods, the objects are inserted into the map with six degrees of freedom (6DoF). In this category, methods have represented the objects in the map using ellipsoids [28][29][30], cuboids [31], and other forms. The geometrical forms of representation, such as ellipsoids, have the advantage that they can provide a simple observation model (defined as the mathematical model of the observations and the parameters to be estimated). ...
... The geometrical forms of representation, such as ellipsoids, have the advantage that they can provide a simple observation model (defined as the mathematical model of the observations and the parameters to be estimated). Such models can be easily integrated into the process of FG [30] and GBA [29,31]. However, most of the current methods in the second group rely on an initially estimated position using classical solutions such as ORB SLAM 2 [32]. ...
Article
Full-text available
Object-level simultaneous localization and mapping (SLAM) has gained popularity in recent years since it can provide a means for intelligent robot-to-environment interactions. However, most of these methods assume that the distribution of the errors is Gaussian. This assumption is not valid under many circumstances. Further, these methods use a delayed initialization of the objects in the map. During this delayed period, the solution relies on the motion model provided by an inertial measurement unit (IMU). Unfortunately, the errors tend to accumulate quickly due to the dead-reckoning nature of these motion models. Finally, the current solutions depend on a set of salient features on the object’s surface and not the object’s shape. This research proposes an accurate object-level solution to the SLAM problem with a 4.1 to 13.1 cm error in the position (0.005 to 0.021 of the total path). The developed solution is based on Rao–Blackwellized Particle Filtering (RBPF) that does not assume any predefined error distribution for the parameters. Further, the solution relies on the shape and thus can be used for objects that lack texture on their surface. Finally, the developed tightly coupled IMU/camera solution is based on an undelayed initialization of the objects in the map.
... Wu Y [94] and Qian Z [95] proposed different semantic data association methods that were extended to ORB-SLAM2, which yielded satisfactory results in their respective systems. Wu Y et al. proposed an efficient and integrated data association strategy as the most significant contribution of EAO-SLAM. ...
... The IMU primarily improves the precision of the pose transformation matrix computed from the image's features. Similar to [95], image data is used as primary data, and LiDAR data is used as secondary data to construct 3D shapes. Since 3D point clouds are too computationally intensive, a common multi-sensor fusion method is to filter and extract features from point cloud data [149] and then generate 2D depth images before applying them in Visual SLAM. ...
Article
Simultaneous Localization and Mapping (SLAM) technology is essential for robots to navigate unfamiliar environments. It utilizes the sensors the robot carries to answer the question "Where am I?" Of the available sensors, cameras are commonly used. Compared to other sensors like LiDARs, the method based on cameras, known as Visual SLAM has been extensively explored by researchers due to the affordability and rich image data cameras provide. Although conventional Visual SLAM algorithms have been able to accurately build a map in static environments, dynamic environments present a significant challenge for Visual SLAM in practical robotics scenarios. While efforts have been made to address this issue, such as adding semantic segmentation to conventional algorithms, a comprehensive literature review is still lacking. This paper discusses the challenges and approaches of Visual SLAM with a focus on dynamic objects and their impact on feature extraction and mapping accuracy. First, two classical approaches of conventional Visual SLAM are reviewed, then the paper explores the application of deep learning in the front-end and back-end of Visual SLAM. Next, Visual SLAM in dynamic environments is analyzed and summarized, and insights into future developments are elaborated upon. This paper provides effective inspiration for researchers on how to combine deep learning and semantic segmentation with Visual SLAM to promote its development.
... This method depends on geometric information of point clouds and achieve good performance when enough feature points on objects are observed. However, object appearance which can deal with the lack of test points is not considered in [8].Qian et al. [36] uses a BoW (Bag of Words) [37] to describe object appearance for association. A BoW of an object is generated by feature points extracted from the bounding box. ...
... The BoW is a vector generated based on low-level key points and a pre-trained visual vocabulary [37], which is widely used in the place recognition and the loop detection. And some methods use it for the object-level data association [36]. The ReID vector is generated from deep feature maps extracted by a CNN for re-identification, which is widely used in multiple object tracking such as [31]. ...
Article
Full-text available
Traditional vSLAM methods extract feature points from images to track and the data association of points is based on low-level geometric clues. When points are observed from variant viewpoints, these clues are not robust for matching. In contrast, semantic information remains consistent for variance of viewpoints and observed scales. Therefore, semantic vSLAM methods gain more attention in recent years. In particular, object-level semantic information can be utilized to model the environment as object landmarks and has been fused into many vSLAM methods which are called object-level vSLAM methods. How to associate objects over consecutive images and how to utilize object information in the pose estimation are two key problems for object-level vSLAM methods. In this work, we propose an object-level vSLAM method which is aimed to solve the object-level data association and estimate camera poses using object semantic constraints. We present an object-level data association scheme considering object appearance and geometry of point landmarks, processing both objects and points matching for mutual improvements. We propose a semantic re-projection error function based on object-level semantic information and integrate it into the pose optimization, establishing longer term constraints. We performed experiments on public datasets including both indoor and outdoor scenes. The evaluation results demonstrate that our method can achieve high accuracy in the object-level data association and outperforms the baseline method in the pose estimation. An open-source version of the code is also available.
... In QuadricSLAM, they overlooked the data association problem and focused on ellipsoid initialization. In the following work, [14] used the BoW model as object representation and formulated the data association as a linear assignment problem. In CubeSLAM, the data association relied on feature point matching, and the bounding box that shared the most feature points was selected. ...
... Metric: The idea is to find the correspondences between ground-truth object IDs and object IDs in the data association algorithm. Similar to [14], we reformulate this problem as a linear assignment problem. The first partite set S consists of the object IDs in the data association algorithm. ...
Preprint
Loop closure can effectively correct the accumulated error in robot localization, which plays a critical role in the long-term navigation of the robot. Traditional appearance-based methods rely on local features and are prone to failure in ambiguous environments. On the other hand, object recognition can infer objects' category, pose, and extent. These objects can serve as stable semantic landmarks for viewpoint-independent and non-ambiguous loop closure. However, there is a critical object-level data association problem due to the lack of efficient and robust algorithms. We introduce a novel object-level data association algorithm, which incorporates IoU, instance-level embedding, and detection uncertainty, formulated as a linear assignment problem. Then, we model the objects as TSDF volumes and represent the environment as a 3D graph with semantics and topology. Next, we propose a graph matching-based loop detection based on the reconstructed 3D semantic graphs and correct the accumulated error by aligning the matched objects. Finally, we refine the object poses and camera trajectory in an object-level pose graph optimization. Experimental results show that the proposed object-level data association method significantly outperforms the commonly used nearest-neighbor method in accuracy. Our graph matching-based loop closure is more robust to environmental appearance changes than existing appearance-based methods.
... Therefore, some works have proposed object-oriented semantic mapping approaches. In contrast, objectoriented semantic maps 41,42 contain semantic information of object instances, and the semantic information is independent of the map in a clustered manner. Therefore, robots can be allowed to operate and maintain the semantics of each entity in the map. ...
Article
Full-text available
With the continuous development of robotics and computer vision technology, mobile robots have been widely applied in various fields. In this process, semantic maps for robots have attracted considerable attention because they provide a comprehensive and anthropomorphic representation of the environment. On the one hand, semantic maps are a tool for robots to depict the environment, which can enhance the robot’s cognitive expression of space and build the communication bond between robots and humans. On the other hand, semantic maps contain spatial location and semantic properties of entities, which helps robots realize intelligent decision-making in human-centered indoor environments. In this paper, we review the primary approaches of semantic mapping proposed over the last few decades, and group them according to the type of information used to extract semantics. First, we give a formal definition of semantic map and describe the techniques of semantic extraction. Then, the characteristics of different solutions are comprehensively analyzed from different perspectives. Finally, the open issues and future trends regarding semantic maps are discussed in detail. We wish this review provides a comprehensive reference for researchers to drive future research in related field.
... Incorporating semantic information into SLAM-generated maps can be categorized into three types of methods: (i) Object detection-based: Methods implement object-level detection (e.g., [21,38]) on RGB images to output 2D bounding boxes. After further processing, they either use a parameterized way to represent the detected object, such as Quadrics [37] and the pose of a pre-modeled object [42], or further perform geometric segmentation on the depth map [9,48]. (ii) Semantic segmentation-based: Methods process semantic segmentation on 2D RGB images and build 3D geometric maps separately. ...
Preprint
Full-text available
Creating 3D semantic reconstructions of environments is fundamental to many applications, especially when related to autonomous agent operation (e.g., goal-oriented navigation or object interaction and manipulation). Commonly, 3D semantic reconstruction systems capture the entire scene in the same level of detail. However, certain tasks (e.g., object interaction) require a fine-grained and high-resolution map, particularly if the objects to interact are of small size or intricate geometry. In recent practice, this leads to the entire map being in the same high-quality resolution, which results in increased computational and storage costs. To address this challenge, we propose MAP-ADAPT, a real-time method for quality-adaptive semantic 3D reconstruction using RGBD frames. MAP-ADAPT is the first adaptive semantic 3D mapping algorithm that, unlike prior work, generates directly a single map with regions of different quality based on both the semantic information and the geometric complexity of the scene. Leveraging a semantic SLAM pipeline for pose and semantic estimation, we achieve comparable or superior results to state-of-the-art methods on synthetic and real-world data, while significantly reducing storage and computation requirements.
... Object detection is also important for generating the objects' position, orientation, and category information, which is useful for semantic constraint and data association [35][36][37]. Despite more attention given to semantic SLAM, there has been limited research on object-level data association, although Qian et al. developed a maximum weighted bipartite matching method for this purpose [38]. Meanwhile, semantic maps can effectively represent what the robot has visited and what it has seen. ...
Article
Full-text available
Traditional simultaneous localization and mapping (SLAM) system tends to operate in small-area static environments, and its performance might degrade when moving objects appear in a highly dynamic environment. To address this issue, this paper proposes a dynamic object-aware visual SLAM algorithm specifically designed for dynamic indoor environments. The proposed method leverages a semantic segmentation architecture called U-Net, which is utilized in the tracking thread to detect potentially moving targets. The resulting output of semantic segmentation is tightly coupled with the geometric information extracted from the corresponding SLAM system, thus associating the feature points captured by images with the potentially moving targets. Finally, filtering out the moving feature points can greatly enhance localization accuracy in dynamic indoor environments. Quantitative and qualitative experiments were carried out on both the Technical University of Munich (TUM) public dataset and the real scenario dataset to verify the effectiveness and robustness of the proposed method. Results demonstrate that the semantics-guided approach significantly outperforms the ORB SLAM2 framework in dynamic indoor environments, which is crucial for improving the robustness and reliability of the SLAM system.
... Yang et al. [13] associated objects with feature points located in 2D detection boxes, and then inferred object associations through inter-frame feature matching. Similarly, [29] calculates the Bag of Words (BoW) vector of feature points as the appearance description of objects, and then perform BoW matching on the candidate objects that satisfy the reprojection relationship. Chen et al. [30] proposed a hierarchical object association strategy, and applied multi-object tracking for short-term object association. ...
Article
Full-text available
Object SLAM uses additional semantic information to detect and map objects in the scene, in order to improve the system’s perception and map representation capabilities. Previous methods often use quadrics and cuboids to represent objects, especially in monocular systems. However, their simplistic shapes are insufficient for effectively representing various types of objects, leading to a limitation in the accuracy of object maps and consequently impacting downstream task performance. In this paper, we propose a novel approach for representing objects in monocular SLAM using superquadrics (SQ) with shape parameters. Our method utilizes object appearance and geometry information comprehensively, enabling accurate estimation of object poses and adaptation to various object shapes. Additionally, we propose a lightweight data association strategy to accurately associate semantic observations across multiple views with object landmarks. We implement a monocular semantic SLAM system with real-time performance and conduct comprehensive experiments on public datasets. The results show that our method is able to build accurate object maps and outperforms state-of-the-art methods on object representation.
... The geometrical forms of representation, such as ellipsoids, have the advantages that they can provide a simple observation model (defined as the mathematical model of the observations and the parameters to be estimated). Such models can be easily integrated into the process of FG [16] and GBA [17,18]. However, most of the current methods in the second group rely on an initially estimated position using classical solutions such as ORB SLAM [19]. ...
Preprint
Full-text available
Object-level Simultaneous Localization and Mapping (SLAM) has gained popularity in recent years since it can provide a means for intelligent robot-to-environment interactions. However, most of these methods assume that the distribution of the errors is gaussian. This assumption is not valid under many circumstances. Further, these methods use a delayed initialization of the objects in the map. During this delayed period, the solution relies on the motion model provided by an Inertial Measurement Unit (IMU). Unfortunately, the errors tend to accumulate quickly due to the dead-reckoning nature of these motion models. Finally, the current solutions depend on a set of salient features on the object’s surface and not the object’s shape. This research proposes an accurate object-level solution to the SLAM problem with a 4.1 to 13.1 cm error in the position (0.005 to 0.021 of the total path). The developed solution is based on Rao-blackwellized Particle Filtering (RBPF) that does not assume any predefined error distribution for the parameters. Further, the solution relies on the shape and thus can be used for objects that lack texture on their surface. Finally, the developed tightly coupled IMU/camera solution is based on an undelayed initialization of the objects in the map.
... The pioneering work of SLAM++ [30] performs object-level SLAM using a depth camera. [31] develop a quadratic-programming-based semantic object initialization scheme to achieve high-accuracy object-level data association and real-time semantic mapping. [32] integrated object detection and localization module together to obtain the semantic maps of the environment and improve localization. ...
Preprint
Full-text available
Indoor relocalization is vital for both robotic tasks like autonomous exploration and civil applications such as navigation with a cell phone in a shopping mall. Some previous approaches adopt geometrical information such as key-point features or local textures to carry out indoor relocalization, but they either easily fail in an environment with visually similar scenes or require many database images. Inspired by the fact that humans often remember places by recognizing unique landmarks, we resort to objects, which are more informative than geometry elements. In this work, we propose a simple yet effective object-based indoor relocalization approach, dubbed AirLoc. To overcome the critical challenges of object reidentification and remembering object relationships, we extract object-wise appearance embedding and inter-object geometric relationships. The geometry and appearance features are integrated to generate cumulative scene features. This results in a robust, accurate, and portable indoor relocalization system, which outperforms the state-of-the-art methods in room-level relocalization by 9.5% of PR-AUC and 7% of accuracy. In addition to exhaustive evaluation, we also carry out real-world tests, where AirLoc shows robustness in challenges like severe occlusion, perceptual aliasing, viewpoint shift, and deformation.
... We adapt the system introduced in [31] for semantic SLAM. At time instance t − 1, the position estimation m i,t−1 ∈ R 2 for the semantic object o i is: Where bel(·) stands for the belief over a variable. ...
Preprint
Full-text available
This paper addresses the problem of enabling a robot to search for a semantic object in an unknown and GPS-denied environment. For the robot in the unknown environment to detect and find the target object, it must perform simultaneous localization and mapping (SLAM) at both geometric and semantic levels using its onboard sensors while planning and executing its motion based on the ever-updated SLAM results. In other words, the robot must be able to conduct simultaneous localization, semantic mapping, motion planning, and execution in real-time in the presence of sensing and motion uncertainty. This is an open problem as it combines semantic SLAM based on perception and real-time motion planning and execution under uncertainty. Moreover, the goals of robot motion change on the fly depending on whether and how the robot can detect the target object. We propose a novel approach to tackle the problem, leveraging semantic SLAM, Bayesian Networks, Markov Decision Process, and real-time dynamic planning. The results demonstrate the effectiveness and efficiency of our approach.
... However, these semantic methods cannot distinguish whether dynamic objects are currently in motion or at rest, so they can only detect potential dynamic objects. Some objects, such as people, cars, and animals, are usually in action, so removing them does not cause adverse consequences [30]. However, in scenes that lack texture, such as parking lots, if all potential dynamic objects are considered to be moving objects and are removed, then systems may not obtain enough reference objects, resulting in inaccurate positioning or even lost positioning. ...
Article
Full-text available
A static environment is a prerequisite for the stable operation of most visual SLAM systems, which limits the practical use of most existing systems. The robustness and accuracy of visual SLAM systems in dynamic environments still face many complex challenges. Only relying on semantic information or geometric methods cannot filter out dynamic feature points well. Considering the problem of dynamic objects easily interfering with the localization accuracy of SLAM systems, this paper proposes a new monocular SLAM algorithm for use in dynamic environments. This improved algorithm combines semantic information and geometric methods to filter out dynamic feature points. Firstly, an adjusted Mask R-CNN removes prior highly dynamic objects. The remaining feature-point pairs are matched via the optical-flow method and a fundamental matrix is calculated using those matched feature-point pairs. Then, the environment’s actual dynamic feature points are filtered out using the polar geometric constraint. The improved system can effectively filter out the feature points of dynamic targets. Finally, our experimental results on the TUM RGB-D and Bonn RGB-D Dynamic datasets showed that the proposed method could improve the pose estimation accuracy of a SLAM system in a dynamic environment, especially in the case of high indoor dynamics. The performance effect was better than that of the existing ORB-SLAM2. It also had a higher running speed than DynaSLAM, which is a similar dynamic visual SLAM algorithm.
... Data associations can also be solved directly by object observations. [23] calculates the Bag of Words (BoW) vector of feature points as the appearance description of objects, and then perform BoW matching on the candidate objects that satisfy the reprojection relationship. Chen et al. [24] Proposed a hierarchical object association strategy, and applied multi-object tracking for short-term object association. ...
Preprint
Object SLAM uses additional semantic information to detect and map objects in the scene, in order to improve the system's perception and map representation capabilities. Quadrics and cubes are often used to represent objects, but their single shape limits the accuracy of object map and thus affects the application of downstream tasks. In this paper, we introduce superquadrics (SQ) with shape parameters into SLAM for representing objects, and propose a separate parameter estimation method that can accurately estimate object pose and adapt to different shapes. Furthermore, we present a lightweight data association strategy for correctly associating semantic observations in multiple views with object landmarks. We implement a monocular semantic SLAM system with real-time performance and conduct comprehensive experiments on public datasets. The results show that our method is able to build accurate object map and has advantages in object representation. Code will be released upon acceptance.
... The mapping and localization of traditional SLAM are mostly based on pixel-level geometric matching. With semantic information, we can upgrade the data association from the traditional pixel level to the object level, improving the accuracy of complex scenes [202]. On the other hand, by using SLAM technology to calculate the position constraints between objects, the consistency constraints can be applied to the recognition results of the same object at different angles and at different times, thus improving the accuracy of semantic understanding. ...
Article
Full-text available
Visual SLAM (VSLAM) has been developing rapidly due to its advantages of low-cost sensors, the easy fusion of other sensors, and richer environmental information. Traditional visionbased SLAM research has made many achievements, but it may fail to achieve wished results in challenging environments. Deep learning has promoted the development of computer vision, and the combination of deep learning and SLAM has attracted more and more attention. Semantic information, as high-level environmental information, can enable robots to better understand the surrounding environment. This paper introduces the development of VSLAM technology from two aspects: traditional VSLAM and semantic VSLAM combined with deep learning. For traditional VSLAM, we summarize the advantages and disadvantages of indirect and direct methods in detail and give some classical VSLAM open-source algorithms. In addition, we focus on the development of semantic VSLAM based on deep learning. Starting with typical neural networks CNN and RNN, we summarize the improvement of neural networks for the VSLAM system in detail. Later, we focus on the help of target detection and semantic segmentation for VSLAM semantic information introduction. We believe that the development of the future intelligent era cannot be without the help of semantic technology. Introducing deep learning into the VSLAM system to provide semantic information can help robots better perceive the surrounding environment and provide people with higher-level help.
... In addition, in prior work such as [6], [15], data association methods have been proposed although they are typically not robust to outdoor scenes. Dynamic objects in outdoor scenes like moving cars and persons are a challenge for quadric estimation since false object associations will lead to false quadric initialization results. ...
Preprint
Full-text available
Object-oriented SLAM is a popular technology in autonomous driving and robotics. In this paper, we propose a stereo visual SLAM with a robust quadric landmark representation method. The system consists of four components, including deep learning detection, object-oriented data association, dual quadric landmark initialization and object-based pose optimization. State-of-the-art quadric-based SLAM algorithms always face observation related problems and are sensitive to observation noise, which limits their application in outdoor scenes. To solve this problem, we propose a quadric initialization method based on the decoupling of the quadric parameters method, which improves the robustness to observation noise. The sufficient object data association algorithm and object-oriented optimization with multiple cues enables a highly accurate object pose estimation that is robust to local observations. Experimental results show that the proposed system is more robust to observation noise and significantly outperforms current state-of-the-art methods in outdoor environments. In addition, the proposed system demonstrates real-time performance.
... However, this association can only be used in very limited environments. Qian et al. [34] use a novel objectlevel object association algorithm based on the bag of words algorithm. In dealing with the object association problem, they used the geometry and appearance information of the object. ...
Preprint
Full-text available
Nowadays in the field of semantic SLAM, how to correctly use semantic information for data association is still a problem worthy of study. The key to solving this problem is to correctly associate multiple object measurements of one object landmark, and refine the pose of object landmark. However, different objects locating closely are prone to be associated as one object landmark, and it is difficult to pick up a best pose from multiple object measurements associated with one object landmark. To tackle these problems, we propose a hierarchical object association strategy by means of multiple object tracking, through which closing objects will be correctly associated to different object landmarks, and an approach to refine the pose of object landmark from multiple object measurements. The proposed method is evaluated on a simulated sequence and several sequences in the Kitti dataset. Experimental results show a very impressive improvement with respect to the traditional SLAM and the state-of-the-art semantic SLAM method.
Article
To ensure long-term space missions, an autonomous localization and mapping system for lunar rovers is demanded. While the target-oriented localization and mapping problem can be solved through state-of-the-art methods, they greatly rely on human-in-loop remote operations, posing several challenges for the visual system of a rover when operating in a distant, unknown, and feature-sparse lunar environment. This paper presents a SAM-augmented target-oriented SLAM framework that enables rovers to estimate the relative distance to the target on the lunar surface, thus ensuring the safety of the exploration task. Based on the proposed point-prompted object instance extraction pipeline, object correspondences are firstly predicted in the middle-end of Lo-SLAM, where reliable semantic constraints are robustly associated cross image frames. We then maintain camera-object relative positioning between the camera and target in the visual odometry of Lo-SLAM. Meanwhile, a multi-level mapping and representation framework is proposed to keep the target explicit and characterized in different subtasks. Extensive experiments are conducted on our dataset, SePT (i.e., Stereo Planetary Tracks). Results show that the proposed Lo-SLAM is validated on challenging lunar scenarios with dramatic viewpoints and object scale changes. The average pose errors are 0.36 meters in centroid and 0.27 meters in scale, and the average object-centric trajectory error is 0.49% or so. An open-source dataset has been released at https://github.com/miaTian99/SePT_Stereo-Planetary-Tracks.
Article
For a robot in an unknown environment to find a target semantic object, it must perform simultaneous localization and mapping (SLAM) at both geometric and semantic levels using its onboard sensors while planning and executing its motion based on the ever-updated SLAM results. In other words, the robot must simultaneously conduct localization, semantic mapping, motion planning, and execution online in the presence of sensing and motion uncertainty. This is an open problem as it intertwines semantic SLAM and adaptive online motion planning and execution under uncertainty based on perception. Moreover, the goals of the robot's motion change on the fly depending on whether and how the robot can detect the target object. We propose a novel approach to tackle the problem, leveraging semantic SLAM, Bayesian Networks, and online probabilistic motion planning. The results demonstrate our approach's effectiveness and efficiency.
Article
Simultaneous localization and mapping (SLAM) in robotics is a fundamental problem. The use of visual odometry (VO) enhances scene recognition in the task of ego-localization within an unknown environment. Semantically meaningful information permits data association and dense mapping to be conducted based on entities representing landmarks rather than manually designed, low-level geometric clues and has inspired various feature descriptors for semantically ensembled SLAM applications. This article illuminates the insights into the measure for semantics and the semantically constrained pose optimization. The concept of semantic extractor and the matched framework are initially presented. As the latest advances in computer vision and the learning-based deep feature acquisition are closely related, the semantic extractor is especially described in a deep learning paradigm. The methodologies pertinent to our explorations for object association and semantics-fused constraining that is amenable for use in a least-squares framework are summarized in a systematic way. By a collection of problem formulations and principle analyses, our review exhibits a fairly unique perspective in semantic SLAM. We further discuss the challenges of semantic uncertainty and explicitly introduce the term ‘semantic reasoning’. Some technology outlooks regarding semantic reasoning are simultaneously given. We argue that for intelligent tasks of robots such as object grasping, dynamic obstacle avoidance, and object-target navigation, semantic reasoning might guide the complex scene understanding under the framework of semantic SLAM directly to a solution.
Conference Paper
Full-text available
Aerial robots play a vital role in various applications where the situational awareness of the robots concerning the environment is a fundamental demand. As one such use case, drones in GPS-denied environments require equipping with different sensors (e.g., vision sensors) that provide reliable sensing results while performing pose estimation and localization. In this paper, reconstructing the maps of indoor environments alongside generating 3D scene graphs for a high-level representation using a camera mounted on a drone is targeted. Accordingly, an aerial robot equipped with a companion computer and an RGB-D camera was built and employed to be appropriately integrated with a Visual Simultaneous Localization and Mapping (VSLAM) framework proposed by the authors. To enhance the situational awareness of the robot while reconstructing maps, various structural elements, including doors and walls, were labeled with printed fiducial markers, and a dictionary of the topological relations among them was fed to the system. The VSLAM system detects markers and reconstructs the map of the indoor areas enriched with higher-level semantic entities, including corridors and rooms. Another achievement is generating multi-layered vision-based situational graphs containing enhanced hierarchical representations of the indoor environment. In this regard, integrating VSLAM into the employed drone is the primary target of this paper to provide an end-to-end robot application for GPS-denied environments. To show the practicality of the system, various real-world condition experiments have been conducted in indoor scenarios with dissimilar structural layouts. Evaluations show the proposed drone application can perform adequately w.r.t. the ground-truth data and its baseline.
Preprint
Full-text available
Aerial robots play a vital role in various applications where the situational awareness of the robots concerning the environment is a fundamental demand. As one such use case, drones in GPS-denied environments require equipping with different sensors (e.g., vision sensors) that provide reliable sensing results while performing pose estimation and localization. In this paper, reconstructing the maps of indoor environments alongside generating 3D scene graphs for a high-level representation using a camera mounted on a drone is targeted. Accordingly, an aerial robot equipped with a companion computer and an RGB-D camera was built and employed to be appropriately integrated with a Visual Simultaneous Localization and Mapping (VSLAM) framework proposed by the authors. To enhance the situational awareness of the robot while reconstructing maps, various structural elements, including doors and walls, were labeled with printed fiducial markers, and a dictionary of the topological relations among them was fed to the system. The VSLAM system detects markers and reconstructs the map of the indoor areas, enriched with higher-level semantic entities, including corridors and rooms. Another achievement is generating multi-layered vision-based situational graphs containing enhanced hierarchical representations of the indoor environment. In this regard, integrating VSLAM into the employed drone is the primary target of this paper to provide an end-to-end robot application for GPS-denied environments. To show the practicality of the system, various real-world condition experiments have been conducted in indoor scenarios with dissimilar structural layouts. Evaluations show the proposed drone application can perform adequately w.r.t. the ground-truth data and its baseline.
Article
Matching landmark patches from a real-time image captured by an on-vehicle camera with landmark patches in an image database plays an important role in various computer perception tasks for autonomous driving. Current methods focus on local matching for regions of interest and do not take into account spatial neighborhood relationships among the image patches, which typically correspond to objects in the environment. In this paper, we construct a spatial graph with the graph vertices corresponding to patches and edges capturing the spatial neighborhood information. We propose a joint feature and metric learning model with graph-based learning. We provide a theoretical basis for the graph-based loss by showing that the information distance between the distributions conditioned on matched and unmatched pairs is maximized under our framework. We evaluate our model using several street-scene datasets and demonstrate that our approach achieves state-of-the-art matching results.
Article
Object-level Simultaneous Localization and Mapping (SLAM) is critical for mobile robot localization and navigation. Wrong observations due to monocular camera noise and object detection error affect accurate object perception. Most of the existing work adopts simple artificial rules to prevent the construction of object outliers. These strategies are difficult to universalize to different challenging scenarios. Eliminating object outliers remains a challenge for object SLAM. In this paper, we propose a Spatio-temporal consistency model for removing object outliers. Our approach takes only a low-cost monocular camera as the image sensor of the system. We use the graph model to construct spatial consistency as a means to constrain the semantic spatial relationships among multiple objects. Only the objects that satisfy the spatial consistency constraints are constructed. In addition, outliers are detected based on the regularity of object measurements that appear on the time axis. We eliminate the objects with observations in consecutive frames that do not satisfy the temporal consistency constraint. Finally, we couple normal objects to SLAM for pose optimization to improve camera localization accuracy. Experiments on public datasets and a real scenario demonstrate the performance of the proposed approach.
Article
Robust and accurate object association is essential for precise 3D object landmark inference in semantic Simultaneous Localization and Mapping (SLAM), and yet remains challenging due to the detection deficiency caused by high miss detection rate, false alarm, occlusion and limited field-of-view, etc. The 2D location of an object is a crucial complementary cue to the appearance feature, especially in the case of associating objects across frames under large viewpoint changes. However, motion model or trajectory pattern based methods struggle to infer object motion reliably with a moving camera. In this paper, by exploiting the local projective warping consistency, a local homography based 2D motion inference method is proposed to sequentially estimate the object location along with uncertainty. By integrating the deep appearance feature and semantic information, an object association method, named HOA, which is robust to detection deficiency is proposed. Experimental evaluations suggest that the proposed motion prediction method is capable of maintaining a low cumulative error over a long duration, which enhances the object association performance in both accuracy and robustness. Note to Practitioners —This work aims to consistently associate 2D detection boxes corresponding to the same 3D object across images. In tasks of landmark-based navigation, collision avoidance, grasping and manipulation, objects in the task space are commonly simplified into 3D enveloping surfaces (e.g. cuboid or ellipsoid) by using 2D object detection boxes from multiple image views, and accurate data association is a prerequisite for precise enveloping surface reconstruction. This problem remains challenging considering the imperfect object detections, the appearance similarity of objects and the unpredictable trajectory of the moving camera. This work proposes a long-term reliable 2D location prediction algorithm that is capable of handling the complex motion of the target. Along with the appearance feature extracted by a retrain-free deep learning based model, this work proposes an object association method that can simultaneously deal with multiple objects with unknown object categories under the moving camera scenario.
Article
Loop closure can effectively correct the accumulated error in robot localization, which plays a critical role in the long-term navigation of the robot. Traditional appearance-based methods rely on local features and are prone to failure in ambiguous environments. On the other hand, object recognition can infer objects' category, pose, and extent. These objects can serve as stable semantic landmarks for viewpoint-independent and non-ambiguous loop closure. However, there is a critical object-level data association problem due to the lack of efficient and robust algorithms. We introduce a novel object-level data association algorithm, which incorporates IoU, instance-level embedding, and detection uncertainty, formulated as a linear assignment problem. Then, we model the objects as TSDF volumes and represent the environment as a 3D graph with semantics and topology. Next, we propose a graph matching-based loop detection based on the reconstructed 3D semantic graphs and correct the accumulated error by aligning the matched objects. Finally, we refine the object poses and camera trajectory in an object-level pose graph optimization. Experimental results show that the proposed object-level data association method significantly outperforms the commonly used nearest neighbor method in accuracy. Our graph matching-based loop closure is more robust to environmental appearance changes than existing appearance-based methods.
Article
In semantic visual Simultaneous Localization and Mapping (SLAM), accurate object-level reconstruction of the environment based on the deep learning techniques is very crucial for high-level scene recognition and semantic object association. However, existing work handles this problem with the assumption of a simple world. There is still a challenge to improve the accuracy of object reconstruction based on images with a complicated environment background. In this work, we propose an improved object recovery method applying the DBSCAN algorithm based on geometric features. Outlier points and abnormal clusters can be identified by combining the clustering algorithm and the nonparametric test. In addition, we develop an adaptive sampling strategy based on line features with varying-step intervals, which can achieve a more accurate estimation of the object orientation. The proposed method is integrated with the ORB-SLAM2 framework to construct a real-time image-based reconstruction system. The qualitative and quantitative evaluation on public datasets and real-world scenarios demonstrate the robustness and effectiveness of our approach compared to the related work.
Article
Current pandemic has caused the medical system to operate under high load. To relieve it, robots with high autonomy can be used to effectively execute contactless operations in hospitals and reduce cross-infection between medical staff and patients. Although semantic Simultaneous Localization and Mapping (SLAM) technology can improve the autonomy of robots, semantic object association is still a problem that is worthy of being studied. The key to solving this problem is to correctly associate multiple object measurements of one object landmark by using semantic information, and to refine the pose of object landmark in real time. To this end, we propose a hierarchical object association strategy and a pose-refinement approach. The former one consists of two levels, i.e., a short-term object association and a global one. In the first level, we employ the multiple-object-tracking for short-term object association, through which the incorrect association among objects whose locations are close and appearances are similar can be avoided. Moreover, the short-term object association can provide more abundant object appearance and more robust estimation of object pose for the global object association in the second level. To refine the object pose in the map, we develop an approach to choose the optimal object pose from all object measurements associated with an object landmark. The proposed method is comprehensively evaluated on seven simulated hospital sequences, a real hospital environment and the KITTI dataset. Experimental results show that our method has an obviously improvement in terms of robustness and accuracy for the object association and the trajectory estimation in the semantic SLAM.
Article
Loop closure is necessary for correcting errors accumulated in simultaneous localization and mapping (SLAM) in unknown environments. However, conventional loop closure methods based on low-level geometric or image features may cause high ambiguity by not distinguishing similar scenarios. Thus, incorrect loop closures can occur. Though semantic 2D image information is considered in some literature to detect loop closures, there is little work that compares 3D scenes as an integral part of a semantic SLAM system. This letter introduces an approach, called SmSLAM+LCD, integrated into a semantic SLAM system to combine high-level 3D semantic information and low-level feature information to conduct accurate loop closure detection and effective drift reduction. The effectiveness of our approach is demonstrated in testing results.
Article
Full-text available
Augmented reality (AR) is an emerging technology that is applied in many fields. One of the limitations that still prevents AR to be even more widely used relates to the accessibility of devices. Indeed, the devices currently used are usually high end, expensive glasses or mobile devices. vSLAM (visual simultaneous localization and mapping) algorithms circumvent this problem by requiring relatively cheap cameras for AR. vSLAM algorithms can be classified as direct or indirect methods based on the type of data used. Each class of algorithms works optimally on a type of scene (e.g., textured or untextured) but unfortunately with little overlap. In this work, a method is proposed to fuse a direct and an indirect methods in order to have a higher robustness and to offer the possibility for AR to move seamlessly between different types of scenes. Our method is tested on three datasets against state-of-the-art direct (LSD-SLAM), semi-direct (LCSD) and indirect (ORBSLAM2) algorithms in two different scenarios: a trajectory planning and an AR scenario where a virtual object is displayed on top of the video feed; furthermore, a similar method (LCSD SLAM) is also compared to our proposal. Results show that our fusion algorithm is generally as efficient as the best algorithm both in terms of trajectory (mean errors with respect to ground truth trajectory measurements) as well as in terms of quality of the augmentation (robustness and stability). In short, we can propose a fusion algorithm that, in our tests, takes the best of both the direct and indirect methods.
Article
Full-text available
Due to the development of the computer vision, machine learning and deep learning technologies, the research community focuses not only on the traditional SLAM problems, such as geometric mapping and localization, but also on semantic SLAM. In this paper we propose a Semantic SLAM system which builds the semantic maps with object-level entities, and it is integrated into the RGB-D SLAM framework. The system combines object detection module that is realized by the deep-learning method, and localization module with RGB-D SLAM seamlessly. In the proposed system, object detection module is used to perform object detection and recognition, and localization module is utilized to get the exact location of the camera. The two modules are integrated together to obtain the semantic maps of the environment. Furthermore, to improve the computational efficiency of the framework, an improved Octomap based on Fast Line Rasterization Algorithm is constructed. Meanwhile, for the sake of accuracy and robustness of the semantic map, Conditional Random Field (CRF) is employed to do the optimization. Finally, we evaluate our Semantic SLAM through three different tasks, i.e. Localization, Object Detection and Mapping. Specifically, the accuracy of localization and the mapping speed are evaluated on TUM dataset. Compared with ORB-SLAM2 and original RGB-D SLAM, our system respectively get 72.9% and 91.2% improvements in dynamic environments localization evaluated by root-mean-square error (RMSE). With the improved Octomap, the proposed Semantic SLAM is 66.5% faster than the original RGB-D SLAM. We also demonstrate the efficiency of object detection through quantitative evaluation in an automated inventory management task on a real-world datasets recorded over a realistic office.
Conference Paper
Full-text available
Traditional approaches for simultaneous localization and mapping (SLAM) rely on geometric features such as points, lines, and planes to infer the environment structure. They make hard decisions about the (data) association between observed features and mapped landmarks to update the environment model. This paper makes two contributions to the state of the art in SLAM. First, it generalizes the purely geometric model by introducing semantically meaningful objects, represented as structured models of mid-level part features. Second, instead of making hard, potentially wrong associations between semantic features and objects, it shows that SLAM inference can be performed efficiently with probabilistic data association. The approach not only allows building meaningful maps (containing doors, chairs, cars, etc.) but also offers significant advantages in ambiguous environments.
Article
Full-text available
We provide a large dataset containing RGB-D image sequences and the ground-truth camera trajectories with the goal to establish a benchmark for the evaluation of visual SLAM systems. Our dataset contains the color and depth images of a Microsoft Kinect sensor and the ground-truth trajectory of camera poses. The data was recorded at full frame rate (30 Hz) and sensor resolution (640x480). The ground-truth trajectory was obtained from a high-accuracy motion-capture system with eight high-speed tracking cameras (100 Hz). Further, we provide the accelerometer data from the Kinect. Finally, we propose an evaluation criterion for measuring the quality of the estimated camera trajectory of visual SLAM systems.
Conference Paper
Full-text available
Many popular problems in robotics and computer vision including various types of simultaneous localization and mapping (SLAM) or bundle adjustment (BA) can be phrased as least squares optimization of an error function that can be represented by a graph. This paper describes the general structure of such problems and presents g2o, an open-source C++ framework for optimizing graph-based nonlinear error functions. Our system has been designed to be easily extensible to a wide range of problems and a new problem typically can be specified in a few lines of code. The current implementation provides solutions to several variants of SLAM and BA. We provide evaluations on a wide range of real-world and simulated datasets. The results demonstrate that while being general g2o offers a performance comparable to implementations of state of-the-art approaches for the specific problems.
Article
Full-text available
The cost scaling push-relabel method has been shown to be efficient for solving minimum-cost flow problems. In this paper we apply the method to the assignment problem. We investigate implementations of the method that take advantage of the problem structure. The results show that the method is very promising for practical use; it outperforms all other codes on almost all problems in our study.
Article
In this paper, we use 2D object detections from multiple views to simultaneously estimate a 3D quadric surface for each object and localize the camera position. We derive a SLAM formulation that uses dual quadrics as 3D landmark representations, exploiting their ability to compactly represent the size, position and orientation of an object, and show how 2D object detections can directly constrain the quadric parameters via a novel geometric error formulation. We develop a sensor model for object detectors that addresses the challenge of partially visible objects, and demonstrate how to jointly estimate the camera pose and constrained dual quadric parameters in factor graph based SLAM with a general perspective camera.
Article
We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that's pretty swell. It's a little bigger than last time but more accurate. It's still fast though, don't worry. At 320x320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 mAP@50 in 51 ms on a Titan X, compared to 57.5 mAP@50 in 198 ms by RetinaNet, similar performance but 3.8x faster. As always, all the code is online at https://pjreddie.com/yolo/
Article
In this work we present a novel approach to recover objects 3D position and occupancy in a generic scene using only 2D object detections from multiple view images. The method reformulates the problem as the estimation of a quadric (ellipsoid) in 3D given a set of 2D ellipses fitted to the object detection bounding boxes in multiple views. We show that a closed-form solution exists in the dual-space using a minimum of three views while a solution with two views is possible through the use of non-linear optimisation and object constraints on the size of the object shape. In order to make the solution robust toward inaccurate bounding boxes, a likely occurrence in object detection methods, we introduce a data preconditioning technique and a non-linear refinement of the closed form solution based on implicit subspace constraints. Results on synthetic tests and on different real datasets, involving challenging scenarios, demonstrate the applicability and potential of our method in several realistic scenarios.
Article
We present ORB-SLAM2 a complete SLAM system for monocular, stereo and RGB-D cameras, including map reuse, loop closing and relocalization capabilities. The system works in real-time in standard CPUs in a wide variety of environments from small hand-held indoors sequences, to drones flying in industrial environments and cars driving around a city. Our backend based on Bundle Adjustment with monocular and stereo observations allows for accurate trajectory estimation with metric scale. Our system includes a lightweight localization mode that leverages visual odometry tracks for unmapped regions and matches to map points that allow for zero-drift localization. The evaluation in 29 popular public sequences shows that our method achieves state-of-the-art accuracy, being in most cases the most accurate SLAM solution. We publish the source code, not only for the benefit of the SLAM community, but with the aim of being an out-of-the-box SLAM solution for researchers in other fields.
Article
Ever more robust, accurate and detailed mapping using visual sensing has proven to be an enabling factor for mobile robots across a wide variety of applications. For the next level of robot intelligence and intuitive user interaction, maps need extend beyond geometry and appearence - they need to contain semantics. We address this challenge by combining Convolutional Neural Networks (CNNs) and a state of the art dense Simultaneous Localisation and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondence between frames of indoor RGB-D video even during loopy scanning trajectories. These correspondences allow the CNN's semantic predictions from multiple view points to be probabilistically fused into a map. This not only produces a useful semantic 3D map, but we also show on the NYUv2 dataset that fusing multiple predictions leads to an improvement even in the 2D semantic labelling over baseline single frame predictions. We also show that for a smaller reconstruction dataset with larger variation in prediction viewpoint, the improvement over single frame segmentation increases. Our system is efficient enough to allow real-time interactive use at frame-rates of approximately 25Hz.
Conference Paper
We present an approach for joint inference of 3D scene structure and semantic labeling for monocular video. Starting with monocular image stream, our framework produces a 3D volumetric semantic + occupancy map, which is much more useful than a series of 2D semantic label images or a sparse point cloud produced by traditional semantic segmentation and Structure from Motion(SfM) pipelines respectively. We derive a Conditional Random Field (CRF) model defined in the 3D space, that jointly infers the semantic category and occupancy for each voxel. Such a joint inference in the 3D CRF paves the way for more informed priors and constraints, which is otherwise not possible if solved separately in their traditional frameworks. We make use of class specific semantic cues that constrain the 3D structure in areas, where multiview constraints are weak. Our model comprises of higher order factors, which helps when the depth is unobservable.We also make use of class specific semantic cues to reduce either the degree of such higher order factors, or to approximately model them with unaries if possible. We demonstrate improved 3D structure and temporally consistent semantic segmentation for difficult, large scale, forward moving monocular image sequences.
Article
We present a real-time object-based SLAM system that leverages the largest object database to date. Our approach comprises two main components: 1) a monocular SLAM algorithm that exploits object rigidity constraints to improve the map and find its real scale, and 2) a novel object recognition algorithm based on bags of binary words, which provides live detections with a database of 500 3D objects. The two components work together and benefit each other: the SLAM algorithm accumulates information from the observations of the objects, anchors object features to especial map landmarks and sets constrains on the optimization. At the same time, objects partially or fully located within the map are used as a prior to guide the recognition algorithm, achieving higher recall. We evaluate our proposal on five real environments showing improvements on the accuracy of the map and efficiency with respect to other state-of-the-art techniques.
Conference Paper
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in understanding an object's precise 2D location. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old along with per-instance segmentation masks. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.
Article
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in understanding an object's precise 2D location. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old along with per-instance segmentation masks. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.
Conference Paper
We present the major advantages of a new 'object oriented' 3D SLAM paradigm, which takes full advantage in the loop of prior knowledge that many scenes consist of repeated, domain-specific objects and structures. As a hand-held depth camera browses a cluttered scene, real-time 3D object recognition and tracking provides 6DoF camera-object constraints which feed into an explicit graph of objects, continually refined by efficient pose-graph optimisation. This offers the descriptive and predictive power of SLAM systems which perform dense surface reconstruction, but with a huge representation compression. The object graph enables predictions for accurate ICP-based camera to model tracking at each live frame, and efficient active search for new objects in currently undescribed image regions. We demonstrate real-time incremental SLAM in large, cluttered environments, including loop closure, relocalisation and the detection of moved objects, and of course the generation of an object level scene description with the potential to enable interaction.
Article
We propose a novel method for visual place recognition using bag of words obtained from accelerated segment test (FAST)+BRIEF features. For the first time, we build a vocabulary tree that discretizes a binary descriptor space and use the tree to speed up correspondences for geometrical verification. We present competitive results with no false positives in very different datasets, using exactly the same vocabulary and settings. The whole technique, including feature extraction, requires 22 ms/frame in a sequence with 26 300 images that is one order of magnitude faster than previous approaches.
Article
Abstract: "The push-relabel method has been shown to be efficient for solving maximum flow and minimum cost flow problems in practice, and periodic global updates of dual variables have played an important role in the best implementations. Nevertheless, global updates had not been known to yield any theoretical improvement in running time. In this work, we study techniques for implementing push-relabel algorithms to solve bipartite matching and assignment problems. We show that global updates yield a theoretical improvement in the bipartite matching and assignment contexts, and we develop a suite of efficient cost-scaling push-relabel implementations to solve assignment problems. For bipartite matching, we show that a push-relabel algorithm using global updates runs in [formula] time (matching the best bound known) and performs worse by a factor of [square root of n] without the updates. We present a similar result for the assignment problem, for which an algorithm that assumes integer costs in the range [ -C, ..., C] runs in time O([square root of nm] log(nC)) (matching the best cost-scaling bound known). We develop cost-scaling push-relabel implementations that take advantage of the assignment problem's special structure, and compare our codes against the best codes from the literature. The results show that the push-relabel method is very promising for practical use." Cover title. "August 1995." Thesis (Ph. D.)--Stanford University, 1995. Includes bibliographical references.
YOLO ROS: Real-time object detection for ROS
  • bjelonic
Pulse code modulation techniques.
  • B Waggener
  • W N Waggener
  • W M Waggener
CGAL User and Reference Manual
  • Cgal The
  • Project
Assignment problems: revised reprint.