Article

A comprehensive review of 3D point cloud descriptors

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The introduction of inexpensive 3D data acquisition devices has promisingly facilitated the wide availability and popularity of 3D point cloud, which attracts more attention on the effective extraction of novel 3D point cloud descriptors for accurate and efficient of 3D computer vision tasks. However, how to de- velop discriminative and robust feature descriptors from various point clouds remains a challenging task. This paper comprehensively investigates the exist- ing approaches for extracting 3D point cloud descriptors which are categorized into three major classes: local-based descriptor, global-based descriptor and hybrid-based descriptor. Furthermore, experiments are carried out to present a thorough evaluation of performance of several state-of-the-art 3D point cloud descriptors used widely in practice in terms of descriptiveness, robustness and efficiency.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Os algoritmos de detecção e extração de objetos tridimensionais operam baseados em critérios específicos para cada classe de objeto de interesse, os quais são denominados de descritores 3D em visão computacional. Segundo Han et al. (2018), os descritores permitem a realização de tarefas de visualização 3D de forma mais acurada e eficiente, podendo ser categorizados como globais, locais ou híbridos. Os globais são calculados com um único vetor de atributos para toda a nuvem de pontos. ...
... Na literatura, é possível encontrar descritores com diferentes características dedicados ao reconhecimento de objetos em uma nuvem de pontos. Um dos mais antigos é o Spin Image ("imagem giratória") criado por (JOHNSON, 1997) para a correspondência entre objetos, o qual consiste em usar uma base local definida por um dado ponto 3D orientado pela normal à superfície e acumular em uma matriz 2D a posição, em relação à essa base, dos outros pontos da superfície do objeto (HAN et al., 2018). Recentemente, uma otimização deste descritor, denominada de Salient Spin Image, foi proposta por H'roura et al. (2018) com o objetivo de reduzir a complexidade computacional do algoritmo e melhorar a sua performance em áreas oclusas e desordenadas (cluttered). ...
... Uma variedade de outros descritores podem ser encontrados, principalmente aqueles que se baseiam nas relações analíticas dos autovalores da estrutura formada pelos pontos de uma vizinhança. Os autovalores ( 1 , 2 , 3 ) são obtidos pela decomposição de uma matriz de variância-covariância local definida por pontos de uma região de suporte e são ordenados em função da magnitude ( 1 > 2 > 3 ) (HAN et al., 2018). Diferentes estudos voltados para a análise de cenas 3D têm explorado o conceito dos autovalores para descrever as características dos objetos (DEMANTKÉ et al., 2012;DITTRICH;HINZ, 2017;DOS SANTOS et al., 2019;WEINMANN et al., 2015WEINMANN et al., , 2017. ...
Article
Full-text available
The result derived from tree detection using LiDAR data can be used in different applications such as forest management and preservation, urban planning, detection of occluded objects by tree crowns, among others. In this sense, this work aims to evaluate the applicability of 3D geometric descriptors based on eigenvalues and LiDAR intensity in tree detection. In experiments, it was analyzed the influence of neighborhood (sphere and cylinder) in the calculation of geometric attributes. From visual analyses, it was possible to notice that the use of some geometric attributes such as omnivariance, curvature, planarity and eigenentropy showed more potential in detecting trees. Considering the four selected attributes (omnivariance, curvature, planarity, eigenentropy) and intensity, the K-Means algorithm was executed to each attribute aiming to separate the point cloud into tree and non-tree. The results derived from classification were compared with reference data using quality parameters such as completeness, correctness and F-Score. Completeness values obtained with for omnivariance, eigenentropy, planarity and intensity reached values above 90%, indicating that these descriptors have high reliability for tree detection. The average correctness was around 62%, presenting a large number of false positives. When analyzing F-Score values, it was possible to verify the potential of omnivariance, computed in a spherical neighborhood, in detecting trees from LiDAR data (F-Score above 80%).
... 2. If 3D objects are to be detected within a larger set, pipelines based on 3D point cloud descriptors have become established [5] [6]. A distinction is made here between the following types of 3D descriptors [6]: a) Local 3D descriptors take into account the local geometric information of a feature point (e.g. ...
... 2. If 3D objects are to be detected within a larger set, pipelines based on 3D point cloud descriptors have become established [5] [6]. A distinction is made here between the following types of 3D descriptors [6]: a) Local 3D descriptors take into account the local geometric information of a feature point (e.g. surface normal and curvature). ...
Conference Paper
Full-text available
The research project SparePartAssist aims at supporting service technicians, who need to identify and order a spare part when working in the field. Existing image-based approaches typically fail to handle occluded situations. Therefore, SparePartAssist opted for using 3D sensor data and a variant of geometric similarity search coping with incomplete model data. When comparing available dToF and iToF sensors for smartphones, multipath problems of iToF sensors were identified as a major obstacle for their application in this use case. Consequently, the recently released LiDAR sensor of the iPad was adopted for delivering the input data for a 3D reconstruction pipeline employing segmentation algorithms and a tailor-made fusion library. Ongoing evaluations show the high potential of the chosen approach.
... Many 3D descriptors for point clouds have been introduced and applied to the task of point cloud registration. Meanwhile, comprehensive reviews for point cloud descriptors and comparison in terms of computational efficiency and accuracy were provided by [138][139][140][141][142] as well. According to the way of descriptor extraction, the descriptors can be approximately classified as local, global, and hybrid descriptors. ...
... For the feature extractor module, in addition to the PFH and SHOT, what we describe in this section is the local-based feature. Various methods to extract the feature and description of the point cloud are also studied and summarized [140]. Feature-based methods usually extract pairwise or higher-order relationships in the histogram in a handcrafted way. ...
Article
Full-text available
A point cloud as a collection of points is poised to bring about a revolution in acquiring and generating three-dimensional (3D) surface information of an object in 3D reconstruction, industrial inspection, and robotic manipulation. In this revolution, the most challenging but imperative process is point could registration, i.e., obtaining a spatial transformation that aligns and matches two point clouds acquired in two different coordinates. In this survey paper, we present the overview and basic principles, give systematical classification and comparison of various methods, and address existing technical problems in point cloud registration. This review attempts to serve as a tutorial to academic researchers and engineers outside this field and to promote discussion of a unified vision of point cloud registration. The goal is to help readers quickly get into the problems of their interests related to point could registration and to provide them with insights and guidance in finding out appropriate strategies and solutions.
... As another important task, 3D keypoint matching is closely related to 3D point cloud registration and 3D point cloud recognition. To tackle this task, many statistics-based methods have been developed in a hand-crafted fashion and aim to describe the geometric structures around 3D keypoints or objects; see a more comprehensive discussion in [10]. ...
... Most 3D object detection algorithms are validated on KITTI; however, KITTI is a relatively small dataset and does not provide detailed map information. Several autonomousdriving companies have recently released their datasets, such as nuScenes 8 , Argoverse 9 , Lyft Level 5 AV dataset 10 Evaluation metrics. To evaluate the detection performance, standard evaluation metrics in academia are the precisionrecall (PR) curve and average precision (AP); however, there is no standard platform to evaluate the running speed of each model. ...
Article
Full-text available
We present a review of 3D point cloud processing and learning for autonomous driving. As one of the most important sensors in autonomous vehicles (AVs), lidar sensors collect 3D point clouds that precisely record the external surfaces of objects and scenes. The tools for 3D point cloud processing and learning are critical to the map creation, localization, and perception modules in an AV. Although much attention has been paid to data collected from cameras, such as images and videos, an increasing number of researchers have recognized the importance and significance of lidar in autonomous driving and have proposed processing and learning algorithms that exploit 3D point clouds. We review the recent progress in this research area and summarize what has been tried and what is needed for practical and safe AVs. We also offer perspectives on open issues that are needed to be solved in the future.
... Examples of local descriptors include spin image (Johnson 1997), signature of histogram of orientations (SHOT) (Tombari, Salti, and Di Stefano 2010), point feature histogram (PFH) (Rusu et al. 2008), fast point feature histogram (FPFH) (Rusu, Blodow, and Beetz 2009), scale invariant feature transform (SIFT) , radial surface descriptor (RSD) (Marton et al. 2010), and height gradient histogram (HGH) (Zhao, Yuan, and Dang 2015). A comprehensive review of point cloud descriptors is presented by Han et al. (2018). ...
... For every pair of a query point with its neighbour, a sphere is determined by fitting the points using spatial location and point normal. Finally, from all the spheres, the ones with minimum and maximum radii are kept and stored as a descriptor (Han et al. 2018). For 'n' points in a point cloud, the total number of RSD descriptors n × 2 × number_of_scales. ...
Article
Full-text available
Roof bolts are commonly used to provide structural support in underground mines. Frequent and automated assessment of roof bolt is critical to closely monitor any change in the roof conditions while preventing major hazards such as roof fall. However, due to challenging conditions at mine sites such as sub-optimal lighting and restrictive access, it is difficult to routinely assess roof bolts by visual inspection or traditional surveying. To overcome these challenges, this study presents an automated method of roof bolt identification from 3D point cloud data, to assist in spatio-temporal monitoring efforts at mine sites. An artificial neural network was used to classify roof bolts and extract them from 3D point cloud using local point descriptors such as the proportion of variance (POV) over multiple scales, radial surface descriptor (RSD) over multiple scales and fast point feature histogram (FPFH). Accuracy was evaluated in terms of precision, recall and quality metric generally used in classification studies. The generated results were compared against other machine learning algorithms such as weighted k-nearest neighbours (k-NN), ensemble subspace k-NN, support vector machine (SVM) and random forest (RF), and was found to be superior by up to 8% in terms of the achieved quality metric. ARTICLE HISTORY
... As another important task, 3D keypoint matching is closely related to 3D point cloud registration and 3D point cloud recognition. To tackle this task, many statistics-based methods have been developed in a hand-crafted fashion and aim to describe the geometric structures around 3D keypoints or objects; see a more comprehensive discussion in [10]. ...
... Most 3D object detection algorithms are validated on KITTI; however, KITTI is a relatively small dataset and does not provide detailed map information. Several autonomousdriving companies have recently released their datasets, such as nuScenes 8 , Argoverse 9 , Lyft Level 5 AV dataset 10 Evaluation metrics. To evaluate the detection performance, standard evaluation metrics in academia are the precisionrecall (PR) curve and average precision (AP); however, there is no standard platform to evaluate the running speed of each model. ...
Preprint
We present a review of 3D point cloud processing and learning for autonomous driving. As one of the most important sensors in autonomous vehicles, light detection and ranging (LiDAR) sensors collect 3D point clouds that precisely record the external surfaces of objects and scenes. The tools for 3D point cloud processing and learning are critical to the map creation, localization, and perception modules in an autonomous vehicle. While much attention has been paid to data collected from cameras, such as images and videos, an increasing number of researchers have recognized the importance and significance of LiDAR in autonomous driving and have proposed processing and learning algorithms to exploit 3D point clouds. We review the recent progress in this research area and summarize what has been tried and what is needed for practical and safe autonomous vehicles. We also offer perspectives on open issues that are needed to be solved in the future.
... As a result, the studies that are aimed to develop 3D descriptors have been increased in recent year. Han et al. [2] reviewed the 3D point cloud descriptors. They categorized the descriptors into three major classes: local-based, global-based and hybridbased approaches. ...
... The SDC measures distances and encodes information about the distribution of points around the centroid. In this way, SDC allows distinguishing objects with similar characteristics according to their size and normal distribution [2]. By combining viewpoint components, FPFH components and the histograms resulting from the SDC, the VFH descriptor is constructed. ...
... This paper presents a computational pipeline for accurate transformation and alignment to register two different 3D point clouds. Specifically, a 3D point cloud descriptor based on geometric features (Hana et al. 2018) and a false correspondence rejection approach based on RANSAC (Fischler and Bolles 1981) is proposed for the accurate point cloud registration in indoor scenes where objects, such as doors, windows, walls, and so on, can potentially cause false correspondences due to their similar shapes The proposed pipeline is tested through a lab experiment in which point clouds of hallways and rooms in a building are extracted for the registration. Based on the results, discussion and recommendations are also presented for future research directions in this area. ...
... A computational pipeline that extracts key features from point clouds and eliminates false correspondences between the features is presented to align two different 3D point clouds representing an as-planned and an as-built model ( Figure 1). Specifically, local feature descriptors that define the local geometry of a point cloud model, thus often being used for the alignment of point clouds (Yang et al. 2016), are adopted in this study, instead of global feature descriptors that define the complete geometry of a model as one feature (Hana et al. 2018 In the framework, the first point clouds of the two models are generated, and then the keypoints, which describe the features of the point cloud models, are extracted using uniform sampling (Xu et al. 2013). During uniform sampling, a point cloud is considered as a voxel grid, and one point is selected in each voxel to represent the surface. ...
Conference Paper
Full-text available
For interactive visualization in AR devices, feature descriptors of point clouds (as-designed model and as-built model) are corresponded and registered. However, point cloud of indoor environment has lots of similar feature descriptors (e.g., indoor scene with similar doors and windows), which leads to many false correspondences and affect registration accuracy. This paper proposes a random sample consensus (RANSAC)-based false correspondence rejection to compute accurate transformation for the registration of such 3D point clouds. Point cloud data is collected from rooms and a hallway of a campus building, and transformation accuracy for the registration of those point clouds is tested. The results show that RANSAC-based false correspondence rejection gives transformation accuracy of 0.017 radians and 0.1924 meters in aligning two point cloud models, and hence the proposed registration approach of a model point cloud with scene point cloud may provide a foundation to accurately implement the AR on a construction jobsite.
... Shape Descriptors: The methods widely used for extracting 3D point cloud descriptors have been extensively studied by Hana et al. [9]. They can be broadly divided into local-based descriptors, global-based descriptors, and hybrid descriptors. ...
... In conventional global registration approaches, features are extracted by using manually defined rules to form the handcrafted feature descriptors. For those based on handcrafted features, a deeper review can be found in the paper by Han et al. [22]. Fast point feature histograms (FPFHs) [23] appear to be the basis of a lot of research, and they have been used in various works claiming stateof-the-art results. ...
Article
Full-text available
A point cloud is a set of data points in space. Point cloud registration is the process of aligning two or more 3D point clouds collected from different locations of the same scene. Registration enables point cloud data to be transformed into a common coordinate system, forming an integrated dataset representing the scene surveyed. In addition to those reliant on targets being placed in the scene before data capture, there are various registration methods available that are based on using only the point cloud data captured. Until recently, cloud-to-cloud registration methods have generally been centered upon the use of a coarse-to-fine optimization strategy. The challenges and limitations inherent in this process have shaped the development of point cloud registration and the associated software tools over the past three decades. Based on the success of deep learning methods applied to imagery data, attempts at applying these approaches to point cloud datasets have received much attention. This study reviews and comments on more recent developments in point cloud registration without using any targets and explores remaining issues, based on which recommendations on potential future studies in this topic are made.
... 3D point cloud descriptors[4] [5] may serve for the detection of 3D objects within a larger set. Here, following types of 3D descriptors may be distinguished[5]: a) Local 3D descriptors taking into account the local geometric information of a feature point (e.g. surface normal and curvature) b) Global 3D descriptors considering the geometric information of an object's entire point cloud c) Hybrid methods combining different 3D descriptors ...
Conference Paper
Full-text available
The presented augmented reality app supports service technicians in scenarios, where the article number of an urgently needed spare part cannot be determined ad-hoc because the system documentation is either not available or not up to date. For this purpose, the app offers means for the identification and ordering of suitable spare parts based on a 3D scan of the visible and preserved component geometry. For this purpose, the project opted to use the recently released LiDAR sensor of the iPad Pro and iPhone Pro to provide the input data of the 3D reconstruction and recognition pipeline. Apple's ARKit supplies the input depth images, the trajectories and pose data. Within this pipeline, a TSDF based fusion approach is employed for 3D reconstruction. Regarding segmentation, extensive experiments were conducted to identify segmentation methods offering both reliable results and sufficient performance. For object recognition different approaches were developed that can cope with incomplete model data. Within the app's user interface augmented reality features are used to blend data from different sources. When conducting experiments especially in facility management, the proposed approach shows promising results.
... Inspired by descriptors in image space, 3-D local descriptors in a 3-D space were presented. Despite the high importance of 3-D local feature descriptors and the introduction of many methods in recent decades, the accurate, fast, and highly reliable performance of this process still faces major problems [19]. 3 appropriate performance against noise, different densities, clutters, occlusions, and missed data (e.g., due to different viewpoints). ...
Article
Full-text available
Three-dimensional (3-D) point clouds are widely considered for applications in different fields. Various methods have been proposed to generate point cloud data: LIDAR and image matching from static and mobile platforms, including, e.g., terrestrial laser scanning. With multiple point clouds from stationary platforms, point cloud registration is a crucial and fundamental issue. A standard approach is a point-based registration, which relies on pairs of corresponding points in two-point clouds. Therefore, a necessary step in point-based registration is the construction of 3-D local descriptors. One of the (many) challenges that will specifically affect the performance of local descriptors with local spatial information is the point displacement error. This error is caused by the difference in the distributions of points surrounding a (potentially) corresponding center point in the two-point clouds. It can occur for various reasons such as 1) distortions caused by the sensors recording the data, 2) moving objects, 3) varying density of point cloud, 4) change of viewing angle, and 5) different of the sensors. The purpose of this article is to develop a new 3-D local descriptor reducing the effect of this type of error in point cloud coarse registration. The approach includes an improved local reference frame and a new geometric arrangement in point cloud space for the 3-D local descriptor. Inspired by the 2-D DAISY descriptor, a geometric arrangement is created to reduce the effect of the point displacement error. in addition, directional histograms are considered as features. Investigations are performed for point clouds from challenging environments, which are publicly available. The results of this study show the high performance of the proposed approach for point cloud registration, especially in more challenging and noisy environments.
... Among these 3D feature descriptors, Point Feature Histogram (PFH) [40], Fast Point Feature Histogram (FPFH) [41] and Color Point Feature Histogram (PFHRGB) that was developed by the PCL community [42] are suited for 3D registration of point clouds. In that regard, a comprehensive review of 3D point cloud descriptors [38] showed that PFHRGB outperforms other descriptors for our registration task. In addition, we empirically evaluated these feature descriptors on the entire African violet dataset by varying the matching threshold, and PFHRGB which provided the best overall alignment results was retained in our methodology. ...
Conference Paper
Full-text available
Recently, there has been increasing interest in applying spatio-temporal registration for phenotyping of both individual and groups of plants in large agricultural fields. However, 3D non-rigid methods for registration are still a research topic and present numerous particular challenges in plant phenotyping due to: overlaps and self-occlusions in dense phyllotaxies; deformations caused by plant growth over time; changes in outdoor environmental settings, etc. In this paper, we address the problem of registering spatio-temporal 3D models of plants by proposing a bundle registration approach that can handle transformations with up to three additional Degrees of Freedom (DoF) to capture the growth of the plant. Besides, we offer to the research community a new multi-view stereo dataset consisting of 2D images and 3D point clouds of an African violet plant observed over a period of ten days. We evaluate the proposed algorithm on the new African violet dataset using the usual 6 DoF (three rotations and three translations) and compared it with 7 DoF (three rotations, three translations, and one scale) and 9 (three rotations, three translations, and three scales). We also performed the comparison between the proposed approach and two other registration approaches: pairwise and incremental. We show that the proposed algorithm achieves an average registration error of less than 2 mm on the African violet dataset. Also, we used VisND, an N-dimensional spatio-temporal visualization tool, to perform a visual assessment of the aligned time-varying 3D models of the plants.
... These three challenges are essentially difficult cases in which common solutions used for 3D data co-registration often failed 37 (Cheng et al., 2018;Pomerleau et al., 2015). For example, for challenge 1), none of the existing image feature extraction and matching methods are capable of processing cross-view images without overlapping texture (Morel and Yu, 2009) and 3D 1 feature extraction & matching method are rather immature even for well-captured and high-quality 3D data (Hana et al., 2018). ...
Article
Wide-area 3D data generation for complex urban environments often needs to leverage a mixed use of data collected from both air and ground platforms, such as from aerial surveys, satellite, and mobile vehicles. On one hand, such kind of data with information from drastically different views (ca. 90° and more) forming cross-view data, which due to very limited overlapping region caused by the drastically different line of sight of the sensors, is difficult to be registered without significant manual efforts. On the other hand, the registration of such data often suffers from non-rigid distortion of the street-view data (e.g., non-rigid trajectory drift), which cannot be simply rectified by a similarity transformation. In this paper, based on the assumption that the object boundaries (e.g., buildings) from the over-view data should coincide with footprints of façade 3D points generated from street-view photogrammetric images, we aim to address this problem by proposing a fully automated geo-registration method for cross-view data, which utilizes semantically segmented object boundaries as view-invariant features under a global optimization framework through graph-matching: taking the over-view point clouds generated from stereo/multi-stereo satellite images and the street-view point clouds generated from monocular video images as the inputs, the proposed method models segments of buildings as nodes of graphs, both detected from the satellite-based and street-view based point clouds, thus to form the registration as a graph-matching problem to allow non-rigid matches; to enable a robust solution and fully utilize the topological relations between these segments, we propose to address the graph-matching problem on its conjugate graph solved through a belief-propagation algorithm. The matched nodes will be subject to a further optimization to allow precise-registration, followed by a constrained bundle adjustment on the street-view image to keep 2D-3D consistencies, which yields well-registered street-view images and point clouds to the satellite point clouds. Our proposed method assumes no or little prior pose information (e.g. very sparse locations from consumer-grade GPS (global positioning system)) for the street-view data and has been applied to a large cross-view dataset with significant scale difference containing 0.5 m GSD (Ground Sampling Distance) satellite data and 0.005 m GSD street-view data, 1.5 km in length involving 12 GB of data. The experiment shows that the proposed method has achieved promising results (1.27 m accuracy in 3D), evaluated using collected LiDAR point clouds. Furthermore, we included additional experiments to demonstrate that this method can be generalized to process different types of over-view and street-view data sources, e.g., the open street view maps and the semantic labeling maps.
... Many authors use this approach, with various choices for the descriptors (Huang et al. 2008;Gelfand et al. 2005;Yang et al. 2016). Han et al. (2018) provides a comprehensive review of such descriptors. Landmarks can also be used to specify key points of the surfaces that are known to be matching because of a priori knowledge by the operator (Allen, Curless, and Popovic 2003). ...
Thesis
The prevalence of obesity have been increasing for several decades and it is now estimated to affect close to 30% of population worldwide. The increased mass of adipose tissues of obese vehicle occupants can negatively affect the risk and severity of injuries sustained during a car crash. Human Body Models (HBM) are a useful tool for studying crash scenarios in order to design better safety systems, but they mainly target non-obese population. However, morphing has been suggested as an alternative to the costly development of new model. This thesis presents enhancements to existing morphing methods which allow using very detailed morphing targets to personalize not only the external shape of the body, but also the inside. The methods were implemented as open source software in the PIPER framework. They were then applied to create a detailed personalized HBM that describe both the subcutaneous fat and abdominal fold of three obese Post Mortem Human Surrogates (PMHS). Tests performed on the same PMHS to characterize the interaction of an obese abdomen with the safety belt were then simulated using the morphed models. The results show the importance of the abdominal fold, which was not accounted for in previous studies, to correctly capture the behaviour of the abdomen. Overall, these results bring new knowledge and methods allowing to better understand and simulate the restrains conditions of obese subjects in automotive environment. These should help in the future to design more efficient restrain systems
... The mobile mapping system can obtain street view data with high position accuracy and high resolution, providing rich facade information and a better three-dimensional description of the scene. However, the 3D point cloud data are usually noisy and lacking the texture information from the top view (Haala et al. 2008;Hana et al. 2018). We argue that data collected from multi-platforms can be complementary. ...
Article
Full-text available
In a complex urban scene, observation from a single sensor unavoidably leads to voids in observations, failing to describe urban objects in a comprehensive manner. In this paper, we propose a spatio-temporal-spectral-angular observation model to integrate observations from UAV and mobile mapping vehicle platform, realizing a joint, coordinated observation operation from both air and ground. We develop a multi-source remote sensing data acquisition system to effectively acquire multi-angle data of complex urban scenes. Multi-source data fusion solves the missing data problem caused by occlusion and achieves accurate, rapid, and complete collection of holographic spatial and temporal information in complex urban scenes. We carried out an experiment on Baisha Town, Chongqing, China and obtained multi-sensor, multi-angle data from UAV and mobile mapping vehicle. We first extracted the point cloud from UAV and then integrated the UAV and mobile mapping vehicle point cloud. The integrated results combined both the characteristics of UAV and mobile mapping vehicle point cloud, confirming the practicability of the proposed joint data acquisition platform and the effectiveness of spatio-temporal-spectral-angular observation model. Compared with the observation from UAV or mobile mapping vehicle alone, the integrated system provides an effective data acquisition solution toward comprehensive urban monitoring.
... Three-dimensional object recognition has been under investigation for a long time in various research fields, such as pattern recognition, computer graphics, and robotics [4,14,20,28]. Although an exhaustive survey of 3D object descriptors is beyond the scope of this paper [2,6,26], we will review the main efforts. Object representations based on just RGB data are sensitive to illuminations and shadows. ...
Article
Full-text available
Despite the recent success of state-of-the-art 3D object recognition approaches, service robots still frequently fail to recognize many objects in real human-centric environments. For these robots, object recognition is a challenging task due to the high demand for accurate and real-time response under changing and unpredictable environmental conditions. Most of the recent approaches use either the shape information only and ignore the role of color information or vice versa. Furthermore, they mainly utilize the LnLnL_n Minkowski family functions to measure the similarity of two object views, while there are various distance measures that are applicable to compare two object views. In this paper, we explore the importance of shape information, color constancy, color spaces, and various similarity measures in open-ended 3D object recognition. Toward this goal, we extensively evaluate the performance of object recognition approaches in three different configurations, including color-only, shape-only, and combinations of color and shape, in both offline and online settings. Experimental results concerning scalability, memory usage, and object recognition performance show that all of the combinations of color and shape yield significant improvements over the shape-only and color-only approaches. The underlying reason is that color information is an important feature to distinguish objects that have very similar geometric properties with different colors and vice versa. Moreover, by combining color and shape information, we demonstrate that the robot can learn new object categories from very few training examples in a real-world setting.
... Objects with irregular shapes are often detected with feature-based methods, which require features with high distinctiveness and descriptiveness. Based on the scope, features can be divided into local features which associate with individual points and global features which associate with the whole point set [4,30]. For local descriptors, the detection consists of four steps: key points extraction, description and matching, correspondence grouping, and RANSAC orientation derivation [4]. ...
Article
Full-text available
As-built building information modeling (BIM) has gained much attention in mechanical, electrical and plumbing (MEP) systems for better facility management. To create as-built BIMs, laser scanning technology is widely used to collect raw data due to its high measurement speed and accuracy. Currently, as-built models are mostly drawn by experienced personnel in BIM modeling software with point cloud data as reference, which is labor intensive and time consuming. This study presents a fully automated approach to converting terrestrial laser scanning data to well-connected as-built BIMs for MEP scenes. According to the geometry complexity, MEP components are divided into regular shaped components and irregular shaped components. A 2D to 3D analysis framework is developed to detect objects and extract accurate geometry information for the two categories of MEP components. Firstly, the MEP scene is divided into slices on which rough geometry information of components' cross sections is extracted. Then, the extracted information on different slices is integrated and analyzed in 3D space to verify the existence of MEP components and obtain refined geometry information used for modeling. Following the detection stage, an MEP network construction approach is developed for MEP components connection and position fine-tuning. Finally, the extracted geometry information and connection relationships are imported into Dynamo to automatically generate the parametric BIM model. To validate the feasibility of the proposed technique , experiments were conducted with point clouds acquired from three scenes in Hong Kong. A comprehensive assessment is presented to evaluate the as-built model quality with three indices: retrieval rate, geometry parameter accuracy and deviation from point clouds to as-built model. The experiment results show that the proposed technique could successfully transform laser scanning data of MEP scenes to as-built BIMs with sufficient accuracy for facility management purpose.
... The classifier uses principal component analysis (PCA) to evaluate dimensionality of the object at defined scales by computing eigenvalues. The proportion of variance (POV) shown by each eigenvalue is given by p i = k i / (k 1 + k 2 + k 3 ), where k 1 , k 2 , and k 3 are eigenvalues obtained for the 3D point cloud using PCA and i represents the eigenvalue number at a particular scale, and leads to three important dimensional parameters: pointness, curveness, and surfaceness (see Eq. (1)) [51]. where p 1 > p 2 > p 3 . ...
Article
Full-text available
Roof bolts such as rock bolts and cable bolts provide structural support in underground mines. Frequent assessment of these support structures is critical to maintain roof stability and minimise safety risks in underground environments. This study proposes a robust workflow to classify roof bolts in 3D point cloud data and to generate maps of roof bolt density and spacing. The workflow was evaluated for identifying roof bolts in an underground coal mine with suboptimal lighting and global navigation satellite system (GNSS) signals not available. The approach is based on supervised classification using the multi-scale Canupo classifier coupled with a random sample consensus (RANSAC) shape detection algorithm to provide robust roof bolt identification. The issue of sparseness in point cloud data has been addressed through upsampling by using a moving least squares method. The accuracy of roof bolt identification was measured by correct identification of roof bolts (true positives), unidentified roof bolts (false negatives), and falsely identified roof bolts (false positives) using correctness, completeness, and quality metrics. The proposed workflow achieved correct identification of 89.27% of the roof bolts present in the test area. However, considering the false positives and false negatives, the overall quality metric was reduced to 78.54%.
... The second, mainly used for smaller datasets, foresees the selection of specific handcrafted features, thus facilitating the learning task and improving the overall performances, though increasing computational times. In this case, most of the features are usually handcrafted for specific tasks (Zhang et al. 2019) and can be subdivided and classified into intrinsic and extrinsic, or also used for local and global descriptors (Han et al., 2018;Weinmann et al., 2015). The local features define the statistical properties of the local neighbourhood geometric information, while the global features describe the whole geometry of the point cloud. ...
Article
Full-text available
The lack of benchmarking data for the semantic segmentation of digital heritage scenarios is hampering the development of automatic classification solutions in this field. Heritage 3D data feature complex structures and uncommon classes that prevent the simple deployment of available methods developed in other fields and for other types of data. The semantic classification of heritage 3D data would support the community in better understanding and analysing digital twins, facilitate restoration and conservation work, etc. In this paper, we present the first benchmark with millions of manually labelled 3D points belonging to heritage scenarios, realised to facilitate the development, training, testing and evaluation of machine and deep learning methods and algorithms in the heritage field. The proposed benchmark, available at http://archdataset.polito.it/, comprises datasets and classification results for better comparisons and insights into the strengths and weaknesses of different machine and deep learning approaches for heritage point cloud semantic segmentation, in addition to promoting a form of crowdsourcing to enrich the already annotated database.
... 3D Feature Extraction Many techniques have been developed in order to obtain global feature descriptors for 3D point sets [13,22,14,8]. Johnson et al. [14] developed a Figure 2. Network Architecture: The network takes as input N points with coordinates (x, y, z). The input is passed to graph signal processing module to generate a re-scaled normalized graph vector and is also passed to deep convolutional feature extraction layers to output a global feature vector N × D. Both the normalized weighted graph and global features goes as input to graph convolutional network to output a global feature signature which is passed to a fully connected layer that scales down the features and assign one of k output classes to each point. ...
Conference Paper
Full-text available
Directly processing 3D point clouds using convolutional neural networks (CNNs) is a highly challenging task primarily due to the lack of explicit neighborhood relationship between points in 3D space. Several researchers have tried to cope with this problem using a preprocessing step of voxelization. Although, this allows to translate the existing CNN architectures to process 3D point clouds but, in addition to computational and memory constraints, it poses quantization artifacts which limits the accurate inference of the underlying object's structure in the illuminated scene. In this paper, we have introduced a more stable and effective end-to-end architecture to classify raw 3D point clouds from indoor and outdoor scenes. In the proposed methodology , we encode the spatial arrangement of neighbouring 3D points inside an undirected symmetrical graph, which is passed along with features extracted from a 2D CNN to a Graph Convolutional Network (GCN) that contains three layers of localized graph convolutions to generate a complete segmentation map. The proposed network achieves on par or even better than state-of-the-art results on tasks like semantic scene parsing, part segmentation and urban classification on three standard benchmark datasets.
... Several approaches have been proposed in the literature for local and global 3D surface description. We refer the reader to the literature by Hana, Jin, Xie, Wang, and Jiang [6]; and by Bayramoglu and Alatan [7], which provide more comprehensive and up to date reviews of 3D shape descriptors. However, we briefly review the literature summarized in Table I. ...
... Coarse registration is based on the identification and matching of common landmarks both in the measured point cloud and in the triangle mesh. Landmarks can be identified through computation of local feature descriptors [3][4][5]. In this work, local curvatures are used. ...
... For this work, we use the Ensemble of Shape Functions (ESF) descriptors [9]. The reason for choosing ESF as descriptors in our system is their robustness to incomplete surfaces [5] as these are very common in the given challenge dataset. Furthermore, ESF descriptors can be efficiently computed from the point cloud without the need for preprocessing steps such as point normal calculation, which reduces the executing time. ...
Conference Paper
In this paper, we present our approach to solve the DEBS Grand challenge 2019 which consists of classifying urban objects in different scenes that originate from a LiDAR sensor. In general, at any point in time, LiDAR data can be considered as a point cloud where a reliable feature extractor and a classification model are required to be able to recognize 3-D objects in such scenes. Herein, we propose and describe an implementation of a 3-D point cloud object detection and classification system based on a 3-D global feature called Ensemble of Shape Functions (ESF) and a random forest object classifier.
... Pose Estimation: Common approaches to retrieve a 3 DoF pose from LiDAR data employ either local features extraction such as FPFH [24] and feature matching using RANSAC [25], or use handcrafted rotation variant global features such as VFH [19] or GASD [18]. An overview of recent research on 3D pose estimation and recognition is given by Han et al. [26]. Velas et al. [27] propose to use a CNN to estimate both translation and rotation between successive LiDAR scans for local motion estimation. ...
Preprint
Full-text available
We introduce a novel method for oriented place recognition with 3D LiDAR scans. A Convolutional Neural Network is trained to extract compact descriptors from single 3D LiDAR scans. These can be used both to retrieve nearby place candidates from a map, and to estimate the yaw discrepancy needed for bootstrapping local registration methods. We employ a triplet loss function for training and use a hard-negative mining strategy to further increase the performance of our descriptor extractor. In an evaluation on the NCLT and KITTI datasets, we demonstrate that our method outperforms related state-of-the-art approaches based on both data-driven and handcrafted data representation in challenging long-term outdoor conditions.
... Pose Estimation: Common approaches to retrieve a 3 DoF pose from LiDAR data employ either local features extraction such as FPFH [24] and feature matching using RANSAC [25], or use handcrafted rotation variant global features such as VFH [19] or GASD [18]. An overview of recent research on 3D pose estimation and recognition is given by Han et al. [26]. Velas et al. [27] propose to use a CNN to estimate both translation and rotation between successive LiDAR scans for local motion estimation. ...
Preprint
Full-text available
We introduce a novel method for oriented place recognition with 3D LiDAR scans. A Convolutional Neural Network is trained to extract compact descriptors from single 3D LiDAR scans. These can be used both to retrieve near-by place candidates from a map, and to estimate the yaw discrepancy needed for bootstrapping local registration methods. We employ a triplet loss function for training and use a hard-negative mining strategy to further increase the performance of our descriptor extractor. In an evaluation on the NCLT and KITTI datasets, we demonstrate that our method outperforms related state-of-the-art approaches based on both data-driven and handcrafted data representation in challenging long-term outdoor conditions.
Article
Full-text available
Point cloud registration is a fundamental problem in computer vision. The problem encompasses critical tasks such as feature estimation, correspondence matching, and transformation estimation. The point cloud registration problem can be cast as a quantile matching problem. We refined the quantile assignment algorithm by integrating prevalent feature descriptors and transformation estimation methods to enhance the correspondence between the source and target point clouds. We evaluated the performances of these descriptors and methods with our approach through controlled experiments on a dataset we constructed using well-known 3D models. This systematic investigation led us to identify the most suitable methods for complementing our approach. Subsequently, we devised a new end-to-end, coarse-to-fine pairwise point cloud registration framework. Finally, we tested our framework on indoor and outdoor benchmark datasets and compared our results with state-of-the-art point cloud registration methods.
Article
Full-text available
As technology evolves, the cost of 3D scanners is falling, which makes 3D computer vision for industrial applications increasingly popular. More and more researchers have started to study 3D computer vision. Point cloud feature descriptors are a fundamental task in 3D computer vision, and descriptors that use spatial features tend to perform better than those without them. Point cloud descriptors can generally be divided into local reference frames-based (LRF-based) and local reference frames-free (LRF-free). The former uses LRFs to provide spatial features to the descriptors, while the latter uses point pair features to provide spatial features. However, the performance of those LRF-based descriptors is more affected by local reference frames (LRFs), and the descriptors with spatial information LRF-free tend to be more computationally intensive because of its point pair combination strategy. Therefore, we propose a strategy named Multi-scale Point Pair Combination Strategy (MSPPCS) that reduces the computation of point pair-based feature descriptors by nearly 70%\% while ensuring that the performance of the descriptor is almost unaffected. We also propose a new descriptor, Spatial Feature Point Pair Histograms (SFPPH), which has excellent performance and robustness due to the diverse spatial features used. We critically evaluate the performance of our descriptor on the Bologna dataset, Kinect dataset, and UWA dataset. The experimental results show that our descriptor is the most robust and performing point cloud feature descriptor.
Article
Full-text available
6D pose estimation of rigid objects from RGB‐D images is crucial for object grasping and manipulation in robotics. Although RGB channels and the depth (D) channel are often complementary, providing respectively the appearance and geometry information, it is still non‐trivial on how to fully benefit from the two cross‐modal data. From the simple yet new observation, when an object rotates, its semantic label is invariant to the pose while its keypoint offset direction is variant to the pose. To this end, we present SO(3)‐Pose, a new representation learning network to explore SO(3)‐equivariant and SO(3)‐invariant features from the depth channel for pose estimation. The SO(3)‐invariant features facilitate to learn more distinctive representations for segmenting objects with similar appearance from RGB channels. The SO(3)‐equivariant features communicate with RGB features to deduce the (missed) geometry for detecting keypoints of an object with the reflective surface from the depth channel. Unlike most of existing pose estimation methods, our SO(3)‐Pose not only implements the information communication between the RGB and depth channels, but also naturally absorbs the SO(3)‐equivariance geometry knowledge from depth images, leading to better appearance and geometry representation learning. Comprehensive experiments show that our method achieves the state‐of‐the‐art performance on three benchmarks. Code is available at https://github.com/phaoran9999/SO3-Pose.
Article
Feature descriptors, as abstraction of the critical information in the lidar point clouds, are often used in global pose initialization in large-area to provide a pose reference for intelligent driving system. The current state-of-the-art method Scan Context descriptor is generated based on points’ maximum height and, therefore, designed especially for outdoor scenarios without a ceiling. This study proposes a generic descriptor for both outdoor and indoor scenarios based on the point cloud’s Gridded Gaussian Distribution (GGD). Wasserstein distance is introduced to this field to evaluate the proposed GGD descriptors’ matching performance because it not only has a solid mathematical foundation in comparing two Gaussian distributions but also shows excellent time efficiency via a straightforward analytical solution without enumeration or iteration. We construct a multi-step error function to initialize the vehicle pose using conventional cosine similarity, and Wasserstein distance. Two experiments are designed for this study. The first experiment compares the pose initialization performances under various multi-frame superimposition distance in space to find an efficient GGD descriptor extraction setting. The second experiment verifies that the proposed method achieves a better pose initialization success rate than the mainstream methods.
Article
Full-text available
In the case of simultaneous localization and mapping, route planning and navigation are based on data captured by multiple sensors, including built-in cameras. Nowadays, mobile devices frequently have more than one camera with overlapping fields of view, leading to solutions where depth information can also be gathered along with ordinary RGB color data. Using these RGB-D sensors, two- and three-dimensional point clouds can be recorded from the mobile devices, which provide additional information for localization and mapping. The method of matching point clouds during the movement of the device is essential: reducing noise while having an acceptable processing time is crucial for a real-life application. In this paper, we present a novel ISVD-based method for displacement estimation, using key points detected by SURF and ORB feature detectors. The ISVD algorithm is a fitting procedure based on SVD resolution, which removes outliers from the point clouds to be fitted in several steps. The developed method removes these outlying points in several steps, in each iteration examining the relative error of the point pairs and then progressively reducing the maximum error for the next matching step. An advantage over relevant methods is that this method always gives the same result, as no random steps are included.
Article
Full-text available
The classic pipeline of 3D point cloud registration involves two steps: point feature matching and the globally consistent refinement. We focus on the first step, which can be further divided into three parts: keypoint detection, feature descriptor extraction and pairwise-point correspondence estimation. In practical applications, point feature matching is ambiguous and challenging due to the low overlap of multiple scans, inconsistency of point density and unstructured properties. In this paper, we construct a Kd-Octree hybrid index structure to organize the point cloud and generate patch-based feature descriptors at its leaf nodes, and further propose a simply yet effective convolutional neural network, termed as KdO-Net, with Kd-Octree based descriptors as input for 3D pairwise point cloud matching. In particular, we present a novel nearest neighbor searching strategy to address computation problem. Thereafter, our method is evaluated on an indoor BundleFusion benchmark, generalized to a challenging outdoor ETH dataset, and extended to our own complicated and low-overlapped TUM-lab dataset. The empirical results graphically demonstrate that our method achieves superior precision and comparable feature matching recall to prior state-of-the-art deep learning-based methods, even though the overlap is less than 30 percent. Finally, we implement quantitative and qualitative ablated experiments and visualization interpretations to further illustrate the behavior and insights of our network.
Article
Full-text available
Due to their advantages, there is an increase of applying robotic systems for small batch production as well as for complex manufacturing processes. However, programming and configuring robots is time and resource consuming while being also accompanied by high costs that are especially challenging for small- and medium-sized enterprises. The current way of programming industrial robots by using teach-in control devices and/or using vendor-specific programming languages is in general a complex activity that requires extensive knowledge in the robotics domain. It is therefore important to offer new practical methods for the programming of industrial robots that provide flexibility and versatility in order to achieve feasible robotics solutions for small lot size productions. This paper focuses on the development of a knowledge-driven framework, which should overcome the limitations of state-of-the-art robotics solutions and enhance the agility and autonomy of industrial robotics systems using ontologies as a knowledge-source. The framework includes reasoning and perception abilities as well as the ability to generate plans, select appropriate actions, and finally execute these actions. In this context, a challenge is the fusion of vision system information with the decision-making component, which can use this information for generating the assembly tasks and executable programs. The introduced product model in the form of an ontology enables that the framework can semantically link perception data to product models to consequently derive handling operations and required tools. Besides, the framework enables an easier adaption of robot-based production systems for individualized production, which requires swift configuration and efficient planning. The presented approach is demonstrated in a laboratory environment with an industrial pilot test case. Our application shows the potential to reduce the efforts needed to program robots in an automated production environment. In this context, the benefits as well as shortcomings of the approach are also discussed in the paper.
Article
Full-text available
Addressing intra-class variation in high similarity shapes is a challenging task in shape representation due to highly common local and global shape characteristics. Therefore, this paper proposes a new set of hand-crafted features for shape recognition by exploiting spectral features of the underlying graph adaptive connectivity formed by the shape characteristics. To achieve this, the paper proposes a new method for formulating an adaptively connected graph on the nodes of the shape outline. The adaptively connected graph is analysed in terms of its spectral bases followed by extracting hand-crafted adaptive graph spectral features (AGSF) to represent both global and local characteristics of the shape. Experimental evaluation using five 2D shape datasets and four challenging 3D shape datasets shows improvements with respect to the existing hand-crafted feature methods up to 9.14% for 2D shapes and up to 14.02% for 3D shapes. Also for 2D datasets, the proposed AGSF has outperformed the deep learning methods by 17.3%.
Article
Full-text available
In the Digital Cultural Heritage (DCH) domain, the semantic segmentation of 3D Point Clouds with Deep Learning (DL) techniques can help to recognize historical architectural elements, at an adequate level of detail, and thus speed up the process of modeling of historical buildings for developing BIM models from survey data, referred to as HBIM (Historical Building Information Modeling). In this paper, we propose a DL framework for Point Cloud segmentation, which employs an improved DGCNN (Dynamic Graph Convolutional Neural Network) by adding meaningful features such as normal and colour. The approach has been applied to a newly collected DCH Dataset which is publicy available: ArCH (Architectural Cultural Heritage) Dataset. This dataset comprises 11 labeled points clouds, derived from the union of several single scans or from the integration of the latter with photogrammetric surveys. The involved scenes are both indoor and outdoor, with churches, chapels, cloisters, porticoes and loggias covered by a variety of vaults and beared by many different types of columns. They belong to different historical periods and different styles, in order to make the dataset the least possible uniform and homogeneous (in the repetition of the architectural elements) and the results as general as possible. The experiments yield high accuracy, demonstrating the effectiveness and suitability of the proposed approach.
Preprint
Full-text available
Despite the recent success of state-of-the-art 3D object recognition approaches, service robots are frequently failed to recognize many objects in real human-centric environments. For these robots, object recognition is a challenging task due to the high demand for accurate and real-time response under changing and unpredictable environmental conditions. Most of the recent approaches use either the shape information only and ignore the role of color information or vice versa. Furthermore, they mainly utilize the LnL_n Minkowski family functions to measure the similarity of two object views, while there are various distance measures that are applicable to compare two object views. In this paper, we explore the importance of shape information, color constancy, color spaces, and various similarity measures in open-ended 3D object recognition. Towards this goal, we extensively evaluate the performance of object recognition approaches in three different configurations, including \textit{color-only}, \textit{shape-only}, and \textit{ combinations of color and shape}, in both offline and online settings. Experimental results concerning scalability, memory usage, and object recognition performance show that all of the \textit{combinations of color and shape} yields significant improvements over the \textit{shape-only} and \textit{color-only} approaches. The underlying reason is that color information is an important feature to distinguish objects that have very similar geometric properties with different colors and vice versa. Moreover, by combining color and shape information, we demonstrate that the robot can learn new object categories from very few training examples in a real-world setting.
Article
Full-text available
In the recent decade, the development of 3D scanners brings the expansion of 3D models, which yields in the increase of demand for developing effective 3D point cloud retrieval methods using only unorganized point clouds instead of mesh data. In this paper, we propose a meshing-free framework for point cloud retrieval by exploiting a bidirectional similarity measurement on local features. Specifically, we first introduce an effective pipeline for keypoint selection by applying principal component analysis to pose normalization and thresholding local similarity of normals. Then, a point cloud based feature descriptor is employed to compute local feature descriptors directly from point clouds. Finally, we propose a bidirectional feature match strategy to handle the similarity measure. Experimental evaluation on a publicly available benchmark demonstrates the effectiveness of our framework and shows it can outperform other alternatives involving state-of-the-art techniques.
Article
Full-text available
3D point clouds are important for the reconstruction of environment. However, comparing to the artificial VR scene representation methods, 3D point clouds are more difficult to correspond to real scenes. In this paper, a method for detecting keypoints and describing scale invariant point feature of 3D point clouds is proposed. To detect, we first select keypoints as the saliency points with fast changing speed along with all principal directions of the searching area of the point cloud. The searching area is a searching keyscale which represents the unique scale size of the point cloud. Then, the descriptor is encoded based on the shape of a border or silhouette of an object to be detected or recognized. We also introduce a vote-casting-based 3D multi-scale object detection method. Experimental results based on synthetic data, real data and vote-casting scheme show that we can easily deal with the different tasks without additional information.
Conference Paper
Full-text available
This paper presents a global point cloud descriptor to be used for efficient object recognition and pose estimation. The proposed method is based on the estimation of a reference frame for the whole point cloud that represents an object instance, which is used for aligning it with the canonical coordinate system. After that, a descriptor is computed for the aligned point cloud based on how its 3D points are spatially distributed. Such descriptor is also extended with color distribution throughout the aligned point cloud. The global alignment transforms of matched point clouds are used for computing object pose. The proposed approach was evaluated with a publicly available dataset, showing that it outperforms major state of the art global descriptors regarding recognition rate and performance and that it allows precise pose estimation.
Article
Full-text available
In this paper, we present a novel RGB-D feature, RISAS, which is robust to Rotation, Illumination and Scale variations through fusing Appearance and Shape information. We propose a keypoint detector which is able to extract information rich regions in both appearance and shape using a novel 3D information representation method in combination with grayscale information. We extend our recent work on Local Ordinal Intensity and Normal Descriptor(LOIND), to further significantly improve its illumination, scale and rotation invariance using 1) a precise neighbourhood region selection method and 2) a more robust dominant orientation estimation. We also present a dataset for evaluation of RGB-D features, together with comprehensive experiments to illustrate the effectiveness of the proposed RGB-D feature when compared to SIFT, C-SHOT and LOIND. We also show the use of RISAS for point cloud alignment associated with many robotics applications and demonstrate its effectiveness in a poorly illuminated environment when compared with SIFT and ORB.
Conference Paper
Full-text available
In this paper, we present a novel 3D descriptor that bridges the gap between global and local approaches. While local descriptors proved to be a more attractive choice for object recognition within cluttered scenes, they remain less discriminating exactly due to the limited scope of the local neighborhood. On the other hand, global descriptors can better capture relationships between distant points, but are generally affected by occlusions and clutter. So, we propose the Local-to-Global Signature (LGS) descriptor, which relies on surface point classification together with signature-based features to overcome the drawbacks of both local and global approaches. As our tests demonstrate, the proposed LGS can capture more robustly the exact structure of the objects while remaining robust to clutter and occlusion and avoiding sensitive, low-level features, such as point normals. The tests performed on four different datasets demonstrate the robustness of the proposed LGS descriptor when compared to three of the SOTA descriptors today: SHOT, Spin Images and FPFH. In general, LGS outperformed all three descriptors and for some datasets with a 50–70% increase in Recall.
Article
Full-text available
We provide new insights to the problem of shape feature description and matching, techniques that are often applied within 3D object recognition pipelines. We subject several state of the art features to systematic evaluations based on multiple datasets from different sources in a uniform manner. We have carefully prepared and performed a neutral test on the datasets for which the descriptors have shown good recognition performance. Our results expose an important fallacy of previous results, namely that the performance of the recognition system does not correlate well with the performance of the descriptor employed by the recognition system. In addition to this, we evaluate several aspects of the matching task, including the efficiency of the different features, and the potential in using dimension reduction. To arrive at better generalization properties, we introduce a method for fusing several feature matches with a limited processing overhead. Our fused feature matches provide a significant increase in matching accuracy, which is consistent over all tested datasets. Finally, we benchmark all features in a 3D object recognition setting, providing further evidence of the advantage of fused features, both in terms of accuracy and efficiency.
Preprint
Full-text available
In this paper, we introduce a dictionary learning framework using RGB-D covariance descriptors on point cloud data for performing object classification. Dictionary learning in combination with RGB-D covariance descriptors provides a compact and flexible description of point cloud data. Furthermore, the proposed framework is ideal for updating and sharing dictionaries among robots in a decentralized or cloud network. This work demonstrates the increased performance of 3D object classification utilizing covariance descriptors and dictionary learning over previous results with experiments performed on a publicly available RGB-D database.
Preprint
Full-text available
In this paper, we introduce a new covariance based feature descriptor to be used on "colored" point clouds gathered by a mobile robot equipped with an RGB-D camera. Although many recent descriptors provide adequate results, there is not yet a clear consensus on how to best tackle "colored" point clouds. We present the notion of a covariance on RGB-D data. Covariances have not only been proven to be successful in image processing, but in other domains as well. Their main advantage is that they provide a compact and flexible description of point clouds. Our work is a first step towards demonstrating the usability of the concept of covariances in conjunction with RGBD data. Experiments performed on an RGB-D database and compared to previous results show the increased performance of our method.
Article
Full-text available
3D object recognition from point clouds is considered as a field of research that is growing fast. Based on the types of features used to represent an object, 3D object recognition approaches can be classified into two broad categories—local and global feature-based techniques. Local feature-based techniques are more robust to clutter and partial occlusions that are frequently present in a real-world scene. Whereas, global feature-based techniques are suitable for model retrieval and 3D shape classification especially with the weak geometric structure. Most systems for 3D object recognition use either local or global feature-based techniques. This is because of the difficulty of integrating a set of local features with a single global feature vector in an appropriate manner. In this paper, a 3D object recognition system based on local and global features of the objects using Point Cloud Library (PCL) is proposed. The proposed system uses a hybrid technique based on Viewpoint Feature Histogram (VFH) method and Fast Point Feature Histogram (FPFH) method. VFH method is used as a global descriptor to recognize the object. Whereas, FPFH method is used as a local descriptor to estimate the position of the object in the real-world scene. The performance of the proposed system is evaluated by calculating the accuracy of the recognition process. The experimental results reveal that this system performs well on the tested objects as compared to some state of the art techniques.
Conference Paper
Full-text available
Object Recognition is an essential component for Autonomous Land Vehicle (ALV) navigation in urban environments. This paper presents a thorough evaluation of the performance of some state of the art global descriptors on the public Sydney Urban Objects Dataset1, which was collected in the Central Business District of Sydney. These descriptors are Bounding Box descriptor, Histogram of Local Point Level descriptor, Hierarchy descriptor, and Spin Image (SI). We also propose a novel Global Fourier Histogram (GFH) descriptor. Experimental results on the public data set show that GFH descriptor turns out to be one of the best global descriptors for the object recognition in urban environments, and the results on the data collected by our own ALV in urban environments also demonstrate its usefulness.
Conference Paper
Full-text available
Existing techniques for 3D action recognition are sensitive to viewpoint variations because they extract features from depth images which change significantly with viewpoint. In contrast, we directly process the pointclouds and propose a new technique for action recognition which is more robust to noise, action speed and viewpoint variations. Our technique consists of a novel descriptor and keypoint detection algorithm. The proposed descriptor is extracted at a point by encoding the Histogram of Oriented Principal Components (HOPC) within an adaptive spatio-temporal support volume around that point. Based on this descriptor, we present a novel method to detect Spatio-Temporal Key-Points (STKPs) in 3D pointcloud sequences. Experimental results show that the proposed descriptor and STKP detector outperform state-of-the-art algorithms on three benchmark human activity datasets. We also introduce a new multiview public dataset and show the robustness of our proposed method to viewpoint variations.
Conference Paper
Full-text available
This paper describes a study and analysis of surface normal-base descriptors for 3D object recognition. Specifically, we evaluate the behaviour of descriptors in the recognition process using virtual models of objects created from CAD software. Later, we test them in real scenes using synthetic objects created with a 3D printer from the virtual models. In both cases, the same virtual models are used on the matching process to find similarity. The difference between both experiments is in the type of views used in the tests. Our analysis evaluates three subjects: the effectiveness of 3D descriptors depending on the viewpoint of camera, the geometry complexity of the model and the runtime used to do the recognition process and the success rate to recognize a view of object among the models saved in the database.
Article
Full-text available
Recent hardware technologies have enabled acquisition of 3D point clouds from real world scenes in real time. A variety of interactive applications with the 3D world can be developed on top of this new technological scenario. However, a main problem that still remains is that most processing techniques for such 3D point clouds are computationally intensive, requiring optimized approaches to handle such images, especially when real time performance is required. As a possible solution, we propose the use of a 3D moving fovea based on a multiresolution technique that processes parts of the acquired scene using multiple levels of resolution. Such approach can be used to identify objects in point clouds with efficient timing. Experiments show that the use of the moving fovea shows a seven fold performance gain in processing time while keeping 91.6% of true recognition rate in comparison with state-of-the-art 3D object recognition methods.
Conference Paper
Full-text available
Point cloud is one of the primitive representations of 3D data nowadays. Despite that much work has been done in 2D image matching, matching 3D points achieved from different perspective or at different time remains to be a challenging problem. This paper proposes a 3D local descriptor based on 3D self-similarities. We not only extend the concept of 2D self-similarity [1] to the 3D space, but also establish the similarity measurement based on the combination of geometric and photometric information. The matching process is fully automatic i.e. needs no manually selected land marks. The results on the LiDAR and model data sets show that our method has robust performance on 3D data under various transformations and noises.
Conference Paper
Full-text available
Object detection is a fundamental task in computer vision. As the 3D scanning techniques become popular, directly detecting objects through 3D point cloud of a scene becomes an immediate need. We propose an object detection framework combining learning-Based classification, local descriptor, a new variance of RANSAC imposing rigid-body constraint and an iterative process for multi-object detection in continuous point clouds. The framework not only takes global and local information into account, but also benefits from both learning and empirical methods. The experiments performed on the challenging ground Lidar dataset show the effectiveness of our method.
Article
Full-text available
We present our findings regarding a novel method for interest point detection and feature descriptor calculation in 3D range data called NARF (Normal Aligned Radial Feature). The method makes explicit use of object boundary information and tries to extract the features in areas where the surface is stable but has substantial change in the vicinity.
Article
Full-text available
Recent advances in computer vision on the one hand, and imaging technologies on the other hand, have opened up a number of interesting possibilities for robust 3-D scene labeling. This paper presents contributions in several directions to improve the state-of-the-art in RGB-D scene labeling. First, we present a novel combination of depth and color features to recognize different object categories in isolation. Then, we use a context model that exploits detection results of other objects in the scene to jointly optimize labels of co-occurring objects in the scene. Finally, we investigate the use of social media mining to develop the context model, and provide an investigation of its convergence. We perform thorough experimentation on both the publicly available RGB-D Dataset from the University of Washington as well as on the NYU scene dataset. An analysis of the results shows interesting insights about contextual object category recognition, and its benefits.
Conference Paper
Full-text available
We propose a new object descriptor for three dimensional data named the Global Structure Histogram (GSH). The GSH encodes the structure of a local feature response on a coarse global scale, providing a beneficial trade-off between generalization and discrimination. Encoding the structural characteristics of an object allows us to retain low local variations while keeping the benefit of global representative-ness. In an extensive experimental evaluation, we applied the framework to category-based object classification in realistic scenarios. We show results obtained by combining the GSH with several different local shape representations, and we demonstrate significant improvements to other state-of-the-art global descriptors.
Conference Paper
Full-text available
In this paper, we present a variant of SURE, an interest point detector and descriptor for 3D point clouds and depth images and use it for recognizing semantically distinct places in indoor environments. The SURE interest operator selects distinctive points on surfaces by measuring the Variation in surface orientation based on surface normals in the local vicinity of a point. Furthermore SURE includes a view-poseinvariant descriptor that captures local surface properties and incorporates colored texture information. In experiments, we compare our approach to a state-of-the-art feature detector in depth images (NARF). Finally, we evaluate the use of SURE features for recognizing places and demonstrate its advantages.
Article
Full-text available
The selection of suitable features and their parameters for the classification of three-dimensional laser range data is a crucial issue for high-quality results. In this paper we compare the performance of different histogram descriptors and their parameters on three urban datasets recorded with various sensors—sweeping SICK lasers, tilting SICK lasers and a Velodyne 3D laser range scanner. These descriptors are 1D, 2D, and 3D histograms capturing the distribution of normals or points around a query point. We also propose a novel histogram descriptor, which relies on the spectral values in different scales. We argue that choosing a larger support radius and a z-axis based global reference frame/axis can boost the performance of all kinds of investigated classification models significantly. The 3D histograms relying on the point distribution, normal orientations, or spectral values, turned out to be the best choice for the classification in urban environments.
Article
Full-text available
One of the most important tasks for mobile robots is to sense their environment. Further tasks might include the recognition of objects in the surrounding environment. Three dimensional range finders have become the sensors of choice for mapping the environment of a robot. Recognizing objects in point clouds provided by such sensors is a difficult task. The main contribution of this paper is the introduction of a new covariance based point cloud descriptor for such object recognition. Covariance based descriptors have been very successful in image processing. One of the main advantages of these descriptors is their relatively small size. The comparisons between different covariance matrices can also be made very efficient. Experiments with real world and synthetic data will show the superior performance of the covariance descriptors on point clouds compared to state-of-the-art methods.
Article
Full-text available
Recognizing 3D objects in the presence of noise, varying mesh resolution, occlusion and clutter is a very challenging task. This paper presents a novel method named Rotational Projection Statistics (RoPS). It has three major modules: local reference frame (LRF) definition, RoPS feature description and 3D object recognition. We propose a novel technique to define the LRF by calculating the scatter matrix of all points lying on the local surface. RoPS feature descriptors are obtained by rotationally projecting the neighboring points of a feature point onto 2D planes and calculating a set of statistics (including low-order central moments and entropy) of the distribution of these projected points. Using the proposed LRF and RoPS descriptor, we present a hierarchical 3D object recognition algorithm. The performance of the proposed LRF, RoPS descriptor and object recognition algorithm was rigorously tested on a number of popular and publicly available datasets. Our proposed techniques exhibited superior performance compared to existing techniques. We also showed that our method is robust with respect to noise and varying mesh resolution. Our RoPS based algorithm achieved recognition rates of 100, 98.9, 95.4 and 96.0% respectively when tested on the Bologna, UWA, Queen’s and Ca’ Foscari Venezia Datasets.
Conference Paper
Full-text available
In this paper we present a method for building models for grasping from a single 3D snapshot of a scene composed of objects of daily use in human living environments. We employ fast shape estimation, probabilistic model fitting and verification methods capable of dealing with different kinds of symmetries, and combine these with a triangular mesh of the parts that have no other representation to model previously unseen objects of arbitrary shape. Our approach is enhanced by the information given by the geometric clues about different parts of objects which serve as prior information for the selection of the appropriate reconstruction method. While we designed our system for grasping based on single view 3D data, its generality allows us to also use the combination of multiple views. We present two application scenarios that require complete geometric models: grasp planning and locating objects in camera images.
Conference Paper
Full-text available
This paper investigates the design of a system for recognizing objects in 3D point clouds of urban environments. The system is decomposed into four steps: locating, segmenting, characterizing, and classifying clusters of 3D points. Specifically, we first cluster nearby points to form a set of potential object locations (with hierarchical clustering). Then, we segment points near those locations into foreground and background sets (with a graph-cut algorithm). Next, we build a feature vector for each point cluster (based on both its shape and its context). Finally, we label the feature vectors using a classifier trained on a set of manually labeled objects. The paper presents several alternative methods for each step. We quantitatively evaluate the system and tradeoffs of different alternatives in a truthed part of a scan of Ottawa that contains approximately 100 million points and 1000 objects of interest. Then, we use this truth data as a training set to recognize objects amidst approximately 1 billion points of the remainder of the Ottawa scan.
Conference Paper
Full-text available
This paper proposes a novel 3D scene interpretation approach for robots in mobile manipulation scenarios using a set of 3D point features (Fast Point Feature Histograms) and probabilistic graphical methods (Conditional Random Fields). Our system uses real time stereo with textured light to obtain dense depth maps in the robot's manipulators working space. For the purposes of manipulation, we want to interpret the planar supporting surfaces of the scene, recognize and segment the object classes into their primitive parts in 6 degrees of freedom (6DOF) so that the robot knows what it is attempting to use and where it may be handled. The scene interpretation algorithm uses a two-layer classification scheme: (i) we estimate Fast Point Feature Histograms (FPFH) as local 3D point features to segment the objects of interest into geometric primitives; and (ii) we learn and categorize object classes using a novel Global Fast Point Feature Histogram (GFPFH) scheme which uses the previously estimated primitives at each point. To show the validity of our approach, we analyze the proposed system for the problem of recognizing the object class of 20 objects in 500 table settings scenarios. Our algorithm identifies the planar surfaces, decomposes the scene and objects into geometric primitives with 98.27% accuracy and uses the geometric primitives to identify the object's class with an accuracy of 96.69%.
Conference Paper
Full-text available
This paper presents a new approach for recognition of 3D objects that are represented as 3D point clouds. We introduce a new 3D shape descriptor called Intrinsic Shape Signature (ISS) to characterize a local/semi-local region of a point cloud. An intrinsic shape signature uses a view-independent representation of the 3D shape to match shape patches from different views directly, and a view-dependent transform encoding the viewing geometry to facilitate fast pose estimation. In addition, we present a highly efficient indexing scheme for the high dimensional ISS shape descriptors, allowing for fast and accurate search of large model databases. We evaluate the performance of the proposed algorithm on a very challenging task of recognizing different vehicle types using a database of 72 models in the presence of sensor noise, obscuration and scene clutter.
Conference Paper
Full-text available
This paper addresses the problem of recognizing free-form 3D objects in point clouds. Compared to traditional approaches based on point descriptors, which depend on local information around points, we propose a novel method that creates a global model description based on oriented point pair features and matches that model locally using a fast voting scheme. The global model description consists of all model point pair features and represents a mapping from the point pair feature space to the model, where similar features on the model are grouped together. Such representation allows using much sparser object and scene point clouds, resulting in very fast performance. Recognition is done locally using an efficient voting scheme on a reduced two-dimensional search space. We demonstrate the efficiency of our approach and show its high recognition performance in the case of noise, clutter and partial occlusions. Compared to state of the art approaches we achieve better recognition rates, and demonstrate that with a slight or even no sacrifice of the recognition performance our method is much faster then the current state of the art approaches.
Conference Paper
Full-text available
This paper deals with local 3D descriptors for surface matching. First, we categorize existing methods into two classes: Signatures and Histograms. Then, by discussion and experiments alike, we point out the key issues of unique- ness and repeatability of the local reference frame. Based on these observations, we formulate a novel comprehensive proposal for surface representation, which encompasses a new unique and repeatable local reference frame as well as a new 3D descriptor. The latter lays at the intersection between Signatures and His- tograms, so as to possibly achieve a better balance between descriptiveness and robustness. Experiments on publicly available datasets as well as on range scans obtained with Spacetime Stereo provide a thorough validation of our proposal.
Conference Paper
Full-text available
We propose an approach for detecting objects in large-scale range datasets that combines bottom-up and top-down processes. In the bottom-up stage, fast-to-compute local descriptors are used to detect potential target objects. The object hypotheses are verified after alignment in a top-down stage using global descriptors that capture larger scale structure information. We have found that the combination of spin images and Extended Gaussian Images, as local and global descriptors respectively, provides a good trade-off between efficiency and accuracy. We present results on real outdoors scenes containing millions of scanned points and hundreds of targets. Our results compare favorably to the state of the art by being applicable to much larger scenes captured under less controlled conditions, by being able to detect object classes and not specific instances, and by being able to align the query with the best matching model accurately, thus obtaining precise segmentation.
Article
In recent years, 3D point cloud has gained increasing attention as a new representation for objects. However, the raw point cloud is often noisy and contains outliers. Therefore, it is crucial to remove the noise and outliers from the point cloud while preserving the features, in particular, its fine details. This paper makes an attempt to present a comprehensive analysis of the state-of-the-art methods for filtering point cloud. The existing methods are categorized into seven classes, which concentrate on their common and obvious traits. An experimental evaluation is also performed to demonstrate robustness, effectiveness and computational efficiency of several methods used widely in practice.
Conference Paper
Depth scans acquired from different views may contain nuisances such as noise, occlusion, and varying point density. We propose a novel Signature of Geometric Centroids descriptor, supporting direct shape matching on the scans, without requiring any preprocessing such as scan denoising or converting into a mesh. First, we construct the descriptor by voxelizing the local shape within a uniquely defined local reference frame and concatenating geometric centroid and point density features extracted from each voxel. Second, we compare two descriptors by employing only corresponding voxels that are both non-empty, thus supporting matching incomplete local shape such as those close to scan boundary. Third, we propose a descriptor saliency measure and compute it from a descriptor-graph to improve shape matching performance. We demonstrate the descriptor’s robustness and effectiveness for shape matching by comparing it with three state-of-the-art descriptors, and applying it to object/scene reconstruction and 3D object recognition.
Article
An algorithm for pairwise non-rigid registration of 3D point clouds is presented in the specific context of isometric deformations. The critical step is registration of point clouds at different epochs captured from an isometric deformation surface within overlapping regions. Based on characteristics invariant under isometric deformation, a variant of the four-point congruent sets algorithm is applied to generate correspondences between two deformed point clouds, and subsequently a RANSAC framework is used to complete cluster extraction in preparation for global optimal alignment. Examples are presented and the results compared with existing approaches to demonstrate the two main contributions of the technique: a success rate for generating true correspondences of 90% and a root mean square error after final registration of 2–3 mm.
Article
Object representation is one of the most challenging tasks in robotics because it must provide reliable information in real-time to enable the robot to physically interact with the objects in its environment. To ensure robustness, a global object descriptor must be computed based on a unique and repeatable object reference frame. Moreover, the descriptor should contain enough information enabling to recognize the same or similar objects seen from different perspectives. This paper presents a new object descriptor named Global Orthographic Object Descriptor (GOOD) designed to be robust, descriptive and efficient to compute and use. We propose a novel sign disambiguation method, for computing a unique reference frame from the eigenvectors obtained through Principal Component Analysis of the point cloud of the target object view captured by a 3D sensor. Three principal orthographic projections and their distribution matrices are computed by exploiting the object reference frame. The descriptor is finally obtained by concatenating the distribution matrices in a sequence determined by entropy and variance features of the projections. Experimental results show that the overall classification performance obtained with GOOD is comparable to the best performances obtained with the state-of-the-art descriptors. Concerning memory and computation time, GOOD clearly outperforms the other descriptors. Therefore, GOOD is especially suited for real-time applications. The estimated object’s pose is precise enough for real-time object manipulation tasks.