Article

基于点‑体素一致性约束的城市激光雷达点云分类

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Point cloud semantic segmentation, which contributes to scene understanding at different scales, is crucial for three-dimensional reconstruction and digital twin cities. However, current semantic segmentation methods mostly extract multi-scale features by down-sampling operations, but the feature maps only have a single receptive field at the same scale, resulting in the misclassification of objects with spatial similarity. To effectively capture the geometric features and the semantic information of different receptive fields, a multi-scale voxel-point adaptive fusion network (MVP-Net) is proposed for point cloud semantic segmentation in urban scenes. First, a multi-scale voxel fusion module with gating mechanism is designed to explore the semantic representation ability of different receptive fields. Then, a geometric self-attention module is constructed to deeply fuse fine-grained point features with coarse-grained voxel features. Finally, a pyramid decoder is introduced to aggregate context information at different scales for enhancing feature representation. The proposed MVP-Net was evaluated on three datasets, Toronto3D, WHU-MLS, and SensatUrban, and achieved superior performance in comparison to the state-of-the-art (SOTA) methods. For the public Toronto3D and SensatUrban datasets, our MVP-Net achieved a mIoU of 84.14% and 59.40%, and an overall accuracy of 98.12% and 93.30%, respectively.
Article
Full-text available
In this paper, a Backward Attentive Fusing Network with Local Aggregation Classifier (BAF-LAC) is proposed to improve the performance of 3D point cloud semantic segmentation. It consists of a Backward Attentive Fusing Encoder-Decoder (BAF-ED) to learn semantic features and a Local Aggregation Classifier (LAC) to maintain the context-awareness of points. BAF-ED narrows the semantic gap between the encoder and the decoder via fusing multi-layer encoder features with the decoder features. High-level encoder features are transformed into an attention map to modulate low-level encoder features backward. LAC adaptively enhances the intermediate features in point-wise MLPs via aggregating the features of neighboring points into the center point. It takes the place of commonly used post-processing techniques and retains context consistency into the classifier. Equipped with these modules, BAF-LAC can extract discriminative semantic features and predict smoother results. Extensive experiments on Semantic3D, SemanticKITTI, and S3DIS demonstrate that the proposed method can achieve competitive results against the state-of-the-art methods.
Conference Paper
Full-text available
Semantic segmentation of large-scale outdoor point clouds is essential for urban scene understanding in various applications, especially autonomous driving and urban high-definition (HD) mapping. With rapid developments of mobile laser scanning (MLS) systems, massive point clouds are available for scene understanding, but publicly accessible large-scale labeled datasets, which are essential for developing learning-based methods, are still limited. This paper introduces Toronto-3D, a large-scale urban outdoor point cloud dataset acquired by a MLS system in Toronto, Canada for semantic segmentation. This dataset covers approximately 1 km of point clouds and consists of about 78.3 million points with 8 labeled object classes. Baseline experiments for semantic segmentation were conducted and the results confirmed the capability of this dataset to train deep learning models effectively. Toronto-3D is released 1 to encourage new research, and the labels will be improved and updated with feedback from the research community.
Article
Full-text available
In this article we describe a new convolutional neural network (CNN) to classify 3D point clouds of urban or indoor scenes. Solutions are given to the problems encountered working on scene point clouds, and a network is described that allows for point classification using only the position of points in a multi-scale neighborhood. On the reduced-8 Semantic3D benchmark [Hackel et al., 2017], this network, ranked second, beats the state of the art of point classification methods (those not using a regularization step).
Article
Full-text available
Point clouds provide a flexible and scalable geometric representation suitable for countless applications in computer graphics; they also comprise the raw output of most 3D data acquisition devices. Hence, the design of intelligent computational models that act directly on point clouds is critical, especially when efficiency considerations or noise preclude the possibility of expensive denoising and meshing procedures. While hand-designed features on point clouds have long been proposed in graphics and vision, however, the recent overwhelming success of convolutional neural networks (CNNs) for image analysis suggests the value of adapting insight from CNN to the point cloud world. To this end, we propose a new neural network module dubbed EdgeConv suitable for CNN-based high-level tasks on point clouds including classification and segmentation. EdgeConv is differentiable and can be plugged into existing architectures. Compared to existing modules operating largely in extrinsic space or treating each point independently, EdgeConv has several appealing properties: It incorporates local neighborhood information; it can be stacked or recurrently applied to learn global shape properties; and in multi-layer systems affinity in feature space captures semantic characteristics over potentially long distances in the original embedding. Beyond proposing this module, we provide extensive evaluation and analysis revealing that EdgeConv captures and exploits fine-grained geometric properties of point clouds. The proposed approach achieves state-of-the-art performance on standard benchmarks including ModelNet40 and S3DIS.
Article
Full-text available
Few prior works study deep learning on point sets. PointNet by Qi et al. is a pioneer in this direction. However, by design PointNet does not capture local structures induced by the metric space points live in, limiting its ability to recognize fine-grained patterns and generalizability to complex scenes. In this work, we introduce a hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set. By exploiting metric space distances, our network is able to learn local features with increasing contextual scales. With further observation that point sets are usually sampled with varying densities, which results in greatly decreased performance for networks trained on uniform densities, we propose novel set learning layers to adaptively combine features from multiple scales. Experiments show that our network called PointNet++ is able to learn deep point set features efficiently and robustly. In particular, results significantly better than state-of-the-art have been obtained on challenging benchmarks of 3D point clouds.
Article
Full-text available
This paper presents a new 3D point cloud classification benchmark data set with over four billion manually labelled points, meant as input for data-hungry (deep) learning methods. We also discuss first submissions to the benchmark that use deep convolutional neural networks (CNNs) as a work horse, which already show remarkable performance improvements over state-of-the-art. CNNs have become the de-facto standard for many tasks in computer vision and machine learning like semantic segmentation or object detection in images, but have no yet led to a true breakthrough for 3D point cloud labelling tasks due to lack of training data. With the massive data set presented in this paper, we aim at closing this data gap to help unleash the full potential of deep learning methods for 3D labelling tasks. Our semantic3D.net data set consists of dense point clouds acquired with static terrestrial laser scanners. It contains 8 semantic classes and covers a wide range of urban outdoor scenes: churches, streets, railroad tracks, squares, villages, soccer fields and castles. We describe our labelling interface and show that our data set provides more dense and complete point clouds with much higher overall number of labelled points compared to those already available to the research community. We further provide baseline method descriptions and comparison between methods submitted to our online system. We hope semantic3D.net will pave the way for deep learning methods in 3D point cloud labelling to learn richer, more general 3D representations, and first submissions after only a few months indicate that this might indeed be the case.
Article
Given the prominence of 3D sensors in recent years, 3D point cloud scene data are worthy to be further investigated. Point cloud scene understanding is a challenging task because of its characteristics of large-scale and discrete. In this study, we propose a network called LEARD-Net, focuses on semantic segmentation for the large-scale point cloud scene data with color information. The proposed network contains three main components: (1) To fully utilize color information of point clouds rather than just as initial input features, we propose a robust local feature extraction module (LFE) to benefit the network focus on both spatial geometric structure, color information and semantic features. (2) We propose a local feature aggregation module (LFA) to benefit the network to focus on the local significant features while also focus on the entire local neighbor. (3) To allow the network to focus on both local and comprehensive features, we use residual and dense connections (ResiDense) to connect different-level LFE and LFA modules. Comparing with state-of-the-art networks on several large-scale benchmark datasets, including S3DIS, Toronto3D and Semantic3D, we demonstrate the effectiveness of our LEARD-Net.
Article
Although novel point cloud semantic segmentation schemes that continuously surpass state-of-the-art results exist, the success of learning an effective model typically relies on the availability of abundant labeled data. However, data annotation is a time-consumng and labor-intensive task, particularly for large-scale airborne laser scanning (ALS) point clouds involving multiple classes in urban areas. Therefore, simultaneously obtaining promising results while significantly reducing labeling is crucial. In this study, we propose a deep-learning-based weakly supervised framework for the semantic segmentation of ALS point clouds. This is to exploit implicit information from unlabeled data subject to incomplete and sparse labels. Entropy regularization is introduced to penalize class overlap in the predictive probability. Additionally, a consistency constraint is designed to improve the robustness of the predictions by minimizing the difference between the current and ensemble predictions. Finally, we propose an online soft pseudo-labeling strategy to create additional supervisory sources in an efficient and nonparametric manner. Extensive experimental analysis using three benchmark datasets demonstrates that our proposed method significantly boosts the classification performance without compromising the computational efficiency, considering the sparse point annotations. It outperforms the current weakly supervised methods and achieves a result comparable to that of full supervision competitors. Considering the ISPRS Vaihingen 3D data, using only 1‰ labels, our method achieved an overall accuracy of 83.0% and an average F1 score of 70.0%. These increased by 6.9% and 12.8%, respectively, compared to the model trained only using sparse label information.
Article
This work presents FG-Net, a general deep learning framework for large-scale point cloud understanding without voxelizations, which achieves accurate and real-time performance with a single NVIDIA GTX 1080 8G GPU and an i7 CPU. First, a novel noise and outlier filtering method is designed to facilitate the subsequent high-level understanding tasks. For effective understanding purpose, we propose a novel plug-and-play module consisting of correlated feature mining and deformable convolution-based geometric-aware modeling, in which the local feature relationships and point cloud geometric structures can be fully extracted and exploited. For the efficiency issue, we put forward a new composite inverse density sampling (IDS)-based and learning-based operation and a feature pyramid-based residual learning strategy to save the computational cost and memory consumption, respectively. Compared with current methods which are only validated on limited datasets, we have done extensive experiments on eight real-world challenging benchmarks, which demonstrates that our approaches outperform state-of-the-art (SOTA) approaches in terms of accuracy, speed, and memory efficiency. Moreover, weakly supervised transfer learning is also conducted to demonstrate the generalization capacity of our method.
Article
3D vision has numerous applications in various areas, such as autonomous vehicles, robotics, digital city, virtual/mixed reality, human-machine interaction, entertainment, and sports. It covers a broad variety of research topics, ranging from 3D data acquisition, 3D modeling, shape analysis, rendering, to interaction. With the rapid development of 3D acquisition sensors (such as low-cost LiDARs, depth cameras, and 3D scanners), 3D data become even more accessible and available. Moreover, the advances in deep learning techniques further boost the development of 3D vision, with a large number of algorithms being proposed recently. We provide a comprehensive review on progress of 3D vision algorithms in recent few years, mostly in the last year. This survey covers seven different topics, including stereo matching, monocular depth estimation, visual localization in large-scale scenes, simultaneous localization and mapping (SLAM), 3D geometric modeling, dynamic human modeling, and point cloud understanding. Although several surveys are now available in the area of 3D vision, this survey is different from few aspects. First, this study covers a wide range of topics in 3D vision and can therefore benefit a broad research community. On the contrary, most existing works mainly focus on a specific topic, such as depth estimation or point cloud learning. Second, this study mainly focuses on the progress in very recent years. Therefore, it can provide the readers with up-to-date information. Third, this paper presents a direct comparison between the progresses in China and abroad. The recent progress in depth image acquisition, including stereo matching and monocular depth estimation, is initially reviewed. The stereo matching algorithms are divided into non-end-to-end stereo matching, end-to-end stereo matching, and unsupervised stereo matching algorithms. The monocular depth estimation algorithms are categorized into depth regression networks and depth completion networks. The depth regression networks are further divided into encoder-decoder networks and composite networks. Then, the recent progress in visual localization, including visual localization in large-scale scenes and SLAM is reviewed. The visual localization algorithms for large-scale scenes are divided into end-to-end and non-end-to-end algorithms, and these non-end-to-end algorithms are further categorized into deep learning-based feature description algorithms, 2D image retrieval-based visual localization algorithms, 2D-3D matching-based visual localization algorithms, and visual localization algorithms based on the fusion of 2D image retrieval and 2D-3D matching. SLAM algorithms are divided into visual SLAM algorithms and multisensor fusion based SLAM algorithms. The recent progress in 3D modeling and understanding, including 3D geometric modeling, dynamic human modeling, and point cloud understanding is further reviewed. 3D geometric modeling algorithms consist of several components, including deep 3D representation learning, deep 3D generative models, structured representation learning and generative models, and deep learning-based 3D modeling. Dynamic human modeling algorithms are divided into multiview RGB modeling algorithms, single-depth camera-based and multiple-depth camera-based algorithms, and single-view RGB modeling methods. Point cloud understanding algorithms are further categorized into semantic segmentation methods and instance segmentation methods for point clouds. The paper is organized as follows. In Section 1, we present the progress in 3D vision outside China. In Section 2, we introduce the progress of 3D vision in China. In Section 3, the 3D vision techniques developed in China and abroad are compared and analyzed. In Section 4, we point out several future research directions in the area. © 2021, Editorial Office of Journal of Image and Graphics. All right reserved.
Article
In recent years, point cloud has become an important type of 3D spatial data. How to improve the understanding abilities of point cloud using artificial intelligence for correct semantic labeling and accurate detection of objects is an urgent and difficult problem. This paper hence proposes an end-to-end 3D point cloud deep learning network, which effectively guarantees the efficiencies of point cloud sampling, the accuracy of feature extraction and the optimization of the overall network performance by the up-down sampling strategy of irregular distribution point cloud, multi-layer aggregation and propagation of features and the loss function for uneven samples. The studies on the large-scale 3D point cloud benchmark data show that it achieves excellent performance in semantic labeling for large-scale outdoor scenes of point clouds, better than those of the state-of-art deep learning networks of point cloud, providing a strong support for the high-performance extraction of 3D geospatial information.
Article
3D neural networks are widely used in real-world applications (e.g., AR/VR headsets, self-driving cars). They are required to be fast and accurate; however, limited hardware resources on edge devices make these requirements rather challenging. Previous work processes 3D data using either voxel-based or point-based neural networks, but both types of 3D models are not hardware-efficient due to the large memory footprint and random memory access. In this paper, we study 3D deep learning from the efficiency perspective. We first systematically analyze the bottlenecks of previous 3D methods. We then combine the best from point-based and voxel-based models together and propose a novel hardware-efficient 3D primitive, Point-Voxel Convolution (PVConv) . We further enhance this primitive with the sparse convolution to make it more effective in processing large (outdoor) scenes. Based on our designed 3D primitive, we introduce 3D Neural Architecture Search (3D-NAS) to explore the best 3D network architecture given a resource constraint. We evaluate our proposed method on six representative benchmark datasets, achieving state-of-the-art performance with 1.8-23.7× measured speedup. Furthermore, our method has been deployed to the autonomous racing vehicle of MIT Driverless, achieving larger detection range, higher accuracy and lower latency.
Article
In this letter, we introduce MappingConvSeg, a continuous convolution network for semantic segmentation of large-scale point clouds. In particular, a conceptually simple, end-to-end learnable, and continuous convolution operator is proposed for learning spatial correlation of unstructured 3-D point clouds. For each local point set, the unstructured point features are first mapped onto a series of learned kernel points based on the spatial relationship, and the continuous convolution is then applied to capture specific local geometrical patterns. Taking the proposed mapping convolution operation as the building block, a hierarchical network is then built for large-scale point cloud semantic segmentation. Experimental results conducted on two public benchmarks, including Toronto-3D and Stanford large-scale 3-D Indoor Spaces (S3DIS) dataset, demonstrate the superiority of the proposed method.
Article
In large-scale road environment, point-based methods require dynamic calculations, and voxel-based methods often lose a lot of information when balancing resolution and performance. To overcome the drawbacks of the above two classical methods, this paper proposes a general network architecture that combines bi-level convolution and dynamic graph edge convolution optimization for multi-object recognition of large-scale road scenes. The framework integrates the convolution operations of two different domains of points and supervoxels to avoid redundant calculations and storage of spatial information in the network. Coupled with the dynamic graph edge convolution optimization, our model enables it to process large-scale point clouds end-to-end at once. Our method was tested and evaluated on different datasets. The experimental results show that our method can achieve higher accuracy in complex road scenes, which is superior to the existing advanced methods.
Article
In this work, we describe a new, general, and efficient method for unstructured point cloud labeling. As the question of efficiently using deep Convolutional Neural Networks (CNNs) on 3D data is still a pending issue, we propose a framework which applies CNNs on multiple 2D image views (or snapshots) of the point cloud. The approach consists in three core ideas. (i) We pick many suitable snapshots of the point cloud. We generate two types of images: a Red-Green-Blue (RGB) view and a depth composite view containing geometric features. (ii) We then perform a pixel-wise labeling of each pair of 2D snapshots using fully convolutional networks. Different architectures are tested to achieve a profitable fusion of our heterogeneous inputs. (iii) Finally, we perform fast back-projection of the label predictions in the 3D space using efficient buffering to label every 3D point. Experiments show that our method is suitable for various types of point clouds such as Lidar or photogrammetric data.
Survey of point cloud semantic segmentation based on deep learning
  • H Y Guan
3D geospatial information extraction of urban objects for smart surveying and mapping
  • Z Dong
一种基于空间特征注意力机制的点云分析方法
  • 王悦
DGPoint: a dynamic graph convolution network for point cloud semantic segmentation
  • Z T Pan
融合点云和多视图的车载激光点云路侧多目标识别
  • 方莉娜
Point cloud analysis method based on spatial feature attention mechanism
  • Y L Qu
Point-voxel CNN for efficient 3D deep learning
  • Y J Lin
A joint network of point cloud and multiple views for roadside objects recognition from mobile laser point clouds
  • L N Fang
DRINet++ : efficient voxel-aspoint point cloud segmentation
  • M S Ye
  • Wan R Xu
Ye M S, Wan R, Xu S J, et al. DRINet++ : efficient voxel-aspoint point cloud segmentation[EB/OL]. (2021-11-16) [2023-10-
基于特征增强核点卷积网络的多光谱LiDAR点云分类方法
  • 陈科
适用于城市场景大规模点云语义标识的深度学习网络
  • 董震
A multispectral LiDAR point cloud classification method based on enhanced features kernel point convolutional network
  • X D Lei
  • 等 基 于 深 度 学 习 的 点 云 语 义 分 割 研 究综述
景 庄 伟, 管 海 燕, 臧 玉 府, 等. 基 于 深 度 学 习 的 点 云 语 义 分 割 研 究综述[J].
Survey of point cloud semantic segmentation based on deep learning
  • Z W Jing
  • H Y Guan
  • Y F Zang
Jing Z W, Guan H Y, Zang Y F, et al. Survey of point cloud semantic segmentation based on deep learning[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(1): 1-26.
  • 董 震 面 向 智 能 化 测 绘 的 城 市 地 物 三 维 提 取
杨 必 胜, 陈 驰, 董 震. 面 向 智 能 化 测 绘 的 城 市 地 物 三 维 提 取 [J].
3D geospatial information extraction of urban objects for smart surveying and mapping
  • B S Yang
  • C Chen
  • Z Dong
Yang B S, Chen C, Dong Z. 3D geospatial information extraction of urban objects for smart surveying and mapping[J].
Cylinder3D: aneffective 3D framework for driving-scene LiDAR semantic segmentation
  • H Zhou
  • X G Zhu
  • X Song
Zhou H, Zhu X G, Song X, et al. Cylinder3D: aneffective 3D framework for driving-scene LiDAR semantic segmentation[EB/ OL]. (2020-08-04)[2023-10-26]. https:∥arxiv.org/abs/2008.01550.
Point cloud analysis method based on spatial feature attention mechanism
  • Y L Qu
  • Y Wang
  • Q Zhang
Qu Y L, Wang Y, Zhang Q, et al. Point cloud analysis method based on spatial feature attention mechanism[J].
DGPoint: 用于三维点云语义分割的动 态图卷积网络
  • 敖建锋 刘友群
刘友群, 敖建锋, 潘仲泰. DGPoint: 用于三维点云语义分割的动 态图卷积网络[J].