Figure - available from: Remote Sensing
This content is subject to copyright.
The three-dimensional 3D convolutional networks (Convnet) F s with L = 6 layers. The input of the network is a patch with q 0 feature channels. A sequence of convolution kernels are applied for multi-layer feature learning (without padding for high computation efficiency), and the size of the output at l -th layer is denoted as n l = n l − 1 − f l + 1 , l = 1 , 2 , … , L , and n 0 = n . The final output is a feature vector ( n 6 = 1 ) . To this end, the kernel size of the last layer is the same as the output of the former layer ( f 6 = n 5 ).
Source publication
Point cloud classification is quite challenging due to the influence of noise, occlusion, and the variety of types and sizes of objects. Currently, most methods mainly focus on subjectively designing and extracting features. However, the features rely on prior knowledge, and it is also difficult to accurately characterize the complex objects of poi...
Similar publications
Urban object segmentation and classification tasks are critical data processing steps in scene understanding, intelligent vehicles and 3D high-precision maps. Semantic segmentation of 3D point clouds is the foundational step in object recognition. To identify the intersecting objects and improve the accuracy of classification, this paper proposes a...
Citations
... One is to regularize the disordered point cloud first, such as projecting a 3D point cloud onto a 2D plane [91][92][93][94][95]. Alternatively, the disordered point cloud is rasterized and voxelized [96][97][98][99], and then processed by convolutional neural networks (CNNS). This kind of algorithm is easier to understand, and can effectively combine the successful experience of a two-dimensional image semantic segmentation algorithm to easily transplant the image semantic segmentation algorithm to three-dimensional point cloud semantic segmentation. ...
With the urgent need of the industry and the continuous development of artificial intelligence, research into intelligent excavators has achieved certain progress. However, intelligent excavators often face strong vibrations, dense dust, and complex objectives. These have brought severe challenges to environmental perception, and are important research difficulties that must be overcome in realizing the practical engineering applications of intelligent excavators. Many researchers have studied these problems in reducing vibration and dust noise for light detection and ranging (LiDAR) scanners, multi-sensor information fusion, and the segmentation and recognition of 3D scenes. This paper reviews the research status of these key technologies and discusses their development trends.
... However, raw point clouds are typically irregularly sampled, unstructured, and unordered (Cui et al., 2021), models for image processing like Convolutional Neural Networks (CNN) cannot be directly applied to point cloud data. Subsequently, projection-based methods (Lawin et al., 2017;Boulch et al., 2017;Tatarchenko et al., 2018;Lyu et al., 2022;, which project point clouds into diverse two-dimensional (2D) representations, and voxel-based methods (Huang and You, 2016;Çiçek et al., 2016;Riegler et al., 2017;Wang et al., 2018;, which partition point clouds into fixed-size 3D grids, are developed. However, these methods entail considerable computational costs and may result in the loss of geometric structures. ...
... To mitigate this, OctNet (Riegler et al., 2017) utilized an octree to partition the point cloud into non-uniform voxels, allowing memory allocation and computation to concentrate on dense areas of the point cloud. MSNet (Wang et al., 2018) captured point cloud features across multiple scales and employs Conditional Random Fields (CRF) to ensure spatial consistency. introduced the submanifold sparse convolutional network (SSCN), which demonstrated remarkable efficacy in handling high-dimensional, sparse data, significantly enhancing precision and processing efficiency in 3D semantic segmentation. ...
Expanding the receptive field in a deep learning model for large-scale 3D point cloud segmentation is an effective technique for capturing rich contextual information, which consequently enhances the network's ability to learn meaningful features. However, this often leads to increased computational complexity and risk of overfitting, challenging the efficiency and effectiveness of the learning paradigm. To address these limitations, we propose the Local Split Attention Pooling (LSAP) mechanism to effectively expand the receptive field through a series of local split operations, thus facilitating the acquisition of broader contextual knowledge. Concurrently, it optimizes the computational workload associated with attention-pooling layers to ensure a more streamlined processing workflow. Based on LSAP, a Parallel Aggregation Enhancement (PAE) module is introduced to enable parallel processing of data using both 2D and 3D neighboring information to further enhance contextual representations within the network. In light of the aforementioned designs, we put forth a novel framework, designated as LSNet, for large-scale point cloud semantic segmentation. Extensive evaluations demonstrated the efficacy of seamlessly integrating the proposed PAE module into existing frameworks, yielding significant improvements in mean intersection over union (mIoU) metrics, with a notable increase of up to 11%. Furthermore, LSNet demonstrated superior performance compared to state-of-the-art semantic segmentation networks on three benchmark datasets, including S3DIS, Toronto3D, and SensatUrban. It is noteworthy that our method achieved a substantial speedup of approximately 38.8% compared to those employing similar-sized receptive fields, which serves to highlight both its computational efficiency and practical utility in real-world large-scale scenes.
... Meanwhile, PointNet++ proposes layer-by-layer feature extraction and aggregation on this basis, which can better capture local and global geometric information. In addition, convolutional neural network (CNN)-based approaches have also achieved some success, such as using 3D convolution for processing point clouds [18][19][20]. In particular, PointNet++, an advanced deep learning architecture, can better cope with segmentation tasks in complex environments by extracting local and global features from point cloud data. ...
In animal husbandry applications, segmenting live pigs in complex farming environments faces many challenges, such as when pigs lick railings and defecate within the acquisition environment. The pig’s behavior makes point cloud segmentation more complex because dynamic animal behaviors and environmental changes must be considered. This further requires point cloud segmentation algorithms to improve the feature capture capability. In order to tackle the challenges associated with accurately segmenting point cloud data collected in complex real-world scenarios, such as pig occlusion and posture changes, this study utilizes PointNet++. The SoftPool pooling method is employed to implement a PointNet++ model that can achieve accurate point cloud segmentation for live pigs in complex environments. Firstly, the PointNet++ model is modified to make it more suitable for pigs by adjusting its parameters related to feature extraction and sensory fields. Then, the model’s ability to capture the details of point cloud features is further improved by using SoftPool as the point cloud feature pooling method. Finally, registration, filtering, and extraction are used to preprocess the point clouds before integrating them into a dataset for manual annotation. The improved PointNet++ model’s segmentation ability was validated and redefined with the pig point cloud dataset. Through experiments, it was shown that the improved model has better learning ability across 529 pig point cloud data sets. The optimal mean Intersection over Union (mIoU) was recorded at 96.52% and the accuracy at 98.33%. This study has achieved the automatic segmentation of highly overlapping pigs and pen point clouds. This advancement enables future animal husbandry applications, such as estimating body weight and size based on 3D point clouds.
... MinkowskiNets [26] provides a 4D convolutional network for extracting spatiotemporal features for processing 3D point clouds videos, which can also be used in point cloud extraction tasks. However, 3D reconstruction methods suffer from spatial information loss and a large amount of "empty data" [27], which easily increase the computational burden. Moreover, the significant loss of original point cloud data limits their application scenarios. ...
The semantic segmentation of vegetation point clouds has very important application value in the field of geosciences. It can distinguish vegetation regions from other regions, further classify and analyze the vegetation, and help us better understand the distribution and characteristics of vegetation to protect and manage natural resources. The PointNet and PointNet++ models use maximum pooling as the aggregation function, allowing the deep neural networks to classify unordered point clouds directly with high classification accuracy. However, their ability to extract spatial correlations and local features from point clouds is insufficient, which restricts the improvement of point clouds semantic segmentation accuracy and results in the poor processing of vegetation point clouds. To resolve this problem, this research designs the novel hierarchical point cloud transformer (HPCT) model, suitable for the semantic segmentation of multisource vegetation point clouds. Combined with deep learning techniques, different levels of features are processed hierarchically based on a hierarchical structure, and a Transformer module is combined in the feature extraction part, so as to obtain a larger receptive field and stronger semantic feature extraction capability. At the same time, we also propose a unified spatial scale sampling method for heterogeneous point cloud data input, which can be used not only for training and predicting the independent HPCT models with a single source of data, but also for training and predicting a unified HPCT model with multisource data. Semantic segmentation experiments are carried out on self-collected three-source data sets. The results show that the semantic segmentation performance of evaluation indicators (such as
Recall
,
Pre
,
IoU
, and
OA
) of the proposed HPCT model under the independent training and unified training on the three-source data exceed those of the PointNet, PointNet++, and PCT models, and even exceed some newly emerging models, such as PontCNN and DGCNN. The unified HPCT model has better segmentation performance than the independent HPCT model, with average
Recall
,
Pre
,
IoU
, and
OA
indicators increasing by 1.07%, 1.73%, 4.33%, and 1.03%, respectively. We attribute this superior accuracy to the unified training with the three-source data. The average
Recall
,
Pre
,
IoU
, and
OA
indicators of the unified HPCT model for the entire three-source data set exceed 96%, 98%, 95%, and 98%, respectively.
... Point clouds also represent the digital mapping of the real world, providing a comprehensive understanding of the state of a large and complex environment. This has led to a significant shift in focus to 3D point clouds in areas such as smart cities [1][2][3][4][5], autonomous driving [6][7][8], and land monitoring [9][10][11]. The challenge, however, is that point clouds exist in the form of discrete point collections, making their effective processing a complex task. ...
The semantic segmentation of point clouds is a crucial undertaking in 3D reconstruction and holds great importance. However, achieving precise semantic segmentation represents a significant hurdle. In this paper, we present BEMF-Net, an efficient method for large-scale environments. It starts with an effective feature extraction method. Unlike images, 3D data comprise not only geometric relations but also texture information. To accurately depict the scene, it is crucial to take into account the impacts of texture and geometry on the task, and incorporate modifications to improve feature description. Additionally, we present a multi-scale feature fusion technique that effectively promotes the interaction between features at different resolutions. The approach mitigates the problem of the smoothing of detailed information caused by downsampling mechanisms, while ensuring the integrity of features across different layers, allowing a more comprehensive representation of the point cloud. We confirmed the effectiveness of this method by testing it on benchmark datasets such as S3DIS, SensatUrban, and Toronto3D.
... Tchapmi et al. [15] generated the bold voxel labels through the 3D fully convolutional neural network based on voxelization of point clouds and then enhanced the prediction results by combining the trilinear interpolation and fully-connected CRF learning fine granularity. Wang et al. [16] implemented multi-scale voxelization of point clouds and extracts features, made adaptive learning of local geometric features, and realized global optimization of prediction class probabilities by using CRF with full considerations to spatial consistency of point clouds. The above semantic segmentation methods based on multi-views or voxels solve the structural problems and have some practicability. ...
Semantic segmentation of point clouds provided by airborne LiDAR survey in urban scenes is a great challenge. This is due to the fact that point clouds at boundaries of different types of objects are easy to be mixed and have geometric spatial similarity. In addition, the 3D descriptions of the same type of objects have different scales. To address above problems, a fusion attention convolutional network (SMAnet) was proposed in this study. The fusion attention module includes a self-attention module (SAM) and multi-head attention module (MAM). The SAM can capture feature information according to correlation of adjacent point cloud and it can distinguish the mixed point clouds with similar geometric features effectively. The MAM strengthens connections among point clouds according to different subspace features, which is beneficial for distinguishing point clouds at different scales. In feature extraction, lightweight multi-scale feature extraction layers are used to effectively utilize local information of different neighbor fields. Additionally, in order to solve the feature externalization problem and expand the network receptive field, the SoftMax-stochastic pooling (SSP) algorithm is proposed to extract global features. The ISPRS 3D Semantic Labeling Contest dataset was chosen in this study for point cloud segmentation experimentation. Results showed that the overall accuracy and average F1-score of SMAnet reach 85.7% and 75.1%, respectively. It is therefore superior to common algorithms at present. The proposed model also achieved good results on the GML(B) dataset, which proves that the model has good generalization ability.
... In the field of point clouds and DL techniques, significant progress has been made since the introduction of PointNet (Qi et al., 2017a). Various approaches have been developed, such as projection-based B. Wu et al., 2019), voxel-based (Alexandru Rosu et al., 2020Wang et al., 2018), point-based (Qi et al., 2017b), pointwise multi-layer perceptron (MLP) (Chen et al., 2019;Jiang et al., 2018), point convolution (Boulch, 2020;Mao et al., 2019), and graph-based (L. Y. Wang et al., 2019) methods. ...
Accurate landslide segmentation is crucial for obtaining damage information in disaster mitigation and relief efforts. This study aims to develop a deep learning network for accurate point cloud landslide segmentation. The proposed dynamic graph attention network (DGA-Net) has four steps. First, the down-sampling and neighbor search are applied to generate the samples that effectively represent the relevant landslide information. Second, the edge features of neighbor points are constructed based on graph structure to extract and enhance point cloud features. Third, the attention mechanism assigns adaptive weights to edge features and aggregates them into new point features. Fourth, the graph structure, edge features, and attention weights are dynamically updated through the hierarchical structures, which enable an expanded receptive field. In the upper reach of the Jinsha River, point clouds were prepared for landslide segmentation. The controlled experiments were designed for effectiveness evaluation. The results reported that proposed DGA-Net achieved the highest mean Intersection over Union (mIoU) of 0.743 and F1-score of 0.786, which was over 6.7% and 3.6% mIoU higher than shallow machine learning and other deep learning models. Besides, we analyzed the effect of super parameters in sampling strategy and the segmentation threshold in prediction stage on the model performance. The results showed that the samples with suitable sampling diameters and appropriate neighboring points are beneficial for landslide segmentation, and using optimal thresholds to segment stacked multiple prediction values can improve mIoU by 6%. Furthermore, the visualized feature maps revealed that the proposed model can index landslide points in feature space, which is beneficial to construct graph structures and use attention to enhance features. Comparative studies on the above experiments proved the superiority of the proposed method for landslide segmentation. We hope that our method and research results can contribute to post-disaster relief efforts.
... Voxel-based methods: These approaches divide the original point cloud into uniformly discrete data using a regular 3D grid, generating corresponding voxel data where each voxel contains a group of corresponding points. Subsequently, multi-scale convolutions with deep learning are used to extract local features [64] and handle relationships among voxels for classification and segmentation. Nevertheless, factors such as voxel grid size selection, potential empty areas in the scene, and varying scales of 3D shapes greatly impact the processing results, making this method unsuitable for large-scale point cloud processing. ...
To ensure efficient railroad operation and maintenance management, the accurate reconstruction of railroad BIM models is a crucial step. This paper proposes a workflow for automated segmentation and reconstruction of railroad structures using point cloud data, without relying on intensity or trajectory information. The workflow consists of four main components: point cloud adaptive denoising, scene segmentation, structure segmentation combined with deep learning, and model reconstruction. The proposed workflow was validated using two datasets with significant differences in railroad line point cloud data. The results demonstrated significant improvements in both efficiency and accuracy compared to existing methods. The techniques enable direct automated processing from raw data to segmentation results, providing data support for parameterized modeling and greatly reducing manual processing time. The proposed algorithms achieved an intersection over union (IoU) of over 0.9 for various structures in a 450-m-long railroad line. Furthermore, for single-track railroads, the automated segmentation time was within 1 min per kilometer, with an average mean intersection over union (MIoU) and accuracy of 0.9518 and 1.0000, respectively.
... Accordingly, early researchers projected three-dimensional (3D) point clouds into two-dimensional (2D) planes (Su et al., 2015) or spheres (Milioto et al., 2019;Lyu et al., 2022;Aksoy et al., 2020;Wu et al., 2018), and then used 2D convolutional neural networks (CNNs) to process point clouds. Moreover, some studies converted point clouds into regular voxels (Riegler et al., 2017;Wang et al., 2018;Zhou et al., 2020) and used 3D CNNs for normalization. However, all these methods inevitably lost detailed geometric information in projection or voxelization, which affected the semantic segmentation accuracy of point clouds. ...
... However, the computation complexity cubically grows with the increase in voxel resolution (Tang et al., 2020). To alleviate this problem, OctNet used octrees to construct non-uniform voxels for reducing spatial redundancy (Riegler et al., 2017), and MSNet used coarse-grained multi-scale voxels to fuse context information (Wang et al., 2018). Simultaneously, submanifold sparse convolution directly processed the voxel activation region through hash mapping, which greatly improved both efficiency and accuracy (Graham et al., 2018). ...
Point cloud semantic segmentation, which contributes to scene understanding at different scales, is crucial for three-dimensional reconstruction and digital twin cities. However, current semantic segmentation methods mostly extract multi-scale features by down-sampling operations, but the feature maps only have a single receptive field at the same scale, resulting in the misclassification of objects with spatial similarity. To effectively capture the geometric features and the semantic information of different receptive fields, a multi-scale voxel-point adaptive fusion network (MVP-Net) is proposed for point cloud semantic segmentation in urban scenes. First, a multi-scale voxel fusion module with gating mechanism is designed to explore the semantic representation ability of different receptive fields. Then, a geometric self-attention module is constructed to deeply fuse fine-grained point features with coarse-grained voxel features. Finally, a pyramid decoder is introduced to aggregate context information at different scales for enhancing feature representation. The proposed MVP-Net was evaluated on three datasets, Toronto3D, WHU-MLS, and SensatUrban, and achieved superior performance in comparison to the state-of-the-art (SOTA) methods. For the public Toronto3D and SensatUrban datasets, our MVP-Net achieved a mIoU of 84.14% and 59.40%, and an overall accuracy of 98.12% and 93.30%, respectively.
... The categorization of deep learning models for 3D point clouds is based on the form of the point cloud; thus, there are a few families of 3D point clouds. (i) Projection and Voxelization-Based Approaches: These types of deep learning applications change the form of the point cloud into a regular form, which are multiview 2D images [32][33][34] or 3D voxels [35][36][37]; then, the convolution neural network of deep learning models is applied to those forms of data. The advantage of those models is that it is possible to use 3D point clouds in regular deep learning models, but they suffer from losing important information in form change; also in the 3D case, memory is computationally burdensome. ...
Automatic point cloud classification (PCC) is a challenging task in large-scale urban point clouds due to the heterogeneous density of points, the high number of points and the incomplete set of objects. Although recent PCC studies rely on automatic feature extraction through deep learning (DL), there is still a gap for traditional machine learning (ML) models with hand-crafted features, particularly after emerging gradient boosting machine (GBM) methods. In this study, we are using the traditional ML framework for the problem of PCC in large-scale datasets following the steps of neighborhood definition, multi-scale feature extraction, and classification. Different from others, our framework takes advantage of the fast feature calculation with multi-scale radius neighborhood and a recent state-of-the-art GBM classifier, LightGBM. We tested our framework using three mobile urban datasets, Paris–Rau–Madame, Paris–Rue–Cassette and Toronto3D. According to the results, our framework outperforms traditional machine learning models and competes with DL-based methods.