Conference PaperPDF Available

Abstract

Most methods for the mapping of tree species are based on the segmentation of single trees that are subsequently classified using a set of hand-crafted features and an appropriate classifier. The classification accuracy for coniferous and deciduous trees just using airborne laser scanning (ALS) data is only around 90% in case the geometric information of the point cloud is used. As deep neural networks (DNNs) have the ability to adaptively learn features from the underlying data, they have outperformed classic machine learning (ML) approaches on well-known benchmark datasets provided by the robotics, computer vision and remote sensing community. Though, tree species classification using deep learning (DL) procedures has been of minor research interest so far. Some studies have been conducted based on an extensive prior generation of images or voxels from the 3D raw data. Since innovative DNNs directly operate on irregular and unordered 3D point clouds on a large scale, the objective of this study is to exemplarily use PointNet++ for the semantic labeling of ALS point clouds to map deciduous and coniferous trees. The dataset for our experiments consists of ALS data from the Bavarian Forest National Park (366 trees/ha), only including spruces (coniferous) and beeches (deciduous). First, the training data were generated automatically using a classic feature-based Random Forest (RF) approach classifying coniferous trees (precision = 93%, recall = 80%) and deciduous trees (precision = 82%, recall = 92%). Second, PointNet++ was trained and subsequently evaluated using 80 randomly chosen test batchesàbatches`batchesà 400 m 2. The achieved per-point classification results after 163 training epochs for coniferous trees (precision = 90%, recall = 79%) and deciduous trees (precision = 81%, recall = 91%) are fairly high considering that only the geometry was included. Nevertheless, the classification results using PointNet++ are slightly lower than those of the baseline method using a RF classifier. Errors in the training data and occurring edge effects limited a better performance. Our first results demonstrate that the architecture of the 3D DNN PointNet++ can successfully be adapted to the semantic labeling of large ALS point clouds to map deciduous and coniferous trees. Future work will focus on the integration of additional features like i.e. the laser intensity, the surface normals and multispectral features into the DNN. Thus, a further improvement of the accuracy of the proposed approach is to be expected. Furthermore, the classification of numerous individual tree species based on pre-segmented single trees should be investigated.
SEMANTIC LABELING OF ALS POINT CLOUDS FOR TREE SPECIES MAPPING
USING THE DEEP NEURAL NETWORK POINTNET++
S. Briechle1, P. Krzystek1, G. Vosselman2
1Munich University of Applied Sciences, Munich, Germany - (sebastian.briechle, peter.krzystek)@hm.edu
2Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede,
the Netherlands - george.vosselman@utwente.nl
KEY WORDS: semantic labeling, ALS point clouds, tree species mapping, deep neural network, PointNet++
ABSTRACT:
Most methods for the mapping of tree species are based on the segmentation of single trees that are subsequently classified using
a set of hand-crafted features and an appropriate classifier. The classification accuracy for coniferous and deciduous trees just
using airborne laser scanning (ALS) data is only around 90% in case the geometric information of the point cloud is used. As
deep neural networks (DNNs) have the ability to adaptively learn features from the underlying data, they have outperformed classic
machine learning (ML) approaches on well-known benchmark datasets provided by the robotics, computer vision and remote
sensing community. Though, tree species classification using deep learning (DL) procedures has been of minor research interest
so far. Some studies have been conducted based on an extensive prior generation of images or voxels from the 3D raw data.
Since innovative DNNs directly operate on irregular and unordered 3D point clouds on a large scale, the objective of this study
is to exemplarily use PointNet++ for the semantic labeling of ALS point clouds to map deciduous and coniferous trees. The
dataset for our experiments consists of ALS data from the Bavarian Forest National Park (366 trees/ha), only including spruces
(coniferous) and beeches (deciduous). First, the training data were generated automatically using a classic feature-based Random
Forest (RF) approach classifying coniferous trees (precision = 93%, recall = 80%) and deciduous trees (precision = 82%, recall =
92%). Second, PointNet++ was trained and subsequently evaluated using 80 randomly chosen test batches `
a 400 m2. The achieved
per-point classification results after 163 training epochs for coniferous trees (precision = 90%, recall = 79%) and deciduous trees
(precision = 81%, recall = 91%) are fairly high considering that only the geometry was included. Nevertheless, the classification
results using PointNet++ are slightly lower than those of the baseline method using a RF classifier. Errors in the training data and
occurring edge effects limited a better performance. Our first results demonstrate that the architecture of the 3D DNN PointNet++
can successfully be adapted to the semantic labeling of large ALS point clouds to map deciduous and coniferous trees. Future work
will focus on the integration of additional features like i.e. the laser intensity, the surface normals and multispectral features into
the DNN. Thus, a further improvement of the accuracy of the proposed approach is to be expected. Furthermore, the classification
of numerous individual tree species based on pre-segmented single trees should be investigated.
1. INTRODUCTION
1.1 2D and 3D DNNs
Recently, DNNs have gained huge interest as a segmentation
and classification method for 2D and 3D data. Examples
for well-known deep convolutional neural networks (CNNs)
are VGG-16 (Simonyan et al., 2014), ResNet-50 (He et al.,
2016) and Mask R-CNN (He et al., 2017). In the past,
benchmark datasets have been published to verify and to
compare the performance of neural networks. For 2D datasets,
very popular benchmarks are the MNIST database (LeCun et
al., 1998), the CIFAR-10 dataset (Krizhevsky et al., 2009) and
the ImageNet dataset (Deng et al., 2009). In the remote sensing
community, state-of-the-art DL methods have been modified
for various use-cases: Gevaert et al. (2018) adjusted a Fully
Convolutional Network to the application of Digital Terrain
Model (DTM) extraction in challenging areas. The method
includes an automatic labeling strategy and outperformed two
reference DTM extraction algorithms. Vetrivel et al. (2018)
successfully detected severe building damages by combining
CNN features from oblique aerial images and 3D features from
dense photogrammetric point clouds. Since sensors capable
of generating 3D data (i.e. stereo camera systems, LiDAR
systems) have gained more and more attention and are now
widespread in numerous technical fields, the semantic labeling
and classification of 3D irregular and unordered point clouds
using DNNs is of major research interest. Tchapmi et al. (2017)
introduced a framework called SegCloud to obtain semantic
scene labeling on point-level using a 3D fully CNN. Based
on a voxelization of the 3D point cloud, their approach was
evaluated on indoor and outdoor datasets (i.e. KITTI (Geiger
et al., 2013)) and a performance comparable or superior to the
state-of-the-art was achieved. Zhou et al. (2018) presented the
neural network VoxelNET to detect objects (i.e. pedestrians,
cyclists) in 3D point clouds based on the encoding of point
clouds into equally spaced 3D voxels. Zhao et al. (2018)
classified ALS point clouds via deep features learned by a
multi-scale CNN. The method creates a group of multi-scale
contextual images for each 3D point and is ranked first on the
ISPRS benchmark dataset (ISPRS, 2019). All the mentioned
3D approaches transform the irregular 3D data into regular 3D
voxel grids or accumulations of 2D images to advantageously
utilize neural networks. In contrast to that, algorithms have
been developed that directly use the original dataset in a
set of sequenced layers to find a best mapping between the
input data and the target predictions. These point-based
DNNs directly operate on the point cloud without the need
for a prior rasterization or voxelization (Figure 1). Qi et al.
(2016) developed a highly efficient and effective type of neural
network (PointNet) showing i.e. a high performance on the
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W13, 2019
ISPRS Geospatial Week 2019, 10–14 June 2019, Enschede, The Netherlands
This contribution has been peer-reviewed.
https://doi.org/10.5194/isprs-archives-XLII-2-W13-951-2019 | © Authors 2019. CC BY 4.0 License.
951
shape classification benchmarks ShapeNet (Chang et al., 2015)
and ModelNet40 (Wu et al., 2015). The offered applications
include object classification, part segmentation and semantic
labeling. Since PointNet showed limiting ability concerning
the recognition of fine-grained patterns and the generalizability
to complex scenes, Qi et al. (2017) introduced an enhanced
version called PointNet++. This hierarchical neural network
recursively applies PointNet and learns local features from
multiple contextual scales. It enables an even more accurate
classification of single objects as well as the semantic labeling
of large-scale point clouds. PointNet++ outperforms PointNet
especially on point sets with varying densities like ScanNet
(Dai et al., 2017). For object classification, PointNet++
reaches a classification accuracy of 90.7% on the ModelNet40
dataset (40 object categories) and 84.5% on the ScanNet
dataset (20 object categories). As robotics and applications
like virtual reality and autonomous driving boost the interest
in 3D data, innovative approaches for the semantic labeling
and the classification of 3D point clouds are being published
high-frequently. Landrieu et al. (2018) proposed a DL-based
framework for the semantic segmentation of large-scale point
clouds and set a new state-of-the-art for outdoor LiDAR scans
(i.e. S3DIS (Armeni et al., 2016) and SEMANTIC3D.NET
(Hackel et al., 2017)). After efficiently pre-organizing 3D
point clouds into geometrically homogeneous elements called
superpoint graphs (SPG), a graph convolutional network
manages to learn contextual relationships between object
parts. In the same year, Li et al. (2018) presented the
neural network PointCNN for feature learning from 3D point
clouds by generalizing typical CNNs and achieved on par or
better performance on multiple challenging 2D (i.e. MNIST,
CIFAR-10) and 3D (i.e. S3DIS) benchmark datasets and tasks.
Figure 1. Basic principle of 3D DNNs like PointNet++,
operating directly on 3D point clouds without a prior
transformation into 2D images or 3D voxel grids
1.2 3D vegetation mapping
Many methods for tree species classification based on
segmented single trees use a set of hand-crafted features in
combination with an appropriate classifier like RF, Support
Vector Machine or logistic regression (Fassnacht et al., 2016).
Using only the geometric information of the ALS point cloud,
the classification accuracy for coniferous and deciduous trees
is around 90%. By extending the feature set with the laser
intensity, the accuracy increases to around 95% (Reitberger
et al., 2009). DL methods have the ability to automatically
learn features and mostly generate more accurate classification
results than classic ML approaches using hand-crafted features
(see section 1.1). Yet, tree species classification using neural
networks has been of minor research interest. Presumably,
one reason is the lack of large training datasets. Just recently,
Hamraz et al. (2018) use a CNN along with 2D images
generated from ALS point clouds to classify coniferous and
deciduous trees with 92% and 86% accuracy, respectively. So
far, the direct usage of 3D data in DNNs for 3D vegetation
mapping is more uncommon.
In this paper, we demonstrate that the architecture of
PointNet++ can successfully be adapted to the semantic
labeling of large ALS point clouds to map the two tree species
spruce and beech.
2. MATERIAL
Airborne full waveform data were acquired in June 2017
(leaf-on condition) using a Riegl LMS-Q 680i instrument which
was carried by a plane at a flying altitude of 550 m. The
resulting point density was at average 54 points/m2. The
mission area is located in the Bavarian Forest National Park
where mainly spruces and beeches are present (95%). Single
trees were segmented via the well-known normalized cut (Ncut)
segmentation (Reitberger et al., 2009) for the entire area of
the National Park (Figure 2). As a baseline method, a RF
classifier was trained with 918 manually labeled reference
trees (380 coniferous, 538 deciduous) using only geometric
features (height dependent and density dependent features,
crown shape) to classify the segments with respect to the tree
species. Next, the classifier was evaluated using a test dataset
comprising 529 trees (293 coniferous, 236 deciduous). For this
standard method, the classification results for coniferous trees
(precision = 93%, recall = 80%) and deciduous trees (precision
= 82%, recall = 92%) were as expected fairly good. Finally,
the classifier was used to predict the tree species of all single
tree segments. Compared to classic ML approaches like RF,
deep learning models require extremely large training datasets
in order to capture the essential features in a multilayer structure
(Ioannidou et al., 2017). Hence, a study area of 5270 m x
500 m (2.64 km2) was extracted from the park area comprising
approximately 97000 tree segments (48.5% coniferous, 51.5%
deciduous) with around 1500 points per tree and a tree density
of 366 trees/ha. This dataset was used to train PointNet++
containing 143.3 million points labeled by the classified tree
segments and was fully balanced with respect to the two object
categories. Potentially misclassified tree segments were not
removed by visual inspection.
Figure 2. Exemplary single tree segments resulting from
the Ncut segmentation; random color rendering
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W13, 2019
ISPRS Geospatial Week 2019, 10–14 June 2019, Enschede, The Netherlands
This contribution has been peer-reviewed.
https://doi.org/10.5194/isprs-archives-XLII-2-W13-951-2019 | © Authors 2019. CC BY 4.0 License.
952
3. METHODOLOGY
The neural network PointNet++ (Qi et al., 2017) operates
on unordered 3D data without initially generating images or
voxels from the point clouds and calculates per-point scores.
It includes a hierarchical feature learning technique as well as
special layers that are able to aggregate multi-scale information
according to local point densities. For our task, the decisive
hyperparameters of the semantic segmentation implementation
of PointNet++ (Qi et al., 2019) were adjusted to get a
well-performing network (Table 1). Since the dataset comprises
two class labels, the activation function of the network was
changed from ”softmax” to ”sigmoid”. Next, the training
dataset was divided into cubic blocks with an edge length
of 60 m (Figure 3). In each training epoch (batch size =
16), smaller batches of 20 m x 20 m x 60 m were extracted
and preprocessed including zero centering. Basically, zero
centering defines the origin of a local coordinate system as the
center of gravity of the selected batches by subtracting the mean
X, mean Y, and mean Z values from the absolute coordinates.
After each training epoch, the network based on the updated
weights was evaluated on randomly chosen test batches. To
estimate the performance of the network, standard metrics
(precision, recall) were calculated on a single point scale. As
graphics processing unit (GPU) we utilized a ”NVIDIA Titan
V” (NVIDIA Corporation, 2019).
Hyperparameter Value Declaration
BATCH SIZE 16 Number of batches per epoch.
NUM POINT 8192 Number of points per batch.
NUM CLASSES 2 Number of object categories.
MAX EPOCH 200 Number of training epochs.
BASE LR 0.001 Initial learning rate.
OPTIMIZER ”adam” Optimization algorithm.
MOMENTUM 0.9 Momentum value for stochastic gradient descent.
DECAY STEP 200000 Increment for the reduction of the learning rate.
DECAY RATE 0.7 Decay rate for the learning rate.
MAX DROPOUT 0.875 Maximal dropout rate.
CUBE DIM 60 Edge length of the cubic training blocks in [m].
Table 1. Hyperparameters and default / optimized values
reference
prediction
decid. conif. recall
decid. 294623 30516 0.91
conif. 68707 261514 0.79
precision 0.81 0.90
OA 0.85
kappa 0.70
Table 2. Classification result (epoch 163) for applying the
trained neural network on randomly chosen test data; OA
= overall accuracy, decid. = deciduous, conif. =
coniferous
4. RESULTS AND DISCUSSION
The neural network was trained using 8192 points per batch.
Hence, the number of training points per epoch added up
to 131072. Assuming approximately 1500 points per tree,
this equates to around 87 tree segments per training epoch.
After each epoch, the weights of the network were updated.
Subsequently, the performance of the network was evaluated
on 80 randomly chosen test batches (655360 points b=437 trees
b=1.19 ha). After 163 of 200 training epochs (21.4 million
points b=14243 trees b=38.9 ha), the classification result for
coniferous trees (precision = 90%, recall = 79%) and deciduous
trees (precision = 81%, recall = 91%) reached its maximum.
Clearly, some experiments were needed to find an optimized set
of hyperparameters. The point-based results (Table 2) are fully
comparable to the classification results provided by the standard
method based on a single tree segmentation and a tree species
classification with RF. Appropriate features were automatically
generated from the point cloud and are able to discriminate the
complex 3D tree structure of both tree species. Nevertheless, it
needs to be pointed out to the reader that the given precision
and recall values for the DL approach were calculated on a
single-point level. In contrast, the performance measures for
the RF classifier given in section 2 are per-tree scores. Figure
4 exemplarily shows the predicted class labels compared to the
reference data (”ground truth”) for an area with a size of 6.4
ha. Like the standard method, PointNet++ performed better for
coniferous trees than for deciduous trees. One reason for this
is the superior representation of the crown shape of coniferous
trees in leaf-on condition at a point density of 54 points/m2.
For the time being we just used the geometrical information
of the ALS point cloud, whereas the laser intensity has not
been included yet. We encountered misclassification effects
at the edges of the blocks, since no inter-block neighborhood
information was provided to the model. One promising
approach to solve this issue is the utilization of Superpoint
Graphs (Landrieu et al., 2018) for the semantic labeling of the
point cloud. Of course, we expected that the neural network
outperforms the RF classifier. Very likely, the errors in the
training data and the mentioned edge effects limited a better
performance.
Figure 3. Cubic blocks used for training of PointNet++;
3D points colored in dependence of the class labels
”coniferous” (blue) and ”deciduous” (red)
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W13, 2019
ISPRS Geospatial Week 2019, 10–14 June 2019, Enschede, The Netherlands
This contribution has been peer-reviewed.
https://doi.org/10.5194/isprs-archives-XLII-2-W13-951-2019 | © Authors 2019. CC BY 4.0 License.
953
Figure 4. Semantic labeling result (left) and reference data (right) for coniferous trees (blue) and deciduous trees (red);
area size 200 m x 320 m
5. CONCLUSION AND OUTLOOK
The conducted experiments prove that the architecture of the 3D
DNN PointNet++ can successfully be adapted to the semantic
labeling of large ALS point clouds to map the two tree species
spruce and beech. The neural network was trained using ALS
data with a point density of 54 points/m2. The training dataset
was generated automatically using a classical feature-based RF
classifier that distinguished single coniferous trees (precision
= 93%, recall = 80%) and deciduous trees (precision = 82%,
recall = 92%). Using the architecture of PointNet++, the
achieved classification results for single points belonging to
either coniferous trees (precision = 90%, recall = 79%) or
deciduous trees (precision = 81%, recall = 91%) are fairly
high considering that only the geometry was included. We
want to emphasize that - depending on the point density and
the extent of the objects - the decisive hyperparameters of
the neural network need to be adjusted in order to get a
well-performing network for the particular classification task.
Moreover, the classification of individual tree species has not
been solved yet and it is still a challenging task. For instance,
a recent study by Amiri et al. (2019) reported on an accuracy
of around 78% by fusing multispectral data from LiDAR and
optical imagery to classify four tree species (spruce, fir, beech,
dead tree). Hence, future work will focus on the mapping of
individual tree species using PointNet++ or comparable DNNs
(e.g. SPG, PointCNN). Furthermore, laser intensity values,
surface normals and multispectral features should be integrated
to further improve the accuracy of the proposed DL approach.
ACKNOWLEDGEMENTS
We would like to thank PD Dr. Marco Heurich and his team
from the Bavarian Forest National Park for the supply of the
ALS data.
References
Amiri, N., Krzystek, P., Heurich, M., Skidmore, A. K.,
2019. Tree species classification by fusing multispectral lidar
and aerial imagery. ISPRS Journal of Photogrammetry and
Remote Sensing. In review.
Armeni, I., Sener, O., Zamir, A. R., Jiang, H., Brilakis,
I., Fischer, M., Savarese, S., 2016. 3d semantic parsing
of large-scale indoor spaces. Proceedings of the IEEE
International Conference on Computer Vision and Pattern
Recognition.
Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang,
Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H. et al.,
2015. Shapenet: An information-rich 3d model repository.
arXiv preprint arXiv:1512.03012.
Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser,
T., Nießner, M., 2017. Scannet: Richly-annotated 3d
reconstructions of indoor scenes. Proc. Computer Vision and
Pattern Recognition (CVPR), IEEE.
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L., 2009.
Imagenet: A large-scale hierarchical image database.
Fassnacht, F. E., Latifi, H., Stere´
nczak, K., Modzelewska,
A., Lefsky, M., Waser, L. T., Straub, C., Ghosh, A.,
2016. Review of studies on tree species classification from
remotely sensed data. Remote Sensing of Environment, 186,
64–87.
Geiger, A., Lenz, P., Stiller, C., Urtasun, R., 2013. Vision
meets Robotics: The KITTI Dataset. International Journal
of Robotics Research (IJRR).
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W13, 2019
ISPRS Geospatial Week 2019, 10–14 June 2019, Enschede, The Netherlands
This contribution has been peer-reviewed.
https://doi.org/10.5194/isprs-archives-XLII-2-W13-951-2019 | © Authors 2019. CC BY 4.0 License.
954
Gevaert, C. M., Persello, C., Nex, F., Vosselman, G.,
2018. A deep learning approach to DTM extraction from
imagery using rule-based training labels. ISPRS journal of
photogrammetry and remote sensing, 142, 106–123.
Hackel, H., Savinov, N., Ladicky, L., Wegner, J. D., Schindler,
K., Pollefeys, M., 2017. SEMANTIC3D.NET: A new
large-scale point cloud classification benchmark. ISPRS
Annals of the Photogrammetry, Remote Sensing and Spatial
Information Sciences, IV-1-W1, 91–98.
Hamraz, H., Jacobs, N. B., Contreras, M. A., Clark, C. H.,
2018. Deep learning for conifer/deciduous classification of
airborne LiDAR 3D point clouds representing individual
trees. arXiv preprint arXiv:1802.08872.
He, K., Gkioxari, G., Doll´
ar, P., Girshick, R., 2017. Mask
r-cnn. Proceedings of the IEEE international conference on
computer vision, 2961–2969.
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual
learning for image recognition. Proceedings of the IEEE
conference on computer vision and pattern recognition,
770–778.
Ioannidou, A., Chatzilari, E., Nikolopoulos, S., Kompatsiaris,
I., 2017. Deep learning advances in computer vision with 3D
data: A survey. ACM Computing Surveys, 50.
ISPRS, 2019. Isprs 3d semantic labeling contest.
http://www2.isprs.org/commissions/comm2/wg4/
vaihingen-3d-semantic-labeling.html. Accessed:
2019-03-11.
Krizhevsky, A., Hinton, G. et al., 2009. Learning multiple
layers of features from tiny images. Technical report,
Citeseer.
Landrieu, L., Simonovsky, M. et al., 2018. Large-scale
point cloud semantic segmentation with superpoint graphs.
Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, 4558–4567.
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P. et al., 1998.
Gradient-based learning applied to document recognition.
Proceedings of the IEEE, 86, 2278–2324.
Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B., 2018.
Pointcnn: Convolution on x-transformed points. Advances in
Neural Information Processing Systems, 828–838.
NVIDIA Corporation, 2019. Nvidia titan v - nvidia’s
supercomputing gpu architecture. https://www.nvidia.
com/en-us/titan/titan-v/. Accessed: 2019-03-11.
Qi, C. R., Su, H., Mo, K., Guibas, L. J., 2016. PointNet:
Deep Learning on Point Sets for 3D Classification and
Segmentation. arXiv preprint arXiv:1612.00593.
Qi, C. R., Yi, L., Su, H., Guibas, L. J., 2017. Pointnet++:
Deep hierarchical feature learning on point sets in a metric
space. Advances in Neural Information Processing Systems,
5099–5108.
Qi, C. R., Yi, L., Su, H., Guibas, L. J., 2019.
Pointnet++: Deep hierarchical feature learning on point
sets in a metric space. https://github.com/charlesq34/
pointnet2. Accessed: 2019-03-11.
Reitberger, J., Schn¨
orr, C., Krzystek, P., Stilla, U., 2009.
3D segmentation of single trees exploiting full waveform
LIDAR data. ISPRS Journal of Photogrammetry and Remote
Sensing, 64, 561–574.
Simonyan, K., Zisserman, A. et al., 2014. Very deep
convolutional networks for large-scale image recognition.
arXiv preprint arXiv:1409.1556.
Tchapmi, L., Choy, C., Armeni, I., Gwak, J., Savarese, S.,
2017. Segcloud: Semantic segmentation of 3d point clouds.
2017 International Conference on 3D Vision (3DV), IEEE,
537–547.
Vetrivel, A., Gerke, M., Kerle, N., Nex, F., Vosselman,
G., 2018. Disaster damage detection through synergistic
use of deep learning and 3D point cloud features derived
from very high resolution oblique aerial images, and
multiple-kernel-learning. ISPRS journal of photogrammetry
and remote sensing, 140, 45–59.
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao,
J., 2015. 3d shapenets: A deep representation for volumetric
shapes. Proceedings of the IEEE conference on computer
vision and pattern recognition, 1912–1920.
Zhao, R., Pang, M., Wang, J., 2018. Classifying airborne
LiDAR point clouds via deep features learned by a
multi-scale convolutional neural network. International
Journal of Geographical Information Science, 32, 960–979.
Zhou, Y., Tuzel, O. et al., 2018. Voxelnet: End-to-end learning
for point cloud based 3d object detection. Proceedings
of the IEEE Conference on Computer Vision and Pattern
Recognition, 4490–4499.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W13, 2019
ISPRS Geospatial Week 2019, 10–14 June 2019, Enschede, The Netherlands
This contribution has been peer-reviewed.
https://doi.org/10.5194/isprs-archives-XLII-2-W13-951-2019 | © Authors 2019. CC BY 4.0 License.
955
... LGENet, and GSV-NET [4][5][6][7][8]. ...
... Then, the subsampled data are fed to the PointNet for encoding the local region patterns into the feature vectors. By stacking several set abstraction levels, Point-Net++ can learn features from a local geometric structure and abstract the local features layer by layer [5]. Details of PointNet++ including multi-scaling grouping and multiresolution grouping that are to solve point data with non-uniform density efficiently are described in [25,26]. ...
Article
Full-text available
Light detection and ranging (LiDAR) data of 3D point clouds acquired from laser sensors is a crucial form of geospatial data for recognition of complex objects since LiDAR data provides geometric information in terms of 3D coordinates with additional attributes such as intensity and multiple returns. In this paper, we focused on utilizing multiple returns in the training data for semantic segmentation, in particular building extraction using PointNet++. PointNet++ is known as one of the efficient and robust deep learning (DL) models for processing 3D point clouds. On most building boundaries, two returns of the laser pulse occur. The experimental results demonstrated that the proposed approach could improve building extraction by adding two returns to the training datasets. Specifically, the recall value of the predicted building boundaries for the test data was improved from 0.7417 to 0.7948 for the best case. However, no significant improvement was achieved for the new data because the new data had relatively lower point density compared to the training and test data.
... Recent advancements have introduced various artificial neural networks (ANNs) specifically designed for point cloud data segmentation, including SEGCloud, VoxNet, PointNet, PointNet++, PointCNN, PointSeg, LaserNet, LGENet, and GSV-NET. These models process point cloud data by organizing and interpreting the 3D points to perform tasks such as segmenting objects and classifying different structures within the point cloud [7][8][9][10][11]. In the field of 3D point cloud segmentation, three key tasks are identified: semantic segmentation, instance segmentation, and part segmentation. ...
Conference Paper
Full-text available
... Some examples of DL-based semantic segmentation of forest point clouds include (Briechle, Krzystek, and Vosselman 2019; Kaijaluoto et al. 2022;Krisanski et al. 2021;Wang and Bryson 2023). Briechle, Krzystek, and Vosselman (2019) used PointNet++ to classify airborne laser scanning point clouds to map deciduous and coniferous trees. Krisanski et al. (2021) separated forest scene into subsets, i.e. terrain, vegetation, coarse woody debris, and stem using a modified PointNet++, with 95.92%, 96.02%, 54.98%, and 96.09% accuracy, respectively. ...
... Although geometrical information might have limited benefits for tree species classification, in some experiments, tree species have been classified exclusively using aircraftborne lidar data. However, these single sensor approaches often require specific conditions, such as seasonal changes in leaf phenology [13], multispectral lidar data [14], or exceptionally high point densities [15], [16]. Despite these challenges, singlesensor approaches offer several advantages over data fusion from different sensors. ...
Article
Full-text available
We evaluated the performance of unmanned aerial systems (UAS) airborne light detection and ranging (lidar) data in the species classification of pine, spruce, and broadleaf trees. Classifications were conducted with three machine learning (ML) approaches (multinomial logistic regression, random forest, and multilayer perceptron) using features computed from automatically segmented point clouds that represent individual trees. Trees were segmented from the point cloud using a marker-controlled watershed algorithm, and two types of features were computed for each segment: intensity and texture. Textural features were computed from gray-level co-occurrence matrices built from horizontal cross-sections of the point cloud. Intensity features were computed as the average intensity values within voxels. The classification accuracies were validated on 39 rectangular 30 m x 30 m field plots using leave-one-plot out cross-validation. The results showed only very small differences in the classification performance between the different ML approaches. Intensity features provided greater classification accuracy (kappa 0.73-0.77) than textural features (kappa 0.60-0.64). However, the best classification results (kappa 0.81) were achieved when both intensity and textural features were used. Feature importance in the different ML approaches was also similar. We conclude that the accurate classification of the three tree species considered in this study is possible using single sensor UAS lidar data.
... With the continuous development of artificial intelligence technology, there are numerous deep learning-based point cloud semantic segmentation algorithms, such as Point-Net [27][28][29], PointNet++ [30][31][32], and RandLA-Net [33][34][35]. Among them, the PointNet series is an instance segmentation network, but it is only suitable for small-scale indoor scenes, while RandLA-Net is a semantic segmentation model, which is suitable for largescale outdoor scenes. ...
Article
Full-text available
With the development of sensor technology and point cloud generation techniques, there has been an increasing amount of high-quality forest RGB point cloud data. However, popular clustering-based point cloud segmentation methods are usually only suitable for pure forest scenes and not ideal for scenes with multiple ground features or complex terrain. Therefore, this study proposes a single-tree point cloud extraction method that combines deep semantic segmentation and clustering. This method first uses a deep semantic segmentation network, Improved-RandLA-Net, which is developed based on RandLA-Net, to extract point clouds of specified tree species by adding an attention chain to improve the model’s ability to extract channel and spatial features. Subsequently, clustering is employed to extract single-tree point clouds from the segmented point clouds. The feasibility of the proposed method was verified in the Gingko site, the Lin’an Pecan site, and a Fraxinus excelsior site in a conference center. Finally, semantic segmentation was performed on three sample areas using pre- and postimproved RandLA-Net. The experiments demonstrate that Improved-RandLA-Net had significant improvements in Accuracy, Precision, Recall, and F1 score. At the same time, based on the semantic segmentation results of Improved-RandLA-Net, single-tree point clouds of three sample areas were extracted, and the final single-tree recognition rates for each sample area were 89.80%, 75.00%, and 95.39%, respectively. The results demonstrate that our proposed method can effectively extract single-tree point clouds in complex scenes.
... Seidel et al. [12] used PointNet to classify seven tree species, but the classification accuracy was very low and then shifted the focus of the study to a picture convolutional neural network (CNN) approach, which finally achieved a high classification accuracy. In other studies, using point cloud deep learning models [13][14][15][16][17][18][19], good accuracy of tree classification was achieved. ...
Conference Paper
Full-text available
Tree species is a critical factor in the practice of forest resource field sample surveys. Light detection and ranging (LiDAR) can obtain three-dimensional structural information about forests and trees and is increasingly being used in forest resource surveys. We used three pointwise multi-layer perceptron (MLP)-based deep learning methods (PointNet, PointNet++, and PointMLP) to identify individual tree point clouds of seven different tree species to explore the effectiveness of point cloud deep learning in classifying individual tree point clouds. Experiment results were extremely exciting. Higher classification accuracy can be attained in trials utilizing 2048 points. The tree classification accuracies of PointMLP and PointNet++ on the test set were 0.9474 and 0.9483, respectively, in classification experiments with a balanced sample size. PointMLP, the current state-of-the-art pointwise MLP-based model, is faster to train and performs better.
... LiDAR data from different platforms have different roles in the study of forest parameter extraction and tree species classification. Most studies have used airborne laser scanning (ALS) data, from which scholars extract taxonomic features for region-wide mapping of tree species distribution [6][7][8][9]; however, there are relatively few studies using terrestrial laser scanning (TLS) data for individual tree species classification due to its high acquisition cost and difficulty in data processing [10,11]. Mobile laser scanning (MLS) based on simultaneous localization and mapping (SLAM) technology can easily and quickly collect point cloud data in a study area [12]. ...
Article
Full-text available
Tree species information is an important factor in forest resource surveys, and light detection and ranging (LiDAR), as a new technical tool for forest resource surveys, can quickly obtain the 3D structural information of trees. In particular, the rapid and accurate classification and identification of tree species information from individual tree point clouds using deep learning methods is a new development direction for LiDAR technology in forest applications. In this study, mobile laser scanning (MLS) data collected in the field are first pre-processed to extract individual tree point clouds. Two downsampling methods, non-uniform grid and farthest point sampling, are combined to process the point cloud data, and the obtained sample data are more conducive to the deep learning model for extracting classification features. Finally, four different types of point cloud deep learning models, including pointwise multi-layer perceptron (MLP) (PointNet, PointNet++, PointMLP), convolution-based (PointConv), graph-based (DGCNN), and attention-based (PCT) models, are used to classify and identify the individual tree point clouds of eight tree species. The results show that the classification accuracy of all models (except for PointNet) exceeded 0.90, where the PointConv model achieved the highest classification accuracy for tree species classification. The streamlined PointMLP model can still achieve high classification accuracy, while the PCT model did not achieve good accuracy in the tree species classification experiment, likely due to the small sample size. We compare the training process and final classification accuracy of the different types of point cloud deep learning models in tree species classification experiments, further demonstrating the advantages of deep learning techniques in tree species recognition and providing experimental reference for related research and technological development.
... However, because deep learning methods require a large amount of training data and the publicly available point cloud datasets are limited to a small amount of ground-based data, there are relatively few studies on tree species classification. Briechle et al. [18] first used PointNet++ to semantically annotate ALS point clouds, and the experimental results showed 90% and 91% classification accuracy for coniferous and deciduous trees, respectively. Subsequently, Briechle et al. [19] also used the PointNet++ network combined with ALS-based data and multispectral imagery to classify three different tree species and dead tree canopies. ...
Article
Full-text available
To investigate forest resources, it is necessary to identify the tree species. However, it is a challenge to identify tree species using 3D point clouds of trees collected by light detection and ranging (LiDAR). PointNet++, a point cloud deep learning network, can effectively classify 3D objects. It is important to establish high-quality individual tree point cloud datasets when applying PointNet++ to identifying tree species. However, there are different data processing methods to produce sample datasets, and the processes are tedious. In this study, we suggest how to select the appropriate method by designing comparative experiments. We used the backpack laser scanning (BLS) system to collect point cloud data for a total of eight tree species in three regions. We explored the effect of tree height on the classification accuracy of tree species by using different point cloud normalization methods and analyzed the effect of leaf point clouds on classification accuracy by separating the leaves and wood of individual tree point clouds. Five downsampling methods were used: farthest point sampling (FPS), K-means, random, grid average sampling, and nonuniform grid sampling (NGS). Data with different sampling points were designed for the experiments. The results show that the tree height feature is unimportant when using point cloud deep learning methods for tree species classification. For data collected in a single season, the leaf point cloud has little effect on the classification accuracy. The two suitable point cloud downsampling methods we screened were FPS and NGS, and the deep learning network could provide the most accurate tree species classification when the number of individual tree point clouds was in the range of 2048–5120. Our study further illustrates that point-based end-to-end deep learning methods can be used to classify tree species and identify individual tree point clouds. Combined with the low-cost and high-efficiency BLS system, it can effectively improve the efficiency of forest resource surveys.
Article
The application of deep learning techniques on point cloud data holds significant promise for efficient data segmentation and classification of traffic signs. This study proposes modifications to the PointNet++ neural network to improve performance on outdoor scenes. In addition, the method leverages the use of local geometric features in the training process. Several models with different combinations of geometric features and proposed changes were trained using labeled data from seven highway segments in Alberta, Canada. The results indicate that the proposed models have improved performance in accuracy and processing times compared to previous studies on sign detection using point cloud data. The overall per sign detection performance shows a 99.2% recall (98% per point) and a 98% F1-score (97% per point). Overall, the inclusion of z-gradient significantly increased sign detection in terms of precision, recall, and F1-score, by 9%, 4.9%, and 7.1%, respectively, allowing the model to yield notable performance improvements for outdoor scene recognition. Ablation tests were performed to validate the performed PointNet++ modifications. The modified PointNet++ was compared with SqueezeSegV2, a state-of-the-art neural network designed for road-object segmentation, and showed improved performance. A comparison was also made with existing sign detection methods on the Paris-Lille-3D benchmark, finding higher recall rates than existing studies. The proposed approach suggests that with adjustments, the PointNet++ neural network architecture can achieve remarkable results on large metric scale scenes for sign extraction using point cloud data.
Article
Full-text available
Existing algorithms for Digital Terrain Model (DTM) extraction still face difficulties due to data outliers and geometric ambiguities in the scene such as contiguous off-ground areas or sloped environments. We postulate that in such challenging cases, the radiometric information contained in aerial imagery may be leveraged to distinguish between ground and off-ground objects. We propose a method for DTM extraction from imagery which first applies morphological filters to the Digital Surface Model to obtain candidate ground and off-ground training samples. These samples are used to train a Fully Convolutional Network (FCN) in the second step, which can then be used to identify ground samples for the entire dataset. The proposed method harnesses the power of state-of-the-art deep learning methods, while showing how they can be adapted to the application of DTM extraction by (i) automatically selecting and labelling dataset-specific samples which can be used to train the network, and (ii) adapting the network architecture to consider a larger surface area without unnecessarily increasing the computational burden. The method is successfully tested on four datasets, indicating that the automatic labelling strategy can achieve an accuracy which is comparable to the use of manually labelled training samples. Furthermore, we demonstrate that the proposed method outperforms two reference DTM extraction algorithms in challenging areas.
Article
Full-text available
The purpose of this study was to investigate the use of deep learning for coniferous/deciduous classification of individual trees from airborne LiDAR data. To enable efficient processing by a deep convolutional neural network (CNN), we designed two discrete representations using leaf-off and leaf-on LiDAR data: a digital surface model with four channels (DSMx4) and a set of four 2D views (4x2D). A training dataset of labeled tree crowns was generated via segmentation of tree crowns, followed by co-registration with field data. Potential mislabels due to GPS error or tree leaning were corrected using a statistical ensemble filtering procedure. Because the training data was heavily unbalanced (~8% conifers), we trained an ensemble of CNNs on random balanced sub-samples of augmented data (180 rotational variations per instance). The 4x2D representation yielded similar classification accuracies to the DSMx4 representation (~82% coniferous and ~90% deciduous) while converging faster. The data augmentation improved the classification accuracies, but more real training instances (especially coniferous) likely results in much stronger improvements. Leaf-off LiDAR data were the primary source of useful information, which is likely due to the perennial nature of coniferous foliage. LiDAR intensity values also proved to be useful, but normalization yielded no significant improvements. Lastly, the classification accuracies of overstory trees (~90%) were more balanced than those of understory trees (~90% deciduous and ~65% coniferous), which is likely due to the incomplete capture of understory tree crowns via airborne LiDAR. Automatic derivation of optimal features via deep learning provide the opportunity for remarkable improvements in prediction tasks where captured data are not friendly to human visual system - likely yielding sub-optimal human-designed features.
Conference Paper
Full-text available
We propose a novel deep learning-based framework to tackle the challenge of semantic segmentation of large-scale point clouds of millions of points. We argue that the organization of 3D point clouds can be efficiently captured by a structure called superpoint graph (SPG), derived from a partition of the scanned scene into geometrically homogeneous elements. SPGs offer a compact yet rich representation of contextual relationships between object parts, which is then exploited by a graph convolutional network. Our framework sets a new state of the art for segmenting outdoor LiDAR scans (+11.9 and +8.8 mIoU points for both Semantic3D test sets), as well as indoor scans (+5.8 mIoU points for the S3DIS dataset).
Article
Full-text available
Few prior works study deep learning on point sets. PointNet by Qi et al. is a pioneer in this direction. However, by design PointNet does not capture local structures induced by the metric space points live in, limiting its ability to recognize fine-grained patterns and generalizability to complex scenes. In this work, we introduce a hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set. By exploiting metric space distances, our network is able to learn local features with increasing contextual scales. With further observation that point sets are usually sampled with varying densities, which results in greatly decreased performance for networks trained on uniform densities, we propose novel set learning layers to adaptively combine features from multiple scales. Experiments show that our network called PointNet++ is able to learn deep point set features efficiently and robustly. In particular, results significantly better than state-of-the-art have been obtained on challenging benchmarks of 3D point clouds.
Article
Full-text available
Deep learning has recently gained popularity achieving state-of-the-art performance in tasks involving text, sound, or image processing. Due to its outstanding performance, there have been efforts to apply it in more challenging scenarios, for example, 3D data processing. This article surveys methods applying deep learning on 3D data and provides a classification based on how they exploit them. From the results of the examined works, we conclude that systems employing 2D views of 3D data typically surpass voxel-based (3D) deep models, which however, can perform better with more layers and severe data augmentation. Therefore, larger-scale datasets and increased resolutions are required.
Article
Full-text available
We present a novel dataset captured from a VW station wagon for use in mobile robotics and autonomous driving research. In total, we recorded 6 hours of traffic scenarios at 10–100 Hz using a variety of sensor modalities such as high-resolution color and grayscale stereo cameras, a Velodyne 3D laser scanner and a high-precision GPS/IMU inertial navigation system. The scenarios are diverse, capturing real-world traffic situations, and range from freeways over rural areas to inner-city scenes with many static and dynamic objects. Our data is calibrated, synchronized and timestamped, and we provide the rectified and raw image sequences. Our dataset also contains object labels in the form of 3D tracklets, and we provide online benchmarks for stereo, optical flow, object detection and other tasks. This paper describes our recording platform, the data format and the utilities that we provide.
Article
A key requirement for leveraging supervised deep learning methods is the availability of large, labeled datasets. Unfortunately, in the context of RGB-D scene understanding, very little data is available -- current datasets cover a small range of scene views and have limited semantic annotations. To address this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations. To collect this data, we designed an easy-to-use and scalable RGB-D capture system that includes automated surface reconstruction and crowdsourced semantic annotation. We show that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks, including 3D object classification, semantic voxel labeling, and CAD model retrieval. The dataset is freely available at http://www.scan-net.org.