Conference PaperPDF Available

Comparative analysis of classification techniques for building block extraction using aerial imagery and LiDAR data

Authors:

Abstract

Building detection has been a prominent area in the area of image classification. Most of the research effort is adapted to the specific application requirements and available datasets. In this paper we present a comparative analysis of different classification techniques for building block extraction. Our dataset includes aerial orthophotos (with spatial resolution 20cm), a DSM generated from LiDAR (with spatial resolution 1m and elevation resolution 20 cm) and DTM (spatial resolution 2m) from an area of Athens, Greece. The classification methods tested are unsupervised (K-Means, Mean Shift), and supervised (Feed Forward Neural Net, Radial-Basis Functions, Support Vector Machines). We evaluated the performance of each method using a subset of the test area. We present the classified images, and statistical measures (confusion matrix, kappa coefficient and overall accuracy). Our results demonstrate that the top unsupervised method is the Mean Shift that performs similarly to the best supervised methods.
Comparative Analysis of Classification Techniques for Building Block Extraction using
Aerial Imagery and LiDAR data
E. Bratsolis1,3, S. Gyftakis1,2, E. Charou2, N.Vassilas1
1Department of Informatics,
Technological Educational Institute of Athens,
12210 Aigaleo, Greece
2Inst. of Informatics and Telecommunications
NCSR Demokritos, 15310 Agia Paraskevi, Greece
3Department of Physics
National University of Athens, 15784 Athens, Greece
Abstract Building detection has been a prominent area in the
area of image classification. Most of the research effort is adapted to
the specific application requirements and available datasets. In this
paper we present a comparative analysis of different classification
techniques for building block extraction. Our dataset includes aerial
orthophotos (with spatial resolution 20cm), a DSM generated from
LiDAR (with spatial resolution 1m and elevation resolution 20 cm)
and DTM (spatial resolution 2m) from an area of Athens, Greece.
The classification methods tested are unsupervised (K-Means, Mean
Shift), and supervised (Feed Forward Neural Net, Radial-Basis
Functions, Support Vector Machines). We evaluated the performance
of each method using a subset of the test area. We present the
classified images, and statistical measures (confusion matrix, kappa
coefficient and overall accuracy). Our results demonstrate that the
top unsupervised method is the Mean Shift that performs similarly to
the best supervised methods.
Keywords remote sensing, image classification algorithms,
LiDAR.
I. INTRODUCTION
Recently there is an increasing demand for detailed 3D
models of buildings, monuments, urban planning units and
cities from elevation (x, y, z) data such as those acquired from
LiDAR airborne scanners [1]. A critical step towards 3D
modeling is the segmentation of the spatial data into
homogeneous regions (e.g. buildings, parts of urban planning
units, roads, etc.) and then the extraction of their boundaries.
The algorithms and methods of image segmentation constitute,
even nowadays, an open research domain in the fields of image
analysis, computer vision and digital remote sensing. The
various techniques that have been developed for this purpose
are based either exclusively on LiDAR data [2] or they utilize
in parallel and other complementary data such as digital maps
[3], high resolution satellite images [3] or aerial
orthophotographs [4].
Following the progressive increase of sensors’ accuracies
during the last decades, small and medium scales research in
the mature fields of land-use and land-cover applications has
been expanded to include large scale detection of buildings and
other urban objects that can be used towards generation of 3D
models. For example, conventional multispectral unsupervised
classification methods followed by co-occurrence matrix based
filtering have been used for buildings classification in urban
regions [5]. Due to the low-resolution TM-SPOT satellite
images that represented medium size buildings (12-20m in
width) with less than five pixels, the aim of this semi-automatic
classification method was to manually determine rough
building classes following a quite analytic clustering. On the
other hand, 3D building reconstruction in [6] is assisted by
accurate 2D digital maps used to locate buildings in laser
scanning data, thus, bypassing the need for an automatic
building detection phase. Building roofs can then be
reconstructed from point clouds through a model based or data
driven approach.
Ahlberg et al. [7] present a method using high resolution
LiDAR and image data for 3D building reconstruction which is
based on a series of preprocessing steps that include generation
of DTM using active contours, ground classification via simple
height thresholding, segmentation and building versus non-
building classification of each segment using an artificial
neural network. The classification is based on measures of
shape, curvature and maximum slope. The 3D reconstruction is
completed by extracting planar roof faces from the elevation
data.
Chen et al. [8] also proposed a scheme towards building
detection and 3D building reconstruction by integrating
LiDAR data, multispectral satellite images and aerial
photography. Building detection is performed through region-
based segmentation and knowledge-based classification. Then,
3D reconstruction is achieved in four steps: (1) 3D planar patch
forming, (2) initial building edge detection, (3) straight line
extraction, and (4) split-merge-shape patented method for
building modeling.
In our case, the available LiDAR data are not in the form of
point clouds. Instead, they have been provided in the form of a
Digital Surface Map (DSM) and a Digital Terrain Map (DTM)
at relatively low resolutions. On the other hand, high resolution
aerial imagery from the same urban region is at a spatial
resolution five and ten times higher than DSM and DTM
respectively. The inaccuracies of the low resolution LiDAR
data as well as the arrangement and layout of the buildings in
the pilot area of Kallithea – characteristic of most of Athens’s
urban areas – make automatic building detection a challenging
application. It is our belief that an attempt to automatically
detect buildings under these circumstances should follow
careful and well designed steps, similar to [7]. To this end, a
number of unsupervised and supervised classification methods
are implemented and tested for building block segmentation
using a fusion of aerial and LiDAR data. In particular, all
classifiers have been trained in a 4-D input space (i.e. the RGB
bands of the optical image augmented by an upsampled
LiDAR depth map).
Sections II and III present the unsupervised and supervised
classification methods tested in this work respectively. Data
description is in Section IV, classification results are presented
in Section V and, finally, Section VI presents the conclusions
and future work.
II. UNSUPERVISED IMAGE CLASSIFICATION
TECHNIQUES
Although in most cases the accuracy of the 3D building
models is specified by the spatial resolution of LiDAR data, a
challenge that is often encountered with multiple independent
data sources and preprocessing steps is to fuse low resolution
LiDAR data with high resolution aerial images to improve the
accuracy of 3D building reconstruction. It is our belief that an
attempt to automatically detect buildings under these
circumstances should follow careful and well designed steps.
To take into account the added problem complexity due to
touching buildings that in many cases form block-scale
connected regions, we suggest to first perform a building block
image segmentation through buildings vs. no buildings pixel
classification prior to the detection of building footsteps.
Since the aim is to develop an automatic system for 3D
building reconstruction, it is our intension to select an
unsupervised technique with a classification accuracy
comparable to that of well known supervised classifiers. In this
respect, two of the most popular unsupervised classification
techniques have been examined in this work and presented in
the sequel.
A. K-means
The algorithm is an unsupervised classification technique
where the user initiates the algorithm by specifying the desired
number of classes. It is started with some clusters of pixels in
the feature space, each of them defined by its center. The initial
cluster is created by associating each pixel in the image to the
given nearest centroid, then the mean values of the elements
are computed and the centroids are replaced by them. These
steps are done iteratively until there is no more allocation of
pixels available for creating new clusters [10].
B. Mean Shift
Mean Shift was first proposed by Fukunaga and Hostetler
[11], later adapted by Cheng [12] for the purpose of image
analysis and more recently extended by Comaniciu, Meer and
Ramesh to low-level vision problems, including, segmentation
[13], adaptive smoothing [13] and tracking [14].
The main idea behind mean shift is to treat the points in a
multidimensional feature space as an empirical probability
density function where dense regions in the feature space
correspond to the local maxima or modes of the underlying
distribution. For each data point in the feature space, one
performs a gradient ascent procedure on the local estimated
density until convergence. The stationary points of this
procedure represent the modes of the distribution. The data
points associated with the same stationary point are considered
members of the same cluster.
For a given pixel, the mean shift algorithm builds a set of
neighboring pixels within a given spatial radius and a color
range. The spatial and color center of this set is then computed
and the algorithm iterates with this new spatial and color
center.
III. SUPERVISED IMAGE CLASSIFICATION
TECHNIQUES
A. Feed Forward Neural Net (FFNN)
Artificial neural networks (ANNs) are connectionist
systems consisting of many primitive units (artificial neurons)
which are working in parallel and are connected via directed
links. The general neural unit has several inputs and each input
is weighted with a weight factor. The main processing principle
of these units is the distribution of activation patterns across the
links similarly to the basic mechanism of a biological neural
network. The knowledge is stored in the structure of the links,
their topology and weights which are organized by training
procedures. The link connecting two units is directed, fixing a
source and a target unit. The weight attributed to a link
transforms the output of a source unit to an input on a target
unit. This is a supervised learning. Depending on the weight,
the transmitted signal can take a value ranging from highly
activating to highly forbidding.
The basic function of a unit is to accept inputs from units
acting as sources, to activate itself, and to produce one output
that is directed to units-targets. Based on their topology and
functionality, the units are arranged in layers. The layers can be
generally divided into three types: input, hidden, and output.
The input layer consists of units that are directly activated by
the input pattern. The output one is made by the units that
produce the output pattern of the network. All the other layers
are hidden and directly inaccessible. Supervised learning
proceeds by minimizing a cost (or error) function with respect
to all of the network weights. The activation function of the
unit is given by the sigmoid function [15].
B. Radial-Basis Functions
Radial-basis functions (RBFs) were first introduced in the
solution of the real interpolation problems. The early work on
this subject is surveyed by Powell [16]. In the RBF neural
networks, radial basis functions are embedded into a two layer
feed-forward neural network. The network has a set of inputs
and a set of outputs. Between the inputs and outputs there is a
layer of processing units referred to as hidden units. Each
hidden unit is implemented with a radial basis function. In the
RBF neural networks, the nodes of the hidden layer generate a
local response of input prompting through the radial basis
functions, and the output layer of RBF neural networks realize
the linear weighted combination of the output of the hidden
basis functions. There is a large class of radial-basis functions.
C. Support Vector Machines
The Support Vector Machines (SVM), have been
introduced within the framework of the Statistical Learning
Theory [16] - [19] developed by V. Vapnik and co-workers.
The approach consists in searching for the separating surface
between 2 classes by the determination of the subset of training
samples which best describes the boundary between the 2
classes. These samples are called support vectors and
completely define the classification system. In the case where
the two classes are nonlinearly separable, the method uses a
kernel expansion in order to make projections of the feature
space onto higher dimensionality spaces where the separation
of the classes becomes linear.
IV. DATA DESCRIPTION
For our experiments we have used the following dataset
from the Kallithea neighborhood of Athens, Greece:
Orthophotos from color (channels R, G, B) aerial imagery
acquired by the National Cadastre and Mapping Agency of
Greece (Fig. 1.). The acquisition date of the orthophotos is
2007 with spatial resolution of 20cm. LiDAR data was
acquired by the GeoIntelligence SA over the above area with
spatial resolution 1m. The vertical resolution is 20 cm. The
acquisition date is 2003. DTM of the above area was acquired
by the GeoIntelligence SA with spatial resolution of 2m.
From combination of LiDAR (DSM) and DTM datasets we
produced the normalized DSM (or n-DSM) of the above area
(Fig. 3.). The n-DSM is the difference between DSM and DTM
and represents the net building heights rather than the absolute
elevations. In our experiments we used the 3 channels of the
orthophoto augmented by the n-DSM as one additional
channel. The values of all these 4 channels were normalized in
the same range [0, 255].
V. CLASSIFICATION RESULTS
All experiments have been performed using the Matlab and
Monteverdi environments. Following training of the supervised
or unsupervised classifiers, test results have been obtained
regarding the central urban block of Fig. 1 (including patio
pixels in the middle of the block). The corresponding prototype
building block mask to which all results will be compared is
shown Fig. 3
The number of classes selected for the K-Means algorithm
was set to 2. The training set included the 4-D points from all
pixels of the whole region and the algorithm converged in less
than 10 iterations. The corresponding classification result is
shown in Fig. 5..
As far as the Mean Shift algorithm is concerned, the spatial
radius was set to 5 pixels and the spectral range to 15
(expressed in radiometric units). The resulting amplitude image
was then thresholded using Ochu’s optimal threshold selection
method in order to produce the final binary image. The Mean
Shift classification result is shown in Fig. 6..
The FFNN classifier consists of four layers. An input layer
with four nodes, a first hidden layer with 20 nodes, a second
hidden layer with 10 nodes and an output layer with one node.
Our classifier uses the Levenberg Marquardt training
algorithm and converges in less than 30 epochs with a training
set consisting of pixels from representative areas from a broad
region, balanced in terms of categories representation, and
corresponding roughly to 1.5% of the overall image. The
FFNN classification results are shown in Fig. 7..
The radial-basis function used in the RBF classifier is the
Gaussian one. The spread constant is chosen 70 and the hidden
nodes 60. Using the same training set as before, the
classification results are shown in Fig. 8.. Finally, the
classification results regarding SVM are shown in Fig. 9..
In all classification results buildings are shown as white and
non-buildings as black pixels. For the evaluation and
quantitative comparison of all classification methods we
compared all results with the prototype mask of Fig. 3. In
particular, in order to quantify the quality of the classification
results, we have computed the following statistical measures on
the above subsets.
a) Confusion matrix: A = [aij],
where aij is the number of pixels from the jth class that
have been classified as belonging to the ith class,
divided by arj, the overall number of pixels from class j.
c) Overall accuracy: ∑aii / at
where at is the total number of pixels (of the evaluation
subset)
b) Kappa coefficient: =
where P0 is the probability of correct pixel classification,
i.e. P0 = ∑aii / at, and Pe = ∑ari aci / at2, with acj, being the
overall number of pixels assigned to class j. Values of
exceeding 0.75 suggest strong - non-accidental
-classification performances [16].
The corresponding confusion matrices, Cohen’s kappa
coefficient and overall accuracy are shown in Table I.
TABLE I. CLASSIFICATION RESULTS
Unsupervised Method Confusion Matrix kappa
coefficient
Overall
Accuracy
K-Means 0.8191 0.1809
0.1498 0.8502
0.6400 0.8298
Mean Shift 0.9166 0.0834
0.1650 0.8350
0.7529 0.8885
Supervised Method Confusion Matrix kappa
coefficient
Overall
Accuracy
FFNN 0.9230 0.0770
0.1460 0.8540
0.7770 0.8992
RBF 0.9166 0.0834
0.1281 0.8719
0.7829 0.9012
SVM 0.8192 0.1808
0.0554 0.9446
0.7151 0.8625
VI. CONCLUSIONS AND FUTURE RESEARCH
Comparing the measures (Table I) we conclude that the best
unsupervised method is the Mean Shift while the FFNN and
RBF supervised methods produce similarly best results. We
intend to use the Mean Shift classification algorithm as a step
prior to building detection in densily built areas such as the one
in Fig. 1. Accurate building detection is a necessary step
towards the development of a system for automatic 3D
reconstruction of buildings.
ACKNOWLEDGMENT
This research has been co-funded by the European Union
(European Social Fund) and Greek national resources under the
framework of the Archimedes III: Funding of Research Groups
in T.E.I. of Athens project of the Education & Lifelong
Learning Operational Programme. We would also like to thank
GeoIntelligence for providing the DSM and DTM elevation
data as well as the National Cadastre & Mapping Agency of
Greece for providing us with the high resolution aerial
photographs.
REFERENCES
[1] C. Poullis, S. You, and U. Neumann, “Rapid creation of large-scale
photorealistic virtual environments”, IEEE Virtual Reality, pp. 153–160,
2008.
[2] J. Giglierano, Lidar Basics for Mapping Applications. US Geological
Survey, http://pubs.usgs.gov/of/2007/1285/pdf/Giglierano.pdf, Open-
File Report 2007-1285.
[3] Y. Zhang, Z. Zhang, J. Zhang and J. Wu, “3D Building Modeling With
Digital Map, LiDAR Data and Video Image Sequences”, The
Photogrammetric Record, Vol. 20(111), pp. 285-302, 2005.
[4] G. Sohn and I. Dowman, “Data fusion of high resolution satellite
imagery and LiDAR data for automatic building extraction”, ISPRS
Journal of Photogrammetry and Remote Sensing, Vol. 62 , pp. 43–63,
2007.
[5] Y. Zhang, “Optimisation of building detection in satellite images by
combining multispectral classification and texture filtering”, ISPRS
Journal of Photogrammetry and Remote Sensing, Vol. 54, pp. 50–60,
1999.
[6] G. Vosselman, “Fusion of Laser Scanning Data, Maps, and Aerial
Photographs for Building Reconstruction”, in Proc. IEEE International
Geoscience and Remote Sensing Symposium and the 24th Canadian
Symposium on Remote Sensing, IGARSS'02, Toronto, Canada, 2002.
[7] S. Ahlberg, U. Söderman, M. Elmqvist and Å. Persson, “On Modelling
and Visualisation of High Resolution Virtual Environments Using
LiDAR Data”, in Proc. 12th International Conference on
Geoinformatics Geospatial Information Research: Bridging the
Pacific and Atlantic, Geoinformatics 2004, University of Gävle,
Sweden, 7-9 June 2004.
[8] L.-C. Chen , T.-A. Teo, Y.-C. Shao, Y.-C. Lai, and J.-Y. Rau, “Fusion of
LiDAR Data and Optical Imagery for Building Modeling”, International
Archives of Photogrammetry, Remote Sensing and Spatial Information
Sciences, Vol. 35, pp.732–737, 2004.
[9] H. Park and S. Lim. “Data Fusion of Laser Scanned Data and Aerial
Ortho-Imagery for Digital Surface Modeling”, in Proc. 3rd Int.
Workshop on 3D Geo-Information, Seoul, Korea, pp. 65-72, 13-14
November, 2008.
[10] R. Duda, P. Hart, and D. Stork, Pattern Classification, Wiley, pp. 526-
527, 2001.
[11] K. Fukunaga and L. Hostetler, “The estimation of the gradient of a
density function, with applications in pattern recognition”, IEEE
Transactions on InformationTheory, 21(1), pp. 32–40, 1975.
[12] Y. Cheng, “Mean shift, mode seeking, and clustering”, IEEE
Transactions on Pattern Analysis and Machine Intelligence , 17(8), pp.
790–799, 1995.
[13] D. Comaniciu, and P. Meer, “Mean shift: A robust approach toward
feature space analysis”, IEEE Transactions on Pattern Analysis and
Machine Intelligence, 24(5), pp. 603–619, 2002.
[14] D. Comaniciu., V. Ramesh, and P. Meer “Kernel-based object tracking”,
IEEE Transactions on Pattern Analysis and Machine Intelligence , Vol.
25(5), pp. 564–577, 2003.
[15] J.D. Paola, and R.A. Schowengerdt, “A Detailed Comparison of
Backpropagation Neural Network and Maximum Likelihood
Classifiers for Urban Land Use Classification”, IEEE Transactions
Geosci. Remote Sens., Vol. 33, No. 4, pp. 981-996, July 1995.
[16] M.J.D. Powel, “Radial basis functions for multivariable interpolation: A
review”, in Proc. IMA Conference on Algorithms for the Approximation
of Functions and Data, pp. 143-167, RCMS, Shrivenham, England,
1985.
[17] V.N. Vapnik, The Nature of Statistical Learning Theory, Springer Verlag,
New York, 1995.
[18] V.N. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.
[19] C. Cortes, and V.N. Vapnik, “Support vector networks”, Machine
Learning, Vol. 20, pp. 273-297, 1995.
[20] T. Kubik, W. Paluszynski, A. Iwaniak and P. Tymkow, “Supervised
Classification of Multi-Spectral Satellite Images Using Neural
Networks”, in Proc. of the 10th IEEE International Conference on
Methods and Models in Automation and Robotics (MMAR 2004), Eds.
S. Domek, R. Kaszyński , pp. 769-744, Miedzyzdroje, Poland, 2004.
Fig. 1. Orthophoto image of building block for test.
Fig. 2. The n-DSM elevation data of the same region as that of Fig. 1.
Fig. 3. A 3-D representation of n-DSM.
Fig. 4. Binary image subset for evaluation of classification.
Fig. 5. Classification result of K-Means
Fig. 6. Classification result of Mean Shift
Fig. 7. Classification result of FFNN
Fig. 8. Classification result of RBF
Fig. 9. Classification result of SVM
... Some mapping and modeling tasks require airborne scanning where Unmanned Aerial Vehicle (UAV) is widely employed regarding easy operation and cheap cost compared to conventional airplanes, although the regulations of UAV flight and mapping specifications are still a research topic [5,6] and may vary among different applications. Nevertheless, whichever LiDAR data measurement tool is employed, one of the first steps of data processing is data classification [7,8,9] i.e., to obtain mostly terrain, vegetation, buildings classes from the source point clouds. In the recent years, machine learning techniques have been widely used to automatically classify [10,11] and model [12] the LiDAR data. ...
... The row of the focus cell is determined from the Z coordinate value of the point. Hence, to determine the column of the concerning cell, it is necessary to calculate the angle with Eq. (8), which is formed between the X-axis and the line connecting the concerning point and the gravity center. In fact, the angle depicts the location of the concerning point regarding the gravity center. ...
Article
Full-text available
Tree modeling and visualization still represent a challenge in the Light Detecting And Ranging (LiDAR) area. Starting from the segmented tree point clouds, this paper presents an innovative tree modeling and visualization approach. The algorithm simulates the tree point cloud by a rotating surface. Three matrixes, X, Y, and Z are calculated by considering the middle of the projected tree point cloud on the horizontal plane. This mathematical form not only allows tree modeling and visualization, but also permits the calculation of geometric characteristics and parameters of the tree. The superimposition of the tree point cloud over the constructed model confirms its high accuracy where all the points of the tree cloud are within the constructed model. The tests with multiple single trees demonstrate an overall average fit between 0.3 and 0.89 m. The built tree models are also compliant with the Open Geospatial Consortium (OCG) City GML standards at the level of a physical model. This approach opens a door to numerous applications for visualization, computation, and study of forestry and vegetation in urban as well as rural areas.
... Step 4 Automatic classification in Building-Non Buildings According to our previous investigation (Bratsolis et al, 2013, Gyftakis et al, 2014 automatic segmentation in Buildings-Non Buildings could be efficiently performed when using Multilayer Feedforward Neural Network (MFNN). The input layer of the MFNN consists of four nodes: 3 nodes for the 3 channels (Red, Green, and Blue) of the orthophoto and one node for values of the Mean-Shifted and thresholded nDSM. ...
... Step 4 Automatic classification in Building-Non Buildings According to our previous investigation (Bratsolis et al, 2013, Gyftakis et al, 2014 automatic segmentation in Buildings-Non Buildings could be efficiently performed when using Multilayer Feedforward Neural Network (MFNN). The input layer of the MFNN consists of four nodes: 3 nodes for the 3 channels (Red, Green, and Blue) of the orthophoto and one node for values of the Mean-Shifted and thresholded nDSM. ...
Article
Full-text available
Urban density is an important factor for several fields, e.g. urban design, planning and land management. Modern remote sensors deliver ample information for the estimation of specific urban land classification classes (2D indicators), and the height of urban land classification objects (3D indicators) within an Area of Interest (AOI). In this research, two of these indicators, Building Coverage Ratio (BCR) and Floor Area Ratio (FAR) are numerically and automatically derived from high-resolution airborne RGB orthophotos and LiDAR data. In the pre-processing step the low resolution elevation data are fused with the high resolution optical data through a mean-shift based discontinuity preserving smoothing algorithm. The outcome is an improved normalized digital surface model (nDSM) is an upsampled elevation data with considerable improvement regarding region filling and “straightness” of elevation discontinuities. In a following step, a Multilayer Feedforward Neural Network (MFNN) is used to classify all pixels of the AOI to building or non-building categories. For the total surface of the block and the buildings we consider the number of their pixels and the surface of the unit pixel. Comparisons of the automatically derived BCR and FAR indicators with manually derived ones shows the applicability and effectiveness of the methodology proposed.
... Our original dataset considers an area of Athens, Greece, and includes aerial orthophotos in the RGB color space at a spatial resolution of 20cm, a DSM generated from LiDAR point cloud with spatial resolution of 1m and elevation resolution of 20 cm and a DTM at a spatial resolution of 2m. This dataset has been used in one of our previous works 8 where various unsupervised and supervised classification methods were implemented and their ability to accurately extract building blocks was tested. Although good results are obtained from the above methods, especially the supervised ones, for better and more accurate results we propose an adaptive preprocessing technique based on mean shift filtering 9,10 for: a) smoothing the high resolution orthophoto without at the same time destroying optical edges, and b) increasing the resolution of nDSM to that of the orthophoto image using data fusion through joint mean shift upsampling with adaptive selection of the corresponding kernel bandwidths. ...
Conference Paper
Full-text available
Nowadays there is an increasing demand for detailed 3D modeling of buildings using elevation data such as those acquired from LiDAR airborne scanners. The various techniques that have been developed for this purpose typically perform segmentation into homogeneous regions followed by boundary extraction and are based on some combination of LiDAR data, digital maps, satellite images and aerial orthophotographs. In the present work, our dataset includes an aerial RGB orthophoto, a DSM and a DTM with spatial resolutions of 20cm, 1m and 2m respectively. Next, a normalized DSM (nDSM) is generated and fused with the optical data in order to increase its resolution to 20cm. The proposed methodology can be described as a two-step approach. First, a nearest neighbor interpolation is applied on the low resolution nDSM to obtain a low quality, ragged, elevation image. Next, we performed a mean shift-based discontinuity preserving smoothing on the fused data. The outcome is on the one hand a more homogeneous RGB image, with smoothed terrace coloring while at the same time preserving the optical edges and on the other hand an upsampled elevation data with considerable improvement regarding region filling and "straightness" of elevation discontinuities. Besides the apparent visual assessment of the increased accuracy of building boundaries, the effectiveness of the proposed method is demonstrated using the processed dataset as input to five supervised classification methods. The performance of each method is evaluated using a subset of the test area as ground truth. Comparisons with classification results obtained with the original data demonstrate that preprocessing the input dataset using the mean shift algorithm improves significantly the performance of all tested classifiers for building block extraction.
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
Modern airborne laser scanners (ALS) and digital cameras provide new opportunities to acquire detailed remote sensing data of the natural environment. This type of data is very suitable as the basis for the construction of high fidelity 3D virtual environment models. There are many applications requiring such models, both civilian and military, e.g. crisis management, decision support systems, virtual tourism, visual simulation, etc. To support these applications, new methods for processing ALS and camera data, extracting geographic information, and supporting virtual environment modelling are needed. In this paper we will outline some recent development on new methods for processing of ALS data. The long term goal of this research is the development of methods for automated extraction of geographic information from high resolution ALS and camera data to support the construction of high-fidelity 3D virtual environment models.
Article
LiDAR elevation data is becoming widely available for use by many non-engineering mapping specialists such as geologists, soil scientists and planners. Understanding the basics of LiDAR data acquisition is essential to using the data effectively in mapping applications, including how vegetation affects the vertical accuracy of LiDAR. Tools are available for mapping specialists to process raw LiDAR data into useful GIS products so they do not have to rely on vendor supplied products.
Book
Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.