Conference PaperPDF Available

Building reconstruction from Worldview DEM using image information


Abstract and Figures

In this paper an algorithm is proposed for extraction of 3D models corresponding to the buildings from Digital elevation Models (DEM) produced from Worldview–2 stereo satellite images. The edge information extracted from orthorectified images are used as additional information for 3D reconstruction algorithm. Particularly the complex buildings containing several smaller building parts are discussed. For this purpose, a model driven approach based on the analysis of the 3D points of DEM in a 2D projection plane is proposed. Accordingly, a building region is divided into smaller parts according to the direction and the number of ridge lines for parametric building reconstruction. The 3D model is derived for each building part and finally, a complete parametric building model is formed by merging the 3D models of the building parts and adjusting the nodes after the merging process. For remaining areas which do not contain ridge lines, a prismatic model by approximation of the corresponding polygon is derived and merged to the parametric models to shape the final model of the building.
Content may be subject to copyright.
Hossein Arefi and Peter Reinartz
Remote Sensing Technology Institute, German Aerospace Center – DLR, D-82234 Wessling, Germany
KEY WORDS: Digital Elevation Models(DEM), Worldview–2, orthophoto, ridge line, local maxima, projection, approximation,
In this paper an algorithm is proposed for extraction of 3D models corresponding to the buildings from Digital elevation Models
(DEM) produced from Worldview–2 stereo satellite images. The edge information extracted from orthorectified images are used as
additional information for 3D reconstruction algorithm. Particularly the complex buildings containing several smaller building parts
are discussed. For this purpose, a model driven approach based on the analysis of the 3D points of DEM in a 2D projection plane
is proposed. Accordingly, a building region is divided into smaller parts according to the direction and the number of ridge lines for
parametric building reconstruction. The 3D model is derived for each building part and finally, a complete parametric building model is
formed by merging the 3D models of the building parts and adjusting the nodes after the merging process. For remaining areas which
do not contain ridge lines, a prismatic model by approximation of the corresponding polygon is derived and merged to the parametric
models to shape the final model of the building.
Automatic building reconstruction from Digital Elevation Models
(DEM) using or without using other data sources is still an active
research area in Photogrammetry or GIS institutions around the
world. In this context, providing a 3D CAD model which rep-
resents the overall shape of the building and containing the most
significant parts has boosted many applications in the GIS area
such as urban planning and mobile navigation systems.
In the past few years, several algorithms have been proposed for
automated 3D building reconstruction. The algorithms comprises
methods that only employ elevation data such as high resolution
airborne LIDAR for model generation while some methods use
other additional sources of data. An additional data source plus
DEM is usually employed when the quality or resolution of the
elevation data is not appropriate for model generation. Segmen-
tation based approaches for a 3D building model generation from
grid data are proposed by Geibel and Stilla (2000) and Rotten-
steiner and Jansa (2002) to find planar regions which determine
a polyhedral model. Gorte (2002) employed another segmenta-
tion approach using TIN structure for the data that the segments
are generated by iteratively merging the triangles based on simi-
larity measurements. Rottensteiner (2006) described a model for
the consistent estimation of building parameters, which is part of
the 3D building reconstruction. Geometric regularities were in-
cluded as soft constraints in the adjustment of the model. Robust
estimation can be then used to eliminate false hypotheses about
geometric regularities. A comparison between data- and model-
driven approaches for building reconstruction have been made
which states that the model-driven approach is faster and does not
visually deform the building model. In contrast, the data-driven
approach tends to model each building detail to obtain the nearest
polyhedral model, but it usually visually deforms the real shape
of the building (Tarsha Kurdi et al., 2007).
A projection based approach for 3D model generation of the build-
ings from high resolution airborne LIDAR data has been pro-
posed by Arefi et al. (2008). The building blocks have been di-
vided to smaller parts according to the location and direction of
the ridge lines. A projection based method was applied to gener-
ate CAD model of each building parts.
Kada and McKinley (2009) utilized a library of parameterized
standard shapes of models to reconstruct building blocks. The
buildings are partitioned into non-intersecting sections, for which
roof shapes are then determined from the normal directions of the
LIDAR points.
In this paper we propose a method which aims at simplifying
the 3D reconstruction of the building blocks by decomposing the
overall model into several smaller ones corresponding to each
building part. A similar method has been already reported by
the author (Arefi, 2009) for reconstruction of high resolution LI-
DAR data. In this paper due to a lower quality of the DEM pro-
duced by stereo matching of satellite data (Worldview–2) com-
paring to the LIDAR data, an additional data source is employed.
Accordingly, the Worldview orthorectified image is employed for
a better extraction of the ridge lines. According to each ridge
line a projection-based algorithm is employed to transfer the 3D
points into 2D space by projecting the corresponding pixels of
each building part onto a 2D plane that is defined based on the
orientation of the ridge line. According to the type of the roof,
a predefined 2D model is fitted to the data and in the next step,
the 2D model in extended to 3D by analyzing the third dimen-
sion of the points. A final model regarding the parametric roof
structures of the building block is defined by merging all the indi-
vidual models and employing some post processing refinements
regarding the coinciding nodes and corners to shape the appro-
priate model. Additionally prismatic models with flat roof are
provided regarding to the remaining areas that are not contain-
ing ridge lines. Finally, all parametric and prismatic models are
merged to form a final 3D model of the building.
In this section a new method is proposed for reconstruction of
buildings by integrating Digital Elevation Models (DEM) pro-
duced from Worldview-2 stereo satellite images and orthorecti-
fied image information.
Worldview–2 provides panchromatic images with 50cm ground
sampling distance (GSD) as well as eight-band multispectral im-
ages with 1.8m GSD. A DEM is produced from panchromatic
Worldview–2 images with 50cm image resolution using a fully
automated method (d’Angelo et al., 2009) based on semiglobal
stereo matching algorithm using mutual information proposed by
uller (2008).
The automatic 3D building reconstruction algorithm proposed in
this paper comprises the following three major steps:
1. Ridge-based decomposition of building parts
2. Projection-based reconstruction of parametric roofs
3. Approximation of the polygons relating to flat roof segments
4. Merge parametric and prismatic models
Figure 1 presents the proposed work flow for automatic genera-
tion of building models using a projection based algorithm. De-
tailed explanations are given in the following chapters.
Building Segment Height Pixels Ortho Image
Surface Normal Regional Maxima Edge Information
Ridge Lines
Projection to 2D
2D Model
Extension to 3D
Merge Parametric Models
Ground Plane
Approximation of Remaining Parts
Prismatic Model
Merge to Final 3D Model
Figure 1: Work flow for projection based 3D building reconstruc-
2.1 Ridge-based decomposition of building parts
The idea of 3D building reconstruction proposed in this paper
is to simplify the modeling process by decomposing the over-
all building model into the smaller tiles based on the location of
the ridge lines. Accordingly, location of the ridge lines in build-
ings with tilted roof structures should be carefully extracted. The
quality of the final model has a direct relation to the quality of ex-
tracted ridge lines, i.e., a high quality ridge line leads to a higher
quality 3D model. The location of ridge line has two major roles
in this modeling approach:
Ridge lines are basis for decomposing the building block to
smaller tiles.
Ridge lines are basis for projection based model generation
of each part.
Therefore, the first and the most important part of generating
3D models of building parts containing tilted roof structures is
extracting ridge lines. Arefi (2009) proposed an algorithm to
extract the ridge location from high resolution airborne LIDAR
data using morphological geodesic reconstruction (Gonzalez and
Woods, 2008). Due to a lower quality of DEM created from
Worldview stereo images comparing to the LIDAR data, a method
relying on only height data does not produce appropriate ridge
pixels. In this paper, a method by integrating orthorectified im-
age and DEM information is applied for a high quality ridge line
extraction (cf. Figure 1). The procedure to extract all the ridge
lines corresponding to a building with tilted roofs begins with
feature extraction. For this purpose, three feature descriptors are
extracted from DEM and ortho image as follows (cf. Figure 2):
(a) Worldview DEM (b) Ortho photo
(c) Surface normals (d) Regional maxima (e) Canny edges
Figure 2: Feature extraction from DEM and orthorectified images
1. Surface normals on DEM: The surface normal is a vector
perpendicular to a surface which represents the orientation
of a surface at a pixel. It can be estimated by determining
the best fitting plane over a small neighborhood. A normal
vector can also be computed by means of the cross prod-
uct of any two non-collinear vectors that are tangent to the
surface at the desired pixel (Jain and Dubes, 1988). Figure
2(c) shows the surface normals generated from the world-
view DEM. This feature descriptor is employed to eliminate
the pixels with a sharp height discontinuity, e.g., eaves, from
the other edge pixels.
2. Regional maxima from DEM: Here, an algorithm based on
image reconstruction using geodesic morphological dilation
(Arefi and Hahn, 2005) is employed to extract the regional
maxima regions. The geodesic dilation differs to basic dila-
tion where an image and a structuring element are involved
in the filtering process. In geodesic dilation additionally the
dilated image is “masked” with a predefined “mask” image.
Equation 1 shows the geodesic dilation of image J (marker)
using mask I. In most applications, the marker image is de-
fined by a height offset to the mask image, which generally
represents the original DEM. Figure 3 illustrates the differ-
ence between geodesic and basic image dilation as well as
Figure 3: Applying geodesic reconstruction to extract the top pixels of a sample building
reconstruction based on geodesic dilation in a profile view
of a simple building with gable roof. The input image 3(a),
called marker, is enlarged by dilation 3(b), and limited by
the mask image (I) (cf. Figure 3(c)). The result of geodesic
dilation is shown in Figure 3(d) and a dashed line around
it depicts the mask image. If this process, i.e., dilation and
limitation by mask, is iteratively continued, it stops after n
iterations (here four) reaching stability. The result provided
by this step is called reconstruction of marker (J) by mask
(I) using geodesic dilation (cf. Figure 3(g)). The number of
iteration, i.e., nin Equation 2, to create reconstructed image
varies from one sample to another. In the example presented
in Figure 3 the reconstruction procedure stops after four it-
Accordingly, geodesic dilation (δI) and image reconstruc-
tion are defined as
I(J) = (JMB)^I, (1)
I(J) = δ(1)
I(J).... δ(1)
| {z }
Equation (2) defines the morphological reconstruction of the
marker image (J) based on geodesic dilation (δI) (cf. Equa-
tion 1). The basic dilation (δ) of marker and point wise min-
imum () between dilated image and mask (I) is employed
iteratively until stability. Looking at the reconstructed im-
age of the example depicted in Figure 6 shows that the up-
per part of the object, i.e., the difference between marker and
mask is suppressed during image reconstruction. Therefore,
the result of gray scale reconstruction depends on the height
offset between the marker and the mask images and accord-
ingly, different height offset suppress different parts of the
object. More information regarding the segmentation of the
DEMs by gray scale reconstruction using geodesic dilation
can be found in (Arefi, 2009) where similar algorithms are
employed for extracting the 3D objects as well as the ridge
lines from high resolution LIDAR DSM. In a segmentation
algorithm based on geodesic reconstruction, selecting an ap-
propriate “marker” image plays the main role and has a di-
rect effect on the quality of the final reconstructed image. A
“marker” image with a small offset, e.g., few meters, from
the “mask” can suppress mainly local maxima regions sim-
ilar to the artifacts above the ground.
3. Canny edges from orthorectified image: Figure 2(e) repre-
sents the result of applying Canny edge detector on orthorec-
tified image relating to the selected building. As shown, the
Canny edge extraction method looks for local maxima of the
gradient of the image.
The above mentioned three feature descriptors are employed to
classify edge pixels extracted from the orthorectified image into
ridge and non-ridge classes. Figure 4(a) illustrates the pixels
which are classified as ridge pixels plotted by red points. As
shown, all the red pixels do not correspond to the ridges and
therefore, an additional procedure is included to separate hori-
zontal pixels from the other pixels. For this aim, the pixels lo-
cated in almost same height are extracted (cf. Figure 4(b)).
Next, RAndom Sample Consensus (RANSAC) algorithm (Fis-
chler and Bolles, 1981) is employed to extract corresponding ridge
lines from the classified pixels (cf. Figure 4(c)).
(a) Potential ridge points (b) Classification of heights (c) RANSAC lines
Figure 4: Ridge extraction
2.2 Projection-based reconstruction of parametric roofs
In the proposed algorithm for 3D reconstruction of buildings con-
taining tilted roofs it is assumed that an individual building part
exists according to each ridge line. Therefore, for each ridge line
and the pixels locating in its buffer zone, a 3D model is fitted.
In order to extract the corresponding pixels to each ridge line, a
buffer zone around each ridge line is considered and the local-
ized pixels in that buffer zone are analyzed for model generation.
In figure 5(a) the red points represent the localized points corre-
sponding to the blue ridge line.
Procedure continues by projecting the localized points onto a 2D
plane perpendicular to the ridge direction (cf. Figure 5(b)). The
overall aim in this step is to look from the front view of the build-
ing part defined by the ridge direction and extract the 2D model
related to the front- and back-side of the building part that take
maximum support of the pixels. Therefore, two vertical lines re-
lating to the walls and two inclined lines relating to the roof faces
are defined (cf. Figure 5(b)). The quality of the 2D model in this
step depends on the existence of a sufficient number of height
points relating to each side of the wall. It is common in complex
buildings that the number of supporting height points at least for
one side of the building part is not sufficient to be able to extract
the corresponding vertical line. To cope with this problem a ver-
tical line which is located symmetrically to the side with more
supported points is defined. Hence, the algorithm in this step
only extracts the side walls having equal distances to the ridge
In order to shape the final 3D model relating to the building part,
the 2D model is converted back to 3D by extruding it orthog-
onally to the projection plane. The 3D model consists of four
walls plus one to four roof planes: two inclined planes in addi-
tion to two vertical triangular planes for a gable roof, and four
inclined planes for a hipped roof (cf. Figure 5(c)).
After reconstructing 3D models for all building parts, they are
merged to form the overall 3D model of the building. Figure
5(d) displays a building model produced by merging three build-
ing parts. The three ridge lines lead to three parametric building
models with hipped roofs. The method contains some extra pro-
cesses to refine the nodes which represent the same corner. If the
nodes are close to each other an average location is determined.
70 7
80 8
90 95 100 10
(a) Localized pixels (b) Fitting 2D model
(c) 3D model of building part (d) Merge parametric models
Figure 5: Projection based model generation
2.3 Approximation of the remaining polygons and generat-
ing prismatic models
Two algorithms are proposed for approximation of the building
polygons based on the main orientation of the buildings (Arefi et
al., 2007). The algorithms are selected according to the number
of main orientations of the buildings and implemented as follows:
If the building is formed by a rectilinear polygon, i.e., sides
are perpendicular to each others from the top view, a method
based on Minimum Bounding Rectangle (MBR) is applied
for approximation. This method is a top-down, model-based
approach that hierarchically optimizes the initial rectilinear
model by fitting MBR to all details of the data set. Principles
of MBR based polygon approximation is presented in Figure
If the building is not rectilinear, i.e., at least one side is
not perpendicular to the other sides, the RANSAC based
(e) New regions produced by
subtraction of (c) and (d)
(f) Superimposed final rectilinear
polygons (red) on DEM
Figure 6: MBR based polygon approximation
method is applied for approximation. In this algorithm the
straight lines are repeatedly extracted using RANSAC al-
gorithm and merged to form the final polygon. Figure 7
shows the RANSAC based approximation of the same build-
ing represented in Figure 6.
Figure 7: Approximation of polygon obtained using RANSAC
In order to include the other structures (here, with flat roof) into
the merged parametric model generated in Section 2.2, the ground
plan of the merged model is compared with approximated poly-
gon. In figure 8(a) the corresponding area related to the para-
metric models plotted as blue lines and approximated polygon
by MBR based method is illustrated using red lines. The over-
all area of the approximated polygon is subtracted from the cor-
responding area for the parametric models. The positive pixels
belong to protrusions and the negative pixels are related to inden-
tations. Corresponding areas to the protrusion and indentation
are again approximated. The average of the heights of the in-
ternal points of protrusion area is used as height of the building
part. Although, this does not mean that the protrusion parts have
(a) MBR based approximation
(red) and parametric models
(b) Merged parametric and prismatic
Figure 8: Generating final 3D model of a building containing
parametric and prismatic roof structures
always flat roof, but since their corresponding roof types cannot
be distinguished by the proposed algorithm, a prismatic model is
fitted to the points.
2.4 Merge parametric and prismatic models
A final model of the building block is provided by including the
prismatic model corresponding to the protrusion area to the para-
metric models and excluding the indentation area from it. The
corresponding polygon nodes of indentation and protrusion re-
gions are included in the overall 3D model. Finally, the incli-
nations of the building roofs are adapted after including the in-
dentation nodes. Figure 8(b) shows the final 3D reconstruction
model of the building block after merging parametric and pris-
matic models. As shown, the building contains a dome shaped
part which is not properly modeled.
The proposed algorithm for 3D reconstruction of the buildings
from Worldview-2 DEM by integrating image information has
been tested in an area located at the city center of Munich – Ger-
The area contains 7 buildings with different shapes that are all
modeled using projection based approach. Figure 9 illustrates the
vector polygons corresponding to the 3D models plotted on the
orthorectified 9(a) image as well as the Digital Elevation Model
9(b). The visual interpretation of the models from the top view
(2D), comparing to the orthorectified image and DEM shows that
almost all the extracted eave and ridge lines of the buildings are
located on their correct locations. As mentioned, the model still
can be refined to generate coinciding corners.
Additionally the comparison can be extended in 3D by superim-
posing the representation of the parametric models on a 3D sur-
face generated from the DEM (cf. Figure 10). In this figure the
roof and wall polygons are filled by green and red colors, respec-
Accordingly, the quality of the model can be evaluated by rate
of visible colors against gray (height) pixels. In area where the
green colors are visible, the produced roof model is higher than
the height pixels in the DEM. In contrast the visible gray pixels
on the roofs show that the roof model is located below the DEM
in that points. A similar conclusion describes the quality of the
walls against DEM pixels. Figure 11 shows a picture provided
from “Google Earth” corresponding to the test area. It is captured
from 3D view which also proves the quality of the produced 3D
(a) 3D models superimposed on ortho
rectified image
(b) 3D models superimposed on DEM
Figure 9: Automatically generated 3D building models superim-
posed on (a) orthorectified Worldview image and (b) DEM
model. Comparison of the model represented in Figures 10 and 9
with this figure shows that there are still some small 3D structures
such as dormers and cone shaped objects that are not modeled.
This is due to not sufficient allocated pixels corresponding to that
regions in DEM for model generation.
An algorithm for automatic 3D reconstruction of the buildings
from Worldview-2 DEM is proposed which also uses edge in-
formation from orthorectified image. According to the ridge in-
formation the building block is decomposed into several parts de-
Figure 10: 3D representation of parametric models superimposed
on DEM
Figure 11: Google earth – Corresponding to the test area
pending on the number of ridge lines. For each ridge, a projection
plane is defined and all the points located on the buffer zone of the
ridge line are projected onto that plane. Next, a 2D model which
is supported by maximum number of projected points is modeled
and then extended to 3D to shape a hipped- or gabled-roofs (para-
metric model). Integrating all 3D models corresponding to each
ridge line produces the parametric model of the building block.
Additionally prismatic models with flat roof are provided regard-
ing the remaining areas that are not already modeled by the pro-
jection based method. Finally, all parametric and prismatic mod-
els are merged to form the final 3D model of the buildings.
The example used in the previous section to illustrate the devel-
oped algorithms shows that the concept for building reconstruc-
tion works quite well. A strength of this projection based ap-
proach is its robustness and that it is quite fast because projection
into 2D space reduce the algorithmic complexity significantly.
Arefi, H., 2009. From LIDAR Point Clouds to 3D Building Mod-
els. PhD thesis, Bundeswehr University Munich.
Arefi, H. and Hahn, M., 2005. A morphological reconstruction al-
gorithm for separating off-terrain points from terrain points in
laser scanning data. In: International Archives of Photogram-
metry, Remote Sensing and Spatial Information Sciences, Vol.
36 (3/W19).
Arefi, H., Engels, J., Hahn, M. and Mayer, H., 2007. Approxi-
mation of building boundaries. In: Urban Data Management
Systems (UDMS) Workshop, Stuttgart, pp. 25 – 33.
Arefi, H., Engels, J., Hahn, M. and Mayer, H., 2008. Levels of
Detail in 3D Building Reconstruction from LIDAR Data. In:
International Archives of the Photogrammetry, Remote Sens-
ing and Spatial Information Sciences, Vol. 37 (B3b), pp. 485 –
d’Angelo, P., Schwind, P., Krau, T., Barner, F. and Reinartz,
P., 2009. Automated dsm based georeferencing of cartosat-1
stereo scenes. In: HighRes09, pp. xx–yy.
Fischler, M. and Bolles, R., 1981. RAndom Sample Consensus:
A paradigm for model fitting with applications to image anal-
ysis and automated cartography. Communications of the ACM
24(6), pp. 381–395.
Geibel, R. and Stilla, U., 2000. Segmentation of laser-altimeter
data for building reconstruction: Comparison of different pro-
cedures. In: International Archives of Photogrammetry and
Remote Sensing and Spatial Information Sciences, Vol. 33
(B3), pp. 326 – 334.
Gonzalez and Woods, 2008. Digital Image Processing. Prentice
Hall, Upper Saddle River, NJ.
Gorte, B., 2002. Segmentation of TIN-structured surface mod-
els. In: International Archives of Photogrammetry and Remote
Sensing and Spatial Information Sciences, Vol. 34 (4).
uller, H., 2008. Stereo processing by semiglobal match-
ing and mutual information. IEEE Trans. Pattern Anal. Mach.
Intell. 30(2), pp. 328–341.
Jain, A. and Dubes, R. C., 1988. Algorithms for Clustering Data.
Prentice Hall, Englewood Cliffs, NJ.
Kada, M. and McKinley, L., 2009. 3D Building Reconstruction
from Lidar Based on a Cell Decomposition Approach. pp. 47–
Rottensteiner, F., 2006. Consistent estimation of building pa-
rameters considering geometric regularities by soft constraints.
In: International Archives of Photogrammetry, Remote Sens-
ing and Spatial Information Sciences, Vol. 36 (3), pp. 13 – 18.
Rottensteiner, F. and Jansa, J., 2002. Automatic extraction of
buildings from LIDAR data and aerial images. In: Interna-
tional Archives of Photogrammetry, Remote Sensing and Spa-
tial Information Sciences, Vol. 34number 4, pp. 569–574.
Tarsha Kurdi, F., Landes, T., Grussenmeyer, P. and Koehl, M.,
2007. Model-driven and data-driven approaches using LIDAR
data: Analysis and comparison. In: International Archives
of Photogrammetry, Remote Sensing and Spatial Information
Sciences, Vol. 36 (3-W49A), pp. 87 – 92.
... The Classical morphological approach [2] for example just searches for each point in a DSM the lowest neighbour in a given radius around this point. Newer approaches include e.g. the geodesic dilation as described in [3] or the multi-directional slope dependent (MSD) DTM generation method by [4] which is an extension to the directional filtering concept of [5]. The Normalized Volume above Ground (NVAG) method proposed in [6] is an extension to [4] including also the volume of the objects on the scanline. ...
In this paper we will present a simplified approach for extracting the ground level – a digital terrain model (DTM) – from the surface provided in a digital surface model (DSM). Most existing algorithms try to find the ground values in a digital surface model. Our approach works the opposite direction by detecting probable above ground areas. The main advantage of our approach is the possibility to use it with incomplete DSMs containing much no data values which can be e.g. occlusions in the calculated DSM. A smoothing or filling of such original derived DSMs will destroy much information which is very useful for deriving a ground surface from the DSM. Since the presented approach needs steep edges to detect potential high objects it will fail on smoothed and filled DSMs. After presenting the algorithm it will be applied to a test area in Salzburg and compared to a terrain model freely available from the Austrian government.
... Locating and determining roof ridges is significant, as the resulting quality of the building model depends on the quality of this detection. Ridge lines are the basis for the division of buildings into separate parts and form the basis of modelling these parts (Arefi and Reinartz, 2011). ...
Full-text available
The results of Remote Piloted Aircraft System (RPAS) photogrammetry are digital surface models and orthophotos. The main problem of the digital surface models obtained is that buildings are not perpendicular and the shape of roofs is deformed. The task of this paper is to obtain a more accurate digital surface model using building reconstructions. The paper discusses the problem of obtaining and approximating building footprints, reconstructing the final spatial vector digital building model, and modifying the buildings on the digital surface model.
... Spaceborne VHR optical sensors are able to scan any point of the Earth surface with a ground spatial resolution around half meter and a revisit time in the order of few days; through their very agile manoeuvring they can also acquire stereo data within the same orbit just using the CCD line combination, by pointing at the same area from two or more orbit positions (Poli and Toutin, 2012). Thanks to the improved acquisition technology, this new class of VHR sensors allows surface modelling up to building level of detail, in 2.5D (DSM) or even in 3D (object extraction) (Poli et al. 2009;Arefi and Reinartz, 2011;d'Angelo and Reinartz, 2011;Capalbo et al., 2012;Poli and Caravaggi, 2012). The paper introduces the project, with information on the testfield characteristics and activity plan, and describes the processing carried out on WorldView-2 and GeoEye-1 stereopairs. ...
Full-text available
Today the use of spaceborne Very High Spatial Resolution (VHSR) optical sensors for automatic 3D information extraction is increasing in the scientific and civil communities. The 3D Optical Metrology (3DOM) Unit of the Bruno Kessler Foundation (FBK) in Trento (Italy) has collected stereo VHSR satellite imagery, as well as aerial and terrestrial data over Trento, with the aim to create a complete data collection with state-of-the-art datasets for investigations on image analysis, automatic digital surface model (DSM) generation, 2D/3D feature extraction, city modelling and data fusion. The testfield region covers the city of Trento, characterised by very dense urban (historical centre), residential and industrial areas, and the surrounding hills and steep mountains (approximate height range 200-2100 m) with cultivations, forests and bare soil. This paper reports the analysis conducted in FBK on the VHSR spaceborne imagery of Trento testfield for 3D information extraction. The data include two stereo-pairs acquired by WorldView-2 in August 2010 and by GeoEye-1 in September 2011 in panchromatic and multispectral mode, together with their original Rational Polynomial Coefficients (RPC), and the position and description of well distributed ground points. For reference and validation, a DSM from airborne LiDAR acquisition is used. The paper gives details on the project and the dataset characteristics. The results achieved by 3DOM on DSM extraction from WorldView-2 and GeoEye-1 stereo-pairs are shown and commented.
Conference Paper
Full-text available
A Digital Terrain Model (DTM) is a representation of the bare-earth with elevations at regularly spaced intervals. This data is captured via aerial imagery or airborne laser scanning. Prior to use, all the above-ground natural (trees, bushes, etc.) and man-made (houses, cars, etc.) structures needed to be identified and removed so that surface of the earth can be interpolated from the remaining points. Elevation data that includes above-ground objects is called as Digital Surface Model (DSM). DTM is mostly generated by cleaning the objects from DSM with the help of a human operator. Automating this workflow is an opportunity for reducing manual work and it is aimed to solve this problem by using conditional adversarial networks. In theory, having enough raw and cleaned (DSM & DTM) data pairs will be a good input for a machine learning system that translates this raw (DSM) data to cleaned one (DTM). Recent progress in topics like 'Image-to-Image Translation with Conditional Adversarial Networks' makes a solution possible for this problem. In this study, a specific conditional adversarial network implementation "pix2pix" is adapted to this domain. Data for "elevations at regularly spaced intervals" is similar to an image data, both can be represented as two dimensional arrays (or in other words matrices). Every elevation point map to an exact image pixel and even with a 1-millimeter precision in z-axis, any real-world elevation value can be safely stored in a data cell that holds 24-bit RGB pixel data. This makes total pixel count of image equals to total count of elevation points in elevation data. Thus, elevation data for large areas results in sub-optimal input for "pix2pix" and requires a tiling. Consequently, the challenge becomes "finding most appropriate image representation of elevation data to feed into pix2pix" training cycle. This involves iterating over "elevation-to-pixel-value-mapping functions" and dividing elevation data into sub regions for better performing images in pix2pix.
Conference Paper
The work presented in this paper shows the possibility of an automatic extraction of three dimensional urban objects from very high resolution (VHR) satellite scenes from anywhere of the world. Actual VHR satellites like GeoEye, World-View-2 or 3 or the Pliades system have ground sampling distances (GSD, “pixel sizes”) of 0.3 to 0.7 centimetres. All these systems allow also the acquisition of in-orbit-stereo-images. These are two or more images of the same location on ground acquired in the same orbit of the satellite from different viewing angles mostly only some seconds apart. From such stereo or – if more than two images were acquired – multistereo images in a first step a high resolution digital surface model (DSM) can be extracted with the same GSD as the stereo imagery. In the second step the inevitable errors and holes in the generated DSM will be filled and corrected using the multispectral imagery. Beneath the very high resolution panchromatic images which are used for the generation of the DSM also lower resolution – normally about 1/4 of the resolution of the panchromatic bands – multi-spectral images are acquired. These contain at least the four visible/NIR (VNIR) bands blue, green, red and near-infrared (NIR). Some sensors have more VNIR bands like World-View-2 (coastal, blue, green, yellow, red, red-edge and two NIR bands) or even additionally short-wave-infrared (SWIR) bands like World-View-3. From these mutispectral bands in a third step a spectral classification can be derived. This classification is used mainly for discrimination of vegetation and non-vegetation areas and the detection of water areas. The last step in this pre-processing comprises the correct orthorectification of the DSM and the pan-sharpened multispectral image. After this pre-processing of the stereo-imagery urban objects like buildings, trees, roads, bridges, and so on can be detected and in a last step these objects will be modeled to produce a final object-model of the satellite-scene or parts of it. In this paper the method is described and applied to an example satellite imagery.
Full-text available
Using the capability of WorldView-2 to acquire very high resolution (VHR) stereo imagery together with as much as eight spectral channels allows the worldwide monitoring of any built up areas, like cities in evolving states. In this paper we show the benefit of generating a high resolution digital surface model (DSM) from multi-view stereo data (PAN) and fusing it with pan sharpened multi-spectral data to arrive at very detailed information in city areas. The fused data allow accurate object detection and extraction and by this also automated object oriented classification and future change detection applications. The methods proposed in this paper exploit the full range of capacities provided by WorldView-2, which are the high agility to acquire a minimum of two but also more in-orbit-images with small stereo angles, the very high ground sampling distance (GSD) of about 0.5 m and also the full usage of the standard four multispectral channels blue, green, red and near infrared together with the additional provided channels special to WorldView-2: coastal blue, yellow, red-edge and a second near infrared channel. From the very high resolution stereo panchromatic imagery a so called height map is derived using the semi global matching (SGM) method developed at DLR. This height map fits exactly on one of the original pan sharpened images. This in turn is used for an advanced rule based fuzzy spectral classification. Using these classification results the height map is corrected and finally a terrain model and an improved normalized digital elevation model (nDEM) generated. Fusing the nDEM with the classified multispectral imagery allows the extraction of urban objects like like buildings or trees. If such datasets from different times are generated the possibility of an expert object based change detection (in quasi 3D space) and automatic surveillance will become possible.
A hybrid model based on 3D deterministic Toutin’s model developed at the Canada Centre for Remote Sensing and taking full advantages of the image metadata was used to geometrically process WorldView-1 and -2 stereo images without in situ ground control points (GCPs) collection. Elevations were thus extracted and compared to 0.2-m accurate lidar elevation data to see the impact of no-GCP. Elevations linear errors with 68% confidence level (LE68) computed over bare surfaces were 2.6 m and 2.1 m for WorldView-1 and -2, respectively with small biases. The not-significant difference in LE68 (around 10–20 cm), when compared to the solution using accurate GCPs, will offer strong advantages of no-control collection in operational conditions, mainly in remote and harsh environments or when cartographic or ground control data do not exist.
Full-text available
During the last decade various techniques have been proposed to extract the ground surface from airborne LIDAR data. The basic problem is the separation of terrain points from off-terrain points which are both recorded by the LIDAR sensor. In particular geometry driven filtering, detection or classification procedures are developed which use knowledge to find points, e.g. on buildings or vegetation. Depending on the application the off-terrain points are excluded from further processing, e.g. for DTM generation, or used, e.g. for building reconstruction. In this paper a new method is proposed to separate 3D off-terrain points from the terrain points. Morphological grayscale reconstruction plays the key role in the proposed algorithm to produce the bare ground. After a short description of morphological reconstruction an algorithm based on this technique is presented. Issues of the implementation of the morphological reconstruction algorithm are discussed and illustrated. Experiments are carried out with different LIDAR data sets, which point out the capacity of the process.
Full-text available
This paper is focused on two topics: first, it deals with a technique for the automated generation of 3D building models from directly observed LIDAR point clouds and digital aerial images, and second, it describes an object-relational technique for handling hybrid topographic data in a topographic information system. Automatic building extraction combining the mentioned data sources consists of three steps. First, candidate regions for buildings have to be detected. After that, initial polyhedral building models have to be created in these candidate regions in a bottom-up procedure. Third, these initial polyhedral building models have to be verified in the images to improve the accuracy of their geometric parameters. This paper describes the current state of development, the overall work flow, and the algorithms used in its individual stages. Intermediate results are presented.
Full-text available
The reconstruction of 3D city models has matured in recent years from a research topic and niche market to commercial products and services. When constructing models on a large scale, it is inevitable to have reconstruction tools available that offer a high level of automation and reliably produce valid models within the required accuracy. In this paper, we present a 3D building reconstruction approach, which produces LOD2 models from existing ground plans and airborne LIDAR data. As well-formed roof structures are of high priority to us, we developed an approach that constructs models by assembling building blocks from a library of parameterized standard shapes. The basis of our work is a 2D partitioning algorithm that splits a building's footprint into nonintersecting, mostly quadrangular sections. A particular challenge thereby is to generate a partitioning of the footprint that approximates the general shape of the outline with as few pieces as possible. Once at hand, each piece is given a roof shape that best fits the LIDAR points in its area and integrates well with the neighbouring pieces. An implementation of the approach is used now for quite some time in a production environment and many commercial projects have been successfully completed. The second part of this paper reflects the experiences that we have made with this approach working on the 3D reconstruction of the entire cities of East Berlin and Cologne.
Full-text available
This paper describes a model for the consistent estimation of building parameters that is a part of a method for automatic building reconstruction from airborne laser scanner (ALS) data. The adjustment model considers the building topology by GESTALT observations, i.e. observations of points being situated in planes. Geometric regularities are considered by "soft constraints" linking neighbouring vertices or planes. Robust estimation can be used to eliminate false hypotheses about such geometric regularities. Sensor data provide the observations to determine the parameters of the building planes. The adjustment model can handle a variety of sensor data and is shown to be also applicable for semi-automatic building reconstruction from image and/or ALS data. A test project is presented in order to evaluate the accuracy that can be achieved using our technique for building reconstruction from ALS data, along with the improvement caused by adjustment and regularisation. The planimetric accuracy of the building walls is in the range of or better than the ALS point distance, whereas the height accuracy is in the range of a few centimetres. Regularisation was found to improve the planimetric accuracy by 5-45%.
Full-text available
Segmentation is an important step during 3-D building reconstruction from laser altimetry data. The objective is to group laser points into segments that correspond to planar surfaces, such as facets of building roofs or the (flat) terrain between buildings. A segmentation method is presented that was inspired by a raster-based algorithm in literature, but works on original (triangulated) laser points. It iteratively merges triangles and already formed segments into larger segments. The algorithm is controlled by a single parameter controlling the maximum dissimilarity for adjacent segments such that merging them is still allowed. The resulting TIN segmentation method is compared with 3-D Hough transform.
Full-text available
High quality and dense sampling are two major properties of recent airborne LIDAR data which are still improving. In this thesis a novel approach for generating 3D building models from LIDAR data is presented. It consists of four major parts: filtering of non-ground regions, segmentation and classification, building outline approximation, and 3D modeling. With filtering non-ground structures are eliminated from the laser data. Image reconstruction by means of geodesic morphology is at the core of the proposed algorithm. Structures which do not comply concerning size, or shape are suppressed. By interpolating the bald earth produced by filtering, Digital Terrain Models (DTM) are generated. Image segmentation creates the potential non-ground regions which are subject to rule-based classification. Geometric feature descriptors based on surface normals, the local height variation, and a vegetation index are employed to classify data into buildings, trees, and other objects such as power lines and cranes. After building classification, their outlines are extracted and unnecessary points are eliminated by two approximation procedures. One fits rectilinear polygons to the building outlines by a hierarchical adaptation of Minimum Bounding Rectangles (MBR). This works fast and reliable, but is restricted to rectangular shapes. For non-rectangular polygons, a Random Sample Consensus (RANSAC) based procedure is employed to fit straight lines. Lines are then intersected or joined. The automatic generation of 3D building models follows the definitions of the Levels of Detail (LOD) in the CityGML standard. Three LOD are considered in this thesis. The first LOD (LOD0) consists of the extracted DTM from the LIDAR data. A prismatic model containing the major walls of the building forms the LOD1. For it, the building roof is approximated by a horizontal plane. LOD2 includes the roof structures into the model. A model driven approach based on the analysis of the 3D points in 2D projection planes is proposed to analyze the roof structure. Building regions are divided into smaller parts according to the direction and the number of ridge lines, the latter extracted using geodesic morphology. A 3D model is derived for each building part. Finally, a complete building model is formed by merging the 3D models of the building parts and adjusting the nodes after merging. Results for test data show the potential but also the shortcomings of the approach also in comparison to related work.
Conference Paper
Full-text available
High resolution stereo satellite imagery is well suited for the creation of digital surface models (DSM). A system for highly automated and operational DSM and orthoimage generation based on CARTOSAT-1 imagery is presented, with emphasis on fully automated georeferencing. The proposed system processes level-1 stereo scenes using the rational polynomial coefficients (RPC) universal sensor model. The RPC are derived from orbit and attitude information and have a much lower accuracy than the ground resolution of approximately 2.5 m. In order to use the images for orthorectification or DSM generation, an affine RPC correction is required. This requires expensive and cumbersome GCP acquisition. In this paper, GCP are automatically derived from lower resolution reference datasets (Landsat ETM+ Geocover and SRTM DSM). The traditional method of collecting the lateral position from a reference image and interpolating the corresponding height from the DEM ignores the higher lateral accuracy of the SRTM dataset. Our method avoids this drawback by using a RPC correction based on DSM alignment, resulting in improved geolocation of both DSM and ortho images. The proposed method is part of an operational CARTOSAT-1 processor at Euromap GmbH for the generation of a high resolution European DSM. Checks against independent ground truth indicate a lateral error of 5-6 meters and a height accuracy of 1-3 meters.