ArticlePDF Available

ESTIMATING SPATIO-TEMPORAL URBAN DEVELOPMENT USING AI

Authors:

Abstract

Estimating the spatio-temporal profile of a building’s construction using high-resolution satellite images is a critical problem since it can be utilized for a variety of data-driven urban initiatives. One strategy to achieve this is to extract building footprints and track them in multi-temporal data as observed in SpaceNet’s Challenges. Although several unique solutions have been presented for this problem, this task can become extremely difficult for partially obscured buildings with densely overlapping boundaries, such as those found in underdeveloped countries like Pakistan. Consequently, in this paper we propose a framework to address this problem by merging built-up area segmentation with digital maps. In the first step, satellite image is passed to a deep learning model that predicts segmentation masks over the built-up area following which building construction profiles are generated by overlaying digital maps over these predicted masks. We compare the results with ground truth profiles and our results show that the proposed method extracts building counts and construction profiles with an accuracy of 95%.
ESTIMATING SPATIO-TEMPORAL URBAN DEVELOPMENT USING AI
M. A. Waseem, M. A. Basheer, M. Uppal, M. Tahir
Department of Electrical Engineering, Lahore University of Management Sciences, Lahore 54792, Pakistan
(m waseem, muhammad.basheer, momin.uppal, tahir)@lums.edu.pk
Commission IV, WG IV/9
KEY WORDS: Building Counts, Spatio-temporal profile, Digital Maps, Semantic Segmentation, Satellite Imagery, Remote Sens-
ing, Urban data
ABSTRACT:
Estimating the spatio-temporal profile of a building’s construction using high-resolution satellite images is a critical problem since
it can be utilized for a variety of data-driven urban initiatives. One strategy to achieve this is to extract building footprints and
track them in multi-temporal data as observed in SpaceNet’s Challenges. Although several unique solutions have been presented
for this problem, this task can become extremely difficult for partially obscured buildings with densely overlapping boundaries,
such as those found in underdeveloped countries like Pakistan. Consequently, in this paper we propose a framework to address this
problem by merging built-up area segmentation with digital maps. In the first step, satellite image is passed to a deep learning model
that predicts segmentation masks over the built-up area following which building construction profiles are generated by overlaying
digital maps over these predicted masks. We compare the results with ground truth profiles and our results show that the proposed
method extracts building counts and construction profiles with an accuracy of 95%.
1. INTRODUCTION
Urban planning has become more crucial than ever because of
the rapidly changing urban environment and human develop-
ment patterns. To meet the demands of the present and future
communities, urban planners will need to be more data-driven
in their planning to enable optimal land and infrastructure solu-
tions. In this regard, constructing a spatio-temporal profile of
the development of buildings is vital since it is used in a vari-
ety of applications, such as urban sprawl analyses, population
estimation, mobile targeting, managing infrastructure deploy-
ment, and enhancing citizens’ access to services. Traditional
methods for these types of tasks are usually based on onsite
measurements and surveys that require a lot of human effort,
time, and resources. With the advancements in remote sensing
technologies, it is now possible to extract these spatio-temporal
profiles from high-resolution satellite images.
The task of extracting the spatio-temporal construction profile
of buildings has been tackled primarily by three types of meth-
ods. These include classical methods, building footprint extrac-
tion methods, and regression methods. A brief review of each
type of method will be provided in Section 2. Although the res-
ults of these methods are excellent and they have been used in a
variety of applications effectively, their performance is still un-
certain for developing countries like Pakistan. The main reason
for this is that these developing countries have diverse devel-
opment patterns, including a lot of partially occluded buildings
that are densely packed together. Therefore, the state-of-the-art
building extraction and object detection methods fail to perform
well. This problem is represented in Figure 1, where similar
segmentation models have been used for the very high resolu-
tion (V.H.R) images of USA, as well as the relatively low resol-
ution and densely packed areas of Pakistan. The buildings foot-
prints have been extracted easily and accurately for USA, while
Corresponding author
the model is unable to accurately detect building boundaries for
densely packed areas of Pakistan. In addition, the unavailability
of very high to ultra high-resolution images (0.1m - 0.01m per
pixel) publicly makes it difficult to extract building footprints
with high accuracy.
Figure 1. A comparison of building footprint extraction in dif-
ferent areas. On the left side, successful building extraction has
been performed using deep learning models with high-resolution
imagery in the USA (ESRI, 2020). On the right, similar methods
have been applied but with much lower resolution and densely
packed areas of Pakistan.
To overcome the above-mentioned problems, we propose a
two-step novel approach that utilizes digital maps, along with
state-of-the-art deep learning methods to establish spatio-
temporal building profiles at any given location. At first, we
train a deep learning model based on DeepLabV3plus
architecture ( Chen et al., 2018a) for semantic segmentation of
built-up areas in satel-lite imagery. The built-up segmentation
masks are then overlaid with digital maps to extract the
construction profile for each building in that area. The details of
our proposed methodology will be explained in Section 3. To
obtain better results, we pre-pare a small dataset of 730 satellite
images which include over 25,000 buildings spanning around
57 sq. km of the land area of Lahore, the second largest city of
Pakistan.
The main contributions of our work are: (1) We create a data-
set that allows us to capture the building patterns in complex
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-4/W5-2022
7th International Conference on Smart Data and Smart Cities (SDSC), 19–21 October 2022, Sydney, Australia
This contribution has been peer-reviewed.
https://doi.org/10.5194/isprs-archives-XLVIII-4-W5-2022-197-2022 | © Author(s) 2022. CC BY 4.0 License.
197
urban scenarios, and (2) We develop a tool for extracting and
tracking building construction profiles in time-series. This tech-
nique could be employed by policy-makers and urban planners
for evidence based and data driven policy making.
The remainder of this paper is organized as follows. Section
2 highlights the diverse building extraction and object detec-
tion methods being used by different researchers followed by
methodological framework. Section 3 explains the end-to-end
pipeline used in this research for developing building construc-
tion profiles. After this, Section 4 quantifies the efficacy of
our proposed method by comparing the generated results with
ground truth data. Section 5 then delineates the main findings
of this research and explains how our proposed pipeline can be
used to estimate the growth in number of buildings in differ-
ent regions of Pakistan over the time. Finally, we conclude our
paper in Section 6 by outlining our contribution and providing
future research directions
2. RELATED WORK
In this section, we will discuss various approaches that have
been employed in existing literature for obtaining building pro-
files or estimating urban sprawl. As discussed earlier, three ma-
jor methods have been used to extract and count buildings from
satellite data, i.e., classical methods, building footprint extrac-
tion methods, and regression methods. Let us look into each
one briefly.
2.1 Classical Methods
Several classical methods that do not include deep models have
been proposed for estimating urban sprawl and population dens-
ity estimation. Most of these methods use Landsat’s images
to perform land use land cover (LULC) classification, and the
number of pixels per class gives a rough estimate of urban growth
( El Garouani et al., 2017; Shah et al., 2021; Sahana et al.,
2018). However, such methods cannot give a quality estim-
ate at a building level because of the low resolution (maximum
of 15m per pixel) of the Landsat images. In contrast to Land-
sat images, some methods also try to estimate urban sprawl
through population density estimates ( Zhang, 2003; Terzi and
Kaya, 2008), which are also rarely available for many under-
developed and developing countries. Moreover, linear model-
ing was applied on fine-resolution (1m per pixel) LiDAR data
along with satellite images to estimate building counts ( Silvan-
Cardenas et al., 2010), but again the approach is quite expensive
and non-scalable.
2.2 Building Footprint Extraction Methods
With the advent of remote sensing technology, researchers are
effectively employing computer vision techniques to extract build-
ing footprints. Several mathematical models have been pro-
posed so far ( Ok et al., 2013; Huang et al., 2014; Chen et
al., 2018b) for this purpose, but they require very high res-
olution images ( 0.01m per pixel) to give good results. To
overcome this issue, researchers began to use deep learning
approaches for the task of building detection and footprint ex-
traction. The availability of freely available datasets and open
challenges ( Wang et al., 2016; Maggiori et al., 2017; Ji et al.,
2018; ISPRS 2D Semantic Labeling Contest, n.d.; Van Etten et
al., 2018; Gupta et al., 2019 have also boosted the interests in
this area. The Space-Net challenges ( Van Etten et al., 2018), in
particular, have demonstrated the feasibility of extracting build-
ings from medium-resolution satellite images (1–4m per pixel).
Hence a lot of solutions, including segmentation along with
post-processing ( Yuan, 2018; Liu et al., 2018), instance seg-
mentation ( Wen et al., 2019; Zhao et al., 2018), generative ad-
versarial networks (GANs) ( Li et al., 2018; Shi et al., 2018),
customized networks ( Hui et al., 2019; Liu et al., 2019), and
graph-based networks ( Qin et al., 2018) have been proposed
in recent years. Moreover, trained models are also available
on ArcGIS ( ESRI, 2020) which can be used directly to extract
building footprints. However, as explained in Section 1, these
methods still struggle to deliver effective results for areas with
varying architectural designs, and tightly packed buildings with
no visible gaps between consecutive buildings.
2.3 Regression Methods
Apart from the above two methods that count buildings in an
area implicitly, several approaches in the literature directly count
the number of buildings from satellite images and other GIS
data. For example, micro-scale data for spatio-temporal model-
ing of building population estimation ( Greger, 2015) was done
for highly urbanized areas. A mathematical model was formu-
lated that tries to count the number of buildings by counting the
number of objects of a certain class in a desired region ( Meng
et al., 2021). A deep learning-based regression model was pro-
posed ( Shakeel et al., 2019) for counting built-up areas in satel-
lite imagery. To adapt the pre-trained building counting models
on the developed countries for the under-developing countries
with unlabelled data, counting consistencies have been used
( Zakria et al., 2021). Although these methods have proved
to be quite effective, most of these provide counts for a region
(either an image or a whole area) and do not provide within
region densities, i.e., estimating the construction profile at the
level of individual building.
3. METHODOLOGY
From the arguments in the preceding section, it is clear that the
deliberate or subliminal aim in existing literature is to generate
building profiles. Each suggested solution however has certain
limitations. For instance, some methods need high-quality data,
which prevents them from being extended to other areas, while
others provide urban growth detection at higher urban scales
and do not offer estimates at the individual building level. In
this section, we will go over our suggested methods for ad-
dressing these challenges. First, we will discuss the data re-
quirements, where we demonstrate how easily accessible data
can be used to solve the issue. Next, we will discuss the pro-
cess of model training, where we use the collected data to train
a deep learning model for built-up area semantic segmentation.
After that, we will describe the end-to-end pipeline we utilize
to predict the construction profile of each (individual) building.
The pipeline makes use of the trained model as well as addi-
tional data extracted from digitized maps.
3.1 Data Requirements
Two key data sources are required to use our suggested pipeline:
(i) RGB satellite images of the region/area of interest, and (ii)
geo-tagged digitized map data with the locations of all avail-
able plots in that area. We used satellite imagery from Google’s
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-4/W5-2022
7th International Conference on Smart Data and Smart Cities (SDSC), 19–21 October 2022, Sydney, Australia
This contribution has been peer-reviewed.
https://doi.org/10.5194/isprs-archives-XLVIII-4-W5-2022-197-2022 | © Author(s) 2022. CC BY 4.0 License.
198
repository, which is freely available for academic usage. To cre-
ate a semantic segmentation dataset for our region, we label a
fraction of images using GIS tools. As a result, our deep model
works incredibly well for our areas. The digital map informa-
tion was taken from the openly downloadable housing plans for
about 350+ societies in Lahore, Pakistan. Let us now discuss
each individual step in detail.
3.1.1 Labelling The trained models for the developed world
do not perform well in segmenting out the built-up regions in
the under-developing nations as has been highlighted in the pre-
ceding sections. It is, therefore, necessary to gather segmenta-
tion data for these regions. To deal with this problem, we de-
velop a dataset by manually marking building footprints for a
portion of Lahore, a metropolitan city in Pakistan. We selec-
ted the region of the Defence Housing Authority (DHA) from
Lahore, which is roughly 57 sq. km in area. Using satellite
imagery, we manually map the footprints of 24,928 building
structures that are located in this region. Figure 2 displays the
overview of the markings in the indicated location. Since we
are working on multi-temporal data, therefore we also mark a
portion of historical images as well.
Figure 2. Details of the marked Area. The complete marked
area of Defence Housing Authority (DHA), Lahore, along with a
zoomed in look at the marked footprints
3.1.2 Extracting Segmentation Data The marked/digitized
polygons along with satellite image are then used to extract seg-
mentation data. The next step is to transform this data into tiny
images and masks so that they can be fed directly to the deep
model. For this, the entire region is divided into 300m x 300m
sized tiles from which images and their respective masks are
extracted. Additionally, we mask off the area’s component that
was not a part of the marked zone. Figure 3 depicts the mask-
ing procedure and tile extraction. We prepared a dataset of 740
images for the DHA region using this procedure.
3.1.3 Collecting Digitized Map Data The data gathered can
be leveraged to train the deep model and, in some way, be
used to make predictions about built-up regions. However, an-
other form of data—the geo-locations of the accessible plots in
the specified region of interest—is required for our pipeline to
make inferences at the scale of buildings. To do this, we first
gathered the society maps (in the form of images) for more than
350 societies in Lahore. After that, we vectorize these maps of
the societies using raster to vector conversion tools from the
GDAL package. Then, as shown in Figure 4, we extract the
center points from each polygon on the vectorized map, which
provides us with the Geo-locations of the available plots. Note
that the map includes information about plots of land only; it
Figure 3. Tiling of Images. Here we show our process of ex-
tracting images from Geo-tiff file. On left, grid of 300m x 300m
is overlaid over the marked region. On the right, the extracted
image (upper right) and it’s corresponding segmentation mask
(lower right) from one of the tiles is shown. The black area shows
the unmarked region.
does not indicate whether a particular plot of land has construc-
tion on it.
Figure 4. Extracting Geo-Locations. An example of one of the
geo-referenced digitized map, along with the extraction of center
points using raster to vector and other GIS tools.
3.2 Model Training
Once the data is obtained, we train a deep learning model based
on DeepLabV3plus ( Chen et al., 2018a) architecture with Res-
net50 encoder for semantic segmentation of built-up regions.
Since our dataset is really small, therefore training a deep net-
work like DeepLabV3plus, with millions of parameters, from
scratch may lead to over-fitting. To prevent this potential prob-
lem, we initialize the model with pre-trained weights on the
image-net dataset. After the initialization, we set the number
of classes of the last convolutional layer to 2 (building or non-
building). Once this is complete, we unfreeze all the layers and
then fine-tune the model using our small dataset. We train the
model with a batch size of 4 and a learning rate of 0.00008, and
we train it for 80 Epochs.
3.3 Proposed Pipeline
Once our deep model is trained, we have completed every re-
quirement for our proposed pipeline. Hence, we can now com-
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-4/W5-2022
7th International Conference on Smart Data and Smart Cities (SDSC), 19–21 October 2022, Sydney, Australia
This contribution has been peer-reviewed.
https://doi.org/10.5194/isprs-archives-XLVIII-4-W5-2022-197-2022 | © Author(s) 2022. CC BY 4.0 License.
199
Figure 5. The Proposed Pipeline. The first step is the selection of AOI and providing its satellite image to the trained model through
tiling. After that, the generated masks are merged, geo-referenced, and digitized into a vector file. In the end, the vectorized file is
overlaid with the extracted geo-locations for that AOI, and profiles are developed. The green dots in the final image represent the
construction of that specific building, while the red dots show that they weren’t constructed.
bine these collected blocks in an end-to-end manner and predict
the construction profiles of each individual plot in the provided
region. Our proposed pipeline is shown in Figure 5, where we
first pass images of our area of interest to the deep model. The
deep learning model returns us segmentation masks for each
of the passed images. We then apply a post-processing stage
where we simply merge these images, geo-tag the merged file,
and then digitize the merged raster to obtain vector/polygons
over the built-up regions. In the end, we overlay the plot loca-
tions extracted from digitized society maps with these predicted
polygons and develop the construction profiles of each building.
We next describe each part of the proposed pipeline in detail.
3.3.1 Selection of AOI and tiling The first step in our pipeline
is the selection of an area of interest (AOI) and providing the
satellite image for that specific AOI. We extract 300m x 300m
tiles from this large image using the step mentioned earlier. For
a multi-temporal analysis like in our case, images at each of
the specified times are collected, and the process is repeated for
each of these images.
3.3.2 Inferences from the Model Once we have data in the
form of images, we pass these images to our trained deep model.
The model predicts a segmentation mask for each of these im-
ages, providing 1 for pixels labelled built-up and 0 otherwise.
These binary masks are saved for each image.
3.3.3 Merging, Geo-Referencing, and Digitizing Once masks
against each of the images have been predicted, the next task is
to apply post-processing methods to convert them to vectors/
polygons so that they can be used as an overlay layer for the
geo-locations of the plots. For this purpose, we employ tools
from the Geospatial Data Abstraction Library (GDAL). We use
the coordinates that were used to extract each tile for its geo-
referencing and then combine each of the geo-referenced tiles.
Once all tiles are geo-referenced, we vectorize this large image
with the help of the value assigned to built-up pixels. Hence, we
obtain a single vector file for our desired AOI that contains the
information of the ”built-up regions”. To use this information
to extract the construction profile of each building, we need to
go through one more step, which is described next.
3.3.4 Extracting building profiles The built-up regions ex-
tracted in the previous step can alone be utilized to formulate
many policies. However, they do not provide granular inform-
ation about the construction or non-construction of each indi-
vidual building in that area. Many footprint extraction methods
try to add some classic post-processing stages, but the results
are not good enough. We propose to solve this by formulat-
ing an easier problem than extracting footprints. We use the
plots’ geo-locations, extracted from the digitized maps, and as-
sign them labels based on their position with respect to the gen-
erated built-up regions. Each plot of land is considered to have
construction on it if it lies within any of the built-up regions, and
un-built otherwise. Mathematically, this is akin to assigning a
binary label yito each plot of land such that.
yi=I(xi, r)rR(1)
Here, irepresents the time index while xiLis the location
of the ith plot in that area. rRis the built-up region extrac-
ted by our model, while I(a, b)is an indicator function showing
whether the point alies within any of the regions in bor not. In
particular, I(a, b) = 1 if ab, and is equal to 0 otherwise.
Thus, yi[0,1] is the estimated construction profile for that
building at a specific time. Using this straightforward mathem-
atical comparison, one can predict built-up labels for each of
the plots in a given area at a specific time. As a result, our tech-
nique can be utilized to produce spatio-temporal development
profiles of building construction for any specified metropolitan
region with ease.
4. ACCURACY ASSESSMENT
In this section, we measure the performance of our model us-
ing different performance metrics. We compare the generated
profiles from our pipeline, to the ground truth profiles. The
ground truth data is simply obtained by using (1), but instead
of predicted regions, we use ground truth building footprints as
an overlay layer. First, we will check the performance of the
model using a confusion matrix and afterward, we will see how
does the model performs in estimating total number of building
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-4/W5-2022
7th International Conference on Smart Data and Smart Cities (SDSC), 19–21 October 2022, Sydney, Australia
This contribution has been peer-reviewed.
https://doi.org/10.5194/isprs-archives-XLVIII-4-W5-2022-197-2022 | © Author(s) 2022. CC BY 4.0 License.
200
counts for different spatial and temporal locations. In this way,
we can better know about efficiency of the model as well as its
consistency across spatial and temporal data.
4.1 Confusion Matrix
We use our ground truth footprints in (1) in place of predicted
regions R to establish the ground truth profiles of the buildings.
After that, we compare both predicted profiles and ground truth
profiles to create the confusion matrix. For example, if a spe-
cific building’s profile is labelled as built-up in ground truth,
while it was labelled un-built by our pipeline, than it is termed
as false negative, and so on. Using these settings, we obtain the
confusion matrix as shown in Table 1.
Ground Truth Label
Predicted Label
ˆ
Pˆ
NTotal
P25893 2144 28037
N2047 57292 59339
Total 27940 59436 87376
Table 1. The confusion matrix for building construction profiles. Pˆ
and Nˆ are the ground truth labels for built-up and un-built,
respectively. P and N are the predicted la-bels for built-up and un-
built, respectively.
It is clear from the table that our model predicts the construction
profiles with an accuracy of 95%. The precision & recall score
is approximately 92%, which is excellent.
4.2 Spatial Consistency
In spatial consistency, we count the total number of constructed
buildings in different areas and compare them with the ground
truth count. In this way, the performance of our proposed pipeline
in estimating building counts for a desired area can be evalu-
ated. For this purpose, we sub-divide the marked area into 12
regions and we compare the total counts from model generated
profiles with the ones generated from ground truth. To make
calculations consistent over the size of regions, we divide the
counts with the total number of plots of that region and we call
it built-up ratio, as described in (2).
BR =PN
iyi
NyiY(2)
Here, BR is the output built-up ratio, Y is a set of profiles of
buildings in that area, and N is the size of set Y. In this way,
areas with 10,000 plots and 1000 plots can be evaluated using
the same scale. Using this equation, built-up ratios for both
ground truth and predicted profiles were calculated for each of
the 12 regions. The results of the evaluation are shown in Fig-
ure 6.
Figure 6. Spatial Consistency of the model: The results for spa-
tial consistency performance of our model. On x-axis, we have
different regions of DHA and on y-axis we show the estimated
and ground truth BRs for that region.
The model is performing extremely well in almost all cases,
with an average deviation of ±0.01(or 1%) form ground truth
built-up ratios.
4.3 Temporal Consistency
In temporal continuity, we perform the same analysis as in spa-
tial consistency, but here we change the temporal dimension
while keeping the area constant. In this way, the performance
of the model over different dates on the same area can be eval-
uated. As we had marked a portion of past images to incor-
porate historical data in training as well, therefore we used that
to develop ground truth profiles for that portion. We apply our
pipeline to 17 different dates distributed between the year 2010
to the year 2020. The results for evaluation are shown in Figure
7.
Figure 7. Temporal Consistency of the model: The results for
temporal consistency performance of our model. On x-axis, we
have images on different dates (dd/mm/yyyy format) for same
regions (Phase 1) of DHA and on y-axis we show the estimated
and ground truth BRs for that time.
The average deviation from ground truth built-up ratios is ap-
proximately ±0.007(or 0.7%), which is extremely good.
5. ANALYSES AND RESULTS
In this section, we will use our proposed pipeline to analyze
and quantify urban sprawl of DHA from 2010 to 2021. Here,
we will be counting the growth in the number of buildings over
the years. Figure 8 shows the a change in one of the regions of
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-4/W5-2022
7th International Conference on Smart Data and Smart Cities (SDSC), 19–21 October 2022, Sydney, Australia
This contribution has been peer-reviewed.
https://doi.org/10.5194/isprs-archives-XLVIII-4-W5-2022-197-2022 | © Author(s) 2022. CC BY 4.0 License.
201
Figure 8. Growth in Phase-5 DHA: The analysis results on one of the regions (Phase 5) of DHA. The left most image is for 2010, the
center image is for 2014, and the last image is for 2020.
DHA over these years. We note that these type of visualizations
can be performed with only segmentation model or even with
the help of Landsat Images. However, to quantify the growth,
we need to further post-process our results using (1) and (2).
Let us now see quantified growth in number of buildings, as
well as the built-up ratios, for the area of DHA.
5.1 Estimating urban sprawl
We collect satellite images for 17 different dates for Defence
Housing Authority (DHA) to perform the urban sprawl ana-
lysis. We pass these images to our proposed pipeline and es-
timate building construction profiles for each of the mentioned
date. The complete analysis show that that the building count
for DHA was 14233 (built-up ratio 16.2%) in 2010, which has
now increased to 27967 (built-up ratio 32%) in 2020. This
shows that in only ten years, the number of buildings in DHA
have doubled, which is a significant rise. This results are shown
in Figure 9.
Figure 9. Urban Sprawl in DHA: The results of building counts
(left) and built-up ratios (BRs) (right) with x-axis representing
the date (dd/mm/yyyy format).
6. CONCLUSION AND FUTURE DIRECTIONS
Establishing construction profiles of buildings for a region is
an essential problem in the process of collecting large-scale
urban data; nonetheless, relatively few people have concen-
trated on directly tackling this problem. The suggested frame-
work is based on datasets that are publicly available. The accur-
acy assessments for our pipeline indicate excellent performance
across a variety of evaluation metrics. As a consequence of this,
the findings are relevant to the work of both researchers and
urban planners. In addition, the framework can further be put
to use in the execution of spatio-temporal sprawl assessments,
as was covered in the preceding sections. However, the avail-
ability of society maps can become a bottleneck in the use of
our pipeline for many places, such as the slums (katchi abadis)
of South Asia. Hence, for future study, we intend to focus on
extracting such information without including society maps or
plots geo-locations.
ACKNOWLEDGEMENTS
This work was supported financially by the Higher Education
Commission (HEC) of Pakistan through a Grand Challenge Fund
Grant No. GCF-521.
REFERENCES
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.,
2018a. Encoder-decoder with atrous separable convolution for
semantic image segmentation. Proceedings of the European
conference on computer vision (ECCV), 801–818.
Chen, R., Li, X., Li, J., 2018b. Object-Based Features for House
Detection from RGB High-Resolution Images. Remote Sensing,
10(3). https://www.mdpi.com/2072-4292/10/3/451.
El Garouani, A., Mulla, D. J., El Garouani, S., Knight, J., 2017.
Analysis of urban growth and sprawl from remote sensing data:
Case of Fez, Morocco. International Journal of Sustainable
Built Environment, 6(1), 160-169. https://www.sciencedirect.
com/science/article/pii/S2212609016300668.
ESRI, 2020. Building Footprint Extraction.
https://www.arcgis.com/home/item.html?id=
a6857359a1cd44839781a4f113cd5934.
Greger, K., 2015. Spatio-Temporal Building Population Estim-
ation for Highly Urbanized Areas Using GIS. Transactions in
GIS, 19(1), 129–150.
Gupta, R., Hosfelt, R., Sajeev, S., Patel, N., Goodman, B.,
Doshi, J., Heim, E., Choset, H., Gaston, M., 2019. xbd: A data-
set for assessing building damage from satellite imagery.
Huang, X., Zhang, L., Zhu, T., 2014. Building Change De-
tection From Multitemporal High-Resolution Remotely Sensed
Images Based on a Morphological Building Index. IEEE
Journal of Selected Topics in Applied Earth Observations and
Remote Sensing, 7(1), 105-115.
Hui, J., Du, M., Ye, X., Qin, Q., Sui, J., 2019. Effective Build-
ing Extraction From High-Resolution Remote Sensing Images
With Multitask Driven Deep Neural Network. IEEE Geoscience
and Remote Sensing Letters, 16(5), 786-790.
ISPRS 2D Semantic Labeling Contest, n.d. Accessed: May 27,
2022 [online]. Available: ”http://www2.isprs.org/commissions/
comm3/wg4/semantic-labeling.html”.
Ji, S., Wei, S., Lu, M., 2018. Fully convolutional networks for
multisource building extraction from an open aerial and satel-
lite imagery data set. IEEE Transactions on Geoscience and Re-
mote Sensing, 57(1), 574–586.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-4/W5-2022
7th International Conference on Smart Data and Smart Cities (SDSC), 19–21 October 2022, Sydney, Australia
This contribution has been peer-reviewed.
https://doi.org/10.5194/isprs-archives-XLVIII-4-W5-2022-197-2022 | © Author(s) 2022. CC BY 4.0 License.
202
Li, X., Yao, X., Fang, Y., 2018. Building-A-Nets: Robust Build-
ing Extraction From High-Resolution Remote Sensing Images
With Adversarial Networks. IEEE Journal of Selected Topics in
Applied Earth Observations and Remote Sensing, 11(10), 3680-
3687.
Liu, P., Liu, X., Liu, M., Shi, Q., Yang, J., Xu, X.,
Zhang, Y., 2019. Building Footprint Extraction from High-
Resolution Images via Spatial Residual Inception Convolu-
tional Neural Network. Remote Sensing, 11(7). https://www.
mdpi.com/2072-4292/11/7/830.
Liu, Y., Zhang, Z., Zhong, R., Chen, D., Ke, Y., Peetham-
baran, J., Chen, C., Sun, L., 2018. Multilevel Building Detec-
tion Framework in Remote Sensing Images Based on Convo-
lutional Neural Networks. IEEE Journal of Selected Topics in
Applied Earth Observations and Remote Sensing, 11(10), 3688-
3700.
Maggiori, E., Tarabalka, Y., Charpiat, G., Alliez, P., 2017.
Can semantic labeling methods generalize to any city? the in-
ria aerial image labeling benchmark. 2017 IEEE International
Geoscience and Remote Sensing Symposium (IGARSS), IEEE,
3226–3229.
Meng, C., Liu, E., Neiswanger, W., Song, J., Burke, M., Lo-
bell, D., Ermon, S., 2021. IS-COUNT: Large-scale Object
Counting from Satellite Images with Covariate-based Import-
ance Sampling. arXiv preprint arXiv:2112.09126.
Ok, A., Senaras, C., Yuksel, B., 2013. Automated Detection of
Arbitrarily Shaped Buildings in Complex Environments From
Monocular VHR Optical Satellite Imagery. IEEE Transactions
on Geoscience and Remote Sensing, 51, 1701-1717.
Qin, X., He, S., Yang, X., Dehghan, M., Qin, Q., Martin, J.,
2018. Accurate Outline Extraction of Individual Building From
Very High-Resolution Optical Images. IEEE Geoscience and
Remote Sensing Letters, 15(11), 1775-1779.
Sahana, M., Hong, H., Sajjad, H., 2018. Analyzing urban spa-
tial patterns and trend of urban growth using urban sprawl mat-
rix: A study on Kolkata urban agglomeration, India. Science
of The Total Environment, 628-629, 1557-1566. https://www.
sciencedirect.com/science/article/pii/S0048969718305631.
Shah, A., Ali, K., Nizami, S. M., 2021. Spatio-temporal ana-
lysis of urban sprawl in Islamabad, Pakistan during 1979–2019,
using remote sensing. GeoJournal. https://doi.org/10.1007/
s10708-021-10413-6.
Shakeel, A., Sultani, W., Ali, M., 2019. Deep built-
structure counting in satellite imagery using attention based re-
weighting. ISPRS journal of photogrammetry and remote sens-
ing, 151, 313–321.
Shi, Y., Li, Q., Zhu, X. X., 2018. Building footprint gener-
ation using improved generative adversarial networks. IEEE
Geoscience and Remote Sensing Letters, 16(4), 603–607.
Silvan-Cardenas, J. L., Wang, L., Rogerson, P., Wu, C., Feng,
T., Kamphaus, B. D., 2010. Assessing fine-spatial-resolution
remote sensing for small-area population estimation. Interna-
tional Journal of Remote Sensing, 31(21), 5605–5634.
Terzi, F., Kaya, H. S., 2008. Analyzing urban sprawl patterns
through fractal geometry: The case of Istanbul metropolitan
area.
Van Etten, A., Lindenbaum, D., Bacastow, T. M., 2018. Spa-
cenet: A remote sensing dataset and challenge series. arXiv
preprint arXiv:1807.01232.
Wang, S., Bai, M., Mattyus, G., Chu, H., Luo, W., Yang, B.,
Liang, J., Cheverie, J., Fidler, S., Urtasun, R., 2016. Toron-
tocity: Seeing the world with a million eyes. arXiv preprint
arXiv:1612.00423.
Wen, Q., Jiang, K., Wang, W., Liu, Q., Guo, Q., Li, L., Wang, P.,
2019. Automatic building extraction from Google Earth images
under complex backgrounds based on deep instance segmenta-
tion network. Sensors, 19(2), 333.
Yuan, J., 2018. Learning Building Extraction in Aerial Scenes
with Convolutional Networks. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 40(11), 2793-2798.
Zakria, M., Rawal, H., Sultani, W., Ali, M., 2021. Cross-Region
Building Counting in Satellite Imagery using Counting Consist-
ency. arXiv preprint arXiv:2110.13558.
Zhang, B.-g., 2003. Application of remote sensing technology
to population estimation. Chinese geographical science, 13(3),
267–271.
Zhao, K., Kang, J., Jung, J., Sohn, G., 2018. Building extraction
from satellite images using mask r-cnn with building bound-
ary regularization. Proceedings of the IEEE conference on com-
puter vision and pattern recognition workshops, 247–251.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-4/W5-2022
7th International Conference on Smart Data and Smart Cities (SDSC), 19–21 October 2022, Sydney, Australia
This contribution has been peer-reviewed.
https://doi.org/10.5194/isprs-archives-XLVIII-4-W5-2022-197-2022 | © Author(s) 2022. CC BY 4.0 License.
203
... The deep network is built on the DeepLabV3+ architecture with a dilated ResNet encoder [9], that is trained using a Dice Loss on manually annotated datasets of various parts of Lahore, Pakistan. We train the model for 80 epochs using an 80-20 Train-Val split and a 8-batch size [10]. We use Google Earth satellite imagery at a fine resolution of 20 zoom level (about 0.3 meters per pixel) to create high-quality constructed settlement masks. ...
Preprint
Full-text available
Any policy-level decision-making procedure and academic research involving the optimum use of resources for development and planning initiatives depends on accurate population density statistics. The current cutting-edge datasets offered by WorldPop and Meta do not succeed in achieving this aim for developing nations like Pakistan; the inputs to their algorithms provide flawed estimates that fail to capture the spatial and land-use dynamics. In order to precisely estimate population counts at a resolution of 30 meters by 30 meters, we use an accurate built settlement mask obtained using deep segmentation networks and satellite imagery. The Points of Interest (POI) data is also used to exclude non-residential areas.
Article
Full-text available
Estimating the number of buildings in any geographical region is a vital component of urban analysis, disaster management, and public policy decision. Deep learning methods for building localization and counting in satellite imagery, can serve as a viable and cheap alternative. However, these algorithms suffer performance degradation when applied to the regions on which they have not been trained. Current large datasets mostly cover the developed regions and collecting such datasets for every region is a costly, time-consuming, and difficult endeavor. In this paper, we propose an unsupervised domain adaptation method for counting buildings where we use a labeled source domain (developed regions) and adapt the trained model on an unlabeled target domain (developing regions). We initially align distribution maps across domains by aligning the output space distribution through adversarial loss. We then exploit counting consistency constraints, within-image count consistency, and across-image count consistency, to decrease the domain shift. Within-image consistency enforces that the building count in the whole image should be greater than or equal to the count in any of its sub-image. Across-image consistency constraint enforces that if an image contains considerably more buildings than the other image, then their sub-images shall also have the same order. These two constraints encourage the behavior to be consistent across and within the images, regardless of the scale. To evaluate the performance of our proposed approach, we collected and annotated a large-scale dataset consisting of challenging South Asian regions having higher building densities and irregular structures as compared to existing datasets. We perform extensive experiments to verify the efficacy of our approach and report improvements of approximately 7–20% over the competitive baseline methods. The dataset and code are available here: https://github.com/intelligentMachines-ITU/domain-Adaptive-Building-Counting.
Article
Full-text available
Urbanization in Pakistan is increasing at 3% annually, the highest in South Asia. 50% of the population is expected to urbanize by 2025. The capital city of Pakistan, Islamabad has experienced phenomenal increase in the urban population and extent in the last four decades. The aim of this research was to analyze the urban sprawl of Islamabad and changes in the Land Use and Land Cover (LULC), with the help of satellite images. LULC statics were extracted from Landsat Multi-Spectral Scanner (MSS), Thematic Mapper (TM), Enhanced Thematic Mapper Plus (ETM+) and Operational Land Imager (OLI) images, for the years 1979, 1989, 1999, 2008 and 2019. There is an increase of 377 sq. km in built-up areas and slight increase of 47 sq km in the agriculture land. The forest cover has been reduced by 83 sq km and the water bodies have also reduced. The barren land has experienced an unprecedented decrease of 333 sq km in the meanwhile. The master plan is under revision to regularize such uncontrolled constructions and accommodate the increased demand. The city, as a result, has been facing issues of water supply, sanitation, transportation etc. This has seriously jeopardized the master plan of the city.
Article
Full-text available
The rapid development in deep learning and computer vision has introduced new opportunities and paradigms for building extraction from remote sensing images. In this paper, we propose a novel fully convolutional network (FCN), in which a spatial residual inception (SRI) module is proposed to capture and aggregate multi-scale contexts for semantic understanding by successively fusing multi-level features. The proposed SRI-Net is capable of accurately detecting large buildings that might be easily omitted while retaining global morphological characteristics and local details. On the other hand, to improve computational efficiency, depthwise separable convolutions and convolution factorization are introduced to significantly decrease the number of model parameters. The proposed model is evaluated on the Inria Aerial Image Labeling Dataset and the Wuhan University (WHU) Aerial Building Dataset. The experimental results show that the proposed methods exhibit significant improvements compared with several state-of-the-art FCNs, including SegNet, U-Net, RefineNet, and DeepLab v3+. The proposed model shows promising potential for building detection from remote sensing images on a large scale.
Article
Full-text available
Building footprint information is an essential ingredient for 3-D reconstruction of urban models. The automatic generation of building footprints from satellite images presents a considerable challenge due to the complexity of building shapes. In this letter, we have proposed improved generative adversarial networks (GANs) for the automatic generation of building footprints from satellite images. We used a conditional GAN (CGAN) with a cost function derived from the Wasserstein distance and added a gradient penalty term. The achieved results indicated that the proposed method can significantly improve the quality of building footprint generation compared to CGANs, the U-Net, and other networks. In addition, our method nearly removes all hyperparameters tuning.
Article
Full-text available
Building damage accounts for a high percentage of post-natural disaster assessment. Extracting buildings from optical remote sensing images is of great significance for natural disaster reduction and assessment. Traditional methods mainly are semi-automatic methods which require human-computer interaction or rely on purely human interpretation. In this paper, inspired by the recently developed deep learning techniques, we propose an improved Mask Region Convolutional Neural Network (Mask R-CNN) method that can detect the rotated bounding boxes of buildings and segment them from very complex backgrounds, simultaneously. The proposed method has two major improvements, making it very suitable to perform building extraction task. Firstly, instead of predicting horizontal rectangle bounding boxes of objects like many other detectors do, we intend to obtain the minimum enclosing rectangles of buildings by adding a new term: the principal directions of the rectangles θ. Secondly, a new layer by integrating advantages of both atrous convolution and inception block is designed and inserted into the segmentation branch of the Mask R-CNN to make the branch to learn more representative features. We test the proposed method on a newly collected large Google Earth remote sensing dataset with diverse buildings and very complex backgrounds. Experiments demonstrate that it can obtain promising results.
Article
Full-text available
In this paper, we propose a hierarchical building detection framework based on deep learning model, which focuses on accurately detecting buildings from remote sensing images. To this end, we first construct the generation model of the multi-level training samples using the Gaussian pyramid technique to learn the features of building objects at different scales and spatial resolutions. Then, the building region proposal networks are put forward to quickly extract candidate building regions, thereby increasing the efficiency of the building object detection. Based on the candidate building regions, we establish the multi-level building detection model using the convolutional neural networks (CNNs), from which the generic image features of each building region proposal are calculated. Finally, the obtained features are provided as inputs for training CNNs model, and the learned model is further applied to test images for the detection of unknown buildings. Various experiments using the Datasets I and II (in the Section V-A) show that the proposed framework increases the mean average precision (mAP) values of building detection by 3.63%, 3.85% and 3.77%, compared with the state-of-the-art methods, i.e., Method IV. Besides, the proposed method is robust to the buildings having different spatial textures and types.
Article
Building extraction from high-resolution remote sensing images has widely been studied for its great significance in obtaining geographic information. Many methods based on deep learning have been tried for the task; however, there is still much to explore about designing layers or modules for remote sensing data and taking full use of the unique features of buildings like shape and boundary. In this letter, an end-to-end network architecture based on U-Net is proposed. The U-Net architecture is modified with Xception module for remote sensing images to extract effective features. Also, multitask learning is adopted to incorporate the structure information of buildings. Two standard data sets (Massachusetts building data set and Vaihingen Data set) of high-resolution remote sensing images are selected to test our model and it achieves state-of-the-art results.
Article
The application of the convolutional neural network has shown to greatly improve the accuracy of building extraction from remote sensing imagery. In this paper, we created and made open a high-quality multisource data set for building detection, evaluated the accuracy obtained in most recent studies on the data set, demonstrated the use of our data set, and proposed a Siamese fully convolutional network model that obtained better segmentation accuracy. The building data set that we created contains not only aerial images but also satellite images covering 1000 km² with both raster labels and vector maps. The accuracy of applying the same methodology to our aerial data set outperformed several other open building data sets. On the aerial data set, we gave a thorough evaluation and comparison of most recent deep learning-based methods, and proposed a Siamese U-Net with shared weights in two branches, and original images and their down-sampled counterparts as inputs, which significantly improves the segmentation accuracy, especially for large buildings. For multisource building extraction, the generalization ability is further evaluated and extended by applying a radiometric augmentation strategy to transfer pretrained models on the aerial data set to the satellite data set. The designed experiments indicate our data set is accurate and can serve multiple purposes including building instance segmentation and change detection; our result shows the Siamese U-Net outperforms current building extraction methods and could provide valuable reference.