Content uploaded by Xiao Huang
Author content
All content in this area was uploaded by Xiao Huang on Mar 27, 2022
Content may be subject to copyright.
2410 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 15, 2022
Integrating Zhuhai-1 Hyperspectral Imagery
With Sentinel-2 Multispectral Imagery to Improve
High-Resolution Impervious Surface Area Mapping
Xiaoxiao Feng , Zhenfeng Shao , Xiao Huang , Luxiao He ,XianweiLv , and Qingwei Zhuang
Abstract—Mapping impervious surface area (ISA) in an accu-
rate and timely manner is essential for a variety of fields and
applications, such as urban heat islands, hydrology, waterlogging,
and urban planning and management. However, the large and
complex urban landscapes pose great challenges in retrieving ISA
information. Spaceborne hyperspectral (HS) remote sensing im-
agery provides rich spectral information with short revisit cycles,
making it an ideal data source for ISA extraction from complex
urban scenes. Nevertheless, insufficient single-band energy, the
involvement of modulation transfer function (MTF), and the low
signal-to-noise ratio (SNR) of spaceborne HS imagery usually re-
sult in poor image clarity and noises, leading to inaccurate ISA
extraction. To address this challenge, we propose a new deep feature
fusion-based classification method to improve 10 m resolution ISA
mapping by integrating Zhuhai-1 HS imagery with Sentinel-2 mul-
tispectral (MS) imagery.We extract deep features that include spec-
tral and spatial features, respectively, from MS and HS imagery
via a 2-D convolutional neural network (CNN), aiming to increase
feature diversity and improve the model’s recognition capability.
The Sentinel-2 imagery is used to enhance the spatial information
of the Zhuhai-1 HS image, improving the urban ISA retrieval
by reducing the impact of noises. By combining the deep spatial
features and deep spectral features, we obtain joint spatial-spectral
features, leading to high classification accuracy and robustness. We
test the proposed method in two highly urbanized study areas that
cover Foshan city and Wuhan city, China. The results reveal that
the proposed method obtains an overall accuracy of 96.72% and
96.75% in the two study areas, 18.78% and 8.66% higher than
classification results with only HS imagery as input. The final ISA
extraction overall accuracy is 95.42% and 95.50% in the two study
areas, the highest among the comparison methods.
Index Terms—Convolutional neural network (CNN), feature
fusion, impervious surface area (ISA) mapping, sentinel-2 imagery,
Zhuhai-1 spaceborne hyperspectral (HS) imagery.
Manuscript received August 9, 2021; revised October 30, 2021 and January
4, 2022; accepted March 5, 2022. Date of publication March 8, 2022; date
of current version March 25, 2022. This work was supported in part by the
National Natural Science Foundation of China under Grant 42090012, in part
by 03 Special Research and 5G Project of Jiangxi Province in China under
Grant 20212ABC03A09, in part by the Zhuhai Industry University Research
Cooperation Project of China under Grant ZH22017001210098PWC, and in
part by the Key R&D project of Sichuan Science and Technology Plan under
Grant 2022YFN0031. (Corresponding author: Zhenfeng Shao.)
Xiaoxiao Feng, Zhenfeng Shao, Luxiao He, Xianwei Lv, and Qingwei Zhuang
are with the State Key Laboratory of Information Engineering in Survey-
ing, Mapping, Remote Sensing, Wuhan University, Wuhan 430079, China
(e-mail: fengxxalice2018@gmail.com; shaozhenfeng@whu.edu.cn; helux-
iao@foxmail.com; xianweilv@whu.edu.cn; zhuangqingwei@whu.edu.cn).
Xiao Huang is with the Department of Geosciences, University of Arkansas,
Fayetteville, AR 72701 USA (e-mail: xh010@uark.edu).
Digital Object Identifier 10.1109/JSTARS.2022.3157755
I. INTRODUCTION
IMPERVIOUS surface area (ISA) is usually defined as natural
or artificial surfaces (e.g., roads, parking lots, roofs made
of cement concrete, glass, asphalt, plastic, tiles, metal, etc.)
covering in cities that prevent water from penetrating into the
ground [1]. The rapid progress of urbanization inevitably leads
to tremendous changes in land use and land cover types. ISA is
a key indicator in evaluating the urban ecological environment
and usually poses notable negative impacts on the urban environ-
ment [2], climate [3], [4], and hydrology [5]–[7]. Therefore, the
evaluation of ISA distribution should focus on not only its spatial
expansion, but also its environmental consequences. Further-
more, it is of great significance for the sustainable development
strategy of urban planning and management to obtain accurate
ISA information in a timely manner and investigate the impact
of its dynamic changes on the environment.
Remote sensing technology has been widely used in ISA
monitoring, thanks to its extensive spatial coverage and high
temporal frequency. Early studies on ISA were mostly based on
medium-resolution multispectral (MS) satellites such as Landsat
Thematic Mapper (TM) [8], [9] and Enhanced Thematic Mapper
(ETM+) [10]. However, the complexity of urban landscapes and
broadband reflectance data pose great challenges in ISA classi-
fication, as many urban materials cannot be distinguished ac-
curately. Besides, ISA maps with coarse resolutions are limited
in potential applications, e.g., distinguishing urban functional
areas [1]. In comparison, fine-resolution ISA maps allow for
more spatial-explicit studies such as investigating the impact of
urbanization on energy, water, carbon cycles, vegetation phenol-
ogy, and surface climate [11]. Hyperspectral (HS) imagery can
provide not only spatial information of features, but also rich
spectral information that can accurately reflect heterogeneous
spectral characteristics of features, leading to fine identification
and classification. Most of the existing ISA studies utilized
classic HS data captured by the airborne HS sensors [12],
such as the simulated Environmental Mapping and Analysis
Program (EnMAP) [13], the Hyperspectral Digital Imagery
Collection Experiment (HYDICE) [14], and Reflective Optics
System Imaging Spectrometer (ROSIS) [15]. Signal-to-noise
ratio (SNR) describes the quality of a measurement. In charge-
coupled device (CCD) imaging, SNR refers to the ratio of the
measured signal to the overall measured noise (frame-to-frame)
at that pixel. High SNR is particularly important in applications
requiring precise measurement. The advantage gained from the
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
FENG et al.: INTEGRATING ZHUHAI-1 HS IMAGERY WITH SENTINEL-2 MS IMAGERY 2411
fine spectral information obtained from HS sensors can be offset
by the lower SNR when compared to MS sensors because of
the fewer number of photons captured by each detector due
to the narrower width of the spectral channels. Compared to
the spaceborne HS data, images from airborne HS sensors are
characterized by high spatial resolution and high SNR. However,
the airborne HS data is limited in synoptic coverage at urban
scales, which limits their use for systematically mapping urban
land cover of arbitrary cities around the world [13]. This study
marks a pioneering effort to integrate the spaceborne HS data and
spaceborne MS data (Sentinel-2) for accurate and fine-grained
(10 m) ISA mapping.
The classification-based ISA extraction methods aim to first
extract spatial and spectral features and feed them into classifiers
to obtain the ISA distribution map. Traditional classification
methods include maximum likelihood estimation (MLE) [16],
support vector machine (SVM) [17], [18], random forest
(RF) [19], and their derivations [20]–[22]. Among them, SVM is
superior to MLE, as it can solve the nonlinear classification prob-
lem. Further, parallel SVM (PSVM) [23] has been developed to
solve the computational complexity problem, and the hierar-
chical PSVM method is designed based on sequential minimal
optimization (SMO) [24] and SVM. In addition, kernel methods
combined with SVM are widely used in HS image classifica-
tion to improve separability [25]. Recently, improved sparse
representation, e.g., synchronous orthogonal matching pursuit
(SOMP) [26] and synchronous subspace pursuit (SSP) [27],
was applied to HS image classification and achieved great
classification results. In the aforementioned methods, training
samples are used to learn the sparse representation dictionary,
where the test samples in HS images are sparsely represented.
The representation residuals are further compared to find the
best representation to determine the label of samples.
However, traditional classification methods largely rely on
expertise and are dependent on parameter settings, leading to
their low automation and low generalization. The deep learning
networks, such as stacked automatic encoder (SAE) [28], deep
belief network (DBN) [29], [30], and deep convolutional neural
network (DCNN) [31], [32], are different from traditional feature
extraction methods. Compared with other networks, CNN uses
local connections to extract features with shared weights. Such
a design facilitates effective information retrieval and reduces
the number of parameters needed to be trained. Chen et al. [33]
applied a self-coding network to classify the reduced HS images
and achieved decent results. They further found that CNN can
extract the spatial and spectral features of the objects in images
in a more effective manner, thus leading to better classification
results.
After reviewing relevant literature, we identified the following
challenges in ISA retrieval based on HS images:
1) The low SNR and modulation transfer function (MTF)
of spaceborne HS data lead to defective spatial informa-
tion, evidenced by the low-quality spectral information of
ground objects.
2) The spectral-based CNN methods fail to integrate the
spatial information of ground objects, which results in
salt-and-pepper noises in classified results, thus leading
to reduced classification accuracy.
To address these challenges, we propose a novel approach to
improving the ISA extraction accuracy by integrating Sentinel-2
MS data and Zhuhai-1 HS data. The first strategy is to first
fuse HS and MS images and then obtain the ISA results using
classifiers. Commonly used HS-MS image fusion methods can
be roughly classified into pan-sharpening and subspace-based
methods. Pan-sharpening-based methods include component
substitution, multiresolution analysis, and sparse representa-
tion [34]. The latter category, e.g., Bayesian method-based
methods and spectral unmixing-based methods, focuses on the
inherent spectral characteristics of scenes. The other strategy
is to first fuse the features extracted from HS and MS, and
further obtain the classification results. Comparing these two
strategies, the former one highly relies on the fused image, and
the accuracy of classification results based on pixel-level fusion
images depends on the spectral fidelity of the fusion algorithm.
Therefore, in this article, we use CNNs to extract the features
from HS and MS images, respectively, and fuse the features to
obtain the classification results.
The proposed integration process is achieved by fusing the
spectral and spatial deep features extracted from HS and MS
images, thus potentially improving the accuracy of the final ISA
map. As HS imagery contains abundant spectral information
while MS data contains detailed spatial information, we extract
spectral and spatial deep features from HS imagery and MS
imagery, respectively. In this study, we utilize two-dimensional
(2-D) CNN to extract the deep features and further enhance
features by fusing extracted spectral and spatial deep features.
To deal with salt-and-pepper noises of classification results, the
object-based image analysis (OBIA) [35] method is a commonly
used approach. However, the OBIA classification method is
mainly for images with very high spatial resolution. For this
study, the spatial resolution of HS satellite images used in this
article is 10 m, which is not ideal for the application of OBIA.
Therefore, we use a 2-D CNN network to extract the spatial
information of images and further perform impervious surface
classification.
The main contributions of this article are summarized as
follows.
1) The extraction of the spectral and spatial deep features
from HS and MS images, respectively, and their fusion
contribute to better feature retrieval from the ground ob-
jects in images, thus leading to improved classification
accuracy.
2) The fusion of spectral and spatial deep features improves
the model’s robustness and reduces noises in classified
results assisted by the supplement of spatial information.
3) Zhuhai-1 HS data (2-day revisiting cycle) and Sentinel-2
MS data (5-day revisiting cycle) have a considerably high
temporal resolution. Therefore, their combination real-
izes a high-temporal fine-grained ISA mapping, providing
the basis for future time series ISA analysis and timely
supports in urban land management and construction
planning.
The rest of this article is organized as follows. Section II
introduces related works and the proposed method. Section III
describes the study areas and experimental datasets. Section IV
presents and analyzes the experimental results. Section V
2412 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 15, 2022
discusses the effectiveness of the proposed method compared
with a single feature classification network and the effect of
different patch sizes on the ISA extraction. Finally, Section VI
concludes this article.
II. METHODOLOGY
A. Convolutional Neural Network
CNN has received wide attention in recent years and achieved
great performances in classification, detection, and many other
tasks. CNN has two characteristics: local connection and shared
weights. In each convolution layer, feature maps are generated
by multiple learnable filters, which can be expressed as
yl
j=
d
i=1
fxl−1
i∗wl
ij +bl
j(1)
where xl−1
idenotes the ith feature map of l−1layer, yl
jdenotes
the jth feature map of llayer, and dis the number of the input
feature maps. wl
ij and bl
jare the randomly initialized weights
and bias, respectively. ∗denotes the convolutional operator, and
fdenotes the nonlinear activation functions, such as Sigmoid,
Tanh, and rectified linear unit (ReLU) [36]. In this article, we
use parametric ReLU (PReLU) [37], which can be formulated
as
PReLU (yi)=yiif yi>0
aiyiif yi≤0(2)
where yiis the input of ith channel, and aiis a coefficient that
controls the slope. After convolution operations, a max-pooling
layer is used to downsample the feature maps [38]. In this way,
the output size and the number of parameters can be reduced,
effectively avoiding overfitting. The max-pooling operation can
be formulated as
yr,c =max
0≤g,n≤h(xr+g,c+h)(3)
where yr,c is the neuron value at (r, c)in the output layer. gand
hare the pixel position around the center neuron at (r, c)within
the image patch.
The network training procedure consists of forward and
backward propagations, aiming to reduce the gap between the
predicted labels and the ground truth labels by updating model
parameters. The loss/cost is calculated by the differences be-
tween the predicted values and the ground-truthing values in
the forward propagation. The purpose of backpropagation is to
reduce loss by adjusting the parameters. In this study, we use
the softmax cross-entropy loss
c=−1
m
m
i=1
[xilog [zi]+[1−xi]log [1 −zi]] + λ
2m
N
j=1
w2
j
(4)
where mis the size of the image batch, xiand zidenote the
ith predicted label and the ground truth label, respectively. N
means the number of weights. λis the parameter to adjust the
proportion between the former term (original loss function)
and the regularization term (the latter term) in (4). Besides,
we set λto 1
2to simplify the process of derivation. Studies
Fig. 1. Workflow of ISA extraction by fusing spectral-spatial deep features
using 2-D CNN.
Fig. 2. Workflow of the 2-D CNN-based deep features extraction. (a) the
workflow of patchwise feature extraction; (b) the workflow of the pixelwise
feature extraction.
have proved that l2regularization term can avoid models from
overfitting [31], [39].
The classification approaches of HS images based on CNN
can be grouped into three categories, i.e., spectral feature-based,
spatial feature-based, and spatial-spectral feature-based meth-
ods [40]. Spectral feature-based classification methods apply
one-dimension (1-D) CNN to extract the deep spectral features
for classification [41], [42]. In comparison, spatial feature-based
classification methods apply 2-D CNN to extract the spatial
information for classification [43]. The main difference between
1-D CNN and 2-D CNN is the dimensionality of the convolution
operation. In this study, we use 2-D CNN to extract the spectral
features from HS images and spatial features from MS images
and further fuse the obtained spectral-spatial deep features to
improve the classification accuracy of ISA.
B. Extraction and Fusion of Spectral and Spatial Features via
2-D CNN
HS images contain rich spectral information that benefits
accurate descriptions of the spectral characteristics of ground
objects. Given the spaceborne nature, HS images are with lim-
ited SNR and spatial resolution. In contrast, MS images with
the same spatial resolution are characterized by high SNR.
Therefore, to simultaneously obtain the spectral and spatial
information, we use 2-D CNN to extract the deep spectral
features from HS images and the deep spatial features from MS
images. We further fuse the extracted deep features for land
cover classification and eventually map the fine-grained ISA
distribution. The specific workflow is shown in Fig. 1.
1) Deep Features Extraction From HS and MS Datasets: In
this article, the spatial and spectral deep features of images are
extracted by three convolution layers and a fully connected layer
[see Fig. 2(a)]. The first layer takes the 27 ×27 image patch with
FENG et al.: INTEGRATING ZHUHAI-1 HS IMAGERY WITH SENTINEL-2 MS IMAGERY 2413
N1channels and calculates 64 feature maps using 4 ×4 receptive
field and a nonlinear activation PReLU. The second layer takes
the 12 ×12 image patch with 64 channels and calculates 128
feature maps using 5 ×5 receptive field and PReLU. The
third layer takes the 4 ×4 image patch with 128 channels and
calculates 256 feature maps using 4 ×4 receptive field and a
nonlinear activation PReLU. The calculating process of these
three convolution layers can be expressed in (5). Finally, the full
connected layer takes the 1 ×1 vector with 256 feature maps to
derive the classification results. The workflow of the patchwise
feature extraction is shown in Fig. 2(a), and the workflow of
pixelwise feature extraction is shown in Fig. 2(b).
⎧
⎪
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎪
⎩
f1(x)=max (0,b
1+w1∗x)
w1:64×(4 ×4×N1),b
1:64×1
f2(x)=max (0,b
2+w2∗f1(x))
w2: 128 ×(5 ×5×64) ,b
2: 128 ×1
f3(x)=max (0,b
3+w3∗f2(x))
w3: 256 ×(4 ×4×128) ,b
3: 256 ×1.
(5)
To compare the classification performance between the pixel-
based 1-D CNN and image patch-based 2-D CNN, we use two
workflows to obtain the results [see Fig. 2(a) and (b)]. The pixel-
based 1-D CNN extracts only spectral features, while the patch-
based 2-D CNN extracts both spectral and spatial features from
images. We further conduct experiments to analyze how such
workflow selection influences classification accuracy.
2) Spectral and Spatial Deep Features Fusion: Multisource
images contain diverse information, while single-source images
may not achieve the best classification performance due to
their lack of feature diversity. We use concatenation to fuse the
features to enhance feature discrimination ability, which denote
as FF-C
H(l+1)
C=H[l]
HS,H[l]
MS(6)
where [•,•]denote the elementwise addition, elementwise mul-
tiplication, and concatenation operations, respectively. H[l]
HS and
H[l]
MS denote the lth layer features extracted from HS and MS
images, respectively.
III. STUDY AREAS AND DATASETS
This study includes two study areas that cover parts of Foshan
city in Guangdong Province, China and Wuhan city in Hubei
Province, China, respectively. The HS dataset is derived from
the Zhuhai-1 Orbita HS satellite, while the MS dataset is derived
from the Sentinel-2 satellite.
A. Zhuhai-1 OHS HS Datasets
The second batch of Zhuhai-1 microsatellites was success-
fully launched on April 26, 2018, including four Orbita HS
satellites (referred to as OHS-A, OHS-B, OHS-C, and OHS-D)
and one video satellite (OVS-2 A). The spatial resolution of OHS
data is 10 m, with an imaging range of 150 km, the spectral
resolution of 2.5 nm, and the spectrum from 400 to 1000 nm
(see Table I). A single HS satellite has 15–16 daily orbits, and
the single data acquisition time of each orbit is less than 8 min.
At present, the revisiting cycle of the four satellites is two days.
TAB LE I
CENTER WAVELENGTH OF OHS HS DATA
TAB LE I I
CENTER WAVELENGTH AND SPAT IAL RESOLUTION OF SENTINEL-2 IMAGERY
The OHS satellite is characterized by its small size, high spatial
resolution, large breadth, and short revisit period. It is expected to
benefit various tasks that include ecological environment moni-
toring, urban construction management, agricultural production,
disaster prediction, and assessment.
B. Sentinel-2 MS Datasets
Sentinel-2 is an Earth observation mission from the Coperni-
cus Programme (operated by the European Space Agency) that
systematically acquires optical imagery at high spatial resolution
(10–60 m) over land and coastal waters. The mission supports
a broad range of services and applications such as agricultural
monitoring, emergency management, land cover classification,
and water quality monitoring. Sentinel-2 has a 5-day revisiting
cycle. We select the bands of 10 m spatial resolution for feature
extraction (see Table II). The dataset is downloaded from the
USGS website from the collection of Level-1 C products.
C. Study Area
The first study area (344.84 km2), located at 113◦4–
113◦15E, 22◦48–22◦59N, covers part of Foshan city in South
China’s Guangdong province (see Fig. 3). Lying in the middle
of the Pearl River delta plain, Foshan city has a high degree
of urbanization and owns a large number of scattered hills,
rivers, and water networks, including navigation, irrigation,
aquaculture, and other functional areas. The second study area
(388.09 km2), located at 114◦6–114◦20E, 30◦22–30◦34N,
covers part of Wuhan city in Hubei province (see Fig. 4). The
Wuhan city is located in the east of Jianghan Plain and on the
middle reaches of the Yangtze River at the intersection of the
Yangtze and Han rivers.
2414 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 15, 2022
Fig. 3. Study area in Foshan city (1855 ×1855 pixels). (a) Zhuhai-1 HS image (shown in bands 12, 6, 1 as RGB) acquired on November 9, 2019; (b) Sentinel-2
MS image (shown in bands 4, 3, 2 as RGB) acquired on November 11, 2019).
Fig. 4. Study area in Wuhan city (1970 ×1970 pixels). (a) Zhuhai-1 HS image (shown in bands 12, 6, 1 as RGB) acquired on September 21, 2019; (b) Sentinel-2
MS image (shown in bands 4, 3, 2 as RGB) acquired on September 22, 2019).
FENG et al.: INTEGRATING ZHUHAI-1 HS IMAGERY WITH SENTINEL-2 MS IMAGERY 2415
Fig. 5. Examples of 8 different land cover types in the Foshan study area. (a) Google Earth image; (b) Zhuhai-1 HS image; (c) Sentinel-2 MS image.
Both study areas are characterized by high-level urbanization
and dense water networks, making them prone to waterlog-
ging issues, especially after frequent and intensive rain. The
fine-resolution ISA distribution can provide the basis for the
investigation of urban resilience. Monitoring of ISA distribution
dynamics plays a vital role in urban environmental impact
analysis and planning management. Both Zhuhai-1 HS image
and Sentinel-2 MS image were captured under clear-sky condi-
tions to illustrate the effectiveness of the proposed classification
workflow that integrates these two images for an improved
10 m ISA mapping. Given the short time intervals between HS
and MS images in the two study areas, we believe there exist
no significant changes in ground features. The image SNR is
estimated by the number of object types in the study area [44].
The SNR values of the HS and MS images in the Foshan study
area are 28.45 and 149.69 dB, respectively. The SNR values
of HS and MS images in the Wuhan study area are 35.36 and
149.66 dB, respectively.
IV. EXPERIMENTS AND RESULTS
In this section, we detail our experimental settings and present
the results along with the analysis. Section IV-A details the sam-
ple selection procedure, Section IV-B shows the experimental
setup. Section IV-C shows the effectiveness of the proposed
fusion algorithm that integrates HS and MS deep features, and
Section IV-D shows the comparison of classification results
obtained from the proposed method and other state-of-art clas-
sification methods.
A. Sample Selection
Before the sample selection, the HS and MS images were geo-
metric registered manually by selecting correspondence points.
The training, validation, and testing samples used in this study
were all selected from Sentinel-2 images via human interpreta-
tion against Google Earth imagery. We first randomly select the
sample points (pixels) from the image, making sure the sampling
points are evenly distributed on the image. Then, we classify the
points into their corresponding types. Finally, we divide sample
points into training, validation, and testing samples according
TABLE III
NUMBER OF TRAINING,VALIDATION,AND TESTING SAMPLES USEDINTHE
FOSHAN STUDY AREA
TAB LE I V
NUMBER OF TRAINING,VALIDATION,AND TESTING SAMPLES USEDINTHE
WUHAN STUDY AREA
to the ratio of 8:1:1. For the pixelwise input, samples are the
central pixels, while for the patchwise input, samples are the
patches with different sizes centered around the central pixels.
The Foshan study area has eight land cover types, i.e., vegetation,
roof, asphalt road, river, dense building, bright ISA, pound, and
soil (see Fig. 5). The number of samples for each land cover type
can be found in Table III. The total samples of the Foshan study
area contain 473 302 pixels. For the eight derived land cover
types, the land cover types that include roof, asphalt road, dense
building, and bright ISA are classified as ISA.
The Wuhan study area has 10 land cover types, including
soil, bright ISA, concrete road, vegetation, dense building, lake,
asphalt road, algae, roof, and river (see Fig. 6). The number
of samples for each classified land cover type is listed in Ta-
ble IV. The total samples of the Wuhan study area contain
2416 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 15, 2022
Fig. 6. Examples of eight different land cover types in the Foshan study area. (a) Google Earth image; (b) Zhuhai-1 HS image; (c) Sentinel-2 MS image.
Fig. 7. Spectral reflectance curves of each land cover type from Zhuhai-1 HS image for the training samples in the Wuhan study area. The curve type is obtained
by calculating the average value and standard deviation of the spectral reflectance of all training samples for each land cover type and for each given band.
294 449 pixels. Land cover types that include bright ISA, con-
crete road, dense building, asphalt road, and roof are classified as
ISA.
To illustrate that different land cover types have distinct
spectral characteristics in HS images, which is beneficial for
the classification, we present the spectral reflectance curves in
Fig. 7. Using the Wuhan study area as an example. For each land
cover type in the Wuhan study area, we calculate the average and
standard deviation of all training samples for each land cover
type and for each band, following [45]. Fig. 8(a) and (b) shows
the ground-truth distribution of land cover types in the Foshan
and Wuhan study areas.
B. Experimental Setup
1) Implementation Details: The proposed network in this
study is implemented on the Pytorch platform with Adam opti-
mizer [46]. In the network training, we set the maximum number
of epochs to 30, the batch size in the training phase to 100, and
the learning rate to 0.001. The input data is normalized into
[0, 1]. According to the number of training samples in the two
study areas, we set the number of training batches to 3786 and
2355 in Foshan and Wuhan study areas, respectively. To reduce
overfitting and to stabilize the network during the training phase,
we set the l2norm regularization to 0.01.
2) Comparison With Baseline Methods: The competing
methods are the classic classification methods with the following
parameter settings.
1) RF: 200 decision trees are used in the classifier.
2) SVM: The kernel is the radial basis function with two op-
timal hyperparameters σand λ, set to 0.1 and 0.01, respectively.
3) Multinomial Logistic Regression (MLR): We choose the
l2regularization as the penalty (set to 0.01) and “lbfgs” as the
solver.
4) Multilayer Perceptron (MLP): We set the batch size to
100, the max epoch to 30, the l2norm regularization to 0.01,
activation function to ReLU, and the optimizer to Adam.
5) Vanilla Recurrent Neural Network (RNN).
6) RNN with gated recurrent units (GRU).
7) RNN with long short term memory (LSTM). The code for
these competing methods is available in [47]. All methods use
the same training, validation, and testing samples.
The performances of the classification results are assessed
based on three indicators that include the overall accuracy (OA),
average accuracy (AA), and Kappa coefficient (Kappa). The OA
measures the ratio between correctly classified testing samples
and the total number of testing samples. The AA measures
the average percentage of correctly classified samples for an
individual class. The Kappa measures the percentage agreement
corrected by the level of agreement that can be expected by
chance alone. Each land cover type is assessed based on two
FENG et al.: INTEGRATING ZHUHAI-1 HS IMAGERY WITH SENTINEL-2 MS IMAGERY 2417
Fig. 8. (a1) and (a2), respectively, show the land cover and ISA in the Foshan study area; (b1) and (b2), respectively, show the land cover and ISA in the Wuhan
study area.
TAB LE V
LAND COVER TYPE CLASSIFICATION RESULTS (UA, PA AND AA) IN THE
FOSHAN STUDY AREA BASED ON FF-C
indicators that include the User’s Accuracy (UA) and Producer’s
Accuracy (PA). UA represents the number of correctly classified
samples divided by the total number of samples in the ground
truth. PA represents the number of correctly classified samples
divided by the total number of samples classified as the land
cover type.
C. Classification Results
In this section, we compare the classification results of the
proposed method and other state-of-art methods.
1) The Performance of the Feature Fusion Methods Classi-
fication: The feature fusion-based method is denoted as FF-C.
We extract the image patches in size of 27 ×27 pixels, a small
patch size to ensure a homogenous land cover type in each patch.
From Tables V and VI, it can be seen that the classification
results of FF-C yields high accuracy in each type. In Table V,
Bright ISA in the Foshan study area can be identified accurately
with the highest PA. From Table VI, we notice that the land cover
classification results from FF-C are satisfactory, which proves
TAB LE V I
LAND COVER TYPE CLASSIFICATION RESULTS (UA, PA AND AA) IN THE
WUHAN STUDY AREA BASED ON FF-C (HS AND MS IMAGES)
the strong capability of the feature fusion method. This is very
helpful with the fine ISA distribution extraction.
2) Classification Results of FF-C and Comparison Methods:
In this section, we compare the classification accuracy of each
land cover type and OA, AA, Kappa obtained by different
methods for the two study areas. Quantitative results are shown
in Tables VII and VIII. Figs. 11 and 12 present classification
maps in two study areas for visual comparison among different
methods. Table VII indicates that the proposed FF-C method
obtains the best AA, OA, and Kappa in the Foshan study area,
higher than the best results among the comparison methods
(obtained from RF) by 5.78%, 2.76%, and 3.55%, respectively.
The proposed FF-C method presents the best classification ac-
curacy for all land cover types, except for vegetation, roof, and
river.
From Table VII, we notice that FF-C also yields the best AA,
OA, and Kappa, higher than GRUthat obtains the second-highest
OA by 2.05%, 0.97%, and 1.21%, respectively. Comparing
2418 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 15, 2022
TAB LE V II
QUANTITATIVE COMPARISONS OF DIFFERENT METHODS IN TERMS OF OA, AA, AND KAPPA IN THE FOSHAN STUDY AREA
The bold numbers indicate the best values for accuracy assessment.
TABLE VIII
QUANTITATIVE COMPARISONS OF DIFFERENT METHODS IN TERMS OF OA, AA, AND KAPPA IN THE WUHAN STUDY AREA
The bold numbers indicate the best values for accuracy assessment.
TAB LE I X
OVERALL ACCURACY OF ISA DISTRIBUTION OF TWO STUDY AREAS FROM
DIFFERENT METHODS
The bold numbers indicate the best accuracy of ISA extraction.
Tables VII and VIII, it can be seen that for both study areas, the
performance of FF-C is generally superior to other classification
methods. Due to the higher HS image quality in the Wuhan
study area, land cover classification results in the Wuhan study
area are better than those in the Foshan study area. The above
results demonstrate that the integration of MS and HS data via
feature fusion improves the classification accuracy of land cover
types.
Figs. 9 and 10 present the land cover classification maps
obtained by different methods in the Foshan and Wuhan study
areas, respectively. A visual comparison reveals that pixelwise
classification methods result in salt-and-pepper noises in classi-
fied land use types. In comparison, the proposed FF-C method
yields smoother classification maps due to the combination of
deep features from MS and HS images that further enhance
the model’s identification ability. Table IX shows the overall
accuracy of extracted ISA from two study areas. The results
suggest that FF-C obtains the highest accuracy of ISA. For the
Foshan study area the OA from FF-C is higher than RF that
obtains the second-highest OA by 6.19%. For the Wuhan study
area the OA from FF-C is higher than GRU that obtains the
second-highest OA by 1.36%.
The final ISA extraction results obtained by the proposed FF-
C method in Foshan and Wuhan study areas are shown in Fig. 13
Table X shows the proportion of each land cover and ISA in two
study areas. From Fig. 13 and Table X, we notice that, compared
to the Foshan study area, pervious surfaces (e.g., green space
and lakes) in the Wuhan study have more extensive coverage.
Even though the proportion of water in the Foshan study area is
larger, it is mostly used for aquaculture. The Foshan study area
is located in Guangdong Province, one of the fastest-growing
provinces in China, so its urbanization process is considerably
faster than the Wuhan study area.
V. D ISCUSSION
In this section, we analyze the effectiveness of the feature
fusion strategy by comparing the 1-D CNN and 2-D CNN
classification without performing HS and MS feature fusion in
Section V-A. Section V-B shows the visual comparison of clas-
sification results obtained from different methods. Section V-C
discusses the impact of different patch size on classification
results.
A. Effectiveness HS and MS Data Feature Fusion
To verify the effectiveness of deep features fusion in
land cover classification, we analyze the classification results
FENG et al.: INTEGRATING ZHUHAI-1 HS IMAGERY WITH SENTINEL-2 MS IMAGERY 2419
Fig. 9. MS image and classification maps from different methods in the Foshan study area with one demarcated areas zoomed in two times for easy observation.
TAB LE X
PROPORTION OF EACH LAND COVER AND ISA IN TWO STUDY AREAS.
2420 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 15, 2022
Fig. 10. MS image and classification maps from different methods in the Wuhan study area with one demarcated areas zoomed in two times for easy observation.
obtained by spectral and spatial deep features separately, as well
as by the feature fusion method based on the same training, vali-
dation, and testing samples. For the deep feature fusion method,
we first use 2-D CNN to, respectively, extract the deep features
from the HS image and MS image and use fuse the features
to explore whether spectral-spatial deep feature fusion can lead
to a better land cover classification accuracy. We present the
classification results using 1-D CNN-based methods (HS-1-D
CNN and MS-1-D CNN) and 2-D CNN-based methods (HS-2-D
CNN and MS-2-D CNN) for HS and MS image classification,
respectively. 1-D CNN-based methods take pixelwise input,
while 2-D CNN-based methods take patchwise input.
Fig. 12 presents the land cover type classification accuracy
(AA, OA, and kappa) in the Foshan study area on HS and MS
data under the FF-C fusion strategy. The results suggest that
2-D CNN-based methods obtain higher accuracy compared to
1-D CNN-based methods. Furthermore, the FF-C obtains the
best classification results for all land cover types except bright
ISA and soil. Although the accuracy of bright ISA and soil from
FF-C fail to achieve the best results, it is very close to the optimal
value. In addition, FF-C obtains the highest AA, OA, and kappa
in the Foshan study area. The improvement curve (red lines) in
Fig. 12 shows the notable improvement of FF-C in all land cover
types, especially in the land cover type of road (an improvement
of 0.79).
From Table XI, we observe that the integration of the deep
features extracted from MS data leads to improved classification
accuracy in all land cover types from the Foshan study area. This
not only verifies the effectiveness of the feature fusion strategy
on enhancing the feature representation, but also indicates that
such an integration of MS and HS data can compensate for the
quality deficiency in HS data.
FENG et al.: INTEGRATING ZHUHAI-1 HS IMAGERY WITH SENTINEL-2 MS IMAGERY 2421
Fig. 11. ISA distribution (from FF-C) in two study areas.
Fig. 12. Land cover type classification accuracy (AA, OA, and kappa) in the
Foshan study area on HS and MS data under FF-C fusion strategy. The red curve
reveals the improvement comparing the method that fuses HS and MS data to
the method that uses HS data alone.
Fig. 13. Land cover type classification accuracy (AA, OA, and kappa) in the
Wuhan study area on HS and MS data under FF-C fusion strategy. The red curve
reveals the improvement comparing the method that fuses HS and MS data to
the method that uses HS data alone.
Fig. 13 presents the land cover type classification accuracy
(AA, OA, and kappa) in the Wuhan study area on HS and MS data
under the FF-C fusion strategy. Table XI shows the land cover
type classification results (UA and PA) in the Wuhan study area
based on 1-D/2-D CNN (HS images alone) and feature fusion
strategies (HS and MS images). The results reveal that the land
cover type of concrete road achieves the greatest improvement in
accuracy by 0.4561. The experimental results from our two study
areas classification demonstrate the effectiveness of integrating
MS data with HS data when performing land cover classification.
The deep features from MS data might enhance the spatial
information from HS data, thus leading to better classification
performance when MS and HS data are fused.
TAB LE X I
LAND COVER TYPE CLASSIFICATION RESULTS (UA AND PA) IN THE FOSHAN
STUDY AREA BASED ON 1D/2DCNN(HSIMAGES ALONE)AND FF-C (HS
AND MS IMAGES)
TAB LE X II
LAND COVER TYPE CLASSIFICATION RESULTS (UA AND PA) IN THE WUHAN
STUDY AREA BASED ON 1D/2DCNN(HSIMAGES ALONE)AND FF-C (HS
AND MS IMAGES)
Comparing results from these two study areas, we notice that
the improvement in the Wuhan study area is not as notable as the
one in the Foshan study area. This is because, with similar SNRs
of the MS data, the Wuhan study area has a higher quality HS
image (SNR =35.36 dB) than that in the Foshan study area (SNR
=28.45 dB). Thus, the designed feature enhancement model
has a less notable impact in the Wuhan study area. We notice
that the SNRs in MS of both study areas are around 150 dB,
twice higher than that of HS data. This means MS data contains
more spatial information of ground objects than HS data. Given
that the 2-D CNN can extract the deep features from images
by considering the contextual information in both spatial and
spectral domains, fusing the spectral and spatial deep features
extracted from HS image and MS image is able to improve
classification performance.
B. Visual Comparison
This section presents the details in classification maps from
different methods in the two study areas. For the Foshan study
area (see Fig. 14), the highlighted black rectangle is dominated
by bare soil and vegetation, belonging to the pervious surface;
while the classified category from the comparison method is
dense building, belonging to impervious surface. Such mis-
classification leads to reduced ISA extraction accuracy and
overestimation of ISA. For the Wuhan study area (see Fig. 15),
the highlighted black rectangle is dominated by bare soil, which
is wrongly classified into dense buildings and concrete roads
in the comparison method. Overall, it can be seen that the
classification from the proposed FF-C method can improve the
accuracy of features recognition, thus obtaining more accurate
ISA distribution information.
2422 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 15, 2022
Fig. 14. Selected classification results of the Foshan study area.
Fig. 15. Selected classification results of the Wuhan study area.
TABLE XIII
LAND COVER TYPE CLASSIFICATION RESULTS IN THE FOSHAN STUDY AREA BASED ON FF-C WITH DIFFERENT PAT CH SIZE
C. Patch Size Analysis
The size of input patches is an important parameter that
determines, to a certain degree, the classification accuracy of
the model. To explore the influence of patch sizes on the classi-
fication performance, we conduct additional experiments in the
Foshan study area. Table XIII shows the land cover classification
results in the Foshan study area based on FF-C with different
patch sizes. We observe improved classification accuracy with
the increase in the patch sizes from 13 to 27, especially for soil
(an improvement by 16.23%).
Fig. 16 presents the OA of classification results corresponding
to different patch sizes. It can be seen that when the patch size
reach 27 pixel, the accuracy is the highest. Therefore, the patch
size is set to 27 in our experiments.
Fig. 16. Overall accuracy (%) with different patch sizes in the Foshan study
area.
FENG et al.: INTEGRATING ZHUHAI-1 HS IMAGERY WITH SENTINEL-2 MS IMAGERY 2423
VI. CONCLUSION
In this study, we propose a 2-D CNN-based method to improve
the accuracy of ISA extraction at 10 m spatial resolution by
combining Sentinel-2 MS data and Zhuhai-1 HS data. We test
our proposed approach in two study areas that cover Foshan
and Wuhan city, China. We first utilize 2-D CNN to extract
the spatial and spectral deep features of MS data and HS data,
then fuse the extracted deep features via a fully connected layer
for the final classification. To investigate the influence of the
fusion method on the final results, we compare the feature
fusion strategies with other comparison methods. The results
prove the superiority of feature fusion methods compared to
nonfusion methods. In the future, we plan to explore the im-
pact of model depths on image feature extraction and develop
more advanced fusion modules to take full advantage of the
detailed spectral information from HS images and detailed spa-
tial information from MS images. In addition, we plan to test
the proposed method in other regions to further investigate its
generalizability.
ACKNOWLEDGMENT
The authors would like to the anonymous reviewers, for their
valuable suggestions and comments that helped us improve this
article significantly.
REFERENCES
[1] Q. Weng, “Remote sensing of impervious surfaces in the urban areas:
Requirements, methods, and trends,” Remote Sens. Environ., vol. 117,
pp. 34–49, 2012.
[2] C. Li, Z. Shao, L. Zhang, X. Huang, and M. Zhang, “A comparative
analysis of index-based methods for impervious surface mapping using
multiseasonal sentinel-2 satellite data,” IEEE J. Sel. Topics Appl. Earth
Observ. Remote Sens., vol. 14, pp. 3682–3694, Mar. 2021.
[3] A. J. Arnfield, “Two decades of urban climate research: A review of
turbulence, exchanges of energy and water, and the urban heat island,”
Int. J. Climatol., A J. Roy. Meteorological Soc., vol. 23, no. 1, pp. 1–26,
2003.
[4] P. Coseo and L. Larsen, “How factors of land use/land cover, building
configuration, and adjacent heat sources and sinks explain urban heat
islands in Chicago,” Landscape Urban Plan., vol. 125, pp. 117–129,
2014.
[5] K. Conway,J. Barrie, P. Hill, W. Austin, and K. Picard, “Mapping sensitive
benthic habitats in the strait of Georgia, coastal British Columbia: Deep-
water sponge and coral reefs,” Geol. Surv. Can., vol. 2, pp. 1–6, 2007.
[6] H. Du et al., “Influences of land cover types, meteorological conditions,
anthropogenic heat and urban area on surface urban heat island in the
yangtze river delta urban agglomeration,” Sci. Total Environ., vol. 571,
pp. 461–470, 2016.
[7] Z. Shao, H. Fu, D. Li, O. Altan, and T. Cheng, “Remote sensing mon-
itoring of multi-scale watersheds impermeability for urban hydrological
evaluation,” Remote Sens. Environ., vol. 232, 2019, Art. no. 111338.
[8] X.-P. Song, J. O. Sexton, C. Huang, S. Channan, and J. R. Townshend,
“Characterizing the magnitude, timing and duration of urban growth from
time series of landsat-based estimates of impervious cover,” Remote Sens.
Environ., vol. 175, pp. 1–13, 2016.
[9] L. Zhang, Q. Weng, and Z. Shao, “An evaluation of monthly impervious
surface dynamics by fusing landsat and modis time series in the Pearl
river delta, China from 2000 to 2015,” Remote Sens. Environ., vol. 201,
pp. 99–114, 2017.
[10] D. Lu and Q. Weng, “Spectral mixture analysis of the urban landscape in
indianapolis with landsat ETM imagery,” Photogrammetric Eng. Remote
Sens., vol. 70, no. 9, pp. 1053–1062, 2004.
[11] X. Huang, D. Wen, J. Li, and R. Qin, “Multi-level monitoring of subtle
urban changes for the megacities of China using high-resolution multi-
view satellite imagery,” Remote Sens. Environ., vol. 196, pp. 56–75, 2017.
[12] S. Roessner, K. Segl, U. Heiden, and H. Kaufmann, “Automated differen-
tiation of urban surfaces based on airborne hyperspectral imagery,” IEEE
Trans. Geosci. Remote Sens., vol. 39, no. 7, pp. 1525–1532, Jul. 2001.
[13] A. Okujeni, S. van der Linden, and P. Hostert, “Extending the vegetation-
impervious-soil model using simulated enmap data and machine learning,”
Remote Sens. Environ., vol. 158, pp. 69–80, 2015.
[14] B. Feng and J. Wang, “Constrained nonnegative tensor factorization for
spectral unmixing of hyperspectral images: A case study of urban imper-
vious surface extraction,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 4,
pp. 583–587, Apr. 2019.
[15] F. Chen, K. Wang, T. Van de Voorde, and T. F. Tang, “Mapping urban land
cover from high spatial resolution hyperspectral data: An approach based
on simultaneously unmixing similar pixels with jointly sparse spectral
mixture analysis,” Remote Sens. Environ., vol. 196, pp. 324–342, 2017.
[16] A. H. Strahler, “The use of prior probabilities in maximum likelihood
classification of remotely sensed data,” Remote Sens. Environ., vol. 10,
no. 2, pp. 135–163, 1980.
[17] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sens-
ing images with support vector machines,” IEEE Trans. Geosci. Remote
Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004.
[18] V. Vapnik, The Nature of Statistical Learning Theory. Cham, Switzerland:
Springer, 2013.
[19] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32,
2001.
[20] M. Fauvel,J. A. Benediktsson, J. Chanussot, and J. R. Sveinsson, “Spectral
and spatial classification of hyperspectral data using SVMs and mor-
phological profiles,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 11,
pp. 3804–3814, Nov. 2008.
[21] S.Schulter, P. Wohlhart, C. Leistner, A. Saffari, P. M. Roth, and H. Bischof,
“Alternating decision forests,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit., 2013, pp. 508–515.
[22] E. Tuv, A. Borisov, G. Runger, and K. Torkkola, “Feature selection with
ensembles, artificial variables, and redundancy elimination,” J. Mach.
Learn. Res., vol. 10, pp. 1341–1366, 2009.
[23] P. Peng, Q.-L. Ma, and L.-M. Hong, “The research of the parallel SMO
algorithm for solving SVM,” in Proc. Int. Conf. Mach. Learn. Cybern.,
2009, vol. 3, pp. 1271–1274.
[24] P.-H. Chen, R.-E. Fan, and C.-J. Lin, “A study on SMO-type decomposition
methods for support vector machines,” IEEE Trans. Neural Netw., vol. 17,
no. 4, pp. 893–908, Jul. 2006.
[25] G. Camps-Valls, L. Gomez-Chova, J. Muñoz-Marí, J. Vila-Francés, and J.
Calpe-Maravilla, “Composite kernels for hyperspectral image classifica-
tion,” IEEE Geosci. Remote Sens. Lett., vol. 3, no. 1, pp. 93–97, Jan. 2006.
[26] Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Hyperspectral image classifi-
cation using dictionary-based sparse representation,” IEEE Trans. Geosci.
Remote Sens., vol. 49, no. 10, pp. 3973–3985, Oct. 2011.
[27] Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Hyperspectral image clas-
sification via kernel sparse representation,” IEEE Trans. Geosci. Remote
Sens., vol. 51, no. 1, pp. 217–231, Jan. 2013.
[28] L. Zhang, L. Zhang, and B. Du, “Deep learning for remote sensing data:
A technical tutorial on the state of the art,” IEEE Geosci. Remote Sens.
Mag., vol. 4, no. 2, pp. 22–40, Jun. 2016.
[29] Y. Chen, X. Zhao, and X. Jia, “Spectral-spatial classification of hyperspec-
tral data based on deep belief network,” IEEE J. Sel. Topics Appl. Earth
Observ. Remote Sens., vol. 8, no. 6, pp. 2381–2392, Jun. 2015.
[30] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, “Convolutional deep belief
networks for scalable unsupervised learning of hierarchical representa-
tions,” in Proc. 26th Annu. Int. Conf. Mach. Learn., 2009, pp. 609–616.
[31] Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi, “Deep feature extrac-
tion and classification of hyperspectral images based on convolutional
neural networks,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 10,
pp. 6232–6251, Oct. 2016.
[32] P. Ghamisi, Y. Chen, and X. X. Zhu, “A self-improving convolution neural
network for the classification of hyperspectral data,” IEEE Geosci. Remote
Sens. Lett., vol. 13, no. 10, pp. 1537–1541, Oct. 2016.
[33] Y. Chen, Y. Wang, Y. Gu, X. He, P. Ghamisi, and X. Jia, “Deep learning
ensemble for hyperspectral image classification,” IEEE J. Sel. Topics Appl.
Earth Observ. Remote Sens., vol. 12, no. 6, pp. 1882–1897, Jun. 2019.
[34] N. Yokoya, C. Grohnfeldt, and J. Chanussot, “Hyperspectral and mul-
tispectral data fusion: A comparative review of the recent litera-
ture,” IEEE Geosci. Remote Sens. Mag., vol. 5, no. 2, pp. 29–56,
Jun. 2017.
2424 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 15, 2022
[35] T. Blaschke, “Object based image analysis for remote sensing,” ISPRS J.
Photogramm. Remote Sens., vol. 65, no. 1, pp. 2–16, 2010.
[36] V. Nair and G. E. Hinton, “Rectified linear units improve restricted
boltzmann machines,” in Proc. 27th Int. Conf. Mach. Learn., 2010, pp.
807–814.
[37] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:
Surpassing human-level performance on imagenet classification,” in Proc.
IEEE Int. Conf. Comput. Vis., 2015, pp. 1026–1034.
[38] Z. Zuo et al., “Learning contextual dependence with convolutional hier-
archical recurrent neural networks,” IEEE Trans. Image Process., vol. 25,
no. 7, pp. 2983–2996, Jul. 2016.
[39] A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for
nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970.
[40] S.Li, W. Song, L. Fang, Y. Chen, P. Ghamisi, and J. A. Benediktsson, “Deep
learning for hyperspectral image classification: An overview,” IEEE Trans.
Geosci. Remote Sens., vol. 57, no. 9, pp. 6690–6709, Sep. 2019.
[41] J.M. Haut, M. E. Paoletti, J. Plaza, J. Li, and A. Plaza, “Active learning with
convolutional neural networks for hyperspectral image classification using
a new Bayesian approach,” IEEE Trans. Geosci. Remote Sens., vol. 56,
no. 11, pp. 6440–6461, Nov. 2018.
[42] X. Yang, Y. Ye, X. Li, R. Y. Lau, X. Zhang, and X. Huang, “Hyperspectral
image classification with deep learning models,” IEEE Trans. Geosci.
Remote Sens., vol. 56, no. 9, pp. 5408–5423, Sep. 2018.
[43] L. Jiao, M. Liang, H. Chen, S. Yang, H. Liu, and X. Cao, “Deep fully con-
volutional network-based spatial distribution prediction for hyperspectral
image classification,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 10,
pp. 5585–5599, Oct. 2017.
[44] J. M. Nascimento and J. M. Dias, “Vertex component analysis: A fast
algorithm to unmix hyperspectral data,”IEEE Trans. Geosci. Remote Sens.,
vol. 43, no. 4, pp. 898–910, Apr. 2005.
[45] W. Li, R. Dong, H. Fu, J. Wang, L. Yu, and P. Gong, “Integrating google
earth imagery with landsat data to improve 30-m resolution land cover
mapping,” Remote Sens. Environ., vol. 237, 2020, Art. no. 111563.
[46] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
2014, arXiv:1412.6980.
[47] M. Paoletti, J. Haut, J. Plaza, and A. Plaza, “Deep learning classifiers for
hyperspectral imaging: A review,” ISPRS J. Photogramm. Remote Sens.,
vol. 158, pp. 279–317, 2019.
Xiaoxiao Feng received the bachelor’s degree in surveying and mapping from
Southeast University, Nanjing, China, in 2014, and the master’s degree in
earth exploration and information technology from the China University of
Geosciences, Wuhan, China, in 2017, and the Ph.D. degree in photogrammetry
and remote sensing from Wuhan University, Wuhan, China in 2021.
He is currently a Ph.D. student with the State Key Laboratory of Information
Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan
University. Her research interests include hyperspectral image processing and
urban impervious surface extraction.
Zhenfeng Shao received the bachelor’s in surveying engineering and master’s
degrees in cartography and geographical information system from Wuhan Tech-
nical University of Surveying and Mapping, Wuhan, China, and the Ph.D. degree
in photogrammetry and remote sensing from Wuhan University, Wuhan, China.
He is currently a Professor with the State Key Laboratory of Information
Engineering in Surveying, Mapping and Remote Sensing, Wuhan University,
Wuhan, China. His research interest mainly focuses on urban remote sensing
applications. The specific research directions include high-resolution remote
sensing image processing and analysis, key technologies and applications from
digital cities to smart cities and sponge cities.
Xiao Huang received the bachelor’s degree in remote sensing and information
engineering from Wuhan University of China, Wuhan, in 2015, the master’s
degree in city planning and architecture from the Georgia Institute of Technology
China, Shenzhen, China, in 2016, and the Ph.D. degree in geography from the
University of South Carolina, Columbia, SC, USA, in 2020.
He is currently an Assistant Professor with the Department of Geosciences,
University of Arkansas, Fayetteville, AR, USA. His research interests include
remote sensing and GIS in natural hazards, data-driven visualization and ad-
vanced data fusion flood models, big social data mining, regional geospatial
analysis, remote sensing, and GeoAI.
Luxiao He received the bachelor’s degree in geo-information science and tech-
nology and the master’s degree in earth exploration and information technology
from the China University of Geosciences, Wuhan, China, respectively, in 2014
and 2017, and the Ph.D. degree in photogrammetry and remote sensing from
Wuhan University, Wuhan, China in 2021.
He is currently working toward the Ph.D. degree with the State Key Labo-
ratory of Information Engineering in Surveying, Mapping and Remote Sensing
(LIESMARS), Wuhan University, Wuhan, China.
His research interests include high spatial resolution image processing and
application.
Xianwei Lv received the bachelor’s degree in geographic information science
from the East China University of Science and Technology, Nanchang, China,
in 2016 and the master’s degree in surveying and mapping from the China
University of Geosciences, Beijing, China, in 2019. He is currently working
toward the Ph.D. degree in photogrammetry and remote sensing with the State
Key Laboratory of Information Engineering in Surveying, Mapping and Remote
Sensing, Wuhan University, Wuhan, China.
He does research in deep learning for very high-resolution image processing
and applications.
Qingwei Zhuang received the bachelor’s degree in surveying and mapping
from Henan Polytechnic University, Henan, China, in 2017, and the master’s
degree in surveying and mapping from the University of Chinese Academy of
Sciences, Beijing, China, in 2020. He is currently working toward the Ph.D.
degree in photogrammetry and remote sensing with the State Key Laboratory of
Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan
University, Wuhan, China.
His research interest mainly focuses on remote sensing applications. The spe-
cific research directions include remote sensing image processing and analysis,
key technologies and applications in urban ecosystem.