ArticlePDF Available

Integrating Zhuhai-1 Hyperspectral Imagery With Sentinel-2 Multispectral Imagery to Improve High-Resolution Impervious Surface Area Mapping

Authors:

Abstract and Figures

Mapping impervious surface area (ISA) in an accurate and timely manner is essential for a variety of fields and applications, such as urban heat islands, hydrology, waterlogging, and urban planning and management. However, the large and complex urban landscapes pose great challenges in retrieving ISA information. Spaceborne hyperspectral (HS) remote sensing imagery provides rich spectral information with short revisit cycles, making it an ideal data source for ISA extraction from complex urban scenes. Nevertheless, insufficient single-band energy, the involvement of modulation transfer function (MTF), and the low signal-to-noise ratio (SNR) of spaceborne HS imagery usually result in poor image clarity and noises, leading to inaccurate ISA extraction. To address this challenge, we propose a new deep feature fusion-based classification method to improve 10 m resolution ISA mapping by integrating Zhuhai-1 HS imagery with Sentinel-2 multispectral (MS) imagery. We extract deep features that include spectral and spatial features respectively from MS and HS imagery via a 2D convolutional neural network (CNN), aiming to increase feature diversity and improve the model’s recognition capability. The Sentinel-2 imagery is used to enhance the spatial information of the Zhuhai-1 HS image, improving the urban ISA retrieval by reducing the impact of noises. By combining the deep spatial features and deep spectral features, we obtain joint spatial-spectral features, leading to high classification accuracy and robustness. We test the proposed method in two highly urbanized study areas that cover Foshan city and Wuhan city, China. The results reveal that the proposed method obtains an overall accuracy of 96.72% and 96.75% in the two study areas, 18.78% and 8.66% higher than classification results with only HS imagery as input. The final ISA extraction overall accuracy is 95.42% and 95.50% in the two study areas, the highest among the comparison methods.
Content may be subject to copyright.
2410 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 15, 2022
Integrating Zhuhai-1 Hyperspectral Imagery
With Sentinel-2 Multispectral Imagery to Improve
High-Resolution Impervious Surface Area Mapping
Xiaoxiao Feng , Zhenfeng Shao , Xiao Huang , Luxiao He ,XianweiLv , and Qingwei Zhuang
Abstract—Mapping impervious surface area (ISA) in an accu-
rate and timely manner is essential for a variety of fields and
applications, such as urban heat islands, hydrology, waterlogging,
and urban planning and management. However, the large and
complex urban landscapes pose great challenges in retrieving ISA
information. Spaceborne hyperspectral (HS) remote sensing im-
agery provides rich spectral information with short revisit cycles,
making it an ideal data source for ISA extraction from complex
urban scenes. Nevertheless, insufficient single-band energy, the
involvement of modulation transfer function (MTF), and the low
signal-to-noise ratio (SNR) of spaceborne HS imagery usually re-
sult in poor image clarity and noises, leading to inaccurate ISA
extraction. To address this challenge, we propose a new deep feature
fusion-based classification method to improve 10 m resolution ISA
mapping by integrating Zhuhai-1 HS imagery with Sentinel-2 mul-
tispectral (MS) imagery.We extract deep features that include spec-
tral and spatial features, respectively, from MS and HS imagery
via a 2-D convolutional neural network (CNN), aiming to increase
feature diversity and improve the model’s recognition capability.
The Sentinel-2 imagery is used to enhance the spatial information
of the Zhuhai-1 HS image, improving the urban ISA retrieval
by reducing the impact of noises. By combining the deep spatial
features and deep spectral features, we obtain joint spatial-spectral
features, leading to high classification accuracy and robustness. We
test the proposed method in two highly urbanized study areas that
cover Foshan city and Wuhan city, China. The results reveal that
the proposed method obtains an overall accuracy of 96.72% and
96.75% in the two study areas, 18.78% and 8.66% higher than
classification results with only HS imagery as input. The final ISA
extraction overall accuracy is 95.42% and 95.50% in the two study
areas, the highest among the comparison methods.
Index Terms—Convolutional neural network (CNN), feature
fusion, impervious surface area (ISA) mapping, sentinel-2 imagery,
Zhuhai-1 spaceborne hyperspectral (HS) imagery.
Manuscript received August 9, 2021; revised October 30, 2021 and January
4, 2022; accepted March 5, 2022. Date of publication March 8, 2022; date
of current version March 25, 2022. This work was supported in part by the
National Natural Science Foundation of China under Grant 42090012, in part
by 03 Special Research and 5G Project of Jiangxi Province in China under
Grant 20212ABC03A09, in part by the Zhuhai Industry University Research
Cooperation Project of China under Grant ZH22017001210098PWC, and in
part by the Key R&D project of Sichuan Science and Technology Plan under
Grant 2022YFN0031. (Corresponding author: Zhenfeng Shao.)
Xiaoxiao Feng, Zhenfeng Shao, Luxiao He, Xianwei Lv, and Qingwei Zhuang
are with the State Key Laboratory of Information Engineering in Survey-
ing, Mapping, Remote Sensing, Wuhan University, Wuhan 430079, China
(e-mail: fengxxalice2018@gmail.com; shaozhenfeng@whu.edu.cn; helux-
iao@foxmail.com; xianweilv@whu.edu.cn; zhuangqingwei@whu.edu.cn).
Xiao Huang is with the Department of Geosciences, University of Arkansas,
Fayetteville, AR 72701 USA (e-mail: xh010@uark.edu).
Digital Object Identifier 10.1109/JSTARS.2022.3157755
I. INTRODUCTION
IMPERVIOUS surface area (ISA) is usually defined as natural
or artificial surfaces (e.g., roads, parking lots, roofs made
of cement concrete, glass, asphalt, plastic, tiles, metal, etc.)
covering in cities that prevent water from penetrating into the
ground [1]. The rapid progress of urbanization inevitably leads
to tremendous changes in land use and land cover types. ISA is
a key indicator in evaluating the urban ecological environment
and usually poses notable negative impacts on the urban environ-
ment [2], climate [3], [4], and hydrology [5]–[7]. Therefore, the
evaluation of ISA distribution should focus on not only its spatial
expansion, but also its environmental consequences. Further-
more, it is of great significance for the sustainable development
strategy of urban planning and management to obtain accurate
ISA information in a timely manner and investigate the impact
of its dynamic changes on the environment.
Remote sensing technology has been widely used in ISA
monitoring, thanks to its extensive spatial coverage and high
temporal frequency. Early studies on ISA were mostly based on
medium-resolution multispectral (MS) satellites such as Landsat
Thematic Mapper (TM) [8], [9] and Enhanced Thematic Mapper
(ETM+) [10]. However, the complexity of urban landscapes and
broadband reflectance data pose great challenges in ISA classi-
fication, as many urban materials cannot be distinguished ac-
curately. Besides, ISA maps with coarse resolutions are limited
in potential applications, e.g., distinguishing urban functional
areas [1]. In comparison, fine-resolution ISA maps allow for
more spatial-explicit studies such as investigating the impact of
urbanization on energy, water, carbon cycles, vegetation phenol-
ogy, and surface climate [11]. Hyperspectral (HS) imagery can
provide not only spatial information of features, but also rich
spectral information that can accurately reflect heterogeneous
spectral characteristics of features, leading to fine identification
and classification. Most of the existing ISA studies utilized
classic HS data captured by the airborne HS sensors [12],
such as the simulated Environmental Mapping and Analysis
Program (EnMAP) [13], the Hyperspectral Digital Imagery
Collection Experiment (HYDICE) [14], and Reflective Optics
System Imaging Spectrometer (ROSIS) [15]. Signal-to-noise
ratio (SNR) describes the quality of a measurement. In charge-
coupled device (CCD) imaging, SNR refers to the ratio of the
measured signal to the overall measured noise (frame-to-frame)
at that pixel. High SNR is particularly important in applications
requiring precise measurement. The advantage gained from the
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
FENG et al.: INTEGRATING ZHUHAI-1 HS IMAGERY WITH SENTINEL-2 MS IMAGERY 2411
fine spectral information obtained from HS sensors can be offset
by the lower SNR when compared to MS sensors because of
the fewer number of photons captured by each detector due
to the narrower width of the spectral channels. Compared to
the spaceborne HS data, images from airborne HS sensors are
characterized by high spatial resolution and high SNR. However,
the airborne HS data is limited in synoptic coverage at urban
scales, which limits their use for systematically mapping urban
land cover of arbitrary cities around the world [13]. This study
marks a pioneering effort to integrate the spaceborne HS data and
spaceborne MS data (Sentinel-2) for accurate and fine-grained
(10 m) ISA mapping.
The classification-based ISA extraction methods aim to first
extract spatial and spectral features and feed them into classifiers
to obtain the ISA distribution map. Traditional classification
methods include maximum likelihood estimation (MLE) [16],
support vector machine (SVM) [17], [18], random forest
(RF) [19], and their derivations [20]–[22]. Among them, SVM is
superior to MLE, as it can solve the nonlinear classification prob-
lem. Further, parallel SVM (PSVM) [23] has been developed to
solve the computational complexity problem, and the hierar-
chical PSVM method is designed based on sequential minimal
optimization (SMO) [24] and SVM. In addition, kernel methods
combined with SVM are widely used in HS image classifica-
tion to improve separability [25]. Recently, improved sparse
representation, e.g., synchronous orthogonal matching pursuit
(SOMP) [26] and synchronous subspace pursuit (SSP) [27],
was applied to HS image classification and achieved great
classification results. In the aforementioned methods, training
samples are used to learn the sparse representation dictionary,
where the test samples in HS images are sparsely represented.
The representation residuals are further compared to find the
best representation to determine the label of samples.
However, traditional classification methods largely rely on
expertise and are dependent on parameter settings, leading to
their low automation and low generalization. The deep learning
networks, such as stacked automatic encoder (SAE) [28], deep
belief network (DBN) [29], [30], and deep convolutional neural
network (DCNN) [31], [32], are different from traditional feature
extraction methods. Compared with other networks, CNN uses
local connections to extract features with shared weights. Such
a design facilitates effective information retrieval and reduces
the number of parameters needed to be trained. Chen et al. [33]
applied a self-coding network to classify the reduced HS images
and achieved decent results. They further found that CNN can
extract the spatial and spectral features of the objects in images
in a more effective manner, thus leading to better classification
results.
After reviewing relevant literature, we identified the following
challenges in ISA retrieval based on HS images:
1) The low SNR and modulation transfer function (MTF)
of spaceborne HS data lead to defective spatial informa-
tion, evidenced by the low-quality spectral information of
ground objects.
2) The spectral-based CNN methods fail to integrate the
spatial information of ground objects, which results in
salt-and-pepper noises in classified results, thus leading
to reduced classification accuracy.
To address these challenges, we propose a novel approach to
improving the ISA extraction accuracy by integrating Sentinel-2
MS data and Zhuhai-1 HS data. The first strategy is to first
fuse HS and MS images and then obtain the ISA results using
classifiers. Commonly used HS-MS image fusion methods can
be roughly classified into pan-sharpening and subspace-based
methods. Pan-sharpening-based methods include component
substitution, multiresolution analysis, and sparse representa-
tion [34]. The latter category, e.g., Bayesian method-based
methods and spectral unmixing-based methods, focuses on the
inherent spectral characteristics of scenes. The other strategy
is to first fuse the features extracted from HS and MS, and
further obtain the classification results. Comparing these two
strategies, the former one highly relies on the fused image, and
the accuracy of classification results based on pixel-level fusion
images depends on the spectral fidelity of the fusion algorithm.
Therefore, in this article, we use CNNs to extract the features
from HS and MS images, respectively, and fuse the features to
obtain the classification results.
The proposed integration process is achieved by fusing the
spectral and spatial deep features extracted from HS and MS
images, thus potentially improving the accuracy of the final ISA
map. As HS imagery contains abundant spectral information
while MS data contains detailed spatial information, we extract
spectral and spatial deep features from HS imagery and MS
imagery, respectively. In this study, we utilize two-dimensional
(2-D) CNN to extract the deep features and further enhance
features by fusing extracted spectral and spatial deep features.
To deal with salt-and-pepper noises of classification results, the
object-based image analysis (OBIA) [35] method is a commonly
used approach. However, the OBIA classification method is
mainly for images with very high spatial resolution. For this
study, the spatial resolution of HS satellite images used in this
article is 10 m, which is not ideal for the application of OBIA.
Therefore, we use a 2-D CNN network to extract the spatial
information of images and further perform impervious surface
classification.
The main contributions of this article are summarized as
follows.
1) The extraction of the spectral and spatial deep features
from HS and MS images, respectively, and their fusion
contribute to better feature retrieval from the ground ob-
jects in images, thus leading to improved classification
accuracy.
2) The fusion of spectral and spatial deep features improves
the model’s robustness and reduces noises in classified
results assisted by the supplement of spatial information.
3) Zhuhai-1 HS data (2-day revisiting cycle) and Sentinel-2
MS data (5-day revisiting cycle) have a considerably high
temporal resolution. Therefore, their combination real-
izes a high-temporal fine-grained ISA mapping, providing
the basis for future time series ISA analysis and timely
supports in urban land management and construction
planning.
The rest of this article is organized as follows. Section II
introduces related works and the proposed method. Section III
describes the study areas and experimental datasets. Section IV
presents and analyzes the experimental results. Section V
2412 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 15, 2022
discusses the effectiveness of the proposed method compared
with a single feature classification network and the effect of
different patch sizes on the ISA extraction. Finally, Section VI
concludes this article.
II. METHODOLOGY
A. Convolutional Neural Network
CNN has received wide attention in recent years and achieved
great performances in classification, detection, and many other
tasks. CNN has two characteristics: local connection and shared
weights. In each convolution layer, feature maps are generated
by multiple learnable filters, which can be expressed as
yl
j=
d
i=1
fxl1
iwl
ij +bl
j(1)
where xl1
idenotes the ith feature map of l1layer, yl
jdenotes
the jth feature map of llayer, and dis the number of the input
feature maps. wl
ij and bl
jare the randomly initialized weights
and bias, respectively. denotes the convolutional operator, and
fdenotes the nonlinear activation functions, such as Sigmoid,
Tanh, and rectified linear unit (ReLU) [36]. In this article, we
use parametric ReLU (PReLU) [37], which can be formulated
as
PReLU (yi)=yiif yi>0
aiyiif yi0(2)
where yiis the input of ith channel, and aiis a coefficient that
controls the slope. After convolution operations, a max-pooling
layer is used to downsample the feature maps [38]. In this way,
the output size and the number of parameters can be reduced,
effectively avoiding overfitting. The max-pooling operation can
be formulated as
yr,c =max
0g,nh(xr+g,c+h)(3)
where yr,c is the neuron value at (r, c)in the output layer. gand
hare the pixel position around the center neuron at (r, c)within
the image patch.
The network training procedure consists of forward and
backward propagations, aiming to reduce the gap between the
predicted labels and the ground truth labels by updating model
parameters. The loss/cost is calculated by the differences be-
tween the predicted values and the ground-truthing values in
the forward propagation. The purpose of backpropagation is to
reduce loss by adjusting the parameters. In this study, we use
the softmax cross-entropy loss
c=1
m
m
i=1
[xilog [zi]+[1xi]log [1 zi]] + λ
2m
N
j=1
w2
j
(4)
where mis the size of the image batch, xiand zidenote the
ith predicted label and the ground truth label, respectively. N
means the number of weights. λis the parameter to adjust the
proportion between the former term (original loss function)
and the regularization term (the latter term) in (4). Besides,
we set λto 1
2to simplify the process of derivation. Studies
Fig. 1. Workflow of ISA extraction by fusing spectral-spatial deep features
using 2-D CNN.
Fig. 2. Workflow of the 2-D CNN-based deep features extraction. (a) the
workflow of patchwise feature extraction; (b) the workflow of the pixelwise
feature extraction.
have proved that l2regularization term can avoid models from
overfitting [31], [39].
The classification approaches of HS images based on CNN
can be grouped into three categories, i.e., spectral feature-based,
spatial feature-based, and spatial-spectral feature-based meth-
ods [40]. Spectral feature-based classification methods apply
one-dimension (1-D) CNN to extract the deep spectral features
for classification [41], [42]. In comparison, spatial feature-based
classification methods apply 2-D CNN to extract the spatial
information for classification [43]. The main difference between
1-D CNN and 2-D CNN is the dimensionality of the convolution
operation. In this study, we use 2-D CNN to extract the spectral
features from HS images and spatial features from MS images
and further fuse the obtained spectral-spatial deep features to
improve the classification accuracy of ISA.
B. Extraction and Fusion of Spectral and Spatial Features via
2-D CNN
HS images contain rich spectral information that benefits
accurate descriptions of the spectral characteristics of ground
objects. Given the spaceborne nature, HS images are with lim-
ited SNR and spatial resolution. In contrast, MS images with
the same spatial resolution are characterized by high SNR.
Therefore, to simultaneously obtain the spectral and spatial
information, we use 2-D CNN to extract the deep spectral
features from HS images and the deep spatial features from MS
images. We further fuse the extracted deep features for land
cover classification and eventually map the fine-grained ISA
distribution. The specific workflow is shown in Fig. 1.
1) Deep Features Extraction From HS and MS Datasets: In
this article, the spatial and spectral deep features of images are
extracted by three convolution layers and a fully connected layer
[see Fig. 2(a)]. The first layer takes the 27 ×27 image patch with
FENG et al.: INTEGRATING ZHUHAI-1 HS IMAGERY WITH SENTINEL-2 MS IMAGERY 2413
N1channels and calculates 64 feature maps using 4 ×4 receptive
field and a nonlinear activation PReLU. The second layer takes
the 12 ×12 image patch with 64 channels and calculates 128
feature maps using 5 ×5 receptive field and PReLU. The
third layer takes the 4 ×4 image patch with 128 channels and
calculates 256 feature maps using 4 ×4 receptive field and a
nonlinear activation PReLU. The calculating process of these
three convolution layers can be expressed in (5). Finally, the full
connected layer takes the 1 ×1 vector with 256 feature maps to
derive the classification results. The workflow of the patchwise
feature extraction is shown in Fig. 2(a), and the workflow of
pixelwise feature extraction is shown in Fig. 2(b).
f1(x)=max (0,b
1+w1x)
w1:64×(4 ×4×N1),b
1:64×1
f2(x)=max (0,b
2+w2f1(x))
w2: 128 ×(5 ×5×64) ,b
2: 128 ×1
f3(x)=max (0,b
3+w3f2(x))
w3: 256 ×(4 ×4×128) ,b
3: 256 ×1.
(5)
To compare the classification performance between the pixel-
based 1-D CNN and image patch-based 2-D CNN, we use two
workflows to obtain the results [see Fig. 2(a) and (b)]. The pixel-
based 1-D CNN extracts only spectral features, while the patch-
based 2-D CNN extracts both spectral and spatial features from
images. We further conduct experiments to analyze how such
workflow selection influences classification accuracy.
2) Spectral and Spatial Deep Features Fusion: Multisource
images contain diverse information, while single-source images
may not achieve the best classification performance due to
their lack of feature diversity. We use concatenation to fuse the
features to enhance feature discrimination ability, which denote
as FF-C
H(l+1)
C=H[l]
HS,H[l]
MS(6)
where [,]denote the elementwise addition, elementwise mul-
tiplication, and concatenation operations, respectively. H[l]
HS and
H[l]
MS denote the lth layer features extracted from HS and MS
images, respectively.
III. STUDY AREAS AND DATASETS
This study includes two study areas that cover parts of Foshan
city in Guangdong Province, China and Wuhan city in Hubei
Province, China, respectively. The HS dataset is derived from
the Zhuhai-1 Orbita HS satellite, while the MS dataset is derived
from the Sentinel-2 satellite.
A. Zhuhai-1 OHS HS Datasets
The second batch of Zhuhai-1 microsatellites was success-
fully launched on April 26, 2018, including four Orbita HS
satellites (referred to as OHS-A, OHS-B, OHS-C, and OHS-D)
and one video satellite (OVS-2 A). The spatial resolution of OHS
data is 10 m, with an imaging range of 150 km, the spectral
resolution of 2.5 nm, and the spectrum from 400 to 1000 nm
(see Table I). A single HS satellite has 15–16 daily orbits, and
the single data acquisition time of each orbit is less than 8 min.
At present, the revisiting cycle of the four satellites is two days.
TAB LE I
CENTER WAVELENGTH OF OHS HS DATA
TAB LE I I
CENTER WAVELENGTH AND SPAT IAL RESOLUTION OF SENTINEL-2 IMAGERY
The OHS satellite is characterized by its small size, high spatial
resolution, large breadth, and short revisit period. It is expected to
benefit various tasks that include ecological environment moni-
toring, urban construction management, agricultural production,
disaster prediction, and assessment.
B. Sentinel-2 MS Datasets
Sentinel-2 is an Earth observation mission from the Coperni-
cus Programme (operated by the European Space Agency) that
systematically acquires optical imagery at high spatial resolution
(10–60 m) over land and coastal waters. The mission supports
a broad range of services and applications such as agricultural
monitoring, emergency management, land cover classification,
and water quality monitoring. Sentinel-2 has a 5-day revisiting
cycle. We select the bands of 10 m spatial resolution for feature
extraction (see Table II). The dataset is downloaded from the
USGS website from the collection of Level-1 C products.
C. Study Area
The first study area (344.84 km2), located at 1134
11315E, 2248–2259N, covers part of Foshan city in South
China’s Guangdong province (see Fig. 3). Lying in the middle
of the Pearl River delta plain, Foshan city has a high degree
of urbanization and owns a large number of scattered hills,
rivers, and water networks, including navigation, irrigation,
aquaculture, and other functional areas. The second study area
(388.09 km2), located at 1146–11420E, 3022–3034N,
covers part of Wuhan city in Hubei province (see Fig. 4). The
Wuhan city is located in the east of Jianghan Plain and on the
middle reaches of the Yangtze River at the intersection of the
Yangtze and Han rivers.
2414 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 15, 2022
Fig. 3. Study area in Foshan city (1855 ×1855 pixels). (a) Zhuhai-1 HS image (shown in bands 12, 6, 1 as RGB) acquired on November 9, 2019; (b) Sentinel-2
MS image (shown in bands 4, 3, 2 as RGB) acquired on November 11, 2019).
Fig. 4. Study area in Wuhan city (1970 ×1970 pixels). (a) Zhuhai-1 HS image (shown in bands 12, 6, 1 as RGB) acquired on September 21, 2019; (b) Sentinel-2
MS image (shown in bands 4, 3, 2 as RGB) acquired on September 22, 2019).
FENG et al.: INTEGRATING ZHUHAI-1 HS IMAGERY WITH SENTINEL-2 MS IMAGERY 2415
Fig. 5. Examples of 8 different land cover types in the Foshan study area. (a) Google Earth image; (b) Zhuhai-1 HS image; (c) Sentinel-2 MS image.
Both study areas are characterized by high-level urbanization
and dense water networks, making them prone to waterlog-
ging issues, especially after frequent and intensive rain. The
fine-resolution ISA distribution can provide the basis for the
investigation of urban resilience. Monitoring of ISA distribution
dynamics plays a vital role in urban environmental impact
analysis and planning management. Both Zhuhai-1 HS image
and Sentinel-2 MS image were captured under clear-sky condi-
tions to illustrate the effectiveness of the proposed classification
workflow that integrates these two images for an improved
10 m ISA mapping. Given the short time intervals between HS
and MS images in the two study areas, we believe there exist
no significant changes in ground features. The image SNR is
estimated by the number of object types in the study area [44].
The SNR values of the HS and MS images in the Foshan study
area are 28.45 and 149.69 dB, respectively. The SNR values
of HS and MS images in the Wuhan study area are 35.36 and
149.66 dB, respectively.
IV. EXPERIMENTS AND RESULTS
In this section, we detail our experimental settings and present
the results along with the analysis. Section IV-A details the sam-
ple selection procedure, Section IV-B shows the experimental
setup. Section IV-C shows the effectiveness of the proposed
fusion algorithm that integrates HS and MS deep features, and
Section IV-D shows the comparison of classification results
obtained from the proposed method and other state-of-art clas-
sification methods.
A. Sample Selection
Before the sample selection, the HS and MS images were geo-
metric registered manually by selecting correspondence points.
The training, validation, and testing samples used in this study
were all selected from Sentinel-2 images via human interpreta-
tion against Google Earth imagery. We first randomly select the
sample points (pixels) from the image, making sure the sampling
points are evenly distributed on the image. Then, we classify the
points into their corresponding types. Finally, we divide sample
points into training, validation, and testing samples according
TABLE III
NUMBER OF TRAINING,VALIDATION,AND TESTING SAMPLES USEDINTHE
FOSHAN STUDY AREA
TAB LE I V
NUMBER OF TRAINING,VALIDATION,AND TESTING SAMPLES USEDINTHE
WUHAN STUDY AREA
to the ratio of 8:1:1. For the pixelwise input, samples are the
central pixels, while for the patchwise input, samples are the
patches with different sizes centered around the central pixels.
The Foshan study area has eight land cover types, i.e., vegetation,
roof, asphalt road, river, dense building, bright ISA, pound, and
soil (see Fig. 5). The number of samples for each land cover type
can be found in Table III. The total samples of the Foshan study
area contain 473 302 pixels. For the eight derived land cover
types, the land cover types that include roof, asphalt road, dense
building, and bright ISA are classified as ISA.
The Wuhan study area has 10 land cover types, including
soil, bright ISA, concrete road, vegetation, dense building, lake,
asphalt road, algae, roof, and river (see Fig. 6). The number
of samples for each classified land cover type is listed in Ta-
ble IV. The total samples of the Wuhan study area contain
2416 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 15, 2022
Fig. 6. Examples of eight different land cover types in the Foshan study area. (a) Google Earth image; (b) Zhuhai-1 HS image; (c) Sentinel-2 MS image.
Fig. 7. Spectral reflectance curves of each land cover type from Zhuhai-1 HS image for the training samples in the Wuhan study area. The curve type is obtained
by calculating the average value and standard deviation of the spectral reflectance of all training samples for each land cover type and for each given band.
294 449 pixels. Land cover types that include bright ISA, con-
crete road, dense building, asphalt road, and roof are classified as
ISA.
To illustrate that different land cover types have distinct
spectral characteristics in HS images, which is beneficial for
the classification, we present the spectral reflectance curves in
Fig. 7. Using the Wuhan study area as an example. For each land
cover type in the Wuhan study area, we calculate the average and
standard deviation of all training samples for each land cover
type and for each band, following [45]. Fig. 8(a) and (b) shows
the ground-truth distribution of land cover types in the Foshan
and Wuhan study areas.
B. Experimental Setup
1) Implementation Details: The proposed network in this
study is implemented on the Pytorch platform with Adam opti-
mizer [46]. In the network training, we set the maximum number
of epochs to 30, the batch size in the training phase to 100, and
the learning rate to 0.001. The input data is normalized into
[0, 1]. According to the number of training samples in the two
study areas, we set the number of training batches to 3786 and
2355 in Foshan and Wuhan study areas, respectively. To reduce
overfitting and to stabilize the network during the training phase,
we set the l2norm regularization to 0.01.
2) Comparison With Baseline Methods: The competing
methods are the classic classification methods with the following
parameter settings.
1) RF: 200 decision trees are used in the classifier.
2) SVM: The kernel is the radial basis function with two op-
timal hyperparameters σand λ, set to 0.1 and 0.01, respectively.
3) Multinomial Logistic Regression (MLR): We choose the
l2regularization as the penalty (set to 0.01) and “lbfgs” as the
solver.
4) Multilayer Perceptron (MLP): We set the batch size to
100, the max epoch to 30, the l2norm regularization to 0.01,
activation function to ReLU, and the optimizer to Adam.
5) Vanilla Recurrent Neural Network (RNN).
6) RNN with gated recurrent units (GRU).
7) RNN with long short term memory (LSTM). The code for
these competing methods is available in [47]. All methods use
the same training, validation, and testing samples.
The performances of the classification results are assessed
based on three indicators that include the overall accuracy (OA),
average accuracy (AA), and Kappa coefficient (Kappa). The OA
measures the ratio between correctly classified testing samples
and the total number of testing samples. The AA measures
the average percentage of correctly classified samples for an
individual class. The Kappa measures the percentage agreement
corrected by the level of agreement that can be expected by
chance alone. Each land cover type is assessed based on two
FENG et al.: INTEGRATING ZHUHAI-1 HS IMAGERY WITH SENTINEL-2 MS IMAGERY 2417
Fig. 8. (a1) and (a2), respectively, show the land cover and ISA in the Foshan study area; (b1) and (b2), respectively, show the land cover and ISA in the Wuhan
study area.
TAB LE V
LAND COVER TYPE CLASSIFICATION RESULTS (UA, PA AND AA) IN THE
FOSHAN STUDY AREA BASED ON FF-C
indicators that include the User’s Accuracy (UA) and Producer’s
Accuracy (PA). UA represents the number of correctly classified
samples divided by the total number of samples in the ground
truth. PA represents the number of correctly classified samples
divided by the total number of samples classified as the land
cover type.
C. Classification Results
In this section, we compare the classification results of the
proposed method and other state-of-art methods.
1) The Performance of the Feature Fusion Methods Classi-
fication: The feature fusion-based method is denoted as FF-C.
We extract the image patches in size of 27 ×27 pixels, a small
patch size to ensure a homogenous land cover type in each patch.
From Tables V and VI, it can be seen that the classification
results of FF-C yields high accuracy in each type. In Table V,
Bright ISA in the Foshan study area can be identified accurately
with the highest PA. From Table VI, we notice that the land cover
classification results from FF-C are satisfactory, which proves
TAB LE V I
LAND COVER TYPE CLASSIFICATION RESULTS (UA, PA AND AA) IN THE
WUHAN STUDY AREA BASED ON FF-C (HS AND MS IMAGES)
the strong capability of the feature fusion method. This is very
helpful with the fine ISA distribution extraction.
2) Classification Results of FF-C and Comparison Methods:
In this section, we compare the classification accuracy of each
land cover type and OA, AA, Kappa obtained by different
methods for the two study areas. Quantitative results are shown
in Tables VII and VIII. Figs. 11 and 12 present classification
maps in two study areas for visual comparison among different
methods. Table VII indicates that the proposed FF-C method
obtains the best AA, OA, and Kappa in the Foshan study area,
higher than the best results among the comparison methods
(obtained from RF) by 5.78%, 2.76%, and 3.55%, respectively.
The proposed FF-C method presents the best classification ac-
curacy for all land cover types, except for vegetation, roof, and
river.
From Table VII, we notice that FF-C also yields the best AA,
OA, and Kappa, higher than GRUthat obtains the second-highest
OA by 2.05%, 0.97%, and 1.21%, respectively. Comparing
2418 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 15, 2022
TAB LE V II
QUANTITATIVE COMPARISONS OF DIFFERENT METHODS IN TERMS OF OA, AA, AND KAPPA IN THE FOSHAN STUDY AREA
The bold numbers indicate the best values for accuracy assessment.
TABLE VIII
QUANTITATIVE COMPARISONS OF DIFFERENT METHODS IN TERMS OF OA, AA, AND KAPPA IN THE WUHAN STUDY AREA
The bold numbers indicate the best values for accuracy assessment.
TAB LE I X
OVERALL ACCURACY OF ISA DISTRIBUTION OF TWO STUDY AREAS FROM
DIFFERENT METHODS
The bold numbers indicate the best accuracy of ISA extraction.
Tables VII and VIII, it can be seen that for both study areas, the
performance of FF-C is generally superior to other classification
methods. Due to the higher HS image quality in the Wuhan
study area, land cover classification results in the Wuhan study
area are better than those in the Foshan study area. The above
results demonstrate that the integration of MS and HS data via
feature fusion improves the classification accuracy of land cover
types.
Figs. 9 and 10 present the land cover classification maps
obtained by different methods in the Foshan and Wuhan study
areas, respectively. A visual comparison reveals that pixelwise
classification methods result in salt-and-pepper noises in classi-
fied land use types. In comparison, the proposed FF-C method
yields smoother classification maps due to the combination of
deep features from MS and HS images that further enhance
the model’s identification ability. Table IX shows the overall
accuracy of extracted ISA from two study areas. The results
suggest that FF-C obtains the highest accuracy of ISA. For the
Foshan study area the OA from FF-C is higher than RF that
obtains the second-highest OA by 6.19%. For the Wuhan study
area the OA from FF-C is higher than GRU that obtains the
second-highest OA by 1.36%.
The final ISA extraction results obtained by the proposed FF-
C method in Foshan and Wuhan study areas are shown in Fig. 13
Table X shows the proportion of each land cover and ISA in two
study areas. From Fig. 13 and Table X, we notice that, compared
to the Foshan study area, pervious surfaces (e.g., green space
and lakes) in the Wuhan study have more extensive coverage.
Even though the proportion of water in the Foshan study area is
larger, it is mostly used for aquaculture. The Foshan study area
is located in Guangdong Province, one of the fastest-growing
provinces in China, so its urbanization process is considerably
faster than the Wuhan study area.
V. D ISCUSSION
In this section, we analyze the effectiveness of the feature
fusion strategy by comparing the 1-D CNN and 2-D CNN
classification without performing HS and MS feature fusion in
Section V-A. Section V-B shows the visual comparison of clas-
sification results obtained from different methods. Section V-C
discusses the impact of different patch size on classification
results.
A. Effectiveness HS and MS Data Feature Fusion
To verify the effectiveness of deep features fusion in
land cover classification, we analyze the classification results
FENG et al.: INTEGRATING ZHUHAI-1 HS IMAGERY WITH SENTINEL-2 MS IMAGERY 2419
Fig. 9. MS image and classification maps from different methods in the Foshan study area with one demarcated areas zoomed in two times for easy observation.
TAB LE X
PROPORTION OF EACH LAND COVER AND ISA IN TWO STUDY AREAS.
2420 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 15, 2022
Fig. 10. MS image and classification maps from different methods in the Wuhan study area with one demarcated areas zoomed in two times for easy observation.
obtained by spectral and spatial deep features separately, as well
as by the feature fusion method based on the same training, vali-
dation, and testing samples. For the deep feature fusion method,
we first use 2-D CNN to, respectively, extract the deep features
from the HS image and MS image and use fuse the features
to explore whether spectral-spatial deep feature fusion can lead
to a better land cover classification accuracy. We present the
classification results using 1-D CNN-based methods (HS-1-D
CNN and MS-1-D CNN) and 2-D CNN-based methods (HS-2-D
CNN and MS-2-D CNN) for HS and MS image classification,
respectively. 1-D CNN-based methods take pixelwise input,
while 2-D CNN-based methods take patchwise input.
Fig. 12 presents the land cover type classification accuracy
(AA, OA, and kappa) in the Foshan study area on HS and MS
data under the FF-C fusion strategy. The results suggest that
2-D CNN-based methods obtain higher accuracy compared to
1-D CNN-based methods. Furthermore, the FF-C obtains the
best classification results for all land cover types except bright
ISA and soil. Although the accuracy of bright ISA and soil from
FF-C fail to achieve the best results, it is very close to the optimal
value. In addition, FF-C obtains the highest AA, OA, and kappa
in the Foshan study area. The improvement curve (red lines) in
Fig. 12 shows the notable improvement of FF-C in all land cover
types, especially in the land cover type of road (an improvement
of 0.79).
From Table XI, we observe that the integration of the deep
features extracted from MS data leads to improved classification
accuracy in all land cover types from the Foshan study area. This
not only verifies the effectiveness of the feature fusion strategy
on enhancing the feature representation, but also indicates that
such an integration of MS and HS data can compensate for the
quality deficiency in HS data.
FENG et al.: INTEGRATING ZHUHAI-1 HS IMAGERY WITH SENTINEL-2 MS IMAGERY 2421
Fig. 11. ISA distribution (from FF-C) in two study areas.
Fig. 12. Land cover type classification accuracy (AA, OA, and kappa) in the
Foshan study area on HS and MS data under FF-C fusion strategy. The red curve
reveals the improvement comparing the method that fuses HS and MS data to
the method that uses HS data alone.
Fig. 13. Land cover type classification accuracy (AA, OA, and kappa) in the
Wuhan study area on HS and MS data under FF-C fusion strategy. The red curve
reveals the improvement comparing the method that fuses HS and MS data to
the method that uses HS data alone.
Fig. 13 presents the land cover type classification accuracy
(AA, OA, and kappa) in the Wuhan study area on HS and MS data
under the FF-C fusion strategy. Table XI shows the land cover
type classification results (UA and PA) in the Wuhan study area
based on 1-D/2-D CNN (HS images alone) and feature fusion
strategies (HS and MS images). The results reveal that the land
cover type of concrete road achieves the greatest improvement in
accuracy by 0.4561. The experimental results from our two study
areas classification demonstrate the effectiveness of integrating
MS data with HS data when performing land cover classification.
The deep features from MS data might enhance the spatial
information from HS data, thus leading to better classification
performance when MS and HS data are fused.
TAB LE X I
LAND COVER TYPE CLASSIFICATION RESULTS (UA AND PA) IN THE FOSHAN
STUDY AREA BASED ON 1D/2DCNN(HSIMAGES ALONE)AND FF-C (HS
AND MS IMAGES)
TAB LE X II
LAND COVER TYPE CLASSIFICATION RESULTS (UA AND PA) IN THE WUHAN
STUDY AREA BASED ON 1D/2DCNN(HSIMAGES ALONE)AND FF-C (HS
AND MS IMAGES)
Comparing results from these two study areas, we notice that
the improvement in the Wuhan study area is not as notable as the
one in the Foshan study area. This is because, with similar SNRs
of the MS data, the Wuhan study area has a higher quality HS
image (SNR =35.36 dB) than that in the Foshan study area (SNR
=28.45 dB). Thus, the designed feature enhancement model
has a less notable impact in the Wuhan study area. We notice
that the SNRs in MS of both study areas are around 150 dB,
twice higher than that of HS data. This means MS data contains
more spatial information of ground objects than HS data. Given
that the 2-D CNN can extract the deep features from images
by considering the contextual information in both spatial and
spectral domains, fusing the spectral and spatial deep features
extracted from HS image and MS image is able to improve
classification performance.
B. Visual Comparison
This section presents the details in classification maps from
different methods in the two study areas. For the Foshan study
area (see Fig. 14), the highlighted black rectangle is dominated
by bare soil and vegetation, belonging to the pervious surface;
while the classified category from the comparison method is
dense building, belonging to impervious surface. Such mis-
classification leads to reduced ISA extraction accuracy and
overestimation of ISA. For the Wuhan study area (see Fig. 15),
the highlighted black rectangle is dominated by bare soil, which
is wrongly classified into dense buildings and concrete roads
in the comparison method. Overall, it can be seen that the
classification from the proposed FF-C method can improve the
accuracy of features recognition, thus obtaining more accurate
ISA distribution information.
2422 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 15, 2022
Fig. 14. Selected classification results of the Foshan study area.
Fig. 15. Selected classification results of the Wuhan study area.
TABLE XIII
LAND COVER TYPE CLASSIFICATION RESULTS IN THE FOSHAN STUDY AREA BASED ON FF-C WITH DIFFERENT PAT CH SIZE
C. Patch Size Analysis
The size of input patches is an important parameter that
determines, to a certain degree, the classification accuracy of
the model. To explore the influence of patch sizes on the classi-
fication performance, we conduct additional experiments in the
Foshan study area. Table XIII shows the land cover classification
results in the Foshan study area based on FF-C with different
patch sizes. We observe improved classification accuracy with
the increase in the patch sizes from 13 to 27, especially for soil
(an improvement by 16.23%).
Fig. 16 presents the OA of classification results corresponding
to different patch sizes. It can be seen that when the patch size
reach 27 pixel, the accuracy is the highest. Therefore, the patch
size is set to 27 in our experiments.
Fig. 16. Overall accuracy (%) with different patch sizes in the Foshan study
area.
FENG et al.: INTEGRATING ZHUHAI-1 HS IMAGERY WITH SENTINEL-2 MS IMAGERY 2423
VI. CONCLUSION
In this study, we propose a 2-D CNN-based method to improve
the accuracy of ISA extraction at 10 m spatial resolution by
combining Sentinel-2 MS data and Zhuhai-1 HS data. We test
our proposed approach in two study areas that cover Foshan
and Wuhan city, China. We first utilize 2-D CNN to extract
the spatial and spectral deep features of MS data and HS data,
then fuse the extracted deep features via a fully connected layer
for the final classification. To investigate the influence of the
fusion method on the final results, we compare the feature
fusion strategies with other comparison methods. The results
prove the superiority of feature fusion methods compared to
nonfusion methods. In the future, we plan to explore the im-
pact of model depths on image feature extraction and develop
more advanced fusion modules to take full advantage of the
detailed spectral information from HS images and detailed spa-
tial information from MS images. In addition, we plan to test
the proposed method in other regions to further investigate its
generalizability.
ACKNOWLEDGMENT
The authors would like to the anonymous reviewers, for their
valuable suggestions and comments that helped us improve this
article significantly.
REFERENCES
[1] Q. Weng, “Remote sensing of impervious surfaces in the urban areas:
Requirements, methods, and trends,” Remote Sens. Environ., vol. 117,
pp. 34–49, 2012.
[2] C. Li, Z. Shao, L. Zhang, X. Huang, and M. Zhang, “A comparative
analysis of index-based methods for impervious surface mapping using
multiseasonal sentinel-2 satellite data,” IEEE J. Sel. Topics Appl. Earth
Observ. Remote Sens., vol. 14, pp. 3682–3694, Mar. 2021.
[3] A. J. Arnfield, “Two decades of urban climate research: A review of
turbulence, exchanges of energy and water, and the urban heat island,”
Int. J. Climatol., A J. Roy. Meteorological Soc., vol. 23, no. 1, pp. 1–26,
2003.
[4] P. Coseo and L. Larsen, “How factors of land use/land cover, building
configuration, and adjacent heat sources and sinks explain urban heat
islands in Chicago,” Landscape Urban Plan., vol. 125, pp. 117–129,
2014.
[5] K. Conway,J. Barrie, P. Hill, W. Austin, and K. Picard, “Mapping sensitive
benthic habitats in the strait of Georgia, coastal British Columbia: Deep-
water sponge and coral reefs,” Geol. Surv. Can., vol. 2, pp. 1–6, 2007.
[6] H. Du et al., “Influences of land cover types, meteorological conditions,
anthropogenic heat and urban area on surface urban heat island in the
yangtze river delta urban agglomeration,Sci. Total Environ., vol. 571,
pp. 461–470, 2016.
[7] Z. Shao, H. Fu, D. Li, O. Altan, and T. Cheng, “Remote sensing mon-
itoring of multi-scale watersheds impermeability for urban hydrological
evaluation,Remote Sens. Environ., vol. 232, 2019, Art. no. 111338.
[8] X.-P. Song, J. O. Sexton, C. Huang, S. Channan, and J. R. Townshend,
“Characterizing the magnitude, timing and duration of urban growth from
time series of landsat-based estimates of impervious cover,” Remote Sens.
Environ., vol. 175, pp. 1–13, 2016.
[9] L. Zhang, Q. Weng, and Z. Shao, “An evaluation of monthly impervious
surface dynamics by fusing landsat and modis time series in the Pearl
river delta, China from 2000 to 2015,Remote Sens. Environ., vol. 201,
pp. 99–114, 2017.
[10] D. Lu and Q. Weng, “Spectral mixture analysis of the urban landscape in
indianapolis with landsat ETM imagery,” Photogrammetric Eng. Remote
Sens., vol. 70, no. 9, pp. 1053–1062, 2004.
[11] X. Huang, D. Wen, J. Li, and R. Qin, “Multi-level monitoring of subtle
urban changes for the megacities of China using high-resolution multi-
view satellite imagery,” Remote Sens. Environ., vol. 196, pp. 56–75, 2017.
[12] S. Roessner, K. Segl, U. Heiden, and H. Kaufmann, “Automated differen-
tiation of urban surfaces based on airborne hyperspectral imagery,IEEE
Trans. Geosci. Remote Sens., vol. 39, no. 7, pp. 1525–1532, Jul. 2001.
[13] A. Okujeni, S. van der Linden, and P. Hostert, “Extending the vegetation-
impervious-soil model using simulated enmap data and machine learning,”
Remote Sens. Environ., vol. 158, pp. 69–80, 2015.
[14] B. Feng and J. Wang, “Constrained nonnegative tensor factorization for
spectral unmixing of hyperspectral images: A case study of urban imper-
vious surface extraction,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 4,
pp. 583–587, Apr. 2019.
[15] F. Chen, K. Wang, T. Van de Voorde, and T. F. Tang, “Mapping urban land
cover from high spatial resolution hyperspectral data: An approach based
on simultaneously unmixing similar pixels with jointly sparse spectral
mixture analysis,” Remote Sens. Environ., vol. 196, pp. 324–342, 2017.
[16] A. H. Strahler, “The use of prior probabilities in maximum likelihood
classification of remotely sensed data,” Remote Sens. Environ., vol. 10,
no. 2, pp. 135–163, 1980.
[17] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sens-
ing images with support vector machines,” IEEE Trans. Geosci. Remote
Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004.
[18] V. Vapnik, The Nature of Statistical Learning Theory. Cham, Switzerland:
Springer, 2013.
[19] L. Breiman, “Random forests,Mach. Learn., vol. 45, no. 1, pp. 5–32,
2001.
[20] M. Fauvel,J. A. Benediktsson, J. Chanussot, and J. R. Sveinsson, “Spectral
and spatial classification of hyperspectral data using SVMs and mor-
phological profiles,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 11,
pp. 3804–3814, Nov. 2008.
[21] S.Schulter, P. Wohlhart, C. Leistner, A. Saffari, P. M. Roth, and H. Bischof,
“Alternating decision forests,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit., 2013, pp. 508–515.
[22] E. Tuv, A. Borisov, G. Runger, and K. Torkkola, “Feature selection with
ensembles, artificial variables, and redundancy elimination,J. Mach.
Learn. Res., vol. 10, pp. 1341–1366, 2009.
[23] P. Peng, Q.-L. Ma, and L.-M. Hong, “The research of the parallel SMO
algorithm for solving SVM,” in Proc. Int. Conf. Mach. Learn. Cybern.,
2009, vol. 3, pp. 1271–1274.
[24] P.-H. Chen, R.-E. Fan, and C.-J. Lin, “A study on SMO-type decomposition
methods for support vector machines,” IEEE Trans. Neural Netw., vol. 17,
no. 4, pp. 893–908, Jul. 2006.
[25] G. Camps-Valls, L. Gomez-Chova, J. Muñoz-Marí, J. Vila-Francés, and J.
Calpe-Maravilla, “Composite kernels for hyperspectral image classifica-
tion,” IEEE Geosci. Remote Sens. Lett., vol. 3, no. 1, pp. 93–97, Jan. 2006.
[26] Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Hyperspectral image classifi-
cation using dictionary-based sparse representation,” IEEE Trans. Geosci.
Remote Sens., vol. 49, no. 10, pp. 3973–3985, Oct. 2011.
[27] Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Hyperspectral image clas-
sification via kernel sparse representation,” IEEE Trans. Geosci. Remote
Sens., vol. 51, no. 1, pp. 217–231, Jan. 2013.
[28] L. Zhang, L. Zhang, and B. Du, “Deep learning for remote sensing data:
A technical tutorial on the state of the art,” IEEE Geosci. Remote Sens.
Mag., vol. 4, no. 2, pp. 22–40, Jun. 2016.
[29] Y. Chen, X. Zhao, and X. Jia, “Spectral-spatial classification of hyperspec-
tral data based on deep belief network,” IEEE J. Sel. Topics Appl. Earth
Observ. Remote Sens., vol. 8, no. 6, pp. 2381–2392, Jun. 2015.
[30] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, “Convolutional deep belief
networks for scalable unsupervised learning of hierarchical representa-
tions,” in Proc. 26th Annu. Int. Conf. Mach. Learn., 2009, pp. 609–616.
[31] Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi, “Deep feature extrac-
tion and classification of hyperspectral images based on convolutional
neural networks,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 10,
pp. 6232–6251, Oct. 2016.
[32] P. Ghamisi, Y. Chen, and X. X. Zhu, “A self-improving convolution neural
network for the classification of hyperspectral data,” IEEE Geosci. Remote
Sens. Lett., vol. 13, no. 10, pp. 1537–1541, Oct. 2016.
[33] Y. Chen, Y. Wang, Y. Gu, X. He, P. Ghamisi, and X. Jia, “Deep learning
ensemble for hyperspectral image classification,” IEEE J. Sel. Topics Appl.
Earth Observ. Remote Sens., vol. 12, no. 6, pp. 1882–1897, Jun. 2019.
[34] N. Yokoya, C. Grohnfeldt, and J. Chanussot, “Hyperspectral and mul-
tispectral data fusion: A comparative review of the recent litera-
ture,” IEEE Geosci. Remote Sens. Mag., vol. 5, no. 2, pp. 29–56,
Jun. 2017.
2424 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 15, 2022
[35] T. Blaschke, “Object based image analysis for remote sensing,” ISPRS J.
Photogramm. Remote Sens., vol. 65, no. 1, pp. 2–16, 2010.
[36] V. Nair and G. E. Hinton, “Rectified linear units improve restricted
boltzmann machines,” in Proc. 27th Int. Conf. Mach. Learn., 2010, pp.
807–814.
[37] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:
Surpassing human-level performance on imagenet classification,” in Proc.
IEEE Int. Conf. Comput. Vis., 2015, pp. 1026–1034.
[38] Z. Zuo et al., “Learning contextual dependence with convolutional hier-
archical recurrent neural networks,” IEEE Trans. Image Process., vol. 25,
no. 7, pp. 2983–2996, Jul. 2016.
[39] A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for
nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970.
[40] S.Li, W. Song, L. Fang, Y. Chen, P. Ghamisi, and J. A. Benediktsson, “Deep
learning for hyperspectral image classification: An overview,IEEE Trans.
Geosci. Remote Sens., vol. 57, no. 9, pp. 6690–6709, Sep. 2019.
[41] J.M. Haut, M. E. Paoletti, J. Plaza, J. Li, and A. Plaza, “Active learning with
convolutional neural networks for hyperspectral image classification using
a new Bayesian approach,” IEEE Trans. Geosci. Remote Sens., vol. 56,
no. 11, pp. 6440–6461, Nov. 2018.
[42] X. Yang, Y. Ye, X. Li, R. Y. Lau, X. Zhang, and X. Huang, “Hyperspectral
image classification with deep learning models,” IEEE Trans. Geosci.
Remote Sens., vol. 56, no. 9, pp. 5408–5423, Sep. 2018.
[43] L. Jiao, M. Liang, H. Chen, S. Yang, H. Liu, and X. Cao, “Deep fully con-
volutional network-based spatial distribution prediction for hyperspectral
image classification,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 10,
pp. 5585–5599, Oct. 2017.
[44] J. M. Nascimento and J. M. Dias, “Vertex component analysis: A fast
algorithm to unmix hyperspectral data,”IEEE Trans. Geosci. Remote Sens.,
vol. 43, no. 4, pp. 898–910, Apr. 2005.
[45] W. Li, R. Dong, H. Fu, J. Wang, L. Yu, and P. Gong, “Integrating google
earth imagery with landsat data to improve 30-m resolution land cover
mapping,” Remote Sens. Environ., vol. 237, 2020, Art. no. 111563.
[46] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
2014, arXiv:1412.6980.
[47] M. Paoletti, J. Haut, J. Plaza, and A. Plaza, “Deep learning classifiers for
hyperspectral imaging: A review,ISPRS J. Photogramm. Remote Sens.,
vol. 158, pp. 279–317, 2019.
Xiaoxiao Feng received the bachelor’s degree in surveying and mapping from
Southeast University, Nanjing, China, in 2014, and the master’s degree in
earth exploration and information technology from the China University of
Geosciences, Wuhan, China, in 2017, and the Ph.D. degree in photogrammetry
and remote sensing from Wuhan University, Wuhan, China in 2021.
He is currently a Ph.D. student with the State Key Laboratory of Information
Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan
University. Her research interests include hyperspectral image processing and
urban impervious surface extraction.
Zhenfeng Shao received the bachelor’s in surveying engineering and master’s
degrees in cartography and geographical information system from Wuhan Tech-
nical University of Surveying and Mapping, Wuhan, China, and the Ph.D. degree
in photogrammetry and remote sensing from Wuhan University, Wuhan, China.
He is currently a Professor with the State Key Laboratory of Information
Engineering in Surveying, Mapping and Remote Sensing, Wuhan University,
Wuhan, China. His research interest mainly focuses on urban remote sensing
applications. The specific research directions include high-resolution remote
sensing image processing and analysis, key technologies and applications from
digital cities to smart cities and sponge cities.
Xiao Huang received the bachelor’s degree in remote sensing and information
engineering from Wuhan University of China, Wuhan, in 2015, the master’s
degree in city planning and architecture from the Georgia Institute of Technology
China, Shenzhen, China, in 2016, and the Ph.D. degree in geography from the
University of South Carolina, Columbia, SC, USA, in 2020.
He is currently an Assistant Professor with the Department of Geosciences,
University of Arkansas, Fayetteville, AR, USA. His research interests include
remote sensing and GIS in natural hazards, data-driven visualization and ad-
vanced data fusion flood models, big social data mining, regional geospatial
analysis, remote sensing, and GeoAI.
Luxiao He received the bachelor’s degree in geo-information science and tech-
nology and the master’s degree in earth exploration and information technology
from the China University of Geosciences, Wuhan, China, respectively, in 2014
and 2017, and the Ph.D. degree in photogrammetry and remote sensing from
Wuhan University, Wuhan, China in 2021.
He is currently working toward the Ph.D. degree with the State Key Labo-
ratory of Information Engineering in Surveying, Mapping and Remote Sensing
(LIESMARS), Wuhan University, Wuhan, China.
His research interests include high spatial resolution image processing and
application.
Xianwei Lv received the bachelor’s degree in geographic information science
from the East China University of Science and Technology, Nanchang, China,
in 2016 and the master’s degree in surveying and mapping from the China
University of Geosciences, Beijing, China, in 2019. He is currently working
toward the Ph.D. degree in photogrammetry and remote sensing with the State
Key Laboratory of Information Engineering in Surveying, Mapping and Remote
Sensing, Wuhan University, Wuhan, China.
He does research in deep learning for very high-resolution image processing
and applications.
Qingwei Zhuang received the bachelor’s degree in surveying and mapping
from Henan Polytechnic University, Henan, China, in 2017, and the master’s
degree in surveying and mapping from the University of Chinese Academy of
Sciences, Beijing, China, in 2020. He is currently working toward the Ph.D.
degree in photogrammetry and remote sensing with the State Key Laboratory of
Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan
University, Wuhan, China.
His research interest mainly focuses on remote sensing applications. The spe-
cific research directions include remote sensing image processing and analysis,
key technologies and applications in urban ecosystem.
... Hyperspectral image classification (HSIC) aims to assign a unique category identity to each element in the image, which is a key technology for the intelligent interpretation of the B Huanhuan Lv 03081@zjhu.edu.cn hyperspectral image (HSI) and has been widely used in many fields such as urban development planning [1,2], agricultural land use [3][4][5], military target detection [6,7] and medical pathological diagnosis [8,9]. However, the problems of low spatial resolution, high spectral dimensionality and lack of labelled samples in HSI pose great challenges to the classification task [10][11][12]. ...
... The parameters of the first 2D convolutional layer is set to (64, (3,3)), and the second is set to (128, (3,3)). The step size of all two 2D convolution layers is set to (1,1). Finally, the features are fed into the fully connected network layer to obtain a set of feature maps stacked by 256 channel features. ...
... The parameters of the first 2D convolutional layer is set to (64, (3,3)), and the second is set to (128, (3,3)). The step size of all two 2D convolution layers is set to (1,1). Finally, the features are fed into the fully connected network layer to obtain a set of feature maps stacked by 256 channel features. ...
Article
Full-text available
Convolutional neural networks and graph convolutional neural networks are two classical deep learning models that have been widely used in hyperspectral image classification tasks with remarkable achievements. However, hyperspectral image classification models based on graph convolutional neural networks using only shallow spectral or spatial features are insufficient to provide reliable similarity measures for constructing graph structures, limiting their classification performance. To address this problem, we propose a new end-to-end hyperspectral image classification model combining 3D–2D hybrid convolution and a graph attention mechanism (3D–2D-GAT). The model utilizes the collaborative work of hybrid convolutional feature extraction module and GAT module to improve classification accuracy. First, a 3D–2D hybrid convolutional network is constructed and used to quickly extract the discriminant deep spatial-spectral features of various ground objects in hyperspectral image. Then, the graph is built based on deep spatial-spectral features to enhance the feature representation ability. Finally, a network of graph attention mechanism is adopted to learn long-range spatial relationship and distinguish the intra-class variation and inter-class similarity among different samples. The experimental results on three datasets, Indian Pine, the University of Pavia and Salinas Valley show that the proposed method can achieve higher classification accuracy compared with other advanced methods.
... Hyperspectral image classification (HSIC) aims to assign a unique category identity to each element in the image, which is a key technology for the intelligent interpretation of the hyperspectral image (HSI) and has been widely used in many fields such as urban development planning [1,2], agricultural land use [3][4][5], military target detection [6,7] and medical pathological diagnosis [8,9]. However, the problems of low spatial resolution, high spectral dimensionality and lack of labelled samples in HSI pose great challenges to the classification task [10][11][12]. ...
... The parameters of the first 2D convolutional layer is set to (64, (3,3)), and the second is set to (128, (3,3)). The step size of all two 2D convolution layers is set to (1,1). Finally, the features are fed into the fully connected network layer to obtain a set of feature maps stacked by 256 channel features. ...
... The parameters of the first 2D convolutional layer is set to (64, (3,3)), and the second is set to (128, (3,3)). The step size of all two 2D convolution layers is set to (1,1). Finally, the features are fed into the fully connected network layer to obtain a set of feature maps stacked by 256 channel features. ...
Preprint
Full-text available
Convolutional neural networks and graph convolutional neural networks are two classical deep learning models that have been widely used in hyperspectral image classification tasks with remarkable achievements. However, hyperspectral image classification models based on graph convolutional neural networks using only shallow spectral or spatial features are insufficient to provide reliable similarity measures for constructing graph structures, limiting their classification performance. To address this problem, we propose a hyperspectral image classification model combining 3D-2D hybrid convolution and a graph attention mechanism. First, a 3D-2D hybrid convolutional network is constructed and used to rapidly extract deep features that express spatial and spectral associations. Then, the graph is built based on deep spatial-spectral features to enhance the feature representation of the graph. Finally, a network of graph attention mechanisms is adopted to learn long-range spatial connections and to classify them using the extracted spatial features. The experimental results on two datasets, Indian Pine and the University of Pavia, show that the proposed method can achieve higher classification accuracy compared with other advanced methods.
... Early studies indicated that combining spectral indices features is conductive to provement of accuracy in ISA mapping [38,39]. In this study, the following three spec indices features were calculated, including the Normalized Difference Vegetation Ind (NDVI), which can highlight vegetated land [40]; the Modified Normalized Differe Water Index (MNDWI), which has a good effect on distinguishing water bodies from la use/cover [41]; and the Normalized Difference Built-up Index (NDBI), which has a sign icant effect on the extraction of ISA [42]. ...
... Early studies indicated that combining spectral indices features is conductive to improvement of accuracy in ISA mapping [38,39]. In this study, the following three spectral indices features were calculated, including the Normalized Difference Vegetation Index (NDVI), which can highlight vegetated land [40]; the Modified Normalized Difference Water Index (MNDWI), which has a good effect on distinguishing water bodies from land use/cover [41]; and the Normalized Difference Built-up Index (NDBI), which has a significant effect on the extraction of ISA [42]. ...
Article
Full-text available
The ecological environment of Yellow River Delta High-efficiency Ecological Economic Zone (YRDHEEZ) is adjacent to the Bohai Sea. The unique geographical location makes it highly sensitive to anthropogenic disturbances. As an important land surface biophysical parameter, the impervious surface area (ISA) can characterize the level of urbanization and measure the intensity of human activities, and hence, the timely understanding of ISA dynamic changes is of great significance to protect the ecological safety of the YRDHEEZ. Based on the multi-source and multi-modal Sentinel-1/2 remotely sensed data provided by Google Earth Engine (GEE) cloud computing platform, this study developed a novel approach for the extraction of time-series ISA in the YRDHEEZ through a combination of random forest algorithm and numerous representative features extracted from Sentinel-1/2. Subsequently, we revealed the pattern of the ISA spatial-temporal evolution in this region over the past five years. The results demonstrated that the proposed method has good performance with an average overall accuracy of 94.84% and an average kappa coefficient of 0.9393, which verified the feasibility of the proposed method for large-scale ISA mapping with 10m. Spatial-temporal evolution analysis revealed that the ISA of the YRDHEEZ decreased from 5211.39 km2 in 2018 to 5147.02 km2 in 2022 with an average rate of −16.09 km2/year in the last 5 years, suggesting that the ISA of YRDHEEZ has decreased while its overall pattern was not significantly changed over time. The presented workflow can provide a reference for large-scale ISA mapping and its evolution analysis, especially in regions on estuarine deltas.
... To get rid of the limitations of manually labeled data, Parekh et al. used OpenStreetMap data to automatically generate training and test samples to support the extraction of IS based on deep learning methods [19]. Besides, Feng et al. proposed an IS extraction method that deeply fused features from multispectral and hyperspectral images [20]. Although IS extraction methods based on deep learning can obtain highly competitive results, the application of deep learning in IS extraction is relatively rare compared to other fields. ...
Article
Full-text available
The fusion of optical and synthetic aperture radar (SAR) images is a promising method to extract urban impervious surface (IS) accurately. Previous studies have shown that the feature-level fusion of optical and SAR images can significantly improve IS extraction. However, they generally use simple layer stacking for features fusion, ignoring the interaction between optical and SAR images. Besides, most of the features they used are shallow features manually extracted, such as texture and geometric features, lacking the use of high-level semantic features of images. The lack of publicly available IS datasets is considered as an obstacle that prevents the extensive use of deep learning models in IS extraction. Therefore, this study first creates an open and accurate IS dataset based on optical and SAR images, and then proposes a semantic segmentation network based on cross fusion of optical and SAR images features, namely CroFuseNet, for IS extraction. In CroFuseNet, we design a cross fusion module (CFM) to fuse features of optical and SAR images to achieve better complementarity between the two types of images. And we propose a multimodal features aggregation (MFA) module to aggregate specific high-level features from optical and SAR images. To validate the proposed CroFuseNet, we compare it with two classical machine learning algorithms and four state-of-the-art deep learning models. The proposed model has the highest accuracy, with OA, MIoU and F1-Score of 97.77%, 0.9495 and 0.9770 respectively. The quantitative and qualitative experimental results demonstrate that the proposed model is superior to these comparative algorithms.
... These subtle changes are rather identified by high-resolution hyperspectral imaging of the canopies where both changes in spectra and the texture appear [34]. Other researchers also found that hyperspectral cameras have great potential in detecting insect disturbances [35] but others highlighted that the best results are reaped when fusing spectral information from hyperspectral cameras with spatial information from satellite data [36]. Therefore, the aim of this study is to determine the best vegetation indices used for the recognition of defoliation as well as the type and position of the sensors. ...
Article
Full-text available
Remote sensing of phenology is adopted as the practice in greenery monitoring. Now research is turned towards the fusion of data from various sensors to fill in the gap in time series and allow monitoring of pests and disturbances. Poplar species were monitored for the determination of the best approach for detecting phenology and disturbances. With the adjustments that include a choice of indices, wavelengths, and a setup, a multispectral camera may be used to calibrate satellite images. The image processing pipeline included different denoising and interpolation methods. The correlation of the changes in a signal of top and lateral imaging proved that the contribution of the whole canopy is reflected in satellite images. Normalized difference vegetation index (NDVI) and normalized difference red edge index (NDRE) successfully distinguished among phenophases and detected leaf miner presence, unlike enhanced vegetation index (EVI). Changes in the indices were registered before, during, and after the development of the disease. NDRE is the most sensitive as it distinguished among the different intensities of damage caused by pests but it was not able to forecast its occurrence. An efficient and accurate system for detection and monitoring of phenology enables the improvement of the phenological models’ quality and creates the basis for a forecast that allows planning in various disciplines.
Article
Full-text available
Monitoring the spatiotemporal dynamics of building footprints (BF) is necessary for understanding urbanization growth. It is a difficult task to extract residential sites, mainly BF, because of the complexity of their makeup and spectral variety. Additionally, conventional methods for building mapping typically rely on abundant training data and expertise from human operates. This study presents a new unsupervised Feature-Based Building Footprint Extraction (F2BFE) strategy using Sentinel-1&2 satellite images and the SRTM Digital Elevation Model (DEM). The newly developed radar index (NRI) from Sentinel-1 images was utilized to extract the Primary Building Footprints (PBF) through histogram analysis and thresholding techniques, based on the mean of annual Sentinel-1 VV and VH Backscatter channels in the Ascending orbit. In this research, the integration of the Otsu and Unimodal thresholding technique was developed as an optimal thresholding method for feature extraction. Furthermore, Sentinel-2 images were applied to extract spectral indices related to vegetation (NDVI, GNDVI, RDVI indices), water (NDWI index), and residential/built-up (NDBI, BuEI). The qualitative and quantitative validation results indicate that the NRI-based BF map achieved higher Overall Accuracy (OA) values of 98.14%, 90%, and 91% in Region of Interest-1 (ROI-1), ROI-2, and ROI-3, respectively. Additionally, the Kappa Coefficients (KC) for these regions were 0.96, 0.97, and 0.85, respectively. The NRI index provides an excellent OA result when vegetation, water, and slope features are carefully eliminated. Finally, it can be inferred that the simultaneous use of the sentinel-1&2 and slope data in feature space leads to increased BF accuracy.
Article
Full-text available
An accurate spatial distribution map of the urban dominant tree species is crucial for evaluating the ecosystem service value of urban forests and formulating urban sustainable development strategies. Spaceborne hyperspectral remote sensing has been utilized to distinguish tree species, but these hyperspectral data have a low spatial resolution (pixel size ≥ 30 m), which limits their ability to differentiate tree species in urban areas characterized by fragmented patches and robust spatial heterogeneity. Zhuhai-1 is a new hyperspectral satellite sensor with a higher spatial resolution of 10 m. This study aimed to evaluate the potential of Zhuhai-1 hyperspectral imagery for classifying the urban dominant tree species. We first extracted 32 reflectance bands and 18 vegetation indices from Zhuhai-1 hyperspectral data. We then used the random forest classifier to differentiate 28 dominant tree species in Shenzhen based on these hyperspectral features. Finally, we analyzed the effects of the classification paradigm, classifier, and species number on the classification accuracy. We found that combining the hyperspectral reflectance bands and vegetation indices could effectively distinguish the 28 dominant tree species in Shenzhen, obtaining an overall accuracy of 76.8%. Sensitivity analysis results indicated that the pixel-based classification paradigm was slightly superior to the object-based paradigm. The random forest classifier proved to be the optimal classifier for distinguishing tree species using Zhuhai-1 hyperspectral imagery. Moreover, reducing the species number could slowly improve the classification accuracy. These findings suggest that Zhuhai-1 hyperspectral data can identify the urban dominant tree species with accuracy and holds potential for application in other cities.
Article
Full-text available
Studies have shown that Sentinel-2 images have advantages over Landsat images in impervious surface area (ISA) extraction. The performance of index-based methods can be affected by different binary methods and subject to seasonal variation. This study marks the first attempt to assess the performance of different spectral indices for ISA extraction using multi-seasonal Sentinel-2 images. Specifically, five indices (i.e., the Biophysical Composition Index calculated using the Gram-Schmidt orthogonalization method (BCI_GSO), Biophysical Composition Index calculated using a principal component-based Procrustes analysis (BCI_PCP), Normalized Built-up Area Index (NBAI), Combinational Build-up Index (CBI), and Perpendicular Impervious Surface Index (PISI)) and three impervious surface binary methods (i.e., Otsu's method, manual method, and ISODATA (Iterative Self-Organizing Data Analysis Technique Algorithm) classification method) were tested on multi-seasonal Sentinel-2 images in the main urban area of Wuhan, China. Results showed that PISI combined with the ISODATA classification method achieved the highest accuracy with 92.64% OA, and 0.8410 Kappa coefficient, and NBAI combined with Otsu's method achieved the lowest accuracy with 35.37% OA, and 0.013 Kappa coefficient. Regarding the seasonal sensitivity, PISI is relatively more stable than the other methods. The superior performance of PISI is largely due to its capability in separating ISA from soil and vegetation. In addition, summer is the best season to map ISA from Sentinel-2 images when the impervious surface is generally less confused with bare soil. This study serves as a reference for the selection of spectral indices for ISA extraction from Sentinel-2 images in relation to binary methods and seasonal effects.
Article
Full-text available
Advances in computing technology have fostered the development of new and powerful deep learning (DL) techniques, which have demonstrated promising results in a wide range of applications. Particularly, DL methods have been successfully used to classify remotely sensed data collected by Earth Observation (EO) instruments. Hyperspectral imaging (HSI) is a hot topic in remote sensing data analysis due to the vast amount of information comprised by this kind of images, which allows for a better characterization and exploitation of the Earth surface by combining rich spectral and spatial information. However, HSI poses major challenges for supervised classification methods due to the high dimensionality of the data and the limited availability of training samples. These issues, together with the high intraclass variability (and interclass similarity) –often present in HSI data– may hamper the effectiveness of classifiers. In order to solve these limitations, several DL-based architectures have been recently developed, exhibiting great potential in HSI data interpretation. This paper provides a comprehensive review of the current-state-of-the-art in DL for HSI classification, analyzing the strengths and weaknesses of the most widely used classifiers in the literature. For each discussed method, we provide quantitative results using several well-known and widely used HSI scenes, thus providing an exhaustive comparison of the discussed techniques. The paper concludes with some remarks and hints about future challenges in the application of DL techniques to HSI classification. The source codes of the methods discussed in this paper are available from: https://github.com/mhaut/hyperspectral_deeplearning_review.
Article
Full-text available
Hyperspectral image (HSI) classification has become a hot topic in the field of remote sensing. In general, the complex characteristics of hyperspectral data make the accurate classification of such data challenging for traditional machine learning methods. In addition, hyperspectral imaging often deals with an inherently nonlinear relation between the captured spectral information and the corresponding materials. In recent years, deep learning has been recognized as a powerful feature-extraction tool to effectively address nonlinear problems and widely used in a number of image processing tasks. Motivated by those successful applications, deep learning has also been introduced to classify HSIs and demonstrated good performance. This survey paper presents a systematic review of deep learning-based HSI classification literatures and compares several strategies for this topic. Specifically, we first summarize the main challenges of HSI classification which cannot be effectively overcome by traditional machine learning methods, and also introduce the advantages of deep learning to handle these problems. Then, we build a framework that divides the corresponding works into spectral-feature networks, spatial-feature networks, and spectral-spatial-feature networks to systematically review the recent achievements in deep learning-based HSI classification. In addition, considering the fact that available training samples in the remote sensing field are usually very limited and training deep networks require a large number of samples, we include some strategies to improve classification performance, which can provide some guidelines for future studies on this topic. Finally, several representative deep learning-based classification methods are conducted on real HSIs in our experiments.
Article
Full-text available
Deep learning has achieved great successes in conventional computer vision tasks. In this paper, we exploit deep learning techniques to address the hyperspectral image classification problem. In contrast to conventional computer vision tasks that only examine the spatial context, our proposed method can exploit both spatial context and spectral correlation to enhance hyperspectral image classification. In particular, we advocate four new deep learning models, namely, 2-D convolutional neural network (2-D-CNN), 3-D-CNN, recurrent 2-D CNN (R-2-D-CNN), and recurrent 3-D-CNN (R-3-D-CNN) for hyperspectral image classification. We conducted rigorous experiments based on six publicly available data sets. Through a comparative evaluation with other state-of-the-art methods, our experimental results confirm the superiority of the proposed deep learning models, especially the R-3-D-CNN and the R-2-D-CNN deep learning models.
Article
Land use and land cover maps provide fundamental information that has been used in different kinds of studies, ranging from climate change to city planning. However, despite substantial efforts in recent decades, large-scale 30-m land cover maps still suffer from relatively low accuracy in terms of land cover type discrimination (especially for the vegetation and impervious types), due to limits in relation to the data, method, and design of the workflow. In this work, we improved the land cover classification accuracy by integrating free and public high-resolution Google Earth images (HR-GEI) with Landsat Operational Land Imager (OLI) and Enhanced Thematic Mapper Plus (ETM+) imagery. Our major innovation is a hybrid approach that includes three major components: (1) a deep convolutional neural network (CNN)-based classifier that extracts high-resolution features from Google Earth imagery; (2) traditional machine learning classifiers (i.e., Random Forest (RF) and Support Vector Machine (SVM)) that are based on spectral features extracted from 30-m Landsat data; and (3) an ensemble decision maker that takes all different features into account. Experimental results show that our proposed method achieves a classification accuracy of 84.40% on the entire validation dataset in China, improving the previous state-of-the-art accuracies obtained by RF and SVM by 4.50% and 4.20%, respectively. Moreover, our proposed method reduces misclassifications between certain vegetation types, and improves identification of the impervious type. Evaluation applied over an area of around 14,000 km 2 confirms little improvement for land cover types (e.g., forest) of which the classification accuracies are already over 80% when using traditional machine learning approaches, yet improvements in accuracy of 7% for cropland and shrubland, 9% for grassland, 23% for impervious and 25% for wetlands were achieved when compared with traditional machine learning approaches. The results demonstrate the great potential of integrating features of datasets at different resolutions and the possibility to produce more reliable land cover maps.
Article
Deep learning models, especially deep convolutional neural networks (CNNs), have been intensively investigated for hyperspectral image (HSI) classification due to their powerful feature extraction ability. In the same manner, ensemble-based learning systems have demonstrated high potential to effectively perform supervised classification. In order to boost the performance of deep learning-based HSI classification, the idea of deep learning ensemble framework is proposed here, which is loosely based on the integration of deep learning model and random subspace-based ensemble learning. Specifically, two deep learning ensemble-based classification methods (i.e., CNN ensemble and deep residual network ensemble) are proposed. CNNs or deep residual networks are used as individual classifiers and random subspaces contribute to diversify the ensemble system in a simple yet effective manner. Moreover, to further improve the classification accuracy, transfer learning is investigated in this study to transfer the learnt weights from one individual classifier to another (i.e., CNNs). This mechanism speeds up the learning stage. Experimental results with widely used hyperspectral datasets indicate that the proposed deep learning ensemble system provides competitive results compared with state-of-the-art methods in terms of classification accuracy. The combination of deep learning and ensemble learning provides a significant potential for reliable HSI classification.
Article
In recent years, a new genre of hyperspectral unmixing methods based on nonnegative matrix factorization (NMF) have been proposed. Unlike traditional spectral unmixing methods, the NMF-based hyperspectral unmixing methods no longer depend on pure pixels in the original image. The NMF is based on linear algebra, which requires that the hyperspectral data cube is converted from 3-D cube to a 2-D matrix. Due to this conversion, the spatial information in the relative positions of the pixels is lost. With the emergence of multilinear algebra, the tensorial representation of hyperspectral imagery that preserves spectral and spatial information has become popular. The tensor-based spectral unmixing was first realized in 2017 using the matrix-vector nonnegative tensor factorization (MVNTF) decomposition. Using the construction of MVNTF spectral unmixing, this letter proposes to integrate three additional constraints (sparseness, volume, and nonlinearity) to the cost function. As we show in this letter, we found that the three constraints greatly improved the impervious surface area fraction/classification results. The constraints also shortened the processing time.
Article
Hyperspectral imaging is a widely used technique in remote sensing in which an imaging spectrometer collects hundreds of images (at different wavelength channels) for the same area on the surface of the earth. In the last two decades, several methods (unsupervised, supervised, and semisupervised) have been proposed to deal with the hyperspectral image classification problem. Supervised techniques have been generally more popular, despite the fact that it is difficult to collect labeled samples in real scenarios. In particular, deep neural networks, such as convolutional neural networks (CNNs), have recently shown a great potential to yield high performance in the hyperspectral image classification. However, these techniques require sufficient labeled samples in order to perform properly and generalize well. Obtaining labeled data is expensive and time consuming, and the high dimensionality of hyperspectral data makes it difficult to design classifiers based on limited samples (for instance, CNNs overfit quickly with small training sets). Active learning (AL) can deal with this problem by training the model with a small set of labeled samples that is reinforced by the acquisition of new unlabeled samples. In this paper, we develop a new AL-guided classification model that exploits both the spectral information and the spatial-contextual information in the hyperspectral data. The proposed model makes use of recently developed Bayesian CNNs. Our newly developed technique provides robust classification results when compared with other state-of-the-art techniques for hyperspectral image classification.
Conference Paper
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.