Conference PaperPDF Available

The Influence of Sampling Methods on Pixel-Wise Hyperspectral Image Classification with 3D Convolutional Neural Networks


Abstract and Figures

Supervised image classification is one of the essential techniques for generating semantic maps from remotely sensed images. The lack of labeled ground truth datasets, due to the inherent time effort and cost involved in collecting training samples, has led to the practice of training and validating new classifiers within a single image. In line with that, the dominant approach for the division of the available ground truth into disjoint training and test sets is random sampling. This paper discusses the problems that arise when this strategy is adopted in conjunction with spectral-spatial and pixel-wise classifiers such as 3D Convolutional Neural Networks (3D CNN). It is shown that a random sampling scheme leads to a violation of the independence assumption and to the illusion that global knowledge is extracted from the training set. To tackle this issue, two improved sampling strategies based on the Density-Based Clustering Algorithm (DBSCAN) are proposed. They minimize the violation of the train and test samples independence assumption and thus ensure an honest estimation of the generalization capabilities of the classifier.
Content may be subject to copyright.
Julius Lange1, Gabriele Cavallaro2, Markus G¨
otz2,3, Ernir Erlingsson3, Morris Riedel2,3
1Humboldt University of Berlin, Germany
ulich Supercomputing Centre, Forschungszentrum J¨
ulich, Germany
3School of Engineering and Natural Sciences, University of Iceland, Iceland
Supervised image classification is one of the essential tech-
niques for generating semantic maps from remotely sensed
images. The lack of labeled ground truth datasets, due to the
inherent time effort and cost involved in collecting training
samples, has led to the practice of training and validating new
classifiers within a single image. In line with that, the dom-
inant approach for the division of the available ground truth
into disjoint training and test sets is random sampling. This
paper discusses the problems that arise when this strategy is
adopted in conjunction with spectral-spatial and pixel-wise
classifiers such as 3D Convolutional Neural Networks (3D
CNN). It is shown that a random sampling scheme leads to
a violation of the independence assumption and to the illu-
sion that global knowledge is extracted from the training set.
To tackle this issue, two improved sampling strategies based
on the Density-Based Clustering Algorithm (DBSCAN) are
proposed. They minimize the violation of the train and test
samples independence assumption and thus ensure an honest
estimation of the generalization capabilities of the classifier.
Index TermsHyperspectral image classification, sam-
pling strategies, clustering, DBSCAN, deep learning, Convo-
lutional Neural Networks (CNNs)
During the past few decades the processing of Earth obser-
vation data through remote sensing techniques has benefited
from advancements in instruments on-board space and air-
borne platforms. Among all the possible products that can be
derived from remote sensing data, classification maps are per-
haps the most often used by many applications. Classification
algorithms are utilized to distinguish between different types
of land-cover classes in order to interpret processes, such as
monitoring of urban growth, impacts of natural disasters, ob-
ject detection, etc. When training samples are available, the
This project has received funding from the European Union’s Hori-
zon 2020 research and innovation programme under the Grant Agreement
No. 754304 DEEP-EST. The results of this research were achieved through
the Human Brain Project PCP Pilot Systems at the Juelich Supercomputing
Centre, which received co-funding from the European Union’s Horizon 2020
research and innovation programme under the Grant Agreement No. 604102.
model parameters of the classifier are learned in a supervised
way. Once the training is completed, the main challenge is to
obtain accurate and reliable semantic maps from previously
unseen data. This capability is usually more influenced by
the amount and quality of the training samples rather than
the model complexity, since classifiers are based on the as-
sumption that training and test samples are generated from
the same feature space and distribution [1]. Remote sensing
data usually present heterogeneous feature spaces and distri-
butions due to differences in acquisition or changes in the na-
ture of the object observed. As a consequence, most of the
statistical models are likely to fail the prediction of new sam-
ples. A straightforward solution to this problem is to rebuilt
from scratch the predictive model using new training data.
However, these samples are usually either collected manually
with ground surveys or automatically generated through im-
age photo interpretation [2]. As a consequence there is a lack
of appropriate benchmark datasets within the community and
the practice of benchmarking new classification algorithms
over a single image remains dominant.
Similarly to Liang et al. [3] and Hansch et al. [4] this pa-
per aims at showing that the extraction of disjointed train and
test sets though a random sampling approach cannot guar-
antee unbiased samples. However, this study considers two
novel aspects. On the one hand, 3D CNNs are investigated as
the spectral-spatial classifier. Due to the way convolutional
neurons process a training sample within a receptive field,
the overlap between the training and testing samples is arti-
ficially enhanced. One the other hand, to alleviate this over-
lapping effect, two alternative sampling strategies based on
the DBSCAN [5] algorithm are proposed. The experiments,
conducted on the full site hyperspectral Indian Pines dataset1,
confirm previous findings regarding random sampling tech-
niques [3, 4] and show that the proposed sampling scheme
leads to less biased error estimates. The amplification of the
accuracy brought by the random sampling approach is atten-
uated, i.e., decreased for each class, while the performance
evaluation can be considered fair, unbiased and with a ratio-
nal estimation of the classifier generalization capabilities.
In order to reduce the need for and effort in recollecting train-
ing data, recent works have considered solutions based on
transfer learning, domain adaptation and active learning ap-
proaches [6]. These solutions offer the capability of exploit-
ing the knowledge acquired by the available ground reference
samples for classifying new images acquired over heteroge-
neous geographical locations at diverse times with different
sensors. However, Ball et al. [7] provide a summary of the
common open-source hyperspectral datasets that are used for
validating new deep learning classifiers methods. These entail
four datasets, i.e., Indian Pines (small test site), Pavia Univer-
sity, Pavia City Center, and Salinas2and they are saturated in
terms of classification accuracies.
The standard procedure for estimating the generalization
error is to divide the ground truth samples into two disjoint
sets, one for training and one for testing. The error obtained
on the training data should not be considered since it is not
difficult to decrease it to zero given a sufficiently complex
method which can easily memorize the training data. There-
fore, the sampling strategy that is adopted for producing these
disjointed sets has a large influence for the validation phase.
The random sampling strategy has been always considered
as the natural choice, especially for classifiers that ignore the
spatial information. Since spectral classifiers are less effec-
tive when dealing with very high spatial resolution images,
modern classification pipelines include both spectral and spa-
tial information. Recently, deep learning has brought in rev-
olutionary achievements in many applications, including the
processing of remote sensing images [7]. Remarkable results
have been achieved with CNNs due to their hierarchical struc-
ture able to extract more hidden and deeper features. Re-
cently, novel supervised CNNs have been proposed for hy-
perspectral image classification [8, 9]. These cover three-
dimensional models that utilize receptive fields in both do-
mains, spectral and spatial. The majority of these studies
have carried out their experiments on standard hyperspectral
datasets [7] by adopting random sampling strategies.
Researchers usually focus on improving the classifica-
tion performance, while the above discussed problems are
mostly neglected. The increase of spatially correlated data
by spectral-spatial features and its influence on the quality
of the estimate of the generalization error was already dis-
cussed by Zhou et al. [10]. A more recent work proposed a
sampling scheme that minimizes the spatial overlap between
train and test data [3]. The method aims to capture the full
spectral variation of the image by globally sampling compact
regions. Finally, Hansch et al. [4] evaluated different sam-
pling approaches and proposed a new strategy that simulates
a realistic gap of data variation between train and application
phase. The method proposed in this work is a more flexible
generalization of these two.
3.1. Sampling Approaches
The idea of the proposed sampling approach is to minimize
the number of biased samples. Bias occurs when directly
neighboring or nearby pixels are present in both training and
test sets. Due to their spatial closeness, information from one
set may leak into the respective other, violating the indepen-
dence assumption. In case of estimating central pixels based
on a surrounding window mask, for example, spatial recep-
tive fields in training and test data may overlap and be nearly
identical. Correctly classifying a pixel of the same class in the
test set based on the previously seen similar instance in the
training data is very likely. In fact, the classification problem
degrades from an actual pattern recognition to simple mem-
orization. The proposed clustering-based method attempts to
overcome this problem by, first, extracting larger contiguous
regions using the class labels, e.g. buildings, fields, etc., and
then distributing these disjointly between the training and test
set. A bias, if present at all, would then only be relevant at the
outer edges of such a region, but not for the inner pixels.
The extraction of the contiguous regions is achieved with
the DBSCAN [5] clustering algorithm. It detects subgroups
within a set through the recursive evaluation of a neighbor
point density threshold (minP oints) criterion within a para-
metric search radius (ε) around a sample. Thereby, indepen-
dent regions can be determined by clustering the coordinates
of pixels of a particular class. Each resulting cluster directly
corresponds to a region. The distribution of the identified
regions between the training and test set is the next logical
problem to address. In principal these regions could now be
randomly sampled and assigned to either one of the two sets.
However, the count of extracted regions is significantly lower
(in the order of a few dozens) compared to the number of pix-
els. For this reasons, the likelihood of selecting an imbalanced
training set rises strongly, e.g., one that does not contain pat-
terns that are present in the test data, like cloud coverage for
Instead, an approach should be selected that maximizes
the variability in the training set, so that a large number of
potential patterns is covered. This requires to establish a met-
ric that evaluates said variety. The first two, proposed as part
of this work, are the region area size and statistical variance
(σ2). Based on this, sorting the regions in ascending, respec-
tively descending order, and assigning them to the training
set, up until the selected split percentage, should result in a
less biased but highly variable pattern distribution. An exam-
ple is depicted in Figure 1. Admittedly, employing the met-
ric on all clustered regions before having splitting them into
training and test data introduces bias itself. Namely, informa-
tion from both, supposedly independent sets, is used to form
them. Being from the same feature space and distribution [1],
this means that the training set, as proposed, is treated favor-
ably. Therefore, an overestimated out-of-sample accuracy on
(a) Random sampling. (b) Cluster sampling with area. (c) Cluster sampling with variance.
Fig. 1: Visualization of different sampling strategies exemplified using the class “forest” of the Indian Pines dataset. Black
pixels are background, white training (10% of the available labeled samples) and purple test samples.
the test set should be the result, diminishing the true gener-
alization capabilities. In the worst overestimation case, the
prediction accuracy would be higher than a randomly sam-
pled datasets. For practical applications, though, this bias is
negligible as the experimental evaluation in Section 3.3 show.
3.2. 3D CNN and Dataset
The proposed 3D CNN is designed to perform pixel-wise
classification of hyperspectral images. As input it accepts
spatial-spectral tensors of size (w, w, c)(wwindow size; c
number of spectral bands), exploits the spectral information
and the correlation between neighboring pixels, and predicts
the center pixel. The network is summarized by Table 1 and
includes convolutional-, max-pooling-, fully-connected- and
softmax layers. The triple alternation of convolutional and
Table 1: Complete set of specifications for the 3D CNN (with
583,962 trainable parameters).
Feature Representation / Value
Conv. Layer Filters 48, 32, 32
Conv. Layer Filter size (3,3,5),(3,3,5),(3,3,5)
Pooling size (1,1,3),(1,1,3),(1,1,2)
Dense Layer Neurons 128, 128
Activation Functions rectified linear unit (ReLU)
Loss Function mean-squared error (MSE)
Optimization stochastic gradient descent (SGD)
Training Epochs 600
Batch Size 50
Learning Rate 1.0
Learning Rate Decay 5×106
max pooling layers (i.e., applied to the cdimension) allows
the network to reduce the number of channels and learn spec-
tral features with different levels of abstraction. The output
tensor of these layers is then flattened into a one-dimensional
feature vector and passed to two fully connected layers for the
class probability prediction. A softmax layer with a vector
length corresponding to the total number of classes votes for
the likeliest option. The experiments have been performed
on the JURON pilot system at J¨
ulich Supercomputing Centre
and the development of the network was performed with the
Keras library (2.0.8) and the TensorFlow (1.3.0) back-end.
The dataset is the Indian Pines hyperspectral image ac-
quired by the AVIRIS sensor in 1992 over an agricultural site
composed of fields with regular geometry and with a variety
of crops. It consists of 614×2166 pixels and 220 spectral
bands, with a spatial resolution of 20m. The ground truth en-
compasses 58 different land-cover classes with a highly im-
balanced density distribution. In the few works that consid-
ered this full size dataset [11], it has been a common practice
to exclude the under-represented classes (e.g., with less than
100 samples) and discard noisy spectral bands. However, this
work considers all the channels and classes in order to test the
robustness of the classifier.
Table 2: Classification results (overall accuracy) of the sam-
pling strategies with different % of training samples.
10 30 60 90 %
Random 0.846 0.932 0.971 0.974 OA
0.834 0.926 0.966 0.972 kappa
Area 0.289 0.323 0.381 0.615 OA
0.231 0.245 0.318 0.581 kappa
Variance 0.251 0.334 0.358 0.389 OA
0.207 0.267 0.285 0.322 kappa
3.3. Experimental Results
Most of the proposed CNN classifiers that considered the In-
dian Pines dataset as a benchmark used the small test site
(16 classes in an area of 145×145 pixels) with random sam-
pling strategies for dividing training and test samples. This
leads to a vast number of classifiers that provided near perfect
classification accuracy. When considering the full test site of
Indian Pines, the state-of-the art classification accuracy (i.e.,
κ= 0.84 with 30% of the data to train with 20 classes and 20
channels excluded) was achieved by Romero et al. [11]. They
proposed to use a greedy layer-wise unsupervised pre-training
on deep CNNs coupled with a an algorithm for unsupervised
learning of sparse features (i.e., Enforcing Lifetime and Popu-
lation Sparsity - EPLS). The classification results achieved by
the proposed 3D CNN are depicted by Table 2 and show that
with only 10% of the available samples for training it was al-
ready possible to obtain κ= 0.83. These results confirm that
0 20 40 60 80 100
Percentage of training samples
Percentage of never seen test samples
Fig. 2: Percentage of unbiased samples for the different sam-
pling strategies with window tensor size wequal to nine.
random sampling approaches can always achieve the best re-
sults. However, the plot depicted by Figure 2 gives a clear
explanation for these achievements. When considering the
random strategy, the number of independent samples (i.e., not
seen during the learning phase) are already less than 1% for a
training set of 10%. On the contrary, the proposed sampling
strategies allow to maintain an acceptable level of indepen-
dence even for training set with higher amount of samples.
On the one hand, this leads to worse classification results, as
shown in Table 2. On the other hand, these numbers are a
more trustworthy representation of how the resulting model
could perform in real world applications, e.g., a usable trans-
ferable classifier. The classifier is unable to learn all the pos-
sible data variations like the same image acquired in different
seasons. The gap of data variation between train and applica-
tion phase remains in place.
The influence of different sampling strategies on the perfor-
mance of pixel-wise image classification has been evaluated.
Confirming previous research, the widely used random sam-
pling approach violates the independence assumption due to
the introduction of systematic bias. This is particular true for
current state-of-the-art CNNs and the spatial overlaps in their
receptive fields. The proposed sampling approaches using the
DBSCAN clustering algorithm minimizes said bias and re-
sults in a classification accuracy on unseen test data closer to
an actual out-of-sample performance.
In line with this observation, a more wide-spread adapta-
tion of none-random sampling approaches for remote sensing
classification problems stands to reason. Particularly, for
transfer learning and concept drift problems the relevance
of the presented findings is apparent. For the future, it is
planned that other datasets are investigated using the pro-
posed method. A similar classification accuracy performance
degradation is to be expected. It will be of interest to fur-
ther investigate the influence of the proposed and then added
sorting metrics for the regions on the classifier performance.
[1] Q. Yang and X. Wu, “10 Challenging Problems in Data
Mining,” International Journal of Information Technol-
ogy and Decision Making, vol. 05, no. 04, pp. 597–604,
[2] B. Demir, C. Persello, and L. Bruzzone, “Batch-Mode
Active-Learning Methods for the Interactive Classifica-
tion of Remote Sensing Images,” IEEE Trans. Geosci.
Remote Sens., vol. 49, no. 3, pp. 1014–1031, 2011.
[3] J. Liang, J. Zhou, Y. Qian, L. Wen, X. Bai, and Y. Gao,
“On the Sampling Strategy for Evaluation of Spectral-
Spatial Methods in Hyperspectral Image Classification,”
IEEE Trans. Geosci. Remote Sens., vol. 55, no. 2, pp.
862–880, 2017.
[4] R H¨
ansch, A Ley, and O Hellwich, “Correct and Still
Wrong: The Relationship Between Sampling Strategies
and the Estimation of the Generalization Error, in Pro-
ceedings of the IEEE IGARSS, 2017.
[5] M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A
Density-based Algorithm for Discovering Clusters a
Density-based Algorithm for Discovering Clusters in
Large Spatial Databases with Noise, in Proceedings
of the SIGKDD, 1996, pp. 226–231.
[6] S. J. Pan and Q. Yang, “A survey on transfer learn-
ing,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 10,
pp. 1345–1359, 2010.
[7] J. E. Ball, D. T. Anderson, and C. S. Chan, “A Com-
prehensive Survey of Deep Learning in Remote Sens-
ing: Theories, Tools and Challenges for the Commu-
nity, in Proceedings of the SPIE Journal of Applied
Remote Sensing, 2017.
[8] Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi,
“Deep Feature Extraction and Classification of Hyper-
spectral Images Based on Convolutional Neural Net-
works,IEEE Trans. Geosci. Remote Sens., vol. 54, no.
10, pp. 6232–6251, 2016.
[9] M. He, B. Li, and H. Chen, “Multi-Scale 3D Deep
Convolutional Neural Network for Hyperspectral Image
Classification,” in Proceedings of the IEEE ICIP, 2017.
[10] J. Zhou, J. Liang, Y. Qian, Y. Gao, and L. Tong, “On
the Sampling Strategies for Evaluation of Joint Spectral-
spatial Information based Classifiers,” in Proceedings of
the 7th WHISPERS, June 2015, pp. 1–4.
[11] A. Romero, C. Gatta, and G. Camps-valls, “Unsuper-
vised Deep Feature Extraction for Remote Sensing Im-
age Classification,” IEEE Trans. Geosci. Remote Sens.,
vol. 54, no. 3, pp. 1–14, 2015.
... More recently, we observe a discussion in the community that mentions the problem but mainly comes with heuristic sample procedures that still select pixels in a random way according to criteria and then applies suggested procedures on the standard benchmarks in literature, see [5,7,8]. It appears that no lesson is learned from the strategy followed by statisticians to create criteria that relate the quality of the estimated parameters to the design of experiments by using a deterministic selection strategy. ...
... As mentioned, [3] were one of the first to point at the difficulty of having independent observations if windows overlap. Therefore, [5,7,8] also discussed this point and came with alternative heuristic sampling techniques. ...
... The intensity of the grey scale provides the number of times data of a pixel is used in the selection. Lange et al. [7] mentioned for some case that a 10% training sample leaves only 1% of the pixels as not seen. This depends of course on the size of the window w. ...
The careful design of experiments in spatial statistics aims at estimating models in an accurate way. In the field of spatial deep learning to classify spatial observations, the training set used to calibrate a model or network is usually determined in a random way in order to obtain a representative sample. This chapter will sketch with examples that this is not necessarily the best way to proceed. Moreover, as in some cases windows are used to smooth signals, overlap may occur in the spatial data. On the one hand, this implies auto-correlation in the training set and, on the other hand, a correlation among pixels used for training and for testing. Our question is how to measure such an overlap and how to steer the selection of training sets. We describe an optimization problem to model and minimize the auto-correlation. A simple example is used to capture the concepts of design of experiments versus training set selection and the measurement of the overlap.
... For each class, it is clustered into two clusters according to its spatial coordinates. The training samples are randomly selected from a cluster, and the remaining classes are used as the test set; • Density-Based Clustering Algorithm Sampling Strategy: Lange et al. [66] detect subgroups in a set by recursively evaluating the density threshold of neighbor points around the sample with parameter as the search radius. Therefore, independent regions can be determined by clustering the coordinates of pixels of a particular class. ...
Full-text available
In deep learning-based hyperspectral remote sensing image classification tasks, random sampling strategies are typically used to train model parameters for testing and evaluation. However, this approach leads to strong spatial autocorrelation between the training set samples and the surrounding test set samples, and some unlabeled test set data directly participate in the training of the network. This leaked information makes the model overly optimistic. Models trained under these conditions tend to overfit to a single dataset, which limits the range of practical applications. This paper analyzes the causes and effects of information leakage and summarizes the methods from existing models to mitigate the effects of information leakage. Specifically, this paper states the main issues in this area, where the issue of information leakage is addressed in detail. Second, some algorithms and related models used to mitigate information leakage are categorized, including reducing the number of training samples, using spatially disjoint sampling strategies, few-shot learning, and unsupervised learning. These models and methods are classified according to the sample-related phase and the feature extraction phase. Finally, several representative hyperspectral image classification models experiments are conducted on the common datasets and their effectiveness in mitigating information leakage is analyzed.
... However, increasing the spatial context increases trainable parameters and consequently increases training time. Also, a higher value of S can lead to overlapping problem [17]. So we settled on S = 11. ...
Full-text available
Hyperspectral image sensors can provide valuable data for land covers, oceans, and the earth atmosphere at various spatial and spectral scales. Rich spectral and spatial information of a location makes hyperspectral image (HSI) an excellent way to work with materials, identify them, or define their properties. However, computer-automated analysis and classification of hyperspectral image is a challenging problem. Most of the spectral information in hyperspectral image is correlated, containing redundant information. High number of bands in input image contributes to the curse of dimensionality problem that reduces classifier performance. In many applications, the amount of labelled hyperspectral data that can be acquired is minimal. The complexities associated with HSI motivate us to propose a method named FA-CNN. We have used factor analysis (FA) dimension reduction technique to remove band correlation while maintaining useful spectral information in a lower number of bands. Then, we have applied convolutional neural network (CNN) for combining spectral and spatial features of HSI. Finally, multilayer perceptron classifier is used for classifying each of the input pixels in HSI. Our proposed method achieved 99.59% overall accuracy and 99.75% average accuracy on Indian Pines dataset; 99.95% overall accuracy and 99.90% average accuracy on Pavia University dataset while requiring a lower number of trainable parameters and training data compared to other methods.
The high-resolution soil moisture inversion under vegetation from remote-sensing data is a challenging task. The key issue is to separate or eliminate the influence of vegetation. Therefore, the difficult problem is to select appropriate vegetation descriptors and accurately deal with them. With the great success of convolutional neural network (CNN) in PolSAR image classification, this paper intends to use CNN and introduce an adaptive weighted learning mechanism to calculate the new vegetation descriptors and eliminate the influence of vegetation on soil moisture inversion under vegetation by modifying the network structure. First, we propose an adaptive weighted learning module that can learn the adaptive weights of vegetation contribution which are the volume scattering components and the double-bounce scattering components in the Freeman-Durden decomposition to obtain new vegetation parameters. Second, we transform the inversion problem into a classification-regression problem, and use CNN to design two corresponding models to complete the inversion of soil moisture. In the proposed strategy, the adaptive weighted learning can effectively extract salient features from different components and combine them to obtain new vegetation descriptors and, the CNN can effectively utilize the spatial distribution of PolSAR data to automatically extract features that are useful for soil moisture inversion. The experimental results show that both classification network and retrospective network can achieve high inversion accuracy, whose inversion accuracy reaches up 96.66%, and root mean square error and the determination coefficient are 2.32% and 0.93, respectively. It suggests the great potential of combining the deep learning technique with traditional inversion model for soil moisture inversion from PolSAR data.
Full-text available
The spectral and spatial resolutions of modern optical Earth observation data are continuously increasing. To fully utilize the data, integrate them with other information sources and create applications relevant to real-world problems, extensive training data are required. We present TAIGA, an open dataset including continuous and categorical forestry data, accompanied by airborne hyperspectral imagery with a pixel size of 0.7 m. The dataset contains over 70 million labeled pixels belonging to more than 600 forest stands. To establish a baseline on TAIGA dataset for multitask learning, we train and validate a convolutional neural network to simultaneously retrieve 13 forest variables. Due to the size of the imagery, the training and testing sets were independent, with strictly no overlap for patches up to 45×45 pixels. Our retrieval results show that including both spectral and textural information improves the accuracy of mapping key boreal forest structural characteristics, compared with an earlier study including only spectral information from the same image. TAIGA responds to the increased availability of hyperspectral and very high resolution imagery, and includes the forestry variables relevant for forestry and environmental applications. We propose the dataset as a new benchmark for spatial-spectral methods that overcomes limitations of widely used small-scale hyperspectral datasets.
Full-text available
Convolutional neural networks (CNN) provide state-of-the-art performance in many computer vision tasks, including those related to remote-sensing image analysis. Successfully training a CNN to generalize well to unseen data, however, requires training on samples that represent the full distribution of variation of both the target classes and their surrounding contexts. With remote sensing data, acquiring a sufficiently representative training set is a challenge due to both the inherent multi-modal variability of satellite or aerial imagery and the general high cost of labeling data. To address this challenge, we have developed ISOSCELES, an Iterative Self-Organizing SCEne LEvel Sampling method for hierarchical sampling of large image sets. Using affinity propagation, ISOSCELES automates the selection of highly representative training images. Compared to random sampling or using available reference data, the distribution of the training is principally data driven, reducing the chance of oversampling uninformative areas or undersampling informative ones. In comparison to manual sample selection by an analyst, ISOSCELES exploits descriptive features, spectral and/or textural, and eliminates human bias in sample selection. Using a hierarchical sampling approach, ISOSCELES can obtain a training set that reflects both between-scene variability, such as in viewing angle and time of day, and within-scene variability at the level of individual training samples. We verify the method by demonstrating its superiority to stratified random sampling in the challenging task of adapting a pre-trained model to a new image and spatial domain for country-scale building extraction. Using a pair of hand-labeled training sets comprising 1,987 sample image chips, a total of 496,000,000 individually labeled pixels, we show, across three distinct model architectures, an increase in accuracy, as measured by F1-score, of 2.2–4.2%.
Full-text available
Although optical remote sensing can capture the Earths environment with visible and infra-red sensors, it is limited by the weather condition. Often, only a few sets of cloud-free optical imagery are available in cloudy regions, where many agricultural towns are located. On the other hand, radar remote sensing can capture imagery even with heavy clouds. In this study, we examined the capability of Sentinel-1 multitemporal dual-polarized SAR imagery in a whole year from Google Earth Engine in crop mapping in two study sites in Chongqing, China, and Landivisiau, France. Results show that it is possible to produce better crop classification maps using multitemporal SAR imagery, but the performance is limited by local terrain. Flat agricultural regions, such as Western Europe, are expected to benefit from the multitemporal SAR information. Mountain agricultural regions, such as Southwestern China, will encounter difficulties due to the undulate terrain. We also tested two sampling strategies, i.e., random sampling and regional sampling, and observed high variation in overall accuracy: the former led to higher accuracy. The gap is caused by the diversity of training sets examined using tSNE visualization. The training set collected via random sampling has higher diversity and therefore helps the classifier to capture the datas real-world distribution. Regional sampling collects a more compact set of training samples. 3D CNN achieved similar results under a huge computation cost compared with 2D CNNs. Based on the experiments, we recommend using light-weight 2D CNN that can run on CPU for real-world crop mapping with SAR data.
The land cover classification has played an important role in remote sensing applications. However, most classification methods were designed based on the pixel features or local spatial features of the remote sensing image, which limits the classification accuracy and generalization. In order to further utilize the spatial information, this letter proposes a dual-branch neural network (NN) inspired by the conditional random field (CRF) model, namely CRF-Net, which takes into account the global spatial features of the image, i.e., geographic latitude-longitude information. First, a dual-branch NN is designed to extract the pixel features and coordinate features. Then, the two kinds of features are fused to realize the remote sensing imagery classification. In the experiments, randomly selected samples and spatial-disjoint samples are employed to verify the effectiveness of the proposed method for hyperspectral image (HSI) and polarimetric synthetic aperture radar (PolSAR) image classification. The experimental results show that the proposed method is superior to the traditional supervised classification methods under the spatial-disjoint sampling strategy, and can achieve the same level of accuracy under the random sampling condition.
Full-text available
This paper introduces the use of single layer and deep convolutional networks for remote sensing data analysis. Direct application to multi- and hyper-spectral imagery of supervised (shallow or deep) convolutional networks is very challenging given the high input data dimensionality and the relatively small amount of available labeled data. Therefore, we propose the use of greedy layer-wise unsupervised pre-training coupled with a highly efficient algorithm for unsupervised learning of sparse features. The algorithm is rooted on sparse representations and enforces both population and lifetime sparsity of the extracted features, simultaneously. We successfully illustrate the expressive power of the extracted representations in several scenarios: classification of aerial scenes, as well as land-use classification in very high resolution (VHR), or land-cover classification from multi- and hyper-spectral images. The proposed algorithm clearly outperforms standard Principal Component Analysis (PCA) and its kernel counterpart (kPCA), as well as current state-of-the-art algorithms of aerial classification, while being extremely computationally efficient at learning representations of data. Results show that single layer convolutional networks can extract powerful discriminative features only when the receptive field accounts for neighboring pixels, and are preferred when the classification requires high resolution and detailed results. However, deep architectures significantly outperform single layers variants, capturing increasing levels of abstraction and complexity throughout the feature hierarchy.
Full-text available
This paper investigates different batch-mode active-learning (AL) techniques for the classification of remote sensing (RS) images with support vector machines. This is done by generalizing to multiclass problem techniques defined for binary classifiers. The investigated techniques exploit different query functions, which are based on the evaluation of two criteria: uncertainty and diversity. The uncertainty criterion is associated to the confidence of the supervised algorithm in correctly classifying the considered sample, while the diversity criterion aims at selecting a set of unlabeled samples that are as more diverse (distant one another) as possible, thus reducing the redundancy among the selected samples. The combination of the two criteria results in the selection of the potentially most informative set of samples at each iteration of the AL process. Moreover, we propose a novel query function that is based on a kernel-clustering technique for assessing the diversity of samples and a new strategy for selecting the most informative representative sample from each cluster. The investigated and proposed techniques are theoretically and experimentally compared with state-of-the-art methods adopted for RS applications. This is accomplished by considering very high resolution multispectral and hyperspectral images. By this comparison, we observed that the proposed method resulted in better accuracy with respect to other investigated and state-of-the art methods on both the considered data sets. Furthermore, we derived some guidelines on the design of AL systems for the classification of different types of RS images.
In recent years, Deep Learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely Computer Vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should not only be aware of advancements like DL, but also be leading researchers in this area. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.
Spectral-spatial processing has been increasingly explored in remote sensing hyperspectral image classification. While extensive studies have focused on developing methods to improve the classification accuracy, experimental setting and design for method evaluation have drawn little attention. In the scope of supervised classification, we find that traditional experimental designs for spectral processing are often improperly used in the spectral-spatial processing context, leading to unfair or biased performance evaluation. This is especially the case when training and testing samples are randomly drawn from the same image - a practice that has been commonly adopted in the experiments. Under such setting, the dependence caused by overlap between the training and testing samples may be artificially enhanced by some spatial information processing methods such as spatial filtering and morphological operation. Such interaction between training and testing sets has violated data independence assumption that is abided by supervised learning theory and performance evaluation mechanism. Therefore, the widely adopted pixel-based random sampling strategy is not always suitable to evaluate spectral-spatial classification algorithms because it is difficult to determine whether the improvement of classification accuracy is caused by incorporating spatial information into classifier or by increasing the overlap between training and testing samples. To partially solve this problem, we propose a novel controlled random sampling strategy for spectral-spatial methods. It can greatly reduce the overlap between training and testing samples and provides more objective and accurate evaluation.
A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data-labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression, and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.