Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Nowadays, modern Earth Observation systems continuously generate huge amounts of data. A notable example is represented by the Sentinel-2 mission, which provides images at high spatial resolution (up to 10 m) with high temporal revisit period (every 5 days), which can be organized in Satellite Image Time Series (SITS). While the use of SITS has been proved to be beneficial in the context of Land Use/Land Cover (LULC) map generation, unfortunately, most of machine learning approaches commonly leveraged in remote sensing field fail to take advantage of spatio-temporal dependencies present in such data. Recently, new generation deep learning methods allowed to significantly advance research in this field. These approaches have generally focused on a single type of neural network, i.e., Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs), which model different but complementary information: spatial autocorrelation (CNNs) and temporal dependencies (RNNs). In this work, we propose the first deep learning architecture for the analysis of SITS data, namely DuPLO (DUal view Point deep Learning architecture for time series classificatiOn), that combines Convolutional and Recurrent neural networks to exploit their complementarity. Our hypothesis is that, since CNNs and RNNs capture different aspects of the data, a combination of both models would produce a more diverse and complete representation of the information for the underlying land cover classification task. Experiments carried out on two study sites characterized by different land cover characteristics (i.e., the Gard site in Mainland France and Reunion Island, a overseas department of France in the Indian Ocean), demonstrate the significance of our proposal.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Various architectures that combine recurrent and convolutional layers into one hybrid model have been proposed to overcome this limitation (Table 5). These architectures use the recurrent components to perform the temporal processing, and the convolutional components to perform the spatial processing [244]. Highlighting the importance of these methods, Garnot et al. [230] compared a straight 2D-CNN model (thus ignoring the temporal aspect), a straight GRU model (thus ignoring the spatial aspect) and a combined 2D-CNN and GRU model (thus using both spatial and temporal information) and found the combined model gave the best results, demonstrating that both the spatial and temporal dimensions provide useful information for land cover mapping and crop classification. ...
... Other hybrid CNN-RNN models have separate convolutional and recurrent units, either stacking them sequentially or using them in parallel processing streams [244], fusing the outputs in the final layers of the model. A common approach is to first use CNN layers to extract spatial features from each time step, then process the resulting feature vectors using a GRU [230], [240], [245] or LSTM [131], [155], [246], [247]. ...
... Adding temporal attention to a hybrid 2D-CNN and GRU network resulted in a small performance improvement to crop classification models [245]. DuPLO [244] incorporates attention with parallel CNN and hybrid CNN/GRU branches. For the CNN branch, a 5×5 patch centered on the pixel was extracted for each pixel and the bands and time series were stacked together. ...
Preprint
Full-text available
Earth observation (EO) satellite missions have been providing detailed images about the state of the Earth and its land cover for over 50 years. Long term missions, such as NASA's Landsat, Terra, and Aqua satellites, and more recently, the ESA's Sentinel missions, record images of the entire world every few days. Although single images provide point-in-time data, repeated images of the same area, or satellite image time series (SITS) provide information about the changing state of vegetation and land use. These SITS are useful for modeling dynamic processes and seasonal changes such as plant phenology. They have potential benefits for many aspects of land and natural resource management, including applications in agricultural, forest, water, and disaster management, urban planning, and mining. However, the resulting satellite image time series (SITS) are complex, incorporating information from the temporal, spatial, and spectral dimensions. Therefore, deep learning methods are often deployed as they can analyze these complex relationships. This review presents a summary of the state-of-the-art methods of modelling environmental, agricultural, and other Earth observation variables from SITS data using deep learning methods. We aim to provide a resource for remote sensing experts interested in using deep learning techniques to enhance Earth observation models with temporal information. In this review, we are primarily concerned with methods of estimating or predicting EO variables from SITS data. These can be divided into two types of tasks depending on the nature of the variable being estimated. If the variable can take one of two or more discrete values, then the task is classification. Examples of classification tasks include land cover mapping [1], crop type identification [2], and burnt area mapping [3]. If the variable can take continuous numeric values, then the task is regression. In the context of time series, regression tasks can be further categorized as extrinsic regression tasks, which estimate the value of a variable external to those represented by the time series [4], or forecasting tasks, which predict future values of a time series based on its historical values. While classification and extrinsic regression tasks are technically distinct, in practice many of the deep learning methods used are very similar. Many architectures that have originally been designed for a classification task can easily be adapted for extrinsic regression tasks (and vice versa) [5], for example, by modifying the last layer and the loss function. A more important consideration when considering deep learning architectures for SITS tasks is the quantity of labeled data available for training models. Many deep learning models have thousands or even millions of parameters that need estimating and thus training these models require large quantities of labeled data. Smaller architectures with fewer parameters are likely to be more suitable when labeled data are limited. In particular, techniques such as semi-supervised and unsupervised learning are designed for situations with few or no labeled samples, respectively. In related work, Gómez et al. [1] provided a comprehensive review of using optical SITS data for land cover classification. However, there have been developments in EO data and machine learning since that review that have led to a substantial increase in SITS research and its potential applications for EO monitoring. One reason for these recent developments is the availability of data from the ESA Sentinel missions that provide both optical and synthetic aperture radar (SAR) data at higher temporal and spatial resolution than many of the previously readily available sources. Another reason is the wide variety of machine learning methods, especially deep learning methods, that can model the complex relationships that exist between the observed electromagnetic radiation and the variable of interest. Both these advances mean there are a wider variety of techniques available for EO modelling and a wider variety of tasks that can be performed using these models. The current review, which covers the use of deep learning methods for SITS, therefore provides an update to review [1]. It focuses on deep learning analysis of SITS for classification and extrinsic regression problems and examines a broad range of applications of SITS data, thus filling the gap left by these other recent reviews. However, the review excludes DL forecasting applications of SITS, as these have been extensively covered by Moskolaï et al. [6]. As we are interested in modelling of temporal features, we limit the study to time series longer than two. Thus, we exclude methods such as bitemporal change detection, which identifies differences in two images obtained at separate times. A recent review of change detection in remote sensing is provided in [7]. SELECTION PRINCIPLES AND RELATED SURVEYS SELECTION PRINCIPLES There are more studies using deep learning for SITS than can feasibly be included in a single review, thus this review is not an exhaustive survey. However, we aim to provide coverage of a broad range of studies that show both the deep learning methods applied to SITS and the tasks for which SITS have been used. We have therefore included studies that: 1) are the key works developing DL techniques for SITS tasks, 2) show how the various DL methods have been applied to SITS, 3) provide insight into extracting temporal and/or spatial features from SITS, and 4) highlight the wide range of tasks for which SITS can be used. Papers were mainly found by searching on
... For instance, TempCNN [22], which applies convolution on the temporal dimension, or Recurrent Neural Networks [23], [24], [25], [26], [27] which retain past timestamps information in memory, have been proposed. Although these architectures can outperform traditional approaches such as Random Forest [28], existing literature [29], [30], [31], has corroborated that better results could be obtained by also considering the spatial dimension. This is due to the fact that high-level spatio-temporal features allow the detection and discrimination of closely resembling spectral signatures. ...
... The proposed network marries CNNs and RNNs as separate layers and the CNN output is injected as the input to an RNN. Other CNN and RNN combinations are proposed in [29] [30]. Both studies propose an architecture composed of two parallel branches aiming to independently extract spatial and temporal features. ...
... Features from temporal data are extracted by applying a RNN architecture whereas spatial features are learned by a CNN network applied on a high spatialresolution 25×25 patch image. Although two parallel branches are also proposed in Duplo [30], this architecture exploits temporal S2 patches with a spatial dimension of 5 × 5 on both branches. The temporal branch uses a shallow CNN to reduce the spatial dimension to 1 before applying Gated Recurrent Units. ...
Article
Full-text available
In this paper, a new self-supervised strategy for learning meaningful representations of complex optical Satellite Image Time Series (SITS) is presented. The methodology proposed, named U-BARN, a Unet-BERT spAtio-temporal Representation eNcoder, exploits irregularly sampled SITS. The designed architecture allows learning rich and discriminative features from unlabeled data, enhancing the synergy between the spatio-spectral and the temporal dimensions. To train on unlabeled data, a time series reconstruction pretext task inspired by the BERT strategy but adapted to SITS is proposed. A Sentinel-2 large-scale unlabeled data-set is used to pre-train U-BARN. During the pre-training, U-BARN processes annual time series composed of a maximum of 100 dates. To demonstrate its feature learning capability, representations of SITS encoded by U-BARN are then fed into a shallow classifier to generate semantic segmentation maps. Experimental results are conducted on a labeled crop data-set (PASTIS) as well as a dense land cover data-set (MultiSenGE). Two ways of exploiting U-BARN pre-training are considered: either U-BARN weights are frozen or fine-tuned. The obtained results demonstrate that representations of SITS given by the frozen U-BARN are more efficient for land cover and crop classification than those of a supervised-trained linear layer. Then, we observe that fine-tuning boosts U-BARN performances on MultiSenGE dataset. Additionally, we observe on PASTIS, in scenarios with scarce reference data, that the fine-tuning brings a significative performance gain compared to fully-supervised approaches. We also investigate the influence of the percentage of elements masked during pre-training on the quality of the SITS representation. Eventually, semantic segmentation performances show that the fully supervised U-BARN architecture reaches better performances than the spatio-temporal baseline (U-TAE) on both downstream tasks: crop and dense land cover segmentation.
... Particularly, (SVM) and random forests have been used for this purpose (Zheng et al., 2015;Saini and Ghosh, 2018). In recent years, the growing availability of SITS data, combined with advances in deep learning, has led to more advanced crop classifiers utilizing temporal neural architectures such as convolutions (Pelletier et al., 2019;Zhong et al., 2019), recurrent units (Rußwurm and Körner, 2017;Ndikumana et al., 2018), selfattention (Rußwurm and Körner, 2020;Sainte Fare Garnot et al., 2020), or hybrids (Interdonato et al., 2019;Rußwurm and Körner, 2018). These models are exceptionally well in capturing temporal patterns; they, however, depend heavily on large amounts of labeled training data. ...
... Adding to this, Fig. 5 provides a visualization of the target feature separability using t-SNE (van der Maaten and Hinton, 2008), demonstrating the enhanced clustering and hence separability of target features based on embeddings from the RAINCOAT-SRS's encoder compared to source-and target trained models, and the default RAINCOAT. These findings align with existing research in SITS, which suggests that incorporating state-of-the-art encoder designs for classifier models can lead to improved results (Pelletier et al., 2019;Ndikumana et al., 2018;Interdonato et al., 2019;Rußwurm and Körner, 2020). However, contrary to what might have been expected, the LSTM-based configurations performed worse than the default CNN setup. ...
Article
Full-text available
With the increasing availability of Earth observation data in recent years, the development of deep learning algorithms for the classification of satellite image time series (SITS) has substantially progressed. Yet, when encountering settings of lacking target labels and distinct feature variations, even the latest classification algorithms may perform poorly in transferring knowledge from a trained dataset to an unknown target dataset, despite similar or even identical label sets. The research field of unsupervised domain adaptation (UDA) focuses on methods to overcome these challenges by providing algorithms that explicitly learn domain shifts between different data domains in the absence of target-labeled data. Building upon recent advances on generic UDA research in time series settings, we propose RAINCOAT-SRS, an enhancement of the frequency-augmented UDA-algorithm RAINCOAT specifically for the SITS domain. To evaluate the default and adjusted model variants, we designed several closed-label set, cross-regional and multi-temporal crop type mapping experiments, which represent common sub-problems of UDA in SITS. We first benchmark RAINCOAT against TimeMatch as a leading algorithm in this application context. Subsequently, we explored different encoder-to-decoder constellations as architectural enhancements. These analyses revealed that a combination of an self-attention-based encoder with the default decoder yields a performance increase to the standard algorithm of up to 6 % in average f1-score, and to TimeMatch by up to 24 %. Beyond, we assessed the impact of the frequency feature and SITS-specific feature extensions by integrating weather data, which both showed to improve classification accuracy only in individual sub-experiments however not consistently across the entire scope of scenarios. Finally, we outline key factors influencing the transferability, thereby emphasizing the major importance of domain-overarching stability of class-relative, structural patterns rather than of collective, linear shifts between domains. Through this research, we introduce RAINCOAT-SRS, a novel model for UDA in SITS, designed to advance generalization in remote sensing by enabling more comprehensive cross-regional and multi-temporal SITS experiments in face of lacking target-labeled data.
... Existing techniques for satellite image time series classification include pixel-wise and whole-image methods. Pixel-wise methods [61,40,68,30,83] process each pixel of the SITS independently, discarding spatial information. In practice, these methods are outperformed by whole-image approaches, like PSE+LTAE [32] or TSViT [75]. ...
... Pixel-wise multi-frame methods. We can also view DAFA-LS as a set of pixel time series and evaluate several methods designed for pixel-wise satellite image time series classification including DuPLo [40], TempCNN [61], a self-attention approach referred to as Transformer [68], and LTAE [30]. ...
Preprint
Full-text available
Archaeological sites are the physical remains of past human activity and one of the main sources of information about past societies and cultures. However, they are also the target of malevolent human actions, especially in countries having experienced inner turmoil and conflicts. Because monitoring these sites from space is a key step towards their preservation, we introduce the DAFA Looted Sites dataset, \datasetname, a labeled multi-temporal remote sensing dataset containing 55,480 images acquired monthly over 8 years across 675 Afghan archaeological sites, including 135 sites looted during the acquisition period. \datasetname~is particularly challenging because of the limited number of training samples, the class imbalance, the weak binary annotations only available at the level of the time series, and the subtlety of relevant changes coupled with important irrelevant ones over a long time period. It is also an interesting playground to assess the performance of satellite image time series (SITS) classification methods on a real and important use case. We evaluate a large set of baselines, outline the substantial benefits of using foundation models and show the additional boost that can be provided by using complete time series instead of using a single image.
... A common approach has been to combine CNN and RNN models. Convolutional neural network (CNN)-RNN integration has manifested through multi-temporal architectures (Interdonato et al., 2019;Pelletier et al., 2019;Rußwurm and Körner, 2018) and parallel processing (Interdonato et al., 2019;Thorp and Drajat, 2021). Hybrid CNN-RNN fusion of Landsat, NAIP, climate and terrain datasets for land cover typing has shown promising results (Chang et al., 2019) while evaluating diverse spatiotemporal CNN-RNN designs assessing postdeforestation land use transition via Landsat further confirmed the combined models' superiority (Masolele et al., 2021). ...
... A common approach has been to combine CNN and RNN models. Convolutional neural network (CNN)-RNN integration has manifested through multi-temporal architectures (Interdonato et al., 2019;Pelletier et al., 2019;Rußwurm and Körner, 2018) and parallel processing (Interdonato et al., 2019;Thorp and Drajat, 2021). Hybrid CNN-RNN fusion of Landsat, NAIP, climate and terrain datasets for land cover typing has shown promising results (Chang et al., 2019) while evaluating diverse spatiotemporal CNN-RNN designs assessing postdeforestation land use transition via Landsat further confirmed the combined models' superiority (Masolele et al., 2021). ...
Article
Full-text available
Sustainable natural resources management relies on effective and timely assessment of conservation and land management practices. Using satellite imagery for Earth observation has become essential for monitoring land cover/land use (LCLU) changes and identifying critical areas for conserving biodiversity. Remote Sensing (RS) datasets are often quite large and require tremendous computing power to process. The emergence of cloud-based computing techniques presents a powerful avenue to overcome computing limitations by allowing machine-learning algorithms to process and analyze large RS datasets on the cloud. Our study aimed to classify LCLU for the Talassemtane National Park (TNP) using a Deep Neural Network (DNN) model incorporating five spectral indices to differentiate six land use classes using Sentinel-2 satellite imagery. Optimization of the DNN model was conducted using a comparative analysis of three optimization algorithms: Random Search, Hyper-band, and Bayesian optimization. Results indicated that the spectral indices improved classification between classes with similar reflectance. The Hyperband method had the best performance, improving the classification accuracy by 12.5% and achieving an overall accuracy of 94.5% with a kappa coefficient of 93.4%. The dropout regularization method prevented overfitting and mitigated over-activation of hidden nodes. Our initial results show that machine learning (ML) applications can be effective tools for improving natural resources management.
... Following the trend in computer vision and machine learning, recent works have shown that neural networks are well suited to process large quantities of data. Spatio-temporal dimensions of SITS have been successfully handled by recurrent cells and convolutions [170,171,172] and attention-based architectures [77]. However, those methods work with the entire datacube and do not exploit geographical entities and their relationships, which is one of the reasons for using graphs to process SITS data. ...
Preprint
Full-text available
The Earth's surface is subject to complex and dynamic processes, ranging from large-scale phenomena such as tectonic plate movements to localized changes associated with ecosystems, agriculture, or human activity. Satellite images enable global monitoring of these processes with extensive spatial and temporal coverage, offering advantages over in-situ methods. In particular, resulting satellite image time series (SITS) datasets contain valuable information. To handle their large volume and complexity, some recent works focus on the use of graph-based techniques that abandon the regular Euclidean structure of satellite data to work at an object level. Besides, graphs enable modelling spatial and temporal interactions between identified objects, which are crucial for pattern detection, classification and regression tasks. This paper is an effort to examine the integration of graph-based methods in spatio-temporal remote-sensing analysis. In particular, it aims to present a versatile graph-based pipeline to tackle SITS analysis. It focuses on the construction of spatio-temporal graphs from SITS and their application to downstream tasks. The paper includes a comprehensive review and two case studies, which highlight the potential of graph-based approaches for land cover mapping and water resource forecasting. It also discusses numerous perspectives to resolve current limitations and encourage future developments.
... It is common to simply exclude single and multitemporal pixels that are occluded by clouds [63] when operating at a pixel level (type P or type T), or interpolate the colour information from earlier and later images (e.g. [48,5,49]) when using temporal data (type T or ST). Another common strategy is to create single image composites from images taken at different times (e.g. ...
Article
Full-text available
Agricultural research is essential for increasing food production to meet the needs of a rapidly growing human population. Collecting large quantities of agricultural data helps to improve decision making for better food security at various levels: from international trade and policy decisions, down to individual farmers. At the same time, deep learning has seen a wave of popularity across many different research areas and data modalities. And satellite imagery has become available in unprecedented quantities, driving much research from the wider remote sensing community. The data hungry nature of deep learning models and this huge data volume seem like a perfect match. But has deep learning been adopted for agricultural tasks using satellite images? This systematic review of 193 studies analyses the tasks that have reaped benefits from deep learning algorithms, and those that have not. It was found that while Land Use / Land Cover research has embraced deep learning algorithms, research on other agricultural tasks has not. This poor adoption appears to be due to a critical lack of labelled datasets for these other tasks. Thus, we give suggestions for collecting larger datasets. Additionally, satellite images differ from ground-based images in a number of ways, resulting in a proliferation of interesting data interpretations unique to satellite images. So, this review also introduces a taxonomy of data input shapes and how they are interpreted in order to facilitate easier communication of algorithm types and enable quantitative analysis.
... To leverage spatial and temporal information of satellite image time series, we developed a novel patch-based classification model PSTIN for identifying different crop types (Fig. 1). The PSTIN processes time-series image patches centered on the target pixel, with a patch size of 5 × 5, as referenced to previous studies Interdonato et al., 2019;Yuan et al., 2022). For each time-series image patch, the PSTIN first uses the PSAE to aggregate the spectral features of all pixels within the patch under the guidance of the central pixel. ...
Article
Accurate and timely crop type classification is essential for effective agricultural monitoring, cropland management , and yield estimation. Unfortunately, the complicated temporal patterns of different crops, combined with gaps and noise in satellite observations caused by clouds and rain, restrict crop classification accuracy, particularly during early seasons with limited temporal information. Although deep learning-based methods have exhibited great potential for improving crop type mapping, insufficient and noisy training data may lead them to overlook more generalizable features and derive inferior classification performance. To address these challenges, we developed a Mask Pixel-set SpatioTemporal Integration Network (Mask-PSTIN), which integrates a temporal random masking technique and a novel PSTIN model. Temporal random masking augments the training data by selectively removing certain temporal information to improve data variability, enforcing the model to learn more generalized features. The PSTIN, comprising a pixel-set aggregation encoder (PSAE) and long short-term memory (LSTM) module, effectively captures comprehensive spatiotemporal features from time-series satellite images. The effectiveness of Mask-PSTIN was evaluated across three regions with different landscapes and cropping systems. Results demonstrated that the addition of PSAE in PSTIN significantly improved crop classification accuracy compared to a basic LSTM, with average overall accuracy (OA) increasing from 80.9% to 83.9%, and the mean F1-Score (mF1) rising from 0.781 to 0.818. Incorporating temporal random masking in training led to further improvements, increasing average OA and mF1 to 87.4% and 0.865, respectively. The Mask-PSTIN significantly outperformed traditional machine learning and deep learning methods (i.e., RF, SVM, Transformer, and CNN-LSTM) in crop type mapping across all three regions. Furthermore, Mask-PSTIN enabled earlier and more accurate crop type identification before or during their developing stages compared to machine learning models. Feature importance analysis based on the gradient backpropagation algorithm revealed that Mask-PSTIN effectively leveraged multi-temporal features, exhibiting broader attention across various time steps and capturing critical crop phenological characteristics. These results suggest that Mask-PSTIN is a promising approach for improving both post-harvest and early-season crop type classification, with potential applications in agricultural management and monitoring.
... Compared with SVM, and RF, DL algorithms are better with the ability to extract a suitable representation of the input data, optimized for the particular task at hand. The majority of the DL approaches in the field of SITS analysis are dedicated to Land Use Land Cover (LULC) mapping and discriminate LULC classes exploiting the information associated with the evolution of the radiometric information such as, agricultural mapping and yield estimation [11], land management planning [12], natural resource mapping [13]. ...
Article
Full-text available
Remote sensing time series research and applications are advancing rapidly in land, ocean, and atmosphere science, demonstrating emerging capabilities in space-based monitoring methodologies and diverse application prospects. This prompts a comprehensive review of remote sensing time series observations, time series data reconstruction, derived products, and the current progress, challenges, and future directions in their applications. The high-frequency new data, i.e., a constellation strategy, increasing computing power and advancing deep learning algorithms, are driving a paradigm shift from traditional point-in-time mapping to near-real-time monitoring tasks, and even to modeling integration of parameter inversion and prediction in land, water, and air science. Correspondingly, the 3 main projects, namely, the Global Climate Observing System, the United States Geological Survey/National Aeronautics and Space Administration (USGS/NASA) Landsat Science team, and the China Global Land Surface Satellite (GLASS) team, along with other time series-derived products, have found widespread applications in the research of Earth’s radiation balance and human–land systems. They have also been utilized for tasks such as land use change detection, assessing coastal effects, ocean environment monitoring, and supporting carbon neutrality strategies. Moreover, the 3 critical challenges and future directions were highlighted including multimode time series data fusion, deep learning modeling for task-specific domain adaptation, and fine-scale remote sensing applications by using dense time series. This review distills historical and current developments spanning the last several decades, providing an insightful understanding into the advancements in remote sensing time series data and applications.
... Additionally, the proposed approach uses an auxiliary classifier for each time series encoder is a strategy that has been suggested in previous multi-source land cover mapping studies [8]. This approach involves adding an extra out-put layer to the encoder, which helps improve the mapping accuracy and performance. ...
... The advent of deep learning technology has revolutionized the field of remote sensing, achieving significant success in areas such as land classification [10], change detection [11], time-series applications [12], [13], and extraction of roads and buildings [14], [15]. Recent studies have extended deep learning techniques from natural image detection to remote sensing imagery, yielding promising results in vehicle detection tasks [16]. ...
Article
Full-text available
Vehicle detection is vital for urban planning and traffic management. Optical remote sensing imagery, known for its high resolution and extensive coverage, is ideal for this task. Traditional horizontal bounding box (HBB) annotations often include excessive background, leading to reduced accuracy, whereas oriented bounding box (OBB) annotations are more precise but costly and prone to human error. To address these issues, we propose the OGR-SM (Oriented Group R-CNN + Student Model) framework, a weakly semi-supervised oriented vehicle detection method based on single-point annotations. It leverages a small amount of OBB annotations along with a large quantity of single-point annotations for training, achieving performance comparable to fully-supervised learning with 100% complete annotations. Specifically, we train the teacher model (OGR) using a small set of accurately annotated OBBs and their corresponding single-point annotations. This model employs an instance-point driven proposal grouping strategy and a group-based proposal assignment strategy enhanced with point location to generate pseudo-OBBs from a large set of weakly annotated single-point data. We then utilize conventional oriented detectors as student models to perform the vehicle detection task in a standard manner. Extensive experiments on two datasets show that our framework, using limited accurate OBBs and many pseudo-OBBs, can achieve or surpass the accuracy of fully-supervised models. Our method balances high-quality annotations with data availability, enhancing scalability and robustness for vehicle detection in remote sensing applications.
... To enhance the discriminative power of the information at different scales, we have leveraged auxiliary classifiers, a widely adopted technique in multi-sensor data fusion for Earth observation data analysis [37,38]. These classifiers serve as output layers for each encoder, aiming to extract complementary and discriminative information from both the pixel and object scales. ...
Article
Full-text available
With the advent of modern Earth observation (EO) systems, the opportunity of collecting satellite image time series (SITS) provides valuable insights to monitor spatiotemporal dynamics. Within this context, accurate land use/land cover (LULC) mapping plays a pivotal role in supporting territorial management and facilitating informed decision-making processes. However, traditional pixel-based and object-based classification methods often face challenges to effectively exploit spectral and spatial information. In this study, we propose Orthrus, a novel approach that fuses multi-scale information for enhanced LULC mapping. The proposed approach exploits several 2D encoding techniques to encode times series information into imagery. The resulting image is leveraged as input to a standard convolutional neural network (CNN) image classifier to cope with the downstream classification task. The evaluations on two real-world benchmarks, namely Dordogne and Reunion-Island, demonstrated the quality of Orthrus over state-of-the-art techniques from the field of land cover mapping based on SITS data. More precisely, Orthrus exhibits an enhancement of more than 3.5 accuracy points compared to the best competing approach on the Dordogne benchmark, and surpasses the best competing approach on the Reunion-Island dataset by over 3 accuracy points.
... Regarding the modeling, we rely on recurrent neural networks, which have successfully been used to analyze temporal satellite data [10,11,12,13]. We opt for the Gated Recurrent Unit (GRU), introduced in [14], due to its moderate number of parameters and its proven effectiveness in remote sensing applications [12,15,16]. The time series of each pixel are pre-padded to a fixed sequence length of 228, to account for the longest time series in the dataset, before being supplied to the model. ...
Preprint
Full-text available
Vegetation indices allow to efficiently monitor vegetation growth and agricultural activities. Previous generations of satellites were capturing a limited number of spectral bands, and a few expert-designed vegetation indices were sufficient to harness their potential. New generations of multi- and hyperspectral satellites can however capture additional bands, but are not yet efficiently exploited. In this work, we propose an explainable-AI-based method to select and design suitable vegetation indices. We first train a deep neural network using multispectral satellite data, then extract feature importance to identify the most influential bands. We subsequently select suitable existing vegetation indices or modify them to incorporate the identified bands and retrain our model. We validate our approach on a crop classification task. Our results indicate that models trained on individual indices achieve comparable results to the baseline model trained on all bands, while the combination of two indices surpasses the baseline in certain cases.
... SITS form complex data cubes structured by their spatial and temporal dimensions, requiring the development of specific architectures. In the context of SITS semantic segmentation (i.e., one prediction for each time-series pixel), recent strategies developed architectures that make the most of both dimensions through dual-branch architectures (Interdonato et al., 2019), ConvLSTM (Rußwurm and Körner, 2018), U-Net with temporal attention encoder (TAE) (Sainte Fare Garnot and Landrieu, 2021), or variants of Vision Transformers (Tarasiou et al., 2023, Voelsen et al., 2023. Similar approaches can be also exploited for extrinsic regression, e.g., yield estimation (Sun et al., 2020), and forecasting tasks (Moskolaï et al., 2021). ...
Article
Full-text available
In the context of climate change, it is important to monitor the dynamics of the Earth’s surface in order to prevent extreme weather phenomena such as floods and droughts. To this end, global meteorological forecasting is constantly being improved, with a recent breakthrough in deep learning methods. In this paper, we propose to adapt a recent weather forecasting architecture, called GraphCast, to a water resources forecasting task using high-resolution satellite image time series (SITS). Based on an intermediate mesh, the data geometry used within the network is adapted to match high spatial resolution data acquired in two-dimensional space. In particular, we introduce a predefined irregular mesh based on a segmentation map to guide the network’s predictions and bring more detail to specific areas. We conduct experiments to forecast water resources index two months ahead on lakes and rivers in Italy and Spain. We demonstrate that our adaptation of GraphCast outperforms the existing frameworks designed for SITS analysis. It also showed stable results for the main hyperparameter, i.e., the number of superpixels. We conclude that adapting global meteorological forecasting methods to SITS settings can be beneficial for high spatial resolution predictions.
... compiles the model accuracy results from studies employing both ML and DL for LULC mapping. Results indicate that while DL can lead to accuracy improvements, these are not always substantialInterdonato et al., 2019;Xie and Niculescu, 2021;Magalhães et al., 2022). Performance varied widely, with accuracy ranging from 51.56% to 99.59% for ML and 61.45%-99.79% ...
Article
Full-text available
This review explores the comparative utility of machine learning (ML) and deep learning (DL) in land system science (LSS) classification tasks. Through a comprehensive assessment, the study reveals that while DL techniques have emerged with transformative potential, their application in LSS often faces challenges related to data availability, computational demands, model interpretability, and overfitting. In many instances, traditional ML models currently present more effective solutions, as illustrated in our decision-making framework. Integrative opportunities for enhancing classification accuracy include data integration from diverse sources, the development of advanced DL architectures, leveraging unsupervised learning, and infusing domain-specific knowledge. The research also emphasizes the need for regular model evaluation, the creation of diversified training datasets, and fostering interdisciplinary collaborations. Furthermore, while the promise of DL for future advancements in LSS is undeniable, present considerations often tip the balance in favor of ML models for many classification schemes. This review serves as a guide for researchers, emphasizing the importance of choosing the right computational tools in the evolving landscape of LSS, to achieve reliable and nuanced land-use change data.
... The necessity and efficiency of the linear interpolation to handle the missing values for land cover classification is unknown given that the RNN or Transformer models, unlike traditional machine learning, can directly handle input time series missing values. Most research adapting RNN and Transformer models to classify satellite time series still used the linear interpolation method to reconstruct missing values Rußwurm and Körner, 2017;Zhu et al., 2017;Ienco et al., 2019;Interdonato et al., 2019;Rußwurm and Körner, 2018;Tang et al., 2022;Xu et al., 2020). A few other studies have directly used RNN or Transformer to handle time series data with missing values for land cover classification without linear interpolation (Metzger et al., 2021;Zhao et al., 2021;Rußwurm et al., 2023). ...
... It is common to combine multi-source time series data for classification, regression, and forecasting tasks by using a model of multiple branches Interdonato et al. 2019). We designed a twobranch network (named CGNet) to integrate different data organizations (1DTS and recurrence plot). ...
Article
Full-text available
Preparing regular time series optical remote sensing data is a difficult task due to the influences of frequently cloudy and rainy days. The irregular data and their forms severely limit the data’s ability to be analysed and modelled for vegetation classification. However, how irregular time series data affect vegetation classification in deep learning models is poorly understood. To address these questions, this research preprocessed the 2019-2021 time series of Sentinel-2 in both unequal and equal intervals, and transformed them into an image through recurrence plot for each pixel. The initial one-dimension time series (1DTS) and recurrence plot data were then used as input data for three deep learning methods (i.e., Conv1D model based on one-dimensional convolution, GoogLeNet model based on two-dimensional convolution, and CGNet model which fused Conv1D and GoogLeNet) for vegetation classification, respectively. The class separability of the features generated by each model was evaluated and the importance of spectral and temporal features was further examined through gradient backpropagation. The equal-interval time series data significantly improved the classification accuracy with 0.04, 0.13, and 0.09 for Conv1D, GoogLeNet, and CGNet, respectively. The CGNet achieved the highest classification accuracy, indicating that the information from 1DTS and recurrence plot can be a good complementary for vegetation classification. The importance of spectral bands and time showed that the Sentinel-2 red edge-1 spectral band played a critical role in the identification of eucalyptus, loquat, and honey pomelo, but the importance order of bands varied in different vegetation types in GoogLeNet. The time importance varied across different vegetation types but is similar in these deep learning models. This study quantified the impacts of organizational form (1DTS and recurrence plot) of time series data on different models. This research is valuable for us to choose appropriate data structures and efficient deep learning models for vegetation classification.
... The spatio-temporal nature of the input needs to be reflected in the model architecture to learn feature representations useful for the recognition of forest loss drivers. This is an active research topic in satellite image processing and many approaches have been proposed, mostly for crop type classification, using 1D or 3D convolution [22,23], combining convolutional and recurrent modules [24,25,26] or utilizing the attention mechanism [27,28,29]. The adoption of spatiotemporal DL for mapping forest loss drivers is in its infancy, with a single study showing a significant increase in accuracy when compared to single-image approaches [17]. ...
Article
Full-text available
The rates of tropical deforestation remain high, resulting in carbon emissions, biodiversity loss, and impacts on local communities. To design effective policies to tackle this, it is necessary to know what the drivers behind deforestation are. Since drivers vary in space and time, producing accurate spatially explicit maps with regular temporal updates is essential. Drivers can be recognized from satellite imagery but the scale of tropical deforestation makes it unfeasible to do so manually. Machine learning opens up possibilities for automating and scaling up this process. In this study, we developed and trained a deep learning model to classify the drivers of any forest loss—including deforestation—from satellite image time series. Our model architecture allows understanding of how the input time series is used to make a prediction, showing the model learns different patterns for recognizing each driver and highlighting the need for temporal data. We used our model to classify over 588′000 sites to produce a map detailing the drivers behind tropical forest loss. The results confirm that the majority of it is driven by agriculture, but also show significant regional differences. Such data is a crucial source of information to enable targeting specific drivers locally and can be updated in the future using free satellite data.
... Different architectures have proven to be competitive for one-year range crop classifications on agricultural land, compared to traditional machine-learning methods (e.g., random forest, support vector machines). These include 1D-CNN (Pelletier et al., 2019;Simoes et al., 2021;Zhong et al., 2019), 2D CNN (Debella-Gilo & Gjertsen, 2021), combinations of convolutional and recurrent models (Interdonato et al., 2019), self-attention (Rußwurm & Körner, 2020), and combinations of Transformer and CNN . Like crops, forest disturbances are characterized by temporal changes and, therefore, similar methods are applicable for disturbance classifications. ...
Article
Full-text available
The monitoring of forest ecosystems is significantly affected by the lack of consistent historical data of low-severity (forest partially disturbed) or gradual disturbance (e.g. eastern spruce budworm epidemic). The goal of this paper is to explore the use of a subset of Landsat time series and deep learning models to identify both the type and the year of disturbances, including low-severity and gradual disturbances, in the boreal forest of eastern Canada at the pixel level. Remote sensing data such as the spectral information from Landsat time series are the best available option for large scale observations of disturbances that go back decades. Traditional modeling approaches, like LandTrendr, require substantial handcrafted pre-processing to remove noise and to extract temporal features from the image sequences before using them as input to a classical machine-learning model. Deep-learning models can autonomously discern which features are relevant within the coarse temporal and spectral information from the Landsat annual dense time series. We evaluated the performance of TempCNN and Transformer model in detecting and classifying the type and the year of the forest disturbance using Landsat time series subsequences. Our findings resulted in the generation of four disturbance maps outlining the forest history from 1986 to 2021 within the eastern Canadian boreal forest. Our experimental outcomes demonstrate several significant benefits of employing deep learning models. Firstly, using noisy Landsat time series they achieve comparable accuracy for classifying fire and total harvesting than existing publicly available disturbance maps. Secondly, the use of shorter time series subsequence with deep learning models enables to map adequately different overlapping disturbances occurring in the complete time series. Finally, they increase the number of distinguishable disturbance classes by adding partial harvesting, gradual disturbances, and forest recovery from older events, making them useful approaches for obtaining the first remote sensing-based map for areas affected by the eastern spruce budworm.
... In recent years, the field of intelligent monitoring of land resources has rapidly developed, due to advancements in deep learning. Deep learning methods, which are primarily based on convolutional neural networks [14], recurrent neural networks [15], deep neural networks [16], and hybrid approaches such as attention-guided mechanisms [17], multitype network nesting [18], [19], and knowledge description [20], have been extensively applied in various tasks including the identification of cultivated land parcels, delineation of cultivated land boundaries, and crop classification. Despite the progress made by researchers in cultivated land parcel extraction using deep learning methods, achieving high-quality parcel segmentation results remains a challenging task, which is primarily due to the following reasons: 1) Temporal Variability: Crops on cultivated land exhibit diverse characteristics that vary over time and demonstrate a strong temporal correlation. ...
Article
The accurate acquisition of farmland information holds paramount importance for effective agricultural resource monitoring and production management. Traditional semantic segmentation methods struggle to perform precise segmentation at the parcel level. Many existing contour extraction methods tend to generate ambiguous and inaccurate outcomes. To overcome these challenges, this paper proposes a multiscale edge-guided network for accurate cultivated land parcel boundary extraction, which consists: (1) Multi-scale guided Transformer module: This module is designed to encode parcel features by combining the Shunted Transformer and atrous convolution modules. It allows for modeling long-distance context and refining features, enabling accurate representation of farmland parcels. (2) Edge enhancement module: This module operates on multiscale features during the feature extraction stage, improving the ability to capture fine details and boundaries of farmland parcels. (3) Dual pyramid structure: This structure consists of a bottom-up pyramid that incorporates deformable convolutions. It enhances the accuracy of multiscale object detection, enabling the network to capture features at different levels of detail. (4) Deoverlap operation: this modular is designed to reduce the ambiguity caused by contour overlap in the extracted results. Specifically, the method effectively mitigates the impact of regional, temporal, and background features. The experimental results showcase the attainment of AP and MIoU scores of 0.3421 and 85.28, respectively, for the extracted farmland parcels. Moreover, the method yields competitive results across different farmland parcel types, shapes, and temporal intervals. This positions it as a valuable tool for optimizing agricultural production and enhancing resource monitoring capabilities.
... To this end, we borrow ideas from the TempCNN architecture proposed in (Pelletier et al., 2019), a supervised deep learning method for land cover mapping, that employs a one-dimensional convolution to manage the intrinsic temporal information that characterize SITS data. The main reason is that, differently from other recent models for SITS classification (Interdonato et al., 2019;Rußwurm and Körner, 2020;Garnot et al., 2020), this model achieves high level performances while keeping low the network complexity for the task of pixel level SITS-based land cover classification. The proposed architecture is visually depicted in Figure 1, and consists of an encoder model, supported by two different heads: i) a Classification Head (CLHead) devoted to deal with the supervised task and ii) a Reconstruction Head (RecoHead) dedicated to reconstruct the original input multivariate time series. ...
... High spatiotemporal resolution remote sensing (RS) images play a pivotal role in various applications, including, but not limited to, crop growth monitoring [1,2], land cover change detection [3][4][5][6][7][8], and land cover classification [9][10][11]. However, due to technical and budgetary constraints, obtaining RS data with high spatial and temporal resolutions is challenging [12], thereby limiting the utilization of advanced RS applications. ...
Article
Full-text available
In recent years, convolutional neural network (CNN)-based spatiotemporal fusion (STF) models for remote sensing images have made significant progress. However, existing STF models may suffer from two main drawbacks. Firstly, multi-band prediction often generates a hybrid feature representation that includes information from all bands. This blending of features can lead to the loss or blurring of high-frequency details, making it challenging to reconstruct multi-spectral remote sensing images with significant spectral differences between bands. Another challenge in many STF models is the limited preservation of spectral information during 2D convolution operations. Combining all input channels’ convolution results into a single-channel output feature map can lead to the degradation of spectral dimension information. To address these issues and to strike a balance between avoiding hybrid features and fully utilizing spectral information, we propose a remote sensing image STF model that combines single-band and multi-band prediction (SMSTFM). The SMSTFM initially performs single-band prediction, generating separate predicted images for each band, which are then stacked together to form a preliminary fused image. Subsequently, the multi-band prediction module leverages the spectral dimension information of the input images to further enhance the preliminary predictions. We employ the modern ConvNeXt convolutional module as the primary feature extraction component. During the multi-band prediction phase, we enhance the spatial and channel information captures by replacing the 2D convolutions within ConvNeXt with 3D convolutions. In the experimental section, we evaluate our proposed algorithm on two public datasets with 16x resolution differences and one dataset with a 3x resolution difference. The results demonstrate that our SMSTFM achieves state-of-the-art performance on these datasets and is proven effective and reasonable through ablation studies.
... An LSTM model was used to investigate the effect of the time component on wheat lodging intensity and changes. Since CNN and RNN look at data from different aspects, a combination of both approaches creates a more diverse and complete representation of information for the classification of agricultural products (Sainte Fare Garnot et al., 2019;Mazzia et al., 2019;Interdonato et al., 2019). ...
Article
Crop lodging in agricultural fields is one of the major factors that limit cereal crop yields. Wheat, the most popular cereal crop in most countries, is also affected by this phenomenon, which may result in a significant decrease in both yield and quality. Therefore, addressing wheat lodging is crucial for producers. This study aims to detect and identify wheat lodging through aerial images and classify its severity based on ratio, angle and location of the lodging. To achieve this goal, a multi-task approach was proposed involving three phases. First, automatic dataset generation methodology was conducted on orthomosaic imagery of three dates. Next, a comprehensive assessment of wheat lodging (ratio, angle and location) was performed, which has received little research attention. Third, applying and improving selected classification models for classifying image datasets was conducted. Combining convolutional neural networks and temporal sequences in a single model provided an opportunity to use spatiotemporal information extracted from the wheat image datasets. Time dependency and individual dates were both considered in the classification task. The limited number of data and imbalanced classes challenges, resulting from real field conditions data collection, were overcome by applying a new loss function to the classifier models. The overall accuracy of wheat lodging classification reached over 91% in these two states using the proposed approach. Based on this research, wheat lodging was detected more accurately by the proposed models despite the small and imbalanced dataset. The developed methodology paves the way for comprehensive and automatic wheat lodging detection, and the methodology can be adapted for similar crops that suffer lodging issues with suitable modifications.
... Zhao et al., 2022a;Zhao et al., 2020b). RNN, such as Long Short-Term Memory (LSTM), links adjacent input variables to learn time dependencies in sequential data and extract accumulated change features over time, but the high complexity and recurrent structure easily accumulate noise or inefficient features, resulting in decreasing the ability for extracting separable complex features from high-dimensional SITS (Interdonato et al., 2019;Xu et al., 2021;W. Zhao et al., 2022b;Zhao et al., 2021b;Zhong et al., 2019). ...
Article
Full-text available
Plantation forests provide critical ecosystem services and have experienced worldwide expansion during the past few decades. Accurate mapping of tree species through remote sensing is critical for managing plantation forests. The typical temporal behaviors and traits of tree species in satellite image time series (SITS) generate temporal and spectral features in multiple phenological stages that are critical to improve tree species mapping. However, the diverse input features, sequential relations and complex structures in SITS drastically increase the dimension and difficulty of spectral-temporal feature extraction, which challenges the capacity of many general classifiers not explicitly adapted for spectral-temporal learning. As a result, there is still a lack of a method that could automatically extract spectral-temporal features with high separability and regional adaptability from high-dimensional SITS for tree species mapping of plantation forests. Moreover, the effects of varying temporal resolution and feature combination on the plantation tree species mapping are under-explored. Here, we developed a multi-head attention-based method for automatically extracting spectral-temporal features with high separa-bility based on a modified Transformer network (Transformer4SITS) for improved plantation tree species mapping. The end-to-end network model consists of a feature extraction module to learn deep spectral-temporal features from SITS and a fusion module to combine multiple features for improving mapping accuracy. We applied this method to two representative plantation forests in southern and northern China for tree species mapping. The results show that: (1) Transformer4SITS method could self-adaptively extract typical spectral-temporal features of key phenological stages (e.g., greenness rising and falling) from SITS, and achieved significantly improved accuracies by at most 15% in comparison with all four baseline methods (i.e., long short-term memory, harmonic analysis, time-weighted dynamic time warping, linear discriminant analysis); (2) time series with higher temporal resolution tended to produce more accurate species maps consistently across two sites, with their overall accuracies (OA) respectively increasing from 91.05% and 84.33% (60-day) to 94.88% and 88.72% (5-day), but the effect of high temporal resolution respectively leveled off around 90-day and 50-day resolution across two sites; (3) the mapping results using all available bands and two-band spectral indices outperformed the results using a subset of them, but with only modest increase in the accuracy (i.e., OA increased from 93.63% and 86.01% to 94.88% and 88.72%. This study thus provides a state-of-the-art deep learning-based method for improved tree species mapping, which is critical for sustainable management and biodiversity monitoring of plantation forests across large scales.
Article
Accurate spatio-temporal information on rice cropping patterns is vital for predicting grain production, managing water resource and assessing greenhouse gas emissions. However, current automated mapping of rice cropping patterns at regional scale is heavily constrained by insufficient training samples and frequent cloudy weathers in major rice-producing areas. To tackle this challenge, we proposed a Phenology domain Optical-SAR feature inTegration method to Automatically generate single (SC-Rice) and double cropping Rice (DC-Rice) sample (POSTAR) for efficient and refined rice mapping. POSTAR includes three major steps: (1) generating a potential rice map using a phenology- and object-based classification method with optical data (Sentinel-2 MSI) to select candidate rice samples; (2) employing K-means to identify SC- and DC-Rice candidate samples according to unique SAR-based (Sentinel-1 SAR) phenological features; (3) implementing a two-step refinement strategy to filter high-confidence SC- and DC-Rice samples, maintaining a balance between intraclass phenological variance and sample purity. Test areas selected for validation include the Dongting Lake plain and Poyang Lake plain in South China, as well as Fujin county located in the Sanjiang plain of North China. POSTAR proved effective in producing reliable SC- and DC-Rice samples, achieving a high spectral correlation similarity (>0.85) and low dynamic time wrapping distance (<8.5) with field samples. Applying POSTAR-derived samples to random forest classifier yielded an overall accuracy of 89.6%, with F1 score of 0.899 for SC-Rice and 0.938 for DC-Rice in the Dongting Lake plain. Owing to the incorporation of knowledge-based optical and SAR phenological features, POSTAR exhibited strong spatial transferability, achieving an overall accuracy of 96.0% in the Poyang Lake plain and 97.8% in the Fujin county. These results demonstrated the effectiveness of the POSTAR method in accurately mapping rice cropping patterns without extensive field visits, providing valuable insights for crop monitoring in large, diverse, and cloudy regions through the integration of optical and SAR data.
Article
Full-text available
Accurate and timely information on the spatial distribution and areas of crop types is critical for yield estimation, agricultural management, and sustainable development. However, traditional crop classification methods often struggle to identify various crop types effectively due to their intricate spatiotemporal patterns and high training data demands. To address this challenge, we developed a Structure-aware Label eXpansion segmentation Network (StructLabX-Net) for diverse crop type mapping using limited point-annotated samples. StructLabX-Net features a backbone U-TempoNet, which combines CNNs and LSTM to explore intricate spatiotemporal patterns. It also incorporates multi-task weak supervision heads for edge detection and pseudo-label expansion, adding crucial structure and contextual insights. We tested the StructLabX-Net across three distinct regions in China, assessing over 10 crop types and comparing its performance against five popular classifiers based on multi-temporal Sentinel-2 images. The results showed that StructLabX-Net significantly outperformed RF, SVM, DeepCropMapping, Transformer, and patch-based CNN in identifying various crop types across three regions with sparse training samples. It achieved the highest overall accuracy and mean F1-score: 91.0% and 89.1% in Jianghan Plain, 91.5% and 90.7% in Songnen Plain, as well as 91.0% and 90.8% in Sanjiang Plain. StructLabX-Net demonstrated a particular advantage for those “hard types” characterized by limited samples and complex phenological features. Furthermore, ablation experiments highlight the crucial role of the “edge” head in guiding the model to accurately differentiate between various crop types with clearer class boundaries, and the “expansion” head in refining the understanding of target crops by providing extra details in pseudo-labels. Meanwhile, combining our backbone U-TempoNet with multi-task weak supervision heads exhibited superior results of crop type mapping than those derived by other segmentation models. Overall, StructLabX-Net maximizes the utilization of limited sparse samples from field surveys, offering a simple, cost-effective, and robust solution for accurately mapping various crop types at large scales.
Conference Paper
Internet of Things (IoT) sensor data or readings evince variations in timestamp range, sampling frequency, geographical location, unit of measurement, etc. Such presented sequence data heterogeneity makes it difficult for traditional time series classification algorithms to perform well. Therefore, addressing the heterogeneity challenge demands learning not only the sub-patterns (local features) but also the overall pattern (global feature). To address the challenge of classifying heterogeneous IoT sensor data (e.g., categorizing sensor data types like temperature and humidity), we propose a novel deep learning model that incorporates both Convolutional Neural Network and Bi-directional Gated Recurrent Unit to learn local and global features respectively, in an end-to-end manner. Through rigorous experimentation on heterogeneous IoT sensor datasets, we validate the effectiveness of our proposed model, which outperforms recent state-of-the-art classification methods as well as several machine learning and deep learning baselines. In particular, the model achieves an average absolute improvement of 3.37% in Accuracy and 2.85% in F1-Score across datasets.
Article
Earth observation (EO) satellite missions have been providing detailed images about the state of Earth and its land cover for over 50 years. Long-term missions, such as those of NASA’s Landsat, Terra, and Aqua satellites, and more recently, the European Space Agency’s (ESA’s) Sentinel missions, record images of the entire world every few days. Although single images provide point-in-time data, repeated images of the same area, or satellite image time series (SITS), provide information about the changing state of vegetation and land use. These SITS are useful for modeling dynamic processes and seasonal changes, such as plant phenology. They have potential benefits for many aspects of land and natural resource management, including applications in agricultural, forest, water, and disaster management; urban planning; and mining. However, the resulting SITS are complex, incorporating information from the temporal, spatial, and spectral dimensions. Therefore, deep learning (DL) methods are often deployed, as they can analyze these complex relationships. This review article presents a summary of the state-of-the-art methods of modeling environmental, agricultural, and other EO variables from SITS data using DL methods. We aim to provide a resource for remote sensing experts interested in using DL techniques to enhance EO models with temporal information.
Article
Full-text available
Mapping of highly dynamic changes in land use and land cover (LULC) can be hindered by various cloudy conditions with optical satellite images. These conditions result in discontinuities in high-temporal-density LULC mapping. In this paper, we developed an integrated time series mapping method to enhance the LULC mapping accuracy and frequency in cloud-prone areas by incorporating spectral-indices-fused deep models and time series reconstruction techniques. The proposed method first reconstructed cloud-contaminated pixels through time series filtering, during which the cloud masks initialized by a deep model were refined and updated during the reconstruction process. Then, the reconstructed time series images were fed into a spectral-indices-fused deep model trained on samples collected worldwide for classification. Finally, post-classification processing, including spatio-temporal majority filtering and time series refinement considering land–water interactions, was conducted to enhance the LULC mapping accuracy and consistency. We applied the proposed method to the cloud- and rain-prone Pearl River Delta (i.e., Guangdong–Hong Kong–Macao Greater Bay Area, GBA) and used time series Sentinel-2 images as the experimental data. The proposed method enabled seamless LULC mapping at a temporal frequency of 2–5 days, and the production of 10 m resolution annual LULC products in the GBA. The assessment yielded a mean overall accuracy of 87.01% for annual mapping in the four consecutive years of 2019–2022 and outperformed existing mainstream LULC products, including ESA WorldCover (83.98%), Esri Land Cover (85.26%), and Google Dynamic World (85.06%). Our assessment also reveals significant variations in LULC mapping accuracies with different cloud masks, thus underscoring their critical role in time series LULC mapping. The proposed method has the potential to generate seamless and near real-time maps for other regions in the world by using deep models trained on datasets collected globally. This method can provide high-quality LULC data sets at different time intervals for various land and water dynamics in cloud- and rain-prone regions. Notwithstanding the difficulties of obtaining high-quality LULC maps in cloud-prone areas, this paper provides a novel approach for the mapping of LULC dynamics and the provision of reliable annual LULC products.
Article
Recently, time series remote sensing image (TSRSI) has been reported to be an effective resource to mapping fine land use/land cover (LULC), and deep learning, in particular, has been gaining growing attention in this field. However, existing deep learning methods often only learn features from either the temporal or spatial domain, neglecting the intercorrelation between temporal features, which may provide more information for classification, are not fully considered. In order to make full use of the relations between temporal features and to explore more objective features for improving classification accuracy, we proposed a feature relationship-based classification method. The method leverages the angles between features on the temporal curve to establish relationships between every pair and triplet of features, resulting in the creation of feature relationship matrices (FRMs) and feature relationship tensors (FRTs). Afterwards, a 2D-3D multi-scale convolutional neural network (2D-3D MSCNN) was designed to learn deep features from FRM and FRT, achieving the classification improvement of TSRSI. Our experiment was conducted on TSRSIs located in two counties, Sutter and Kings in California, United States. The experimental results indicate that compared to both deep learning and non-deep learning methods, the proposed approach achieves significant improvements in accuracy and LULC mapping, validating the effectiveness and feasibility of enhancing TSRSI classification accuracy through feature relationship learning.
Article
Full-text available
Time Series Classification and Extrinsic Regression are important and challenging machine learning tasks. Deep learning has revolutionized natural language processing and computer vision and holds great promise in other fields such as time series analysis where the relevant features must often be abstracted from the raw data but are not known a priori. This paper surveys the current state of the art in the fast-moving field of deep learning for time series classification and extrinsic regression. We review different network architectures and training methods used for these tasks and discuss the challenges and opportunities when applying deep learning to time series data. We also summarize two critical applications of time series classification and extrinsic regression, human activity recognition and satellite earth observation.
Article
Full-text available
Semantic change detection (SCD) holds a critical place in remote sensing image interpretation, as it aims to locate changing regions and identify their associated land cover classes. Presently, post-classification techniques stand as the predominant strategy for SCD due to their simplicity and efficacy. However, these methods often overlook the intricate relationships between alterations in land cover. In this paper, we argue that comprehending the interplay of changes within land cover maps holds the key to enhancing SCD’s performance. With this insight, a Temporal-Transform Module (TTM) is designed to capture change relationships across temporal dimensions. TTM selectively aggregates features across all temporal images, enhancing the unique features of each temporal image at distinct pixels. Moreover, we build a Temporal-Transform Network (TTNet) for SCD, comprising two semantic segmentation branches and a binary change detection branch. TTM is embedded into the decoder of each semantic segmentation branch, thus enabling TTNet to obtain better land cover classification results. Experimental results on the SECOND dataset show that TTNet achieves enhanced performance when compared to other benchmark methods in the SCD task. In particular, TTNet elevates mIoU accuracy by a minimum of 1.5% in the SCD task and 3.1% in the semantic segmentation task.
Article
Due to the complex and highly heterogeneous land cover in urban areas, the single-temporal pixel-wise and parcel-wise classification cannot realize high-precision recognition of ground objects. Semantic segmentation of satellite image time series (SITS), can distinguish objects with similar spectral reflection and temporal evolution. But optical SITS have problems of uneven time-frequency distribution and incomplete, which makes it impossible to directly use existing models to carry out time series semantic segmentation. This study proposes a semantic segmentation network that combines optical and radar SITS, named Multi-Source Temporal Attention Fusion-Based Temporal-Spatial Transformer (MTAF-TST), to achieve high-precision land cover classification in urban areas. Firstly, MTAF-TST uses the Transformer spatial semantic segmentation module to extract the spatial context information of ground objects to realize pixel-level land cover classification, which relieves the salt-and-pepper phenomenon that is easy to occur in traditional pixel-by-pixel classification in complex scenes. Secondly, MTAF-TST uses the Transformer time feature extraction module to mine long-range time-dependent and high-level semantic information, overcoming the drawbacks of traditional convolutional and recurrent neural networks that cannot mine long-range time-dependent features of SITS. Finally, MTAF-TST uses a multi-source temporal attention fusion module to fuse the depth features of optical and radar SITS, which overcomes the shortcomings of traditional direct feature stitching methods that cannot make full use of time-correlated features, achieving high-precision land cover classification. The experimental results show that the MTAF-TST can realize the complementarity of radar and optical SITS in terms of timing integrity, color, texture, etc., and effectively improve the accuracy of SITS classification.
Article
Full-text available
Modern Earth Observation systems provide remote sensing data at different temporal and spatial resolutions. Among all the available spatial mission, today the Sentinel-2 program supplies high temporal (every five days) and high spatial resolution (HSR) (10 m) images that can be useful to monitor land cover dynamics. On the other hand, very HSR (VHSR) imagery is still essential to figure out land cover mapping characterized by fine spatial patterns. Understanding how to jointly leverage these complementary sources in an efficient way when dealing with land cover mapping is a current challenge in remote sensing. With the aim of providing land cover mapping through the fusion of multitemporal HSR and VHSR satellite images, we propose a suitable end-to-end deep learning framework, namely M 3 Fusion, which is able to simultaneously leverage the temporal knowledge contained in time series data as well as the fine spatial information available in VHSR images. Experiments carried out on the Reunion Island study area confirm the quality of our proposal considering both quantitative and qualitative aspects.
Article
Full-text available
The emergence of the Sentinel-1A and 1B satellites now offers freely available and widely accessible Synthetic Aperture Radar (SAR) data. Near-global coverage and rapid repeat time (6-12 days) gives Sentinel-1 data the potential to be widely used for monitoring the Earth’s surface. Subtle land-cover and land surface changes can affect the phase and amplitude of the C-band SAR signal, and thus the coherence between two images collected before and after such changes. Analysis of SAR coherence therefore serves as a rapidly-deployable and powerful tool to track both seasonal changes, and rapid surface disturbances following natural disasters. An advantage of using Sentinel-1 C-band radar data is the ability to easily construct time series of coherence for a region of interest at low cost. In this paper, we propose a new method for Potentially Affected Area (PAA) detection following a natural hazard event. Based on the coherence time series, the proposed method (1) determines the natural variability of coherence within each pixel in the region of interest, accounting for factors such as seasonality and the inherent noise of variable surfaces; (2) compares pixel-by-pixel syn-event coherence to temporal coherence distributions to determine where statistically significant coherence loss has occurred. The user can determine what degree the syn-event coherence value (e.g., 1st, 5th percentile of pre-event distribution) constitutes a PAA, and integrate pertinent regional data, such as population density, to rank and prioritise PAAs. We apply the method to two case studies, Sarpol-e, Iran following the 2017 Iran-Iraq earthquake, and a landslide-prone region of NW Argentina, to demonstrate how rapid identification and interpretation of potentially affected areas can be performed shortly following a natural hazard event.
Article
Full-text available
The development and improvement of methods to map agricultural land cover are currently major challenges, especially for radar images. This is due to the speckle noise nature of radar, leading to a less intensive use of radar rather than optical images. The European Space Agency Sentinel-1 constellation, which recently became operational, is a satellite system providing global coverage of Synthetic Aperture Radar (SAR) with a 6-days revisit period at a high spatial resolution of about 20 m. These data are valuable, as they provide spatial information on agricultural crops. The aim of this paper is to provide a better understanding of the capabilities of Sentinel-1 radar images for agricultural land cover mapping through the use of deep learning techniques. The analysis is carried out on multitemporal Sentinel-1 data over an area in Camargue, France. The data set was processed in order to produce an intensity radar data stack from May 2017 to September 2017. We improved this radar time series dataset by exploiting temporal filtering to reduce noise, while retaining as much as possible the fine structures present in the images. We revealed that even with classical machine learning approaches (K nearest neighbors, random forest, and support vector machines), good performance classification could be achieved with F-measure/Accuracy greater than 86% and Kappa coefficient better than 0.82. We found that the results of the two deep recurrent neural network (RNN)-based classifiers clearly outperformed the classical approaches. Finally, our analyses of the Camargue area results show that the same performance was obtained with two different RNN-based classifiers on the Rice class, which is the most dominant crop of this region, with a F-measure metric of 96%. These results thus highlight that in the near future these RNN-based techniques will play an important role in the analysis of remote sensing time series.
Article
Full-text available
Grassland use intensity is a topic of growing interest worldwide, as grasslands are integral in supporting biodiversity, food production, and regulating of the global carbon cycle. Data available for characterizing grasslands management are largely descriptive and collected from laborious field campaigns or questionnaires. The recent launch of the Sentinel-2 earth monitoring constellation provides new possibilities for high temporal and spatial resolution remote sensing data covering large areas. This study aims to evaluate the potential of a time series of Sentinel-2 data for mapping of mowing frequency in the region of Canton Aargau, Switzerland. We tested two cloud masking processes and three spatial mapping units (pixels, parcel polygons and shrunken parcel polygons), and investigated how missing data influence the ability to accurately detect and map grassland management activity. We found that more than 40% of the study area was mown before 15 June, while the remaining part was either mown later, or was not mown at all. The highest accuracy for detection of mowing events was achieved using additional clouds masking and size reduction of parcels, which allowed correct detection of 77% of mowing events. Additionally, we found that using only standard cloud masking leads to significant overestimation of mowing events, and that the detection based on sparse time series does not fully correspond to key events in the grass growth season.
Article
Full-text available
Relation classification plays an important role in the field of natural language processing (NLP). Previous research on relation classification has verified the effectiveness of using convolutional neural network (CNN) and recurrent neural network (RNN). In this paper, we proposed a model that combine the RNN and CNN (RCNN), which will Give full play to their respective advantages: RNN can learn temporal and context features, especially long-term dependency between two entities, while CNN is capable of catching more potential features. We experiment our model on the SemEval-2010 Task 8 dataset¹, and the result shows that our method is superior to most of the existing methods.
Article
Full-text available
A satellite image time series (SITS) contains a significant amount of temporal information. By analysing this type of data, the pattern of the changes in the object of concern can be explored. The natural change in the Earth’s surface is relatively slow and exhibits a pronounced pattern. Some natural events (for example, fires, floods, plant diseases, and insect pests) and human activities (for example, deforestation and urbanisation) will disturb this pattern and cause a relatively profound change on the Earth’s surface. These events are usually referred to as disturbances. However, disturbances in ecosystems are not easy to detect from SITS data, because SITS contain combined information on disturbances, phenological variations and noise in remote sensing data. In this paper, a novel framework is proposed for online disturbance detection from SITS. The framework is based on long short-term memory (LSTM) networks. First, LSTM networks are trained by historical SITS. The trained LSTM networks are then used to predict new time series data. Last, the predicted data are compared with real data, and the noticeable deviations reveal disturbances. Experimental results using 16-day compositions of the moderate resolution imaging spectroradiometer (MOD13Q1) illustrate the effectiveness and stability of the proposed approach for online disturbance detection.
Article
Full-text available
Modern Earth Observation systems provide sensing data at different temporal and spatial resolutions. Among optical sensors, today the Sentinel-2 program supplies high-resolution temporal (every 5 days) and high spatial resolution (10m) images that can be useful to monitor land cover dynamics. On the other hand, Very High Spatial Resolution images (VHSR) are still an essential tool to figure out land cover mapping characterized by fine spatial patterns. Understand how to efficiently leverage these complementary sources of information together to deal with land cover mapping is still challenging. With the aim to tackle land cover mapping through the fusion of multi-temporal High Spatial Resolution and Very High Spatial Resolution satellite images, we propose an End-to-End Deep Learning framework, named M3Fusion, able to leverage simultaneously the temporal knowledge contained in time series data as well as the fine spatial information available in VHSR information. Experiments carried out on the Reunion Island study area asses the quality of our proposal considering both quantitative and qualitative aspects.
Article
Full-text available
Earth observation (EO) sensors deliver data at daily or weekly intervals. Most land use and land cover classification (LULC) approaches, however, are designed for cloud-free and mono-temporal observations. The increasing temporal capabilities of today’s sensors enable the use of temporal, along with spectral and spatial features.Domains such as speech recognition or neural machine translation, work with inherently temporal data and, today, achieve impressive results by using sequential encoder-decoder structures. Inspired by these sequence-to-sequence models, we adapt an encoder structure with convolutional recurrent layers in order to approximate a phenological model for vegetation classes based on a temporal sequence of Sentinel 2 (S2) images. In our experiments, we visualize internal activations over a sequence of cloudy and non-cloudy images and find several recurrent cells that reduce the input activity for cloudy observations. Hence, we assume that our network has learned cloud-filtering schemes solely from input data, which could alleviate the need for tedious cloud-filtering as a preprocessing step for many EO approaches. Moreover, using unfiltered temporal series of top-of-atmosphere (TOA) reflectance data, our experiments achieved state-of-the-art classification accuracies on a large number of crop classes with minimal preprocessing, compared to other classification approaches.
Article
Full-text available
Mapping winter vegetation quality is a challenging problem in remote sensing. This is due to cloud coverage in winter periods, leading to a more intensive use of radar rather than optical images. The aim of this letter is to provide a better understanding of the capabilities of Sentinel-1 radar images for winter vegetation quality mapping through the use of deep learning techniques. Analysis is carried out on a multitemporal Sentinel-1 data over an area around Charentes-Maritimes, France. This data set was processed in order to produce an intensity radar data stack from October 2016 to February 2017. Two deep recurrent neural network (RNN)-based classifiers were employed. Our work revealed that the results of the proposed RNN models clearly outperformed classical machine learning approaches (support vector machine and random forest).
Article
Full-text available
For agronomic, environmental, and economic reasons, the need for spatialized information about agricultural practices is expected to rapidly increase. In this context, we reviewed the literature on remote sensing for mapping cropping practices. The reviewed studies were grouped into three categories of practices: crop succession (crop rotation and fallowing), cropping pattern (single tree crop planting pattern, sequential cropping, and intercropping/agroforestry), and cropping techniques (irrigation, soil tillage, harvest and post-harvest practices, crop varieties, and agro-ecological infrastructures). We observed that the majority of the studies were exploratory investigations, tested on a local scale with a high dependence on ground data, and used only one type of remote sensing sensor. Furthermore, to be correctly implemented, most of the methods relied heavily on local knowledge on the management practices, the environment, and the biological material. These limitations point to future research directions, such as the use of land stratification, multi-sensor data combination, and expert knowledge-driven methods. Finally, the new spatial technologies, and particularly the Sentinel constellation, are expected to improve the monitoring of cropping practices in the challenging context of food security and better management of agro-environmental issues.
Article
Full-text available
Nowadays, remote sensing technologies produce huge amounts of satellite images that can be helpful to monitor geographical areas over time. A satellite image time series (SITS) usually contains spatio-temporal phenomena that are complex and difficult to understand. Conceiving new data mining tools for SITS analysis is challenging since we need to simultaneously manage the spatial and the temporal dimensions at the same time. In this work, we propose a new clustering framework specifically designed for SITS data. Our method firstly detects spatio-temporal entities, then it characterizes their evolutions by mean of a graph-based representation, and finally it produces clusters of spatio-temporal entities sharing similar temporal behaviors. Unlike previous approaches, which mainly work at pixel-level, our framework exploits a purely object-based representation to perform the clustering task. Object-based analysis involves a segmentation step where segments (objects) are extracted from an image and constitute the element of analysis. We experimentally validate our method on two real world SITS datasets by comparing it with standard techniques employed in remote sensing analysis. We also use a qualitative analysis to highlight the interpretability of the results obtained.
Article
Full-text available
Panchromatic (PAN) and multispectral (MS) ima-gery classification is one of the hottest topics in the field of remote sensing. In recent years, deep learning techniques have been widely applied in many areas of image processing. In this paper, an end-to-end learning framework based on deep multiple instance learning (DMIL) is proposed for MS and PAN images' classification using the joint spectral and spatial information based on feature fusion. There are two instances in the proposed framework: one instance is used to capture the spatial information of PAN and the other is used to describe the spectral information of MS. The features obtained by the two instances are concatenated directly, which can be treated as simple fusion features. To fully fuse the spatial-spectral information for further classification, the simple fusion features are fed into a fusion network with three fully connected layers to learn the high-level fusion features. Classification experiments carried out on four different airborne MS and PAN images indicate that the classifier provides feasible and efficient solution. It demonstrates that DMIL performs better than using a convolutional neural network and a stacked autoencoder network separately. In addition, this paper shows that the DMIL model can learn and fuse spectral and spatial information effectively, and has huge potential for MS and PAN imagery classification.
Article
Full-text available
The standard content-based attention mechanism typically used in sequence-to-sequence models is computationally expensive as it requires the comparison of large encoder and decoder states at each time step. In this work, we propose an alternative attention mechanism based on a fixed size memory representation that is more efficient. Our technique predicts a compact set of K attention contexts during encoding and lets the decoder compute an efficient lookup that does not need to consult the memory. We show that our approach performs on-par with the standard attention mechanism while yielding inference speedups of 20% for real-world translation tasks and more for tasks with longer sequences. By visualizing attention scores we demonstrate that our models learn distinct, meaningful alignments.
Article
Full-text available
In response to the need for generic remote sensing tools to support large-scale agricultural monitoring, we present a new approach for regional-scale mapping of agricultural land-use systems (ALUS) based on object-based Normalized Difference Vegetation Index (NDVI) time series analysis. The approach consists of two main steps. First, to obtain relatively homogeneous land units in terms of phenological patterns, a principal component analysis (PCA) is applied to an annual MODIS NDVI time series, and an automatic segmentation is performed on the resulting high-order principal component images. Second, the resulting land units are classified into the crop agriculture domain or the livestock domain based on their land-cover characteristics. The crop agriculture domain land units are further classified into different cropping systems based on the correspondence of their NDVI temporal profiles with the phenological patterns associated with the cropping systems of the study area. A map of the main ALUS of the Brazilian state of Tocantins was produced for the 2013-2014 growing season with the new approach, and a significant coherence was observed between the spatial distribution of the cropping systems in the final ALUS map and in a reference map extracted from the official agricultural statistics of the Brazilian Institute of Geography and Statistics (IBGE). This study shows the potential of remote sensing techniques to provide valuable baseline spatial information for supporting agricultural monitoring and for large-scale land-use systems analysis.
Article
Full-text available
Deep learning (DL) is a powerful state-of-the-art technique for image processing including remote sensing (RS) images. This letter describes a multilevel DL architecture that targets land cover and crop type classification from multitemporal multisource satellite imagery. The pillars of the architecture are unsupervised neural network (NN) that is used for optical imagery segmentation and missing data restoration due to clouds and shadows, and an ensemble of supervised NNs. As basic supervised NN architecture, we use a traditional fully connected multilayer perceptron (MLP) and the most commonly used approach in RS community random forest, and compare them with convolutional NNs (CNNs). Experiments are carried out for the joint experiment of crop assessment and monitoring test site in Ukraine for classification of crops in a heterogeneous environment using nineteen multitemporal scenes acquired by Landsat-8 and Sentinel-1A RS satellites. The architecture with an ensemble of CNNs outperforms the one with MLPs allowing us to better discriminate certain summer crop types, in particular maize and soybeans, and yielding the target accuracies more than 85% for all major crops (wheat, maize, sunflower, soybeans, and sugar beet).
Article
Full-text available
The recognition of actions from video sequences has many applications in health monitoring, assisted living, surveillance, and smart homes. Despite advances in sensing, in particular related to 3D video, the methodologies to process the data are still subject to research. We demonstrate superior results by a system which combines recurrent neural networks with convolutional neural networks in a voting approach. The gated-recurrent-unit-based neural networks are particularly well-suited to distinguish actions based on long-term information from optical tracking data; the 3D-CNNs focus more on detailed, recent information from video data. The resulting features are merged in an SVM which then classifies the movement. In this architecture, our method improves recognition rates of state-of-the-art methods by 14% on standard data sets.
Article
Full-text available
One of the challenging issues in high-resolution remote sensing images is classifying land-use scenes with high quality and accuracy. An effective feature extractor and classifier can boost classification accuracy in scene classification. This letter proposes a deep-learning-based classification method, which combines convolutional neural networks (CNNs) and extreme learning machine (ELM) to improve classification performance. A pretrained CNN is initially used to learn deep and robust features. However, the generalization ability is finite and suboptimal, because the traditional CNN adopts fully connected layers as classifier. We use an ELM classifier with the CNN-learned features instead of the fully connected layers of CNN to obtain excellent results. The effectiveness of the proposed method is tested on the UC-Merced data set that has 2100 remotely sensed land-use-scene images with 21 categories. Experimental results show that the proposed CNN-ELM classification method achieves satisfactory results.
Article
Full-text available
Sentinel-2 images are expected to improve global crop monitoring even in challenging tropical small agricultural systems that are characterized by high intra- and inter-field spatial variability and where satellite observations are disturbed by the presence of clouds. To overcome these constraints, we analyzed and optimized the performance of a combined Random Forest (RF) classifier/object-based approach and applied it to multisource satellite data to produce land use maps of a smallholder agricultural zone in Madagascar at five different nomenclature levels. The RF classifier was first optimized by reducing the number of input variables. Experiments were then carried out to (i) test cropland masking prior to the classification of more detailed nomenclature levels, (ii) analyze the importance of each data source (a high spatial resolution (HSR) time series, a very high spatial resolution (VHSR) coverage and a digital elevation model (DEM)) and data type (spectral, textural or other), and (iii) quantify their contributions to classification accuracy levels. The results show that RF classifier optimization allowed for a reduction in the number of variables by 1.5- to 6-fold (depending on the classification level) and thus a reduction in the data processing time. Classification results were improved via the hierarchical approach at all classification levels, achieving an overall accuracy of 91.7% and 64.4% for the cropland and crop subclass levels, respectively. Spectral variables derived from an HSR time series were shown to be the most discriminating, with a better score for spectral indices over the reflectances. VHSR data were only found to be essential when implementing the segmentation of the area into objects and not for the spectral or textural features they can provide during classification.
Article
Full-text available
A detailed and accurate knowledge of land cover is crucial for many scientific and operational applications, and as such, it has been identified as an Essential Climate Variable. This accurate knowledge needs frequent updates. This paper presents a methodology for the fully automatic production of land cover maps at country scale using high resolution optical image time series which is based on supervised classification and uses existing databases as reference data for training and validation. The originality of the approach resides in the use of all available image data, a simple pre-processing step leading to a homogeneous set of acquisition dates over the whole area and the use of a supervised classifier which is robust to errors in the reference data. The produced maps have a kappa coefficient of 0.86 with 17 land cover classes. The processing is efficient, allowing a fast delivery of the maps after the acquisition of the image data, does not need expensive field surveys for model calibration and validation, nor human operators for decision making, and uses open and freely available imagery. The land cover maps are provided with a confidence map which gives information at the pixel level about the expected quality of the result.
Article
Full-text available
Semantic labeling (or pixel-level land-cover classification) in ultra-high resolution imagery (< 10cm) requires statistical models able to learn high level concepts from spatial data, with large appearance variations. Convolutional Neural Networks (CNNs) achieve this goal by learning discriminatively a hierarchy of representations of increasing abstraction. In this paper we present a CNN-based system relying on an downsample-then-upsample architecture. Specifically, it first learns a rough spatial map of high-level representations by means of convolutions and then learns to upsample them back to the original resolution by deconvolutions. By doing so, the CNN learns to densely label every pixel at the original resolution of the image. This results in many advantages, including i) state-of-the-art numerical accuracy, ii) improved geometric accuracy of predictions and iii) high efficiency at inference time. We test the proposed system on the Vaihingen and Potsdam sub-decimeter resolution datasets, involving semantic labeling of aerial images of 9cm and 5cm resolution, respectively. These datasets are composed by many large and fully annotated tiles allowing an unbiased evaluation of models making use of spatial information. We do so by comparing two standard CNN architectures to the proposed one: standard patch classification, prediction of local label patches by employing only convolutions and full patch labeling by employing deconvolutions. All the systems compare favorably or outperform a state-of-the-art baseline relying on superpixels and powerful appearance descriptors. The proposed full patch labeling CNN outperforms these models by a large margin, also showing a very appealing inference time. Code to run pretrained models and to fine tune them, is available at https://sites.google.com/site/michelevolpiresearch/codes/dense-labeling.
Article
Full-text available
Synthetic aperture radar polarimetry (PolSAR) and polarimetric decomposition techniques have proven to be useful tools for wetland mapping. In this study we classify reed belts and monitor their phenological changes at a natural lake in northeastern Germany using dual-co-polarized (HH, VV) TerraSAR-X time series. The time series comprises 19 images, acquired between August 2014 and May 2015, in ascending and descending orbit. We calculated different polarimetric indices using the HH and VV intensities, the dual-polarimetric coherency matrix including dominant and mean alpha scattering angles, and entropy and anisotropy (normalized eigenvalue difference) as well as combinations of entropy and anisotropy for the analysis of the scattering scenarios. The image classifications were performed with the random forest classifier and validated with high-resolution digital orthophotos. The time series analysis of the reed belts revealed significant seasonal changes for the double-bounce–sensitive parameters (intensity ratio HH/VV and intensity difference HH-VV, the co-polarimetric coherence phase and the dominant and mean alpha scattering angles) and in the dual-polarimetric coherence (amplitude), anisotropy, entropy, and anisotropy-entropy combinations; whereas in summer dense leaves cause volume scattering, in winter, after leaves have fallen, the reed stems cause predominately double-bounce scattering. Our study showed that the five most important parameters for the classification of reed are the intensity difference HH-VV, the mean alpha scattering angle, intensity ratio HH/VV, and the coherence (phase). Due to the better separation of reed and other vegetation (deciduous forest, coniferous forest, meadow), winter acquisitions are preferred for the mapping of reed. Multi-temporal stacks of winter images performed better than summer ones. The combination of ascending and descending images also improved the result as it reduces the influence of the sensor look direction. However, in this study, only an accuracy of ~50% correct classified reed areas was reached. Whereas the shorelines with reed areas (>10 m broad) could be detected correctly, the actual reed areas were significantly overestimated. The main source of error is probably the challenging data geocoding causing geolocation inaccuracies, which need to be solved in future studies.
Article
Full-text available
When exploited in remote sensing analysis, a reliable change rule with transfer ability can detect changes accurately and be applied widely. However, in practice, the complexity of land cover changes makes it difficult to use only one change rule or change feature learned from a given multi-temporal dataset to detect any other new target images without applying other learning processes. In this study, we consider the design of an efficient change rule having transferability to detect both binary and multi-class changes. The proposed method relies on an improved Long Short-Term Memory (LSTM) model to acquire and record the change information of long-term sequence remote sensing data. In particular, a core memory cell is utilized to learn the change rule from the information concerning binary changes or multi-class changes. Three gates are utilized to control the input, output and update of the LSTM model for optimization. In addition, the learned rule can be applied to detect changes and transfer the change rule from one learned image to another new target multi-temporal image. In this study, binary experiments, transfer experiments and multi-class change experiments are exploited to demonstrate the superiority of our method. Three contributions of this work can be summarized as follows: (1) the proposed method can learn an effective change rule to provide reliable change information for multi-temporal images; (2) the learned change rule has good transferability for detecting changes in new target images without any extra learning process, and the new target images should have a multi-spectral distribution similar to that of the training images; and (3) to the authors’ best knowledge, this is the first time that deep learning in recurrent neural networks is exploited for change detection. In addition, under the framework of the proposed method, changes can be detected under both binary detection and multi-class change detection.
Article
Full-text available
We have mapped the primary native and exotic vegetation that occurs in the Cerrado-Caatinga transition zone in Central Brazil using MODIS-NDVI time series (product MOD09Q1) data over a two-year period (2011-2013). Our methodology consists of the following steps: (a) the development of a three-dimensional cube composed of the NDVI-MODIS time series; (b) the removal of noise; (c) the selection of reference temporal curves and classification using similarity and distance measures; and (d) classification using support vector machines (SVMs). We evaluated different temporal classifications using similarity and distance measures of land use and land cover considering several combinations of attributes. Among the classification using distance and similarity measures, the best result employed the Euclidean distance with the NDVI-MODIS data by considering more than one reference temporal curve per class and adopting six mapping classes. In the majority of tests, the SVM classifications yielded better results than other methods. The best result among all the tested methods was obtained using the SVM classifier with a fourth-degree polynomial kernel; an overall accuracy of 80.75% and a Kappa coefficient of 0.76 were obtained. Our results demonstrate the potential of vegetation studies in semiarid ecosystems using time-series data.
Article
Full-text available
This paper introduces the use of single layer and deep convolutional networks for remote sensing data analysis. Direct application to multi- and hyper-spectral imagery of supervised (shallow or deep) convolutional networks is very challenging given the high input data dimensionality and the relatively small amount of available labeled data. Therefore, we propose the use of greedy layer-wise unsupervised pre-training coupled with a highly efficient algorithm for unsupervised learning of sparse features. The algorithm is rooted on sparse representations and enforces both population and lifetime sparsity of the extracted features, simultaneously. We successfully illustrate the expressive power of the extracted representations in several scenarios: classification of aerial scenes, as well as land-use classification in very high resolution (VHR), or land-cover classification from multi- and hyper-spectral images. The proposed algorithm clearly outperforms standard Principal Component Analysis (PCA) and its kernel counterpart (kPCA), as well as current state-of-the-art algorithms of aerial classification, while being extremely computationally efficient at learning representations of data. Results show that single layer convolutional networks can extract powerful discriminative features only when the receptive field accounts for neighboring pixels, and are preferred when the classification requires high resolution and detailed results. However, deep architectures significantly outperform single layers variants, capturing increasing levels of abstraction and complexity throughout the feature hierarchy.
Article
Full-text available
The goal of precipitation nowcasting is to predict the future rainfall intensity in a local region over a relatively short period of time. Very few previous studies have examined this crucial and challenging weather forecasting problem from the machine learning perspective. In this paper, we formulate precipitation nowcasting as a spatiotemporal sequence forecasting problem in which both the input and the prediction target are spatiotemporal sequences. By extending the fully connected LSTM (FC-LSTM) to have convolutional structures in both the input-to-state and state-to-state transitions, we propose the convolutional LSTM (ConvLSTM) and use it to build an end-to-end trainable model for the precipitation nowcasting problem. Experiments show that our ConvLSTM network captures spatiotemporal correlations better and consistently outperforms FC-LSTM and the state-of-the-art operational ROVER algorithm for precipitation nowcasting.
Article
Full-text available
The correction of atmospheric effects is one of the preliminary steps required to make quantitative use of time series of high resolution images from optical remote sensing satellites. An accurate atmospheric correction requires good knowledge of the aerosol optical thickness (AOT) and of the aerosol type. As a first step, this study compares the performances of two kinds of AOT estimation methods applied to FormoSat-2 and LandSat time series of images: a multi-spectral method that assumes a constant relationship between surface reflectance measurements and a multi-temporal method that assumes that the surface reflectances are stable with time. In a second step, these methods are combined to obtain more accurate and robust estimates. The estimated AOTs are compared to in situ measurements on several sites of the AERONET (Aerosol Robotic Network). The methods, based on either spectral or temporal criteria, provide accuracies better than 0.07 in most cases, but show degraded accuracies in some special cases, such as the absence of vegetation for the spectral method or a very quick variation of landscape for the temporal method. The combination of both methods in a new spectro-temporal method increases the robustness of the results in all cases.
Article
Full-text available
We propose neural network model that demonstrates the phenomenon of signal transfer between separated neuron groups via other chaotic neurons that show no apparent correlations with the input signal. The model is a recurrent neural network in which it is supposed that synchronous behavior between small groups of input and output neurons has been learned as fragments of high-dimensional memory patterns, and depletion of neural connections results in chaotic wandering dynamics. Computer experiments show that when a strong oscillatory signal is applied to an input group in the chaotic regime, the signal is successfully transferred to the corresponding output group, although no correlation is observed between the input signal and the intermediary neurons. Signal transfer is also observed when multiple signals are applied simultaneously to separate input groups belonging to different memory attractors. In this sense simultaneous multichannel communications are realized, and the chaotic neural dynamics acts as a signal transfer medium in which the signal appears to be hidden.
Article
Full-text available
The classification of an annual time series by using data from past years is investigated in this letter. Several classification schemes based on data fusion, sparse learning, and semisupervised learning are proposed to address the problem. Numerical experiments are performed on a Moderate Resolution Imaging Spectroradiometer image time series and show that while several approaches have statistically equivalent performances, a support vector machine with I1 regularization leads to a better interpretation of the results due to their inherent sparsity in the temporal domain.
Article
Full-text available
In this paper, we propose a novel neural network model called RNN Encoder--Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder--Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.
Article
Full-text available
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
Article
Change detection is one of the central problems in earth observation and was extensively investigated over recent decades. In this paper, we propose a novel recurrent convolutional neural network (ReCNN) architecture, which is trained to learn a joint spectral-spatial-temporal feature representation in a unified framework for change detection in multispectral images. To this end, we bring together a convolutional neural network (CNN) and a recurrent neural network (RNN) into one end-to-end network. The former is able to generate rich spectral-spatial feature representations, while the latter effectively analyzes temporal dependency in bi-temporal images. In comparison with previous approaches to change detection, the proposed network architecture possesses three distinctive properties: 1) It is end-to-end trainable, in contrast to most existing methods whose components are separately trained or computed; 2) it naturally harnesses spatial information that has been proven to be beneficial to change detection task; 3) it is capable of adaptively learning the temporal dependency between multitemporal images, unlike most of algorithms that use fairly simple operation like image differencing or stacking. As far as we know, this is the first time that a recurrent convolutional network architecture has been proposed for multitemporal remote sensing image analysis. The proposed network is validated on real multispectral data sets. Both visual and quantitative analysis of experimental results demonstrates competitive performance in the proposed mode.
Article
Central to the looming paradigm shift toward data-intensive science, machine-learning techniques are becoming increasingly important. In particular, deep learning has proven to be both a major breakthrough and an extremely powerful tool in many fields. Shall we embrace deep learning as the key to everything? Or should we resist a black-box solution? These are controversial issues within the remote-sensing community. In this article, we analyze the challenges of using deep learning for remote-sensing data analysis, review recent advances, and provide resources we hope will make deep learning in remote sensing seem ridiculously simple. More importantly, we encourage remote-sensing scientists to bring their expertise into deep learning and use it as an implicit general model to tackle unprecedented, large-scale, influential challenges, such as climate change and urbanization.
Article
Hyperspectral image classification has become a research focus in recent literature. However, well-designed features are still open issues that impact on the performance of classifiers. In this paper, a novel supervised deep feature extraction method based on siamese convolutional neural network (S-CNN) is proposed to improve the performance of hyperspectral image classification. First, a CNN with five layers is designed to directly extract deep features from hyperspectral cube, where the CNN can be intended as a nonlinear transformation function. Then, the siamese network composed by two CNNs is trained to learn features that show a low intraclass and high interclass variability. The important characteristic of the presented approach is that the S-CNN is supervised with a margin ranking loss function, which can extract more discriminative features for classification tasks. To demonstrate the effectiveness of the proposed feature extraction method, the features extracted from three widely used hyperspectral data sets are fed into a linear support vector machine (SVM) classifier. The experimental results demonstrate that the proposed feature extraction method in conjunction with a linear SVM classifier can obtain better classification performance than that of the conventional methods.
Technical Report
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
Article
Enhancing the frequency of satellite acquisitions represents a key issue for Earth Observation community nowadays. Repeated observations are crucial for monitoring purposes, particularly when intra-annual process should be taken into account. Time series of images constitute a valuable source of information in these cases. The goal of this paper is to propose a new methodological framework to automatically detect and extract spatiotemporal information from satellite image time series (SITS). Existing methods dealing with such kind of data are usually classification-oriented and cannot provide information about evolutions and temporal behaviors. In this paper we propose a graph-based strategy that combines object-based image analysis (OBIA) with data mining techniques. Image objects computed at each individual timestamp are connected across the time series and generates a set of evolution graphs. Each evolution graph is associated to a particular area within the study site and stores information about its temporal evolution. Such information can be deeply explored at the evolution graph scale or used to compare the graphs and supply a general picture at the study site scale. We validated our framework on two study sites located in the South of France and involving different types of natural, semi-natural and agricultural areas. The results obtained from a Landsat SITS support the quality of the methodological approach and illustrate how the framework can be employed to extract and characterize spatiotemporal dynamics.
Article
In recent years, vector-based machine learning algorithms, such as random forests, support vector machines, and 1-D convolutional neural networks, have shown promising results in hyperspectral image classification. Such methodologies, nevertheless, can lead to information loss in representing hyperspectral pixels, which intrinsically have a sequence-based data structure. A recurrent neural network (RNN), an important branch of the deep learning family, is mainly designed to handle sequential data. Can sequence-based RNN be an effective method of hyperspectral image classification? In this paper, we propose a novel RNN model that can effectively analyze hyperspectral pixels as sequential data and then determine information categories via network reasoning. As far as we know, this is the first time that an RNN framework has been proposed for hyperspectral image classification. Specifically, our RNN makes use of a newly proposed activation function, parametric rectified tanh (PRetanh), for hyperspectral sequential data analysis instead of the popular tanh or rectified linear unit. The proposed activation function makes it possible to use fairly high learning rates without the risk of divergence during the training procedure. Moreover, a modified gated recurrent unit, which uses PRetanh for hidden representation, is adopted to construct the recurrent layer in our network to efficiently process hyperspectral data and reduce the total number of parameters. Experimental results on three airborne hyperspectral images suggest competitive performance in the proposed mode. In addition, the proposed network architecture opens a new window for future research, showcasing the huge potential of deep recurrent networks for hyperspectral data analysis.
Article
Nowadays, modern earth observation programs produce huge volumes of satellite images time series (SITS) that can be useful to monitor geographical areas through time. How to efficiently analyze such kind of information is still an open question in the remote sensing field. Recently, deep learning methods proved suitable to deal with remote sensing data mainly for scene classification (i.e. Convolutional Neural Networks - CNNs - on single images) while only very few studies exist involving temporal deep learning approaches (i.e Recurrent Neural Networks - RNNs) to deal with remote sensing time series. In this letter we evaluate the ability of Recurrent Neural Networks, in particular the Long-Short Term Memory (LSTM) model, to perform land cover classification considering multi-temporal spatial data derived from a time series of satellite images. We carried out experiments on two different datasets considering both pixel-based and object-based classification. The obtained results show that Recurrent Neural Networks are competitive compared to state-of-the-art classifiers, and may outperform classical approaches in presence of low represented and/or highly mixed classes. We also show that using the alternative feature representation generated by LSTM can improve the performances of standard classifiers.
Article
In this paper, we propose a multi-scale deep feature learning method for high-resolution satellite image classification. Specifically, we firstly warp the original satellite image into multiple different scales. The images in each scale are employed to train a deep convolutional neural network (DCNN). However, simultaneously training multiple DCNNs is time-consuming. To address this issue, we explore DCNN with spatial pyramid pooling (SPP-net). Since different SPP-nets have the same number of parameters, which share the identical initial values, and only fine-tuning the parameters in fully-connected layers ensures the effectiveness of each network, thereby greatly accelerating the training process. Then, the multi-scale satellite images are fed into their corresponding SPP-nets respectively to extract multi-scale deep features. Finally, a multiple kernel learning method is developed to automatically learn the optimal combination of such features. Experiments on two difficult datasets show that the proposed method achieves favorable performance compared to other state-of-the-art methods.
Article
The success of long short-term memory (LSTM) neural networks in language processing is typically attributed to their ability to capture long-distance statistical regularities. Linguistic regularities are often sensitive to syntactic structure; can such dependencies be captured by LSTMs, which do not have explicit structural representations? We begin addressing this question using number agreement in English subject-verb dependencies. We probe the architecture's grammatical competence both using training objectives with an explicit grammatical target (number prediction, grammaticality judgments) and using language models. In the strongly supervised settings, the LSTM achieved very high overall accuracy (less than 1% errors), but errors increased when sequential and structural information conflicted. The frequency of such errors rose sharply in the language-modeling setting. We conclude that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured.
Article
This work explores conditional image generation with a new image density model based on the PixelCNN architecture. The model can be conditioned on any vector, including descriptive labels or tags, or latent embeddings created by other networks. When conditioned on class labels from the ImageNet database, the model is able to generate diverse, realistic scenes representing distinct animals, objects, landscapes and structures. When conditioned on an embedding produced by a convolutional network given a single image of an unseen face, it generates a variety of new portraits of the same person with different facial expressions, poses and lighting conditions. We also show that conditional PixelCNN can serve as a powerful decoder in an image autoencoder, creating. Additionally, the gated convolutional layers in the proposed model improve the log-likelihood of PixelCNN to match the state-of-the-art performance of PixelRNN on ImageNet, with greatly reduced computational cost.
Article
Deep-learning (DL) algorithms, which learn the representative and discriminative features in a hierarchical manner from the data, have recently become a hotspot in the machine-learning area and have been introduced into the geoscience and remote sensing (RS) community for RS big data analysis. Considering the low-level features (e.g., spectral and texture) as the bottom level, the output feature representation from the top level of the network can be directly fed into a subsequent classifier for pixel-based classification. As a matter of fact, by carefully addressing the practical demands in RS applications and designing the input"output levels of the whole network, we have found that DL is actually everywhere in RS data analysis: from the traditional topics of image preprocessing, pixel-based classification, and target recognition, to the recent challenging tasks of high-level semantic feature extraction and RS scene understanding.
Article
Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch}. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.
Article
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.
Article
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
Article
Time-series remote sensing data, such as Moderate Resolution Imaging Spectroradiometer (MODIS) data hold considerable promise for investigating long-term dynamics of land use/cover change (LUCC), given their significant advantages of frequent temporal coverage and free cost. However, because of the complex ecological environment of wetlands, the applicability of these data for studying temporal dynamics of wetland-related land-cover types is limited. This is especially so for the Poyang Lake, China's largest freshwater lake, which has active seasonal and inter-annual dynamics. The primary objective of this study is to investigate the suitability of MODIS 250-m maximum value composite (MVC) vegetation indexes (VIs) for dynamics monitoring of the Poyang Lake. We applied time-series16-day MODIS NDVI from 2000 to 2012 and developed a method to classify wetland cover types based on timing of inundation. We combined techniques of applying iterative self-organizing data analysis (ISODATA) with varying numbers of clusters and a transformed divergence (TD) statistic, to implement annual classification of smoothed time-series NDVI. In addition, we propose a decision tree based on features derived from NDVI profiles, to characterize phenological differences among clusters. Supported by randomly generated validation samples from TM images and daily water level records, we obtained a satisfactory accuracy assessment report. Classification results showed various change patterns for four dominant land cover types. Water area showed a non-significant declining trend with average annual change rate 33.25 km2, indicating a drier Poyang lake, and emergent vegetation area had weak change over the past 13 years. Areas of submerged vegetation and mudflat expanded, with significant average annual change rate 23.51 km2 for the former. The results suggest that MODIS’ 250-m spatial resolution is appropriate and the classification method based on timing of inundation useful for mapping general land cover patterns of Poyang Lake.
Conference Paper
Recently, pre-trained deep neural networks (DNNs) have outperformed traditional acoustic models based on Gaussian mixture models (GMMs) on a variety of large vocabulary speech recognition benchmarks. Deep neural nets have also achieved excellent results on various computer vision tasks using a random “dropout” procedure that drastically improves generalization error by randomly omitting a fraction of the hidden units in all layers. Since dropout helps avoid over-fitting, it has also been successful on a small-scale phone recognition task using larger neural nets. However, training deep neural net acoustic models for large vocabulary speech recognition takes a very long time and dropout is likely to only increase training time. Neural networks with rectified linear unit (ReLU) non-linearities have been highly successful for computer vision tasks and proved faster to train than standard sigmoid units, sometimes also improving discriminative performance. In this work, we show on a 50-hour English Broadcast News task that modified deep neural networks using ReLUs trained with dropout during frame level training provide an 4.2% relative improvement over a DNN trained with sigmoid units, and a 14.4% relative improvement over a strong GMM/HMM system. We were able to obtain our results with minimal human hyper-parameter tuning using publicly available Bayesian optimization code.
Article
Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates \emph{deep recurrent neural networks}, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.