Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Drainage network extraction is essential for different research and applications. However, traditional methods have low efficiency, low accuracy for flat regions, and difficulties in detecting channel heads. Although deep learning techniques have been used to solve these problems, different challenges remain unsolved. Therefore, we introduced distributed representations of aspect features to facilitate the deep learning model calculating the flow direction; adopted a semantic segmentation model, U-Net, to improve the accuracy and efficiency in predicting flow directions and in pixel classifications; and used postprocessing to delineate the flowlines. Our proposed framework achieved state-of-the-art results compared with the traditional methods and the published deep-learning-based methods. Further, case study results demonstrated that our framework can extract drainage networks with high accuracy for rivers of different widths flowing through terrains of different characteristics. This framework, requiring no parameters provided by users, can also produce waterbody polygons and allow cyclic graphs in the drainage network.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Yanan et al. [6] fused a digital orthophoto model (DOM) with slope data to extract terrace edges using the Canny edge operator. Mao et al. [7] used the U-Net model as an intermediate step in the extraction of drainage networks. As an important topographic line in the Loess Plateau, the shoulder-line separates the complicated landform of the Loess Plateau into inter-gully and inner-gully areas, also known as positive and negative terrains (P-N terrain), respectively, in terms of the formation of landforms [8,9]. ...
... The growing standard equation considered the feature factors of the slope terrain and TPI of the elevations and grew into the final candidate region. The topographic position index is the difference between the centre grid elevation and the average elevation of its local window around the pixel, which reflects variations in the elevation [7,29]. The extraction steps for the candidate regions of the shoulder-lines are shown in Algorithm 1. ...
Article
Full-text available
The positive and negative terrains (P–N terrains) of the Loess Plateau of China are important geographical topography elements for measuring the degree of surface erosion and distinguishing the types of landforms. Loess shoulder-lines are an important terrain feature in the Loess Plateau and are often used as a criterion for distinguishing P–N terrains. The extraction of shoulder lines is important for predicting erosion and recognising a gully head. However, existing extraction algorithms for loess shoulder-lines in areas with insignificant slopes need to be improved. This study proposes a regional fusion (RF) method that integrates the slope variation-based method and region-growing algorithm to extract loess shoulder-lines based on a Digital Elevation Model (DEM) at a spatial resolution of 5 m. The RF method introduces different terrain factors into the growth standards of the region-growing algorithm to extract loess-shoulder lines. First, we employed a slope-variation-based method to build the initial set of loess shoulder-lines and used the difference between the smoothed and real DEMs to extract the initial set for the N terrain. Second, the region-growing algorithm with improved growth standards was used to generate a complete area of the candidate region of the loess shoulder-lines and the N terrain, which were fused to generate and integrate contours to eliminate the discontinuity. Finally, loess shoulder-lines were identified by detecting the edge of the integrated contour, with results exhibiting congregate points or spurs, eliminated via a hit-or-miss transform to optimise the final results. Validation of the experimental area of loess ridges and hills in Shaanxi Province showed that the accuracy of the RF method based on the Euclidean distance offset percentage within a 10-m deviation range reached 96.9% compared to the manual digitalisation method. Based on the mean absolute error and standard absolute deviation values, compared with Zhou’s improved snake model and the bidirectional DEM relief-shading methods, the proposed RF method extracted the loess shoulder-lines highly accurately.
... Geomorphology is a scientific discipline that examines the formation, distribution, and evolutionary patterns of Earth's surface features (Xiong et al. 2021). A digital elevation model (DEM) is a digital representation of the Earth's topographical surface, so it can be utilised in various fields, such as terrain analysis, feature identification, landform mapping and classification, landslide analysis, and the study of landform evolution (Graff and Usery 1993, Dr� aguţ and Eisank 2012, Xiong et al. 2014, Iwahashi et al. 2018, Mao et al. 2021, Ruiz-Lend� ınez et al. 2023. The use of DEMs has advanced geomorphology into a more quantitative field, improving the accuracy and automation of studies in these areas (Xiong et al. 2022). ...
... While deep learning was first applied in geosciences decades ago (Zhao and Mendel 1988, Dowla et al. Author Version 1990), its applications are recently reemerging in a variety of areas, including predicting discharge (e.g., Kratzert et al. 2019, Worland et al. 2019, Tennant et al. 2020, rainfall (Pan et al. 2019, Gauch et al. 2021, Adewoyin et al. 2021, landslide susceptibility (Ermini et al. 2005), river width (e.g., Ling et al. 2019), forecasting (e.g., Fleming et al. 2015) or reconstructing floods (e.g., Bomers et al. 2019), detecting sediment grain size (Chen et al. 2022), and mapping topographic features (Valentine et al. 2013), drainage networks (Mao et al. 2021), riverscape (Alfredsen et al. 2022), and riverbed sediment size (Marchetti et al. 2022). An artificial neural network consists of a succession of layers of connected neurons. ...
Article
Full-text available
Clustering and machine learning-based predictions are increasingly used for environmental data analysis and management. In fluvial geomorphology, examples include predicting channel types throughout a river network and segmenting river networks into a series of channel types, or groups of channel forms. However, when relevant information is unevenly distributed throughout a river network, the discrepancy between data-rich and data-poor locations creates an information gap. Combining clustering and predictions addresses this information gap, but challenges and limitations remain poorly documented. This is especially true when considering that predictions are often achieved with two approaches that are meaningfully different in terms of information processing: decision trees (e.g., RF: random forest) and deep learning (e.g., DNN: deep neural networks). This presents challenges for downstream management decisions and when comparing clusters and predictions within or across study areas. To address this, we investigate the performance of RF and DNN with respect to the information gap between clustering data and prediction data. We use nine regional examples of clustering and predicting river channel types, stemming from a single clustering methodology applied in California, USA. Our results show that prediction performance decreases when the information gap between field-measured data and geospatial predictors increases. Furthermore , RF outperforms DNN, and their difference in performance decreases when the information gap between field-measured and geospatial data decreases. This suggests that mismatched scales between field-derived channel types and geospatial predictors hinder sequential information processing in DNN. Finally, our results highlight a sampling trade-off between uniformly capturing geomorphic variability and ensuring robust generalization.
... As the availability of remote sensing data increases and further development of empirical models takes place, the associated increase in complexity may require new approaches. In this context, machine learning advancements have significantly enhanced the utilization of extensive data [25] to create new topographic datasets based on existing remote sensing sources [21,[26][27][28][29]. Consequently, machine learning has emerged as a powerful tool for generating high-accuracy DTMs comprising large amounts of remote sensing data, including altimetry measurements acquired using LiDAR instruments, such as ICESat and GEDI. ...
Article
Full-text available
Digital elevation models (DEMs) have a wide range of applications and play a crucial role in many studies. Numerous public DEMs, frequently acquired using radar and optical satellite imagery, are currently available; however, DEM datasets tend to exhibit elevation values influenced by vegetation height and coverage, compromising the accuracy of models in representing terrain elevation. In this study, we developed a digital terrain model for South America using a novel methodology to remove vegetation bias in the Copernicus DEM GLO-30 (COPDEM) model using machine learning, Global Ecosystem Dynamics Investigation (GEDI) elevation data, and multispectral remote sensing products. Our results indicate considerable improvements compared to COPDEM in representing terrain elevation, reducing average errors (BIAS) from 9.6 m to 1.5 m. Furthermore, we evaluated our product (ANADEM) by comparison with other global DEMs, obtaining more accurate results for different conditions of vegetation fraction cover and land use. As a publicly available and open-source dataset, ANADEM will play a crucial role in advancing studies that demand accurate terrain elevation representations at large scales.
... As the availability of remote sensing data increases and further development of empirical models takes place, the associated increase in complexity may require new approaches. In this context, machine learning advancements have significantly enhanced the utilization of extensive data to create new topographic datasets based on existing remote sensing sources [20,[24][25][26][27]. Consequently, machine learning has emerged as a powerful tool for generating high-accuracy DTMs comprising large amounts of remote sensing data, including altimetry measurements acquired using LiDAR instruments, such as ICESat and GEDI. ...
Preprint
Full-text available
Digital elevation models (DEMs) have a wide range of applications and play a crucial role in many studies. Numerous public DEMs, frequently acquired using radar and optical satellite imagery, are currently available; however, DEM datasets tend to exhibit elevation values influenced by vegetation height and coverage, which can compromise the accuracy of models in representing terrain elevation. In this study, we developed a digital terrain model for South America using a novel methodology to remove vegetation bias in the Copernicus DEM GLO-30 (COPDEM) model using machine learning, Global Ecosystem Dynamics Investigation (GEDI) elevation data, and multispectral remote sensing products. Our results indicate considerable improvements compared to COPDEM in representing terrain elevation, reducing average errors (BIAS) from 9.6 m to 1.5 m. Furthermore, we evaluated our product (ANADEM) by comparison with other global DEMs, obtaining more accurate results for different conditions of vegetation fraction cover and land use. As a publicly available and open-source dataset, ANADEM will play a crucial role in advancing studies that demand accurate terrain elevation representations at large scales.
... Pit removal is typically applied to the DEM before the flow accumulation algorithm to avoid the flow ending up in small depressions, caused by errors from elevation underestimation, which results in flat surfaces. However, flow accumulation algorithms in their basic form do not work well on flat surfaces [18]. For example, pit removal assumes that depressions are formed from an underestimation of elevation, when they can also be formed from an overestimation of elevation [19]. ...
Article
Full-text available
Vector datasets of small watercourses, such as rivulets, streams, and ditches, are important for many visualization and analysis use cases. Mapping small watercourses with traditional methods is laborious and costly. Convolutional neural networks (CNNs) are state-of-the-art computer vision methods that have been shown to be effective for extracting geospatial features, including small watercourses, from LiDAR point clouds, digital elevation models (DEMs), and aerial images. However, the cause of the false predictions by machine-learning models is often not thoroughly explored, and thus the impact of the results on the process of producing accurate datasets is not well understood. We digitized a highly accurate and complete dataset of small watercourses from a study area in Finland. We then developed a process based on a CNN that can be used to extract small watercourses from DEMs. We tested and validated the performance of the network with different input data layers, and their combinations to determine the best-performing layer. We analyzed the false predictions to gain an understanding of their nature. We also trained models where watercourses with high levels of uncertainty were removed from the training sets and compared the results to training models with all watercourses in the training set. The results show that the DEM was the best-performing layer and that combinations of layers provided worse results. Major causes of false predictions were shown to be boundary errors with an offset between the prediction and labeled data, as well as errors of omission by watercourses with high levels of uncertainty. Removing features with the highest level of uncertainty from the labeled dataset increased the overall f1-score but reduced the recall of the remaining features. Additional research is required to determine if the results remain similar to other CNN methods.
... Based on the correctness of the detected object, true positive (TP; correct classification to a class), false positive (FP; incorrect classification to a class), and false negative (FN; incorrect classification to other classes) cases are counted. Next, Eq. 3 and Eq. 4 are used to calculate precision (model's ability to detect only relevant objects) and recall (model's ability to detect all relevant classes) based on TP, FP and FN for each class (Guo et al., 2021;Mao et al., 2021;Padilla et al., 2020;Xu et al., 2021). It must be noted that true negative cases are not considered in object detection when measuring model performance, as there are countless number of objects (belonging to a large number of classes) that should not be detected in the input image (Padilla et al., 2020). ...
Article
Full-text available
Floods are one of the most prevalent and costliest natural hazards globally. The safe transit of people and goods during a flood event requires fast and reliable access to flood depth information with spatial granularity comparable to the road network. In this research, we propose to use crowdsourced photos of submerged traffic signs for street-level flood depth estimation and mapping. To this end, a deep convolutional neural network (CNN) is utilized to detect traffic signs in user-contributed photos, followed by comparing the lengths of the visible part of detected sign poles before and after the flood event. A tilt correction approach is also designed and implemented to rectify potential inaccuracy in pole length estimation caused by tilted stop signs in floodwaters. The mean absolute error (MAE) achieved for pole length estimation in pre- and post-flood photos is 1.723 and 2.846 in., respectively, leading to an MAE of 4.710 in. for flood depth estimation. The presented approach provides people and first responders with a reliable and geographically scalable solution for estimating and communicating real-time flood depth data at their locations.
... Thus, the overall time for the described raster setup is about 2.3 h using TATOO. (Mao et al., 2021). ...
Article
While modelling includes many detailed processes, the model setup gets costly. Delineating distributed parameters requires advanced GIS processing techniques and programming expertise, limiting models' usage to researchers and practitioners with sufficient resources. Although high-resolution input data get increasingly available, only a few preprocessing algorithms of integrated hydrological models can handle these in a reasonable time and for both subcatchment and raster model architectures. Here the collaborative open-source Python-3.6 Topographic Analysis Tool (TATOO) library is presented, integrating different models' preprocessing into one processing environment and combining not model-specific topographic preprocessing functions with model-specific parameter calculation functions. Utilising high-resolution DEMs and flow network shapefiles, TATOO offers algorithms to delineate (1) raster- or subcatchment-based model networks, (2) runoff generation, concentration and routing parameters, and (3) channel and foreland cross-section geometries including bankfull water levels. TATOO's capabilities and time requirements are demonstrated for the Large Area Runoff Simulation Model's preprocessing.
Article
Effective monitoring and forecasting of urban flooding are crucial for climate change adaptation and resilience around the world. We proposed a novel and automatic system for urban flood detection and quantification. Our software takes image/video data of flooding as inputs because the such data source is easy to obtain and widely available compared with conventional water level sensors or flood gauges. First, the kernel of our system is a robust water region segmentation module that detects flooded regions together with surrounding reference objects from the scene. We combine image and video segmentation technologies to make the system reliable under varying weather and illumination conditions. Second, our system uses the detected situated objects to determine the inundation depth. Field experiments demonstrate that our segmentation results are accurate and reliable; and our system can detect flooding and estimate inundation depths from images and time-lapse videos. Our code is available at https://github.com/xmlyqing00/V-FloodNet.
Article
Being a fundamental type of hydrogeomorphic parameter, drainage networks have been widely applied in the fields of geographic cartography, hydrologic and hydraulic modeling, water resource management, flood risk analysis, and so on. The rapid developments of modern remote sensing technologies have greatly improved the resolution and precision of spatial data providing a plenty of microtopographic information that could have significant impacts on the surface runoff, but most of the drainage network extraction methods have paid less attention to it especially for flat terrains. In order to deal with this problem, we propose a novel approach for high-quality drainage network extraction in flat terrains, in which the hydrogeomorphic features with the function of obstruction, guidance or collection to the runoff are considered as a priori knowledge used to extend digital elevation models (DEMs). Firstly, the hydrogeomorphic features that have obvious influences on surface runoff are classified, and used as a priori knowledge to extract drainage networks. After mathematically described by reorganizing their data into raster form and semanticizing their responses to runoff, this prior knowledge is then incorporated into a DEM to construct a Digital elevation-eXtended Model (DXM). Secondly, in order to solve the problem of erroneous or indeterminate flow directions in drainage network extraction based on the DEM alone, a method of drainage network extraction based on DXM is thereby developed. Finally, the proposed approach is applied and validated in a case study in Yanhu Lake area of Hoh Xil in the Qinghai–Tibet Plateau. The DEMs and digital orthophoto maps (DOMs) with sub-meter resolution were collected by employing unmanned aerial vehicle-based (UAV-based) structure-from-motion (SFM) photogrammetry. The hydrogeomorphic features with runoff responses were extracted from the DOMs and used as a priori knowledge. Various DXMs are constructed based on the extracted hydrogeomorphic features and DEMs to extract the drainage networks of the study area. The extraction results are qualitatively and quantitatively evaluated by visual comparisons and statistical analysis to validate the proposed approach. Additionally, the effects of DEM resolution and the presence of a priori knowledge on the quality of drainage network extraction were investigated. It is found that the DXM can clearly determine the flow directions by adding useful auxiliary information into the conventional D8 algorithm. This approach provides a new high-efficient way to extract the drainage networks in flat terrains.
Article
Overland flow is an important hydrological pathway in many forests of the humid tropics. Its generation is subject to topographic controls at differing spatial scales. Our objective was to identify such controls on the occurrence of overland flow in a lowland tropical rainforest. To this end, we installed 95 overland flow detectors (OFDs) in four nested subcatchments of the Lutzito catchment on Barro Colorado Island, Panama, and monitored the frequency of overland flow occurrence during 18 rainfall events at each OFD location temporal frequency. For each such location, we derived three non-digital terrain attributes and 17 digital ones, of which 15 were based on Digital Elevation Models (DEMs) of three different resolutions. These attributes then served as input into a Random Forest ensemble tree model to elucidate the importance and partial and joint dependencies of topographic controls for overland flow occurrence. Lutzito features a high median temporal frequency in overland flow occurrence of 0.421 among OFD locations. However, spatial temporal frequencies of overland flow occurrence vary strongly among these locations and the subcatchments of Lutzito catchment. This variability is best explained by (1) microtopography, (2) coarse terrain sloping and (3) various measures of distance-to-channel, with the contribution of all other terrain attributes being small. Microtopographic features such as concentrated flowlines and wash areas produce highest temporal frequencies, whereas the occurrence of overland flow drops sharply for flow distances and terrain sloping beyond certain threshold values. Our study contributes to understanding both the spatial controls on overland flow generation and the limitations of terrain attributes for the spatially explicit prediction of overland flow frequencies.
Article
Full-text available
The life cycle of leaves, from sprout to senescence, is the phenomenon of regular changes such as budding, branching, leaf spreading, flowering, fruiting, leaf fall, and dormancy due to seasonal climate changes. It is the effect of temperature and moisture in the life cycle on physiological changes, so the detection of newly grown leaves (NGL) is helpful for the estimation of tree growth and even climate change. This study focused on the detection of NGL based on deep learning convolutional neural network (CNN) models with sparse enhancement (SE). As the NGL areas found in forest images have similar sparse characteristics, we used a sparse image to enhance the signal of the NGL. The difference between the NGL and the background could be further improved. We then proposed hybrid CNN models that combined U-net and SegNet features to perform image segmentation. As the NGL in the image were relatively small and tiny targets, in terms of data characteristics, they also belonged to the problem of imbalanced data. Therefore, this paper further proposed 3-Layer SegNet, 3-Layer U-SegNet, 2-Layer U-SegNet, and 2-Layer Conv-U-SegNet architectures to reduce the pooling degree of traditional semantic segmentation models, and used a loss function to increase the weight of the NGL. According to the experimental results, our proposed algorithms were indeed helpful for the image segmentation of NGL and could achieve better kappa results by 0.743.
Article
Full-text available
Image semantic segmentation has been applied more and more widely in the fields of satellite remote sensing, medical treatment, intelligent transportation, and virtual reality. However, in the medical field, the study of cerebral vessel and cranial nerve segmentation based on true-color medical images is in urgent need and has good research and development prospects. We have extended the current state-of-the-art semantic-segmentation network DeepLabv3+ and used it as the basic framework. First, the feature distillation block (FDB) was introduced into the encoder structure to refine the extracted features. In addition, the atrous spatial pyramid pooling (ASPP) module was added to the decoder structure to enhance the retention of feature and boundary information. The proposed model was trained by fine tuning and optimizing the relevant parameters. Experimental results show that the encoder structure has better performance in feature refinement processing, improving target boundary segmentation precision, and retaining more feature information. Our method has a segmentation accuracy of 75.73%, which is 3% better than DeepLabv3+.
Article
Full-text available
The GaoFen-7 (GF-7) satellite, which was launched on November 3, 2019, is China's first civilian submeter stereo mapping satellite. The satellite is equipped with the first laser altimeter officially in China for earth observation. Except for the laser altimeter, the GF-7 spaceborne laser altimeter system also includes two laser footprint cameras and a laser optical axis surveillance camera. The laser altimeter system is designed and used to assist improving the elevation accuracy without Ground Control Points (GCPs) of the two line-array stereo mapping cameras. This paper details the design of the GF-7 spaceborne laser altimeter system, its ranging performance in the laboratory and its data processing method. The type of data products is also released. These data will play a vital role in the application of geography, glaciology, forestry and other industries.
Article
Full-text available
Automated and accurate wetland identification algorithms are increasingly important for wetland conservation and environmental planning. Deep learning for wetland identification is an emerging field that shows promise for advancing these efforts. Deep learning is unique to traditional machine learning techniques for its ability to consider the spatial context of object characteristics within a landscape scene. However, applying deep learning typically requires very large datasets for training the algorithms, which limits their application for many environmental applications including wetland identification. Using four study sites across Virginia with field delineated wetlands, we provide insight into the potential for deep learning for wetland detection from limited, but typical, wetland delineation training data. Our proposed workflow performs a wetland semantic segmentation using DeepNets, a deep learning architecture for remote sensing data, and an input dataset consisting of high-resolution topographic indices and the Normalized Difference Vegetation Index. Results show that models trained and evaluated for a single site were able to achieve high accuracy (up to 91% recall and 56% precision) and similar accuracy can be obtained for models trained across multiple sites (up to 91% recall and 57% precision). Through this analysis we found that, across all sites, input data configurations taking advantage of hydrologic properties derived from elevation data consistently outperformed models using the elevation data directly, showing the benefit of physically-informed inputs in deep learning training for wetland identification. By refining the wetland identification workflow presented in this paper and collecting additional training data across landscapes, there is potential for deep learning algorithms to support a range wetland conservation efforts.
Article
Full-text available
As the resolution of DEMs is becoming higher, both the efficiency and accuracy of channel head recognition are important for drainage network extraction. In this paper, a D8-compatible high-efficient channel head recognition method is proposed. For each potential channel, this method first calculates a geomorphologic parameter series along the flow path, and then determines the channel head by detecting the change point in the series. Instead of directly using one threshold value for a whole region that is still commonly used in D8-based methods, the proposed method recognizes channel heads by their local geomorphology one by one. The proposed method is applied with high resolution DEMs for different terrains, and the identified channel heads and extracted drainage networks show good agreement with observations. In comparison with a popular software with state-of-the-art channel head recognition method, the proposed one shows similar accuracy but much higher computational efficiency.
Article
Full-text available
Drainage network extraction plays an important role in geomorphologic analyses, hydrologic modeling, and non-point source pollutant simulation, among others. Flow enforcement, by imposing information of known river maps to digital elevation models (DEMs), leads to improved drainage network extraction. However, the existing flow enforcement methods (e.g., the elevation-based stream-burning method) have certain limitations, as they may cause unreal longitudinal profiles, lead to unintended topological errors, and even misinterpret the overall drainage patterns. The present study proposes an enhanced flow enforcement method without elevation modification towards an accurate and efficient drainage network extraction. In addition to preserving the Boolean-value information as to whether a DEM pixel belongs to a stream, the proposed method can also well preserve and fully utilize the topological relations among mapped streamlines and morphological information of each mapped streamline. The method involves two important steps: (1) proposal of an improved rasterization algorithm of mapped streamlines to yield continuous, unambiguous, and collision-free raster equivalent of stream vectors for flow enforcement; and (2) realization of the enhanced flow enforcement in a modified Priority-Flood procedure –– in this way, flows are enforced to completely follow the mapped streamlines, and hence, channel short-circuits and spurious confluences of adjacent streams are avoided. An efficient implementation of the method is made based on a size-balanced binary search tree. The method is also tested over the Rogue River Basin in the United States, using DEMs with various resolutions. Visual and statistical analyses of the results indicate three major advantages of the proposed method: (1) significant reduction in the misinterpretation of drainage patterns; (2) maximum channel displacement of one pixel to the river map at various resolutions; and (3) high computational efficiency.
Article
Full-text available
Background: Corneal endothelium (CE) images provide valuable clinical information regarding the health state of the cornea. Computation of the clinical morphometric parameters requires the segmentation of endothelial cell images. Current techniques to image the endothelium in vivo deliver low quality images, which makes automatic segmentation a complicated task. Here, we present two convolutional neural networks (CNN) to segment CE images: a global fully convolutional approach based on U-net, and a local sliding-window network (SW-net). We propose to use probabilistic labels instead of binary, we evaluate a preprocessing method to enhance the contrast of images, and we introduce a postprocessing method based on Fourier analysis and watershed to convert the CNN output images into the final cell segmentation. Both methods are applied to 50 images acquired with an SP-1P Topcon specular microscope. Estimates are compared against a manual delineation made by a trained observer. Results: U-net (AUC=0.9938) yields slightly sharper, clearer images than SW-net (AUC=0.9921). After postprocessing, U-net obtains a DICE=0.981 and a MHD=0.22 (modified Hausdorff distance), whereas SW-net yields a DICE=0.978 and a MHD=0.30. U-net generates a wrong cell segmentation in only 0.48% of the cells, versus 0.92% for the SW-net. U-net achieves statistically significant better precision and accuracy than both, Topcon and SW-net, for the estimates of three clinical parameters: cell density (ECD), polymegethism (CV), and pleomorphism (HEX). The mean relative error in U-net for the parameters is 0.4% in ECD, 2.8% in CV, and 1.3% in HEX. The computation time to segment an image and estimate the parameters is barely a few seconds. Conclusions: Both methods presented here provide a statistically significant improvement over the state of the art. U-net has reached the smallest error rate. We suggest a segmentation refinement based on our previous work to further improve the performance.
Article
Full-text available
High-resolution (HR) digital elevation models (DEMs), such as those at resolutions of 1 and 3 meters, have increasingly become more widely available, along with lidar point cloud data. In a natural environment, a detailed surface water drainage network can be extracted from a HR DEM using flow-direction and flow-accumulation modeling. However, elevation details captured in HR DEMs, such as roads and overpasses, can form barriers that incorrectly alter flow accumulation models, and hinder the extraction of accurate surface water drainage networks. This study tests a deep learning approach to identify the intersections of roads and stream valleys, whereby valley channels can be burned through road embankments in a HR DEM for subsequent flow accumulation modeling, and proper natural drainage network extraction.
Article
Full-text available
Runoff and river routing schemes play important roles in land surface models (LSMs) because they regulate the soil moisture, heat flux and vegetation dynamics of land surface processes and account for the carbon and nutrient transport in river channels. However, these schemes are often simplistic and conceptual. Hence, in this study, we focused on (1) evaluating these schemes in the Community Land Model (CLM) and (2) altering the model representations using physically based schemes based on a study of the Yellow River basin. For runoff simulation, CLM exhibits limitations in water-limited areas, especially areas with aridity index values greater than 2. Additionally, CLM greatly overestimates runoff, and produces negative Nash-Sutcliffe efficiency (NSE) coefficients, few quick flow variations, high quick flow index values and large root mean square errors of total water storage. These issues were greatly improved by implementing schemes of physically based overland flow, lateral soil water flow and interchange between groundwater and river water. Furthermore, for the river routing scheme, the use of 1D kinematic wave routing improved previously unrealistic aspects of the flood hydrographs (e.g., incorrect flood peak values and times) of the original CLM. Moreover, compared with using the grid element-based river routing approach in CLM, the computational costs were reduced by up to 45% by implementing the flow interval element method. Finally, our results indicate that parameter calibration in CLM minimally improved the simulation performance in water-limited areas, while physically based schemes generated better results.
Article
Full-text available
The expressive power of neural networks is important for understanding deep learning. Most existing works consider this problem from the view of the depth of a network. In this paper, we study how width affects the expressiveness of neural networks. Classical results state that \emph{depth-bounded} (e.g. depth-2) networks with suitable activation functions are universal approximators. We show a universal approximation theorem for \emph{width-bounded} ReLU networks: width-(n+4) ReLU networks, where n is the input dimension, are universal approximators. Moreover, except for a measure zero set, all functions cannot be approximated by width-n ReLU networks, which exhibits a phase transition. Several recent works demonstrate the benefits of depth by proving the depth-efficiency of neural networks. That is, there are classes of deep networks which cannot be realized by any shallow network whose size is no more than an \emph{exponential} bound. Here we pose the dual question on the width-efficiency of ReLU networks: Are there wide networks that cannot be realized by narrow networks whose size is not substantially larger? We show that there exist classes of wide networks which cannot be realized by any narrow network whose depth is no more than a \emph{polynomial} bound. On the other hand, we demonstrate by extensive experiments that narrow networks whose size exceed the polynomial bound by a constant factor can approximate wide and shallow network with high accuracy. Our results provide more comprehensive evidence that depth is more effective than width for the expressiveness of ReLU networks.
Article
Full-text available
Digital terrain model (DTM) is the base for calculation of the surface runoff under the influence of the gravity (gravity flow) in hydrological analysis. It is important to produce hydrologically corrected DTM with the removed natural and artificial depressions to avoid numerical problems in algorithms of the gravity flow. The pit removal procedure changes geomorphometry of the DTM. GIS software packages use pit removal algorithm independently of geomorphmetric features of the analyzed area. In need of minimally modified DTM after the pit removal areas, the carving method (deepen drainage routes) and the filling method (fill sink) were analyzed on three different geomorphometric areas (bare mountain range, hilly wooded area and the plain area intersected with the network of the drainage canals). The recommendation is given for the choice of geomorphometric least changing DTM algorithm. The input data are raster data of elevation points created by stereoscopic photogrammetry method in 5x5 and 25x25 meter resolution. Differences have been noticed during the process of creating raster data. The recommendation is given for the choice of the most acceptable method for each type of area on the basis of comparison of the original elevation points with the elevation points in created DTM.
Article
Full-text available
Digital terrain model (DTM) is the base for calculation of the surface runoff under the influence of the gravity (gravity flow) in hydrological analysis. It is important to produce hydrologically corrected DTM with the removed natural and artificial depressions to avoid numerical problems in algorithms of the gravity flow. The pit removal procedure changes geomorphometry of the DTM. GIS software packages use pit removal algorithm independently of geomorphmetric features of the analyzed area. In need of minimally modified DTM after the pit removal areas, the carving method (deepen drainage routes) and the filling method (fill sink) were analyzed on three different geomorphometric areas (bare mountain range, hilly wooded area and the plain area intersected with the network of the drainage canals). The recommendation is given for the choice of geomorphometric least changing DTM algorithm. The input data are raster data of elevation points created by stereoscopic photogrammetry method in 5x5 and 25x25 meter resolution. Differences have been noticed during the process of creating raster data. The recommendation is given for the choice of the most acceptable method for each type of area on the basis of comparison of the original elevation points with the elevation points in created DTM.
Article
Full-text available
We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network . The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the fully convolutional network (FCN) architecture and its variants. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance. The design of SegNet was primarily motivated by road scene understanding applications. Hence, it is efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than competing architectures and can be trained end-to-end using stochastic gradient descent. We also benchmark the performance of SegNet on Pascal VOC12 salient object segmentation and the recent SUN RGB-D indoor scene understanding challenge. We show that SegNet provides competitive performance although it is significantly smaller than other architectures. We also provide a Caffe implementation of SegNet and a webdemo at http://mi.eng.cam.ac.uk/projects/segnet/
Article
Full-text available
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
Article
Full-text available
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .
Technical Report
Full-text available
In 1996, the U.S. Geological Survey (USGS) developed a global topographic elevation model designated as GTOPO30 at a horizontal resolution of 30 arc-seconds for the entire Earth. Because no single source of topographic information covered the entire land surface, GTOPO30 was derived from eight raster and vector sources that included a substantial amount of U.S. Defense Mapping Agency data. The quality of the elevation data in GTOPO30 varies widely; there are no spatially-referenced metadata, and the major topographic features such as ridgelines and valleys are not well represented. Despite its coarse resolution and limited attributes, GTOPO30 has been widely used for a variety of hydrological, climatological, and geomorphological applications as well as military applications, where a regional, continental, or global scale topographic model is required. These applications have ranged from delineating drainage networks and watersheds to using digital elevation data for the extraction of topographic structure and three-dimensional (3D) visualization exercises (Jenson and Domingue, 1988; Verdin and Greenlee, 1996; Lehner and others, 2008). Many of the fundamental geophysical processes active at the Earth's surface are controlled or strongly influenced by topography, thus the critical need for high-quality terrain data (Gesch, 1994). U.S. Department of Defense requirements for mission planning, geographic registration of remotely sensed imagery, terrain visualization, and map production are similarly dependent on global topographic data. Since the time GTOPO30 was completed, the availability of higher-quality elevation data over large geographic areas has improved markedly. New data sources include global Digital Terrain Elevation Data (DTED®) from the Shuttle Radar Topography Mission (SRTM), Canadian elevation data, and data from the Ice, Cloud, and land Elevation Satellite (ICESat). Given the widespread use of GTOPO30 and the equivalent 30-arc-second DTED® level 0, the USGS and the National Geospatial-Intelligence Agency (NGA) have collaborated to produce an enhanced replacement for GTOPO30, the Global Land One-km Base Elevation (GLOBE) model and other comparable 30-arc-second-resolution global models, using the best available data. The new model is called the Global Multi-resolution Terrain Elevation Data 2010, or GMTED2010 for short. This suite of products at three different resolutions (approximately 1,000, 500, and 250 meters) is designed to support many applications directly by providing users with generic products (for example, maximum, minimum, and median elevations) that have been derived directly from the raw input data that would not be available to the general user or would be very costly and time-consuming to produce for individual applications. The source of all the elevation data is captured in metadata for reference purposes. It is also hoped that as better data become available in the future, the GMTED2010 model will be updated.
Article
Full-text available
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Conference Paper
The expressive power of neural networks is important for understanding deep learning. Most existing works consider this problem from the view of the depth of a network. In this paper, we study how width affects the expressiveness of neural networks. Classical results state that depth-bounded (e.g. depth-2) networks with suitable activation functions are universal approximators. We show a universal approximation theorem for width-bounded ReLU networks: width-(n + 4) ReLU networks, where n is the input dimension, are universal approximators. Moreover, except for a measure zero set, all functions cannot be approximated by width-n ReLU networks, which exhibits a phase transition. Several recent works demonstrate the benefits of depth by proving the depth-efficiency of neural networks. That is, there are classes of deep networks which cannot be realized by any shallow network whose size is no more than an exponential bound. Here we pose the dual question on the width-efficiency of ReLU networks: Are there wide networks that cannot be realized by narrow networks whose size is not substantially larger? We show that there exist classes of wide networks which cannot be realized by any narrow network whose depth is no more than a polynomial bound. On the other hand, we demonstrate by extensive experiments that narrow networks whose size exceed the polynomial bound by a constant factor can approximate wide and shallow network with high accuracy. Our results provide more comprehensive evidence that depth may be more effective than width for the expressiveness of ReLU networks.
Article
Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information. In this work, we propose to combine the advantages from both methods. Specifically, our proposed model, DeepLabv3+, extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further explore the Xception model and apply the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network. We demonstrate the effectiveness of the proposed model on the PASCAL VOC 2012 semantic image segmentation dataset and achieve a performance of 89% on the test set without any post-processing. Our paper is accompanied with a publicly available reference implementation of the proposed models in Tensorflow.
Article
High-resolution Digital Elevation Models (DEMs) can be used to extract high-accuracy prerequisite drainage networks. A higher resolution represents a larger number of grids. With an increase in the number of grids, the flow direction determination will require substantial computer resources and computing time. Parallel computing is a feasible method with which to resolve this problem. In this paper, we proposed a parallel programming method within the.NET Framework with a C# Compiler in a Windows environment. The basin is divided into sub-basins, and subsequently the different sub-basins operate on multiple threads concurrently to calculate flow directions. The method was applied to calculate the flow direction of the Yellow River basin from 3 arc-second resolution SRTM DEM. Drainage networks were extracted and compared with HydroSHEDS river network to assess their accuracy. The results demonstrate that this method can calculate the flow direction from high-resolution DEMs efficiently and extract high-precision continuous drainage networks.
Conference Paper
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large num- ber of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alterna- tive to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of “Canada” and “Air” cannot be easily combined to obtain “Air Canada”. Motivated by this example,we present a simplemethod for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Article
In this paper, we show a phenomenon where residual networks can be trained using an order of magnitude fewer iterations than is used with standard training methods, which we named "super-convergence". One of the key elements of super-convergence is training with cyclical learning rates and a large maximum learning rate. Furthermore, we present evidence that training with large learning rates improves performance by regularizing the network. In addition, we show that super-convergence provides a greater boost in performance relative to standard training when the amount of labeled training data is limited. We also provide an explanation for the benefits of a large learning rate using a simplification of the Hessian Free optimization method to compute an estimate of the optimal learning rate. The architectures and code to replicate the figures in this paper are available at github.com/lnsmith54/super-convergence.
Article
In this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks, in the application of semantic image segmentation. To handle the problem of segmenting objects at multiple scales, we design modules which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates. Furthermore, we propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. We also elaborate on implementation details and share our experience on training our system. The proposed `DeepLabv3' system significantly improves over our previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.
Article
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Article
Convolutional networks for image classification progressively reduce resolution until the image is represented by tiny feature maps in which the spatial structure of the scene is no longer discernible. Such loss of spatial acuity can limit image classification accuracy and complicate the transfer of the model to downstream applications that require detailed scene understanding. These problems can be alleviated by dilation, which increases the resolution of output feature maps without reducing the receptive field of individual neurons. We show that dilated residual networks (DRNs) outperform their non-dilated counterparts in image classification without increasing the model's depth or complexity. We then study gridding artifacts introduced by dilation, develop an approach to removing these artifacts (`degridding'), and show that this further increases the performance of DRNs. In addition, we show that the accuracy advantage of DRNs is further magnified in downstream applications such as object localization and semantic segmentation.
Article
Batch Normalization is quite effective at accelerating and improving the training of deep models. However, its effectiveness diminishes when the training minibatches are small, or do not consist of independent samples. We hypothesize that this is due to the dependence of model layer inputs on all the examples in the minibatch, and different activations being produced between training and inference. We propose Batch Renormalization, a simple and effective extension to ensure that the training and inference models generate the same outputs that depend on individual examples rather than the entire minibatch. Models trained with Batch Renormalization perform substantially better than batchnorm when training with small or non-i.i.d. minibatches. At the same time, Batch Renormalization retains the benefits of batchnorm such as insensitivity to initialization and training efficiency.
Conference Paper
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .
Article
In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.
Article
Extracting hydrologic and geomorphic features from high resolution topography data is a challenging and computationally demanding task. We illustrate the new capabilities and features of GeoNet, an open source software for the extraction of channel heads, channel networks, and channel morphology from high resolution topography data. The method has been further developed and includes a median filtering operation to remove roads in engineered landscapes and the calculation of hillslope lengths to inform the channel head identification procedure. The software is now available in both MATLAB and Python, allowing it to handle datasets larger than the ones previously analyzed. We present the workflow of GeoNet using three different test cases; natural high relief, engineered low relief, and urban landscapes. We analyze default and user-defined parameters, provide guidance on setting parameter values, and discuss the parameter effect on the extraction results. Metrics on computational time versus dataset size are also presented. We show the ability of GeoNet to objectively and accurately extract channel features in terrains of various characteristics.
Article
The most important and useful information about topographical relief besides the store of height points are the valley and ridge lines of the terrain. They are, in fact, the skeleton of the earth's surface and can be regarded as the structure lines of the relief. Geometrically speaking these lines are the trajectories of the minima and maxima of the relief. Ridge lines are the watersheds at both sides of which the terrain descends while at both sides of a valley line the ground ascends. The article demonstrates the feasibility of using a computer to find the valley and ridge line system of topographical relief. This may also prove to be a valuable tool in the field of analytical geomorphology. -from Author
Article
Digital Design and Computer Architecture takes a unique and modern approach to digital design. Beginning with digital logic gates and progressing to the design of combinational and sequential circuits, Harris and Harris use these fundamental building blocks as the basis for what follows: the design of an actual MIPS processor. SystemVerilog and VHDL are integrated throughout the text in examples illustrating the methods and techniques for CAD-based circuit design. By the end of this book, readers will be able to build their own microprocessor and will have a top-to-bottom understanding of how it works. Harris and Harris have combined an engaging and humorous writing style with an updated and hands-on approach to digital design. This second edition has been updated with new content on I/O systems in the context of general purpose processors found in a PC as well as microcontrollers found almost everywhere. The new edition provides practical examples of how to interface with peripherals using RS232, SPI, motor control, interrupts, wireless, and analog-to-digital conversion. High-level descriptions of I/O interfaces found in PCs include USB, SDRAM, WiFi, PCI Express, and others. In addition to expanded and updated material throughout, SystemVerilog is now featured in the programming and code examples (replacing Verilog), alongside VHDL. This new edition also provides additional exercises and a new appendix on C programming to strengthen the connection between programming and processor architecture.
Article
Stream burning is a common flow enforcement technique used to correct surface drainage patterns derived from digital elevation models (DEM). The technique involves adjusting the elevations of grid cells that are coincident with the features of a vector hydrography layer. This paper focuses on the problematic issues with common stream burning practices, particularly the topological errors resulting from the mismatched scales of the hydrography and DEM data sets. A novel alternative stream burning method is described and tested using five DEMs of varying resolutions (1 to 30 arc-seconds) for an extensive area of southwestern Ontario, Canada. This TopologicalBreachBurn method uses total upstream channel length (TUCL) to prune the vector hydrography layer to a level of detail that matches the raster DEM grid resolution. Network pruning reduces the occurrence of erroneous stream piracy caused by the rasterization of multiple stream links to the same DEM grid cell. The algorithm also restricts flow within individual stream reaches, further reducing erroneous stream piracy. In situations where two vector stream features occupy the same grid cell, the new tool ensures that the larger stream, designated by higher TUCL, is given priority. TUCL-based priority minimizes the impact of the topological errors that occur during the stream rasterization process on modeled regional drainage patterns. The test data demonstrated that TopologicalBreachBurn produces highly accurate and scale-insensitive drainage patterns and watershed boundaries. The drainage divides of four large watersheds within the study region that were delineated from the TopologicalBreachBurn-processed DEMs were found to be highly accurate when compared with the official watershed boundaries, even at the coarsest grid resolutions, with Kappa index of agreement values ranging from 0.952 to 0.921. The corresponding Kappa coefficient values for a traditional stream burning method (FillBurn) ranged from 0.953 to 0.490, demonstrating a significant decrease in mapping accuracy at coarser DEM grid resolutions.
Article
In this paper we investigate the performance of different types of rectified activation functions in convolutional neural network: standard rectified linear unit (ReLU), leaky rectified linear unit (Leaky ReLU), parametric rectified linear unit (PReLU) and a new randomized leaky rectified linear units (RReLU). We evaluate these activation function on standard image classification task. Our experiments suggest that incorporating a non-zero slope for negative part in rectified activation units could consistently improve the results. Thus our findings are negative on the common belief that sparsity is the key of good performance in ReLU. Moreover, on small scale dataset, using deterministic negative slope or learning it are both prone to overfitting. They are not as effective as using their randomized counterpart.
Article
Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch}. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.
Article
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.
Article
Deep Convolutional Neural Networks (DCNNs) have recently shown state of the art performance in high level vision tasks, such as image classification and object detection. This work brings together methods from DCNNs and probabilistic graphical models for addressing the task of pixel-level classification (also called "semantic image segmentation"). We show that responses at the final layer of DCNNs are not sufficiently localized for accurate object segmentation. This is due to the very invariance properties that make DCNNs good for high level tasks. We overcome this poor localization property of deep networks by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF). Qualitatively, our "DeepLab" system is able to localize segment boundaries at a level of accuracy which is beyond previous methods. Quantitatively, our method sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 66.4% IOU accuracy in the test set. We show how these results can be obtained efficiently: Careful network re-purposing and a novel application of the 'hole' algorithm from the wavelet community allow dense computation of neural net responses at 8 frames per second on a modern GPU.
Article
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make train-ing faster, we used non-saturating neurons and a very efficient GPU implemen-tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
Article
Key Points Drainage network extraction methods should not rely on contributing area Drainage networks can be extracted from digital data using just two parameters The method works well on different landscapes using the same parameter values