Article

Flood hazard risk assessment model based on random forest

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... According to the data, the samples in which floods did not occur are considered to be the areas with the lowest risk level. Moreover, this typical selection of risk categories was used by previous investigations in which the flooded areas at the flooding time were classified as the areas that are at medium and high risk [17,57,58]. The number and distribution of the sample data in the study area are shown in Figure 3. ...
... In the RF method, each tree is dependent on the values of a random input vector independently and has equal distribution to all of the trees in that forest [59]. Currently, this algorithm is a widely used classification algorithm for high-dimensional data as well as multi-modal data classification among classification algorithms [57,60]. The RF algorithm is highly useful to reduce the frequently reported overfitting cases that are associated with the Decision Tree (DT) model. ...
... In this study, to avoid the possibility of overfitting, we assigned five-folds during RF performance among other k-fold values. Similar investigations that have used ML approaches have proven that five folds are generally a sufficient number of folds to reduce overfitting occurrence [17,43,44,57]. After performing the model in the training and testing phases, the RF model was developed for all pixels (206216). ...
Article
Full-text available
Detecting effective parameters in flood occurrence is one of the most important issues that has drawn more attention in recent years. Remote Sensing (RS) and Geographical Information System (GIS) are two efficient ways to spatially predict Flood Risk Mapping (FRM). In this study, a web-based platform called the Google Earth Engine (GEE) (Google Company, Mountain View, CA, USA) was used to obtain flood risk indices for the Galikesh River basin, Northern Iran. With the aid of Landsat 8 satellite imagery and the Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM), 11 risk indices (Elevation (El), Slope (Sl), Slope Aspect (SA), Land Use (LU), Normalized Difference Vegetation Index (NDVI), Normalized Difference Water Index (NDWI), Topographic Wetness Index (TWI), River Distance (RD), Waterway and River Density (WRD), Soil Texture (ST]), and Maximum One-Day Precipitation (M1DP)) were provided. In the next step, all of these indices were imported into ArcMap 10.8 (Esri, West Redlands, CA, USA) software for index normalization and to better visualize the graphical output. Afterward, an intelligent learning machine (Random Forest (RF)), which is a robust data mining technique, was used to compute the importance degree of each index and to obtain the flood hazard map. According to the results, the indices of WRD, RD, M1DP, and El accounted for about 68.27 percent of the total flood risk. Among these indices, the WRD index containing about 23.8 percent of the total risk has the greatest impact on floods. According to FRM mapping, about 21 and 18 percent of the total areas stood at the higher and highest risk areas, respectively.
... Commonly, machine-learning (ML; Breiman, 1984) models are used, often ensembled with multi-criteria decision-making techniques (Triantaphyllou et al., 2000;Ho et al., 2010). Some authors (Degiorgis et al., 2012;Gnecco et al., 2017) have tested a blend of GDs, while some others mixed these indices with information on land use, soil geology, and climate and compared different combination strategies (e.g., Wang et al., 2015;Lee et al., 2017;Khosravi et al., 2018;Arabameri et al., 2019;Janizadeh et al., 2019;Costache et al., 2020). These studies suggest that data-driven flood hazard mapping has a remarkable potential. ...
... First, previous studies (e.g., Manfreda et al., 2015;Samela et al., 2017) clearly showed that D and HAND are the most descriptive single-feature indices for flood hazard mapping, sufficiently accurate in mountainous regions but still inadequate over predominantly flat areas, whereas, among composite feature indices, GFI and LGFI show good performance in both the geographical contexts. Also, in several studies (e.g., Wang et al., 2015;Lee et al., 2017;Khosravi et al., 2018;Janizadeh et al., 2019;Costache et al., 2020), elevation retrieved from DEMs is shown to have a strong influence on flood occurrence. Slope appears to be the most important index in Khosravi et al. (2018) and Costache et al. (2020) and among the most influential ones in Arabameri et al. (2019). ...
... The second research question of the present study is whether it is possible to obtain a good estimation of flood hazard by combining multiple GDs with low-complexity machinelearning models. Differently from several other contributions in the literature, we do not focus on model complexity nor on the comparison of different models (Wang et al., 2015;Khosravi et al., 2018;Mosavi et al., 2018;Arabameri et al., 2019;Costache et al., 2020). Instead, we prefer to select one simple model type (i.e., decision trees, DTs) and focus on the combination of the five innovative elements listed in the "Introduction" section; in this way, we can analyse the influence on the multivariate DEM-based approach of the preliminary steps, consisting of data pre-processing (i.e., selection and manipulation of input features, target maps, training set, and test set). ...
Article
Full-text available
Recent literature shows several examples of simplified approaches that perform flood hazard (FH) assessment and mapping across large geographical areas on the basis of fast-computing geomorphic descriptors. These approaches may consider a single index (univariate) or use a set of indices simultaneously (multivariate). What is the potential and accuracy of multivariate approaches relative to univariate ones? Can we effectively use these methods for extrapolation purposes, i.e., FH assessment outside the region used for setting up the model? Our study addresses these open problems by considering two separate issues: (1) mapping flood-prone areas and (2) predicting the expected water depth for a given inundation scenario. We blend seven geomorphic descriptors through decision tree models trained on target FH maps, referring to a large study area (∼ 105 km2). We discuss the potential of multivariate approaches relative to the performance of a selected univariate model and on the basis of multiple extrapolation experiments, where models are tested outside their training region. Our results show that multivariate approaches may (a) significantly enhance flood-prone area delineation (accuracy: 92 %) relative to univariate ones (accuracy: 84 %), (b) provide accurate predictions of expected inundation depths (determination coefficient ∼ 0.7), and (c) produce encouraging results in extrapolation.
... A key concern in flood risk assessment is to explore flood-inducing factors, as floods generally arise from multiple drivers. The selection of factors is related to flood type, spatial scale, characteristics of the study area, etc. [20][21][22]. Another key issue involved in the multi-criteria analysis is how to determine weights of flood-inducing factors. ...
... In general, meteorological factors such as rainfall intensity, rainfall frequency, etc. are the main causes of flood disasters, while topographical and geomorphic factors such as river network area, slope, infiltration capacity, etc. also have a great impact on flood disasters [53][54][55]. After going through a detailed analysis of the literature [20,22,56,57] and analysis about climate, topography, social-economic in the Poyang Lake Basin, this study collected ten relevant index data from the consideration of the hazard of flood inducing factors and the vulnerability of the hazard-bearing body. These 10 factors then were divided to six hazard factors and four vulnerability factors according to their attributes on the flood. ...
... RF refers to the annual number of rainstorm days; ARA refers to the annual average total rainstorm. DEM contains various topographical information of the basin, and the lower the elevation in the area, the more prone to flooding [22]. The elevation ( Fig. 2d) was divided into five classes. ...
Article
China suffers the most serious loss of life and property with the most floods in the world. In this study, a multi-criteria analysis model with the combined analytic hierarchy process and Entropy weight method (AHP-Entropy) was proposed to assess the long and short-term flood risk in Poyang Lake basin, and results were verified by several flood events that happened on July 2020. Considering multi-factors of flood risk, six flood hazard factors (namely, maximum three-day rainfall (RMAX3), annual average rainstorm frequency (RF), annual average rainstorm amount (ARA), drainage density (DD), slope, elevation (DEM)) and four flood vulnerability factors (namely, population density (PD), land use pattern (LUP), GDP, normalized difference vegetation index (NDVI)) were selected and weights of them were derived from the AHP-Entropy method. Results show that PD (0.168), RMAX3 (0.163), LUP (0.146), GDP (0.129), and RF (0.111) play a vital role in the results of flood risk assessment. Spatially, the long and short-term flood risk maps are shown to have similar characteristics with correlation coefficient of 0.9056. Areas with high risk and very high risk account for 19.6% of the total area in the long-term flood risk map and increased to 22.2% in the short-term flood risk map. Overall, the northeastern parts of the Poyang Lake basin are more prone to floods and the flood risk gradually decreases from the Poyang Lake towards the surrounding areas. Verification of the results with Sentinel-1 synthetic aperture radar data shows that the flood risk assessment model has an accuracy of more than 50% in very high risk zones for floods, and more than 90% for high and very high risk floods, which showed that the presented model is reliable in flood risk assessment.
... The authors used the International River Interface Cooperative software (iRIC) model with FaSTMECH (Flow and Sediment Transport with Morphological Evolution of Channel) solver [114] for hydrodynamic modeling, which was calibrated and used to train the AI models. Seven events with different flow magnitudes (10,50,95,120,150, 300, and 400 m 3 /s) were used for training and five events with different flow magnitudes (20,30,45,225, and 350 m 3 /s) were used for testing. This approach was evaluated in Green River in Utah, USA and was able to reduce the simulation time by 60 times with satisfactory prediction performance. ...
... AI models have also been used for regional-scale flood hazard risks. The RF model was used for regional-scale categorical flood hazard risk assessments over 27,363 km 2 with 5000 sample points in the Dongjiang River Basin in China [120]. The predictors included disaster-inducing factors (M3DP, TF, RD) and disaster-breeding environmental factors (SL, DEM, DTR, NDV I, LUP, ST, TW I, SPI). ...
... Similarly, the optimized GBoost, XGBoost, RF, SVM, MLP, and Convolutional Neural Network (CNN) were used to develop a flood risk map to identify regions with low, moderate, high, and highest risk in the Pearl River Delta in China, based on information obtained from flood risk inventory maps [121]. Different from [120,121] also included disaster-bearing body factors in the AI-based decisions. Using GBoost, XGBoost, RF, SVM, MLP, and CNN, the authors evaluated flood risk using disaster-inducing factors (M3HP, M1DP, DPE25, TF), disaster-breeding environmental factors (DEM, SL, DTR, RD, TW I, CN), and disasterbearing body factors (PD, GDPD). ...
Article
This review focuses on the use of Interpretable Artificial Intelligence (IAI) and eXplainable Artificial Intelligence (XAI) models for data imputations and numerical or categorical hydroclimatic predictions from nonlinearly combined multidimensional predictors. The AI models considered in this paper involve Extreme Gradient Boosting, Light Gradient Boosting, Categorical Boosting, Extremely Randomized Trees, and Random Forest. These AI models can transform into XAI models when they are coupled with the explanatory methods such as the Shapley additive explanations and local interpretable model-agnostic explanations. The review highlights that the IAI models are capable of unveiling the rationale behind the predictions while XAI models are capable of discovering new knowledge and justifying AI-based results, which are critical for enhanced accountability of AI-driven predictions. The review also elaborates the importance of domain knowledge and interventional IAI modeling, potential advantages and disadvantages of hybrid IAI and non-IAI predictive modeling, unequivocal importance of balanced data in categorical decisions, and the choice and performance of IAI versus physics-based modeling. The review concludes with a proposed XAI framework to enhance the interpretability and explainability of AI models for hydroclimatic applications.
... Similarly, another study that investigated the inpatient satisfaction of different public hospitals in China, considering the relationship between different influencing factors and overall satisfaction, found that the prediction accuracy of the RF is much higher than that of multiple regression and naïve Bayesian models [13]. Wang et al. [24] found great efficiency over a large database using RF while analyzing flood risk assessment through various risk index variables. However, the error rate for both training and testing data can be improved by increasing the sample size and classification trees. ...
... According to previous studies, using RF for prediction models can provide better predictive results over large databases than other machine learning models [9,13,24]. The results revealed that the RF simulation to predict household evacuation preparation time had an accuracy of approximately 86%, which yields efficient accuracy compared to traditional methods, and can take into consideration a similar simulation method. ...
Article
Full-text available
Household evacuation preparation time is important to ensure safe and successful evacuations and is essential for the estimation of the total evacuation time during a disaster. Previous research has shown that machine learning can provide a higher prediction accuracy, especially using the random forest model. However, no studies have investigated predictions of household evacuation preparation time considering the safe evacuation of coastal communities during cyclone disasters. This study proposes a methodology to predict household evacuation preparation time following demographic and behavioral input variables based on a random forest algorithm focusing on cyclones. In addition, this research analyzes the variable importance and partial dependence plot to identify the key influential factors that affect household evacuation preparation time. A case study was conducted in Gabura Union, Shaymnagar Upzila in Bangladesh regarding cyclone Bulbul in 2019 to gather demographic and behavioral data for a preparation time simulation. The prediction results showed efficient assessment of household evacuation preparation time prediction, meriting application for cases of future disasters. Our results show that the most important factors that impact household evacuation preparation time are evacuation companions and age, followed by shelter distance, income, and shelter type. The results of the prediction model can assist emergency response and evacuation planners and national disaster management authorities in developing and improving effective evacuation plans that take household evacuation preparation time into consideration for future disasters.
... Some studies indicated that machine-learning models can be a reliable method for hydrological studies [25,26]. For example, as one of the machine-learning methods, the random forest model has been identified to be a useful tool to analyze the impact of different drivers on the hydrological processes and variable importance in flood forecast [27]. However, quantifying the impact of climate change and human activities on runoff changes based on machine-learning methods remains less used and challenging. ...
... Although relatively simple and long-term hydro-climatological data are needed, an elasticity-based method was widely employed due to its physically realistic features [19,21]. The random forest model, as one of the machine-learning methods, is good at solving nonlinear problems and is increasingly employed in hydrology research [24,27]. Furthermore, it is more flexible to assemble more kinds of drivers for runoff changes. ...
Article
Full-text available
Quantifying the impact of climate change and human activities on runoff changes is beneficial for developing sustainable water-management strategies within the local ecosystem. Machine-learning models were widely used in scientific research; yet, whether it is applicable for quantifying the contribution of climate change and human activities to runoff changes is not well understood. To provide a new pathway, we quantified the contribution of climate change and human activities to runoff changes using a machine-learning method (random forest model) in two semi-humid basins in this study. Results show that the random forest model provides good performances for runoff simulation; the contributions of climate change and human activities to runoff changes from 1982 to 2014 were found between 6-9% and 91-94% in the Zijinguan basin, and 31-44% and 56-69% in the Daomaguan basin, respectively. Furthermore, the model performances were also compared with those of well-known elasticity-based and double-mass curve methods, and the results of these models are approximate in the investigated basins, which implies that the random forest model has the potential for runoff simulation and for quantifying the impact of climate change and human activities on runoff changes. This study provides a new methodology for studying the impact of climate change and human activities on runoff changes, and the limited numbers of parameters make this methodology important for further applications to other basins elsewhere. Nevertheless, the physical interpretation should be made with caution and more comprehensive comparison work must be performed to assess the model's applicability.
... Machine learning-based methods have shown advantages for the analysis of rainfall-runoff pollution, and with the development of data science, various machine learning methods have been explored and developed to predict rainfall-runoff pollution in urban rivers (Jeung et al., 2019). These methods including random forest (RF), gradient boosting decision tree (GBDT), and extreme gradient boosting (XGBoost) methods, which have been applied to analyze the relationships between rainfall characteristics and runoff pollution (Wu et al., 2014;Wang et al., 2015). RF algorithms have been used to rank the importance of multiple rainfall characteristics affecting the initial scouring effect of river runoff, revealing the following six most important characteristics: total rainfall amount; maximum rainfall intensity in 5 min; rainfall duration; total amount of runoff; peak runoff; and average rainfall intensity (Alias et al., 2014;Perera et al., 2019). ...
... Although several previous studies have analyzed rainfall characteristics and their qualitative effects on rainfall-runoff pollution, few studies have quantitatively analyzed the effects of rainfall characteristics on its pollution. Neural network models including artificial neural network (ANN), convolutional neural network, and back-propagation neural network models can consider multiple rainfall features together to predict rainfallrunoff pollution (Wu et al., 2014;Wang et al., 2015;Chau, 2017;Fotovatikhah et al., 2018). Some researchers have used neural network models to generalize the complex relationships between rainfall characteristics and water quality parameters to enhance the accuracy of rainfall-runoff simulation and prediction (Fernandes et al., 2020). ...
Article
Full-text available
Climate change and rapid urbanization have made it difficult to predict the risk of pollution in cities under different types of rainfall. In this study, a data-driven approach to quantify the effects of rainfall characteristics on river pollution was proposed and applied in a case study of Shiyan River, Shenzhen, China. The results indicate that the most important factor affecting river pollution is the dry period followed by average rainfall intensity, maximum rainfall in 10 min, total amount of rainfall, and initial runoff intensity. In addition, an artificial neural network model was developed to predict the event mean concentration (EMC) of COD in the river based on the correlations between rainfall characteristics and EMC. Compared to under light rain (< 10 mm/day), the predicted EMC was five times lower under heavy rain (25–49.9 mm/day) and two times lower under moderate rain (10–24.9 mm/day). By converting the EMC to chemical oxygen demand in the river, the pollution load under non-point-source runoff was estimated to be 497.6 t/year (with an accuracy of 95.98%) in Shiyan River under typical rainfall characteristics. The results of this study can be used to guide urban rainwater utilization and engineering design in Shenzhen. The findings also provide insights for predicting the risk of rainfall-runoff pollution and developing related policies in other cities.
... Point-based data-driven models such as logistic regression (Al-Juaidi et al. 2018), the statistical index (Wi) method (Tehrany et al. 2019), random forests (RF) (Wang et al. 2015;Lee et al. 2017;Chen et al. 2020), support vector machines (SVM) (Tehrany et al. 2014;Tehrany et al. 2015), and artificial neural networks (ANN) (Bui et al. 2020) have been used as alternatives to map flood susceptibility for large areas. They can incrementally create high-level features from a raw dataset, and capture complex patterns in the dataset (Bui et al. 2020). ...
... Therefore, flood susceptibility mapping should consider both the spatial and temporal precipitation patterns (Wang et al. 2015;Zhao et al. 2018). To that end, we selected the annual maximum daily precipitation (AP) and the frequency of ...
Preprint
Full-text available
Identifying urban pluvial flood-prone areas is necessary but the application of two-dimensional hydrodynamic models is limited to small areas. Data-driven models have been showing their ability to map flood susceptibility but their application in urban pluvial flooding is still rare. A flood inventory (4333 flooded locations) and 11 factors which potentially indicate an increased hazard for pluvial flooding were used to implement convolutional neural network (CNN), artificial neural network (ANN), random forest (RF) and support vector machine (SVM) to: (1) Map flood susceptibility in Berlin at 30, 10, 5, and 2 m spatial resolutions. (2) Evaluate the trained models' transferability in space. (3) Estimate the most useful factors for flood susceptibility mapping. The models' performance was validated using the Kappa, and the area under the receiver operating characteristic curve (AUC). The results indicated that all models perform very well (minimum AUC = 0.87 for the testing dataset). The RF models outperformed all other models at all spatial resolutions and the RF model at 2 m spatial resolution was superior for the present flood inventory and predictor variables. The majority of the models had a moderate performance for predictions outside the training area based on Kappa evaluation (minimum AUC = 0.8). Aspect and altitude were the most influencing factors on the image-based and point-based models respectively. Data-driven models can be a reliable tool for urban pluvial flood susceptibility mapping wherever a reliable flood inventory is available.
... SOM is utilized for clustering, dimension reduction in A. Sahoo, D. K. Ghose various applications, and visualization of high-dimensional data (Kohonen, 2001). SOM comprises an input and an output layer linked to each neuron in output layer on a hexagonal grid using a weight vector (Wang et al., 2015). An unsupervised network learns to map nonlinearly, from a high-dimensional input layer to a low-dimensional grid pattern in output layer (Olkowska et al., 2014;Farmaki et al., 2013). ...
... Prediction performances can be enhanced by combining many classification trees in forestry. Here, a randomized subsection of interest variables is utilized for splitting the tree, and thus, the final result of RF is the average outcome of all tree (Pal 2005;Wang et al. 2015). The arbitrary selection of elements decreases correlation among trees in forest at each node, consequently reducing error rate of the forest. ...
Article
Full-text available
Efficient methods are necessary for interpolation of precipitation data in geospatial systems. In recent years, there has been an incremental need to complete rainfall data networks. Reliable missing data estimation is significant for hydrologists, meteorologists, and environmentalists. A study is conducted in the Cachar watershed, Assam state (India), for imputation of missing precipitation data considering nineteen rain gauging stations using K-nearest neighbor (KNN), self-organizing maps (SOM), random forest (RF), and feed-forward neural network (FNN). Various performance indices like root mean squared error (RMSE), determination coefficient (R²), and mean absolute error (MAE) are used for understanding model efficacy. Performance indices of model indicate MAE (0.043), R² (0.999), and RMSE (0.066) values for FNN, which shows its effectiveness in imputation of missing precipitation data as compared to KNN, SOM, and RF, especially in regions with extreme missingness. Results of this research proved to be highly imperative in selecting preeminent techniques to estimate precipitation data and reduce data gaps in complex watersheds like Cachar. For all stations, the performance indices of proposed models fell inside standard range of hydrological modeling. This study can be well utilized for water resources management and hydrological modeling.
... The following lessons are learned from the feature importance assessment: 1) As the primary flood driver, precipitation shows notable importance for all levels of flood events, which is consistent with Wang et al. (2015). 2) It is important to increase flood insurance penetration (Horn and Webel 2019) in vulnerable areas because the contribution of effective numbers of policies is stable at all damage levels. ...
Article
Full-text available
Each year throughout the contiguous United States (CONUS), flood hazards cause damage amounting to billions of dollars in homeowner insurance claims. As climate change threatens to raise the frequency and severity of flooding in vulnerable areas, the ability to predict the number of property insurance claims resulting from flood events becomes increasingly important to flood resilience. Based on random forest, we develop a flood property Insurance Claims model (iClaim) by fusing records from the National Flood Insurance Program (NFIP), including building locations, topography, basin morphometry, and land cover, with data from multiple sources of hydrometeorological variables, including flood extent, precipitation, and operational river-stage and oceanic water-level measurements. The model utilizes two steps—damage level classification and claim number regression—and subsampling strategies designed accordingly to reduce overfitting and underfitting caused by the flood claim samples, which are unevenly distributed and widely ranged. We evaluate the model using 446,446 grid samples identified from 589 flood events occurring from 2016 to 2019 over CONUS, overlapping 258,159 claims out of a total of 287,439 NFIP records of the same period. Our rigorous validation yields acceptable performance at the grid/event, county/event, and event accumulative level, with R 2 over 0.5, 0.9, and 0.95, respectively. We conclude that the iClaim model can be used in many application scenarios, including assessing flood impact and improving flood resilience.
... Therefore, they are widely used in regional-scale flood analysis. At present, as sampling models, the machine learning models (i.e., support vector machine [19], random forest [20], artificial neural network [21], and decision tree [22]) and ...
Article
Full-text available
Climate change, population increase, and urban expansion have increased the risk of flooding. Therefore, accurately identifying future changing patterns in the flood risk is essential. For this purpose, this study elaborated a new framework for a basin scale that employs a future land-use simulation model, a factor spatialization technique, and a novel hybrid model for scenario-based flood risk assessment in 2030 and 2050. Three land-use scenarios (i.e., natural growth scenario, cropland protection scenario, and ecological protection scenario) were set and applied in Jinjiang Basin to explore the changes in future flood risk under these scenarios. The results indicate the different degrees of increase in flood risk that will occur in the three scenarios. Under the natural growth (NG) scenario, the city will expand rapidly with the growth of population and economy, and the total area with high and very high flood risk will increase by 371.30 km2 by 2050, as compared to 2020. However, under the ecological protection (EP) scenario, woodlands will be protected, and the growth in population, economy, and built-up lands will slow down with slightly increased risk of flooding. In this scenario, the total area with high and very high flood risk will increase by 113.75 km2 by 2050. Under the cropland protection (CP) scenario, the loss of croplands will have been effectively stopped, and the flood risk will not show a significant increase under this scenario, with an increase by only 90.96 km2 by 2050, similar to the EP scenario. Spatially, these increased flood risks mainly locate at the periphery of existing built-up lands, and the high-flood-risk zones are mainly distributed in the southeast of the Jinjiang Basin. The information about increasing flood risk determined by the framework provides insight into the spatio-temporal characteristics of future flood-prone areas, which facilitates reasonable flood mitigation measures to be developed at the most critical locations in the region.
... The recent success of machine learning (ML) models lies in their ability to not only account for nonlinearity issues related to physical processes, but also make it easier to model them at reduced costs [17]. Recent advances in ML techniques have made a considerable contribution to the enhancement of predictive flood hazard mapping [18]. Using ML algorithms, the limitations of traditional approaches can be addressed and the accuracy of predictions greatly improved [19]. ...
Article
Full-text available
Purpose The purpose of the paper is to predict mapping of areas vulnerable to flooding in the Ourika watershed in the High Atlas of Morocco with the aim of providing a useful tool capable of helping in the mitigation and management of floods in the associated region, as well as Morocco as a whole. Design/methodology/approach Four machine learning (ML) algorithms including k-nearest neighbors (KNN), artificial neural network, random forest (RF) and x-gradient boost (XGB) are adopted for modeling. Additionally, 16 predictors divided into categorical and numerical variables are used as inputs for modeling. Findings The results showed that RF and XGB were the best performing algorithms, with AUC scores of 99.1 and 99.2%, respectively. Conversely, KNN had the lowest predictive power, scoring 94.4%. Overall, the algorithms predicted that over 60% of the watershed was in the very low flood risk class, while the high flood risk class accounted for less than 15% of the area. Originality/value There are limited, if not non-existent studies on modeling using AI tools including ML in the region in predictive modeling of flooding, making this study intriguing.
... This model is known as the ensemble machine learning model because RF combines forecasting results from each sample to get the final prediction (the general description of the RF model is provided in section 2g). RF has emerged as an alternative forecasting technique in many fields, especially for hydroclimatologic variables, for example, regional flood hazard risk forecasting (Wang et al. 2015), drought forecasting (Chen et al. 2012), Australian winter rainfall forecasting (Firth et al. 2005), reservoir inflow forecasting (Nguyen 2016), daily water levels forecasting (Nguyen et al. 2016), and daily and monthly rainfall forecasting (Monira et al. 2010;Taksande and Mohod 2015). ...
Article
Globally, extreme rainfall has intense impacts on ecosystems and human livelihoods. However, no effort has yet been made to forecast the extreme rainfall indices through machine learning techniques. In this paper, a new extreme rainfall indices forecasting model is proposed using Random Forest (RF) model to provide effective forecasts of monthly extreme rainfall indices. In addition, RF feature importance is proposed in this study to identify the most and least important features for the proposed model. This study forecasts only statistical significant extreme rainfall indices over Bangladesh including consecutive dry days (CDD), the number of heavy rain days (R10mm; rainfall ≥ 10 mm), and the number of heavy rain days (R20mm; rainfall ≥ 20 mm) within 1–3 months lead-time. The proposed model uses monthly antecedent CDD, R10mm, and R20mm including atmospheric parameters and ocean-atmospheric teleconnections, namely, convective available potential energy (CAPE), relative humidity (RH), air temperature (TEM), El Niño–southern oscillation (ENSO), Indian Ocean Dipole (IOD), and North Atlantic oscillation (NAO) as the inputs to the model. Results show that the proposed model yields the best performance to forecast CDD, R10mm, and R20mm with only antecedent of these indices as input. Ocean-atmospheric teleconnections (IOD, ENSO, and NAO) are useful for CDD forecasting, and local atmospheric parameters (CAPE, RH, and TEM) are useful for R10mm and R20mm forecasting. The results suggest that adding atmospheric parameters and ocean-atmospheric teleconnections is useful to forecast extreme rainfall indices.
... Besides, it helps to take the appropriate measures to prevent a disaster or reduce its possible effects. Wang et al. [217] evaluate the flood risk in Dongjiang River Basin, China. They introduce a risk assessment method based on a machine learning algorithm called "random forest". ...
Thesis
Cette thèse de doctorat introduit le problème de tournées de véhicules avec des demandes dépendantes du temps (CVRP-TDD), le CVRP-TDD robuste et le cumulatif CVRP-TDD dans la logistique humanitaire. Ces problèmes contribuent à la planification de la distribution des kits de secours suite à une catastrophe. Ce problème apparaît lorsque les victimes fuient les refuges en raison du manque d'aide, provoquant la propagation du chaos dans d'autres territoires.Dans le CVRP-TDD, la distribution doit garantir un nombre maximum de personnes satisfaites dans les refuges avant leur départ. Un Programme Linéaire en Nombres Entiers (PLNE), un algorithme de Branch-and-Price (B&P) et une borne supérieure sont développés. Deux règles de dominance et une technique d'accélération sont également proposées. De plus, un cadre métaheuristique explorant 2 espaces de solution est mis en œuvre, introduisant les méthodes suivantes : GRASP, ILS, ELS, GRASPxELS et GRASPxILS.Le CVRP-TDD robuste suppose que le nombre de personnes fuyant les refuges refuges est incertain. Le PLNE et le B&P utilisés précédemment sont alors adaptés pour maximiser le nombre de rescapés servis dans leur refuge, dans le pire des scénarios.Le CVRP-TDD cumulatif considère le flux sortant des refuges critiques, mais aussi le flux entrant dans les refuges non-critiques Dans ce cas, l'objectif est de minimiser la somme des temps d'arrivées aux refuges critiques. Un PLNE, une heuristique en deux phases et une borne inférieure sont présentés
... Simply put, random forest combines many decision trees to produce a more accurate and stable prediction. [28]. ...
Conference Paper
Full-text available
Machine Learning (ML) models for flood prediction can be beneficial for flood alerts and flood reduction or prevention. To that end, machine-learning (ML) techniques have gained popularity due to their low computational requirements and reliance mostly on observational data. This study aimed to create a machine learning model that can predict floods in Kebbi state based on historical rainfall dataset of thirty-three years (33), so that it can be used in other Nigerian states with high flood risk. In this article, the Accuracy, Recall, and Receiver Operating Characteristics (ROC) scores of three machine learning algorithms, namely Decision Tree, Logistic Regression, and Support Vector Classification (SVR), were evaluated and compared. Logistic Regression, when compared with the other two algorithms, gives more accurate results and provides high performance accuracy and recall. In addition, the Decision Tree outperformed the Support Vector Classifier. Decision Tree performed reasonably well due to its above average accuracy and below-average recall scores. We discovered that Support Vector Classification performed poorly with a small size of dataset, with a recall score of 0, below average accuracy score and a distinctly average roc score
... Many studies have used images from the Moderate-Resolution Imaging Spectroradiometer (MODIS) sensors for large-scale crop extraction [8], but the identified crop maps often had large uncertainties because of the mixture of land cover types within rough to median resolution imagery pixels. Since the launch of Landsat and HJ-1A/1B, fine-spatial-resolution images have been used in crop monitoring [9,10]. Thanks to the high spatial and temporal resolution, the Sentinel data series have also been applied in crop extraction. ...
Article
Full-text available
Timely and accurate information of cotton planting areas is essential for monitoring and managing cotton fields. However, there is no large-scale and high-resolution method suitable for mapping cotton fields, and the problems associated with low resolution and poor timeliness need to be solved. Here, we proposed a new framework for mapping cotton fields based on Sentinel-1/2 data for different phenological periods, random forest classifiers, and the multi-scale image segmentation method. A cotton field map for 2019 at a spatial resolution of 10 m was generated for northern Xinjiang, a dominant cotton planting region in China. The overall accuracy and kappa coefficient of the map were 0.932 and 0.813, respectively. The results showed that the boll opening stage was the best phenological phase for mapping cotton fields and the cotton fields was identified most accurately at the early boll opening stage, about 40 days before harvest. Additionally, Sentinel-1 and the red edge bands in Sentinel-2 are important for cotton field mapping, and there is great potential for the fusion of optical images and microwave images in crop mapping. This study provides an effective approach for high-resolution and high-accuracy cotton field mapping, which is vital for sustainable monitoring and management of cotton planting.
... Fig. 12b Existing studies demonstrated that large-scale atmospheric circulation factors have a strong influence on weather regimes, climate change and hydrological variations (Hoerling et al., 2016;Xiao et al., 2016;Deng et al., 2018). South China has been recognized to be mainly influenced by the El Niño-Southern Oscillation Random forest (RF) has been verified to be applicable in measuring the variable importance and has performed well compared to many methods (Strobl et al., 2007;Wang et al., 2015). Due to its reliability for variable selection and determination of variable importance, RF was used in this study to identify the contributions of five associated large-scale circulation influencing factors on the changing properties of the precipitation concentration. ...
Article
Full-text available
Rainfall pattern (RP) and precipitation concentration (PC) are two critical indices for measuring rainfall. Detecting their changes under global warming helps to understand better the rainfall variability and the flooding formation. Using the Guangdong Province, China, as a study case, five criteria were used to determine the independent precipitation events for hourly precipitation data from 1967‐2012. The RP and PC of the events were identified, and then their spatiotemporal variability was investigated further. The results show that 1) the occurrence frequency during 46 years, average rainfall amount, and duration of independent rainfall in the coastal areas was higher than in other regions. 2) The dominant RPs in Guangdong are unimodal (more than 70%), especially the pattern with early peak (Mode I; 39.5%); however, the number of stations with Mode I as dominant rainfall pattern decreased over time, while those with Mode III (the pattern with late peak) increased. 3) PCI can be used for measuring the concentration of independent rainfall events, and its fixed minimum inter‐event time (MIT) has significant impacts on RP and PC. The PC of the events with early peak is higher, and the concentration in the west is generally higher than those in the east under different RPs. This article is protected by copyright. All rights reserved.
... While non-stationarity can be a concern as a result of climate change in these modelling applications, the developed model will be built on a combination of hydrologic and climatic variables, which addresses this potential issue (Li et al., 2016;Ghaith et al., 2020). Random forest models have historically been successful in classifying flood hazard risks (Wang et al., 2015;Sadler et al., 2018). ...
Article
Mid-winter breakups (MWBs), consisting of the early breakup of the winter river ice cover before the typical spring breakup season, are becoming increasingly common events in cold region rivers. These events can lead to potentially severe flooding, while also altering the expected spring flow regime, yet data on these events is limited. In this study, a newly released Canadian River Ice Database (CRID), containing river ice data from 196 rivers across Canada obtained from time series analysis, was used to analyse these MWBs on a previously impossible national scale. The CRID data was combined with the Natural Resources Canada (NRCan) gridded daily climate dataset to identify a list of potential hydrologic and climatic drivers for MWB events. Techniques such as correlation analysis, Least Absolute Selection Shrinkage Operator (LASSO) regression, and input omission were combined to select 20 key drivers of the severity of MWB events. A random forest model that was trained with these drivers using data-driven modelling techniques successfully classified the MWBs as either low, medium, or high severity, achieving an overall accuracy of 80%. A new threshold for the prediction of MWB initiation based on climatic conditions was subsequently proposed through the use of optimization via an exhaustive grid search and its accuracy in identifying MWBs exceeded those proposed by previous studies. The new threshold used in conjunction with the random forest model provide valuable tools for both the prediction of MWBs and the assessment of their potential severity.
... Bagging ensemble and logistic model tree (LMT) are used for flood susceptibility assessment (Chapi et al., 2017). Random forests (RF) are used for identifying the high-inundation risk areas (Wang et al., 2015) and real-time rainfall forecasting (Yu et al., 2017). K-nearest neighborhood (KNN) is adopted to predict the areas with high-inundation risk in the coastal region considering the risk of sea-level rise (Park and Lee, 2020). ...
Article
Full-text available
This paper presents a deep learning model based on the integration of physical and social sensors data for predictive watershed flood monitoring. The data from flood sensors and 3-1-1 reports data are mapped and fused through a multi-variate time series approach. This data format is able to increase the data availability (partially due to sparsely installed physical sensors and fewer reported flood incidents in less urbanized areas) and capture both spatial and temporal interactions between different watersheds and historical events. We use Harris County, TX as the study site and obtained 7 historical flood events data for training, validating, and testing the flood prediction model. The model predicts the flood probability of each watershed in the next 24 hours. By comparing the flood prediction performance of three different datasets (i.e., flood sensor only, 3-1-1 reports only, and integrated dataset), we conclude that the integrated dataset achieves the best flood prediction performance with an accuracy of 0.825, Area Under the Receiver operating characteristics Curve (AURC) of 0.902, Area Under the Precision-Recall Curve (AUPRC) of 0.883, Area Under the F-measure Curve (AUFC) of 0.762, and Max. F-measure of 0.788.
... • the impact parameter W has a physical foundation, as it ranges between the concept of energy and momentum, and it can be fitted to available experimental and field data effectively without the need of additional thresholds. This is expected to enhance the robustness of damage predictions, also with respect to high-dimensional multi-variate models that can be ill-conditioned in case of scarce data (Carisi et al., 2018;Merz et al., 2013;Wang et al., 2015). ...
Article
Full-text available
Direct flood damage is commonly assessed using damage models (i.e. vulnerability functions and fragility curves), which describe the relationship between hazard, vulnerability, and the (probability of) damage for items exposed to floods. In this paper, we introduce a non-dimensional impact parameter that, according to the physics of damage mechanisms and/or tuned on field or lab data, combines water depth and flow velocity in a general and flexible form. We then suggest a general approach to assess relative damage functions for items of different nature, subject to either progressive or on-off damage processes. The proposed method enhances traditional tools that use inundation depth as the main (or only) explicative variable, and allows recasting the results from previous studies in an elegant, flexible and unique form. Compared to multivariate models that link flow variables to damage directly, the physics-based approach allows for an intelligible assessment of flood hazard and the associated damage, even in case of scarce or sparse data. The proposed impact parameter and the related procedure to assess the relative damage functions are applied to different kinds of exposed items (people, vehicles, and buildings), demonstrating the general applicability and validity of the proposed method.
... Floods cause billions of dollars of damage each year [1][2][3]. Between 1980 and 2013, the global direct economic losses from floods exceeded $1 trillion (2013 values), and more than 220,000 people lost their lives [4]. In recent years, urban areas have faced global flood risk challenges due to extreme weather, rapid urbanization, and climate change, which has been increasing both in severity and frequency worldwide, increasing risks to human lives, health, properties, infrastructure, and the environment [5,6]. ...
Article
Full-text available
Metro systems have become high-risk entities due to the increased frequency and severity of urban flooding. Therefore, understanding the flood risk of metro systems is a prerequisite for mega-cities’ flood protection and risk management. This study proposes a method for accurately assessing the flood risk of metro systems based on an improved trapezoidal fuzzy analytic hierarchy process (AHP). We applied this method to assess the flood risk of 14 lines and 268 stations of the Guangzhou Metro. The risk results validation showed that the accuracy of the improved trapezoidal fuzzy AHP (90% match) outperformed the traditional trapezoidal AHP (70% match). The distribution of different flood risk levels in Guangzhou metro lines exhibited a polarization signature. About 69% (155 km2) of very high and high risk zones were concentrated in central urban areas (Yuexiu, Liwan, Tianhe, and Haizhu); the three metro lines with the highest overall risk level were lines 3, 6, and 5; and the metro stations at very high risk were mainly located on metro lines 6, 3, 5, 1, and 2. Based on fieldwork, we suggest raising exits, installing watertight doors, and using early warning strategies to resist metro floods. This study can provide scientific data for decision-makers to reasonably allocate flood prevention resources, which is significant in reducing flood losses and promoting Guangzhou’s sustainable development.
... Among these key variables affecting flood forecasting, rainfall and the spatial examination of the hydrologic cycle have the most significant effects (Nourani and Komasi, 2013). Although many studies have predicted the risk of flooded areas (Sampson et al., 2015;Tehrany et al., 2015;Wang et al., 2015;Darabi et al., 2019), this method still cannot essentially eliminate the impact of flood disasters. At the same time, it is relatively difficult to predict flood disasters. ...
Article
Full-text available
Floods, as one of the most common disasters in the natural environment, have caused huge losses to human life and property. Predicting the flood resistance of poplar can effectively help researchers select seedlings scientifically and resist floods precisely. Using machine learning algorithms, models of poplar’s waterlogging tolerance were established and evaluated. First of all, the evaluation indexes of poplar’s waterlogging tolerance were analyzed and determined. Then, significance testing, correlation analysis, and three feature selection algorithms (Hierarchical clustering, Lasso, and Stepwise regression) were used to screen photosynthesis, chlorophyll fluorescence, and environmental parameters. Based on this, four machine learning methods, BP neural network regression (BPR), extreme learning machine regression (ELMR), support vector regression (SVR), and random forest regression (RFR) were used to predict the flood resistance of poplar. The results show that random forest regression (RFR) and support vector regression (SVR) have high precision. On the test set, the coefficient of determination (R ² ) is 0.8351 and 0.6864, the root mean square error (RMSE) is 0.2016 and 0.2780, and the mean absolute error (MAE) is 0.1782 and 0.2031, respectively. Therefore, random forest regression (RFR) and support vector regression (SVR) can be given priority to predict poplar flood resistance.
... ML models have been explored because they can detect flood-prone locations based on historical events without necessarily understanding the physical processes behind them (Wang et al. 2015;Bera et al. 2022). Guyon et al., (2002) established the Recursive Feature Elimination (RFE) algorithm to improve feature selection while utilizing SVMs. ...
... Tree-based machine learning models are extremely popular nonlinear models for the prediction and attribution analysis of ecosystem dynamics Wang et al., 2015;Yuan et al., 2019), and they are usually more accurate than neural networks in many applications and outperform standard deep-learning models on tabular-style data sets (Lundberg et al., 2020). Extreme gradient boosting (XGB) is an ensemble learning algorithm based on an iterative decision tree model with many decision trees, which is typically used in classification and regression fields (Chen et al., 2016;Yan et al., 2020). ...
Article
Full-text available
The dominance of vapor pressure deficit (VPD) and soil water content (SWC) for plant water stress is still under debate. These two variables are strongly coupled and influenced by climatic drivers. The impacts of climatic drivers on the relationships between gross primary production (GPP) and water stress from VPD/SWC and the interaction between VPD and SWC are not fully understood. Here, applying statistical methods and extreme gradient boosting models‐Shapley additive explanations framework to eddy‐covariance observations from the global FLUXNET2015 dataset, we found that the VPD‐GPP relationship was strongly influenced by climatic interactions, and that VPD was more important for plant water stress than SWC across most plant functional types when we removed the effect of main climatic drivers, e.g. air temperature, incoming shortwave radiation and wind speed. However, we found no evidence for a significant influence of elevated CO2 on stress alleviation, possibly because of the short duration of the records (approximately one decade). Additionally, the interactive effect between VPD and SWC differed from their individual effect. When SWC was high, the SHAP interaction value of SWC and VPD on GPP was decreased with increasing VPD, but when SWC was low, the trend was the opposite. Additionally, we revealed a threshold effect for VPD stress on GPP loss; above the threshold value, the stress on GPP was flattened off. Our results have important implications for independently identifying VPD and SWC limitations on plant productivity, which is meaningful for capturing the magnitude of ecosystem responses to water stress in dynamic global vegetation models.
... The random forest (RF) algorithm has been applied in many fields of remote sensing and achieved corresponding results [44][45][46][47]. It performs random selection of classification features, with random combination of features performed on the nodes of each decision tree. ...
Article
Full-text available
Urban vegetation can regulate ecological balance, reduce the influence of urban heat islands, and improve human beings' mental state. Accordingly, classification of urban vegetation types plays a significant role in urban vegetation research. This paper presents various window sizes of completed local binary pattern (CLBP) texture features classifying urban vegetation based on high spatial-resolution WorldView-2 images in areas of Shanghai (China) and Lianyungang (Jiangsu province, China). To demonstrate the stability and universality of different CLBP window textures, two study areas were selected. Using spectral information alone and spectral information combined with texture information, imagery is classified using random forest (RF) method based on vegetation type, showing that use of spectral information with CLBP window textures can achieve 7.28% greater accuracy than use of only spectral information for urban vegetation type classification, with accuracy greater for single vegetation types than for mixed ones. Optimal window sizes of CLBP textures for grass, shrub, arbor, shrub-grass, arbor-grass, and arbor-shrub-grass are 3 × 3, 3 × 3, 11 × 11, 9 × 9, 9 × 9, 7 × 7 for urban vegetation type classification. Furthermore, optimal CLBP window size is determined by the roughness of vegetation texture.
... The study of flood risk mapping based on RF, on the other hand, is limited and still has a scope to explore for large river basins. Lai et al. (2015) and Wang et al. (2015) have developed a flood risk mapping and assessment framework based on the RF algorithm for Jiangxi Province's river basin, China, and concluded that the RF-based outcomes have more reliability than support vector machines (SVMs) algorithm. Similarly, Feng et al. (2015) have also concluded that the RF algorithm outperformed the artificial neural network (ANN) and the maximum likelihood method for flood risk assessment. ...
Article
Full-text available
Floods have a significant economic, social, and environmental impact in developing countries like India. Settlements in flood hazard zones increase flood risk due to a lack of information and awareness. The present study proposed a machine learning-based framework to identify such flood risk zones for the lower Narmada basin in India. Flood hazard factors like elevation and slope of the terrain, distance from main river network, drainage density, annual average rainfall of the area, and land-use land-cover (LULC) characteristics, as well as flood vulnerability factors like population density, agricultural production, and road–river intersections, were used as predictors in the random forest algorithm to predict the flood depth in the region. Initially, the flood depth obtained from the hydrodynamic model was used as a predict and to train the model and determine the weightage of each predictor. The RandomizedSeachCV technique was used to optimize hyperparameters of the random forest algorithm. The obtained results from variable importance of random forest show that the elevation of the terrain, LULC characteristics, distance from the main river network, and rainfall are the major contributors to cause flood risk in the area. Furthermore, the possibility of using the IoT-based sensor to develop the real-time flood risk mapping framework is described. The developed flood risk map can assist policymakers, stakeholders, and citizens in developing guidelines, taking preventive measures, and avoid unnecessary settlements in flood risk zones.
... The random forests model is an ensemble-learning algorithm ( Breiman 2001 ). This approach has been used in developing susceptibility models ( Ismail et al., 2010 ;Wang et al., 2015 ;Youssef et al., 2016 ;Chen et al., 2017 ) and has been applied in numerous predictive mapping applications and across multiple disciplines ( Prasad et al., 2006 ;Belgiu and Dr ȃgu ¸t 2016 ;Biau and Scornet 2016 ). We used random forests to predict the probability of lowdensity juniper presence given the set of observed environmental variables. ...
Article
Full-text available
Expanding distributions of native juniper species have had significant ecological and economic impacts on prairie ecosystems of the Great Plains. Juniper encroachment reduces rangeland production by decreasing herbaceous biomass and affecting natural ecosystem functions as it alters other native plant communities, microclimates, and soils. Juniper distribution maps are needed to support proactive management, but they often underestimate the extent of low-density juniper stands. Our objectives were to extend a previous juniper mapping study by 1) fitting a predictive ecological model for low-density (< 15% fractional cover) juniper stands and assessing the classification accuracy, 2) determining the habitat variables that had the strongest associations with low-density juniper, and 3) applying the model to map low-density juniper stands, where proactive management has the greatest potential for stopping further juniper encroachment. The study area included counties bordering the Missouri River in southeastern South Dakota and northeastern Nebraska covering approximately 23 000 km². Environmental predictors included seed source distance and density, as well as topography, climate, soils, and land use variables. Areas of low-density juniper were identified by visual interpretation of sample plots from digital aerial photography. We used a machine-learning approach to classify low-density juniper with the random forests algorithm. Model accuracy was high with an area under the receiver operating characteristic curve of 0.884. Variables related to seed sources were the most important predictors, and precipitation, slope angle, and the local intensity of human land use also had substantial influences. A previous map based on Landsat imagery identified 209 968 acres (84 971 ha) as juniper with in the study area, and this study found an additional 430 648 acres (174 277 ha) classified as low-density juniper stands. These results can provide agencies and land managers with more accurate information about the distribution of juniper, and the underlying techniques can be extended to map woody plant encroachment in other areas.
... It has an improved prediction accuracy compared with a single decision tree. Some additional benefits are that random forests are good at solving nonlinear problems, require no normalization or scaling of data and are insensitive to multicollinearity [25][26][27][28][29]. ...
Article
Full-text available
Ammonium is one of the main inorganic pollutants in groundwater, mainly due to agricultural, industrial and domestic pollution. Excessive ammonium can cause human health risks and environmental consequences. Its temporal and spatial distribution is affected by factors such as meteorology, hydrology, hydrogeology and land use type. Thus, a groundwater ammonium analysis based on limited sampling points produces large uncertainties. In this study, organic matter content, groundwater depth, clay thickness, total nitrogen content (TN), cation exchange capacity (CEC), pH and land-use type were selected as potential contributing factors to establish a machine learning model for fitting the ammonium concentration. The Shapley Additive exPlanations (SHAP) method, which explains the machine learning model, was applied to identify the more significant influencing factors. Finally, the machine learning model established according to the more significant influencing factors was used to impute point data in the study area. From the results, the soil organic matter feature was found to have a substantial impact on the concentration of ammonium in the model, followed by soil pH, clay thickness and groundwater depth. The ammonium concentration generally decreased from northwest to southeast. The highest values were concentrated in the northwest and northeast. The lowest values were concentrated in the southeast, southwest and parts of the east and north. The spatial interpolation based on the machine learning imputation model established according to the influencing factors provides a reliable groundwater quality assessment and was not limited by the number and the geographical location of samplings.
... Meanwhile, non-parametric statistical ML models, such as long short-term memory (LSTM), can model any nonlinear components (universal approximators). Furthermore, for the last method, RF was selected due to its popular use as an ML algorithm in hydrology applications [45][46][47]. All these three selected methods are discussed in the following section. ...
Article
Full-text available
The Red River of the North is vulnerable to floods, which have caused significant damage and economic loss to inhabitants. A better capability in flood-event prediction is essential to decision-makers for planning flood-loss-reduction strategies. Over the last decades, classical statistical methods and Machine Learning (ML) algorithms have greatly contributed to the growth of data-driven forecasting systems that provide cost-effective solutions and improved performance in simulating the complex physical processes of floods using mathematical expressions. To make improvements to flood prediction for the Red River of the North, this paper presents effective approaches that make use of a classical statistical method, a classical ML algorithm, and a state-of-the-art Deep Learning method. Respectively, the methods are seasonal autoregressive integrated moving average (SARIMA), Random Forest (RF), and Long Short-Term Memory (LSTM). We used hourly level records from three U.S. Geological Survey (USGS), at Pembina, Drayton, and Grand Forks stations with twelve years of data (2007–2019), to evaluate the water level at six hours, twelve hours, one day, three days, and one week in advance. Pembina, at the downstream location, has a water level gauge but not a flow-gauging station, unlike the others. The floodwater-level-prediction results show that the LSTM method outperforms the SARIMA and RF methods. For the one-week-ahead prediction, the RMSE values for Pembina, Drayton, and Grand Forks are 0.190, 0.151, and 0.107, respectively. These results demonstrate the high precision of the Deep Learning algorithm as a reliable choice for flood-water-level prediction.
... Rapid global urbanization as well as the increasing intensity and frequency of extreme precipitation induced by climate change has increased the severity of urban floods across many cities (Zhang et al. 2017;Wu et al. 2017;Wang et al. 2017;Zhang et al. 2018b;Swain et al. 2018;Lai et al. 2020;Hu et al. 2021). Such floods have not only caused huge economic losses, but also have led to great destruction to the environment and ecosystem (Wang, Lai, et al. 2015;Lai et al. 2016;Donat et al. 2017;Zhang et al. 2018a;Chen et al. 2020;Li et al. 2020b;Zhang et al. 2021). It is predicted that approximately 40% of the cities across the globe will be located in flood-prone zones by 2030 (Güneralp et al. 2015). ...
Article
Full-text available
Urban floods are becoming increasingly more frequent, which has led to tremendous economic losses. The application of inundation modeling to predict and simulate urban flooding is an effective approach for disaster prevention and risk reduction, while also addressing the uncertainty problem in the model is always a challenging task. In this study, a cellular automaton (CA)-based model combining a storm water management model (SWMM) and a weighted cellular automata 2D inundation model was applied and a physical-based model (LISFLOOD-FP) was also coupled with SWMM for comparison. The simulation performance and the uncertainty factors of the coupled model were systematically discussed. The results show that the CA-based model can achieve sufficient accuracy and higher computational efficiency than can a physical-based model. The resolution of terrain and rainstorm data had a strong influence on the performance of the CA-based model, and the simulations would be less creditable when using the input data with a terrain resolution lower than 15 m and a recorded interval of rainfall greater than 30 min. The roughness value and model type showed limited impacts on the change of inundation depth and occurrence of the peak inundation area. Generally, the CA-based coupled model demonstrated laudable applicability and can be recommended for fast simulation of urban flood episodes. This study also can provide references and implications for reducing uncertainty when constructing a CA-based coupled model.
... The hilly regions of Uttarakhand are mostly affected by the flash-floods situation because of steep slopes and high drainage density. Thus, implementing flood reduction measures requires a holistic approach throughout the pre-flood, post-flood, and post-flood stages [66,67]. During the pre-flood stage, the recommendations are as follows: implementation of disaster contingency planning and flood risk management for all causes of flood, preventing inapt development within the flood plains, constructing physical flood defence infrastructure, implementation of proper warning and forecast systems, proper land use planning and public communications. ...
Article
Full-text available
Uttarakhand, an Indian Himalayan state in India, is famous for its natural environment, health rejuvenation, adventure, and a pilgrimage centre for various religions. It is categorised into two major regions, i.e., the Garhwal and the Kumaon, and geographically, the Bhabar and the Terai. Floods, cloudbursts, glacier lake outbursts, and landslides are the major natural hazards that cause the highest number of mortalities and property damage in this state. After becoming a full 27th state of India in 2000, the developmental activities have increased many folds, which has added to such calamities. This study briefly summarises the major incidents of flood damage, describes the fragile geology of this Himalayan state, and identifies the natural as well as the anthropogenic causes of the flood as a disaster. It also highlights the issue of climate change in the state and its adverse impact in the form of extreme precipitation. Besides these, it reviews the challenges involved in flood management and highlights the effective flood risk management plan that may be adopted to alleviate its adverse impacts.
Article
Full-text available
Considering the large number of natural disasters on the planet, many areas in the world are at risk of these hazards; therefore, providing an integrated map as a guide map for multiple natural hazards can be applied to save human lives and reduce financial losses. This study designed a multi-hazard map for three important hazards (earthquakes, floods, and landslides) to identify endangered areas in Kermanshah province located in western Iran using ensemble SWARA-ANFIS-PSO and SWARA-ANFIS-GWO models. In the first step, flood and landslide inventory maps were generated to identify at-risk areas. Then, the occurrence places for each hazard were divided into two groups for training susceptibility models (70%) and testing the models applied (30%). Factors affecting these hazards, including altitude, slope aspect, slope degree, plan curvature, distance to rivers, distance to roads, distance to the faults, rainfall, lithology, and land use, were used to generate susceptibility maps. The SWARA method was used to weigh the subclasses of the influencing factors in floods and landslides. In addition, a peak ground acceleration (PGA) map was generated to investigate earthquakes in the study area. In the next step, the ANFIS machine learning algorithm was used in combination with PSO and GWO meta-heuristic algorithms to train the data, and SWARA-ANFIS-PSO and SWARA-ANFIS-GWO susceptibility maps were separately generated for flood and landslide hazards. The predictive ability of the implemented models was validated using the receiver operating characteristics (ROC), root mean square error (RMSE), and mean square error (MSE) methods. The results showed that the SWARA-ANFIS-PSO ensemble model had the best performance in generating flood susceptibility maps with ROC = 0.936, RMS = 0.346, and MSE = 0.120. Furthermore, this model showed excellent results (ROC = 0.894, RMS = 0.410, and MSE = 0.168) for generating a landslide map. Finally, the best maps and PGA map were combined, and a multi-hazard map (MHM) was obtained for Kermanshah Province. This map can be used by managers and planners as a practical guide for sustainable development.
Article
Full-text available
The aggregation of the same type of socio-economic activities in urban space generates urban functional zones, each of which has one function as the main (e.g., residential, educational or commercial), and is an important part of the city. With the development of deep learning technology in the field of remote sensing, the accuracy of land use decoding has been greatly improved. However, no finer remote sensing image could directly obtain economic and social information and it has a high revisit cycle (low temporal resolution), while urban flooding often lasts only a few hours. Cities contain a large amount of ”social sensing” data that records human socio-economic activities, and GIS is a natural discipline with strong socio-economic ties. We propose a new GeoSemantic2vec algorithm for urban function recognition based on the latest advances in natural language processing technology (BERT model), which utilizes the rich semantic information in urban POI data to portray urban functions. Taking the Wuhan flooding event in summer 2020 as an example, we identified 84.55% of the flooding locations in social media. We also use the new algorithm proposed in this paper to divide the main urban area of Wuhan into 8 types of urban functional zones (kappa coefficient is 0.615) and construct a ”City Portrait” of flooding locations. This paper summarizes the progress of existing research on urban function identification using natural language processing techniques and proposes a better algorithm, which is of great value for urban flood location detection and risk assessment.
Article
Watershed models are robust tools that inform management and policy in a variety of sectors, but these models are often neglected through time due to economic or technical constraints. Additionally, they are not readily accessible tools for key decision makers. Conversely, machine learning models are robust alternatives to common hydrologic modeling frameworks. The random forest algorithm specifically is an interpretable predictive tool. We couple Annualized Agricultural Non-Point Source (AnnAGNPS) model output, an abstract, anthropogenic flood risk metric, and develop a random forest model to provide an empirical tool that benefits decision makers in the Des Moines Lobe of the Prairie Pothole Region in north-central Iowa. The developed model has the capacity to predict our flood risk metric (calibration: R² > 0.9, validation: R² > 0.7) for individual farmed prairie potholes across a variety of morphologic and management conditions and can be used iteratively to assess alternative actions.
Article
In a changing environment, changes in terrestrial water storage (TWS) in basins have a significant impact on potential floods and affect flood risk assessment. Therefore, we aimed to study the impact of TWS on potential floods. In this study, we reconstructed the TWS based on precipitation and temperature, evaluated the reconstructed TWS data based on Gravity Recovery and Climate Experiment (GRACE)-TWS data, and analyzed and calculated the flood potential index (FPI) in the Yangtze River Basin (YRB). The related influencing factors were analyzed based on the Global Land Data Assimilation System (GLDAS) data and Granger’s causality test. The main conclusions are as follows: (1) although the GRACE-TWS anomaly (GRACE-TWSA) in the YRB showed an increasing trend for the averaged TWSA over all grids in the whole basin (i.e., 0.31 cm/a, p < 0.05), the variable infiltration capacity-soil moisture anomalies (VIC-SMA) showed a decreasing trend (i.e., −0.048 cm/a, p > 0.05) during April 2002–December 2019; (2) a larger relative contribution of detrended precipitation to FPI was found in the Jialingjiang River Basin (JRB), Wujiang River Basin (WRB), Dongting Lake Rivers Basin (DLRB), YinBin-Yichang reaches (YB-YC), and Yichang-Hukou reaches (YC-HK), while the contribution of detrended TWS to FPI in the Poyang Lake Rivers Basin (PLRB) was larger than that in other basins; and (3) the original and detrended soil moisture (SM) and TWS in the YRB showed a significant positive correlation (p < 0.05), while the significant effect of SM on TWS caused a change in FPI in the YRB and its sub-basins. This study is of great significance for the correct understanding of the FPI and the accurate assessment of flood risk.
Article
Rainfall-runoff simulation is vital for planning and controlling flood control events. Hydrology modeling using Hydrological Engineering Center—Hydrologic Modeling System (HEC-HMS) is accepted globally for event-based or continuous simulation of the rainfall-runoff operation. Similarly, machine learning is a fast-growing discipline that offers numerous alternatives suitable for hydrology research’s high demands and limitations. Conventional and process-based models such as HEC-HMS are typically created at specific spatiotemporal scales and do not easily fit the diversified and complex input parameters. Therefore, in this research, the effectiveness of Random Forest, a machine learning model, was compared with HEC-HMS for the rainfall-runoff process. Furthermore, we also performed a hydraulic simulation in Hydrological Engineering Center—Geospatial River Analysis System (HEC-RAS) using the input discharge obtained from the Random Forest model. The reliability of the Random Forest model and the HEC-HMS model was evaluated using different statistical indexes. The coefficient of determination (R2), standard deviation ratio (RSR), and normalized root mean square error (NRMSE) were 0.94, 0.23, and 0.17 for the training data and 0.72, 0.56, and 0.26 for the testing data, respectively, for the Random Forest model. Similarly, the R2, RSR, and NRMSE were 0.99, 0.16, and 0.06 for the calibration period and 0.96, 0.35, and 0.10 for the validation period, respectively, for the HEC-HMS model. The Random Forest model slightly underestimated peak discharge values, whereas the HEC-HMS model slightly overestimated the peak discharge value. Statistical index values illustrated the good performance of the Random Forest and HEC-HMS models, which revealed the suitability of both models for hydrology analysis. In addition, the flood depth generated by HEC-RAS using the Random Forest predicted discharge underestimated the flood depth during the peak flooding event. This result proves that HEC-HMS could compensate Random Forest for the peak discharge and flood depth during extreme events. In conclusion, the integrated machine learning and physical-based model can provide more confidence in rainfall-runoff and flood depth prediction.
Preprint
Full-text available
Extreme runoff modeling is hindered by the lack of sufficient and relevant ground information and the low reliability of physically-based models. The authors propose to combine precipitation Remote Sensing (RS) products, Machine Learning (ML) modeling, and hydrometeorological knowledge to improve extreme runoff modeling. The approach applied to improve the representation of precipitation is the object-based Connected Component Analysis (CCA), a method that enables classifying and associating precipitation with extreme runoff events. Random Forest (RF) is employed as a ML model. We used 2.5 years of nearly-real-time hourly RS precipitation from the PERSIANN-CCS and IMERG-early run databases (spatial resolutions of 0.04 o and 0.1 o , respectively), and runoff at the outlet of a 3391 km 2-basin located in the tropical Andes of Ecuador. The developed models show the ability to simulate extreme runoff for the cases of long-duration precipitation events regardless of the spatial extent, obtaining Nash-Sutcliffe efficiencies (NSE) above 0.72. On the contrary, we found an unacceptable model performance for a combination of short-duration and spatially-extensive precipitation events. The strengths/weaknesses of the developed ML models are attributed to the ability/difficulty to represents complex precipitation-runoff responses.
Article
Floods have occurred frequently all over the world. During 2000-2020, nearly half (44.9%) of global floods occurred in the Belt and Road region because of its complex geology, topography, and climate. Therefore, providing an insight into the spatial distribution characteristics of flood susceptibility in this region is essential. Here, a database was established with 11 flood conditioning factors, 1500 flooded points, and 1500 non-flooded points selected by an improved method. Subsequently, a rare combination of logistic regression and support vector machine, integrated by heterogeneous framework, was applied to generate an ensemble flood susceptibility map. Based on it, the concept of ecological vulnerability synthesis index in the ecological field was introduced into this study, and the flood susceptibility comprehensive index (FSCI) was proposed to quantify the degree of flood susceptibility of each country and sub-region. At the results, the ensemble model has an excellent accuracy, with the highest AUC value of 0.9342. The highest and high flood susceptibility zones are mainly located in the southeastern part of Eastern Asia, most of Southeast Asia and South Asia, account for 12.22% and 9.57% of the total study area respectively. From the regional perspective, it can be found that Southeast Asia had the highest flood susceptibility with the highest FSCI of 4.69, while East Asia and Central and Eastern Europe showed the most significant spatial distribution characteristics. From the national perspective, of the 66 countries in this region, 20 of the countries have the highest flood susceptibility level (FSCIn > 0.8), which face the greatest threat of flooding. These results are able to facilitate reasonable flood mitigation measures develop at the most critical locations in the Belt and Road region, and lays a theoretical basis for quantifying flood susceptibility at national or regional scale.
Article
The excessive application of agricultural irrigation water and chemical fertilizer has increased crop yields to help meet the demand for food, but it has also led to major water environment problem, i.e. non-point source (NPS) pollution, which needs to be addressed to achieve sustainable development targets. Although numerous studies have focused on the control and reduction of agricultural NPS pollution from the perspective of irrigation and fertilizer, the effects of different cropping systems on NPS pollution (ammonia nitrogen (NH3−N)) in the Dongjiang River Basin (DRB) were seldom assessed. Specifically, variation in the NH3-N load was simulated and analyzed at the annual and semi-annual scales under ten different cropping systems using the Soil and Water Assessment Tool (SWAT) model, which was calibrated and validated with satisfactory statistical index values in the DRB. The results indicated that the NH3-N load decreased, distinctly increased, slightly decreased when sweet potato, peanut, and rice were planted, respectively. Compared with mono-cropping, crop rotation could reduce the NH3-N load, and the planting sequence of crops could affect the NH3-N load to a certain extent. Planting peanuts in spring would dramatically increase NH3-N load. To evaluate NH3-N pollution, a critical threshold of NH3-N emission (5.1 kg·ha⁻¹·year⁻¹) was proposed. Meeting the NH3-N emission threshold cannot be achieved by altering the cropping system alone; additional measures are needed to reduce agricultural NPS pollution. This study facilitates the development of cropping systems and provides relevant information to aid the sustainable development of agriculture in the DRB.
Article
Climate models are critical tools for developing strategies to manage the risks posed by sea-level rise to coastal communities. While these models are necessary for understanding climate risks, there is a level of uncertainty inherent in each parameter in the models. This model parametric uncertainty leads to uncertainty in future climate risks. Consequently, there is a need to understand how those parameter uncertainties impact our assessment of future climate risks and the efficacy of strategies to manage them. Here, we use random forests to examine the parametric drivers of future climate risk and how the relative importances of those drivers change over time. In this work, we use the Building blocks for Relevant Ice and Climate Knowledge (BRICK) semi-empirical model for sea-level rise. We selected this model because of its balance of computational efficiency and representation of the many different processes that contribute to sea-level rise. We find that the equilibrium climate sensitivity and a factor that scales the effect of aerosols on radiative forcing are consistently the most important climate model parametric uncertainties throughout the 2020 to 2150 interval for both low and high radiative forcing scenarios. The near-term hazards of high-end sea-level rise are driven primarily by thermal expansion, while the longer-term hazards are associated with mass loss from the Antarctic and Greenland ice sheets. Our results highlight the practical importance of considering time-evolving parametric uncertainties when developing strategies to manage future climate risks.
Article
Fire susceptibility modeling is crucial for sustaining and managing forests among many other valuable land resources. With 56% of its area covered by forests, Arkansas is known as the “natural state”. About 1000 wildfires occurred and burned more than 10,000 acres each year during 1981–2018. In this paper, we use remote-sensing-based machine learning methods to address the natural and anthropogenic factors influencing wildfires and model fire susceptibility in Arkansas. Among the 15 explored variables, potential evapotranspiration, soil moisture, Palmer drought severity index, and dry season precipitation were recognized as the most significant factors contributing to the fire density. The obtained R-squared values are significant, with 0.99 for training the model and 0.92 for the validation. The results show that the Ouachita National Forest and the Ozark Forest, in west-central and west Arkansas, respectively, have the highest susceptibility to wildfires. The southern part of Arkansas has low-to-moderate fire susceptibility, while the eastern part of the state has the lowest fire susceptibility. These new results for Arkansas demonstrate the potency of remote-sensing-based random forest in predicting fire susceptibility at the state level that can be adapted to study fires in other states and help with fire preparedness to reduce loss and save the precious environment.
Thesis
Full-text available
GEOTECHNICAL MAPPING FOR PHOTOVOLTAIC POWER PLANTS FOUNDATIONS USING MACHINE LEARNING ALGORITHM Photovoltaic solar energy is a renewable source that does not emit carbon during energy production, and is expected to become the main source of the Brazilian energy matrix up to 2040. The assembly of a photovoltaic power plant's equipments is a repetitive and fast process. Looking to extend this fast pace to the whole construction, the preference for the panels foundations has been the use of driven steel piles. Standard Penetration Tests (SPT) assist in the foundations design and in estimating the areas in the power plant lands that are liable to pile driving. Due to the large dimensions of the power plants, it becomes economically unfeasible to perform SPT with the same density recommended for buildings. Through Scorpan, a framework derived from Pedology, 29 environmental covariates were identified - derived from SRTM data and a Landsat 8 image - that represented soil formation factors. Then, the random forest algorithm was used to build a model associating the covariates with the number of blows of 74 SPT on the terrain of a 10.2 km² plant in the Brazilian semiarid region, at depths of 1, 2 and 3 meters. The built model was used to create NSPT maps, which, in turn, were used to create drivability maps based on an NSPT limit for driving piles. The most important variables were Altitude, Clay Index, Annual Precipitation and Isothermality. For a depth of 1 meter, the model showed R² = 0.43 and RMSE = 4.93; for a depth of 2 meters, R² = 0.41 and RMSE = 18.44; and for the depth of 3 meters, R² = 0.25 and RMSE = 17.66. In the lack of NSPT mappings that evaluated its mapping performance through the coefficient of determination (R²), the results were compared with soil texture mappings, especially one also located in the Brazilian semiarid region. It is concluded that the performance of the mapping is consistent with the available literature and that the maps allow a good visualization of the behavior and distribution of NSPT values on the terrain, assisting the foundations decision making process.
Article
Flood disaster is one of the most frequent and damaging disasters in the world. In the context of global climate change, urban flooding is occurring more frequently and has more severe consequences. Risk assessment has a certain effect on flood management and risk reduction. To facilitate the formulation of better flood prevention and disaster reduction strategies, a new urban flooding risk assessment model was proposed, in which six evaluation indicators were selected from flood hazard and vulnerability. The model was established based on the combination weighting method of game theory, and the risk map was generated based on the GIS platform. The model was validated by comparing the evaluation result with the disaster information in Zhengzhou city. The result showed that AHP-CRITIC·game theory combination weighting was more reasonable than the AHP and AHP-EWM weighting methods to determine the weight. The high-risk and the very high-risk areas are mainly distributed in the Jinshui district with lower elevation and highest economic development. The model framework proposed in this paper can be applied to the rapid assessment of urban flooding risk.
Article
Full-text available
Floods are the leading cause of natural disaster damages in the United States, with billions of dollars incurred every year in the form of government payouts, property damages, and agricultural losses. The Federal Emergency Management Agency oversees the delineation of floodplains to mitigate damages, but disparities exist between locations designated as high risk and where flood damages occur due to land use and climate changes and incomplete floodplain mapping. We harnessed publicly available geospatial datasets and random forest algorithms to analyze the spatial distribution and underlying drivers of flood damage probability (FDP) caused by excessive rainfall and overflowing water bodies across the conterminous United States. From this, we produced the first spatially complete map of FDP for the nation, along with spatially explicit standard errors for four selected cities. We trained models using the locations of historical reported flood damage events (n = 71 434) and a suite of geospatial predictors (e.g. flood severity, climate, socioeconomic exposure, topographic variables, soil properties, and hydrologic characteristics). We developed independent models for each hydrologic unit code level 2 watershed and generated a FDP for each 100 m pixel. Our model classified damage or no damage with an average area under the curve accuracy of 0.75; however, model performance varied by environmental conditions, with certain land cover classes (e.g. forest) resulting in higher error rates than others (e.g. wetlands). Our results identified FDP hotspots across multiple spatial and regional scales, with high probabilities common in both inland and coastal regions. The highest flood damage probabilities tended to be in areas of low elevation, in close proximity to streams, with extreme precipitation, and with high urban road density. Given rapid environmental changes, our study demonstrates an efficient approach for updating FDP estimates across the nation.
Article
Full-text available
Floods are the most frequent natural hazard globally and incidences have been increasing in recent years as a result of human activity and global warming, making significant impacts on people’s livelihoods and wider socio-economic activities. In terms of the management of the environment and water resources, precise identification is required of areas susceptible to flooding to support planners in implementing effective prevention strategies. The objective of this study is to develop a novel hybrid approach based on Bald Eagle Search (BES), Support Vector Machine (SVM), Random Forest (RF), Bagging (BA) and Multi-Layer Perceptron (MLP) to generate a flood susceptibility map in Thua Thien Hue province, Vietnam. In total, 1621 flood points and 14 predictor variables were used in this study. These data were divided into 60% for model training, 20% for model validation and 20% for testing. In addition, various statistical indices were used to evaluate the performance of the model, such as Root Mean Square Error (RMSE), Receiver Operation Characteristics (ROC), and Mean Absolute Error (MAE). The results show that BES, for the first time, successfully improved the performance of individual models in building a flood susceptibility map in Thua Thien Hue, Vietnam, namely SVM, RF, BA and MLP, with high accuracy (AUC > 0.9). Among the models proposed, BA-BES was most effective with AUC = 0.998, followed by RF-BES (AUC = 0.998), MLP-BES (AUC = 0.998), and SVM-BES (AUC = 0.99). The findings of this research can support the decisions of local and regional authorities in Vietnam and other countries regarding the construction of appropriate strategies to reduce damage to property and human life, particularly in the context of climate change.
Article
Machine learning algorithms have been widely applied in mineral prospectivity mapping (MPM). In this study, we implemented ensemble learning of extreme gradient boosting (XGBoost) and random forest (RF) models to create MPM for magmatic hydrothermal tin polymetallic deposits in Xianghualing District, southern Hunan Province, China. Machine-learning models often require careful adjustment of the learning parameters and model hyperparameters for optimal global performance. However, parameter tuning often entails tedious calculations and sufficient expert experience, which is a time-consuming and labor-intensive process. To obtain the global optimal performance of the XGBoost and RF models, a Bayesian optimization algorithm (BOA) was employed with the aid of 5-fold cross validation to search for the most appropriate hyperparameters of the XGBoost and RF models. After the Bayesian optimization, the AUC values of both models were significantly improved, indicating that the BOA is a powerful optimization tool. The optimization results provide a reference for the empirical hyperparameter setting of ensemble learning models. Through a comparative study, the XGBoost model was shown to be superior to the RF model in terms of accuracy, precision, recall, F1 score, and kappa coefficient. In addition, the receiver operating characteristic curves and prediction–area curves showed that the XGBoost model outperformed the RF model, indicating that the XGBoost model had better prediction ability and stability in the case area. In this study, the XGBoost model shows great potential for MPM, offering a significant improvement over the BOA method.
Article
Full-text available
Drought imposes serious challenges to ecosystems and societies and has plagued mankind throughout the ages. To understand the long-term trend of drought in China, a series of annual self-calibrating Palmer drought severity indexes (scPDSI), which is a semi-physical drought index based on the land surface water balance, were reconstructed during AD 56~2000. Multi-proxy records of tree-ring width and stalagmite oxygen isotope δ18O were used for this reconstruction, along with random forest regression. The spatiotemporal characteristics of the reconstruction results were analyzed, and comparisons were made with previous studies. Results showed that (1) China witnessed a drought-based state during the past 2000 years (mean value of scPDSI was −0.3151), with an average annual drought area of 85,000 km2; 4 wetting periods, i.e., the Han Dynasty (AD 56~220), the Tang Dynasty (AD 618~907), the Ming Dynasty (AD 1368~1644), and the Qing Dynasty (AD 1644~1912); and 2 drying periods, i.e., the Era of Disunity (AD 221~580) and the Song Dynasty (AD 960~1279). (2) Three different alternating fluctuation dry-wet modes (i.e., interannual, multidecadal, and centennial scales) in China were all significantly (p-value < 0.001) correlated with the amplitude and frequency of temperature in the Northern Hemisphere. (3) According to the spatial models disassembled from the rotated empirical orthogonal function, China was divided into nine dry-wet regions: northwestern China, Xinjiang, southwestern China, southeastern China, the Loess plateau, central China, southwestern Tibet, eastern China, and northeastern China. (4) The random forest (RF) was found to be accurate and stable for the reconstruction of drought variability in China compared with linear regression.
Article
Full-text available
Although machine learning (ML) techniques are increasingly used in rainfall-runoff models, most of them are based on one-dimensional datasets. In this study, a rainfall-runoff model with deep learning algorithms (CNN-LSTM) was proposed to compute runoff in the watershed based on two-dimensional rainfall radar maps directly. The model explored a convolutional neural network (CNN) to process two-dimensional rainfall maps and long short-term memory (LSTM) to process one-dimensional output data from the CNN and the upstream runoff in order to calculate the flow of the downstream runoff. In addition, the Elbe River basin in Sachsen, Germany, was selected as the study area, and the high-water periods of 2006, 2011, and 2013, and the low-water periods of 2015 and 2018 were used as the study periods. Via the fivefold validation, we found that the Nash–Sutcliffe efficiency (NSE) and Kling–Gupta efficiency (KGE) fluctuated from 0.46 to 0.97 and from 0.47 to 0.92 for the high-water period, where the optimal fold achieved 0.97 and 0.92, respectively. For the low-water period, the NSE and KGE ranged from 0.63 to 0.86 and from 0.68 to 0.93, where the optimal fold achieved 0.86 and 0.93, respectively. Our results demonstrate that CNN-LSTM would be useful for estimating water availability and flood alerts for river basin management.
Article
Floods result in substantial damage throughout the world every year. Accurate predictions of floods can significantly alleviate casualties and property losses. However, due to the complexity of hydrology process especially in a city with complicated pipe network, the accuracy of traditional flood forecasting models suffer from the performance degradation with the increasing of required prediction period. In the work, based on the collected historical data of Xixian City, Henan Province, China, using the Internet of Things system (IoT) in 2011-2018, a Bidirectional Gated Recurrent Unit (BiGRU) multi-step flood prediction model with attention mechanism is proposed. In our model, the attention mechanism is used to automatically adjust the matching degree between the input features and output. Besides, we use a bidirectional GRU model, which can process the input sequence from two directions of time series (chronologically and antichronologically), then merge their representations together. Compared with the prediction model using Long Short Term Memory (LSTM), our method can generate better prediction result, as can be seen from the arrival time error and peak error of floods during multi-step predictions.
Article
Full-text available
As a result of global warming, the occurrences of floods have increased in frequency and severity. Flooding often occurs near rivers and low-lying areas, which makes such areas higher-risk locations. Flood-risk evaluation represents an essential analytic step in preventing floods and reducing losses. However, the uncertainty and nonlinear relation between evaluation indices and risk levels are always difficult points in the evaluation process. Fuzzy comprehensive evaluation (FCE), an effective method for solving random, fuzzy and multi-index problems, has led to progress in understanding this relation. Thus, in this study, an assessment model based on FCE is adopted to evaluate flood risk in the Dongjiang River Basin. To correct the one-sidedness of the single weighting method, a combination weight integrating subjective weight and objective weight is adopted based on game theory. The evaluation results show that high-risk areas are mainly located in regions that include unfavorable terrain, developed industries and dense population. These high-risk areas appropriately coincide with the integrated risk zoning map and inundation areas of historical floods, proving that the evaluation model is feasible and rational. The results also can be used as references for the prevention and reduction of floods and other applications in the Dongjiang River Basin.
Article
Full-text available
This study aims to examine the performance of Random Forest (RF) and Maximum Likelihood Classification (MLC) method to crop classification through pixel-based and parcel-based approaches. Analyses are performed on multispectral SPOT 5 image. First, the SPOT 5 image is classified using the classification methods in pixel-based manner. Next, the produced thematic maps are overlaid with the original agricultural parcels and the frequencies of the pixels within the parcels are computed. Then, the majority of the pixels are assigned as class label to the parcels. Results indicate that the overall accuracies of the parcel-based approach computed for the Random Forest method is 85.89%, which is about 8% better than the corresponding result of MLC.
Article
Full-text available
This study aims to examine the performance of Random Forest (RF) and Maximum Likelihood Classification (MLC) method to crop classification through pixel-based and parcel-based approaches. Analyses are performed on multispectral SPOT 5 image. First, the SPOT 5 image is classified using the classification methods in pixel-based manner. Next, the produced thematic maps are overlaid with the original agricultural parcels and the frequencies of the pixels within the parcels are computed. Then, the majority of the pixels are assigned as class label to the parcels. Results indicate that the overall accuracies of the parcel-based approach computed for the Random Forest method is 85.89%, which is about 8% better than the corresponding result of MLC.
Article
Full-text available
This study presents the methodology and procedure for risk assessment of flood disasters in central Liaoning Province, which was supported by geographical information systems (GIS) and technology of natural disaster risk assessment. On the basis of the standard formulation of natural disaster risk and flood disaster risk index, of which weights were developed using combined weights of entropy, the relative membership degree functions of variable fuzzy set (VFS) theory were calculated using improved set pair analysis, while level values were calculated using VFSs, including hazard levels, exposure levels, vulnerability levels and restorability levels, and the flood risk level for each assessment unit was obtained using the natural disaster index method. Consequently, integrated flood risk map was carried out by GIS spatial analysis technique. The results show that the southwestern and central parts of the study area possess higher risk, while the northwestern and southeastern parts possess lower risk. The results got by the assessment model fits the area of historical flood data; this study offer new insights and possibility to carry out an efficient way for flood disaster prevention and mitigation. The study also provides scientific reference in flood risk management for local and national governmental agencies.
Article
Full-text available
Floods are a serious hazard to life and property. The traditional probability statistical method is acceptable in analysing the flood risk but requires a large sample size of hydrological data. This paper puts forward a composite method based on artificial neural network (ANN) and information diffusion method (IDM) for flood analysis. Information diffusion theory helps to extract as much useful information as possible from the sample and thus improves the accuracy of system recognition. Meanwhile, an artificial neural network model, back-propagation (BP) neural network, is used to map the multidimensional space of a disaster situation to a one-dimensional disaster space and to enable resolution of the grade of flood disaster loss. These techniques all contribute to a reasonable prediction of natural disaster risk. As an example, application of the method is verified in a flood risk analysis in China, and the risks of different flood grades are determined. Our model yielded very good results and suggests that the methodology is effective and practical, with the potentiality to be used to forecast flood risk for use in flood risk management. It is also hoped that by conducting such analyses lessons can be learned so that the impact of natural disasters such as floods can be mitigated in the future.
Article
Full-text available
Flood exposure is increasing in coastal cities owing to growing populations and assets, the changing climate, and subsidence. Here we provide a quantification of present and future flood losses in the 136 largest coastal cities. Using a new database of urban protection and different assumptions on adaptation, we account for existing and future flood defences. Average global flood losses in 2005 are estimated to be approximately US$6billion per year, increasing to US$52billion by 2050 with projected socio-economic change alone. With climate change and subsidence, present protection will need to be upgraded to avoid unacceptable losses of US$1trillion or more per year. Even if adaptation investments maintain constant flood probability, subsidence and sea-level rise will increase global flood losses to US$60-63billion per year in 2050. To maintain present flood risk, adaptation will need to reduce flood probabilities below present values. In this case, the magnitude of losses when floods do occur would increase, often by more than 50%, making it critical to also prepare for larger disasters than we experience today. The analysis identifies the cities that seem most vulnerable to these trends, that is, where the largest increase in losses can be expected.
Article
Full-text available
The risk of flood disasters is increasing for many coastal societies owing to global and regional changes in climate conditions, sea-level rise, land subsidence and sediment supply. At the same time, in many locations, conventional coastal engineering solutions such as sea walls are increasingly challenged by these changes and their maintenance may become unsustainable. We argue that flood protection by ecosystem creation and restoration can provide a more sustainable, cost-effective and ecologically sound alternative to conventional coastal engineering and that, in suitable locations, it should be implemented globally and on a large scale.
Article
Full-text available
The future impacts of climate change on landfalling tropical cyclones are unclear. Regardless of this uncertainty, flooding by tropical cyclones will increase as a result of accelerated sea-level rise. Under similar rates of rapid sea-level rise during the early Holocene epoch most low-lying sedimentary coastlines were generally much less resilient to storm impacts. Society must learn to live with a rapidly evolving shoreline that is increasingly prone to flooding from tropical cyclones. These impacts can be mitigated partly with adaptive strategies, which include careful stewardship of sediments and reductions in human-induced land subsidence.
Article
Full-text available
The predictive analysis of natural disasters and their consequences is challenging because of uncertainties and incomplete data. The present article studies the use of variable fuzzy sets (VFS) and improved information diffusion method (IIDM) to construct a composite method. The proposed method aims to integrate multiple factors and quantification of uncertainties within a consistent system for catastrophic risk assessment. The fuzzy methodology is proposed in the area of flood disaster risk assessment to improve probability estimation. The purpose of the current study is to establish a fuzzy model to evaluate flood risk with incomplete data sets. The results of the example indicate that the methodology is effective and practical; thus, it has the potential to forecast the flood risk in flood risk management.
Article
Full-text available
Tree species diversity is a key parameter to describe forest ecosystems. It is, for example, important for issues such as wildlife habitat modeling and close-to-nature forest management. We examined the suitability of 8-band WorldView-2 satellite data for the identification of 10 tree species in a temperate forest in Austria. We performed a Random Forest (RF) classification (object-based and pixel-based) using spectra of manually delineated sunlit regions of tree crowns. The overall accuracy for classifying 10 tree species was around 82% (8 bands, object-based). The class-specific producer's accuracies ranged between 33% (European hornbeam) and 94% (European beech) and the user's accuracies between 57% (European hornbeam) and 92% (Lawson's cypress). The object-based approach outperformed the pixel-based approach. We could show that the 4 new WorldView-2 bands (Coastal, Yellow, Red Edge, and Near Infrared 2) have only limited impact on classification accuracy if only the 4 main tree species (Norway spruce, Scots pine, European beech, and English oak) are to be separated. However, classification accuracy increased significantly using the full spectral resolution if further tree species were included. Beside the impact on overall classification accuracy, the importance of the spectral bands was evaluated with two measures provided by RF. An in-depth analysis of the RF output was carried out to evaluate the impact of reference data quality and the resulting reliability of final class assignments. Finally, an extensive literature review on tree species classification comprising about 20 studies is presented.
Article
Full-text available
The method of Random Forest (RF) was used to classify whether rockburst will happen and the intensity of rockburst in the underground rock projects. Some main control factors of rockburst, such as the values of in-situ stresses, uniaxial compressive strength and tensile strength of rock, and the elastic energy index of rock, were selected in the analysis. The traditional indicators were summarized and divided into indexes I and II. Random Forest model and criterion were obtained through training 36 sets of rockburst samples which come from underground rock projects in domestic and abroad. Another 10 samples were tested and evaluated with the model. The evaluated results agree well with the practical records. Comparing the results of support vector machine (SVM) method, and artificial neural network (ANN) method with random forest method, the corresponding misjudgment ratios are 10%, 20%, and 0, respectively. The misjudgment ratio using index I is smaller than that using index II. It is suggested that using the index I and RF model can accurately classify rockburst grade.
Article
Full-text available
Floods often take place around rivers and plains, which indicates a higher risk of flooding in these areas. This paper adopts fuzzy comprehensive assessment (FCA), simple fuzzy classification (SFC), and the fuzzy similarity method (FSM) to assess flood disaster risk in Kelantan, Malaysia. Validation data, such as the flooded area, paddy area, urban area, residential area, and refuges, were overlaid to validate and analyze the accuracy of flood disaster risk. The results show that (1) 70–75% of flooded areas lie within the higher and highest risk zones, which shows an effective assessment accuracy; (2) paddy, built-up, and residential areas concentrated in the higher and highest risk zones are more likely to be destroyed by flood disasters; (3) 200–225 refuges in the higher and highest risk zones account for around 50% of all refuges, which means that more refuges should be built in the higher and highest risk zones to meet the accommodation requirement; (4) three methods proved to be feasible and effective in evaluating flood disaster risk, among which FCA is more suitable for the study area than the two other methods.
Article
Full-text available
The usual approach for flood damage assessment consists of stage-damage functions which relate the relative or absolute damage for a certain class of objects to the inundation depth. Other characteristics of the flooding situation and of the flooded object are rarely taken into account, although flood damage is influenced by a variety of factors. We apply a group of data-mining techniques, known as tree-structured models, to flood damage assessment. A very comprehensive data set of more than 1000 records of direct building damage of private households in Germany is used. Each record contains details about a large variety of potential damage-influencing characteristics, such as hydrological and hydraulic aspects of the flooding situation, early warning and emergency measures undertaken, state of precaution of the household, building characteristics and socio-economic status of the household. Regression trees and bagging decision trees are used to select the more important damage-influencing variables and to derive multi-variate flood damage models. It is shown that these models outperform existing models, and that tree-structured models are a promising alternative to traditional damage models.
Article
Full-text available
Abstract Flood risk management decisions require the rational assessment of mitigation strategies. This is a complex decision-making process involving many uncertainties. This paper presents a case study where a cost-benefit based methodology is used to define the best intervention measures for flood-riskmitigation in central Spain. Based on different flood hazard scenarios, several structural measures considered by the local BasinWater Authority and others defined by engineering criteria were checked for operability. Non-systematic data derived from dendrogeomorphological analysis of riparian trees were included in the flood frequency analysis. Flood damage was assessed by means of depth-damage functions, and flooded urban areas were obtained by applying a hydraulic model. The best defense strategies were obtained by a cost-benefit procedure, where uncertainties derived from each analytical process were incorporated based on a stochastic approach to estimate expected economic losses. The results showed that large structural solutions are not economically viable when compared with other smaller structural measures, presumably because of the pre-established location of dams in the upper part of the basin which do not laminate the flow generated by the surrounding catchment to Navaluenga.
Article
Full-text available
Flooding is the most common natural hazard in Greece, and most of low-lying urban centers are flood-prone areas. Assessment of flood hazard zones is a necessity for rational management of watersheds. In this study, the coupling of the analytical hierarchy process and geographical information systems were used, in order to assess flood hazard, based either on natural or on anthropogenic factors. The proposed method was applied on Kassandra Peninsula, in Northern Greece. The morphometric and hydrographic characteristics of the watersheds were calculated. Moreover, the natural flood genesis factors were examined, and subsequently, the anthropogenic interventions within stream beds were recorded. On the basis of the above elements, two flood hazard indexes were defined, separately for natural and anthropogenic factors. According to the results of these indexes, the watersheds of the study area were grouped into hazard classes. At the majority of watersheds, the derived hazard class was medium (according to the classification) due to natural factors and very high due to anthropogenic. The results were found to converge to historical data of flood events revealing the realistic representation of hazard on the relating flood hazard maps.
Article
Full-text available
Flooding is one of the most destructive natural hazards that cause damage to both life and property every year, and therefore the development of flood model to determine inundation area in watersheds is important for decision makers. In recent years, data mining approaches such as artificial neural network (ANN) techniques are being increasingly used for flood modeling. Previously, this ANN method was frequently used for hydrological and flood modeling by taking rainfall as input and runoff data as output, usually without taking into consideration of other flood causative factors. The specific objective of this study is to develop a flood model using various flood causative factors using ANN techniques and geographic information system (GIS) to modeling and simulate flood-prone areas in the southern part of Peninsular Malaysia. The ANN model for this study was developed in MATLAB using seven flood causative factors. Relevant thematic layers (including rainfall, slope, elevation, flow accumulation, soil, land use, and geology) are generated using GIS, remote sensing data, and field surveys. In the context of objective weight assignments, the ANN is used to directly produce water levels and then the flood map is constructed in GIS. To measure the performance of the model, four criteria performances, including a coefficient of determination (R 2), the sum squared error, the mean square error, and the root mean square error are used. The verification results showed satisfactory agreement between the predicted and the real hydrological records. The results of this study could be used to help local and national government plan for the future and develop appropriate (to the local environmental conditions) new infrastructure to protect the lives and property of the people of Johor.
Article
Full-text available
This study contributes to the comprehensive assessment of flood hazard and risk for the Phrae flood plain of the Yom River basin in northern Thailand. The study was carried out using a hydrologic–hydrodynamic model in conjunction with a geographic information system (GIS). The model was calibrated and verified using the observed rainfall and river flood data during flood seasons in 1994 and 2001, respectively. Flooding scenarios were evaluated in terms of flooding depth for events of 25-, 50-, 100- and 200-year return periods. An impact-based hazard estimation technique was applied to assess the degree of hazard across the flood plain. The results showed that 78% of the Phrae flood-plain area of 476 km in the upper Yom River basin lies in the hazard zone of the 100-year return-period flood. Risk analyses were performed by incorporating flood hazard and the vulnerability of elements at risk. Based on relative magnitude of risk, flood-prone areas were divided into low-, moderate-, high- and severe-risk zones. For the 100-year return-period flood, the risk-free area was found to be 22% of the total flood plain, while areas under low, medium, high and severe risk were 33, 11, 28 and 6%, respectively. The outcomes are consistent with overall property damage recorded in the past. The study identifies risk areas for priority-based flood management, which is crucial when there is a limited budget to protect the entire risk zone simultaneously.Citation Tingsanchali, T. & Karim, F. (2010) Flood-hazard assessment and risk-based zoning of a tropical flood plain: case study of the Yom River, Thailand. Hydrol. Sci. J.55(2), 145–161.
Article
Full-text available
Observed rainfall and flow data from the Dongjiang River basin in humid southern China were used to investigate runoff changes during low-flow and flooding periods and in annual flows over the past 45 years. We first applied the non-parametric Mann–Kendall rank statistic method to analyze the change trend in precipitation, surface runoff and pan evaporation in those three periods. Findings showed that only the surface runoff in the low-flow period increased significantly, which was due to a combination of increased precipitation and decreased pan evaporation. The Pettitt–Mann–Whitney statistical test results showed that 1973 and 1978 were the change points for the low-flow period runoff in the Boluo sub-catchment and in the Qilinzui sub-catchment, respectively. Most importantly, we have developed a framework to separate the effects of climate change and human activities on the changes in surface runoff based on the back-propagation artificial neural network (BP-ANN) method from this research. Analyses from this study indicated that climate variabilities such as changes in precipitation and evaporation, and human activities such as reservoir operations, each accounted for about 50% of the runoff change in the low-flow period in the study basin. Copyright © 2010 John Wiley & Sons, Ltd.
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Book
Digital Terrain Analysis in Soil Science and Geology provides soil scientists and geologists with an integrated view of the principles and methods of digital terrain analysis (DTA). Its attention to first principles and focus on error analysis makes it a useful resource for scientists to uncover the method applications particular to their needs. Digital Terrain Analysis in Soil Science and Geology covers a wide range of geological applications in the context of multi-scale problems of soil science and geology. Presents a mathematical approach from a single author who is actively researching in the field and has published a number of fundamental papers. Outlines principles, methods, then follows with examples in a simple set-up that builds on content. Provides an integrated view of the principles and methods of DTA.
Article
Hierarchical comprehensive evaluation method is widely used in flood disaster loss assessment and risk prediction of strength. How to improve its performance and tempo is still a continuous research problem. Support vector machine (SVM) is proved to be one of most effective method to solve the classification problem of small samples. However, the traditional SVM does not reflect difference of different indexes and leads to error. So we propose the combined weighted SVM (CWSVM) to evaluate flood disaster grade. The model modifies the kernel function of SVM and solves the problem that samples' Euclidian distances are necessary to really embody the feature difference. Besides, the model modifies the distance by both the objective value difference and the subjective human conventions. By comparative analyzing the assessment results of flood disaster data in China from 1950 to 2009, the CWSVM obtained higher classification precision. The research offers a new efficient way to solve multi-index comprehensively evaluation problem.
Chapter
In this chapter we consider bounds on the rate of uniform convergence. We consider upper bounds (there exist lower bounds as well (Vapnik and Chervonenkis, 1974); however, they are not as important for controlling the learning processes as the upper bounds).
Chapter
In the history of research of the learning problem one can extract four periods that can be characterized by four bright events: (i) Constructing the first learning machines, (ii) constructing the fundamentals of the theory, (iii) constructing neural networks, (iv) constructing the alternatives to neural networks.
Article
To ascertain the optimal reserve of the professional relief workers in China, the author defines and optimizes the reserve cycle of the professional relief workers in this paper. Based on the analysis of the optimal personnel reserve in a single cycle, the author gains the short-term and long-term security personnel reserves as well as the relevant optimal personnel reserves, and then analyzes and introduces the factors to influence the optimal reserve and the corresponding adjustment methods.
Article
The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
Article
Effectively predicting corporate financial distress is an important and challenging issue for companies. The research aims at predicting financial distress using the integrated model of rough set theory (RST) and support vector machine (SVM), in order to find a better early warning method and enhance the prediction accuracy. After several comparative experiments with the dataset of Chinese listed companies, rough set theory is proved to be an effective approach for reducing redundant information. Our results indicate that the SVM performs better than the BPNN when they are used for corporate financial distress prediction.
Article
Corporate going-concern opinions are not only useful in predicting bankruptcy but also provide some explanatory power in predicting bankruptcy resolution. The prediction of a firm's ability to remain a going concern is an important and challenging issue that has served as the impetus for many academic studies over the last few decades. Although intellectual capital (IC) is generally acknowledged as the key factor contributing to a corporation's ability to remain a going concern, it has not been considered in early prediction models. The objective of this study is to increase the accuracy of going-concern prediction by using a hybrid random forest (RF) and rough set theory (RST) approach, while adopting IC as a predictive variable. The results show that this proposed hybrid approach has the best classification rate and the lowest occurrence of Types I and II errors, and that IC is indeed valuable for going-concern prediction.
Article
T-lymphocyte (T-cell) is a very important component in human immune system. T-cell epitopes can be used for the accurately monitoring the immune responses which activation by major histocompatibility complex (MHC), and rationally designing vaccines. Therefore, accurate prediction of T-cell epitopes is crucial for vaccine development and clinical immunology. In current study, two types peptide features, i.e., amino acid properties and chemical molecular features were used for the T-cell epitopes peptide representation. Based on these features, random forest (RF) algorithm, a powerful machine learning algorithm, was used to classify T-cell epitopes and non-T-cell epitopes. The classification accuracy, sensitivity, specificity, Matthews correlation coefficient (MCC), and area under the curve (AUC) values for proposed method are 97.54%, 97.22%, 97.60%, 0.9193, and 0.9868, respectively. These results indicate that current method based on the combined features and RF is effective for T-cell epitopes prediction.
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.