Article

Comprehensive evaluation of machine learning algorithms for flood susceptibility mapping in Wardha River sub-basin, India

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Machine learning offers a powerful and versatile approach to flood susceptibility mapping, enabling us to leverage complex data and improve prediction accuracy. Given the plethora of available techniques and the challenges in selecting the optimal approach, this study investigates prominent ML algorithms for flood susceptibility mapping (FSM) in the Wardha River sub-basin, India. Seven machine learning algorithms, viz. support vector machine (SVM), extreme gradient boosting (XGB), artificial neural network (ANN), generalized linear model (GLM), gradient boosting machine (GBM), random forest (RF), and linear discriminant analysis (LDA), were evaluated at varying spatial resolutions (30 m, 50 m, 100 m, and 200 m). Seven flood-inducing factors (elevation, flow accumulation, topographic wetness index, slope, rainfall, land use, and drain density) were considered. Model performance was assessed using sensitivity, specificity, area under the curve (AUC), overall correlation, overall standard deviation ratio, and overall root mean square difference (RMSD). The impact of spatial resolution on models’ accuracy was analysed. SVM, GBM, and RF were significantly affected, while ANN, GLM, and XGB were less sensitive. LDA excelled in execution time and spatial resolution resilience. The overall ranking of models was executed based on their accuracy, AUC, and execution time. XGB outperformed GBM and RF, securing first place, while SVM ranked last. GLM, ANN, and LDA ranked third to fifth. The results highlighted the importance of algorithm selection in accurately mapping flood susceptibility, particularly when working with varying spatial resolution data. The study findings can inform the decision-making process for implementing FSM using these machine learning algorithms.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Machine learning (ML) methods, on the other hand, are emerging as a promising alternative in flood risk management and disaster reduction 19,20 . Unlike physically-based models, ML approaches leverage algorithms to automatically learn patterns from historical data, such as rainfall, land use, and flood records 21 . ...
Article
Full-text available
Urban flooding threatens urban resilience and challenges SDGs 11 and 13. This study assesses urban building flood risk in Guangzhou by integrating flood susceptibility with building unction vulnerability. Using a Random Forest (RF) model, it predicts flood susceptibility based on flood records, hydrological, topographical, and anthropogenic features. The Categorical Boosting (CatBoost) model identifies building functions using POI and AOI data. Results reveal significant spatial variations: central districts exhibit higher flood susceptibility, while peripheral areas remain less affected. Over half of the buildings are moderately vulnerable, with only a small fraction highly vulnerable. Based on flood susceptibility and functional vulnerability, Guangzhou is classified into three district types: central urban (Type I), intermediate urban (Type II), and suburban/rural (Type III). The study underscores the need for tailored flood risk management strategies to address these disparities and mitigate climate change-induced water hazards.
Article
Full-text available
This study investigates sediment removal efficiency in river systems for optimizing hydroelectric projects using a novel Deep Neural Network (DNN) model tailored for river basin sediment transport. Aimed at evaluating sedimentation dynamics under various discharge conditions, the research explores sediment removal efficiency over extended run times (0–180 h) across discharge rates of 250 to 500 cumec. The DNN model employs river discharge and sediment concentration values as key inputs to predict sediment removal efficiency, capturing the complex interplay between discharge conditions and sediment transport patterns. Differential Evolution Optimization was applied to refine operational parameters, achieving an optimal discharge rate of 374.40 cumec and a sediment concentration of 2497.61 ppm, ensuring efficient operation while minimizing turbine wear due to sediment deposition. The model also identified optimal conditions for the diversion tunnel, set at 130 cumec and 2714.87 ppm, sustained for a runtime of 97.12 h to maintain a balanced sediment flow. Performance evaluation metrics, including R-squared (99.49%), MSE (0.1847), RMSE (0.4297), and MAE (0.2409), indicate superior model accuracy and predictive reliability. Additionally, Taylor diagrams were used to validate the model’s generalization capability across training and testing phases. This research highlights the DNN model’s potential in sedimentation forecasting, contributing to river management and environmental conservation, and suggests future integration with environmental variables for enhanced predictive capacity. The study demonstrates significant advancements in data-driven sediment analysis and underscores the critical role of optimization for sustainable hydroelectric operation. Graphical Abstract
Article
Full-text available
Flooding poses a significant threat as a prevalent natural disaster. To mitigate its impact, identifying flood-prone areas through susceptibility mapping is essential for effective flood risk management. This study conducted flood susceptibility mapping (FSM) in Chandrapur district, Maharashtra, India, using geographic information system (GIS)-based frequency ratio (FR) and Shannon’s entropy index (SEI) models. Seven flood-contributing factors were considered, and historical flood data were utilized for model training and testing. Model performance was evaluated using the area under the curve (AUC) metric. The AUC values of 0.982 for the SEI model and 0.966 for the FR model in the test dataset underscore the robust performance of both models. The results revealed that 5.4% and 8.1% (FR model) and 3.8% and 7.6% (SEI model) of the study area face very high and high risks of flooding, respectively. Comparative analysis indicated the superiority of the SEI model. The key limitations of the models are discussed. This study attempted to simplify the process for the easy and straightforward implementation of FR and SEI statistical flood susceptibility models along with key insights into the flood vulnerability of the study region.
Article
Full-text available
Flash floods stand as a substantial peril linked to climate change, imposing a severe menace to both human existence and built structures. This study aims to assess and compare the effectiveness of four distinct machine learning (ML) methodologies in the production of flood susceptibility maps (FSMs) in Ibaraki prefecture, Japan. Additionally, the investigation aims to examine the influence of excluding plan and profile curvature factors on the accuracy of the resulting maps. The dataset comprised 224 spots, consisting of 112 flooded and 112 non-flooded locations, and 11 environmental factors. The models were trained using 70% of the dataset, while the remaining 30% was utilized for model evaluation using the ROC curve method. The results indicated that both the ANN-MLP and SVR models achieved notable accuracy, with area under curve values of 95.23% and 95.83% respectively. An intriguing observation was made when the plan and profile curvature factors were excluded, as it led to an improvement in the accuracy of the ANN-MLP model, resulting in an accuracy of 96.7%. Furthermore, the generated FSMs were classified into five distinct hazard levels. The northern region of the maps predominantly exhibited very low and low hazard levels, while areas located in the southern region, closer to main streams, demonstrated considerably higher hazard levels categorized as very high and high. Ultimately, this study marks novel endeavor to investigate the impact of the curvature factor on the precision of machine learning algorithms in the creation of FSMs, which serve as fundamental tools for subsequent investigations.
Article
Full-text available
Urban flooding can differ significantly from rural flooding due to the influence of rapidly changing land use and rainfall patterns on runoff in urban areas. Consequently, understanding and managing urban flooding necessitate a comprehensive grasp of these influential factors. This study focuses on assessing the impact of land use and rainfall changes on runoff and flood resilience in urban areas of Chennai, India, utilizing the InVEST-UFRM model. The research includes an evaluation of flood risk and potential damage to building infrastructure, examining 14 sub-basins within the study area with diverse land use and rainfall depths for 2015, 2020, and 2025. Observed rainfall and land use data were employed for 2015 and 2020, while future rainfall data relied on Global Circulation Models (GCMs) of the Coupled Model Inter-comparison Project-6 (CMIP6) outputs and QGIS MOLUSCE plugin predicted land use for 2025. The study identified that the change in land use had a more significant impact on runoff than the temporal change in rainfall amount. Notably, the reduction of water bodies in the study area emerged as a major contributing factor to excessive runoff. The estimated maximum potential damage to built infrastructure in the study area reached approximately 10 billion USD. This research provides valuable insights into urban flood resilience and the impact of land use and rainfall changes and proposes effective measures for flood adaptation and mitigation. The study findings can serve as essential tools for urban planners in an effective management of urban floods in similar regions as investigated here.
Article
Full-text available
Identifying flood-prone areas is essential for preventing floods, reducing risks, and making informed decisions. A spatial database with 595 flood inventory and 13 flood predictors were used to implement five boosting algorithms: gradient boosting machine (GBM), extreme gradient boosting, categorical boosting , logit boost, and light gradient boosting machine (LGBM) to map flood susceptibility in Rathnapura while evaluating trained model's generalizing ability and assessing the feature importance in flood susceptibility mapping (FSM). The model performance was evaluated using the F1-score, kappa index, and area under curve (AUC) method. The findings revealed that all the models were effective in identifying the overall flood susceptibility trends while LightGBM model had superior results (F1-score = 0.907, Kappa value = 0.813 and AUC = 0.970), securing the top scores across all performance metrics compared to the other models (for testing dataset). Based on kappa evaluation, most of the models had finer performance (AUC min = 0.737) while LightGBM had moderate performance for predictions beyond the training region. According to the results, regions with lower altitudes and topographic roughness values, moderate rainfall, and proximity to rivers are more susceptible to flooding. This framework can be adapted for rapid FSM in data-deficient regions.
Article
Full-text available
A flood is a common and highly destructive natural disaster. Recently, machine learning methods have been widely used in flood susceptibility analysis. This paper proposes a NHAND (New Height Above the Nearest Drainage) model as a framework to evaluate the effectiveness of both individual learners and ensemble models in addressing intricate flood-related challenges. The evaluation process encompasses critical dimensions such as prediction accuracy, model training duration, and stability. Research findings reveal that, compared to Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Lasso, Random Forest (RF), and Extreme Gradient Boosting (XGBoost), Stacked Generalization (Stacking) outperforms in terms of predictive accuracy and stability. Meanwhile, XGBoost exhibits notable efficiency in terms of training duration. Additionally, the Shapley Additive Explanations (SHAP) method is employed to explain the predictions made by the XGBoost.
Article
Full-text available
A flood is a natural catastrophe that causes heavy damage not only to people but also to properties. To prevent and mitigate flood damage, an accurate flood susceptibility map that reveals highly potential flood-prone areas is essential. This study aims to construct flood susceptibility maps in the Huong Khe district using three machine learning algorithms, namely the K - Nearest Neighbour (KNN), the Support Vector Machine (SVM) and Artificial Neural Network (ANN). Training and testing datasets were extracted from Sentinel-1 SAR images. Seven causative factors were selected as input for predictive models after removing high-correlation factors and unimportant factors through a rigorous screening process by analyzing the Pearson correlation coefficient (PCC) and calculating the information gain ratio (InGR). The model's hyperparameters were found by grid search algorithm integrated 5-fold cross-validation. The three optimal flood susceptibility models showed excellent performance, with very high accuracy indices in the training and testing phases, over 90% of overall accuracy and UAC values. High and very high susceptibility classes on flood susceptibility maps accounted for around 18% of the total study area and were mainly located in residential and agricultural areas. Thus, there is a need to make proper land use planning for these areas to reduce damage in flood seasons.
Article
Full-text available
The unique characteristics of drainage conditions in the Pagla river basin cause flooding and harm the socioeconomic environment. The main purpose of this study is to investigate the comparative utility of six machine learning algorithms to improve flood susceptibility and ensemble techniques' capability to elucidate the underlying patterns of floods and make a more accurate prediction of flood susceptibilities in the Pagla river basin. In the present scenario, the frequency of flood conditions in this study area becomes high with heavy and sudden rainfall, so it is essential to study flood mitigation and measure. At first, a spatial flood database was built with 200 flood locations and sixteen flood influencing factors, and its process with the help of the Geographic Information System (GIS) environment and build up different models applying the machine learning techniques. It has found different flood susceptibility zone using machine learning-based Artificial Neural Network (ANN), Support Vector Machine (SVM), Random Forest (RF), Reduced Error Pruning Tree (REPTree), Logistic Regression (LR), and Bagging helping GIS environment and the model validation using the Receiver Operating Characteristic Curve (ROC). Afterward, ensemble all the models to gate comparative accuracy of the flood zone. The calculated areas are under the very high flood susceptibility zone 8.69%, 14.92%, 14.17%, 12.98%, 14.65%, 13.24% and 13.41% for ANN, SVM, RF, REPTree, LR and Bagging, respectively. Finally, ROC curve, the Standard Error (SE), and the Confidence Interval (CI) at 95 per cent were used to assess and compare the performance of the models. The obtained results indicate that all models are highly accepted Area Under Curve (AUC) of ROC between 0.889 (LR) to 0.926 (Ensemble). From the estimation of the accuracy of the applied methods using ROC, it is found that the Ensemble model has the higher capability compared to the other applied models in projecting flood susceptibility in the study area. It has the highest area under the ROC curve the AUC values are 0.918 and 0.926, the SE (0.023, 034), and the narrowest CI (95 per cent) (0.873–0.962, 0.859–0.993) whereas highest area under Bagging (the ROC) curve (AUC) value (0.914, 0.919), for both the training and validation datasets. After ensembling, the result shows that the result is a highly flood susceptible area located at the lower part of the study area. In this area, the very high flood susceptibility zone values lie between 4.46 and 6.00 in the ensemble result. The areas comprise the low height and belong to Murarai I, Murarai II, Suti I and Suti II C.D. block of West Bengal. The current study will help the policymakers and the researcher determine the flood conditioning problems for prospects.
Article
Full-text available
A correct understanding of the parameters and methods used in flood susceptibility mapping (FSM) is critical for identifying the strengths and limitations of different mapping approaches, as well as for developing methodologies. In this study, we examined scientific publications in the literature using WoS. Although the number of methods used is quite high, the number of parameters used in these methods varies, with a maximum of 21 and a minimum of 5 parameters preferred. It was found that the most commonly used parameter has a preference rate of 97%, but there is no common parameter in 100% of the studies. The methods used for determining flood susceptibility include multi-criteria decision-making (MCDM) methods, physically based hydrological models, statistical methods, and various soft computing methods. Although the use of traditional statistical methods and MCDM methods is already high among researchers, the methods used in flood susceptibility analysis have evolved over the years from traditional human judgments to statistical methods based on big data and machine learning. In the reviewed studies, it was observed that machine learning, fuzzy logic, metaheuristic optimization algorithms, and heuristic search algorithms, which are soft computing methods, have been widely used in FSM in recent years. HIGHLIGHTS Determination of the methods used in the literature for susceptibility mapping used to identify flood-prone areas for sustainable flood management (more than 150 methods have been found to be used). Also, creating master classes for these methods.; Interpreting the interchangeability of the parameters used in the FSM methods in the literature and creating master classes for these parameters for researchers.;
Article
Full-text available
Flood, a distinctive natural calamity, has occurred more frequently in the last few decades all over the world, which is often an unexpected and inevitable natural hazard, but the losses and damages can be managed and controlled by adopting effective measures. In recent times, flood hazard susceptibility mapping has become a prime concern in minimizing the worst impact of this global threat; but the nonlinear relationship between several flood causative factors and the dynamicity of risk levels makes it complicated and confronted with substantial challenges to reliable assessment. Therefore, we have considered SVM, RF, and ANN—three distinctive ML algorithms in the GIS platform—to delineate the flood hazard risk zones of the subtropical Kangsabati river basin, West Bengal, India; which experienced frequent flood events because of intense rainfall throughout the monsoon season. In our study, all adopted ML algorithms are more efficient in solving all the non-linear problems in flood hazard risk assessment; multi-collinearity analysis and Pearson’s correlation coefficient techniques have been used to identify the collinearity issues among all fifteen adopted flood causative factors. In this research, the predicted results are evaluated through six prominent and reliable statistical (“AUC-ROC, specificity, sensitivity, PPV, NPV, F-score”) and one graphical (Taylor diagram) technique and shows that ANN is the most reliable modeling approach followed by RF and SVM models. The values of AUC in the ANN model for the training and validation datasets are 0.901 and 0.891, respectively. The derived result states that about 7.54% and 10.41% of areas accordingly lie under the high and extremely high flood danger risk zones. Thus, this study can help the decision-makers in constructing the proper strategy at the regional and national levels to mitigate the flood hazard in a particular region. This type of information may be helpful to the various authorities to implement this outcome in various spheres of decision making. Apart from this, future researchers are also able to conduct their research byconsidering this methodology in flood susceptibility assessment.
Article
Full-text available
Floods are one of the most destructive natural disasters, causing financial and human losses every year. As a result, reliable Flood Susceptibility Mapping (FSM) is required for effective flood management and reducing its harmful effects. In this study, a new machine learning model based on the Cascade Forest Model (CFM) was developed for FSM. Satellite imagery, historical reports, and field data were used to determine flood-inundated areas. The database included 21 flood-conditioning factors obtained from different sources. The performance of the proposed CFM was evaluated over two study areas, and the results were compared with those of other six machine learning methods, including Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Deep Neural Network (DNN), Light Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting (XGBoost), and Categorical Boosting (CatBoost). The result showed CFM produced the highest accuracy compared to other models over both study areas. The Overall Accuracy (AC), Kappa Coefficient (KC), and Area Under the Receiver Operating Characteristic Curve (AUC) of the proposed model were more than 95%, 0.8, 0.95, respectively. Most of these models recognized the southwestern part of the Karun basin, northern and northwestern regions of the Gorganrud basin as susceptible areas.
Article
Full-text available
Floods are among the most devastating environmental hazards that directly and indirectly affect people’s lives and activities. In many countries, sustainable environmental management requires the assessment of floods and the likely flood-prone areas to avoid potential hazards. In this study, the performance and capabilities of seven machine learning algorithms (MLAs) for flood susceptibility mapping were tested, evaluated, and compared. These MLAs, including support vector machine (SVM), random forest (RF), multivariate adaptive regression spline (MARS), boosted regression tree (BRT), functional data analysis (FDA), general linear model (GLM), and multivariate discriminant analysis (MDA), were tested for the area between Safaga and Ras Gharib cities, Red Sea, Egypt. A geospatial database was developed with eleven flood-related factors, namely altitude, slope aspect, lithology, land use/land cover (LULC), slope length (LS), topographic wetness index (TWI), slope angle, profile curvature, plan curvature, stream power index (SPI), and hydrolithology units. In addition, 420 actual flooded areas were recorded from the study area to create a flood inventory map. The inventory data were randomly divided into training group with 70% and validation group with 30%. The flood-related factors were tested with a multicollinearity test, the variance inflation factor (VIF) was less than 2.135, the tolerance (TOL) was more than 0.468, and their importance was evaluated with a partial least squares (PLS) method. The results show that RF performed the best with the highest AUC (area under curve) value of 0.813, followed by GLM with 0.802, MARS with 0.801, BRT with 0.777, MDA with 0.768%, FDA with 0.763, and SVM with 0.733. The results of this study and the flood susceptibility maps could be useful for environmental mitigation, future development activities in the area, and flood control areas.
Article
Full-text available
Historical exploration of flash flood events and producing flash flood susceptibility maps are crucial steps for decision makers in disaster management. In this paper, classification and regression tree (CART) methodology and its ensemble models of random forest (RF), boosted regression trees (BRT), and extreme gradient boosting (XGBoost) were implemented to create a flash flood susceptibility map of the Bâsca Chiojdului River Basin, one of the areas in Romania that is constantly exposed to flash floods. The torrential areas including 962 flash flood events were delineated from orthophotomaps and field observations. Furthermore, a set of conditioning forces to explain the flash floods was constructed which included aspect, land use and land cover (LULC), hydrological soil groups lithology, slope, topographic wetness index (TWI), topographic position index (TPI), profile curvature, convergence index, and stream power index (SPI). All models indicated the slope as the most important factor triggering the flash flood occurrence. The highest area under the curve (AUC) was achieved by the RF model (AUC =0.956), followed by the BRT model (AUC =0.899), XGBoost model (AUC =0.892), and CART model (AUC =0.868), respectively. The results showed that the central part of the Bâsca Chiojdului river basin, which covers approximately 30% of the study area, is more susceptible to flash flooding.
Article
Full-text available
Flood occurs as a result of high intensity and long-term rainfalls accompanied by snowmelt which flow out of the main river channel onto the flood prone areas and damage the buildings, roads, and facilities and cause life losses. This study aims to implement extreme gradient boosting (EGB) method for the first time in flood susceptibility modelling and compare its performance with three advanced benchmark models including Frequency Ratio (FR), Random Forest (RF), and Generalized Additive Model (GAM). Flood susceptibility map is an efficient tool to make decision for flood control. To do this, the altitude, slope degree, profile curvature, topographic wetness index (TWI), distance from rivers, normalized difference vegetation index, plan curvature, rainfall, land use, stream power index, and lithology were fed to the models. To run the models, 243 flood locations were detected by field surveys and national reports. The same number of locations were randomly created in the study regions and considered as non-flood locations. The flood and non-flood locations were split in 70% ratio for the training dataset and 30% ratio for the testing dataset. Both flood and non-flood locations were fed into the models and output flood susceptibility maps were produced. In order to evaluate the performance of the algorithms, receiver operating characteristics (ROC) curve was implemented. The results of the current research show that the RF model and EGB have the best performances with the area under ROC curve (AUC) of 0.985, and 0.980, followed by the GAM and FR algorithms with AUC values of 0.97, and 0.953, respectively. The results of variable importance by the RF model show that distance from rivers has an important influence on flood susceptibility mapping (FSM), followed by profile curvature, slope, TWI, and altitude. Considering the high performances of the RF and EGB models in flood susceptibility modelling, application of these models is recommended for such studies.
Article
Full-text available
Floods represent catastrophic environmental hazards that have a significant impact on the environment and human life and their activities. Environmental and water management in many countries require modeling of flood susceptibility to help in reducing the damages and impact of floods. The objective of the current work is to employ four data mining/machine learning models to generate flood susceptibility maps, namely boosted regression tree (BRT), functional data analysis (FDA), general linear model (GLM), and multivariate discriminant analysis (MDA). This study was done in Wadi Qena Basin in Egypt. Flood inundated locations were determined and extracted from the interpretation of different datasets, including high-resolution satellite images (sentinel-2 and Astro digital) (after flood events), historical records, and intensive field works. In total, 342 flood inundated locations were mapped using ArcGIS 10.5, which separated into two groups; training (has 239 flood locations represents 70%) and validating (has 103 flood locations represents 30%), respectively. Nine themes of flood-influencing factors were prepared, including slope-angle, slope length, altitude, distance from main wadis, landuse/landcover, lithological units, curvature, slope-aspect, and topographic wetness index. The relationships between the flood-influencing factors and the flood inventory map were evaluated using the mentioned models (BRT, FDA, GLM, and MDA). The results were compared with flood inundating locations (validating flood sites), which were not used in constructing the models. The accuracy of the models was calculated through the success (training data) and prediction (validation data) rate curves according to the receiver operating characteristics (ROC) and the area under the curve (AUC). The results showed that the AUC for success and prediction rates are 0.783, 0.958, 0.816, 0.821 and 0.812, 0.856, 0.862, 0.769 for BRT, FDA, GLM, and MDA models, respectively. Subsequently, flood susceptibility maps were divided into five classes, including very low, low, moderate, high, and very high susceptibility. The results revealed that the BRT, FDA, GLM, and MDA models provide reasonable accuracy in flood susceptibility mapping. The produced susceptibility maps might be vitally important for future development activities in the area, especially in choosing new urban areas, infrastructural activities, and flood mitigation areas.
Article
Full-text available
Neurocomputing methods have contributed significantly to the advancement of modelling techniques in surface water hydrology and hydraulics in the last couple of decades, primarily due to their vast performance advantages and usage amenity. This comprehensive review considers the research progress in the past two decades, the current state-of-the-art, and future prospects of the application of neurocomputing to different aspects of hydrological sciences, i.e., quantitative surface hydrology and hydraulics. An extensive literature survey, by running over more than 800 peer-reviewed papers, outlines and concisely explores the past and recent tendencies in the application of conventional neural-based approaches and modern neurocomputing models in relevant topics of hydrological and hydraulic sciences. Apart from segregated descriptions and analyses of the main facets of surface hydrology and hydraulics, this review offers a practical summary of prevailing neurocomputing methods used in different subfields of hydrology and water engineering. Six relevant topics to modelling hydrological and hydraulic sciences are articulated and analysed, including modelling of water level in surface water bodies, flood and risk assessment, sediment transport in river systems, urban water demand prediction, modelling flow through hydro-structures, and hydraulics of sewers. This review is meant to be a mainstream guideline for researchers and practitioners whose work is associated with data mining and machine learning methods in various areas of water engineering and hydrological sciences to assist them to decide on suitable methods, network structures and modelling strategies for a given problem.
Article
Full-text available
Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest--descent minimization. A general gradient--descent "boosting" paradigm is developed for additive expansions based on any fitting criterion. Specific algorithms are presented for least--squares, least--absolute--deviation, and Huber--M loss functions for regression, and multi--class logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are decision trees, and tools for interpreting such "TreeBoost" models are presented. Gradient boosting of decision trees produces competitive, highly robust, interpretable procedures for regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire 1996, and Fr...
Article
Although the effect of digital elevation model (DEM) and its spatial resolution on flood simulation modeling has been well studied, the effect of coarse and finer resolution image and DEM data on machine learning ensemble flood susceptibility prediction has not been investigated, particularly in data sparse conditions. The present work was, therefore, to investigate the performance of the resolution effects, such as coarse (Landsat and SRTM) and high (Sentinel-2 and ALOS PALSAR) resolution data on the flood susceptible models. Another motive of this study was to construct very high precision and robust flood susceptible models using standalone and ensemble machine learning algorithms. In the present study, fifteen flood conditioning parameters were generated from both coarse and high resolution datasets. Then, the ANN-multilayer perceptron (MLP), random forest (RF), bagging (B)-MLP, B-gaussian processes (B-GP) and B-SMOreg algorithms were used to integrate the flood conditioning parameters for generating the flood susceptible models. Furthermore, the influence of flood conditioning parameters on the modelling of flood susceptibility was investigated by proposing an ROC based sensitivity analysis. The validation of flood susceptibility models is also another challenge. In the present study, we proposed an index of flood vulnerability model to validate flood susceptibility models along with conventional statistical techniques, such as the ROC curve. Results showed that the coarse resolution based flood susceptibility MLP model has appeared as the best model (area under curve: 0.94) and it has predicted 11.65 % of the area as very high flood susceptible zones (FSz), followed by RF, B-MLP, B-GP, and B-SMOreg. Similarly, the high resolution based flood susceptibility model using MLP has predicted 19.34 % of areas as very high flood susceptible zones, followed by RF (14.32 %),B-MLP (14.88 %), B-GP, and B-SMOreg. On the other hand, ROC based sensitivity analysis showed that elevation influences flood susceptibility largely for coarse and high resolution based models, followed by drainage density and flow accumulation. In addition, the accuracy assessment using the IFV model revealed that the MLP model outperformed all other models in the case of a high resolution image. The coarser resolution image's performance level is acceptable but quite low. So, the study recommended the use of high resolution images for developing a machine learning algorithm based flood susceptibility model. As the study has clearly identified the areas of higher flood susceptibility and the dominant influencing factors for flooding, this could be used as a good database for flood management.
Article
Recently, there has been a notable tendency towards employing ensemble learning methodologies in assorted areas of engineering, such as hydrology, for simulation and prediction purposes. The diversity of ensemble techniques available for implementation in hydrological sciences has led to the development and utilization of different strategies in the implementation. This review paper explores and refers to the advancement of ensemble methods, including the resampling ensemble methods (e.g., bagging, boosting, and dagging), model averaging, and stacking viz. generalized stacked, in different application fields of hydrology. The main hydrological topics in this review study cover subjects such as surface hydrology, river water quality, rainfall-runoff, debris flow, river icing, sediment transport, groundwater, flooding, and drought modeling and forecasting. The general findings of this survey demonstrate the absolute superiority of using ensemble strategies over the regular (individual) model learning in hydrology. In addition, the boosting techniques (e.g., boosting, AdaBoost, and extreme gradient boosting) have been more frequent and successfully implemented in hydrological problems than the bagging, stacking, and dagging approaches.
Article
Despite massive investments and continuous flood-control efforts in India, the socioeconomic damages and death toll continue to remain high. Undoubtedly, the process of flood management in India is very complex due to the influence of several socio-hydroclimatological factors, such as climate change, sea level rise, and socioeconomic dynamics. While these factors influence the intensity and frequency of flood events, factors explicitly related to the process of flood management, such as the improper execution of traditional structural measures, the lack of the proper implementation of schemes and the faulty end-to-end management of the flood management programs/practices, ensure only partial protection. This review article identifies the region-specific flood problems in India and discusses the initiatives undertaken by major Indian flood management agencies, with an emphasis on the current ongoing flood management practices. The effectiveness of these practices in the long term is discussed, and specific gaps are identified. The recommendations provided in this article may be useful to guide stakeholders and policymakers in formulating and implementing sustainable flood management plans for improved flood resilience.
Conference Paper
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Article
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
Article
The articles published by the Annals of Eugenics (1925–1954) have been made available online as an historical archive intended for scholarly use. The work of eugenicists was often pervaded by prejudice against racial, ethnic and disabled groups. The online publication of this material for scholarly research purposes is not an endorsement of those views nor a promotion of eugenics in any way.
Maharashtra: Over 53,000 evacuated from rain, flood-hit Vidarbha region
  • India Today
Heavy rains claim 25 lives in Chandrapur this monsoon
  • Hindustan Times
Impact of extreme weather events in relation to floods over maharashtra in recent years
  • D M Rase
  • P S Narayanan
  • K N Mohan
Artificial neural networks for flood susceptibility mapping in data-scarce urban areas. Spatial modeling in GIS and R for earth and environmental sciences
  • F Falah
  • O Rahmati
  • M Rostami