ArticlePDF Available

Evaluation of Geospatial Interpolation Techniques for Enhancing Spatiotemporal Rainfall Distribution and Filling Data Gaps in Asir Region, Saudi Arabia

MDPI
Sustainability
Authors:

Abstract and Figures

Providing an accurate spatiotemporal distribution of rainfall and filling data gaps are pivotal for effective water resource management. This study focuses on the Asir region in the southwest of Saudi Arabia. Given the limited accuracy of satellite data in this arid/mountain-dominated study area, geospatial interpolation has emerged as a viable alternative approach for filling terrestrial records data gaps. Furthermore, the irregularity in rain gauge data and the yearly spatial variation in data gaps hinder the creation of a coherent distribution pattern. To address this, the Centered Root Mean Square Error (CRMSE) is employed as a criterion to select the most appropriate geospatial interpolation technique among 51 evaluated methods for maximum and total yearly precipitation data. This study produced gap-free maps of total and maximum yearly precipitation from 1966 to 2013. Beyond 2013, it is recommended to utilize ordinary Kriging with a J-Bessel semivariogram and simple Kriging with a K-Bessel semivariogram to estimate the spatial distribution of maximum and total yearly rainfall depth, respectively. Additionally, a proposed methodology for allocating additional rain gauges to improve the accuracy of rainfall spatial distribution is introduced based on a cross-validation error (CVE) assessment. Newly proposed gauges in the study area resulted in a significant 21% CVE reduction.
This content is subject to copyright.
Citation: Helmi, A.M.; Elgamal, M.;
Farouk, M.I.; Abdelhamed, M.S.;
Essawy, B.T. Evaluation of Geospatial
Interpolation Techniques for
Enhancing Spatiotemporal Rainfall
Distribution and Filling Data Gaps in
Asir Region, Saudi Arabia.
Sustainability 2023,15, 14028.
https://doi.org/10.3390/
su151814028
Academic Editors: Vahid Nourani
and Aida H. Baghanam
Received: 23 August 2023
Revised: 17 September 2023
Accepted: 20 September 2023
Published: 21 September 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
sustainability
Article
Evaluation of Geospatial Interpolation Techniques for
Enhancing Spatiotemporal Rainfall Distribution and Filling
Data Gaps in Asir Region, Saudi Arabia
Ahmed M. Helmi 1, * , Mohamed Elgamal 2, Mohamed I. Farouk 2,3 , Mohamed S. Abdelhamed 4
and Bakinam T. Essawy 5
1Irrigation and Hydraulics Department, Faculty of Engineering, Cairo University, Giza 12613, Egypt
2Civil Engineering Department, College of Engineering, Imam Mohammad Ibn Saud Islamic
University (IMSIU), Riyadh 13318, Saudi Arabia; mhelgamal@imamu.edu.sa (M.E.);
miradi@imamu.edu.sa (M.I.F.)
3Irrigation and Hydraulics Department, Faculty of Engineering, Ain Shams University, Cairo 11517, Egypt
4Global Institute for Water Security, University of Saskatchewan, Saskatoon, SK S7N 3H5, Canada;
mohamed.abdelhamed@usask.ca
5Civil and Infrastructure Engineering and Management Department, Nile University,
Sheikh Zayed City 12588, Egypt; btaessawy@nu.edu.eg
*Correspondence: ahmed.helmi@eng.cu.edu.eg
Abstract:
Providing an accurate spatiotemporal distribution of rainfall and filling data gaps are
pivotal for effective water resource management. This study focuses on the Asir region in the
southwest of Saudi Arabia. Given the limited accuracy of satellite data in this arid/mountain-
dominated study area, geospatial interpolation has emerged as a viable alternative approach for
filling terrestrial records data gaps. Furthermore, the irregularity in rain gauge data and the yearly
spatial variation in data gaps hinder the creation of a coherent distribution pattern. To address this, the
Centered Root Mean Square Error (CRMSE) is employed as a criterion to select the most appropriate
geospatial interpolation technique among 51 evaluated methods for maximum and total yearly
precipitation data. This study produced gap-free maps of total and maximum yearly precipitation
from 1966 to 2013. Beyond 2013, it is recommended to utilize ordinary Kriging with a J-Bessel
semivariogram and simple Kriging with a K-Bessel semivariogram to estimate the spatial distribution
of maximum and total yearly rainfall depth, respectively. Additionally, a proposed methodology for
allocating additional rain gauges to improve the accuracy of rainfall spatial distribution is introduced
based on a cross-validation error (CVE) assessment. Newly proposed gauges in the study area
resulted in a significant 21% CVE reduction.
Keywords: geospatial interpolation; data gaps; kriging; optimum rainfall locations; arid region
1. Introduction
Arid lands cover 41% of the Earth’s surface and sustain one-third of the global popula-
tion, half the world’s livestock, and around 44% of its food production [
1
]. These regions,
characterized by low rainfall and high evaporation rates, have witnessed heightened water
demand in recent decades, intensifying pressure on freshwater resources and amplifying
water scarcity concerns [
2
,
3
]. The irregular occurrence of droughts and floods in these areas
presents substantial challenges to water resource management [
4
]. Moreover, optimizing
irrigation practices and enhancing crop yields, especially in water-scarce arid regions,
relies on a comprehensive understanding of rainfall distribution [
5
]. Analyzing these
patterns facilitates the identification of drought- and flood-prone zones, enabling proactive
mitigation strategies. Predicting rainfall in arid regions poses challenges due to intricate
interactions among meteorological factors such as air temperature, humidity, and wind,
further compounded by the scarcity of comprehensive meteorological networks, which
Sustainability 2023,15, 14028. https://doi.org/10.3390/su151814028 https://www.mdpi.com/journal/sustainability
Sustainability 2023,15, 14028 2 of 32
hinders precise assessment efforts [
6
9
]. Navigating the sporadic and unpredictable nature
of rainfall patterns necessitates the inclusion of additional environmental determinants,
such as the topography and elevation of different locations [
10
,
11
]. Hence, accurate rainfall
measurement and prediction are vital for effective water management in such strategic
regions.
The emergence of spatial data and distributed models, coupled with advanced com-
puting capabilities, has collectively led to the integration of these models into scientific
research due to their ability to simulate intricate flow regimes within complex study do-
mains [
12
14
]. Factors including spatial rain gauge distribution, elevation, and historical
data length contribute to the precision of spatial rainfall estimation at targeted sites. These
factors affect the simulation of various hydrological processes like runoff, recharge, soil
moisture, and evaporation [
4
,
15
,
16
]. Therefore, ensuring accurate gridded rainfall data is
crucial to meet the demands of these models.
Ground truth rainfall records provide valuable insights for calibration, identifying
trends and patterns, analyzing the impact of climate changes, and correcting biases in
current climatic models and their future projections [
17
,
18
]. The number of rain gauges
required to obtain accurate rainfall estimates depends on the size of the area under con-
sideration and the desired level of accuracy [
6
]. Inadequate gauge numbers can result in
underestimation or overestimation of rainfall amounts, leading to erroneous conclusions
about the water balance within the studied region [
19
,
20
]. Challenges are also apparent in
data accessibility and limitations, such as brief availability or data gaps, while data relia-
bility may be compromised due to measurement errors or inhomogeneities and potential
unavailability [
19
]. Even with a well-distributed gauge network, some dysfunction during
storm events exacerbates the challenges. One potential issue when examining rainfall data
involves missing records due to various factors, including discontinued measurements,
and damaged/displaced rainfall gauges following flood events, among others. These
data gaps can yield false trends or spatial correlations that inaccurately represent natu-
ral variations [
21
,
22
]. Researchers employ advanced modeling techniques and satellite
data to address these challenges and enhance the precision of rainfall prediction in arid
regions. These techniques facilitate a comprehensive understanding of the factors influ-
encing rainfall patterns and provide details about spatial distribution [
12
,
14
,
20
,
21
]. From a
hydrological standpoint, assessing the reliability of existing ground truth rain gauges in
estimating rainfall data in areas lacking gauge information is pivotal.
The ground rain gauge station network in arid regions exhibits a dual nature, character-
ized by coarse spacing adhering to World Meteorological Organization (WMO) guidelines
for minimum density and historical rainfall data often containing gaps that need filling
before use [
23
]. Several techniques are detailed in the literature to address these gaps,
including using satellite data [
24
27
], implementing geostatistical interpolation techniques
between terrestrial rain gauges to conduct regional rainfall analyses from available data
measurements as detailed in Table 1, or using machine learning and statistical analysis [
28
].
Although satellite-based rainfall estimates can provide valuable information, their limi-
tations in accuracy and spatial resolution emphasize the necessity of ground truth rain
gauges and appropriate interpolation methods to improve the accuracy of rainfall esti-
mates. For instance, satellite data have not yet provided a satisfying accuracy for total and
maximum yearly precipitation information in the Arabian Peninsula, making geostatistical
interpolation between terrestrial rain gauges crucial for obtaining reliable rainfall data [
29
].
Therefore, it is essential to have an adequate density of rain gauges over the area under
consideration to obtain accurate rainfall estimates.
Spatial interpolation techniques are widely utilized to enhance the spatial resolution
of data through the estimation of values in areas without sampling, applied in geosciences,
water resources, environmental sciences, agriculture, and civil engineering [
30
,
31
]. How-
ever, their accuracy is influenced by factors such as sampling density, sample spatial
distribution, sample clustering, surface type, data variance, data normality, quality of sec-
ondary information, stratification, and grid size or resolution [
32
,
33
], sometimes resulting
Sustainability 2023,15, 14028 3 of 32
in underestimations or overestimations. To enhance accuracy, ground truth gauges are
used for calibration in these techniques [
18
]. The selection of an interpolation method,
whether deterministic or geostatistical, depends on the trade-off between accuracy and
computational efficiency. Deterministic techniques are more straightforward and faster
but do not account for the uncertainty associated with the estimation process [
34
], while
geostatistical techniques provide a comprehensive view of the uncertainty associated with
the estimation process but are computationally more intensive [
35
,
36
]. As a result, several
spatial interpolation techniques have been developed that are appropriate for the rapid
estimation process [20,3741].
To evaluate the efficiency of spatial interpolation techniques, it is imperative to con-
sider the following key factors: (1) the design of the sampling process; (2) the mean and
coefficient of variation of the primary variable for either the estimation dataset or validation
dataset; (3) the sample size for both the estimation and validation datasets; (4) the geograph-
ical extent of the study area; and (5) the use of appropriate accuracy measurements [
31
].
Studies have analyzed spatial interpolation techniques (see Table 1); however, compre-
hensive conclusions remain elusive. Despite this, the three most frequently compared
techniques are Ordinary Kriging (OK), Inverse Distance Weighting (IDW) with inverse
distance squared (IDS), while different types of Kriging geostatistical techniques were
evaluated: Ordinary Kriging, Cokriging, and Empirical Bayesian Kriging.
Table 1. Sample of available comparative studies in the literature.
Interpolation Method Validation Method Recommended Method Ref.
Thiessen Polygon (TB),
Inverse Distance Weighting (IDW),
Linear Regression (LG),
Kriging with External Drift (KED),
Ordinary Kriging (OK)
Root Mean Squared Error (RMSE) Ordinary Kriging (OK) [42]
Inverse Distance Weighting (IDW),
Local Polynomial Interpolation (LPI)
Global Polynomial Interpolation (GPI)
Simple Kriging (SK)
Universal Kriging (UK),
Ordinary Kriging (OK),
Radial Basis Function (RBF)
Mean Error (ME)
Root Mean Squared Error (RMSE) Ordinary Kriging (OK) [43]
Natural Neighbor Interpolation (NNI),
Ordinary Kriging (OK)
Cokriging (CK)
Root Mean Squared Error (RMSE) Cokriging (CK) [44]
Kriging with External Drift (KED),
Optimal Interpolation Method (OIM),
Thiessen Polygons (TB)
Root Mean Squared Error (RMSE) Optimal Interpolation Method
(OIM) [45]
Inverse Distance Weighting (IDW),
Radial Basis Function (RBF),
Diffusion Interpolation with Barrier
(DIB),
Kernel Interpolation with Barrier (KIB),
Ordinary Kriging (OK),
Empirical Bayesian Kriging (EBK)
Leave-One-Out Cross-Validation
(LOOCV),
Mean Square Error (MSE),
Mean Absolute Error (MAE),
Mean Absolute Percentage Error
(MAPE),
Symmetric Mean Absolute
Percentage Error (SMAPE)
Nash–Sutcliffe Efficiency Coefficient
(NSE)
Kernel Interpolation with Barrier
(KIB) [46]
Sustainability 2023,15, 14028 4 of 32
Table 1. Cont.
Interpolation Method Validation Method Recommended Method Ref.
Inverse Distance Weighting (IDW),
Radial Basis Function (RBF),
Local Polynomial Interpolation (LPI),
Global Polynomial Interpolation (GPI),
Simple Kriging (SK),
Universal Kriging (UK),
Ordinary Kriging (OK),
Empirical Bayesian Kriging (EBK),
Empirical Bayesian Kriging Regression
Prediction (EBKRP)
Mean Error (ME),
Root Mean Square Error (RMSE),
Pearson R2 (R2),
Mean Standardized Error (MSE),
Root Mean
Square Standardized Error (RMSSE),
Average Standard Error (ASE)
Empirical Bayesian Kriging
Regression Prediction (EBKRP) [47]
Inverse Distance Weighting (IDW),
Kriging,
ANUDEM,
Spline
Mean Absolute Error (MAE),
Mean Relative Error (MRE),
Root Mean Squared Error (RMSE),
Spatial and Temporal Distributions.
Inverse Distance Weighting (IDW)
[48]
Inverse Distance Weighting,
Natural Neighbor (NN),
Regularized Spline (RS),
Tension Spline (TS),
Ordinary Kriging (OK),
Universal Kriging (UK)
Root Mean Square Error (RMSE),
Mean Absolute Error (MAE),
Mean Bias Error (MBE),
Coefficient of Determination (R2)
Ordinary Kriging (OK) [49]
Inverse Distance Weighting (IDW),
Ordinary Kriging (OK),
Ordinary Cokriging (OCK),
Linear Regression (LR),
Simple Kriging with varying Local
Means (SKLM),
Kriging with an External Drift (KED)
Mean Error (ME), and
Root Mean-Square Error (RMSE) Ordinary Cokriging (OCK) [50]
Circular Ordinary Kriging (COK),
Spherical Ordinary Kriging (SOK),
Exponential Ordinary Kriging (EOK),
Gaussian Ordinary Kriging (GOK),
Empirical Bayesian Kriging (EBK)
Mean Error (ME),
Mean Standardized Error (MSDE),
Root Mean Square Standardized
Error (RMSSDE)
Mean Standard Error (MSE),
Root Mean Square Error (RMSE)
Exponential Ordinary Kriging
(EOK),
Empirical Bayesian Kriging (EBK)
[51]
Saudi Arabia ranks among the arid countries confronting challenges in accurately
predicting rainfall patterns [
4
]. Therefore, this study aims to rectify the lack of rainfall data
in an arid region, the Asir area in Saudi Arabia, to provide a reliable and accurate spatial
distribution of rainfall. The objectives of this study encompass:
1.
Assessing various spatial interpolation techniques to ascertain the optimal approach
for accurate rainfall prediction across diverse arid regions.
2.
Analyzing the sufficiency of rainfall station distribution and pinpointing optimal sites
for installing new rain gauges within the study area.
3.
Providing an illustrative example elucidating the practical utilization of study out-
comes in filling data gaps at any location and time within the study area for end-users.
In this study, we collect rain gauge data for the Asir Province; however, some gauges
lack rainfall records for certain years. As such, we first apply various spatial interpolation
techniques to fill in the gap of the missing data for these gauges. Subsequently, we review
the distribution of rain gauge stations to determine their sufficiency and determine if more
meteorological stations are necessary for specific spots. Ultimately, we suggest the optimal
locations for installing additional meteorological stations to improve data collection. The
remainder of the article is organized as follows: Section 2outlines the study area and
its characteristics. Section 3presents the materials and methods. Section 4reports the
main results and discussion of the findings. Section 5demonstrates a use case to help
Sustainability 2023,15, 14028 5 of 32
end-users apply the findings in practical situations, and the article ends with a summary
and conclusions in Section 6.
2. The Study Area
The Kingdom of Saudi Arabia (KSA) is the world’s 13th largest country, with an area of
about 2,150,000 square kilometers [
52
,
53
]. The KSA extends between coordinates
33.75:56.25 E
and 16.5:32.5 N, WGS84, and covers roughly 80% of the Arabian Peninsula [
54
]. The KSA
overlooks both the Arabian Gulf and the Red Sea, boasting 550 and 2250 km of shoreline,
respectively. These water bodies serve as the kingdom’s primary sources of water vapor [
55
].
The KSA has a complex topography that can be classified into four categories: (1) coastal
plains, (2) central and northern plateaus, (3) the central Tuwayq mountains, and (4) the western
Asir mountains. Topographical elevations ascend steeply from the Red Sea shoreline towards
the southwestern Asir mountains, reaching the kingdom’s highest value at 2990 m above
mean sea level [56]. The location and topography of the KSA are shown in Figure 1.
Figure 1. The KSA’s location and topography.
The KSA is predominantly categorized into arid and semi-arid regions [
57
,
58
]. Given
its prevailing arid climate and limited water resources, only five percent of cultivable
land is utilized [
59
]. Notably, the majority of rainfall events occur between October and
April [
60
]. Figure 2shows the spatial distribution of the average annual rainfall over
Sustainability 2023,15, 14028 6 of 32
the KSA, obtained from available rain gauge records [
29
]. The southwestern KSA region
has the highest potential of harnessing rainfall as a water resource [
60
]. This region’s
mountainous terrain and complex topography generate orographic (convectional) rains,
making it an ideal location for water resource exploitation [
57
,
61
,
62
]. Aligned with the World
Bank’s recommendations for addressing physical water resource scarcity within the MENA
region [63], the area highlighted in Figure 2was selected as the study area for this study.
Figure 2. Average annual rainfall over the KSA, adopted from Helmi and Abdelhamed (2023) [29].
3. Materials and Methods
3.1. Rain Gauges
Assessing the adequacy of the rain gauge amount and distribution is the backbone of
accurate quantification of rainfall volume. This imperative evaluation becomes particularly
pronounced within mountainous terrains, owing to the pronounced spatial disparities
in rainfall patterns. The available terrestrial rainfall data used in the current study were
collected from official KSA authorities—the Ministry of Environment of the KSA (ME), the
Ministry of Water and Agriculture of the KSA (MEWA), and the Presidency of Meteorology
and Environment of the KSA (PME). Different types of rainfall gauges are available in the
KSA: (1) meteorological (recording) gauges, (2) daily rainfall gauges, (3) recording rainfall
gauges, and (4) totalizing rainfall gauges [
64
]. Within the geographical scope of this study,
a total of 128 rain gauge data were acquired, as shown in Figure 3A. Notably, the available
records obtained from totalizing gauges exhibit non-uniform temporal intervals, which can
restrict their effectiveness in providing accurate estimates for interpreting/analyzing daily
or yearly records. Five totalizing gauges are located in the study area, as highlighted in
Figure 3B. Based on this criterion, the remaining 123 gauges were selected for the current
study, as shown in Figure 3C. The study area has significant variability in altitude among
its rain gauges, from 0 to 2603 m above mean sea level, as illustrated in Figure 3D. The
properties of the selected rain gauges are summarized in Table 2, including the number of
records available between the first and last available dates, as well as any missing yearly
records (data gaps) during that period. Inconsistency can be noted in the available rain
gauge data due to the randomness in rainfall patterns and the spatial variability in the
Sustainability 2023,15, 14028 7 of 32
missing data locations in the available time frame. Figure 4shows the temporal variation in
the 123 selected gauges. To perform a geostatistical analysis in the study area, a threshold
of at least 50 rain gauge data points was chosen. This threshold was met, allowing for a
48-year analysis period from 1966 to 2013. However, there are gaps in the rainfall records
that need to be predicted with an appropriate interpolation method.
Figure 3.
(
A
) The collected 128 rainfall gauges’ locations, (
B
) types of available rainfall gauges,
(C) the selected rainfall gauges for the current study, and (D) elevations of the selected rain gauges.
Sustainability 2023,15, 14028 8 of 32
Table 2. Characteristics of the selected rain gauges.
Gauge Long
(WGS 84) Lat (WGS84) Altitude (m)
Mean Max.
Rainfall
(mm)
Mean Total
Rainfall
(mm)
Gauge Long
(WGS 84)
Lat
(WGS 84) Altitude (m)
Mean Max.
Rainfall
(mm)
Mean Total
Rainfall
(mm)
A004 43.10 18.17 2280 31.9 97.9 J141 39.92 22.50 661 49.7 96.4
A005 42.48 18.20 2186 57.3 282.8 J204 40.20 21.35 685 39.7 154.0
A006 42.60 18.25 2121 40.3 238.5 J205 40.22 21.35 750 46.3 171.0
A007 42.15 19.10 2376 61.8 344.4 J214 39.98 21.98 641 33.5 97.0
A103 42.78 18.10 2224 39.5 237.9 J219 39.43 22.20 192 22.2 38.4
A104 43.37 17.93 2281 32.1 127.2 J220 39.82 22.37 473 23.4 62.2
A105 43.18 18.23 2206 23.9 70.1 J221 39.35 21.92 88 27.5 45.0
A106 42.48 18.27 2356 38.5 253.2 J239 39.68 21.97 269 23.9 64.1
A107 42.57 18.60 2016 31.9 109.8 N001 44.23 17.55 1278 19.8 44.5
A108 42.38 18.52 2516 26.4 85.0 N103 43.63 17.68 2191 34.2 145.9
A110 42.98 18.68 1802 33.9 98.3 N203 43.62 17.67 2036 26.9 94.3
A112 42.57 18.37 2096 36.2 112.4 SA001 42.95 17.05 169 36.4 203.0
A113 42.68 18.63 1840 17.5 69.3 SA002 42.62 17.17 32 28.2 80.2
A117 42.27 18.62 2489 38.2 181.7 SA003 41.88 19.00 355 37.5 202.7
A118 42.37 18.25 2197 56.7 333.7 SA004 41.40 18.73 41 27.4 61.2
A120 42.17 18.88 2308 56.0 213.9 SA005 41.88 19.56 2263 27.2 172.5
A121 42.75 18.03 2269 44.5 292.7 SA101 42.83 16.97 68 41.6 173.0
A123 42.87 18.32 2039 34.3 128.9 SA102 42.23 17.70 51 30.1 65.2
A124 42.33 18.42 2577 41.0 185.0 SA104 43.08 17.04 574 50.3 403.9
A126 43.22 18.53 1850 12.9 26.5 SA105 41.97 18.93 458 48.6 364.1
A127 42.25 18.78 2531 48.3 247.3 SA106 42.53 17.37 73 37.2 137.5
A128 42.70 18.47 1927 16.5 60.2 SA107 42.78 17.12 78 44.6 170.0
A130 42.32 18.33 2470 36.1 216.5 SA108 41.92 18.33 533 41.8 255.8
A201 42.52 18.42 2083 25.9 103.0 SA110 43.13 17.27 1210 47.1 401.0
Sustainability 2023,15, 14028 9 of 32
Table 2. Cont.
Gauge Long
(WGS 84) Lat (WGS84) Altitude (m)
Mean Max.
Rainfall
(mm)
Mean Total
Rainfall
(mm)
Gauge Long
(WGS 84)
Lat
(WGS 84) Altitude (m)
Mean Max.
Rainfall
(mm)
Mean Total
Rainfall
(mm)
A206 42.25 18.68 2603 39.1 206.6 SA111 43.12 17.05 574 54.5 444.1
A213 42.83 18.17 2114 30.4 132.0 SA113 42.03 18.53 455 41.3 298.1
B001 41.29 20.18 1932 58.9 294.0 SA115 41.67 18.00 0 28.1 54.6
B004 42.60 20.02 1155 21.3 71.8 SA116 42.20 18.25 1421 49.0 321.7
B005 42.53 19.87 1202 18.6 73.6 SA120 41.83 19.43 611 46.2 243.5
B006 43.52 19.53 1090 21.8 48.9 SA122 41.83 19.12 377 31.1 208.6
B007 41.57 19.87 2047 52.6 269.3 SA125 42.45 17.13 7 25.8 72.5
B008 42.67 20.08 1139 24.3 80.1 SA126 42.88 17.45 559 45.9 502.3
B009 41.90 19.53 2279 46.0 272.6 SA129 42.90 17.17 163 49.9 313.7
B101 41.58 19.90 2040 50.8 224.9 SA132 42.78 17.02 61 29.5 127.0
B103 41.65 20.25 1571 20.0 51.5 SA135 43.23 16.80 287 43.7 330.6
B110 42.88 18.80 1742 22.4 73.2 SA136 43.13 16.68 159 45.6 251.7
B111 42.85 21.25 922 18.4 48.8 SA137 42.95 16.60 61 29.5 116.3
B114 42.23 20.02 1286 25.5 98.2 SA138 42.02 18.63 393 39.7 267.1
B208 42.73 19.02 1717 20.3 69.6 SA139 42.03 19.05 900 26.2 293.3
B216 41.98 19.47 2239 42.9 234.8 SA140 43.03 17.32 688 49.7 448.6
B217 41.93 19.75 1756 43.0 170.4 SA142 41.58 18.77 104 32.3 101.9
B219 42.80 19.33 1475 18.4 53.1 SA143 43.13 16.90 259 46.3 420.7
B220 42.04 19.20 1571 36.0 142.6 SA144 42.25 18.17 1013 46.9 366.6
J001 41.05 19.53 53 36.8 77.3 SA145 42.80 17.62 699 46.0 306.8
J002 39.34 22.16 72 18.3 33.3 SA147 41.47 19.03 115 30.9 66.3
J102 39.70 21.43 355 27.8 50.3 SA148 42.53 16.92 0 35.5 79.0
J106 39.33 22.15 66 18.8 35.8 SA204 42.60 17.57 188 43.3 160.2
J107 40.45 20.32 95 34.1 63.6 TA002 40.50 21.30 1590 33.0 148.2
J108 40.28 20.15 7 32.8 59.8 TA004 40.45 21.40 1553 28.6 119.4
J111 38.83 23.10 12 17.4 33.7 TA005 41.67 21.18 1148 14.0 49.4
Sustainability 2023,15, 14028 10 of 32
Table 2. Cont.
Gauge Long
(WGS 84) Lat (WGS84) Altitude (m)
Mean Max.
Rainfall
(mm)
Mean Total
Rainfall
(mm)
Gauge Long
(WGS 84)
Lat
(WGS 84) Altitude (m)
Mean Max.
Rainfall
(mm)
Mean Total
Rainfall
(mm)
J113 40.12 21.37 455 47.8 146.0 TA006 41.28 20.62 1389 52.6 117.9
J114 39.82 21.43 298 37.9 78.6 TA007 41.47 19.98 2256 48.6 194.0
J116 39.63 22.58 394 28.2 63.3 TA104 40.80 21.32 1394 31.4 97.3
J121 41.05 20.23 338 40.9 162.1 TA106 40.32 21.33 1888 44.8 185.5
J124 41.28 19.95 586 30.0 114.4 TA109 40.37 21.07 2145 45.4 269.6
J126 41.43 19.77 353 37.3 236.8 TA125 40.42 21.26 1713 29.7 51.7
J127 41.53 19.67 657 52.7 184.0 TA206 40.40 21.28 1675 35.7 165.8
J131 41.60 19.47 474 41.9 182.6 TA233 40.65 21.13 1691 48.5 205.6
J134 39.20 21.50 15 28.7 51.4 TA250 40.45 21.67 1241 22.7 91.2
J137 41.33 19.97 632 36.4 222.3 TA251 40.37 21.37 1822 30.1 120.1
J139 41.03 19.73 93 38.6 63.9 TA255 40.36 21.24 1730 34.1 118.5
J140 39.03 22.82 10 21.1 39.4
Sustainability 2023,15, 14028 11 of 32
Figure 4. Temporal variation in available rain gauge data in the study area.
3.2. Interpolation Techniques
The characteristics of available datasets influence the spatial interpolation outcomes,
so selecting an accurate interpolation method is a crucial part of generating accurate
distribution rainfall maps. Until recently, most of the studies in this field provided a
limited comparison of interpolation techniques, relying on statistical errors to select the
best method. The choice of interpolation method depends on the nature of the data, the
size of the study area, and the desired level of accuracy. Each method has its pros and cons,
and it is important to choose the most appropriate one for a specific application [65].
The selected spatial interpolation techniques were systematically analyzed to deter-
mine their accuracy in estimating unknown rainfall values and predicting the spatial rainfall
distribution pattern. In the current study, the authors selected to evaluate
two different
interpolation categories, i.e., deterministic and geostatistical. Nearly all commonly known
interpolation techniques and various models within these techniques were analyzed to
determine the most optimal method for interpolating rainfall data in the study area.
Four interpolation techniques were evaluated under the deterministic technique um-
brella: Inverse Distance Weighting (IDW), Global Polynomial Interpolation (GPI), Radial
Basis Functions (RBF), and Local Polynomial Interpolation (LPI). Additionally, three in-
terpolation techniques under the geostatistical technique umbrella were also evaluated,
which were the Kriging, Cokriging, and Empirical Bayesian Kriging (EBK), with seven dif-
ferent semivariogram models such as Circular, Spherical, Exponential, Gaussian, K-Bessel,
J-Bessel, and Stable models. The interpolation techniques’ mathematical equations are
given in Appendix A.
Generally, 51 interpolation techniques (43 geostatistical and 8 deterministic) were
selected to be assessed in the current study to determine the most suitable interpolation
technique for the data available, as shown in Table 3.
Sustainability 2023,15, 14028 12 of 32
Table 3. Codes of the 51 interpolation techniques used in the current study.
Geostatistical Interpolation Techniques
Ordinary Kriging—Circular Variogram (KOC) Ordinary Cokriging—Circular Variogram (CKOC)
Ordinary Kriging—Spherical Variogram (KOS) Ordinary Cokriging—Spherical Variogram (CKOS)
Ordinary Kriging—Exponential Variogram (KOE) Ordinary Cokriging—Exponential Variogram (CKOE)
Ordinary Kriging—Gaussian Variogram (KOG) Ordinary Cokriging—Gaussian Variogram (CKOG)
Ordinary Kriging—K-Bessel Variogram (KOK) Ordinary Cokriging—K-Bessel Variogram (CKOK)
Ordinary Kriging—J-Bessel Variogram (KOJ) Ordinary Cokriging—J-Bessel Variogram (CKOJ)
Ordinary Kriging—Stable Variogram (KOT) Ordinary Cokriging—Stable Variogram (CKOT)
Simple Kriging—Circular Variogram (KSC) Simple Cokriging—Circular Variogram (CKSC)
Simple Kriging—Spherical Variogram (KSS) Simple Cokriging—Spherical Variogram (CKSS)
Simple Kriging—Exponential Variogram (KSE) Simple Cokriging—Exponential Variogram (CKSE)
Simple Kriging—Gaussian Variogram (KSG) Simple Cokriging—Gaussian Variogram (CKSG)
Simple Kriging—K-Bessel Variogram (KSK) Simple Cokriging—K-Bessel Variogram (CKSK)
Simple Kriging—J-Bessel Variogram (KSJ) Simple Cokriging—J-Bessel Variogram (CKSJ)
Simple Kriging—Stable Variogram (KST) Simple Cokriging—Stable Variogram (CKST)
Universal Kriging—Circular Variogram (KUC) Universal Cokriging—Circular Variogram (CKUC)
Universal Kriging—Spherical Variogram (KUS) Universal Cokriging—Spherical Variogram (CKUS)
Universal Kriging—Exponential Variogram (KUE) Universal Cokriging—Exponential Variogram (CKUE)
Universal Kriging—Gaussian Variogram (KUG) Universal Cokriging—Gaussian Variogram (CKUG)
Universal Kriging—K-Bessel Variogram (KUK) Universal Cokriging—K-Bessel Variogram (CKUK)
Universal Kriging—J-Bessel Variogram (KUJ) Universal Cokriging—J-Bessel Variogram (CKUJ)
Universal Kriging—Stable Variogram (KUT) Universal Cokriging—Stable Variogram (CKUT)
Empirical Bayesian Kriging (EBK)
Deterministic interpolation techniques
Global Polynomial Interpolation (GPI) Inverse Distance Weighting—P = 2 (IDWP2)
Local Polynomial Interpolation (LPI) Inverse Distance Weighting—P = 3 (IDWP)
Radial Basis Function (RBF) Inverse Distance Weighting—P = 4 (IDWP4)
Inverse Distance Weighting (P = 1) (IDWP1) Inverse Distance Weighting—P = 5 (IDWP)
3.2.1. Deterministic Techniques
The Inverse Distance Weighting (IDW) method estimates the value of the target point
based on the values of neighboring points, with more weight given to points closer to the
target point. The IDW method uses the Cartesian coordinates of the target station and
uses the inverse distance raised to a power P as a weighting factor for the adjacent points
values, as shown in Equation (1) [
31
,
33
,
34
,
36
,
42
]. According to this method, the influence
of a variable diminishes with the distance from its sampled location. Using a high power
in IDW emphasizes the nearest points, and creates a more detailed surface with lower
smoothness [66]. Five IDW interpolation powers were tested in the current study, namely
IDWP1, IDWP2, IDWP3, IDWP4, and IDWP5, for P= 1, 2, 3, 4, and 5, respectively.
zp=
n
i=1zi
dip
n
i=11
dip(1)
where
zi
is the known value point, di is the distance to a known point,
zi
is the unknown
point, and pis the exponent power (1, 2, 3, 4, or 5) [67].
Sustainability 2023,15, 14028 13 of 32
Global Polynomial Interpolation (GPI) is a mathematical technique that can calculate
a polynomial function to pass through a set of given points. GPI is a highly accurate
tool in numerical analysis and can be used to estimate any continuous function. The key
advantage of GPI over other interpolation techniques is that it provides a single polynomial
function to approximate the entire curve instead of several piecewise functions. To perform
GPI, a set of points that lie on the curve to be estimated must be chosen, and a polynomial
function is then constructed to pass through these points. The degree of the polynomial
is determined by the number of points selected. Higher-degree polynomials can provide
more accurate approximations, but they may also be more prone to rounding errors and
other factors. GPI is a fast and global method of predicting rainfall based on the polynomial
function [66,68]. The second-degree polynomial is selected for the current study.
The Radial Basis Function (RBF), also known as the Spline method, uses artificial
neural networks (ANN) to accurately interpolate data. The RBF is a popular tool for
interpolating multi-dimensional scattered data and can be easily generalized to various
space dimensions, making it useful in natural resource management. In addition to acting
as a class of interpolation techniques for georeferenced data, the RBF is a deterministic
interpolator that utilizes the level of smoothing to determine the appropriate interpolation
technique [68,69].
Local Polynomial Interpolation (LPI) is a method for approximating a function by
fitting a polynomial to small subsets of the data. This technique involves constructing a
locally best-fitting polynomial to a set of points, rather than fitting a single polynomial
to all the points globally. LPI is particularly useful when dealing with large datasets or
highly variable functions. By considering the local behavior of the data, it allows for a
more accurate representation of the function. In addition, LPI can estimate derivatives and
integrals of the function at specific points [68].
3.2.2. Geostatistical Techniques
Geostatistics is a statistical technique that deals with spatially distributed variables. Its
original purpose was to predict the likelihood of finding gold ore in the Witwatersrand mine
in South Africa [
70
]. This technique employs a semivariogram, which describes the spatial
correlation between data points, to estimate values at locations where no samples exist.
The semivariogram, which plots the variance of the differences between pairs of points
against their spatial separation, is vital to geostatistical interpolation. It is used to model
the spatial dependence of the estimated variables and to interpolate values at unsampled
locations. The semivariogram provides information about the spatial correlation between
data points and is used to estimate the weights that are applied to the measured values of
the variable being estimated at unsampled locations [68].
Kriging is a widely used technique in the fields of geology, environmental science, and
engineering to estimate the value of a variable at an unobserved location based on nearby
observed locations. The technique relies on spatial autocorrelation, which refers to the
tendency of locations to exhibit similar values to the given variable. Kriging utilizes this
principle to estimate the variable’s value at an unobserved location by analyzing the spatial
correlation between nearby observed locations. This involves fitting a mathematical model
to the observed data, which describes the spatial correlation between data points. The
resulting model is then used to estimate the variable’s value at the unobserved location by
taking a weighted average of the values of the variable at nearby observed locations, with
the weights assigned based on the spatial correlation between the locations [
21
,
31
,
33
,
42
].
Kriging offers several benefits over other interpolation techniques, including more accurate
estimates and a measure of uncertainty for each estimate. This can be particularly useful in
decision-making processes, where a clear understanding of the accuracy and reliability of
the estimates is critical. The semivariogram is a mathematical function used to describe
the spatial autocorrelation of a variable and is used to measure the degree to which the
values of the variable at two locations are similar as a function of the distance between
these locations.
Sustainability 2023,15, 14028 14 of 32
Cokriging is a variation of Kriging that enables the inclusion of additional spatially
correlated variables in the estimation process. This method enhances the precision of
rainfall estimations by considering the connection between rainfall and other factors such
as elevation, temperature, and land use. It is particularly beneficial when the rainfall data
are limited or unreliable. The Cokriging method assesses the cross-covariance function
between two variables, describing their spatial correlation. It then uses this function to
determine the weights to be assigned to the measured values of the related variable to
predict the rainfall values at unsampled locations. Cokriging is a valuable tool for spatially
analyzing rainfall, improving our understanding and prediction of rainfall patterns [
71
,
72
].
Empirical Bayesian Kriging (EBK) is a geostatistical interpolation method used to
estimate rainfall distribution in spatial domains. It tackles some challenges present in
spatially valid Kriging models, such as the manual adjustment of parameters required
by other Kriging techniques. Instead, EBK automates the parameter calculation process
by combining submissions and simulations [
73
75
]. EBK stands out from other Kriging
techniques due to its ability to account for errors arising from semivariogram estimation. It
employs a hierarchical Bayesian model that incorporates prior semivariogram knowledge,
enhancing the accuracy of interpolated values by reducing the estimation errors. This
flexible approach yields accurate estimates, especially with limited data. The EBK process
includes estimating a semivariogram model from available data, simulating new values at
input locations based on this model, and iteratively estimating new semivariogram models
from simulated data. This produces a spectrum of semivariograms, enabling prediction
and standard error computation for unsampled locations. In contrast to classical Kriging,
EBK optimizes parameters differently. It automatically refines parameters using diverse
semivariogram models, resulting in enhanced adaptability and estimate accuracy [7375].
4. Results and Discussion
Automation of geostatistical analysis in Arc-GIS via template parameters can be incor-
porated into deterministic and geostatistical approaches that do not require semivariograms.
Seven types of semivariograms (i.e., Circular, Spherical, Exponential, Gaussian, K-Bessel,
J-Bessel, and Stable) were selected to be evaluated within Kriging and Cokriging tech-
niques and their sub-types (i.e., ordinary, simple, and universal). A semivariogram relates
the distance between rain gauge locations and the variance between rain gauge records.
Consequently, the semivariogram varies from year to year based on available records.
These variances and inability to automate the process necessitate the manual generation of
4032 geostatistical
parameters for Kriging and Cokriging through the 48-year study period
(2 (Kriging and Cokriging)
×
3 (Kriging types)
×
7 (semivariogram types)
×
48 (number of
years)). For other interpolation techniques, parameter values (e.g., the power of the IDW)
were defined once and repeated for each year of the study period.
Several quantitative statistical metrics are found in the literature, offering different
lenses for evaluating the accuracy of interpolation techniques. The Mean Absolute Error
(MAE: Equation (2)) measures the average error between predicted
(Pi)
and measured
rainfall values
(Mi)
. The Root Mean Square Error (RMSE: Equation (3)) calculates the stan-
dard deviation of the errors. The Centered Root Mean Square Error (
CRMSE: Equation (4)
)
disregards the mean values in the error evaluation. Taylor’s graphical plot (Taylor, 2001)
provides valuable information about the standard deviation, the correlation coefficient
(CC: Equation (5)), and CRMSE for (Mi)and (Pi).
In this study, the cross-validation CRMSE serves as the criterion for comparing the
prediction accuracy across different geostatistical approaches, guiding the selection of
the appropriate technique with the least CRSME. This cross-validation procedure was
sequentially conducted across all the points used to generate a geostatistical layer. After
omitting each measured value from the dataset, the error at each point was determined by
the difference between the predicted and the measured values at the respective location.
Across the study period, distinct geostatistical approaches were assigned to each year,
addressing total yearly and maximum daily rainfall data. This variation in the appropriate
Sustainability 2023,15, 14028 15 of 32
geostatistical approaches is due to the differences in the available rain gauges used to create
the dataset from one year to another. Figure 5shows the Taylor diagram for total yearly
rainfall in the study area for the year 1998. Among the approaches, ordinary Cokriging with
an exponential variogram (CKOE) emerged with the least centered Root Mean Square Error
and thus was selected to generate the total yearly surface for the study area. Similarly, for
the year 1999, the spatial distribution of the total annual and maximum daily rainfall depths
over the study area based on the selected (CKSG and CKOC) interpolation techniques
is given in Figures 6and 7, respectively. Table 4shows the yearly selected interpolation
techniques for the maximum daily and total annual rainfall over the study area.
MAE =N
i=1|(MiPi)|
N(2)
RMSE =v
u
u
t1
N
N
i=1
(MiPi)2(3)
CRMSE =v
u
u
t1
N
N
i=1(MiM)PiP2=pσM2+σP22·σP·CC (4)
CC =n
i=1MiM·PiP
qn
i=1MiM2·qn
i=1PiP2(5)
where µSand σSare the mean and standard deviation of satellite rainfall data.
Figure 5. Taylor diagram depicting total rainfall interpolation models for the year 1998.
Sustainability 2023,15, 14028 16 of 32
Figure 6.
Spatial distribution of total annual rainfall depth for the year 1999 (generated by CKSG
geostatistical interpolation).
Figure 7.
Spatial distribution of maximum daily rainfall for the year 1999 (generated by CKOC
geostatistical interpolation).
Sustainability 2023,15, 14028 17 of 32
Table 4.
Yearly selected interpolation techniques for maximum daily and total annual rainfall over
the study area.
Year Total Yearly Maximum
Daily Year Total Yearly Maximum Daily
1966 KOJ CKOK 1990 CKUK CKOJ
1967 KUJ KUJ 1991 CKSJ KOJ
1968 KSJ CKUC 1992 CKOJ KSG
1969 LPI CKOJ 1993 CKUJ CKSK
1970 KSJ KUJ 1994 CKOS KOJ
1971 CKOC CKOC 1995 CKOC CKUC
1972 CKOC KOJ 1996 CKSE CKOG
1973 CKOJ KOJ 1997 CKUT CKOT
1974 KOJ CKOC 1998 CKUK CKOK
1975 CKOJ CKUJ 1999 CKSG CKOC
1976 CKOJ LPI 2000 KSG KOE
1977 CKOG CKUC 2001 KOT KSJ
1978 CKOJ KUJ 2002 CKSC CKOG
1979 CKOT CKOS 2003 KUJ CKSJ
1980 KOJ CKOJ 2004 KOJ CKOJ
1981 CKSG CKSJ 2005 CKSC GPI
1982 CKSG CKUJ 2006 KOG CKOK
1983 CKUC KOJ 2007 KOG IDWP1
1984 CKOC CKSK 2008 CKSJ CKSG
1985 CKOE CKOK 2009 CKSE KUJ
1986 CKOC KUJ 2010 KSG CKOC
1987 KOJ CKSK 2011 KSJ KST
1988 KUK CKOC 2012 CKOS CKUJ
1989 CKUE LPI 2013 CKOJ CKOC
The yearly spatial distribution for maximum and total precipitation over the study
area with gaps filled from 1966 to 2013 is given in Figure S1 through Figure S12 in the
Supplementary Materials. These rasters can be utilized to directly select any missing data
at a specific gauge within the study area boundaries and during the study period (as shown
in the use case section). The average values of the total and maximum yearly precipitation
were calculated based on the filled data from 1966 to 2013, as given in Figure 8. These
average values were utilized to select the optimal interpolation technique in light of the
limited availability of rainfall data beyond 2013 due to the limited rain gauge records in
this period. In such cases, the Ordinary Kriging with J-Bessel semivariogram (KOJ) and
Simple Kriging with K-Bessel (KSK) semivariogram techniques are recommended to fill
data gaps for maximum and total yearly precipitation, respectively, for the years after 2013.
The World Meteorological Organization (WMO) recommends certain densities of
rain gauge stations to be followed for different types of catchments. For mountainous
regions with irregular rainfall, a density of 250 km
2
per station is recommended for daily-
recording gauges [
23
]. The density of the rain gauges in the study area is far beyond
these recommended limits. Additionally, several publications have discussed different
techniques for assessing and optimizing the locations of rain gauge stations and the spatial
distribution of the network. These techniques include statistical covariance [
76
], the Kriging
Sustainability 2023,15, 14028 18 of 32
interpolation method, and Kriging and entropy techniques for rainfall network design [
77
].
In this study, the cross-validation error (CVE) is used as a quality indicator to evaluate
the density of a given rain gauge network. The spatial distribution of the CVE for the
global proposed maximum daily and total yearly interpolation techniques (KOJ, KSK) is
displayed in Figures S13A and S14A in the Supplementary Materials, with the sum of both
errors shown in Figure 9A. The maximum combined CVE resulting from the sum of the
total rainfall and the maximum rainfall is greater than 50%. The highest CVE is observed in
the southeastern part of the study area, likely due to orographic effects. The average of the
combined CVE is 42%, as highlighted in Table 5.
Figure 8. The spatial distribution of the maximum and total yearly precipitation (mm).
Figure 9.
Cross-validation error summation for total yearly and maximum daily rainfall for
(
A
) existing rainfall stations, (
B
) with additional proposed stations (I), (
C
) with additional proposed
stations (I) and (II), and (D) with additional proposed stations (I), (II), and (III).
Sustainability 2023,15, 14028 19 of 32
Table 5. Study area areal average cross-validation error (%).
Maximum Rainfall Error (%) Total Rainfall Error (%) Error Summation (%)
Existing stations 14.60% 27.08% 41.68%
Existing stations + Proposed
stations (I) 12.71% 22.34% 35.05%
Existing stations + Proposed
stations (I) and (II) 11.65% 20.95% 32.61%
Existing stations + Proposed
stations (I), (II), and (III) 11.51% 20.79% 32.31%
Three additional groups of rain gauges (I, II, and III) are proposed to reduce the CVE,
resulting in a total of 135, 148, and 153 stations, respectively, as shown in Figure 10. The
proposed locations of the additional rain gauges are identified based on three criteria:
(1) areas with the highest cross-validation error, (2) areas with a low rain gauge density,
and (3) the proposed locations should be selected to avoid screening other stations located
at centroid of the triangle formed by the three closest gauges. Figure 11 demonstrates the
proposed methodology for locating new stations.
Figure 10. Proposed additional stations.
Sustainability 2023,15, 14028 20 of 32
Figure 11. Proposed methodology to select the location of additional stations.
Table 5shows that adding new candidate stations generally helped to decrease the average
CVE. It is interesting to note that this reduction in CVE is not linear, as shown in Figure 12.
Increasing the existing rain gauge stations by 10% (group I) resulted in a decrease in the average
CVE by 16%. However, the additional addition of 10% of the number of rain gauge stations
from group I to group II resulted in a decrease of less than 6% in the average CVE.
Figure 12. Effect of increasing the number of rain gauge stations on the average CVE.
Sustainability 2023,15, 14028 21 of 32
Assessing and categorizing the aridity in different climatic regions have significant
implications for agricultural productivity, water resource management, and environmental
conservation. The United Nations Environment Programme (UNEP) introduced the Aridity
Index (AI) as a quantitative measure of aridity, involving the evaluation of the ratio of
annual rainfall (P) to potential evapotranspiration (PET) [
78
]. The aridity is classified as
desert, hyper-arid, arid, semi-arid, dry subhumid, or subhumid based on the AI values
shown in Table 6[
79
]. Hyper-arid regions receive minimal rainfall. Arid regions have
insufficient rainfall to sustain vegetation and semi-arid regions experience slightly higher
but still inadequate rainfall, while dry, sub-humid regions receive enough rainfall to support
continuous harvesting without requiring irrigation.
Table 6. Aridity level classification.
Aridity Level Aridity Index (AI)
Desert AI < 0.03
Hyper arid 0.03 < AI < 0.05
Arid 0.05 < AI < 0.20
Semi-arid 0.20 < AI < 0.50
Dry 0.50 < AI < 0.65
Sub-humid 0.65 < AI < 0.75
Humid AI > 0.75
Cold PET 400 mm
To determine the AI within the study area, it was necessary to determine the average
annual rainfall and PET values. Figure 8shows that the time-averaged annual rainfall
varies from a minimum of 17 mm/year (at the northwestern boundary of the study area)
to less than 400 mm/yr (in the mountainous southern region). Annual PET data were ac-
quired from the publicly accessible WaPOR portal, developed by the Food and Agriculture
Organization [
80
]. The WaPOR portal offers a near real-time database utilizing satellite
data, facilitating the monitoring of agricultural water productivity across various spatial
scales. The dataset obtained from the WaPOR database comprised multiple resolution
levels: continental (250 m ground resolution), country (100 m), and sub-national (30 m).
This dataset encompasses several variables, including water productivity, land productivity,
actual and potential evapotranspiration, land cover and use, biomass, rainfall, carbon
dioxide uptake, yields, harvested index, and crop calendar. For this study, raster data of
the annual PET were extracted at the continental level from the year 2009 to the present,
with a ground resolution of 250 m, using WaPOR portal data [
81
]. The raster calculation
algorithm was employed to derive the time-averaged annual PET, as depicted in Figure 13.
Figure 13 shows that PET varies from a minimum of 2000 mm/yr in the mountainous
southern region to a peak of slightly below 3000 mm/yr elsewhere.
Figure 14 exhibits the spatial distribution of the average Aridity Index within the
study area. The figure highlights that the majority of the study area falls within the hyper-
arid classification, except the mountainous southern and southwestern regions, which are
classified as arid zones.
Sustainability 2023,15, 14028 22 of 32
Figure 13. Time-averaged annual reference evapotranspiration.
Figure 14. Aridity classification in the study area.
5. Use Case
This section showcases a detailed example of how to deal with missing rainfall data in
a network of stations for specific non-consecutive years (as indicated in Table 7) by using
the findings outlined earlier.
Sustainability 2023,15, 14028 23 of 32
Table 7. Maximum rainfall records with missing entries.
J124 B001 J137 J126 TA007 J127 B007 B101
2008 23.0 32.0 missing 26.78991 26.0 missing 46.0 19.0
2009 34.0 63.0 missing 33.0 54.5 missing 42.5 25.0
2010 60.0 56.0 missing 41.0 168.0 missing 54.0 45.0
2011 28.0 82.5 missing 22.0 60.0 missing 44.0 27.0
2012 40.0 missing missing 17.0 34.5 missing 62.0 56.0
2013 26.5 missing missing 28.0 24.5 missing 54.0 40.0
2014 7.3 30.8 70.4 35.5 37.8 40.5 36.1 41.5
2015 6.5 89.0 51.3 51.5 missing 50.8 59.0 missing
2016 32.0 69.0 50.5 50.8 missing 50.5 26.0 52.0
2017 1.3 78.8 24.5 21.6 missing 45.3 26.2 15.0
2018 5.0 34.4 42.5 21.5 missing 40.5 35.8 34.5
The missing data for the years 2008 to 2010 can be directly retrieved from Figure S5,
while the data for the years 2011 to 2013 can be acquired from Figure S6 in the Supplementary
Materials, since these years fall after 2014. For the years 2015 to 2018, missing Material can
be derived by employing ordinary Kriging interpolation with a J-Bessel semivariogram to
generate geostatistical layers using the available data from adjacent stations, as demonstrated
in Figure 15. The resulting missing data can be found in Table 8after the filling process.
Figure 15. Geostatistical surfaces for the years from 2015 to 2018.
Sustainability 2023,15, 14028 24 of 32
Table 8. Interpolated filled data.
J124 B001 J137 J126 TA007 J127 B007 B101
2008 23.0 32.0 27.2 26.8 26.0 28.2 46.0 19.0
2009 34.0 63.0 46.4 33.0 54.5 51.4 42.5 25.0
2010 60.0 56.0 46.4 41.0 168.0 42.6 54.0 45.0
2011 28.0 82.5 30.4 22.0 60.0 22.9 44.0 27.0
2012 40.0 19.4 26.8 17.0 34.5 32.4 62.0 56.0
2013 26.5 30.0 30.9 28.0 24.5 32.5 54.0 40.0
2014 7.3 30.8 70.4 35.5 37.8 40.5 36.1 41.5
2015 6.5 89.0 51.3 51.5 50.4 50.8 59.0 52.9
2016 32.0 69.0 50.5 50.8 47.2 50.5 26.0 52.0
2017 16.0 78.8 24.5 21.6 24.4 45.3 26.2 15.0
2018 5.0 34.4 42.5 21.5 30.6 40.5 35.8 34.5
6. Summary and Conclusions
Rainfall depth and spatial distribution play a pivotal role in hydrological and water
resources studies. While rain gauges are the conventional and most accurate data source,
obtaining adequate records with accurate spatial distributions can be challenging, especially
in regions with complex topography and security concerns, such as arid areas. In such
contexts, satellite data and geostatistical interpolation techniques can be employed to
estimate the spatial distribution between rain gauges and fill data gaps. However, the
accuracy of satellite data in our study area was inadequate, underscoring the critical role of
geostatistical interpolation in ensuring reliable rainfall spatial distribution.
This study focused on the southwest part of the Kingdom of Saudi Arabia, covering
123 rain gauges and spanning from 1966 to 2013. Throughout the study period, a total of
51 interpolation techniques were evaluated for total and maximum yearly precipitation
records. Each year was assigned an optimal interpolation technique for maximum and
total yearly precipitation data, determined by cross-validation to find the least centered
Root Mean Square Error among the tested techniques. This optimal technique was then
utilized to generate maximum and total yearly rainfall spatial distributions over the study
area for each year from 1966 to 2013.
Similarly, interpolation techniques were chosen for mean annual values of the gap-
filled data for total and maximum yearly rainfall, intended for years beyond 2013. For
maximum yearly rainfall interpolation, ordinary Kriging with a J-Bessel semivariogram is
recommended, while for total yearly rainfall, simple Kriging with a K-Bessel semivariogram
is recommended.
In practical engineering applications within the study area, when utilizing rainfall
time series is advised, the rasters provided in the Supplementary Materials can serve as
valuable tools to address data gaps from 1966 to 2013. For data gaps extending beyond
2013, the use of KOJ and KSK techniques is recommended for interpolating maximum daily
and total yearly rainfall values, respectively.
The average annual interpolated total rainfall depth and average annual potential
evapotranspiration were utilized to develop the spatial distribution of the study area’s
Aridity Index. Consequently, the study area was classified into three zones based on the
Aridity Index scale: desert, hyper-arid, and arid zones.
A methodology was introduced, suggesting the incorporation of additional stations
to enhance accuracy and reduce cross-validation errors. Intriguingly, the integration of
another 23 stations resulted in a 21% improvement in the sum of the cross-validation error
for maximum and total yearly precipitation. A practical use case clearly illustrates the
proposed methodology, effectively showcasing its utility for engineering purposes.
Sustainability 2023,15, 14028 25 of 32
Supplementary Materials:
The following supporting information can be downloaded at: https:
//www.mdpi.com/article/10.3390/su151814028/s1, Figure S1: Distribution of maximum annual
rainfall from 1966 to 1974, Figure S2: Distribution of maximum annual rainfall from 1965 to 1983,
Figure S3: Distribution of maximum annual rainfall from 1984 to 1992, Figure S4: Distribution of
maximum annual rainfall from 1993 to 2001, Figure S5: Distribution of maximum annual rainfall
from 2002 to 2010, Figure S6: Distribution of maximum annual rainfall from 2011 to 2013, Figure S7:
Distribution of total annual rainfall from 1966 to 1974; Figure S8: Distribution of total annual rainfall
from 1965 to 1983, Figure S9: Distribution of total annual rainfall from 1984 to 1992, Figure S10:
Distribution of total annual rainfall from 1993 to 2001, Figure S11: Distribution of total annual rainfall
from 2002 to 2010, Figure S12: Distribution of total annual rainfall from 2011 to 2013, Figure S13: Total
yearly rainfall cross-validation error for (A) existing rainfall stations, (B) with additional proposed
stations (I), (C) with additional proposed stations (I) and (II), and 9D) with additional proposed
stations (I), (II), and (III), Figure S14: Maximum daily rainfall cross-validation error for (A) existing
rainfall stations, (B) with additional proposed stations (I), (C) with additional proposed stations (I)
and (II), and (D) with additional proposed stations (I), (II), and (III).
Author Contributions:
A.M.H.: Conceptualization, Data Collection, Data Analysis, Methodology,
Visualization, Writing—Review and Editing. M.E.: Data Analysis, Methodology, Visualization,
Writing—Review and Editing, Supervision. M.I.F.: Methodology, Visualization, Writing—Review and
Editing. M.S.A.: Data Analysis, Writing—Review and Editing. B.T.E.: Data Collection, Methodology,
Visualization, Writing—Original Draft. All authors have read and agreed to the published version of
the manuscript.
Funding:
This work was supported and funded by the Deanship of Scientific Research at Imam
Mohammad Ibn Saud Islamic University (IMSIU) (grant number IMSIU-RG23109).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Data will be made available on request.
Conflicts of Interest:
The authors declare that they have no known competing financial interests or
personal relationships that could have appeared to influence the work reported in this paper.
Appendix A. Spatial Interpolation Equations and Main Characteristics
Spatial interpolation is the estimation of missing measured parameters at specific loca-
tions over the study area by utilizing available data. In the current study, two approaches of
spatial interpolation were used: (A) deterministic and (B) geostatistical. In the deterministic
approach, the relation between points is presented in the form of a mathematical equation.
On the other hand, the geostatistical approach is a probabilistic approach considering
the spatial variance. Table A1 shows a summary of the advantages and disadvantages of
interpolation techniques.
(A) Deterministic Approaches
(A.1.)
Inverse Distance Weighting (IDW)
Inverse Distance Weighting is widely used due to its simplicity. Inverse Distance
Weighting assigns weights to the surrounding measured data inversely proportional to
each point separating distance. The power exponent (P) in the function varies between 0
and 6. With an increase in the power exponent, the weight of each point decreases faster
with distance.
Zx=
N
i=1
λiZi(A1)
λi=dP
xi
N
i=1dP
xi
(A2)
where:
Zx: the predicted unknown value at point (x).
Sustainability 2023,15, 14028 26 of 32
λi: the weight value of the sampled point (i).
Zi: the value of the sampled point (i).
dxi: the distance between the sampled point (i)and the predicted point (x).
P: the power of decreasing weight with distance.
(A.2.)
Global Polynomial Interpolation
The global polynomial is a multiple regression between all measured data over the
study area. It starts from the first order up to the tenth.
Z(xi,yi)=βo+β1·xi+β2·yi+ε(xi,yi)(A3)
Z(xi,yi)=βo+β1·xi+β2·yi+β3·x2
i+β4·y2
i+β5·xi·yi+β6·x3
i+β7·y3
i+β8·x2
i·yi+β9·xi·y2
i+ε(xi,yi)(A4)
where:
Z(xi,yi): location (xi,yi)value.
ε(xi,yi): random error.
β: parameter.
(A.3.)
Local Polynomial Interpolation
Local Polynomial Interpolation has the same procedure as Global Polynomial Interpo-
lation. Local Polynomial Interpolation is conducted over individual windows of the study
area, not the whole area, as for Global Polynomial Interpolation.
(A.4.)
Radial Basis Function
The Radial Basis Function generates an interpolated surface over the whole study area
by utilizing one of the following five spline equations.
Reg ulari zed s plin e f un ction—Ø(r)=lnσ·r
22+E1(σ·r)2+CE(A5)
where:
Ø(r): Radial Basis Function.
r: distance between the predicted point and each data location.
E1: exponential integer [82].
CE: Euler constant [82].
Spline with tension f unction—Ø(r)=lnσ·r
2+Koσ·r
2+CE(A6)
where:
Ko: modified Bessel function
Mul tiqua dric f unctio n—Ø(r)=r2+σ21/2 (A7)
In verse mul tiqua dric f unctio n—Ø(r)=r2+σ21/2 (A8)
Thi npl ate s pline f unctio n—Ø(r)=(σ·r)2ln(σ·r)(A9)
(B) Geostatistical Approaches
The main difference between geostatistical and deterministic interpolation techniques
is the consideration of the parameters’ spatial variability in the geostatistical approach. The
observed variance with distance is transferred to a theoretical one that can be represented
mathematically in the interpolation process. Equation (A10) illustrates the calculation of
Sustainability 2023,15, 14028 27 of 32
the semivariance. The variable (
γ
) is named semivariance because it calculates half of the
variance, not the entire variance.
γ(h)=1
2N(h)
N(h)
i=1
(Z(ui)Z(ui+h))2(A10)
where
Z(ui)and Z(ui+h)
are the values of the at the locations
(ui)and (ui+h)
, respec-
tively.
Figure A1 shows a schematic diagram of a semivariogram. Several mathematical
equations can be utilized to represent the theoretical semivariogram. In the current study,
seven mathematical models were evaluated (Circular, Spherical, Exponential, Gaussian,
K-Bessel, J-Bessel, and Stable).
Figure A1. Schematic semivariogram.
The geostatistical analysis estimates the missing values at location
(uo)
by assigning
weights to the measured values, as shown in Equation (A11).
z(uo)=
n
i=1
λiz(ui)(A11)
where:
z(uo): an estimate of the variable of interest at the location (uo).
z(ui): the measured value of the variable of interest at the location (ui).
λi: the Kriging weight of z(ui).
n: the total number of data locations.
Geostatistical prediction is subjected to two main constraints: (A) the unbiasedness
as given in Equation (A12), and (B) minimization of the variance of the estimated value
(z(uo)), shown in Equation (A13).
E[z(uo)z(uo)] =0
n
i=1
λi=1 (A12)
E[z(uo)z(uo)]2=n
i=1n
j=1λiλjγij 2n
i=1λiγio +C(o)(A13)
(B.1.)
Simple Kriging
Sustainability 2023,15, 14028 28 of 32
In simple Kriging, the stationary random parameter mean
(µ)
is assumed constant
and known before Kriging. The model prediction parameters are given by Equation (A14).
γ11 γ12 γ13
γ21 γ22 γ23
γ31 γ32 γ33
· · ·
γ1n
γ2n
γ23n
.
.
.....
.
.
γn1γn2γn3· · · γnn
λ1
λ2
λ3
.
.
.
λn
=
γ01
γ02
γ03
.
.
.
γ0n
(A14)
(B.2.)
Ordinary Kriging
In ordinary Kriging, the stationary random parameter mean
(µ)
is assumed constant
and known before Kriging. The prediction parameters are given by Equation (A15)
γ11 γ12 γ13 · · · γ1n1
γ21 γ22 γ23 · · · γ2n1
γ31 γ32 γ33 · · · γ3n1
.
.
..
.
..
.
.....
.
. 1
γn1γn2γn3· · · γnn 1
1 1 1 · · · 1 0
λ1
λ2
λ3
λn
m
=
γ01
γ02
γ03
γ0n
1
(A15)
where:
m: the mean value of the stationary variable.
(B.3.)
Universal Kriging
If the concerned parameters exhibit a trend leading to nonstationary mean behavior,
the sampling domain can be limited [
83
]. Kriging with a trend is another name of universal
Kriging. In universal Kriging, another set of unbiasedness conditions is required. The
prediction parameters are given by Equation (A16).
γ11 γ12 γ13 · · · γ1n1f11f12· · · f1L
γ21 γ22 γ23 · · · γ2n1f21f22· · · f2L
γ31 γ32 γ33 · · · γ3n1f31f31· · · f3L
.
.
..
.
..
.
.....
.
..
.
..
.
..
.
.....
.
.
γn1γn2γn3· · · γnn 1fn1fn2· · · fnL
1 1 1 · · · 1 0 0 0 0 0
f11f21f31· · · fn10 0 0 0 0
f12f22f32· · · fn20 0 0 0 0
.
.
..
.
..
.
.....
.
. 0 0 0 0 0
f1Lf2Lf3L· · · fn20 0 0 0 0
λ1
λ2
λ3
.
.
.
µ0
µ1
µ2
µ3
.
.
.
µL
=
γ01
γ02
γ03
γ0n
f10
f20
f30
.
.
.
fL0
(A16)
where:
L: the number of unbiasedness conditions.
fL
P: the Pth basis function.
(B.4.)
Cokriging
Cokriging is used when there are several variables. In the current study, we have
two variables
: (A) the rainfall depth and (B) the elevation of the rain gauges. The mathe-
matical formulas of Cokriging can be found in [8488].
(B.5.)
Empirical Bayesian Kriging
There is a shortage in the literature regarding detailed descriptions of Empirical
Bayesian Kriging (EBK). Most of the available documents focus computer package use
[68,89]
.
Sustainability 2023,15, 14028 29 of 32
Table A1. Summary of advantages and disadvantages of interpolation techniques.
Interpolation
Technique Advantages Disadvantages
Inverse Distance Weighting Simple and easy to apply.
Fast computing time.
Generates “Bulls Eyes” at
measurement locations.
Sensitive to the selection of power
factor.
Sensitive to outliers.
Global Polynomial Interpolation
Simple and easy to apply.
Fast computing time.
Provides a smooth interpolated
surface over the whole study area.
Does not capture local trends.
Sensitive to the polynomial degree.
Large influence of the edge points.
Local Polynomial Interpolation
Flexibility (allows the selection
between several polynomial
degrees).
Captures local trends.
Can handle complex patterns.
High computational demand.
Sensitive to the polynomial degree.
May generate discontinuity at local
region edges
Radial Basis Function
Flexibility (allows the selection
between five basis functions).
Captures local trends.
Can capture complex spatial
patterns.
High computational demand.
Sensitive to basis function selection.
Kriging
Provides Best Linear Unbiased
Prediction (BLUP).
Several types of Kriging provide
flexibility to deal with stationary
and non-stationary variables.
Requires preparing a variogram
before starting the computation.
Sensitive to model assumption.
Computationally demanding.
Cokriging
Provide Best Linear Unbiased
Prediction (BLUP).
Several types of kriging provide
flexibility to deal with stationary
and non-stationary variables.
Can handle more than one variable
simultaneously.
Incorporation of auxiliary variables
can improve prediction accuracy.
Requires preparing a variogram
before starting the computation.
Sensitive to model assumption.
The highest computational demand
approach.
Empirical Bayesian Kriging Automated variogram generation. Limited theoretical
justificationOutliers sensitivity
References
1.
Gaur, M.K.; Squires, V.R. Geographic Extent and Characteristics of the World’s Arid Zones and Their Peoples. In Climate
Variability Impacts on Land Use and Livelihoods in Drylands; Gaur, M.K., Squires, V.R., Eds.; Springer International Publishing: Cham,
Switzerland, 2018; pp. 3–20. ISBN 978-3-319-56681-8.
2.
Khan, M.Y.A.; ElKashouty, M.; Subyani, A.M.; Tian, F.; Gusti, W. GIS and RS intelligence in delineating the groundwater potential
zones in Arid Regions: A case study of southern Aseer, southwestern Saudi Arabia. Appl. Water Sci. 2022,12, 3. [CrossRef]
3.
Wang, R.; Chen, J.; Chen, X.; Wang, Y. Variability of precipitation extremes and dryness/wetness over the southeast coastal region
of China, 1960–2014. Int. J. Climatol. 2017,37, 4656–4669. [CrossRef]
4.
Mallick, J.; Talukdar, S.; Alsubih, M.; Salam, R.; Ahmed, M.; Ben Kahla, N.; Shamimuzzaman, M. Analysing the trend of rainfall in
Asir region of Saudi Arabia using the family of Mann-Kendall tests, innovative trend analysis, and detrended fluctuation analysis.
Theor. Appl. Climatol. 2021,143, 823–841. [CrossRef]
Sustainability 2023,15, 14028 30 of 32
5.
Kharrou, M.H.; Le Page, M.; Chehbouni, A.; Simonneaux, V.; Er-Raki, S.; Jarlan, L.; Ouzine, L.; Khabba, S.; Chehbouni,
G. Assessment of Equity and Adequacy of Water Delivery in Irrigation Systems Using Remote Sensing-Based Indicators in
Semi-Arid Region, Morocco. Water Resour. Manag. 2013,27, 4697–4714. [CrossRef]
6.
Brunner, M.I.; Slater, L.; Tallaksen, L.M.; Clark, M. Challenges in modeling and predicting floods and droughts: A review. Wiley
Interdiscip. Rev. Water 2021,8, e1520. [CrossRef]
7.
Hao, Z.; Singh, V.P.; Xia, Y. Seasonal Drought Prediction: Advances, Challenges, and Future Prospects. Rev. Geophys.
2018
,56,
108–141. [CrossRef]
8. Pachauri, K.; Meyer, L. Climate Change 2014 Synthesis Report; IPCC: Geneva, Switzerland, 2000; ISBN 9789291691432.
9. Al Mamoon, A.; Rahman, A. Rainfall in Qatar: Is it changing? Nat. Hazards 2017,85, 453–470. [CrossRef]
10.
Donat, M.G.; Lowry, A.L.; Alexander, L.V.; O’Gorman, P.A.; Maher, N. More extreme precipitation in the worldâ
TM
s dry and
wet regions. Nat. Clim. Chang. 2016,6, 508–513. [CrossRef]
11.
Gebregiorgis, A.S.; Hossain, F. Understanding the dependence of satellite rainfall uncertainty on topography and climate for
hydrologic model simulation. IEEE Trans. Geosci. Remote Sens. 2013,51, 704–718. [CrossRef]
12.
Arora, M.; Singh, P.; Goel, N.K.; Singh, R.D. Spatial distribution and seasonal variability of rainfall in a mountainous basin in the
Himalayan region. Water Resour. Manag. 2006,20, 489–508. [CrossRef]
13.
Buytaert, W.; Celleri, R.; Willems, P.; De Bièvre, B.; Wyseure, G. Spatial and temporal rainfall variability in mountainous areas: A
case study from the south Ecuadorian Andes. J. Hydrol. 2006,329, 413–421. [CrossRef]
14.
Hu, Q.; Li, Z.; Wang, L.; Huang, Y.; Wang, Y.; Li, L. Rainfall spatial estimations: A review from spatial interpolation to multi-source
data merging. Water 2019,11, 579. [CrossRef]
15. Kundzewicz, Z.W. Climate change impacts on the hydrological cycle. Ecohydrol. Hydrobiol. 2008,8, 195–203. [CrossRef]
16.
Lorenz, C.; Kunstmann, H. The hydrological cycle in three state-of-the-art reanalyses: Intercomparison and performance analysis.
J. Hydrometeorol. 2012,13, 1397–1420. [CrossRef]
17.
Besha, K.Z.; Demissie, T.A.; Feyessa, F.F. Comparative analysis of long-term precipitation trends and its implication in the Modjo
catchment, central Ethiopia. J. Water Clim. Chang. 2022,13, 3883–3905. [CrossRef]
18.
Kessabi, R.; Hanchane, M.; Caloiero, T.; Pellicone, G.; Addou, R.; Krakauer, N.Y. Analyzing Spatial Trends of Precipitation Using
Gridded Data in the Fez-Meknes Region, Morocco. Hydrology 2023,10, 37. [CrossRef]
19.
Haggag, M.; Elsayed, A.A.; Awadallah, A.G. Evaluation of rain gauge network in arid regions using geostatistical approach: Case
study in northern Oman. Arab. J. Geosci. 2016,9, 552. [CrossRef]
20.
Noori, M.J.; Hassan, H.H.; Mustafa, Y.T. Spatial Estimation of Rainfall Distribution and Its Classification in Duhok Governorate
Using GIS. J. Water Resour. Prot. 2014,6, 75–82. [CrossRef]
21.
Keblouti, M.; Ouerdachi, L.; Boutaghane, H. Spatial interpolation of annual precipitation in Annaba- Algeria-Comparison and
evaluation of methods. Energy Procedia 2012,18, 468–475. [CrossRef]
22.
Radi, N.F.A.; Zakaria, R.; Azman, M.A.Z. Estimation of missing rainfall data using spatial interpolation and imputation methods.
AIP Conf. Proc. 2015,1643, 42–48. [CrossRef]
23.
WMO. Volume I: Hydrology—From Measurement to Hydrological Information; WMO: Geneva, Switzerland, 2008; Volume I,
ISBN 9789263101686.
24.
De Moraes Cordeiro, A.L.; Blanco, C.J.C. Assessment of satellite products for filling rainfall data gaps in the Amazon region. Nat.
Resour. Model. 2021,34, 12298. [CrossRef]
25.
Al-Areeq, A.M.; Al-Zahrani, M.A.; Sharif, H.O. Assessment of the performance of satellite rainfall products over Makkah
watershed using a physically based hydrologic model. Appl. Water Sci. 2022,12, 246. [CrossRef]
26.
Hobouchian, M.P.; Salio, P.; García Skabar, Y.; Vila, D.; Garreaud, R. Assessment of satellite precipitation estimates over the slopes
of the subtropical Andes. Atmos. Res. 2017,190, 43–54. [CrossRef]
27.
Hsu, K.L.; Gao, X.; Sorooshian, S.; Gupta, H.V. Precipitation estimation from remotely sensed information using artificial neural
networks. J. Appl. Meteorol. 1997,36, 1176–1190. [CrossRef]
28.
Portuguez-maurtua, M.; Arumi, J.L.; Lagos, O.; Stehr, A.; Arquiñigo, N.M. Filling Gaps in Daily Precipitation Series Using
Regression and Machine Learning in Inter-Andean Watersheds. Water 2022,14, 1799. [CrossRef]
29.
Helmi, A.M.; Abdelhamed, M.S. Evaluation of CMORPH, PERSIANN-CDR, CHIRPS V2.0, TMPA 3B42 V7, and GPM IMERG V6
Satellite Precipitation Datasets in Arabian Arid Regions. Water 2023,15, 92. [CrossRef]
30.
Zhou, F.; Guo, H.C.; Ho, Y.S.; Wu, C.Z. Scientometric analysis of geostatistics using multivariate methods. Scientometrics
2007
,73,
265–279. [CrossRef]
31.
Li, J.; Heap, A.D. A review of comparative studies of spatial interpolation methods in environmental sciences: Performance and
impact factors. Ecol. Inform. 2011,6, 228–241. [CrossRef]
32.
Li, J.; Heap, A.D. Spatial interpolation methods applied in the environmental sciences: A review. Environ. Model. Softw.
2014
,53,
173–189. [CrossRef]
33.
Wagner, P.D.; Fiener, P.; Wilken, F.; Kumar, S.; Schneider, K. Comparison and evaluation of spatial interpolation schemes for daily
rainfall in data scarce regions. J. Hydrol. 2012,464–465, 388–400. [CrossRef]
34.
Sahu, B.; Ghosh, A.K. Seema Deterministic and geostatistical models for predicting soil organic carbon in a 60 ha farm on
Inceptisol in Varanasi, India. Geoderma Reg. 2021,26, e00413. [CrossRef]
Sustainability 2023,15, 14028 31 of 32
35.
Zhang, Y.; Hou, J.; Huang, C. Integration of Satellite-Derived and Ground-Based Soil Moisture Observations for a Precipitation
Product over the Upper Heihe River Basin, China. Remote Sens. 2022,14, 5355. [CrossRef]
36.
Brindha, K.; Taie Semiromi, M.; Boumaiza, L.; Mukherjee, S. Comparing Deterministic and Stochastic Methods in Geospatial
Analysis of Groundwater Fluoride Concentration. Water 2023,15, 1707. [CrossRef]
37. Li, J.; Heap, A.D. A Review of Spatial Interpolation Methods for Environmental Scientists; Geoscience Australia: Canberra, Australia,
2008; Volume 68, ISBN 9781921498305.
38.
Chinchorkar, S.; Sayyad, F.; Patel, G.; Chinchorkar, S.S.; Patel, G.R.; Sayyad, F.G. Development of monsoon model for long range
forecast rainfall explored for Anand (Gujarat-India). Int. J. Water Resour. Environ. Eng. 2012,4, 322–326. [CrossRef]
39.
Lu, B.; Charlton, M.; Harris, P.; Fotheringham, A.S. Geographically weighted regression with a non-Euclidean distance metric: A
case study using hedonic house price data. Int. J. Geogr. Inf. Sci. 2014,28, 660–681. [CrossRef]
40.
Naoum, S.; Tsanis, I.K. A multiple linear regression GIS module using spatial variables to model orographic rainfall. J. Hydroinfor-
matics 2004,6, 39–56. [CrossRef]
41.
Yeh, H.C.; Chen, Y.C.; Wei, C.; Chen, R.H. Entropy and kriging approach to rainfall network design. Paddy Water Environ.
2011
,9,
343–355. [CrossRef]
42.
Mair, A.; Fares, A. Comparison of Rainfall Interpolation Methods in a Mountainous Region of a Tropical Island. J. Hydrol. Eng.
2011,16, 371–383. [CrossRef]
43.
Getahun, Y.S. Spatial-Temporal Analysis of Climate Elements, Vegetation Characteristics, and Sea Surface Temperature Anomaly—
A Case Study in Gojam, Ethiopia, Erasmus Mundus Program-European Union. Master’s Thesis, Universidade NOVA de Lisboa,
Lisbon, Portugal, 2012.
44.
Page, T.; Beven, K.J.; Hankin, B.; Chappell, N.A. Interpolation of rainfall observations during extreme rainfall events in complex
mountainous terrain. Hydrol. Process. 2022,36, 14758. [CrossRef]
45.
Jacquin, A. Interpolation of daily precipitation in mountain catchments with limited data availability. In Proceedings of the EGU
General Assembly Conference Abstracts; European Geosciences Union: Munich, Germany, 2014; p. 13973.
46.
Yang, R.; Xing, B. A comparison of the performance of different interpolation methods in replicating rainfall magnitudes under
different climatic conditions in chongqing province (China). Atmosphere 2021,12, 1318. [CrossRef]
47.
Ali, G.; Sajjad, M.; Kanwal, S.; Xiao, T.; Khalid, S.; Shoaib, F.; Gul, H.N. Spatial–temporal characterization of rainfall in Pakistan
during the past half-century (1961–2020). Sci. Rep. 2021,11, 6935. [CrossRef] [PubMed]
48.
Yang, X.; Xie, X.; Liu, D.L.; Ji, F.; Wang, L. Spatial Interpolation of Daily Rainfall Data for Local Climate Impact Assessment over
Greater Sydney Region. Adv. Meteorol. 2015,2015, 563629. [CrossRef]
49.
Amini, M.A.; Torkan, G.; Eslamian, S.; Zareian, M.J.; Adamowski, J.F. Analysis of deterministic and geostatistical interpolation
techniques for mapping meteorological variables at large watershed scales. Acta Geophys. 2019,67, 191–203. [CrossRef]
50.
Delbari, M.; Afrasiab, P.; Jahani, S. Spatial interpolation of monthly and annual rainfall in northeast of Iran. Meteorol. Atmos. Phys.
2013,122, 103–113. [CrossRef]
51.
Gupta, A.; Kamble, T.; Machiwal, D. Comparison of ordinary and Bayesian kriging techniques in depicting rainfall variability in
arid and semi-arid regions of north-west India. Environ. Earth Sci. 2017,76, 512. [CrossRef]
52.
Hasanean, H.; Almazroui, M. Rainfall: Features and variations over Saudi Arabia, a review. Climate
2015
,3, 578–626. [CrossRef]
53. OBG. The Report: Saudi Arabia; Oxford Business Group: Oxford, UK, 2019.
54.
Chowdhury, S.; Al-Zahrani, M. Characterizing water resources and trends of sector wise water consumptions in Saudi Arabia. J.
King Saud Univ. Eng. Sci. 2015,27, 68–82. [CrossRef]
55. Al-Ahmadi, K.; Al-Ahmadi, S. Rainfall-altitude relationship in Saudi Arabia. Adv. Meteorol. 2013,2013, 363029. [CrossRef]
56.
Sultana, R.; Nasrollahi, N. Evaluation of remote sensing precipitation estimates over Saudi Arabia. J. Arid Environ.
2018
,151,
90–103. [CrossRef]
57.
Al-Zahrani, M.; Husain, T. An algorithm for designing a precipitation network in the south-western region of Saudi Arabia. J.
Hydrol. 1998,205, 205–216. [CrossRef]
58.
Hag-Elsafi, S.; El-Tayib, M. Spatial and statistical analysis of rainfall in the Kingdom of Saudi Arabia from 1979 to 2008. Weather
2016,71, 262–266. [CrossRef]
59.
Frenken, K. Irrigation in the Middle East Region in Figures, FAO Water Report—Aquastat Survey; FAO: Rome, Italy, 2009; Volume 34.
60.
Almazroui, M. Calibration of TRMM rainfall climatology over Saudi Arabia during 1998–2009. Atmos. Res.
2011
,99, 400–414.
[CrossRef]
61.
Abdullah, M.A.; Almazroui, M. Climatological study of the southwestern region of Saudi Arabia. I. Rainfall analysis. Clim. Res.
1998,9, 213–223. [CrossRef]
62.
Subyani, A.M. Geostatistical study of annual and seasonal mean rainfall patterns in southwest Saudi Arabia/Distribution
géostatistique de la pluie moyenne annuelle et saisonnière dans le Sud-Ouest de l’Arabie Saoudite. Hydrol. Sci. J.
2004
,49, 55137.
[CrossRef]
63.
World Bank. Making the Most of Scarcity Accountability for Better Water Management in the Middle East and North Africa; MENA
Development Report; World Bank: Washington, DC, USA, 2007.
64.
KSA-MWA. Hydrological Publications No. 98 Vol. 4 Years (1963–1980); Kingdom of Saudi Arabia, Minisrtry of Agriculture and Wate,
Hydrology Division, Department of Water Resources and Development: Riyadh, Saudi Arabia, 1980.
Sustainability 2023,15, 14028 32 of 32
65.
Igaz, D.; Šinka, K.; Varga, P.; Vrbiˇcanová, G.; Aydın, E.; Tárník, A. The evaluation of the accuracy of interpolation methods in
crafting maps of physical and hydro-physical soil properties. Water 2021,13, 212. [CrossRef]
66. Robinson, T.P.; Metternicht, G. Testing the performance of spatial interpolation techniques for mapping soil properties. Comput.
Electron. Agric. 2006,50, 97–108. [CrossRef]
67.
Burrough, P.A.; McDonnell, R.A.; Lloyd, C.D. Principles of Geographical Information Systems; Oxford University Press: Oxford, UK,
2015; ISBN 0198742843.
68.
Johnston, K.; Ver Hoef, J.M.; Krivoruchko, K.; Lucas, N. Using ArcGIS Geostatistical Analyst; ESRI: Redlands, CA, USA, 2001;
ISBN 1589480066.
69.
Bhunia, G.S.; Shit, P.K.; Maiti, R. Comparison of GIS-based interpolation methods for spatial distribution of soil organic carbon
(SOC). J. Saudi Soc. Agric. Sci. 2018,17, 114–126. [CrossRef]
70.
Krige, D.G. A statistical approach to some basic mine valuation problems on the Witwatersrand. J. S. Afr. Inst. Min. Metall.
1951
,
52, 119–139.
71.
Adhikary, S.K.; Muttil, N.; Yilmaz, A.G. Cokriging for enhanced spatial interpolation of rainfall in two Australian catchments.
Hydrol. Process. 2017,31, 2143–2161. [CrossRef]
72.
Liu, D.; Zhao, Q.; Fu, D.; Guo, S.; Liu, P.; Zeng, Y. Comparison of spatial interpolation methods for the estimation of precipitation
patterns at different time scales to improve the accuracy of discharge simulations. Hydrol. Res. 2020,51, 583–601. [CrossRef]
73.
Javari, M. Comparison of interpolation methods for modeling spatial variations of Precipitation in Iran. Int. J. Environ. Sci. Educ.
2016,11, 349–358.
74.
Mirzaei, R.; Sakizadeh, M. Comparison of interpolation methods for the estimation of groundwater contamination in Andimeshk-
Shush Plain, Southwest of Iran. Environ. Sci. Pollut. Res. 2016,23, 2758–2769. [CrossRef] [PubMed]
75.
Samsonova, V.P.; Blagoveshchenskii, Y.N.; Meshalkina, Y.L. Use of empirical Bayesian kriging for revealing heterogeneities in the
distribution of organic carbon on agricultural lands. Eurasian Soil Sci. 2017,50, 305–311. [CrossRef]
76. Shih, S.F. Rainfall variation analysis and optimization of gaging systems. Water Resour. Res. 1982,18, 1269–1277. [CrossRef]
77.
Chen, Y.-C.; Wei, C.; Yeh, H.-C. Rainfall network design using kriging and entropy. Hydrol. Process.
2008
,22, 340–346. [CrossRef]
78.
Middleton, N.; Thomas, D. World Atlas of Desertification; United Nations Digital Library: Geneva, Switzerland, 1992; Volume ix,
69p.
79.
Spinoni, J.; Vogt, J.; Naumann, G.; Carrao, H.; Barbosa, P. Towards identifying areas at climatological risk of desertification using
the Köppen-Geiger classification and FAO aridity index. Int. J. Climatol. 2015,35, 2210–2222. [CrossRef]
80. FAO. WaPOR Database Methodology, Version 2 Release, April 2020; FAO: Rome, Italy, 2020; ISBN 978-92-5-132981-8.
81. WaPOR WaPOR Portal. Available online: https://wapor.apps.fao.org/home/WAPOR_2/1 (accessed on 7 September 2023).
82.
Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions; Department of Commerce—United States of America: Dover,
NY, USA, 1975.
83.
Rojimol, J. Development of Optimal Geostatistical Model for Geotechnical Applications; Indian Institute of Technology Hyderabad:
Sangareddy, India, 2013.
84. Journel, A.G.; Huijbregts, C. Mining Geostatistics; Academic Press: London, UK; New York, NY, USA, 1978; ISBN 0123910501.
85.
Isaaks, E.H.; Srivastava, R.M. An Introduction to Applied Geostatistics; Oxford University Press: Oxford, UK, 1990;
ISBN 978-0195050134.
86. Cressie, N. Fitting variogram models by weighted least squares. J. Int. Assoc. Math. Geol. 1985,17, 563–586. [CrossRef]
87.
Gooverts, P. Geostatistics for Natural Resources Evaluation; Oxford University Press: New York, NY, USA, 1997; ISBN 0-19-511538-4.
88.
Chile, J.-P.; Delfiner, P. Geostatistics-Modeling Spatial Uncertainty, 2nd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2012;
ISBN 978-0-470-18315-1.
89. Krivoruchko, K.; Gribov, A. Evaluation of empirical Bayesian kriging. Spat. Stat. 2019,32, 100368. [CrossRef]
Disclaimer/Publisher’s Note:
The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
... This requires a careful analysis to choose the suitable sites for wind farms and study the factors affecting them to obtain the highest productivity [5]. Türkiye is one of the countries that seek to invest heavily in the field of sustainable energy, and its wind energy production increased from 1375.80 MW in 2010 to 11,101.82 MW by January 2022, indicating significant growth, especially after adding 1797 MW between 2020 and 2021, which made Türkiye take its place in the global renewable energy ranking. ...
... Integrating machine learning algorithms with geographic information systems can be used to classify of vegetation, urban areas, and water bodies [10]. Regression analysis and pattern prediction are missing data in spatial data, such as in predicting air pollution levels in an area [11]. Object detection and feature extraction from remote sensing data and satellite images such as identifying buildings, roads and vehicles [12]. ...
Article
Full-text available
This research highlights the importance of integrating machine learning algorithms with Geographical Information Systems (GIS) applications in the field of renewable energy by finding a suitable site for wind farms due to their importance in preserving the environment to achieve efficiency and cost-effectiveness and reduce the environmental impact of fossil fuel energy sources. Using GIS various factors affecting wind energy localization were processed and analyzed including natural, socio-economic and environmental criteria. Ensemble learning of four supervised machine learning algorithms (Random Forest, K-Nearest Neighbor, Support Vector Machines, Naive Bayes) was used to classify suitable and unsuitable data representing geo-referenced points on the ground with three criteria for each site (wind speed, elevation and slope). The results of the algorithms varied in terms of accuracy and variance, then the results were collected, and the intersection between them was found so that the location classification would be agreed upon in the results of the algorithms used. The aim of using this technique is to reduce the error, increase the accuracy and avoid the bias or variance present in individual models. Accuracy of the algorithms result was respectively (K-Nearest Neighbor, Random Forest, Support Vector Machines, Naive Bayes) (93.022%, 93.018%, 95.095%, 89.553%). The final result is a map using GIS showing the suitable and unsuitable sites of wind farms in the study area (Türkiye) has been chosen as a study area in the research due to several factors that make it suitable for wind energy projects, including its geographical location, which gives it great climatic and terrain diversity, as it is surrounded by seas (Black Sea, Aegean Sea, and Mediterranean Sea), which leads to the activity of seasonal and continuous winds, which contributes to the activity of seasonal and permanent winds. Its drive to develop investment in renewable energy due to economic and population growth has increased the demand for energy and consequently the development of renewable and sustainable energy sources. This research contributes to supporting the global transition to sustainable energy by providing a new methodology for integrating multiple technologies to support a sustainable energy future.
... Traditional precipitation observation methods primarily rely on rain gauges. However, in many parts of the world, especially in remote and less developed regions, the density of rain gauges is low, resulting in significant gaps in pluviometric data [8][9][10]. In recent years, with the development of remote sensing technology, precipitation observation methods based on remote sensing have been gradually applied and developed. ...
... MWRI-RM observation data from 23 October 2023 to the present are available from the National Satellite Meteorological Center. Table 1 summarizes the channel settings of FY-3G/MWRI-RM [9]. ...
Article
Full-text available
Using the FY-3G/MWRI-RM observations, this paper proposes a precipitation retrieval method that combines the Synthetic Minority Over-sampling Technique with Light Gradient Boosting Machine (SMOTE-LGBM) and analyzes the impact of MWRI-RM channel settings on precipitation retrieval. The SMOTE-LGBM-based model consists of two LGBM models for precipitation identification and estimation, respectively. The SMOTE method is used to address the imbalance between precipitation and non-precipitation samples. Using the Integrated Multi-Satellite Retrievals for the Global Precipitation Measurement (IMERG) product as a reference, we validate the retrieved precipitation by the SMOTE-LGBM-based model with an independent testing dataset. The critical success indexes are 0.483 and 0.526, and the Pearson correlation coefficients are 0.611 and 0.645 for the ocean and land regions, respectively. The spatial distributions of the retrieved and IMERG accumulated precipitation in the testing dataset are similar. In addition, we visualize and analyze the cases of Meiyu and two typhoons. The results indicate that the SMOTE-LGBM-based model effectively represents the spatial distribution characteristics of precipitation and achieves high agreement with IMERG precipitation products. Overall, the SMOTE-LGBM-based model successfully retrieves precipitation from MWRI-RM and provides accurate precipitation products for FY-3G/MWRI-RM for the first time.
... Figure 3 shows the average rainfall from 2002 to 2020, where most of the basin is located in the humid region. For more information about the rainfall system in this region, readers can refer to the studies conducted by Sulaiman et al. [22] and Helmi et al. [23]. Overview of the study region: on the left side is a map of Saudi Arabia while on the right side is the study site location (Wadi Hail), which is located in Asir Province. ...
... Figure 3 shows the average rainfall from 2002 to 2020, where most of the basin is located in the humid region. For more information about the rainfall system in this region, readers can refer to the studies conducted by Sulaiman et al. [22] and Helmi et al. [23]. ...
Article
Full-text available
Floods in southwestern Saudi Arabia, especially in the Asir region, are among the major natural disasters caused by natural and human factors. In this region, flash floods that occur in the Wadi Hail Basin greatly affect human life and activities, damaging property, the built environment, infrastructure, landscapes, and facilities. A previous study carried out for the same basin has effectively revealed zones of flood risk using such an approach. However, the utilization of the HEC–HMS (Hydrologic Engineering Center–Hydrologic Modeling System) model and IMERG data for delineating areas prone to flash floods remain unexplored. In response to this advantage, this work primarily focused on flood generation assessment in the Wadi Hail Basin, one of the major basins in the region that is frequently prone to severe flash flood damage, from a single extreme rainfall event. We employed a fully physical-based, distributed hydrological model run with HEC–HMS software version 4.11 and Integrated Multi-satellite Retrievals of Global Precipitation Measurement (IMERG V.06) data, as well as other geo-environmental variables, to simulate the water flow within the Wadi Basin, and predict flash flood hazard. Discharge from the wadi and its sub-basins was predicted using 1 mm rainfall over an 8-h occurrence time. Significant peak discharge (3.6 m³/s) was found in eastern and southern upstream sub-basins and crossing points, rather than those downstream, due to their high-density drainage network (0.12) and CNs (88.4). Generally, four flood hazard levels were identified in the study basin: ‘low risk’, ‘moderate risk’, ‘high risk’, and ‘very high risk’. It was found that 43.8% of the total area of the Wadi Hail Basin is highly prone to flooding. Furthermore, medium- and low-hazard areas make up 4.5–11.2% of the total area, respectively. We found that the peak discharge value of sub-basin 11 (1.8 m³/s) covers 13.2% of the total Wadi Hail area; so, it poses more flood risk than other Wadi Hail sub-basins. The obtained results demonstrated the usefulness of the methods used to develop useful hydrological information in a region lacking ungagged data. This study will play a useful role in identifying the impact of extreme rainfall events on locations that may be susceptible to flash flooding, which will help authorities to develop flood management strategies, particularly in response to extreme events. The study results have potential and valuable policy implications for planners and decision-makers regarding infrastructural development and ensuring environmental stability. The study recommends further research to understand how flash flood hazards correlate with changes at different land use/cover (LULC) classes. This could refine flash flood hazards results and maximize its effectiveness.
... Traditional deterministic and geostatistical interpolation methods, such as nearest neighbor interpolation, inverse distance weighted interpolation, kriging interpolation, trigonometric interpolation, and spline interpolation, have historically been utilized for soil attribute data interpolation. However, the neural networks has ushered in a new era, with these models increasingly shaping various interpolation fields, including image [5,6], soil [7][8][9] and atmospheric sciences [10,11], marking a significant advancement in the realm of spatial interpolation techniques. ...
Article
Full-text available
Petroleum hydrocarbon pollution causes significant damage to soil, so accurate prediction and early intervention are crucial for sustainable soil management. However, traditional soil analysis methods often rely on statistical methods, which means they always rely on specific assumptions and are sensitive to outliers. Existing machine learning based methods convert features containing spatial information into one-dimensional vectors, resulting in the loss of some spatial features of the data. This study explores the application of Three-Dimensional Convolutional Neural Networks (3DCNN) in spatial interpolation to evaluate soil pollution. By introducing Channel Attention Mechanism (CAM), the model assigns different weights to auxiliary variables, improving the prediction accuracy of soil hydrocarbon content. We collected soil pollution data and validated the spatial distribution map generated using this method based on the drilling dataset. The results indicate that compared with traditional Kriging3D methods (R² = 0.318) and other machine learning methods such as support vector regression (R² = 0.582), the proposed 3DCNN based method can achieve better accuracy (R² = 0.954). This approach provides a sustainable tool for soil pollution management, supports decision-makers in developing effective remediation strategies, and promotes the sustainable development of spatial interpolation techniques in environmental science.
... Based on this data, the GIS (3.2.1) software's spline interpolation method was used to depict the spatial distribution of the MFI values. To generate this map, the spline interpolation approach was chosen to smoothly model shifting surfaces of phenomena like rainfall, notably in cases with a diminished number of measurement points [59]. The fuzzification function used for rainfall variability is "fuzzy small". ...
Article
Full-text available
Due to the multiple pressures from human activities, many freshwater ecosystems are facing degradation. To address this issue, a new approach for assessing stream water quality and ecological (WQE) risk using a multi-criteria analysis through a GIS-based policy tool has been developed. The suggested methodology integrates eight different factors along the contaminant pathway from source to streams, including: (a) rainfall variability, (b) soil texture, (c) soil erodibility, (d) slope, (e) river buffer zone, (f) point source contamination buffer zone, (g) non-point source contamination of NO3, and (h) non-point source contamination of PO4. Utilizing fuzzy GIS tools, the above factors and their related maps were spatially overlaid (raster-based suitability for raster reclassification) to obtain the final stream WQE risk map. The final map depicts the spatial distribution of streams concerning their water quality risk and is represented by two classes of WQE risk. The first class is characterized as “appropriate”, in which there is no need for any further actions, while the other one is characterized as “non-appropriate”, indicating that actions should be taken to ensure the sustainability of streams’ water quality. The proposed approach was implemented for the island of Crete, which is located in the Southeast Mediterranean region. The developed methodology was validated using the Hellenic evaluation system (HESY2), an especially established and adapted to the Mediterranean river systems ecological quality metric method, obtained by in situ measurements that were conducted during different monitoring programs (1989–2015). Moreover, this study summarizes appropriate measures and practices that ensure the sustainable management of Mediterranean river basins. These practices can be adopted by local authorities, owners of polluting units, and farmers/breeders to improve the resiliency of streams’ water quality issues in the Mediterranean region.
... CRMSE will be computed by employing Eq. (34) 146 . A model with the lowest CRMSE value will be better 147 . ...
Article
Full-text available
In this study, a landslide susceptibility assessment is performed by combining two machine learning regression algorithms (MLRA), such as support vector regression (SVR) and categorical boosting (CatBoost), with two population-based optimization algorithms, such as grey wolf optimizer (GWO) and particle swarm optimization (PSO), to evaluate the potential of a relatively new algorithm and the impact that optimization algorithms can have on the performance of regression models. The Kerala state in India has been chosen as the test site due to the large number of recorded incidents in the recent past. The study started with 18 potential predisposing factors, which were reduced to 14 after a multi-approach feature selection technique. Six susceptibility models were implemented and compared using the machine learning algorithms alone and combining each of them with the two optimization algorithms: SVR, CatBoost, SVR-PSO, CatBoost-PSO, SVR-GWO, and CatBoost-GWO. The resulting maps were validated with an independent dataset. The performance rankings, based on the area under the receiver operating characteristic curve (AUC) metric, are as follows: CatBoost-GWO (AUC = 0.910) had the highest performance, followed by CatBoost-PSO (AUC = 0.909), CatBoost (AUC = 0.899), SVR-GWO (AUC = 0.868), SVR-PSO (AUC = 0.858), and SVR (AUC = 0.840). Other validation statistics corroborated these outcomes, and the Friedman and Wilcoxon-signed rank tests verified the statistical significance of the models. Our case study showed that CatBoost outperformed SVR both in case the models were optimized or not; the introduction of optimization algorithms significantly improves the results of machine learning models, with GWO being slightly more effective than PSO. However, optimization cannot drastically alter the results of the model, highlighting the importance of setting up of a rigorous susceptibility model since the early steps of any research.
... Monitorar e analisar a distribuição das chuvas é, portanto, de grande relevância para uma boa gestão hídrica, principalmente diante das alterações notadas no comportamento do clima, como citado anteriormente. Isto posto, cada vez mais trabalhos investigam o comportamento espaço-temporal das chuvas e buscam compreender possíveis tendências (Borges et al., 2021;Furtunato et al., 2017;Furtunato et al., 2019;Guptsa et al., 2017;Helmi et al., 2023;Jaman & Adhikary, 2020;Lucas et al., 2022;Sousa et al. 2023). ...
Article
Full-text available
A variabilidade espacial e temporal das chuvas pode gerar desequilíbrios hídricos, o que pode causar impactos de caráter social e econômico. Por conta disto, o estudo da variabilidade das precipitações é de extrema importância para diversas áreas do conhecimento, principalmente em grandes bacias hidrográficas, como a do rio Tietê. Um dos fatores que podem influenciar a variabilidade espacial e temporal das precipitações é a ocorrência de eventos El Niño – Oscilação Sul (ENOS), que alteram a temperatura do Oceano Pacífico. O presente trabalho tem como objetivo estudar a variabilidade espaço-temporal das chuvas na bacia hidrográfica do rio Tietê entre os anos de 1988 e 2017 e avaliar estas variações com os eventos ENOS no período. A partir dos bancos de bancos de dados hidrológicos públicos, foram obtidos 560 postos pluviométricos dentro da área de interesse, os quais passaram por análise estatística descritiva e geoestatística. Posteriormente, foram gerados mapas a partir da técnica de krigagem. Os resultados obtidos possibilitaram identificar que a bacia em questão tem maiores precipitações na parcela sudeste (próximo do litoral) e menores precipitações a oeste, além do fato das sub-bacias não apresentarem necessariamente comportamento hidrológico análogo. A variabilidade temporal das chuvas ao longo da bacia foi analisada, sendo observados maiores volumes pluviométricos nos anos de 1991, 1995, 2009 e 2015, e os menores nos anos de 1994, 2002, 2003 e 2014. Em relação a influência dos eventos ENOS nas chuvas da bacia, os dados não permitiram observar uma influência direta sobre a região como um todo ou nas sub-bacias.
... Deterministic interpolation techniques create surfaces based on similarity, whereas stochastic methods consider spatial autocorrelation among measured points [24]. These methods are essential for estimating values at unobserved locations using known values from sampled locations [25]. The predictions of these methods are influenced by various factors, making it challenging to select the most appropriate input data method [26]. ...
Article
Full-text available
Earthquake hazard mapping assesses and visualizes seismic hazards in a region using data from specific points. Conducting a seismic hazard analysis for each point is essential, while continuous assessment for all points is impractical. The practical approach involves identifying hazards at specific points and utilizing interpolation for the rest. This method considers grid point spacing and chooses the right interpolation technique for estimating hazards at other points. This article examines different point distances and interpolation methods through a case study. To gauge accuracy, it tests 15 point distances and employs two interpolation methods, inverse distance weighted and ordinary kriging. Point distances are chosen as a percentage of longitude and latitude, ranging from 0.02 to 0.3. A baseline distance of 0.02 is set, and other distances and interpolation methods are compared with it. Five statistical indicators assess the methods. Ordinary kriging interpolation shows greater accuracy. With error rates and hazard map similarities in mind, a distance of 0.14 points seems optimal, balancing computational time and accuracy needs. Based on the research findings, this approach offers a cost-effective method for creating seismic hazard maps. It enables informed risk assessments for structures spanning various geographic areas, like linear infrastructures.
Preprint
Full-text available
Groundwater is a commodity we depend on for diverse needs, and maintaining its quality must be considered vital. We considered Machine Learning (ML) operations and Explainable Arti cial Intelligence (XAI) to predict the nitrate concentration levels in the groundwater of India for the years 2019 and 2023. dataset. We prepared GIS surface maps using interpolation supported by the Empirical Bayesian Kriging method. We investigated the model e ciency and feature importance in the presence and absence of location attributes. We considered 19 ML models and ltered Light Gradient Boosting Machine (LightGBM) and Liner Regression (LR) models that exhibited relatively better accuracy. We rst trained these models and fed them to XAI via SHAP (SHapley Additive exPlanations), which was dependent on the game theory. We obtained a 28.23% and 24.88% increase in accuracy when comparing the 2019 and 2023 datasets with location attributes, respectively. We also observed a 28.3% increase in accuracy when the 2023 dataset without a location attribute was used. We conclude that ML can be integrated with XAI to improve the accuracy of the prediction of nitrate in groundwater studies. Novelty statement The works cited did not / partially considered the role of XAI integrated with ML in enhancing the prediction capabilities. We made a wider search using Google Scholar and Google search to check if a work like this had also been published. Our search did not yield any results, which points to the fact that this work may be novel, considering the intricacies presented here. We considered machine learning frameworks rst to investigate an appropriate model and, in other words, train the model. The saved ML model was used as a starting point in the XAI framework (SHAP). We observed that there was an increase in the accuracy metric when the trained ML model was passed onto XAI.
Article
Full-text available
Dental and skeletal fluorosis caused by consuming high-fluoride groundwater has been reported over several decades globally. Prediction maps to estimate the fluoride contaminated area rely on interpolation methods. This study presents a comparison of the accuracy of nine spatial interpolation methods in predicting the fluoride in groundwater. Leave-one-out cross-validation (LOOCV), hold-out validation and validation with an independent dataset were used to assess the precision of the interpolation methods. This is the first study on fluoride with a large dataset (N = 13,585) applied at the regional level in India. Our findings showed that the inverse distance weighted (IDW) algorithm outperformed other methods in terms of less discrepancy between measured and predicted fluoride. IDW and local polynomial interpolation (LPI) were the only methods to predict contaminated areas (fluoride > 1.5 mg/L). However, the area estimated by the typical assessment of the percentage of unsuitable samples was much higher (6.1%) compared to that estimated by IDW (0.2%) and LPI (0.2%). LOOCV provided viable results than the other two validation methods. Interpolation methods are accompanied with uncertainty which are regulated by the sample size, sample density, sample distribution, minimum and maximum measured concentrations, smoothing and border effects. Drawing a comparison among variegated interpolation methods capturing a wide range of prediction uncertainty is suggested rather than relying on one method exclusively. The high-fluoride areas identified in this study can be used by the Government in planning remediation actions.
Article
Full-text available
The aim of this paper was to present a precipitation trend analysis using gridded data at annual, seasonal and monthly time scales over the Fez-Meknes region (northern Morocco) for the period 1961–2019. Our results showed a general decreasing trend at an annual scale, especially over the mountain and the wetter parts of the region, which was statistically significant in 72% of the grid points, ranging down to −30 mm per decade. A general upward trend during autumn, but still non-significant in 95% of the grid points, was detected, while during winter, significant negative trends were observed in the southwest (−10 to −20 mm per decade) and northeast areas (more than −20 mm per decade) of the region. Spring rainfall significantly decreased in 86% of the grid points, with values of this trend ranging between 0 and −5 mm per decade in the upper Moulouya and −5 to −10 mm per decade over the rest of the region (except the northwest). At a monthly time scale, significant negative trends were recorded during December, February, March and April, primarily over the northeast Middle Atlas and the northwest tip of the region, while a significant upward trend was observed during the month of August, especially in the Middle Atlas. These results could help decision makers understand rainfall variability within the region and work out proper plans while taking into account the effects of climate change.
Article
Full-text available
Rainfall depth is a crucial parameter in water resources and hydrological studies. Rain gauges provide the most reliable point-based rainfall estimates. However, they do not have a proper density/distribution to provide sufficient rainfall measurements in many areas, especially in arid regions. To evaluate the adequacy of satellite datasets as an alternative to the rain gauges, the Kingdom of Saudi Arabia (KSA) is selected for the current study as a representative of the arid regions. KSA occupies most of the Arabian Peninsula and is characterized by high variability in topographic and climatic conditions. Five satellite precipitation datasets (SPDSs)—CMORPH, PERSIANN-CDR, CHIRPS V2.0, TMPA 3B42 V7, and GPM IMERG V6—are evaluated versus 324 conventional rain-gauges’ daily precipitation measures. The evaluation is conducted based on nine quantitative and categorical metrics. The evaluation analysis is carried out for daily, monthly, yearly, and maximum yearly records. The daily analysis revealed a low correlation for all SPDSs (<0.31), slightly improved in the yearly and maximum yearly analysis and reached its highest value (0.58) in the monthly analysis. The GPM IMERG V6 and PERSIANN-CDR have the highest probability of detection (0.55) but with a high false alarm ratio (>0.8). Accordingly, in arid regions, the use of daily SPDSs in rainfall estimation will lead to high uncertainty in the obtained results. The best performance for all statistical metrics was found at 500–750 m altitudes in the central and northern parts of the study area for all satellites except minor anomalies. CMORPH dataset has the lowest centered root mean square error (RMSEc) for all analysis periods with the best results in the monthly analyses.
Article
Full-text available
Precipitation monitoring is important for earth system modeling and environmental management. Low spatial representativeness limits gauge measurements of rainfall and low spatial resolution limits satellite-derived rainfall. SM2RAIN-based products, which exploit the inversion of the water balance equation to derive rainfall from soil moisture (SM) observations, can be an alternative. However, the quality of SM data limits the accuracy of rainfall. The goal of this work was to improve the accuracy of rainfall estimation through merging multiple soil moisture (SM) datasets. This study proposed an integration framework, which consists of multiple machine learning methods, to use satellite and ground-based soil moisture observations to derive a precipitation product. First, three machine learning (ML) methods (random forest (RF), long short-term memory (LSTM), and convolutional neural network (CNN)) were used, respectively to generate three SM datasets (RF-SM, LSTM-SM, and CNN-SM) by merging satellite (SMOS, SMAP, and ASCAT) and ground-based SM observations. Then, these SM datasets were merged using the Bayesian model averaging method and validated by wireless sensor network (WSN) observations. Finally, the merged SM data were used to produce a rainfall dataset (SM2R) using SM2RAIN. The SM2R dataset was validated using automatic meteorological station (AMS) rainfall observations recorded throughout the Upper Heihe River Basin (China) during 2014–2015 and compared with other rainfall datasets. Our results revealed that the quality of the SM2R data outperforms that of GPM-SM2RAIN, Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS), ERA5-Land (ERA5) and multi-source weighted-ensemble Precipitation (MSWEP). Triple-collocation analysis revealed that SM2R outperformed China Meteorological Data and the China Meteorological Forcing Dataset. Ultimately, the SM2R rainfall product was considered successful with acceptably low spatiotemporal errors (RMSE = 3.5 mm, R = 0.59, and bias = −1.6 mm).
Article
Full-text available
Understanding trends and variability of precipitation is essential to improve water resources utilization as well as agricultural activities. This study aims to investigate the spatiotemporal trends and variability of rainfall in the Modjo watershed, central Ethiopia. The Mann-Kendall trend (M–K) test, innovative trend analysis (ITA) and Sen's slope estimator were used to determine temporal trends, while the inverse distance weighted interpolation technique was adopted to visualize the spatial trends in time series. The result showed that complex patterns of rainfall variability that range from 16 to 59%, 18 to 63%, and 50 to 90% for the annual, summer, and spring seasons, respectively was observed over the study watershed. The result also indicated that significant trend (p < 0.05) in annual rainfall was detected only in 28.6% and 42.9% of the stations under the M-K test and ITA method, respectively, which indicates relatively more significant trends are displayed by the ITA method than the M–K test. At the seasonal scale, positive trends have been more dominant in the summer season (Z > 0, SITA > 0), whereas negative trends (Z < 0, SITA < 0) were detected in the spring season. Comparatively, the ITA method is found to be robust and allows more detailed trend analysis results using graphical illustrations for extreme events. The study concludes that the increasing and decreasing trends in summer and spring rainfall patterns could have implications leading to an increase in extreme events and lower agricultural productivity, respectively. The result suggests the need for planning effective adaptation strategies at the regional and local scales. HIGHLIGHTS Comparative trend analysis approach was followed using the classical M–K test and ITA method.; The spatiotemporal trend is analyzed at station level which is not a common approach.; The study indicates the ITA method displays more significant trends than the M–K test.; Increasing extreme events and lower agricultural productivity were identified as the major implications in the future.; Establishing effective adaptation strategies in such a rain-fed agricultural watershed is needed.;
Article
Full-text available
Makkah region is one of the most flash flood-prone areas of Saudi Arabia due to terrain characteristics and the synoptic-scale weather conditions that intensify through interaction with the local topography causing high convective short-lived rainfall events, although these conditions are quite infrequent. Most of these events last for less than two hours. This study aims to assess the performance of five satellite precipitation products over a 1725 km ² sparsely gauged, arid basin. A fully distributed, physically based hydrologic model was forced by the five satellite precipitation products, and the evaluation included the hydrographs and runoff maps predicted by the model. Moreover, the propagation of the satellite rainfall errors into runoff predictions was quantified. Large variations and significant biases were found in satellites precipitation estimates compared to the available ground rainfall measurements. The Early IMERG product showed the best agreement with the reported total rainfall accumulations followed by Late IMERG while the other products significantly underestimated precipitation accumulations. Comparison with estimated runoff peaks showed that the Early IMERG product has the lowest errors in runoff peaks. Therefore, the hydrographs produced by the Early IMERG product were used as a reference to quantify the propagation of satellite precipitation errors into runoff predictions over the Makkah watershed. The results clearly indicated that both systematic and random rainfall errors were significantly amplified in runoff predictions.
Article
Full-text available
Rainfall plays an important role in agricultural production. It has a profound influence on the growth, development and yields of a crop, incidence of pests and diseases, water needs and fertilizer requirements in terms of differences in nutrient mobilization due to water stresses and timeliness and effectiveness of prophylactic and cultural operations on crops. Occurrences of erratic rainfall are beyond human control. However, it is possible to adapt to or mitigate the adverse effects if a forecast of the rainfall can be had in time. Accurate information on rainfall is essential for the planning and management of agricultural operations. Nevertheless, rainfall is one of the most complex and difficult elements of the hydrology cycle to understand and to model due to the complexity of the atmospheric processes that generate rainfall and the tremendous range of variation over a wide range of scales both in space and time. Thus, accurate rainfall forecasting is one of the greatest challenges in operational hydrology. An attempt has been made here to develop a Final Long range Forecast of seasonal monsoon based on multiple regression technique to predict monsoon rainfall based on 16 parameters related with Anand (Gujarat). The 16 parameter of Anand for the last 25 years (1980-2005) was used for development of model. The operational forecast was given by using mean of both the models which has less error, (Avg. Error 3.5% for MRM-I and 5.1% for MRM-II respectively). The operational forecast is having still less error i.e. 0.6% for all data (1980-2009) and 2.9% for independent data (2006 to 2009), so it can be used for giving final forecast.
Article
Full-text available
As precipitation is a fundamental component of the global hydrological cycle that governs water resource distribution, the understanding of its temporal and spatial behavior is of great interest, and exact estimates of it are crucial in multiple lines of research. Meteorological data provide input for hydroclimatic models and predictions, which generally lack complete series. Many studies have addressed techniques to fill gaps in precipitation series at annual and monthly scales, but few have provided results at a daily scale due to the complexity of orographic characteristics and in some cases the non-linearity of precipitation. The objective of this study was to assess different methods of filling gaps in daily precipitation data using regression model (RM) and machine learning (ML) techniques. RM included linear regression (LRM) and multiple regression (MRM) algorithms, while ML included multiple regression algorithms (ML-MRM), K-nearest neighbors (ML-KNN), gradient boosting trees (ML-GBT), and random forest (ML-RF). This study covered the Malas, Omas, and Cañete River (MOC) watersheds, which are located on the Pacific Slope of central Peru, and a nineteen-year period of records (2001–2019). To assess model performance, different statistical metrics were applied. The results showed that the optimized machine learning (OML) models presented the least variability in estimation errors and the best approximation of the actual data from the study zone. In addition, this investigation shows that ML interprets and analyzes non-linear relationships between rain gauges at a daily scale and can be used as an efficient method of filling gaps in daily precipitation series.
Book
This text fulfils a need for an advanced-level work covering both the theory and application of geostatistics. It covers the most important areas of geostatistical methodology, introducing tools for description, quantitative modelling of spatial continuity, spatial prediction, and assessment of local uncertainty and stochastic simulation. It also details the theoretical background underlying most GSLIB programs. The tools are applied to an environmental data set, but the book includes a general presentation of algorithms intended for students and practitioners in such diverse fields as soil science, mining, petroleum, remote sensing, hydrogeology, and the environmental sciences.
Article
The representation of rainfall in space is important for hydrological modelling. Accurate estimation of rainfall is particularly challenging in mountainous regions where observations are often sparse relative to the spatial variability of rainfall. In these regions, orographic processes lead to complex patterns of rainfall enhancement and rain shadow depletion. This study tests Natural Neighbour Interpolation (NNI), ordinary kriging (OK) and ordinary cokriging (CK), to determine if CK improves rainfall interpolation during three extreme rainfall events. Three different elevation indices were considered as secondary variables for CK. Preliminary analysis using long‐term annual average rainfall totals, including additional high elevation rainfall observations, showed that CK with an effective elevation index (a directionally smoothed elevation, corrected for degree of ‘orographic processing’ and shifted to account for ‘wind‐drift’ of rainfall) as a secondary variable performed better than NNI and OK with an overall improvement of around 40%. Using rainfall totals for long‐term wind direction and wind speed rainfall classes, CK performance was variable but provided an improvement of approximately 15% for wind direction classes without an easterly wind component. For 15‐minute timesteps during extreme rainfall events, there were comparatively small differences (cross‐validation using RMSE) between interpolation methods, partly attributed to having only relatively low elevation rainfall observations, providing weak constraint. Using cross‐validation and mean bias did, however, show an improvement for both high and low elevation observation classes. Importantly, cross‐variogram estimation provided differing cross‐validation results when estimated for different rainfall accumulation periods: 15‐minutes, hourly, daily and long‐term. Variograms and cross variograms estimated at a 15‐minute timestep frequency were robust for many timesteps, but were difficult to fit automatically for others. Variograms estimated from longer periods were more reliably estimated, but tended to have lower variance and cross‐variance and longer correlation ranges producing a smoother interpolated rainfall field. Given the weak cross‐validation constraint, care must be taken in identifying the most appropriate method and variogram estimation period. This article is protected by copyright. All rights reserved.