ArticlePDF Available

The study of environmental and human factors affecting aquifer depth changes using tree algorithm

Authors:
  • hassan.mirhashemi@yahoo.com
  • College of agricultural and natural resources

Abstract and Figures

In recent years, more attention has been paid to water resources due to the development of agriculture and proper planning for better management of aquifers. Considering the effect of different factors on aquifer depth changes, in this study, human and environmental factors affecting the depth of aquifer changes in Qazvin plain were used. The Classification And Regression Tree (CART) algorithm was adopted to investigate and predict changes in aquifer depth. According to the results, the highest probability of the aquifer drop observed in July, August and September was 86.5%, and the highest probability of uprising aquifer depth arisen in December, January, February and March was 71.2%. According to sensitivity analysis by CART algorithm, the most important human and environmental factors affecting the number of aquifer depth changes in Qazvin plain were groundwater withdrawal from the agricultural abstraction well and the air temperature, respectively. Therefore, predicting the amount of aquifer depth changes by CART model, planning and managing groundwater resources is possible.
Content may be subject to copyright.
Vol.:(0123456789)
1 3
International Journal of Environmental Science and Technology
https://doi.org/10.1007/s13762-019-02504-2
ORIGINAL PAPER
The study ofenvironmental andhuman factors aecting aquifer
depth changes using tree algorithm
S.H.Mirhashemi1· P.Haghighatjou1· F.Mirzaei2· M.Panahi3
Received: 20 March 2019 / Revised: 26 June 2019 / Accepted: 3 August 2019
© Islamic Azad University (IAU) 2019
Abstract
In recent years, more attention has been paid to water resources due to the development of agriculture and proper planning for
better management of aquifers. Considering the effect of different factors on aquifer depth changes, in this study, human and
environmental factors affecting the depth of aquifer changes in Qazvin plain were used. The Classification And Regression
Tree (CART) algorithm was adopted to investigate and predict changes in aquifer depth. According to the results, the highest
probability of the aquifer drop observed in July, August and September was 86.5%, and the highest probability of uprising
aquifer depth arisen in December, January, February and March was 71.2%. According to sensitivity analysis by CART
algorithm, the most important human and environmental factors affecting the number of aquifer depth changes in Qazvin
plain were groundwater withdrawal from the agricultural abstraction well and the air temperature, respectively. Therefore,
predicting the amount of aquifer depth changes by CART model, planning and managing groundwater resources is possible.
Keywords Air temperature· Aquifer management· Aquifer drop· Qazvin plain
Introduction
Groundwater is one of the essential natural resources that
plays an important role in providing water resources. So,
finding and predicting the spatial distribution of possible
locations for the groundwater detection is an important
topic for private, governmental and research institutes
(Nampak etal. 2014a, b). However, over-discharge of
groundwater along with low recharge due to low rainfall
and aridity leads to decrease groundwater level and as
a result of shortage of freshwater (Li etal. 2013). All
over the world, groundwater is an important source of
freshwater, especially for areas with sacristy of surface
water (Moreaux and Reynaud 2006).Groundwater surface
methods mainly focus on physically based numerical sim-
ulation models (Feng etal. 2011; Praveena etal. 2012;
Li etal. 2013), wavelet linear regression (WLR), artifi-
cial neural network (ANN) and dynamic autoregressive
(DAR) models (Adamowski and Chan 2011; Taormina
etal. 2012; Maheswaran and Khosa 2013). In addition,
some scientists used different methods for assessing the
risk of groundwater exploration. However, these studies
emphasize on assessment of risk of groundwater pollution
(Wang etal. 2012; Sener and Davraz 2013). Addition-
ally, the studies on risk assessment methods for deter-
mination of groundwater levels are rare; for example, a
bivariate-copula-based approach was presented by Reddy
and Ganguli (2012) for risk assessment due to hydrocli-
matic variability on groundwater levels in an unconfined
aquifer at the Manjara watershed in India. Dong etal.
(2013) divided groundwater in Tianjin into seven task
areas, which include comprehensive evaluation model for
the use of groundwater risks to forecast the level of the
third aquifer in years of 2015, 2020, and 2030.
The results illustrate that the ability of proposed GPR
model is better than partial least squares, back propa-
gation artificial neural networks, and least squares sup-
port vector regression (LSSVR) (Liu etal. 2018). To
Communicated by Parveen Fatemeh Rupani.
* P. Haghighat jou
phjou40@gmail.com
1 Department ofWater Engineering, Faculty ofWater
andSoil, University ofZabol, Zabol, Iran
2 Department ofIrrigation andDrainage, Faculty
ofAgriculture andNatural Resources, University ofTehran,
Tehran, Iran
3 Department ofWater Engineering, Faculty ofAgriculture,
University ofZanjan, Zanjan, Iran
International Journal of Environmental Science and Technology
1 3
determine the landslide susceptibility mapping, a data
mining classification technique was used. A decision tree
is a favorite classification algorithm, although it is hard to
use previously to analyze landslide susceptibility, because
the results assume a uniform class distribution, but land-
slide spatial event data sometimes are high class imbal-
anced (Yeon etal. 2010). The classification tree technique
was applied to develop a prediction model. The model
was very helpful in prediction (Hossain and Piantanakul-
chai 2013). Decision trees were relatively more accurate
than neural networks and support vector machines, but
the nodes of the rule were more than requested. Adjust-
ment of minimum support led to set of more tractable
rule (Olson etal. 2012). The decision tree, which is one
of the most popular classification techniques, is applied
in the process of data mining. Decision tree is represented
by introducing a tree, which summarizes the classifica-
tion method. Decision trees are used to predict items
membership in different classes. The ability to set this
technique makes it more usable than other methods of
data mining. The decision tree methodology consists of
two main phases: A—constructing the primary tree: using
educational data, constructing decision tree continues so
that each leaf becomes homogeneous. B—pruning: at this
phase, according to the experimental data, the grown tree
is pruned to increase the accuracy of the model (Cichosz
2015). Lee etal. investigated the potential of groundwater
productivity in a research, using the decision tree tech-
nique and geographical information system in Boryeong
and Pohang cities of the Korea. The results showed that
the models of the decision tree can be useful for the devel-
opment and study of groundwater resources (Lee and Lee
2015). Data mining and machine learning techniques are
two effective tools for studying similarity of watersheds
from hydrological viewpoint (Di Prinzio etal. 2011; Ley
etal. 2011; Toth 2013).
The CART is a powerful prediction tool with accu-
rate results. Chi-square Automatic Interaction Detector
(CHAID) algorithm (developed by a method called AID)
uses the Chi-squared test for tree split strategy, for this
reason it is called Chi-squared AID. This algorithm sup-
ports continuous and discrete variables (inputs) and can
perform regression and classification functions on the
result variable (Bozkir and Sezer 2011). In order to imple-
ment such programs to conserve groundwater resources,
the decision tree supports groundwater managers and
decision makers (Stumpp etal. 2016).
Tien Bui etal. (2018) have also determined the possibility
of land subsidence in South Korea using four machine learning
models including support vector machine (SVM), logistic
model tree (LMT), Bayesian logistic regression (BLR) and
alternate decision tree (ADTree). They found the BLR model
more precisely than other applied methods, though the other
methods also showed reasonable precision.
Most of Iran’s plains are arid and face water scarcity.
Thus, it is very important to manage water resources in the
plains of Iran. Hence, in this research, in order to the bet-
ter aquifers management, the effects of various factors on
depth changes of them were studied. The questions arisen
in this study are: what human and environmental factors
affect the aquifer depth changes? And how are the depth
changes in different months of the year? The objective of
the study is better management of the aquifers. The results
of CART algorithm were compared to results of CHAID,
SVM, Reduced Error Pruning Tree (REPTree) and neural
net algorithms. The results show that CART algorithm is the
best algorithm among others.
Materials andmethods
Study area
Qazvin plain with an area of 440 thousand hectares is
located on central plateau of Iran. Its climate is semi-
arid with hot summers and cold winters. One of the
most important rivers, providing water for this plain, is
Taleghan River, in which Taleghan dam was built. The
dam has a capacity of 460 million m3. This never saved
more than 210 million m3 water, due to its severe loss
of inflow current (Mohammadrezapour etal. 2019). The
position of Qazvin plain is shown in Fig.1.
The total agricultural area of Qazvin plain is about
313,608 hectares, approximately 82,070 hectares are within
the irrigation network. The volume of water entering to
irrigation network of Qazvin Plain is provided through the
Taleghan Dam reservoir. In the agricultural area of Qazvin
plain, the number of agricultural wells is about 1900, and
its average flow rate is about 30 L/s. In order to determine
the volume of agricultural water demand, the amount of
water requirement for completely agricultural products
in the agricultural area of Qazvin plain increased, by its
cultivation area. The volume of precipitation was obtained
by multiplying the amount of precipitation by the area of
precipitation.
The monthly potential evapotranspiration of Qazvin
plain was calculated by Penman–Monteith formula. Right
now, the FAO Penman–Monteith method is proposed as
the only standard method for defining, and calculating the
International Journal of Environmental Science and Technology
1 3
Fig. 1 Location of the Qazvin plain in Iran
Fig. 2 a, b The aquifer depth
changes in 15-years period
International Journal of Environmental Science and Technology
1 3
reference evapotranspiration, regarding the expertise ses-
sion in May 1990 (Allen etal. 1998) (Equation1).
where ETo is reference evapotranspiration (mmday−1), T
mean daily air temperature at 2m height (°C), Rn net radia-
tion at the crop surface (MJm−2day−1), u2 wind speed at
2m height (ms−1), es saturation vapor pressure (kPa), ∆
slope vapor pressure curve (kPa °C−1), ea actual vapor pres-
sure (kPa), γ psychrometric constant (kPa °C−1), es-ea satu-
ration vapor pressure deficit (kPa), G soil heat flux density
(MJm−2day−1).
The used data in this study are based on monthly
information from Qazvin plain area among the years
of 2001 to 2015. Human factors such as the volume of
water discharged from agricultural wells (million cubic
meters), the volume of water delivered to irrigation net-
work (million cubic meters), the volume of agricultural
water demand (million cubic meters) and environmental
factors such as the volume of precipitation (million cubic
meters), the temperature of air (centigrade), air humid-
ity (percent) and potential evapotranspiration (millimeter
per day) were used. Also, data related to the amount of
aquifer depth fluctuations were introduced into a given
model as target data.
In Fig.2a, b, the aquifer depth changes or fluctuations
have been shown in 15-year period. The negative amounts
of changes are related to the aquifer depth falling, and
the positive ones are related to its rising. As shown in the
figure, the highest falling of the aquifer and its highest ris-
ing have been occurred in August and April, respectively.
Data mining algorithms
The CART decision tree algorithm is a method to construct
prediction models using data. This algorithm divides its
input data repeatedly and is able to process classified pre-
dictor and target variables (Loh 2011).
CART indicates Classification And Regression Trees
(Breiman etal. 1984). This is chosen by the fact that it
builds binary trees, for example each internal node has
precisely two outgoing edges. The splits are selected
by coupling criteria, and the obtained tree is pruned
by cost–complexity pruning. At the time of presenting,
CART can reflect inappropriate classification costs in the
tree induction. It also helps users to provide a possible
prior distribution. An important CART character is its
(1)
ET
O=
0.408
[
RnG)+𝛾
900
T + 273U2.(esea)
]
Δ+𝛾(1+0.34 U
2
)
ability to produce regression trees. Regression tree leaves
forecast a real number and not a class. In case of regres-
sion, CART searches for splits that minimize the forecast
squared error. The mean weight for node is a basis of
prediction per leaf (El Seddawy etal. 2013). Advantages
and disadvantages of this method are as follows (Timo-
feev 2004):
Advantages CART can easily manage both numerical
and categorical variables by. CART algorithm will itself
discriminate the most significant variables and removes non-
significant ones. CART can easily manage extreme values.
Disadvantages CART may have unsteady decision tree.
Irrelevant qualification of learning sample such as remov-
ing several observations and making changes in decision
tree: increasing or decreasing of tree complication, chang-
ing variables of division and values. CART is splinted by
only one variable. One of the problems in CART model is
its bias in selecting the variables. In addition, in qualitative
variables with more than two levels, results may be confus-
ing. Several levels of a variable can belong to a node and
make it impossible to easily interpret the obtained results
(Breiman etal. 1984).
CHAID algorithm finds the differences of each sample
and then produces (generate) the intended tree. Pruning
the tree performs by finding similar differences (Chat-
tamvelli 2011). SVM method is a supervised nonpara-
metric statistical approach and is based on the assumption
that there is no information about data distribution. The
main characteristic of this method is the high ability to
use lesser training data and higher accuracy compared to
methods mentioned above (Mantero etal. 2005; Mount-
rakis etal. 2011). Neural networks have layer categoriza-
tion nature, such that each layer consists of a few nodes
(neurons), and process begins with data input and ends
by output (Richards., 1999). The REPTree algorithm is
a fast decision tree learner, which constructs a decision/
regression tree using data gain/variance and prunes it by
Table 1 Model denomination in data classification
Model name Data range (m) Data average (m) Percent-
age of
data
A··· < −1 − 1.66 10.23
B− 1 ≤ ··· < −0.3 − 0.56 25.79
C−0 .3 ≤ ··· < 0 − 0.15 24.14
D0 < ··· < 0.3 0.15 17.66
E0.3 ≤ ··· < 1 0.55 16.78
F1 ≤ ··· 1.73 5.4
International Journal of Environmental Science and Technology
1 3
reduced error pruning. The algorithm only displays values
for numeric indices once (Daud and Corne 2007).
Classication ofaquifer depth changes data
In order to increase the usability of results for managers
and also to reduce the effects of errors, the categorization
of the values of groundwater depth fluctuations was per-
formed. If the reading errors are less than 1m, the effect of
these errors is reduced by categorization. Also, for outlier
data values, this problem can be resolved by performing
a data categorization process. The purpose of this method
is to place the data in categories according to rules and
exclude a category that is very small. The categorization
is a process that converts continuous attributes into dis-
crete attributes. In this study, the data on aquifer depth
fluctuations were divided into six models, and the bounda-
ries between discrete values were determined so that each
model is sufficiently represented the data sample. After
categorizing the complete data, for each range, the data
were called by the model name, from A to F (Table1),
so that the level of the groundwater drop decreases (in
descending order), from model A to C, and the level of the
groundwater rising (in ascending order) increases from D
to F. The negative sign of the data range is related to the
depth of aquifer drawdown.
As shown in Table1, the highest and lowest percent-
ages of the aquifer depth changes have been allocated to B
(0.3 to 1m) and F models (uprising more than 1m) with
an average value of 25.8% and 5.4%, respectively. Dis-
crete values of aquifer depth changes were introduced into
CART tree algorithm as the target function. In cases where
the target function is discrete, the software for each branch
of the tree, in addition to the primary rules provides infor-
mation about the frequency of each range. In this research,
the frequency of each model was used in different months
and some related primary rules.
In order to validate the five models, the data were also
divided into two parts: training and test data. The tree
model was classified using training and test data. The five
models were constructed using training data, and then the
constructed models were examined on test data. The per-
centage of samples from the test data expresses the accu-
racy of the model, which the model (Gupta 2011) correctly
identifies their target characteristics. For the models, 70%
Table 2 Evaluation of the results of the models by statistical indica-
tors
Statistical
indicator
CART Neural net CHAID CVM REPTree
TPR 0.62 0.30 0.28 0.30 0.22
TNR 0.78 0.62 0.52 0.53 0.41
PPV 0.45 0.40 0.35 0.37 0.35
ACC 0.75 0.71 0.67 0.74 0.73
FM 0.30 0.25 0.13 0.25 0.14
GM 0.69 0.43 0.38 0.39 0.30
Fig. 3 The results of tree
graph derived from the CART
algorithm
International Journal of Environmental Science and Technology
1 3
of the data were randomly selected as training data and
the remaining 30% were tested as test data. The statistical
indices of Eqs.27 were used to evaluate the results of
CART, CHAID, SVM, REPTree and neural net algorithms.
In order to evaluate the models and select the best
one, the true positive rate index (TPR), true negative rate
(TNR), accuracy (ACC), positive predictive value or pre-
cision (PPV), F mean (FM) and geometric mean (GM)
were used. The indices are specified by the relationship
between 2 and 7. Indicators are calculated on the basis of
2 to 7 relationships.
(2)
TPR
=
TP
TP +FN
(3)
TNR
=
TN
FP + TN
(4)
ACC
=
TP + TN
TP+FN+FP+TN
(5)
FM
=
2×TP
TP +FP +FN
(6)
GM =
TPR
×TNR
(7)
PPV = TP
TP +FP
Table 3 The frequencies of models in December, January, February
and March
Model name Model frequency Percentage of
model frequency
A 57 2.2
B 219 8.3
C 480 18.3
D 779 29.6
E 837 31.9
F 255 9.7
Table 4 Frequency probability of models in April and November
Model name Model frequency Percentage of
model frequency
A 93 7.1
B 234 17.9
C 342 26.3
D 287 22.2
E 257 19.7
F 89 6.8
Table 5 Frequency probability of models in July, August, and Sep-
tember
Model name Model frequency Percentage of
model frequency
A 167 15.3
B 482 44.0
C 298 27.2
D 79 7.2
E 50 4.6
F 19 1.7
Table 6 Initial rules for the dominant state of model B in July,
August and September
Initial Month name in [“August” “July” “June” “May” “Octo-
ber” “September”]
1 The temperature of air > 14
2Potential evapotranspiration > 3.24
3 Percent of air humidity 45.5
4 The volume of agricultural water demand > 0.03
5 The volume of water entering irrigation network > 0.07
Table 7 Frequency probability of models in May, June, and October
Model name Model frequency Percentage of
model frequency
A 118 9.28
B 436 28.27
C 440 40.59
D 139 10.93
E 101 7.94
F 38 2.99
Table 8 The initial rules for the dominant state of model C in May,
June and October
Initial Month name in [“June” “May” “October”]
1 The temperature of air 19.5
2Potential evapotranspiration ≤ 3.2
3Percent air humidity > 45.5
4Volume of precipitation > 0.002
5 The volume of water entering to irrigation
network ≤ 0.2
International Journal of Environmental Science and Technology
1 3
where TP is the number of positive tag data stored correctly,
FP the number of negative label data categorized as false
positive, FN the number of positive tag data classified incor-
rectly, and TN the number of negative tag data classified
correctly (Han etal. 2011).
Results anddiscussion
Among the used algorithms, the best results are related to
CART algorithm with the highest true positive rate (0.62),
true negative rate (0.78), accuracy (0.75), positive predictive
value or precision (0.45), index F (0.30) and geometric mean
(0.69) (Table2).
The results of CART tree diagram are shown in Fig.3.
As can be seen, diagram is divided into two main branches,
so that the months of May, June, July, August, September
and October are in the first main branch, and the months of
November, December, January, February, March and April
are in the second main branch. At the end of each branch,
the predicted values are shown due to aquifer depth changes.
Details of tree graph are listed in Tables3, 4, 5, 6, 7 and 8.
The rst category
In December, January, February andMarch
Table3 shows the probability of occurrence of aquifer depth
changes is determined by the frequency. The E and A mod-
els, with the amount of 31.9% and 2.2%, respectively, are
the highest and lowest probabilities () which were obtained
in December, January, February and March, respectively.
According to Table3, the summation of the probabilities
of uprising and drop of aquifer depth is 71.2 and 28.8%,
respectively. Therefore, during these four months, the prob-
ability of uprising is higher than the probability of aquifer
depth drop.
In April andNovember
According to the frequency table (Table4), C and F mod-
els have the highest and lowest probabilities of 26.3% and
6.8%, respectively, in April and November. According to
Table4, the summation of the uprising and drop prob-
abilities of aquifer depth are 51.3% and 48.7%, respec-
tively. Therefore, the difference among drop and uprising
of aquifer depth in these two months is not very different.
Fig. 4 The importance of input
parameters in predicting the
amount of aquifer depth changes
International Journal of Environmental Science and Technology
1 3
The second category
In July, August andSeptember
As can be seen in Table5, B and F models have the high-
est and lowest probabilities of 44% and 1.7%, respectively,
occurred in July, August and September. According to
Table5, the sum of the uprising and drop probabilities
of aquifer depth is 86.5% and 13.5%, respectively. There-
fore, the probability of drop in these three months is much
higher than the probability of uprising.
The conditions of model B were investigated by initial
rules and showed reduction in drop based on the results.
The initial rules are specified for a state that ends in B
(Table6).
According to Table6 of initial rules, among the factors
affecting the changes of aquifer depth, it can be mentioned
to increase the values of two human factors, the volume of
water entering the irrigation network and the volume of agri-
cultural water demand. Therefore, the use of more advanced
irrigation systems can be identified with higher outputs and
fewer losses, which reduce the volume of water demand
from water resources and ultimately reduce the volume of
water entering irrigation network. Among the environmen-
tal factors affecting the amount of increase in drop of aqui-
fer, the increase in air temperature and the reduction of air
humidity can be mentioned, so when the air temperature
increases and the air humidity decreases, it is better to pre-
vent undesirable conditions in the aquifer.
In May, June andOctober
According to obtained results of frequency table (Table7),
it has been shown that C and F models, with the values
of 40.6% and 3%, have the highest and lowest probabili-
ties respectively, that occurred in May, June and October.
According to drop and uprising values, the sum of the upris-
ing and drop probabilities of aquifer depth are 78.1% and
21.9%, respectively. Therefore, the probability of drop in
the depth of aquifer is higher than the probability of uprising
during these three months.
The conditions of model C are investigated by initial rules
to suggest ways to reduce the drop based on their results.
The initial rules are determined for a state that leads to C
(Table8).
According to Table8, effective human factor affecting
the amount of drop is the volume of water entering the
irrigation network and the volume of agricultural water
demand. Due to the simultaneous increase in the volume
of precipitation and the volume of water entering to the
irrigation network, it can be assumed that the increase
in precipitation (assuming that the aquifer is in a desir-
able situation) leads to a wrong decision of managers and
farmers for the high-consumption crop pattern, due to
increased agricultural water consumption. This inference
is determined by increasing the volume of water enter-
ing the irrigation network, and eventually a drop in the
aquifer.
Figure4 shows the results of the sensitivity analysis of
CART algorithm in determining the importance of various
factors in humans and environmental factors that affect the
prediction of aquifer depth changes. Considering that the
volume of water taken from the agricultural wells and the air
temperature has been selected as the most important human
and environmental factors affecting the prediction of ground-
water depth changes.
Conclusion
The prediction of aquifer depth changes is an impor-
tant subject in order to better manage the aquifers. In
this study, a model is presented for prediction of aquifer
depth changes using CART tree algorithm. Based on the
results, the air temperature as an environmental factor has
the greatest effect on aquifer depth changes. So, due to
tree diagram, the maximum drop and rise of aquifer depth
mainly occur in the hottest and coolest months of the
year, respectively. Thus, it is advisable to supervise irri-
gation water utilization and prevent crop production with
high water requirement in warm months. Among human
factors, water discharged from agricultural wells has the
greatest effect on aquifer depth changes. Thus, in order
to better manage aquifers, it is recommended that instead
of using groundwater, water supplied through irrigation
networks should be used. The use of water supplied by
irrigation networks leads to efficient application of water.
So, irrigation networks play an important role in optimiz-
ing the use of water resources (Burt 2007).
Finally, more attention should be considered to water
resources management. The aquifer depth fluctuations
should regularly measure at more appropriate intervals, and
total of human factors that affect the aquifer depth fluctua-
tions should be properly managed.
International Journal of Environmental Science and Technology
1 3
Acknowledgements The authors would like to thank the Coordinator-
ship of the Scientific Research Projects of University Zabol.
References
Adamowski J, Chan HF (2011) A wavelet neural network conjunction
model for groundwater level forecasting. J Hydrol 407(1–4):28–40
Allen RG, Pereira LS, Raes D, Smith M (1998) Crop evapotranspi-
ration-guidelines for computing crop water requirements-FAO
irrigation and drainage paper 56. Fao, Rome, vol 300, no 9,
pp D05109
Bozkir AS, Sezer EA (2011) Predicting food demand in food courts
by decision tree approaches. Proc Comput Sci 3:759–763
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classifica-
tion and regression trees, 1st edn. Chapman and Hall/CRC,
New York
Burt CM (2007) Volumetric irrigation water pricing considerations.
Irrigat Drain Syst 21(2):133–144
Chattamvelli R (2011) Data mining algorithms, 1st edn. Alpha Sci-
ence International, Oxford, pp 274–290
Cichosz P (2015) Data mining algorithms: explained using R.
Wiley, New York
Daud MNR, Corne DW (2007) Human readable rule induction
in medical data mining: a survey of existing algorithms. In:
WSEAS European computing conference, Athens, Greece
Dong D, Sun W, Zhu Z, Xi S, Lin G (2013) Groundwater risk
assessment of the third aquifer in Tianjin city, China. Water
Resour Manag 27(8):3179–3190
El Seddawy AB, Sultan T, Khedr A (2013) Applying classifica-
tion technique using DID3 algorithm to improve decision
support under uncertain situations. Department of Business
Information System, Arab Academy for Science and Technol-
ogy and Department of Information System, Helwan Univer-
sity, Egypt. Int J Mod Eng Res 3(4):2139–2146
Feng S, Huo Z, Kang S, Tang Z, Wang F (2011) Groundwater
simulation using a numerical model under different water
resources management scenarios in an arid region of China.
Environ Earth Sci 62(5):961–971
Gupta GK (2011) Introduction to data mining with case studies,
2nd edn. Prentice Hall, Upper Saddle River, pp 526–534
Han J, Kamber M, Pei J (2011) Data mining concepts and tech-
niques, 3rd edn. Morgan Kaufmann, Burlington, pp 365–369
Hossain MM, Piantanakulchai M (2013) Groundwater arsenic
contamination risk prediction using GIS and classification
tree method. Eng Geol 156:37–45
Lee S, Lee CW (2015) Application of decision-tree model to
groundwater productivity-potential mapping. Sustainability
7(10):13416–13432
Ley R, Casper MC, Hellebrand H, Merz R (2011) Catchment
classification by runoff behaviour with self-organizing maps
(SOM). Hydrol Earth Syst Sci 15(9):2947–2962
Li F, Feng P, Zhang W, Zhang T (2013) An integrated groundwa-
ter management mode based on control indexes of groundwa-
ter quantity and level. Water Resour Manag 27(9):3273–3292
Liu H, Yang C, Huang M, Wang D, Yoo C (2018) Modeling of
subway indoor air quality using Gaussian process regression.
J Hazard Mater 359:266–273
Loh WY (2011) Classification and regression trees. Wiley Inter-
discip Rev Data Min Knowl Discov 1(1):14–23
Maheswaran R, Khosa R (2013) Long term forecasting of ground-
water levels with evidence of non-stationary and nonlinear
characteristics. Comput Geosci 52:422–436
Mantero P, Moser G, Serpico SB (2005) Partially supervised clas-
sification of remote sensing images through SVM-based prob-
ability density estimation. IEEE Trans Geosci Remote Sens
43:559–570
Mohammadrezapour O, Yoosefdoost I, Ebrahimi M (2019) Cuckoo
optimization algorithm in optimal water allocation and crop
planning under various weather conditions (case study: Qazvin
plain, Iran). Neural Comput Appl 31(6):1879–1892
Moreaux M, Reynaud A (2006) Urban freshwater needs and spatial
cost externalities for coastal aquifers: a theoretical approach.
Reg Sci Urban Econ 36(2):163–186
Mountrakis G, Im J, Ogole C (2011) Support vector machines in
remote sensing: a review. ISPRS J Photogramm Remote Sens
13:247–259
Nampak H, Pradhan B, Manap MA (2014) Application of GIS based
data driven evidential belief function model to predict ground-
water potential zonation. J Hydrol 513:283–300
Olson DL, Delen D, Meng Y (2012) Comparative analysis of data
mining methods for bankruptcy prediction. Decis Support Syst
52(2):464–473
Praveena SM, Abdullah MH, Bidin K, Aris AZ (2012) Sustainable
groundwater management on the small island of Manukan,
Malaysia. Environ Earth Sci 66(3):719–728
Prinzio MD, Castellarin A, Toth E (2011) Data-driven catchment
classification: application to the pub problem. Hydrol Earth
Syst Sci 15(6):1921–1935
Reddy MJ, Ganguli P (2012) Risk assessment of hydroclimatic
variability on groundwater levels in the Manjara basin
aquifer in India using Archimedean copulas. J Hydrol Eng
17(12):1345–1357
Richards JA (1999) Remote sensing digital image analysis. Springer,
Berlin, p 240
Sener E, Davraz A (2013) Assessment of groundwater vulnerability
based on a modified DRASTIC model, GIS and an analytic
hierarchy process (AHP) method: the case of Egirdir Lake
basin (Isparta, Turkey). Hydrogeol J 21(3):701–714
Stumpp C, Żurek AJ, Wachniew P, Gargini A, Gemitzi A, Filip-
pini M, Witczak S (2016) A decision tree tool supporting the
assessment of groundwater vulnerability. Environ Earth Sci
75(13):1057
Taormina R, Chau KW, Sethi R (2012) Artificial neural net-
work simulation of hourly groundwater levels in a coastal
aquifer system of the Venice lagoon. Eng Appl Artif Intell
25(8):1670–1676
Tien Bui B, Shahabi H, Shirzadi A, Chapi K, Pradhan B, Chen W,
Khosravi K, Panahi M, Ahmad B, Saro L (2018) Land sub-
sidence susceptibility mapping in South Korea using machine
learning algorithms. Sensors 18:2464
Timofeev R (2004) Classification and regression trees (CART)
theory and applications. Humboldt University, Berlin
Toth E (2013) Catchment classification based on characterisation
of streamflow and precipitation time series. Hydrol Earth Syst
Sci 17(3):1149–1159
International Journal of Environmental Science and Technology
1 3
Wang J, He J, Chen H (2012) Assessment of groundwater contami-
nation risk using hazard quantification, a modified DRASTIC
model and groundwater value, Beijing Plain, China. Sci Total
Environ 432:216–226
Yeon YK, Han JG, Ryu KH (2010) Landslide susceptibility
mapping in Injae, Korea, using a decision tree. Eng Geol
116(3–4):274–283
... Mirhashemi and Mirzaei used the Apriori method to examine variations in the Qazvin plain's aquifer depth. It is a great tool for figuring out what is causing fluctuations in aquifer drawdown so that the aquifer may be better managed (Mirhashemi and Mirzaei 2020). The decision tree algorithm is one of the prediction algorithms that may be used. ...
... Many industries, including hydrology, meteorology, and agriculture, now make use of data mining methods. Groundwater management and planning may be improved by using the CART algorithm to forecast changes in aquifer depth, according to the research findings of Mirhashemi et al. (2020). Methods for clustering are part of data mining science, which aims to discover information from databases via exploration and processing. ...
Article
Using a clustering and tree method combination, this research looked at the prediction of changes in irrigation network groundwater depth in the Abyek plain. Groundwater depth variations in various plain regions were examined initially, utilizing the K-means technique for geographic grouping and aquifer depth changes. It was then applied to a tree algorithm using K-means findings. A tree method was then used to forecast changes in aquifer depth across all clusters. There were five clusters of groundwater alterations based on the K-means algorithm findings, and aquifer decline increased from cluster 1 to 5. Clusters 1 and 5 showed the greatest increases in aquifer depth and the greatest decreases. K-means and classification and regression tree findings show that in locations where the most aquifer decline was recorded, human causes were successful, while in regions where the highest groundwater depth rise was found, natural factors were effective. Factors of precipitation, agricultural water demand (million cubic meters), and water delivered to irrigation network (million cubic meters) in regions with high aquifer drawdown and factors of volume of precipitation, water delivered to irrigation network, and air humidity percentage in regions with increased groundwater depth had the greatest impact. For varied variations in the Abyek plain groundwater depth, rainfall volume was evident in most tree diagrams.
... In the process of obtaining the laparoscope, the 3D scene of the objective world needs to be projected onto the 2D image plane of the camera [22], and this projection can be explained by image conversion. Figure 1 shows a coordinate system related to 3D space scene imaging. ...
Article
Full-text available
The development of new technologies based on electronic intelligent images is a very active research and promotion of new technologies in recent years. This article mainly summarizes the basic concept, development, and technology of electronic intelligent imaging technology, as well as the research, promotion, and application of electronic intelligent imaging technology in clinical treatment. It especially emphasizes the practicality and application of electronic intelligent imaging technology in the current clinical operation process and conducts a meta-analysis of the current mesorectal excision, so as to provide more scientific and professional guidance for clinical surgery. The results of the meta-analysis showed that 3291 documents were initially obtained and duplicate documents were deleted by searching for keywords in mesocolon excision. We excluded 2399 subjects and articles whose interventions did not meet the inclusion criteria of this study after reading the title and abstract. Then, we obtained 892 papers that may meet the inclusion criteria through preliminary screening. We further optimized the search strategy based on selection criteria and data integrity filtering principles and finally determined 111 references. 100 articles that did not meet the requirements were excluded, and 11 articles were finally included for meta-analysis. Medical imaging can effectively improve the therapeutic effect of mesocolon excision and reduce the occurrence of complications. Therefore, it is very important to combine medical intelligent images for preoperative evaluation, and the development of the combination of surgical treatment and medical images should not be underestimated in the future.
... Some researchers now use data mining in different fields such as agriculture, healthcare, banking, sports, and hydrology. The tree data mining algorithm was used previously to predict variations in the depth of the aquifer (Mirhashemi et al. 2020). ...
Article
Full-text available
Nowadays, deficit irrigation is of particular importance in areas facing the water shortage and drought. This study focused on the investigation and prediction of the values of five quantitative traits of guar beans under different deficit irrigation methods. Deficit irrigation methods were carried out at the initial, development, mid, and late plant growth stages. The experiment was carried out in 25 treatments each with four replications in 2018 and 2019. Initially, the values of five quantitative traits of guar beans were divided into three categories, the values of which were clustered using the K-means algorithm. Then, clusters were predicted using a combination of K-means and CART algorithms. Finally, the relationship between different deficit irrigation methods and clusters was investigated by a combination of K-means and Apriori algorithms. The results of two hybrid algorithms determined that the amount of irrigation in the mid-stage of plant growth significantly affected the five quantitative traits of guar beans. After the mid-stage of the plant growth, the amount of irrigation in the development, initial, and late growth stages had the greatest effect on the quantitative traits of guar beans. Among the deficit irrigation methods, irrigation rates of 60% in the primary stage, 80% in the development stage, 100% in the mid-stage, and 40% in the late stage of the plant growth were the best deficit irrigation methods in the four stages of growth.
... The findings suggest that CART, RF, and BRT approaches may be employed for water resource planning, management, and land use planning (Naghibi et al. 2016). The use of data mining methods for aquifer management is advocated in their findings (Mirhashemi et al. 2020;Mirhashemi and Mirzaei 2021). This strategy is considered to lead to improved groundwater management in the current article. ...
Article
Groundwater drawdown is a damaging and destructive component in agriculture, demonstrating the necessity for a pattern to assist managers and farmers in predicting the amount of water decrease for proper groundwater planning and management. Because a variety of variables influence the quantity of groundwater drawn down, the current study focuses on human and environmental factors that are useful in forecasting the amount of groundwater levels change within the irrigation network range in the Alborz plain region from 2004 to 2018. Four tree algorithms were employed to forecast changes in groundwater levels. The outcomes of four algorithms in forecasting groundwater level change were evaluated: C5.0 and Classification and Regression Tree (CART), Chi-square Automatic Interaction Detector (CHAID), Quick, Unbiased, Efficient, and Statistical Tree (QUEST). The results for various indices reveal that the performance of the C5.0 algorithm is superior to that of the other methods. The findings of the C5.0 algorithm demonstrate that the volume of agricultural water demand, air humidity, and the amount of water provided to the irrigation network are the three most critical elements determining groundwater level fluctuations. As a result, the proposed method can estimate the amount of change in groundwater levels, which may aid in improved groundwater management and reduce the negative consequences of groundwater drawdown.
... Today, different data mining algorithms have been considered by various researchers (Mirhashemi et al. 2020;Mirhashemi and Mirzaei 2021). Association rules are one of the most important tasks in the field of data mining, which can be used in different contexts. ...
Article
Full-text available
In recent years, knowledge production from the massive amount of data using data mining techniques has attracted attention. Meanwhile, prediction of precipitation in various hydrological issues such as runoff, flood, and drought as well as watershed management is of great importance. Accordingly, the purpose of this research is to extract association rules using data mining techniques to verify and predict the amount of precipitation. The monthly data of precipitation and effective factors related to it were used in this study. This research was carried out in Qazvin Plain for 30 years from 1988 to 2018. Different factors affecting the amount of precipitation with different intervals, including time without delay and delay of 1 to 3 months, were used. Four scenarios were defined based on the four timescales of the influential factors. For each scenario, rules on precipitation and its influential factors were extracted by the Apriori algorithm. The extracted rules were evaluated by the indicators of confidence, support, and lift. The accuracy of the rules was evaluated for all four scenarios according to the three indicators and the best scenario was chosen. According to the results of the evaluation indicators, it was determined that effective factors with the 2-month delay had the most substantial effect on predicting the amount of precipitation. In the last step, the independent relationship between precipitation and factors affecting the 2-month delay was examined. Finally, it was determined that the average pressure level factor of the station with a 2-month delay had the most significant relationship with precipitation in Qazvin Plain.
... Use of the radial basis function to solve high-dimensional models could bring many advantages to the numerical method [15,16]. Mirhashemi et al. [21] used the CART tree algorithm to investigate the aquifer status of the Qazvin plain. Occurrence of water deficit crises in Iran has necessitated multilateral decisions in water management [3]. ...
Article
Full-text available
This study aimed to perform more appropriate management of the water delivered to the irrigation network. For this purpose, a combination of K-Means and Apriori algorithms was conducted to evaluate the impact of various factors on the management of water delivered to the irrigation network. Initially, the amount of water entering the irrigation network and its various influential factors were clustered by the K-Means algorithm. Then, the output information of the K-Means algorithm was selected as the input information of the Apriori algorithm. Accordingly, six optimal clusters were formed by the K-Means algorithm whereby 18 association rules related to six clusters were extracted by the Apriori algorithm. In addition, the amount of water requirements of crops played the greatest impact on the decision of managers for the amount of water delivered to the irrigation network. In some cases, although the amount of precipitation satisfies the water requirements of crops, it does not affect reduction of the amount of water delivered to the irrigation network. Further, air temperature and air humidity percentage had not been considered in the managers’ decision related to the amount of water delivered to this network. Since the problem of water deficit and lack of precipitation existed in the Abyek plain, it is suggested that the positive effects of environmental factors on the amount of water delivered to the irrigation network be considered to prevent water wastage.
Article
The largest amount of water consumption in Iran is related to agriculture. Considering that Iran has suffered a drought in recent years, the optimal use of water is necessary, especially in the agricultural sector. For this reason, in this research, the deficit irrigation regarding alfalfa crop was investigated. Alfalfa was cultivated in three years with four replications. Deficit irrigation methods including 40 %, 70 % and 100 % full irrigation (FI) were employed at different stages of alfalfa growth. First, crop per drop (CPD), benefit per drop (BPD) and net benefit per drop (NBPD) values related to alfalfa were clustered. The results of clustering were introduced to the tree algorithm as target data. The tree algorithm predicted the target data according to the factors of 1) the type of cropping year, 2) the time of harvest and 3) the level of deficit irrigation at different stages of alfalfa growth. According to the results of the combination of two algorithms, it was found that The lowest values of CPD, BPD and NBPD are equal to 1.2 (kg/m3), 125 (1000 Rial/m3) and 115 (1000 Rial/m3) respectively in the second harvest and deficit irrigation which was predicted in the two regrowth (RG) and budding (BU) stages. The highest values of CPD, BPD and NBPD are equal to 1.8 (kg/m3), 185 (1000 Rial/m3) and 175 (1000 Rial/m3) respectively in the third harvest and deficit irrigation which was predicted in the branching stage (BR). Thus, to increase the yield, it is better not to perform severe deficit irrigation at RG and BU stages. Also, according to the predictor importance of the software, it was found that the three factors of harvest time, BU and the type of cropping year had the greatest effect on the prediction, respectively.
Article
Full-text available
The Jilango and Shabel-Dulla areas are located in the Laghdera Sub-County of Garissa County, in North Eastern Kenya, on the fringes of the distal Merti aquifer. Water scarcity has been the number one factor contributing to the immense levels of poverty in the pastoralist centre.The resident inhabitants are of the nomadic-pastoralist lifestyles, raising livestock such as camels, goats and cows, for upkeep. The settlements are six kilometers apart and this is a deliberate effort by the villagers to live near the seasonal River Jilango, within whose beds the residents have sunk several sand wells to get water for their domestic use, and also, on a limited extent, to get little water for their livestock, mainly the sheep and goats. The water resources quality, ease of availability and priority ranking in the study area were undertaken using field traverse along the river channels and also via geophysical mapping in the selected sites in the neighborhood of the riverbed. Priority ranking was pegged upon the ready availability and cost effective way of getting water into the villages for use, for both domestic-and livestock-watering purposes. In areas where groundwater potential was inferred using Geophysical mapping, it became paramount that the study gets to estimate the water quality expected from the yet-to-be-drilled proposed well points. Groundwater potential estimation was also undertaken using field geophysics and GIS, as well as the Decision Tree algorithms in R software packages. Precision and sensitivity analysis of the data was also undertaken using the python Softwares and returned favorable results for the techniques employed. Feasibility for Wells, Earthpans, Springs and piped water from Baraki Centre were all considered, in terms of how much each option would cost to the community. This involved physical transects and meetings with members of the public in the proposed locality. From the table so prepared, it was decided to rank groundwater as number two priority source, mainly for use by r livestock and an earthpan as number one for human domestic usages, based on water quality and other factors considered. The study shows the priority ranking of water resources in the study area as well as the fact that predicted groundwater quality is of inferior order, and recommends the designing and construction of a big storage dam to aid domestic water supply. This shall complement the available aquifer water which, though saline and of low discharge, can be tolerated by the livestock. Equally deduced from the study is the fact the decision tree algorithm is a useful hydrological assessment tool, as it was used to a degree of over 90 percent precision levels, in predicting the water quality that will be encountered in the locality, once the new well water is drilled.
Article
Since maize water requirement is different at different growth stages, so prediction and extraction of association rules related to water requirements of the plant were performed separately at initial, development, mid, and late season growth stages. Accordingly, information on water requirement of maize during 20 years (2000-2019) in Qazvin plain was used. First, the results of C5.0, CART, CHAID, and QUIST algorithms related to corn water demand forecast were evaluated. According to the results, CART algorithm at the initial growth stage and C5.0 algorithm at the development, mid and late season stages of growth, had the best performance in predicting water requirements of maize. The factors of air humidity and precipitation were the most important factors in predicting water requirements of maize at the initial stage of growth by CART tree algorithm. Also, according to the results of C5.0 algorithm, it was found that at the development and mid-season growth stages, precipitation and air temperature were the most important, while at the late season stage of growth, the two factors of sunny (sunshine) hours and wind speed were most important in predicting plant water requirements. Finally, using Apriori algorithm, association rules between water requirements and the factors affecting it were extracted at four growth stages of maize. The results of association rules were evaluated by indicators of confidence, support, and lift. According to the results of Apriori algorithm, precipitation at the initial and development growth stages as well as air temperature and wind speed factors at the mid and late season stages of growth, respectively, had the greatest relationship with water requirements of maize.
Article
Full-text available
In this study, land subsidence susceptibility was assessed for a study area in South Korea by using four machine learning models including Bayesian Logistic Regression (BLR), Support Vector Machine (SVM), Logistic Model Tree (LMT) and Alternate Decision Tree (ADTree). Eight conditioning factors were distinguished as the most important affecting factors on land subsidence of Jeong-am area, including slope angle, distance to drift, drift density, geology, distance to lineament, lineament density, land use and rock-mass rating (RMR) were applied to modelling. About 24 previously occurred land subsidence were surveyed and used as training dataset (70% of data) and validation dataset (30% of data) in the modelling process. Each studied model generated a land subsidence susceptibility map (LSSM). The maps were verified using several appropriate tools including statistical indices, the area under the receiver operating characteristic (AUROC) and success rate (SR) and prediction rate (PR) curves. The results of this study indicated that the BLR model produced LSSM with higher acceptable accuracy and reliability compared to the other applied models, even though the other models also had reasonable results.
Article
Full-text available
As inferred from its biological nature, agriculture is a key consumer of water resources in many countries. Hence, today, water management plays an important role in the use of water resources of these countries. The present study aimed to optimize cultivation area, to manage irrigation water, and to optimize total income gained from the cultivation area of special crops in Qazvin plain (the central plateau of Iran) under various weather conditions using cuckoo optimization algorithm (COA). Under the same objective function, the performance of the COA was accessed through comparison with the genetic algorithm (GA). The results of two models showed that because of its high water requirement and low yield, the cultivation area of sugar beet in every four different condition reduced (by over 80%); that is, it is not wise to plant it in all different weather conditions of the study area. Comparison of the model results indicates that the COA can provide better and more reliable optimal results in relative yield of crops, higher farm income. So, in comparison with GA, less water is allocated. Following the new cropping pattern delivered by COA model, the water volume stored in the dam reservoir at the end of the operation under wet, normal, dry, and hot–dry conditions rose, respectively, by 264,745.3, 2,865,387, 275,789, and 655,918 m3. Meanwhile, the farmers’ profit increased, respectively, by 6.2, 2.6, 1.27, and 1.48% compared to the previous optimization occurred at the end of the operation. To conclude, COA is quite promising in a cultivation area of crops optimization problem in terms of its simple structure, excellent search efficiency, and strong robustness.
Article
Full-text available
Objective criteria for catchment classification are identified by the scientific community among the key research topics for improving the interpretation and representation of the spatiotemporal variability of streamflow. A promising approach to catchment classification makes use of unsupervised neural networks (Self Organising Maps, SOM's), which organise input data through non-linear techniques depending on the intrinsic similarity of the data themselves. Our study considers ~300 Italian catchments scattered nationwide, for which several descriptors of the streamflow regime and geomorphoclimatic characteristics are available. We qualitatively and quantitatively compare in the context of PUB (Prediction in Ungauged Basins) a reference classification, RC, with four alternative classifications, AC's. RC was identified by using indices of the streamflow regime as input to SOM, whereas AC's were identified on the basis of catchment descriptors that can be derived for ungauged basins. One AC directly adopts the available catchment descriptors as input to SOM. The remaining AC's are identified by applying SOM to two sets of derived variables obtained by applying Principal Component Analysis (PCA, second AC) and Canonical Correlation Analysis (CCA, third and fourth ACs) to the available catchment descriptors. First, we measure the similarity between each AC and RC. Second, we use AC's and RC to regionalize several streamflow indices and we compare AC's with RC in terms of accuracy of streamflow prediction. In particular, we perform an extensive cross-validation to quantify nationwide the accuracy of predictions in ungauged basins of mean annual runoff, mean annual flood, and flood quantiles associated with given exceedance probabilities. Results of the study show that CCA can significantly improve the effectiveness of SOM classifications for the PUB problem.
Article
Full-text available
Catchments show a wide range of response behaviour, even if they are adjacent. For many purposes it is necessary to characterise and classify them, e.g. for regionalisation, prediction in ungauged catchments, model parameterisation. In this study, we investigate hydrological similarity of catchments with respect to their response behaviour. We analyse more than 8200 event runoff coefficients (ERCs) and flow duration curves of 53 gauged catchments in Rhineland-Palatinate, Germany, for the period from 1993 to 2008, covering a huge variability of weather and runoff conditions. The spatio-temporal variability of event-runoff coefficients and flow duration curves are assumed to represent how different catchments "transform" rainfall into runoff. From the runoff coefficients and flow duration curves we derive 12 signature indices describing various aspects of catchment response behaviour to characterise each catchment. Hydrological similarity of catchments is defined by high similarities of their indices. We identify, analyse and describe hydrologically similar catchments by cluster analysis using Self-Organizing Maps (SOM). As a result of the cluster analysis we get five clusters of similarly behaving catchments where each cluster represents one differentiated class of catchments. As catchment response behaviour is supposed to be dependent on its physiographic and climatic characteristics, we compare groups of catchments clustered by response behaviour with clusters of catchments based on catchment properties. Results show an overlap of 67% between these two pools of clustered catchments which can be improved using the topologic correctness of SOMs.
Article
Full-text available
The Water Framework Directive and Groundwater Directive aim at preserving and improving the groundwater status. Groundwater bodies are classified as being or not being at risk of failing to meet these objectives. Those at risk are subject to more precise risk assessment where the concept of vulnerability is considered in the pathway part of the source–pathway–receptor scheme. However, no further details on implementation strategies are provided. In order to support groundwater managers and decision-makers in implementation of programs protecting groundwater, a systematic operational approach based on a decision tree is proposed, which leads the user through the stages of vulnerability assessment. First, a problem has to be formulated related to a threatening of the quantitative and/or qualitative status of a groundwater body. Next, the stated problem needs to be related to the intrinsic or specific vulnerability. Methods used for the intrinsic vulnerability assessment belong to two categories: subjective rating and objective methods. Method selection depends primarily on: data availability, knowledge and available resources. A key issue is the lag time associated with transport between a source/event of contamination and the water body. This lag time is primarily controlled by the temporal scale of water flow. It provides information about flow processes and at the same time also about timescales required for the implementation of strategies. Effects of any measures taken cannot be observed immediately but at the earliest after these estimated lag times emphasizing the need to also proactively safeguard groundwater resources and preserve their good status.
Article
Full-text available
For the sustainable use of groundwater, this study analyzed groundwater productivity-potential using a decision-tree approach in a geographic information system (GIS) in Boryeong and Pohang cities, Korea. The model was based on the relationship between groundwater-productivity data, including specific capacity (SPC), and its related hydrogeological factors. SPC data which is measured and calculated for groundwater productivity and data about related factors, including topography, lineament, geology, forest and soil data, were collected and input into a spatial database. A decision-tree model was applied and decision trees were constructed using the chi-squared automatic interaction detector (CHAID) and the quick, unbiased, and efficient statistical tree (QUEST) algorithms. The resulting groundwater-productivity-potential (GPP) maps were validated using area-under-the-curve (AUC) analysis with the well data that had not been used for training the model. In the Boryeong city, the CHAID and QUEST algorithms had accuracies of 83.31% and 79.47%, and in the Pohang city, the CHAID and QUEST algorithms had accuracies of 86.18% and 80.00%. As another validation, the GPP maps were validated by comparing the actual SPC data. As the result, in the Boryeong city, the CHAID and QUEST algorithms had accuracies of 96.55% and 94.92% and in the Pohang city, the CHAID and QUEST algorithms had accuracies of 87.88% and 87.50%. These results indicate that decision-tree models can be useful for development of groundwater resources.
Book
The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Article
Soft sensor modeling of indoor air quality (IAQ) in subway stations is essential for public health. Gaussian process regression (GPR), as an efficient nonlinear modeling method, can effectively interpret the complicated features of industrial data by using composite covariance functions derived from base kernels. In this work, an accurate GPR soft sensor with the sum of squared-exponential covariance function and periodic covariance function is proposed to capture the time varying and periodic characteristics in the subway IAQ data. The results demonstrate that the prediction performance of the proposed GPR model is superior to that of the traditional soft sensors consisting of partial least squares, back propagation artificial neural networks, and least squares support vector regression (LSSVR). More specifically, the values of root mean square error, mean absolute percentage error, and coefficient of determination are improved by 12.35%, 9.53%, and 40.05%, respectively, in comparison with LSSVR.