Content uploaded by Christian Ortiz-Lopez
Author content
All content in this area was uploaded by Christian Ortiz-Lopez on Jan 28, 2025
Content may be subject to copyright.
Content uploaded by Christian Ortiz-Lopez
Author content
All content in this area was uploaded by Christian Ortiz-Lopez on Jun 05, 2024
Content may be subject to copyright.
Journal of Environmental Management 362 (2024) 121378
0301-4797/© 2024 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-
nc/4.0/).
Research article
Ensemble machine learning using hydrometeorological information to
improve modeling of quality parameter of raw water supplying
treatment plants
Christian Ortiz-Lopez
a
,
*
, Christian Bouchard
a
, Manuel J. Rodriguez
b
a
Centre de Recherche en Am´
enagement et D´
eveloppement (CRAD), Universit´
e Laval, 2325 All´
ee des Biblioth`
eques, Qu´
ebec City, QC, G1V 0A6, Canada
b
´
Ecole Sup´
erieure d’Am´
enagement du Territoire et de D´
eveloppement R´
egional (ESAD), Universit´
e Laval, 2325 All´
ee des Biblioth`
eques, Qu´
ebec City, QC, G1V 0A6,
Canada
ARTICLE INFO
Handling editor: Lixiao Zhang
Keywords:
Ensemble machine learning
Drinking water
Source water
Raw water quality modeling
River ow events
Rainfall events
ABSTRACT
Source and raw water quality may deteriorate due to rainfall and river ow events that occur in watersheds. The
effects on raw water quality are normally detected in drinking water treatment plants (DWTPs) with a time-lag
after these events in the watersheds. Early warning systems (EWSs) in DWTPs require models with high accuracy
in order to anticipate changes in raw water quality parameters. Ensemble machine learning (EML) techniques
have recently been used for water quality modeling to improve accuracy and decrease variance in the outcomes.
We used three decision-tree-based EML models (random forest [RF], gradient boosting [GB], and eXtreme
Gradient Boosting [XGB]) to predict two critical parameters for DWTPs, raw water Turbidity and UV absorbance
(UV254), using rainfall and river ow time series as predictors. When modeling raw water turbidity, the three
EML models (r2
RF−Tu =0.87, r2
GB−Tu =0.80 and r2
XGB−Tu =0.81) showed very good performance metrics. For raw
water UV254, the three models (r2
RF−UV =0.89, r2
GB−UV =0.85 and r2
XGB−UV =0.88) again showed very good
performance metrics. Results from this study suggest that EML approaches could be used in EWSs to anticipate
changes in the quality parameters of raw water and enhance decision-making in DWTPs.
1. Introduction
Surface water is a primary source of water supply water used for
human consumption. Surface water contains elements such as patho-
genic microorganisms, particles, and organic matter. These elements
must be removed or inactivated in drinking water treatment plants
(DWTP) to deliver safe drinking water and ensure public health safety
(World Health Organization, 2017). The quality of raw water from
surface sources, such as rivers and lakes, is prone to variation due to
meteorological events that occur in the watersheds (Khan et al., 2015).
During precipitation events, the hydrological response of the watershed
takes some time to develop. This response depends on specic charac-
teristics of rainfall events such as intensity, duration, and amount, as
well as soil characteristics such as soil saturation and soil conditions
preceding the event. Rainfall can lead to a deterioration in the quality of
raw water due to the increased transport of contaminants through sur-
face and subsurface runoff to rivers and lakes, and thus to DWTP intakes
(Delpla et al., 2023). Such deterioration in raw water quality, especially
when there are large peaks in contaminant concentrations, may require
prompt adjustments to treatment conditions in the DWTP. For example,
coagulant and disinfectant dosages may need to be modied after
rainfall events if there are signicant increases in the concentrations of
ne particles and natural organic matter (Edzwald, 2011).
There is a time lag between the moment rain falls in the watershed
and the moment when variations in raw water quality can be detected in
the DWTP through online monitoring or grab sampling (Ortiz-Lopez
et al., 2023). There is an additional time lag between the detection of
water quality degradation and the implementation of operational ad-
justments needed to respond to these situations. A tool that could
anticipate raw water degradation, especially the peak concentrations,
would help DWTP operators react in an appropriate and timely way.
Early Warning Systems (EWSs) are critical for DWTP and include pre-
dictive models. Deterministic (i.e., physics-based) modeling of raw
water quality variations, is difcult due to complex and numerous un-
derlying phenomena (Bui et al., 2020). Some researchers therefore opt
for an empirical (i.e., non-physical) approach using articial intelligence
* Corresponding author.
E-mail address: christian.ortiz-lopez.1@ulaval.ca (C. Ortiz-Lopez).
Contents lists available at ScienceDirect
Journal of Environmental Management
journal homepage: www.elsevier.com/locate/jenvman
https://doi.org/10.1016/j.jenvman.2024.121378
Received 13 February 2024; Received in revised form 3 May 2024; Accepted 2 June 2024