PreprintPDF Available

Pulling the Carpet Below the Learner's Feet: Genetic Algorithm To Learn Ensemble Machine Learning Model During Concept Drift

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Data-driven models, in general, and machine learning (ML) models, in particular, have gained popularity over recent years with an increased usage of such models across the scientific and engineering domains. When using ML models in realistic and dynamic environments, users need to often handle the challenge of concept drift (CD). In this study, we explore the application of genetic algorithms (GAs) to address the challenges posed by CD in such settings. We propose a novel two-level ensemble ML model, which combines a global ML model with a CD detector, operating as an aggregator for a population of ML pipeline models, each one with an adjusted CD detector by itself responsible for re-training its ML model. In addition, we show one can further improve the proposed model by utilizing off-the-shelf automatic ML methods. Through extensive synthetic dataset analysis, we show that the proposed model outperforms a single ML pipeline with a CD algorithm, particularly in scenarios with unknown CD characteristics. Overall, this study highlights the potential of ensemble ML and CD models obtained through a heuristic and adaptive optimization process such as the GA one to handle complex CD events.
Content may be subject to copyright.
Pulling the Carpet Below the Learner’s Feet: Genetic Algorithm To Learn
Ensemble Machine Learning Model During Concept Drift
Teddy Lazebnik1,2,
1Department of Mathematics, Ariel University, Ariel, Israel
2Department of Cancer Biology, Cancer Institute, University College London, London, UK
Corresponding author: lazebnik.teddy@gmail.com
Abstract
Data-driven models, in general, and machine learning (ML) models, in particular, have gained popularity over
recent years with an increased usage of such models across the scientific and engineering domains. When using
ML models in realistic and dynamic environments, users need to often handle the challenge of concept drift (CD).
In this study, we explore the application of genetic algorithms (GAs) to address the challenges posed by CD in
such settings. We propose a novel two-level ensemble ML model, which combines a global ML model with a
CD detector, operating as an aggregator for a population of ML pipeline models, each one with an adjusted CD
detector by itself responsible for re-training its ML model. In addition, we show one can further improve the pro-
posed model by utilizing off-the-shelf automatic ML methods. Through extensive synthetic dataset analysis, we
show that the proposed model outperforms a single ML pipeline with a CD algorithm, particularly in scenarios
with unknown CD characteristics. Overall, this study highlights the potential of ensemble ML and CD models
obtained through a heuristic and adaptive optimization process such as the GA one to handle complex CD events.
Keywords: automatic machine learning; heuristic optimization; concept drift; ensemble machine learning.
1 Introduction
Data-driven models, in general, and machine learning (ML) models, in particular, have gained popularity over
recent years with increased usage of such models across the scientific and engineering domains [1, 2, 3, 4, 5, 6].
While ML models show promising results in the lab as well as in realistic scenarios, deployed ML models ex-
perience a wide range of challenges in production settings. A common and important challenge these models
encounter is adapting to dynamic and evolving environments [7, 8]. In particular, concept drift (CD), the phe-
nomenon wherein the statistical properties of the target variable change over time, poses a significant hurdle to
the stability and performance of learning-based models [9]. The dynamic nature of real-world data introduces
uncertainties, necessitating the continuous adaptation of models to maintain their relevance and accuracy. Recent
research in the CD domain focuses on addressing three main challenges. Precisely identify CD within unstructured
and noisy datasets [10, 11, 12], to comprehensively comprehend CD in a quantifiable and interpretable manner
[13, 14], and to respond effectively to CD [15, 16].
A genetic algorithm (GA) is a search and optimization technique inspired by the principles of natural selection
and genetic inheritance [17, 18]. It operates by iteratively evolving a population of potential “solutions” to a
problem through mechanisms such as selection, crossover, and mutation, mimicking the process of biological
evolution to find optimal or near-optimal solutions [19, 20]. The motivation behind employing genetic algorithms
lies in their ability to efficiently explore large solution spaces, enabling the discovery of diverse and effective
solutions that may be elusive through traditional optimization methods [21, 22, 23]. In particular, GAs have
been adapted to the realm of ML to find a well-performing ML pipeline [24], solve a symbolic regression task
[25, 26, 27], or even as part of ML ensemble-based model [28, 29].
arXiv:2412.09035v1 [cs.LG] 12 Dec 2024
Draft: December 13, 2024 2
Generally speaking, GAs hold significance in ML for three primary reasons. First, they operate in discrete
spaces, making them applicable in scenarios where gradient-based methods are impractical [30]. Second, GAs
function as reinforcement learning algorithms, evaluating the performance of a learning system based on a singular
metric, commonly referred to as the “fitness” function, in contrast to approaches like back-propagation where
different parts of the model received different optimization signals. This characteristic makes them particularly
useful in situations where performance measurement is the sole available information [31, 32]. Third, GAs involve
a population, making them suitable for scenarios where the desired outcome is not a single model but a set of
models, as exemplified in learning within multi-agent systems [33].
To this end, GAs can be used to overcome CD [34, 35]. In this study, we focused on the last point as in the
context of deployed ML-based solution where it is trained on initial data and more (tagged) data is gathered over
time, GAs can be used to respond to changes in dynamics. Intuitively, one can think of CD as the change between
the source (x) and target (y) features where some model (f) is applied at two points in time (t1, t2), such that
||ft1(x)y|| ||ft2(x)y||. Hence, as new data is introduced to the learner, a mechanism should alter the
model faiming to obtain ||f1
t1(x)y|| =||f2
t2(x)y|| where f2is originated from f1and altered to achieve the
above condition. Following this line of thought, we propose a two-level ML model obtained using a GA algorithm
which has a global ML model with a CD detector that operates as an ensemble model for a population of ML
models, each governing subset of the data and self-adoptive according to their own CD detector models.
The rest of the paper is organized as follows. Section 2 presents an overview of CD properties, challenges, and
previous solutions as well as the recent developments in the field of GAs and Ensemble ML. Sectio 3 provides
a technical background later used in the model definition. Section 4 formally outlines the task definition, the
proposed algorithm based on GA, and its applicative improvement in the form of using automatic ML models
with a divide-and-conquer approach. Section 5 introduces the experimental setup used to explore the proposed
algorithm. Section 6 shows the obtained results. Finally, Section 7 discusses the results with their potential
applicative usage as well as the limitations of this study and possible future work.
2 Related Work
In this section, we present approaches in the field of CD adoption followed by an overview of the GA models used
in dynamical systems. Afterward, we review several cases where GA is used in the context of CD and ensemble
ML models.
2.1 Concept drift
A discussion about CD contains two interconnected dichotomy aspects - the theoretical and applied aspects of
defining, detecting, and tacking CD. It is more often than not the applied aspect that governs the broad interest in
CD as ML users experience first-hand the challenges that come with CD in their respective tasks [36]. Indeed, in
a dynamic world, nothing is constant. For instance, let us consider a supply chain distribution system responsible
for distributing a company’s products between its physical stores. Would one should expect that a model that
was trained before COVID-19 [37], would work equally well during or even after the COVID-19 pandemic? It is
reasonably easy to assume that due to these kinds of unforeseen circumstances, user behavior would change a lot,
as indeed happened in practice [38, 39].
As such, a field of repetitive ML model adoption has been proposed where new data is used to re-train ML
models to detect, capture, and utilize the changes in the data over time, practically addressing CD [40, 41]. One
of the most direct approaches to address CD involves retraining a new model with the latest data to replace
the outdated model and dynamics constructed it [42, 43]. This method necessitates an explicit CD detector to
determine when model retraining is required. This approach does not work well when the change over time is
relatively slow and “smooth” and excels in hard shifts in the system’s dynamics. The complementary approach
is to use a “window” strategy where the model is retrained on the latest data with some fixed size. A more
sophisticated example of this approach is employed by Paired Learners which utilizes two learners - the stable
learner and the reactive learner [44]. If the stable learner consistently misclassifies instances correctly identified
by the reactive learner, signaling a new concept, the stable learner is replaced with the reactive learner.
Draft: December 13, 2024 3
These two approaches cover the basic ideas of CD overcoming in ML. That said, each approach raises a new
computational challenge one needs to tackle. The first approach requires the accurate detection of CD while the
latter challenges the user to find and use the optimal window size. On top of that, as each approach is appropriate
to different types of CD, choosing the appropriate one for each case is a challenge in itself. This fertile soil was
the base of multiple solutions.
Initially, attempts to find the optimal window size have been conducted as a compromise must be made in
determining the suitable window size. A smaller window effectively mirrors the most recent data distribution,
while a larger window affords more data for training a new model. Consequently, [45] proposed ADWIN, an al-
gorithm that dynamically adjusts subwindow sizes based on the rate of change between sub-windows, eliminating
the necessity for users to predefine a fixed window size. After determining the optimal window cut, the window
containing outdated data is discarded, facilitating the training of a new model with the latest window data.
Moving beyond mere model retraining, researchers have delved into the integration of the drift detection
process with the retraining mechanism tailored for specific ML algorithms rather than an “one method to rule
them all” approach. For example, [46] proposed DELM which extends the conventional ELM algorithm to handle
concept drift by adaptively modifying the number of hidden layer nodes. Moreover, instance-based lazy learners
also show promising results for CD handling [47]. For example, [48] proposed NEFCS, a kNN-based adaptive
model, that utilizes a competence model-based drift detection algorithm to identify drift instances in the case base
and distinguish them from noise instances.
2.2 Genetic algorithm
GAs belong to a category of approaches commonly known as evolutionary computation methods, employed in
adaptive aspects of computation such as search, optimization, machine learning, and parameter adjustment [49].
What distinguishes these approaches is their characteristic reliance on a population of potential solutions. Unlike
most search algorithms that focus on modifying a single candidate solution to enhance its performance, evolu-
tionary algorithms dynamically adapt entire populations of candidate solutions to address the problem at hand.
Drawing inspiration from biological populations, these algorithms incorporate selection operators to amplify the
number of superior solutions within the population while diminishing the presence of inferior ones [50, 51]. Ad-
ditionally, they employ other operators to generate novel solutions. The variability among these algorithms lies in
the standard representation of problems and the nature and relative significance of the operations introducing new
solutions [52, 53, 54].
GAs have found application across diverse domains, including engineering [19], medicine [20], and economics
[55]. For instance, [51] addressed the challenge of optimizing sequences of machines and their corresponding
operations for process planning optimization. Employing GAs, the authors derived feasible processes initially
and subsequently identified the optimal process from this set of viable alternatives. Similarly, [50] investigated
and assessed the utilization of GAs under various constraints in process route sequencing and astringency. The
authors revamped the GA, encompassing the development of coding strategies, the evaluation operator, and the
fitness function. Their findings demonstrated that these modified GAs could effectively fulfill the requirements of
sequencing tasks and meet the criteria for astringency. In another study, [55] introduced an optimization scheme
based on GAs to enhance Atkinson fuel engine models, specifically targeting fuel consumption reduction [55, 56].
GAs were chosen due to the high-dimensionality and non-linearity of the optimization system, rendering classical
methods time and resource-intensive. Furthermore, the authors proposed GAs as a financial model, illustrating
their applicability in learning signal utilization, making inferences from market-clearing prices, and assessing
the worthiness of acquiring a signal [57]. In the economic domain, [58] presented an agent-based model with a
heterogeneous population and genetic algorithm-based decision-making to model and simulate an economy with
taxation policy dynamics. Furthermore, for the clinical domain, GAs have been widely used as well. For example,
[59] used GA to obtain the parameters of a partial differential equations-based model describing an immunotherapy
treatment for bladder cancer.
Draft: December 13, 2024 4
2.3 Genetic algorithm for concept drift
A growing body of work finds the usage of GA to detect and adapt to CD promising in a wide range of tasks and
data settings [60]. In particular, in realistic applications, CD is of interest when new data is obtained once an initial
ML model is obtained [61, 62].
[63] introduces an innovative ensemble learning approach relying on evolutionary algorithms to address di-
verse forms of concept drifts in non-stationary data stream classification tasks. The authors employ random fea-
ture subspaces drawn from a feature pool to construct distinct classification types within the ensemble. Each type
comprises a finite set of classifiers (decision trees) constructed at various instances throughout the data stream.
Utilizing an evolutionary algorithm, specifically replicator dynamics, the system adapts to varying concept drifts
by enabling types with superior performance to expand and those with inferior performance to diminish in size.
[64] proposed a novel Density-based method for Clustering Data streams employing Genetic Algorithm (DCDGA).
This approach leverages a GA to optimize parameters, specifically the cluster radius and minimum density thresh-
old, ensuring more precise coverage of density clusters. Additionally, a Chebychev distance function is introduced
to compute the distance between the center of Core Micro-Clusters (CMCs) and the incoming data points. The
authors evaluated DCDGA on both artificial and real datasets and showed that the experimental results were com-
pared with another online density-based clustering in the field.
[65] introduces a predictive model for temporal data with a numerical target, utilizing GA to capture changes
in a dataset caused by concept drift. In the presence of environmental changes, which stands for the CD in these
settings, the author’s proposed algorithm responds by clustering the data and subsequently creating nonlinear
models that characterize the formed clusters. These nonlinear models serve as terminal nodes within the GA
model trees.
[66] developed a spam detection system that examines the evolution of features. The author’s proposed method
encompasses three key steps. First, training a spam classification model; second, detecting CD using a new strategy
that analyzes feature evolution based on the similarity between feature vectors extracted from training and test data;
and finally, knowledge transfer learning. In the last step, the focus is on determining what knowledge to transfer,
how to transfer it, and when to execute the knowledge transfer process.
2.4 Ensemble machine learning models
Ensemble ML methods leverage multiple ML algorithms to generate weak predictive results by extracting features
through diverse projections on data. These results are then fused using various voting mechanisms to achieve supe-
rior performance compared to that obtained from any individual algorithm in isolation [67]. Indeed, ensemble ML
models show superior results on a wide range of tasks [68]. The fundamental concept of a standard ensemble ML
model involves two stages: producing prediction outcomes through numerous weak classifiers and consolidating
these multiple results into a consistency function to obtain the ultimate result using voting schemes. The voting
scheme can range in complexity, from a simple average or majority vote for regression and classification tasks,
respectively, such as for the case of the Random Forest model [69] which is based on a set (forest) of Decision
Tree models [70] to being an ML or deep learning model by itself [71].
The weak prediction ML models in the set of an ensemble method differ from one another by one or more
of the following three properties. First, the samples provided to the model, the features provided to the model,
and even the ML model itself [72]. These changes allow each weak prediction model to focus on a less complex
pattern in the data and excel in capturing it. Hence, the more weak models an ensemble model includes, the more
complex patterns it can capture. However, it also increases the bias given the data training data [72]. Importantly,
this scheme can be generalized where weak models in an ensemble model can be ensemble models by themselves.
For example, imagine a Random Forest model in which each Decision Tree is replaced with a Random Forest as
well. This example will result in three levels of models.
Draft: December 13, 2024 5
3 Technical Background
In this section, the necessary technical background, later used by the proposed model, is presented. Initially, a
formal definition of CD with its two main types is provided. Next, an introduction to automatic ML (AutoML)
models is presented.
3.1 Concept drift definition
A learning algorithm, A, observing samples with a stationary distribution would observe the training cohort in the
form (xi, yi)such that xiis the feature vector and yiis the target feature. A class prediction at a specific point in
time twould be given as ytbased on the feature vector xt. Opposed to this, a data stream may produce samples
with a non-stationary distribution. In such a scenario, the (xi, yi)is obtained by a distribution that explicitly
depends on time or previous samples measured from the distribution. Formally, a CD between two points in time
t0and t1can be defined as: pt0(xi, yi)=pt1(xi, yi), where ptis the joint distribution at time t between the
feature vector xiand the target feature yi[8]. Following this definition, CD can occur due to three main reasons:
the distribution of samples in the target feature can change; the distribution of samples in the target feature can
change concerning the samples of the source features; and the source features distribution can change while the
target feature does not.
In addition to the fact that CD occurred, the rate at which it happens is also of interest. Simply put, the rate
at which CD takes place can be roughly divided into two main forms: shift and moving CD. The shift drift is
associated with sudden changes in the distribution while the moving drift occurs at a much slower rate usually
with multiple phases in between [73]. Formally, let us assume two distributions pt(xi, yi)and pt+∆t(xi, yi)of
the data associated with two points in time tand t+ t, respectively, such that tR+is the time passed
since the initial point in time, t. In addition, let us assume a threshold value ψR+. A drifting rate is defined to
be
(1 KS (pt, pt+∆t))/t, (1)
such that KS (a, b)is the p-value of a Kolmogorov-Smirnov test [74] between the two distribution aand b. Fig.
1 presents a schematic view of shift and moving CD where the shift CD moves from one two-dimensional distri-
bution (x, y) to another drastically while the moving CD gradually alters from the same source distribution to the
other distribution.
3.2 Automatic Machine Learning
The process of ML model development is time-consuming, requires substantial expertise, and is susceptible to hu-
man errors [75]. To this end, AutoML has emerged as a promising approach that automates many steps in the ML
development process, including data pre-processing, feature engineering, model selection, and hyperparameter
tuning, thereby mitigating the challenges associated with using ML models [76, 77, 78, 79].
Multiple models have been proposed in recent years for Automatic machine learning [80]. For instance, the
Tree-based Pipeline Optimization Tool (TPOT) library utilizes a GA search process to identify ML pipelines
based on the popular Scikit-learn library [81]. TPOT uses a tree-based representation to evolve and optimize these
pipelines based on their performance, aiming to find the most effective combination for a given dataset. The library
represents machine learning pipelines as tree structures, providing a flexible and hierarchical way to organize and
evolve complex combinations of data preprocessing and modeling steps. The AutoSklearn library [82, 83] employs
various search methods to construct an ML pipeline, also based on the Scikit-learn library. It employs meta-
learning and Bayesian optimization techniques to efficiently search through various preprocessing steps, feature
engineering methods, and model configurations. AutoSklearn incorporates meta-learning, leveraging information
from previous ML tasks to guide the search for effective pipelines. The AutoGluon library [84] strategy is based
on the idea of ensembling multiple models and stacking them in multiple layers. AutoGluon uses a fixed defaults
(set adaptively) strategy for the search process of ML models in each layer and then combines multiple layers
using the stacking and repeated bagging methods. The PyCaret library [85] is a Python wrapper around several
popular ML libraries and frameworks which uses a multi-metric comparison of these models, in a brute-force
manner, to find the best model for a given dataset and task.
Draft: December 13, 2024 6
x
y
t
x
y
t
Shift concept drift Moving concept drift
Figure 1: A schematic view of shift and moving CD. One can notice that the shift CD moves from one two-
dimensional distribution (x, y) to another distribution drastically. On the other hand, the moving CD gradually
alters from the same source distribution to the other distribution.
4 Ensemble Machine Learning Model For Concept Drift Data
In this section, we outline the proposed GA-based solution for CD in data streams based on ensemble ma-
chine learning models. The proposed model is based on the fact that there are existing feasible, and even well-
performing, solutions for CD types when these occur individually or under some assumptions. Furthermore, as
there is no one solution to rule out all CD types, one is required to find an appropriate solution for each case.
However, using multiple models, it is possible to activate one or a subset of these models as the dynamics of the
system alter over time. Fig. 2 presents a schematic view of the learning problem during different CD types and
possible remedy with an ensemble ML model obtained using an initial search process. Intuitively, one can perform
a two-step optimization process where the first step is responsible for the ensemble configuration in terms of the
model and the data it obtained during the training phase, and the second step is to find and train an ML model with
the CD-handling method that best suits the data it obtained.
4.1 Task definition
Initially, to measure how well a population of ML models handles a CD scenario, a metric needs to be defined.
Intuitively, the population of ML models should perform well on the initial dataset at some time, t, and not
lose this performance over some fixed duration of time. Hence, let us consider a population of ML models,
M:= [M1, . . . , Mk]and a dataset that increases over time D(t)Rn×m, such that each model (MiM)
obtains a subset, DiD(t)at some point in time t. For a given event horizon τR+and a performance metric,
ψ, the CD handling performance of the ML models population, L(M, D(t))ψ,τ , is defined as follows:
L(M, D(t))ψ,h := ω1ψ(M, D(t)) ω2ψ(M , D(t)) ψ(M, D(t+τ)),(2)
where ω1R+and ω2R+are the weights of the model’s performance at the end of the training phase and the
weight of the CD’s influence on the models’ performance, respectively. Based on this definition, one can formalize
an optimization task to find the population of ML models as follows:
max
ML(M, D(t))ψ,τ .(3)
Draft: December 13, 2024 7
Time
Data distribution
Shift Moving
Model 1 Model 2
Model 3
Figure 2: A schematic view of the learning problem during different CD types and possible remedy with ensemble
ML model obtained using an initial search process. The distributions over time are shown as mean ±standard
deviation of some random variable, as reflected on the y-axis. The x-axis indicates steps in time. In this example, a
shift CD has occurred between the third and fourth steps in time. In addition, a moving CD has occurred between
the sixth and tenth steps in time. A possible ensemble model to tackle this condition would detect the shift and
moving CDs and use three models, one to capture the original data between the CDs, a second model that takes
into account the recent, seemingly stable, data with some ”tail” of the moving CD, and a third model that based
only on the recent time without CD.
4.2 Genetic algorithm based solution
One can solve Eq. (3) in multiple ways. Naively, assuming a finite number of ML models and a maximum number
of models in the population, one can theoretically brute force the optimization task. Nonetheless, due to the
extremely large set of possible solutions, such an approach is infeasible in practice [86]. In an opposite manner,
one can try this optimization task analytically, however, without further assumptions over either M,D(t), or ψit
seems infeasible to obtain such a solution. Thus, one can use a heuristic approach to solve Eq. (3). In this study,
we suggest an adoption of the classical GA for this task.
Formally, we assume that the population of ML models, M, is defined by both the number of models, n,
as well as the models themselves. Each ML model, MiM, is contracted from a feature engineering algo-
rithm, a supervised ML algorithm, and a hyperparameters tuning algorithm, each of these algorithms is chosen
from a pre-defined and finite set of algorithms. In addition, each ML pipeline model is affected by the data it
is provided with during the training and testing phases. As such, each model should be provided with a non-
empty subset, diD(t)that is used to train the ML pipeline model. Based on this representation, solving for
Eq. (3) would optimize ψ(M , D(t)) while is not considered ψ(M, D(t)) ψ(M , D(t+τ)) as no remedy for
the treating change over time for D(t)is considered. Consequently, one should include a CD detection algo-
Draft: December 13, 2024 8
rithm for each ML pipeline model as well as for the entire population. As such, Mcan be represented by M:=
{(fg, mg, hg, D(t), cdg), n, (f1, m1, h1, d1, cd1),...,(fn1, mn1, hn1, dn1, cdn1),(fn, mn, hn, dn, cdn)
such that fiFis the feature engineering algorithm, miMis the supervised ML algorithm, hiHis the
hyperparameters tuning algorithm, diD(t)is subset of the data used to train (fi, mi, hi),cdiCD is the
CD detection algorithm. In addition, (fg, mg, hg, cdg)is the global ML pipeline model responsible which gets as
input the output of (fi, mi, hi)and cdifor each i[1, . . . , n]and making the final prediction of the Mmodel.
To this end, we proposed a genetic-based algorithm for finding a population of ML models to handle CD.
The algorithm works as follows. First, a population of Msolutions (i.e., a population of ML pipeline models
populations) is generated at random such that each ML pipeline is chosen at random with a uniform distribution.
Specifically, the ML pipelines as well as the CD detector algorithms are chosen at random with a uniform dis-
tribution. However, diare chosen by picking indexes t1and t2with a Poissonian distribution decaying from the
latest sample in D(t)to the first one. In addition, the performance of the best solution from P0, as defined by
Eq. 2, is computed. Now, for ψNgenerations, three operations are taking place, the mutation,crossover, and
selection operators to generate the next-generation population Pi+1. If MPiis found to be better than the best
performing Mso far, it is becoming the best M. The best-performing Mduring the entire process is returned as
the answer of the model. The mutation operator is stochastically employed, for each MPi, with probability
ξ[0,1]. First, we randomly decide if to mutate the global ML model or one of the inner ML models in the
population w.r.t. with probability ζ[0,1]. Either way, we replace one of the components of the ML pipeline
model with another one, in a uniform manner distribution. For the case of di, the start index (t1) or the end index
(t2) is altered. First, the start or end index is randomly picked (with equal probability), and then a rounded value
xwhich is distributed normally with µR+and σR+as the mean and standard deviation of the distribution.
Cross-over is employed for two M-Maand Mbin population Piwith the goal of creating two next-generation
Maand Mb. We randomly choose a split-size 1< s < min(|Ma|,|Mb|), and use it to split both ML popu-
lations, each to two random subsets - one of size sand one of size |Ma| sand |Mb| s, respectively. i.e.,
Ma=Ms
aM|Ma|−s
aand Mb=Ms
bM|Mb|−s
b. The cross-over then unifies complementing subsets from aand
b, creating Mab and Mba as follows Mab =Ms
aM|Mb|−s
b, Mba =Ms
bM|Ma|−s
a. The cross-over operation is
performed over the entire population Pi.Piis first split into disjointed pairs of M, and then the cross-over is per-
formed on each such pair. Last, after employing mutation and cross-over, we employ the selection operator which
forms the next-generation population Pi+1. We use the royalty tournament operator [50], which selects the best
α[0,1] ML populations from Piaccording to the fitness function L(M, D). The rest of the ML populations are
sampled (with repetitions) according to their fitness score, i.e., with probability pselect (M) = L(M,D)
PMPiL(M,D).
A pseudo-code representation of the proposed algorithm is presented in Algo 1. Fig. 3 provides a schematic view
of Algorithm 1.
Algorithm 1 Genetic algorithm for population of ML models during concept drift
1: Input: dataset (D),performance measuring metric (ψ),event horizon (τ)
2: Output: population of ML models (M)
3: P1generate (nt[2, N ]) ML pipelines in random
4: Lbest MP1: max(L(M, D))
5: for generation i[1, . . . , ϕ]do
6: PiMutation Operator(Pi, D.ψ, τ )
7: PiCrossover Operator(Pi, D.ψ, τ )
8: Pi+1 Selection Operator(Pi, D.ψ , τ)
9: if MPi: max(L(M, D)) > Lbest then
10: Lbest maxM(L(M, D))
11: end if
12: end for
13: return argmaxMPimax(L(M , D))
Draft: December 13, 2024 9
Dataset (D)
Performance metric (ψ)
Event horizon (τ)
Initial population generation
Pre-defined library
F, M, H D CD
Search space
P0 := [[ , ..., ],
[ , ..., ],
....,
[ , ..., ]]
Mutation operator
Crossover operator Selection operator
Result
[ , ..., ]
[ , ..., ]
[ , ..., ]
Parent a:
Parent b:
child ab child ba
Fittness
Solution
Royalty
[ , ..., ]
[ , ..., ]
P
i+1 P
i
[ , ..., ]
Figure 3: A schematic view of Algorithm 1.
4.3 Improved version using the divide and conquer approach
One can notice that the proposed model is a mixture of finding the optimal ML pipeline, (fi, mi, hifor each ML
model in the population given a subset of the dataset, di, finding the optimal subset of data and CD model, (di, cdi,
for each ML pipeline, and finding the global ML model as well as the CD model. Technically, the proposed model
seeks to solve three dependent optimization tasks.
Notably, the CD model for each ML pipeline in the ensemble does not have an effect on the rest of the
components in the ML pipeline and only plays a role as an input for the global ML model. As such, once a
subset of the dataset, diis terminated, the optimization task of each ML pipeline in the ensemble is independent.
Moreover, once all ML models in the ensemble are obtained, it defines a single optimization task to find the global
ML pipeline. To this end, one can treat the finding of the CD models in the ensemble (i:cdi) as a constraint as
part of the feature engineering component in the global model (fg). Hence, the proposed model can be re-arranged
as follows. First, divide the dataset, D(t), into subsets provided to the ML pipeline models in the ensemble. For
each data subset (di), find the optimal ML pipeline model (fi, mi, hi). Finally, uses these models, repeating the
same process to find an optimal global ML pipeline model together with an adjustment CD model.
In order to solve this representation of the model, the second and third tasks can be solved using an AutoML
model designed especially for this task. For the data splitting task, we use Algo 1 with slight modification in which
the algorithm assumes each model in the ensemble is defined only by its dicomponent.
5 Experimental Setup
Evaluating the proposed algorithm requires three components: dataset, baseline comparison, and performance
evaluation metric. In this section, we outline these components which be used in the following section to assess
the proposed algorithm.
5.1 Dataset curation
In order to evaluate the proposed model, one is required to obtain a statistically large set of cases with CD in
various levels and combinations. Moreover, as different CD can happen in multiple ways in parallel for different
Draft: December 13, 2024 10
subsets (usually features) of the dataset, one should include this representation in the model. Due to the multiple
moving parts and the challenge of detecting CD accurately [87], we used synthetic datasets for our analysis.
Intuitively, when developing an ML model some dataset is already established based on data gathered, marked by
D(0). This can be considered the initial (or first) phase of the data curation process. At this point, we can assume
no CD is present and there is some connection between the target and source features. After this point, as part
of the second phase, more data streams to the dataset over time with discrete steps. At each step in time, there is
either a CD event or not. If no CD is present, more data, according to some distribution (which is unknown for the
ML model) is generated at random and added to the dataset, D(t), at time t. Otherwise, a concept drift event is
associated with the change of a subset (or all) features’ distributions of the dataset during a period of tN. In
addition, CD can also introduce change to the connection between the source and target feature which for the ML
reflected like noise. Fig 4 shows a schematic view of the dataset generation process for the experiments.
Phase 1 - D(t=0)
f1f2fm
. . .
S1
S2
Sn
X
.
.
.
ft
dist1
dist2Random
formula
Phase 2 - D(t=i+1) D(t=i)
Concept Drift event Regular step in time
dist i
For each feature (fi)
Sample
D(t=i+1)
Add to
Calculate
ft
dist i
For each feature (fi)
Sample
D(t=i+1)
Add to
Calculate
ft
CD dist
CD noise
Figure 4: A schematic view of a dataset generation process for the experiments.
Formally, a dataset (D(t)) is generated as follows. First, the initial data, D(0), is generated followed by a
function F:D(t)D(t+ 1) which accepts the dataset at a point in time, t, and returns the same dataset for
the next point in time, t+ 1. For the initial data, a random number of samples, sN, and features, fNis
picked at random from some pre-defined distribution. Then, for each feature, a random distribution is chosen,
from a pre-defined set of known distributions, and ssamples are obtained from it. In order to obtain a meaningful
regression task, the target feature of each dataset is obtained by defining a random function (contracted from a
combination of polynomials, exponential, trigonometric, logarithmic, and step-wise functions) which gets as an
input the other features of the dataset. To be exact, the function is constructed by randomly picking a topology for
an expression tree where each leaf node is a feature from the dataset and each decision node is a function from a
pre-defined set of functions [88]. For the function, F, a sequence of CD events is set alongside a default behavior.
For the default behavior, which is used when there are no CD events for a specific step in time, a random number
ζNof new samples is generated for each feature using the current distributions associated with each feature
and then the target feature is computed for these samples using the current formula. In a complementary manner,
a CD event at some time changes the distribution of a subset (or all) features in the dataset over a period of t
steps in time as well as changing the formula used to calculate the target feature.
In order to obtain a robust representation of an algorithm’s performance, we generated multiple datasets for
each explored configuration. The hyperparameters used as part of the data generation process are summarized in
Table 1.
Draft: December 13, 2024 11
Hyperparameter Description Value range
n, m The number of samples and features in the initial dataset [102,105,350]
distiThe distribution of a feature in the dataset Normal, Exponential, Binomial, Geo-
metric, Benford, Uniform
ηFormula topology’s size [n, 10n]
τFormula functions [+,,, exp, log, inv, sin, scalar, step
wise]
Table 1: The hyperparameters used as part of the data generation with their value ranges.
For the experiments, we used the ML components available in the Scikit-learn library [81]. In addition, we
used the CD algorithms provided by the Frouros library [89].
5.2 Baseline algorithms
For the comparison of the proposed algorithm with other configurations, we establish three baseline models. First,
asingle-random model where only a single ML pipeline and CD model are chosen at random. Second, a multi-
random model where the proposed ML population with global ML model is used but each of them is picked
at random. Third, a single ML pipeline with a CD algorithm which is obtained using TPOT and brute force,
respectively. In addition, the proposed model and its improvement are considered in the evaluation as the fourth
and fifth candidates, respectively.
5.3 Performance metric
Assessing the performance of the overall model requires measuring the performance of the model on various tasks
over time. To this end, for a specific dataset, D(t), and a time frame of interest, t0t1(t1> t0), the performance
of the model is defined to be the average performance over all steps in time. Formally, the ML pipeline with CD
adoption model’s performance is defined to be [90]:
θ(M) := 1
(t1t0)N
N
X
i=0
t1
X
j=t0
ψM, Dj(i)(4)
6 Results
In this section, we outline the results of the experiments. First, we compare the proposed model to other single-
and multi- ML pipeline solutions. Afterward, we explore the robustness and sensitivity of the best solution to the
concept drift rate, dataset’s size, and dataset’s complexity.
6.1 Performance comparison
Table 2 presents the comparison between the five models with the metric presented in Eq. (2) for four cases - shift,
moving, mixed, and random CD. For the shift concept drift, the datasets are generated with a single CD starting
at a random point in time and have a random drift rate between 0.1and 0.2, chosen in a uniformly distributed
manner. Similarly, the moving CD case is identical to the shift case but with a drift rate between 0.01 and 0.02.
For the mixed case, a random number of CD can occur ranging between 2 and 10 with the constraint that at least
one would be a shift CD and another is a moving CD. Finally, the random case is like the mixed case but without
any constrain on the CD type. For each case, we used n= 1000 datasets.
Draft: December 13, 2024 12
Model Shift Moving Mixed Random
Single-random 0.59 (0.13) 0.65 (0.11) 0.47 (0.18) 0.48 (0.18)
Multi-random 0.63 (0.10) 0.69 (0.08) 0.56 (0.15) 0.58 (0.16)
TPOT 0.68 (0.09) 0.75 (0.06) 0.52 (0.16) 0.51 (0.16)
Proposed 0.74 (0.07) 0.80 (0.06) 0.64 (0.11) 0.64 (0.11)
Proposed improved 0.76 (0.06) 0.81 (0.05) 0.67 (0.10) 0.67 (0.10)
Table 2: Comparison of different ML pipelines with CD detection models for four different CD cases - shift,
moving, mixed, and random. The results are shown as the mean with the standard deviation in brackets of Eq. (2).
6.2 Sensitivity analysis
In order to evaluate the proposed model’s performance in different settings, we explore the performance of both the
proposed model and its improved version on different concept drift rates, dataset sizes, and dataset’s complexity.
For the dataset’s complexity, we adopted the metric proposed by [91] which associated the dataset’s complexity
with its non-linearity measured by by using the 1R2value obtained from a linear regression model trained on
the dataset.
6.2.1 Drift rate
Table 3 summarizes the performance, in terms of Eq. (2), as the mean of n= 1000 cases for each drift rate. One
can notice an average decrease in the performance of the model as the drift rate increases. Nonetheless, for a drift
rate of 0.2, the performance of both models slightly increases as the shift CD is clearer and easier to detect by the
CD algorithms in the ensemble. Importantly, we allow between 2 and 10 CDs for each sample such that all of
them have the same drift rate.
Model \Drift rate 0.01 0.025 0.05 0.075 0.1 0.15 0.2
Proposed 0.81 (0.06) 0.80 (0.06) 0.79 (0.06) 0.77 (0.07) 0.73 (0.07) 0.74 (0.07) 0.75 (0.06)
Proposed improved 0.81 (0.05) 0.81 (0.05) 0.80 (0.06) 0.77 (0.06) 0.75 (0.06) 0.76 (0.06) 0.77 (0.06)
Table 3: Sensitivity analysis for the proposed model and its improved version in terms of drift rate.
6.2.2 Dataset size
Table 5 summarizes the performance as the mean of n= 1000 cases for datasets with different initial sizes and
growth rates. The growth rate is the number of new samples added to the dataset in each step in time. For this
analysis, we used the random CD case (see Section 6.1). The analysis shows that for datasets with relatively
large initial sizes (>104), the performance over the growth rate is more stable for both models. However, for
relatively small initial sizes (<103), the growth rate has the dominant effect on the model’s performance. Overall,
a tendency for more data is more beneficial for the proposed model.
6.2.3 Dataset’s complexity
Table 5 summarizes the performance as the mean of n= 1000 cases for datasets with different complexity levels.
For this analysis, we used the random CD case (see Section 6.1). This analysis shows that while more “complex”
dataset has lower results in absolute terms due to the poorer performance of ML in general, the proposed model is
only slightly negatively affected by the dataset’s complexity.
Draft: December 13, 2024 13
Model Initial size \growth rate 100101102103
Proposed
1020.42 (0.18) 0.44 (0.16) 0.50 (0.14) 0.57 (0.13)
1030.49 (0.17) 0.49 (0.15) 0.53 (0.14) 0.57 (0.13)
1040.62 (0.12) 0.62 (0.12) 0.62 (0.12) 0.63 (0.11)
1050.64 (0.11) 0.64 (0.11) 0.64 (0.11) 0.64 (0.11)
Proposed improved
1020.44 (0.19) 0.45 (0.18) 0.49 (0.17) 0.59 (0.12)
1030.49 (0.17) 0.49 (0.17) 0.50 (0.16) 0.59 (0.12)
1040.65 (0.12) 0.65 (0.12) 0.65 (0.11) 0.67 (0.10)
1050.67 (0.10) 0.67 (0.10) 0.67 (0.10) 0.67 (0.10)
Table 4: Sensitivity analysis for the proposed model and its improved version for different initial size and growth
rates.
Model Comparison 0.1 0.3 0.5 0.7 0.9
Proposed Absolute 0.71 (0.11) 0.78 (0.07) 0.81 (0.06) 0.80 (0.05) 0.79 (0.06)
Relative 0.89 (0.04) 0.88 (0.04) 0.90 (0.03) 0.89 (0.04) 0.90 (0.04)
Proposed improved Absolute 0.73 (0.09) 0.81 (0.05) 0.84 (0.05) 0.82 (0.06) 0.81 (0.05)
Relative 0.92 (0.03) 0.90 (0.04) 0.90 (0.04) 0.90 (0.04) 0.91 (0.04)
Table 5: Sensitivity analysis for the proposed model and its improved version in terms of the dataset’s complexity.
The absolute comparison does not account for the ML pipeline performance in the substance of CD while the
relative divides the result of Eq. (2) by ψ(M, D(0)).
6.2.4 AutoML library usage
Table 6 summarizes the performance as the mean of n= 1000 cases where the improved proposed model is
utilized with different AutoML libraries. For this analysis, we used the random CD case (see Section 6.1). One
can notice that all four examined libraries produced similar results with TPOT and AutoGluon being the best and
worse, respectively, on average.
Model \AutoML library TPOT AutoSklearn AutoGluon PyCaret
Proposed improved 0.67 (0.10) 0.65 (0.09) 0.64 (0.11) 0.64 (0.09)
Table 6: Sensitivity analysis for the proposed model and its improved version in terms of drift rate.
7 Discussion
In this study, we investigated the usage of ensemble ML models to handle CD in datasets that increase over time.
In particular, we proposed a novel instance of the GA approach for an ML pipeline with CD detection algorithm
ensemble. Using this structure, the overall model is more robust for different CD events. Moreover, we show that
one can further improve the proposed algorithm by integrating existing off-the-shelf automatic ML approaches.
Overall, the proposed model allows to use of existing ML and CD algorithms to bolster the resilience of ML
models in the face of CD in realistic scenarios.
To be exact, the two-level ML model proposed in this study holds promise in addressing the dynamic chal-
lenges posed by CD by introducing a global ML model with a CD detector operating as an ensemble model for a
population of ML pipeline models which also can be adopted by an adjusted CD detection algorithm. Hence, this
study takes a novel approach compared to the one-model-to-rule-them-all approach currently governing the field
[92, 93]. Simply put, the ensemble structure utilized by the proposed model allows individual ML to autonomously
Draft: December 13, 2024 14
adapt to subset-specific changes while feeding the prediction of each model and how well it is operating, as indi-
cated by its adjusted CD detector, showcasing the adaptability of GAs in governing diverse data subsets.
Indeed, Table 2 shows that the proposed solution provides a more robust solution, on average, for a large
number of datasets compared to a single ML pipeline with a CD algorithm obtained using an AutoML library
(such as TPOT) and brute-force of multiple (12) CD algorithms. For the most difficult and realistic settings where
the number, as well as nature or the CD, are unknown (i.e., the random case) the proposed model improves the
performance performance of the ML model by up to 0.13 while also reducing the diversity in the results between
datasets from 0.16 to 0.11, indicating a more robust and consistent results across different datasets.
Moreover, we explore the performance of the proposed model over different settings. First, Table 3 shows
that the proposed model performs better from moving CD rather than shifting CD. This outcome aligns with the
behavior of other solutions [94, 95]. Second, Table 4 reveals that the proposed model performs worse on small-
size datasets, which is also a well-known phenomenon in the realm of ML [96, 97]. However, once enough data
is available, the proposed model performs similarly. To this end, Table 5 continues the same line where more
“complex” datasets resulted in worse performance in absolute terms as the underline ML pipeline models are
performing worse. Nonetheless, in relative terms, the complexity of the dataset does not play much role in the
context of handling CD. Finally, Table 6 shows that the improved version of the proposed model is only slightly
affected by the autoML library used, at least from the popular autoML libraries currently available, and one can
choose its preferred autoML library.
This research is not without limitations. First, the proposed solution is evaluated on synthetic data due to
the challenge and resources required to find real-world cases of CD in large numbers and for a wide range of
CD behaviors. Hence, the proposed results should be taken with caution and future work should re-evaluate the
proposed method using a large number of real-world CD datasets. Second, the proposed method provides each
ML model in the population a subset of the dataset, D(t), which is continuous, ignoring the more generic cases
where one can union several continuous subsets of D(t)which can improve the performance the ML model alone
and the entire performance, as a whole. One can investigate the contribution of such an extension to the proposed
method’s performance. Third, GA in general, and for the discussed case, in particular, often requires tuning several
hyperparameters, such as population size, crossover rate, and mutation rate to achieve their optimal performance
[98, 99]. In this study, we chose such hyperparameter values through a manual trial and error approach which
probably do not result in the optimal values. Future studies may explore the sensitivity of the proposed GA-based
approach to variations in these parameters and find the optimal hyperparameter values with respect to assumptions
on the CD dynamics or the data itself. Fourth, the presented experiments focused on regression tasks, ignoring
classification tasks. Extending the evaluation to classification tasks can shed more light on the performance of
the proposed model in a wider context. Finally, following a more general trend [100, 101, 102, 103, 104], one
can reduce the search space of the proposed task and therefore improve the proposed algorithm by introducing
knowledge about which ML models are appropriate to which types of CD and the best matching between an ML
model and a CD detector.
This study marks a significant stride in fortifying ML models against the persistent challenges posed by CD
through the approach of a two-level ensemble of ML models working together with the CD application of genetic
algorithms GAs. The adaptability demonstrated, particularly in discrete spaces where traditional optimization
methods face limitations, highlights the promising role of GAs in addressing the nuanced demands of evolving
data distributions. While the research outcomes are encouraging, avenues for refinement and future exploration are
recognized. Extending the generalizability across diverse domains, conducting a more comprehensive sensitivity
analysis, and optimizing computational complexity for broader applicability should be key considerations. In
essence, this research contributes to the ongoing discourse on adaptive learning systems, leveraging GAs in an
ensemble ML context to navigate the challenges presented by dynamic data landscapes.
Draft: December 13, 2024 15
Declarations
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit
sectors.
Conflicts of interest/Competing interests
None.
Materials availability
The materials used for this study are available from the author upon reasonable request.
References
[1] T. Lazebnik. Data-driven hospitals staff and resources allocation using agent-based simulation and deep
reinforcement learning. Engineering Applications of Artificial Intelligence, 126:106783, 2023.
[2] T. Lazebnik, Z. Bahouth, S. Bunimovich-Mendrazitsky, and S. Halachmi. Predicting acute kidney injury
following open partial nephrectomy treatment using sat-pruned explainable machine learning model. BMC
Medical Informatics and Decision Making, 22:133, 2022.
[3] E. Savchenko and T. Lazebnik. Computer aided functional style identification and correction in modern
russian texts. Journal of Data, Information and Management, 4:25–32, 2022.
[4] L. Shami and T. Lazebnik. Implementing machine learning methods in estimating the size of the non-
observed economy. Computational Economics, 2023.
[5] A. Oren, J. D. Turkcu, S. Meller, T. Lazebnik, P. Wiegel, R. Mach, H. A. Volk, and A. Zamansky.
Brachysound: machine learning based assessment of respiratory sounds in dogs. Scientific Reports, 4:25–
32, 2022.
[6] E. Savchenko, A. Rosenfeld, and S. Bunimovich-Mendrazitsky. Mathematical modeling of bcg-based blad-
der cancer treatment using socio-demographics. Scientific Reports, 13:18754, 2023.
[7] I. ˇ
Zliobaite, M. Pechenizkiy, and J. Gama. Big Data Analysis: New Algorithms for a New Society, vol-
ume 16. Springer, 2016.
[8] J. M. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, and A. Bouchachia. A survey on concept drift adaptation.
ACM Computing Surveys, 46(4):1–37, 2014.
[9] J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang. Learning under concept drift: A review. IEEE
Transactions on Knowledge and Data Engineering, 31(12):2346–2363, 2019.
[10] P. M. Goncalves, S. G. T. de Carvalho Santos, R. S. M. Barros, and D. C. L. Vieira. A comparative study
on concept drift detectors. Expert Systems with Applications, 41(18):8144–8156, 2014.
[11] M. Harel, S. Mannor, R. El-Yaniv, and K. Crammer. Concept drift detection through resampling. In
E. P. Xing and T. Jebara, editors, Proceedings of the 31st International Conference on Machine Learning,
volume 32, pages 1009–1017, 2014.
[12] S. Wang, L. L. Minku, D. Ghezzi, D. Caltabiano, P. Tino, and X. Yao. Concept drift detection for online
class imbalance learning. In The 2013 International Joint Conference on Neural Networks (IJCNN), pages
1–10, 2013.
Draft: December 13, 2024 16
[13] P. Wang, H. Yu, N. Jin, D. Davies, and W. L. Woo. Quadcdd: A quadruple-based approach for understanding
concept drift in data streams. Expert Systems with Applications, 238:122114, 2024.
[14] Q. Xiang, L. Zi, X. Cong, and Y. Wang. Concept drift adaptation methods under the deep learning frame-
work: A literature review. Applied Sciences, 13(11), 2023.
[15] J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, and Bouchachia A. A survey on concept drift adaptation.
ACM Computing Surveys (CSUR), 46, 2014.
[16] L. Yang, S. McClean, M. Donnelly, K. Burke, and K. Khan. Detecting and responding to concept drift in
business processes. Algorithms, 15(5), 2022.
[17] J. H. Holland. Genetic algorithms. Scientific American, 267(1):66–73, 1992.
[18] M. Kumar, M. Husain, N. Upreti, and D. Gupta. Genetic algorithm: Review and application. International
Journal of Information Technology and Knowledge Management, 2(2):451–454, 2010.
[19] L. Bo and L. Rein. Comparison of the luus–jaakola optimization procedure and the genetic algorithm.
Engineering Optimization, 37(4):381–396, 2005.
[20] A. Ghaheri, S. Shoar, M. Naderan, and S. S. Hoseini. The applications of genetic algorithms in medicine.
Oman Med J., 30(6):406–416, 2005.
[21] J. Zhao and M. Xu. Fuel economy optimization of an atkinson cycle engine using genetic algorithm. Applied
Energy, 105:335–348, 2013.
[22] B. R. Routledge. Genetic algorithm learning to choose and use information. Macroeconomic Dynamics,
5(2):303–325, 2001.
[23] A. E. Drake and R. Marks. Genetic algorithms in economics and finance: Forecasting stock market prices
and foreign exchange—a review. In SH. Chen, editor, Genetic Algorithms and Genetic Programming in
Computational Finance, pages 29–54. Springer, Boston, MA, 2002.
[24] R. S. Olson and J. H. Moore. Tpot: A tree-based pipeline optimization tool for automating machine learning.
In Workshop on Automatic Machine Learning, pages 66–74. PMLR, 2016.
[25] T. Worm and K. Chiu. Prioritized grammar enumeration: symbolic regression by dynamic programming.
In Proceedings of the 15th annual conference on Genetic and evolutionary computation, pages 1021–1028,
2013.
[26] L. Kammerer, G. Kronberger, B. Burlacu, S. M. Winkler, M. Kommenda, and M. Affenzeller. Symbolic
regression by exhaustive search: reducing the search space using syntactical constraints and efficient se-
mantic structure deduplication. In Genetic Programming Theory and Practice XVII, pages 79–99. Springer,
2020.
[27] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning frame-
work for solving forward and inverse problems involving nonlinear partial differential equations. Journal
of Computational physics, 378:686–707, 2019.
[28] J. Shapiro. Genetic Algorithms in Machine Learning, pages 146–168. 2001.
[29] K. De Jong. Learning with genetic algorithms: An overview. Machine Learning, 3:121–138, 1988.
[30] J. W. Herrmann. A genetic algorithm for minimax optimization problems. In Proceedings of the 1999
Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), volume 2, pages 1099–1103, 1999.
[31] A. Sehgal, H. La, S. Louis, and H. Nguyen. Deep reinforcement learning using genetic algorithm for
parameter optimization. In 2019 Third IEEE International Conference on Robotic Computing (IRC), pages
596–601, 2019.
Draft: December 13, 2024 17
[32] R. Chen, B. Yang, S. Li, and S. Wang. A self-learning genetic algorithm based on reinforcement learning
for flexible job-shop scheduling problem. Computers & Industrial Engineering, 149:106778, 2020.
[33] A. J. Heppenstall, A. J. Evans, and M. H. Birkin. Genetic algorithm optimisation of an agent-based model
for simulating a retail market. Environment and Planning B: Urban Analytics and City Science, 2007.
[34] E. Padmalatha, C. R. K. Reddy, and B. P. Rani. Classification of concept-drifting data streams using
optimized genetic algorithm. International Journal of Computer Applications, 125(15), 2015.
[35] M. Smith and V. Ciesielski. Adapting to concept drift with genetic programming for classifying streaming
data. In 2016 IEEE Congress on Evolutionary Computation (CEC), pages 5026–5033, 2016.
[36] Zliobaite, I., M. Pechenizkiy, and J. Gama. An Overview of Concept Drift Applications, pages 91–114.
Springer International Publishing, 2016.
[37] T. Lazebnik and S. Bunimovich-Mendrazitsky. The signature features of covid-19 pandemic in a hy-
brid mathematical model—implications for optimal work–school lockdown policy. Adv. Theory Simul.,
4(5):e2000298, 2021.
[38] P. Chowdhury, S. K. Paul, S. Kaisar, and A. Moktadir. Covid-19 pandemic related supply chain studies:
A systematic review. Transportation Research Part E: Logistics and Transportation Review, 148:102271,
2021.
[39] I. N. Pujawan and A. U. Bah. Supply chains under covid-19 disruptions: literature review and research
agenda. Supply Chain Forum: An International Journal, 23(1):81–95, 2022.
[40] F. Maggi, W. Robertson, C. Kruegel, and G. Vigna. Protecting a moving target: Addressing web application
concept drift. In E. Kirda, S. Jha, and D. Balzarotti, editors, Recent Advances in Intrusion Detection, pages
21–40. Springer Berlin Heidelberg, 2009.
[41] S. Madireddy, P. Balaprakash, P. Carns, R. Latham, G. K. Lockwood, R. Ross, S. Snyder, and S. M. Wild.
Adaptive learning for concept drift in application performance modeling. In Proceedings of the 48th Inter-
national Conference on Parallel Processing, 2019.
[42] P. Vivekanandan and R. Nedunchezhian. Mining data streams with concept drifts using genetic algorithm.
Artificial Intelligence Review, 36:163–178, 2011.
[43] T. Buchgraber, D. Shutin, and H. V. Poor. A sliding-window online fast variational sparse bayesian learning
algorithm. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
pages 2128–2131, 2011.
[44] S. H. Bach and M. A. Maloof. Paired learners for concept drift. In 2008 Eighth IEEE International
Conference on Data Mining, pages 23–32, 2008.
[45] A. Bifet and Ricard G. Learning from time-changing data with adaptive windowing. In Proceedings of the
2007 SIAM international conference on data mining, pages 443–448, 2007.
[46] G-B. Huang, Q-Y. Zhu, and C-Q. Siew. Extreme learning machine: Theory and applications. Neurocom-
puting, 70(1):489–501, 2006.
[47] F. Fdez-Riverola, E. L. Iglesias, F. Diaz, J. R. Mendez, and J. M. Corchado. Applying lazy learning algo-
rithms to tackle concept drift in spam filtering. Expert Systems with Applications, 33(1):36–48, 2007.
[48] N. Lu, J. Lu, G. Zhang, and R. Lopez de Mantaras. A concept drift-tolerant case-base editing technique.
Artificial Intelligence, 230:108–133, 2016.
[49] A. Sohail. Genetic algorithms in the fields of artificial intelligence and data sciences. Annals of Data
Science, 10:1007–1018, 2023.
Draft: December 13, 2024 18
[50] Z. W. Bo, L. Z. Hua, and Z. G. Yu. Optimization of process route by genetic algorithms. Robotics and
Computer-Integrated Manufacturing, 22:180–188, 2006.
[51] M. Salehi and A. Bahreininejad. Optimization process planning using hybrid genetic algorithm and intelli-
gent search for job shop machining. Journal of Intelligent Manufacturing, 22(4):643–652, 2011.
[52] L. Davis. Applying adaptive algorithms to epistatic domains. Proceedings of the international joint confer-
ence on artificial intelligence, pages 162–164, 1985.
[53] A. B. A. Hassanat and E. Alkafaween. On enhancing genetic algorithms using new crossovers. International
Journal of Computer Applications in Technology, 55(3), 2017.
[54] Y. Kaya, M. Uyar, and Tekin R. A novel crossover operator for genetic algorithms: ring crossover. arXiv,
2011.
[55] J. Zhao and M. Xu. Fuel economy optimization of an atkinson cycle engine using genetic algorithm. Applied
Energy, 105:335–348, 2013.
[56] J. Zhao, M. Xu, M. Li, B. Wang, and S. Liu. Design and optimization of an atkinson cycle engine with the
artificial neural network method. Applied Energy, 92:492–502, 2012.
[57] B. R. Routledge. Genetic algorithm learning to choose and use information. Macroeconomic Dynamics,
5(2):303–325, 2001.
[58] A. Ariel, T. Lazebnik, and L. Shami. Microfounded tax revenue forecast model with heterogeneous popu-
lation and genetic algorithm approach. Computational Economics, 2023.
[59] T. Lazebnik. Cell-level spatio-temporal model for a bacillus calmette&ndash;gu&eacute;rin-based im-
munotherapy treatment protocol of superficial bladder cancer. Cells, 11(15), 2022.
[60] A. S. Iwashita and J. P. Papa. An overview on concept drift learning. IEEE Access, 7:1532–1547, 2019.
[61] G. H. F. M. Oliveira, R. C. Cavalcante, G. G. Cabral, L. L. Minku, and A. L. I. Oliveira. Time series
forecasting in the presence of concept drift: A pso-based approach. In 2017 IEEE 29th International
Conference on Tools with Artificial Intelligence (ICTAI), pages 239–246, 2017.
[62] S. Agrahari and A. K. Singh. Concept drift detection in data stream mining : A literature review. Journal
of King Saud University - Computer and Information Sciences, 34(10):9523–9540, 2022.
[63] H. Ghomeshi, M. M. Gaber, and Y. Kovalchuk. Eacd: evolutionary adaptation to concept drifts in data
streams. Data Mining and Knowledge Discovery, 33:663–694, 2019.
[64] M. Tareq and E. A. Sundararajan. A new density-based method for clustering data stream using genetic
algorithm. Technology Reports of Kansai University, 62(11):6557–6572, 2020.
[65] C. Kuranga and N. Pillay. Genetic programming-based regression for temporal data. Genetic Programming
and Evolvable Machines, 22:297–324, 2021.
[66] M. Henke, E. Santos, E. Souto, and A. O. Santin. Spam detection based on feature evolution to deal with
concept drift. Journal of Universal Computer Science, 27(4):364–386, 2021.
[67] B. V. Dasarathy and B. V. Sheela. A composite classifier system design: concepts and methodology. Pro-
ceedings of the IEEE, 67(5):708–713, 1979.
[68] R. E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197–227, 1990.
[69] L. Breiman. Random forests. Machine Learning, 45:5–32, 2001.
Draft: December 13, 2024 19
[70] P. H. Swain and H. Hauska. The decision tree classifier: Design and potential. IEEE Transactions on
Geoscience Electronics, 15(3):142–147, 1977.
[71] I. Drori, Y. Krishnamurthy, R. Rampin, R. de Paula Lourenco, J. P. Ono, K. Cho, C. Silva, and J. Freire.
Alphad3m: Machine learning pipeline synthesis. arXiv, 2021.
[72] X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. Ma. A survey on ensemble learning. Front. Comput. Sci., 14(2):251–
258, 2020.
[73] D. Brzezinski and J. Stefanowski. Reacting to different types of concept drift: The accuracy updated
ensemble algorithm. IEEE Transactions on Neural Networks and Learning Systems, 25(1):81–94, 2014.
[74] V. W. Berger and Y. Zhou. Kolmogorov–Smirnov Test: Overview. John Wiley & Sons, Ltd, 2014.
[75] T. Lazebnik, T. Fleischer, and A. Yaniv-Rosenfeld. Benchmarking biologically-inspired automatic machine
learning for economic tasks. Front. Comput. Sci., 15(14):11232, 2023.
[76] Q. Yao, M. Wang, Y. Chen, W. Dai, Y-F. Li, W. W. Tu, Q. Yang, and Y. Yu. Taking human out of learning
applications: A survey on automated machine learning. arXiv, 2019.
[77] E. Nisioti, K. C. Chatzidimitriou, and A. L. Symeonidis. Predicting hyperparameters from meta-features in
binary classification problems. ICML 2018 AutoML Workshop, 2018.
[78] F. Pinto, V. Cerqueira, C. Soares, and J. Mendes-Moreira. autobagging: Learning to rank bagging workflows
with metalearning. arXiv, 2017.
[79] P. Molino, Y. Dudin, and S. S. Miryala. Ludwig: a type-based declarative deep learning toolbox. arXiv,
2019.
[80] T. Lazebnik and A. Rosenfeld. Fspl: A meta–learning approach for a filter and embedded feature selection
pipeline. International Journal of Applied Mathematics and Computer Science, 33(1):103–115, 2023.
[81] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
R. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in python. Journal of machine learning
research, 12:2825–2830, 2011.
[82] M. Feurer, A. Klevin, K. Eggensperger, J. T. Springenberg, M. Blum, and F. Hutter. Auto-sklearn: Efficient
and Robust Automated Machine Learning. 2019.
[83] M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter. Auto-sklearn 2.0: Hands-free automl
via meta-learning. arXiv, 2020.
[84] N. Erickson, J. Mueller, A. Shirkov, H. Zhang, P. Larroy, M. Li, and A. Smola. Autogluon-tabular: Robust
and accurate automl for structured data. arXiv, 2020.
[85] M. Ali. PyCaret: An open source, low-code machine learning library in Python, April 2020. PyCaret
version 1.0.
[86] K. Chauhan, S. Jani, D. Thakkar, R. Dave, J. Bhatia, S. Tanwar, and M. S. Obaidat. Automated machine
learning: The new wave of machine learning. In 2020 2nd International Conference on Innovative Mecha-
nisms for Industry Applications (ICIMIA), pages 205–212, 2020.
[87] J. Demsar and Z. Bosnic. Detecting concept drift in data streams using model explanation. Expert Systems
with Applications, 92:546–559, 2018.
[88] L. S. Keren, A. Liberzon, and T. Lazebnik. A computational framework for physics-informed symbolic
regression with straightforward integration of domain knowledge. Scientific Reports, 13:1249, 2023.
Draft: December 13, 2024 20
[89] J. C´
espedes-Sisniega and A. L´
opez-Garc´
ıa. Frouros: A python library for drift detection in machine learning
systems. arXiv, 2022.
[90] F. Bayram, B. S. Ahmed, and A. Kassler. From concept drift to model degradation: An overview on
performance-aware drift detectors. Knowledge-Based Systems, 245:108632, 2022.
[91] A. Shmuel, O. Glockman, and T. Lazebnik. Symbolic regression as feature engineering method for machine
and deep learning regression tasks. arXiv, 2023.
[92] I. Zliobaite. Learning under concept drift: an overview. arXiv, 2010.
[93] H. Hu, M. Kantardzic, and T. S. Sethi. No free lunch theorem for concept drift detection in streaming data
classification: A review. WIREs Data Mining and Knowledge Discovery, 10(2):e1327, 2019.
[94] I. Khamassi, M. Sayed-Mouchaweh, M. Hammami, and K. Ghedira. Discussion and review on evolving
data streams and concept drift adapting. Evolving Systems, 9:1–23, 2018.
[95] E. Yu, Y. Song, G. Zhang, and J. Lu. Learn-to-adapt: Concept drift adaptation for hybrid multiple streams.
Neurocomputing, 496:121–130, 2022.
[96] G-J. Qi and J. Luo. Small data challenges in big data era: A survey of recent progress on unsupervised and
semi-supervised methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(4):2168–
2187, 2022.
[97] M. Raissi and G. E. Karniadakis. Hidden physics models: Machine learning of nonlinear partial differential
equations. Journal of Computational Physics, 357:125–141, 2018.
[98] M. Angelova and T. Pencheva. Tuning genetic algorithm parameters to improve convergence time. Inter-
national Journal of Chemical Engineering, 2011:646917, 2011.
[99] M. Mosayebi and M. Sodhi. Tuning genetic algorithm parameters using design of experiments. In Pro-
ceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, page 1937–1944,
2020.
[100] G. Kronberger, F. O. de Franc¸a, B. Burlacu, C. Haider, and M. Kommenda. Shape-constrained symbolic
regression—improving extrapolation with prior knowledge. Evolutionary Computation, 30(1):75–98, 2022.
[101] R. Wu, Y. Fujita, and K. Soga. Integrating domain knowledge with deep learning models: An interpretable
ai system for automatic work progress identification of natm tunnels. Tunnelling and Underground Space
Technology, 105:103558, 2020.
[102] X. Pan and H. B. Shen. Rna-protein binding motifs mining with a new hybrid deep learning based cross-
domain knowledge integration approach. BMC Bioinformatics, 18:136, 2017.
[103] O. L. Liu, H-S. Lee, C. Hofstetter, and M. C. Linn. Assessing knowledge integration in science: Construct,
measures, and evidence. Educational Assessment, 13(1):33–55, 2008.
[104] A. Best, J. L. Terpstra, G. Moor, B. Riley, C. D. Norman, and R. E. Glasgow. Building knowledge in-
tegration systems for evidence-informed decisions. Journal of Health Organization and Management,
23(6):627–641, 2009.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Frouros is an open-source Python library capable of detecting drift in machine learning systems. It provides a combination of classical and more recent algorithms for drift detection, covering both concept and data drift. We have designed it to be compatible with any machine learning framework and easily adaptable to real-world use cases. The library is developed following best development and continuous integration practices to ensure ease of maintenance and extensibility.
Article
Full-text available
The early and accurate diagnosis of brachycephalic obstructive airway syndrome (BOAS) in dogs is pivotal for effective treatment and enhanced canine well-being. Owners often do underestimate the severity of BOAS in their dogs. In addition, traditional diagnostic methods, which include pharyngolaryngeal auscultation, are often compromised by subjectivity, are time-intensive and depend on the veterinary surgeon’s experience. Hence, new fast, reliable assessment methods for BOAS are required. The aim of the current study was to use machine learning techniques to bridge this scientific gap. In this study, machine learning models were employed to objectively analyze 366 audio samples from 69 Pugs and 79 other brachycephalic breeds, recorded with an electronic stethoscope during a 15-min standardized exercise test. In classifying the BOAS test results as to whether the dog is affected or not, our models achieved a peak accuracy of 0.85, using subsets from the Pugs dataset. For predictions of the BOAS results from recordings at rest in Pugs and various brachycephalic breeds, accuracies of 0.68 and 0.65 were observed, respectively. Notably, the detection of laryngeal sounds achieved an F1 score of 0.80. These results highlight the potential of machine learning models to significantly streamline the examination process, offering a more objective assessment than traditional methods. This research indicates a turning point towards a data-driven, objective, and efficient approach in canine health assessment, fostering standardized and objective BOAS diagnostics.
Article
Full-text available
Cancer is one of the most widespread diseases around the world with millions of new patients each year. Bladder cancer is one of the most prevalent types of cancer affecting all individuals alike with no obvious “prototypical patient”. The current standard treatment for BC follows a routine weekly Bacillus Calmette-Guérin (BCG) immunotherapy-based therapy protocol which is applied to all patients alike. The clinical outcomes associated with BCG treatment vary significantly among patients due to the biological and clinical complexity of the interaction between the immune system, treatments, and cancer cells. In this study, we take advantage of the patient’s socio-demographics to offer a personalized mathematical model that describes the clinical dynamics associated with BCG-based treatment. To this end, we adopt a well-established BCG treatment model and integrate a machine learning component to temporally adjust and reconfigure key parameters within the model thus promoting its personalization. Using real clinical data, we show that our personalized model favorably compares with the original one in predicting the number of cancer cells at the end of the treatment, with 14.8%14.8%14.8\% improvement, on average.
Article
Full-text available
Data-driven economic tasks have gained significant attention in economics, allowing researchers and policymakers to make better decisions and design efficient policies. Recently, with the advancement of machine learning (ML) and other artificial intelligence (AI) methods, researchers can now solve complex economic tasks with previously unseen performance and ease. However, to use such methods, one is required to have a non-trivial level of expertise in ML or AI, which currently is not standard knowledge in economics. In order to bridge this gap, automatic machine learning (AutoML) models have been developed, allowing non-experts to efficiently use advanced ML models with their data. Nonetheless, not all AutoML models are created equal in general, particularly for the unique properties associated with economic data. In this paper, we present a benchmarking study of biologically inspired and other AutoML techniques for economic tasks. We evaluate four different AutoML models alongside two baseline methods using a set of 50 diverse economic tasks. Our results show that biologically inspired AutoML models (slightly) outperformed non-biological AutoML in economic tasks, while all AutoML models outperformed the traditional methods. Based on our results, we conclude that biologically inspired AutoML has the potential to improve our economic understanding while shifting a large portion of the analysis burden from the economist to a computer.
Article
Full-text available
With the advent of the fourth industrial revolution, data-driven decision making has also become an integral part of decision making. At the same time, deep learning is one of the core technologies of the fourth industrial revolution that have become vital in decision making. However, in the era of epidemics and big data, the volume of data has increased dramatically while the sources have become progressively more complex, making data distribution highly susceptible to change. These situations can easily lead to concept drift, which directly affects the effectiveness of prediction models. How to cope with such complex situations and make timely and accurate decisions from multiple perspectives is a challenging research issue. To address this challenge, we summarize concept drift adaptation methods under the deep learning framework, which is beneficial to help decision makers make better decisions and analyze the causes of concept drift. First, we provide an overall introduction to concept drift, including the definition, causes, types, and process of concept drift adaptation methods under the deep learning framework. Second, we summarize concept drift adaptation methods in terms of discriminative learning, generative learning, hybrid learning, and others. For each aspect, we elaborate on the update modes, detection modes, and adaptation drift types of concept drift adaptation methods. In addition, we briefly describe the characteristics and application fields of deep learning algorithms using concept drift adaptation methods. Finally, we summarize common datasets and evaluation metrics and present future directions.
Article
Full-text available
The ability of governments to accurately forecast tax revenues is essential for the successful implementation of fiscal programs. However, forecasting state government tax revenues using only aggregate economic variables is subject to Lucas’s critique, which is left not fully answered as classical methods do not consider the complex feedback dynamics between heterogeneous consumers, businesses, and the government. In this study we present an agent-based model with a heterogeneous population and genetic algorithm-based decision-making to model and simulate an economy with taxation policy dynamics. The model focuses on assessing state tax revenues obtained from regions or cities within countries while introducing consumers and businesses, each with unique attributes and a decision-making mechanism driven by an adaptive genetic algorithm. We demonstrate the efficacy of the proposed method on a small village, resulting in a mean relative error of 5.44%±2.45%5.44%±2.45%5.44\% \pm 2.45\% from the recorded taxes over 4 years and 4.08%±1.214.08%±1.214.08\% \pm 1.21 for the following year’s assessment. Moreover, we demonstrate the model’s ability to evaluate the effect of different taxation policies on economic activity and tax revenues.
Article
Full-text available
There are two main approaches to tackle the challenge of finding the best filter or embedded feature selection (FS) algorithm: searching for the one best FS algorithm and creating an ensemble of all available FS algorithms. However, in practice, these two processes usually occur as part of a larger machine learning pipeline and not separately. We posit that, due to the influence of the filter FS on the embedded FS, one should aim to optimize both of them as a single FS pipeline rather than separately. We propose a meta-learning approach that automatically finds the best filter and embedded FS pipeline for a given dataset called FSPL. We demonstrate the performance of FSPL on n = 90 datasets, obtaining 0.496 accuracy for the optimal FS pipeline, revealing an improvement of up to 5.98 percent in the model's accuracy compared to the second-best meta-learning method.
Article
Full-text available
Even though the literature on unregistered economic activity is growing at an increasing rate, we commonly encounter simple ordinary least squares methods and panel regressions, largely ignoring the recent rapid developments in machine learning methods. This study provides a new approach to more accurately estimate the size of the non-observed economy using machine learning methods. Compared to two currency demand-based models used to estimate the size of the non-observed economy, we show that a Random Forest algorithm can more accurately estimate the demand for currency, which is known to provide a fair estimation of the unregistered economic activity. The proposed approach shows superior forecasting capabilities compared to the current state-of-the-art linear regression-based methods dedicated to estimating non-observed economic activity.