PreprintPDF Available

Abstract and Figures

This paper has been published in SIGKDD Newsletter exploration (december 2022) . ..... More and more applications require early decisions, i.e. taken as soon as possible from partially observed data. However, the later a decision is made, the more its accuracy tends to improve, since the description of the problem to hand is enriched over time. Such a compromise between the earliness and the accuracy of decisions has been particularly studied in the field of Early Time Series Classification. This paper introduces a more general problem, called Machine Learning based Early Decision Making (ML-EDM), which consists in optimizing the decision times of models in a wide range of settings where data is collected over time. After defining the ML-EDM problem, ten challenges are identified and proposed to the scientific community to further research in this area. These challenges open important application perspectives, discussed in this paper.
Content may be subject to copyright.
Open challenges for Machine Learning based
Early Decision-Making research
Alexis Bondu1, Youssef Achenchabe1,2, Albert Bifet3,5, Fabrice Cl´
erot1, Antoine Cornu ´
ao Gama4, Georges H´
ebrail3, Vincent Lemaire1, and Pierre-Franc¸ois Marteau6
1Orange Labs, France
2Paris-Saclay University, Agroparistech, France
3IPP, el´ecom Paristech, France
4University of Porto, Portugal
5University of Waikato, New Zealand
6Bretagne-Sud University, France
More and more applications require early decisions, i.e. taken
as soon as possible from partially observed data. However,
the later a decision is made, the more its accuracy tends to
improve, since the description of the problem to hand is en-
riched over time. Such a compromise between the earliness
and the accuracy of decisions has been particularly studied
in the field of Early Time Series Classification. This paper
introduces a more general problem, called Machine Learn-
ing based Early Decision Making (ML-EDM), which consists
in optimizing the decision times of models in a wide range
of settings where data is collected over time. After defin-
ing the ML-EDM problem, ten challenges are identified and
proposed to the scientific community to further research in
this area. These challenges open important application per-
spectives, discussed in this paper.
In numerous real situations, we have to make early decisions
in the absence of complete knowledge of the problem at hand.
For example, such decisions are necessary in medicine [54]
when a physician must make a diagnosis, possibly leading
to an urgent surgical operation, before having the results
of all medical tests. In such situations, the issue facing the
decision makers is that, most of the time, the longer the
decision is delayed, the clearer is the likely outcome (e.g.
the critical or not critical state of the patient) but, also,
the higher the cost that will be incurred if only because
decisions taken earlier allow one to be better prepared. We
thus seek to make decisions at times that seem to be the
best compromises between the earliness and the accuracy
of our decisions.
Similarly in Machine Learning, when the input data is ac-
quired over time, there can be situations with a trade-off be-
tween the earliness and accuracy of decisions. For instance,
this is the case for anomaly detection, predictive mainte-
nance, patient health monitoring, self-driving vehicles (see
Section 10). In each case, the decisions are time-sensitive
(e.g. in an autonomous car, it is critical to detect obstacles
on the road as early as possible and at the same time as reli-
ably as possible, in order to plan safe avoidance trajectories
if needed). In general, it is assumed that there is a gain of
information over time, i.e. delaying decisions tends to make
them more reliable (e.g. the certainty about the existence or
absence of an obstacle on the road becomes more and more
accurate as the car gets closer).
This earliness vs. accuracy dilemma is part of many decision
making scenarios, and is particularly involved in the problem
of Early Classification of Time Series (ECTS). But, as we
will see, it takes place in a larger perspective.
Early Classification of Time Series: a particular case
The ECTS problem consists in finding the optimal time to
trigger the class prediction of an input time series observed
over time. As successive measurements provide more and
more information about the incoming time series, ECTS al-
gorithms aim to optimize online the trade-off between the
earliness and the accuracy of their decisions.
More formally, the individuals1considered are time series
of finite length T, during which a decision must be made.
At testing time, the measurements of the incoming time se-
ries are received over time, and the history of measurements
available at time tis denoted by xt=hx1,...,xti. It is
assumed that each time series can be ascribed to some class
y Y, and the task is to make a prediction about the class
of each incoming time series as early as possible, because a
time increasing cost must be paid when the decision is trig-
gered. In the ECTS problem, a single decision is triggered
for each incoming time series, which is irrevocable and final.
An ECTS approach is generally made of two main compo-
nents: (i) an hypothesis2h H capable of predicting the
class yYof the incoming series at any time, such that
h(xt) = ˆywith t[1, T ] ; (ii ) a triggering strategy capa-
ble of making decisions at the right moments, denoted by
T rigg er. Both the hypothesis and the triggering strategy
are learned in batch mode (i.e. offline), by using a training
set made of complete time series with their associated labels.
1The term individual refers to any type of statistical unit
2An hypothesis is a candidate predictor which approximates
the concept P(y|xt).
A short overview of Early Classification of Time Series
This paragraph provides an overview of the ECTS approaches.
For a recent and more complete survey, the reader can refer
to [2, 34]. Doubts about the relevance of the ECTS frame-
work and questions about the definition of the problem have
been raised in [82]. The present paper sets up a formal
framework and explores new directions to enrich this area
and underlines its practical significance.
The pioneering approaches were based on some form of con-
fidence criterion and waited until a predefined threshold
is reached before triggering their decisions. For instance,
in [32, 36, 62], a classifier is learned for each time step and
various stopping rules are used (e.g. threshold on confidence
level). In [83], such a threshold is indirectly set since the best
time step to trigger the decision is estimated by determining
the earliest time step for which the predicted label does not
change, based on a 1NN classifier. Similarly, [58] proposes
a method where the accuracy of a set of probabilistic classi-
fiers is monitored over time, which allows the identification
of time steps from whence it seems safe to make predictions.
Then, more informed approaches appeared which explicitly
take into account the cost of delaying the decisions. A no-
table example is [59] where the conflict between earliness and
accuracy is explicitly addressed. Moreover, instead of set-
ting the trade-off in a single objective optimization criterion
as in [57], the authors keep it as a multi-objective criterion
and to explore the Pareto front of the multiple dominating
The Economy approach [2, 18] goes one step further by
casting the ECTS problem as searching to optimize a loss
function which combines the expected cost of misclassifica-
tion at the time of decision, plus the cost of having delayed
the decision thus far. This well-founded approach is non-
myopic, as it is able to anticipate measurements which are
not yet visible at decision time by estimating the expected
costs for future time steps. This approach leads to the best
performances observed to date, and [2] shows that the non-
myopic feature of this approach explains its strong perfor-
mances through an ablation study.
Limitations of the ECTS problem
While ECTS covers a wide range of applications, it does not
exhaust all cases where a Machine Learning model can be
applied on data acquired over time, and where the trade-off
between the earliness and the accuracy of decisions must be
optimized. Indeed, ECTS, as defined above, is limited to:
a classification problem ;
an available training set which contains completely and
properly labeled time series ;
a decision deadline that is finite, fixed and known ;
unique decisions for each incoming time series ;
decisions that once made can never be reconsidered ;
fixed decision costs which do not depend on the trig-
gering time and the decisions made.
All of these assumptions might be questioned and point
to research issues. The purpose of this paper is to pro-
pose research directions for extending ECTS toward a more
generic problem, that we call Machine Learning based Early
Decision-Making (ML-EDM).
This position paper is organised as follows. Section 2 first
defines the ML-EDM problem, shows how a triggering strat-
egy can be learned, and positions ML-EDM with respect to
Reinforcement Learning. A series of ten challenges is then
proposed in order to develop ML-EDM approaches for a
wide range of problems. Section 3 explains the deep origin of
the delay costs involved in the ML-EDM problems. Section
4 considers a variety of learning tasks, and Section 5, a vari-
ety of data types. Section 6 gives some leads to address the
problem of online ML-EDM. Section 7 extends ML-EDM
to revocable decisions. Section 8 specifies the other deci-
sion costs involved, and shows how they can vary depending
on the triggering time and the decisions made. Section 9
gives an overview on the proposed challenges, and makes a
synthesis of long and short term application perspectives.
Then, Section 10 provides some examples of applications of
the ML-EDM techniques and Section 11 illustrates how the
loss function can be defined and how to use it for evaluation
purposes. At last, Section 12 concludes with perspectives for
the development of the ML-EDM field in the coming years.
This section defines what ML-EDM is by answering the fol-
lowing questions:
A- What is an early decision? B- How to learn a triggering
strategy from training data? C- Can a triggering strategy
be learned by Reinforcement Learning?
Question A - What is an early decision?
Two types of problems can be distinguished [35]. Decision-
making under ignorance refers to a category of problems
where the set of possible outcomes is known, but no in-
formation about their probabilities is available. By con-
trast, decision-making under uncertainty deals with prob-
lems where the probabilities of the possible outcomes are
known, or partially known.
Basically, Early Decision Making consists in: (i) observing
pieces of information over time ; (ii) deciding when to make
a decision ; and (iii) making the decision itself. In the fol-
lowing, increasingly complex decision-making problems are
considered, numbered from (I)to (IV ), in order to pro-
gressively lead to a general definition of ML-EDM.
(I)-Optimal Stopping Problem [73] is a canonical case
of interest, where the decision to make is simply to stop re-
ceiving new pieces of information. More formally, {Xi}is a
sequence of random variables observed successively, whose
joint distribution is known. Let {ri}be a sequence of re-
ward functions, such that riis a function of the observed
values x1,...,xi. The objective is to maximize the reward,
deciding after observing the value of the random variable
Xi, either to stop and accept the reward ri, or to observe
the value of the next random variable Xi+1. A number of
optimal stopping problems have been extensively studied in
the literature, such as:
The Shepp’s urn [73] which is filled with a known num-
ber of $1 bills, and a known number of anti-bills of -$1.
Here, the reward is the sum of the bills gathered until
the end of the game. The ob jective is to maximize our
payoff by stopping to draw objects in this urn at the
best time.
The secretary problem [24] consists in selecting the
largest possible value (which is unknown), among a
sequence of values of known size observed in a uniform
random order. At each step the choice is, either to
stop and keep the last observed value, or to continue.
These two problems involve decision making under uncer-
tainty, since the system under study is perfectly known and
the probability of the possible outcomes can be estimated.
For instance, in the Shepp’s urn the probability of getting
a bill or an anti-bill in the next draw is available, since the
content of the urn is known at any time. In the secretary
problem, the rank of the last value among the previously
observed values approximates the rank in the entire set of
values, since the observed values constitute a uniform sam-
ple of all values.
As in Early Decision Making problem, Shepp’s urn and the
secretary problem imply a trade-off between early and ac-
curate decisions.
On the one hand, there is a time pressure which pushes to
trigger early decisions. In a Shepp’s urn, the number of
objects is finite and if all of them are drawn, our payoff is
bad, i.e. equal to the number of bills minus the number of
anti-bills. In the secretary problem, the number of values
is known. The more values are observed, the less future
opportunity remains to select a high value.
On the other hand, there is a gain of information (about
what’s left in the urn) over time which tends to delay the
decisions. In the Shepp’s urn problem, the sample of al-
ready drawn objects grows over time, which provides useful
information to be compared to the known quantities of bills
and anti-bills. For the secretary’s problem, the sample of
already drawn values grows over time, and the last observed
value can be compared to this sample.
Note: from here on, the decision-making problems
presented in the following are part of supervised
learning. A set of labeled examples, which takes dif-
ferent forms depending on the problem, is assumed
to be available.
(II )-The ECTS problem can be considered as a par-
ticular instance of optimal stopping, where the decision to
be made consists in: (i) stopping receiving new measure-
ments ; and (ii) predicting the class of the incoming time
series. The hypothesis h H is assumed to be available,
allowing to predict the class yYof the incoming series
at any time, such that h(xt) = ˆy. In this case, the reward
function r(xt, t, ˆy, y ) depends on the observed measurements
xt=hx1,...,xti; the decision time t; the predicted class
ˆy; and the true class y. The following loss function can be
L(h(xt), t, y) = Lprediction (h(xt), y ) + Ldelay (t) (1)
where Lprediction(.) is the cost of making a potentially bad
prediction which can be expressed as a cost matrix, and
Ldelay(t) is a monotonically increasing function of trepre-
senting the cost of delaying the decision until t3. The best
decision time tis given by the optimal triggering strategy
T rigg erdefined as:
T rigg er(h(xt)) =
1 if t=t= arg mint[1,T]L(h(xt), t, y ) or t=T
0 otherwise
where the decision is forced at t=Tif it was not taken
Here, the trade-off between early and accurate decisions
takes the following form. On the one hand, the delay cost
Ldelay(t) incurred in making a decision urges to make an
early decision. On the other hand, the cost of making a bad
prediction Lprediction is assumed to decrease over time, as
the description of the incoming time series becomes richer.
This decision making problem is under uncertainty, since
the hypothesis his capable of estimating the distribution of
the possible outcomes P(y|xt), at any time.
In practice, ECTS approaches trigger decisions at ˆ
t, hope-
fully the closest as possible to the optimal time t, at least in
terms of cost: L(h(xˆ
t, y) L(h(xt?), t?, y) must be small.
Triggering such a decision is an online optimization prob-
lem, since ˆ
tmust be chosen based on a partial description xt
of the incoming time series xT(with tT), and the reward
function can be defined as:
r(xt, t, h(xt), y) = −L(h(xt), t, y) if t=ˆ
tor t=T
0 otherwise
where the risk equals to 0 when no decision is made, given
that the decision is forced at t=Tresulting in an important
risk due to the delay cost.
Note: in the rest of this section, and for readability
reasons, T(the end of the considered time period) is
still considered as finite and known, as in the ECTS
problem. In Section 7, another setting is studied
where Tis indeterminate, i.e. where the successive
measurements are observed as a data stream.
(III )-Early decisions to be located in time constitute
a more challenging problem, which consists of both making
a decision for each incoming time series, but also predict-
ing a time period associated with the decision. For exam-
ple, maintenance operations on hydroelectric dam turbines
can only be performed when the electricity demand is at a
low enough level. There are therefore periods where mainte-
nance is possible and periods where this is not desirable. The
objective here is to determine as early as possible whether
3Note that the delay cost Ldelay(t) could depend on the
class yof the time series (see Section 3). For instance, in
the emergency department in a hospital, the cost of delaying
a decision when there is internal bleeding is not the same as
the one in case of gastroenteritis, where the early symptoms
could look the same. Here, for reasons of readability, we
make Ldelay depend only on t.
and during which period it will be possible to shut down the
turbines, within the day (if [1, T ] corresponds to one day).
In this case, the ground truth (y, (s, e)) consists of a class
yY, associated with a certain time period [s, e], defined by
astart timestamp s[1, T ] and a end timestamp e[s, T ].
At testing time, the objective is twofold: triggering the de-
cision as early as possible, while also predicting the associ-
ated time period [s, e]. Let us consider a decision denoted
by (h(xˆ
t),s, ˆe)), where h(xˆ
t) is the class predicted at ˆ
triggering time), and s, ˆe] is the associated predicted time
period. The loss function Lhas to be redefined as a function
of the following parameters:
t),s, ˆe))
| {z }
trigger ing time
,(y, (s, e))
| {z }
ground truth
The loss function Lneeds to be specified further, depending
on the considered application. In general, this loss function
should account for two aspects: (i) the quality of the predic-
tions ; (ii) the time overlap between the decisions made and
the true decisions. For example, Figure 1 shows a situation
where the decision made is correct, since the predicted class
(see the second line) matches the ground truth (see the first
line). But these two decisions do not coincide exactly in
time, as the predicted time period is earlier than the ground
Figure 1: Example of a time-lagged decision.
(IV )-ML-EDM4considers, by extension, multiple early
decisions to be located in time (i.e. in the time period [1, T ]).
, which is necessary in numerous applications. For example,
consider a set of servers used to trade on a stock exchange
platform (where [1, T ] corresponds to the platform’s hours
of operation during the day). For each server, key perfor-
mance indices (e.g., CPU, RAM, network) are recorded over
time. The ground truth consists of a sequence of states (e.g.,
overload or nominal) associated with the corresponding time
periods. In this application, the task is to detect overload
periods as early as possible.
Thus, in this problem, the true decisions {yi,(si, ei)}kx
i=1 con-
sists of a sequence of varying length kx, which is specific to
each individual x. Each element of this sequence is a deci-
sion to be located in time, which consists of a class yiY
associated with a certain time period [si, ei]. For a given
individual x, the time periods {(si, ei)}kx
i=1 constitute a time
partition, each interval [si, ei] being associated with the true
class yi(e.g. in predictive monitoring, this time partition
would correspond to the successive states, up or down, of a
given device).
Here, the online optimization problem to be addressed is
more complex than the previous one, since it consists in trig-
gering a sequence of decisions as soon as possible, without
4ML-EDM is a supervised problem, its extension to unsu-
pervised problems is discussed in challenge #1, Section 4.
knowing the number of true decisions kx, and also ignor-
ing the time periods associated with each true decision. Let
us consider that an ML-EDM approach triggers a sequence
of decisions {h(xˆ
i0=1 ; where ˆ
kxis the number
of decisions made ; where {ˆ
i0=1 represents the associ-
ated triggering times ; and where {si0,ˆei0)}ˆ
i0=1 represents
the predicted time periods associated to the decisions which
forms a partition of the time period [1, T ]. In the scenario
of multiple early decisions to be located in time, a loss func-
tion LL needs to be defined as a function of the following
|{z }
| {z }
trigger ing times
,{yi,(si, ei)}kx
| {z }
ground truth
This equation shows the loss function used to evaluate an
approach after the end timestamp T, when predictions have
been taken for all instants in the time period [1, T ] (see Sec-
tion 11.2 for more details). The loss function LL can be
expressed in many different ways, depending on the appli-
cation considered. In practice, mapping rules need to be
defined to match the decisions made to the true ones (see
Section 11.1).
Note that the problem of making predictions for all instants
in [1, T ] points to the issue as whether decisions made can be
revoked, or not, before T. In case decisions are irrevocable,
once a decision has been made, let us say (y, (s, e)), then it
is no longer possible to change the prediction of the class for
all times t(s, e). This renders the optimization problem
dependent upon previous decisions, and it becomes more
constraining for application cases. Revocable decision are
studied in Section 7.
- The deadline, denoted by T, after which decisions are
forced is an important component, that takes different forms
depending on the problem. In the simple case of ECTS,
only one decision needs to be made before the incoming
time series is complete. Thus, the deadline is defined as
the maximum size of the input series (T=T), which is
known in advance during training. By contrast, in the more
complex case of ML-EDM where multiple decisions to be
located in time must be taken before the end timestamp
T, the deadline Tis defined as a maximum delay allowed to
detect the start of a true decision. In practice, two situations
can be distinguished:
Some applications do not support the absence of deci-
sion, and the entire considered time period must be
partitioned by the successive decisions. This is the
case for instance when moderating content on social
networks, where discussions are continuously going on
between users and where each part of these discussions
must be classified as appropriate or not (see Section
10.3). In this case, no decision is not allowed, and
all decisions are subject to the cost Ldelay and thus
constrained by the deadline T.
By contrast, in some applications, a nominal operat-
ing state exists which is almost permanent, and for
which there is no decision deadline. This is for instance
the case in predictive maintenance applications, where
there is no urgency or even a deadline to detect the ab-
sence of failure. In this case, the delay cost Ldelay and
also the deadline Tapply only to the other decisions
(e.g. failures categorized by severity level) excluding
the nominal state.
At the end, ML-EDM aims to develop approaches which
allow for easy adaptation to all cases, whether the deadline
Tis applicable to all decisions, or whether there exists a
nominal operation state which bypasses this deadline.
Question B - How to learn a triggering strategy from data?
In summary: this section shows that learning a
triggering strategy follows the usual general princi-
ples of Machine Learning approach, with the partic-
ularity to consider time-sensitive loss functions (i.e.
which depend on when decisions are triggered, as in
Equations 1, 4 and 5).
In practice, the optimal triggering strategy is not available
and it must be approximated by a learned function, such as
T rigg erγT rigg er, where γΓ is a set of parameters to
be optimized within the space of parameters Γ of a chosen
family of triggering strategies.
In addition, the hypothesis his supposed to be learned previ-
ously during the training phase, making the system capable
of predicting yat any time t[1, T ]. This hypothesis is
defined by a set of parameters θΘ.
To illustrate what a triggering strategy is, let us consider
an example from the ECTS literature. The SR approach,
described in [57], involves 3 parameters (γ1, γ2, γ3) to decide
if the current prediction h(xt) must be chosen (output 1) or
if it is preferable to wait for more data (output 0):
T rigg erγ(h(xt)) = 0 if γ1p1+γ2p2+γ3t
1 otherwise (6)
where p1is the largest posterior probability estimated by h,
p2is the difference between the two largest posterior prob-
abilities, and the last term t
Trepresents the proportion of
the incoming time series that is visible at time t. The pa-
rameters γ1, γ2, γ3are real values in [1,1] to be optimized,
as described more generally in the following.
In the simple case of ECTS, a single decision has to be made
for each time series xX(see Equation 1). Thus, the risk
associated with any triggering strategy T rigg erγbelonging
to any family Γ, is defined as follows, given the previously
learned hypothesis hθwithin the family Θ:
R(T rigg erγ|hθ) = E
t, y)i(7)
where ˆ
tis determined by γ, the parameters of the triggering
Similarly, the risk can be defined in the more complex case
of multiple early decisions to be located in time. Let
Tpart be the set of all possible partitions of the time domain
[1, T ], having a varying number of time intervals k. The risk
can be defined as:
R(T rigg erγ|hθ) =
ti0},{yi,(si, ei)})i(8)
where {ˆ
ti0}and {si0,ˆei0)}are determined by γ, and given
hθ. In Equation 8, the risk is an expectancy on three ran-
dom variables, drawing triplets from the join distribution
P(x,{(si, ei)},{yi}). The first element corresponds to the
input data5, which is an individual xX. The two other
consist of the ground truth, which is composed of: (i) a par-
tition of the time domain {(si, ei)} Tpart with a particular
number of time intervals, denoted by k[1, T ] ; (ii) and a
set of class labels {yi} Ykfor each time interval.
Now, the objective is to approximate the optimal triggering
strategy T rigg erby finding γΓ which minimizes the
risk, such that:
γ= arg min
R(T rigg erγ|hθ) (9)
The joint distribution P(x,{(si, ei)},{yi}) is unknown, thus
Equation 8 can not be calculated ; however a training set S
which samples this distribution is supposed to be available.
The risk can be approximated by the empirical risk calcu-
lated on the training set S={xj,{yj
i, ej
as follows:
Remp(T rig gerγ|hθ) =
LL {h(xj
i, ej
where ˆ
ti0jis the triggering time of the i-th made decision of
the j-th individual.
At the end, training an ML-EDM approach can be viewed
as a two-step Machine Learning problem: (i)first, the hy-
pothesis hθmust be learned in order to predict the most
appropriate decision hθ(xt), at any time t[1, T ] ; (ii )
second, the best triggering strategy defined by γmust be
learned, given the hypothesis hθand given the family Γ,
such that: γ= arg min
Remp(T rig gerγ|hθ) (11)
Question C - Can a triggering strategy be learned by Re-
inforcement Learning?
In summary: this section shows that learning a
triggering strategy of an ECTS approach can be cast
as a Reinforcement Learning (RL) problem, with re-
wards well chosen, and it might be expected that
provided with sufficient training, RL may end up
with a good approximation of an efficient decision
5Notice that the notation xXin Equations 7 and 8
is an abuse that we use use to simplify our purpose. In
all mathematical rigor, the measurements observed succes-
sively constitute a family of time-indexed random variables
x= (xt)t[1,T ]. This stochastic process xis not gener-
ated as commonly by a distribution, but by a filtration
F= (Ft)t[1,T ]which is defined as a collection of nested σ-
algebras [43] allowing to consider time dependencies. There-
fore, the distribution P(x,{(si, ei)},{yi}) should also be re-
written as a filtration.
Reinforcement learning [75] aims at learning a function, called
a policy π, from states to actions: π:S A. Rewards can
be associated with transitions from states st S to states
st+1 S under an action a A. Rewards are classically
denoted r(st, a, st+1)R. In all generality, the result of
an action ain state stmay be non deterministic and one
among a set (or space) of states. The optimal policy π?is
the one that maximizes the expected gain from any state
st S. This gain, denoted Rtstarting from the sate st, is
defined as a function of the rewards from that state (e.g. a
discounted sum of the rewards received). In order to learn a
policy, value functions can be considered, such as the state-
value function vπ(s) classically defined as:
=Eπ[Rt|st] = X
p(st+1, r |st, a)r(st, a, st+1 ) + γ vπ(st+1)(12)
where Eπ[·] denotes the expected value of a random vari-
able given that the agent follows the policy πand tis any
time step. In the case of a non deterministic policy, π(a|st)
denotes the probability of choosing action ain state stand
p(st+1, r |st, a) the probability of reaching state st+1 and re-
ceiving the reward rgiven that the action ahas been chosen
in state st. And γis a discounting factor: γ < 1.
In our case, the agent aims to learn a triggering strategy
given the previously learned classifier hθ, and the state st=
(t, xt) is the current time tand the observed data at current
time. The instantaneous reward r(st, a) only depends on
the current state stand the action taken a(i.e. prediction
now, or postponed to a later time). Finally, the discounted
factor γ, usually present in RL for reasons of convergence
over infinite episodes, is equal to 1 in our case, since we
always deal with finite episodes with forced decisions after
a maximum delay. So that the equation (12) simplifies to:
vπ(st) = X
π(a|st)r(st, a) + vπ(st+1)
when, during learning, the agent takes a decision, it updates
the value of the state stusing:
vπ(st) = r(st, a) + vπ(st+1)
where st+1 is the state after having taken the action ain
state st.
As the equation above shows, the core observation in RL is
that the value function for a state st(i.e. an estimation of
the expected gain from that state) is related to the value
function of states st+1 that may be reached from st. In that
way, information gathered further down a followed path can
be back-propagated to previous states thus allowing increas-
ingly better decisions from those states to be made.
For instance, in game playing, rewards may happen both
during play (e.g. the player just lost a pawn) and at the end
of the game (e.g. the player is chess mate). Similarly, one
could cast the ECTS problem as a RL problem where, at
each time step, the “player” is in state st= (t, h(xtk),xt)
and should choose between making a prediction (e.g. h(xt))
with an associated reward:
rt=−L(h(xt), t, y) = −Lprediction (h(xt), y ) Ldelay(t)
or postpone the decision, with no immediate associated re-
ward, that is rt= 0. If no decision has been made before the
term of the episode (e.g. when t=T) a decision is forced
(see Figure 2). Provided with enough time series to train on,
prediction h(xt)
no prediction
st+1 =(t+1,h(xtk),xt+1)
st=(t, h(xtk),xt)
Forced decision
at time T if not taken before
rt+1 =L(h(xt+1),t+1,y)
rT=L(h(xT),T,y)=Lprediction (h(xT),y)+Ldelay(T)
Figure 2: A part of an ECTS “game” when learning an op-
timal policy while “playing” a training time series. When
a prediction is made, the game stops, otherwise it contin-
ues until a prediction is made or the term of the episode is
and sufficient training in the form of “playing” these time
series, a reinforcement learning agent may end up with a
policy bπthat approximates a good early triggering strategy,
one that would converge over time, after a very large num-
ber of “plays” on the training time series, to the optimal
decision function π?(See Equations 2 and 11 ).
The RL framework is very general. It uses immediate and
delayed rewards. As shown in this section, there is in prin-
ciple no obstacle to apply RL to the learning of a good trig-
gering strategy. However, if used directly, the generality
of RL is paid for by a need for a large number of “experi-
ments”. In addition, the state space is continuous in the case
of the ECTS problem, thus an interpolating functions must
be used in order to represent the values such as vπ(s) and
this entails the choice of a family of functions and setting
their associated parameters.
Another approach, the one favored in the current literature
for ECTS [2], is to choose functions for representing the ex-
pected values of decision times, and thus providing a ground
for the triggering strategy. This has the merit of incorporat-
ing prior knowledge of the trade-off between earliness and
accuracy, at the cost of making modelling choices that may
bias the method of estimating the expected future cost.
The respective performances, merits and limits of both ap-
proaches should be studied empirically by a comparison of
RL based ECTS approaches, such as [53], with approaches
that explicitly exploit the form of the optimization criterion
designed for ECTS as in [2].
ML-EDM approaches aim to trigger decisions at the right
time, by reaching a good trade-off between the earliness and
the accuracy of their decisions. To achieve this, a balance
must be found between penalizing late decisions and penal-
izing prediction errors. Decision costs are key to make this
antagonistic trade-off choice, as they allow us to evaluate the
cost of waiting for new measures vs. the cost of making a
decision now. In Section 2, decision costs are involved start-
ing from Equation 2 in the loss function Land they have
an important impact on the entire path of the description
of the ML-EDM problem. The ob jective of this section is to
understand the deep origin of the delay cost.
The delay cost represents the cost of postponing a decision
(see the function Ldelay in Equation 1). In the particular
case of ECTS problems, the delay cost is present in all the
works described in scientific literature. But it can be explic-
itly defined as in [2,56], or implicitly as in most approaches.
For instance, the authors in [83] trigger all the decisions
at the minimum prediction length, which correspond to the
early moment such that no prediction differs from those ap-
plied to the full-length training time series (based on a KNN
classifier). This approach thus implicitly assumes that the
delay cost is very low, by favoring the accuracy of decisions
at the expense of their earliness. In [59], the authors pro-
pose to model the trade-off between earliness and accuracy
as a multi-objective criterion and explore the Pareto front
of multiple dominant solutions. This approach is useful in
applications where earliness and accuracy can not be evalu-
ated in a commensurable way, and it provides a collection of
optimal solutions each corresponding to a particular value
of the delay cost.
For a better understanding, let us examine what happens
once a decision is triggered in the simple ECTS problem.
Figure 3 represents a classifier and a triggering strategy. At
each time step t[0, T ], the classifier predicts the condi-
tional distribution P(y|xt) based on the input incomplete
time series xt=hx0, x1,...,xti. Then, the triggering strat-
egy either decides to postpone the decision until a new mea-
surement xt+1 is available, or to trigger the decision by pre-
dicting the class value. In this first scenario, let us consider
that triggering a decision at time timplies performing a
given task (namely αor β) which depends on the predicted
class (respectively Aor B).
Figure 3: Tasks to be performed after the triggering of a
Given that this task (αor β) must be completed before the
deadline T, the problem is to determine how the cost of per-
forming this task evolves depending on the trigger time t. In
practice, the delay cost Ldelay takes the form of a parametric
function (e.g., a constant [83], linear [2] or exponential [9]
function), whose form characterizes the additional cost to
delay the execution of the tasks.
Aconstant cost, one where there is no penalty associated
with delaying the decision, would mean that these tasks are
achievable in an arbitrarily short time Tt < . In practice,
an irreducible amount of time is needed to perform the tasks
using a single worker. To reduce this time, the tasks need to
be parallelized using several workers, incurring an extra-cost
when building the global result from sub-tasks. Formally, a
constant delay cost would mean that the tasks are infinitely
parallelizable, i.e. they can be divided into independent and
arbitrarily small sub-tasks, and that there is no extra-cost
in building the global result.
More generally in ML-EDM problems, the delay cost Ldelay
is necessarily an increasing function (monotonic or piece-
wise) depending on the time remaining before the decision
deadline, and it may depend on the decision made (i.e. the
predicted label). In addition, it should tend to +when
the time remaining to perform these tasks T ttends to
zero [9]. For example, this delay cost may be modeled by
Ldelay(t) = 1/(T t)α, with a single parameter αwhich
influences the increase in cost when (T t)0.
As in ECTS, the formal definition of ML-EDM provided
in Section 2 is limited to classification problems, and in-
volves ground truth. However, in many applications, it is
extremely hard or costly to obtain, especially in the case
of anomaly detection (e.g. fraud, cyber-attacks, predictive
maintenance). In these application domains, there are sev-
eral issues: (i) labels can be extremely expensive to obtain
as they each require an examination from an expert ; (ii ) the
labels provided by experts can be uncertain ; and (iii ) the
class of anomalous observations is often poorly represented
and drifts over time. For example, cyber-attack techniques
are very diverse and change with time. Faced with these
difficulties, anomaly detection is often addressed using un-
supervised approaches, by assuming that the anomalies are
outliers. In this case, the problem comes down to modeling
the normal behavior of the system, if possible using histori-
cal data that are cleaned of anomalies. Then, it is necessary
to define the notion of outlier to be able to assign an eccen-
tricity score to the new observations. Note that this type of
modeling can be considered as a first step to manage non-
stationarity, since in this case the stationarity assumption
only concerns the normal behavior of the system (this as-
sumption could be removed in future work).
Challenge #1:
extending non-myopia to unsupervised approaches
A variety of unsupervised early decision problems could be
studied, of which two examples are listed below: (i) the
problem could be to decide, as soon as possible, whether
a partially observed time series hx1, x2,...,xtiwill be an
outlier (or not) when fully observed at time T(i.e. with a
single decision triggered for each incoming time series, as in
the ECTS problem described in Section 2) ; (ii) an other un-
supervised problem could be to detect, online and as early as
possible, the chunks of the input data stream which deviate
from the nominal learned behavior (i.e. with multiple early
decisions to be located in time, as in the ML-EDM prob-
lem). In both cases, the accuracy vs. earliness trade-off still
exists. On the one hand, an early detection is inaccurate by
nature because the outlier series (resp. chunk) is unreliably
detected, based on few (resp. poorly informative) observed
measurements. On the other hand, delaying the detection of
anomalies can be very costly. For instance, a cyber-attack
which is not detected immediately gives time to the hakers
to exploit the security hole found. Designing ML-EDM ap-
proaches to tackle unsupervised learning tasks is challenging
in several respects: (i) learning a triggering strategy with
the goal of achieving a good trade-off between earliness and
accuracy of its decisions cannot be achieved in the Machine
Learning framework as described in the section 2 and should
be formalized in another way without labels (in particular,
the models evaluation described in Equation 5 should be
reconsidered); (ii) developing unsupervised non-myopic ap-
proaches is very difficult, as the training set does not contain
anomalous series, thus the triggering strategy cannot learn
from their continuations.
The extension of ML-EDM both to online scenarios (see
Section 6) and to unsupervised tasks is of particular interest,
because combined they would enable a new generation of
monitoring systems [1] to be developed. In this case, the
learning task would consist in detecting online the start and
end of the outlier chunks: (i) without requiring labels to
learn the model ; (ii ) by considering the trade-off between
accuracy and earliness to trigger the decisions at the right
Challenge #2:
addressing other supervised learning tasks
The formal description of ML-EDM proposed in Section 2
is generic, in the sense that the type of the target variable y
can easily be changed. By definition, the ECTS approaches
in the literature are limited to classification problems, but
they could naturally be extended to other supervised learn-
ing tasks. For instance, predicting a numerical target vari-
able from a time series is a problem known as Time Series
Extrinsic Regression (TSER) [76]. In some domains, TSER
approaches are very useful and allow applications such as the
prediction of the daily energy consumption of a house, based
on the last week’s consumption, temperature and humidity
measurements. Early TSER would consist of predicting the
value of the numerical target variable as soon as possible,
while ensuring proper reliability. Another example of a su-
pervised task for which ML-EDM approaches could be devel-
oped is time series forecasting [16]. Basically, a forecasting
model aims to predict the next measurements of a time se-
ries up to an horizon ν,Y=hxt+1, xt+2 ,...,xt+νifrom the
recent past measurements X=hxtw,...,xt1, xti. Using
a forecasting model, in a an online and early way, would
consist of adapting the forecast horizon t+νaccording to
the observed values in X, by modeling the trade-off between
the accuracy and the earliness of these predicted values.
The ML-EDM problem described in Section 2 should also be
adapted to semi-supervised learning, which is of great help
when the ground truth is only partially available. More
generally, the collected ground truth may be imperfect for
various practical reasons, such as the labeling cost, the avail-
ability of experts, the difficulty of defining each label with
certainty, etc. This problem has recently gained attention in
the literature through the field of Weakly Supervised Learn-
ing (WSL) [86] which aims to list these problems and pro-
vide solutions.
Challenge #3: early weakly-supervised learning
The extension of ML-EDM to weakly-supervised learning is
an interesting challenge, as it would allow to better address
applications where the ground truth has corruptions or is
incomplete (which includes semi-supervised learning). How-
ever, the weakly-supervised learning is a very large domain
with many types of supervision deficiencies to be studied.
From a practical point of view, the priority is probably to
extend ML-EDM to label noise, and more specifically to bi-
quality learning [60], where the model is trained from two
training sets: (i) one trusted with few labels ; (ii) the other,
untrusted, with a large number of potentially corrupted la-
bels. This would allow interesting applications, such as in
cyber security where few labels are investigated by an ex-
pert, and the majority of labels are provided by rule-based
systems. The ma jor difficulty in designing bi-quality learn-
ing ML-EDM approaches is to learn a triggering strategy
from these two training sets, which models the compromise
between accuracy and earliness in a robust way to label
noise. Another interesting avenue would be to adapt Ac-
tive Learning [71] approaches to ML-EDM, with the goal of
labeling examples which improve both accuracy and earli-
ness of the decisions. Such approaches would be particularly
helpful when early decisions have to be made, and when la-
beling examples is very costly as, again, it is the case in
cyber security applications.
The ML-EDM definition proposed in Section 2 involves mea-
surements (i.e. scalar values) acquired over time. However,
this is only for reasons of simplicity of exposition. Ideally,
ML-EDM approaches should be data type agnostic, i.e. they
should operate for any data type as long as measurements
are made over time and decisions are online.
Below, we outline data types that are present in applications
where ML-EDM could be used.
i) Multivariate time series consist of successive measure-
ments each containing more than one numerical value.
ii) More complex signals exist, such as video streams which
involve higher dimension.
iii) Data streams is another type of data which can contain
both numeric and categorical variables [10]. Successive
measurements are received in an uncontrolled order
and speed.
iv) Another type of data is evolving graphs which con-
sist of graphs whose structure changes over time [46].
Several types of learning tasks can be considered, such
as predicting the next changes in the graph structure,
or the classification of parts of the graph (e.g. nodes,
arcs, sub-graphs).
v) Successive snapshots of relational data [19] should be
consider to design new ML-EDM approaches. More
precisely, relational data consists of a collection of ta-
bles having logical connections between them. Like
other types, relational data can evolve over time: (i)
the connections between tables can change ; (ii) as
well as the structure of the tables ; (iii) or even the
values of the information stored in the tables.
vi) Text is another widespread type of data. An appli-
cation example is the moderation of social networking
platforms, with early deletion of inappropriate con-
tents and automatic closure of fraudulent accounts (see
Section 10.3).
Challenge #4: data type agnostic ML-EDM
Ideally, the new developed ML-EDM approaches should be
data type agnostic, i.e. they should operate for any data
type presented above. To do so, a pivotal format needs to
be defined in order to learn the triggering strategies in a
generic way. For instance, each learning example could be
characterized by a series of Tpredictions indexed by time
(corresponding to the output of the learned hypothesis h(xt)
for each time step t[1, T ]), as well as by {yi,(si, ei)}kx
the ground truth composed of the true decisions to be made
over time for this individual. In the particular case of ECTS,
some approaches can easily be adapted to become agnostic
to data type [2, 58, 59]. In contrast, others have been de-
signed to be very specific to time series [31, 37, 83, 84], espe-
cially with the search of features (e.g. shapelets) occurring
early in the time series and helping to discriminate between
classes. More generally, future work in ML-EDM should
definitely promote data type agnostic approaches, to allow
the use of these techniques in a wide range of application
In the specific case of Early Classification of Time Series
(ECTS), an important limitation is that the training time
series: (i) have the same length T; (ii) correspond to differ-
ent i.i.d individuals ; (iii) have a label which characterizes
the whole time period of length T. There are obviously
applications where this formulation of the problem is rele-
vant [7,17,23,34,50,67,72,78], especially in cases where the
start and end of the time series are naturally defined (e.g. a
day of trading takes place from 9:30am to 4pm, during the
opening hours of the stock exchange).
The development of online ML-EDM approaches could over-
come these limitations and enable a new range of applica-
tions. For this purpose, let us consider that the input mea-
surements are observed without interruption, in the form of
adata stream [29]. In the case of a classification problem,
an online ML-EDM approach would consist in identifying
chunks in the input data stream (i.e. fixed time-windows
defined by their start and end timestamps) and categorizing
them according to a predefined set of classes. For example,
in a predictive maintenance scenario [64] such an approach
would operate on a continuous basis to detect periods of
system malfunction as soon as possible.
Figure 4: Example of a data stream labeled by chunks over
a time period
Challenge #5:
online and early predictions to be located in time
In the case of a classification problem, the training data
consist of the measurements observed from the stream dur-
ing the training period, denoted by x=hx1, x2,...,x|x|i,
associated with their labels y=hy1, y2,...,y|x|i. A la-
beled chunk is formed by the consecutive measurements, be-
tween the timestamps taand tb, if their labels share the
same value (i.e. if {yi}i[ta,tb]is a singleton). As shown in
Figure 4, the data stream defined over the training period
is labeled by chunks of variable size. For example, these
chunks could represent the periods of failure and nominal
operation in a predictive maintenance scenario. During the
deployment phase, the model is applied online on a data
stream whose measurements are observed progressively over
time. This model is expected to provide predictions located
in time, since it needs to predict the beginning and the end
of each chunk, associated with the predicted class which
characterizes the state of the system during this chunk.
Challenge #6:
online accuracy vs. earliness trade-off
Designing online ML-EDM approaches requires redefining
the accuracy vs. earliness trade-off for online decisions. The
main issue is that a data stream is of indeterminate length:
(i) its beginning may be too old to be considered explic-
itly, or can even be indeterminate ; (ii ) its end is never
reached, since it is constantly postponed by the new mea-
surements which arrive. In the particular case of ECTS, it
is precisely the fact that the input series has a maximum
length T, known in advance, that leads to force triggering
the decision when the current time tbecomes close to the
deadline T.
The rest of this paragraph presents an example of adapt-
ing the accuracy vs. earliness trade-off to online decisions
developed in [4]. Let us consider a predictive maintenance
problem for which a classifier has been trained in batch in or-
der to detect the beginning and the end of abnormal chunks
(see Figure 5). The prediction of the classifier focuses on a
fixed timestamp sand the question is to determine if this
timestamp corresponds (or not) to the beginning of an ab-
normal section. The input features used by the classifier are
extracted from a sliding window xt=hxtw,...,xt1, xti
of length w. As shown in Figure 5, the sliding window xt
moves over time as it gets closer to s. At first, the timestamp
sis located in the future (s > t). Making a good prediction
is difficult since the potentially anomalous part of the stream
is not yet visible in xt. In this case, the classifier have to
detect the early signs of an anomaly. Then, the timestamp
senters the xtwindow (at time t= 4). The prediction
becomes easier to perform, since a part of the potentially
abnormal chunk is visible in xt. The last possible moment
to trigger the decision is reached when the timestamp sis
getting ready to exit the sliding window xt.
Finally, the accuracy vs. earliness trade-off occurs as fol-
lows: (i) on the one hand, the accuracy of the decisions
increases over time due to the classification task that be-
comes easier as the xtwindow shifts ; (ii ) on the other
hand, predictive maintenance applications require early de-
cisions which allow to anticipate breakdowns, or at least to
detect it early. Ultimately, this proposal consists of changing
the definition of what is predicted as normal or abnormal.
Here, the observation to be scored is no longer a time series
of finite length, but a particular measurement of the input
data stream identified by its timestamp. This proposal only
partially addresses the problem, as the predictions for each
timestamp would have to be consolidated in order to predict
the start and end of each chunk. There are certainly other
ways to adapt the accuracy vs. earliness trade-off to online
decisions that would be valuable to investigate.
Figure 5: Illustration of the earliness vs. accuracy trade-off
for online decisions
Challenge #7:
management of non-stationarity in ML-EDM
It is not always realistic to assume stationarity of the data.
In practice, data collected from a stream may suffer from
several types of drifts: (i) the distribution of the measure-
ments within the sliding window xtcan vary over time, this
is called covariate-shift [63]; (ii) the prior distribution of
the classes P(y) can be subject to such drifts; (iii) and the
concept to be learned P(y|x) can also change when concept
drift occurs [28].
To manage these non-stationarities, a first family of ap-
proaches maintains a decision model trained using a sliding
window of most recent examples. This is a blind approach,
in the sense that there is no explicit drift detection. The
main problem is deciding the appropriate window size.
A second family of approaches, explicitly detects the drifts
[30, 48] and triggers actions when necessary, such as re-
training the model from scratch, or using a collection of
models in the case of ensembles. In this case, detecting
concept drift can be considered as similar to the anomaly
detection problem, and ML-EDM approaches could be used
to tackle it in future work. A popular idea is to train the
decision model using a growing window while data is sta-
tionary, and shrink the window when a drift is detected.
These kinds of approaches can easily be adapted to on-
line ML-EDM, since they decouple model training and non-
stationarity management.
In the case of incremental concept drift, a third family of
approaches consists in continuously adapting the model by
training it online from recent data. This kind of adaptive
approach is much more challenging to adapt to online ML-
EDM. Indeed, as in ML-EDM problems (see Figure 3), two
kinds of models are used: (i) the predictive model(s), which
can categorize the input data stream at any time ; (ii) the
triggering strategy which makes the decisions at the ap-
propriate time. The main challenge in developing adaptive
drift management methods for the online ML-EDM prob-
lem is that the parameters of the predictive models and of
the triggering strategy must be updated jointly. These two
kinds of models are highly dependent: updating the param-
eters of one has an impact on the optimal parameters of the
By contrast, in standard ML-EDM approaches which oper-
ate in batch mode, the parameters of the predictive models
are first optimized, and then the parameters of the trigger-
ing strategy are optimized in turn given the parameters of
the classifiers (see paragraph B in Section 2). This two-step
Machine Learning scheme is definitely not valid for manag-
ing drift online [45]. Adaptive drift management for the
online ML-EDM problem has not yet been addressed in
the literature and constitutes an interesting research direc-
tion. In drift detection systems, there is a trade-off between
fast detection and the number of false alarms. Moreover,
in problems where the target (e.g. the labels) is not al-
ways available or available with a delay requires unsuper-
vised or semi-supervised drift detection mechanisms. The
ML-EDM framework, improving the compromise between
earliness and accuracy, can provide new approaches for drift
In many situations, one can take a decision and then decide
to change it after some new pieces of information become
available. The change may be burdensome but nevertheless
justified because it seems likely to lead to a much better
outcome. This can be the case when a doctor revises what
now seems a misdiagnosis.
Similarly, ML-EDM should be extended to consider such a
revocation mechanism. In the classical ML-EDM problem as
described in Section 2, a prediction h(xˆ
t) cannot be changed
once the decision is triggered at time ˆ
tT. The cost of
such an irrevocable decision is given by the loss function
described by Equation 5. Whereas, the extension of ML-
EDM to revocable decisions [3] allows a prediction to be
modified several times before the end Tof the considered
time period. On the one hand, the revocation of a decision
generates a higher delay cost Ldelay, as well as a cost of
changing the decision Lrevoke . On the other hand, new data
observed in the meantime provide information that makes
the prediction more reliable, thus tending to decrease the
misclassification cost Lprediction. Ultimately, the main issue
is to identify the appropriate decisions to revoke, in order to
minimize the global cost, given by Equation 13.
Such an extension to revocable decisions could be of great
interest: (i) in applications where the cost of changing de-
cisions is low, i.e. the DAGs associated with each possible
decision share reusable tasks (see Section 3) ; (ii) in appli-
cations involving online early decision making (see Section
6). There are many use cases where the need to revoke de-
cisions appears clearly. For instance, the emergency stop
system of an autonomous car brakes as soon as an obstacle
is suspected on the highway, and releases the brake when
it realizes, as it gets closer, that the suspected obstacle is a
false positive (e.g. a dark spot on the road).
Developing ML-EDM approaches capable of appropriately
revoking its decisions involves solving the two following chal-
Challenge #8: reactivity vs. stability dilemma for
revocable decisions
The first issue is to ensure that a decision change is driven
by the information provided by the recently acquired mea-
surements, and not caused by the inability of the system to
produce a stable decision over time. This problem is not
trivial. On the one hand, the system needs to be reactive
by changing its decision promptly when necessary. On the
other hand, the system is required to provide stable deci-
sions over time by avoiding excessively frequent and undue
changes. Thus, a trade-off exists between the reactivity of
the system and its stability over time. One way to formalize
this trade-off is to associate a cost to decision changes, as it
is proposed in part (iii) of Equation 13. To our knowledge,
only one approach uses such a cost of decision change [3],
in order to penalize revocation of too many decisions. The
reactivity vs. stability dilemma of revocable decisions is un-
derstudied in the literature, and it would be interesting for
the scientific community to work on this question.
Challenge #9:
extending non-myopia to revocation risk
Non-myopic ML-EDM approaches are capable of estimating
the information gain that will be provided by future mea-
surements, based on the currently visible ones. In other
words, these approaches are able to predict the reliability
improvement of a decision in the future. Thus, a decision is
triggered when the expected gain in miss-classification cost
at the next time steps does not compensate the cost of de-
laying the decision [2]. In the case of revocable decisions,
an important challenge is to estimate the future informa-
tion gain by taking into account the risk of revocation it-
self. Specifically, a decision that will probably be revoked
afterward should be delayed due to this risk. Conversely, a
decision which promises to be sustainable should be antici-
pated. Designing non-myopic to revocation risk approaches
could be an important step forward to (i) optimize the first
trigger moment, and (ii) reduce the number of undue de-
cision changes. The approach proposed in [3] constitutes
a first step in this direction, by assigning a cost to decision
changes and considering it in the expectation of future costs.
To the best of our knowledge, this is the only approach which
provides this interesting property. It is not clear whether al-
ternative methods are possible. This is an interesting topic
for further studies by the scientific community.
The origin of the delay cost has been studied in Section
3, however it is necessary to further specify the operating
scenario in order to understand the other decision costs
involved in ML-EDM. Figure 6 describes a binary ECTS
problem, where the actions to be performed depend on the
predicted class and are described by two Directed Acyclic
Graphs (DAG). These DAGs characterize the sequence and
the relationships between the unit tasks which compose them
(e.g. task 1 must be completed before starting task 2). Here,
the DAGs of tasks are fixed, they do not depend on the de-
cision time.
The total cost of a decision can be decomposed by:
(i) the delay cost, denoted by Ldelay , which reflects the
need to execute the DAG of actions corresponding to
Figure 6: DAGs of tasks to be performed after the triggering
of a decision.
the new decision in a constrained time, and in a par-
allel way (already detailed in Section 3);
(ii) the decision cost, which corresponds to the consequences
of a bad decision, or the gains of a good decision (de-
noted by Lprediction).
(iii) the revocation cost, which is the cumulative cost of
the mistakenly performed tasks belonging to the DAG
of previously made bad decisions, and which are not
reusable for the new decision (denoted by Lrevoke) ;
When expressed in the same unit, these different types of
costs can be summed up in order to reflect the quality of
the decisions made, and their timing. Thus, Equation 1
L(h(xt), t, y) =
z }| {
Ldelay(t) +
z }| {
Lprediction (h(xt), y)
+Lrevoke h(xt)|{(h(xˆ
| {z }
where {(h(xˆ
t]represents the sequence of the
previously made decisions and their associated triggering
time, with ˆ
ti< t, i[1, D x
Term (ii): Taking into account the decision cost is a very
common feature in the literature, particularly in the field of
cost-sensitve learning [20]. These techniques take as input a
function Lprediction( ˆy|y) : Y × Y Rwhich defines the cost
of predicting ˆywhen the true class is y. The aim is to learn
a classifier which minimizes these costs on new data.
Term (iii): By contrast, the study of the revocation cost
is very limited in the literature. To our knowledge, [3] is
the only one article article that considers this problem, and
this work shows that assigning a cost to decision changes is
a first lead to manage the reactivity vs. stability dilemma,
and to design non-myopic to revocation risk approaches (i.e.
discussed later in challenges #8 and #9). The origin of
this cost can be explained in the light of the tasks to be
performed once a decision is triggered (see Figure 6). For
instance, let us consider the first decision noted by (A, ˆ
in which the system predicts at time ˆ
t1that the input time
series belongs to the class A. This decision is then revoked
in favor of a new decision (B, ˆ
t2). The cost of changing this
decision, denoted by Lrevoke ((B, ˆ
t2)|(A, ˆ
t1)), can be defined
as the cost of the actions already performed between ˆ
t2which turn out to be useless for the new decision, i.e.
which cannot be reused in the DAG of tasks corresponding
to the new predicted class B. In order to define the costs of
decision changes, it is necessary to identify the common tasks
between the DAGs of the different classes and to evaluate
their execution time. In addition, the entire sequence of
the past decisions must be taken into account to identify
the already completed tasks which are now useful for the
achievement of the current DAG of tasks. For instance,
the cost Lrevoke ((A, ˆ
t3)|{(A, ˆ
t1),(B, ˆ
t2)}) can be reduced by
the tasks executed between ˆ
t1and ˆ
t2, if these tasks are not
perishable, i.e. the results are identical to those that would
be obtained by re-executing these tasks at ˆ
Challenge #10:
scheduling strategy and time-dependent costs
In this paper, the DAGs of tasks are supposed to be fixed,
i.e. not depending on the decision time. However, a more
general problem could be considered (see Figure 7) where
the DAGs of tasks are generated by a scheduling strategy
depending on: (i) the decision made ; (ii ) and the deci-
sion time. Such a scheduling strategy is useful in appli-
cations where the actions to be performed after a decision
can be adapted to a time budget available to perform them.
Two situations may occur: (i) ideally, a decision is triggered
early enough to allow the scheduling strategy to generate a
complete DAG of tasks which is optimal given the decision
made (as in Figure 6) ; (ii) on the contrary, in the case of a
too late decision, the scheduling strategy needs to build the
DAG so that it can be achieved in the remaining time (e.g.
by parallelizing some tasks, by changing or removing some
of them). For instance, when flying an airplane, the tasks
to be performed for an emergency landing are not the same
as for a normal landing, and there is a range of situations
with different emergency level, and therefore corresponding
to different time budgets.
Such a time-dependent scheduling strategy radically trans-
forms the ML-EDM problem and the way it can be formu-
lated. In particular, the triggering and scheduling strategies
become mutually dependent:
1. Decision costs depend on the generated DAG of tasks:
all the previously discussed costs result from the struc-
ture of the DAG to be performed conditionally to the
decision made: (i) the relationships between the tasks
; (ii) their execution time ; (iii) the conditions of their
reuse when they are common to several DAGs. Since
the structure of the DAG to be performed now de-
pends on the decision time, the decision costs can no
longer be considered as fixed, and they are available
only after scheduling.
2. The optimal decision time depends on the cost values:
on the other hand, the triggering strategy aims to op-
timize the decision time based on the cost values. As
described in Equation 11, the triggering strategy is
learned by minimizing the empirical risk, which is it-
self estimated using a loss function based on the costs.
Figure 7: DAG of tasks to be performed after the triggering
of a decision, generated by a scheduling strategy.
This mutual dependency between the triggering and the
scheduling strategies has strong impacts on the ML-EDM
problem. In particular, the optimal decision time tde-
scribed in Equation 2 must be redefined as a fixed point,
i.e. the function to be optimized takes the optimal so-
lution as an input parameter, in such a way that t=
arg mint[1,T ]L(h(xt), t, y). This leads to a much more
difficult class of optimization problems, for which the simple
existence of a solution is difficult to ensure.
Finding an optimal triggering strategy when the scheduling
strategy is itself time-dependent makes ML-EDM a quite
difficult challenge as the scheduling strategy is only known
through its interactions with the triggering strategy. In this
case, Reinforcement Learning seems to be a possible option
to address the problem. The scheduling strategy could then
be considered as part of the environment, and a contributor
to the reward signal by determining the decision costs for
each decision taken at a particular time. However, this line
of attack remains to be investigated in order to assess its
In many applications, fortunately, the implementation of a
scheduling strategy is much simpler, especially when the
variation of decision costs over time is known in advance
(or modeled, and thus are partially known). The preceding
remarks are reminders that if considered in all its complex-
ity, ML-EDM becomes a very difficult problem. Addressing
the case where the costs are assumed to be time dependent
but with a known form, already offers interesting challenges
and corresponds to a variety of applications.
This section provides an overview of the previously pre-
sented challenges, indicating references which address part
of these challenges (see the second column of Table 1), and
summarizing the main prospects for applications in the short
and long term (see the last column of Table 1). Table 1 or-
ganizes the proposed challenges by category, using colors to
identify: (i) those related to changing the learning task ; (ii)
those related to online ML-EDM ; (iii) and those related to
revocable decisions.
ML-EDM challenges SOTA Main application perspectives
#1 (Section 4)
Extending non-myopia to
unsupervised approaches
In anomaly detection applications, anticipate the deviation of an
observed individual from a normal behavior.
#2 (Section 4)
Addressing other supervised learning tasks
Adapt ECTS approaches to extrinsic regression problems.
Develop forecasting methods whose prediction horizon can adapt.
#3 (Section 4)
Early weakly supervised
learning (WSL)
Adapt ECTS approaches to the different WSL classification sce-
#4 (Section 5)
Data type agnostic ML-EDM
[2, 18, 56,
Identify agnostic approaches in the literature and promote this
Define a pivotal format allowing to develop an ML-EDM library.
#5 (Section 6)
Online predictions to be
located in time
Applications where the arrival of an event (e.g. a failure) must be
predicted in advance, as well as its duration.
#6 (Section 6)
Online accuracy vs. earliness trade-off
[4] Optimize decision time in online predictive maintenance applica-
#7 (Section 6)
Management of
non-stationarity in ML-EDM
Properly manage the potentially long life of ML-EDM models.
#8 (Section 7)
Reactivity vs. stability dilemma for revocable
[3] Applications where undue and excessive decision changes must be
#9 (Section 7)
Non-myopia to revocation risk
[3] Applications where it is necessary to delay decisions which are
likely to be changed later.
#10 (Section 8)
scheduling strategy and time-dependent decision
Applications where the variation of the decision costs over time is
known or can be modeled.
Applications where the scheduling strategy is only known through
its interactions with the triggering strategy.
Table 1: Overview of the proposed challenges by category: in blue those related to the learning task, in green those related
to online ML-EDM, in purple those related to revoking decisions, and in white the others.
ML-EDM approaches can be applied to a wide range of ap-
plications, such as cyber security [87], medicine [41], surgery
[69]. This section develops some key use cases and identifies
possible advances in near future, if the proposed challenges
are met.
10.1 Early classification of fetal heart rates
There are no precise figures on the number of deaths in child-
birth due to poor oxygenation. According to the Portuguese
Directorate-General for Health, the number of children who
died due to hypoxia in 2013 was 192 fetuses. This is a critical
example where making informed early decisions is critical.
Cardiotocography techniques are used to assess fetal well-
being through continuous monitoring of fetal heart rate and
uterine contractions [51]. Labor is a potentially threatening
situation to fetal well-being, as strong uterine contractions
stop the flow of maternal blood to the placenta, compromis-
ing fetal oxygenation [61].
In this field, ML-EDM techniques could be of great help to
detect the early warning signs of complications during child-
birth. This application can be addressed as an ECTS prob-
lem, as a fetal heart rate signal constitutes a time series.
The extension of ECTS techniques to revocable decisions
would be very relevant (see challenges #8 and #9) allow-
ing for active monitoring of the children’s well-being on a
continuous basis, until delivery. In addition, two particular
aspects need to be taken into account in developing an effi-
cient approach: (i) the prediction cost Lprediction is highly
asymmetrical since a false negative can mean the death of
the baby or the mother; (ii) the deadline Twhich represents
the moment of delivery is uncertain and varying. Thus, the
deadline Tcorresponds to the occurrence of an event (i.e.
the birth ) which can be modeled as random variable as
in [26, 44].
10.2 Digital twin in production systems
Digital Twin (DT) is an important active concept in the
area of Industry 4.0. With the development of low cost
sensors and efficient IoT communication facilities, almost all
production systems are now equipped with several sensors
enabling real time monitoring and helping in decisions about
maintenance, or when failures occur. In this section, we
consider digital twins (DT) of cyber-physical systems (CBS)
which are in operation.
The main digital twin applications [27] are related to smart
cities, manufacturing, healthcare and industry. The role of
the DT is thus to use the data streams coming from the sen-
sors of the CBS in order to constantly calibrate simulation
models of different components of the system. Indeed, this
offers several opportunities, namely (1) detection of anoma-
lies when the system deviates from the simulation model ;
(2) diagnostic of dysfunctions when they occur ; (3) explo-
ration of different scenarios for system evolution in case of
dysfunction ; (4) recommendation for repair actions.
Effective maintenance management methods are vital, and
industries seek to minimize the number of operational fail-
ures. The availability of large volume of data coming from
sensors of a CBS makes the use of Machine Learning tech-
niques, supervised or unsupervised, very appealing. Typical
unsupervised ML approaches are related to anomaly detec-
tion [66] where an alarm should be triggered when the be-
havior of the CBS differs from normal running. Typical
supervised ML approaches in the context of manufacturing
and industry are related to predictive maintenance [14,64].
Predictive Maintenance (PdM) is a data-driven approach
that emerged in Industry 4.0. It uses statistical analysis,
Machine Learning (ML) models for modeling complex sys-
tems behavior, identifying trends and predicting failures.
We review below some challenges of the paper in light of
this domain. Challenge #1 (extending non-myopia to unsu-
pervised approaches) is relevant, since an efficient anomaly
detection system requires unsupervised approaches which
can be combined with physics-based simulations of the dif-
ferent components. Challenge #2 (other supervised tasks)
is also appropriate since both classification and regression
problems appear (e.g. breakdown occurrence, prediction of
energy consumption). Challenge #4 (data type agnostic)
is especially relevant for DT’s, since a system is always com-
posed of several heterogeneous components. In this situa-
tion, the update of one component or one or several sensors
would be much easier and cheaper if ML-EDM were data
type agnostic. DT’s operating at a system level leads to
complex prediction models and complex decisions since the
different components operate differently but in interaction
(cf. challenge #5). The ability to manage non-stationarity
(cf. challenge #7) is obviously central in DT’s: aging and
wearing of equipment lead to covariate and concept drifts
which must be taken into account.
10.3 Social networks: societal and psycholog-
ical risks
Online social networking platforms are more popular than
ever. They radically transform the way we communicate
with each other. However, this transformation comes with
many problems on both sides, for users and platforms.
For example, Fake news spread widely during the covid pan-
demic. [5] tackled this problem as a binary classification
problem where classes are “fake” and “real” news. Fake
accounts are also considered a major problem , as they
are among the main culprits in spreading false information.
For instance, [8, 21, 25, 74] use Machine Learning techniques
to detect these fake account based on interactions between
users. Fake accounts can also be used for harassment and
can induce major psychological risks [81]. The detection
of depression and risk of suicide has been addressed using
Machine Learning techniques in [15,39].
Decisions taken by Machine Learning models to prevent such
risks on social networks are clearly time-sensitive:
Fake news must be