Content uploaded by Vincent Lemaire

Author content

All content in this area was uploaded by Vincent Lemaire on Dec 28, 2022

Content may be subject to copyright.

Content uploaded by Alexis Bondu

Author content

All content in this area was uploaded by Alexis Bondu on May 23, 2022

Content may be subject to copyright.

Content uploaded by Alexis Bondu

Author content

All content in this area was uploaded by Alexis Bondu on Apr 29, 2022

Content may be subject to copyright.

Available via license: CC BY 4.0

Content may be subject to copyright.

Open challenges for Machine Learning based

Early Decision-Making research

Alexis Bondu1, Youssef Achenchabe1,2, Albert Bifet3,5, Fabrice Cl´

erot1, Antoine Cornu ´

ejols2,

Jo˜

ao Gama4, Georges H´

ebrail3, Vincent Lemaire1, and Pierre-Franc¸ois Marteau6

1Orange Labs, France

2Paris-Saclay University, Agroparistech, France

3IPP, T´el´ecom Paristech, France

4University of Porto, Portugal

5University of Waikato, New Zealand

6Bretagne-Sud University, France

ABSTRACT

More and more applications require early decisions, i.e. taken

as soon as possible from partially observed data. However,

the later a decision is made, the more its accuracy tends to

improve, since the description of the problem to hand is en-

riched over time. Such a compromise between the earliness

and the accuracy of decisions has been particularly studied

in the ﬁeld of Early Time Series Classiﬁcation. This paper

introduces a more general problem, called Machine Learn-

ing based Early Decision Making (ML-EDM), which consists

in optimizing the decision times of models in a wide range

of settings where data is collected over time. After deﬁn-

ing the ML-EDM problem, ten challenges are identiﬁed and

proposed to the scientiﬁc community to further research in

this area. These challenges open important application per-

spectives, discussed in this paper.

1. INTRODUCTION

In numerous real situations, we have to make early decisions

in the absence of complete knowledge of the problem at hand.

For example, such decisions are necessary in medicine [54]

when a physician must make a diagnosis, possibly leading

to an urgent surgical operation, before having the results

of all medical tests. In such situations, the issue facing the

decision makers is that, most of the time, the longer the

decision is delayed, the clearer is the likely outcome (e.g.

the critical or not critical state of the patient) but, also,

the higher the cost that will be incurred if only because

decisions taken earlier allow one to be better prepared. We

thus seek to make decisions at times that seem to be the

best compromises between the earliness and the accuracy

of our decisions.

Similarly in Machine Learning, when the input data is ac-

quired over time, there can be situations with a trade-oﬀ be-

tween the earliness and accuracy of decisions. For instance,

this is the case for anomaly detection, predictive mainte-

nance, patient health monitoring, self-driving vehicles (see

Section 10). In each case, the decisions are time-sensitive

(e.g. in an autonomous car, it is critical to detect obstacles

on the road as early as possible and at the same time as reli-

ably as possible, in order to plan safe avoidance trajectories

if needed). In general, it is assumed that there is a gain of

information over time, i.e. delaying decisions tends to make

them more reliable (e.g. the certainty about the existence or

absence of an obstacle on the road becomes more and more

accurate as the car gets closer).

This earliness vs. accuracy dilemma is part of many decision

making scenarios, and is particularly involved in the problem

of Early Classiﬁcation of Time Series (ECTS). But, as we

will see, it takes place in a larger perspective.

Early Classiﬁcation of Time Series: a particular case

The ECTS problem consists in ﬁnding the optimal time to

trigger the class prediction of an input time series observed

over time. As successive measurements provide more and

more information about the incoming time series, ECTS al-

gorithms aim to optimize online the trade-oﬀ between the

earliness and the accuracy of their decisions.

More formally, the individuals1considered are time series

of ﬁnite length T, during which a decision must be made.

At testing time, the measurements of the incoming time se-

ries are received over time, and the history of measurements

available at time tis denoted by xt=hx1,...,xti. It is

assumed that each time series can be ascribed to some class

y∈ Y, and the task is to make a prediction about the class

of each incoming time series as early as possible, because a

time increasing cost must be paid when the decision is trig-

gered. In the ECTS problem, a single decision is triggered

for each incoming time series, which is irrevocable and ﬁnal.

An ECTS approach is generally made of two main compo-

nents: (i) an hypothesis2h∈ H capable of predicting the

class y∈Yof the incoming series at any time, such that

h(xt) = ˆywith t∈[1, T ] ; (ii ) a triggering strategy capa-

ble of making decisions at the right moments, denoted by

T rigg er. Both the hypothesis and the triggering strategy

are learned in batch mode (i.e. oﬄine), by using a training

set made of complete time series with their associated labels.

1The term individual refers to any type of statistical unit

studied.

2An hypothesis is a candidate predictor which approximates

the concept P(y|xt).

A short overview of Early Classiﬁcation of Time Series

This paragraph provides an overview of the ECTS approaches.

For a recent and more complete survey, the reader can refer

to [2, 34]. Doubts about the relevance of the ECTS frame-

work and questions about the deﬁnition of the problem have

been raised in [82]. The present paper sets up a formal

framework and explores new directions to enrich this area

and underlines its practical signiﬁcance.

The pioneering approaches were based on some form of con-

ﬁdence criterion and waited until a predeﬁned threshold

is reached before triggering their decisions. For instance,

in [32, 36, 62], a classiﬁer is learned for each time step and

various stopping rules are used (e.g. threshold on conﬁdence

level). In [83], such a threshold is indirectly set since the best

time step to trigger the decision is estimated by determining

the earliest time step for which the predicted label does not

change, based on a 1NN classiﬁer. Similarly, [58] proposes

a method where the accuracy of a set of probabilistic classi-

ﬁers is monitored over time, which allows the identiﬁcation

of time steps from whence it seems safe to make predictions.

Then, more informed approaches appeared which explicitly

take into account the cost of delaying the decisions. A no-

table example is [59] where the conﬂict between earliness and

accuracy is explicitly addressed. Moreover, instead of set-

ting the trade-oﬀ in a single objective optimization criterion

as in [57], the authors keep it as a multi-objective criterion

and to explore the Pareto front of the multiple dominating

trade-oﬀs.

The Economy approach [2, 18] goes one step further by

casting the ECTS problem as searching to optimize a loss

function which combines the expected cost of misclassiﬁca-

tion at the time of decision, plus the cost of having delayed

the decision thus far. This well-founded approach is non-

myopic, as it is able to anticipate measurements which are

not yet visible at decision time by estimating the expected

costs for future time steps. This approach leads to the best

performances observed to date, and [2] shows that the non-

myopic feature of this approach explains its strong perfor-

mances through an ablation study.

Limitations of the ECTS problem

While ECTS covers a wide range of applications, it does not

exhaust all cases where a Machine Learning model can be

applied on data acquired over time, and where the trade-oﬀ

between the earliness and the accuracy of decisions must be

optimized. Indeed, ECTS, as deﬁned above, is limited to:

•a classiﬁcation problem ;

•an available training set which contains completely and

properly labeled time series ;

•a decision deadline that is ﬁnite, ﬁxed and known ;

•unique decisions for each incoming time series ;

•decisions that once made can never be reconsidered ;

•ﬁxed decision costs which do not depend on the trig-

gering time and the decisions made.

All of these assumptions might be questioned and point

to research issues. The purpose of this paper is to pro-

pose research directions for extending ECTS toward a more

generic problem, that we call Machine Learning based Early

Decision-Making (ML-EDM).

This position paper is organised as follows. Section 2 ﬁrst

deﬁnes the ML-EDM problem, shows how a triggering strat-

egy can be learned, and positions ML-EDM with respect to

Reinforcement Learning. A series of ten challenges is then

proposed in order to develop ML-EDM approaches for a

wide range of problems. Section 3 explains the deep origin of

the delay costs involved in the ML-EDM problems. Section

4 considers a variety of learning tasks, and Section 5, a vari-

ety of data types. Section 6 gives some leads to address the

problem of online ML-EDM. Section 7 extends ML-EDM

to revocable decisions. Section 8 speciﬁes the other deci-

sion costs involved, and shows how they can vary depending

on the triggering time and the decisions made. Section 9

gives an overview on the proposed challenges, and makes a

synthesis of long and short term application perspectives.

Then, Section 10 provides some examples of applications of

the ML-EDM techniques and Section 11 illustrates how the

loss function can be deﬁned and how to use it for evaluation

purposes. At last, Section 12 concludes with perspectives for

the development of the ML-EDM ﬁeld in the coming years.

2. DEFINITION OF ML-EDM

This section deﬁnes what ML-EDM is by answering the fol-

lowing questions:

A- What is an early decision? B- How to learn a triggering

strategy from training data? C- Can a triggering strategy

be learned by Reinforcement Learning?

Question A - What is an early decision?

Two types of problems can be distinguished [35]. Decision-

making under ignorance refers to a category of problems

where the set of possible outcomes is known, but no in-

formation about their probabilities is available. By con-

trast, decision-making under uncertainty deals with prob-

lems where the probabilities of the possible outcomes are

known, or partially known.

Basically, Early Decision Making consists in: (i) observing

pieces of information over time ; (ii) deciding when to make

a decision ; and (iii) making the decision itself. In the fol-

lowing, increasingly complex decision-making problems are

considered, numbered from (I)to (IV ), in order to pro-

gressively lead to a general deﬁnition of ML-EDM.

(I)-Optimal Stopping Problem [73] is a canonical case

of interest, where the decision to make is simply to stop re-

ceiving new pieces of information. More formally, {Xi}is a

sequence of random variables observed successively, whose

joint distribution is known. Let {ri}be a sequence of re-

ward functions, such that riis a function of the observed

values x1,...,xi. The objective is to maximize the reward,

deciding after observing the value of the random variable

Xi, either to stop and accept the reward ri, or to observe

the value of the next random variable Xi+1. A number of

optimal stopping problems have been extensively studied in

the literature, such as:

•The Shepp’s urn [73] which is ﬁlled with a known num-

ber of $1 bills, and a known number of anti-bills of -$1.

Here, the reward is the sum of the bills gathered until

the end of the game. The ob jective is to maximize our

payoﬀ by stopping to draw objects in this urn at the

best time.

•The secretary problem [24] consists in selecting the

largest possible value (which is unknown), among a

sequence of values of known size observed in a uniform

random order. At each step the choice is, either to

stop and keep the last observed value, or to continue.

These two problems involve decision making under uncer-

tainty, since the system under study is perfectly known and

the probability of the possible outcomes can be estimated.

For instance, in the Shepp’s urn the probability of getting

a bill or an anti-bill in the next draw is available, since the

content of the urn is known at any time. In the secretary

problem, the rank of the last value among the previously

observed values approximates the rank in the entire set of

values, since the observed values constitute a uniform sam-

ple of all values.

As in Early Decision Making problem, Shepp’s urn and the

secretary problem imply a trade-oﬀ between early and ac-

curate decisions.

On the one hand, there is a time pressure which pushes to

trigger early decisions. In a Shepp’s urn, the number of

objects is ﬁnite and if all of them are drawn, our payoﬀ is

bad, i.e. equal to the number of bills minus the number of

anti-bills. In the secretary problem, the number of values

is known. The more values are observed, the less future

opportunity remains to select a high value.

On the other hand, there is a gain of information (about

what’s left in the urn) over time which tends to delay the

decisions. In the Shepp’s urn problem, the sample of al-

ready drawn objects grows over time, which provides useful

information to be compared to the known quantities of bills

and anti-bills. For the secretary’s problem, the sample of

already drawn values grows over time, and the last observed

value can be compared to this sample.

Note: from here on, the decision-making problems

presented in the following are part of supervised

learning. A set of labeled examples, which takes dif-

ferent forms depending on the problem, is assumed

to be available.

(II )-The ECTS problem can be considered as a par-

ticular instance of optimal stopping, where the decision to

be made consists in: (i) stopping receiving new measure-

ments ; and (ii) predicting the class of the incoming time

series. The hypothesis h∈ H is assumed to be available,

allowing to predict the class y∈Yof the incoming series

at any time, such that h(xt) = ˆy. In this case, the reward

function r(xt, t, ˆy, y ) depends on the observed measurements

xt=hx1,...,xti; the decision time t; the predicted class

ˆy; and the true class y. The following loss function can be

deﬁned:

L(h(xt), t, y) = Lprediction (h(xt), y ) + Ldelay (t) (1)

where Lprediction(.) is the cost of making a potentially bad

prediction which can be expressed as a cost matrix, and

Ldelay(t) is a monotonically increasing function of trepre-

senting the cost of delaying the decision until t3. The best

decision time t∗is given by the optimal triggering strategy

T rigg er∗deﬁned as:

T rigg er∗(h(xt)) =

1 if t=t∗= arg mint∈[1,T]L(h(xt), t, y ) or t=T

0 otherwise

(2)

where the decision is forced at t=Tif it was not taken

before.

Here, the trade-oﬀ between early and accurate decisions

takes the following form. On the one hand, the delay cost

Ldelay(t) incurred in making a decision urges to make an

early decision. On the other hand, the cost of making a bad

prediction Lprediction is assumed to decrease over time, as

the description of the incoming time series becomes richer.

This decision making problem is under uncertainty, since

the hypothesis his capable of estimating the distribution of

the possible outcomes P(y|xt), at any time.

In practice, ECTS approaches trigger decisions at ˆ

t, hope-

fully the closest as possible to the optimal time t∗, at least in

terms of cost: L(h(xˆ

t),ˆ

t, y)− L(h(xt?), t?, y) must be small.

Triggering such a decision is an online optimization prob-

lem, since ˆ

tmust be chosen based on a partial description xt

of the incoming time series xT(with t≤T), and the reward

function can be deﬁned as:

r(xt, t, h(xt), y) = −L(h(xt), t, y) if t=ˆ

tor t=T

0 otherwise

(3)

where the risk equals to 0 when no decision is made, given

that the decision is forced at t=Tresulting in an important

risk due to the delay cost.

Note: in the rest of this section, and for readability

reasons, T(the end of the considered time period) is

still considered as ﬁnite and known, as in the ECTS

problem. In Section 7, another setting is studied

where Tis indeterminate, i.e. where the successive

measurements are observed as a data stream.

(III )-Early decisions to be located in time constitute

a more challenging problem, which consists of both making

a decision for each incoming time series, but also predict-

ing a time period associated with the decision. For exam-

ple, maintenance operations on hydroelectric dam turbines

can only be performed when the electricity demand is at a

low enough level. There are therefore periods where mainte-

nance is possible and periods where this is not desirable. The

objective here is to determine as early as possible whether

3Note that the delay cost Ldelay(t) could depend on the

class yof the time series (see Section 3). For instance, in

the emergency department in a hospital, the cost of delaying

a decision when there is internal bleeding is not the same as

the one in case of gastroenteritis, where the early symptoms

could look the same. Here, for reasons of readability, we

make Ldelay depend only on t.

and during which period it will be possible to shut down the

turbines, within the day (if [1, T ] corresponds to one day).

In this case, the ground truth (y, (s, e)) consists of a class

y∈Y, associated with a certain time period [s, e], deﬁned by

astart timestamp s∈[1, T ] and a end timestamp e∈[s, T ].

At testing time, the objective is twofold: triggering the de-

cision as early as possible, while also predicting the associ-

ated time period [s, e]. Let us consider a decision denoted

by (h(xˆ

t),(ˆs, ˆe)), where h(xˆ

t) is the class predicted at ˆ

t(the

triggering time), and [ˆs, ˆe] is the associated predicted time

period. The loss function Lhas to be redeﬁned as a function

of the following parameters:

L

(h(xˆ

t),(ˆs, ˆe))

| {z }

predictions

,ˆ

t

|{z}

trigger ing time

,(y, (s, e))

| {z }

ground truth

(4)

The loss function Lneeds to be speciﬁed further, depending

on the considered application. In general, this loss function

should account for two aspects: (i) the quality of the predic-

tions ; (ii) the time overlap between the decisions made and

the true decisions. For example, Figure 1 shows a situation

where the decision made is correct, since the predicted class

(see the second line) matches the ground truth (see the ﬁrst

line). But these two decisions do not coincide exactly in

time, as the predicted time period is earlier than the ground

truth.

Figure 1: Example of a time-lagged decision.

(IV )-ML-EDM4considers, by extension, multiple early

decisions to be located in time (i.e. in the time period [1, T ]).

, which is necessary in numerous applications. For example,

consider a set of servers used to trade on a stock exchange

platform (where [1, T ] corresponds to the platform’s hours

of operation during the day). For each server, key perfor-

mance indices (e.g., CPU, RAM, network) are recorded over

time. The ground truth consists of a sequence of states (e.g.,

overload or nominal) associated with the corresponding time

periods. In this application, the task is to detect overload

periods as early as possible.

Thus, in this problem, the true decisions {yi,(si, ei)}kx

i=1 con-

sists of a sequence of varying length kx, which is speciﬁc to

each individual x. Each element of this sequence is a deci-

sion to be located in time, which consists of a class yi∈Y

associated with a certain time period [si, ei]. For a given

individual x, the time periods {(si, ei)}kx

i=1 constitute a time

partition, each interval [si, ei] being associated with the true

class yi(e.g. in predictive monitoring, this time partition

would correspond to the successive states, up or down, of a

given device).

Here, the online optimization problem to be addressed is

more complex than the previous one, since it consists in trig-

gering a sequence of decisions as soon as possible, without

4ML-EDM is a supervised problem, its extension to unsu-

pervised problems is discussed in challenge #1, Section 4.

knowing the number of true decisions kx, and also ignor-

ing the time periods associated with each true decision. Let

us consider that an ML-EDM approach triggers a sequence

of decisions {h(xˆ

ti0),(ˆsi0,ˆei0)}ˆ

kx

i0=1 ; where ˆ

kxis the number

of decisions made ; where {ˆ

ti0}ˆ

kx

i0=1 represents the associ-

ated triggering times ; and where {(ˆsi0,ˆei0)}ˆ

kx

i0=1 represents

the predicted time periods associated to the decisions which

forms a partition of the time period [1, T ]. In the scenario

of multiple early decisions to be located in time, a loss func-

tion LL needs to be deﬁned as a function of the following

parameters:

LL

{h(xˆ

ti0),(ˆsi0,ˆei0)}ˆ

kx

i0=1

|{z }

predictions

,{ˆ

ti0}ˆ

kx

i0=1

| {z }

trigger ing times

,{yi,(si, ei)}kx

i=1

| {z }

ground truth

(5)

This equation shows the loss function used to evaluate an

approach after the end timestamp T, when predictions have

been taken for all instants in the time period [1, T ] (see Sec-

tion 11.2 for more details). The loss function LL can be

expressed in many diﬀerent ways, depending on the appli-

cation considered. In practice, mapping rules need to be

deﬁned to match the decisions made to the true ones (see

Section 11.1).

Note that the problem of making predictions for all instants

in [1, T ] points to the issue as whether decisions made can be

revoked, or not, before T. In case decisions are irrevocable,

once a decision has been made, let us say (y, (s, e)), then it

is no longer possible to change the prediction of the class for

all times t∈(s, e). This renders the optimization problem

dependent upon previous decisions, and it becomes more

constraining for application cases. Revocable decision are

studied in Section 7.

- The deadline, denoted by T, after which decisions are

forced is an important component, that takes diﬀerent forms

depending on the problem. In the simple case of ECTS,

only one decision needs to be made before the incoming

time series is complete. Thus, the deadline is deﬁned as

the maximum size of the input series (T=T), which is

known in advance during training. By contrast, in the more

complex case of ML-EDM where multiple decisions to be

located in time must be taken before the end timestamp

T, the deadline Tis deﬁned as a maximum delay allowed to

detect the start of a true decision. In practice, two situations

can be distinguished:

•Some applications do not support the absence of deci-

sion, and the entire considered time period must be

partitioned by the successive decisions. This is the

case for instance when moderating content on social

networks, where discussions are continuously going on

between users and where each part of these discussions

must be classiﬁed as appropriate or not (see Section

10.3). In this case, no decision is not allowed, and

all decisions are subject to the cost Ldelay and thus

constrained by the deadline T.

•By contrast, in some applications, a nominal operat-

ing state exists which is almost permanent, and for

which there is no decision deadline. This is for instance

the case in predictive maintenance applications, where

there is no urgency or even a deadline to detect the ab-

sence of failure. In this case, the delay cost Ldelay and

also the deadline Tapply only to the other decisions

(e.g. failures categorized by severity level) excluding

the nominal state.

At the end, ML-EDM aims to develop approaches which

allow for easy adaptation to all cases, whether the deadline

Tis applicable to all decisions, or whether there exists a

nominal operation state which bypasses this deadline.

Question B - How to learn a triggering strategy from data?

In summary: this section shows that learning a

triggering strategy follows the usual general princi-

ples of Machine Learning approach, with the partic-

ularity to consider time-sensitive loss functions (i.e.

which depend on when decisions are triggered, as in

Equations 1, 4 and 5).

In practice, the optimal triggering strategy is not available

and it must be approximated by a learned function, such as

T rigg erγ≈T rigg er∗, where γ∈Γ is a set of parameters to

be optimized within the space of parameters Γ of a chosen

family of triggering strategies.

In addition, the hypothesis his supposed to be learned previ-

ously during the training phase, making the system capable

of predicting yat any time t∈[1, T ]. This hypothesis is

deﬁned by a set of parameters θ∈Θ.

To illustrate what a triggering strategy is, let us consider

an example from the ECTS literature. The SR approach,

described in [57], involves 3 parameters (γ1, γ2, γ3) to decide

if the current prediction h(xt) must be chosen (output 1) or

if it is preferable to wait for more data (output 0):

T rigg erγ(h(xt)) = 0 if γ1p1+γ2p2+γ3t

T≤0

1 otherwise (6)

where p1is the largest posterior probability estimated by h,

p2is the diﬀerence between the two largest posterior prob-

abilities, and the last term t

Trepresents the proportion of

the incoming time series that is visible at time t. The pa-

rameters γ1, γ2, γ3are real values in [−1,1] to be optimized,

as described more generally in the following.

In the simple case of ECTS, a single decision has to be made

for each time series x∈X(see Equation 1). Thus, the risk

associated with any triggering strategy T rigg erγbelonging

to any family Γ, is deﬁned as follows, given the previously

learned hypothesis hθwithin the family Θ:

R(T rigg erγ|hθ) = E

X,YhL(hθ(xˆ

t),ˆ

t, y)i(7)

where ˆ

tis determined by γ, the parameters of the triggering

strategy.

Similarly, the risk can be deﬁned in the more complex case

of multiple early decisions to be located in time. Let

Tpart be the set of all possible partitions of the time domain

[1, T ], having a varying number of time intervals k. The risk

can be deﬁned as:

R(T rigg erγ|hθ) =

E

X,Tpart,YkhLL({hθ(xˆ

ti0),(ˆsi0,ˆei0)},{ˆ

ti0},{yi,(si, ei)})i(8)

where {ˆ

ti0}and {(ˆsi0,ˆei0)}are determined by γ, and given

hθ. In Equation 8, the risk is an expectancy on three ran-

dom variables, drawing triplets from the join distribution

P(x,{(si, ei)},{yi}). The ﬁrst element corresponds to the

input data5, which is an individual x∈X. The two other

consist of the ground truth, which is composed of: (i) a par-

tition of the time domain {(si, ei)} ∈ Tpart with a particular

number of time intervals, denoted by k∈[1, T ] ; (ii) and a

set of class labels {yi} ∈ Ykfor each time interval.

Now, the objective is to approximate the optimal triggering

strategy T rigg er∗by ﬁnding γ∗∈Γ which minimizes the

risk, such that:

γ∗= arg min

γ∈Γ

R(T rigg erγ|hθ) (9)

The joint distribution P(x,{(si, ei)},{yi}) is unknown, thus

Equation 8 can not be calculated ; however a training set S

which samples this distribution is supposed to be available.

The risk can be approximated by the empirical risk calcu-

lated on the training set S={xj,{yj

i,(sj

i, ej

i)}}j∈[1,n],i∈kxj,

as follows:

Remp(T rig gerγ|hθ) =

1

n

n

X

j=1

LL {h(xj

ˆ

ti0),(ˆsj

i0,ˆej

i0)},{ˆ

ti0j},{yj

i,(sj

i, ej

i)}(10)

where ˆ

ti0jis the triggering time of the i-th made decision of

the j-th individual.

At the end, training an ML-EDM approach can be viewed

as a two-step Machine Learning problem: (i)ﬁrst, the hy-

pothesis hθmust be learned in order to predict the most

appropriate decision hθ(xt), at any time t∈[1, T ] ; (ii )

second, the best triggering strategy deﬁned by γ∗must be

learned, given the hypothesis hθand given the family Γ,

such that: γ∗= arg min

γ∈Γ

Remp(T rig gerγ|hθ) (11)

Question C - Can a triggering strategy be learned by Re-

inforcement Learning?

In summary: this section shows that learning a

triggering strategy of an ECTS approach can be cast

as a Reinforcement Learning (RL) problem, with re-

wards well chosen, and it might be expected that

provided with suﬃcient training, RL may end up

with a good approximation of an eﬃcient decision

function.

5Notice that the notation x∈Xin Equations 7 and 8

is an abuse that we use use to simplify our purpose. In

all mathematical rigor, the measurements observed succes-

sively constitute a family of time-indexed random variables

x= (xt)t∈[1,T ]. This stochastic process xis not gener-

ated as commonly by a distribution, but by a ﬁltration

F= (Ft)t∈[1,T ]which is deﬁned as a collection of nested σ-

algebras [43] allowing to consider time dependencies. There-

fore, the distribution P(x,{(si, ei)},{yi}) should also be re-

written as a ﬁltration.

Reinforcement learning [75] aims at learning a function, called

a policy π, from states to actions: π:S → A. Rewards can

be associated with transitions from states st∈ S to states

st+1 ∈ S under an action a∈ A. Rewards are classically

denoted r(st, a, st+1)∈R. In all generality, the result of

an action ain state stmay be non deterministic and one

among a set (or space) of states. The optimal policy π?is

the one that maximizes the expected gain from any state

st∈ S. This gain, denoted Rtstarting from the sate st, is

deﬁned as a function of the rewards from that state (e.g. a

discounted sum of the rewards received). In order to learn a

policy, value functions can be considered, such as the state-

value function vπ(s) classically deﬁned as:

vπ(st).

=Eπ[Rt|st] = X

a∈A

π(a|st)X

st+1,r

p(st+1, r |st, a)r(st, a, st+1 ) + γ vπ(st+1)(12)

where Eπ[·] denotes the expected value of a random vari-

able given that the agent follows the policy πand tis any

time step. In the case of a non deterministic policy, π(a|st)

denotes the probability of choosing action ain state stand

p(st+1, r |st, a) the probability of reaching state st+1 and re-

ceiving the reward rgiven that the action ahas been chosen

in state st. And γis a discounting factor: γ < 1.

In our case, the agent aims to learn a triggering strategy

given the previously learned classiﬁer hθ, and the state st=

(t, xt) is the current time tand the observed data at current

time. The instantaneous reward r(st, a) only depends on

the current state stand the action taken a(i.e. prediction

now, or postponed to a later time). Finally, the discounted

factor γ, usually present in RL for reasons of convergence

over inﬁnite episodes, is equal to 1 in our case, since we

always deal with ﬁnite episodes with forced decisions after

a maximum delay. So that the equation (12) simpliﬁes to:

vπ(st) = X

a∈A

π(a|st)r(st, a) + vπ(st+1)

when, during learning, the agent takes a decision, it updates

the value of the state stusing:

vπ(st) = r(st, a) + vπ(st+1)

where st+1 is the state after having taken the action ain

state st.

As the equation above shows, the core observation in RL is

that the value function for a state st(i.e. an estimation of

the expected gain from that state) is related to the value

function of states st+1 that may be reached from st. In that

way, information gathered further down a followed path can

be back-propagated to previous states thus allowing increas-

ingly better decisions from those states to be made.

For instance, in game playing, rewards may happen both

during play (e.g. the player just lost a pawn) and at the end

of the game (e.g. the player is chess mate). Similarly, one

could cast the ECTS problem as a RL problem where, at

each time step, the “player” is in state st= (t, h(xtk),xt)

and should choose between making a prediction (e.g. h(xt))

with an associated reward:

rt=−L(h(xt), t, y) = −Lprediction (h(xt), y )− Ldelay(t)

or postpone the decision, with no immediate associated re-

ward, that is rt= 0. If no decision has been made before the

term of the episode (e.g. when t=T) a decision is forced

(see Figure 2). Provided with enough time series to train on,

prediction h(xt)

no prediction

rt=0

st+1 =(t+1,h(xtk),xt+1)

st=(t, h(xtk),xt)

rt=L(h(xt),t,y)

End

Forced decision

at time T if not taken before

rt+1 =L(h(xt+1),t+1,y)

rt+2 =L(h(xt+2),y +2,y)

rT=L(h(xT),T,y)=Lprediction (h(xT),y)+Ldelay(T)

Figure 2: A part of an ECTS “game” when learning an op-

timal policy while “playing” a training time series. When

a prediction is made, the game stops, otherwise it contin-

ues until a prediction is made or the term of the episode is

reached.

and suﬃcient training in the form of “playing” these time

series, a reinforcement learning agent may end up with a

policy bπthat approximates a good early triggering strategy,

one that would converge over time, after a very large num-

ber of “plays” on the training time series, to the optimal

decision function π?(See Equations 2 and 11 ).

The RL framework is very general. It uses immediate and

delayed rewards. As shown in this section, there is in prin-

ciple no obstacle to apply RL to the learning of a good trig-

gering strategy. However, if used directly, the generality

of RL is paid for by a need for a large number of “experi-

ments”. In addition, the state space is continuous in the case

of the ECTS problem, thus an interpolating functions must

be used in order to represent the values such as vπ(s) and

this entails the choice of a family of functions and setting

their associated parameters.

Another approach, the one favored in the current literature

for ECTS [2], is to choose functions for representing the ex-

pected values of decision times, and thus providing a ground

for the triggering strategy. This has the merit of incorporat-

ing prior knowledge of the trade-oﬀ between earliness and

accuracy, at the cost of making modelling choices that may

bias the method of estimating the expected future cost.

The respective performances, merits and limits of both ap-

proaches should be studied empirically by a comparison of

RL based ECTS approaches, such as [53], with approaches

that explicitly exploit the form of the optimization criterion

designed for ECTS as in [2].

3. ORIGIN OF THE DELAY COST

ML-EDM approaches aim to trigger decisions at the right

time, by reaching a good trade-oﬀ between the earliness and

the accuracy of their decisions. To achieve this, a balance

must be found between penalizing late decisions and penal-

izing prediction errors. Decision costs are key to make this

antagonistic trade-oﬀ choice, as they allow us to evaluate the

cost of waiting for new measures vs. the cost of making a

decision now. In Section 2, decision costs are involved start-

ing from Equation 2 in the loss function Land they have

an important impact on the entire path of the description

of the ML-EDM problem. The ob jective of this section is to

understand the deep origin of the delay cost.

The delay cost represents the cost of postponing a decision

(see the function Ldelay in Equation 1). In the particular

case of ECTS problems, the delay cost is present in all the

works described in scientiﬁc literature. But it can be explic-

itly deﬁned as in [2,56], or implicitly as in most approaches.

For instance, the authors in [83] trigger all the decisions

at the minimum prediction length, which correspond to the

early moment such that no prediction diﬀers from those ap-

plied to the full-length training time series (based on a KNN

classiﬁer). This approach thus implicitly assumes that the

delay cost is very low, by favoring the accuracy of decisions

at the expense of their earliness. In [59], the authors pro-

pose to model the trade-oﬀ between earliness and accuracy

as a multi-objective criterion and explore the Pareto front

of multiple dominant solutions. This approach is useful in

applications where earliness and accuracy can not be evalu-

ated in a commensurable way, and it provides a collection of

optimal solutions each corresponding to a particular value

of the delay cost.

For a better understanding, let us examine what happens

once a decision is triggered in the simple ECTS problem.

Figure 3 represents a classiﬁer and a triggering strategy. At

each time step t∈[0, T ], the classiﬁer predicts the condi-

tional distribution P(y|xt) based on the input incomplete

time series xt=hx0, x1,...,xti. Then, the triggering strat-

egy either decides to postpone the decision until a new mea-

surement xt+1 is available, or to trigger the decision by pre-

dicting the class value. In this ﬁrst scenario, let us consider

that triggering a decision at time timplies performing a

given task (namely αor β) which depends on the predicted

class (respectively Aor B).

Figure 3: Tasks to be performed after the triggering of a

decision.

Given that this task (αor β) must be completed before the

deadline T, the problem is to determine how the cost of per-

forming this task evolves depending on the trigger time t. In

practice, the delay cost Ldelay takes the form of a parametric

function (e.g., a constant [83], linear [2] or exponential [9]

function), whose form characterizes the additional cost to

delay the execution of the tasks.

Aconstant cost, one where there is no penalty associated

with delaying the decision, would mean that these tasks are

achievable in an arbitrarily short time T−t < . In practice,

an irreducible amount of time is needed to perform the tasks

using a single worker. To reduce this time, the tasks need to

be parallelized using several workers, incurring an extra-cost

when building the global result from sub-tasks. Formally, a

constant delay cost would mean that the tasks are inﬁnitely

parallelizable, i.e. they can be divided into independent and

arbitrarily small sub-tasks, and that there is no extra-cost

in building the global result.

More generally in ML-EDM problems, the delay cost Ldelay

is necessarily an increasing function (monotonic or piece-

wise) depending on the time remaining before the decision

deadline, and it may depend on the decision made (i.e. the

predicted label). In addition, it should tend to +∞when

the time remaining to perform these tasks T − ttends to

zero [9]. For example, this delay cost may be modeled by

Ldelay(t) = 1/(T − t)α, with a single parameter αwhich

inﬂuences the increase in cost when (T − t)→0.

4. LEARNING TASKS

As in ECTS, the formal deﬁnition of ML-EDM provided

in Section 2 is limited to classiﬁcation problems, and in-

volves ground truth. However, in many applications, it is

extremely hard or costly to obtain, especially in the case

of anomaly detection (e.g. fraud, cyber-attacks, predictive

maintenance). In these application domains, there are sev-

eral issues: (i) labels can be extremely expensive to obtain

as they each require an examination from an expert ; (ii ) the

labels provided by experts can be uncertain ; and (iii ) the

class of anomalous observations is often poorly represented

and drifts over time. For example, cyber-attack techniques

are very diverse and change with time. Faced with these

diﬃculties, anomaly detection is often addressed using un-

supervised approaches, by assuming that the anomalies are

outliers. In this case, the problem comes down to modeling

the normal behavior of the system, if possible using histori-

cal data that are cleaned of anomalies. Then, it is necessary

to deﬁne the notion of outlier to be able to assign an eccen-

tricity score to the new observations. Note that this type of

modeling can be considered as a ﬁrst step to manage non-

stationarity, since in this case the stationarity assumption

only concerns the normal behavior of the system (this as-

sumption could be removed in future work).

Challenge #1:

extending non-myopia to unsupervised approaches

A variety of unsupervised early decision problems could be

studied, of which two examples are listed below: (i) the

problem could be to decide, as soon as possible, whether

a partially observed time series hx1, x2,...,xtiwill be an

outlier (or not) when fully observed at time T(i.e. with a

single decision triggered for each incoming time series, as in

the ECTS problem described in Section 2) ; (ii) an other un-

supervised problem could be to detect, online and as early as

possible, the chunks of the input data stream which deviate

from the nominal learned behavior (i.e. with multiple early

decisions to be located in time, as in the ML-EDM prob-

lem). In both cases, the accuracy vs. earliness trade-oﬀ still

exists. On the one hand, an early detection is inaccurate by

nature because the outlier series (resp. chunk) is unreliably

detected, based on few (resp. poorly informative) observed

measurements. On the other hand, delaying the detection of

anomalies can be very costly. For instance, a cyber-attack

which is not detected immediately gives time to the hakers

to exploit the security hole found. Designing ML-EDM ap-

proaches to tackle unsupervised learning tasks is challenging

in several respects: (i) learning a triggering strategy with

the goal of achieving a good trade-oﬀ between earliness and

accuracy of its decisions cannot be achieved in the Machine

Learning framework as described in the section 2 and should

be formalized in another way without labels (in particular,

the models evaluation described in Equation 5 should be

reconsidered); (ii) developing unsupervised non-myopic ap-

proaches is very diﬃcult, as the training set does not contain

anomalous series, thus the triggering strategy cannot learn

from their continuations.

The extension of ML-EDM both to online scenarios (see

Section 6) and to unsupervised tasks is of particular interest,

because combined they would enable a new generation of

monitoring systems [1] to be developed. In this case, the

learning task would consist in detecting online the start and

end of the outlier chunks: (i) without requiring labels to

learn the model ; (ii ) by considering the trade-oﬀ between

accuracy and earliness to trigger the decisions at the right

time.

Challenge #2:

addressing other supervised learning tasks

The formal description of ML-EDM proposed in Section 2

is generic, in the sense that the type of the target variable y

can easily be changed. By deﬁnition, the ECTS approaches

in the literature are limited to classiﬁcation problems, but

they could naturally be extended to other supervised learn-

ing tasks. For instance, predicting a numerical target vari-

able from a time series is a problem known as Time Series

Extrinsic Regression (TSER) [76]. In some domains, TSER

approaches are very useful and allow applications such as the

prediction of the daily energy consumption of a house, based

on the last week’s consumption, temperature and humidity

measurements. Early TSER would consist of predicting the

value of the numerical target variable as soon as possible,

while ensuring proper reliability. Another example of a su-

pervised task for which ML-EDM approaches could be devel-

oped is time series forecasting [16]. Basically, a forecasting

model aims to predict the next measurements of a time se-

ries up to an horizon ν,Y=hxt+1, xt+2 ,...,xt+νifrom the

recent past measurements X=hxt−w,...,xt−1, xti. Using

a forecasting model, in a an online and early way, would

consist of adapting the forecast horizon t+νaccording to

the observed values in X, by modeling the trade-oﬀ between

the accuracy and the earliness of these predicted values.

The ML-EDM problem described in Section 2 should also be

adapted to semi-supervised learning, which is of great help

when the ground truth is only partially available. More

generally, the collected ground truth may be imperfect for

various practical reasons, such as the labeling cost, the avail-

ability of experts, the diﬃculty of deﬁning each label with

certainty, etc. This problem has recently gained attention in

the literature through the ﬁeld of Weakly Supervised Learn-

ing (WSL) [86] which aims to list these problems and pro-

vide solutions.

Challenge #3: early weakly-supervised learning

The extension of ML-EDM to weakly-supervised learning is

an interesting challenge, as it would allow to better address

applications where the ground truth has corruptions or is

incomplete (which includes semi-supervised learning). How-

ever, the weakly-supervised learning is a very large domain

with many types of supervision deﬁciencies to be studied.

From a practical point of view, the priority is probably to

extend ML-EDM to label noise, and more speciﬁcally to bi-

quality learning [60], where the model is trained from two

training sets: (i) one trusted with few labels ; (ii) the other,

untrusted, with a large number of potentially corrupted la-

bels. This would allow interesting applications, such as in

cyber security where few labels are investigated by an ex-

pert, and the majority of labels are provided by rule-based

systems. The ma jor diﬃculty in designing bi-quality learn-

ing ML-EDM approaches is to learn a triggering strategy

from these two training sets, which models the compromise

between accuracy and earliness in a robust way to label

noise. Another interesting avenue would be to adapt Ac-

tive Learning [71] approaches to ML-EDM, with the goal of

labeling examples which improve both accuracy and earli-

ness of the decisions. Such approaches would be particularly

helpful when early decisions have to be made, and when la-

beling examples is very costly as, again, it is the case in

cyber security applications.

5. TYPES OF DATA

The ML-EDM deﬁnition proposed in Section 2 involves mea-

surements (i.e. scalar values) acquired over time. However,

this is only for reasons of simplicity of exposition. Ideally,

ML-EDM approaches should be data type agnostic, i.e. they

should operate for any data type as long as measurements

are made over time and decisions are online.

Below, we outline data types that are present in applications

where ML-EDM could be used.

i) Multivariate time series consist of successive measure-

ments each containing more than one numerical value.

ii) More complex signals exist, such as video streams which

involve higher dimension.

iii) Data streams is another type of data which can contain

both numeric and categorical variables [10]. Successive

measurements are received in an uncontrolled order

and speed.

iv) Another type of data is evolving graphs which con-

sist of graphs whose structure changes over time [46].

Several types of learning tasks can be considered, such

as predicting the next changes in the graph structure,

or the classiﬁcation of parts of the graph (e.g. nodes,

arcs, sub-graphs).

v) Successive snapshots of relational data [19] should be

consider to design new ML-EDM approaches. More

precisely, relational data consists of a collection of ta-

bles having logical connections between them. Like

other types, relational data can evolve over time: (i)

the connections between tables can change ; (ii) as

well as the structure of the tables ; (iii) or even the

values of the information stored in the tables.

vi) Text is another widespread type of data. An appli-

cation example is the moderation of social networking

platforms, with early deletion of inappropriate con-

tents and automatic closure of fraudulent accounts (see

Section 10.3).

Challenge #4: data type agnostic ML-EDM

Ideally, the new developed ML-EDM approaches should be

data type agnostic, i.e. they should operate for any data

type presented above. To do so, a pivotal format needs to

be deﬁned in order to learn the triggering strategies in a

generic way. For instance, each learning example could be

characterized by a series of Tpredictions indexed by time

(corresponding to the output of the learned hypothesis h(xt)

for each time step t∈[1, T ]), as well as by {yi,(si, ei)}kx

i=1

the ground truth composed of the true decisions to be made

over time for this individual. In the particular case of ECTS,

some approaches can easily be adapted to become agnostic

to data type [2, 58, 59]. In contrast, others have been de-

signed to be very speciﬁc to time series [31, 37, 83, 84], espe-

cially with the search of features (e.g. shapelets) occurring

early in the time series and helping to discriminate between

classes. More generally, future work in ML-EDM should

deﬁnitely promote data type agnostic approaches, to allow

the use of these techniques in a wide range of application

conditions.

6. ONLINE EARLY DECISION MAKING

In the speciﬁc case of Early Classiﬁcation of Time Series

(ECTS), an important limitation is that the training time

series: (i) have the same length T; (ii) correspond to diﬀer-

ent i.i.d individuals ; (iii) have a label which characterizes

the whole time period of length T. There are obviously

applications where this formulation of the problem is rele-

vant [7,17,23,34,50,67,72,78], especially in cases where the

start and end of the time series are naturally deﬁned (e.g. a

day of trading takes place from 9:30am to 4pm, during the

opening hours of the stock exchange).

The development of online ML-EDM approaches could over-

come these limitations and enable a new range of applica-

tions. For this purpose, let us consider that the input mea-

surements are observed without interruption, in the form of

adata stream [29]. In the case of a classiﬁcation problem,

an online ML-EDM approach would consist in identifying

chunks in the input data stream (i.e. ﬁxed time-windows

deﬁned by their start and end timestamps) and categorizing

them according to a predeﬁned set of classes. For example,

in a predictive maintenance scenario [64] such an approach

would operate on a continuous basis to detect periods of

system malfunction as soon as possible.

Figure 4: Example of a data stream labeled by chunks over

a time period

Challenge #5:

online and early predictions to be located in time

In the case of a classiﬁcation problem, the training data

consist of the measurements observed from the stream dur-

ing the training period, denoted by x=hx1, x2,...,x|x|i,

associated with their labels y=hy1, y2,...,y|x|i. A la-

beled chunk is formed by the consecutive measurements, be-

tween the timestamps taand tb, if their labels share the

same value (i.e. if {yi}i∈[ta,tb]is a singleton). As shown in

Figure 4, the data stream deﬁned over the training period

is labeled by chunks of variable size. For example, these

chunks could represent the periods of failure and nominal

operation in a predictive maintenance scenario. During the

deployment phase, the model is applied online on a data

stream whose measurements are observed progressively over

time. This model is expected to provide predictions located

in time, since it needs to predict the beginning and the end

of each chunk, associated with the predicted class which

characterizes the state of the system during this chunk.

Challenge #6:

online accuracy vs. earliness trade-oﬀ

Designing online ML-EDM approaches requires redeﬁning

the accuracy vs. earliness trade-oﬀ for online decisions. The

main issue is that a data stream is of indeterminate length:

(i) its beginning may be too old to be considered explic-

itly, or can even be indeterminate ; (ii ) its end is never

reached, since it is constantly postponed by the new mea-

surements which arrive. In the particular case of ECTS, it

is precisely the fact that the input series has a maximum

length T, known in advance, that leads to force triggering

the decision when the current time tbecomes close to the

deadline T.

The rest of this paragraph presents an example of adapt-

ing the accuracy vs. earliness trade-oﬀ to online decisions

developed in [4]. Let us consider a predictive maintenance

problem for which a classiﬁer has been trained in batch in or-

der to detect the beginning and the end of abnormal chunks

(see Figure 5). The prediction of the classiﬁer focuses on a

ﬁxed timestamp sand the question is to determine if this

timestamp corresponds (or not) to the beginning of an ab-

normal section. The input features used by the classiﬁer are

extracted from a sliding window xt=hxt−w,...,xt−1, xti

of length w. As shown in Figure 5, the sliding window xt

moves over time as it gets closer to s. At ﬁrst, the timestamp

sis located in the future (s > t). Making a good prediction

is diﬃcult since the potentially anomalous part of the stream

is not yet visible in xt. In this case, the classiﬁer have to

detect the early signs of an anomaly. Then, the timestamp

senters the xtwindow (at time t= 4). The prediction

becomes easier to perform, since a part of the potentially

abnormal chunk is visible in xt. The last possible moment

to trigger the decision is reached when the timestamp sis

getting ready to exit the sliding window xt.

Finally, the accuracy vs. earliness trade-oﬀ occurs as fol-

lows: (i) on the one hand, the accuracy of the decisions

increases over time due to the classiﬁcation task that be-

comes easier as the xtwindow shifts ; (ii ) on the other

hand, predictive maintenance applications require early de-

cisions which allow to anticipate breakdowns, or at least to

detect it early. Ultimately, this proposal consists of changing

the deﬁnition of what is predicted as normal or abnormal.

Here, the observation to be scored is no longer a time series

of ﬁnite length, but a particular measurement of the input

data stream identiﬁed by its timestamp. This proposal only

partially addresses the problem, as the predictions for each

timestamp would have to be consolidated in order to predict

the start and end of each chunk. There are certainly other

ways to adapt the accuracy vs. earliness trade-oﬀ to online

decisions that would be valuable to investigate.

Figure 5: Illustration of the earliness vs. accuracy trade-oﬀ

for online decisions

Challenge #7:

management of non-stationarity in ML-EDM

It is not always realistic to assume stationarity of the data.

In practice, data collected from a stream may suﬀer from

several types of drifts: (i) the distribution of the measure-

ments within the sliding window xtcan vary over time, this

is called covariate-shift [63]; (ii) the prior distribution of

the classes P(y) can be subject to such drifts; (iii) and the

concept to be learned P(y|x) can also change when concept

drift occurs [28].

To manage these non-stationarities, a ﬁrst family of ap-

proaches maintains a decision model trained using a sliding

window of most recent examples. This is a blind approach,

in the sense that there is no explicit drift detection. The

main problem is deciding the appropriate window size.

A second family of approaches, explicitly detects the drifts

[30, 48] and triggers actions when necessary, such as re-

training the model from scratch, or using a collection of

models in the case of ensembles. In this case, detecting

concept drift can be considered as similar to the anomaly

detection problem, and ML-EDM approaches could be used

to tackle it in future work. A popular idea is to train the

decision model using a growing window while data is sta-

tionary, and shrink the window when a drift is detected.

These kinds of approaches can easily be adapted to on-

line ML-EDM, since they decouple model training and non-

stationarity management.

In the case of incremental concept drift, a third family of

approaches consists in continuously adapting the model by

training it online from recent data. This kind of adaptive

approach is much more challenging to adapt to online ML-

EDM. Indeed, as in ML-EDM problems (see Figure 3), two

kinds of models are used: (i) the predictive model(s), which

can categorize the input data stream at any time ; (ii) the

triggering strategy which makes the decisions at the ap-

propriate time. The main challenge in developing adaptive

drift management methods for the online ML-EDM prob-

lem is that the parameters of the predictive models and of

the triggering strategy must be updated jointly. These two

kinds of models are highly dependent: updating the param-

eters of one has an impact on the optimal parameters of the

other.

By contrast, in standard ML-EDM approaches which oper-

ate in batch mode, the parameters of the predictive models

are ﬁrst optimized, and then the parameters of the trigger-

ing strategy are optimized in turn given the parameters of

the classiﬁers (see paragraph B in Section 2). This two-step

Machine Learning scheme is deﬁnitely not valid for manag-

ing drift online [45]. Adaptive drift management for the

online ML-EDM problem has not yet been addressed in

the literature and constitutes an interesting research direc-

tion. In drift detection systems, there is a trade-oﬀ between

fast detection and the number of false alarms. Moreover,

in problems where the target (e.g. the labels) is not al-

ways available or available with a delay requires unsuper-

vised or semi-supervised drift detection mechanisms. The

ML-EDM framework, improving the compromise between

earliness and accuracy, can provide new approaches for drift

detection.

7. REVOCABLE DECISIONS

In many situations, one can take a decision and then decide

to change it after some new pieces of information become

available. The change may be burdensome but nevertheless

justiﬁed because it seems likely to lead to a much better

outcome. This can be the case when a doctor revises what

now seems a misdiagnosis.

Similarly, ML-EDM should be extended to consider such a

revocation mechanism. In the classical ML-EDM problem as

described in Section 2, a prediction h(xˆ

t) cannot be changed

once the decision is triggered at time ˆ

t≤T. The cost of

such an irrevocable decision is given by the loss function

described by Equation 5. Whereas, the extension of ML-

EDM to revocable decisions [3] allows a prediction to be

modiﬁed several times before the end Tof the considered

time period. On the one hand, the revocation of a decision

generates a higher delay cost Ldelay, as well as a cost of

changing the decision Lrevoke . On the other hand, new data

observed in the meantime provide information that makes

the prediction more reliable, thus tending to decrease the

misclassiﬁcation cost Lprediction. Ultimately, the main issue

is to identify the appropriate decisions to revoke, in order to

minimize the global cost, given by Equation 13.

Such an extension to revocable decisions could be of great

interest: (i) in applications where the cost of changing de-

cisions is low, i.e. the DAGs associated with each possible

decision share reusable tasks (see Section 3) ; (ii) in appli-

cations involving online early decision making (see Section

6). There are many use cases where the need to revoke de-

cisions appears clearly. For instance, the emergency stop

system of an autonomous car brakes as soon as an obstacle

is suspected on the highway, and releases the brake when

it realizes, as it gets closer, that the suspected obstacle is a

false positive (e.g. a dark spot on the road).

Developing ML-EDM approaches capable of appropriately

revoking its decisions involves solving the two following chal-

lenges:

Challenge #8: reactivity vs. stability dilemma for

revocable decisions

The ﬁrst issue is to ensure that a decision change is driven

by the information provided by the recently acquired mea-

surements, and not caused by the inability of the system to

produce a stable decision over time. This problem is not

trivial. On the one hand, the system needs to be reactive

by changing its decision promptly when necessary. On the

other hand, the system is required to provide stable deci-

sions over time by avoiding excessively frequent and undue

changes. Thus, a trade-oﬀ exists between the reactivity of

the system and its stability over time. One way to formalize

this trade-oﬀ is to associate a cost to decision changes, as it

is proposed in part (iii) of Equation 13. To our knowledge,

only one approach uses such a cost of decision change [3],

in order to penalize revocation of too many decisions. The

reactivity vs. stability dilemma of revocable decisions is un-

derstudied in the literature, and it would be interesting for

the scientiﬁc community to work on this question.

Challenge #9:

extending non-myopia to revocation risk

Non-myopic ML-EDM approaches are capable of estimating

the information gain that will be provided by future mea-

surements, based on the currently visible ones. In other

words, these approaches are able to predict the reliability

improvement of a decision in the future. Thus, a decision is

triggered when the expected gain in miss-classiﬁcation cost

at the next time steps does not compensate the cost of de-

laying the decision [2]. In the case of revocable decisions,

an important challenge is to estimate the future informa-

tion gain by taking into account the risk of revocation it-

self. Speciﬁcally, a decision that will probably be revoked

afterward should be delayed due to this risk. Conversely, a

decision which promises to be sustainable should be antici-

pated. Designing non-myopic to revocation risk approaches

could be an important step forward to (i) optimize the ﬁrst

trigger moment, and (ii) reduce the number of undue de-

cision changes. The approach proposed in [3] constitutes

a ﬁrst step in this direction, by assigning a cost to decision

changes and considering it in the expectation of future costs.

To the best of our knowledge, this is the only approach which

provides this interesting property. It is not clear whether al-

ternative methods are possible. This is an interesting topic

for further studies by the scientiﬁc community.

8. ORIGIN OF THE DECISION COSTS

The origin of the delay cost has been studied in Section

3, however it is necessary to further specify the operating

scenario in order to understand the other decision costs

involved in ML-EDM. Figure 6 describes a binary ECTS

problem, where the actions to be performed depend on the

predicted class and are described by two Directed Acyclic

Graphs (DAG). These DAGs characterize the sequence and

the relationships between the unit tasks which compose them

(e.g. task 1 must be completed before starting task 2). Here,

the DAGs of tasks are ﬁxed, they do not depend on the de-

cision time.

The total cost of a decision can be decomposed by:

(i) the delay cost, denoted by Ldelay , which reﬂects the

need to execute the DAG of actions corresponding to

Figure 6: DAGs of tasks to be performed after the triggering

of a decision.

the new decision in a constrained time, and in a par-

allel way (already detailed in Section 3);

(ii) the decision cost, which corresponds to the consequences

of a bad decision, or the gains of a good decision (de-

noted by Lprediction).

(iii) the revocation cost, which is the cumulative cost of

the mistakenly performed tasks belonging to the DAG

of previously made bad decisions, and which are not

reusable for the new decision (denoted by Lrevoke) ;

When expressed in the same unit, these diﬀerent types of

costs can be summed up in order to reﬂect the quality of

the decisions made, and their timing. Thus, Equation 1

becomes:

L(h(xt), t, y) =

(i)

z }| {

Ldelay(t) +

(ii)

z }| {

Lprediction (h(xt), y)

+Lrevoke h(xt)|{(h(xˆ

ti),ˆ

ti)}i∈[1,Dx

t]

| {z }

(iii)

(13)

where {(h(xˆ

ti),ˆ

ti)}i∈[1,Dx

t]represents the sequence of the

previously made decisions and their associated triggering

time, with ˆ

ti< t, ∀i∈[1, D x

t].

Term (ii): Taking into account the decision cost is a very

common feature in the literature, particularly in the ﬁeld of

cost-sensitve learning [20]. These techniques take as input a

function Lprediction( ˆy|y) : Y × Y → Rwhich deﬁnes the cost

of predicting ˆywhen the true class is y. The aim is to learn

a classiﬁer which minimizes these costs on new data.

Term (iii): By contrast, the study of the revocation cost

is very limited in the literature. To our knowledge, [3] is

the only one article article that considers this problem, and

this work shows that assigning a cost to decision changes is

a ﬁrst lead to manage the reactivity vs. stability dilemma,

and to design non-myopic to revocation risk approaches (i.e.

discussed later in challenges #8 and #9). The origin of

this cost can be explained in the light of the tasks to be

performed once a decision is triggered (see Figure 6). For

instance, let us consider the ﬁrst decision noted by (A, ˆ

t1),

in which the system predicts at time ˆ

t1that the input time

series belongs to the class A. This decision is then revoked

in favor of a new decision (B, ˆ

t2). The cost of changing this

decision, denoted by Lrevoke ((B, ˆ

t2)|(A, ˆ

t1)), can be deﬁned

as the cost of the actions already performed between ˆ

t1and

ˆ

t2which turn out to be useless for the new decision, i.e.

which cannot be reused in the DAG of tasks corresponding

to the new predicted class B. In order to deﬁne the costs of

decision changes, it is necessary to identify the common tasks

between the DAGs of the diﬀerent classes and to evaluate

their execution time. In addition, the entire sequence of

the past decisions must be taken into account to identify

the already completed tasks which are now useful for the

achievement of the current DAG of tasks. For instance,

the cost Lrevoke ((A, ˆ

t3)|{(A, ˆ

t1),(B, ˆ

t2)}) can be reduced by

the tasks executed between ˆ

t1and ˆ

t2, if these tasks are not

perishable, i.e. the results are identical to those that would

be obtained by re-executing these tasks at ˆ

t3.

Challenge #10:

scheduling strategy and time-dependent costs

In this paper, the DAGs of tasks are supposed to be ﬁxed,

i.e. not depending on the decision time. However, a more

general problem could be considered (see Figure 7) where

the DAGs of tasks are generated by a scheduling strategy

depending on: (i) the decision made ; (ii ) and the deci-

sion time. Such a scheduling strategy is useful in appli-

cations where the actions to be performed after a decision

can be adapted to a time budget available to perform them.

Two situations may occur: (i) ideally, a decision is triggered

early enough to allow the scheduling strategy to generate a

complete DAG of tasks which is optimal given the decision

made (as in Figure 6) ; (ii) on the contrary, in the case of a

too late decision, the scheduling strategy needs to build the

DAG so that it can be achieved in the remaining time (e.g.

by parallelizing some tasks, by changing or removing some

of them). For instance, when ﬂying an airplane, the tasks

to be performed for an emergency landing are not the same

as for a normal landing, and there is a range of situations

with diﬀerent emergency level, and therefore corresponding

to diﬀerent time budgets.

Such a time-dependent scheduling strategy radically trans-

forms the ML-EDM problem and the way it can be formu-

lated. In particular, the triggering and scheduling strategies

become mutually dependent:

1. Decision costs depend on the generated DAG of tasks:

all the previously discussed costs result from the struc-

ture of the DAG to be performed conditionally to the

decision made: (i) the relationships between the tasks

; (ii) their execution time ; (iii) the conditions of their

reuse when they are common to several DAGs. Since

the structure of the DAG to be performed now de-

pends on the decision time, the decision costs can no

longer be considered as ﬁxed, and they are available

only after scheduling.

2. The optimal decision time depends on the cost values:

on the other hand, the triggering strategy aims to op-

timize the decision time based on the cost values. As

described in Equation 11, the triggering strategy is

learned by minimizing the empirical risk, which is it-

self estimated using a loss function based on the costs.

Figure 7: DAG of tasks to be performed after the triggering

of a decision, generated by a scheduling strategy.

This mutual dependency between the triggering and the

scheduling strategies has strong impacts on the ML-EDM

problem. In particular, the optimal decision time t∗de-

scribed in Equation 2 must be redeﬁned as a ﬁxed point,

i.e. the function to be optimized takes the optimal so-

lution as an input parameter, in such a way that t∗=

arg mint∗∈[1,T ]L(h(xt∗), t∗, y). This leads to a much more

diﬃcult class of optimization problems, for which the simple

existence of a solution is diﬃcult to ensure.

Finding an optimal triggering strategy when the scheduling

strategy is itself time-dependent makes ML-EDM a quite

diﬃcult challenge as the scheduling strategy is only known

through its interactions with the triggering strategy. In this

case, Reinforcement Learning seems to be a possible option

to address the problem. The scheduling strategy could then

be considered as part of the environment, and a contributor

to the reward signal by determining the decision costs for

each decision taken at a particular time. However, this line

of attack remains to be investigated in order to assess its

merit.

In many applications, fortunately, the implementation of a

scheduling strategy is much simpler, especially when the

variation of decision costs over time is known in advance

(or modeled, and thus are partially known). The preceding

remarks are reminders that if considered in all its complex-

ity, ML-EDM becomes a very diﬃcult problem. Addressing

the case where the costs are assumed to be time dependent

but with a known form, already oﬀers interesting challenges

and corresponds to a variety of applications.

9. OVERVIEW ON CHALLENGES

This section provides an overview of the previously pre-

sented challenges, indicating references which address part

of these challenges (see the second column of Table 1), and

summarizing the main prospects for applications in the short

and long term (see the last column of Table 1). Table 1 or-

ganizes the proposed challenges by category, using colors to

identify: (i) those related to changing the learning task ; (ii)

those related to online ML-EDM ; (iii) and those related to

revocable decisions.

ML-EDM challenges SOTA Main application perspectives

#1 (Section 4)

Extending non-myopia to

unsupervised approaches

In anomaly detection applications, anticipate the deviation of an

observed individual from a normal behavior.

#2 (Section 4)

Addressing other supervised learning tasks

Adapt ECTS approaches to extrinsic regression problems.

Develop forecasting methods whose prediction horizon can adapt.

#3 (Section 4)

Early weakly supervised

learning (WSL)

Adapt ECTS approaches to the diﬀerent WSL classiﬁcation sce-

narios.

#4 (Section 5)

Data type agnostic ML-EDM

[2, 18, 56,

57]

Identify agnostic approaches in the literature and promote this

feature.

Deﬁne a pivotal format allowing to develop an ML-EDM library.

#5 (Section 6)

Online predictions to be

located in time

Applications where the arrival of an event (e.g. a failure) must be

predicted in advance, as well as its duration.

#6 (Section 6)

Online accuracy vs. earliness trade-oﬀ

[4] Optimize decision time in online predictive maintenance applica-

tions.

#7 (Section 6)

Management of

non-stationarity in ML-EDM

Properly manage the potentially long life of ML-EDM models.

#8 (Section 7)

Reactivity vs. stability dilemma for revocable

decisions

[3] Applications where undue and excessive decision changes must be

avoided.

#9 (Section 7)

Non-myopia to revocation risk

[3] Applications where it is necessary to delay decisions which are

likely to be changed later.

#10 (Section 8)

scheduling strategy and time-dependent decision

costs

Applications where the variation of the decision costs over time is

known or can be modeled.

Applications where the scheduling strategy is only known through

its interactions with the triggering strategy.

Table 1: Overview of the proposed challenges by category: in blue those related to the learning task, in green those related

to online ML-EDM, in purple those related to revoking decisions, and in white the others.

10. USECASES

ML-EDM approaches can be applied to a wide range of ap-

plications, such as cyber security [87], medicine [41], surgery

[69]. This section develops some key use cases and identiﬁes

possible advances in near future, if the proposed challenges

are met.

10.1 Early classiﬁcation of fetal heart rates

There are no precise ﬁgures on the number of deaths in child-

birth due to poor oxygenation. According to the Portuguese

Directorate-General for Health, the number of children who

died due to hypoxia in 2013 was 192 fetuses. This is a critical

example where making informed early decisions is critical.

Cardiotocography techniques are used to assess fetal well-

being through continuous monitoring of fetal heart rate and

uterine contractions [51]. Labor is a potentially threatening

situation to fetal well-being, as strong uterine contractions

stop the ﬂow of maternal blood to the placenta, compromis-

ing fetal oxygenation [61].

In this ﬁeld, ML-EDM techniques could be of great help to

detect the early warning signs of complications during child-

birth. This application can be addressed as an ECTS prob-

lem, as a fetal heart rate signal constitutes a time series.

The extension of ECTS techniques to revocable decisions

would be very relevant (see challenges #8 and #9) allow-

ing for active monitoring of the children’s well-being on a

continuous basis, until delivery. In addition, two particular

aspects need to be taken into account in developing an eﬃ-

cient approach: (i) the prediction cost Lprediction is highly

asymmetrical since a false negative can mean the death of

the baby or the mother; (ii) the deadline Twhich represents

the moment of delivery is uncertain and varying. Thus, the

deadline Tcorresponds to the occurrence of an event (i.e.

the birth ) which can be modeled as random variable as

in [26, 44].

10.2 Digital twin in production systems

Digital Twin (DT) is an important active concept in the

area of Industry 4.0. With the development of low cost

sensors and eﬃcient IoT communication facilities, almost all

production systems are now equipped with several sensors

enabling real time monitoring and helping in decisions about

maintenance, or when failures occur. In this section, we

consider digital twins (DT) of cyber-physical systems (CBS)

which are in operation.

The main digital twin applications [27] are related to smart

cities, manufacturing, healthcare and industry. The role of

the DT is thus to use the data streams coming from the sen-

sors of the CBS in order to constantly calibrate simulation

models of diﬀerent components of the system. Indeed, this

oﬀers several opportunities, namely (1) detection of anoma-

lies when the system deviates from the simulation model ;

(2) diagnostic of dysfunctions when they occur ; (3) explo-

ration of diﬀerent scenarios for system evolution in case of

dysfunction ; (4) recommendation for repair actions.

Eﬀective maintenance management methods are vital, and

industries seek to minimize the number of operational fail-

ures. The availability of large volume of data coming from

sensors of a CBS makes the use of Machine Learning tech-

niques, supervised or unsupervised, very appealing. Typical

unsupervised ML approaches are related to anomaly detec-

tion [66] where an alarm should be triggered when the be-

havior of the CBS diﬀers from normal running. Typical

supervised ML approaches in the context of manufacturing

and industry are related to predictive maintenance [14,64].

Predictive Maintenance (PdM) is a data-driven approach

that emerged in Industry 4.0. It uses statistical analysis,

Machine Learning (ML) models for modeling complex sys-

tems behavior, identifying trends and predicting failures.

We review below some challenges of the paper in light of

this domain. Challenge #1 (extending non-myopia to unsu-

pervised approaches) is relevant, since an eﬃcient anomaly

detection system requires unsupervised approaches which

can be combined with physics-based simulations of the dif-

ferent components. Challenge #2 (other supervised tasks)

is also appropriate since both classiﬁcation and regression

problems appear (e.g. breakdown occurrence, prediction of

energy consumption). Challenge #4 (data type agnostic)

is especially relevant for DT’s, since a system is always com-

posed of several heterogeneous components. In this situa-

tion, the update of one component or one or several sensors

would be much easier and cheaper if ML-EDM were data

type agnostic. DT’s operating at a system level leads to

complex prediction models and complex decisions since the

diﬀerent components operate diﬀerently but in interaction

(cf. challenge #5). The ability to manage non-stationarity

(cf. challenge #7) is obviously central in DT’s: aging and

wearing of equipment lead to covariate and concept drifts

which must be taken into account.

10.3 Social networks: societal and psycholog-

ical risks

Online social networking platforms are more popular than

ever. They radically transform the way we communicate

with each other. However, this transformation comes with

many problems on both sides, for users and platforms.

For example, Fake news spread widely during the covid pan-

demic. [5] tackled this problem as a binary classiﬁcation

problem where classes are “fake” and “real” news. Fake

accounts are also considered a major problem , as they

are among the main culprits in spreading false information.

For instance, [8, 21, 25, 74] use Machine Learning techniques

to detect these fake account based on interactions between

users. Fake accounts can also be used for harassment and

can induce major psychological risks [81]. The detection

of depression and risk of suicide has been addressed using

Machine Learning techniques in [15,39].

Decisions taken by Machine Learning models to prevent such

risks on social networks are clearly time-sensitive:

•Fake news must be