Content uploaded by Avigdor Gal

Author content

All content in this area was uploaded by Avigdor Gal on Dec 07, 2021

Content may be subject to copyright.

The Thirty-Third AAAI Conference on Artiﬁcial Intelligence (AAAI-19)

Congestion Graphs for Automated Time Predictions

Arik Senderovich, J. Christopher Beck

Mechanical and Industrial Engineering

University of Toronto

Canada

sariks@mie.utoronto.ca

jcb@mie.utoronto.ca

Avigdor Gal

Industrial Engineering and Management

Technion-Israel Institute of Technology

Israel

avigal@technion.ac.il

Matthias Weidlich

Dept. of Computer Science

Humboldt University zu Berlin

Germany

matthias.weidlich@hu-berlin.de

Abstract

Time prediction is an essential component of decision making

in various Artiﬁcial Intelligence application areas, including

transportation systems, healthcare, and manufacturing. Predic-

tions are required for efﬁcient resource allocation and schedul-

ing, optimized routing, and temporal action planning. In this

work, we focus on time prediction in congested systems, where

entities share scarce resources. To achieve accurate and ex-

plainable time prediction in this setting, features describing

system congestion (e.g., workload and resource availability),

must be considered. These features are typically gathered

using process knowledge, (i.e., insights on the interplay of

a system’s entities). Such knowledge is expensive to gather

and may be completely unavailable. In order to automatically

extract such features from data without prior process knowl-

edge, we propose the model of congestion graphs, which are

grounded in queueing theory. We show how congestion graphs

are mined from raw event data using queueing theory based

assumptions on the information contained in these logs. We

evaluate our approach on two real-world datasets from health-

care systems where scarce resources prevail: an emergency

department and an outpatient cancer clinic. Our experimental

results show that using automatic generation of congestion

features, we get an up to

23%

improvement in terms of rela-

tive error in time prediction, compared to common baseline

methods. We also detail how congestion graphs can be used to

explain delays in the system.

Introduction

Accurate time prediction is important in domains where hav-

ing an accurate estimate of resource availability and the du-

ration of tasks is critical for planning, scheduling, resource

allocation, and coordination. In healthcare, the time until a

patient sees a provider in an emergency department is cru-

cial for ambulance routing and provider scheduling (Ang

et al

.

2015). Similarly, in smart cities, predicted travel and

arrival times of public transportation feed directly into rout-

ing and dispatching (Botea, Nikolova, and Berlingerio 2013;

Copyright

c

2019, Association for the Advancement of Artiﬁcial

Intelligence (www.aaai.org). All rights reserved.

Wilkie et al

.

2011). In manufacturing, in turn, predictions of

cycle times for a product are used to set customer due dates

and anticipate job completion times (Backus et al. 2006).

An effective approach to solve a time prediction prob-

lem is to formulate it as a supervised learning task,

where future time points are predicted based on raw event

data (Senderovich et al

.

2015). This data is commonly avail-

able in the form of event logs, recordings of the behavior of

a system, which contain temporal information. For example,

every visit to the emergency department is associated with a

sequence of timestamped events that the patient experienced

(e.g., start of triage and end of treatment).

Previous work has shown that congestion has a substan-

tial impact on the total time spent in a system (Gal et al

.

2017) and hence on the quality of time prediction. However,

event logs lack explicit information on the load imposed by

arriving entities that are processed by shared (and scarce)

resources. State-of-the-art methods, therefore, consider ad-

ditional features that capture congestion of shared resources.

These features are elicited by gathering extensive knowl-

edge about the underlying process (e.g., by conducting inter-

views with stakeholders) and subsequently computed from

the event logs (Ang et al

.

2015). However, process knowl-

edge is expensive to gather and not always easy to elicit as

stakeholders often lack a global view of the process. It is well-

known that elicitation of process knowledge is hindered by its

fragmentation across stakeholders, their focus on individual

entities, and a general lack of conceptualization capabili-

ties (Rosemann 2006; Frederiks and van der Weide 2006;

Dumas et al

.

2018). In addition, manual feature elicitation

is often time consuming and prone to biases and errors. The

process of feature generation is considered an art, making

it difﬁcult to automate (Khurana, Samulowitz, and Turaga

2018).

In this work, we address the challenge of automatically

generating congestion features based on the information avail-

able in event logs, thus removing the need for prior process

knowledge. To this end, we propose a data-driven method

rooted in queueing theory, a sub-ﬁeld in Operations Research

4854

that analyzes the impact of congestion on a system’s perfor-

mance (Bolch et al. 2006). Our contribution is threefold.

1.

We introduce congestion graphs, dynamic networks that

capture queueing information.

2.

We present a declarative mining procedure that automati-

cally constructs congestion graphs from event data without

the need for process knowledge.

3.

We show how to extract congestion-related features from

congestion graphs.

We empirically test our approach using event logs from two

real-world healthcare systems, predicting the time to meet

the ﬁrst physician in an emergency department and the total

time spent in an outpatient cancer clinic. Incorporating our

congestion features improves the relative error of prediction

by up to

23%

and

14%

, respectively, compared to baseline

prediction methods using the same process knowledge.

Data-Driven Time Predictions

In this section, we deﬁne our data model in the form of

event logs and then pose the problem of automated time

prediction via supervised learning. We conclude the section

with an overview of our approach to generate congestion-

related features from an event log in order to solve the time

prediction problem.

Event Logs

As our data model, we consider event data as collected by

modern information systems (i.e.,event logs) that trace the

events that occur in the underlying system (van der Aalst

2016). For example, in a hospital setting, an event log will

comprise patient pathways, represented by a sequence of

timestamped services that denote treatment steps (e.g., XRAY

ordering, start of physical examination, etc.). 1 is a sample

from the event log of an emergency department. Here, the

handling of a speciﬁc entity (i.e., a patient) is represented by

the notion of a case that is encoded by a case identiﬁer present

in all log entries. Event logs represent raw data for individual

cases, but do not contain explicit system-level information

(e.g., the number of available resources and the number of

cases waiting for a service).

Table 1: An event log of an emergency department.

Case Id Event Name Timestamp

11 Registration 7:30:04

11 Nurse Admission Start 7:35:52

13 Additional Vitals End 7:36:07

13 Lab Tests Results Start 7:40:32

11 Nurse Admission End 7:47:12

13 Lab Tests Results End 7:51:02

12 Additional Vitals Start 7:52:48

11 Order Blood Test 8:05:10

11 Additional Vitals Start 8:36:22

11 Additional Vitals End 8:48:37

12 Additional Vitals End 8:57:45

13 Doctor Admission Start 8:59:08

11 Doctor Admission Start 9:12:45

To formalize the notions of cases and their traces in event

logs, let

I

,

E

, and

T

be the ﬁnite universes of case identiﬁers,

event names, and timestamps, respectively. Then, an event

log

L⊆(I × E × T )

is a set of log entries, triplets that

combine a case, an event name, and a timestamp.

We deﬁne some short-hand notation to refer to the log

entries of a single case. Given a log

L

, the trace

σi

of case

i∈ I comprises all the related log entries in order:

σi=h(i, ei

1, ti

1),(i, ei

2, ti

2),...,(i, ei

n, ti

n)i

with

σi(q)=(i, ei

q, ti

q)∈L

,

1≤q≤n

, such that

tq<

tr

for

1≤q < r ≤n

(log entries are ordered by their

timestamp) and

S1≤q≤n{(i, ei

q, ti

q)}={(ir, er, tr)∈L|

ir=i}

(the trace contains all log entries of case

i

). As such,

ei

q

and

ti

q

denote the event name and timestamp of the

q

-th log

entry of the trace of case

i

, respectively. We assume that the

ﬁrst event

ei

1

is an arrival event of case

i

into the system. In

what follows, we shall omit sub- and superscripts whenever

clear from the context. Moreover, we write

|σi|=ni

to

denote the length of the i-th trace.

Time Prediction with Supervised Learning

We are interested in predicting the timestamp of an event

related to a speciﬁc case. For example, in an emergency

department, we are interested in the time that a patient sees

a physician for the ﬁrst time, as it is crucial information

for online ambulance routing (for acute patients) and for a

patient’s choice of an emergency department (for low-acuity

patients) (Ang et al

.

2015). In other contexts, such as the

treatment of cancer patients, the time until the end of the last

treatment step is an important indicator for quality-of-service.

The prediction target is therefore the time

te

, when a pa-

tient ﬁrst reaches a speciﬁc event

e∈ E

, conditioned on

time

t1

of arrival (

e1

). Using supervised learning, every log

entry in the training set is given a label

y=te−t1

for the

prediction target, denoting the universe of such labels by

Y

.

Consequently, the input for the learning algorithm is a labeled

event log denoted by

Ly⊆ L × Y

. We aim to obtain a func-

tion

h:L → Y

, which maps log entries to corresponding

labels (Shalev-Shwartz and Ben-David 2014).

A main challenge when applying supervised learning to

solve prediction problems is to obtain features that explain the

target variable. However, in systems with shared and scarce

resources, raw event recordings do not contain congestion

related information, such as system load and the number

of available resources. In this work, we therefore propose a

feature transformation function

φ:L × Y → X × Y

that

maps raw labeled event recordings into a set of features (

X

)

with the following two capabilities:

(i)

The proposed method is automatically applicable with-

out prior knowledge of the system under investigation

or the speciﬁc semantics of events recorded in the log;

(ii)

The proposed method is grounded in well-established

results from queueing theory, thereby guiding the fea-

ture generation procedure with insights on the impact

of congestion on the system’s temporal behavior.

Approach Overview

We use a model-driven approach to automatically generate

congestion-related features, as illustrated in 1. Given an event

4855

EHR

Event Log

Mining

Congestion Graphs Congestion

Graphs

Infinite

Resources

Steady-State

State-Aware

Feature Learning Enriched

Data

Congestion Graphs

Infinite

Resources Steady-

State State-

Aware

Mining Congestion Graphs

Feature Learning

EHR

Event Log

Enriched

Data

Mining

Congestion

Graphs

EHR

Event Log Enriched

Data

Feature

Learning

Congestion

Graphs

Infinite

Resources

Steady-

State

State-

Aware

Congestion

Graph

Discovery

Event Log Enriched

Event Log

Feature

Extraction

Congestion

Graphs

Infinite

Resources

Snapshot

Markovian

Congestion

Graph

Mining

Event Log Enriched

Event Log

Feature

Extraction

Congestion

Graph

Figure 1: Our solution to generate congestion features.

log, we ﬁrst mine congestion graphs, graphical representa-

tions of the dynamics observed in the system. These dynamic

graphs represent the ﬂow of entities in terms of events and are

labeled with performance information that is extracted from

the event log. Extraction of such performance information is

grounded in some general assumptions on the system dynam-

ics: in this work, on a state representation of an underlying

queueing system. Lastly, we create a transformation function

φG

that encodes the labels of a congestion graph into respec-

tive features. This feature creation yields an enriched event

log, which can be used as input for a supervised learning

method.

Congestion Graphs

We start the section with an overview of a general queueing

model that serves as our theoretical basis, before introduc-

ing the model of congestion graphs. Then, we demonstrate

mining of congestion graphs and show how these graphs are

used for feature extraction.

Generalized Jackson Networks

For time prediction in queueing networks, we consider the

model of a Generalized Jackson Network (GJN), the most

general model in single-server queueing theory (Gamarnik

and Zeevi 2006). A GJN describes a network of queueing

stations, where entities wait for a particular service (e.g., a

treatment step) that is conducted by or uses shared resources.

The entities are assumed to be non-distinguishable and

may arrive exogenously into any of the queueing stations ac-

cording to a renewal process. Upon arrival, entities are served

in a First-Come First-Served order by a single resource, with

service times being independent and identically distributed.

Hence, the length-of-stay (or sojourn time) at a queueing

station is the sum of waiting time and service time. When

entities complete service at a station, they are either routed to

the next station or depart the system. Routing is assumed to

be Bernoulli distributed: a coin is ﬂipped at the end of service

to decide on the next station (or departure).

As a GJN model postulates that each station has a single

resource, multiple resources are modeled by an increased

processing rate of a station: the service rate is multiplied by

the number of resources.

The state of the GJN corresponds to a Markov process,

known as the Markov state representation (MSR), that com-

prises three components: the queue length, the elapsed

time since the most recent arrival, and the time since the

start of the most recent service (Gamarnik and Zeevi 2006;

Chen and Yao 2013). To capture the state at time

t

, the three

components must be measured just prior to time t.

The Model of Congestion Graphs

A congestion graph is a fully-connected, vertex-labeled, di-

rected graph,

G= (V, F, ω)

with

V

being the vertices and

F=V×V

being the edges. The labeling is based on a

universe

Ω

of vertex labels and is time-varying. With

T

as

the universe of timestamps (as introduced above for event

logs), function

ω:V× T → Ω

assigns a label to vertices at

particular points in time. We denote

ωt(v)

the label of vertex

v∈Vat time t∈ T .

In our work, we deﬁne congestion graph labels using the

MSR of a GJN. Speciﬁcally, a congestion graph can be

thought of as a GJN where each edge represents a queueing

station. The time that cases spend on edges of the congestion

graph represent service times, while events (in the event log)

correspond to congestion graph vertices. Hence, given a point

in time

t

and an edge

(v, v0)

of the congestion graph, its MSR

is given by a triplet that consists of: (1) the number of cases

traveling on edge

(v, v0)

; (2) the time elapsed since the most

recent arrival of a case into edge

(v, v0)

; and (3) the time

elapsed since the start of the most recent service at

(v, v0)

.

However, we cannot determine the edge of an ongoing case

at time

t

as this information is not directly accessible in event

logs. At a time point

t

, we only know the last event observed

for each case (

v

), without knowing the next event (

v0

). Thus,

we label the vertices of the congestion graph rather than its

edges.

Following this idea, we construct the congestion graph

G= (V, F, ω)

by setting the vertices

V

to be the set of all

events observed in the log and by assigning time-dependent

vertex labels, as approximations of the MSR. Speciﬁcally, we

set

ωt(v)

to be a tuple

(n(v, t), (v, t), τ (v, t))

, where

n(v, t)

is the number of cases for which

v

is the most recent event

(i.e., the number of cases that are in transition to the service

after

v

);

(v, t)

is the total time since these cases visited

v

(i.e., the accumulated partial transition delays); and

τ(v, t)

is the time between the two most recent occurrences of the

respective event v.

Feature Extraction from Mined Congestion Graphs

We conclude this section by providing the declarative proce-

dure to derive the approximated MSR from an event log

L

and demonstrating how to extract features from the mined

congestion graph.

Given an event log

L

, mining of a congestion graph in-

volves the extraction of events that yield the vertices of the

graph,

V={e∈ E | (i, e, t)∈L}

, the identiﬁcation

of dependencies between the events that yield the edges,

F={(ei

q, ei

q+1)∈(E × E)|i∈ I,1≤q < |σi|}

, and the

deﬁnition of the labeling function. As explained above, these

labels are deﬁned for particular points in time. However, in

practice, the labeling function does not need to be deﬁned for

every timestamp in

T

, but may be limited to the timestamps

that appear in the event log (T={t∈ T | (i, e, t)∈L}).

We derive the labels in terms of the approximated MSR

as follows. The number of cases in transition at time

t

, for

4856

1 2

Registration

3 4 5 6

Additional

vitals

Lab

tests resu lts

Doctor

admissio n

Nurse

admission

Order

blood test

Figure 2: A part of the congestion graph constructed using

the event log of 1.

which the last event was vis given by:

n(v, t) =

i∈ I | ∃ 1≤q≤ |σi|:ei

q=v∧ti

q< t < ti

q+1.

The total elapsed time

(v, t)

for cases, for which event

v

has

just been observed, is calculated as:

(v, t) =

X

i∈I

t−ti

q| ∃ 1≤q≤ |σi|:ei

q=v∧ti

q< t < ti

q+1.

Finally, the time between the two most recent occurrences of

events vprior to time tis deﬁned as:

τ(v, t) = t0−t00,with t0= max

i∈I,1≤q≤|σi|

ei

q=v∧ti

q<t<ti

q+1

ti

q

t00 = max

i∈I,1≤q≤|σi|

ei

q=v∧ti

q<t0<ti

q+1

ti

q

Note that the mining procedure for label derivation has a

complexity that is linear in the number of events recorded in

the event log: the algorithm makes a single pass over the log

to compute the labels.

We illustrate mining a congestion graph using the event log

of 1. The general structure of the congestion graph is shown

in 2, which maps out all the events and their dependencies

as recorded in the event log. Note that for clarity the ﬁgure

presents only edges that appear Table 1 rather than showing

the fully-connected congestion graph. We further illustrate

the MSR of one of the graph’s vertices. Consider the fourth

event, referring to the additional vitals. The MSR

ωt(4)

of

this event is estimated for time 9:00:00 as follows: Two pa-

tients are in transition (patients 11 and 12), their accumulated

delay is

13m38s

, and the delay between the respective treat-

ment events is

9m8s

. Hence, the MSR for the fourth event at

time 9:00:00 is given as ωt(4) = (2,13m38s, 9m8s)).

The vertex labels of the congestion graph induce a set of

congestion features. For a graph

G= (V, F, ω)

, the transfor-

mation applied to the event log to extract these features at

time t, denoted by φG, is simply:

φG:L × Y → L × Ω× Y ,

φG(i, ei

q, ti

q, y)=(i, ei

q, ti

q, ωt(ei

q)),

with qbeing the most recent event with respect to t.

Evaluation

In this section, we present the main ﬁndings of evaluating

our congestion mining technique against real-world event

logs from two healthcare systems, namely an emergency

department and a large outpatient cancer clinic. Our main

results are summarized as follows:

•

Extracting features from congestion graphs increases the

accuracy of time prediction by up to

8%

with respect to

the best benchmark.

•

In terms of relative error (i.e., the ratio between the error

and the actually observed time), we achieve improvements

of up to 23%.

•

Congestion graphs are able to provide insights into causes

of delay via feature ranking.

Experimental Setup and Procedure

We ﬁrst describe the experimental setup in terms of the two

real-world datasets and the benchmarks used to assess the

applicability of our approach. We then outline the overall

experimental procedure and implementation and deﬁne our

accuracy evaluation measures.

Datasets and Time Prediction Queries

Our experiments

use two real-world event logs:

•

ED: The event log from the Electronic Health Records

of an Israeli emergency department that serves approxi-

mately

100

patients per day. Every patient that enters the

emergency department receives a bar-code that is scanned

at the start and end of every medical procedure. A subset

of the patient treatment events was illustrated in 1 and 2.

The actual treatment procedures, however, are more com-

plex, as there are 13 different types of treatments. The

dataset covers April 2014 to December 2014 and includes

approximately 42,000 patient visits.

•

CC: An outpatient cancer clinic (the Dana-Farber Cancer

Institute in Boston, MA), in which

250

health providers

serve

1,000

patients per day. The dataset is based on a

track log that comes from a Real-Time Locating System

(RTLS). The resulting event log is based on nearly

1,000

RTLS sensors that track patients, physicians, nurses, and

equipment with a resolution of

3

seconds, thereby moni-

toring the system in real-time. The recordings contain sen-

sor (location) description and the ﬂoor number where the

tracked entity was observed. The dataset contains record-

ings between April 2014 and December 2014.

Comparing the two datasets, we observe that the average

length-of-stay for ED is 300 minutes with a standard de-

viation of 307 minutes, while for CC patients the stay is

approximately 150 minutes with a standard deviation of 120

minutes. ED patients wait for a physician an average of 60

minutes. Furthermore, the emergency department (ED) oper-

ates on a 24/7 basis, while the outpatient clinic (CC) opens

at 6:00 and closes for new arrivals at 18:00. Both healthcare

systems experience high load during morning hours: for ED

the load peaks between 10:00 and noon, while for CC, the

high load period spans 9:00 to noon.

For each healthcare system, we chose the query that is

most relevant given the speciﬁc application context. That is,

4857

for ED, we predict the time-to-physician upon a patient’s

arrival. For CC, our prediction target is the length-of-stay of

an arriving patient.

Baseline Techniques

We compare our approach for time

prediction based on features extracted from mined con-

gestion graphs against several baseline techniques. First,

we consider the long-term average (LongTerm) based on

the training set. This technique should perform poorly as

it does not account for varying congestion levels. How-

ever, it is often used for time prediction in hospitals across

the United States (Dong, Yom-Tov, and Yom-Tov 2015;

Ang et al

.

2015). Second, a reﬁned version of LongTerm

is a rolling horizon predictor that is based on the moving

average of

H

periods (e.g., hours) (Ang et al

.

2015). We

denote it by Rolling(H) and cross-validate the optimal

H

using the training data. Third, we use an hourly average

(HourAvg) to accommodate for seasonal effects, deriving

time-of-day information from the timestamps assigned to log

entries. Fourth, we use the snapshot predictor, which predicts

time-to-physician and length-of-stay, respectively, based on

the wait time of the most recent patient that ﬁnished waiting.

This result is considered the state of the art in delay prediction

for single-station queues (Senderovich et al. 2015).

Experimental Procedure and Implementation

We fol-

low the training-validation-test paradigm (Friedman, Hastie,

and Tibshirani 2001) to evaluate our approach and randomly

partition the two datasets into training data and test data.

Speciﬁcally, for each dataset we make the following four

partitions:

•

Single month training: We use patients that arrived during

April 2014 as training data and patients that were admitted

during May 2014 as test data. This reduces the possibility

of concept drift, at the expense of reducing the size of the

training set.

•

Summer months: We use April 2014 - June 2014 for train-

ing and test the technique on patients that arrived during

July 2014. We leave out winter months as they are known

to be heavily loaded (concept drift).

•

Entire year: we use April 2014 - October 2014 for training

and November 2014 - December 2014 for testing. This

increases the variability due to concept drift, yet provides

the learning algorithm with much more training data.

•

Peak hours: We choose the heavily loaded hours for each

of the healthcare systems, as measured by the arrival rates

of patients. As in the entire year scenario, we use April

2014 - October 2014 for training and November 2014 -

December 2014 for testing.

In our experiments, we rely on a state-of-the-art supervised

learning algorithm, XGBoost (Chen and Guestrin 2016), im-

plemented in Python. It is employed to learn a function

h

(see Motivation) based on the training set, validate its hyper-

parameters using cross-validation on the validation set (the

training data is partitioned 80/20 chronologically for this

purpose), and evaluate prediction accuracy on the test set.

All algorithms for congestion graph mining and feature

extraction are implemented in Python and are publicly avail-

able.

1

Our experiments were conducted on an 8-core server,

Intel Xeon CPU E5-2660 v4 @ 2.00GHz, each core being

equipped with 32GB main memory, running on Linux Centos

7.3 OS.

Evaluation Measures

We measure the accuracy of predic-

tion with three empirical measures. First, the Root Mean

Squared Error (RMSE) is based on the squared difference

between the actual time and the predicted value. Let

y∗

l

be the

actual value of

yl

, the time of interest for a log entry of the

test set

l∈Ltest

. With

ˆyl

be the predicted value, the RMSE

is deﬁned as:

RM SE =s1

|Ltest|X

l∈Ltest

[ ˆyl−y∗

l]2.

RMSE quantiﬁes the error in the time units of the original

measurements, in our case, seconds (which are converted to

minutes below for convenience).

The RMSE is sensitive to outliers (Friedman, Hastie, and

Tibshirani 2001). Therefore, in addition, we consider the ab-

solute error, which is known to be more robust (Friedman,

Hastie, and Tibshirani 2001). Speciﬁcally, we use the fol-

lowing two measures. The Mean Absolute Error (MAE) is

deﬁned as:

MAE =1

|Ltest|X

l∈Ltest

|ˆyl−y∗

l|,

and quantiﬁes the absolute deviation between the predicted

value and the real value. The Mean Absolute Relative Error

(MARE), in turn, is deﬁned as:

MARE =1

|Ltest|X

l∈Ltest

|ˆyl−y∗

l|

y∗

l

,

and quantiﬁes the ratio between the absolute error and the pre-

dicted value. The latter is used to provide a relative measure

for the absolute error, as an error of 10 minutes in a 100-

minute length-of-stay is tolerable, while the same error in a 5

minute length-of-stay points toward a signiﬁcant problem in

the method.

Results

The main results of our experiments are summarized in 2.

The rows correspond to all combinations of dataset (ED

and CC), the training period, and the method (‘LongTerm’,

‘Rolling(H)’, ‘HourAvg’, ‘Snapshot’, and ‘CG’ for conges-

tion graph). To denote the training and test periods, we use

the numeric values of months (e.g., 4 for April). Further,

we add the relevant hours for the high load scenario (e.g.,

9-12 corresponds to 9:00-noon). The boldfaced values are

the dominating methods in terms of the three measures. The

values of the ﬁrst two accuracy measures (RMSE and MAE)

correspond to the prediction error in minutes. The third accu-

racy measure, namely MARE, is a ratio between the absolute

error and the actual time that we wish to predict.

As shown in 2, considering inter-patient dependencies in

the data, by means of features extracted from congestion

1http://bit.ly/2lcq37s

4858

Table 2: Prediction accuracy based on the test set.

DS Time Period Method RMSE MAE MARE

ED

Tr=5

LongTerm 46 33 0.79

Rolling(H) 47 33 0.73

Test=6 HourAvg 47 33 0.72

Snapshot 47 34 0.74

CG 45 32 0.70

Tr=4,5,6

LongTerm 43 33 0.77

Rolling(H) 42 31 0.74

Test=7 HourAvg 43 32 0.76

Snapshot 43 31 0.74

CG 41 29 0.69

Tr=4:10

LongTerm 99 38 1.48

Rolling(H) 98 35 1.37

Test=11,12 HourAvg 100 38 1.46

Snapshot 101 42 1.65

CG 97 32 1.27

Tr=4,5,6

LongTerm 39 28 0.67

Rolling(H) 39 27 0.65

Test=7 HourAvg 38 28 0.64

High Load (10-12) Snapshot 38 27 0.64

CG 36 25 0.60

CC

Tr=5

LongTerm 118 95 1.35

Rolling(H) 115 90 1.28

Test=6 HourAvg 112 89 1.26

Snapshot 120 96 1.36

CG 106 82 1.17

Tr=4,5,6

LongTerm 123 96 1.3

Rolling(H) 117 90 1.22

Test=7 HourAvg 115 89 1.2

Snapshot 122 94 1.27

CG 108 81 1.09

Tr=4:10

LongTerm 123 97 1.36

Rolling(H) 119 92 1.3

Test=11,12 HourAvg 117 93 1.28

Snapshot 123 95 1.33

CG 110 83 1.16

Tr=4,5,6

LongTerm 114 93 1.37

Rolling(H) 113 92 1.36

Test=7 HourAvg 114 93 1.34

High Load (9-12) Snapshot 114 93 1.36

CG 104 82 1.2

graphs, improves prediction accuracy beyond the baselines

(‘LongTerm’, ‘Rolling(H)’, ‘HourAvg’, ‘Snapshot’), espe-

cially when considering the MARE measure. When consider-

ing the time-to-physician in the emergency department (ED),

congestion features increase prediction accuracy by up to

6%

.

As for relative error (ratio between the error and the actual

time), we observe an improvement of

23%

. This general trend

is mirrored for the second dataset. For the cancer clinic (CC),

congestion features improve the accuracy of length-of-stay

prediction by up to

8%

, while the relative error is improved

by up to

14%

. The consistent results for both datasets provide

evidence that the automatic extraction of congestion features

indeed improves the accuracy of time prediction signiﬁcantly.

Table 3: Importance of congestion features (ED dataset).

Ranking Feature Description

1n(1) # of Patients in Reception

2(5) Elapsed Time: Lab Results

3(4) Elapsed Time: Additional Vitals

When observing the difference between entire year pre-

diction and shorter periods, we encounter a noticeable, yet

expected concept drift. Predicting winter months using the

beginning of the year is expected to perform worse than

short-term predictions, as winter behavior is different due to

higher arrival rates into the emergency department and an

increased number of cancellations in the outpatient hospital.

Speciﬁcally, the error grows by a factor of

2

, compared to

summer-time prediction. Testing the different predictors for

their robustness to concept drift, we discover that congestion

features deteriorate less than other prediction methods across

the different selections of time periods for training and test.

Insights using Feature Importance

Providing insights as to the most important features and root-

causes for delays in the system is a crucial step when opti-

mizing systems. We now take the dataset of the emergency

department (ED) as an example to show how the features

obtained from congestion graphs provide insights into the

root-causes of delays.

We evaluate feature importance by ranking features accord-

ing to their role in the prediction task. Speciﬁcally, gradient

boosting enables the ranking of features in correspondence to

their predictive power (Pan et al. 2009). 3 presents the top-3

features given as an output by the cross-validated XGBoost

method during heavily loaded hours.

The extracted features (over all times in the event

log, hence time index

t

is omitted) are denoted by

(n(v), (v), τ (v))

with

n(v)

being the number of cases for

which

v

is the most recent event,

(v)

being the total time

since these cases visited

v

and

τ(v)

being the time between

the two most recent occurrences of the respective event

v

.

Also, for illustration purposes, 3 shows the full congestion

graph, which represents the pathway of a patient in the Emer-

gency Department. The vertices and outgoing edges that

correspond to the highest-ranked congestion features are

highlighted. Recall that the congestion graph was created

automatically from the event log and that, as noted in the

Introduction, such a system view is exactly what is expen-

sive and difﬁcult to obtain through traditional methods. The

dominant feature for the emergency department based on

the congestion graph is

n(1)

, the number of patients who

entered reception. This implies that a greater arrival volume

has an impact on time prediction, as it results in delays. The

second feature,

(5)

, corresponds to the elapsed time since

lab results are ready (i.e., blood work). This feature is highly

predictive as the next step after lab is typically the visit to the

physician, the prediction target. Hence, an important feature

is the time in queue for the physician (which is

(5)

). For the

same reason, feature

(4)

turned out to be of high predictive

4859

1 2

Registration

3 4 5 6 7

8

9

10

12

1113

Nurse

admission

Order

bloodtest

Additional

vitals

Lab

testsresults

Doctor

admission

Orderimagingtest

Order

consultant

Imaging

Imagingdecrypting

Consultancy

Orderexternalexams

External

exams

Figure 3: Main treatment events and ﬂows; events and ﬂows of important features are highlighted.

power, as some patients immediately pass from the checking

vitals to the physician.

To summarize, an interpretation of feature importance

yields insights on root causes of delays in patient treatment.

As such, mined congestion graphs provide a means for analy-

sis and understanding of the process beyond time prediction.

Related Work

The importance of extracting features that account for con-

gestion has been recognized in the literature. A recent work

proposes the Q-Lasso method for predicting waiting times

in emergency departments (Ang et al

.

2015). The authors as-

sume full knowledge of the patient ﬂow process and use this

knowledge to manually deﬁne queueing features (e.g., the

number of patients waiting for a physician) that are inserted

into a Lasso regression model for feature selection. Similarly,

Senderovich et al

.

(2015) proposed a single-station queueing

model that is heavily based on process knowledge to generate

predictive features. In our work, we do not assume a-priori

knowledge of the process and the events that we observe

in the event log. Furthermore, compared to Senderovich et

al

.

(2015) our approach handles event logs from multi-stage

processes.

Liu et al

.

(2014; 2016) propose a method for discovering

a stochastic workﬂow model from event data that is emitted

by a real-time location system. Applied in healthcare, the

developed model considers dependencies between patients

that stay in the hospital at the same time. However, in these

works, the authors assume known relations between sensor

locations and activities. This information is used to enrich

the data with additional knowledge, while our method does

not require a data enrichment step.

Congestion estimation and prediction have been the subject

of numerous works in trafﬁc analysis (Liu, Yue, and Krishnan

2013; Van Lint 2008). Most works in this area aim to learn a

generative model of dynamic trafﬁc conditions. In contrast,

our work is based on discriminative machine learning and

formalizes this idea using congestion graphs.

Automated feature generation has been a popular research

topic (see Khurana, Samulowitz, and Turaga (2018) and ref-

erences within for a review). Speciﬁcally, given a pre-deﬁned

set of generic feature transformation functions (e.g., sine,

squared root, logarithm), a wide range of techniques has been

applied to elicit optimal transformation sequences, including

reinforcement learning (Khurana, Samulowitz, and Turaga

2018), local search (Markovitch and Rosenstein 2002), and

deep neural networks (Bengio, Courville, and Vincent 2013).

However, these methods are either computationally expensive

(e.g., training a deep neural network) and/or lack the capabil-

ity to discover complex features. In our work, we provide an

approach that generates predictive features that come from

queueing theory and cannot be easily derived using generic

transformation functions. Furthermore, unlike generic fea-

tures, our congestion graph based features can be used for a

root-cause analysis of delays. Importantly, our method has a

complexity that is linear in the number of events recorded in

the event log.

In addition, temporal point processes were ﬁtted from

data to provide accurate time prediction (Lian et al

.

2015;

Trivedi et al

.

2017). These methods learn features from node

representations of a temporal graph, which can then be used

to predict times. An important distinction between our paper

and these two papers is that our congestion graphs are based

on Generalized Jackson Networks, a queueing model that

does not assume prior knowledge on the distributions of

its building blocks (e.g., arrival rates and service times). In

contrast, the two papers, assume that the underlying model

is either a temporal point process (Trivedi et al

.

2017) or a

Gaussian renewal process (Lian et al

.

2015) with parametric

or non-parametric structures.

Lastly, our work also relates to the task of activity predic-

tion, an established problem in the data mining ﬁeld (Mi-

nor, Doppa, and Cook 2015). However, our setting is ori-

ented towards cold start queries, where information about

the progress of a speciﬁc patient is unavailable. Speciﬁcally,

Minor, Doppa, and Cook capture inter-entity dependencies

via pre-deﬁned features, such as the most frequent event type

in a time window. Our method, in contrast, automatically

generates these features using congestion-based reasoning

rooted in queueing theory.

Conclusion

We presented a novel approach for automated feature ex-

traction for time prediction in congested systems, based on

the notion of congestion graphs, dynamic representations of

event data that are grounded in queueing theory. Speciﬁcally,

our notion of congestion graphs is based on a Markovian state

representation of queueing systems. Empirical evaluation

conﬁrms that the features that come from these congestion

graphs improve prediction performance. In addition, we ob-

serve that our approach goes beyond accurate time prediction

by providing insights into the root-causes of system behavior.

4860

Future work involves extending our methods to support

changes in the underlying system. Speciﬁcally, our tech-

niques are prone to failure when the mapping of states to

predictions is unstable. Therefore, we aim at developing an

adaptive online component to compensate for such changes.

Furthermore, congestion graphs result in

O(|V|)

features,

with |V|being the number of events in the data, which ham-

pers its scalability. Speciﬁcally, in large systems with thou-

sands of events, this can lead to feature explosion. Hence,

in the future, we shall provide techniques for regularizing

congestion graphs, e.g., by considering edges that have sig-

niﬁcant predictive power.

Acknowledgements

Research partially funded by the German Research Foun-

dation (DFG) under grant agreement number WE 4891/1-1.

We are also grateful to the Stiftung Industrieforschung for

supporting this work (grant S0234/10220/2017).

References

Ang, E.; Kwasnick, S.; Bayati, M.; Plambeck, E. L.; and Aratow,

M. 2015. Accurate emergency department wait time prediction.

Manufacturing & Service Operations Management 18(1):141–156.

Backus, P.; Janakiram, M.; Mowzoon, S.; Runger, C.; and Bhargava,

A. 2006. Factory cycle-time prediction with a data-mining approach.

IEEE Transactions on Semiconductor Manufacturing 19(2):252–

258.

Bengio, Y.; Courville, A. C.; and Vincent, P. 2013. Representation

learning: A review and new perspectives. IEEE Trans. Pattern Anal.

Mach. Intell. 35(8):1798–1828.

Bolch, G.; Greiner, S.; de Meer, H.; and Trivedi, K. S. 2006. Queue-

ing Networks and Markov Chains - Modeling and Performance

Evaluation with Computer Science Applications; 2nd Edition. Wi-

ley.

Botea, A.; Nikolova, E.; and Berlingerio, M. 2013. Multi-modal

journey planning in the presence of uncertainty. In ICAPS.

Chen, T., and Guestrin, C. 2016. Xgboost: A scalable tree boosting

system. In Proceedings of the 22Nd ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining, 785–794.

ACM.

Chen, H., and Yao, D. D. 2013. Fundamentals of queueing networks:

Performance, asymptotics, and optimization, volume 46. Springer

Science & Business Media.

Dong, J.; Yom-Tov, E.; and Yom-Tov, G. B. 2015. The impact of

delay announcements on hospital network coordination and waiting

times. Technical report, Working paper.

Dumas, M.; Rosa, M. L.; Mendling, J.; and Reijers, H. A. 2018.

Fundamentals of Business Process Management, Second Edition.

Springer.

Frederiks, P. J. M., and van der Weide, T. P. 2006. Information mod-

eling: The process and the required competencies of its participants.

Data Knowl. Eng. 58(1):4–20.

Friedman, J.; Hastie, T.; and Tibshirani, R. 2001. The elements of

statistical learning, volume 1. Springer series in statistics Springer,

Berlin.

Gal, A.; Mandelbaum, A.; Schnitzler, F.; Senderovich, A.; and Wei-

dlich, M. 2017. Traveling time prediction in scheduled transporta-

tion with journey segments. Inf. Syst. 64:266–280.

Gamarnik, D., and Zeevi, A. 2006. Validity of heavy trafﬁc steady-

state approximations in generalized jackson networks. The Annals

of Applied Probability 56–90.

Khurana, U.; Samulowitz, H.; and Turaga, D. S. 2018. Feature

engineering for predictive modeling using reinforcement learning.

In McIlraith, S. A., and Weinberger, K. Q., eds., Proceedings of

the Thirty-Second AAAI Conference on Artiﬁcial Intelligence, New

Orleans, Louisiana, USA, February 2-7, 2018. AAAI Press.

Lian, W.; Henao, R.; Rao, V.; Lucas, J. E.; and Carin, L. 2015. A

multitask point process predictive model. In Proceedings of the

32nd International Conference on Machine Learning, ICML 2015,

Lille, France, 6-11 July 2015, 2030–2038.

Liu, C.; Ge, Y.; Xiong, H.; Xiao, K.; Geng, W.; and Perkins, M.

2014. Proactive workﬂow modeling by stochastic processes with

application to healthcare operation and management. In Proceedings

of the 20th ACM SIGKDD international conference on Knowledge

discovery and data mining, 1593–1602. ACM.

Liu, C.; Xiong, H.; Papadimitriou, S.; Ge, Y.; and Xiao, K. 2016. A

proactive workﬂow model for healthcare operation and management.

IEEE Transactions on Knowledge and Data Engineering.

Liu, S.; Yue, Y.; and Krishnan, R. 2013. Adaptive collective routing

using gaussian process dynamic congestion models. In Proceedings

of the 19th ACM SIGKDD international conference on Knowledge

discovery and data mining, 704–712. ACM.

Markovitch, S., and Rosenstein, D. 2002. Feature generation using

general constructor functions. Machine Learning 49(1):59–98.

Minor, B.; Doppa, J. R.; and Cook, D. J. 2015. Data-driven activity

prediction: Algorithms, evaluation methodology, and applications.

In Proceedings of the 21th ACM SIGKDD International Conference

on Knowledge Discovery and Data Mining, 805–814. ACM.

Pan, F.; Converse, T.; Ahn, D.; Salvetti, F.; and Donato, G. 2009.

Feature selection for ranking using boosted trees. In Proceedings of

the 18th ACM conference on Information and knowledge manage-

ment, 2025–2028. ACM.

Rosemann, M. 2006. Potential pitfalls of process modeling: part A.

Business Proc. Manag. Journal 12(2):249–254.

Senderovich, A.; Weidlich, M.; Gal, A.; and Mandelbaum, A. 2015.

Queue mining for delay prediction in multi-class service processes.

Information Systems 53:278–295.

Shalev-Shwartz, S., and Ben-David, S. 2014. Understanding ma-

chine learning: From theory to algorithms. Cambridge University

Press.

Trivedi, R.; Dai, H.; Wang, Y.; and Song, L. 2017. Know-evolve:

Deep temporal reasoning for dynamic knowledge graphs. In Pro-

ceedings of the 34th International Conference on Machine Learning,

ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, 3462–3471.

van der Aalst, W. M. P. 2016. Process Mining - Data Science in

Action, Second Edition. Springer.

Van Lint, J. 2008. Online learning solutions for freeway travel

time prediction. IEEE Transactions on Intelligent Transportation

Systems 9(1):38–47.

Wilkie, D.; van den Berg, J. P.; Lin, M. C.; and Manocha, D. 2011.

Self-aware trafﬁc route planning. In AAAI, volume 11, 1521–1527.

4861