ArticlePDF Available

Abstract and Figures

Time prediction is an essential component of decision making in various Artificial Intelligence application areas, including transportation systems, healthcare, and manufacturing. Predictions are required for efficient resource allocation and scheduling, optimized routing, and temporal action planning. In this work, we focus on time prediction in congested systems, where entities share scarce resources. To achieve accurate and explainable time prediction in this setting, features describing system congestion (e.g., workload and resource availability), must be considered. These features are typically gathered using process knowledge, (i.e., insights on the interplay of a system’s entities). Such knowledge is expensive to gather and may be completely unavailable. In order to automatically extract such features from data without prior process knowledge, we propose the model of congestion graphs, which are grounded in queueing theory. We show how congestion graphs are mined from raw event data using queueing theory based assumptions on the information contained in these logs. We evaluate our approach on two real-world datasets from healthcare systems where scarce resources prevail: an emergency department and an outpatient cancer clinic. Our experimental results show that using automatic generation of congestion features, we get an up to 23% improvement in terms of relative error in time prediction, compared to common baseline methods. We also detail how congestion graphs can be used to explain delays in the system.
Content may be subject to copyright.
The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19)
Congestion Graphs for Automated Time Predictions
Arik Senderovich, J. Christopher Beck
Mechanical and Industrial Engineering
University of Toronto
Canada
sariks@mie.utoronto.ca
jcb@mie.utoronto.ca
Avigdor Gal
Industrial Engineering and Management
Technion-Israel Institute of Technology
Israel
avigal@technion.ac.il
Matthias Weidlich
Dept. of Computer Science
Humboldt University zu Berlin
Germany
matthias.weidlich@hu-berlin.de
Abstract
Time prediction is an essential component of decision making
in various Artificial Intelligence application areas, including
transportation systems, healthcare, and manufacturing. Predic-
tions are required for efficient resource allocation and schedul-
ing, optimized routing, and temporal action planning. In this
work, we focus on time prediction in congested systems, where
entities share scarce resources. To achieve accurate and ex-
plainable time prediction in this setting, features describing
system congestion (e.g., workload and resource availability),
must be considered. These features are typically gathered
using process knowledge, (i.e., insights on the interplay of
a system’s entities). Such knowledge is expensive to gather
and may be completely unavailable. In order to automatically
extract such features from data without prior process knowl-
edge, we propose the model of congestion graphs, which are
grounded in queueing theory. We show how congestion graphs
are mined from raw event data using queueing theory based
assumptions on the information contained in these logs. We
evaluate our approach on two real-world datasets from health-
care systems where scarce resources prevail: an emergency
department and an outpatient cancer clinic. Our experimental
results show that using automatic generation of congestion
features, we get an up to
23%
improvement in terms of rela-
tive error in time prediction, compared to common baseline
methods. We also detail how congestion graphs can be used to
explain delays in the system.
Introduction
Accurate time prediction is important in domains where hav-
ing an accurate estimate of resource availability and the du-
ration of tasks is critical for planning, scheduling, resource
allocation, and coordination. In healthcare, the time until a
patient sees a provider in an emergency department is cru-
cial for ambulance routing and provider scheduling (Ang
et al
.
2015). Similarly, in smart cities, predicted travel and
arrival times of public transportation feed directly into rout-
ing and dispatching (Botea, Nikolova, and Berlingerio 2013;
Copyright
c
2019, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
Wilkie et al
.
2011). In manufacturing, in turn, predictions of
cycle times for a product are used to set customer due dates
and anticipate job completion times (Backus et al. 2006).
An effective approach to solve a time prediction prob-
lem is to formulate it as a supervised learning task,
where future time points are predicted based on raw event
data (Senderovich et al
.
2015). This data is commonly avail-
able in the form of event logs, recordings of the behavior of
a system, which contain temporal information. For example,
every visit to the emergency department is associated with a
sequence of timestamped events that the patient experienced
(e.g., start of triage and end of treatment).
Previous work has shown that congestion has a substan-
tial impact on the total time spent in a system (Gal et al
.
2017) and hence on the quality of time prediction. However,
event logs lack explicit information on the load imposed by
arriving entities that are processed by shared (and scarce)
resources. State-of-the-art methods, therefore, consider ad-
ditional features that capture congestion of shared resources.
These features are elicited by gathering extensive knowl-
edge about the underlying process (e.g., by conducting inter-
views with stakeholders) and subsequently computed from
the event logs (Ang et al
.
2015). However, process knowl-
edge is expensive to gather and not always easy to elicit as
stakeholders often lack a global view of the process. It is well-
known that elicitation of process knowledge is hindered by its
fragmentation across stakeholders, their focus on individual
entities, and a general lack of conceptualization capabili-
ties (Rosemann 2006; Frederiks and van der Weide 2006;
Dumas et al
.
2018). In addition, manual feature elicitation
is often time consuming and prone to biases and errors. The
process of feature generation is considered an art, making
it difficult to automate (Khurana, Samulowitz, and Turaga
2018).
In this work, we address the challenge of automatically
generating congestion features based on the information avail-
able in event logs, thus removing the need for prior process
knowledge. To this end, we propose a data-driven method
rooted in queueing theory, a sub-field in Operations Research
4854
that analyzes the impact of congestion on a system’s perfor-
mance (Bolch et al. 2006). Our contribution is threefold.
1.
We introduce congestion graphs, dynamic networks that
capture queueing information.
2.
We present a declarative mining procedure that automati-
cally constructs congestion graphs from event data without
the need for process knowledge.
3.
We show how to extract congestion-related features from
congestion graphs.
We empirically test our approach using event logs from two
real-world healthcare systems, predicting the time to meet
the first physician in an emergency department and the total
time spent in an outpatient cancer clinic. Incorporating our
congestion features improves the relative error of prediction
by up to
23%
and
14%
, respectively, compared to baseline
prediction methods using the same process knowledge.
Data-Driven Time Predictions
In this section, we define our data model in the form of
event logs and then pose the problem of automated time
prediction via supervised learning. We conclude the section
with an overview of our approach to generate congestion-
related features from an event log in order to solve the time
prediction problem.
Event Logs
As our data model, we consider event data as collected by
modern information systems (i.e.,event logs) that trace the
events that occur in the underlying system (van der Aalst
2016). For example, in a hospital setting, an event log will
comprise patient pathways, represented by a sequence of
timestamped services that denote treatment steps (e.g., XRAY
ordering, start of physical examination, etc.). 1 is a sample
from the event log of an emergency department. Here, the
handling of a specific entity (i.e., a patient) is represented by
the notion of a case that is encoded by a case identifier present
in all log entries. Event logs represent raw data for individual
cases, but do not contain explicit system-level information
(e.g., the number of available resources and the number of
cases waiting for a service).
Table 1: An event log of an emergency department.
Case Id Event Name Timestamp
11 Registration 7:30:04
11 Nurse Admission Start 7:35:52
13 Additional Vitals End 7:36:07
13 Lab Tests Results Start 7:40:32
11 Nurse Admission End 7:47:12
13 Lab Tests Results End 7:51:02
12 Additional Vitals Start 7:52:48
11 Order Blood Test 8:05:10
11 Additional Vitals Start 8:36:22
11 Additional Vitals End 8:48:37
12 Additional Vitals End 8:57:45
13 Doctor Admission Start 8:59:08
11 Doctor Admission Start 9:12:45
To formalize the notions of cases and their traces in event
logs, let
I
,
E
, and
T
be the finite universes of case identifiers,
event names, and timestamps, respectively. Then, an event
log
L(I × E × T )
is a set of log entries, triplets that
combine a case, an event name, and a timestamp.
We define some short-hand notation to refer to the log
entries of a single case. Given a log
L
, the trace
σi
of case
i∈ I comprises all the related log entries in order:
σi=h(i, ei
1, ti
1),(i, ei
2, ti
2),...,(i, ei
n, ti
n)i
with
σi(q)=(i, ei
q, ti
q)L
,
1qn
, such that
tq<
tr
for
1q < r n
(log entries are ordered by their
timestamp) and
S1qn{(i, ei
q, ti
q)}={(ir, er, tr)L|
ir=i}
(the trace contains all log entries of case
i
). As such,
ei
q
and
ti
q
denote the event name and timestamp of the
q
-th log
entry of the trace of case
i
, respectively. We assume that the
first event
ei
1
is an arrival event of case
i
into the system. In
what follows, we shall omit sub- and superscripts whenever
clear from the context. Moreover, we write
|σi|=ni
to
denote the length of the i-th trace.
Time Prediction with Supervised Learning
We are interested in predicting the timestamp of an event
related to a specific case. For example, in an emergency
department, we are interested in the time that a patient sees
a physician for the first time, as it is crucial information
for online ambulance routing (for acute patients) and for a
patient’s choice of an emergency department (for low-acuity
patients) (Ang et al
.
2015). In other contexts, such as the
treatment of cancer patients, the time until the end of the last
treatment step is an important indicator for quality-of-service.
The prediction target is therefore the time
te
, when a pa-
tient first reaches a specific event
e∈ E
, conditioned on
time
t1
of arrival (
e1
). Using supervised learning, every log
entry in the training set is given a label
y=tet1
for the
prediction target, denoting the universe of such labels by
Y
.
Consequently, the input for the learning algorithm is a labeled
event log denoted by
Ly⊆ L × Y
. We aim to obtain a func-
tion
h:L → Y
, which maps log entries to corresponding
labels (Shalev-Shwartz and Ben-David 2014).
A main challenge when applying supervised learning to
solve prediction problems is to obtain features that explain the
target variable. However, in systems with shared and scarce
resources, raw event recordings do not contain congestion
related information, such as system load and the number
of available resources. In this work, we therefore propose a
feature transformation function
φ:L × Y X × Y
that
maps raw labeled event recordings into a set of features (
X
)
with the following two capabilities:
(i)
The proposed method is automatically applicable with-
out prior knowledge of the system under investigation
or the specific semantics of events recorded in the log;
(ii)
The proposed method is grounded in well-established
results from queueing theory, thereby guiding the fea-
ture generation procedure with insights on the impact
of congestion on the system’s temporal behavior.
Approach Overview
We use a model-driven approach to automatically generate
congestion-related features, as illustrated in 1. Given an event
4855
EHR
Event Log
Mining
Congestion Graphs Congestion
Graphs
Infinite
Resources
Steady-State
State-Aware
Feature Learning Enriched
Data
Congestion Graphs
Infinite
Resources Steady-
State State-
Aware
Mining Congestion Graphs
Feature Learning
EHR
Event Log
Enriched
Data
Mining
Congestion
Graphs
EHR
Event Log Enriched
Data
Feature
Learning
Congestion
Graphs
Infinite
Resources
Steady-
State
State-
Aware
Congestion
Graph
Discovery
Event Log Enriched
Event Log
Feature
Extraction
Congestion
Graphs
Infinite
Resources
Snapshot
Markovian
Congestion
Graph
Mining
Event Log Enriched
Event Log
Feature
Extraction
Congestion
Graph
Figure 1: Our solution to generate congestion features.
log, we first mine congestion graphs, graphical representa-
tions of the dynamics observed in the system. These dynamic
graphs represent the flow of entities in terms of events and are
labeled with performance information that is extracted from
the event log. Extraction of such performance information is
grounded in some general assumptions on the system dynam-
ics: in this work, on a state representation of an underlying
queueing system. Lastly, we create a transformation function
φG
that encodes the labels of a congestion graph into respec-
tive features. This feature creation yields an enriched event
log, which can be used as input for a supervised learning
method.
Congestion Graphs
We start the section with an overview of a general queueing
model that serves as our theoretical basis, before introduc-
ing the model of congestion graphs. Then, we demonstrate
mining of congestion graphs and show how these graphs are
used for feature extraction.
Generalized Jackson Networks
For time prediction in queueing networks, we consider the
model of a Generalized Jackson Network (GJN), the most
general model in single-server queueing theory (Gamarnik
and Zeevi 2006). A GJN describes a network of queueing
stations, where entities wait for a particular service (e.g., a
treatment step) that is conducted by or uses shared resources.
The entities are assumed to be non-distinguishable and
may arrive exogenously into any of the queueing stations ac-
cording to a renewal process. Upon arrival, entities are served
in a First-Come First-Served order by a single resource, with
service times being independent and identically distributed.
Hence, the length-of-stay (or sojourn time) at a queueing
station is the sum of waiting time and service time. When
entities complete service at a station, they are either routed to
the next station or depart the system. Routing is assumed to
be Bernoulli distributed: a coin is flipped at the end of service
to decide on the next station (or departure).
As a GJN model postulates that each station has a single
resource, multiple resources are modeled by an increased
processing rate of a station: the service rate is multiplied by
the number of resources.
The state of the GJN corresponds to a Markov process,
known as the Markov state representation (MSR), that com-
prises three components: the queue length, the elapsed
time since the most recent arrival, and the time since the
start of the most recent service (Gamarnik and Zeevi 2006;
Chen and Yao 2013). To capture the state at time
t
, the three
components must be measured just prior to time t.
The Model of Congestion Graphs
A congestion graph is a fully-connected, vertex-labeled, di-
rected graph,
G= (V, F, ω)
with
V
being the vertices and
F=V×V
being the edges. The labeling is based on a
universe
of vertex labels and is time-varying. With
T
as
the universe of timestamps (as introduced above for event
logs), function
ω:V× T
assigns a label to vertices at
particular points in time. We denote
ωt(v)
the label of vertex
vVat time t∈ T .
In our work, we define congestion graph labels using the
MSR of a GJN. Specifically, a congestion graph can be
thought of as a GJN where each edge represents a queueing
station. The time that cases spend on edges of the congestion
graph represent service times, while events (in the event log)
correspond to congestion graph vertices. Hence, given a point
in time
t
and an edge
(v, v0)
of the congestion graph, its MSR
is given by a triplet that consists of: (1) the number of cases
traveling on edge
(v, v0)
; (2) the time elapsed since the most
recent arrival of a case into edge
(v, v0)
; and (3) the time
elapsed since the start of the most recent service at
(v, v0)
.
However, we cannot determine the edge of an ongoing case
at time
t
as this information is not directly accessible in event
logs. At a time point
t
, we only know the last event observed
for each case (
v
), without knowing the next event (
v0
). Thus,
we label the vertices of the congestion graph rather than its
edges.
Following this idea, we construct the congestion graph
G= (V, F, ω)
by setting the vertices
V
to be the set of all
events observed in the log and by assigning time-dependent
vertex labels, as approximations of the MSR. Specifically, we
set
ωt(v)
to be a tuple
(n(v, t), (v, t), τ (v, t))
, where
n(v, t)
is the number of cases for which
v
is the most recent event
(i.e., the number of cases that are in transition to the service
after
v
);
(v, t)
is the total time since these cases visited
v
(i.e., the accumulated partial transition delays); and
τ(v, t)
is the time between the two most recent occurrences of the
respective event v.
Feature Extraction from Mined Congestion Graphs
We conclude this section by providing the declarative proce-
dure to derive the approximated MSR from an event log
L
and demonstrating how to extract features from the mined
congestion graph.
Given an event log
L
, mining of a congestion graph in-
volves the extraction of events that yield the vertices of the
graph,
V={e∈ E | (i, e, t)L}
, the identification
of dependencies between the events that yield the edges,
F={(ei
q, ei
q+1)(E × E)|i∈ I,1q < |σi|}
, and the
definition of the labeling function. As explained above, these
labels are defined for particular points in time. However, in
practice, the labeling function does not need to be defined for
every timestamp in
T
, but may be limited to the timestamps
that appear in the event log (T={t∈ T | (i, e, t)L}).
We derive the labels in terms of the approximated MSR
as follows. The number of cases in transition at time
t
, for
4856
1 2
Registration
3 4 5 6
Additional
vitals
Lab
tests resu lts
Doctor
admissio n
Nurse
admission
Order
blood test
Figure 2: A part of the congestion graph constructed using
the event log of 1.
which the last event was vis given by:
n(v, t) =
i∈ I | ∃ 1q≤ |σi|:ei
q=vti
q< t < ti
q+1.
The total elapsed time
(v, t)
for cases, for which event
v
has
just been observed, is calculated as:
(v, t) =
X
i∈I
tti
q| ∃ 1q≤ |σi|:ei
q=vti
q< t < ti
q+1.
Finally, the time between the two most recent occurrences of
events vprior to time tis defined as:
τ(v, t) = t0t00,with t0= max
i∈I,1q≤|σi|
ei
q=vti
q<t<ti
q+1
ti
q
t00 = max
i∈I,1q≤|σi|
ei
q=vti
q<t0<ti
q+1
ti
q
Note that the mining procedure for label derivation has a
complexity that is linear in the number of events recorded in
the event log: the algorithm makes a single pass over the log
to compute the labels.
We illustrate mining a congestion graph using the event log
of 1. The general structure of the congestion graph is shown
in 2, which maps out all the events and their dependencies
as recorded in the event log. Note that for clarity the figure
presents only edges that appear Table 1 rather than showing
the fully-connected congestion graph. We further illustrate
the MSR of one of the graph’s vertices. Consider the fourth
event, referring to the additional vitals. The MSR
ωt(4)
of
this event is estimated for time 9:00:00 as follows: Two pa-
tients are in transition (patients 11 and 12), their accumulated
delay is
13m38s
, and the delay between the respective treat-
ment events is
9m8s
. Hence, the MSR for the fourth event at
time 9:00:00 is given as ωt(4) = (2,13m38s, 9m8s)).
The vertex labels of the congestion graph induce a set of
congestion features. For a graph
G= (V, F, ω)
, the transfor-
mation applied to the event log to extract these features at
time t, denoted by φG, is simply:
φG:L × Y → L × × Y ,
φG(i, ei
q, ti
q, y)=(i, ei
q, ti
q, ωt(ei
q)),
with qbeing the most recent event with respect to t.
Evaluation
In this section, we present the main findings of evaluating
our congestion mining technique against real-world event
logs from two healthcare systems, namely an emergency
department and a large outpatient cancer clinic. Our main
results are summarized as follows:
Extracting features from congestion graphs increases the
accuracy of time prediction by up to
8%
with respect to
the best benchmark.
In terms of relative error (i.e., the ratio between the error
and the actually observed time), we achieve improvements
of up to 23%.
Congestion graphs are able to provide insights into causes
of delay via feature ranking.
Experimental Setup and Procedure
We first describe the experimental setup in terms of the two
real-world datasets and the benchmarks used to assess the
applicability of our approach. We then outline the overall
experimental procedure and implementation and define our
accuracy evaluation measures.
Datasets and Time Prediction Queries
Our experiments
use two real-world event logs:
ED: The event log from the Electronic Health Records
of an Israeli emergency department that serves approxi-
mately
100
patients per day. Every patient that enters the
emergency department receives a bar-code that is scanned
at the start and end of every medical procedure. A subset
of the patient treatment events was illustrated in 1 and 2.
The actual treatment procedures, however, are more com-
plex, as there are 13 different types of treatments. The
dataset covers April 2014 to December 2014 and includes
approximately 42,000 patient visits.
CC: An outpatient cancer clinic (the Dana-Farber Cancer
Institute in Boston, MA), in which
250
health providers
serve
1,000
patients per day. The dataset is based on a
track log that comes from a Real-Time Locating System
(RTLS). The resulting event log is based on nearly
1,000
RTLS sensors that track patients, physicians, nurses, and
equipment with a resolution of
3
seconds, thereby moni-
toring the system in real-time. The recordings contain sen-
sor (location) description and the floor number where the
tracked entity was observed. The dataset contains record-
ings between April 2014 and December 2014.
Comparing the two datasets, we observe that the average
length-of-stay for ED is 300 minutes with a standard de-
viation of 307 minutes, while for CC patients the stay is
approximately 150 minutes with a standard deviation of 120
minutes. ED patients wait for a physician an average of 60
minutes. Furthermore, the emergency department (ED) oper-
ates on a 24/7 basis, while the outpatient clinic (CC) opens
at 6:00 and closes for new arrivals at 18:00. Both healthcare
systems experience high load during morning hours: for ED
the load peaks between 10:00 and noon, while for CC, the
high load period spans 9:00 to noon.
For each healthcare system, we chose the query that is
most relevant given the specific application context. That is,
4857
for ED, we predict the time-to-physician upon a patient’s
arrival. For CC, our prediction target is the length-of-stay of
an arriving patient.
Baseline Techniques
We compare our approach for time
prediction based on features extracted from mined con-
gestion graphs against several baseline techniques. First,
we consider the long-term average (LongTerm) based on
the training set. This technique should perform poorly as
it does not account for varying congestion levels. How-
ever, it is often used for time prediction in hospitals across
the United States (Dong, Yom-Tov, and Yom-Tov 2015;
Ang et al
.
2015). Second, a refined version of LongTerm
is a rolling horizon predictor that is based on the moving
average of
H
periods (e.g., hours) (Ang et al
.
2015). We
denote it by Rolling(H) and cross-validate the optimal
H
using the training data. Third, we use an hourly average
(HourAvg) to accommodate for seasonal effects, deriving
time-of-day information from the timestamps assigned to log
entries. Fourth, we use the snapshot predictor, which predicts
time-to-physician and length-of-stay, respectively, based on
the wait time of the most recent patient that finished waiting.
This result is considered the state of the art in delay prediction
for single-station queues (Senderovich et al. 2015).
Experimental Procedure and Implementation
We fol-
low the training-validation-test paradigm (Friedman, Hastie,
and Tibshirani 2001) to evaluate our approach and randomly
partition the two datasets into training data and test data.
Specifically, for each dataset we make the following four
partitions:
Single month training: We use patients that arrived during
April 2014 as training data and patients that were admitted
during May 2014 as test data. This reduces the possibility
of concept drift, at the expense of reducing the size of the
training set.
Summer months: We use April 2014 - June 2014 for train-
ing and test the technique on patients that arrived during
July 2014. We leave out winter months as they are known
to be heavily loaded (concept drift).
Entire year: we use April 2014 - October 2014 for training
and November 2014 - December 2014 for testing. This
increases the variability due to concept drift, yet provides
the learning algorithm with much more training data.
Peak hours: We choose the heavily loaded hours for each
of the healthcare systems, as measured by the arrival rates
of patients. As in the entire year scenario, we use April
2014 - October 2014 for training and November 2014 -
December 2014 for testing.
In our experiments, we rely on a state-of-the-art supervised
learning algorithm, XGBoost (Chen and Guestrin 2016), im-
plemented in Python. It is employed to learn a function
h
(see Motivation) based on the training set, validate its hyper-
parameters using cross-validation on the validation set (the
training data is partitioned 80/20 chronologically for this
purpose), and evaluate prediction accuracy on the test set.
All algorithms for congestion graph mining and feature
extraction are implemented in Python and are publicly avail-
able.
1
Our experiments were conducted on an 8-core server,
Intel Xeon CPU E5-2660 v4 @ 2.00GHz, each core being
equipped with 32GB main memory, running on Linux Centos
7.3 OS.
Evaluation Measures
We measure the accuracy of predic-
tion with three empirical measures. First, the Root Mean
Squared Error (RMSE) is based on the squared difference
between the actual time and the predicted value. Let
y
l
be the
actual value of
yl
, the time of interest for a log entry of the
test set
lLtest
. With
ˆyl
be the predicted value, the RMSE
is defined as:
RM SE =s1
|Ltest|X
lLtest
[ ˆyly
l]2.
RMSE quantifies the error in the time units of the original
measurements, in our case, seconds (which are converted to
minutes below for convenience).
The RMSE is sensitive to outliers (Friedman, Hastie, and
Tibshirani 2001). Therefore, in addition, we consider the ab-
solute error, which is known to be more robust (Friedman,
Hastie, and Tibshirani 2001). Specifically, we use the fol-
lowing two measures. The Mean Absolute Error (MAE) is
defined as:
MAE =1
|Ltest|X
lLtest
|ˆyly
l|,
and quantifies the absolute deviation between the predicted
value and the real value. The Mean Absolute Relative Error
(MARE), in turn, is defined as:
MARE =1
|Ltest|X
lLtest
|ˆyly
l|
y
l
,
and quantifies the ratio between the absolute error and the pre-
dicted value. The latter is used to provide a relative measure
for the absolute error, as an error of 10 minutes in a 100-
minute length-of-stay is tolerable, while the same error in a 5
minute length-of-stay points toward a significant problem in
the method.
Results
The main results of our experiments are summarized in 2.
The rows correspond to all combinations of dataset (ED
and CC), the training period, and the method (‘LongTerm’,
‘Rolling(H)’, ‘HourAvg’, ‘Snapshot’, and ‘CG’ for conges-
tion graph). To denote the training and test periods, we use
the numeric values of months (e.g., 4 for April). Further,
we add the relevant hours for the high load scenario (e.g.,
9-12 corresponds to 9:00-noon). The boldfaced values are
the dominating methods in terms of the three measures. The
values of the first two accuracy measures (RMSE and MAE)
correspond to the prediction error in minutes. The third accu-
racy measure, namely MARE, is a ratio between the absolute
error and the actual time that we wish to predict.
As shown in 2, considering inter-patient dependencies in
the data, by means of features extracted from congestion
1http://bit.ly/2lcq37s
4858
Table 2: Prediction accuracy based on the test set.
DS Time Period Method RMSE MAE MARE
ED
Tr=5
LongTerm 46 33 0.79
Rolling(H) 47 33 0.73
Test=6 HourAvg 47 33 0.72
Snapshot 47 34 0.74
CG 45 32 0.70
Tr=4,5,6
LongTerm 43 33 0.77
Rolling(H) 42 31 0.74
Test=7 HourAvg 43 32 0.76
Snapshot 43 31 0.74
CG 41 29 0.69
Tr=4:10
LongTerm 99 38 1.48
Rolling(H) 98 35 1.37
Test=11,12 HourAvg 100 38 1.46
Snapshot 101 42 1.65
CG 97 32 1.27
Tr=4,5,6
LongTerm 39 28 0.67
Rolling(H) 39 27 0.65
Test=7 HourAvg 38 28 0.64
High Load (10-12) Snapshot 38 27 0.64
CG 36 25 0.60
CC
Tr=5
LongTerm 118 95 1.35
Rolling(H) 115 90 1.28
Test=6 HourAvg 112 89 1.26
Snapshot 120 96 1.36
CG 106 82 1.17
Tr=4,5,6
LongTerm 123 96 1.3
Rolling(H) 117 90 1.22
Test=7 HourAvg 115 89 1.2
Snapshot 122 94 1.27
CG 108 81 1.09
Tr=4:10
LongTerm 123 97 1.36
Rolling(H) 119 92 1.3
Test=11,12 HourAvg 117 93 1.28
Snapshot 123 95 1.33
CG 110 83 1.16
Tr=4,5,6
LongTerm 114 93 1.37
Rolling(H) 113 92 1.36
Test=7 HourAvg 114 93 1.34
High Load (9-12) Snapshot 114 93 1.36
CG 104 82 1.2
graphs, improves prediction accuracy beyond the baselines
(‘LongTerm’, ‘Rolling(H)’, ‘HourAvg’, ‘Snapshot’), espe-
cially when considering the MARE measure. When consider-
ing the time-to-physician in the emergency department (ED),
congestion features increase prediction accuracy by up to
6%
.
As for relative error (ratio between the error and the actual
time), we observe an improvement of
23%
. This general trend
is mirrored for the second dataset. For the cancer clinic (CC),
congestion features improve the accuracy of length-of-stay
prediction by up to
8%
, while the relative error is improved
by up to
14%
. The consistent results for both datasets provide
evidence that the automatic extraction of congestion features
indeed improves the accuracy of time prediction significantly.
Table 3: Importance of congestion features (ED dataset).
Ranking Feature Description
1n(1) # of Patients in Reception
2(5) Elapsed Time: Lab Results
3(4) Elapsed Time: Additional Vitals
When observing the difference between entire year pre-
diction and shorter periods, we encounter a noticeable, yet
expected concept drift. Predicting winter months using the
beginning of the year is expected to perform worse than
short-term predictions, as winter behavior is different due to
higher arrival rates into the emergency department and an
increased number of cancellations in the outpatient hospital.
Specifically, the error grows by a factor of
2
, compared to
summer-time prediction. Testing the different predictors for
their robustness to concept drift, we discover that congestion
features deteriorate less than other prediction methods across
the different selections of time periods for training and test.
Insights using Feature Importance
Providing insights as to the most important features and root-
causes for delays in the system is a crucial step when opti-
mizing systems. We now take the dataset of the emergency
department (ED) as an example to show how the features
obtained from congestion graphs provide insights into the
root-causes of delays.
We evaluate feature importance by ranking features accord-
ing to their role in the prediction task. Specifically, gradient
boosting enables the ranking of features in correspondence to
their predictive power (Pan et al. 2009). 3 presents the top-3
features given as an output by the cross-validated XGBoost
method during heavily loaded hours.
The extracted features (over all times in the event
log, hence time index
t
is omitted) are denoted by
(n(v), (v), τ (v))
with
n(v)
being the number of cases for
which
v
is the most recent event,
(v)
being the total time
since these cases visited
v
and
τ(v)
being the time between
the two most recent occurrences of the respective event
v
.
Also, for illustration purposes, 3 shows the full congestion
graph, which represents the pathway of a patient in the Emer-
gency Department. The vertices and outgoing edges that
correspond to the highest-ranked congestion features are
highlighted. Recall that the congestion graph was created
automatically from the event log and that, as noted in the
Introduction, such a system view is exactly what is expen-
sive and difficult to obtain through traditional methods. The
dominant feature for the emergency department based on
the congestion graph is
n(1)
, the number of patients who
entered reception. This implies that a greater arrival volume
has an impact on time prediction, as it results in delays. The
second feature,
(5)
, corresponds to the elapsed time since
lab results are ready (i.e., blood work). This feature is highly
predictive as the next step after lab is typically the visit to the
physician, the prediction target. Hence, an important feature
is the time in queue for the physician (which is
(5)
). For the
same reason, feature
(4)
turned out to be of high predictive
4859
Figure 3: Main treatment events and flows; events and flows of important features are highlighted.
power, as some patients immediately pass from the checking
vitals to the physician.
To summarize, an interpretation of feature importance
yields insights on root causes of delays in patient treatment.
As such, mined congestion graphs provide a means for analy-
sis and understanding of the process beyond time prediction.
Related Work
The importance of extracting features that account for con-
gestion has been recognized in the literature. A recent work
proposes the Q-Lasso method for predicting waiting times
in emergency departments (Ang et al
.
2015). The authors as-
sume full knowledge of the patient flow process and use this
knowledge to manually define queueing features (e.g., the
number of patients waiting for a physician) that are inserted
into a Lasso regression model for feature selection. Similarly,
Senderovich et al
.
(2015) proposed a single-station queueing
model that is heavily based on process knowledge to generate
predictive features. In our work, we do not assume a-priori
knowledge of the process and the events that we observe
in the event log. Furthermore, compared to Senderovich et
al
.
(2015) our approach handles event logs from multi-stage
processes.
Liu et al
.
(2014; 2016) propose a method for discovering
a stochastic workflow model from event data that is emitted
by a real-time location system. Applied in healthcare, the
developed model considers dependencies between patients
that stay in the hospital at the same time. However, in these
works, the authors assume known relations between sensor
locations and activities. This information is used to enrich
the data with additional knowledge, while our method does
not require a data enrichment step.
Congestion estimation and prediction have been the subject
of numerous works in traffic analysis (Liu, Yue, and Krishnan
2013; Van Lint 2008). Most works in this area aim to learn a
generative model of dynamic traffic conditions. In contrast,
our work is based on discriminative machine learning and
formalizes this idea using congestion graphs.
Automated feature generation has been a popular research
topic (see Khurana, Samulowitz, and Turaga (2018) and ref-
erences within for a review). Specifically, given a pre-defined
set of generic feature transformation functions (e.g., sine,
squared root, logarithm), a wide range of techniques has been
applied to elicit optimal transformation sequences, including
reinforcement learning (Khurana, Samulowitz, and Turaga
2018), local search (Markovitch and Rosenstein 2002), and
deep neural networks (Bengio, Courville, and Vincent 2013).
However, these methods are either computationally expensive
(e.g., training a deep neural network) and/or lack the capabil-
ity to discover complex features. In our work, we provide an
approach that generates predictive features that come from
queueing theory and cannot be easily derived using generic
transformation functions. Furthermore, unlike generic fea-
tures, our congestion graph based features can be used for a
root-cause analysis of delays. Importantly, our method has a
complexity that is linear in the number of events recorded in
the event log.
In addition, temporal point processes were fitted from
data to provide accurate time prediction (Lian et al
.
2015;
Trivedi et al
.
2017). These methods learn features from node
representations of a temporal graph, which can then be used
to predict times. An important distinction between our paper
and these two papers is that our congestion graphs are based
on Generalized Jackson Networks, a queueing model that
does not assume prior knowledge on the distributions of
its building blocks (e.g., arrival rates and service times). In
contrast, the two papers, assume that the underlying model
is either a temporal point process (Trivedi et al
.
2017) or a
Gaussian renewal process (Lian et al
.
2015) with parametric
or non-parametric structures.
Lastly, our work also relates to the task of activity predic-
tion, an established problem in the data mining field (Mi-
nor, Doppa, and Cook 2015). However, our setting is ori-
ented towards cold start queries, where information about
the progress of a specific patient is unavailable. Specifically,
Minor, Doppa, and Cook capture inter-entity dependencies
via pre-defined features, such as the most frequent event type
in a time window. Our method, in contrast, automatically
generates these features using congestion-based reasoning
rooted in queueing theory.
Conclusion
We presented a novel approach for automated feature ex-
traction for time prediction in congested systems, based on
the notion of congestion graphs, dynamic representations of
event data that are grounded in queueing theory. Specifically,
our notion of congestion graphs is based on a Markovian state
representation of queueing systems. Empirical evaluation
confirms that the features that come from these congestion
graphs improve prediction performance. In addition, we ob-
serve that our approach goes beyond accurate time prediction
by providing insights into the root-causes of system behavior.
4860
Future work involves extending our methods to support
changes in the underlying system. Specifically, our tech-
niques are prone to failure when the mapping of states to
predictions is unstable. Therefore, we aim at developing an
adaptive online component to compensate for such changes.
Furthermore, congestion graphs result in
O(|V|)
features,
with |V|being the number of events in the data, which ham-
pers its scalability. Specifically, in large systems with thou-
sands of events, this can lead to feature explosion. Hence,
in the future, we shall provide techniques for regularizing
congestion graphs, e.g., by considering edges that have sig-
nificant predictive power.
Acknowledgements
Research partially funded by the German Research Foun-
dation (DFG) under grant agreement number WE 4891/1-1.
We are also grateful to the Stiftung Industrieforschung for
supporting this work (grant S0234/10220/2017).
References
Ang, E.; Kwasnick, S.; Bayati, M.; Plambeck, E. L.; and Aratow,
M. 2015. Accurate emergency department wait time prediction.
Manufacturing & Service Operations Management 18(1):141–156.
Backus, P.; Janakiram, M.; Mowzoon, S.; Runger, C.; and Bhargava,
A. 2006. Factory cycle-time prediction with a data-mining approach.
IEEE Transactions on Semiconductor Manufacturing 19(2):252–
258.
Bengio, Y.; Courville, A. C.; and Vincent, P. 2013. Representation
learning: A review and new perspectives. IEEE Trans. Pattern Anal.
Mach. Intell. 35(8):1798–1828.
Bolch, G.; Greiner, S.; de Meer, H.; and Trivedi, K. S. 2006. Queue-
ing Networks and Markov Chains - Modeling and Performance
Evaluation with Computer Science Applications; 2nd Edition. Wi-
ley.
Botea, A.; Nikolova, E.; and Berlingerio, M. 2013. Multi-modal
journey planning in the presence of uncertainty. In ICAPS.
Chen, T., and Guestrin, C. 2016. Xgboost: A scalable tree boosting
system. In Proceedings of the 22Nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, 785–794.
ACM.
Chen, H., and Yao, D. D. 2013. Fundamentals of queueing networks:
Performance, asymptotics, and optimization, volume 46. Springer
Science & Business Media.
Dong, J.; Yom-Tov, E.; and Yom-Tov, G. B. 2015. The impact of
delay announcements on hospital network coordination and waiting
times. Technical report, Working paper.
Dumas, M.; Rosa, M. L.; Mendling, J.; and Reijers, H. A. 2018.
Fundamentals of Business Process Management, Second Edition.
Springer.
Frederiks, P. J. M., and van der Weide, T. P. 2006. Information mod-
eling: The process and the required competencies of its participants.
Data Knowl. Eng. 58(1):4–20.
Friedman, J.; Hastie, T.; and Tibshirani, R. 2001. The elements of
statistical learning, volume 1. Springer series in statistics Springer,
Berlin.
Gal, A.; Mandelbaum, A.; Schnitzler, F.; Senderovich, A.; and Wei-
dlich, M. 2017. Traveling time prediction in scheduled transporta-
tion with journey segments. Inf. Syst. 64:266–280.
Gamarnik, D., and Zeevi, A. 2006. Validity of heavy traffic steady-
state approximations in generalized jackson networks. The Annals
of Applied Probability 56–90.
Khurana, U.; Samulowitz, H.; and Turaga, D. S. 2018. Feature
engineering for predictive modeling using reinforcement learning.
In McIlraith, S. A., and Weinberger, K. Q., eds., Proceedings of
the Thirty-Second AAAI Conference on Artificial Intelligence, New
Orleans, Louisiana, USA, February 2-7, 2018. AAAI Press.
Lian, W.; Henao, R.; Rao, V.; Lucas, J. E.; and Carin, L. 2015. A
multitask point process predictive model. In Proceedings of the
32nd International Conference on Machine Learning, ICML 2015,
Lille, France, 6-11 July 2015, 2030–2038.
Liu, C.; Ge, Y.; Xiong, H.; Xiao, K.; Geng, W.; and Perkins, M.
2014. Proactive workflow modeling by stochastic processes with
application to healthcare operation and management. In Proceedings
of the 20th ACM SIGKDD international conference on Knowledge
discovery and data mining, 1593–1602. ACM.
Liu, C.; Xiong, H.; Papadimitriou, S.; Ge, Y.; and Xiao, K. 2016. A
proactive workflow model for healthcare operation and management.
IEEE Transactions on Knowledge and Data Engineering.
Liu, S.; Yue, Y.; and Krishnan, R. 2013. Adaptive collective routing
using gaussian process dynamic congestion models. In Proceedings
of the 19th ACM SIGKDD international conference on Knowledge
discovery and data mining, 704–712. ACM.
Markovitch, S., and Rosenstein, D. 2002. Feature generation using
general constructor functions. Machine Learning 49(1):59–98.
Minor, B.; Doppa, J. R.; and Cook, D. J. 2015. Data-driven activity
prediction: Algorithms, evaluation methodology, and applications.
In Proceedings of the 21th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, 805–814. ACM.
Pan, F.; Converse, T.; Ahn, D.; Salvetti, F.; and Donato, G. 2009.
Feature selection for ranking using boosted trees. In Proceedings of
the 18th ACM conference on Information and knowledge manage-
ment, 2025–2028. ACM.
Rosemann, M. 2006. Potential pitfalls of process modeling: part A.
Business Proc. Manag. Journal 12(2):249–254.
Senderovich, A.; Weidlich, M.; Gal, A.; and Mandelbaum, A. 2015.
Queue mining for delay prediction in multi-class service processes.
Information Systems 53:278–295.
Shalev-Shwartz, S., and Ben-David, S. 2014. Understanding ma-
chine learning: From theory to algorithms. Cambridge University
Press.
Trivedi, R.; Dai, H.; Wang, Y.; and Song, L. 2017. Know-evolve:
Deep temporal reasoning for dynamic knowledge graphs. In Pro-
ceedings of the 34th International Conference on Machine Learning,
ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, 3462–3471.
van der Aalst, W. M. P. 2016. Process Mining - Data Science in
Action, Second Edition. Springer.
Van Lint, J. 2008. Online learning solutions for freeway travel
time prediction. IEEE Transactions on Intelligent Transportation
Systems 9(1):38–47.
Wilkie, D.; van den Berg, J. P.; Lin, M. C.; and Manocha, D. 2011.
Self-aware traffic route planning. In AAAI, volume 11, 1521–1527.
4861
... 9,10 Some work reports models for emergency wait times (door to physician), but not for ambulance off-load. [11][12][13][14] ...
... Model building and recalibration. Guided by door-toprovider wait-time prediction literature, [11][12][13]17 we used 3 statistical and machine-learning techniques (ie, linear regression, random forests, and elastic net regression) and a rolling average approach (ie, the average calculation of the outcome of previous "k" observations). We included all predictor variables in the model construction and undertook a post hoc variable importance analysis. ...
Article
Study objective To derive and internally and externally validate machine-learning models to predict emergency ambulance patient door–to–off-stretcher wait times that are applicable to a wide variety of emergency departments. Methods Nine emergency departments provided 3 years (2017 to 2019) of retrospective administrative data from Australia. Descriptive and exploratory analyses were undertaken on the datasets. Statistical and machine-learning models were developed to predict wait times at each site and were internally and externally validated. Results There were 421,894 episodes analyzed, and median site off-load times varied from 13 (interquartile range [IQR], 9 to 20) to 29 (IQR, 16 to 48) minutes. The global site prediction model median absolute errors were 11.7 minutes (95% confidence interval [CI], 11.7 to 11.8) using linear regression and 12.8 minutes (95% CI, 12.7 to 12.9) using elastic net. The individual site model prediction median absolute errors varied from the most accurate at 6.3 minutes (95% CI, 6.2 to 6.4) to the least accurate at 16.1 minutes (95% CI, 15.8 to 16.3). The model technique performance was the same for linear regression, random forests, elastic net, and rolling average. The important variables were the last k-patient average waits, triage category, and patient age. The global model performed at the lower end of the accuracy range compared with models for the individual sites but was within tolerable limits. Conclusion Electronic emergency demographic and flow information can be used to estimate emergency ambulance patient off-stretcher times. Models can be built with reasonable accuracy for multiple hospitals using a small number of point-of-care variables.
... In call-center processes, thoroughly studied in [11], queueing theory models can be used for load predictions under assumptions about distributions of unobserved parameters, such as customer patience duration [12], while assuming high load snapshot principle predictors show better accuracy [13]. For time predictions in congested systems, the required features are extracted using congestion graphs [14] mined using queuing theory. ...
... Sect. 2) approaches the problem of analyzing the performance of systems with shared resources primarily either from the control-flow perspective [17,19,20,10,5] or the resource/queuing perspective [11,12,13,14], leading to information loss about the other perspective. In the following, we show how to conceptualize the problem from both perspectives at once using synchronous proclets [6] extended with a few concepts of coloured Petri Nets [7]. ...
Preprint
Full-text available
To identify the causes of performance problems or to predict process behavior, it is essential to have correct and complete event data. This is particularly important for distributed systems with shared resources, e.g., one case can block another case competing for the same machine, leading to inter-case dependencies in performance. However, due to a variety of reasons, real-life systems often record only a subset of all events taking place. For example, to reduce costs, the number of sensors is minimized or parts of the system are not connected. To understand and analyze the behavior of processes with shared resources, we aim to reconstruct bounds for timestamps of events that must have happened but were not recorded. We present a novel approach that decomposes system runs into entity traces of cases and resources that may need to synchronize in the presence of many-to-many relationships. Such relationships occur, for example, in warehouses where packages for N incoming orders are not handled in a single delivery but in M different deliveries. We use linear programming over entity traces to derive the timestamps of unobserved events in an efficient manner. This helps to complete the event logs and facilitates analysis. We focus on material handling systems like baggage handling systems in airports to illustrate our approach. However, the approach can be applied to other settings where recording is incomplete. The ideas have been implemented in ProM and were evaluated using both synthetic and real-life event logs.
... Focusing on process performance analysis, a process may be approximated by a single-station queue to derive predictive features [4]. Beyond single-station models, discovery of congestion graphs enables the construction of dynamic queueing features [35]. In [36], the authors construct inter-case features including a count of the number of cases and inter-event durations for cases that share similar prefixes. ...
... In a queueing system, these features translate to the number of customers in queue and in service, and waiting and service times. Yet, in [4], [35], [36], full life cycle information is assumed, which renders the approaches inapplicable in our setting. ...
... This research led to integrating queueing models and process models [90] and the detection of complex performance patterns when considering all process executions together [17,47]. Integrated treatment of these perspectives allowed to increase accuracy in process prediction [48,89], inferring otherwise unobservable behavior [30], and allows detecting emergent system-level phenomena of cascades of increased workload and processing delays [94]. Further, integrating explicit behavioral descriptions of process executions and actors allows to detect complex task execution patterns describing organizational routines and individual habits that involve multiple actors and process executions and that evolve over time [49]. ...
Preprint
Full-text available
Augmented Business Process Management Systems (ABPMSs) are an emerging class of process-aware information systems that draws upon trustworthy AI technology. An ABPMS enhances the execution of business processes with the aim of making these processes more adaptable, proactive, explainable, and context-sensitive. This manifesto presents a vision for ABPMSs and discusses research challenges that need to be surmounted to realize this vision. To this end, we define the concept of ABPMS, we outline the lifecycle of processes within an ABPMS, we discuss core characteristics of an ABPMS, and we derive a set of challenges to realize systems with these characteristics.
... To illustrate a possible approach that combines process enhancement with expert knowledge for creating a focused set of features to create a perspective, consider queue mining [11], which enhanced a log with a queueing perspective to extract congestions from process data. Feature engineering using a model-driven approach to automatically generate congestion-related features, as illustrated in Figure 1, was proposed by Senderovich et al. [30]. Given an event log, the first step mines congestion graphs, graphical representations of the dynamics observed in a system. ...
Chapter
The discipline of process mining was inaugurated in the BPM community. It flourished in a world of small(er) data, with roots in the communities of software engineering and databases and applications mainly in organizational and management settings. The introduction of big data, with its volume, velocity, variety, and veracity, and the big strides in data science research and practice pose new challenges to this research field. The paper positions process mining along modern data life cycle, highlighting the challenges and suggesting directions in which data science disciplines (e.g., machine learning) may interact with a renewed process mining agenda.
Article
Quantitative business process analysis is a powerful approach for analysing timing properties of a business process, such as the expected waiting time of customers or the utilization rate of resources. Multiple techniques are available for quantitative business process analysis, which all have their own advantages and disadvantages. This paper presents a novel technique, based on queueing models, that combines the advantages of existing techniques, in that it leads to accurate analyses, is computationally inexpensive, and feature complete with respect to its support for basic process modelling constructs. An extensive quantitative evaluation has been performed that compares the presented queueing model to existing queueing models from literature. This evaluation shows that the presented model outperforms existing models with one order of magnitude on accuracy. The resulting queueing model can be used for fast and accurate timing predictions of business process models. These properties are useful in optimization scenarios.
Article
Objective Patients, families and community members would like emergency department wait time visibility. This would improve patient journeys through emergency medicine. The study objective was to derive, internally and externally validate machine learning models to predict emergency patient wait times that are applicable to a wide variety of emergency departments. Methods Twelve emergency departments provided 3 years of retrospective administrative data from Australia (2017–2019). Descriptive and exploratory analyses were undertaken on the datasets. Statistical and machine learning models were developed to predict wait times at each site and were internally and externally validated. Model performance was tested on COVID-19 period data (January to June 2020). Results There were 1 930 609 patient episodes analysed and median site wait times varied from 24 to 54 min. Individual site model prediction median absolute errors varied from±22.6 min (95% CI 22.4 to 22.9) to ±44.0 min (95% CI 43.4 to 44.4). Global model prediction median absolute errors varied from ±33.9 min (95% CI 33.4 to 34.0) to ±43.8 min (95% CI 43.7 to 43.9). Random forest and linear regression models performed the best, rolling average models underestimated wait times. Important variables were triage category, last-k patient average wait time and arrival time. Wait time prediction models are not transferable across hospitals. Models performed well during the COVID-19 lockdown period. Conclusions Electronic emergency demographic and flow information can be used to approximate emergency patient wait times. A general model is less accurate if applied without site-specific factors.
Article
Full-text available
Knowledge Graphs are important tools to model multi-relational data that serves as information pool for various applications. Traditionally, these graphs are considered to be static in nature. However, recent availability of large scale event-based interaction data has given rise to dynamically evolving knowledge graphs that contain temporal information for each edge. Reasoning over time in such graphs is not yet well understood. In this paper, we present a novel deep evolutionary knowledge network architecture to learn entity embeddings that can dynamically and non-linearly evolve over time. We further propose a multivariate point process framework to model the occurrence of a fact (edge) in continuous time. To facilitate temporal reasoning, the learned embeddings are used to compute relationship score that further parametrizes intensity function of the point process. We demonstrate improved performance over various existing relational learning models on two large scale real-world datasets. Further, our method effectively predicts occurrence or recurrence time of a fact which is novel compared to any prior reasoning approaches in multi-relational setting.
Conference Paper
Full-text available
Point process data are commonly observed in fields like healthcare and the social sciences. Designing predictive models for such event streams is an under-explored problem, due to often scarce training data. In this work we propose a multitask point process model, leveraging information from all tasks via a hierarchical Gaussian process (GP). Nonparametric learning functions implemented by a GP, which map from past events to future rates, allow analysis of flexible arrival patterns. To facilitate efficient inference, we propose a sparse construction for this hierarchical model, and derive a variational Bayes method for learning and inference. Experimental results are shown on both synthetic data and as well as real electronic health-records data.
Article
We investigate the impact of delay announcements on the coordination within hospital networks using a combination of empirical observations and numerical experi- ments. We offer empirical evidence that suggests that patients take delay information into account when choosing emergency service providers and that such information can help increase coordination in the network, leading to improvements in the performance of the network, as measured by emergency department wait times. Our numerical results indi- cate that the level of coordination that can be achieved is limited by the patients’ sensitivity to waiting, the load of the system, the heterogeneity among hospitals, and, importantly, the method hospitals use to estimate delays. We show that delay estimators that are based on historical averages may cause oscillation in the system and lead to higher average wait times when patients are sensitive to delay. We provide empirical evidence that suggests that such oscillations occur in hospital networks in the United States.
Conference Paper
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Article
Advances in real-time location systems have enabled us to collect massive amounts of fine-grained semantically rich location traces, which provide unparalleled opportunities for understanding human activities and generating useful knowledge. This, in turn, delivers intelligence for real-time decision making in various fields, such as workflow management. Indeed, it is a new paradigm to model workflows through knowledge discovery in location traces. To that end, in this paper, we provide a focused study of workflow modeling by integrated analysis of indoor location traces in the hospital environment. In particular, we develop a workflow modeling framework that automatically constructs the workflow states and estimates the parameters describing the workflow transition patterns. More specifically, we propose effective and efficient regularizations for modeling the indoor location traces as stochastic processes. First, to improve the interpretability of the workflow states, we use the geography relationship between the indoor rooms to define a prior of the workflow state distribution. This prior encourages each workflow state to be a contiguous region in the building. Second, to further improve the modeling performance, we show how to use the correlation between related types of medical devices to reinforce the parameter estimation for multiple workflow models. In comparison with our preliminary work [11], we not only develop an integrated workflow modeling framework applicable to general indoor environments, but also improve the modeling accuracy significantly. We reduce the average log-loss by up to 11%.
Article
Urban mobility impacts urban life to a great extent. To enhance urban mobility, much research was invested in traveling time prediction: given an origin and destination, provide a passenger with an accurate estimation of how long a journey lasts. In this work, we investigate a novel combination of methods from Queueing Theory and Machine Learning in the prediction process. We propose a prediction engine that, given a scheduled bus journey (route) and a 'source/destination' pair, provides an estimate for the traveling time, while considering both historical data and real-time streams of information that are transmitted by buses. We propose a model that uses natural segmentation of the data according to bus stops and a set of predictors, some use learning while others are learning-free, to compute traveling time. Our empirical evaluation, using bus data that comes from the bus network in the city of Dublin, demonstrates that the snapshot principle, taken from Queueing Theory, works well yet suffers from outliers. To overcome the outliers problem, we use Machine Learning techniques as a regulator that assists in identifying outliers and propose prediction based on historical data.
Article
This paper proposes the Q-Lasso method for wait time prediction, which combines statistical learning with fluid model estimators. In historical data from four remarkably different hospitals, Q-Lasso predicts the emergency department (ED) wait time for low-acuity patients with greater accuracy than rolling average methods (currently used by hospitals), fluid model estimators (from the service operations management literature), and quantile regression methods (from the emergency medicine literature). Q-Lasso achieves greater accuracy largely by correcting errors of underestimation in which a patient waits for longer than predicted. Implemented on the external website and in the triage room of the San Mateo Medical Center (SMMC), Q-Lasso achieves over 30% lower mean squared prediction error than would occur with the best rolling average method. The paper describes challenges and insights from the implementation at SMMC.
Article
Information systems have been widely adopted to support service processes in various domains, e.g., in the telecommunication, finance, and health sectors. Information recorded by systems during the operation of these processes provide an angle for operational process analysis, commonly referred to as process mining. In this work, we establish a queueing perspective in process mining to address the online delay prediction problem, which refers to the time that the execution of an activity for a running instance of a service process is delayed due to queueing effects. We present predictors that treat queues as first-class citizens and either enhance existing regression-based techniques for process mining or are directly grounded in queueing theory. In particular, our predictors target multi-class service processes, in which requests are classified by a type that influences their processing. Further, we introduce queue mining techniques that derive the predictors from event logs recorded by an information system during process execution. Our evaluation based on large real-world datasets, from the telecommunications and financial sectors, shows that our techniques yield accurate online predictions of case delay and drastically improve over predictors neglecting the queueing perspective.
Article
Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides an extensive theoretical account of the fundamental ideas underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics of the field, the book covers a wide array of central topics that have not been addressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for an advanced undergraduate or beginning graduate course, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics, and engineering.