PreprintPDF Available

Towards Safer Heuristics With XPlain

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Many problems that cloud operators solve are computationally expensive, and operators often use heuristic algorithms (that are faster and scale better than optimal) to solve them more efficiently. Heuristic analyzers enable operators to find when and by how much their heuristics underperform. However, these tools do not provide enough detail for operators to mitigate the heuristic's impact in practice: they only discover a single input instance that causes the heuristic to underperform (and not the full set), and they do not explain why. We propose XPlain, a tool that extends these analyzers and helps operators understand when and why their heuristics underperform. We present promising initial results that show such an extension is viable.
Content may be subject to copyright.
Towards Safer Heuristics With XPlain
Pantea Karimi1*, Solal Pirelli2*, Siva Kesava Reddy Kakarla3, Ryan Beckett3,
Santiago Segarra4, Beibin Li3, Pooria Namyar5, Behnaz Arzani3
1MIT 2EPFL, Sonar 3Microsoft Research 4Rice University 5University of Southern California
Abstract
Many problems that cloud operators solve are computation-
ally expensive, and operators often use heuristic algorithms
(that are faster and scale better than optimal) to solve them
more efficiently. Heuristic analyzers enable operators to find
when and by how much their heuristics underperform. How-
ever, these tools do not provide enough detail for operators to
mitigate the heuristic’s impact in practice: they only discover
asingle input instance that causes the heuristic to underper-
form (and not the full set) and they do not explain why.
We propose
X
Plain, a tool that extends these analyzers
and helps operators understand when and why their heuristics
underperform. We present promising initial results that show
such an extension is viable.
CCS Concepts
Networks
Network performance analysis;Network
performance modeling;Network management;Network
reliability.
Keywords
Heuristic Analysis, Explainable Analysis, Domain-Specific
Language
1 Introduction
Operators use heuristics (approximate algorithms that are
faster or scale better than their optimal counterparts) in pro-
duction systems to solve computationally difficult or expen-
sive problems. These heuristics perform well across many
typical instances, but they can break in unexpected ways
*Equal contribution: Work was done partly as an intern at Microsoft Re-
search.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components
of this work owned by others than the author(s) must be honored. Abstracting
with credit is permitted. To copy otherwise, or republish, to post on servers or
to redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from permissions@acm.org.
HOTNETS ’24, November 18–19, 2024, Irvine, CA, USA
© 2024 Copyright held by the owner/author(s). Publication rights licensed to
ACM.
ACM ISBN 979-8-4007-1272-2/24/11
https://doi.org/10.1145/3696348.3696884
when network conditions change [
5
,
6
,
16
,
35
]. Our commu-
nity has developed tools that enable operators to identify such
situations [
1
,
2
,
6
,
16
,
35
]. These tools find the “performance
gap” of one heuristic algorithm compared to another heuristic
or the optimal they identify an example instance of an
input which causes a given heuristic to underperform.
For example, MetaOpt [
35
] describes a heuristic deployed
in Microsoft’s wide area traffic engineering solution and
shows it could underperform by
30%
(see §2). This means the
company would either have to overprovision their networks
to support 30% more traffic, drop that traffic, or delay it.
The potential benefit of heuristic analyzers is clear: they
allow operators to quantify the risk of heuristics they want
to deploy. Although these heuristic analyzers have already
shed light on the performance gap of many deployed heuris-
tics, they are still in their nascent stage and have limited use
for operators who do not have sufficient expertise in formal
methods and/or optimization theory. There are crucial fea-
tures missing: operators have to (1) model the heuristics they
want to analyze in terms of mathematical constructs these
tools can support and (2) manually analyze the outputs from
these tools to understand how to fix their heuristics or their
scenarios the tool only provides a performance gap and
an example input that caused it. They do not produce the full
space of inputs that can cause large gaps nor describe why
the heuristic underperformed in these instances.
The latter problem limits the operator’s ability to use the
output of these tools to fix the problem and to either improve
the heuristic, create an alternative solution for when it under-
performs, or cache the optimal solution for those instances. In
our earlier examples, the operator has to look at the tool’s ex-
ample demand matrix to understand why the heuristic routes
30% less traffic than the optimal.
The state of these heuristic analyzers today is reminiscent
of the early days of our community’s exploration of network
verifiers and their potential to help network operators config-
ure and manage their networks. In the same way that network
verifiers enabled operators to identify bugs in their configu-
rations [
10
,
14
,
15
,
19
,
22
,
24
,
27
,
28
,
30
,
32
,
39
,
47
,
49
], a
heuristic analyzer can help them find the performance gap of
the algorithms they deploy. Tools that allow operators to lever-
age heuristic analyzers more easily, identify why the heuris-
tics underperform, and devise solutions to remediate the issue
serve a similar purpose to the tools our community crafted that
explained the impact of configuration bugs [
23
,
25
,
39
,
40
]
arXiv:2410.15086v1 [cs.AI] 19 Oct 2024
HOTNETS ’24, November 18–19, 2024, Irvine, CA, USA P. Karimi et al.
(by producing all sets of packets that the bug impacted and
the configuration lines that caused the impact).
We propose
X
Plain our vision for a “generalizer” that
can augment existing heuristic analyzers and help operators
either improve their heuristics (by helping them find why the
heuristics underperform) or use them more safely (by finding
all regions where they underperform).
We propose a domain-specific language (§5.1), which al-
lows us to concretely describe the heuristic’s behavior and
that of a benchmark we want to compare it to for automated
analysis. It is rooted in network flow abstraction, which allows
us to model the behavior of many heuristics that operators
use in today’s networks, including all those from [
16
,
35
].
Our compiler converts inputs in this language into an exist-
ing heuristic analyzer. Our efficient iterative algorithm uses
the analyzer, extrapolates from the adversarial inputs it finds,
and finds all adversarial subspaces where the heuristic un-
derperforms. We then use our language again and visualize
why (i.e., the different decisions the heuristic made compared
to the optimal that caused it to underperform) the heuristic
underperforms in these cases.
We also discuss open questions and a possible approach
built on the solutions we propose in this work to uncover
what properties in the input or the problem instance cause the
heuristic to underperform (§5.4).
Our proof-of-concept implementation of this idea uses
MetaOpt [
35
] as the underlying heuristic analyzer because
it is open source. But our proposal applies to other heuristic
analyzers such as [1,2,16] as well.
2 What is heuristic analysis?
Heuristic analyzers [
1
,
16
,
35
] take a heuristic model and a
benchmark model (e.g., the optimal) as input. Their goal is to
characterize the performance gap of the heuristic compared to
the benchmark. Recent tools [
16
,
35
] use optimization theory
or first-order logic to solve this problem and return a single
input instance that causes the heuristic to underperform.
Example heuristics from these work include:
Demand Pinning (DP) was deployed in Microsoft’s wide area
network. DP is a heuristic for the traffic engineering problem.
The optimal algorithm assigns traffic (demands) to paths and
maximizes the total flow it routes through the network without
exceeding the network capacity. Operators use DP to reduce
the size of the optimization problem they solve. DP first filters
all demands below a pre-defined threshold and routes them
through (pins them to) their shortest path. It then routes the
remaining demands optimally using the available capacity
(see Fig. 1).
MetaOpt authors modeled DP directly as an optimization
problem. They also provided a number of helper functions that
allow operators to model it more easily (Fig. 1b). MetaOpt
solves a bi-level optimization that produces the performance
gap and demand that causes it (the flow in Fig. 1a). It is easy to
see what is missing: it is up to the operator to examine the sin-
gle output and find why DP underperformed. DP is amenable
to such manual analysis (see [
35
]), but not all heuristics are.
It is also hard for operators to extrapolate from this example
adversarial input and find all other regions of the input space
where DP may underperform. These limitations are exacer-
bated as we move to larger problems with more demands,
where it is harder to pinpoint how a heuristic’s decision to
route a particular demand interferes with its ability to route
others.
Vector bin packing (VBP) places multi-dimensional balls
into multi-dimensional bins and minimizes the number of
bins in use. Operators use VBP in many production systems,
such as to place VMs onto servers [9].
The VBP problem is APX-hard [
45
]. One heuristic that
solves VBP is first-fit (FF), which greedily places an incoming
ball in the first bin it fits in. Fig. 1c shows how we can encode
it in MetaOpt.
MetaOpt produces the adversarial ball sizes 1%, 49%, 51%,
51% (as a percentage of the bin size) for an example with
4 balls and 3 equal-sized bins (we use single-dimensional
balls) the optimal uses 2 bins while FF uses 3 (we show a
more complex version in Fig. 2). Once again, operators have
to reason through this example to identify why FF underper-
forms and what other inputs cause the same problem. This
is harder in FF and other VBP heuristics, such as best fit or
first fit decreasing, as evidenced by the years of research by
theoreticians in this space [36].
In this paper, we use the DP and VBP as running examples.
These examples are representative of the heuristics prior work
has studied [
16
,
35
] (the scheduling examples Virley studies
are conceptually similar to VBP, and we think our discussions
directly translate to those use-cases).
Prior work [
5
] shows that, using a single adversarial in-
stance, it is difficult to understand why a heuristic underper-
formed. It is even harder to generalize from why an adver-
sarial input causes the heuristic to underperform on a single
problem instance (or a few instances) to what properties in
the input and the problem instance cause it to underperform.
3 The case for comprehensive analysis
Prior work [
2
,
5
,
35
] show explaining adversarial inputs can
have benefits: we can improve DP’s performance gap by an or-
der of magnitude and produce congestion control algorithms
that meet pre-specified requirements [
2
]. But these results re-
quire manual analysis [
35
] or problem-specific models [
2
,
5
].
We see an opportunity for a new tool that enables opera-
tors to identify the full risk surface of the heuristic (the set
of inputs where the heuristic underperforms) and to identify
Towards Safer Heuristics With XPlain HOTNETS ’24, November 18–19, 2024, Irvine, CA, USA
1 2 3
45
100 100
50
50 50
Demand DP (thresh = 50) OPT
src-dest value path value path value
1350 1-2-350 1-4-5-350
12100 1-250 1-2100
23100 2-350 2-3100
Total DP 150 Total OPT 250
(a) DP from [
35
]. (left) Topology. (right) A set of de-
mands and their flow allocations using the DP heuris-
tic and the optimal (OPT) solution.
OuterVar:𝑑𝑘requested rate of demand 𝑘
Input:𝑃𝑘paths for demand 𝑘
Input:ˆ
𝑝𝑘shortest path
Input:𝑇𝑑demand pinning threshold
for all demand 𝑘 D do
ForceToZeroIfLeq(𝑑𝑘𝑓ˆ
𝑝𝑘
𝑘, 𝑑𝑘,𝑇𝑑)
end for
MaxFlow()
(b) DP in MetaOpt.
OuterVar:Y(size of balls)
Input:C(capacity of bins)
for all ball 𝑖and bin 𝑗do
r𝑖𝑗 =C𝑗Yi
ball 𝑢<𝑖
x𝑑
𝑢𝑗
𝑓𝑖𝑗 =AllLeq( [−rd
ij]d,0)
𝛾𝑖𝑗 =AllEq( [xd
ik]d,k<j,0)
𝛼𝑖𝑗 =AND(𝑓𝑖 𝑗 , 𝛾𝑖 𝑗 )
IfThenElse (𝛼ij,[ (x𝑖 𝑗 ,Y𝑖)],[ (x𝑖 𝑗 ,0)] )
end for
(c) Heuristic for VBP in MetaOpt.
Figure 1: Example heuristics and their encoding in MetaOpt (sub-figures (b) and (c)). Heuristic in sub-figure (b) forces
the demands less than a threshold to be pinned and then solves a flow maximization problem, heuristic in sub-figure (c)
assigns the first bin that can fit the ball.
0.3
0.8
0.2
0.4
0.7
0.7
0.15
0.85
0.25
0.25
0.3
0.75
0.75
0.6
0.12
0.4
0.4
Figure 2: Example adversarial instance for FF with equal-
sized bins with size of 1; the optimal uses 8 bins and the
heuristic 9.
why the heuristic underperforms automatically. It can produce
(1) a description of the entire area(s) where a heuristic has a
high performance gap; or (2) a description of what choices
the heuristic makes that cause it to underperform (the dif-
ference in the actions of the heuristic and the optimal can
point us to why the heuristic underperforms). Through these
outputs, these tools can make it safer for operators to use
heuristics in practice as they can mitigate the cases where
they underperform and maybe even design safer heuristics.
There are three levels of information we can provide: (1)
for a given problem instance, the sets of inputs that cause the
heuristic to underperform; (2) for a given problem instance,
a reason as to why the heuristic underperforms in each con-
tiguous region of the adversarial input space; and (3) for the
general case, the characteristics of the inputs and problem
instances that cause the heuristic to underperform.
Take DP as an example. The ideal tool would produce:
Type 1. For a given topology, the adversarial input sets are
of the form
𝐷𝑖
where each
𝐷𝑖R𝑛
+
represents a contiguous
subspace of the n-dimensional (8-dimensional in Fig. 1a for
8 demands) space.
For a given
𝐷𝑖
: (a) an entry
𝑑𝑖 𝑗 =𝑇𝜖
(here
𝑇
is the
demand pinning threshold and
𝜖
is a small positive value) if
there are multiple paths between the nodes
𝑖
and
𝑗
(we call
a demand
𝑑:𝑑𝑇
a pinnable demand); (b) for all other
𝑢𝑣
where a portion of the path between the nodes
𝑢
and
𝑣
intersects with the shortest path of a pinnable demand we have
𝑑𝑢𝑣 min (C𝑢 𝑣 𝑇)
. Here, the set
C𝑢 𝑣
contains the capacity of
all links on the path between
𝑢
and
𝑣
. The adversarial instance
in our example in Fig. 1a fits this behavior.
Type 2. For a given topology, DP routes pinned demands on
their shortest paths, but the optimal routes them through alter-
nate paths. We expect the pinned demands in each contiguous
subspace would all have a common pattern where they have
the same shortest path, and DP does shortest-path routing for
these demands, whereas the optimal does not.
Type 3. The heuristic’s performance is worse when the length
of the shortest path of the pinned demands is longer or the
capacity of the links along these paths is lower pinned
demands limit the heuristic’s ability to route other demands.
4 Challenges
It is hard to arrive at low-level models of a heuristic in order to
use existing analyzers [
2
,
16
,
35
], and operators need to have
expertise in either formal methods [
2
,
16
] or optimization
theory [
12
,
35
] to do so. We see an analogy with writing
imperative programs in assembly code: we can write any
program in assembly but it takes time, has a high risk of being
buggy, and makes code reviews (i.e., explanations) difficult.
Low-level models operate over variables and constructs
that are often hard to connect to the original problem (“Greek
letters” and “auxiliary variables” instead of “human-readable”
text). To model the first fit behavior, MetaOpt uses an auxil-
iary, binary variable
𝛼𝑖 𝑗
that captures whether bin
𝑗
is the first
bin where ball 𝑖fits in, and sets its value through:
𝛼𝑖 𝑗 𝑓𝑖𝑗 +Í{𝑘BIN S |𝑘<𝑗}(1𝑓𝑖𝑘 )
𝑗𝑖BALLS,𝑗BIN S
𝑗BIN S
𝛼𝑖 𝑗 =1𝑖BA LL S.
It is hard to derive an explanation from such a model and
harder still to connect it to how the heuristic works to explain
its behavior. We need a better and more descriptive language
to encode the behavior of the heuristic. We also need to:
Find adversarial subspaces and validate them. These are
subspaces of the input space where the inputs that fall in those
HOTNETS ’24, November 18–19, 2024, Irvine, CA, USA P. Karimi et al.
subspaces cause the heuristic to underperform. To find them,
we need a search algorithm that iterates and extrapolates from
the adversarial inputs existing analyzers find (similar to the
all-SAT problem [
17
,
34
,
48
], the input space is large, and we
cannot blindly search it to find adversarial inputs [
35
]). Once
we find a potential "adversarial subspace," we should validate
it: we need to check whether the heuristic’s performance gap
is higher for inputs that belong to the adversarial subspace
compared to those that do not with statistical significance.
Find why the inputs in each subspace cause bad perfor-
mance. It is reasonable to assume the inputs in the same
contiguous adversarial subspace trigger the same “bad behav-
ior” in the heuristic. To find and explain these behaviors, we
need to automatically reason through the heuristic’s actions
and compare them to those of the benchmark: we need to
concretely encode the heuristic and benchmark’s choices as
part of the language we design for our solution. The challenge
is to ensure this language applies to a broad range of problems
and is amenable to the types of automation we desire.
Generalize beyond a single instance. Perhaps the hardest
challenge is to generalize from the instance-based explana-
tions to one that applies to the heuristic’s behavior in the
general case: we have to find a valid extrapolation from these
instance-based examples and discover patterns that apply to
the heuristic’s behavior across different problem instances.
5 The XPlain proposal
We propose
X
Plain (Fig. 3). Users describe the heuristic and
benchmark through its domain specific language (§5.1). The
main purpose of this domain-specific language (DSL) is to
concretely define the behavior of the heuristic and benchmark,
which allows automated systems to analyze, compare, and
explain their behavior. The compiler translates the DSL into
low-level optimization constructs.
The adversarial subspace generator(§5.2) generates a
set of contiguous subspaces where the inputs in each sub-
space cause the heuristic to underperform and the signifi-
cance checker filters the outputs and ensures the subspaces
are statistically significant it checks that the inputs that fall
into these subspaces produce higher gaps compared to those
that do not with statistical significance.
The explainer (§5.3) describes how the heuristic’s actions
differ from the benchmark in each contiguous subspace for
a given problem instance. The generalizer (Fig. 5.2) extrap-
olates from these instance-based observations to produce
the properties of the inputs and the instance that cause the
heuristic to underperform. It uses instance-based explana-
tions across many instances to do so we use the instance
generator to create such instances.
5.1 The domain-specific language
To auto-generate the information we described in §3 we need
a DSL to concretely encode the heuristic and benchmark
algorithms. We need a DSL that: (1) can represent diverse
heuristics; (2) we can use to automatically compile into op-
timizations that we can efficiently solve (those that existing
solvers support and that do not introduce too many additional
constraints and variables compared to hand-written models);
and (3) is easy and intuitive to use.
We design an abstraction based on network flow prob-
lems [
11
]. Network flow problems are optimizations that,
given a set of sources and destinations, optimize how to route
traffic to respect capacity constraints, maximize link utiliza-
tion, etc. Network flow problems impose two key constraints:
the total flow on each link should be below the link capacity,
and what comes into a node should go out (flow conservation).
There are advantages to using network flow problems: they
have an intuitive graph representation [
11
] operators know
how to reason about the flow of traffic through such graphs;
we can easily translate them into convex optimization or fea-
sibility problems [
11
]; and they have many variants which we
can use and build upon.
We can use the network flow model and extend it through
a set of new “node behaviors” to ensure we can apply it to
a broad class of heuristics. Node behaviors are a set of con-
straints that operate on the flows coming in and going out
of each node: “split nodes” (enforce flow conservation con-
straints); “pick nodes” (enforce flow conservation constraints
but only allow flow on a single outgoing edge); “copy nodes”
(copy the flow that comes in onto all of their outgoing edges);
“source” and “sink” nodes (produce or consume traffic); etc.
A node can enforce multiple behaviors simultaneously. We
include node behaviors that do not enforce flow conservation
constraints (such as the “copy nodes") or capacity constraints
by default so that we can model a broad set of heuristics.
Users can also add metadata to each node or edge, which we
can use later to improve the explanations we produce.
Users encode the problem, the heuristic, and the benchmark
in the DSL in abstract terms. For example, to model VBP they
specify that the problem operates over (abstract) sequences of
different node types that correspond to the balls and bins in the
VBP problem. Users also encode the actions the heuristic and
the optimal can make in terms of the relationship between the
different sequences of nodes and the edges that connect them
and rules that govern how flow can traverse from one node to
the next. To analyze a specific instance of the VBP problem,
users input the number of balls and bins and then
X
Plain
concretizes the encoding (we show a concretized example
with 4 one-dimensional balls and 3 bins in Fig. 4b).
Our DSL allows us to model the examples from prior work.
We can model DP with split, source, and sink nodes (Fig. 4a),
Towards Safer Heuristics With XPlain HOTNETS ’24, November 18–19, 2024, Irvine, CA, USA
DSL
Instance
Generator
Compiler Heuristic
Analyzer
Adversarial
Sample
Adversarial
Subspace
Generator
Significance
Checker Explainer
Generalizer
Encode
Exclude
Subspace Type 2
Type 1 Type 3
Figure 3: XPlain: the system architecture we propose to extend existing heuristic analyzers.
Unmet Demand
12131415 23 43 45 53
DEMANDS
1-21-2-3 1-4-5-31-41-4-5 2-3 4-5-3 4-5 5-3
PATHS
12 1 423 4 5 5 3EDGES
Met Demand
(a) How we model our DP example (Fig. 1a) in the DSL.
𝐵0𝐵1𝐵2𝐵3BALLS
𝐵𝑖𝑛0𝐵𝑖𝑛1𝐵𝑖𝑛 2BINS
Occupancy
(b) How we model FF in the DSL.
Figure 4: Encoding heuristics in our DSL. We show sink nodes in ; source nodes enforcing behavior of split nodes
in and source nodes enforcing behavior of pick nodes in ; copy nodes in ; and split nodes with limited outgoing
capacity in . The edge colors show type 2 explanations: more intense red (blue) edges show there are more samples in
the subspace that only the heuristic (optimal) uses. In (a), DP uses the shortest path for the demand between
13
and
the optimal does not. In (b), we see FF places a large ball (
𝐵0
) in the first bin, causing it to have to place the last ball
differently, too. We used 3000 samples for each explanation. XPlain took 20 minutes to produce each figure.
and we use “pick nodes” with limited capacity that only allow
a ball to be assigned to a single bin (Fig. 4b) to model FF.
We prove that we can represent any linear or mixed in-
teger problem through a small set of node behaviors (our
abstraction is sufficient) in App. A.
We can easily compile node behaviors into efficient op-
timizations. Our encoding allows us to solve the optimiza-
tion faster compared to the hand-coded optimization: our
DSL allows us to find redundant constraints and variables.
This, in turn, reduces the number of variables and constraints
MetaOpt adds in its re-writes
1
. We have implemented a com-
plete DSL in a LINQ [
41
]-style language: compared to the
original MetaOpt implementation, the compiled DSL ana-
lyzes our DP example 4.3
×
faster. MetaOpt does not re-write
FF, and we do not provide any run-time gains in that case.
Open questions. We can describe any heuristic that MetaOpt
can analyze in our DSL. To support other analyzers (e.g., [
16
])
we may need to change our compiler and add other node
1
Gurobi’s pre-solve can also do this, but it changes the variable names, making
it hard to connect them back to the original problem.
behaviors. We also need to understand what metadata the user
can (or should) provide to enable
X
Plain. This may require a
co-design with XPlain’s other components.
Although we have proved that any mixed integer program
can be mapped to our DSL (App. A), that does not mean such
a mapping is the most efficient representation of the heuristic
in the DSL: we may achieve better performance if we model
the heuristic directly in the DSL. We need further research to
formalize and guide users in how to do so and optimize their
representations.
5.2 The adversarial subspace generator
Random search cannot find adversarial subspaces (it may not
even find an adversarial point [
35
]). We propose an algorithm
where we extrapolate from the heuristic analyzer’s output and:
(1) use the analyzer to find an adversarial example; (2) find
the adversarial subspace around that example; (3) exclude that
subspace and repeat until we can no longer find an adversarial
example (where the heuristic significantly underperforms)
outside all of the subspaces we have found so far.
HOTNETS ’24, November 18–19, 2024, Irvine, CA, USA P. Karimi et al.
To find each adversarial subspace, we first find a rough
candidate region: we sample in a cubic area around the initial
adversarial point given by a heuristic analyzer and expand
our sampling area based on the density of adversarial (bad)
samples we find in each direction. We define these “directions”
based on where the sub-cube (slice) lies with respect to the
initial adversarial point that MetaOpt found. We stop when
the density of bad samples drops in all possible expansion
directions (Fig. 5a).
We go “slice by slice" when we investigate the cubic region
around the initial bad sample because the adversarial subspace
may not be uniformly spread around the initial point. We
extend our sampling regions only around the slices where
the density of bad samples is high. We pick the number of
samples we use based on the DKW inequality [33].
These subspace boundaries we have so far are not exact:
how big we pick our slices and how much we expand them in
each iteration influence how many false positives fall into the
subspace. We refine the subspace based on an idea from prior
work in diagnosis [
13
]. We train a regression tree that predicts
the performance gap on samples in our rough subspace. The
predicates that form the path that starts at the root of this tree
and reaches the leaf that contains the initial bad sample more
accurately describe the subspace (Fig. 5b).
The significance checker ensures the subspaces we find are
statistically significant: the points in a subspace cause a higher
performance gap compared to those immediately outside it.
We only report those subspaces with a low-p-value (less than
0.05) as adversarial.
We use the Wilcoxon signed-rank test [
44
], which allows
for dependant samples the subspace fully describes what
points are inside and what points are not (the samples in the
two pools are dependent). We find subspaces for DP and VBP
with p-values 2×1060 and 8×1011 , respectively.
Our approach allows us to find all statistically significant
subspaces that meet our exploration granularity. If we do not
include an adversarial input in a subspace (if it is outside of
the region we explored), the analyzer will find it in the next
iteration. Users can control
X
Plain’s ability to find all adver-
sarial scenarios: they can use smaller cube-sizes to explore
the space in more detail but it comes at the cost of a slower
runtime. They can also elect to include those parts of the ini-
tial subspaces
X
Plain finds (before we apply the decision tree)
as part of MetaOpt’s decision space (if they do so they need
to include the number of times they are willing to re-examine
an area to avoid an infinite cycle there may be regions that
are not statistically significant and
X
Plain would revisit them
if they contain a input instance that produces a high gap).
Open questions. The decision tree helps us identify predi-
cates (of the form
𝑓𝑡
where
𝑓
is a feature and
𝑡
a threshold)
that describe a subspace. What features we train the tree on
influence what predicates we can get. On small instances we
can use raw inputs but on larger instances this would require
a deep decision tree to fully describe the space the output
becomes computationally more difficult to use in the next step
(step (3) above). We need to define functions
F (I)
of the
input
I
that allow us to describe these subspaces efficiently
and which we can use in the analyzers to execute step (3) (i.e.,
where we exclude a subspace and re-run the analyzer).
It may be better if we apply the adversarial subspace gen-
erator (steps (1)-(3) above) directly to the “projected” input
space: where each function
F (I)
describes one dimension
of the
𝑚
-dimensional space (note,
𝑚
need not be the same as
the dimensions of the input space
I
). If the space defined by
the adversarial subspaces is sparse this approach may allow
us to find these adversarial subspaces more efficiently.
We may need additional mechanisms to help scale
X
Plain
it may take a long time to find adversarial subspaces if we
analyze a large problem instance or if there are many disjoint
subspaces.
5.3 The explainer
We hypothesize that the inputs in a contiguous subspace share
the same root cause for why they cause the heuristic to under-
perform. This is where a network-flow-based DSL explicitly
encoding the decisions of the heuristic and the benchmark
algorithm proves useful. We run samples from within each
contiguous subspace through the DSL and score edges based
on if: (1) both the benchmark and the heuristic send flow
on that edge (score = 0); (2) only the benchmark sends flow
(score = 1); or (3) only the heuristic sends flow (score = -1).
Such a “heatmap” of the differences between the bench-
mark and the heuristic shows how inputs in the subspace
interfere with the heuristic. In Fig. 4a, in a given subspace
with
3000
samples, all pinnable demands share the same short-
est path (red arrows in 1-2-3 path), and the optimal routes
them through alternative paths (blue arrows in 1-4-5-3 path).
Open questions. As the instance size (the scale of the prob-
lem we want to analyze) grows, the above heatmap may be-
come harder to interpret. We need mechanisms that allow us
to summarize the information in this heatmap in a way that
the user can interpret and use to improve their heuristic.
The heuristic and benchmark also differ in how much flow
they route on each edge. We need to define the appropriate
data structure to represent this information to a user so that
they are interpretable and actionable.
5.4 The generalizer and instance generator
We can enable operators to improve their heuristics or know
when to apply mitigations if we can extrapolate from the
type 1 and 2 explanations to form type 3: what properties
in the adversarial inputs cause the heuristic to underperform
and what aspects of the problem instance exacerbate it? We
Towards Safer Heuristics With XPlain HOTNETS ’24, November 18–19, 2024, Irvine, CA, USA
(a) Identifying dense adversarial slices.
Í3
𝑛=0𝐵𝑛1.5
𝐺𝑎𝑝 =1% 𝐵1<=0.5
𝐺𝑎𝑝 =25% 𝐺 𝑎𝑝 =3%
(b) Refinement by regression tree.
Adversarial subspaces: 𝑖𝐷𝑖
𝐷𝑖=𝑗XR+4A
T𝑖XC𝑖
𝑗
V𝑖
A=I4×4
I4×4,X=[𝐵0𝐵1𝐵2𝐵3]𝑇
𝐷0:C0
0=[0.01 0.51 0.51 0.51 0 0.49 0.49 0.49 ]𝑇
T0=1111
0100,V0=[1.5 0.5]𝑇
(c) The adversarial subspaces for FF.
Figure 5: The adversarial subspace generator: (a) finds a rough subspace and separates bad samples ( ) from good ones
( ); (b) it trains a regression tree on these samples and uses it to refine the subspace and produces (c). We show the first
subspace (
𝐷0
) for our FF example in (c). Here,
𝐶𝑖
𝑗
encodes the rough subspace and
𝑇𝑖
and
𝑉𝑖
the path in the regression tree.
need to find trends across instance-based information and
find an instance-agnostic explanation for why the heuristic
underperformed.
To discover patterns, we need to consider a diverse set of
instances and identify trends in the outputs of the subspace
generator and the explainer. We build an instance generator
that uses the problem description in the DSL to create such
instances and feeds them into the pipeline.
We imagine the generalizer would contain a “grammar”
that uses the metadata the user provides through the DSL
along with the network flow structure to describe trends in the
instance-based explanations. For example, one may consider
this predicate from a hypothetical grammar:
increasing(P):𝑎, 𝑏 |𝑎, 𝑏 P &|𝑎|≥|𝑏| 𝑔𝑎𝑝 (𝑎) 𝑔𝑎 𝑝 (𝑏)
With such a grammar, a generalizer can go through the
observations on the samples the instance generator produced
and check if the predicates in the grammar are statistically
significant. For example, if
P
describes the set of shortest
paths of pinnable demands in DP, the generalizer might pro-
duce
increasing(P)
for why DP underperforms this
predicate suggests that the gap is larger when the shortest
path of the pinnable demands is longer.
Open questions. One may envision a solution similar to enu-
merative synthesis [
3
,
18
,
20
], which searches through the
grammar, finds all predicates that hold for a particular heuris-
tic, and forms clauses that explain the heuristic’s behavior.
We need more work to define the generalizer’s grammar and
how to build valid clauses from them.
6 Related work
To our knowledge, this is the first work that focuses on a gen-
eral framework to provide more insights into the outputs of
heuristic analysis tools [
16
,
35
] and provides an explainability
feature for these tools. We build on prior work:
Domain customized performance analyzers. The work we
do in
X
Plain also applies to custom performance analyzers,
which only apply to specific heuristics [57].
Explainable AI.
X
Plain resembles prior work in explainable
AI, which provided more context around what different ML
models predict [
31
,
38
,
42
]. Parts of our solution (including
the three types) are inspired by these works [4,8,37].
Enumerative Synthesis. This field generates programs that
meet a specification through systematic enumeration of pos-
sible program candidates [
3
,
18
,
20
]. We believe these ideas
can help us to design the generalizer.
Large Language Models (LLMs). we may be able to use
LLMs [
46
] for various parts of our designs these include: to
generate the DSL, to summarize Type 2 explanations, and to
generate the grammer we need to produce Type 3 explanations.
But LLMs are prone to hallucination [
21
,
29
] and also require
additional step-by-step mechanisms to guide them [
26
,
43
].
We may be able to build a natural language interface that can
help us automatically generate the DSL. Such an interface
will enable non-experts to more easily use XPlain. This, too,
is an interesting topic for future work.
7 Acknowledgements
We would like to thank Basmira Nushi, Ishai Menache, Kon-
stantina Mellou, Luke Marshall, Amin Khodaverdian, Chen-
ning Li, Joe Chandler, and Weiyang Wang for their valuable
comments. We also than the HotNets program committee for
their valuable feedback.
HOTNETS ’24, November 18–19, 2024, Irvine, CA, USA P. Karimi et al.
References
[1]
Anup Agarwal, Venkat Arun, Devdeep Ray, Ruben Martins, and Srini-
vasan Seshan. 2022. Automating network heuristic design and analysis.
In Proceedings of the 21st ACM Workshop on Hot Topics in Networks.
8–16.
[2]
Anup Agarwal, Venkat Arun, Devdeep Ray, Ruben Martins, and Srini-
vasan Seshan. 2024. Towards provably performant congestion control.
In 21st USENIX Symposium on Networked Systems Design and Imple-
mentation (NSDI 24). USENIX Association, Santa Clara, CA, 951–978.
https://www.usenix.org/conference/nsdi24/presentation/agarwal-anup
[3]
Rajeev Alur, Rastislav Bodik, Garvit Juniwal, Milo M.K. Martin,
Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando
Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-
Guided Synthesis. Proceedings of the International Conference on
Formal Methods in Computer-Aided Design (2013). https://doi
.
org/
10.1109/FMCAD.2013.6679385
[4]
Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Har-
ald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and
Thomas Zimmermann. 2019. Software engineering for machine learn-
ing: A case study. In 2019 IEEE/ACM 41st International Conference on
Software Engineering: Software Engineering in Practice (ICSE-SEIP).
IEEE, 291–300.
[5]
Mina Tahmasbi Arashloo, Ryan Beckett, and Rachit Agarwal. 2023.
Formal Methods for Network Performance Analysis. In 20th USENIX
Symposium on Networked Systems Design and Implementation (NSDI
23). 645–661.
[6]
Venkat Arun, Mohammad Alizadeh, and Hari Balakrishnan. 2022. Star-
vation in end-to-end congestion control. In Proceedings of the ACM
SIGCOMM 2022 Conference (Amsterdam, Netherlands) (SIGCOMM
’22). Association for Computing Machinery, New York, NY, USA,
177–192. https://doi.org/10.1145/3544216.3544223
[7]
Venkat Arun, Mina Tahmasbi Arashloo, Ahmed Saeed, Mohammad
Alizadeh, and Hari Balakrishnan. 2021. Toward formally verifying con-
gestion control behavior. In Proceedings of the 2021 ACM SIGCOMM
2021 Conference. 1–16.
[8]
Behnaz Arzani, Kevin Hsieh, and Haoxian Chen. 2021. Interpretable
feedback for AutoML and a proposal for domain-customized AutoML
for networking. In Proceedings of the 20th ACM Workshop on Hot
Topics in Networks. 53–60.
[9]
Hugo Barbalho, Patricia Kovaleski, Beibin Li, Luke Marshall, Marco
Molinaro, Abhisek Pan, Eli Cortez, Matheus Leao, Harsh Patwari, Zuzu
Tang, et al
.
2023. Virtual machine allocation with lifetime predictions.
Proceedings of Machine Learning and Systems 5 (2023).
[10]
Ryan Beckett, Aarti Gupta, Ratul Mahajan, and David Walker. 2017.
A General Approach to Network Configuration Verification. In Pro-
ceedings of the Conference of the ACM Special Interest Group on Data
Communication (Los Angeles, CA, USA) (SIGCOMM ’17). ACM, New
York, NY, USA, 155–168. https://doi.org/10.1145/3098822.3098834
[11]
Dimitris Bertsimas and John N Tsitsiklis. 1997. Introduction to linear
optimization. Vol. 6. Athena Scientific Belmont, MA.
[12]
Stephen P Boyd and Lieven Vandenberghe. 2004. Convex optimization.
Cambridge university press.
[13]
Mike Chen, Alice X Zheng, Jim Lloyd, Michael I Jordan, and Eric
Brewer. 2004. Failure diagnosis using decision trees. In International
Conference on Autonomic Computing, 2004. Proceedings. IEEE, 36–
43.
[14]
Seyed K. Fayaz, Tushar Sharma, Ari Fogel, Ratul Mahajan, Todd
Millstein, Vyas Sekar, and George Varghese. 2016. Efficient Net-
work Reachability Analysis Using a Succinct Control Plane Repre-
sentation. In 12th USENIX Symposium on Operating Systems De-
sign and Implementation (OSDI 16). USENIX Association, Savannah,
GA, 217–232. https://www
.
usenix
.
org/conference/osdi16/technical-
sessions/presentation/fayaz
[15]
Aaron Gember-Jacobson, Raajay Viswanathan, Aditya Akella, and
Ratul Mahajan. 2016. Fast Control Plane Analysis Using an Abstract
Representation. In Proceedings of the 2016 ACM SIGCOMM Confer-
ence (Florianopolis, Brazil) (SIGCOMM ’16). ACM, New York, NY,
USA, 300–313. https://doi.org/10.1145/2934872.2934876
[16]
Saksham Goel, Benjamin Mikek, Jehad Aly, Venkat Arun, Ahmed
Saeed, and Aditya Akella. 2023. Quantitative verification of scheduling
heuristics. arXiv preprint arXiv:2301.04205 (2023).
[17]
Orna Grumberg, Assaf Schuster, and Avi Yadgar. 2004. Memory Ef-
ficient All-Solutions SAT Solver and Its Application for Reachability
Analysis. In Formal Methods in Computer-Aided Design, Alan J. Hu
and Andrew K. Martin (Eds.). Springer Berlin Heidelberg, Berlin, Hei-
delberg, 275–289.
[18]
Sumit Gulwani, Susmit Jha, Ashish Tiwari, and Ramarathnam Venkate-
san. 2011. Synthesis of Loop-free Programs. In Proceedings of the
32nd ACM SIGPLAN Conference on Programming Language Design
and Implementation (PLDI). ACM, 62–73.
[19]
Alex Horn, Ali Kheradmand, and Mukul Prasad. 2017. Delta-net:
Real-time Network Verification Using Atoms. In 14th USENIX Sym-
posium on Networked Systems Design and Implementation (NSDI 17).
USENIX Association, Boston, MA, 735–749. https://www
.
usenix
.
org/
conference/nsdi17/technical-sessions/presentation/horn- alex
[20]
Kangjing Huang, Xiaokang Qiu, Peiyuan Shen, and Yanjun Wang.
2020. Reconciling enumerative and deductive program synthesis. In
Proceedings of the 41st ACM SIGPLAN Conference on Programming
Language Design and Implementation. 1159–1174.
[21]
Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng,
Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing
Qin, et al
.
2023. A survey on hallucination in large language models:
Principles, taxonomy, challenges, and open questions. arXiv preprint
arXiv:2311.05232 (2023).
[22]
Karthick Jayaraman, Nikolaj Bjorner, Jitu Padhye, Amar Agrawal,
Ashish Bhargava, Paul-Andre C Bissonnette, Shane Foster, Andrew
Helwer, Mark Kasten, Ivan Lee, Anup Namdhari, Haseeb Niaz, Anirud-
dha Parkhi, Hanukumar Pinnamraju, Adrian Power, Neha Milind Raje,
and Parag Sharma. 2019. Validating Datacenters at Scale. In Proceed-
ings of the ACM Special Interest Group on Data Communication (Bei-
jing, China) (SIGCOMM ’19). ACM, New York, NY, USA, 200–213.
https://doi.org/10.1145/3341302.3342094
[23]
Karthick Jayaraman, Nikolaj Bjørner, Geoff Outhred, and Charlie Kauf-
man. 2014. Automated Analysis and Debugging of Network Connectiv-
ity Policies. Technical Report MSR-TR-2014-102. Microsoft.
[24]
Siva Kesava Reddy Kakarla, Ryan Beckett, Behnaz Arzani, Todd
Millstein, and George Varghese. 2020. GRoot: Proactive Verifica-
tion of DNS Configurations. In Proceedings of the Annual Confer-
ence of the ACM Special Interest Group on Data Communication
on the Applications, Technologies, Architectures, and Protocols for
Computer Communication (Virtual Event, USA) (SIGCOMM ’20). As-
sociation for Computing Machinery, New York, NY, USA, 310–328.
https://doi.org/10.1145/3387514.3405871
[25]
Siva Kesava Reddy Kakarla, Alan Tang, Ryan Beckett, Karthick Jayara-
man, Todd Millstein, Yuval Tamir, and George Varghese. 2020. Finding
Network Misconfigurations by Automatic Template Inference. In 17th
USENIX Symposium on Networked Systems Design and Implementa-
tion (NSDI 20). USENIX Association, Santa Clara, CA, 999–1013.
https://www.usenix.org/conference/nsdi20/presentation/kakarla
[26]
Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Kaya Stechly,
Mudit Verma, Siddhant Bhambri, Lucas Saldyt, and Anil Murthy. 2024.
LLMs Can’t Plan, But Can Help Planning in LLM-Modulo Frameworks.
arXiv preprint arXiv:2402.01817 (2024).
Towards Safer Heuristics With XPlain HOTNETS ’24, November 18–19, 2024, Irvine, CA, USA
[27]
Peyman Kazemian, George Varghese, and Nick McKeown. 2012.
Header Space Analysis: Static Checking for Networks. In 9th
USENIX Symposium on Networked Systems Design and Imple-
mentation (NSDI 12). USENIX Association, San Jose, CA, 113–
126. https://www
.
usenix
.
org/conference/nsdi12/technical-sessions/
presentation/kazemian
[28]
Ahmed Khurshid, Xuan Zou, Wenxuan Zhou, Matthew Caesar, and
P. Brighten Godfrey. 2013. VeriFlow: Verifying Network-Wide Invari-
ants in Real Time. In Presented as part of the 10th USENIX Sympo-
sium on Networked Systems Design and Implementation (NSDI 13).
USENIX, Lombard, IL, 15–27. https://www
.
usenix
.
org/conference/
nsdi13/technical-sessions/presentation/khurshid
[29]
Fang Liu, Yang Liu, Lin Shi, Houkun Huang, Ruifeng Wang, Zhen
Yang, and Li Zhang. 2024. Exploring and evaluating hallucinations
in llm-powered code generation. arXiv preprint arXiv:2404.00971
(2024).
[30]
Nuno P. Lopes, Nikolaj Bjørner, Patrice Godefroid, Karthick Jayaraman,
and George Varghese. 2015. Checking Beliefs in Dynamic Networks.
In Proceedings of the 12th USENIX Conference on Networked Sys-
tems Design and Implementation (Oakland, CA) (NSDI’15). USENIX
Association, USA, 499–512.
[31]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to inter-
preting model predictions. Advances in neural information processing
systems 30 (2017).
[32]
Haohui Mai, Ahmed Khurshid, Rachit Agarwal, Matthew Caesar,
P. Brighten Godfrey, and Samuel Talmadge King. 2011. Debugging the
Data Plane with Anteater. SIGCOMM Comput. Commun. Rev. 41, 4
(aug 2011), 290–301. https://doi.org/10.1145/2043164.2018470
[33]
P. Massart. 1990. The Tight Constant in the Dvoretzky-Kiefer-
Wolfowitz Inequality. The Annals of Probability 18, 3 (1990), 1269–
1283. https://doi.org/10.1214/aop/1176990746
[34]
Ken L. McMillan. 2002. Applying SAT Methods in Unbounded Sym-
bolic Model Checking. In Computer Aided Verification, Ed Brinksma
and Kim Guldstrand Larsen (Eds.). Springer Berlin Heidelberg, Berlin,
Heidelberg, 250–264.
[35]
Pooria Namyar, Behnaz Arzani, Ryan Beckett, Santiago Segarra, Hi-
manshu Raj, Umesh Krishnaswamy, Ramesh Govindan, and Srikanth
Kandula. 2024. Finding Adversarial Inputs for Heuristics using Multi-
level Optimization. In 21st USENIX Symposium on Networked Sys-
tems Design and Implementation (NSDI 24). USENIX Association,
Santa Clara, CA, 927–949. https://www
.
usenix
.
org/conference/nsdi24/
presentation/namyar-finding
[36]
Rina Panigrahy, Kunal Talwar, Lincoln Uyeda, and Udi Wieder. 2011.
Heuristics for vector bin packing. research. microsoft. com (2011).
[37]
P Jonathon Phillips, P Jonathon Phillips, Carina A Hahn, Peter C
Fontana, Amy N Yates, Kristen Greene, David A Broniatowski, and
Mark A Przybocki. 2021. Four principles of explainable artificial
intelligence. (2021).
[38]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why
should i trust you?" Explaining the predictions of any classifier. In
Proceedings of the 22nd ACM SIGKDD international conference on
knowledge discovery and data mining. 1135–1144.
[39]
Alan Tang, Siva Kesava Reddy Kakarla, Ryan Beckett, Ennan Zhai,
Matt Brown, Todd Millstein, Yuval Tamir, and George Varghese. 2021.
Campion: Debugging Router Configuration Differences. In Proceedings
of the 2021 ACM SIGCOMM 2021 Conference (Virtual Event, USA)
(SIGCOMM ’21). Association for Computing Machinery, New York,
NY, USA, 748–761. https://doi.org/10.1145/3452296.3472925
[40]
Bingchuan Tian, Xinyi Zhang, Ennan Zhai, Hongqiang Harry Liu,
Qiaobo Ye, Chunsheng Wang, Xin Wu, Zhiming Ji, Yihong Sang, Ming
Zhang, Da Yu, Chen Tian, Haitao Zheng, and Ben Y. Zhao. 2019.
Safely and Automatically Updating In-Network ACL Configurations
with Intent Language. In Proceedings of the ACM Special Interest
Group on Data Communication (Beijing, China) (SIGCOMM ’19).
Association for Computing Machinery, New York, NY, USA, 214–226.
https://doi.org/10.1145/3341302.3342088
[41]
Mads Torgersen. 2007. Querying in C# how language integrated query
(LINQ) works. In Companion to the 22nd ACM SIGPLAN conference
on Object-oriented programming systems and applications companion.
852–853.
[42]
Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfac-
tual explanations without opening the black box: Automated decisions
and the GDPR. Harv. JL & Tech. 31 (2017), 841.
[43]
Lei Wang, Wanyu Xu, Yihuai Lan, Zhiqiang Hu, Yunshi Lan, Roy Ka-
Wei Lee, and Ee-Peng Lim. 2023. Plan-and-solve prompting: Improving
zero-shot chain-of-thought reasoning by large language models. arXiv
preprint arXiv:2305.04091 (2023).
[44]
Frank Wilcoxon. 1945. Individual comparisons by ranking methods.
Biometrics bulletin 1, 6 (1945), 80–83.
[45]
Gerhard J Woeginger. 1997. There is no asymptotic PTAS for two-
dimensional vector packing. Inform. Process. Lett. 64, 6 (1997), 293–
297.
[46]
Zheyu Yan, Yifan Qin, Xiaobo Sharon Hu, and Yiyu Shi. 2023. On
the Viability of Using LLMs for SW/HW Co-Design: An Example in
Designing CiM DNN Accelerators. In 2023 IEEE 36th International
System-on-Chip Conference (SOCC). 1–6. https://doi
.
org/10
.
1109/
SOCC58585.2023.10256783
[47]
Hongkun Yang and Simon S. Lam. 2016. Real-time Verifica-
tion of Network Properties Using Atomic Predicates. IEEE/ACM
Trans. Netw. 24, 2 (April 2016), 887–900. https://doi
.
org/10
.
1109/
TNET.2015.2398197
[48]
Yinlei Yu, Pramod Subramanyan, Nestan Tsiskaridze, and Sharad Ma-
lik. 2014. All-SAT Using Minimal Blocking Clauses. In 2014 27th
International Conference on VLSI Design and 2014 13th International
Conference on Embedded Systems. 86–91. https://doi
.
org/10
.
1109/
VLSID.2014.22
[49]
Peng Zhang, Xu Liu, Hongkun Yang, Ning Kang, Zhengchang Gu, and
Hao Li. 2020. APKeep: Realtime Verification for Real Networks. In
17th USENIX Symposium on Networked Systems Design and Imple-
mentation (NSDI 20). USENIX Association, Santa Clara, CA, 241–255.
https://www.usenix.org/conference/nsdi20/presentation/zhang-peng
HOTNETS ’24, November 18–19, 2024, Irvine, CA, USA P. Karimi et al.
Split
𝑓(𝑖0,𝑛)𝑓(𝑖2,𝑛)
𝑓(𝑖1,𝑛)
𝑓(𝑛,𝑗 0)𝑓(𝑛,𝑗 1)
𝑓(𝑖0,𝑛)+𝑓(𝑖1,𝑛)+𝑓(𝑖2,𝑛 )=𝑓(𝑛, 𝑗0)+𝑓(𝑛, 𝑗1)
(a) SPLIT NODE
Pick
𝑓(𝑖0,𝑛)
𝑓(𝑛,𝑗 0)𝑓(𝑛,𝑗 1)
𝑓(𝑖0,𝑛)=𝑓(𝑛, 𝑗𝑘)𝑓(𝑛, 𝑗1𝑘)=0𝑘 {0,1}
(b) PICK NODE
×𝐶
𝑓(𝑖,𝑛)
𝑓(𝑛,𝑗 )=𝐶·𝑓(𝑖 ,𝑛)
(c) MULTIPLY NODE
AllEq
𝑓(𝑖0,𝑛)𝑓(𝑖2,𝑛)
𝑓(𝑖1,𝑛)
𝑓(𝑛,𝑗 0)𝑓(𝑛,𝑗 1)
𝑓(𝑛,∗) =𝑓( ,𝑛)
(d) ALL EQUAL NODE
Copy
𝑓(𝑖0,𝑛)𝑓(𝑖1,𝑛)
𝑓(𝑛,𝑗 0)𝑓(𝑛,𝑗 2)
𝑓(𝑛,𝑗 1)
𝑓(𝑛,∗) =𝑓(𝑖0,𝑛 )+𝑓(𝑖1,𝑛)
(e) COPY NODE
Sink
𝑓(𝑖0,𝑛)
𝑓(𝑖1,𝑛)
𝑓(𝑖2,𝑛)
Objective: Í𝑖𝑓(𝑖,𝑛)
(f) SINK NODE
Figure 6: Different node types in XPlain’s DSL.
A Formalizing XPlain’s DSL
We prove that we can model any linear optimization in
X
Plain.
A.1 XPlain’s node description
PRELIMINARIES. Our network-flow-based DSL is a directed
graph where we denote the set of nodes with
N
and the set
of directed edges as
E
. We treat each edge
(𝑖, 𝑗 ) E
as
a variable with a non-negative flow value
𝑓(𝑖, 𝑗 )0
. We
impose constraints on these flow variables as needed. We
define incoming edges to node
𝑛 N
as those edges which
are directed towards
𝑛
(i.e.,
(𝑖, 𝑛) E
). Outgoing edges are
those exiting
𝑛
. The incoming (outgoing) traffic to a node
is the sum of all flow that arrives at that node from all the
incoming (outgoing) edges.
We have the following node behaviors:
SPLIT NODES (
N𝑠𝑝𝑙𝑖𝑡
)split the incoming traffic between the
outgoing edges (Fig. 6a). They enforce the traditional flow
conservation constraints:
{𝑖 N,(𝑖,𝑛 ) E }
𝑓(𝑖,𝑛)=
{𝑖 N,(𝑛,𝑖 ) E }
𝑓(𝑛,𝑖)𝑛 N𝑠𝑝𝑙𝑖𝑡
They can also optionally enforce (1) an upper bound on the
traffic on an outgoing edge (capacity constraint) and (2) the
traffic on an incoming edge to be constant.
𝑓(𝑛,𝑖)𝐶(𝑛,𝑖 )𝐶(𝑛,𝑖 )R+,𝑖 {𝑖 N,(𝑛, 𝑖 ) E} 𝑛 N𝑠𝑝𝑙𝑖𝑡
𝑓(𝑖,𝑛)=𝑑(𝑖 ,𝑛)𝑑(𝑖 ,𝑛)R0,𝑖 {𝑖 N,(𝑖 , 𝑛) E} 𝑛 N𝑠𝑝𝑙 𝑖𝑡
PICK NODES (
N𝑝𝑖𝑐𝑘
)satisfy flow conservation but only allow
one of the outgoing edges to carry traffic (Fig. 6b):
{𝑖 N,(𝑖,𝑛 ) E }
𝑓(𝑖,𝑛)=
{𝑖 N,(𝑛,𝑖 ) E }
𝑓(𝑛,𝑖)𝑛 N𝑝𝑖𝑐𝑘
{𝑖 N,(𝑛,𝑖 ) E }
1[𝑓(𝑛,𝑖)>0]=1𝑛 N𝑝𝑖𝑐𝑘
where
1[𝑥>0]
is an indicator function (=
1
if
𝑥>0
, other-
wise = 0).
MULTIPLY NODES (
N𝑚𝑢𝑙 𝑡
)only have one incoming and one
outgoing link. They multiply the incoming traffic by a con-
stant
𝐶R+
before sending it out (Fig. 6c). They only satisfy
flow conservation when 𝐶=1.
𝑓(𝑛,𝑖)=𝐶 𝑓 (𝑗,𝑛 )∀(𝑖, 𝑗) { (𝑖, 𝑗 ) | 𝑖, 𝑗 N,(𝑛, 𝑖 ),(𝑗, 𝑛 ) E } 𝑛 N𝑚𝑢𝑙 𝑡
AL L EQ UAL NO DES (
N𝑎𝑙𝑙 𝐸𝑞
)require all the incoming and
outgoing edges to carry the same amount of traffic (Fig. 6d):
𝑓(𝑛,𝑖)=𝑓(𝑗 ,𝑛)∀(𝑖, 𝑗 ) {(𝑖, 𝑗) | 𝑖, 𝑗 N ,(𝑛, 𝑖 ),(𝑗, 𝑛) E } 𝑛 N𝑎𝑙𝑙𝐸𝑞
To make it simpler to encode a heuristic in the DSL, we
also add the following node types to our DSL:
COPY NODES (
N𝑐𝑜𝑝 𝑦
)copy the total incoming flow into each
outgoing edge (Fig. 6e):
𝑓(𝑛,𝑗 )=
{𝑖 N,(𝑖,𝑛 ) E }
𝑓(𝑖,𝑛)𝑗 { 𝑗|𝑗 N,(𝑛, 𝑗 ) E } 𝑛 N𝑐𝑜𝑝 𝑦
Towards Safer Heuristics With XPlain HOTNETS ’24, November 18–19, 2024, Irvine, CA, USA
We can recreate this node’s behavior if we combine split
nodes and equal nodes (Fig. 7). However, using a copy node
directly is more intuitive and straightforward for users, and
we include it in our DSL for that reason.
Split
AllEq
𝑓(𝑖0,𝑛)𝑓(𝑖1,𝑛)
𝑓(𝑛,𝑗 0)𝑓(𝑛,𝑗2)
𝑓(𝑛,𝑗 1)
𝑓(𝑛,∗) =𝑓(𝑖0,𝑛 )+𝑓(𝑖1,𝑛)
Figure 7: Recreating COPY NODE with SPLIT NODE and
ALL EQUAL NODE
We use source and sink nodes to define the objective:
SOURCE NODES (
N𝑠𝑜𝑢𝑟𝑐𝑒
)are special cases of split or pick
nodes that represent the inputs to the problem. For example,
Fig. 4a illustrates the input traffic demand modeled as source
nodes that enforce split node behavior ( ). Also, Fig. 4b
shows the input ball sizes as source nodes with pick node
behavior ( , each ball can only be placed in one bin).
SINK NODE (
N𝑠𝑖𝑛𝑘
)is a specific node that (1) only has incom-
ing edges and (2) measures the performance of the problem as
the total incoming traffic through these edges (Fig. 6f). When
the DSL represents an optimization problem, the sink node
is designated as the objective, and the compiler translates the
value of the sink node into the optimization objective.
A.2 X
Plain can model any linear optimization
THE OR EM A.1. We can model any linear optimization
(linear programming or mixed integer linear programming)
as a flow network using the six node behaviors (
N𝑠𝑝𝑙𝑖𝑡
,
N𝑝𝑖𝑐𝑘
,
N𝑚𝑢𝑙 𝑡 ,N𝑎𝑙𝑙 𝐸𝑞 , and N𝑠𝑖𝑛𝑘 )
PROO F.
An optimization problem maximizes (or mini-
mizes) an objective subject to inputs that fall within a feasible
space that the optimization constraints characterize. We can
express a linear optimization problem as (linear programming
or mixed integer linear programming):
max
x,yc
xx+c
yy
Axx+Ayyb
x0
y {0,1}|y|
To show that our DSL is complete, we need to show that we
can capture both the feasible space and the objective correctly
through our flow model for every possible linear optimization.
We first present a general algorithm to express the feasible
space of any given linear optimization as a flow model and
prove it is correct. Next, we show how we can use the same
algorithm to express any linear objective.
How to represent the feasible space with a flow model. We
can express the feasible space of any linear optimization as:
Axx+Ayyb(1)
x0(2)
y {0,1}|y|(3)
where we denote matrices and vectors in bold. xand yare
vectors of continuous and binary variables of size
|x| × 1
and
|y| × 1
, respectively.
b
is a constant vector of size
|b| × 1
.
Ax
and
Ay
are constant matrices of sizes
|b|×|x|
and
|b|×|y|
respectively. Note that we can enforce an equality constraint
as two inequality constraints (Eq. 1), and represent any integer
variable as the sum of multiple binary variables. We map the
variables to flows in our model.
We need to transform the above optimization before we
can model it with our node behaviors:
Transformation 1. The matrices
Ax
and
A𝑦
, and the vector
b
may contain negative entries. This conflicts with the non-
negativity requirement of the flows in our flow model. To
address this, we decompose these matrices and vector into
their positive and negative components:
Ax=A+
xA
x,Ay=A+
yA
y,b=b+b
where all the elements in
A+
x=[𝑎(+,x)
𝑖 𝑗 ]
and
A
x=[𝑎(−,x)
𝑖 𝑗 ]
are non-negative such that at most one of
𝑎(+,x)
𝑖 𝑗
or
𝑎(−,x)
𝑖 𝑗
is non-zero for every
𝑖Z[0,|b|)
and
𝑗Z[0,|x|)
. Note that
Z[0,𝑚)={0, . . . ,𝑚 1}
. Same holds for both (1)
A+
y
and
A
y
, and (2)
b+=[𝑏+
𝑖]
and
b=[𝑏
𝑖]
over every
𝑖
. All
matrices have the same size as their originating matrix. After
substituting these decompositions into Eq. 1, we have:
A+
xx+A+
yy+bA
xx+A
yy+b+(4)
Transformation 2. Eq. 4 and SPLIT N OD Es qualitatively
represent similar behaviors. SP LI T N OD Es split the incoming
traffic across outgoing edges and ensure the traffic on each
edge does not exceed the capacity constraints. Ideally, we can
HOTNETS ’24, November 18–19, 2024, Irvine, CA, USA P. Karimi et al.
enforce the Eq. 4 constraints using a SPLIT NODEs and as a
flow conservation a constraint:
A+
xx+A+
yy+b+f=A
xx+A
yy+b+(Flow conservation)
0f(Flow constraint) (5)
The problem is that Eq. 4 also involves coefficients associ-
ated with each variable (
A
), while SP L IT N ODEs do not accept
weights. We address this by replacing each term (coefficient
multiplied by a variable) in each of the Eq. 4 constraints with
an auxiliary variable:
𝑢+
𝑖 𝑗 =𝑎( +,x)
𝑖 𝑗 𝑥𝑗, 𝑢
𝑖 𝑗 =0if 𝑎( +,x)
𝑖 𝑗 0𝑖Z[0,|b| ) ,𝑗Z[0,|x|)
𝑢
𝑖 𝑗 =𝑎( ,x)
𝑖 𝑗 𝑥𝑗, 𝑢 +
𝑖 𝑗 =0if 𝑎( ,x)
𝑖 𝑗 >0𝑖Z[0,|b| ) ,𝑗Z[0,|x|)
𝑣+
𝑖 𝑗 =𝑎( +,y)
𝑖 𝑗 𝑦𝑗, 𝑣
𝑖 𝑗 =0if 𝑎( +,y)
𝑖 𝑗 0𝑖Z[0,|b| ) ,𝑗Z[0,|y|)
𝑣
𝑖 𝑗 =𝑎( ,y)
𝑖 𝑗 𝑦𝑗, 𝑣 +
𝑖 𝑗 =0if 𝑎( ,y)
𝑖 𝑗 >0𝑖Z[0,|b| ) ,𝑗Z[0,|y|)
(6)
We define
U+=[𝑢+
𝑖 𝑗 ]
,
U=[𝑢
𝑖 𝑗 ]
,
V+=[𝑣+
𝑖 𝑗 ]
, and
V=
[𝑣
𝑖 𝑗 ]
. We can then express Eq. 5 in terms of these auxiliary
variables:
U+dx+V+dy+b+f=Udx+Vdy+b+,0f
where
dx
and
dy
are vectors with all elements equal to 1
and sizes of
|x| × 1
and
|y| × 1
respectively. This is because
each of the auxiliary variables
𝑢𝑖 𝑗
or
𝑣𝑖 𝑗
appear in exactly one
inequality constraint.
Transformation 3. We encounter a problem to enforce the
constraints in Eq. 6 using MU LTI PLY N ODE for
𝑢𝑖 𝑗
and
𝑣𝑖 𝑗
:
each MU LTI PLY N ODE has only one input and one output edge.
Each edge also corresponds to one variable. This means each
variable can appear in at most two constraints, corresponding
to the two nodes at the two ends of the edge. However, the
variables in Eq. 6 appear more than twice (for example,
𝑥𝑗
can appear up to |b|times.)
We address this by introducing additional variables and
constraints:
𝑢+
𝑖 𝑗 =𝑎( +,x)
𝑖 𝑗 𝑥+
𝑖 𝑗 , 𝑢
𝑖 𝑗 =𝑎( ,x)
𝑖 𝑗 𝑥
𝑖 𝑗 𝑖Z[0,|b|) ,𝑗Z[0,|x| )
𝑣+
𝑖 𝑗 =𝑎( +,y)
𝑖 𝑗 𝑦+
𝑖 𝑗 , 𝑣
𝑖 𝑗 =𝑎( ,y)
𝑖 𝑗 𝑦
𝑖 𝑗 𝑖Z[0,|b|) ,𝑗Z[0,|y|)
𝑥+
𝑖 𝑗 =𝑥
𝑖 𝑗 =𝑥𝑗𝑖Z[0,|b| ),𝑗Z[0,|x| )
𝑦+
𝑖 𝑗 =𝑦
𝑖 𝑗 =𝑦𝑗𝑖Z[0,|b| ),𝑗Z[0,|y| )
With these modifications, each variable
𝑥+
𝑖 𝑗
and
𝑥
𝑖 𝑗
appears
in exactly two constraints (same for y).
The final resulting optimization after all the transformations
is:
U+dx+V+dy+b+f=Udx+Vdy+b+,0f(7)
𝑢+
𝑖 𝑗 =𝑎( +,x)
𝑖 𝑗 𝑥+
𝑖 𝑗 𝑖𝑗(8)
𝑥
𝑖 𝑗 =
1
𝑎(−,x)
𝑖 𝑗
𝑢
𝑖 𝑗 if 𝑎( ,x)
𝑖 𝑗 >0𝑖𝑗(9)
𝑣+
𝑖 𝑗 =𝑎( +,y)
𝑖 𝑗 𝑦+
𝑖 𝑗 𝑖𝑗(10)
𝑦
𝑖 𝑗 =
1
𝑎(−,y)
𝑖 𝑗
𝑣
𝑖 𝑗 if 𝑎( ,y)
𝑖 𝑗 >0𝑖𝑗(11)
𝑥+
𝑖 𝑗 =𝑥
𝑖 𝑗 =𝑥𝑗𝑖𝑗(12)
𝑦+
𝑖 𝑗 =𝑦
𝑖 𝑗 =𝑦𝑗𝑖𝑗(13)
x0(14)
y {0,1}|y|(15)
where for each of the equations above, notation
𝑖𝑗
means
all the possible
𝑖
and
𝑗
values should be considered accord-
ing to the specific constraints or conditions given for each
equation.
Constructing the flow model. We can encode the above
constraints using a flow model. We first create one edge per
variable and then enforce each constraint using one node:
(S1)
We encode Eq. 7 using SPLIT NODEs. We will have a
node for each possible
𝑖
. The inputs to each node are
(1) one edge per variable on the left-hand side of the
constraint (
U+
and
V+
), (2) one edge with a constant
rate
b
, and (3) one additional edge associated with
f
.
The outputs are (1) one edge per variable on the right-
hand side of the constraint (
U
and
V
) and (2) one
additional edge with constant rate
b+
.Fig. 8 shows
how this encoding is done.
Split(𝑖)
𝑗𝑢+
𝑖 𝑗 𝑏
𝑖
𝑗𝑣+
𝑖 𝑗
𝑓𝑖
𝑗𝑢
𝑖 𝑗 𝑏+
𝑖
𝑗𝑣
𝑖 𝑗
Í𝑗[𝑢+
𝑖 𝑗 +𝑣+
𝑖 𝑗 ] + 𝑏
𝑖+𝑓𝑖=Í𝑗[𝑢
𝑖 𝑗 +𝑣
𝑖 𝑗 ] + 𝑏+
𝑖
Figure 8: Step 1 of the encoding: SPLIT NODE for
𝑖
. There
will be a SPLIT NODE for each possible
𝑖Z[0,|b|)
. If a
variable is 0, we do not need to assign it to the node. There
are at most
|x|
arrows present for
𝑢+
𝑖 𝑗
and
𝑢
𝑖 𝑗
since at most
one of
𝑎(−,x)
𝑖 𝑗
or
𝑎(+,x)
𝑖 𝑗
is non-zero. Similarly, there are at
most |y|arrows present for 𝑦+
𝑖 𝑗 and 𝑦
𝑖 𝑗 .
Towards Safer Heuristics With XPlain HOTNETS ’24, November 18–19, 2024, Irvine, CA, USA
(S2)
We express Eq. 8 11 using MU LTIPLY NO D Es. The
U
edges originate from SP L IT N ODEs to these MULTI P LY
NO DE s while
U+
edges are in the opposite direction. So,
the node that models Eq. 8 has
𝑥+
𝑖 𝑗
as its input edge and
𝑢+
𝑖 𝑗
as its output edge. Conversely, the input edge is
𝑢
𝑖 𝑗
and the output edge is
𝑥
𝑖 𝑗
for Eq. 9 (same holds for
𝑦
and 𝑣). Fig. 9 shows this step.
×𝑎(+,x)
𝑖 𝑗
𝑥+
𝑖 𝑗
𝑢+
𝑖 𝑗 =𝑎(+,x)
𝑖 𝑗 𝑥+
𝑖 𝑗
×1
𝑎(−,x)
𝑖 𝑗
𝑢
𝑖 𝑗
𝑥
𝑖 𝑗 =1
𝑎(−,x)
𝑖 𝑗
𝑢
𝑖 𝑗
Figure 9: Step 2 of the encoding. There will be a MULTIPLY
NODE for each possible
𝑖
and
𝑗
. At most of these two
MULTIPLY NODEs will be needed since at most one of
𝑎(−,x)
𝑖 𝑗 or 𝑎( +,x)
𝑖 𝑗 is non-zero.
(S3)
We model Eq. 12 13 using AL L EQUA L NOD Es. Note
that for a fixed
𝑖
and
𝑗
, since at most one of
𝑎(−,x)
𝑖 𝑗
and
𝑎(+,x)
𝑖 𝑗
is non-zero, at most of the equations in Eq. 8 and
Eq. 9 are needed for that
𝑖
and
𝑗
(same holds for Eq. 10
and Eq. 11). Consequently, at most of
𝑥+
𝑖 𝑗
and
𝑥
𝑖 𝑗
is
needed in Eq. 12 (same holds for
𝑦+
𝑖 𝑗
and
𝑦
𝑖 𝑗
in Eq. 13).
The
𝑥𝑗
and
𝑥
𝑖 𝑗
s are input edges and
𝑥+
𝑖 𝑗
s are the output
edges (same for 𝑦). Fig. 10 illustrates this step.
AllEq(𝑗)
𝑥𝑗𝑖𝑥
𝑖 𝑗
𝑖𝑥+
𝑖 𝑗
Figure 10: Step 3 of the encoding. There will be a ALL
EQ UAL N ODE for each possible 𝑗Z[0,|x| ) .
(S4)
The input variables are the variables in
x
and
y
. We
represent binary variables in Eq. 15 using PICK NODES.
It has one incoming edge with a constant rate of 1 and
two outgoing edges. One of the outputs corresponds to
the binary variable. If the node selects that specific edge
to carry the flow, the binary variable is 1. Otherwise,
it is 0. Eq. 14 is inherently satisfied as flows are all
non-negative.
This flow model provably captures the optimization’s feasi-
ble space as there is a one-to-one correspondence between the
constraints in the optimization and the constraints enforced
by the nodes.
How to capture the optimization objective. We can express
the objective of any linear optimization as
maxx,yc
xx+c
yy
where
cx
and
cy
are constant vectors. We can reformulate and
add a constraint that enforces
𝑝=c
xx+c
yy
, so the objective
of the optimization changes to maximizing
𝑝
. Then, we can
use similar transformations, as we explained before, to capture
this constraint within the flow model. We add a sink node that
has one incoming edge
𝑝
. This way, we can express any linear
optimization objective with our model.
ResearchGate has not been able to resolve any citations for this publication.