L.S. Moss (Ed.): TARK 2019
EPTCS 297, 2019, pp. 293–312, doi:10.4204/EPTCS.297.19
© R. Kuznets, L. Prosperi, U. Schmid & K. Fruzsa
This work is licensed under the
Creative Commons Attribution License.
Causality and Epistemic Reasoning
in Byzantine MultiAgent Systems
Roman Kuznets*
Embedded Computing Systems
TU Wien
Vienna, Austria
roman@logic.at
Laurent Prosperi
ENS ParisSaclay
Cachan, France
laurent.prosperi@enscachan.fr
Ulrich Schmid
Embedded Computing Systems
TU Wien
Vienna, Austria
s@ecs.tuwien.ac.at
Krisztina Fruzsa†
Embedded Computing Systems
TU Wien
Vienna, Austria
krisztina.fruzsa@tuwien.ac.at
Causality is an important concept both for proving impossibility results and for synthesizing efﬁcient
protocols in distributed computing. For asynchronous agents communicating over unreliable chan
nels, causality is well studied and understood. This understanding, however, relies heavily on the
assumption that agents themselves are correct and reliable. We provide the ﬁrst epistemic analysis
of causality in the presence of byzantine agents, i.e., agents that can deviate from their protocol and,
thus, cannot be relied upon. Using our new framework for epistemic reasoning in faulttolerant multi
agent systems, we determine the byzantine analog of the causal cone and describe a communication
structure, which we call a multipede, necessary for verifying preconditions for actions in this setting.
1 Introduction
Reasoning about knowledge has been a valuable tool for analyzing distributed systems for decades [5, 9],
and has provided a number of fundamental insights. As crisply formulated by Moses [17] in the form of
the Knowledge of Preconditions Principle, a precondition for action must be known in order to be action
able. In a distributed environment, where agents only communicate by exchanging messages, an agent
can only learn about events happening to other agents via messages (or sometimes the lack thereof [8]).
In asynchronous systems, where the absence of communication is indistinguishable from delayed
communication, agents can only rely on messages they receive. Lamport’s seminal deﬁnition of the
happenedbefore relation [14] establishes the causal structure for asynchronous agents in the agent–time
graph describing a run of a system. This structure is often referred to as a causal cone, whereby causal
links are either time transitions from past to future for one agent or messages from one agent to another.
As demonstrated by Chandy and Misra [2], the behavior of an asynchronous agent can only be affected
by events from within its causal cone.
The standard way of showing that an agent does not know of an event is to modify a given run by
removing the event in question in such a way that the agent cannot detect the change. By Hintikka’s
deﬁnition of knowledge [11], the agent thinks it possible that the event has not occurred and, hence, does
not know of the event to have occurred. Chandy and Misra’s result shows that in order for agent ito learn
of an event happening to another agent j, there must exist a chain of successfully delivered messages
*Supported by the Austrian Science Fund (FWF) projects RiSE/SHiNE (S11405) and ADynNet (P28182).
†PhD student in the FWF doctoral program LogiCS (W1255).
294
Causality and Epistemic Reasoning in Byzantine MultiAgent Systems
leading from the moment of agent jobserving the event to some past or present state of agent i. This ob
servation remains valid in asynchronous distributed systems where messages could be lost and/or where
agents may stop operating (i.e., crash) [4, 10, 19].
In synchronous systems, if message delays are upperbounded, agents can also learn from the absence
of communication (communicationbytime). As shown in [1], Lamport’s happenedbefore relation must
then be augmented by causal links indicating no communication within the message delay upper bound
to also capture causality induced via communicationbytime, leading to the socalled syncausality rela
tion. Its utility has been demonstrated using the ordered response problem, where agents must perform
a sequence of actions in a given order: both the necessary and sufﬁcient knowledge and a necessary and
sufﬁcient communication structure (called a centipede) have been determined in [1]. It is important to
note, however, that syncausality works only in faultfree distributed systems with reliable communica
tion. Although it has recently been shown in [8] that silent choirs are a way to extend it to distributed
systems where agents may crash, the idea does not generalize to less benign faults.
Unfortunately, all the above ways of capturing causality and the resulting simplicity of determining
the causal cone completely break down if agents may be byzantine faulty [15]. Byzantine faulty agents
may behave arbitrarily, in particular, need not adhere to their protocol and may, hence, send arbitrary
messages. It is common to limit the maximum number of agents that ever act byzantine in a distributed
system by some number f, which is typically much smaller than the total number nof agents. Prompted
by the ever growing number of faulty hardware and software having realworld negative, sometimes
lifecritical, consequences, capturing causality and providing ways for determining the causal cone in
byzantine faulttolerant distributed systems is both an important and scientiﬁcally challenging task. To
the best of our knowledge, this challenge has not been addressed in the literature before.1
In a nutshell, for f>0, the problem of capturing causality becomes complicated by the fact that
a simple causal chain of messages is no longer sufﬁcient: a single byzantine agent in the chain could
manufacture “evidence” for anything, both false negatives and false positives. And indeed, obvious
generalizations of message chains do not work. For example, it is a folklore result that, in the case
of direct communication, at least f+1 conﬁrmations are necessary because fof them could be false.
When information is transmitted along arbitrary, possibly branching and intersecting chains of messages,
the situation is even more complex and deﬁes simplistic direct analysis. In particular, as shown by the
counterexample in [16, Fig. 1], one cannot rely on Menger’s Theorem [3] for separating nodes in the
twodimensional agent–time graph.
Major contributions: In this paper, we generalize the causality structure of asynchronous distributed
systems described above to multiagent systems involving byzantine faulty agents. Relying on our novel
byzantine runsandsystems framework [12] (described in full detail in [13]), we utilize some generic
epistemic analysis results for determining the shape of the byzantine analog of Lamport’s causal cone.
Since knowledge of an event is too strong a precondition in the presence of byzantine agents, it has to be
relaxed to something more akin to belief relative to correctness [18], for which we coined the term hope.
We show that hope can only be achieved via a causal message chain that passes solely through correct
agents (more precisely, through agents still correct while sending the respective messages). While the
result looks natural enough, its formal proof is quite involved technically and paints an instructive pic
ture of how byzantine agents can affect the information ﬂow. We also establish a necessary condition for
detecting an event, and a corresponding communication structure (called a multipede), which is severely
complicated by the fact that the reliable causal cones of indistinguishable runs may be different.
Paper organization: In Sect. 2, we succinctly introduce the features of our byzantine runsand
1Despite having “Byzantine” in the title, [4, 10] only address benign faults (crashes, send/receive omissions of messages).
R. Kuznets, L. Prosperi, U. Schmid & K. Fruzsa
295
systems framework [13] and state some generic theorems and lemmas needed for proving the results
of the paper. In Sect. 3, we describe the mechanism of run modiﬁcations, which are used to remove
events an agent should not know about from a run, without the agent noticing. Our characterization of
the byzantine causal cone is provided in Sect. 4, the necessary conditions for establishing hope for an
occurrence of an event and the underlying multipede structure can be found in Sect. 5. Some conclusions
in Sect. 6 roundoff the paper.
2 RunsandSystems Framework for Byzantine Agents
First, we describe the modiﬁcations of the runsandsystems framework [5] necessary to account for
byzantine behavior. To prevent wasting space on multiple deﬁnition environments, we give the following
series of formal deﬁnitions as ordinary text marking deﬁned objects by italics; consult [13] for the same
deﬁnitions in fully spelledout format. As a further spacesaving measure, instead of repeating every
time “actions and/or events,” we use haps2as a general term referring to either actions or events.
The goal of all these deﬁnitions is to formally describe a system where asynchronous agents 1,...,n
perform actions according to their protocols, observe events, and exchange messages within an envi
ronment represented as a special agent
ε
. Unlike the environment, agents only have limited local in
formation, in particular, being asynchronous, do not have access to the global clock. No assumptions
apart from liveness are made about the communication. Messages can be lost, arbitrarily delayed, and/or
delivered in the wrong order. This part of the system is a fairly standard asynchronous system with un
reliable communication. The novelty is that the environment may additionally cause at most fagents
to become faulty in arbitrary ways. A faulty agent can perform any of its actions irrespective of its
protocol and observe events that did not happen, e.g., receive unsent or corrupted messages. It can also
have false memories about actions it has performed. At the same time, much like the global clock, such
malfunctions are not directly visible to an agent, especially when it mistakenly thinks it acted correctly.
We ﬁx a ﬁnite set A={1,...,n}of agents. Agent i∈Acan perform actions a ∈Actionsi,
e.g., send messages, and witness events e ∈Eventsisuch as message delivery. We denote Hapsi:=
Actionsi⊔Eventsi. The action of sending a copy numbered kof a message
µ
∈Msgs to an agent j∈A
is denoted send(j,
µ
k), whereas a receipt of such a message from i∈Ais recorded locally as recv(i,
µ
).3
Agent irecords actions from Actionsiand observes events from Eventsiwithout dividing them into
correct and faulty. The environment
ε
, on the contrary, always knows if the agent acted correctly or
was forced into byzantine behavior. Hence, the syntactic representations of each hap for agents (local
view) and for the environment (global view) must differ, with the latter containing more information.
In particular, the global view syntactically distinguishes correct haps from their byzantine counterparts.
While there is no way for an agent to distinguish a real event from its byzantine duplicate, it can analyze
its recorded actions and compare them with its protocol. Sometimes, this information might be sufﬁcient
for the agent to detect its own malfunctions.
All of Actions :=Si∈AActionsi,Events :=Si∈AEventsi, and Haps :=Actions ⊔Events represent the
local view of haps. All haps taking place after a timestamp t ∈T:=Nand no later than t+1 are grouped
into a round denoted t+½ and are treated as happening simultaneously. To model asynchronous agents,
we exclude these system timestamps from the local format of Haps. At the same time, the environment
ε
incorporates the current timestamp tinto the global format of every correct action a∈Actionsi, as initi
2Cf. “Till I know ’tis done, Howe’er my haps, my joys were ne’er begun.” W. Shakespeare, Hamlet, Act IV, Scene 3.
3Thus, it is possible to send several copies of the same message in the same round. If one or more of such copies are received
in the same round, however, the recipient does not know which copy it has received, nor that there have been multiple copies.
296
Causality and Epistemic Reasoning in Byzantine MultiAgent Systems
ated by agent iin the local format, via a onetoone function global (i,t,a). Timestamps are especially
crucial for proper message processing with global (i,t,send(j,
µ
k)) :=gsend(i,j,
µ
,id(i,j,
µ
,k,t)) for
some onetoone function id :A×A×Msgs ×N×T→Nthat assigns each sent message a unique
global message identiﬁer (GMI). We chose not to model agenttoagent channels explicitly. With all
messages effectively sent through one systemwide channel, these GMIs are needed to ensure the causal
ity of message delivery, i.e., that only sent messages can be delivered correctly. The sets GActionsi:=
{global (i,t,a)t∈T,a∈Actionsi}of all possible correct actions for each agent in global format are
pairwise disjoint due to the injectivity of global. We set GActions :=Fi∈AGActionsi.
Unlike correct actions, correct events witnessed by agents are generated by the environment
ε
and,
hence, can be assumed to be produced already in the global format GEventsi. We deﬁne GEvents :=
Fi∈AGEventsiassuming them to be pairwise disjoint, and GHaps =GEvents ⊔GActions. We do not
consider the possibility of the environment violating its protocol, which is meant to model the fundamen
tal physical laws of the system. Thus, all events that can happen are considered correct. A byzantine event
is, thus, a subjective notion. It is an event that was perceived by an agent despite not taking place. In other
words, each correct event E∈GEventsihas a faulty counterpart fake (i,E), and agent icannot distinguish
the two. An important type of correct global events of agent jis the delivery grecv(j,i,
µ
,id)∈GEvents j
of message
µ
with GMI id ∈Nsent by agent i. Note that the GMI, which is used in by the global
format to ensure causality, must be removed before the delivery is recorded by the agent in the local
format because GMIs contain the time of sending, which should not be accessible to agents. To strip
this information before updating local histories, we employ a function local :GHaps →Haps converting
correct haps from the global into the local format in such a way that for actions local reverses global,
i.e., localglobal (i,t,a):=a. For message deliveries, localgrecv(j,i,
µ
,id):=recv(i,
µ
), i.e., agent j
only knows that it received message
µ
from agent i. It is, thus, possible for two distinct correct global
events, e.g., grecv(j,i,
µ
,id)and grecv(j,i,
µ
,id′), representing the delivery of different copies of the
same message
µ
, possibly sent by iat different times, to be recorded by jthe same way, as recv(i,
µ
).
Therefore, correct actions are initiated by agents in the local format and translated into the global
format by the environment. Correct and byzantine events are initiated by the environment in the global
format and translated into the local format before being recorded by agents.4We will now turn our
attention to byzantine actions.
While a faulty event is purely an error of perception, actions can be faulty in another way: they can
violate the protocol. The crucial question is: who should be responsible for such violations? With agents’
actions governed by their protocols while everything else is up to the environment, it seems that errors,
especially unintended errors, should be the environment’s responsibility. A malfunctioning agent tries
to follow its protocol but fails for reasons outside of its control, i.e., due to environment interference. A
malicious agent tries to hide its true intentions from other agents by pretending to follow its expected
protocol and, thus, can also be modeled via environment interference. Thus, we model faulty actions as
byzantine events of the form fake(i,A7→ A′)where A,A′∈GActionsi⊔ {noop}for a special nonaction
noop in global format. Here Ais the action (or, in case of noop, inaction) performed, while A′rep
resents the action (inaction) perceived instead by the agent. More precisely, the agent either records
a′=local(A′)∈Eventsiif A′∈GEventsior has no record of this byzantine action if A′=noop. The
byzantine inaction fail (i):=fake (i,noop 7→ noop)is used to make agent ifaulty without performing
any actions and without leaving a record in i’s local history. The set of all i’s byzantine events, corre
sponding to both faulty events and faulty actions, is denoted BEventsi, with BEvents :=Fi∈ABEventsi.
To prevent our asynchronous agents from inferring the global clock by counting rounds, we make
4This has already been described for correct events. A byzantine event is recorded the same way as its correct counterpart.
R. Kuznets, L. Prosperi, U. Schmid & K. Fruzsa
297
waking up for a round contingent on the environment issuing a special system event go(i)for the agent in
question. Agent i’s local view of the system immediately after round t+½, referred to as (processtime or
agenttime)node (i,t+1), is recorded in i’s local state ri(t+1), also called i’s local history. Nodes (i,0)
correspond to initial local states ri(0)∈Σi, with G(0):=∏i∈AΣi. If a round contains neither go(i)nor
any event to be recorded in i’s local history, then the said history ri(t+1) = ri(t)remains unchanged,
denying the agent the knowledge of the round just passed. Otherwise, ri(t+1) = X:ri(t), for X⊆Hapsi,
the set of all actions and events perceived by iin round t+½, where : stands for concatenation. The ex
act deﬁnition will be given via the updateifunction, to be described shortly. Thus, the local history ri(t)
is a list of all haps as perceived by iin rounds it was active in. The set of all local states of iis Li.
While not necessary for asynchronous agents, for future backwards compatibility, we add more sys
tem events for each agent, to serve as faulty counterparts to go(i). Commands sleep (i)and hibernate (i)
signify a failure to activate the agent’s protocol and differ in that the former enforces waking up the agent
(and thus recording time) notwithstanding. These commands will be used, e.g., for synchronous systems.
None of the system events SysEventsi:={go(i),sleep (i),hibernate (i)}is directly detectable by agents.
To summarize, GEventsi:=GEventsi⊔BEventsi⊔SysEventsiwith GEvents :=Fi∈AGEventsiand
GHaps :=GEvents ⊔GActions. Throughout the paper, horizontal bars signify phenomena that are cor
rect. Note that the absence of this bar means the absence of a claim of correctness. It does not necessarily
imply a fault. Later, this would also apply to formulas, e.g., occurredi(e)demands a correct occurrence
of an event ewhereas occurredi(e)is satisﬁed by either correct or faulty occurrence.
We now turn to the description of runs and protocols for our byzantineprone asynchronous agents.
Arun r is a sequence of global states r(t) = (r
ε
(t),r1(t),...,rn(t)) of the whole system consisting of the
state r
ε
(t)of the environment and local states ri(t)of every agent. We already discussed the composition
of local histories. Similarly, the environment’s history r
ε
(t)is a list of all haps that happened, this time
faithfully recorded in the global format. Accordingly, r
ε
(t+1) = X:r
ε
(t)for the set X⊆GHaps of all
haps from round t+½. The set of all global states is denoted G.
What happens in each round is determined by protocols P
iof agents, protocol P
ε
of the environ
ment, and chance, the latter implemented as the adversary part of the environment. Agent i’s protocol
P
i:Li→
℘
(
℘
(Actionsi)) \ {∅}provides a range P
i(ri(t)) of sets of actions based on i’s current local
state ri(t), with the view of achieving some collective goal. Recall that the global timestamp tis not
part of ri(t). The control of all events—correct, byzantine, and system—lies with the environment
ε
via its protocol P
ε
:T→
℘
(
℘
(GEvents)) \ {∅}, which can depend on a timestamp t∈Tbut not on the
current state. The environment’s protocol is thus kept impartial by denying it an agenda based on the
global history so far. Other parts of the environment must, however, have access to the global history, in
particular, to ensure causality. Thus, the environment’s protocol provides a range P
ε
(t)of sets of events.
Protocols P
iand P
ε
are nondeterministic and always provide at least one option. The choice among the
options (if more than one) is arbitrarily made by the already mentioned adversary part of the environ
ment. It is also required that all events from P
ε
(t)be mutually compatible at time t. These t coherency
conditions are: (a) no more than one system event go(i),sleep (i), and hibernate (i)per agent iat a time;
(b) a correct event perceived as eby agent iis never accompanied by a byzantine event that iwould also
perceive as e, i.e., an agent cannot be mistaken about witnessing an event that did happen; (c) the GMI
of a byzantine sent message is the same as if a copy of the same message were sent correctly in the same
round. Note that the prohibition (b) does not extend to correct actions.
Both the global run r:T→Gand its local parts ri:T→Liprovide a sequence of snapshots of
the system and local states respectively. Given the joint protocol P := (P
1,...,P
n)and the environment’s
protocol P
ε
, we focus on
τ
f,P
ε
,Ptransitional runs r that result from following these protocols and are built
according to a transition relation
τ
f,P
ε
,P⊆G×Gfor asynchronous agents at most f≥0 of which may
298
Causality and Epistemic Reasoning in Byzantine MultiAgent Systems
r(t)
P
n(rn(t))
...
P
1(r1(t))
P
ε
(t)
P
n
P
1
P
ε
Xn
...
X1
X
ε
adversary
adversary
adversary
α
t
n(r)
...
α
t
1(r)
α
t
n(r)
...
α
t
1(r)
X
ε
=
α
t
ε
(r)
global
global
β
t
n(r)
...
β
t
1(r)
β
t
ε
(r)
β
t
ε
(r)
f iltern
f ilter1
f ilter
ε
rn(t+1)
...
r1(t+1)
r
ε
(t+1)
upd aten
upd ate1
upd ate
ε
β
t
ε
n(r)
β
t
ε
1(r)
r(t+1)

t
    
t+1
Protocol phase Adversary phase Labeling phase Filtering phase Updating phase
Figure 1: Details of round t+½ of a
τ
f,P
ε
,Ptransitional run r.
become faulty in a given run. In this paper, we only deal with generic f,P
ε
, and P. Hence, whenever safe,
we write
τ
in place of
τ
f,P
ε
,P. Each transitional run begins in some initial global state r(0)∈G(0)and
progresses by ensuring that r(t)
τ
r(t+1), i.e., (r(t),r(t+1)) ∈
τ
, for each timestamp t∈T. Given f,
P
ε
, and P, the transition relation
τ
f,P
ε
,Pconsisting of ﬁve consecutive phases is graphically represented
in Figure 1 and described in detail below:
1. Protocol phase. A range P
ε
(t)⊆
℘
(GEvents)of tcoherent sets of events is determined by the en
vironment’s protocol P
ε
; for each i∈A, a range P
i(ri(t)) ⊆
℘
(Actionsi)of sets of i’s actions is
determined by the agents’ joint protocol P.
2. Adversary phase. The adversary nondeterministically picks a tcoherent set X
ε
∈P
ε
(t)and a set
Xi∈P
i(ri(t)) for each i∈A.
3. Labeling phase. Locally represented actions in Xi’s are translated into the global format:
α
t
i(r):=
{global (i,t,a)a∈Xi} ⊆ GActionsi. In particular, correct sends are supplied with GMIs.
4. Filtering phase. Functions f ilt er
ε
and f ilt erifor each i∈Aremove all causally impossible at
tempted events from
α
t
ε
(r):=X
ε
and actions from
α
t
i(r).
4.1. First, f ilt er
ε
ﬁlters out causally impossible events based (a) on the current global state r(t),
which could not have been accounted for by the protocol P
ε
, (b) on
α
t
ε
(r), and (c) on all
α
t
i(r), not
accessible for P
ε
either. Speciﬁcally, two kinds of events are causally impossible for asynchronous
agents with at most fbyzantine failures and are removed by f ilt er
ε
in two stages as follows (for
mal deﬁnitions can be found in the appendix, Deﬁnitions A.16–A.17; cf. also [13] for details):
(1) in the 1st stage, all byzantine events are removed by f il ter≤f
ε
if they would have resulted in
more than ffaulty agents in total;
(2) in the 2nd stage, correct receives without matching sends (either in the history r(t)or in the
current round) are removed by f ilterB
ε
.
The resulting set of events to actually occur in round t+½ is denoted
β
t
ε
(r):=f ilt er
ε
r(t),
α
t
ε
(r),
α
t
1(r), .. .,
α
t
n(r).
4.2. After events are ﬁltered, f ilt erifor each agent iremoves all i’s actions iff go(i)/∈
β
t
ε
(r). The
resulting sets of actions to be actually performed by agents in round t+½ are
β
t
i(r):=f ilt eri
α
t
1(r), .. .,
α
t
n(r),
β
t
ε
(r).
We have
β
t
i(r)⊆
α
t
i(r)⊆GActionsiand
β
t
ε
(r)⊆
α
t
ε
(r)⊆GEvents.
R. Kuznets, L. Prosperi, U. Schmid & K. Fruzsa
299
5. Updating phase. The resulting mutually causally consistent sets of events
β
t
ε
(r)and of actions
β
t
i(r)
are appended to the global history r(t); for each i∈A, all nonsystem events from
β
t
ε
i(r):=
β
t
ε
(r)∩GEventsi
as perceived by the agent and all correct actions
β
t
i(r)are appended in the local form to the local
history ri(t), which may remain unchanged if no action or event triggers an update or be appended
with the empty set if an update is triggered only by a system event go(i)or sleep(i):
r
ε
(t+1):=update
ε
r
ε
(t),
β
t
ε
(r),
β
t
1(r), ...,
β
t
n(r); (1)
ri(t+1):=updateiri(t),
β
t
i(r),
β
t
ε
(r).(2)
Formal deﬁnitions of update
ε
and updateiare given in Def. A.19 in the appendix.
The protocols Pand P
ε
only affect phase 1, so we group the operations in the remaining phases 2–5
into a transition template
τ
fthat computes a transition relation
τ
f,P
ε
,Pfor any given Pand P
ε
. This transi
tion template, primarily via the ﬁltering functions, represents asynchronous agents with at most ffaults.
The template can be modiﬁed independently from the protocols to capture other distributed scenarios.
Liveness and similar properties that cannot be ensured on a roundbyround basis are enforced by
restricting the allowable runs by admissibility conditions Ψ, which formally are subsets of the set Rof all
transitional runs. For example, since no goal can be achieved without allowing agents to act from time to
time, it is standard to impose the Fair Schedule (FS) admissibility condition, which for byzantine agents
states that an agent can only be delayed indeﬁnitely through persistent faults:
FS :={r∈R(∀i∈A)(∀t∈T)(∃t′≥t)
β
t′
ε
(r)∩SysEventsi6=∅}.
In scheduling terms, FS ensures that each agent be considered for using CPU time inﬁnitely often. Deny
ing any of these requests constitutes a failure, represented by a sleep(i)or hibernate (i)system event.
We now combine all these parts in the notions of context and agentcontext:
Deﬁnition 1. Acontext
γ
= (P
ε
,G(0),
τ
f,Ψ)consists of an environment’s protocol P
ε
, a set of global
initial states G(0), a transition template
τ
ffor f≥0, and an admissibility condition Ψ. For a joint proto
col P, we call
χ
= (
γ
,P)an agentcontext. A run r:T→Gis called weakly
χ
consistent if r(0)∈G(0)
and the run is
τ
f,P
ε
,Ptransitional. A weakly
χ
consistent run ris called (strongly)
χ
consistent if r∈Ψ.
The set of all
χ
consistent runs is denoted R
χ
⊆R. Agentcontext
χ
is called nonexcluding if any ﬁnite
preﬁx of a weakly
χ
consistent run can be extended to a
χ
consistent run.
We are also interested in narrower types of faults. Let FEventsi:=BEventsi⊔{sleep (i),hibernate (i)}.
Deﬁnition 2. Environment’s protocol P
ε
makes an agent i∈A:
1. correctable if X∈P
ε
(t)implies that X\FEventsi∈P
ε
(t);
2. delayable if X∈P
ε
(t)implies X\GEventsi∈P
ε
(t);
3. errorprone if X∈P
ε
(t)implies that, for any Y⊆FEventsi, the set Y⊔(X\FEventsi)∈P
ε
(t)
whenever it is tcoherent;
4. gullible if X∈P
ε
(t)implies that, for any Y⊆FEventsi, the set Y⊔(X\GEventsi)∈P
ε
(t)whenever
it is tcoherent;
5. fully byzantine if agent iis both errorprone and gullible.
In other words, correctable agents can always be made correct for the round by removing all their
byzantine events; delayable agents can always be forced to skip a round completely (which does not
300
Causality and Epistemic Reasoning in Byzantine MultiAgent Systems
make them faulty); errorprone (gullible) agents can exhibit any faults in addition to (in place of) correct
events, thus, implying correctability (delayability); fully byzantine agents’ faults are unrestricted. Com
mon types of faults, e.g., crash or omission failures, can be obtained by restricting allowable sets Yin
the deﬁnition of gullible agents.
Now that our byzantine version of the runsandsystems framework is laid out, we deﬁne interpreted
systems in this framework in the usual way, i.e., as special kinds of Kripke models for multiagent dis
tributed environments [5]. For an agentcontext
χ
, we consider pairs (r,t′)∈R
χ
×Tof a
χ
consistent
run rand timestamp t′. A valuation function
π
:Prop →
℘
(R
χ
×T)determines whether an atomic
proposition from Prop is true in run rat time t′. The determination is arbitrary except for a small set
of designated atomic propositions whose truth value at (r,t′)is fully determined. More speciﬁcally, for
i∈A,o∈Hapsi, and t∈Tsuch that t≤t′,
correct(i,t)is true at (r,t′), or node (i,t)is correct in run r, iff no faulty event happened to iby
timestamp t, i.e., no event from FEventsiappears in the r
ε
(t)preﬁx of the r
ε
(t′)part of r(t′);
correctiis true at (r,t′)iff correct(i,t′)is;
fake(i,t)(o)is true at (r,t′)iff ihas a faulty reason to believe that o∈Hapsioccurred in round t−½,
i.e., o∈ri(t)because (at least in part) of some O∈BEventsi∩
β
t−1
ε
(r);
occurred(i,t)(o)is true at (r,t′)iff ihas a correct reason to believe o∈Hapsioccurred in round t−½,
i.e., o∈ri(t)because (at least in part) of O∈(GEventsi∩
β
t−1
ε
(r)) ⊔
β
t−1
i(r);
occurredi(o)is true at (r,t′)iff at least one of occurred(i,m)(o)for 1 ≤m≤t′is;
occurred(o)is true at (r,t′)iff at least one of occurredi(o)for i∈Ais;
occurredi(o)is true at (r,t′)iff either occurredi(o)is or at least one of fake(i,m)(o)for 1 ≤m≤t′is.
An interpreted system is a pair I= (R
χ
,
π
). The epistemic language
ϕ
::=p ¬
ϕ
(
ϕ
∧
ϕ
)Ki
ϕ
where p∈Prop and i∈Aand derived Boolean connectives are deﬁned in the usual way. Truth for these
(epistemic)formulas is deﬁned in the standard way, in particular, for a run r∈R
χ
, timestamp t∈T,
atomic proposition p∈Prop, agent i∈A, and formula
ϕ
we have (I,r,t)=piff (r,t)∈
π
(p)and
(I,r,t)=Ki
ϕ
iff (I,r′,t′)=
ϕ
for any r′∈R
χ
and t′∈Tsuch that ri(t) = r′
i(t′). A formula
ϕ
is valid
in I, written I=
ϕ
, iff (I,r,t)=
ϕ
for all r∈R
χ
and t∈T.
Due to the tcoherency of all allowed protocols P
ε
, an agent cannot be both right and wrong about any
local event e∈Eventsi, i.e., I=¬(occurred(i,t)(e)∧fake(i,t)(e)). Note that for actions this can happen.
Following the concept from [6] of global events that are local for an agent, we deﬁne conditions
under which formulas can be treated as such local events. A formula
ϕ
is called localized for i within
an agentcontext
χ
iff ri(t) = r′
i(t′)implies (I,r,t)=
ϕ
⇐⇒ (I,r′,t′)=
ϕ
for any I= (R
χ
,
π
), runs
r,r′∈R
χ
, and timestamps t,t′∈T. By these deﬁnitions, we immediately obtain:
Lemma 3. The following statements are valid for any formula
ϕ
localized for an agent i ∈Awithin an
agentcontext
χ
and any interpreted system I= (R
χ
,
π
):I=
ϕ
↔Ki
ϕ
and I=¬
ϕ
↔Ki¬
ϕ
.
The knowledge of preconditions principle [17] postulates that in order to act on a precondition an
agent must be able to infer it from its local state. Thus, Lemma 3 shows that formulas localized for ican
always be used as preconditions. Our ﬁrst observation is that the agent’s perceptions of a run are one
example of such epistemically acceptable (though not necessarily reliable) preconditions:
Lemma 4. For any agentcontext
χ
, agent i ∈A, and local hap o ∈Hapsi, the formula occurredi(o)is
localized for i within
χ
.
It can be shown that correctness of these perceptions is not localized for iand, hence, cannot be the
basis for actions. In fact, Theorem 12 will reveal that no agent can establish its own correctness.
R. Kuznets, L. Prosperi, U. Schmid & K. Fruzsa
301
3 Run modiﬁcations
We will now introduce the pivotal technique of run modiﬁcations, which are used to show an agent does
not know
ϕ
by creating an indistinguishable run where
ϕ
is false.
Deﬁnition 5. A function
ρ
:R
χ
→
℘
(GActionsi)×
℘
(GEventsi)is called an iintervention for an agent
context
χ
and agent i ∈A. A joint intervention B = (
ρ
1,...,
ρ
n)consists of iinterventions
ρ
ifor each
agent i∈A. An adjustment [Bt;...;B0]is a sequence of joint interventions B0. .. ,Btto be performed at
rounds from ½ to t+½.
We consider an iintervention
ρ
(r) = (X,X
ε
)applied to a round t+½ of a given run rto be a
metaaction by the system designer, intended to modify the results of this round for iin such a way
that
β
t
i(r′) = Xand
β
t
ε
i(r′) =
β
t
ε
(r′)∩GEventsi=X
ε
in the artiﬁcially constructed new run r′. For
ρ
(r) = (X,X
ε
), we denote a
ρ
(r):=Xand e
ρ
(r):=X
ε
. Accordingly, a joint intervention (
ρ
1,...,
ρ
n)pre
scribes actions
β
t
i(r′) = a
ρ
i(r)for each agent iand events
β
t
ε
(r′) = Fi∈Ae
ρ
i(r)for the round in question.
Thus, an adjustment [Bt;...;B0]fully determines actions and events in the initial t+1 rounds of run r′:
Deﬁnition 6. Let adj = [Bt;...;B0]be an adjustment where Bm= (
ρ
m
1,...,
ρ
m
n)for each 0 ≤m≤tand
each
ρ
m
iis an iintervention for an agentcontext
χ
= ((P
ε
,G(0),
τ
f,Ψ),P). A run r′is obtained from
r∈R
χ
by adjustment adj iff for all t′≤t, all T′>t, and all i∈A,
(a) r′(0):=r(0),
(b) r′
i(t′+1):=updateir′
i(t′),a
ρ
t′
i(r),Fi∈Ae
ρ
t′
i(r),
(c) r′
ε
(t′+1):=update
ε
r′
ε
(t′),Fi∈Ae
ρ
t′
i(r),a
ρ
t′
1(r), .. ., a
ρ
t′
n(r),
(d) r′(T′)
τ
f,P
ε
,Pr′(T′+1).
R(
τ
f,P
ε
,P,r,adj)is the set of all runs obtained from rby adj.
Note that adjusted runs need not be a priori transitional, i.e., obey (d), for t′≤t. Of course, we intend
to use adjustments in such a way that r′is a transitional run. But it requires a separate proof. In order to
improve the readability of these proofs, we allow ourselves (and already used) a small abuse of notation.
The
β
sets
β
t
ε
(r′)and
β
t
i(r′)were initially deﬁned only for transitional runs as the result of ﬁltering.
But they also represent the sets of events and i’s actions respectively happening in round t+½. This
alternative deﬁnition is equivalent for transitional runs and, in addition, can be used for adjusted runs r′.
This is what we mean whenever we write
β
sets for runs obtained by adjustments.
In order to minimize an agent’s knowledge in accordance with the structure of its (soon to be deﬁned)
reliable (or byzantine) causal cone, we will use several types of iinterventions that copy round t+½ of
the original run to various degrees: (a) CFreeze denies iall actions and events, (b) FakeEchot
ireproduces
all messages sent by ibut in byzantine form, (c) XFocust
i(for an appropriately chosen set X⊆A×T)
faithfully reproduces all actions and as many events as causally possible.
Deﬁnition 7. For an agentcontext
χ
,i∈A, and r∈R
χ
, we deﬁne the following iinterventions:
CFreeze (r):= (∅,∅).(3)
FakeEchot
i(r):=∅,{fail (i)} ⊔ fake(i,gsend(i,j,
µ
,id)7→ noop)
gsend(i,j,
µ
,id)∈
β
t
i(r)∨(∃A)fake (i,gsend(i,j,
µ
,id)7→ A)∈
β
t
ε
(r).(4)
XFocust
i(r):=
β
t
i(r),
β
t
ε
i(r)\grecv(i,j,
µ
,id(j,i,
µ
,k,m)) (j,m)/∈X,k∈N.(5)
302
Causality and Epistemic Reasoning in Byzantine MultiAgent Systems
4 The Reliable Causal Cone
Before giving formal deﬁnitions and proofs, we ﬁrst explain the intuition behind our byzantine analog of
Lamport’s causal cone, and the particular adjustments used for constructing runs with identical reliable
causal cones in the main Lemma 10 of this section.
In the absence of faults [2], the only way information, say, about an event ethat happens to an agent j
can reach another agent i, or more precisely, its node (i,t), is via a causal chain (time progression and
delivered messages) originating from jafter ehappened and reaching ino later than timestamp t. The
set of beginnings of causal chains, together with all causal links, is called the causal cone of (i,t). The
standard way of demonstrating the necessity of such a chain for ito learn about e, when expressed in
our terminology, is by using an adjustment that removes all events and actions outside the causal cone.
Once an adjusted run with no haps outside the causal cone is shown to be transitional and the local state
of iat timestamp tis shown to be the same as in the given run, it follows that iconsiders it possible that
edid not happen and, hence, does not know that ehappened. This well known proof is carried out in our
framework in [12] (see also [13] for an extended version).
However, one subtle aspect of our formalization is also relevant for the byzantine case. We illus
trate it using a minimal example. Suppose, in the given run, jssent exactly one message to jrduring
round m+½ and it was correctly received by jrin round l+½. At the same round, jritself sent its last
ever message, and sent it to i. If this message to iarrived before t, then (jr,l)is a node within the causal
cone of (i,t). On the other hand, neither (jr,l+1)nor (js,m)are within the causal cone. Thus, the run
adjustment discussed in the previous paragraph removes the action of sending the message from jsto jr,
which happened outside the causal cone, and, hence, makes it causally impossible for jrto receive it de
spite the fact that the receipt happened within, or more precisely, on the boundary of the causal cone. On
the other hand, the message sent by jrin the same round cannot be suppressed without inoticing. Thus,
suppressing all haps on the boundary of the causal cone is not an option. These considerations necessitate
the use of XFocusl
jto remove such “ghosts” of messages instead of the exact copy of round l+½ of the
given run. To obtain Chandy–Misra’s result, one needs to set Xto be the entire causal cone.5
We now explain the complications created by the presence of byzantine faults. Because byzantine
agents can lie, the existence of a causal chain is no more sufﬁcient for reliable delivery of information.
Causal chains can now be reliable, i.e., involve only correct agents, or unreliable, whereby a byzantine
agent can corrupt the transmitted information or even initiate the whole communication while pretending
to be part of a longer chain. If several causal chains link a node (j,m)witnessing an event with (i,t),
where the decision based on this event is to be made, then, intuitively, the information about the event can
only be relied upon if at least one of these causal chains is reliable. In effect, all correct nodes, i.e., nodes
(j,m)such that (I,r,t)=correct(j,m), are divided into three categories: those without any causal chains
to (i,t), i.e., nodes outside Lamport’s causal cone, those with causal chains but only unreliable ones, and
those with at least one reliable causal chain. There is, of course, the fourth category consisting of byzan
tine nodes, i.e., nodes (j,m)such that (I,r,t)6=correct(j,m). Since there is no way for nodes without
reliable causal chains to make themselves heard, we call these nodes silent masses and apply to them the
CFreeze intervention: since they cannot have an impact, they need not act. The nodes with at least one
reliable causal chain to (i,t), which must be correct themselves, form the reliable causal cone and treated
the same way as Lamport’s causal cone in the faultfree case, except that the removal of “ghost” messages
is more involved in this case. Finally, the remaining nodes are byzantine and form a fault buffer on the
5This treatment of the cone’s boundary could be perceived as overly pedantic. But in our view this is preferable to being
insufﬁciently precise.
R. Kuznets, L. Prosperi, U. Schmid & K. Fruzsa
303
way of reliable information. Their role is to pretend the run is the same independently of what the silent
masses do. We will show that FakeEchom
jsufﬁces since only messages sent from the fault buffer matter.
Before stating our main Lemma 10, which constructs an adjusted run that leaves agent iat tin the
same position while removing as many haps as possible, it should be noted that our analysis relies on
knowing which agents are byzantine in the given run, which may easily change without affecting local
histories. This assumption will be dropped in the following section.
First we deﬁne simple causal links among nodes as binary relations on A×Tin inﬁx notation:
Deﬁnition 8. For all i∈Aand t∈T, we have (i,t)→l(i,t+1). Additionally, for a run r, we
have (i,m)→r
c(j,l)iff there are
µ
∈Msgs and id ∈Nsuch that grecv(j,i,
µ
,id)∈
β
l−1
ε
(r)and either
gsend(i,j,
µ
,id)∈
β
m
i(r)or fake (i,gsend(i,j,
µ
,id)7→ A)∈
β
m
ε
(r)for some A∈ {noop} ⊔ GActionsi.
Causal rlinks →r:=→l∪→r
care either local or communication related. A causal rpath for a run ris
a sequence
ξ
=h
θ
0,
θ
1,...,
θ
ki,k≥0, of nodes connected by causal rlinks, i.e., such that
θ
l→r
θ
l+1for
each 0 ≤l<k. This causal rpath is called reliable iff node (jl,tl+1)is correct in rfor each
θ
l= ( jl,tl)
with 0 ≤l<kand, additionally, node
θ
k= ( jk,tk)is correct in r. We also write
θ
0 r
ξθ
kto denote the
fact that path
ξ
connects node
θ
0to
θ
kin run r, or simply
θ
0 r
θ
kto state that such a causal rpath exists.
Note that neither receives nor sends of messages forming a reliable causal rpath can be byzantine.
The latter is guaranteed by the immediate future of nodes on the path being correct.
Deﬁnition 9. The reliable causal cone ◮r
θ
of node
θ
in run r consists of all nodes
ζ
∈A×Tsuch
that
ζ
r
ξθ
for some reliable causal rpath
ξ
. The fault buffer iir
θ
of node
θ
in run r consists of all
nodes (j,m)with m<tsuch that (j,m) r
θ
and (j,m+1)is not correct. Abbreviating ii◮r
θ
:=◮r
θ
⊔iir
θ
,
the silent masses of node
θ
in run r are all the remaining nodes ·ir
θ
:= (A×T)\ ii◮r
θ
.
Here the ﬁlling of the cone ◮signiﬁes reliable communication, ii represents a barrier for correct
information, whereas ·i depicts correct information isolated from its destination. We can now state the
main result of this section:
Lemma 10 (Coneequivalent run construction).For f ∈N, for a nonexcluding agentcontext
χ
=
(P
ε
,G(0),
τ
f,Ψ),Psuch that all agents are gullible, correctable, and delayable, for any
τ
f,P
ε
,Ptran
sitional run r, for a node
θ
= (i,t)∈A×Tcorrect in r, let adjustment adj = [Bt−1;...;B0]where
Bm= (
ρ
m
1,...,
ρ
m
n)for each 0≤m≤t−1such that
ρ
m
j:=
ii◮r
θ
Focusm
jif (j,m)∈◮r
θ
,
FakeEchom
jif (j,m)∈ iir
θ
,
CFreeze if (j,m)∈ ·ir
θ
.
(6)
Then each r′∈R(
τ
f,P
ε
,P,r,adj)satisﬁes the following properties:
(A) (∀(j,m)∈◮r
θ
)r′
j(m) = rj(m);
(B) (∀m≤t)r′
i(m) = ri(m);
(C) for any m ≤t, we have that
β
m−1
ε
(r′)∩FEvents j6=∅iff both (j,m−1) r
θ
and (j,m)is not
correct in r;
(D) for any m ≤t, any node (j,m)correct in r is also correct in r′;
(E) the number of agents byzantine by any m ≤t in run r′is not greater than that in run r and is ≤f ;
(F) r′is
τ
f,P
ε
,Ptransitional.
Proof sketch. The following properties follow from the deﬁnitions:
◮r
θ
∩iir
θ
=∅,
θ
∈◮r
θ
,(7)
(j,m)∈◮r
θ
&(k,m′)→r(j,m) =⇒(k,m′)∈ ii◮r
θ
.(8)
304
Causality and Epistemic Reasoning in Byzantine MultiAgent Systems
Note that for
β
k= ( jk,mk)with k=1,2, we have
β
1→r
β
2implies m1<m2and
β
1 r
β
2implies
m1≤m2. Thus, all parts of the lemma except for Statement (F) only concern m≤t, and even this last
statement for m>tis a trivial corollary of Def. 6(d). Thus, we focus on m≤t.
Statement (A) can be proved by induction on musing the following auxiliary lemma for the given
transitional run rand the adjusted run r′, which is also constructed using the standard update functions.
Lemma 11. If r j(m) = r′
j(m), and
β
m
j(r) = a
ρ
m
j(r), and
β
m
ε
j(r) = e
ρ
m
j(r), then rj(m+1) = r′
j(m+1).
Proof. This statement follows from (2) for the transitional run r, Def. 6(b) for the adjusted run r′, and the
fact that update jonly depends on events of agent j, in particular, on the presence of go(j)or sleep (j)
(see Def. A.19 in the appendix for details).
The third condition of Lemma 11 is satisﬁed for
ρ
m
j(r) = ii◮r
θ
Focusm
jwithin ◮r
θ
by (8). Further,
if (j,m)∈◮r
θ
, then so are all (j,m′)with m′≤m. In particular, (i,m′)∈◮r
θ
for any m′≤t. Thus,
Statement (B) follows from Statement (A) we have already proved.
Statement (C) is due to the fact that (a) ii◮r
θ
Focusm
jdoes not produce any new byzantine events
relative to
β
m
ε
j(r), which contains none for (j,m)∈◮r
θ
, (b) CFreeze never produces byzantine events,
whereas (c) FakeEchom
jalways contains at least fail (j)∈BEvents j. Statements (D) and (E) are direct
corollaries of Statement (C).
The bulk of the proof concerns Statement (F), or, more precisely the transitionality up to times
tamp t. For each m<t, we need sets
α
m
ε
(r′)∈P
ε
(m)and
α
m
j(r′) = {global (j,m,a)a∈Xj}for some
Xj∈Pj(r′
j(m)) for each j∈Asuch that for
β
m
ε
(r′) = Fj∈Ae
ρ
m
j(r)and
β
m
j(r′) = a
ρ
m
j(r)for all j∈A,
β
m
ε
r′=f ilt er
ε
r′(m),
α
m
ε
r′,
α
m
1r′,...,
α
m
nr′,(9)
β
m
jr′=f ilt erj
α
m
1r′,...,
α
m
nr′,
β
m
ε
r′.(10)
The construction of such
α
sets and the proof of (9)–(10) for them is by induction on m. Note that
r′
ε
(0) = r
ε
(0)and r′
j(0) = rj(0)for all j∈Aby Def. 6(a). We will show that it sufﬁces to choose
α
m
jr′:=(
α
m
j(r)if (j,m)∈◮r
θ
,
{global (j,m,a)a∈Xj}for some Xj∈Pj(r′
j(m)) otherwise,(11)
with the choice in the latter case possible by Pj(r′
j(m)) 6=∅, and
α
m
ε
r′:=f ilt er≤f
ε
r(m),
α
m
ε
(r),
α
m
1(r),...,
α
m
n(r)\G
(l,m)∈·ir
θ
⊔iir
θ
GEventsl⊔ {fail (l)(l,m)∈ iir
θ
}⊔
nfake (l,gsend(l,j,
µ
,id)7→ noop)(l,m)∈ iir
θ
&
gsend(l,j,
µ
,id)∈
β
m
l(r)∨(∃A)fake (l,gsend(l,j,
µ
,id)7→ A)∈
β
m
ε
(r)o.(12)
Informally, according to (11), in r′, we just repeat the choices made in rwithin the reliable causal
cone and make arbitrary choices elsewhere. According to (12), events are chosen in a more complex
way. First, mimicking the 1ststage ﬁltering in the given run r, the originally chosen
α
m
ε
(r)∈P
ε
(m)
is preventively purged of all byzantine events whenever they would have caused more than fagents to
become faulty in r. Note that, in our transitional simulation of the adjusted run r′, this is done prior to
ﬁltering (9) by exploiting the correctability of all agents. Secondly, for all agents loutside the reliable
causal cone at the current timestamp m, i.e., with (l,m)∈ ·ir
θ
⊔iir
θ
, all events are removed, to comply with
R. Kuznets, L. Prosperi, U. Schmid & K. Fruzsa
305
the total freeze among the silent masses ·ir
θ
and to make room for byzantine communication in the fault
buffer iir
θ
. The resulting set complies with P
ε
because all agents are delayable. For the silent masses, this
is the desired result. For the fault buffer, on the other hand, byzantine sends are added for every correct
or byzantine send in r, thus, ensuring that the incoming information in the reliable causal cone in r′is the
same as in r. For the case when a faulty buffer node (l,m)sent no messages in the original run, fail (l)is
added to make the immediate future (l,m+1)byzantine despite its silence, which is crucial for fulﬁlling
Statement (C) and simplifying bookkeeping for byzantine agents.
The proof of (9)–(10) is by induction on m=0,...,t−1. To avoid overlong formulas, we ab
breviate the righthand side of (9) by ϒm
ε
and the righthand sides of (10) for each j∈Aby Ξm
jfor
the speciﬁc
α
m
j(r′)and
α
m
ε
(r′)deﬁned in (11) and (12) respectively. Thus, it only remains to show that
β
m
ε
(r′) = ϒm
ε
and (∀j∈A)
β
m
j(r′) = Ξm
j, or equivalently, further abbreviating ϒm
j:=ϒm
ε
∩GEventsj, that
β
m
ε
jr′=ϒm
jand
β
m
jr′=Ξm
j
for all j∈A, by simultaneous induction on m.
Induction step for the silent masses (j,m)∈ ·ir
θ
. By (12),
α
m
ε
j(r′):=
α
m
ε
(r′)∩GEventsj=∅, and ﬁl
tering it yields ϒm
j=∅=
β
m
ε
j(r′)as prescribed by CFreeze. In particular, go(j)/∈
β
m
ε
j(r′), thus, ensuring
that ﬁltering
α
m
j(r′), whatever it is, yields Ξm
j=∅=
β
m
j(r′), once again in compliance with CFreeze
applied within ·ir
θ
.
Before proceeding with the induction step for the remaining nodes, observe that events in
α
m
ε
(r′),
if added to r′(m), do not cross the byzantineagent threshold f, meaning that the 1ststage ﬁltering does
not affect
α
m
ε
(r′):
f ilt er≤f
ε
r′(m),
α
m
ε
r′,
α
m
1r′,...,
α
m
nr′=
α
m
ε
r′.(13)
Indeed there are two sources of byzantine events in
α
m
ε
(r′): byzantine events from
α
m
ε
(r)that survived
f ilt er≤f
ε
in (12) and those pertaining to nodes in the fault buffer iir
θ
. The former were also present
in
β
m
ε
(r)in the original run because the 2ndstage ﬁlter f ilt erB
ε
only removes correct (receive) events.
At the same time, for any (l,m)∈ iir
θ
, the immediate future (l,m+1)was a faulty node in rby the def
inition of iir
θ
. In either case, any agent faulty in r′based on
α
m
ε
(r′)was also faulty by timestamp m+1
in r. Additionally, any agent already faulty in r′(m)was also faulty in r(m)by Statement (D). Since the
number of agents faulty by m+1 in the original transitional run rcould not exceed f, adding
α
m
ε
(r′)
to r′(m)does not exceed this threshold either. It follows from (13) that
ϒm
j=ﬁlterB
ε
(r′(m),
α
m
ε
r′,
α
m
1r′,...,
α
m
nr′)∩GEventsj.(14)
Induction step for the fault buffer (j,m)∈ iir
θ
. For these nodes, the
α
m
ε
j(r′)part of
α
m
ε
(r′)contains
no correct events, hence, f ilt erB
ε
, which only removes correct receives, has no effect. In other words,
ϒm
j=
α
m
ε
r′∩GEvents j={fail (j)} ⊔ nfake (j,gsend(j,h,
µ
,id)7→ noop)
gsend(j,h,
µ
,id)∈
β
m
j(r)∨(∃A)fake (j,gsend(j,h,
µ
,id)7→ A)∈
β
m
ε
(r)o=
β
m
ε
jr′
as prescribed by FakeEchom
j. As in the case of the silent masses, go(j)/∈
β
m
ε
(r′)guarantees that the
Ξm
j=∅=
β
m
j(r′)requirement is fulﬁlled within iir
θ
.
Induction step for the reliable causal cone (j,m)∈◮r
θ
. The case of the nodes with a reliable causal
path to
θ
, whose immediate future remains correct in r, is the ﬁnal and also most complex induction step.
Recall that
α
m
j(r′) =
α
m
j(r)∈Pj(r(m)) = Pj(r′(m)) because within ◮r
θ
by Statement (A) r′(m) = r(m).
306
Causality and Epistemic Reasoning in Byzantine MultiAgent Systems
Thus, our choice of
α
m
j(r′)in (11) is in compliance with transitionality. Since (j,m+1)is correct,
the
α
m
ε
j(r):=
α
m
ε
(r)∩GEventsjpart of
α
m
ε
(r)contained no byzantine events and, hence, is unchanged
by (12). For the same reason it is not affected by 1ststage ﬁltering in either run. Thus, the same set of
j’s events undergoes the 2ndstage ﬁltering in both the original run rand in our transitional simulation
of the adjusted run r′. Let us call this set of j’s events Ωj.
Since both f ilt erB
ε
and ii◮r
θ
Focusm
jcan only remove receive events, it immediately follows that
β
m
ε
j(r′)⊆
β
m
ε
j(r)and ϒm
jagree on all nonreceive events. Importantly, this includes go(j)events, thus
ensuring that Ξm
j=
α
m
j(r).
A receive event U=grecv(j,k,
µ
,id)∈Ωjis retained in either run iff it is causally grounded by a
matching send, correct or byzantine. Due to the uniqueness of GMI id, as ensured by the injectivity of
both id and global functions, as well as Condition (c) of the tcoherency of sets produced by P
ε
, there is at
most one agent k’s node where such a matching send can originate from. If id is not wellformed and no
such send can exist, Uis ﬁltered out from both
β
m
ε
(r)and ϒm
ε
, the former ensuring U/∈
β
m
ε
(r′). The rea
soning in the case such a node
η
= (k,z)exists depends on where timestamp zis relative to mand where
η
falls in our partition of nodes. Generally, to retain Uin
β
m
ε
j(r)and ϒm
j, one must ﬁnd either a correct
send V:=gsend(k,j,
µ
,id)or a faulty send WA:=fake (k,V7→ A)for some A∈GActionsk⊔ {noop}.
• If z>mis in the future of m, then Uis ﬁltered out from both
β
m
ε
(r)and ϒm
ε
, hence, U/∈
β
m
ε
(r′).
• If z≤mand
η
∈ ·ir
θ
, then, independently of ﬁltering in r, hap U/∈eii◮r
θ
Focusm
j(r)=
β
m
ε
j(r′)
because the message’s origin is outside the focus area. At the same time, no actions or events are
scheduled at
η
in r′(for z=mit follows from the already proven induction step for silent masses).
Without either Vor WA, event Uis ﬁltered out from ϒm
ε
.
• If z<mand
η
∈ iir
θ
, then, by (4) in the deﬁniton of FakeEchoz
k, only Wnoop can save Uin ϒm
jand
Wnoop ∈
β
z
ε
k(r′)iff either V∈
β
z
k(r)or WA∈
β
z
ε
k(r)for some A. Thus, ﬁltering Uyields the same
result in both runs, and ii◮r
θ
Focusm
jdoes not affect Ubecause
η
∈ ii◮r
θ
.
• If z=mand
η
∈ iir
θ
, again only Wnoop can save Uin ϒm
j, this time by construction (12) of
α
m
ε
(r′)6∋ go(k). Here Wnoop ∈
α
m
ε
k(r′)iff either V∈
β
z
k(r)or WA∈
β
z
ε
k(r)for some A. Thus, ﬁl
tering Uyields the same result in both runs, and ii◮r
θ
Focusm
jdoes not affect Ubecause
η
∈ ii◮r
θ
.
• If z<mand
η
∈◮r
θ
, then (k,z+1)is still correct in both rand r′, hence, no byzantine events
such as WAare present in either r(m)or r′(m). Accordingly, only Vcan save Uin this case. Since
β
z
k(r) =
β
z
k(r′)by construction (5) of ii◮r
θ
Focusz
k, ﬁltering Uyields the same result in both runs,
and ii◮r
θ
Focusm
jdoes not affect Ubecause
η
∈ ii◮r
θ
.
• If z=mand
η
∈◮r
θ
, again (k,m+1)is correct in rmeaning this time that no WAare present in Ωk.
Again, only Vcan save Ufrom ﬁltering. Since
α
m
k(r) =
α
m
k(r′)by construction (11) and the sets
of events being ﬁltered agree on go(k)∈Ωk, here too ﬁltering Uyields the same result in both
runs, and ii◮r
θ
Focusm
jdoes not affect Ubecause
η
∈ ii◮r
θ
.
This case analysis completes the induction step for the reliable causal cone, the induction proof, proof of
Statement (F), and the proof of the whole Lemma 10.
5 Preconditions for Actions: Multipedes
Arguably the most important application of Lemma 10, and, hence, of causal cones, is to derive precon
ditions for agents’ actions, cp. [1]. While relatively simple in traditional settings, where events can be
preconditions according to the knowledge of preconditions principle [17] and where Lamport’s causal
cone sufﬁces, this is no longer true in byzantine settings. As Theorem 12 reveals, if f>0, an asyn
R. Kuznets, L. Prosperi, U. Schmid & K. Fruzsa
307
chronous agent can learn neither that it is (still) correct nor that a particular event6really occurred.
Theorem 12 ([12]).If f ≥1, then for any o ∈Eventsi, for any interpreted system I= (R
χ
,
π
)with any
nonexcluding agentcontext
χ
= ((P
ε
,G(0),
τ
f,Ψ),P)where i is gullible and every j 6=i is delayable,
I=¬Kioccurred(o)and I=¬Kicorrecti.(15)
These validities can be shown by modeling the infamous brain in a vat scenario (see [12] for details).
Theorem 12 obviously implies that knowledge of simple preconditions, e.g., events, is never achiev
able if byzantine agents are present. Settling for the next best thing, one could investigate whether
iknows ohas happened relative to its own correctness, i.e., whether Ki(correcti→occurred(o)) holds
(cf. [18]), a kind of nonfactive belief in o. This means that ican be mistaken about odue to its own
faults (in which case it cannot rely on any information anyway), not due to being misinformed by other
agents. It is, however, sometimes overly restrictive to assume that Ki(correcti→occurred(o)) holds in
situations when iis, in fact, faulty: typical speciﬁcations, e.g., for distributed agreement [15], do not
restrict the behavior of faulty agents, and agents might sometimes learn that they are faulty. We therefore
introduced the hope modality
Hi
ϕ
:=correcti→Ki(correcti→
ϕ
),
which was shown in [7] to be axiomatized by adding to K45 the axioms correcti→(Hi
ϕ
→
ϕ
), and
¬correcti→Hi
ϕ
, and Hicorrecti.
The following Theorem 13 shows that hope is also closely connected to reliable causal cones, in the
sense that events an agent can hope for must lie within the reliable causal cone.
Theorem 13. For a nonexcluding agentcontext
χ
= ((P
ε
,G(0),
τ
f,Ψ),P)such that all agents are
gullible, correctable, and delayable, for a correct node
θ
= (i,t), and for an event o ∈Events, if all
occurrences of O ∈GEvents such that local(O) = o happen outside the reliable causal cone ◮r
θ
of a run
r∈R
χ
, i.e., if O ∈
β
m
ε
(r)∩GEvents j&local(O) = o implies (j,m)/∈◮r
θ
, then for any I= (R
χ
,
π
),
(I,r,t)6=Hioccurred(o).
Proof. Constructing the ﬁrst trounds according to the adjustment from Lemma 10 and extending this
preﬁx to an inﬁnite run r′∈R
χ
using the nonexclusiveness of
χ
, we obtain a run with no correct events
recorded as o. Indeed, in r′, there are no events originating from ·ir
θ
, no correct events from iir
θ
, and all
events originating from ◮r
θ
, though correct, were also present in rand, hence, do not produce oin local
histories. At the same time, ri(t) = r′
i(t)by Lemma 10(B), making (I,r′,t)indistinguishable for i, and
(I,r′,t)=correctiby Lemma 10(D).
It is interesting to compare the results and proofs of Theorems 12 and 13. Essentially, in the run r′
modeling the brain in a vat in the former, iis a faulty agent that perceives events while none really happen.
Therefore, Kioccurred (o)can never be attained. In the run r′constructed by Lemma 10 in Theorem 13,
on the other hand, iremains correct. The reason that Hioccurred(o)fails here is that odoes not occur
within the reliable causal cone.
Theorem 13 shows that, in order to act based on the hope that an event occurred, it is necessary that
the event originates from the reliable causal cone. Unfortunately, this is not sufﬁcient. Consider the
case of a run rwhere no agent exhibits a fault: every causal message chain is reliable and the ordinary
6Actually, the reasoning in this section also extends to actions, i.e., arbitrary haps.
308
Causality and Epistemic Reasoning in Byzantine MultiAgent Systems
and reliable causal cones coincide. However, since up to fagents could be byzantine, it is trivial to
modify rby seeding fail (j)events in round ½ for several agents jin a way that is indistinguishable for
agent itrying to hope for the occurrence of o. This would enlarge the fault buffer and shrink the reliable
causal cone in the soconstructed adjusted run ˆr. Obviously, by making different sets of agents byzantine
(without violating f, of course), one can fabricate multiple adjusted runs where ˆri(t) = ri(t)is exactly
the same but fault buffers and reliable causal cones vary in size and shape. Any single one of those ˆr
satisfying the conditions of Theorem 13, in the sense that all occurrences of ohappen outside its reliable
causal cone, dash the hope of ifor oin r.
Thus, in order for ito have hope at (i,t)in run rthat oreally occurred, it is necessary that some
correct global version Oof o(not necessarily the same one) is present somewhere (not necessarily at
the same node) in the reliable causal cone of every run ˆrthat ensures ri(t) = ˆri(t). This gives rise to the
deﬁnition of a multipede, which ensures (I,r,t)=Hioccurred(o)according to Theorem 13:
Deﬁnition 14 (Multipede).We say that a run rin a nonexcluding agent context
χ
= ((P
ε
,G(0),
τ
f,Ψ),P)
contains a multipedeo
θ
for event o∈Events at some node
θ
= (i,t)iff, for all runs ˆr∈R
χ
with ri(t) = ˆri(t),
it holds that ohappens inside its reliable causal cone, i.e., that
(∃(j,m)∈◮ˆr
θ
)(∃O∈GEvents j)O∈
β
m
ε
(ˆr)&local(O) = o.
We obtain the following necessary condition for the existence of a multipede:
Theorem 15 (Necessary condition for a multipede).Given an arbitrary nonexcluding agentcontext
χ
=(P
ε
,G(0),
τ
f,Ψ),Psuch that all agents are gullible, correctable, and delayable and for any run
r∈R
χ
in any interpreted system I= (R
χ
,
π
), if (I,r,t)=Hioccurred(o)for a correct node
θ
= (i,t),
i.e., if there is a multipedeo
θ
in r, then the following must hold: Let Byzr
θ
:={j∈A(∃m)( j,m)∈ iir
θ
}.
For any S ⊆A\({i}⊔Byzr
θ
)such that S=f−Byzr
θ
, there must exist a witness wS∈Aof some correct
event OS∈
β
mS
ε
(r)∩GEventswSsuch that local(OS) = o and such that there is causal path (wS,mS) r
ξ
S
θ
that does not involve agents from S ⊔Byzr
θ
.
Proof. Since, by Lemma 10, the adjusted run r′∈R
χ
and since the only faults up to toccur in r′in
the fault buffer iir
θ
, i.e., pertain to agents from Byzr
θ
, for any Sdescribed above, one can construct ﬁrst
trounds by setting
β
0
ε
(rS):=
β
0
ε
(r′)⊔{fail (j)j∈S⊔Byzr
θ
}and keeping the rest of r′intact. These ﬁrst
trounds can be extended to complete inﬁnite runs rS∈R
χ
indistinguishable for iat
θ
from either r′or r
because the addition of fail(j)is imperceptible for agents and does not affect protocols. The only poten
tially affected element could have been f ilt er
ε
in the part ensuring byzantine agents do not exceed fin
number, but it also behaves the same way as in r′because S+Byzr
θ
=f. Since ri(t) = r′
i(t) = rS
i(t),
we have (I,rS,t)=Hioccurred(o). Node
θ
remains correct in these runs because i/∈S. Thus, by
Theorem 13, each run rSmust have a requisite correct event OS∈
β
mS
ε
wS(rS)∩◮rS
θ
. It remains to note
that any such correct event from rSmust be present in r′and in rand any causal path in rSexists already
in r′and r, according to the construction from Lemma 10. Thus, there must exist a causal path
ξ
in rfrom
(wS,mS)to
θ
such that
ξ
is reliable in rS. Finally, since all fbyzantine agents in rS, namely S⊔Byzr
θ
,
are made faulty from round ½, path
ξ
Sbeing reliable in rSmeans not involving these agents.
From the perspective of protocol design, arguably, of more interest are sufﬁcient conditions for
the existence of a multipede in a given run. Whereas a sufﬁcient condition could be obtained directly
from Def. 14, of course, identifying all the transitional runs ˆrwith ri(t) = ˆri(t)is far from being com
putable in general. Actually, we conjecture that sufﬁcient conditions cannot be formulated in a protocol
independent way at all. Unfortunately, however, protocoldependence cannot be expected to be simple
R. Kuznets, L. Prosperi, U. Schmid & K. Fruzsa
309
either. For instance, even just varying the number and location of faults in rfor suppressing occurred(o)
in a modiﬁed run ˆrcould be nontrivial. If kagents are already faulty in run r, at least f−kones can
freely be used for this purpose. However, some of the kbyzantine faults in rmay also be relocated in ˆr,
as agents that only become faulty after timestamp tcannot be part of any fault buffer. Rather than making
them faulty, it would sufﬁce to just freeze them.
For instance, for the following communication structure with f=2 and agents 1 and 2 being byzan
tine (we omit the time dimension for simplicity’s sake),
1234
they would both participate in the fault buffer, whereas, already 2 alone would sufﬁce because even were
1 correct, the observed communication does not give it a chance to pass by 2. Depending on 1’s protocol,
it might be possible to reassign 1 to the silent masses, thereby allowing to consider 4 as the second faulty
agent and, thus, showing the impossibility for 3 to act in this situation. An opposite outcome is possible
in the following scenario:
A1.1A2.1
I1A1.2C A2.2I2
A1.3A2.3
Let f=2 and the faulty agents be A2.1 and A1.1. While the sufﬁcient condition forces Cto consider
the case of both I1 and I2 being compromised and information originating from them unreliable, our
necessary condition does not rule out C’s ability to make a decision. Indeed, suppose I1 and I2 are
investigators sending in their reports via three aides each. Having received 4 identical reports that are
correct from A1.2, A1.3, A2.2, and A2.3 and only 2 fake reports from A1.1 and A2.1, agent Cwould have
been able to choose the correct version if the possibility of both investigators being compromised were
off the table. Our method of adjusting the run does not allow us to move the faulty agent from A1.1 to I1
because it is not clear how A1.1 would have behaved were it correct and had it received a fake report
from I1. By designing a protocol in such a way that A1.1’s correct behavior in such a hypothetical situ
ation is different, we can eliminate the possibility of investigators being compromised and, thus, resolve
the situation for C.
6 Conclusions
The main contribution of this paper is the characterization of the analog of Lamport’s causal cone in
asynchronous multiagent systems with byzantine faulty agents. Relying on our novel byzantine runs
andsystems framework, we provided an accurate epistemic characterization of causality and the induced
reliable causal cone in the presence of asynchronous byzantine agents. Despite the quite natural ﬁnal
shape of a reliable causal cone, it does not lead to simple conditions for ascertaining preconditions:
the detection of what we called a multipede is considerably more complex than the veriﬁcation of the
existence of one causal path in the faultfree case. Since the agents’ actions depend on the shape of
multiple alternative reliable causal cones in byzantine faulttolerant protocols like [20], however, there is
no alternative but to detect multipedes.
310
Causality and Epistemic Reasoning in Byzantine MultiAgent Systems
Developing practical sufﬁcient conditions for the existence of a multipede poses exciting challenges,
which are currently being addressed in the context of the epistemic analysis of some real byzantine
faulttolerant protocols. This contextdependency is unavoidable, since the agent that tries to detect a
multipede in a run lacks global information such as the actual members of the fault buffer. On the other
hand, the gap between the necessary and sufﬁcient conditions can potentially be minimized by designing
protocols based on the insights into the causality structure we have uncovered. For instance, while we
treated all errorcreating nodes as part of the fault buffer in our necessary conditions for a multipede,
it is sometimes possible to relegate redundant parts of it into the silent masses. As this would allow to
relocate byzantine faults for intercepting more causal paths, one may design protocols in a way that does
not allow this.
A larger and more longterm goal is to extend our study to syncausality and the reliable syncausal
cone in the context of synchronous byzantine faulttolerant multiagent systems, and to possibly incor
porate protocols explicitly into the logic.
Acknowledgments. We are grateful to Yoram Moses, Hans van Ditmarsch, and Moshe Vardi for
fruitful discussions and valuable suggestions that have helped shape the ﬁnal version of this paper. We
also thank the anonymous reviewers for their comments and suggestions.
References
[1] Ido BenZvi & Yoram Moses (2014): Beyond Lamport’s Happenedbefore: On Time Bounds and the Order
ing of Events in Distributed Systems.
Journal of the ACM
61(2:13), doi:10.1145/2542181.
[2] K. M. Chandy & Jayadev Misra (1986): How processes learn.
Distributed Computing
1(1), pp. 40–52,
doi:10.1007/BF01843569.
[3] Reinhard Diestel (2017): Graph Theory, Fifth edition. Springer, doi:10.1007/9783662536223.
[4] Cynthia Dwork & Yoram Moses (1990): Knowledge and Common Knowledge in a Byzantine Environment:
Crash Failures.
Information and Computation
88(2), pp. 156–186, doi:10.1016/08905401(90)900149.
[5] Ronald Fagin, Joseph Y. Halpern, Yoram Moses & Moshe Y. Vardi (1995): Reasoning About Knowledge.
MIT Press.
[6] Ronald Fagin, Joseph Y. Halpern, Yoram Moses & Moshe Y. Vardi (1999): Common knowledge revisited.
Annals of Pure and Applied Logic
96(1–3), pp. 89–105, doi:10.1016/S01680072(98)000335.
[7] Krisztina Fruzsa (2019): Hope for Epistemic Reasoning with Faulty Agents! In:
Proceedings of ESSLLI
2019 Student Session
. (To appear).
[8] Guy Goren & Yoram Moses (2018): Silence. In:
PODC ’18, Proceedings of the 2018 ACM Symposium on
Principles of Distributed Computing
, ACM, pp. 285–294, doi:10.1145/3212734.3212768.
[9] Joseph Y. Halpern & Yoram Moses (1990): Knowledge and Common Knowledge in a Distributed Environ
ment.
Journal of the ACM
37(3), pp. 549–587, doi:10.1145/79147.79161.
[10] Joseph Y. Halpern, Yoram Moses & Orli Waarts (2001): A characterization of eventual Byzantine agreement.
SIAM Journal on Computing
31(3), pp. 838–865, doi:10.1137/S0097539798340217.
[11] Jaakko Hintikka (1962): Knowledge and Belief: An Introduction to the Logic of the Two Notions. Cornell
University Press.
[12] Roman Kuznets, Laurent Prosperi, Ulrich Schmid & Krisztina Fruzsa (2019): Epistemic Reasoning with
ByzantineFaulty Agents. In:
Proceedings of FroCoS 2019
. (To appear).
[13] Roman Kuznets, Laurent Prosperi, Ulrich Schmid, Krisztina Fruzsa & Lucas Gr ´eaux (2019): Knowledge in
Byzantine MessagePassing Systems I: Framework and the Causal Cone. Technical Report TUW260549,
TU Wien. Available at https://publik.tuwien.ac.at/files/publik_260549.pdf.
R. Kuznets, L. Prosperi, U. Schmid & K. Fruzsa
311
[14] Leslie Lamport (1978): Time, Clocks, and the Ordering of Events in a Distributed System.
Communications
of the ACM
21(7), pp. 558–565, doi:10.1145/359545.359563.
[15] Leslie Lamport, Robert Shostak & Marshall Pease (1982): The Byzantine Generals Problem.
ACM Transac
tions on Programming Languages and Systems
4(3), pp. 382–401, doi:10.1145/357172.357176.
[16] Alexandre Maurer, S´ebastien Tixeuil & Xavier Defago (2015): Reliable Communication in a Dynamic Net
work in the Presence of Byzantine Faults. eprint 1402.0121, arXiv. Available at https://arxiv.org/abs/
1402.0121.
[17] Yoram Moses (2015): Relating Knowledge and Coordinated Action: The Knowledge of Preconditions Prin
ciple. In R. Ramanujam, editor:
Proceedings of TARK 2015
, pp. 231–245, doi:10.4204/EPTCS.215.17.
[18] Yoram Moses & Yoav Shoham (1993): Belief as defeasible knowledge.
Artiﬁcial Intelligence
64(2), pp.
299–321, doi:10.1016/00043702(93)90107M.
[19] Yoram Moses & Mark R. Tuttle (1988): Programming Simultaneous Actions Using Common Knowledge.
Algorithmica
3, pp. 121–169, doi:10.1007/BF01762112.
[20] T. K. Srikanth & Sam Toueg (1987): Optimal Clock Synchronization.
Journal of the ACM
34(3), pp. 626–645,
doi:10.1145/28869.28876.
Appendix
Filter functions
Deﬁnition A.16. The ﬁltering function f ilt er
ε
for asynchronous agents with at most f≥0 byzantine
faults is deﬁned as follows.
First, we deﬁne a subﬁlter f ilt erB
ε
:G×
℘
(GEvents)×∏n
i=1
℘
(GActionsi)→
℘
(GEvents)that re
moves impossible receives: for a global state h∈G, set X
ε
⊆GEvents, and sets Xi⊆GActionsi,
f ilt erB
ε
(h,X
ε
,X1,...,Xn):=X
ε
\ngrecv(j,i,
µ
,id)gsend(i,j,
µ
,id)/∈h
ε
∧
(∀A∈ {noop} ⊔ GActionsi)fake (i,gsend(i,j,
µ
,id)7→ A)/∈h
ε
∧(gsend(i,j,
µ
,id)/∈Xi∨go(i)/∈X
ε
)∧
(∀A∈ {noop} ⊔ GActionsi)fake (i,gsend(i,j,
µ
,id)7→ A)/∈X
ε
o,
where h
ε
is the environment’s record of all haps in the global state hand O∈h
ε
(O/∈h
ε
) states that the
hap O∈GHaps is (isn’t) present in this record of all past rounds, X
ε
represents all events attempted by
the environment and Xi’s represent all actions attempted by agents iin the current round.
Second, using XB
ε
i:=X
ε
∩BEventsi⊔ {sleep (i),hibernate (i)}and deﬁning A(Failed (h)) to be
the set of agents who have already exhibited faulty behavior in the global state h, we deﬁne a subﬁlter
f ilt er≤f
ε
:G×
℘
(GEvents)×∏n
i=1
℘
(GActionsi)→
℘
(GEvents)that removes all byzantine events in the
situation when having them would have exceeded the fthreshold:
f ilt er≤f
ε
(h,X
ε
,X1, .. . , Xn):=
X
ε
if A(Failed (h)) ∪iXB
ε
i6=∅≤f,
X
ε
\F
i∈A
XB
ε
iotherwise.
The ﬁlter f ilt er
ε
:G×
℘
(GEvents)×∏n
i=1
℘
(GActionsi)→
℘
(GEvents)is obtained by composing
these two subﬁlters, with the ≤fsubﬁlter applied ﬁrst:
f ilt er
ε
(h,X
ε
,X1, .. . , Xn):=f ilterB
ε
h,f ilt er≤f
ε
(h,X
ε
,X1, .. . , Xn),X1, .. ., Xn.
312
Causality and Epistemic Reasoning in Byzantine MultiAgent Systems
The composition in the opposite order could violate causality if a message receipt is preserved by
f ilt erB
ε
based on a byzantine send in the same round, which is later removed by f ilt er≤f
ε
.
Deﬁnition A.17. The ﬁlters f ilt eri:∏n
j=1
℘
(GActions j)×
℘
(GEvents)→
℘
(GActionsi)for agents’ ac
tions are deﬁned as follows: for X
ε
representing all environment’s events and Xirepresenting all actions
attempted by agent iin the current round,
f ilt eri(X1,...,Xn,X
ε
):=(Xiif go(i)∈X
ε
,
∅otherwise.
Update functions
Before deﬁning the update functions, we need several auxiliary functions:
Deﬁnition A.18. We use a function local :GHaps →Haps converting correct haps from the global
format into the local formats for the respective agents in such a way that, for any i,j∈A, any t∈T, any
a∈Actionsi, any
µ
∈Msgs, and any M∈N:
1. local(GActionsi) = Actionsi; 3. localglobal (i,t,a)=a;
2. local(GEventsi) = Eventsi; 4. localgrecv(i,j,
µ
,M)=recv(j,
µ
).
For all other haps, the localization cannot be done on a hapbyhap basis because system events and
byzantine events fake (i,A7→ noop)do not create a local record. Accordingly, we deﬁne a localization
function
σ
:
℘
(GHaps)→
℘
(Haps)as follows: for each X⊆GHaps,
σ
X:=local(X∩GHaps)∪
{E∈GEvents (∃i)fake (i,E)∈X} ∪ {A′∈GActions (∃i)(∃A)fake i,A7→ A′∈X}.
Deﬁnition A.19. We abbreviate X
ε
i:=X
ε
∩GEventsifor performed events X
ε
⊆GEvents and actions
Xi⊆GActionsifor each i∈A. Given a global state r(t) = r
ε
(t),r1(t),...,rn(t)∈G, we deﬁne
agent i’s updatei:Li×
℘
(GActionsi)×
℘
(GEvents)→Lithat outputs a new local state from Libased
on i’s actions Xiand events X
ε
:
updatei(ri(t),Xi,X
ε
):=(ri(t)if
σ
(X
ε
i) = ∅and X
ε
i∩ {go(i),sleep(i)}=∅,
σ
X
ε
i⊔Xi:ri(t)otherwise
(note that in transitional runs, updateiis always used after the action f il teri, thus, in the absence of go(i),
it is always the case that Xi=∅).
Similarly, the environment’s state update
ε
:L
ε
×
℘
(GEvents)×∏n
i=1
℘
(GActionsi)→L
ε
outputs
a new state of the environment based on events X
ε
and all actions Xi:
update
ε
(r
ε
(t),X
ε
,X1,...,Xn):= (X
ε
⊔X1⊔ · ·· ⊔ Xn):r
ε
(t).
Accordingly, the global update function upd ate :G×
℘
(GEvents)×∏n
i=1
℘
(GActions j)→Gmodiﬁes
the global state as follows:
update (r(t),X
ε
,X1,...,Xn):=update
ε
(r
ε
(t),X
ε
,X1,...,Xn),
update1(r1(t),X1,X
ε
),...,updaten(rn(t),Xn,X
ε
).