Conference PaperPDF Available

Vulnerabilities in synchronous IPC designs



Recent advances in interprocess communication (IPC) performance have been exclusively based on thread-migrating IPC designs. Thread-migrating designs assume that IPC interactions are synchronous, and that user-level execution will usually resume with the invoked process (modulo preemption). This IPC design approach offers shorter instruction path lengths, requires fewer locks, has smaller instruction and data cache footprints, dramatically reduces TLB overheads, and consequently offers higher performance and lower timing variance than previous IPC designs. With care, it can be performed as an atomic unit of operation. While the performance of thread-migrating IPC has been examined in detail, the vulnerabilities implicit in synchronous IPC designs have not been examined in depth in the archival literature, and their implications for IPC design have been actively misunderstood in at least one recent publication. In addition to performance, a sound IPC design must address concerns of asymmetric trust and reproducibility and provide support for dynamic payload lengths. Previous IPC designs, including those of EROS, Mach, L4, Flask, and Pebble, satisfy only two of these three requirements. In this paper, we show how these three design objectives can be met simultaneously. We identify the conflict of requirements and illustrate how their collision arises in two well-documented IPC architectures: L4 and EROS. We then show how all three design objectives are simultaneously met in the next generation EROS IPC system.
Appears in the 2003 IEEE Symposium on Security and Privacy,
Oakland, CA 2003
Vulnerabilities in Synchronous IPC Designs
Jonathan S. Shapiro
Department of Computer Science
Johns Hopkins University
Recent advances in interprocess communication (IPC)
performance have been exclusively based on thread-
migrating IPC designs. Thread-migrating designs assume
that IPC interactions are synchronous, and that user-level
execution will usually resume with the invoked process
(modulo preemption). This IPC design approach offers
shorter instruction path lengths, requires fewer locks, has
smaller instruction and data cache footprints, dramat-
ically reduces TLB overheads, and consequently offers
higher performance and lower timing variance than pre-
vious IPC designs. With care, it can be performed as an
atomic unit of operation.
While the performance of thread-migrating IPC has
been examined in detail, the vulnerabilities implicit in
synchronous IPC designs have not been examined in
depth in the archival literature, and their implications for
IPC design have been actively misunderstood in at least
one recent publication. In addition to performance, a
sound IPC design must address concerns of asymmetric
trust and reproducibility and provide support for dynamic
payload lengths. Previous IPC designs, including those
of EROS, Mach, L4, Flask, and Pebble, satisfy only two of
these three requirements.
In this paper, we show how these three design objec-
tives can be met simultaneously. We identify the conflict
of requirements and illustrate how their collision arises
in two well-documented IPC architectures: L4 and EROS.
We then show how all three design objectives are simulta-
neously met in the next generation EROS IPC system.
Keywords: operating systems, capability systems, inter-
process communication, vulnerability.
1 Introduction
Thread-migrating interprocess communication (IPC) de-
signs are one of the driving forces behind the current
resurgence of interest in microkernel-based systems. In
particular, IPC implementations by Liedtke [20, 21], and
Shapiro [30] have shown that protected IPC invocations
can be reduced into the 135 cycle range on Pentium-
family processors.
Experimental analysis by Ford [8]
shows that much of this performance advantage cannot be
achieved in asynchronous IPC mechanisms. Given these
results, performance-motivated reports of the demise of
microkernels [3] may have been premature. The question
of security-motivated demise remains open.
The potential security benefit of high-performanceIPC
is straightforward. In contrast to asynchronous or buffer-
ing IPC designs [1, 27], the latency of thread-migrating
IPC is low enough that applications can be factored
into multiple protection domains, each encapsulated by
a process boundary and selectively linked by protected,
IPC-based communication. Domain-based isolation is
an essential building block for high-assurance systems
[19, 33, 29, 5, 12, 31]. At least one commercial system
has been constructed using this approach to domain en-
forcement [11].
Selected denial of service attacks against the L4 mi-
crokernel and its servers (including several re-examined
here) have been briefly examined in the literature [22].
This paper provides an in-depth exposure of application-
level vulnerabilities that are implicit in any synchronous
IPC system. When a synchronous communication mech-
anism is introduced, the security and trust implications of
sender blocking must be considered. When the system de-
sign also incorporates user-level pagers [1], interactions
between these pagers, string transfers, and blocking must
also be considered. Relatively experienced IPC design-
ers have failed to recognize the potential security issues
inherent in this combination [7].
A satisfactory IPC design must simultaneously satisfy
certain functional, testability, and security requirements.
In particular:
These results predate the introduction of the Pentium family fast sys-
tem call instructions. Comparably careful implementations of those
designs on current machines shouldtherefore reducethecommon case
IPC time down into the 60-70 cycle designs.
Asymmetric Trust: The IPC architecture must not
embed any assumption that the sender trusts the re-
Reproducibility: The behavior of the IPC primitives
should not be influenced by exogenous factors such
as system load. This is required for testability and
operational predictability under variations in work-
Dynamic Payload: The IPC architecture must sup-
port transmission of messages whose length is not
knownin advance to the receiving party. Without dy-
namic payload support, procedures returning (e.g.)
precise (unbounded) integers or dynamically sized
vectors cannot be handled at the IPC interface.
This particular set of requirements is deceptively dif-
ficult to satisfy simultaneously. Due to the presence of
timeouts, Mach and L4 fail the reproducibility require-
ment. In the absence of timeouts or truncation, Pebble [9]
and Fluke [7] fail the asymmetric trust requirement. Prior
to the work reported here, EROS failed the dynamic string
This paper examines in detail certain denial of ser-
vice problems that arise in synchronous IPC designs. We
identify the cases in which there is potential for such de-
nial, and describe the available mechanisms in EROS and
L4 by which applications may attempt to protect them-
selves. We show how the next generation EROS IPC
subsystem simultaneously meets all three of the require-
ments identified above, and in the process show that func-
tional requirements such as support for dynamic message
payloads need not sacrifice reproducibility. The primary
goals of this paper are to ensure that denial of service vul-
nerabilities are not neglected in future IPC designs, and to
showhowthey are addressedin the next-generationEROS
IPC system.
The balance of this paper proceeds as follows. In sec-
tion 2, we selectively review the L4 and EROS IPC de-
signs. Section3 describesthe problem ofasymmetrictrust
in detail and the options for dealing with this problem.
Section 4 performs a further case analysis on IPC safety
in conditions of asymmetric trust, identifying many cases
where existing, low cost mechanisms are sufficient to ad-
dress the problem and introduces the trusted buffer object
(TBO), a solution for the general case. Section 5 justifies
why the TBO can be trusted where an ordinary client can-
not, and establishes additional requirements on the IPC
subsystem that are necessary to achieve an effective TBO
design. The remaining sections describe related work, ac-
knowledgments, and conclusions.
2 Review of the L4 and EROS IPC
This paper explores two interprocess communication de-
signs for purposes of illustration and discussion: those of
L4 and EROS. From a feature perspective, the two de-
signs have been converging for several years. Both have
been carefully engineered and specified, but they embody
somewhat different views of how IPC should be imple-
mented. Before proceeding to the body of the paper
proper, it is useful to briefly review these two IPC designs.
The descriptions provided below are not comprehensive,
focusing instead on those portions of the two IPC systems
that are relevant to denial of service vulnerabilities.
2.1 L4 IPC
The L4 IPC system [20, 21] implements a single basic
primitive SendAndReceive. This operation sends a mes-
sage to a named thread (or task) id and blocks waiting for
a response. The receive phase can be omitted, allowing
message sends that initiate a new thread of control within
the receiver.
As part of the specification of the receive phase,
the process performing a SendAndReceive can indicate
whether it will accept a message from any sender (an open
wait) or only from a particular thread (a closed wait). This
enables the implementation of remote procedure calls: by
receiving using a closed wait, the invoker of the RPC is
assured that the next incoming message will be received
from the invoked thread.
The payload of an L4 IPC operation consists of a
bounded vector of “immediate words, some number of
which are guaranteed to be transmitted in registers. This
word vector is optionally followed by a bounded dope
vector, each entry of which can specify either a byte
string of arbitrary length (limited by the address space
size) oran addressspace mappingto betransferred. Ignor-
ing some complications arising from scatter/gather sup-
port, the receive specification must include corresponding
dope vectors specifying where each logical component of
the message is to be received.
An L4 message whose payload is fully registerized is
guaranteed not to induce an exception in either sender or
recipient. Transmission of a data string can lead to re-
ceiver side page faults that are handled by user-mode fault
In general, there is no guarantee that the specified des-
tination process is in the receive state at the time of the
IPC operation. To avoid indefinite blocking, L4 invoca-
tions include two timeout values: the amount of time to
wait before transmission is initiated, and the amount of
time that can be spent servicing page faults. If either time-
out is exceeded, the operation is aborted and an error is
signaled to the sending process. The specified timeout
can be infinite.
Vulnerabilities The two key weaknesses of the L4 de-
sign are the absence of protection in the specification of
the recipient (any sender may invoke any process) and the
presence of timeouts in the IPC interface. In addition,
there are two minor issues: the IPC mechanism reveals
the sender’s task id to the recipient, and the widely re-
ported high-performance Pentium implementation fails to
save and restore segment register values.
The absence of IPC protection deprives a service of
the ability to restrict its callers. A hostile process may
perform denial of service attacks by repeatedly invoking
operations on an arbitrary target process. This can be mit-
igated using the original “clans and chiefs” mechanism
[23], but this solution imposes significant overhead on all
invocations. A more efficient solution proposed in [22]
is to allocate one server thread per client and use closed
waits. This solution requires all service applications to
accept the complexity of multithreading, has unfortunate
implications for resource consumption, and does not ad-
dress the problem of session establishment for such ser-
vices. An experimental design modification by Jaeger
has proposed a capability-like IPC redirection mechanism
[12] that provides access control and renders the recipient
identity opaque. This design has not yet beenincorporated
into the L4 specification, but is expected to add between
20% (typical) and 50% to the cost of an L4 IPC operation.
The timeout mechanism mitigates certain low-cost de-
nial of service attacks, but not effectively: the receiving
server thread is occupied by the IPC operation until the
timeout occurs. The presence of timeouts in the primitive
IPC specification impedes reproducible operation unless
workloads and queueing behaviorcan bepredicted. Time-
out failures are difficult to test in real systems, and have
the potential to cascade in unpredictable ways.
While the L4 IPC system has the ability to transmit
multiple, independently labeled strings, the inclusion of
multiple strings has little impact on vulnerability. The pri-
mary concern in string transmission is page fault handling
rather than aggregate string length or number of strings.
It is debatable whether the minor issues identified
above should be viewed as significant vulnerabilities. Re-
vealing sender task identity is clearly a failure of encap-
sulation. In principle this is unfortunate, but it is not
clear that the revelation has significant security conse-
quences. One concern is that revealing the caller task
identity demonstratesauthority only indirectly, and makes
interposition between client and server difficult[13]. Fail-
ure to save and restore segment registers reveals the un-
derlying interrupt event flow to any process that cares to
observe it. This is a high-bandwidth inward covert chan-
nel, but the hole could be straightforwardly closed with
minor alterations to the implementation.
The EROS IPC system was derived from that of KeyKOS
[11], but has been influenced by various aspects of the
L4 design. EROS implements three primitive invoca-
tions: SEND (which does not wait), CALL (sends and en-
ters a closed wait), and RETURN (sends and entersan open
wait). The fact that a process is in an open or a closed wait
is explicitly recorded in the process state of the waiting
In contrast to the L4 design, the open/closed wait dis-
tinction in EROS expresses a restriction on the capabil-
ity type that must be invoked rather than the process id
of the invoker. A receiving process in the available state
(the open wait) must be invoked using a start capability
to the receiving process. If the receiving process is not
in the available state, this invocation will block. A re-
ceiving process in the waiting state (the closed wait) must
be invoked using a resume capability to the receiving pro-
cess. Resume capabilities are producedby the CALL oper-
ation and areconsumedwhen invoked. The processmodel
guaranteesthat if a resume capability existsto a givenpro-
cess, that process is in the waiting state. This ensures that
invoking a resume capability will never block waiting for
interprocess rendezvous.
The payload of an EROS IPC operation contains four
registers, four capabilities, and a bounded string. Forth-
coming enhancements will remove the string bound and
provide L4-like scatter/gather support. In contrast to L4,
an EROS process must hold a capability to the process it
wishes to invoke. Capabilities are named indirectly in the
invocation by specifying their index in a kernel-managed
per-process capability list. EROS provides only limited
means for mapping transfers: the sender can explicitly
construct a mapping structure and transmit a capability
to this structure to the recipient. EROS also provides a
translucent forwarding object known as a wrapper that al-
lows capabilities to be selectively rescinded.
As with L4, an EROS invocation whose payload con-
sists only of registers and capabilities is guaranteed not to
induce an exception in either sender or recipient. Trans-
mission of a data string can lead to page faultsin the recip-
ient if the receive region has not been properly prepared
by the recipient. If such a page fault would require the
invocation of a user-mode page fault handler, and the in-
vocationis a resume capability invocation, the transmitted
string is truncated. This behavior guards against hostile
clients that seek to block a server indefinitely.
The interactions between EROS invocation types and
capability types allow two mutually trusting processes to
establish an “extendedmutual exclusion”using co-routine
style CALL operations. Process
performs a CALL on .
Instead of returning, performs a CALL on the resume
capability to . The two processes continue in this fash-
ion until both are done, and the final invocationperformed
is a RETURN operation. Until this final invocation is per-
formed, whichever party is blocked is in a closed wait.
This ensures that third-party invocations cannot interrupt
the transaction in progress.
Vulnerabilities The key weakness of the EROS design
is its requirementthat the client knowin advance an upper
bound on the size of any message returned by a server.
This is a functional rather than a security failing, but it is
rather a nuisance when implementing a language-neutral
capability interface.
The EROS truncation rule has been mis-characterized
as a correctness flaw by Ford et al. [7]. In fact, it reflects
a conscious decision that servers are not obligated to re-
turn correct data to hostile clients. Liedtke also found the
truncation decision surprising (personal communication).
Discussion of its rationale resulted in the introduction of
separate page fault timeouts into the L4 IPC system. In
the absence of either truncation or timeouts, delivery is
assured at the cost of exposure to denial of service. We
note that given a choice between exposure, truncation, and
timeout there are really no desirable outcomes. The solu-
tion that we are adopting in the nextgenerationEROS sys-
tem is feasible only incapability-based systems, and relies
heavily on EROS’s ability to factor resource and trust de-
3 Asymmetric Trust
A key, under-recognized requirement in IPC systems is
the ability to support asymmetric or qualified trust rela-
tionships between two communicating processes. This
occurs whenever multiple client applications call a single
server application, and is especially acute when the com-
mon (shared) application is a reference monitor. While
the clients trust the server to process their requests, a cor-
rect server is presented with a curious set of conflicting
It must respond to the client requestsaccording to the
requirements of its interface and operational specifi-
It cannot trust that a given client will faithfully exe-
cute its part in the application-levelinvocation proto-
col. In particular, no assumptions can be made by the
server about the possibility of client-induced block-
ing unless blocking is precluded by the primitive IPC
While it has not been emphasized in prior publications on the EROS
IPC system, the EROS system includes a translucent forwarding ob-
ject. The original purpose of the translucent forwarder was to support
selective rescinding of client access. Selective revocation of this kind
can be used defensively to revoke hostile sessions in the event of re-
quest flooding attacks.
A hostile client might attempt denial of service on other
clients simply by calling a serverand blockingindefinitely
in a non-receiving process state, causing the server in turn
to become blockedwhen it attempts to reply. We will refer
to this type of a client as a defecting client.
Four basic design features have been attempted in var-
ious IPC designs to manage and mitigate this problem:
Buffering Mach [1], along with mostearly IPC systems,
implements message buffering. Buffer blocks are kernel
allocated, which effectively converts a well-localized at-
tack on a single server into a global denial of resource
attack on the entire operating system.
Multithreading L4 [20, 21] and Mach support multi-
threaded servers. In L4, threads are explicitly allocated
resources, and a given server will eventually run out of
them. Multithreading is therefore not effective against the
denial of resource attack, but it does ensure that the attack
also pressures the kernel scheduler, expanding the scope
of the vulnerability.
Truncation EROS [30] ensures that all server replies
are “prompt” by virtue of its IPC specification. A prompt
invocation is one that cannot be blocked by the actions of
an untrusted party. The EROS process model guarantees
that the client is in the proper state when the server replies
using a resume capability. However, this is only a partial
solution. Unless care is taken, user-level pagers can be
used by a hostile client to block a replying server in a non-
returning user-level page fault handler. To preclude this
attack, EROS truncates the message and reports the fact
of truncation to the client whenever a user-mode handler
would need to be invoked. Defined pages that have been
removed by the single-level store implementation do not
result in truncation, because the kernel is trusted to supply
them promptly.
Timeouts L4 [20, 21] includes a timeoutspecification as
part of each invocation, with reserved values to mean “do
not wait” and “wait forever.” While a timeout prevents to-
tal blockage of a server, a small number of hostile clients
can exploit the use of a server-side timeout to implement
severe denial of service against other clients.
A summary of which features are included in com-
monly referenced IPC systems is given in Table 1.
Feature Mach L4 EROS
Buffering Yes No No
Multithreading Yes No Via Retry
Truncation No No Yes
Timeouts No Yes No
Table1. Summary of defensive features in com-
monly referenced IPC implementations.
3.1 Effect of Buffering
As we have already mentioned, kernel-implemented IPC
buffering creates opportunity for global denial of service
by performing a denial of resource attack on the avail-
able kernel buffers. Because of Mach’s lazy copy feature,
which performs large copies using transparently kernel-
implemented copy on write mappings, attacking the Mach
IPC system in this way requires fairly large copies: the
pressure on pages is converted into pressure on page ta-
bles. When implemented on most hierarchical memory
management architectures, a small number of cooperat-
ing hostile processes can exhaust Mach kernel memory
on a 32-bit machine by performing misaligned copies of a
2 gigabyte region; each copy allocates enough page tables
to consume 8 megabytes of kernel memory.
To perpetuate this attack, the attacking process need
never acquire the full 2 gigabytes. By starting with a
smaller region whose size just exceeds the kernel’s pol-
icy threshold for copy on write transfer a hostile process
can use multiple transfers to create the large region with-
out actually holding a large number of data pages. The
essence of the problem is that neither party in the transac-
tion is “charged” for their mapping pages by the kernel.
While the additional copies imposed by buffering are
undesirable, the security problem with buffering arises
from a misattribution of burden (cost). By failing to at-
tribute the cost of bufferstorage to an appropriateprocess,
the kernel becomes open to attack. The VAX/VMM se-
curity monitor mitigated this issue using memory quotas
[18], but a quota-based approach is not practical in effi-
cient IPC systems. Introduction of such a quota mech-
anism into a synchronous IPC system must eventually
result in the delivery of an allocation fault to a user-
mode exception handler, which is exactly the problem that
EROS faces with user-supplied page fault handlers.
The alternative is to implement an IPC primitive that
operates over buffered channels in pipe-like fashion. This
approach abandons essentially all of the IPC performance
advances of the last 20 years, and the resulting IPC primi-
tive is too slow to support an adequate degree of compart-
mentalization. Neither L4 nor EROS implements buffer-
ing in the kernel, and it is now generally accepted that
buffering should not be included in the design of a kernel-
level IPC primitive.
3.2 Effect of Multithreading
The effect of multithreadingon IPC denialof servicetakes
two forms. If new threads are allocated transparently by
the kernel, there is an implied denial of resource attack
by exhausting kernel memory. If threads are allocated
explicitly by the server process, then the attack requires
copies of the hostile client, where is the number of
available server threads.
Multithreading is required in certain servers, but from
an assurance perspective it is best avoided on complexity
grounds. Where a single-threaded server is feasible, the
complexity of concurrencymanagement is eliminated and
assurance of the server is more easily achieved.
L4 providesmultithreading-awareIPC in the microker-
nel. EROS will shortly implement a RETRY operation that
allowsa service toforcea caller to re-performits last invo-
cation on a service-supplied capability. This mechanism
allows an EROS service to selectively block and unblock
callers in the kernel after examining their request. Most
cases of multiplexing and out of order return can be han-
dled using only a single service thread by leveraging this
3.3 Effect of Truncation
Truncation imposes a subtle requirement on all object in-
terfaces: an upper bound on the length of the reply must
be known by the client at the time of the call so that it can
pre-probe the receive area to ensure that needed pages are
defined. This is an impediment for operations that return
(e.g.) vectors or strings, as it is common for both to be
dynamically sized. In the general case, neither of these
types can safely be used as the return value from an in-
terprocess call unless the underlying IPC system supports
dynamically sized return values.
EROS inherited its truncation approach from KeyKOS
[11] and it has seemed adequate for many years. We have
lately come to feel that it is problematic. Several changes
in the EROS IPC system over the years have combined to
alter our view:
1. The expansion of the KeyKOS [11] one page IPC
payload limit to 64 kilobytes in EROS reduced the
length pressure on messages, and started us thinking
about how to map CORBA-like object invocations
directly onto the primitive invocation mechanism.
2. This in turn led to a paper design, to be implemented
in the next round of EROS IPC modifications, that
removes the string length bound entirely (subject to
the limits of the address space size).
3. The subsequent design and implementation of
CapIDL, which provides a language-neutral inter-
face definition language for capabilities, drew our at-
tention to the fact that unbounded dynamically sized
vectors cannot be supported if the recipient must
know the message size in advance.
4. Our desire to use E [25], a capability-based scripting
language, as a scripting language for EROS objects
led us to introduce a standard GetSignature()
operation on all conforming capabilities. This oper-
ation returns a string whose length cannot be known
to the client in advance.
As luck would have it, the GetSignature() op-
eration is the first time we have needed to define a
commonly used operation with a response size that
is not knowable at call time.
To our knowledge, EROS and KeyKOS are the only
current IPC systems implementing truncation. L4 does
3.4 Effect of Timeouts
As previously discussed, timeouts impede predictable be-
havior and testability. Personal discussion with Yoon Ho
Park regarding how timeout values were actually used in
L4-based SawMill multiserversystem [10] concluded that
the use of L4 timeouts (and timeouts in general) can be di-
vided into four categories:
1. Do not wait at all (corresponds to EROS truncation).
2. Wait indefinitely
3. Wait for some period motivated by an externally
specified duration in the real world. For example, if a
disk drive is specified to have a maximum seek time
of 12ms, but no seek completion interrupt has been
received within 20ms, something is probably wrong.
4. All other values. In principle any such value must be
incorrect unless the dynamic range of possible work-
loads can be specified in advance.
In practice, L4 services do not use timeouts in the last cat-
egory, typically resorting either to infinite timeouts or to
non-blocking IPC operations. This suggests that in prac-
tice the addition of a timeout value is an ineffective guard
against defecting clients.
Setting aside the question of effectiveness, timeouts
embeddedin IPC operations presentdifficultiesin friendly
use: they preclude testability, and they do not interact
favorably with debugging (because the recipient may be
stopped). In practice, when such a timeout is triggered it
is usual for the IPC to fail, but the IPC sender is nonethe-
less blocked for the duration of the timeout when it could
potentially be doing useful work.
When IPC timeouts are leveraged to support client-side
page faults during IPC (as in L4), the timeout mechanism
can be exploited by hostile clients to implement efficient
denial of service attacks against other clients that share a
common server (e.g. a window system) with the attacker.
The attack proceeds by first implementing a client-side
page fault handler that simply never waits for a page fault
notification. With this page fault handler in place, the
client sends a string containing an undefined page to the
shared server. The receiving server thread (in L4: task)
is rendered inaccessable until the timeout expires. In con-
sequence, well-behaved clients cannot invoke the server.
Multithreading does not circumvent this attack. It simply
requires that several duplicates of the attacking client be
used. All of these duplicates can sharein commona single
defecting page fault handler.
L4 provides timeouts, EROS does not.
4 Case Analysis for Suspicious IPC
In seeking a solution to the problem of asymmetric trust,
we would like to satisfy three objectives:
1. We would like to avoid introducing complexity
whenever possible. Complex strategies used by a
server manifest as latency experienced by its clients.
2. When a client defects, we would like to impose the
cost of misbehavior on the client.
3. We must avoid solutions that convert a localized de-
nial of service into a systemwide denial of service.
4.1 Sources of Vulnerability
Before looking for a general solution to the defecting
client, it is useful to first understand the cases in which
the problem is actually a threat. This is useful because of
an “escape hatch” in the requirements on the IPC subsys-
tem: we need not guarantee service to attackers. If it can
be determined that a receiver is defecting, the IPC sub-
system can be absolved of its responsibility to deliver the
The potential sources of blockage in the IPC process
lie in the invocation type (divisible or indivisible), the in-
vokee state, the invokers transmitted string (page faults),
and the invokee’s received string (page faults).
4.1.1 Invoker Thread, String Vulnerabilities
As long as IPC invocations are specified as indivisible, we
can rely on the fact that the invoker cannot cause the invo-
cation to block without completing. To prevent invoker-
side page faults, the IPC implementation can behave as
though a “dry run” is performed on the sender side be-
fore the real invocation, effectively causing all sender-side
page faults to occur prior to the invocation and ensuring
that the sender-side string is ready prior to invocation.
Following a successful dry run, we may assume that ex-
ceptions in the send string region constitute defection and
abort the invocation. This is how the EROS IPC mecha-
nism is specified.
4.1.2 Invokee in Wrong State
The next concern is to consider whether the recipient may
be in the wrong state to accept the IPC invocation at all.
We must here consider two kinds of invokee states: open
waits and closed waits. The blocking behavior expected
by an invoker depends on the expected state of the invo-
When an invoker process performs an IPC to a
server process that is in an open wait, there is an im-
plicit race with all other processes that might be invoking
. It is possible that some other process beats to the
invocation and may consequently block for a period of
time controlled by
. Therefore, in any such invocation
the invoker is implicitly declaring that they permit the in-
vokee to indefinitely block the invoker.
In a closed waitmattersare quitedifferent. The invoker
expects that the invokee is waiting for a response. If for
some reason the invokee is not waiting in a closed wait,
then the invokee has defected. In practice, the closed wait
arises when a serveris responding to a previous client call.
This is the case in which prompt completion is required to
prevent denial of service attacks against the server.
In the L4 IPC system, there is no direct coupling be-
tween the invokeestate and the invokerinvocation. Where
the processes involved know that procedure-call seman-
tics is expected the client can use a “wait forever”timeout
and waits in a closed wait. The server responds using a
“do not wait” timeout.
In the EROS IPC system, a start capability is a capa-
bility whose invocation will block unless the recipient is
in the available (open wait) state. A resume capability
must be used to invoke a process in the waiting (closed
wait) state. Any operation that causes an EROS process
to leave the waiting state causes all outstanding resume
capabilities to be efficiently rescinded, guaranteeing that
client-side debuggers cannot be exploited as a means to
attack servers. Resume capabilities are consumed as they
are used. This prevents a server that is later compromised
from performing denial of service attacks on past clients.
4.1.3 Receive String Page Faults, Large Strings
When the invocation protocol declares that the invokee
may legally block the invoker, as in “wait forever” (L4)
or start capability (EROS) invocations, we need not con-
sider why blocking occurs. By agreeing to block, the in-
vokeris implicitly saying that it trusts the receiverto make
decisions about the disposition of the invoker’s thread of
control. As long as it can tell that this is the case, the
IPC subsystem is free to run an invokee-specified page
fault handler. If the invoker has specified that the invoca-
tion must be prompt, as in the do not wait (L4) or resume
key (EROS) invocations, invokee page faults cannot be
serviced unless a guarantee of prompt completion can be
Unless care is taken in the design, a parallel problem
can arise if long strings are to be transferred. Real-time
I have chosen to gloss over the distinction in L4 between page fault
servicing timeouts and invokee readiness timeouts; the essential point
is that the invoker has said that they will notagree to block for invokee
schedulers rely on the fact that clock-drivenand interrupt-
driven preemption are recognized quickly by the proces-
sor. As a result, the internal implementation of an IPC that
transmits long strings must break the operation into mul-
tiple internal units of operation, each having a maximum
duration chosen to avoid interfering with the scheduler.
Given the need for internal units of operation, the
IPC designer may be tempted to make the external (i.e.
application-visible) specification of the IPC primitive di-
visible in some form. There are two security design haz-
ards here:
1. The servermust not be blockedby an untrusted client
when engaging in an extended transfer, and
2. The server must not be required to indefinitely hold
state in hopes that an IPC might later be resumed,
which precludes servicing other requests while there
is an incomplete divisible IPC outstanding.
To avoid these hazards, the scatter/gather and long string
enhancements to the EROS IPC specification allow an
IPC to be cleanly aborted, but do not permit it to be
“paused” prior to completion.
4.2 Application Layer Implications
With the sources of vulnerability characterized, we can
finally examine the impact on the application-levelremote
procedure call protocol in cases where invokee page fault
handlers cannot be safely executed.
Provided that IPC operations are specified as indivisi-
ble, the difficult case arises only when all of the following
conditions apply:
The invocation type precludes blocking,
The payload size cannot be known in advance by the
receiver, and
The side-effects or computational cost of the opera-
tion preclude recovery by delivering a truncated re-
sponse to the client, adjusting the client receive area,
and replaying the invocation.
Our experience with EROS has been that the vast ma-
jority of interprocess invocations have a statically spec-
ified upper bound on their payload in both directions,
even when data motion is involved. Read and write re-
quests, for example, typically specify an upper bound on
the buffer length. Whenever such a bound can be estab-
lished, we can reasonably require that well-behaved re-
cipients will pre-validate their receive buffer areas. In all
such cases, the sender can robustly use a non-blocking,
truncating IPC operation. Most replies carry no string at
all. For those that do, the string is generally smaller than
one page, so the receive area validation requirement im-
poses no significant performance cost.
If the invokee of a non-blocking IPC operation knows
the length of the expected message and fails to provide
adequately validated buffer space then it has not cor-
rectly executedthe higher-levelRPC protocol. In this case
we can presume that it has defected. The non-blocking
requirement is therefore a concern only when a well-
behaved receiver is unable to bound the length of the re-
Many of these unboundable cases – in particular those
that involve the return of strings are replayable. One
example is the previously described “get signature” oper-
vector<char, *> GetSignature(IF);
In this invocation, the server can respond using a non-
blocking IPC. If the response string exceeds the valid
buffer length of the recipient, the reply is truncated. As
long as the IPC primitivereports to the stub layerthe num-
ber of bytes that were sent (as opposed to received) the
stub can transparently increase the receive string buffer
size and retry the invocation.
4.3 The Difficult Case
We have now reduced the difficult case non-replayable
invocations with client-unknown payload – to a relatively
rare set ofsituations: those in which a side effect occursor
the cost of performing a “dry run” invocation (replay) is
prohibitive. In L4, the solution at this point is a page fault
IPC timeout. Because EROS is a capability-based system,
it offers the possibility of a more flexible solution.
If a large message is to be transferred in a single logi-
cal operation, the client cannot reliably accept it, and the
server must not block, then somebody must provide mem-
ory to hold it. In a typical IPC design, either the sender
holds it in sender memory and uses some timeout mecha-
nism to decide when to abort the invocation, or the kernel
must provide buffers. Based on the IPC specification, it
initially appears that EROS must either truncate these in-
vocations or incorporate some form of timeout into the
capability invocation mechanism. This was the source of
the mischaracterization in Ford et al. [7]. The next gener-
ation EROS system provides the underpinnings for a third
First, we must admit that the problem of server block-
ing has been slightly mischaracterized. We have said that
the server must not block, but this is overly constrain-
ing. A more precise statement is that the server cannot be
blocked by an untrusted party. In particular, if a trusted
buffering agent were available, the server could safely en-
capsulate the message in a buffer for later consumptionby
the client.
The problem with buffering in general is that (a) it is
expensive, and (b) the wrong party pays for the storage –
usually either the kernel or the sender. The first is merely
a nuisance. The second is a a potential cause of denial of
service. In EROS, it is possible to create a buffer that can
be trusted by a server but paid for by the client. We refer
to this object as the trusted buffer object (TBO). Ideally,
the use of a trusted buffer object would proceed as fol-
1. When CapIDL (the EROS IDL compiler) is asked to
generate an invocation stub for a non-replayable op-
eration that has an unboundablereply string, it modi-
fies the invocation signature to use a trusted interme-
diary: the trusted buffer object. The client invokes
the TBO, which invokes the server on behalf of the
2. On receipt, the server verifies using means described
below that the trusted buffer object is authentic. If
this verification succeeds, the server knows that this
object really executes the trusted buffer object pro-
gram. As a consequence, it knows that the TBO can
be counted on not to defect. Further, it knows that
any buffer storage space used by the TBO is ulti-
mately paid for by the client program.
3. The server transmits its response string to the TBO
using the “extended mutual exclusion” provided by
the EROS IPC primitives. This has one of two out-
(a) The TBO accepts the entire string, or
(b) The TBO runs out of space and reports this.
4. On completion, theTBO returnsto the client, passing
the string supplied by the server.
Note that a client can reuse its TBO object repeatedly.
Once created, the per-invocation overhead of the TBO is
A protocol optimization is possible to avoid recopying
the string from the TBO to the client. Instead of accepting
the string that the TBO attempts to return to the client, the
client may elect to accept zero bytes worth of data. Fol-
lowing completion of the TBO’s reply, the client can di-
rectlymap the TBO’s address space into client memory. If
desired, the TBO space can be permanently mapped into
client memory at TBO creation time, eliminating the over-
head of call-time mapping manipulations.
During an early review of this design, Mark Miller ob-
servedthat the same memorymapping optimizationmight
be feasible on the server side of the transaction, yield-
ing a zero-copy implementation for long string transfers.
The problem with this lies in the fact that the server must
be prepared to recoverfrom unsatisfied page faults within
this mapping region when client space is exhausted. Ask-
ing the TBO how much space is available does not help,
because the client has the authority to revoke this space in
mid-transfer. Our sense at this point is that the efficiency
of normal TBO invocation is high enough, and recovering
from these faults is complex enough, that this approach
will not usually prove to be worthwhile.
By introducing the trusted buffer object into the proto-
col, the entire burden of resource allocation is placed on
the client in such a way that the server is safe from denial
of resource and denial of service. In particular, the TBO
is transmitted using capability arguments to the IPC prim-
itive, and these are guaranteed to be transmitted promptly.
The overhead of using the TBO is largely lost in the noise
when very large strings are transferred.
5 Trusting the TBO
Given that a server cannot in general trust its clients, we
need to account for why injecting the TBO as an inter-
mediary is helpful, and what constraints must be satisfied
if the TBO is to be safely trusted. The constraints are
The returning server must be able to determine
whether it is returning to a TBO that is, to a pro-
cess executing code that can be trusted to respond
The server must be in a position to know that the
TBO will actually execute that code – that is, that the
client cannot disable the scheduling authority under
which the TBO is executing.
Prior to the work described here, EROS provided
means for client authentication and recovery from de-
struction, but did not provide adequate control over
scheduling authority to support the TBO.
5.1 TBO Authentication
The identification of a calling application (as distinct from
a calling principal) is directly supported by the EROS
constructor [32] mechanism. TBO objects are created by
the TBO constructor. Every EROS constructor “brands”
each processthat it createsby inserting aunique capability
into a reserved capability field of the created process (the
brand slot). The brand capability can be any convenient
capability type. To ensure that its brand is universally
unique and private, the constructor uses a distinguished
start capability to itself as the brand for its products. This
distinguished start capability is guaranteed to be accessi-
ble only to that constructor.
To support applicationauthentication, the EROS kernel
implements an operation known as the Identify operation.
Given a process, resume, or start capability to a process
and an alleged brand capability, the Identify com-
pares the alleged brand capability with the actual brand
previously recorded in the process
. The operation re-
turns true (false) according to whether the two brand
capabilities are (are not) equal.
Using this kernel primitive, any constructor is able to
identify its products. Given access to the TBO construc-
tor, any service can identify whether it is returning to a
5.2 Assurance of Execution
There are four ways in which the client might prevent the
TBO from executing:
1. The client might reclaim its storageby destroying the
space bank (the source of storage) from which it was
created. In this event the capability to the TBO will
become detectably invalid, so the server is able to
detect defection.
2. The client might reclaim TBO storage in mid-
transaction, while the server is waiting for the TBO.
The EROS kernel already makes provision for de-
fense against this: destruction of a process causes its
outstanding resume capabilities to be invoked, wak-
ing the server.
3. The client might contrive to starve the TBO of stor-
age by providing an inadequately populated space
bank. The TBO code guards against this and re-
turns an appropriate error if this occurs, allowing the
server to detect defection.
4. The client can disable the TBO’s schedule during ex-
To address the last problem, we are adding a new
feature into the next generation EROS IPC mechanism:
schedule donation. If both invoker and invokee agree to
do so, the invoker’s schedule capability will be copied to
the invokee as a side effect of the invocation. Agreement
is required so that a service can decline to execute under
caller control. Similarly, it should not be possible for an
arbitrary program to acquire a thread of control from its
caller by trickery. The L4 IPC design provides a similar
The specification of schedule donation is that the re-
cipient continues to execute under the donated schedule
until such time as it explicitly alters its own scheduling
authority, either explicitly or by accepting another dona-
tion. This implies that the donating process trusts the re-
cipient to give up the donated schedule when appropriate.
As the TBO is trusted, schedule donation is appropriate
here. Having established the identity and safe scheduling
control of the TBO, the server returns an extended string
to the TBO using a non-promptreturn operation (one that
permits page fault servicing).
Once a string has been loaded into the TBO by a server
process, the only operations the client can do are
Fetch the string. Doing so forces the client to do-
nate scheduling authority to the TBO object, reliev-
ing the server of any remaining scheduling exposure
that might arise from an error in the TBO implemen-
Ignore the TBO. In this case the TBO will retain
the server schedule until destroyed, but as it will not
initiate instructions without an invocation from the
client, and any such invocation will force a schedule
donation, we do not really care what schedule the
TBO acts under.
Note that neither of these actions imposes addi-
tional client-controlled costs on the server-supplied TBO
scheduling authority. Scheduling isolation is preserved.
6 Related Work
While many of the pieces described here rely on several
generations of improvement in the EROS system, EROS
owes a tremendous debt to its predecessor KeyKOS [11].
A great deal of work on thread-migrating IPC has been
done in the last decade, most notably by Liedtke [20, 21],
Ford [8], and Shapiro [31]. Though the KeyKOS capabil-
ity invocation mechanism [11] predates it, most of the cur-
rent work on thread-migrating IPC derives in some mea-
sure from the “lightweight remote procedure call” work
by Bershad [2].
Work on packet filtering [26, 24, 6] is similar in flavor
to the more restricted filtering performed by filtering in-
directors. We are not aware of such filters being used to
defer packet processing, nor do they appear to have been
used to filter dynamically tagged messages.
Use of registers to speed interprocess communication
was heavily used in the V Distributed System [4], and ap-
pears to have been independentlyproposed by Karger[16]
in connection with the SCAP system [17, 15]. The SCAP
design gains particular advantage if the trust relationship
between caller and callee is known to both parties.
Liedtke et al. [22] have considered selected denial of
service attacks against the L4 microkernel and its servers,
including several of the problems identified here. Their
work does not examine application-levelvulnerabilities in
depth, nor does it propose design refinements to the L4
IPC architecture that might mitigate its vulnerabilities.
The dynamic returnpayload problem addressedin Sec-
tion 4.3 is identical to the one way network transfer prob-
lem in multilevel secure systems, noted by Karger and in-
dependently by Rushby [14, 28]. In a one way network
transfer, the recipient is incapable of signaling error to the
sender and thereforemust provide sufficient storage to ac-
cept a message of unknown length.
7 Acknowledgements
The impetus for the work described here was the imple-
mentation of the CapIDL capability IDL system, which
is derived closely from earlier IDL work in CORBA.
CapIDL was jointly designed by Mark Miller and the au-
thor. A recent discussion with Dave Presotto helped spark
the last connections that led to the invention of the trusted
buffer object.
EROS builds heavily on the KeyKOS system designed
by Norm Hardy, Charles Landau, Alan Bomberger, and
Bill Frantz while at Tymshare, Inc. All of these people
have been helpful and patient in describing the workings
of KeyKOS and encouraging the development and evo-
lution of the EROS system. Bryan Ford has been kind
enough at various points to explain a number of details
in the Fluke implementation and the Mach version 4 IPC
Jochen Liedtke took the time several years ago while
we were both at IBM to discuss the merits of timeouts
and the problem of hostile pagers. Jochen’s continuous
advances in the performance and design of microkernel
operating systems led to improvements in the EROS im-
plementation and drove us to a deeper and more careful
understanding of operating system design. Yoon-Ho Park
similarly took time to discuss the experiences of the L4
team in building the SawMill system [10].
This paper originated from a discussion with Jay Lep-
reau at the Symposium on Operating System Design and
Implementationin 1999, which first promptedus to recog-
nize that vulnerabilities in synchronous IPC systems arise
from the complex and subtle interaction of many factors
in their designs, and that these interactions had not been
adequately explored in the archival literature.
8 Conclusions
In addition to fast performance, an effective interprocess
communication system must provide reproducible behav-
ior, deal with assymetric trust among communicating pro-
cesses, and enable support for messages that contain dy-
namically sized payloads. This paper describes why si-
multaneous satisfaction of these requirements is challeng-
ing, and identifies a set of enhancements to the current
EROS system that let us meet all three design objectives.
Two key enablers of the solution proposed here are the
ability in EROS to authenticate the code executed by an
application independent of its user, and the ability (via
confinement) to protect a trusted program from tampering
by its user. No generally satisfactory means to satisfy all
of these requirements simultaneously has previously been
To our knowledge, no previous papers have been pub-
lished that expose in depth oradequately address the inter-
process denial of service vulnerabilities that are implicit
in synchronous IPC designs. The analysis presented here
illustrates in detail an unusual case of authority factoring:
the provider of the trusted buffer’s storage authority and
the provider of its execution authority need to be distinct
in order to prevent certain classes of denial of service at-
tacks. Based on other ongoing design activities within
the EROS community, this appears to be an example of a
general pattern that emerges in many places where multi-
plexing crosses a trust boundary.
The solution proposed here leverages application-
based authentication heavily. We rely on the ability to
identify and safely execute components that are instanti-
ated by untrusted providers, and whose storage originates
from an untrusted source (the client). It is difficult to see
how this particular way of straddling the trust boundary
can be achieved without some form of protected naming
primitive. In our TBO approach, the server relies on re-
ceiving such a name (the resume capability) as a basis for
authenticating the TBO. With care, a cryptographic hash
of the invokerexecutableimage injected by the kernelinto
every invocation might serve as a substitute for capabil-
ities. Note that this hash can be cached in the kernel’s
per-process data structure; its use need not be expensive.
Given the number of IPC systems (and more broadly,
operating systems) that do not adequately support com-
munication across assymetric trust relationships, it ap-
pears that the issues involved are not widely understood.
One goal of this paper is to ensure that the problem of
asymmetric trust is not neglected in future IPC designs,
and to describe one strategy for how to support it.
[1] M. Acceta, R. V. Baron, W. Bolosky, D. B. Golub, R. F.
Rashid, A. Tevanian Jr., and M. W. Young. Mach: A
new kernel foundation for UNIX development. In Proc.
1986 USENIX Summer Technical Conference, pages 93–
112, June 1986.
[2] B. Bershad, T. Anderson, E. Lazowska, and H. Levy.
Lightweight remote procedure call. In Proc. 12th Sym-
posium on Operating Systems Principles, pages 102–113,
Dec. 1989.
[3] J. B. Chen and B. N. Bershad. The impact of operating
system structure on memory system performance. In Proc.
14th Symposium on Operating Systems Principles, Dec.
[4] D. Cheriton. The v distributed system. (3), Mar. 1988.
[5] U.S. Department of Defense Trusted Computer System
Evaluation Criteria, 1985.
[6] D. Engler and M. F. Kaashoek. Dpf: Fast, flexible message
demultiplexing using dynamic code generation. In Proc.
SIGCOMM ’96 Conference, pages 53–59, Stanford, CA,
USA, Aug. 1992.
[7] B. Ford, M. Hibler, J. Lepreau, R. McGrath, and P. Tull-
mann. Interface and execution models in the fluke kernel.
In Proc. 3rd Symposium on Operating System Design and
Implementation, pages 101–115, Feb. 1999.
[8] B. Ford and J. Lepreau. Evolving Mach 3.0 to a migrat-
ing threads model. In Proc. Winter USENIX Conference,
pages 97–114, Jan. 1994.
[9] E. Gabber, C. Small, J.Bruno, J. Brustoloni, and A. Silber-
schatz. The pebble component-based operating system. In
Proc. 1999 USENIX Annual Technical Conference, pages
267–282, Monterey, CA, USA, June 1999.
[10] A. Gefflaut, T. Jaeger, Y. Park, J. Liedtke, K. Elphinstone,
V. Uhlig, J. Tidswell, L. Deller, and L. Reuther. The
SawMill multiserver approach. In Proc. ACM SIGOPS
European Workshop, Sept. 2000.
[11] N. Hardy. The KeyKOS architecture. Operating Systems
Review, 19(4):8–25, Oct. 1985.
[12] T. Jaeger, K. Elphinstone, J. Liedtke, V. Panteleenko, and
Y. Park. Flexible access control using IPC redirection. In
Proc. 7th Workshop on Hot Topics in Operating Systems,
pages 191–196. IEEE, Mar. 1999.
[13] T. Jaeger, J. E. Tidswell, A. Gefaut, Y. Park, K. J. Elphin-
stone, and J. Liedtke. Synchronous IPC over transparent
monitors. In Proc. Ninth ACM/SIGOPS European Work-
shop Beyond the PC: New Challenges for the Operating
System”, Sept. 2000.
[14] P. Karger. Non-Discretionary Access Control for Decen-
tralized Computing Systems. PhD thesis, Massachusetts
Institute of Technology, Cambridge Massachusetts, May
1977. MIT/LCS/TR-179.
[15] P. Karger. Improving Security and Performance for Capa-
bility Systems. PhD thesis, University of Cambridge, Oct.
1988. Technical Report No. 149.
[16] P. A. Karger. Using registers to optimize cross-domain
call performance. ACM SIGARCH Computer Architecture
News, (2):194–204, Apr. 1989.
[17] P. A. Karger and A. J. Herbert. An augmented capability
architecture to support lattice security and traceability of
access. In Proc. of the 1984 IEEE Symposium on Security
and Privacy, pages 2–12, Oakland, CA, Apr. 1984. IEEE.
[18] P. A. Karger, M. E. Zurko, D. W. Bonin, A. H. Mason,
and C. E. Kahn. A retrospective on the VAX VMM secu-
rity kernel. IEEE Transactions on Software Engineering,
(11):1147–1165, Nov. 1991.
[19] B. W. Lampson and H. E. Sturgis. Reflections on an
operating system design. Communications of the ACM,
19(4):251–265, May 1976.
[20] J. Liedtke. Improving IPC by kernel design. In Proc. 14th
ACM Symposium on Operating System Principles, pages
175–188. ACM, 1993.
[21] J. Liedtke. Improved address-space switching on Pen-
tium processors by transparently multiplexing user ad-
dress spaces. Technical Report GMD TR 933, GMD, Nov.
[22] J. Liedtke, N. Islam, and T. Jaeger. Preventing denial-of-
service attacks on a
-kernel for weboses. In Proc. HotOS-
VI, May 1997.
[23] J. L¨oser and M. Hohmuth. Omega0: A portable interface
to interrupt hardware for l4 systems. In Proc. First Work-
shop on Common Microkernel System Platforms, Dec.
1999. Revised: Jewel Edition, January 5, 2000.
[24] S. McCanne and V. Jacobson. The bsd packet lter: A
new architecture for user-level packet capture. In Proc.
USENIX TechnicalConference, pages 259–269, Jan. 1993.
[25] M. S. Miller, C. Morningstar, and B. Frantz. Capability-
based financial instruments. In Proc. Financial Cryptog-
raphy 2000, Anguila, BWI, 2000. Springer-Verlag.
[26] J. Mogul, R. Rashid, and M. Accetta. The packet filter: An
efficient mechanism for user-level network code. In Proc.
Eleventh ACM Symposium on Operating Systems Princi-
ples, pages 39–51, Austin, TX, USA, Nov. 1987.
[27] M. Rozier, V. Abrossimov, F. Armand, I. Boule, M. Gien,
M. Guillemont, F. Hermann, C. Kaiser, S. Langlois,
P. Leonard, and W. Neuhauser. Overview of the Chorus
distributed system. Technical Report CS-TR-90-25, Cho-
rus Systemes, F-78182 St. Quentin-en-Yvelines Cedex,
France, 1991.
[28] J. Rushby and B. Randell. A distributed secure system.
IEEE Computer, 16(7):55–67, 1983.
[29] M. D. Schroeder, D. D. Clark, and J. H. Saltzer. The
MULTICS kernel design project. In Proc. 6th ACM Sym-
posium on Operating Systems Principles, pages 43–56.
ACM, Nov. 1977.
[30] J. S. Shapiro, D. J. Farber, and J. M. Smith. The measured
performance of a fast local IPC. In Proc. 5th International
Workshop on Object Orientation in Operating Systems,
pages 89–94, Seattle, WA, USA, Nov. 1996. IEEE.
[31] J. S. Shapiro, J. M. Smith, and D. J. Farber. EROS: A
fast capability system. In Proc. 17th ACM Symposium on
Operating Systems Principles, pages 170–185, Kiawah Is-
land Resort, near Charleston, SC, USA, Dec. 1999. ACM.
[32] J. S. Shapiro and S. Weber. Verifying the EROS confine-
ment mechanism. In Proc. 2000 IEEE Symposium on Se-
curity and Privacy, pages 166–176, Oakland, CA, USA,
[33] W.A. Wulf, R.Levin, andS. P. Harbison. HYDRA/C.mmp:
An Experimental Computer System. McGraw Hill, 1981.
... In these scenarios, the security provided by isolation is not enough. Communication relations are a concern, given that the IPC infrastructure constitute a means for attacking sibling partitions [17,16]. One major concern whenever implementing IPC mechanisms is how to mitigate the chance of a Denial-Of-Service (DoS) attack [18]. ...
... Inherently there are some accruing benefits in terms of performance and resource management. The data transfer can happen directly between address spaces, which will reduce the message propagation latency, and no buffering within the kernel is required [17,16,14]. Synchronous communication usually encompass a donation scheme, where in a client-server scenario the requester may donate its time execution in order to quicker resolve its dependency towards the server [43]. ...
... On scenario (B) one malicious client may block the server infinitely, causing them to fail answering the requests from other clients. This problem can and should be considered in the opposite way, where a malicious server may cause a client to block, by failing to perform the desired operations at specific points in time (scenarios C and D) [16,17]. Timeouts could be used in order to overcome these problems, however from L4 family experience [14] these are not effective, due to inappropriate use from user perspective, and also because it is not a good mensurable way to determine a given timeout value. ...
Full-text available
Embedded systems, which were by definition single-purpose, have evolved rapidly and nowadays are capable of supporting applications that, priorly, would be distributed between different hardware platforms. Virtualization proved its value in other fields, providing a way to safely collocate different applications on the same platform, enforcing security through isolation. Typical virtualization solutions follow a monolithic architecture, which usually contain large Trust Computing Base (TCB). Inherently, these are difficult to maintain, and could likely hide buggy software. Microkernels advocate a minimal TCB, that is restricted to an Inter-Partition Communication (IPC) infrastructure, a scheduler and memory management. Other functionalities are implemented in user-space, isolated from the system’s critical functionalities. Service provision is achieved by leveraging Inter-Partition Communication (IPC) infrastructure, with well defined communication channels, and establishing trustworthy communication relations. The inherent complexity of properly configuring such systems requires the use of dedicated tools, aiming at easing the configuration process. Model-Driven Engineering (MDE) advocates the conception of models towards software development, which would provide a more abstract, simplified view of the final system. Model description is often paired with Domain-Specific Languages (DSLs), that are featured with generative capacities. Thus, it becomes possible to transform a more abstract system into implementation artifacts (e.g. C/C++ code). Semantic technology has also been combined to modeling technologies, providing an alternative system representation, while enhancing modeling tools with: higher consistency, interoperability, automated validation and reasoning support. Under the light of the above, a collaborative effort was conducted towards the enhancement of the in-house developed RTZVisor with microkernel-like principles, that resulted on the µRTZVisor. This thesis focus on the implementation of a secure IPC infrastructure, featured with a capability-based access-control facility, to improve its overall reliability by imposing Information Control Flow (ICF). Aiming at easing system’s configuration, a modeling infrastructure was conceived that enabled the description of systems to be deployed on top of µRTZVisor. The infrastructure also converts the model representation into final source code with µRTZVisor resources configuration.
... On the other hand, asynchronous communication requires a double data copy: first from the sender's address space to the kernel, and then from the kernel to the recipient's address space. Although this provokes performance degradation, it enforces the system's security by avoiding the asymmetric trust problem [43,44], where an untrustworthy partition may cause a server to be blocked indefinitely, preventing it from answering other partitions' requests, resulting in possible DOS attacks. This could be solved by the use of timeouts; however, there is no theory to determine reasonable timeout values in non-trivial systems [28]. ...
Full-text available
Virtualization has been deployed as a key enabling technology for coping with the ever growing complexity and heterogeneity of modern computing systems. However, on its own, classical virtualization is a poor match for modern endpoint embedded system requirements such as safety, security and real-time, which are our main target. Microkernel-based approaches to virtualization have been shown to bridge the gap between traditional and embedded virtualization. This notwithstanding, existent microkernel-based solutions follow a highly para-virtualized approach, which inherently requires a significant software engineering effort to adapt guest operating systems (OSes) to run as userland components. In this paper, we present μ RTZVisor as a new TrustZone-assisted hypervisor that distinguishes itself from state-of-the-art TrustZone solutions by implementing a microkernel-like architecture while following an object-oriented approach. Contrarily to existing microkernel-based solutions, μ RTZVisor is able to run nearly unmodified guest OSes, while, contrarily to existing TrustZone-assisted solutions, it provides a high degree of functionality and configurability, placing strong emphasis on the real-time support. Our hypervisor was deployed and evaluated on a Xilinx Zynq-based platform. Experiments demonstrate that the hypervisor presents a small trusted computing base size (approximately 60KB), and a performance overhead of less than 2% for a 10 ms guest-switching rate.
... This extra overhead results in an understandable performance cost. However, by focusing on asynchronous communication, we avoid the asymmetric-trust problem [20]. This issue is specific to synchronous communication and may result in deadlocks or partitions hanging indefinitely while waiting for a communication event from a compromised partition. ...
Conference Paper
Full-text available
Safety has been, for a long time, a major concern for the aerospace industry. The recent increased interconnectivity, altogether with the on-going trend for adopting commercial off-the-shelf computing systems, have raised several security concerns, and proven security is gaining attention as a vulnerability that can also affect safety. Current approaches go towards isolation provided by space and time partitioning of system virtualization. The problem is existent virtualization solutions were primarily prepared to deal with accidental hardware faults or software bugs, and are not ready to fully manage malicious or intentional faults. This work describes the implementation of SecSSy hypervisor. SecSSy is a hardware-assisted virtualization solution, which addresses security at several stages of system development. SecSSy relies on a secure hardware architecture as the foundation to implement a secure software architecture, all steamed by a safe and secure development process. To the best of authors' knowledge, this is the first solution offering such a complete security-safety synergy for aerospace systems.
Conference Paper
Google's Android OS provides a lightweight IPC mechanism called Binder, which enables the development of feature-rich apps that seamlessly integrate services and data of other apps. Whenever apps can act both as service consumers and service providers, it is inevitable that the IPC mechanism provides message receivers with message provenance information to establish trust. However, the Android OS currently fails in providing sufficient provenance information, which has led to a number of attacks. We present an extension to the Android IPC mechanism, called Scippa, that establishes IPC call-chains across application processes. Scippa provides provenance information required to effectively prevent recent attacks such as confused deputy attacks. Our solution constitutes a system-centric approach that extends the Binder kernel module and Android's message handlers. Scippa integrates seamlessly into the system architecture and our evaluation shows a performance overhead of only 2.23% on Android OS v4.2.2.
The L4 microkernel has undergone 20 years of use and evolution. It has an active user and developer community, and there are commercial versions that are deployed on a large scale and in safety-critical systems. In this article we examine the lessons learnt in those 20 years about microkernel design and implementation. We revisit the L4 design articles and examine the evolution of design and implementation from the original L4 to the latest generation of L4 kernels. We specifically look at seL4, which has pushed the L4 model furthest and was the first OS kernel to undergo a complete formal verification of its implementation as well as a sound analysis of worst-case execution times. We demonstrate that while much has changed, the fundamental principles of minimality, generality, and high inter-process communication (IPC) performance remain the main drivers of design and implementation decisions.
Conference Paper
Full-text available
Full-text available
The CHORUS technology has been designed for building new generations of open, istributed, scalable operating systems. CHORUS has the following main characteristics: - a communication-based architecture, relying on a minimal Nucleus which integrates dis- tributed processing and communication at the lowest level, and which implements gen- eric services used by a set of subsystem servers to extend standard operating system interfaces. A UNIX subsystem has been developed; other subsystems such as object- oriented systems are planned; - a real-time Nucleus providing real-time services which are accessible to system program-mers; - a modular architecture providing scalability, and allowing, in particular, dynamic configuration of the system and its applications over a wide range of hardware and net- work configurations, including parallel and multiprocessor systems. CHORUS − V3 is the current version of the CHORUS Distributed Operating System, developed by Chorus systèmes. Earlier versions were studied and implemented within the Chorus research project at INRIA between 1979 and 1986. This paper presents the CHORUS architecture and the facilities provided by the CHORUS − V3 Nucleus. It also describes the UNIX subsystem built with the CHORUS technology that provides: - binary compatibility with UNIX ; - extended UNIX services, supporting distributed applications by providing network IPC, distributed virtual memory, light-weight processes, and real-time facilities.
Full-text available
Code to implement network protocols can be either inside the kernel of an operating system or in user-level processes. Kernel-resident code is hard to develop, debug, and maintain, but user-level implementations typically incur significant overhead and perform poorly. The performance of user-level network code depends on the mechanism used to demultiplex received packets. Demultiplexing in a user-level process increases the rate of context switches and system calls, resulting in poor performance. Demultiplexing in the kernel eliminates unnecessary overhead. This paper describes the packet filter , a kernel-resident, protocol-independent packet demultiplexer. Individual user processes have great flexibility in selecting which packets they will receive. Protocol implementations using the packet filter perform quite well, and have been in production use for several years.
Full-text available
We have defined and implemented a new kernel API that makes every exported operation either fully interruptible and restartable, thereby appearing atomic to the user. To achieve interruptibility, all possible states in which a thread may become blocked for a "long" time are completely rep-resentable as valid kernel API calls, without needing to re-tain any kernel internal state. This API provides important functionality. Since all ker-nel operations appear atomic, services such as transparent checkpointing and process migration that need access to the complete and consistent state of a process can be im-plemented by ordinary user-mode processes. Atomic op-erations also enable applications to provide reliability in a more straightforward manner. This API also allows novel kernel implementation tech-niques and evaluation of existing techniques, which we ex-plore in this paper. Our new kernel's single source im-plements either the "process" or the "interrupt" execution model on both uni-and multiprocessors, depending only on a configuration option affecting a small amount of code. Our kernel structure avoids the major complexities of tra-ditional implementations of the interrupt model, neither re-quiring ad hoc saving of state, nor limiting the operations (such as demand-paged memory) that can be handled by the kernel. Finally, our interrupt model configuration can support the process model for selected components, with the attendant flexibility benefits. We report preliminary measurements comparing fully, partially and non-preemptible configurations of both pro-cess and interrupt model implementations. We find that the interrupt model has a modest speed edge in some benchmarks, maximum latency varies nearly three ordersprojects/flux/. of magnitude, average latency varies by a factor of six, and memory use favors the interrupt model as expected, but not by a large amount. We find that the overhead for restarting the most costly kernel operation ranges from 2–8%.
In this paper we evaluate the memory system behavior of two distinctly different implementations of the UNIX operating system; DEC;s Ultrix, a monolithic system, and Mach 3.0 with CMU's UNIX server, a microkernel-based system. In our evaluation we use combined system and user memory reference traces of thirteen industry-standard workloads. We show that the microkernel-based system executes substantially more non-idle system instructions for an equivalent workload than the monolithic system. Furthermore, the average instruction for programs running on Mach has a higher cost, in terms of memory cycles per instruction, than on Ultrix.
Conference Paper
This paper describes a new technique to improve the performance of cross-domain calls and returns in a capability-based computer system. Using register optimization information obtained from the compiler, a trusted linker can minimize the number of registers that must be saved, restored, or cleared when changing from one protection domain to another. The size of the performance gain depends on the level of trust between the calling and called protection domains. The paper presents alternate implementations for an extended VAX architecture and for a RISC architecture and reports performance measurements done on a re-microprogrammed VAX-11/730 processor.