Conference PaperPDF Available

Ovid: A Software-Defined Distributed Systems Framework

Authors:

Abstract and Figures

We present Ovid, a framework for building evolvable large-scale distributed systems that run in the cloud. Ovid constructs and deploys distributed systems as a collection of simple components, creating systems suited for containerization in the cloud. Ovid supports evolution of systems through transformations, which are automated refinements. Examples of transformations include replication, batching, sharding, and encryption. Ovid transformations guarantee that an evolving system still implements the same specification. Moreover, systems built with transformations can be combined with other systems to implement more complex infrastructure services. The result of this framework is a software-defined distributed system, in which a logically centralized controller specifies the components, their interactions, and their transformations.
Content may be subject to copyright.
Ovid: A Software-Defined Distributed Systems Framework
Deniz Altınb¨
uken
Cornell University
Robbert van Renesse
Cornell University
Abstract
We present Ovid, a framework for building evolvable
large-scale distributed systems that run in the cloud.
Ovid constructs and deploys distributed systems as a col-
lection of simple components, creating systems suited
for containerization in the cloud. Ovid supports evolu-
tion of systems through transformations, which are auto-
mated refinements. Examples of transformations include
replication, batching, sharding, and encryption. Ovid
transformations guarantee that an evolving system still
implements the same specification. Moreover, systems
built with transformations can be combined with other
systems to implement more complex infrastructure ser-
vices. The result of this framework is a software-defined
distributed system, in which a logically centralized con-
troller specifies the components, their interactions, and
their transformations.
1 Introduction
Containerization in the cloud is a lightweight virtualiza-
tion technique that is getting popular and helping de-
velopers build and run applications comprising multiple
systems in the cloud. The container technology creates
an ecosystem where systems can be deployed quickly
and can be composed with other systems. While we have
a good understanding of how to build cloud services that
are comprised of many systems, we do not have the tech-
nology to reason about how these systems can be recon-
figured and evolved. A given configuration of systems in
the cloud may not be able to scale to developing work-
loads. Sharding, replication, batching, and similar im-
provements will be necessary to incorporate over time.
Next, online software updates will be necessary for bug
fixes, new features, enhanced security, and so on.
We describe the design and initial implementation of
Ovid, a container-based framework for building, main-
taining, and evolving distributed systems that run in the
cloud. Each container runs one or more agents that com-
municate by exchanging messages. An agent is a self-
contained state machine that transitions in response to
messages it receives and may produce output messages
for other agents. Initially, a system designer does not
worry about scaling, failure handling, and so on, and de-
velops relatively simple agents to implement a particu-
lar service. Next, we apply automatic transformations to
agents such as replication or sharding. Ovid provides a
suite of such transformations. Each transformation re-
places an agent by one or more new agents and creates a
refinement mapping [15] from the new agents to the orig-
inal agent to demonstrate correctness. Transformations
can be applied recursively, resulting in a tree of transfor-
mations.
When an agent is replaced by a set of agents, the ques-
tion arises what happens to messages that are sent to the
original agent. For this, each transformed agent has one
or more ingress agents that receive such incoming mes-
sages. The routing is governed by routing tables: each
agent has a routing table that specifies, for each destina-
tion address, what the destination agent is. Inspired by
software-defined networks [9, 17], Ovid has a logically
centralized controller, itself an agent, that determines the
contents of the routing tables.
Besides routing, the controller determines where
agents run and uses containerization to run agents as
lightweight processes sharing resources. Agents may be
co-located to reduce communication overhead, or run in
different locations to benefit performance or failure inde-
pendence. Ovid also supports on-the-fly reconfiguration.
The result is what can be termed a “software-defined
distributed system” in which a programmable controller
manages a running system.
We have built a prototype implementation of Ovid that
includes an interactive and visual tool for specifying and
transforming distributed systems and a run-time environ-
ment that deploys and runs the agents in containers in
the cloud. The interactive designer makes it relatively
easy, even for novice users, to construct systems that are
scalable and reliable. The designer can be run from any
web browser. The run-time environment, currently only
supporting agents and transformations written in Python,
manages all execution and communication fully auto-
matically. Eventually, Ovid is intended to support agents
written in a variety of programming languages.
2 Design
A system in Ovid consists of a set of agents that commu-
nicate by exchanging messages. Each agent has a unique
Figure 1: Boxes (rectangles) and agents (circles) for the
key-value store that has been transformed to be sharded
two-ways.
identifier and messages are sent to specific agent identi-
fiers. When an agent receives a message, it updates its
state and it may generate output messages that are sent to
other agents. Every agent has a routing table that maps
agent identifiers to agent identifiers.
For example, if a web server, acting as a client, uses a
key-value store to store data, it can send GET and PUT
requests to the key-value store using an agent identi-
fier, say ‘KVS’. This means, the web server knows the
key-value store as ‘KVS’. The routing table of the web
server must then contain an entry that maps ‘KVS’ to
the unique agent identifier of the key-value store agent,
and similarly, if the key-value store knows the web server
as a ‘client’, the routing table of the key-value store
agent must contain an entry that maps ‘client’ to the
identifier of the web server.
The agents can be transformed— replacing it with one
or more new agents in such a way that the new agents col-
lectively implement the same functionality as the origi-
nal agent from the perspective of the other, unchanged,
agents. In the context of a particular transformation, we
call the original agent virtual, and the agents that result
from the transformation physical.
For example, consider the key-value store used by the
web server. We can shard the virtual agent by creating
two physical copies of it, one responsible for all keys that
satisfy some predicate P(key), and the other responsible
for the other keys. To glue everything together, we add
additional physical agents: a collection of ingress proxy
agents, one for each client, and two egress proxy agents,
one for each key-value store agent.
An ingress proxy agent mimics the virtual agent
‘KVS’ to its client, while an egress proxy agent mimics
the client to a shard of the key-value store. The routing
table of the client agent is modified to route messages to
‘KVS’ to the ingress proxy.
Figure 1 illustrates the configuration as a directed
graph and demonstrates the use of layering and encap-
sulation, common concepts in distributed systems and
networking. Every physical agent is pictured as a sep-
arate node and agents that are co-located are shown in
the same rectangle representing a box, which acts as
a container. The directed edges between nodes illus-
trate the message traffic patterns between agents. More-
over, the configuration shows two layers with an abstrac-
tion boundary. The top layer shows an application and
its clients. The bottom layer multiplexes and demulti-
plexes. This is similar to multiplexing in common net-
work stacks. For example, the EtherType field in an Eth-
ernet header, the protocol field in an IP header, and the
destination port in a TCP header all specify what the next
protocol is to handle the encapsulated payload. In our
system, agent identifiers fulfill that role. Even if there
are multiple layers of transformation, each layer would
use, uniformly, an agent identifier for demultiplexing.
Note that, a transformation is essentially a special case
of a refinement in which an executable specification of
an agent is refined to another executable specification.
Hence, in Ovid every transformation maps the state of
the physical agents to the state of the virtual agent and
the transitions of the physical agents to transitions in the
virtual agent, in effect exhibiting the correctness of the
transformation. Another paper on Ovid [4] includes ex-
amples of these refinements and focuses on the theory of
how they work, while this paper presents the design and
implementation of Ovid.
Ovid agents and transformations can be written in
any programming language, as long as they conform to
a specified line protocol (usually in the form of JSON
records). The example specifically illustrates sharding,
but there are many other kinds of transformations that
can be applied in a similar fashion, among which:
State Machine Replication: similar to sharding, this
deploys multiple copies of the original agent. The
proxies in this case run a replication protocol that
ensures that all copies receive the same messages in
the same order;
Primary-Backup Replication: this can be applied to
applications that keep state on a separate disk us-
ing read and write operations. In our model, such a
disk is considered a separate agent. Fault-tolerance
can be achieved by deploying multiple disk agents,
one of which is considered primary and the others
backups;
Load Balancing: also similar to sharding, and par-
ticularly useful for stateless agents, a load balancing
agent is an ingress proxy agent that spreads incom-
ing messages to a collection of server agents;
Encryption, Compression, and Batching: between
any pair of agents, one can insert a pair of agents
2
that encode and decode sequences of messages re-
spectively;
Monitoring, Auditing: between any pair of agents,
an agent can be inserted that counts or logs the mes-
sages that flow through it.
Above we have presented transformations on individ-
ual agents. In limited form, transformations can some-
times also be applied to sets of agents. For example, a
pipeline of agents (in which the output of one agent form
the input to the next) acts essentially as a single agent,
and transformations that apply to a single agent can also
be applied to pipelines. Some transformations, such as
Nysiad [11], apply to particular configurations of agents.
For simplicity, we focus here on transformations of in-
dividual agents only, but believe the techniques can be
generalized more broadly.
Every agent in the system has a unique identifier.
The agents that result from a transformation have identi-
fiers that are based on the original agent’s identifier, by
adding new identifiers in a ‘path name’ style. Thus
an agent with identifier ‘X/Y/Z’ is part of the imple-
mentation of agent ‘X/Y’, which itself is part of the
implementation of agent ‘X’. In our running example,
assume the identifier of the original key-value store is
‘KVS’. Then we can call its shards ‘KVS/Shard1’
and ‘KVS/Shard2’. We can call the server prox-
ies ‘KVS/ShardEgressProxyX, and we can call
the client proxies ‘KVS/ShardIngressProxyX
for some X.
The client agent in this example still sends messages
to agent identifier ‘KVS’, but due to transformation
the original ‘KVS’ agent no longer exists physi-
cally. The client’s routing table maps agent identifier
‘KVS’ to ‘KVS/ShardIngressProxyXfor
some X. Agent ‘KVS/ShardIngressProxyX
encapsulates the received message and sends it to
agent identifier ‘KVS/ShardEgressProxy1’ or
‘KVS/ShardEgressProxy2’ depending on the
hash function. Assuming those proxy agents have not
been transformed themselves, there is again a one-to-one
mapping to corresponding agent identifiers. Each egress
proxy ends up sending to agent identifier ‘KVS’. Agent
identifier ‘KVS’ is mapped to agent ‘KVS/Shard1’
at agent ‘KVS/ShardEgressProxy1’
and to agent ‘KVS/Shard2’ at agent
‘KVS/ShardEgressProxy2’. Note that if
identifier Xin a routing table is mapped to an identifier
Y, it is always the case that Xis a prefix of Y(and is
identical to Yin the case the agent has not been refined).
Given the original system and the transformations that
have been applied, it is always possible to determine
the destination agent for a message sent by a particular
source agent to a particular agent identifier.
Figure 2: The Logical Agent Transformation Tree
(LATT) for the two-way sharded ‘KVS’ key-value store
generated by the visualizer.
We represent a system and its transformation in a Log-
ical Agent Transformation Tree (LATT). A LATT is a
directed tree of agents. The root of this tree is a vir-
tual agent that we call the System Agent. The “children”
of this root are the agents before transformation. Each
transformation of an agent (or set of agents) then results
in a collection of children for the corresponding node.
A deployed system evolves. Ovid supports on-the-fly
reconfiguration based on “wedging” agents [6, 2]. By
wedging an agent, it can no longer make transitions, al-
lowing its state to be captured and re-instantiated for a
new agent. Brevity prevents us from going into detail
how the state is sharded and transferred, but the refine-
ment mapping provides the information about how the
state of the transformed agent has to map back to the
original agent. By updating routing tables the reconfigu-
ration can be completed.
Eventually we would like these transformations to
be determined and updated automatically based on the
workload of the application and the environment in
which it runs. In the current system, transformations are
manually selected and updated by the user with the help
of a visualizer component.
3 Implementation
Our implementation is comprised of two main parts: The
designer creates the LATTs that represent the desired
distributed system and transforms these LATTs creating
a configuration that can be deployed. Ovid core then
takes the configuration generated by the designer, builds
the LATT as a real system with a collection of agent pro-
cesses and deploys it in a given setting.
The designer models a given distributed system in the
form of a LATT, and applies transformations on this
LATT. It is implemented in JavaScript, to make it easy
3
to run it in a web browser, with 1200 lines of code. The
designer has a visualizer component that can draw the
LATT for a given system in the form of a graph. Users
can select the transformations to be applied to a given
system and these transformations create a new LATT that
represents the new distributed system, as a combination
of agents.
Using the LATT visualizer, the user can see how trans-
formations change the LATT of a given system and
which components are created as a result of the transfor-
mation. The corresponding graph for the LATT is kept
up-to-date automatically. The visualizer also shows the
high-level specifications of systems in the environment
and how they are connected to each other. Using this in-
formation, the user can see how transformations affect
the high-level specification of a system and easily under-
stand how a given system interacts with other systems.
When a system is transformed only the LATT of that
system is affected, but when it comes to deployment,
other systems that are connected to the given system may
be affected as well. The transformation function updates
the deployment information for the environment once the
LATT of a given system is updated. For example, Fig-
ure 2 shows the LATT generated by the visualizer for
the two-way sharded key-value store we used as our run-
ning example. During this transformation, the LATTs of
the clients that are connected to the ‘KVS’ system are
not transformed, but the deployment information of the
clients are transformed to include the ingress proxies to
be able to send their requests to the correct shard. The
designer then creates a configuration that can be directly
deployed by Ovid core.
Ovid core builds and deploys a distributed system cre-
ated by the designer in a cloud environment. The cur-
rent version of Ovid core is implemented with 5000 lines
of Python code. Agents run as Python threads inside a
box, which acts as a container, that is, an execution en-
vironment for agents. Eventually, we want to support
agents written in any programming language running in
containers maintained by platforms such as Docker [18]
or Kubernetes [1].
For each agent, the box keeps track of the agent’s at-
tributes and runs transitions for messages that have ar-
rived. In addition, each box runs a box manager agent
that supports management operations such as starting a
new agent or updating the Agent Routing Table (ART)
of a given agent.
Boxes also implement the reliable transport of mes-
sages between agents. A box is responsible for making
sure that the set of output messages of an agent running
on the box is transferred to the sets of input messages of
the destination agents. For each destination agent, this
assumes there is an entry for the message’s agent identi-
fier in the source agent’s routing table, and the box also
needs to know how to map the destination agent identi-
fier to one or more network addresses of the box that runs
the destination agent. A box maintains the Box Routing
Table (BRT) for this purpose.
Co-locating agents in a box saves message overhead,
since co-located agents can communicate efficiently via
shared memory instead of message passing. Currently
a small PUT request takes on average 0.29 milliseconds
when the client is co-located with the server and 1.38
milliseconds when the client and server are running in
different boxes on Linux machines that are connected us-
ing a 1 Gbit Ethernet.
All routing information is maintained by the controller
including the list of boxes and their network addresses,
the list of agents and in which boxes they are running,
and the agent identifier mappings used to route messages.
As in other software-defined architectures, we deploy
a logically centralized controller for administration. The
controller itself is just another agent, and has identi-
fier ‘controller’. The controller agent itself can
be transformed by replication, sharding and so on. For
scale, the agent may also be hierarchically structured.
Depending on the deployment of the system itself, the
controller itself can be transformed and deployed accord-
ingly, for instance deployed across a WAN. However,
here we assume for simplicity that the controller agent
is physically centralized and runs on a specific box.
As a starting point, the controller is configured with
the LATT, as well as the identifiers of the box managers
on those boxes. The BRT of the box in which the con-
troller runs is configured with the network addresses of
the box managers.
First, the controller sends a message to each box man-
ager imparting the network addresses of the box in which
the controller agent runs. Upon receipt, the box manager
adds a mapping for the controller agent to its BRT. The
controller can then instantiate the agents of the LATT,
which is accomplished by the controller sending requests
to the various box managers.
Initially, the agents’ routing tables contain only the
‘controller’ mapping. When an agent sends a mes-
sage to a particular agent identifier, there is an “ART
miss” event. On such an event, the agent ends up implic-
itly sending a request to the controller asking it to resolve
the agent identifier to agent identifier binding. The con-
troller uses the LATT to determine this binding and re-
sponds to the client with the destination agent identifier.
The client then adds the mapping to its routing table.
Next, the box tries to deliver the message to the des-
tination agent. To do this, the box looks up the desti-
nation agent identifier in its BRT, and may experience a
“BRT miss”. In this case, the box sends a request to the
controller agent asking to resolve that binding as well.
The destination agent may be within the same box as the
4
Figure 3: The physical deployment of a system in the
cloud. Multiple agents can run in a box and multiple
boxes can run on machines that are connected to each
other through the network.
source agent, but this can only be learned from the con-
troller. One may think of routing tables as caches for the
routes that the controller decides.
Figure 3 shows a possible layout of systems created
using the Ovid framework. There are three physical ma-
chines connected through the network. Machines can run
multiple boxes, acting as containers, which may run mul-
tiple agents themselves.
Ovid core is implemented in an object-oriented man-
ner, where every agent in the system, including the box
managers and the controller, extends from an Agent
class. Moreover, each transformation and the agents that
implement it is created as a separate module, and each
module extends from a base module. The base module
includes the base agent implementation and the under-
lying messaging layer as well as other utility libraries.
On top of this base module, every module includes trans-
formation specific agents such as the ingress and egress
proxies. Ovid is designed in an object-oriented and mod-
ular fashion to be easily evolvable itself. New transfor-
mations can be added to Ovid by adding new modules
that are independent of existing modules.
4 Related Work
There has been much work on support for developing
distributed systems as well as deploying long-lived dis-
tributed systems that can adapt to evolving environments.
Mace [13] is a language-based solution to automati-
cally generate complex distributed system deployments
using high-level language constructs. Orleans [8] and
Sapphire [22] offer distributed programming platforms
to simplify programming distributed applications that
run in cloud environments.
In [21], Wilcox et al. present a framework called Verdi
for implementing practical fault-tolerant distributed sys-
tems and then formally verifying that the implementa-
tions meet their specifications. Systems like Verdi can be
used in combination with Ovid to build provably correct
large-scale infrastructure services that comprise multiple
distributed systems. These systems can be employed to
prove the safety and liveness properties of different mod-
ules in Ovid, as well as the distributed systems that are
transformed by Ovid.
Another approach to implementing evolvable dis-
tributed systems is building reconfigurable systems. Re-
configurable distributed systems [5, 7, 12, 14] support
the replacement of their sub-systems. In [3], Ajmani
et al. propose automatically upgrading the software of
long-lived, highly-available distributed systems gradu-
ally, supporting multi-version systems. Horus [20, 16]
and Ensemble [10, 19] employ a modular approach to
building distributed systems using micro-protocols that
can be combined together to create protocols.
5 Conclusion and Future Work
Ovid is a software-defined distributed systems frame-
work for building large-scale distributed systems that
have to support evolution in the cloud. Using Ovid,
a user designs, builds and deploys distributed systems
comprised of agents that run in containers. Our proto-
type implementation of Ovid supports agents and trans-
formations of these agents written in Python. Next, we
will support agents written in any programming language
running in containers. We also want to develop tools to
verify performance and reliability objectives of a deploy-
ment. We then plan to do evaluations of various large-
scale distributed systems developed using Ovid. Even-
tually, Ovid will be able to create complex distributed
infrastructure services as a combination of systems run-
ning in containers in the cloud.
Acknowledgments
The authors are supported in part by AFOSR grants
FA2386-12-1-3008, F9550-06-0019, FA9550-11-1-
0137, by NSF grants CNS-1601879, 0430161, 0964409,
1040689, 1047540, 1518779, 1561209, 1601879,
CCF-0424422, by ONR grants N00014-01-1-0968,
N00014-09-1-0652, by DARPA grants FA8750-10-2-
0238, FA8750-11-2-0256, and by gifts from Yahoo!,
Microsoft Corporation, Infosys, Google, Facebook Inc.
5
References
[1] Kubernetes. http://kubernetes.io/. Ac-
cessed March 7, 2016.
[2] AB U-LIBDEH, H., VAN RENESSE, R., AND VIG -
FU SS ON , Y. Leveraging sharding in the design of
scalable replication protocols. In Proceedings of
the Symposium on Cloud Computing (Farmington,
PA, USA, October 2013), SoCC ’13.
[3] AJ MA NI , S., LI SKOV, B ., AND SHRIRA, L . Mod-
ular software upgrades for distributed systems.
In Proceedings of the 20th European Conference
on Object-Oriented Programming (Berlin, Heidel-
berg, 2006), ECOOP’06, Springer-Verlag, pp. 452–
476.
[4] ALTINB UK EN , D., AN D VAN REN ES SE , R. Ovid:
A software-defined distributed systems framework
to support consistency and change. IEEE Data En-
gineering Bulletin (Mar. 2016).
[5] BI DAN , C. , IS SA RN Y, V., S ARIDAKIS, T., AN D
ZARRAS, A. A dynamic reconfiguration service
for corba. In 4th International Conference on
Distributed Computing Systems (1998), ICCDS’98,
IEEE Computer Society Press, pp. 35–42.
[6] BIRMAN, K., MALKHI, D., AN D VAN RENESSE,
R. Virtually synchronous methodology for dy-
namic service replication. Tech. Rep. MSR-TR-
2010-151, Microsoft Research, 2010.
[7] BL OO M, T. Dynamic module replacement in a
distributed programming system. Tech. Rep. MIT-
LCSTR-303, MIT, 1983.
[8] BY KOV, S., G EL LE R, A ., K LI OT, G. , LA RUS, J.,
PANDYA, R ., A ND THELIN, J . Orleans: Cloud
computing for everyone. In ACM Symposium
on Cloud Computing (SOCC ’11) (October 2011),
ACM.
[9] CASADO, M., FRE ED MA N, M. J., PETTIT, J.,
LUO , J., MCKE OWN , N. , AN D SHE NK ER , S.
Ethane: Taking control of the enterprise. In
Proceedings of the 2007 Conference on Applica-
tions, Technologies, Architectures, and Protocols
for Computer Communications (New York, NY,
USA, 2007), SIGCOMM ’07, ACM, pp. 1–12.
[10] HAYD EN , M . G. The Ensemble System. PhD
thesis, Cornell University, Ithaca, NY, USA, 1998.
AAI9818467.
[11] HO, C., VAN RENESSE, R., BICKFORD, M., AND
DOL EV, D. Nysiad: Practical protocol transforma-
tion to tolerate Byzantine failures. In Proceedings
of the 5th USENIX Symposium on Networked Sys-
tems Design and Implementation (Berkeley, CA,
USA, 2008), NSDI ’08, USENIX Association,
pp. 175–188.
[12] HOFMEISTER, C. R. , A ND PU RTI LO , J. M. A
framework for dynamic reconfiguration of dis-
tributed programs. In Proceedings of the 11th In-
ternational Conference on Distributed Computing
Systems (1991), ICDCS 1991, pp. 560–571.
[13] KILLIAN, C. E., ANDERSON, J. W., BRAUD, R.,
JHA LA , R., AND VAHDAT, A. M. Mace: Language
support for building distributed systems. In Pro-
ceedings of the 28th ACM SIGPLAN Conference on
Programming Language Design and Implementa-
tion (New York, NY, USA, 2007), PLDI ’07, ACM,
pp. 179–188.
[14] KRAMER, J., A ND MAG EE , J. The evolving
philosophers problem: Dynamic change manage-
ment. IEEE Transactions on Software Engineering
16, 11 (November 1990), 1293–1306.
[15] LA MP ORT, L. Specifying concurrent program
modules. Transactions on Programming Lan-
guages and Systems 5, 2 (April 1983), 190–222.
[16] LI U, X., KREITZ, C., VAN RENESSE, R.,
HICKEY, J., HAYDE N, M., BIRMAN, K.,
AN D CON STAB LE , R. Building reliable, high-
performance communication systems from compo-
nents. In 17th ACM Symposium on Operating Sys-
tem Principles (Kiawah Island Resort, SC, USA,
December 1999).
[17] MCKE OWN , N. , ANDERSON, T., BALAKRISH-
NAN , H. , PARULKAR, G., PET ER SO N, L., REX -
FO RD , J., SH EN KE R, S., A ND TURNER, J. Open-
flow: Enabling innovation in campus networks.
ACM SIGCOMM Computer Communication Re-
view 38, 2 (March 2008), 69–74.
[18] ME RK EL , D. Docker: Lightweight linux con-
tainers for consistent development and deployment.
Linux Journal 2014, 239 (March 2014).
[19] VAN RENESSE, R., BIRMAN, K. P. , HAYDEN, M.,
VAY SBURD, A., AND KARR, D. Building adaptive
systems using ensemble. Software Practice and Ex-
perience 28, 9 (August 1998), 963–979.
[20] VAN RENESSE, R., BIRMAN, K . P., AND MAF-
FE IS , S . Horus: A flexible group communication
6
system. Communications of the ACM 39, 4 (April
1996), 76–83.
[21] WI LC OX, J . R., WOO S, D., PANCHEKHA, P.,
TATLOC K, Z ., WAN G, X ., E RN ST, M. D., AND
ANDERSON, T. Verdi: A framework for imple-
menting and formally verifying distributed system.
In Proceedings of the 36th annual ACM SIGPLAN
conference on Programming Language Design and
Implementation (Portland, OR, USA, June 2015),
Implementation (Portland, OR, USA, June 2015),
PLDI 2015.
[22] ZH AN G, I., SZE KE RE S, A., AKE N, D. V., ACK -
ERMAN, I., GRIBBLE, S. D., KRISHNAMURTHY,
A., AND LEV Y, H. M. Customizable and
extensible deployment for mobile/cloud applica-
tions. In 11th USENIX Symposium on Operating
Systems Design and Implementation (OSDI ’14)
(Broomfield, CO, Oct. 2014), USENIX Associa-
tion, pp. 97–112.
7
Chapter
This paper presents Escher, an approach to build and deploy multi-tiered cloud-based applications, and outlines the framework that supports it. Escher is designed to allow systems of systems to be derived methodically and to evolve over time, in a modular way. To this end, Escher includes (i) a novel authenticated message bus that hides from one another the low-level implementation details of different tiers of a distributed system; and (ii) general purpose wrappers that take an implementation of an application and deploy, for example, a sharded or replicated version of an application automatically.
Conference Paper
Full-text available
Distributed systems are difficult to implement correctly because they must handle both concurrency and failures: machines may crash at arbitrary points and networks may reorder, drop, or duplicate packets. Further, their behavior is often too complex to permit exhaustive testing. Bugs in these systems have led to the loss of critical data and unacceptable service outages. We present Verdi, a framework for implementing and formally verifying distributed systems in Coq. Verdi formalizes various network semantics with different faults, and the developer chooses the most appropriate fault model when verifying their implementation. Furthermore, Verdi eases the verification burden by enabling the developer to first verify their system under an idealized fault model, then transfer the resulting correctness guarantees to a more realistic fault model without any additional proof burden. To demonstrate Verdi's utility, we present the first mechanically checked proof of linearizability of the Raft state machine replication algorithm, as well as verified implementations of a primary-backup replication system and a key-value store. These verified systems provide similar performance to unverified equivalents.
Article
Full-text available
In designing and building distributed systems, it is common engineering practice to separate steady-state ("normal") operation from abnormal events such as recovery from failure. This way the normal case can be optimized extensively while recovery can be amortized. However, integrating the recovery procedure with the steady-state protocol is often far from obvious, and can present subtle difficulties. This issue comes to the forefront in modern data centers, where applications are often implemented as elastic sets of replicas that must reconfigure while continuing to provide service, and where it may be necessary to install new versions of active services as bugs are fixed or new functionality is introduced. Our paper explores this topic in the context of a dynamic reconfiguration model of our own design that unifies two widely popular prior approaches to the problem: virtual synchrony, a model and associated protocols for reliable group communication, and state machine replication (in particular, Paxos), a model and protocol for replicating some form of deterministic functionality specified as an event-driven state machine.
Conference Paper
Full-text available
Cloud computing is a new computing paradigm, combining diverse client devices -- PCs, smartphones, sensors, single-function, and embedded -- with computation and data storage in the cloud. As with every advance in computing, programming is a fundamental challenge, as the cloud is a concurrent, distributed system running on unreliable hardware and networks. Orleans is a software framework for building reliable, scalable, and elastic cloud applications. Its programming model encourages the use of simple concurrency patterns that are easy to understand and employ correctly. It is based on distributed actor-like components called grains, which are isolated units of state and computation that communicate through asynchronous messages. Within a grain, promises are the mechanism for managing both asynchronous messages and local task-based concurrency. Isolated state and a constrained execution model allow Orleans to persist, migrate, replicate, and reconcile grain state. In addition, Orleans provides lightweight transactions that support a consistent view of state and provide a foundation for automatic error handling and failure recovery. We implemented several applications in Orleans, varying from a messaging-intensive social networking application to a data- and compute-intensive linear algebra computation. The programming model is a general one, as Orleans allows the communications to evolve dynamically at runtime. Orleans enables a developer to concentrate on application logic, while the Orleans runtime provides scalability, availability, and reliability.
Conference Paper
Full-text available
Upgrading the software of long-lived, highly-available distributed sys- tems is di cult. It is not possible to upgrade all the nodes in a system at once, since some nodes may be unavailable and halting the system for an upgrade is unacceptable. Instead, upgrades must happen gradually, and there may be long periods of time when di erent nodes run di erent software versions and need to communicate using incompatible protocols. We present a methodology and in- frastructure that make it possible to upgrade distributed systems automatically while limiting service disruption. We introduce new ways to reason about cor- rectness in a multi-version system. We also describe a prototype implementation that supports automatic upgrades with modest overhead.
Conference Paper
Full-text available
Abstract Although building systems from components has attractions , this approach,also has problems. Can we be sure that a certain configuration of components,is correct? Can it perform as well as a monolithic system?,Our paper answers these questions for the Ensemble communication,architecture by showing how, with help of the Nuprl formal system, configurations may be checked against specifications, and how optimized code can be synthesized from these configurations. The performance,results show that we can substantially reduce end-to-end latency in the already optimized Ensemble system. Finally, we discuss whether the techniques we used are general enough,for systems other than communication systems.
Conference Paper
Full-text available
The paper presents and evaluates Nysiad,1 a system that implements a new technique for transforming a scalable distributed system or network protocol tolerant only of crash failures into one that tolerates arbitrary failures, including such failures as freeloading and malicious at- tacks. The technique assigns to each host a certain num- ber of guard hosts, optionally chosen from the available collection of hosts, and assumes that no more than a con- figurable number of guards of a host are faulty. Nysiad then enforces that a host either follows the system's pro- tocol and handles all its inputs fairly, or ceases to produce output messages altogether—a behavior that the system tolerates. We have applied Nysiad to a link-based routing protocol and an overlay multicast protocol, and present measurements of running the resulting protocols on a simulated network.
Conference Paper
Most if not all datacenter services use sharding and replication for scalability and reliability. Shards are more-or-less independent of one another and individually replicated. In this paper, we challenge this design philosophy and present a replication protocol where the shards interact with one another: A protocol running within shards ensures linearizable consistency, while the shards interact in order to improve availability. We provide a specification for the protocol, prove its safety, analyze its liveness and availability properties, and evaluate a working implementation.
Article
Docker promises the ability to package applications and their dependencies into lightweight containers that move easily between different distros, start up quickly and are isolated from each other.
Conference Paper
This paper presents Ethane, a new network architecture for the enterprise. Ethane allows managers to define a single network-wide fine-grain policy, and then enforces it directly. Ethane couples extremely simple flow-based Ethernet switches with a centralized controller that manages the admittance and routing of flows. While radical, this design is backwards-compatible with existing hosts and switches. We have implemented Ethane in both hardware and software, supporting both wired and wireless hosts. Our operational Ethane network has supported over 300 hosts for the past four months in a large university network, and this deployment experience has significantly affected Ethane's design.