Verifying Cloud Services: Present and Future
University of Grenoble – LIG
IBM Research and
Royal Holloway, University of
University of Luxembourg
Mountain View, CA, USA
As cloud-based services gain popularity in both private and
enterprise domains, cloud consumers are still lacking in tools
to verify that these services work as expected. Such tools
should consider properties such as functional correctness,
service availability, reliability, performance and security guar-
antees. In this paper we survey existing work in these ar-
eas and identify gaps in existing cloud technology in terms
of the veriﬁcation tools provided to users. We also discuss
challenges and new research directions that can help bridge
With trendsetters like Amazon, Microsoft, Google, or Ap-
ple, cloud technologies have turned mainstream. Tools and
services such as Dropbox, Google Docs and iCloud are widely
used by home users. As cloud technology matures, public
cloud services are becoming more attractive to enterprise
users as well. Interest in the cloud has been shown by play-
ers such as critical infrastructure providers, including medi-
cal and banking industries, power grid operators, and more.
Indeed, the beneﬁts of cloud services, such as ﬂexible and
rapid service deployment, cost reduction, and little (if any)
administrative overhead, are widely accepted. Yet, very lim-
ited tools are currently available for clients to monitor and
evaluate the provided cloud services.
It is only reasonable that consumers who pay for a service
expect it to be (among other features) available, reliable, se-
cure and careful with their data. But examples abound that
this is not always the case: in terms of availability, Ama-
zon Elastic Cloud faced an outage in 2011 when it crashed
after starting to create too many new backups of its stor-
age volumes . Many large customers of Amazon such as
Reddit and Quora were down for more than one day. In-
tuit experienced repeated similar outages  in 2010 and
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for proﬁt or commercial advantage and that copies
bear this notice and the full citation on the ﬁrst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior speciﬁc
permission and/or a fee.
Dagstuhl Seminar ’12 Dagstuhl, Germany
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.
2011. No explanation had been provided to customers, who
for long could not access their ﬁnancial data. Authentica-
tion and authorization problems are also common: in 2011,
DropBox admitted that a bug in their authentication mech-
anisms had disabled password authentication ; hence,
for four hours, the accounts of Dropbox’s 25 million users
could be accessed with any password. Similar authorization
issues have aﬀected other cloud service providers, including
Google . Recent data loss incidents include the Sidekick
disaster , involving T-Mobile and Microsoft. A server
failure is said to have caused the loss of the data of one
million users, who had to wait for two weeks to have their
two-week-old data restored. In numerous incidents user data
was compromised by hackers. For example, Sony Playsta-
tion was compromised in 2011 . Forensic analysis took
several days to complete and the breach caused the data of
77 million users to be stolen.
Users could beneﬁt from knowing to what extent a cloud
provider delivers the promised service. More concretely, the
contract between the cloud and its users should be veriﬁable
(at least to some extent) and the ability to detect failures,
without relying solely on the cloud provider’s report, can be
useful to the users. For example, it may be important to
promptly ﬁnd out that a service does not respect its func-
tional speciﬁcation; or that it generously shares personal
data with the world; or that it is down, underperforms, or
if its basic security controls seem to be failing. This in-
formation can be especially helpful for critical applications
such as medicine or banking and facilitate their process of
adopting cloud technologies. In addition, veriﬁcation tools
to check these aspects can help consumers pick and choose
a particular cloud provider.
But, other than the claim that a service achieves a certain
goal (stores or serves data, computes a function, etc.), what
delivery parameters may be of interest to consumers? To
answer this question, we identify several areas of concern:
(C1) Trusted software and server identity. Is the ser-
vice running the right software over the correct set of
(C2) Functional correctness1.Once the service is run-
1Functional properties are speciﬁed as properties of individ-
ual executions of the system (an execution is an alternating
sequence of global system conﬁgurations and events, speci-
ﬁed by the protocol), and similarly veriﬁed over executions.
They can be both safety and liveness properties.
ning, is it doing what it is supposed to?
(C3) Performance and dependability. How eﬃcient is
the service? Is it reliable and available?
(C4) Security. Does the service comply with security poli-
cies, if any?
State of the art research has started to tackle these issues
individually. However, we feel that cloud users may beneﬁt
from understanding in breadth, rather than only in depth,
whether they could verify service provisions in the cloud.
Answering concerns such as C1-C4 can raise awareness and
lead the way to more tools that empower cloud users.
In this paper, we attempt to identify existing gaps in to-
day’s cloud technologies with respect to concerns C1-C4. We
discuss recent research advances, and propose directions for
future research to help and bridge those gaps. Our survey
is related to several others in the area [23, 66]. However,
in this case we go beyond a few random examples and pro-
vide a ﬁrst attempt at better systematizing potential client
concerns and related solutions.
We consider two main types of cloud customers – service
providers deploying their software for execution in the cloud
(with Platform-as-a-Service or Infrastructure-as-a-Service),
and cloud users, who use a software or storage service exe-
cuting in the cloud, be it provided by a third party software
provider or by the cloud provider. Next, we brieﬂy introduce
the diﬀerent research areas covered in the remainder of the
Veriﬁcation of Strong Service Identities.
Today, service providers have no guarantees that the ser-
vices being delivered to their users match the implemen-
tation deployed to the cloud. The risk of cloud misman-
agement stemming from cloud administration mistakes or
from abuse by other cloud tenants could result in corruption
or misconﬁguration of the service implementation. Conse-
quently, the service could deviate from the behavior origi-
nally intended by the service provider. For example, previ-
ous work  manipulated the identities of virtual machine
images to demonstrate an attack on the consumers of Ama-
zon EC2. In Section 2, we discuss a possible path towards
enabling service providers to attest the deployed services and
check for compliance with their original service implemen-
tation. The idea is to bind a strong service identity to the
service instances on the cloud such that this unique associa-
tion is preserved throughout the entire service lifecycle, from
deployment to decommissioning. We focus on a promising
implementation of this idea based on Trusted Computing.
Cloud nodes run special software stacks – trusted software
systems – that can host the service instances in special en-
vironments, isolated from both the administrator and other
tenants. Cloud nodes are also equipped with commodity
trusted computing hardware, which validates the integrity
of the software stack upon boot and enables service providers
to verify that the nodes are running a trusted software sys-
tem; if this is the case, service identity is preserved. In Sec-
tion 2, we introduce this general approach, discuss existing
related work, and highlight the main challenges in realizing
Verifying Functional Properties of Cloud Services.
Users can beneﬁt from gaining assurance that the behav-
ior of a cloud service complies with its advertised functional
speciﬁcation. In Section 3, we propose a new approach al-
lowing the users to verify service integrity in a scalable fash-
ion without relying on either a centralized certiﬁcation au-
thority or access to the actual implementation code.
Our approach is based on decomposition of the veriﬁca-
tion process into three phases: test suite generation,test
suite execution, and validation of the results, where each
phase can be performed at a diﬀerent location to maximize
performance and exhaustiveness of the veriﬁcation process.
Our proposal for implementing the test suite generation
is based on black-box testing techniques that generate test
suites covering all interesting behaviors described by the
speciﬁcation. Since in our framework, the speciﬁcation is de-
scribed as a state machine, a test suite would produce inputs
to generate all possible traversals of the state machine. Test
suite execution is done in the cloud, and the resulting traces
are stored in the cloud for future compliance testing. The
latter is done on the end-user infrastructure using sampling
techniques, such as property testing. Sampling improves ef-
ﬁciency and scalability of our approach, while guaranteeing
speciﬁcation compliance with high probability.
Veriﬁcation of Cloud Storage Services.
While generic veriﬁcation methods such as those we pro-
pose in Sections 2 and 3 may, in the future, allow verifying
functional properties of cloud services, they have not yet ma-
tured. Multiple recent works have tackled speciﬁc concerns
that arise in the context of cloud storage, and promising
techniques have emerged. In Section 4 we survey such de-
sirable storage properties and state of the art veriﬁcation
Performance and Dependability Non-Functional Prop-
Verifying non-functional properties like performance, de-
pendability, energy consumption and economical costs of
clouds is challenging today due to ad-hoc management in
terms of quality-of-service (QoS) and service level agree-
ment (SLA). We believe that a diﬀerentiating element be-
tween cloud computing environments will be the QoS and
the SLA provided by the cloud. In Section 5, we call for
the deﬁnition of a new cloud model that integrates service
levels and SLA into the cloud in a systematic way. The
proposed approach aims to combine and guarantee multiple
cloud service level objectives in a consistent and ﬂexible way.
It also allows to provide better than best-eﬀort cloud QoS
through a control-theoretic approach for modeling and con-
trolling SLA-oriented cloud services. We also discuss how
to help system designers build SLA-oriented clouds that are
controllable by construction, and how to assess cloud service
Security-Oriented Non-Functional Properties Veriﬁca-
Service providers may request that the deployment of their
service in the cloud adheres to certain security constraints.
For example, a service provider might ask that their de-
ployed service should only reply to authorized requests com-
ing from the US, between 2 and 6 pm, or that it should never
divulge sensitive data to a set of end users, or that it should
destroy or backup data at periodic intervals and in a certain
way. These behavioral constraints are often independent
of the application that is being provided. It is diﬃcult to
guarantee adherence to such constraints, because of the dy-
namic and multi-tenant nature of the cloud environment.
For both users and service providers, it can be beneﬁcial to
have tools that monitor the high-level system behavior and
raise ‘alarms’ when security policies of this type are violated.
Such monitoring tools have not yet matured. Section 6 ex-
plains the connected issues and advances in more detail.
In what follows we examine these topics in more detail.
2. VERIFYING STRONG SERVICE IDEN-
A service provider incurs risks of cloud mismanagement
when making use of a cloud provider’s infrastructure for
hosting services. If the software that the service provider de-
ploys to the cloud is tampered with or replaced for a diﬀerent
version, the service in production could deviate from the in-
tended implementation and distress the service provider and
users. The question we address is: How can cloud providers
guarantee a strong identity between the software running on
the cloud nodes and the service implementation?
2.1 Deﬁnitions and Approach
We focus on enforcing the property of strong service iden-
tity on a cloud platform. If Sdenotes the service software
implementation produced by the service provider and S0
an instance of the software service Shosted in the cloud,
strong service identity is satisﬁed if and only if the invari-
ant S=S0holds for the entire lifecycle of Sand in all
the nodes where Sis instantiated. The lifecycle of a ser-
vice spans the period between its deployment until its de-
commissioning. Throughout this length of time, the service
might be replicated or migrated across various cloud nodes.
In Infrastructure-as-a-Service (IaaS) the service is deployed
as a virtual machine image and instantiated in virtual ma-
chines (VMs). In Platform-as-a-Service (PaaS) the service
is shipped as an application package and instantiated into
objects in application containers.
To enforce strong service identity, a cloud platform could
provide trusted containers. A trusted container hosts the
state of a service instance in isolation from other tenants
and from the cloud administrator. This protection is en-
forced throughout the service lifecycle. When migrating or
replicating service instances to other nodes, the trusted con-
tainer veriﬁes that the sink is also a trusted container and
transmits any relevant service code and data to the sink
over an encrypted channel. The service provider can also
verify that the target host oﬀers trusted container protec-
tions before deploying the service. As a result, insofar as
the service is instantiated in trusted containers, the strong
service identity is satisﬁed.
The implementation of the trusted container semantics on
the cloud nodes could be carried out by a privileged software
system. A trusted software system oﬀers a speciﬁc hosting
abstraction and is crafted so that neither the administrator
nor other tenants have access to service instances’ state. Ex-
amples of such systems include CloudVisor , which lever-
ages nested virtualization to protect the conﬁdentiality and
integrity of guest virtual machines in Xen. Other trusted
software systems exist, for example, oﬀering isolation at the
process granularity . These systems could be used not
only to protect the state of the service instances, but also to
protect the back-end cloud systems (e.g., database servers).
The question then is how can remote parties verify that the
cloud nodes execute a trusted software system rather than
an insecure OS or hypervisor.
To provide such a validation capability, we leverage com-
modity Trusted Platform Module  (TPMs) chips de-
ployed on the cloud nodes. TPM enables remote attestation
of a cloud node. During bootstrap, a cloud node executes
a sequence of programs and stores the hashes of these pro-
grams in the TPM’s internal registers. Since these registers
cannot be rewritten unless the machine reboots, their con-
tent reveals the bootstrap state of a node and the TPM
enables to securely convey the state of these registers to a
remote party using an attestation protocol. To prevent man-
in-the-middle attacks, the TPM signs the registers’ content
with the private part of a cryptographic keypair that never
leaves the TPM in plaintext. The remote party can then
verify the signature and the content of the TPM registers
using a public key certiﬁcate given by the cloud provider:
if the trusted software system boots on the cloud node, its
respective hash will show up in the TPM’s registers.
By rooting trust in TPMs and on trusted software sys-
tems we require that both these components are correct.
Under this assumption, strong service identity could be en-
forced in the presence of powerful adversaries. The TPM
can protect the content of its registers from a malicious ad-
ministrator with privileges to manage the cloud nodes from
a remote site: he can reboot the nodes, access their local
disks, install arbitrary software, and eavesdrop the network.
TPMs, however, cannot defend against physical attacks. We
assume that the hardware is protected by complementary
mechanisms deployed within the cloud provider’s premises.
In summary, by implementing the trusted container ab-
straction, a cloud platform architecture based on a trusted
software system and TPMs deployed on the nodes could en-
force the strong software identity. Through the use of attes-
tation, this architecture enables service providers and users
to obtain tangible evidence of compliance with the strong
software identity property. Next, we examine existing work
that materializes some of these concepts in concrete systems.
2.2 Existing Work
We brieﬂy survey the existing work on 1) enforcing strong
identity in IaaS, 2) leveraging TPMs in the cloud, and 3)
implementing trusted containers on the cloud nodes. To the
best of our knowledge, no system today implements strong
service identity in PaaS platforms.
Strong software identity in IaaS. In IaaS, services are
typically dispatched to the cloud provider in a virtual ma-
chine image. Enforcing strong identity, then, requires devis-
ing a hardened hypervisor that can oﬀer trusted container
semantics at the granularity of VMs. The hardened hyper-
visor must enforce VM state isolation from the cloud ad-
ministrator. To ensure conﬁnement of VMs only to cloud
nodes running the hardened hypervisor, cloud nodes are at-
tested based on the TPMs located on the nodes locally. To
give users and service providers guarantees of service iden-
tity (i.e., that the VM image of the VM executing on the
nodes is the VM image uploaded by the service provider and
instantiated on the cloud) attestation can also be done from
outside the cloud. This architecture was ﬁrst proposed by
Santos et al. . To implement the role of the hardened
hypervisor, CloudVisor  could be used.
Systems for leveraging TPMs in the cloud. Some sys-
tems have been developed that, while not oﬀering directly
the property of strong software identity, provide a building
block for doing so. Schiﬀman et al.  proposed a system
that allows for the remote attestation of cloud node’s hy-
pervisor and VM image from outside the cloud. A more
advanced version of this system is Excalibur . Excalibur
prevents performance bottlenecks due to TPM ineﬃciency
and oﬀers an abstraction for sealing data based on policy
such that only the nodes that satisfy that policy can unseal
and interpret the data. For example, by sealing a VM image
to a policy designating CloudVisor as the trusted hypervi-
sor, the service provider is guaranteed that only the nodes
running CloudVisor could instantiate the VM image thereby
abiding by the strong identity property. Excalibur can sup-
port other software stacks, not only hypervisors, a feature
that might be relevant in PaaS. Excalibur also supports re-
strictions based on the node location, which gives service
providers additional control over VM placement.
Systems for implementing trusted containers. While
VMs have been the preferable hosting abstraction in the
context of cloud computing [86, 20], other systems can oﬀer
alternative abstractions that could be more suitable for cer-
tain use cases. Systems like Nexus  provide trusted con-
tainer abstractions at the process level. This could be more
appropriate for cloud platforms that do not run VMMs on
their cloud nodes. Maniatis et al.  propose trusted con-
tainer abstractions as application sandboxes, which can be
more suitable for isolation of web applications. Considerable
amount of research was also geared toward oﬀering trusted
container abstractions while depending on a small trusted
computing base so as to reduce the chance of vulnerabilities
in the code that could lead to security breaches [86, 77].
2.3 Challenges and Scientiﬁc Directions
While the existing work has focused on supporting strong
service identity for IaaS and designing specialized building
blocks for cloud attestation and trusted container support, a
considerable gap exists between what these mechanisms can
oﬀer and what is necessary to enforce strong service identity
in PaaS. We highlight three main challenges.
High-level PaaS container abstractions. PaaS plat-
forms typically oﬀer its users programming abstractions that
enable them to implement service applications with high
level languages like Java or Python. The service implemen-
tation typically consists of a set of classes which make use of
an API deﬁned by the PaaS provider. These classes are then
packaged, dispatched to the cloud, and instantiated by the
PaaS platform in isolated containers. Containers typically
depend on a software stack that includes the OS, a runtime
engine (e.g., JVM), libraries, and back-end services (e.g.,
databases). In existing PaaS platforms, however, contain-
ers do not yet oﬀer the property of strong service identity.
To enforce this property, one direction is to enhance exist-
ing containers according to the trusted container semantics.
This task, however, is challenging using the known mech-
anisms. On the one hand, trusted container abstractions
based on VM  or process  are too low level to be use-
ful for the PaaS users. On the other hand, trusted container
abstractions oﬀering application sandboxes  depend on
a very large trusted computing base (TCB); with this ap-
proach it would be necessary to trust the entire PaaS stack
therefore incurring TCB bloating. How to provide high-level
PaaS abstractions with a small TCB is an open question.
Integration with PaaS back-end. When instantiated in
a PaaS container, a service instance will normally make use
of additional PaaS back-end services, which include for ex-
ample databases and transaction monitors. When devising
trusted containers for PaaS, it is necessary to account for the
fact that the integrity of the service instance hosted by the
container could be compromised by a back-end service. In
fact, by yielding erroneous results, a back-end service could
taint the code or data of a PaaS user’s service instance, and
introduce corruption that could violate the strong service
identity that we wish for. This danger raises several ques-
tions: How can PaaS users know if a back-end service is
reliable and therefore know if it can be used safely? How to
handle the heterogeneity of back-end services, each of them
featuring particular capabilities that raise various conﬁdence
levels with their users? How to deal with software updates
of the back-end services and determine whether updates are
secure? What implications will these issues have to the pro-
gramming model oﬀered to PaaS users?
Distribution and migration of PaaS service instances.
In general, the PaaS-hosted services can be expected to be
both multi-tiered and clustered. As a result, a service com-
prises multiple components which can be distributed across
several cloud nodes. These components are hosted in in-
dependent containers and communicate among themselves
over secure channels. It is also common that, for resource
management reasons, a PaaS platform might migrate com-
ponents around across diﬀerent hosting containers, e.g., for
balancing load. Components might also need to be instan-
tiated in or eliminated from containers in order to accom-
modate the elastic variations in the service demand. To ac-
count for all these scenarios when implementing the trusted
container semantics, it is then necessary to always attest
a hosting cloud node before creating a component instance
and to provide that the distributed component instances can
authenticate and communicate securely. Existing systems
that support attestation in the cloud have been used only in
the context of IaaS for attesting hypervisors and VMs [72,
71]. In IaaS, however, the number of VMs that need attesta-
tion is signiﬁcantly smaller than a potentially large number
of PaaS service components. It is unclear if existing systems
could withstand such a large attestation demand without
incurring scalability bottlenecks.
3. VERIFYING FUNCTIONAL PROPERTIES
OF CLOUD SERVICES
The techniques described in the previous section allow
the PaaS services to be associated with a strong identity,
which is being preserved throughout the entire software life-
time withstanding administration mistakes, and tampering
attempts. In this section, we focus on a complementary
question, namely, given a uniquely identiﬁed service instance
deployed and running on the trusted PaaS platform, how
can we eﬃciently verify that its behavior complies with the
functional properties advertised by its provider?
Our approach to verifying functional properties of the
PaaS services is based on the software testing paradigm.
Conceptually, the software testing process can be viewed as
consisting of the following three phases (which can be inter-
Invalid Credit Card
Figure 1: Speciﬁcation of the Checkout Flow of an
On-Line Shopping Site. The speciﬁcation is mod-
elled as a ﬁnite-state automaton consisting of 7
states, 5of which belong to the interactive portion of
the checkout process. Each of these 5states allows
the customer to return to any one of the preceding
states to revise the data entered at that state. In ad-
dition, another 3states in the interactive group have
self-cycles allowing the customer to correct errors in
the supplied information. The total number of cy-
cles in the automaton graph is therefore, equal 17,
and grows quadratically with the number of states.
leaved to improve performance):
Test suite generation: the speciﬁcation and tested soft-
ware are analyzed to extract eﬀective test cases which
are then assembled into a test suite.
Test suite execution: the software is subjected to the test
suite produced at the previous stage.
Result validation: the traces generated by running the
test suite are compared against those prescribed by
the speciﬁcation, producing “pass” or “fail” outputs for
each compliant and non-compliant trace, respectively.
In order to make the above process amenable for testing
PaaS services hosted in the cloud, the following challenges
must be addressed.
First, since the cloud software is typically developed and
distributed by a third party Software-as-a-Service (SaaS)
provider, the service implementation code cannot be as-
sumed to be available to the end users. This precludes the
test suite generator from using white-box testing techniques
(such as symbolic execution [47, 24, 28]), which utilize the
knowledge of the code structure to achieve high quality cov-
erage of possible execution paths. In Section 3.2, we dis-
cuss alternative approaches to implementing the test suite
generator, and propose several solutions based on black-box
Second, the cloud-based services are typically interactive
(see Figure 1): i.e., they are being driven by on-line user
inputs (e.g., supplied through a web-based interface), which
are forwarded to the remote service implementation via an
RPC-style protocol (such as, e.g., REST , or SOAP ).
Consequently, executing the service test suite on the user
premises might result in high communication costs, and slow
down the entire testing process. Instead, the cloud provider
must oﬀer support for executing the test suite on the cloud
infrastructure while minimizing the interaction with the user
2. Request Service Spec
4. Test Suite
3. Service Spec
7. Verify Executions
Figure 2: Veriﬁcation Framework for Services in a
to the largest possible extent. The users must, however be
oﬀered tools to eﬃciently validate the test execution results
to guard against the possibility of them being faked by a
potentially dishonest cloud provider.
Third, the service logic can be fairly complex as it must
be able to accommodate a wide-range of on-line interaction
scenarios such as, e.g., undoing the eﬀects of previously ex-
ecuted steps of an on-line transaction (e.g., resulting from
the user pressing the “back” button in the browser), or time-
outs following long periods of inactivity. As a result, even a
service with a small number of interaction steps may end up
exhibiting large numbers of acceptable behaviors resulting
from repeated traversals through the interaction workﬂow
cycles (see Figure 1). Exhaustive testing of all the resulting
behaviors may end up producing large volumes of lengthy
output traces whose validation may be too costly to conduct
on a less powerful end user infrastructure.
To address the above challenges, we propose a new dis-
tributed testing framework enabling an eﬃcient veriﬁcation
of services hosted on a remote cloud. Below, we discuss the
framework architecture, and some of the challenges associ-
ated with its implementation.
3.1 Testing Framework Architecture
The architecture of our testing framework is depicted in
Figure 2. Unlike the existing testing solutions, in our frame-
work, the test suite execution and result validation phases
are disjoint from each other, with the former being assigned
to the Testing Harness component hosted in the cloud, and
the latter being executed by Result Veriﬁer installed on the
The service implementation is provided by the Software-
as-a-Service (SaaS) provider, which is also responsible for
advertising its speciﬁcation. The user inspects the adver-
tised speciﬁcations to select the service, whose speciﬁcation
is the closest match to the user requirements. To stream-
line the service selection process, the speciﬁcation must be
expressed in a standardized speciﬁcation language, such as,
e.g., Web Service Deﬁnition Language (WSDL) . Here,
we omit the details of the service speciﬁcation framework,
which is the subject of future work.
Next, the speciﬁcation is analyzed by Test Suite Genera-
tor to produce a test suite using the black-box testing tech-
niques  (Section 3.2). The resulting test suite is then
submitted to Testing Harness, which deploys the service in-
stance on the cloud-based execution platform, subjects the
deployed instance to the submitted test suite, and stores the
results on the cloud storage facilities. The Result Veriﬁer
can then validate the execution results using the techniques
described in Section 3.4.
In the following sections, we discuss approaches to imple-
menting Test Suite Generator, Testing Harness, and Result
Veriﬁer in more detail.
3.2 Test Suite Generator
A simple way to create a black-box test suite is to gener-
ate a collection of random sequences consisting of the input
invocations as deﬁned by the service API. Although this
technique can be highly eﬀective in ﬁnding bugs in real sys-
tems, it does not guarantee much in terms of the quality of
coverage of the service speciﬁcation.
In contrast, in a more sophisticated black-box methodol-
ogy, known as speciﬁcation-based or model-based testing ,
the test suite is derived from the service speciﬁcation, mod-
elled as a state machine. To guarantee exhaustiveness, the
test suite must include a test case for each possible traver-
sal through the speciﬁcation automaton. Although the test
suite constructed in this fashion does not necessarily check
implementation-speciﬁc details, it provides an assurance that
all observable behaviors of the service will be exercised. The
test suite composition can be further adjusted to achieve a
desired balance between the path coverage and performance,
e.g., by excluding test cases exercising less interesting behav-
Note that the standard service API might not always
be suﬃcient to exhaustively exercise all the behaviors pre-
scribed by the service model. For example, the test cases
necessary for exhaustive testing the credit check portion of
the Checkout workﬂow in Figure 1 will be impossible to
generate, using the service’s standard API, without a priori
knowledge of the real customer credit data.
To address this problem we could require the software
provider to expose a special testing API that augments the
standard service API with calls instrumented for the test-
ing purposes (such as, e.g., those simulating requests from
customers with low and high credit scores in the example
above). Note, however, that supporting testing APIs in the
cloud settings requires cooperation on behalf of the underly-
ing cloud platform to ensure that the testing inputs are not
activated during the normal service operation.
Generalizing this approach, of using a public API and a
special testing API in order to generate an exhaustive testing
suite without compromising security, into a complete solu-
tion applicable to realistic services is an interesting research
problem, which we intend to pursue in the future.
3.3 Testing Harness
Testing Harness (TH) is responsible for executing the test
suite submitted by the user on an instance of the service
of interest. One important aspect that must be addressed
by the TH implementation is the degree of isolation of the
tested service instance from other applications concurrently
running on the cloud. In particular, the functional correct-
ness of the implementation code is best tested “in-vitro”,
that is, when its instance deployed in a fully dedicated run-
time environment. To achieve this degree of isolation, the
underlying trusted PaaS platform (see Section 2) must ex-
pose the necessary hooks which can be leveraged by TH to
create an execution environment with well-deﬁned isolation
One limitation of the “in-vitro” testing is that it cannot
guarantee that the functional properties, which passed vali-
dation when tested in isolation, will continue to hold when
the service is deployed in a production environment that
could be shared by a large number of other cloud tenants.
For example, when not adequately protected against unau-
thorized accesses on behalf of other co-hosted services, the
service might be compromised to exhibit a behavior that
arbitrarily deviates from its speciﬁcation.
To address this problem, TH must oﬀer “in-vivo” test-
ing  capabilities that will allow the service instance to
be deployed and tested on a simulated or real multi-tenant
runtime. In addition, the TH and underlying PaaS runtime
must oﬀer hooks that will monitor and log accesses to shared
multi-tenant resources. Building such an in-vivo testing en-
vironment along with automating generation, testing, and
veriﬁcation of multi-tenancy related properties is an inter-
esting research direction to pursue in the future.
3.4 Result Veriﬁer
A straightforward implementation of the result valida-
tion phase would be to execute the speciﬁcation automaton
on each trace produced at the test suite execution phase.
This approach may, however, be too expensive for analyzing
long traces, such as those resulting from repeated traversals
through the speciﬁcation automaton cycles (see above). To
validate such traces more eﬃciently, we propose to utilize
a probabilistic technique known as combinatorial property
testing [9, 29].
Roughly, this technique is based on the observation that a
compliant trace, whose length exceeds the size of the speci-
ﬁcation automaton (in terms of the number of states), must
visit the same state more than once. On the other hand,
a trace that is too far from being compliant must contain
enough states that do not ﬁt any possible traversal on any
cycle of the automaton. Hence, in order to verify the trace
compliance with high probability, it is enough to check whether
asample of the trace, of size that depends on the number
of cycles in the automaton graph, ﬁts into the pattern of
traversing cycles. A property testing algorithm then pro-
ceeds by sampling short (constant length) segments of the
trace and checks whether they ﬁt into some cyclic path on
the automaton. Note that in this algorithm, we assume that
the input trace is much longer than the longest cycle-free
path in the automaton. This is because short traces can be
veriﬁed exhaustively as described above.
In order to illustrate the advantages of this technique, con-
sider a linear workﬂow Wconsisting of nstates such that
for each state sin W, there is an edge pointing back to a
state s0such that s0is preceding sin W(see, e.g., the inter-
active sub-ﬂow of the checkout workﬂow in Figure 1). The
number of cycles cin Wis then roughly on the order of
n2, with the average cycle length being n/2. Consequently,
the average length of the trace τproduced by executing
a test case exercising each cycle at least k > 0 times will
be at least kcn/2, which is also equal to the complexity of
the exhaustive compliance check of τ. In contrast, with the
property testing-based compliance check, the complexity de-
pends only on c, resulting in a signiﬁcant speedup compared
to the exhaustive check, for large values of k. For example,
the complexity of the exhaustive compliance analysis for the
trace resulting from one time traversal of each cycle of the
interactive part of the checkout ﬂow in Figure 1 will have
traverse 47 states as compared to just about 13, which we
expect to be the number of states, required for the property
testing based analysis in practice.
4. VERIFYING PROPERTIES OF CLOUD
Users increasingly rely on the cloud for storage, instantly
uploading their photos, documents, scheduled system back-
ups and more. In this section, we explore some of the prop-
erties expected by users from a cloud storage service and
survey recent work on the veriﬁcation of these properties.
4.1 Protecting Against a Byzantine Provider
We start by describing properties for which the known ver-
iﬁcation methods can overcome any adversarial cloud provider,
even a fully malicious one.
Integrity. One of the basic properties expected from a
storage system is data integrity. Users must be conﬁdent
that their data is not altered while being stored or trans-
ferred to and from the storage service. A simple way to
guarantee this is to use error detecting (or error correct-
ing) codes. To protect against intentional tempering of the
data, a client may use a cryptographic hash function and
separately maintain the key. For large volume of data, hash-
trees  are commonly used to verify data integrity without
recomputing a hash of the entire data for the purpose of ver-
iﬁcation. The leaves of a hash-tree are hashes of data blocks,
whereas its internal nodes are hashes of their children in the
tree. A user is then able to verify any data block by storing
only the root hash of the tree and performing a logarithmic
number of cryptographic hash operations. When multiple
users share data using a remote storage service, digital sig-
natures allow the clients to verify data integrity.
Consistency. Although these methods guarantee that
the storage will not be able to corrupt or forge the data, it
does not prevent a storage service from simply hiding up-
dates performed by one client from the others, or showing
updates to clients in diﬀerent orders. In fact, this would
be impossible to detect without additional trust assump-
tions (such as TPM) or alternatively the clients being able
to jointly audit the server’s responses. Several solutions
using trusted components were proposed [31, 84], guaran-
teeing strong consistency (i.e., linearizability ) even if
the service is malicious. A diﬀerent approach, not assum-
ing any trusted components, was pioneered by Mazi`eres and
Shasha [58, 51], introducing untrusted storage protocols and
the notion of fork-consistency. Intuitively, traditional strong
consistency guarantees that all clients have the same view of
the execution history. On the other hand, fork-consistency
guarantees that client views form a tree, where forks in the
tree are caused by a faulty server hiding operations of one
client from another. To date, this is the strongest known
consistency notion that can be achieved with a possibly
Byzantine remote storage server where no trusted compo-
nents are assumed and when the clients do not communicate
with one another (once clients can communicate directly,
they are able detect that their views were forked by the
server). Multiple systems were based on this idea, starting
with SUNDR , a network ﬁle system designed to work
with a remote and potentially Byzantine server. Cachin
et al.  implement an SVN system hosted on a poten-
tially Byzantine server. In FAUST , authors study fork-
consistency more formally, including a proof that guaran-
teeing this notion comes with a price on service availability,
even when the server is correct, and propose a new consis-
tency notion (weak-fork linearizability) that overcomes this
limitation. Venus , a veriﬁcation system built with Ama-
zon S3, uses a weak-fork linearizable protocol as a building
block but provides more traditional consistency semantics to
its clients. When the server is correct, weak-fork linearizabil-
ity allows Venus to guarantee a strong notion of liveness (i.e.,
service availability), where clients are not aﬀected by failures
of other clients. Venus uses direct automated emails among
the clients to uphold strong consistency semantics and to
provide eventual detection of storage failures. Feldman et
al. introduced SPORC , a system which likewise guar-
antees a variation of fork-consistency, but for the ﬁrst time
allows not only to detect storage faults but also to recover
from them by leveraging the conﬂict resolution mechanism of
Operational Transformation. Finally, we note that a similar
consistency notion  was recently used in a non-Byzantine
setting to model consistency in the context of mobile clients
performing disconnected operations , suggesting a yet to
be explored connection between untrusted storage and dis-
connected operations or, more generally, with the traditional
model of message passing with omission faults.
Similarly to storage failure detection using direct commu-
nication among clients, if a global trace of client operations
and storage responses is available, many inconsistencies can
be easily detected [11, 85, 81].
Finally, systems such as Intercloud Storage  and Dep-
Sky  replicate data over multiple clouds in order to mit-
igate integrity or consistency violations and potential un-
availability caused by a provider failure.
Retrievability. How can clients assure that their data is
still stored somewhere in the cloud and not lost by a provider
trying to cut storage costs? As the amount of uploaded
information grows, it is often infeasible for clients to check
data availability by periodically downloading all the data.
This challenge was addressed in the form of new veriﬁcation
schemes: Proofs of Retrievability (PORs)  and Proofs
of Data Possession (PDP) . These protocols guarantee
with high probability that the cloud is in possession of the
data using challenges submitted by the client. The basic
idea is that a client submits requests for a small sample
of data blocks, and veriﬁes server responses (using small
additional information encoded in each block or by asking for
special blocks whose value is known in advance to the client).
Recently, these schemes were generalized and improved, and
prototype systems have been implemented [75, 18, 17]. This
line of work has also lead to the development of schemes for
veriﬁcation of other properties, as we describe next.
4.2 Protecting Against an Economically Ra-
tional Cloud Provider
In what follows, the veriﬁcation methods assume an eco-
nomically rational adversary. Such cloud provider may cheat
but will not do so if it requires spending more money or other
resources compared to correct behavior.
Conﬁdentiality. To prevent information leakage and
provide data conﬁdentiality, it is usually expected that stored
data is encrypted. Clients can encrypt the information with
their own keys before storing it to the cloud. However, this
is often not desired as access to the unencrypted data allows
the provider to oﬀer a richer set of functionality, beyond
storage, such as searching the data or sharing it with other
authorized users. Instead, the provider is usually entrusted
with encrypting the data. Recent incidents have shown that
providers do not always uphold this expectation . Au-
thors in  have recently proposed a scheme to probabilis-
tically ensure that the provider indeed stores the data in an
encrypted form. The main idea is to ensure that the data is
stored in the desired, encoded format Gby encapsulating G
into another format H, which the provider will store instead
of G, and such that Hhas several desired properties. First,
it should be easy for the provider to convert Hback into G.
Second, it should be diﬃcult for the provider to cheat by
computing some part of Hon-the-ﬂy, in order to respond to
a client’s query, without processing the entire data in format
G. And ﬁnally, there should be a certain lower bound on the
time required to translate Ginto its encapsulation H, using
assumptions on a constrained resource, such as the computa-
tional power of the provider, physical storage access times,
network latencies, etc. The client can then challenge the
cloud provider at a random time to produce random chunks
of H, and require that the cloud provider does so within
a time period τ. Timely correct response proves with high
probability that the provider is indeed storing Hand is not
computing Hon the ﬂy. Obviously, this scheme does not
prevent the provider to store another, unencrypted, copy of
the data (incurring double the storage). Instead, it provides
a strong negative incentive to do so, and hence assumes an
economically rational provider.
Redundancy. Storage providers guarantee a certain level
of reliability through data replication. For example, Ama-
zon S3 promises to sustain the concurrent loss of data in two
facilities whereas a cheaper oﬀering from Amazon called the
Reduced Redundancy Storage (RRS) guarantees to sustain
the loss of a single storage facility. How can the client make
sure that the oﬀered level of redundancy is in fact being pro-
vided? Authors in  propose a scheme where, similarly
to the ideas described above, random challenges for data
are submitted to the storage provider and timely responses
are expected. In the proposed scheme, the client and cloud
provider agree upon the following: (a) an encoding Gof the
data with an erasure code that tolerates a certain fraction
of block losses, and (b) a mapping of the blocks of Gonto c
drives, such that the data is spread evenly over the drives.
The client then submits queries, each requesting crandomly
selected blocks of data, such that each block is expected
to reside on a diﬀerent physical drive. The main idea is
that the mechanics of commercial hard-disk drives guaran-
tee a certain lower bound on the time it takes to retrieve the
data, which for small data blocks is dominated by disk seek
time. The scheme is built such that with high probability
responding to the query within the expected time τwould in
fact require cdrives accessed concurrently. This algorithm
works for a class of adversaries the authors call cheap and
lazy, namely - they would like to decrease the cost of stor-
age by storing less replicas, but would not take the eﬀort of
changing the data or cheating in other forms. The scheme
is designed for hard-disk drives (and not, e.g., SSDs) and
is not resilient to fully malicious adversaries, which, for ex-
ample, may store the data in an encrypted form on cdrives
as required, but then store the encryption key in volatile
memory or on a single drive.
Location. One of the biggest concerns users have when
using a cloud storage service is that once in the cloud, the
user is no longer sure where her data is physically located.
Often, for some types of data, users and especially compa-
nies and organizations that maintain private user data, are
bound by laws and regulations to store their data within a
particular geographical region or country borders. To ad-
dress this, storage location is frequently an integral part of
cloud storage SLAs. Location is also important for disaster
tolerance – the provider may promise not only to replicate
the data but also that the replicas reside in disperse dat-
acenters or geographical locations. How can a client then
verify the actual location of her data in the cloud? Obvi-
ously, it is very diﬃcult, if not impossible, to ensure that a
storage provider does not store a copy of the data outside
of the allowed geographical area. Thus, such veriﬁcation
can only work assuming a weaker adversarial model, such as
an economically rational provider in the Proof Of Location
(PoL) veriﬁcation scheme recently proposed by Watson et
al. . Speciﬁcally, they assume that ﬁle replication (copy)
is only performed by the service provider and for the pur-
pose of providing guaranteed reliability for which the user
is charged. Thus, the provider may try to cheat by storing
a copy of the data in a diﬀerent (perhaps cheaper) location,
but only if this is done instead (and not in addition to) stor-
ing the data in the correct (promised) location. In , the
cloud and client agree on the list of storage replicas and their
locations. The client then uses a combination of an Inter-
net Geolocation system, that can determine the location of
a server using network latencies, with a PoR scheme that
can prove that this server actually possesses the data. More
speciﬁcally, the scheme uses a number of trusted auxiliary
servers as landmarks, whose location is known and that can
send challenges to the storage servers claiming to hold the
data, in order to verify their location. A novel PoR protocol
introduced in  allows the client to encode the data once,
after which the server re-codes it multiple times and stores
a diﬀerent encoding of the data on the diﬀerent replicas.
Storing slightly diﬀerent encodings of the data on diﬀerent
servers prevents the servers from colluding, where a server
can claim to have the data, while in fact fetching it from
another server on-demand to answer the veriﬁcation query.
5. VERIFYING PERFORMANCE AND DE-
Non-functional properties of cloud services represent dif-
ferent aspects of quality-of-service (QoS), such as perfor-
mance, dependability, security, etc., and are quantiﬁed with
diﬀerent metrics. Performance metrics include service re-
quest latency which is the necessary time to respond to a
service requested by a client, and service throughput which
is the amount of requests processed by the service per unit
of time. Dependability metrics include service availability
and service reliability . Availability may be measured as
service abandon rate, that is the ratio of, on the one hand,
the time the cloud service is capable of returning successful
responses to the clients, and on the other hand, the total
time; cloud service availability is measured during a period
of time, usually a year or a month. Availability may also
be represented by service use rate that is the ratio of time a
cloud service is used to the total time. Cloud service reliabil-
ity may be measured as the ratio of successful service client
requests to the total number of requests, during a period of
time. Reliability may also be quantiﬁed as mean time be-
tween failures (MTBF) which is the predicted elapsed time
between inherent failures of the service, or mean time to re-
cover (MTTR) which is is the average time that a service
takes to recover from a failure. Other metrics may be con-
sidered to render cloud service costs, such as energetic cost
that reﬂects the energy footprint of a service, or the ﬁnancial
cost of using a cloud service.
Thus, a QoS metrics is a means to quantify the service
level with regard to a QoS aspect. One might want a service
level to attain a given objective, that is the Service Level
Objective (SLO). A SLO has usually one of the following
forms: provide a QoS metrics with a value higher/lower than
a given threshold, maximize/minimize the QoS metrics, etc.
Therefore, a Service Level Agreement (SLA) is a combina-
tion of SLOs to meet and is negotiated between two parties,
the cloud service provider and its customer. A simple ex-
ample of SLA is the following: “99.5% of requests to cloud
services should be processed, within 2 seconds, and with a
minimal ﬁnancial cost”. This SLA includes three SLOs re-
spectively related to service availability, performance and
5.2 Related Work
The control of services to guarantee the SLA is a critical
requirement for successful performance and dependability
management of services [53, 57, 60]. Much related work has
been done in the area of system QoS management, an in-
teresting survey is provided in . In the context of Cloud
Computing, existing public cloud services provide very few
guarantees in terms of performance and dependability .
In the following, we review some of the existing public cloud
services regarding their levels of performance and depend-
ability. Amazon EC2 compute service oﬀers a service avail-
ability of 99.95% . However, in case of an outage Ama-
zon requires the customer to send it a claim within thirty
business days. And Amazon S3 storage service guarantees a
service reliability of 99.9% . Here again, to be reimbursed,
the customer has the responsibility to report Amazon S3 re-
quest failures, and to provide evidence to Amazon within
ten business days. On the other hand, Amazon cloud ser-
vices do not provide performance guarantees or other QoS
guarantees. Similarly, Rackspace Cloud Servers compute
service oﬀers a service availability of 99.86%, and Rackspace
Cloud Files storage service provides a service reliability of
99.9% . Azure Compute guarantees a service availabil-
ity level of 99.95% , and Azure Storage guarantees that
99.9% of storage requests are handled within ﬁxed maximum
processing times .
Some recent works consider Service Level Agreement (SLA)
in cloud environments [26, 54]. SLA is a contract negotiated
between a cloud service provider and a cloud customer. It
speciﬁes service level objectives (SLOs) that the cloud ser-
vice must guarantee in the form of constraints on quality-
of-service metrics. Chhetri et al. propose the automation
of SLA establishment based on a classiﬁcation of cloud re-
sources in diﬀerent categories with diﬀerent costs, e.g. on-
demand instances, reserved instances and spot instances in
Amazon EC2 cloud . However, this approach does not
provide guarantees in terms of performance nor dependabil-
ity. Macias and Guitart follow a similar approach for SLA
enforcement, based on classes of clients with diﬀerent prior-
ities, e.g. Gold, Silver, and Bronze clients . Here again,
a relative best-eﬀort behavior is provided for clients with
diﬀerent priorities, but no performance and dependability
SLOs are guaranteed.
5.3 Main Challenges and Scientiﬁc Directions
As far as we know, no clouds adequately address service
performance and dependability guarantees, leaving the fol-
lowing questions open:
How to address and combine multiple cloud SLOs in a
consistent and ﬂexible way? Today’s clouds do not provide
guarantees in terms of service performance. One would like
the cloud to allow the customer to specify expectations re-
garding service request response time not to exceed a given
maximum value, or cloud service throughput not to be below
a given minimum value, etc. Regarding dependability, there
are some initiatives in terms of guaranteed levels of cloud
service availability and reliability [3, 4, 5, 6]. However, the
provided SLOs are ﬁxed by the cloud provider in an ad-hoc
way and can not be speciﬁed by cloud customers in a ﬂexi-
ble way. Furthermore, one would expect the cloud provider
to allow the customer to combine multiple SLOs regarding
performance, dependability, security, cost, etc. How could
these SLOs be combined in a consistent way, knowing that
some of them may raise antagonism and trade-oﬀs?
How to provide better than heuristics-based and best-eﬀort
cloud QoS? Existing cloud services provide best-eﬀort QoS
management, usually based on over-provisioning resources
and other heuristics for managing cloud resources. However,
cloud customers would expect strict guarantees regarding
cloud service performance, dependability, cost, etc. How
could a cloud provide such strict QoS guarantees?
How to help system designers build QoS-oriented clouds
and assess QoS guarantees? Cloud services are usually de-
signed as black-boxes. Adding QoS management on top of
these black-boxes is far from being trivial, and raises a chal-
lenging question: How to observe the behavior of cloud ser-
vices in an accurate and non-intrusive way? On the other
hand, with current cloud services the customer has the re-
sponsibility to report violations of QoS levels by analyzing
log ﬁles [3, 4]. However, this is against one of the main moti-
vations of cloud computing, that is hiding the administration
To address these challenges in a principled way, we call for
the deﬁnition a new cloud model where QoS and SLA are
ﬁrst-class citizens. The new model should enrich the general
paradigm of cloud and is orthogonal to Infrastructure-as-
a-Service, Platform-as-a-Service, Software-as-as-Service and
other cloud models, and may apply to any of them. The new
cloud model must take into account both the cloud provider
and cloud customer points of view. From the point of view
of the cloud provider, autonomic SLA management must be
provided to handle non-functional properties of cloud ser-
vices, such as performance, dependability and ﬁnancial cost
in the cloud. On the other hand, from the point of view
of the cloud customer, cloud SLA governance must be pro-
vided. It allows cloud customers to be part of the loop and
to be automatically notiﬁed about the state of the cloud,
such as SLA violations or cloud energy consumption. The
former provides more transparency about SLA guarantees,
and the latter aims to raise customer awareness about cloud
service energy footprint. The new cloud model must allow
the customer to choose the terms of the SLA with the cloud
service provider, to specify the set of (possibly weighted)
SLOs he requires, and to agree on the penalties in case of
SLA violation. The SLOs can be expressed as thresholds to
meet, or as QoS metrics to minimize or maximize.
To provide better than best-eﬀort cloud QoS, a control-
theoretic approach should be followed to design fully au-
tonomic cloud services. First, a utility function should be
deﬁned to precisely describe the set of SLOs as speciﬁed in
the SLA, the weights assigned to these SLOs if any, and the
possible trade-oﬀs and priorities between the SLOs. The
cloud service conﬁguration (i.e. combination of resources)
with the highest utility is the best regarding SLA guar-
antees. Thus, how to ﬁnd such a cloud service conﬁgura-
tion? Control theory techniques through modelling cloud
service behavior, and proposing control laws and algorithms
are good candidates for fully autonomic SLA-oriented cloud
services . The challenges for modelling cloud services
are to build accurate models that are able to capture the
non-linear behavior of cloud services, and that are able to
self-calibrate to render the variations of service workloads.
The challenge for controlling cloud services is to propose
accurate and eﬃcient algorithms and control laws that cal-
culate the best service conﬁguration, and rapidly react to
changes in cloud service usage. Largely distributed cloud
services would require eﬃcient distributed control based on
scalable distributed protocols.
To help build SLA-oriented clouds, cloud services should
be designed to be controllable by construction. The services
should allow to observe their behavior online, to monitor
their changing QoS, and to apply changes on service conﬁg-
uration (i.e. resource set) while the service is running. To
help system designers assess cloud service QoS guarantees,
benchmarking tools are necessary to inject realistic work-
loads, data loads, fault loads, and attack loads into a cloud
service, and to measure their impact on the actual perfor-
mance, dependability and security of the service [8, 50, 69].
6. VERIFYING SECURITY POLICIES
For many years now, SLAs have been standard practice
when setting up the terms of QoS for a service provision.
However, SLAs normally steer clear of any explicit security
commitments, possibly since cloud providers are reserved
about the security guarantees of their services.
This point is proved by sources such as last year’s report
of CA Technologies and Ponemon Institute , where it
was found out that, out of 127 cloud service providers in the
US and Europe, over 80% do not believe that securing their
services gives them a competitive advantage. Then, how can
consumers protect their data and applications?
6.1 Requirements, Policies and Compliance
In order to secure cloud services, providers employ secu-
rity measures that depend on a set of requirements. These
requirements stem from two sources: external sources (e.g.,
laws and regulations), and particular requirements that users
could request. Security policies express accurately both kinds
of requirements. To make sure that such policies are re-
spected, there are tools to enforce policies or verify compli-
ance with policies.
External security requirements. To protect cloud
users, oﬃcial security requirements stem from two main
sources: laws and regulations, and standards that providers
should abide by. Sensitive data protection has been the
target of EU and US laws for several years now, be it in
healthcare or telecommunications. In Europe, for example,
directive 95/46/EC protects personal data (among others,
it forbids the collection and disclosure of such data with-
out the subject’s consent); in the US, the Health Insurance
and Portability Act  aims to restrict access to computer
systems that contain sensitive patient data, as well as to
prevent interception or deletion of such data by unautho-
rised parties. In terms of security standards and guidelines,
the most active sectors are healthcare and banking, with
examples ranging from Health Level 7 to PCI’s Data Secu-
rity Standards . The focus of such standards is securing
healthcare and payment transactions. In all, such external
requirements aﬀect cloud consumers within a single domain
or country, as well as across multiple jurisdictions. It is an
open problem, outside the scope of regulations, what tools
to use and how to employ them in order to satisfy such re-
quirements, for both cloud providers and users.
Security policies. Unlike regulations and laws that can
specify general security constraints in a text form, secu-
rity policies are the machine-understandable speciﬁcation of
what a user considers to be accepted or allowed system be-
havior. A security policy of a cloud consumer can specify,
for instance, that customer-identiﬁable data should not be
propagated to other services; or that the owner should be
notiﬁed of any backups or reconﬁgurations done to their
service. Security policies can impose restrictions on: how
to access and use system resources or the provided service;
user accountability; key management; conﬁguration of the
back-end system (e.g., when to erase application data, when
to do backups, connections to security services). Many en-
terprises have such policies already in place either as a good
practice, or for auditing or certiﬁcation purposes.
Tools to enforce or verify compliance. Enforcing
a security policy means performing the actions to ensure
that the application complies with that policy. Examples of
security enforcement tools are Axiomatics XACML Policy
Server, IBM’s Tivoli Security Manager, or XML gateways
such as Vordel. In a cloud setting, users can either: (1) set
up their own enforcers, when they have control over some
part of the infrastructure, or (2) rely on another party to
enforce their policies, and then verify that the enforcement
is done correctly. An after-the-fact veriﬁcation usually in-
volves analyzing execution logs provided a reporting service
is in place and its output is provided to the user; at run-
time clients can randomly probe the application to discover
policy violations (fast but imprecise), or actively monitor
application or service output (which can be a performance
burden and involves an analysis architecture and process).
6.2 Existing work
Expressing security constraints. Surprisingly, it is
only very recently that the notion of security service-level
agreements has been proposed in the cloud context: one of
the ﬁrst is an HP report  suggesting that clients should
negotiate those security needs that they can understand,
predict and measure by themselves. Examples include: 95%
of serious security incidents should be solved within one hour
from detection; an up-to-date antivirus to scan the system
every day; minimum network availability in case of an at-
tack; the percentage of unpatched or unmanaged machines.
In a similar vein, Jaatun et al.  suggest that a secu-
rity SLA should include: the security requirements that the
provider will enforce, the process of monitoring security pa-
rameters, collecting evidence, and assembling it to infer any
security incidents; problem reporting; compensation and re-
sponsibilities. To this list, Breaux and Gordon  add the
dimension of constraint changes across jurisdictions when
regulations share a common focus. Further, a comparison
of the potential languages that can be used by cloud users
to express such requirements, has been made by Meland
et al. . The authors examine several languages usable
for cloud SLAs, among which there are also XACML, WS-
Agreement, LegalXML; however, they conclude that prior
to choosing how to express security requirements, it is more
stringent to converge towards common concepts of security
contracts. A step in that direction has been made on the
industry side: the Cloud Security Alliance has issued CSA-
STAR , a public registry of the security controls oﬀered
by popular cloud security providers.
Malicious insiders. The role of the system adminis-
trator has become much more prominent in the cloud, and
administrators are scarce resources. First, they have to
be competent at managing intricate multitenant systems
that still require an amount of manual maintenance; sec-
ond, cloud administrators have an exacerbated security re-
sponsibility because their actions can aﬀect sensitive data
and numerous users. With root privileges, an administra-
tor can read log ﬁles, conﬁgurations, patch binaries and run
executables. As shown with the four attacks exempliﬁed
in , a malicious or sloppy administrator hence violate
user data conﬁdentiality, integrity, and even the availability
of cloud nodes. In order to harden compute nodes, Bleik-
ertz et al. suggested a solution  to minimize administra-
tor privileges during maintenance operations. The authors
identify ﬁve privilege levels that an administrator can have
over a node; security policies deployed on each node de-
ﬁne the transitions between privilege levels, while enforcing
these policies and system accountability are performed with
Monitoring and resource management. Recent work
has shown more and more attacks on cloud resources, of
which there are denial of service attacks exploiting cloud
underprovisioning ; fraudulent consumption for Web re-
sources ; or even several vulnerabilities of the Xen’s re-
source scheduler that allows malicious customers to use re-
sources at the expense of somebody else, in Amazon EC2 .
These examples show that even if virtualisation is supposed
to ensure isolation among customers, this isolation is not
complete and there is always another shared resource (cache,
memory, network, etc.) that can be exploited. Sharing thus
becomes very hard to measure, since providers are very likely
to charge incorrectly or even increase their expenses. To
bridge this gap, Sekar and Maniatis proposed the notion
of veriﬁability by which customers can check their resource
consumption, with the help of a trusted consumption moni-
tor which validates consumption reports . Such monitor
is faced with several challenges: reporting can clog band-
width, and performance might suﬀer because of too frequent
measurement. The authors suggest oﬄoading the monitor-
ing to dedicated entities, sampling and periodic snapshots
of resource consumption. A similar solution is suggested by
Wang and Zhou , who aim to provide a dedicated service
to collect and monitor evidence that a multitenant platform
6.3 Main challenges and next directions
Cloud consumers and providers are often tempted to con-
sider encryption as the only security tool for suﬃcient pro-
tection when using the cloud. Yet, encryption is not a
panacea for everything: data and resources are shared, ap-
plications are outsourced, and schemes to control the access
to such resources sometimes fall short in the face of sloppy
users, malicious insiders, or system misconﬁgurations. More-
over, when this happens, users are often expected to provide
proof of the security violations. We therefore believe that
users may beneﬁt from additional tools that may allow them
to (1) articulate the desired security policy, (2) gain evidence
in case violations happen, and (3) choose a better provider
in case such violations are too frequent. We further expand
on these points below.
First, clients may beneﬁt from being able to express and
reason about the security of their data or services. Cloud
users often expect protection not only for their personal data
when it is stored or when it propagates, but also for their
service access information and usage habits. As of now there
is no agreed upon way to express constraints on how your
data should be handled by a provider (e.g., data lifetime,
redundancy schemes, usage and propagation in other coun-
Second, it is hard to prove that a policy violation has
actually occurred. Application monitoring and oﬄine anal-
ysis can be used for this purpose. Monitoring can be done
by the client, with the help of tools that can intercept and
examine relevant events in the cloud. To enforce informa-
tion ﬂow constraints, some solutions have already been sug-
gested: gateways or security proxies control the data ﬂows
both to and from service providers, and are already on the
market as mentioned before. Similarly, solutions like Cloud-
Filter  can intercept HTTP traﬃc, and ﬁlter out sensitive
data that had been labelled internally; with such labels and
contextual information, a security decision is made based
on a set of policies on data treatment. In all, these solu-
tions are better suited for those customers able to set up
their own policy compliance checking proxies, somewhere in
the cloud. But this is often not possible, nor straightfor-
ward, in which case may prefer to use third party services
to enact various type of security constraints. Such services
need not only consider regulatory requirements that apply
to each jurisdiction and domain (e.g., encryption key size,
anonymisation, etc), but also speciﬁc security requirements
of the clients, that can be measured at runtime.
Third, it may be useful for users to be able to compare
providers in terms of security assurance. This proposition is
currently challenging for a user who is in search of a provider,
and we feel that more research should concentrate in that di-
rection. Conversely, if a user uses two cloud providers at the
same time, such a comparison can be achieved using report-
ing tools similar to those used to measure accountability of
user actions and their resource consumption. Furthermore,
cloud providers currently oﬀer little or no proofs to their
customers that what they billed them was right; nor can the
users prove that they did, or did not, use more resources than
they should have. Some existing approaches [81, 83] suggest
relying on an external accountability monitors, but there are
still several challenges in that respect: performance, trust
model used, and privacy. In terms of performance, it is a
challenge to draw the line between how often to report ser-
vice activity so as not to clog the communication lines, and
how much information in the report is actually relevant for
later analysis. In terms of trust model, it is important to
determine to what extent the consumer and the provider
should trust each other in reporting truthfully. Tools for en-
suring timestamping and log tamper-resistance are already
in place. In terms of privacy, reporting should be suﬃcient
to detect faults and at the same time should not expose
private user data.
This paper surveys the tools and methods that cloud users
and service providers can employ to verify that cloud ser-
vices behave as expected. We focus on the veriﬁcation of
several properties: the identity of the service and of the
nodes the service runs on; functional correctness of a service;
SLA-imposed parameters like performance and dependabil-
ity; and lastly the compliance of the service with security
requirements as speciﬁed by a security policy. We discussed
state of the art in these areas and identiﬁed gaps and chal-
lenges, which explain the lack of suﬃcient tools for monitor-
ing and evaluation of cloud services. In each of these areas
we highlighted new and promising directions that we believe
to be instrumental in developing such tools in the future. We
hope that our paper will encourage future research in this
The authors would like to thank R¨
udiger Kapitza and the
other organisers of the Dagstuhl seminar 12281 ”Security
and Dependability for Federated Cloud Platforms” (July
2012), who have bolstered this collaboration.
 Web Services Description Language (WSDL) 1.1.
 SOAP Version 1.2 Part 1: Messaging Framework (Second
Edition). http://www.w3.org/TR/soap12-part1, 2007.
 Amazon EC2 SLA. https://aws.amazon.com/ec2- sla/,
 Amazon S3 SLA. https://aws.amazon.com/simpledb/,
 Rackspace SLA.
 Windows Azure Compute SLA.
 Windows Azure Storage SLA. https:
 D. Agarwal and S. K. Prasad. Azurebench: Benchmarking
the storage services of the azure cloud platform. In IPDPS
Workshops, pages 1048–1057. IEEE Computer Society,
 N. Alon, M. Krivelevich, I. Newman, and M. Szegedy.
Regular languages are testable with a constant number of
queries. In Proc. 40th IEEE Symposium on Foundations of
Computer Science, pages 645–655, 1999.
 American Express may have failed to encrypt data.
american-express- may-have- failed-to- encrypt-data/
 E. Anderson, X. Li, M. Shah, J. Tucek, and J. Wylie. What
consistency does your key-value store actually provide. In
Proceedings of the Sixth international conference on Hot
topics in system dependability, pages 1–16. USENIX
 G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner,
Z. Peterson, and D. Song. Provable data possession at
untrusted stores. In Proceedings of the 14th ACM
conference on Computer and communications security,
pages 598–609. ACM, 2007.
 C. B˘asescu, C. Cachin, I. Eyal, R. Haas, A. Sorniotti,
M. Vukoli´c, and I. Zachevsky. Robust data sharing with
key-value stores. In Proc. Intl. Conference on Dependable
Systems and Networks (DSN), June 2012.
 S. A. Baset. Cloud SLAs: present and future. SIGOPS
Oper. Syst. Rev., 46(2):57–66, July 2012.
 A. Bessani, M. Correia, B. Quaresma, F. Andr´e, and
P. Sousa. DepSky: Dependable and secure storage in a
cloud-of-clouds. In Proc. 6th European Conference on
Computer Systems (EuroSys), pages 31–46, 2011.
 S. Bleikertz, A. Kurmus, Z. A. Nagy, and M. Schunter.
Secure cloud maintenance – protecting workloads against
insider attacks. In ASIACCS ACM Symposium on
Information, Computer and Communications Security,
2012. to appear.
 K. Bowers, A. Juels, and A. Oprea. Hail: a high-availability
and integrity layer for cloud storage. In Proceedings of the
16th ACM conference on Computer and communications
security, pages 187–198. ACM, 2009.
 K. Bowers, A. Juels, and A. Oprea. Proofs of retrievability:
Theory and implementation. In Proceedings of the 2009
ACM workshop on Cloud computing security, pages 43–54.
 K. Bowers, M. van Dijk, A. Juels, A. Oprea, and R. Rivest.
How to tell if your cloud ﬁles are vulnerable to drive
crashes. In Proceedings of the 18th ACM conference on
Computer and communications security, pages 501–514.
 S. Butt, H. A. Lagar-Cavilla, A. Srivastava, and
V. Ganapathy. Self-service Cloud Computing. In CCS,
 C. Cachin and M. Geisler. Integrity protection for revision
control. In Applied Cryptography and Network Security,
pages 382–399. Springer, 2009.
 C. Cachin, I. Keidar, and A. Shraer. Fail-aware untrusted
storage. SIAM Journal on Computing, 40(2):493–533, 2011.
 C. Cachin and M. Schunter. A Cloud You Can Trust.
a-cloud- you-can- trust, 2011.
 C. Cadar, D. Dunbar, and D. Engler. Klee: unassisted and
automatic generation of high-coverage tests for complex
systems programs. In Proceedings of the 8th USENIX
conference on Operating systems design and
implementation, OSDI’08, pages 209–224, Berkeley, CA,
USA, 2008. USENIX Association.
 R. Cellan-Jones. The Sidekick Cloud Disaster.
 M. B. Chhetri, Q. B. Vo, and R. Kowalczyk. Policy-Based
Automation of SLA Establishment for Cloud Computing
Services. In The 2012 12th IEEE/ACM International
Symposium on Cluster, Cloud and Grid Computing
(CCGrid 2012), pages 164–171, Washington, DC, USA,
 V. Chipounov, V. Kuznetsov, and G. Candea. S2e: a
platform for in-vivo multi-path analysis of software
systems. In Proceedings of the sixteenth international
conference on Architectural support for programming
languages and operating systems, ASPLOS ’11, pages
265–278, New York, NY, USA, 2011. ACM.
 V. Chipounov, V. Kuznetsov, and G. Candea. The s2e
platform: Design, implementation, and applications. ACM
Trans. Comput. Syst., 30(1):2:1–2:49, Feb. 2012.
 H. Chockler and O. Kupferman. ω-regular languages are
testable with a constant number of queries. Theor.
Comput. Sci., 329(1-3):71–92, 2004.
 B. Chun, C. Curino, R. Sears, A. Shraer, S. Madden, and
R. Ramakrishnan. Mobius: uniﬁed messaging and data
serving for mobile apps. In Proceedings of the 10th
international conference on Mobile systems, applications,
and services, pages 141–154. ACM, 2012.
 B. Chun, P. Maniatis, S. Shenker, and J. Kubiatowicz.
Attested append-only memory: Making adversaries stick to
their word. ACM SIGOPS Operating Systems Review,
 Cloud Security Alliance. CSA - Security, Trust, and
 CNN Money. Amazon EC2 outage downs Reddit, Quora.
 P. S. S. Council. PCI Data Security Standard, v2. https:
 R. DeVries. RichardDeVries’s Journal: How Google handles
a bug report.
 A. Feldman, W. Zeller, M. Freedman, and E. Felten. Sporc:
Group collaboration using untrusted cloud resources.
OSDI, Oct, 2010.
 A. Ferdowsi. The Dropbox blog: Yesterday’s Authentication
Bug. https://blog.dropbox.com/?p=821, 2011.
 R. T. Fielding. Chapter 5: Representational State Transfer
(REST). University of California, Irvine, 2000. Ph.D.
 D. G. Gordon and T. D. Breaux. Managing
multi-jurisdictional requirements in the cloud: towards a
computational legal landscape. In Proceedings of the 3rd
ACM workshop on Cloud computing security workshop,
CCSW ’11, pages 83–94, New York, NY, USA, 2011. ACM.
 T. C. Group. TPM Main Speciﬁcation Level 2 Version 1.2,
Revision 130, 2006.
 J. Guitart, J. Torres, and E. Ayguad´e. A survey on
performance management for internet applications.
Concurr. Comput. : Pract. Exper., 22(1):68–106, 2010.
 M. Herlihy and J. Wing. Linearizability: A correctness
condition for concurrent objects. ACM Transactions on
Programming Languages and Systems (TOPLAS),
 J. Idziorek and M. Tannian. Exploiting cloud utility models
for proﬁt and ruin. 2012 IEEE Fifth International
Conference on Cloud Computing, 0:33–40, 2011.
 P. Institute. Security of Cloud Computing Providers Study.
security-of- cloud-computing- providers-final- april-2011.
 M. Jaatun, K. Bernsmed, and A. Undheim. Security SLAs -
An Idea Whose Time Has Come? In Multidisciplinary
Research and Practice for Information Systems, volume
7465 of Lecture Notes in Computer Science, pages 123–130.
Springer Berlin Heidelberg, 2012.
 A. Juels and B. S. Kaliski, Jr. Pors: proofs of retrievability
for large ﬁles. In Proceedings of the 14th ACM conference
on Computer and communications security, CCS ’07, pages
584–597. ACM, 2007.
 J. C. King. A new approach to program testing. In
Proceedings of the international conference on Reliable
software, pages 228–233, New York, NY, USA, 1975. ACM.
 M. Krigsman. Intuit: Pain and Pleasure in the Cloud.
intuit-pain- and-pleasure- in-the- cloud/14880, 2011.
 J. C. Laprie. Dependable Computing and Fault-Tolerance:
Concepts and Terminology. In IEEE Int. Symp. on
Fault-Tolerant Computing (FTCS), 1985.
 A. Lenk, M. Menzel, J. Lipsky, S. Tai, and P. Oﬀermann.
What are you paying for? performance benchmarking for
infrastructure-as-a-service oﬀerings. In L. Liu and
M. Parashar, editors, IEEE CLOUD, pages 484–491. IEEE,
 J. Li, M. N. Krohn, D. Mazi`eres, and D. Shasha. Secure
untrusted data repository (sundr). In OSDI, pages 121–136,
 H. Liu. A new form of dos attack in a cloud and its
avoidance mechanism. In Proceedings of the 2010 ACM
workshop on Cloud computing security workshop, CCSW
’10, pages 65–76, New York, NY, USA, 2010. ACM.
 C. Loosley, F. Douglas, and A. Mimo. High-Performance
Client/Server. John Wiley & Sons, Nov. 1997.
 M. Macias and J. Guitart. Client Classiﬁcation Policies for
SLA Enforcement in Shared Cloud Datacenters. In The
2012 12th IEEE/ACM International Symposium on
Cluster, Cloud and Grid Computing (CCGrid 2012), pages
156–163, Washington, DC, USA, 2012.
 L. Malrait, S. Bouchenak, and N. Marchand. Experience
with ConSer: A System for Server Control Through Fluid
Modeling. IEEE Transactions on Computers,
 P. Maniatis, D. Akhawe, K. Fall, E. Shi, S. McCamant, and
D. Song. Do You Know Where Your Data Are? Secure
Data Capsules for Deployable Data Protection. In HotOS,
 E. Marcus and H. Stern. Blueprints for High Availability.
Wiley, Sept. 2003.
 D. Mazi`eres and D. Shasha. Building secure ﬁle systems out
of byzantine storage. In Proceedings of the twenty-ﬁrst
annual symposium on Principles of distributed computing,
pages 108–117. ACM, 2002.
 P. H. Meland, K. Bernsmed, M. G. Jaatun, A. Undheim,
and H. Castejon. Expressing cloud security requirements in
deontic contract languages. In CLOSER, pages 638–646.
 D. A. Menasc´e and V. A. F. Almeida. Capacity Planning
for Web Services: Metrics, Models, and Methods. Prentice
 R. Merkle. Protocols for public key cryptosystems. In IEEE
Symposium on Security and privacy, volume 1109, pages
 B. Monahan and M. Yearworth. Meaningful Security SLAs.
HPL-2005- 218R1.html, 2008.
 A. Oprea and M. Reiter. On consistency of encrypted ﬁles.
Distributed Computing, pages 254–268, 2006.
 I. Papagiannis and P. Pietzuch. Cloudﬁlter: practical
control of sensitive data propagation to the cloud. In
Proceedings of the 2012 ACM Workshop on Cloud
computing security workshop, CCSW ’12, pages 97–102,
New York, NY, USA, 2012. ACM.
 R. Patton. Software Testing. SAMS Publishing, second
 S. Pearson and A. Benameur. Privacy, security and trust
issues arising from cloud computing. Cloud Computing
Technology and Science, IEEE International Conference
on, 0:693–702, 2010.
 M. Pezze and M. Young. Software Testing and Analysis:
Process, Principles and Techniques. Wiley, 2007.
 F. Rocha and M. Correia. Lucy in the sky without
diamonds: Stealing conﬁdential data in the cloud. In
Proceedings of the 2011 IEEE/IFIP 41st International
Conference on Dependable Systems and Networks
Workshops, DSNW ’11, pages 129–134, Washington, DC,
USA, 2011. IEEE Computer Society.
 A. Sangroya, D. Serrano, and S. Bouchenak. Benchmarking
Dependability of MapReduce Systems. In The 31st IEEE
International Symposium on Reliable Distributed Systems
(SRDS 2012), 2012.
 N. Santos, K. P. Gummadi, and R. Rodrigues. Towards
Trusted Cloud Computing. In HotCloud, 2009.
 N. Santos, R. Rodrigues, K. Gummadi, and S. Saroiu.
Policy-Sealed Data: A New Abstraction For Building
Trusted Cloud Services. In USENIX Security, 2012.
 J. Schiﬀman, T. Moyer, H. Vijayakumar, T. Jaeger, and
P. McDaniel. Seeding Clouds with Trust Anchors. In
 V. Sekar and P. Maniatis. Veriﬁable resource accounting for
cloud computing services. In Proceedings of the 3rd ACM
workshop on Cloud computing security workshop, CCSW
’11, pages 21–26, New York, NY, USA, 2011. ACM.
 SensePost Blog, DEF CON 17 Conference. Clobbering the
Cloud, 2009. http://www.sensepost.com/blog/3706.html.
 H. Shacham and B. Waters. Compact proofs of
retrievability. Advances in Cryptology-ASIACRYPT 2008,
pages 90–107, 2008.
 A. Shraer, C. Cachin, A. Cidon, I. Keidar, Y. Michalevsky,
and D. Shaket. Venus: Veriﬁcation for untrusted cloud
storage. In Proceedings of the 2010 ACM workshop on
Cloud computing security workshop, pages 19–30. ACM,
 E. G. Sirer, W. de Bruijn, P. Reynolds, A. Shieh, K. Walsh,
D. Williams, and F. B. Schneider. Logical Attestation: An
Authorization Architecture for Trustworthy Computing. In
 The Guardian. PlayStation Network hack: why it took
Sony seven days to tell the world.
apr/27/playstation-network- hack-sony, 2011.
 United States Congress. Health Insurance Portability Act.
 M. van Dijk, A. Juels, A. Oprea, R. Rivest, E. Stefanov,
and N. Triandopoulos. Hourglass schemes: how to prove
that cloud ﬁles are encrypted. In Proceedings of the 2012
ACM conference on Computer and communications
security, pages 265–280. ACM, 2012.
 C. Wang and Y. Zhou. A collaborative monitoring
mechanism for making a multitenant platform accountable.
In Proceedings of the 2nd USENIX conference on Hot
topics in cloud computing, HotCloud’10, pages 18–18,
Berkeley, CA, USA, 2010. USENIX Association.
 G. Watson, R. Safavi-Naini, M. Alimomeni, M. Locasto,
and S. Narayan. Lost: location based storage. In
Proceedings of the 2012 ACM Workshop on Cloud
computing security workshop, pages 59–70. ACM, 2012.
 J. Yao, S. Chen, C. Wang, D. Levy, and J. Zic.
Accountability as a Service for the Cloud. Services
Computing, IEEE International Conference on, 0:81–88,
 A. Yumerefendi and J. Chase. Strong accountability for
network storage. ACM Transactions on Storage (TOS),
 K. Zellag and B. Kemme. How consistent is your cloud
application? In Proceedings of the Third ACM Symposium
on Cloud Computing, page 6. ACM, 2012.
 F. Zhang, J. Chen, H. Chen, and B. Zang. CloudVisor:
Retroﬁtting Protection of Virtual Machines in Multi-tenant
Cloud with Nested Virtualization. In SOSP, 2011.
 F. Zhou, M. Goel, P. Desnoyers, and R. Sundaram.
Scheduler vulnerabilities and attacks in cloud computing.
IEEE International Symposium on Networking Computing
and Applications, 2011.