An Adaptive Framework for Tunable Consistency and Timeliness Using Replication
ABSTRACT One of the well-known challenges in using replication to service multiple clients concurrently is that of delivering a timely and consistent response to the clients. In this paper, we address this problem in the context of client applications that have specific temporal and consistency requirements. These applications can tolerate a certain degree of relaxed consistency, in exchange for better response time. We propose a flexible QoS model that allows these clients to specify their temporal and consistency constraints. In order to select replicas to serve these clients, we need to control the inconsistency of the replicas, so that we have a large enough pool of replicas with the appropriate state to meet a client's timeliness, consistency, and dependability requirements. We describe an adaptive framework that uses lazy update propagation to control the replica inconsistency and employs a probabilistic approach to select replicas dynamically to service a client, based on its QoS specification. The probabilistic approach predicts the ability of a replica to meet a client's QoS specification by using the performance history collected by monitoring the replicas at runtime. We conclude with experimental results based on our implementation. 1.
-
Citations (0)
-
Cited In (0)
Page 1
An Adaptive Framework for Tunable Consistency and Timeliness
Using Replication∗
Sudha Krishnamurthy, William H. Sanders
Coordinated Science Laboratory,
Dept. of Computer Science, and
Dept. of Electrical & Computer Engineering
University of Illinois at Urbana-Champaign
Michel Cukier
Center for Reliability Engineering
Dept. of Materials & Nuclear Engineering
University of Maryland, College Park
E-mail: {krishnam,whs}@crhc.uiuc.edu, mcukier@eng.umd.edu
Abstract
One of the well-known challenges in using replication to ser-
vice multiple clients concurrently is that of delivering a timely and
consistent response to the clients. In this paper, we address this
problem in the context of client applications that have specific tem-
poral and consistency requirements. These applications can toler-
ate a certain degree of relaxed consistency, in exchange for better
response time. We propose a flexible QoS model that allows these
clients to specify their temporal and consistency constraints. In
order to select replicas to serve these clients, we need to control
the inconsistency of the replicas, so that we have a large enough
pool of replicas with the appropriate state to meet a client’s timeli-
ness, consistency, and dependability requirements. We describe an
adaptive framework that uses lazy update propagation to control
the replica inconsistency and employs a probabilistic approach to
select replicas dynamically to service a client, based on its QoS
specification. The probabilistic approach predicts the ability of a
replica to meet a client’s QoS specification by using the perfor-
mance history collected by monitoring the replicas at runtime. We
conclude with experimental results based on our implementation.
1. Introduction
Replicatingdistributedservicesenablesustoservicemul-
tiple clients concurrently, and deliver good response times,
by selecting different replicas to service different clients.
However, since concurrency has the potential to introduce
replica inconsistency, one of the challenges in replicating
distributed services is the problem of supporting concur-
rent client operationswhile ensuringthat the replicatedstate
does not diverge in an unacceptable manner. Traditional
replica consistency models provide a binary view of consis-
tency: strong consistency with immediate convergence, or
weak consistency with eventual convergence. Both of these
∗This research has been supported by DARPA contract F30602-98-C-
0187.
consistency models have been studied extensively. In the
strong consistency model, concurrent operations on repli-
cated data are equivalent to a sequential execution on non-
replicated data. Pessimistic replication algorithms, such as
active and passive replication (e.g., [1, 11, 8, 12]), have tra-
ditionally been used to maintain strong consistency among
replicated data. Although these algorithms, which provide
single-copy semantics, ensure correctness for a wide class
of applications(e.g.,bankingtransactions), the performance
overheadsincurredinmaintainingmutuallyconsistentrepli-
cas may be unreasonably high for clients that do not require
strongconsistency. Further, strongconsistencymay not be a
viable option in environments in which some of the replicas
runon hosts and links that either are inherentlyslow, or tend
to become slow due to transient overloads and failures.
On the other hand, in the weak consistency model, op-
erations are performed on some subset of replicas, and the
updates are propagated to the other replicas either lazily or
on demand. Typically, the only guarantee provided to the
clients is that the replicated state will eventually converge,
if update activity ceases. Several optimistic replication al-
gorithms (e.g., [2, 9]) have been proposed for applications
that can tolerate relaxed consistency. These algorithms al-
low a client to access any replica in order to provide bet-
ter responsiveness, unlike the pessimistic algorithms, which
allow access to only those servers that have the most up-
to-date state. However, if the clients access different servers
beforetheir states converge,the resulting inconsistencymay
lead to conflicts.
In this work, our goal is to support applications that have
specific time constraints. These applications can relax their
consistencyrequirementsinexchangeforimprovingtheprob-
ability that their responsetime constraintscan be met. How-
ever, in order for the response to be meaningful, they need
some bounds on the inconsistency in the response they re-
ceive. Examples of applications that benefit from relaxed
but bounded inconsistency in exchange for timeliness in-
Page 2
clude real-time database applications, such as online stock-
trading and traffic-monitoring applications. In order to sup-
port such applications that have specific temporal and con-
sistency requirements effectively, we use an approach that
allows the users to express their timeliness and consistency
requirementsas a quality-of-service(QoS) specification. To
study the trade-offs between timeliness and consistency, we
propose an adaptive middleware framework that allows us
to explore the intermediate space between the above binary
views of consistency. We have implemented this framework
in AQuA, a CORBA-based middleware that supports trans-
parent replication of objects across a LAN [11].
We now list the main contributions of this paper. In Sec-
tion 2, we propose a QoS model that allows a broad spec-
trum of applications to express their timeliness and consis-
tency requirements. In Section 3 we describe our frame-
work that allows us to build protocols for providing differ-
ent consistency guarantees. These protocols use a combi-
nation of immediate and lazy update propagation to ensure
that the states of the replicas do not diverge in an unac-
ceptable manner. As a proof-of-concept, in Section 4, we
describe the protocol we have implemented to maintain se-
quential consistency among the replicas. In Section 5 we
describe a probabilistic approach that allows a middleware
to dynamically select replicas to service the clients based
on the QoS specification of the clients. Similar to the work
we presented in [5], this approach uses the performancehis-
tory of replicas obtained by online performance monitoring
to predict a replica’s ability to meet a client’s QoS speci-
fication. However, while our previous work assumed that
the replicas were stateless, our current model addresses this
selection problem in the context of replicas with state. In
Section 6, we present a few experimental results based on
our implementation.
2. QoS Model for Timeliness and Consistency
Several other researchers have extended traditional con-
sistency models by incorporatingthe notion of time in order
to bound the degree of inconsistency. For example, the no-
tion of epsilon-serializability (defined in [10]), and timed
consistency models (defined in [13, 6]), require that if a
write is executedat time t, then the effectofthe write should
be visible to others by t + x, where x is the maximum ac-
ceptable delay for propagating the effect of the write. The
TACT middleware[15] is anotherrelated workthat attempts
to provide a middleware framework for tunable consistency
and availability. The consistency measures used by TACT
to bound the level of inconsistency include the order er-
ror, which limits the number of tentative writes that can
be outstanding at any replica; the numerical error, which
bounds the difference between the value delivered to the
client and the most consistent value; and staleness, which
places a real-time bound on the delay of propagating the
writes among the replicas. However, while these models
provide a way to quantify consistency, they do not address
the problem of tuning consistency requirements in the pres-
ence of specific transaction deadlines or response time con-
straints. We now describe our QoS model that allows the
clients to express their consistency and response time re-
quirements.
Our request model enables a middleware to distinguish
invocations that modify the state of the object they invoke
from those that merely retrieve the state. To do this, a client
application has to explicitly specify all the read-only meth-
ods it invokes on an object by their names. If an operation
is not specified as read-only, then our middleware considers
it to be an update operation. An update operation is any in-
vocation that modifies the state of the object on which the
operation is performed, and may be either a write-only op-
eration or a read-write operation.
OurQoSmodelregardsconsistencyasatwo-dimensional
attribute: <ordering guarantee, staleness threshold>. The
ordering guarantee is a service-specific attribute that de-
notes the guarantee provided by a service to all of its clients
about the order in which their requests will be processed by
the servers, so as to prevent conflicts between operations.
Some well-knownorderingguarantees that a service can of-
fer are sequential (or total), causal, and FIFO [1]. In this
paper, we target services that provide sequential ordering
guarantees.
The staleness threshold, which is specified by the client,
is a measure of the maximum degree of staleness a client is
willing to tolerate in the response it receives. In our frame-
work, the staleness of a response denotes the staleness of
the state of the replica that sent the response. We compute
the staleness of a replica by associating a timestamp with
eachupdateoperation. We usetimestampsbasedon“logical
clocks” [7] because this obviates the need for synchronized
clocks across the distributed replicas. These logical times-
tamps make it possible to specify the staleness in terms of
“versions.” A replica whose staleness is x has a state that
has not yet been updated to reflect the modifications en-
suing from the most recent x updates. The replica’s state,
however,reflects the modificationsof all updatescommitted
prior to that. In order to meet a client’s QoS specification, a
response deliveredto the client should be no more stale than
the staleness threshold specified by the client.
The timeliness specification includes a pair of attributes:
<response time, probability of timely response>. This pair
specifies the time by which a client expects a response after
it has transmitted its request, and the probability with which
it expects its temporal constraint to be met. Failure to meet
a client’s deadline results in a timing failure for the client.
In our QoS model, the timeliness attribute is applicable only
for read-only requests and not for update operations.
As an example of the use of the above QoS model, con-
Page 3
Primary Replication Group
Leader
Secondary Replication Group
QoSGroup
Client1
Client2
Client3
Service S
Service S
Figure 1. Replica organization
sideradocument-sharingapplicationinwhichmultipleread-
ers and writers concurrently access a document that is up-
dated in sequential mode. Using the above model, a client
of such an application can specify that he wishes to obtain
a copy of the document that is not more than 5 versions old
within 2.0 seconds with a probability of at least 0.7.
3. Framework for Tunable Consistency
Given the aboveQoS model, our goal is to build a frame-
work that can be tuned to support the different application-
specific requirements at the middleware layer. In order to
design this framework, we address three main issues: 1) or-
ganization of the replicas, 2) development of the protocols
that implement different consistency semantics and design
of an infrastructure that would allow these protocols to be
used on demand, and 3) development of a mechanism to
select replicas dynamically to service a client based on the
client’s QoS requirements. We now describe the approach
wehaveusedtoaddresstheseissues inthecontextofAQuA.
All the replicas offering the same service are organized
into two groups: a primary replication group and a sec-
ondary replication group, as shown in Figure 1. We also
haveaQoSgroup,whichencompassesallofthereplicasofa
service and their clients. In our implementation, all of these
groups are derived from Maestro groups [14], and members
of a group communicate with each other by making use of
the Maestro-Ensemble group communication protocol [3].
For each group, Ensemble elects one of the members of the
group as the leader. However, only the leader of the pri-
mary groupis relevantto this work. We dependon Maestro-
Ensemble to provide reliable, virtual synchrony, and FIFO
messaging guarantees, and we build upon these guarantees
to provide the different end-to-end consistency guarantees.
We also depend on Maestro-Ensemble to inform the group
members when changes in the group membership occur.
The primary group is used to implement strong consis-
tency semantics, whereas the secondary group implements
weaker consistency semantics. The size of these groups can
be tuned to implement a range of consistency semantics.
The above two-level replica organization was motivated by
the need to favor the operations that can tolerate relaxed
consistency to a certain degree in exchange for a timely re-
sponse. We reduce the overheads incurred by a write-all
scheme, such as an active replication scheme, by perform-
ing the updates on the smaller primary group, while allow-
ing the secondary replicas, which are greater in number, to
handle the read-only operations. The primary replicas sub-
sequentlybringthestateofthesecondaryreplicasup-to-date
using lazy update propagation. The degree of divergence
between the states of primary and secondary replicas can be
bounded by choosing an appropriate frequency for the lazy
update propagation. Thus, while clients that need the most
up-to-datestate to be reflected in their response may have to
dependmore on the response from a primary replica, clients
that are willing to tolerate a certain degree of staleness in
their response can achieve better response times, due to the
higher availability of the secondary replicas.
4. Tunable Consistency Protocols
In Section 2, we mentioned that to maintain replica con-
sistency, the replicas should serve their clients by respect-
ing the ordering guarantee associated with the service. Our
framework allows different ordering guarantees to be im-
plemented as timed consistency handlers in the AQuA gate-
way, as shown in Figure 2. A client can communicate with
a replicated service by using the gateway handler appropri-
ate for the service. For example, Figure 2 shows a client
communicating with two different services. Service A is
an example of an application, such as a document-editing
application, that guarantees sequential consistency using to-
tal ordering. Service B represents an application, such as
a banking transaction, that guarantees FIFO ordering. The
client uses the sequential consistency handler to communi-
cate with service A, while it uses the FIFO handler to com-
municate with service B. In this paper, we will describe the
sequential consistency handler we have implemented in the
AQuA middleware. The protocol processing in the handler
is dividedinto a client-side protocol and a server-side proto-
col. In this section we will describe the processing involved
ontheserversideinordertomaintainsequentialconsistency
across the two groups of replicas, and in the next section we
will describehowthe client-sideprotocoluses thesereplicas
to meet the client’s QoS specification.
4.1. Sequential Consistency Protocol
In our sequential consistency model, the update requests
of the clients are executed by all of the primary replicas in
the same order. The secondary replicas do not directly ser-
vice a client’s update request. Instead, the secondary repli-
cas update their state when one of the members of the pri-
mary group lazily propagates its updated state to the sec-
ondarygroup. We callthismemberthelazypublisher. Thus,
althoughthereplicasmayupdatetheirstateatdifferentpoints
Page 4
Maestro/Ensemble
LAN
Gateway
W
S
G
Server Gateway
W
S
Server Gateway
Server B
Server A
Client
TOTAL
FIFO
FIFO
TOTAL
Maestro/Ensemble
Maestro/Ensemble
Maestro/Ensemble
Gateway Handlers
Figure 2. Timed consistency handlers in
AQuA
in time, they all see the effects of the updates in the same se-
quential order. The order in which an update is committed
by the replicas is determined by its Global Sequence Num-
ber (GSN), which is assigned by the leader of the primary
group. The leader merely serves as the sequencer and does
not actually service the client’s request.
We now describe how the updates and read-onlyrequests
are processed by the replicas. The processing depends on
whether the replica is a primary or secondary replica. All
of this processing is done at the middleware layer, within
the gateway handler of the replicas. Each gateway handler
maintains a pair of variables, my GSN and my CSN, which
are used by the protocol to provide sequential consistency.
my GSN is the replica’s local view of the current GSN, and
my CSN is the replica’s commit sequence number (CSN),
which indicates the GSN of the most recent update commit-
ted by the replica. The commit sequence number increases
strictly in monotonicorder, and a replica is assumed to have
committed every update whose global sequence number is
less than or equal to the value of its my CSN. Our protocol
ensures that the consistency guarantees are preserved even
when replica failures occur. This is done by handling the
failures of the sequencer and the lazy publisher, which play
a crucial role in providingsequential consistencysemantics.
However, we omit the details of the failure handling in this
paper due to the space constraint.
4.1.1. Update Operations. The update operations are sent
toallmembersoftheprimarygroup,includingthesequencer.
Whenthesequencerreceivesanupdaterequestfromaclient,
it advancestheGSN andbroadcaststhe GSN assignmentfor
the request to all the other members of the primary group.
A non-leader primary replica can service an update re-
questimmediately,providedit hasalreadyreceivedthe GSN
broadcast for that request from the sequencer. Otherwise,
the replica stores the request in a buffer and processes it
upon receiving the GSN assignment from the sequencer.
If the update request is in sequential order, the replica ad-
vances its CSN, and then delivers the update request to the
server application. If, however, the request is out of the
globalorder,thereplicabufferstherequestandcommitsit at
a later time, after the intermediate requests have been com-
mitted.
4.1.2. Read-Only Operations. In our sequential consis-
tency model, a read-only request is sent to the sequencer
and a subset of the primary and secondary replicas. Dif-
ferent replicas may service different sets of read-only re-
quests. When the sequencer receives a read-only request,
theleaderbroadcaststhecurrentvalueof theGSN to the pri-
mary and secondary replicas, without advancing the GSN.
When a non-leader primary or a secondary replica receives
a read-only request from a client, it buffers the request un-
til it receives the GSN assignment for the request from the
sequencer. The replicas use this GSN to measure the stal-
eness of their state. To determine its staleness, the replica
first sets its value of my GSN to the value of the GSN broad-
cast by the sequencer. The replica then computes the value
of (my GSN - my CSN). This value is a measure of how
stale the state of the replica is. If the replica’s state is less
stale than the threshold specified by the client in its QoS
specification, the replica can service the client’s request im-
mediately. However, a secondary replica may have a state
that is more stale than the staleness threshold specified by
the client, because the secondary replicas update their state
only when they receive the state update from the lazy pub-
lisher. In that case, the replica performs a deferred read by
buffering the read request and responding to the client im-
mediately after receiving the next state update from the lazy
publisher.
5. Probabilistic Model-Based Replica Selection
Having described the protocol processing in the server-
side gateway handler, we now describe the processing per-
formed in the client-side handler to meet the QoS specifi-
cation of the client. Each client expresses its constraints in
the form of a QoS specification that includes the response
time constraint, d; the minimum probability of meeting this
constraint, Pc(d); and the maximum staleness, a, that it can
tolerate in its response. If a response fails to meet the dead-
line constraint of the client, then it results in a timing failure
fortheclient. Hence,oneoftheimportantresponsibilitiesof
the client handler is to select an appropriate subset of repli-
cas to service the clients, and reduce the occurrence of such
timing failures.
A simple approach would be to allocate all the available
replicas to service a single client. However, such an ap-
proach is not scalable, as it increases the load on all the
replicas and results in higher response times for the remain-
ing clients. On the other hand, assigning a single replica to
service each client allows us to service multiple clients con-
currently. However, should a replica fail while servicing a
request, the failure could result in an unacceptable delay for
the client being serviced. Hence, neither approach is suit-
able when a client has specific timing constraints and when
Page 5
failure to meet the constraints results in a penalty for the
client. Therefore, we need a method that attempts to pre-
vent the occurrence of such timing failures for a client by
selecting replicas from the available replica pool, based on
an understanding of the client’s QoS requirements and the
responsiveness of the replicas.
In our model, the constraints specified by a client apply
only for the read transactions invoked by the client. For an
update transaction, the only constraint that applies is that
it has to be committed by the replicas in a manner that re-
spects the ordering guarantee associated with the service.
Hence, our selection algorithm handles an update request of
a client by simply multicasting the request to all the primary
replicas. The handler on the server side takes care of com-
mitting these updates in the appropriate order, as described
in Section 4 for the sequential ordering case. For the read-
only requests, the selection algorithm has to choose from
among the primary and secondary replicas based on their
ability to meet the client’s temporal requirements, as well
as on whether the state of the replica is within the staleness
threshold specified by the client. However, the uncertainty
in the environmentand in the availability of the replicas due
to transient overload and failures makes it impossible for a
client to know with certainty if a set of replicas can meet
its deadline. Further, a client can be certain that the state of
the primaryreplicas is always up-to-date,because of the im-
mediate update propagation. However, it cannot make such
guarantees about the state of the secondary replicas, which
update their state only when they receive the lazy update
propagated by the lazy publisher.
Hence, our selection approachmakes use of probabilistic
models to estimate a replica’s staleness and to predict the
probability that the replica will be able to meet the client’s
deadline. These models make their prediction based on in-
formation gathered by monitoring the replicas at runtime.
A selection algorithm then uses this online prediction to
choosea subsetofreplicasthatcantogethermeettheclient’s
timing constraints with at least the probability requested by
the client. We will now describe our probabilistic models
and replica selection algorithm. They enhance the selection
approach we presented in [5], which made the assumption
that the replicas were stateless. We first define the notation
we use to explain our model.
Let t denote the time at which a request is transmitted.
Since replicas are selected at the time a request is transmit-
ted, we also use t to denote the time at which the replica se-
lection is done. Let Ribe the random variable that denotes
the response time of replica i. Let Ai(t) denote the stale-
ness of the state of replica i at time t, and P(Ai(t) ≤ a) be
the probability that the state of replica i at time t is within
the staleness threshold, a, specified by the client. We call
this the staleness factor for replica i. Let P(Ri≤ d) be the
probabilitythat a response fromreplica i will be receivedby
the client within the client’s deadline, d, and PK(d) be the
probability that at least one response from the set K, con-
sisting of k > 0 replicas, will arrive by the client’s deadline,
d. The probability that a replica can meet the client’s time
constraint, d, and thereby prevent a timing failure, depends
on whether the replica is functioningand has a state that can
satisfythe client-specifiedstaleness threshold. We can make
use of these individual probabilities to choose a subset K of
replicas such that PK(d) ≥ Pc(d). The replicas in the set K
will then form the final set selected to service the request.
5.1. Modeling the Response Time Distribution
We now derive the expression for PK(d), which is the
probabilitythat at least one response from the replicas in set
K arrives by the client’s deadline, d, and thereby avoids the
occurrence of a timing failure. The set K is made up of a
subset Kpof primary replicas and a subset Ksof secondary
replicas (i.e., K = Kp∪ Ks). While each replica in K
processes the client’s request and returns its response, only
the first response received for a request is delivered to the
client. Hence, a timing failure occurs only if no response is
receivedfromanyofthereplicasintheselectedset K within
d time units after the requestwas transmitted. Therefore,we
have
PK(d) = 1 − P(no replica i ∈ K ? Ri≤ d)
In our work, we assume that the response times of the repli-
cas are independent, because they process their requests in-
dependently. While this assumption may not be strictly true
in some cases (e.g., if the network delays are correlated), it
does result in a model that is fast enough to be solved on-
line, especially for the time-sensitive applications we target
inourwork. Furthermore,theresultswepresentinSection6
show that the resulting model makes reasonably good pre-
dictions for the scenarios we have considered. Thus, using
the independence assumption, we obtain
PK(d) = 1 − P(no i ∈ Kp? Ri≤ d) · P( no j ∈ Ks? Rj≤ d)
(1)
5.1.1. Case 1: Primary Replicas. In Section 4, we men-
tioned that the update requests of the clients are propagated
to the primary group immediately. Hence, for a primary
replica i, the staleness factor P(Ai(t) ≤ a) = 1, and the
replicaalwayshasastatethatcansatisfythestalenessthresh-
old of the client. Therefore, in the case of the primary repli-
cas, we have
P(no i ∈ Kp? Ri≤ d) =
?
i∈Kp
P(Ri> d) =
?
i∈Kp
(1 − FI
Ri(d))
(2)
where FI
for replica i, given that it can respond immediately to a read
request without waiting for a state update.
Ridenotes the response time distribution function