Content uploaded by Jan Lindström
Author content
All content in this area was uploaded by Jan Lindström on Nov 18, 2014
Content may be subject to copyright.
MULTIVERSION REPEATABLE READ ISOLATION LEVEL – THEORY AND
PRACTICE
Jan Lindstr¨
om
Solid, an IBM Company
It¨
alahdenkatu 22 B
00210 Helsinki, Finland
jan.lindstrom@solidtech.com
ABSTRACT
Concurrency control is the activity of synchronizing
database operations by concurrently executing transactions
on a shared database. We examine the problem of concur-
rency control when the database supports multiple versions
of the data. Multiversion concurrency control is used in
order to improve the level of achievable concurrency. The
goal is to produce an execution that has the same effect as
a serial one. We use the multiversion concurrency control
theory to analyze histories produced by multiversion con-
currency control methods. We show that using traditional
repeatable read isolation level is inadequate and provide a
new isolation level definition for multiversion repeatable
read. We show that the new isolation level captures the
essence of repeatable read isolation level in multiversion-
ing systems.
KEY WORDS
Concurrency Control, Algorithms, Database theory.
1 Introduction
Adatabase system (DBS) is a process that executes read
and write operations on data items of a database. A trans-
action is a program that issues reads and writes to a DBS.
If transactions execute concurrently, the interleaved execu-
tion of their reads and writes by the DBS can produce un-
desirable results. Thus, the basic problem in concurrency
control is to maintain the consistency of a database updated
by interleaved transactions [11]. Specifically, the goal of
concurrency control is to produce an execution that has the
same effect as a serial one. Although we hope it would
have better performance time characteristics as a serial one.
Such executions are called serializable [6]. A DBS secures
a serializable execution by controlling the order in which
reads and writes are executed [14]. The methods for se-
curing serializability are diverse. Some methods monitor
the execution requests in order to secure a meaningful sub-
set of serializability [14] or insert lock-unlock steps in the
transactions [8, 23, 17].
In a multiversion DBS each write on a data item pro-
duces a new version of the data item. For each read of the
data item the DBS selects one of the versions of data item
to be read. Because writes do not overwrite each other and
reads can read any version the DBS has more alternatives
controlling the order of reads and writes. Multiversion con-
currency control (MVCC) is described in some detail in
sections 4.3 and 5.5 of in [5]. this paper cites a 1978 dis-
sertation by D.P. Reed [16] which describes MVCC and
claims it as an original work. Many interesting concur-
rency control algorithms using multiversioning have been
proposed (e.g. [3, 7, 18, 13, 1, 2, 10, 12, 15, 20]).
MVCC is particularly adept at implementing true
snapshot isolation [4], something which other methods of
concurrency control frequently do either incompletely or
with high performance cost. Snapshot isolation has been
adopted by several major database management systems,
such as SQL Anywhere, InterBase, Firebird, Oracle, Post-
greSQL, Solid and Microsoft SQL Server. The main reason
for its adoption is that it allows better performance than se-
rializability, yet still avoids the kind of concurrency anoma-
lies that cannot easily be worked around. Snapshot isola-
tion has also been used [4] in criticism of the ANSI SQL-92
standard’s definition of isolation levels, as it exhibits none
of the ”anomalies” that the SQL standard prohibited, yet
snapshot isolation is not serializable (the anomaly-free iso-
lation level defined by ANSI).
Unfortunately, the ANSI SQL-92 standard was writ-
ten with a lock-based database in mind, and hence is rather
vague when applied to MVCC systems [4, 9]. As a con-
crete example, imagine a bank storing two balances, Xand
Y, for accounts held by a single person, Phil. The bank
will allow Xor Yto run a deficit, provided that the total
held in both is never negative (i.e., X+Y≥0must hold).
Suppose both Xand Ystart at $100. Now imagine Phil ini-
tiates two transactions concurrently, T1withdrawing $200
from X, and T2withdrawing $200 from Y.
If the database guaranteed serializable transactions,
the simplest way of coding T1is to deduct $200 from X,
and then verify that X+Y≥0still holds, aborting if
not. T2similarly deducts $200 from Yand then verifies
X+Y≥0. Since the transactions must serialize, either
T1happens first, leaving X=−$100, Y = $100, and pre-
venting T2from succeeding or T2happens first and simi-
larly prevents T1from succeeding.
Under snapshot isolation, however, both T1and T2
can operate on private snapshots of the database: each
deducts $200 from an account, and then verifies that the
605-071
189
Proceedings of the Fourth IASTED International Conference
April 2-4, 2008 Langkawi, Malaysia
Advances in Computer Science and Technology (ACST 2008)
ISBN CD: 978-0-88986-730-7
new total is zero, using the other account value that held
when the snapshot was taken. Since neither update con-
flicts, both commit successfully, leaving X=Y=
−$100, X +Y=−$200. This non-serializable anomaly is
known as write skew [9]. ANSI’s ”REPEATABLE READ”
isolation level allows phantom reads, but prevents write
skew. In contrast, snapshot isolation allows write skew,
but prevents phantom reads. Serializable transactions al-
low neither.
Rest of the paper is organized as follows. In sec-
tion 2 we present a formal representation of histories for
multiversion concurrency control. Section 3 present a new
multiversion repeatable read isolation level to solve prob-
lems on traditional repeatable read isolation level. Section
4 presents a performance evaluation where the proposed
method is compared with the traditional implementation of
repeatable read isolation level on multiversion concurrency
control. Finally, Section 5 presents the conclusions of this
work.
2 MultiversionConcurrencyControlTheory
In order to reason about the execution of a collection of
transactions on a database using MVCC we need a formal
representation of histories rich enough to describe multiple
versions of data items. We will develop this in a similar
fashion to [21].
A version function htranslates each write step into a
version creation step and each read step into a version read
step.
Definition 2.1 Multiversion schedule. Let T={t1, ..., tn}
be a (finite) set of transactions. A multiversion history for
Tis a pair L= (Σ,≺Σ)where ≺Σis order on Σand
1. Σ = h(
n
S
i=1
Σi)for some function h
2. ∀t∈Tand ∀p, q ∈Σtthe following holds:
p≺tq⇒h(p)≺Σh(q)
3. if h(rj(x)) = rj(xi), i 6=jand cj∈Σthen ci∈Σ
and ci≺Σcj.
A multiversion schedule is a prefix of a multiversion
history.
Condition (1) states that each transaction operation is
translated into an appropriate multiversion operation. Con-
dition (2) states that history function preserves all orderings
defined by transactions. Condition (3) states that a transac-
tion may not read a version until it has been produced.
Definition 2.2 Reads-From Relation. Let Σbe a multiver-
sion schedule, t,tj∈Σtransactions. The reads-from rela-
tion of Σis defined by RF (Σ) = {(ti, x, tj)|rj(xi)∈Σ}.
A multiversion schedule is called a monoversion
schedule if its version function maps each read step to the
last preceding write step on the same data item.
Definition 2.3 Multiversion conflict. A multiversion con-
flict in a multiversion schedule Σis a pair of steps ri(xj)
and wk(xk)such that ri(xj)≺Σwk(xk).
Definition 2.3 essentially states that in a multiversion
schedule the only relevant conflict is read write operation
pairs on the same data item, not necessarily on the same
version. It is easy to see that write write pairs on the same
data item no longer count as conflicts, as they create dif-
ferent versions. A multiversion schedule can read the same
data item more than once and these reads can read different
version.
Definition 2.4 Multiversion Multiple Reads. If a trans-
action reads the same data item more than once these
reads are ordered as condition (2) stated in Definition
2.1 and these reads are numbered. In other words,
if h(rjk(x)) ∈Σjand h(rjk+1(x)) ∈Sigmajthen
h(rjk(x)) ≺Σh(rjk+1 (x)). Note it may hold that two
consecutive reads in the transaction do not read the same
version of the data item i.e h(rjk(x)) = rjk (xi)and
h(rjk+1 (x)) = rjk+1 (xl)where i6=l.
In this work we assume that the transaction writes a
data item at most once. Similarly, the construction in Def-
inition 2.4 can be used if multiple writes to the same data
item are allowed. From above we obtain:
Definition 2.5 Multiversion Conflict Serializability. A
multiversion history Σis multiversion conflict serializable
if there is a serial monoversion history for the same set of
transactions in which all pairs of operations in multiver-
sion conflict occur in the same order as in Σ.
It can be shown that membership of a history in class
multiversion conflict serializable can be characterized in
graph-theoretic terms using the following notion of a mul-
tiversion serialization graph.
Definition 2.6 Multiversion Serialization Graph (MVSG).
For a given schedule Σand a version order ≺, the mul-
tiversion serialization graph MV SG(Σ,≺)of Σthen is
the conflict graph G(Σ) = (V, E)with the following edges
added to each rk(xj)and wi(xi)in committed projection
of Σ, where k, i and jare pairwise distinct: if xi≺xj,
then (ti, tj)∈E, otherwise (tk, ti)∈E.
Not surprisingly, we obtain:
Theorem 2.1 A multiversion history is multiversion con-
flict serializable (MVSC) iff its multiversion serialization
graph is acyclic.
Definition 2.7 Initialization transaction T0. Lets assume
that we have a initialization transaction performing the fol-
lowing steps:
T0
create table total(s integer);
190
create table vals(i integer not null,
val integer, primary key(i));
insert into total values (5);
insert into vals values (1,1),(2,1),(3,1),
(4,1),(5,1);
commit;
Now executing first initialization transaction de-
scribed in Definition 2.7 lets consider following execution
history of transactions T1and T2:
Example 2.1 Read only transaction case.
T1
begin;
select *from total;
select *from vals;
T2
begin;
insert into vals values (6,1);
update total set s = s + 1;
commit;
select *from total;
select *from vals;
commit;
Naturally, transaction T1sees value (5)
in both selects from the table total and values
(1,1),(2,1),(3,1),(4,1),(5,1) in both selects from
the table vals.
Using a traditional repeatable read isolation level
transaction T2can’t update the value on total table because
transaction T1has taken a shared lock on the same record.
However, on multiversion concurrency control no locks are
taken for reads, and therefore transaction T2can update
total table. Furthermore, transaction T1does not see the
committed changes done by transaction T1and thus pro-
duces consistent repeatable read in other words, relations
produced on the first set of selects are equivalent to second
set of selects. Thus, the situation is simple when we we
have a read only transaction.
The situation becomes more interesting when the
transaction is not a read only transaction. Lets again as-
sume that we have the same initialization transaction as in
Definition 2.7. Now consider following execution history
of transactions T1and T2.
Example 2.2 Repeatable read with own changes.
T1
begin;
select *from total;
select *from vals;
T2
begin;
select *from total;
select *from vals;
insert into vals values (6,1);
update total set s = s + 1;
select *from total;
select *from vals;
commit;
select *from total;
select *from vals;
commit;
Naturally, transaction T1sees value (5)
in both selects from the table total and values
(1,1),(2,1),(3,1),(4,1),(5,1) in both selects from
the table vals. Similarly, transaction T2sees in the
first selects value (5) in the table total and values
(1,1),(2,1),(3,1),(4,1),(5,1) in the table vals. More-
over, transaction T2sees it’s own changes in the second
set of selects, i.e. (6) in the table total and values
(1,1),(2,1),(3,1),(4,1),(5,1),(6,1) in the table vals.
Transaction T1again sees equivalent database state on
both sets of the selects and does not see committed changes
done by transaction T2. However, transaction T2in the sec-
ond set of selects sees its own changes. Furthermore. trans-
action T2is valid iff values read on the first set of selects
do not change i.e no new versions of these data items are
created before commit.
Finally, if both transactions are not read only transac-
tions we have an inconsistent repeatable read. Lets again
assume that we have the same initialization transaction as
in Definition 2.7. Now consider the following execution
history of transactions T1and T2:
Example 2.3 Inconsistent repeatable read.
T1
begin;
select *from total;
select *from vals;
T2
begin;
insert into vals values (6,1);
update total set s = s + 1;
commit;
insert into vals values (7,1);
update total set s = s + 1;
select *from total;
select *from vals;
commit;
Transaction T1sees value (5) in the first select from
the table total and values (1,1),(2,1),(3,1),(4,1),(5,1)
in the first select from the table vals. Furthermore, trans-
action T2sees (7) from the table total because this is the
value the transactions itself has created. However, transac-
tion T1sees values (1,1),(2,1),(3,1),(4,1),(5,1),(7,1)
from the table vals. This is inconsistent because now
sum(val) from vals 6≡ select * from total. Additionally,
transaction T2sees in the first selects value (5) in the table
total and values (1,1),(2,1),(3,1),(4,1),(5,1) in the ta-
ble vals. Moreover, transaction T2sees it’s own changes
in the second set of selects, i.e. (6) in the table total and
values (1,1),(2,1),(3,1),(4,1),(5,1),(6,1) in the table
vals.
This execution history is possible because transaction
T2releases all locks on transaction commit. We could pre-
vent the transaction T1from seeing its own changes and
191
then result of both sets of selects would be equivalent and
consistent. But this would look as a inconsistency from the
users point of view. Additionally, if transaction code as-
serts that change has been successful, this assertion would
fail. If we let transaction T1see it’s own changes but not
the committed changes of transaction T2we have a incon-
sistent retrieval assuming that select sum(val) from vals ≡
select * from total. This is because we would see that total
table has value 7 but sum of values in val table is 6. There-
fore, there is clear need for a new definition for consistent
repeatable read isolation level in multiversion concurrency
control. This new definition and algorithm is presented in
the next section.
3 Multiversion Repeatable Read Isolation
Level
In a traditional monoversion repeatable read isolation level
no data records retrieved by a SELECT statement can be
changed; however, if the SELECT statement contains any
ranged WHERE clauses, phantom reads may occur. In this
isolation level the transaction acquires read locks on all re-
trieved data, but does not acquire range locks.
In a multiversion concurrency control situation rad-
ically changes because no locks are acquired for retrieved
data. Therefore, a new set of inconsistent retrievals are pos-
sible.
Definition 3.1 Multiversion Repeatable Read. Let Tbe a
transaction and ΣTbe operations defined by transaction.
Then transaction Tobeys multiversion consistent repeat-
able read iff:
1. (Read only) if W S (T) = ∅∧h(rTn(x)), h(rTm(x)) ∈
ΣT, n > m ⇒h(rTn(x)) = rTn(xj)∧h(rTm(x)) =
rTm(xj).
2. (Own changes) ∀x∈WS (T)TRS(T)∧
h(wT(x)) = wT(xt)≺Σh(rT(x)) ⇒h(rT(x)) =
rT(xT).
3. (Version upgrade) ∀x∈W S(T)if ∃y∈RS (T) :
h(rTi[y]) ≺h(wT[x]) ≺h(rTj[y]) ∧j > i ⇒
h(rTi[y]) = rTi[yk]≺h(wT[x]) = wT[xl]≺
h(rTj[y]) = rTj[ym]∧k≤m≤l.
4. (No version changes) ∀x∈W S(T)if x∈RS(T)⇒
∄k:h(rT[x]) ≺h(xk[x]) =≺ck≺now, k 6=T
Condition (1) states that if we have a read-only trans-
action and the transaction reads the same data item more
than one then the version function always translates each
read step to the same version of the data item. Condition
(2) states that if a transaction writes a data item all succes-
sive reads to the same data item read the version transac-
tion has written. Condition (3) states that if the transaction
reads the same data item after it has created a new version
of another data item the second read might read a newer
version of the data item compared to first read before the
write. Condition (4) states that if a transaction is not read-
only transaction then versions read by this transaction must
remain the same at the validation phase. Thus, no other
transaction has created a new version of those data items.
Now we are ready to present algorithms to implement
a multiversion repeatable read isolation level. For presen-
tation MVCC-RR. We begin where transaction Twill read
the version valid at the start of the transaction readlevel
and if the transaction is doing a update, then the transac-
tion will read the latest version of the data item. Identifi-
cation and read version is stored to the read set RS of the
transaction.
Definition 3.2 read(T , X, readlevel, mode):
begin
ifX∈RS(T)
update X in RS(T)
elseadd X to RS(T)
end if
if more == read then
read version(X, readlevel)
else
read version(X, ∞)
end if
end
In a write a new version of the data item is created and
this data item is stored to write set W S of the transaction.
If we write a data item we will increase the readlevel to
the data item Xnew version. Thus if we have create a new
version of the data item following reads to the same data
item will read the created version.
Definition 3.3 write(T , X, readlev el)
begin
create new version of X;
add X to WS(T)
set readlevel to version(X)
end
Finally, for every data item read by the transaction we
must see that there is no other transaction that has also writ-
ten the same data item and if the data item version is dif-
ferent from the version read originally that only this trans-
action has created it.
Definition 3.4 validate write set(T , readlevel)
begin
for all X in RS(T)
if version(X) != current_version(X)
if X in WS(T)
if current_version(X) != version(WS(T))
return FALSE;
else return FALSE;
end if
end if
end for
192
return TRUE;
end
Theorem 3.1 MVCS ⊂MVCC-RR. To show true subset
property it is enough to give an example history hwhere
h∈MVCC-RR but h /∈MVCS. Any example where hcon-
tains phantom rows is enough. Lets assume initialization
transaction described in Definition 2.7. Forexample:
T1
begin;
select *from vals where i between 1 and 5;
T2
begin;
insert into vals values(4,1);
commit;
update vals set val = 2 where i = 2;
select *from vals where i between 1 and 5;
commit;
Corollary 3.1 MVCC-RR ∈P. Thus MVCC-RR belongs
to class P and can be solved on a deterministic sequential
machine in an amount of time that is polynomial in the size
of the input.
Lets again assume that we have the same initialization
transaction as in Definition 2.7. Now consider the follow-
ing execution history of transactions T1and T2using the
MVCC-RR method:
Example 3.1 Multiversion repeatable read isolation level.
T1
begin;
select *from total;
select *from vals;
T2
begin;
insert into vals values (6,1);
update total set s = s + 1;
commit;
insert into vals values (7,1);
update total set s = s + 1;
select *from total;
select *from vals;
commit;
Now, transaction T1sees value (5) in
the first select from the table total and values
(1,1),(2,1),(3,1),(4,1),(5,1) in first select from
the table vals. Furthermore, transaction T2sees (7) from
the table total this is because this value is the value the
transaction itself has created. Additionally, transaction T1
sees values (1,1),(2,1),(3,1),(4,1),(5,1),(6,1),(7,1)
from the table vals. This is consistent because now
sum(val) from vals ≡select * from total. Additionally,
transaction T2sees in the first selects value (5) in the table
total and values (1,1),(2,1),(3,1),(4,1),(5,1) in the
table vals. Moreover, transaction T2sees it’s own changes
in the second set of selects, i.e. (6) in the table total and
values (1,1),(2,1),(3,1),(4,1),(5,1),(6,1) in the table
vals.
4 Performance evaluation
We have carried out a set of experiments in order to exam-
ine the feasibility of our prototype implementation, specifi-
cally the concurrency control mechanism. All experiments
were executed in the MySQL database running on a AMD
Opteron Processor 146 processor containing 2 GB of main
memory with the Linux operating system 2.6.20.
The test database represents a typical network
database that mimics a Home Location Register (HLR)
[22] which is used to store information about users of the
network. Operators use HLR databases to store subscriber
data, location data, network access data, and about network
services data, for example call forwarding. To simplify pre-
sentations schema presented below does not contain all the
operations of the HLR. Instead database and transactions
are from The Telecom One (TM1) benchmark designed for
telecommunication applications [19].
In the Figure 1 we have compared overall per-
formance of different MVCC implementations in
MySQL/InnoDB and MySQL/solidDB storage engines.
MySQL/InnoDB implements traditional MVCC using
locks while MySQL/solidDB uses MVCC-RR method
presented in this paper. This experiment clearly shows
that MVCC-RR provides a lot better overall performance
because it allows more concurrency between different type
of transactions.
0
5000
10000
15000
20000
25000
0 200 400 600 800 1000 1200
Transactions in second
Time (seconds)
MySQL/innodb
MySQL/solidDB
Figure 1. MVCC methods compared.
5 Conclusion
Multiversion concurrency control is an attractive choice
for concurrency control method in database system be-
cause reads can read any version without locking and thus
adds more alternatives in controlling the order of reads and
193
writes producing more concurrency to transaction execu-
tion. Repeatable read isolation level is the most used iso-
lation level on many applications. However, traditional
repeatable read isolation level definition is written with a
strict lock-based concurrency control in mind and hence
is rather vague when applied to multiversion concurrency
control systems. We have presented definition for consis-
tent repeatable read isolation level for multiversion concur-
rency control and presented algorithms witch implement
this definition. We have shown that these algorithms pro-
duce correct results.
References
[1] D. Agrawal, A. J. Bernstein, P. Gupta, and S. Sen-
gupta. Distributed multi-version optimistic concur-
rency control for relational databases. In COMPCON,
pages 416–421, 1986.
[2] M. Ahuja and J. C. Browne. Concurrency con-
trol by pre-ordering entities in databases with multi-
versioned entities. In ICDE, pages 312–321, 1987.
[3] R. Bayer. On the integrity of data bases and resource
locking. In IBM Symposium: Data Base Systems,
pages 339–361, 1975.
[4] H. Berenson, P. A. Bernstein, J. Gray, J. Melton,
E. O’Neil, and O’Neilm P. A critique of ansi sql iso-
lation levels. In SIGMOD Conference, pages 1–10,
1995.
[5] P. A. Bernstein and N. Goodman. Concurrency con-
trol in distributed database systems. ACM Computing
Surveys, 13(2), June 1981.
[6] P. A. Bernstein, V. Hadzilacos, and N. Goodman.
Concurrency Control and Recovery in Database Sys-
tems. Addison-Wesley, 1987.
[7] A. Chan, S. Fox, W. K. Lin, A. Nori, and D. R.
Ries. The implementation of an integrated concur-
rency control and recovery scheme. In SIGMOD Con-
ference, pages 184–191, 1982.
[8] K. P. Eswaran, J. N. Gray, R. A. Lorie, and I. L.
Traiger. The notions of consistency and predicate
locks in a database system. Communications of the
ACM, 19(11):624–633, November 1976.
[9] A. Fekete, D. Liarokapis, E. O’Neil, P. O’Neil, and
D. Shasha. Making snapshot isolation serializable.
ACM Trans. Database Syst., 30(2), 2005.
[10] R. Kataoka, T. Satoh, and U. Inoue. A multiver-
sion concurrency control algorithm for concurrent ex-
ecution of partial update and bulk retrieval transac-
tions. In Proceedings of the 10th Phoenix Conference
on Computers and Communications, pages 130–136,
1991.
[11] H. T. Kung and C. H. Papadimitriou. An optimal-
ity theory of database concurrency control. In Pro-
ceedings of ACM SIGMOD 1979 Annual Conference,
pages 116–126, 1979.
[12] X. Liu, J. A. Miller, and N. R. Parate. Transaction
management for object-oriented databases: Perfor-
mance advantages of using multiple versions. In Pro-
ceedings of the 25th Annual Simulation Symposium,
pages 222–231, Los Alamitos, Calif., April 1992.
IEEE Computer Society Press.
[13] S. Muro, T. Kameda, and T. Minoura. Multi-version
concurrency control scheme for a database system. J.
Comput. Syst. Sci., 29(2):207–224, 1984.
[14] C. Papadimitriou. The serializability of concurrent
database updates. Journal of the ACM, 26(4):631–
653, October 1979.
[15] Y. Raz. Commitment ordering based distributed con-
currency control for bridging single and multi version
resources. In RIDE-IMS, pages 189–198, 1993.
[16] D. P. Reed. Naming and synchronization in a decen-
tralized computer system. PhD thesis, Department of
Electrical Engineering and Computer Science, Mas-
sachusetts Institute of Technology, 1978.
[17] D. R. Ries and M. Stonebraker. Locking granularity
revisited. ACM Trans. Database Syst., 4(2):210–227,
1979.
[18] A. Silberschatz. A multi-version concurrency scheme
with no rollbacks. In PODC, pages 216–223, 1982.
[19] T. Strandell. Open source database systems: Systems
study, performance and scalability. Master’s thesis,
2003.
[20] L. Wang and Z. Peng. Extension of multi-version con-
currency control mechanisms for long-duration trans-
action based on nested transaction model. In CIT,
pages 963–968, 2004.
[21] G. Weikum and G. Vossen. Transactional Information
Systems: Theory, algorithms, and the practice of con-
currency control and recovery. Morgan Kaufmann,
2002.
[22] Wikipedia. Network switch-
ing subsystem. Technical report,
http://en.wikipedia.org/wiki/Home Location Register.
[23] M. Yannakakis, C. H. Papadimitriou, and H. T. Kung.
Locking policies: Safety and freedom from deadlock.
In 20th Annual Symposium on Foundations of Com-
puter Science, pages 286–297, 1979.
194