Conference PaperPDF Available

Locomotor: transparent migration of client-side database code

Authors:

Abstract and Figures

Server-side execution is a well-known method for improving the performance of database applications. Running code on the database server eliminates round trips to the client application resulting in significantly reduced latency. However, the common approach of explicitly writing server-side code in stored procedures has significant drawbacks. Application developers must develop and maintain code in two separate languages and manually partition code between the client and server. Code shipping is a viable alternative but still requires an explicit specification of which code can be run on the server. We propose a hybrid shipping approach based on static analysis which automatically partitions client code and only ships code to the server which is likely to improve performance. We demonstrate the viability of this approach in a prototype system which we call Locomotor. Locomotor operates on Python applications using the Redis key-value store. Through static analysis, it identifies fragments of code which can benefit from being executed on the server and automatically performs translation to execute the fragments on the server. Unlike many previous systems, Locomotor is not pattern-based and is able to ship a wide variety of code. By shipping code to the server, Locomotor is able to achieve significant performance gains over client-side execution with no modifications to the application code.
Content may be subject to copyright.
Locomotor: Transparent Migration of Client-Side Database Code
Michael J. Mior
University of Waterloo
200 University Avenue West
Waterloo, Ontario N2L 3G1
mmior@uwaterloo.ca
ABSTRACT
Server-side execution is a well-known method for improving the
performance of database applications. Running code on the data-
base server eliminates round trips to the client application resulting
in signicantly reduced latency. However, the common approach of
explicitly writing server-side code in stored procedures has signi-
cant drawbacks. Application developers must develop and maintain
code in two separate languages and manually partition code be-
tween the client and server. Code shipping is a viable alternative
but still requires an explicit specication of which code can be run
on the server. We propose a hybrid shipping approach based on
static analysis which automatically partitions client code and only
ships code to the server which is likely to improve performance.
We demonstrate the viability of this approach in a prototype
system which we call Locomotor. Locomotor operates on Python
applications using the Redis key-value store. Through static anal-
ysis, it identies fragments of code which can benet from being
executed on the server and automatically performs translation to
execute the fragments on the server. Unlike some previous systems,
Locomotor is not pattern-based and is able to ship a wide variety of
code. By shipping code to the server, Locomotor is able to achieve
signicant performance gains over client-side execution with no
modications to the application code.
CCS CONCEPTS
Information systems Database query processing
;
Soft-
ware and its engineering Scripting languages;
ACM Reference format:
Michael J. Mior. 2017. Locomotor: Transparent Migration of Client-Side
Database Code. In Proceedings of DBPL 2017, Munich, Germany, September 1,
2017, 5 pages.
DOI: 10.1145/3122831.3122840
1 INTRODUCTION
It is common for databases to allow the creation of parameterized
stored procedures which execute on the database server. These
stored procedures are generally more ecient than equivalent
client-side code since the use of stored procedures can eliminate
multiple network round trips to the database server. Some systems,
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
DBPL 2017, Munich, Germany
©2017 ACM. 978-1-4503-5354-0/17/09. . . $15.00
DOI: 10.1145/3122831.3122840
def i n c r e x ( r e d i s , ke y ) :
i f r e d i s . e x i s t s ( key ) :
return r e d i s . i n c r ( ke y )
else :
return Non e
Figure 1: Python code to conditionally increment in Redis
such as VoltDB [
12
], rely on the use of stored procedures to achieve
high performance. However, stored procedures can be more dicult
to write and manage than client code. Stored procedures may be
written in a dierent language than the application and it is nec-
essary to deploy and maintain stored procedures on the database
server as the code changes.
One example of these challenges is present in the Redis key-value
store. Applications accessing Redis do so via a programmatic API
using libraries provided in the source language of the application
(e.g. Python). For example, Figure 1 shows Python code that incre-
ments the value of a key in Redis only if the key exists. This code
requires two round trips to the server: rst to check if the key exists
(
EXISTS
), and second to increment the key (
INCR
). The remainder
of this work explores this problem using Redis and Python, but a
similar solution would be other possible scripting languages, such
as Ruby and JavaScript, and other datastores supporting server side
scripting, such as MongoDB.
Redis has a mechanism to enable scripts written in Lua to be
executed on the server. As shown in Figure 2, writing a server-side
script can reduce the latency of application code involving multiple
operations. Lua is a popular embedded scripting language, but it is
rarely used in application development. Therefore, it is likely that
code interacting with the database will be written in the source
language of the application. Conversion to a server-side script is
typically only considered in the case of performance issues since it
requires introducing an additional language as well the burden of
maintaining the code on the server.
Our goal in this work is to automate this approach of translating
client application code. We have developed a prototype system we
call Locomotor which performs the automation via static analysis
and code shipping. Section 2 provides an overview of our code anal-
ysis techniques. Section 3 then discusses the benets of partitioning
code between the client and the server. We give the details of our
approach to code shipping in Section 4 and evaluate the eective-
ness of our approach in Section 5. Finally, we examine some related
work in Section 6 and conclude in Section 7.
DBPL 2017, September 1, 2017, Munich, Germany Michael J. Mior
Client Redis
EXISTS
INCR
Client Redis
EVAL
(a) Client-driven (b) Stored procedure
Figure 2: Sequence diagrams for client-driven and scripted
conditional increment
2 CLIENT CODE ANALYSIS
We have two tasks to perform when analyzing database client
application code. The rst is to identify where database access is
occurring. This is necessary in order to determine where we need
to insert code which calls native server methods inside the stored
procedure. Second, we perform data ow analysis to determine
what data needs to be passed to/from a block of code which will be
translated into a stored procedure.
2.1 Identifying Database Access
We take a simplistic approach to identifying code which accesses the
database. First, we consider the scope for the potential optimizations
as a Python function. Second, we assume that application code uses
simple objects as the only interface to the backend database. Since
this form of encapsulation is good software engineering practice, we
expect this assumption to hold in practice. With additional eort, it
should be possible to automate the identication of database access
throughout the code instead of limiting the scope to single functions.
With Redis as our target database, we assume that database access
objects have methods which are named after Redis commands.
Locomotor currently searches for these method calls to identify
database access objects. It is also possible to explicitly annotate
database access objects. Although this approach is simplistic, it is
easy to implement and can be easily replaced if an application uses
another database interface.
After identifying these objects, we note where method calls on
these objects occur within the application source code. We only
consider shipping code to the server which contains database ac-
cess since there is no benet to shipping code which does not
interact with the server. We further ensure that any database calls
executed on the server have their parameters available by shipping
the relevant data as we further discuss in Section 4.
2.2 Data Flow Analysis
We use a simple form of information ow tracking [
10
] to identify
how data fetched from the database is propagated throughout the
application code. It is most benecial to colocate database calls on
the server which have dependencies between their parameters. For
example, if a store to the database depends on a previous read then
colocating those calls on the server will reduce the number of round
trips. After identifying database objects as in the previous section,
we denote any values returned by calling methods on those objects
as tainted. When those values are assigned to variables or used in
expressions, we mark those values as also tainted. Finally, when a
tainted value is used in a future call on a database object, we can
identify a link between these two calls.
Alongside this taint analysis, we perform a simple data ow
analysis to identify values required and emitted by a particular block
of code. For each block of code that we ship to the server, we must
ship all data which is used by that block of code. Furthermore, we
must ship back any data modied by the code or values which need
to be returned from the enclosing function. Locomotor is currently
able to send and receive integer and oating point numbers, strings,
and lists or associative arrays consisting of these values. We discuss
the full details of translation in Section 4.
3 CODE PARTITIONING
A key insight is that it is only necessary to ship code which accesses
the database to the server. This reduces the load on the server and
eliminates the need to add support for arbitrary application libraries
on the server. As we discuss later, we only translate a subset of
application syntax to the server. This simplies the translation pro-
cess while still allowing Locomotor to translate signicant portions
of real-world code. Since applications may mix database and non-
database code within a single method, we perform code partitioning
at the level of source lines. Making use of the information obtained
via analysis of the client code, Locomotor aims to reduce the latency
of application operations involving database code.
We assume that code involving database access is not CPU inten-
sive and that the main source of overhead is network trac between
the application and the database [
13
]. The potential gains we can
achieve from sending the code to the server result from eliminating
network overhead between the application and the database server.
However, it is impossible to precisely determine this overhead stat-
ically. The cost of partitioning the code at any point depends on
factors such as the size of the required data which must be trans-
ferred and the number of iterations for loops in the application
code. We make several simplifying assumptions which we expect
to hold in an OLTP setting. Firstly, we assume the amount of data
processed results in small transfer times relative to network latency.
This reduces the problem to determining the estimated minimum
number of round trips to the database. Second, we assume any
iterative code involving database access will execute more quickly
on the server, so we always ship iterative code. Since we assume
data size is small and network round trips dominate the runtime,
we expect this assumption to hold in practice.
4 CODE SHIPPING
Our approach to code shipping contains two main steps. First, we
transpile the client application code to the same language used by
server-side stored procedures. Second, we patch the client appli-
cation code to invoke the stored procedure in the location where
the original code was located. We describe the complete process of
code shipping in the following section.
Locomotor: Transparent Migration of Client-Side Database Code DBPL 2017, September 1, 2017, Munich, Germany
4.1 Transpilation
High-level scripting languages such Python, JavaScript, or Ruby are
popular choices for building database applications [
8
]. Redis, our
target database for Locomotor, makes use of the Lua programming
language as an embedded scripting engine. The advantage of our
choice of dynamic scripting languages for both application and
server-side code helps simplify transpilation. When performing
transpilation, we operate at the level of the abstract syntax tree
(AST). The similarity of the two languages means that the AST of
equivalent Python and Lua code are very similar. We can therefore
perform transpilation by walking the AST of the Python code we
wish to translate while emitting the equivalent Lua code. There
are some caveats to this approach resulting from minor dierences
in the two languages which we discuss in Appendix A. After per-
forming transpilation, Redis provides an interface to register the
script with the server via a string of source code. We discuss in the
following section how we patch the invocation of server-side code
into the client application.
We also need a mechanism for transporting data structures be-
tween the two languages. As discussed in the previous section,
Locomotor allows the use of complex types such as lists and as-
sociative arrays. Since the Redis scripting interface only permits
the transfer of binary strings and numbers, we must serialize these
values during transfer. The Redis scripting engine contains a library
for working with the MessagePack [
6
] serialization format. Mes-
sagePack is an ecient binary format which supports all the data
types we choose to work with in Locomotor. This solves the prob-
lem of translating complex sequences of nested objects between
the two languages. The Python MessagePack library is invoked in
the client application when values are sent to the server. The Lua
MessagePack library deserializes the data and then serializes any
values which are required by the application code.
4.2 Method Patching
After translating the Python code to the equivalent Lua, we must
modify the application source to call server-side code for that por-
tion of the method. To accomplish this, Locomotor edits the byte-
code of functions to replace the relevant portion with a call to the
server-side method. This call accepts all the relevant values which
must be passed to the server-side code. Upon returning from the
server-side code, the Locomotor-patched code checks a ag to see
if it should return from the enclosing function. If so, the return
value is unpacked from the server and the function returns. Other-
wise, we unpack all the values which must be stored back to local
variables and continue executing the client-side code.
5 EVALUATION
The goal of our evaluation is to demonstrate that automated ap-
proaches for stored procedure conversion can improve application
performance without requiring runtime information. We focus on
the delivery transaction of py-tpcc [11], an implementation of the
TPC-C benchmark [
9
] in Python with support for Redis. Some minor
modications were required to x bugs discovered in the imple-
mentation. To perform a delivery, few parameters are required, but
many keys are accessed on the Redis server. Furthermore, a small
amount of data is returned when the transaction completes. These
properties make this transaction an ideal target for optimization.
Each machine used in the remainder of this section has two six
core Xeon E52620 processors operating at 2.10 GHz and 64 GB of
memory. A 1TB SATA drive operating at 7,200 RPM was used for
the Redis data directory. A manually compiled version of Redis
3.2 (unreleased) was used to serve all requests. This version of
Redis was required because previous versions do not allow any
scripts which are nondeterministic. This includes scripts which
write values from a previous read, a restriction which would limit
our potential optimizations.
5.1 Manual Partitioning
To explore the eects of code shipping on performance, we rst
start with a manual partitioning of the TPC-C delivery transaction.
In this partitioning, we ship the entire transaction to the server
for execution. To measure the eect on latency, we execute a num-
ber of transactions in three dierent settings: 1) the unmodied
Python application, 2) a pure Lua client-side implementation of
the same transaction generated by Locomotor, 3) a hand-optimized
Lua client-side implementation of the same transaction, 4) entire
transaction shipped to the server as Lua. For each implementation,
we execute 100,000 transactions in a single thread and measure the
total completion time to see how latency is aected. Results are
shown in Figure 3.
We rst note that there is signicant room for optimization of the
automatically translated Lua code. Because both Python and Lua
are dynamically typed languages with diering semantics, Locomo-
tor currently requires some runtime checking. We expect that much
of this overhead could be eliminated by performing type inference
on the client application in Python. The hand-optimized version
of the Lua code is approximately 37% faster than the one gener-
ated by Locomotor, suggesting that there are indeed signicant
opportunities for improvement.
Although the Lua implementation of the transaction generated
by Locomotor is slower than the original Python application, we
still see a 4
×
reduction in runtime when this code is shipped to
the server as a stored procedure. This is because the transaction
contains multiple loops make a number of database calls. When
all this code resides on the client, there are a signicant number
of network round trips between the application and the server. By
placing this code on the server as a stored procedure, we greatly
reduce the latency of each of these database accesses.
For example, we measured the number of network round trips
between the client and the server for a single instance of the deliv-
ery transaction in the above experiments. With the unpartitioned
Python implementation, 24 round trips were required. Using the
partitioned Python implementation reduced the number of net-
work round trips to 8. The network used in our experiments has
a measured average latency of approximately 0.163 ms. This leads
to an anticipated time savings from reduced round trips of 16.3
seconds. The actual improvement is approximately 14.7 seconds.
We expect the performance penalty is a result of ineciencies in
our Lua script implementation.
DBPL 2017, September 1, 2017, Munich, Germany Michael J. Mior
0
5
10
15
20
25
30
35
Python Lua Optimized Lua Partitioned
Execution time (s)
Figure 3: TPC-C delivery transaction runtime
6 RELATED WORK
While systems exist which perform automated partitioning of ap-
plications between client and server, Locomotor has several key
distinctions. Firstly, existing systems which perform code shipping
assume that the client and the server are written in the language
and that the server code can be modied to enable the shipping
process. Existing work also makes limited use of the information
available from analyzing application code and instead requires sig-
nicant runtime analysis. We provide a more detailed comparison
with related systems below.
There has been signicant work in optimizing the performance
of applications via automatic partitioning. Coign [
7
] automatically
partitions applications using Microsoft’s COM interface. This parti-
tioning relies on proling application binaries to determine com-
munication patterns. After proling, components were partitioned
using a series of classiers to decide which to place on the client or
server. Unlike Locomotor, the client and server both use the same
runtime and code can be moved largely unchanged between the
client and server.
Others have examined automatic partitioning for mobile applica-
tions. MAUI [
5
] and CloneCloud [
4
] aim to optimize performance of
mobile device applications by ooading compute-intensive code to
more powerful processors on remote servers. Both also use coarse-
grained partitioning and require a signicant amount of proling
data to make correct decisions.
Another approach to reducing communication overhead is to
automatically batch application queries together in a single com-
munication round with the database server. Scalpel [
1
] attempts
to do so by monitoring database requests at the network layer and
prefetching results for queries which are expected to be issued
in the near future. Furthermore, Scalpel is able to rewrite queries
to fetch correlated results automatically. Both of these tasks are
achieved by predicting correlations of parameters between multiple
queries. While Scalpel performs well, it does not take advantage of
any knowledge of the application code and performs unnecessary
requests to the database when predictions fail. Sloth [
3
] performs
batching by delaying queries until their results are needed by sub-
sequent application code. Queries which are not dependent on the
results of previous queries can then be batched together to save on
round trips. Sloth performs this batching by adding lazy evaluation
to Java programs using JDBC. While this approach works well, it is
unable to batch queries which have dependencies between them.
Furthermore, the lazy evaluation required to perform batching adds
5–15% overhead.
The most similar work to ours is Pyxis [
2
]. Pyxis uses runtime
proling to dynamically ship portions of JDBC application code
onto a server side component. The goal of Pyxis is to minimize
round trips between the client and the server when an applica-
tion makes consecutive database requests. However, Pyxis requires
dynamic proling of the application code which adds up to 6
×
overhead for code not involving database access. This restricts the
class of applications which can benet from Pyxis without man-
ual restructuring of the application to separate compute-intensive
code. The overhead is partially a result of a server-side component
which is required to communicate with the database process on
the server. We avoid this issue in Locomotor by using the scripting
language already embedded in the database server thus requiring
no changes to the server code. Furthermore, Locomotor directly
handles dynamic languages while Pyxis relies on a compiler which
modies Java application source code.
7 CONCLUSIONS AND FUTURE WORK
By automatically translating code that interacts with the database,
we are able to produce ecient stored procedures while eliminating
the overhead of manual rewriting. Our prototype implementation
showed a 4×performance increase over a purely client-side appli-
cation for the TPC-C delivery transaction. Furthermore, scripts can
be automatically installed on the server and updated as the source
language changes. For code which includes multiple interactions
with the database, this conversion to stored procedures can result
in signicant performance improvements.
Currently it is necessary for the application developer to specify
functions they wish to execute on the server. In future work, we
anticipate that by constructing an accurate performance model of
both the server and client-side code, it should be possible to au-
tomate the selections of code fragments to execute on the server.
While our prototype only operates on Python applications commu-
nicating with Redis, we believe similar techniques could apply to
other databases.
REFERENCES
[1]
I.T. Bowman and K. Salem. 2005. Optimization of Query Streams Using Semantic
Prefetching. ACM TODS 30, 4 (Dec 2005), 1056–1101.
[2]
A. Cheung and others. 2012. Automatic Partitioning of Database Applications.
Proc. VLDB Endow. 5, 11 (Jul 2012).
[3]
A. Cheung and others. 2014. Sloth: Being Lazy is a Virtue (when Issuing Database
Queries). In Proc. of SIGMOD ’14.
[4]
B. Chun and others. 2011. CloneCloud: Elastic Execution Between Mobile Device
and Cloud. In Proc. of EuroSys ’11. ACM.
[5]
E. Cuervo and others. 2010. MAUI: Making Smartphones Last Longer with Code
Ooad. In Proc. of MobiSys ’10. ACM.
[6] S. Furuhashi. 2017. MessagePack. (2017). https://msgpack.org/
[7]
G.C. Hunt and M.L. Scott. 1999. The Coign Automatic Distributed Partitioning
System. In Proc. of OSDI ’99. USENIX Association, 1471–1482.
[8]
M. Jazayeri. 2007. Some Trends in Web Application Development. In Proc. FOSE
’07. IEEE Comp. Soc., 199–213.
[9]
W. Kohler and others. 1991. Overview of TPC Benchmark C: The Order-Entry
Benchmark. Transaction Processing Performance Council, Technical Report (1991).
Locomotor: Transparent Migration of Client-Side Database Code DBPL 2017, September 1, 2017, Munich, Germany
return {'a': 1 } [ 'a']
(a) Python
d = { [ '__DICT '] = t r u e , [ 'a'] = 1 }
return d [ ( d ._ _D IC T ) and ('a')or ('a'+ 1 ) ]
(b) Lua
Figure 4: Dictionary access
[10]
L. Lam and others. 2006. A General Dynamic Information Flow Tracking Frame-
work for Security Applications. In Proc. ACSAC ’06. 463–472.
[11]
A. Pavlo. 2015. Python implementation of TPC-C. (2015). https://github.com/
apavlo/py-tpcc
[12]
M. Stonebraker and A. Weisberg. 2013. The VoltDB Main Memory DBMS. IEEE
Data Eng. Bull. 36, 2 (2013), 21–27.
[13]
X. Wu and others. 2013. Optimizing Event Polling for Network-Intensive Appli-
cations: A Case Study on Redis. In Proc. ICPADS ’13. 687–692.
A PYTHON TO LUA TRANSLATION DETAILS
In general, our translation process works by examining an AST
node in the original Python code and emitting the corresponding
Lua code. However, there are caveats which arise as a result of the
dierences between the two languages which we detail below.
A.1 Data Type Conversions
One of the main diculties with translating between two languages
is the incongruence of their type systems. For example, Python has
explicit types for arrays (lists) and hash tables (dictionaries) while
Lua has a single type (tables) which are used for both purposes. This
is complicated by the fact that Lua uses 1-based indexing for tables
with integer indices (arrays). The syntax for accessing both arrays
and dictionaries is the same in Python. This means it is necessary
to know at runtime whether the Lua table is being used as an array
or hash table to construct the appropriate index. We solve this
problem by annotating each Lua table with attribute indicating if
it corresponds to a Python dictionary. On each table access, we
check this attribute to construct the correct index into the table. An
example translation is given in Figure 4.
A.2 Language Semantics
While there is a high correspondence between the operators avail-
able in Python and Lua, some operators have slightly dierent
semantics. For example, Lua has two separate operators for numer-
ical addition (
+
) and string concatenation (
..
). Python simply uses
+
for both purposes and we cannot necessary statically determine
whether the operation represents addition or concatenation. To
resolve this issue, we use the simple heuristic of identifying a nu-
merical or string constant on either side of the operator and falling
back to integer addition. This heuristic could be replaced by either
static type inference (if possible) or runtime type checks as needed.
Lua’s truth semantics also dier from those of Python. In Lua,
only
nil
and
false
evaluate to
false
while in Python other values
such as empty strings and collections also evaluate to false. To
match these semantics, we simply check Lua values according to
Python semantics whenever an expression is evaluated in a boolean
context.
for iin r a nge (10):
i f i == 3 :
break
else :
continue
x += 1
(a) Python
l o c a l __BREAK1 = false
for i = 0 , 1 0 1 , 1 do
i f __BREAK1 then brea k end
repeat
i f i == 3 th en
__BREAK1 = t r ue
do bre ak end
else
i f t ru e t he n br ea k end
end
x=x+1
u n t i l t r u e
end
(b) Lua
Figure 5: Loop break and continue
A.3 Built-in Functions
Python has built-in functions as well as a large standard library
which is commonly used by applications. We choose to assume that
built-in functions have not been redened. In practice, it is possible
that a developer chose to name their own functions using these
names in which case the translation would be invalid. Most of the
time this assumption is valid but if it is violated, then the translation
will not be faithful to the original code. While this has not been
a problem with the examples we considered, this issue could be
resolved by more detailed static analysis to identify overridden
functions coupled with runtime checks to ensure that the guess
that the function was overridden was correct. When translating
built-in functions, we either use an equivalent built-in function in
Lua or provide a simple Lua implementation.
A.4 Loop Constructs
Python and Lua have dierent constructs for looping. For example,
Lua does not support a
continue
statement which Python uses
to end the current loop iteration and continue with the next. We
emulate this in Lua by nesting an additional loop and using a
break
statement to stop executing the rest of the loop. This has the dis-
advantage of disallowing both break and continue in a single loop.
We can resolve this issue by modifying
break
statements in the
original code to set a ag which can be checked on each iteration
to abort the loop as we demonstrate in Figure 5.

Supplementary resource (1)

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Many software security solutions require accurate tracking of control/data dependencies among information objects in network applications. This paper presents a general dynamic information flow tracking framework (called GIFT) for C programs that allows an application developer to associate application-specific tags with input data, instruments the application to propagate these tags to all the other data that are control/data-dependent on them, and invokes application-specific processing on output data according to their tag values. To use GIFT, an application developer only needs to implement input and output proxy functions to tag input data and to perform tag-dependent processing on output data, respectively. To demonstrate the usefulness of GIFT, we implement a complete GIFT application called Aussum, which allows selective sandboxing of network client applications based on whether their inputs are "tainted" or not. For a set of computation-intensive test applications, the measured elapsed time overhead of GIFT is less than 35%
Conference Paper
Full-text available
This paper presents MAUI, a system that enables fine-grained energy-aware offload of mobile code to the infrastructure. Previous approaches to these problems either relied heavily on programmer support to partition an application, or they were coarse-grained re- quiring full process (or full VM) migration. MAUI uses the benefits of a managed code environment to offer the best of both worlds: it supports fine-grained code offload to maximize energy savings with minimal burden on the programmer. MAUI decides at run- time which methods should be remotely executed, driven by an op- timization engine that achieves the best energy savings possible un- der the mobile device's current connectivity constrains. In our eval- uation, we show that MAUI enables: 1) a resource-intensive face recognition application that consumes an order of magnitude less energy, 2) a latency-sensitive arcade game application that doubles its refresh rate, and 3) a voice-based language translation applica- tion that bypasses the limitations of the smartphone environment by executing unsupported components remotely.
Article
Many web applications store persistent data in databases. During execution, such applications spend a significant amount of time communicating with the database for retrieval and storing of persistent data over the network. These network round-trips represent a significant fraction of the overall execution time for many applications (especially those that issue a lot of database queries) and, as a result, increase their latency. While there has been prior work that aims to eliminate round-trips by batching queries, they are limited by (1) a requirement that developers manually identify batching opportunities, or (2) the fact that they employ static program analysis techniques that cannot exploit many opportunities for batching, as many of these opportunities require knowing precise information about the state of the running program. In this article, we present Sloth, a new system that extends traditional lazy evaluation to expose query batching opportunities during application execution, even across loops, branches, and method boundaries. Many such opportunities often require expensive and sophisticated static analysis to recognize from the application source code. Rather than doing so, Sloth instead makes use of dynamic analysis to capture information about the program state and, based on that information, decides how to batch queries and when to issue them to the database. We formalize extended lazy evaluation and prove that it preserves program semantics when executed under standard semantics. Furthermore, we describe our implementation of Sloth and our experience in evaluating Sloth using over 100 benchmarks from two large-scale open-source applications, in which Sloth achieved up to a 3 × reduction in page load time by delaying computation using extended lazy evaluation.
Conference Paper
In today's data centers supporting Internet-scale computing and I/O services, increasingly more network-intensive applications are deployed on the network as a service. To this end, it is critical for the applications to quickly retrieve requests from the network and send their responses to the network. To facilitate this network function, operating system usually provides an event notification mechanism so that the applications (or the library) know if the network is ready to supply data for them to read or to receive data for them to write. As a widely used and representative notification mechanism, epoll in Linux provides a scalable and high-performance implementation by allowing applications to specifically indicate which connections and what events on them need to be watched. As epoll has been used in some major systems, including KV systems, such as Redis and Memcached, and web server systems such as NGINX, we have identified a substantial performance issue in its use. For the sake of efficiency, applications usually use epoll's system calls to inform the kernel exactly of what events they are interested in and always keep the information up-to-date. However, in a system with demanding network traffic, such a rigid maintenance of the information is not necessary and the excess number of system calls for this purpose can substantially degrade the system's performance. In this paper, we use Redis as an example to explore the issue. We propose a strategy of informing the kernel of the interest events in a manner adaptive to the current network load, so that the epoll system calls can be reduced and the events can be efficiently delivered. We have implemented the strategy, named as FlexPoll, in Redis without modifying any kernel code. Our evaluation on Redis shows that the query throughput can be improved by up to 46.9% on micro benchmarks, and even up to 67.8% on workloads emulating real-world access patterns. FlexPoll can be extended to other applications an- event libraries built on the epoll mechanism in a straightforward manner.
Article
Many web applications store persistent data in databases. During execution, such applications spend a significant amount of time communicating with the database for retrieval and storing of persistent data over the network. These network round trips represent a significant fraction of the overall execution time for many applications and as a result increase their latency. While there has been prior work that aims to eliminate round trips by batching queries, they are limited by 1) a requirement that developers manually identify batching opportunities, or 2) the fact that they employ static program analysis techniques that cannot exploit many opportunities for batching. In this paper, we present Sloth, a new system that extends traditional lazy evaluation to expose query batching opportunities during application execution, even across loops, branches, and method boundaries. We evaluated Sloth using over 100 benchmarks from two large-scale open-source applications, and achieved up to a 3x reduction in page load time by delaying computation.
Article
Database-backed applications are nearly ubiquitous in our daily lives. Applications that make many small accesses to the database create two challenges for developers: increased latency and wasted resources from numerous network round trips. A well-known technique to improve transactional database application performance is to convert part of the application into stored procedures that are executed on the database server. Unfortunately, this conversion is often difficult. In this paper we describe Pyxis, a system that takes database-backed applications and automatically partitions their code into two pieces, one of which is executed on the application server and the other on the database server. Pyxis profiles the application and server loads, statically analyzes the code's dependencies, and produces a partitioning that minimizes the number of control transfers as well as the amount of data sent during each transfer. Our experiments using TPC-C and TPC-W show that Pyxis is able to generate partitions with up to 3x reduction in latency and 1.7x improvement in throughput when compared to a traditional non-partitioned implementation and has comparable performance to that of a custom stored procedure implementation.
Conference Paper
Mobile applications are becoming increasingly ubiquitous and provide ever richer functionality on mobile devices. At the same time, such devices often enjoy strong connectivity with more powerful machines ranging from laptops and desktops to commercial clouds. This paper presents the design and implementation of CloneCloud, a system that automatically transforms mobile applications to benefit from the cloud. The system is a flexible application partitioner and execution runtime that enables unmodified mobile applications running in an application-level virtual machine to seamlessly off-load part of their execution from mobile devices onto device clones operating in a computational cloud. CloneCloud uses a combination of static analysis and dynamic profiling to partition applications automatically at a fine granularity while optimizing execution time and energy use for a target computation and communication environment. At runtime, the application partitioning is effected by migrating a thread from the mobile device at a chosen point to the clone in the cloud, executing there for the remainder of the partition, and re-integrating the migrated thread back to the mobile device. Our evaluation shows that CloneCloud can adapt application partitioning to different environments, and can help some applications achieve as much as a 20x execution speed-up and a 20-fold decrease of energy spent on the mobile device.