Fig 6 - uploaded by Gabriel Kerneis
Content may be subject to copyright.
Web servers comparison

Web servers comparison

Source publication
Article
Full-text available
In this paper, we introduce Continuation Passing C (CPC), a programming language for concurrent systems in which native and cooperative threads are unified and presented to the programmer as a single abstraction. The CPC compiler uses a compilation technique, based on the CPS transform, that yields efficient code and an extremely lightweight repres...

Similar publications

Article
Full-text available
Autotuning is an established technique for optimizing the performance of parallel applications. However, programmers must prepare applications for autotuning, which is tedious and error prone coding work. We demonstrate how applications become ready for autotuning with few or no modifications by extending Threading Building Blocks (TBB), a library...

Citations

... Cilk [11] runtime uses continuations to spawn and synchronize workers. Continuation Passing C [24] allows the programmers to spawn new workers to execute a particular function, put a worker to sleep and wake it up later using library functions. Li et al. [25] present an alternative way to implement concurrency in the Glasgow Haskell Compiler, by polling the pool of suspended continuations to see if blocking can be resolved. ...
Conference Paper
OpenMP uses the efficient ‘team of workers’ model, where workers are given chunks of tasks (iterations of a parallel-for-loop, or sections in a parallel-sections block) to execute, and worker (not tasks) can be synchronized using barriers. Thus, OpenMP restricts the invocation of barriers in these tasks; as otherwise, the behavior of the program would be dependent on the number of runtime workers. To address such a restriction which can adversely impact programmability and readability, Aloor and Nandivada proposed UW-OpenMP by taking inspiration from the more intuitive interaction of tasks and barriers in newer task parallel languages like X10, HJ, Chapel and so on. UW-OpenMP gives the programmer an impression that each parallel task is executed by a unique worker, and importantly these parallel tasks can be synchronized using a barrier construct. Though UW-OpenMP is a useful extension of OpenMP (more expressive and efficient), it does not admit barriers within recursive functions invoked from parallel-for-loops, because of the inherent challenges in handing them. In this paper, we extend UW-OpenMP (we call it UWOmp++) to address this challenging limitation and in the process also realize more efficient programs.
... The prototype employs standard SSA construction found in literature and converts the SSA into CPS. Since nested function definition, lambda expressions and closures are not supported in C, we adopt the implementation tricks, such as lambda lifting, function pointers and continuation-as-a-stack, which were inspired by Kerneis and Chroboczek's work [8]. ...
Conference Paper
Full-text available
Control flow obfuscation protects software from being reverse-engineered by altering the control flow transfer without changing the software's run-time semantics. We propose a new control flow obfuscation technique by rewriting the source program in the continuation passing style (CPS). The continuation is encoded through higher order combinators and function pointers at the target language level. As a result, the original control flow graph is fragmented which makes any software tampering attempt through binary static analysis hard. We implemented a prototype which performs obfuscation on C source codes. The benchmark shows that this approach is practical compared to existing techniques.
... Fork-join, task, async, and event functions appear not to rely on a specific language design. There is a big research debate about the relationship between threads and events in systems research [16]. ...
Article
Full-text available
Asynchronous programming has appeared as a programming style that overcomes undesired properties of concurrent programming. Typically in asynchronous models of programming, methods are posted into a post list for latter execution. The order of method executions is serial, but nondeterministic. This paper presents a new and simple, yet powerful, model for asynchronous programming. The proposed model consists of two components; a context-free grammar and an operational semantics. The model is supported by the ability to express important applications. An advantage of our model over related work is that the model simplifies the way posted methods are assigned priorities. Another advantage is that the operational semantics uses the simple concept of singly linked list to simulate the prioritized process of methods posting and execution. The simplicity and expressiveness make it relatively easy for analysis algorithms to disclose the otherwise un-captured programming bugs in asynchronous programs.
... Continuation-Passing C Since event-driven programming is more difficult but more efficient than threaded programming, it is natural to want to at least partially automate it. Continuation-Passing C (CPC [24]) is an extension of the C programming language for writing concurrent systems, built on top of the C Intermediate Language (CIL) framework [32]. The CPC programmer manipulates very lightweight threads, annotating cooperative functions and choosing whether they should be cooperatively or preemptively scheduled at any given point. ...
... CPS conversion has been applied at least to C [26], C++ [30], and Javascript [31]. To the best of our knowledge, CPC is the only public implementation for the C language, as well as the only one using lambda-lifting to avoid the runtime overhead of environments [24]. ...
... Continuation-Passing C (CPC) [24] is an extension of the C language to write concurrent programs. The programmer writes synchronous code in threaded style, using common synchronisation techniques such as condition variables. ...
Conference Paper
Full-text available
Coroutines and events are two common abstractions for writing concurrent programs. Because coroutines are often more convenient, but events more portable and efficient, it is natural to want to translate the former into the latter. CPC is such a source-to-source translator for C programs, based on a partial conversion into continuation-passing style (CPS conversion) of functions annotated as cooperative. In this article, we study the application of the CPC translator to QEMU, an open-source machine emulator which also uses annotated coroutine functions for concurrency. We first propose a new type of annotations to identify functions which never cooperate, and we introduce CoroCheck, a tool for the static analysis and inference of cooperation annotations. Then, we improve the CPC translator, defining CPS conversion as a calling convention for the C language, with support for indirect calls to CPS-converted function through function pointers. Finally, we apply CoroCheck and CPC to QEMU (750 000 lines of C code), fixing hundreds of missing annotations and comparing performance of the translated code with existing implementations of coroutines in QEMU. Our work shows the importance of static annotation checking to prevent actual concurrency bugs, and demonstrates that CPS conversion is a flexible, portable, and efficient compilation technique, even for very large programs written in an imperative language.
... Since event-driven programming is more difficult but more efficient than threaded programming, it is natural to want to at least partially automate it. Continuation-Passing C (CPC [10]) is an extension of the C programming language for writing concurrent systems. The CPC programmer manipulates very lightweight threads, choosing whether they should be cooperatively or preemptively scheduled at any given point. ...
... The CPC translator is structured in a series of proven source-to-source transformations [10], which turn a threaded-style CPC program into an equivalent event-driven C program. Boxing first encapsulates a small number of variables in environments. ...
... Note that because C is a call-by-value language, lifted parameters are duplicated rather than shared and this step is not correct in general. It is however sound in the case of CPC because lifted functions are called in tail position: they never return, which guarantees that at most one copy of each parameter is reachable at any given time [10]. Block floating is then a trivial extraction of closed, inner functions at top-level. ...
Article
Full-text available
Threads and events are two common abstractions for writing concurrent programs. Because threads are often more convenient, but events more efficient, it is natural to want to translate the former into the latter. However, whereas there are many different event-driven styles, existing translators often apply ad-hoc rules which do not reflect this diversity. We analyse various control-flow and data-flow encodings in real-world event-driven code, and we observe that it is possible to generate any of these styles automatically from threaded code, by applying certain carefully chosen classical program transformations. In particular, we implement two of these transformations, lambda lifting and environments, in CPC, an extension of the C language for writing concurrent systems. Finally, we find out that, although rarely used in real-world programs because it is tedious to perform manually, lambda lifting yields better performance than environments in most of our benchmarks.
... This paper is a companion technical report to the article "Continuation-Passing C: from threads to events through continuations" [4]. It contains the complete version of the proofs presented in the article. ...
Technical Report
Full-text available
This paper is a companion technical report to the article "Continuation-Passing C: from threads to events through continuations". It contains the complete version of the proofs of correctness of lambda-lifting and CPS-conversion presented in the article.
... Continuation Passing C (CPC) [4,6] is a translator that converts a program written in threaded style into a program written with events and native system threads, at the programmer's choice. Threads in CPC, when compiled to events, are extremely cheap, roughly two orders of magnitude cheaper than in traditional programming systems; this encourages a somewhat unusual programming style. ...
... The CPC translation process itself is described in detail elsewhere [6]. ...
... In this code, we first trigger an asynchronous read of the on-disk data (1), and immediately yield to threads servicing other clients (2) in order to give the kernel a chance to perform the read. When we are scheduled again, we check whether the read has completed (3); if it has, we perform a non-blocking write (7); if it hasn't, we yield one more time (4) and, if that fails again (5), delegate the work to a native thread which can block (6). ...
Article
Full-text available
Threads are a convenient and modular abstraction for writing concurrent programs, but often fairly expensive. The standard alternative to threads, event-loop programming, allows much lighter units of concurrency, but leads to code that is difficult to write and even harder to understand. Continuation Passing C (CPC) is a translator that converts a program written in threaded style into a program written with events and native system threads, at the programmer's choice. Together with two undergraduate students, we taught ourselves how to program in CPC by writing Hekate, a massively concurrent network server designed to efficiently handle tens of thousands of simultaneously connected peers. In this paper, we describe a number of programming idioms that we learnt while writing Hekate; while some of these idioms are specific to CPC, many should be applicable to other programming systems with sufficiently cheap threads.
Article
The emergence of energy harvesting devices creates the potential for batteryless sensing and computing devices. Such devices operate only intermittently, as energy is available, presenting a number of challenges for software developers. Programmers face a complex design space requiring reasoning about energy, memory consistency, and forward progress. This paper introduces Alpaca, a low-overhead programming model for intermittent computing on energy-harvesting devices. Alpaca programs are composed of a sequence of user-defined tasks. The Alpaca runtime preserves execution progress at the granularity of a task. The key insight in Alpaca is the privatization of data shared between tasks. Shared values written in a task are detected using idempotence analysis and copied into a buffer private to the task. At the end of the task, modified values from the private buffer are atomically committed to main memory, ensuring that data remain consistent despite power failures. Alpaca provides a familiar programming interface, a highly efficient runtime model, and places fewer restrictions on a target device's hardware architecture. We implemented a prototype of Alpaca as an extension to C with an LLVM compiler pass. We evaluated Alpaca, and directly compared to two systems from prior work. Alpaca eliminates checkpoints, which improves performance up to 15x, and avoids static multi-versioning, which improves memory consumption by up to 5.5x.
Thesis
Full-text available
Most computer programs are concurrent ones: they need to perform several tasks at the same time. Threads and events are two common techniques to implement concurrency. Events are generally more lightweight and efficient than threads, but also more difficult to use. Additionally, they are often not powerful enough; it is then necessary to write hybrid code, that uses both preemptively-scheduled threads and cooperatively-scheduled event handlers, which is even more complex. In this dissertation, we show that concurrent programs written in threaded style can be translated automatically into efficient, equivalent event-driven programs through a series of proven source-to-source transformations. We first propose Continuation-Passing C, an extension of the C programming language for writing concurrent systems that provides very lightweight, unified (cooperative and preemptive) threads. CPC programs are processed by the CPC translator to produce efficient sequentialized event-loop code, using native threads for the preemptive parts. We then define and prove the correctness of these transformations, in particular lambda lifting and CPS conversion, for an imperative language. Finally, we validate the design and implementation of CPC by comparing it to other thread librairies, and by exhibiting our Hekate BitTorrent seeder. We also justify the choice of lambda lifting by implementing eCPC, a variant of CPC using environments, and comparing its performances to CPC.