Article

Why threads are a bad idea (for most purposes)

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Ever since the advent of concurrent operating systems and interrupts, events (and event handlers respectively) have also served as a programming abstraction. In the last decade, event-driven programming (see Adya et al. 2002;Bolosky et al. 2000;Cunningham and Kohler 2005;Desai et al. 2013;Floyd et al. 1997;Ousterhout 1996;Franklin and Zdonik 1997;Welsh et al. 2001;Shih et al. 2002;Gustafsson 2005;Odersky 2006, 2009;Li and Zdancewic 2007;Foltzer et al. 2012) has become a major paradigm for developing distributed systems. Garcia et al. (2013) show the 20-fold increase of the licenses for so-called "messaging" middleware systems. ...
... Global reasoning about control flow Straightforward decoupling makes it hard to globally reason about control flow of event-driven processes because (1) shared events as a means of communication leads to competing recipients (Chin and Millstein 2006) and (2) structuring processes as collections of event handlers leads to the problem of stack management (von Behren et al. 2003) as interaction logic is fragmented across multiple event handlers. This has led to the emergence of a family of event-driven systems (see Adya et al. 2002;Bolosky et al. 2000;Cunningham and Kohler 2005;Ousterhout 1996;Welsh et al. 2001;Shih et al. 2002;Gustafsson 2005;Odersky 2006, 2009;Li and Zdancewic 2007;Foltzer et al. 2012) that address these challenges by either using "call-return" (see Adya et al. 2002;Bolosky et al. 2000;Cunningham and Kohler 2005;Ousterhout 1996;Welsh et al. 2001) or coroutine primitives (see Shih et al. 2002;Gustafsson 2005;Odersky 2006, 2009;Li and Zdancewic 2007;Foltzer et al. 2012). Both approaches sacrifice decoupling, i.e., processes are unnamed and do not refer to each other. ...
... Global reasoning about control flow Straightforward decoupling makes it hard to globally reason about control flow of event-driven processes because (1) shared events as a means of communication leads to competing recipients (Chin and Millstein 2006) and (2) structuring processes as collections of event handlers leads to the problem of stack management (von Behren et al. 2003) as interaction logic is fragmented across multiple event handlers. This has led to the emergence of a family of event-driven systems (see Adya et al. 2002;Bolosky et al. 2000;Cunningham and Kohler 2005;Ousterhout 1996;Welsh et al. 2001;Shih et al. 2002;Gustafsson 2005;Odersky 2006, 2009;Li and Zdancewic 2007;Foltzer et al. 2012) that address these challenges by either using "call-return" (see Adya et al. 2002;Bolosky et al. 2000;Cunningham and Kohler 2005;Ousterhout 1996;Welsh et al. 2001) or coroutine primitives (see Shih et al. 2002;Gustafsson 2005;Odersky 2006, 2009;Li and Zdancewic 2007;Foltzer et al. 2012). Both approaches sacrifice decoupling, i.e., processes are unnamed and do not refer to each other. ...
Article
Full-text available
Event-driven programming has become a major paradigm in developing concurrent, distributed systems. Its benefits are often informally captured by the key tenet of “decoupling,” a notion which roughly captures the ability of processes to join and leave (or fail) applications dynamically, and to be developed by independent parties. Programming models for event-driven programming either make it hard to globally reason about control flow, thus hampering sound execution, or sacrifice decoupling to aid in reasoning about control flow. This work fills the gap by introducing a programming model—dubbed cooperative decoupled processes—that achieves both decoupling and global reasoning about control flow. We introduce this programming model through an event calculus, loosely inspired by the Join calculus, that enables reasoning about cooperative decoupled processes through the concepts of pre- and postconditions. A linear type system controls aliasing of events to avoid a break of control flow and thus safe exchange of shared events. Fundamental properties of the type system such as subject reduction, migration safety, and progress are established.
... Even when connections remain open for a long time, the scheduling overhead imposed on the operating system becomes non-neglectable. The use of threads improves the scalability, but leaves the basic scalability problem unsolved [5]. ...
... Discussion. Even though the equivalence of both models was proved very early [7] there has been a huge debate on about which execution model is favourable [5,8]. The current state of the art suggests that reasonable performance is possible with both models [9], provided that hardwarespecific fine-tuning takes place. ...
... Furthermore, lock-based synchronisation is prone to subtle and hard-to-find errors in the locking protocol, leading to dead-or live-locks. Finally, locks are not composable [5]. That is, the composition of two application modules with correct locking protocols, may yield a broken locking protocol. ...
Article
Modern web applications are concurrently used by many users and provide increasingly interactive features. Multi-core processors, highly distributed backend architectures, and new web technologies force a reconsideration of approaches for concurrent programming in order to fulfil scalability demands and to implement modern web application features. We provide a survey on different concepts and techniques of concurrency inside web architectures and guide through viable concurrency alternatives for architects and developers.
... Nonetheless, threads have a number of difficulties that make it questionable to expose them to programmers as a way to build concurrent programs [41,45,33,22]. Nontrivial multithreaded programs are astonishingly difficult to understand, and can yield unexpected behaviors, nondeterminate behaviors, deadlock, and livelock. ...
... Some argue that message passing is a bad idea [5]. Some argue the contrary [41,47,49]. Gorlatch [19] argues against the direct use of send-receive primitives in message-passing libraries, advocating instead the use of collective operations (like MPI's broadcast, gather, and scatter). ...
... The term "events" is generally used to refer to the kind of processing that goes on in GUI design. As defined by Ousterhout [41], event-driven programming has one execution stream (no "CPU concurrency"), registered callbacks, an event loop that waits for events and invokes handlers, and no preemption of handlers. However, message passing libraries like MPI are not (to my knowledge) used in this way. ...
Article
Full-text available
... Namely, explicit time management using the system clock to tag events, and implicit time management by means of the next-instruction abstraction in programming languages and computer systems. These simple abstractions have created complex usage patterns to address massive parallelism (see, common concurrency patterns in [7]), frequent resource sharing errors (eg, liveness and data-race errors [6,5]) and convoluted event ordering and synchronization algorithms(see, for example, [25]). ...
... Finally, the modeler should take into consideration concurrency and composition, in particular considering that the problem of synchronization of parallel activities have created a plethora of abstractions (e.g., thread, process, tasks) and several complex usage patterns (see, common concurrency patterns in [7]), frequent resource sharing errors (e.g. liveness and data-race errors [6]) and convoluted event ordering and synchronization algorithms [25]. ...
Chapter
Full-text available
In this paper, we propose REAL-T, a distributed event-based language with explicit support for time manipulation. The language introduces automata for operational time manipulation, causality constructs and Linear Temporal Logic for declarative time predicates, and a distributed-time aware event model. We have developed a compiler for the language and a dynamic run-time framework. To validate the proposal we study detection of complex patterns of security vulnerabilities in IoT scenarios.
... Main examples are Rich Internet Applications, based on HTM5 and JavaScript, and mobile applications, based on e.g. the Android platform. Besides such recent technologies, the use of eventdriven architectures and the comparison with thread-based ones have been discussed in literature from long time, from different point of views [39,55] (a quick overview is reported in Section 2). ...
... Besides the actor context, the dualism between multi-threaded and eventdriven models is a well-known topic discussed in literature in particular in the context of Operating Systems [35,39,55], as well as asynchronous I/O management. This paper is related in particular to those works that aim at integrating the models, so as to finally simplify programming and improve modularity, avoiding problems such as stack-ripping [1], in which the logical control flow between operations is broken across a series of callbacks. ...
Article
Event loops are a main control architecture to implement actors. In this paper we first analyse the impact that this choice has on the design of actor-based concurrent programs. Then, we discuss control loops as the main architecture adopted to implement agents, and we frame them as an extension of event loops effective to improve the programming of autonomous components that need to integrate both reactive and proactive behaviours, in a modular way.
... We now review the Listing 6 in detail. The piece of code expresses two pairs of pointcuts and advices: one is for blocking (lines 2-10) and another for restarting (lines [12][13][14][15][16][17][18][19][20]. For the¯rst pointcut-advice pair, the advice is invoked after the execution Request.send. ...
... Although threads provide the means to write a set of asynchronous executions as synchronous fashion, avoiding callback hell, it leads to other problems. Ousterhout in [18] states that threads are more di±cult to program than events because the programmers must reason about a shared state. SyncAS can be applied to the eventbased style with single-threaded applications, meaning that the programmers do not need to consider the drawbacks of threads while avoiding issues such as callback hell. ...
Article
Full-text available
Asynchronous programming has been widely adopted in domains such as Web development. This programming style usually uses callback methods, non-blocking operations, allowing high responsive user interactions even if an application works without multi-threading. However, this style requires the uncoupling of a module into two sub-modules at least, which are not intuitively connected by a callback method. The separation of modules spurs the birth of other issues: callback spaghetti and callback hell. This paper proposes a virtual block approach to address the previous two issues. This approach enables a programmer to virtually block a program execution and restart it at arbitrary points in the program. As a result, programmers do not need to uncouple a module even if non-blocking operations are adopted; therefore, callback dependencies disappear. Using aspect-oriented programming, this approach uses aspects to control the execution of a program in an oblivious manner. As a consequence, programmers do not need to be concerned whether pieces of code use blocking or non-blocking operations. We implement a proof-of-concept for this approach, called SyncAS, for ActionScript3. In addition, we apply our proposal to a toy application and conduct experiments to show its modular application, flexibility, and performance.
... However, while cooperative threads/fibers can hide the blocking problem from the programmer, they rely on the programmer to explicitly invoke the scheduler, either by invoking a blocking call or by calling a yield() function. Long historical experience both in academia [42] and in practice in the pre-2000 versions of MacOS and Windows [45], suggests that programmers are likely to get this wrong. In a cooperatively scheduled system, if one programmer gets it wrong in one place, the whole system suffers. ...
... The scheduling mechanisms of the WSN OS can be classified into two kinds: the event-driven scheduling and the preemptive multithreaded scheduling. 7,8 In the event-driven system, the preemption cannot be performed. All the tasks are executed one by one within the global stack. ...
Article
Full-text available
Memory optimization, energy conservation, over-the-air reprogramming, and fault tolerance are the critical challenges for the proliferation of the wireless sensor network. To address these challenges, a new wireless sensor network platform termed LiveWSN is presented in this article. In LiveWSN, several new design concepts are implemented, such as the hierarchical shared-stack scheduling and the pre-linking native-code reprogramming. By doing so, the data memory cost of the LiveWSN scheduling system can be optimized by 25% if compared with that of the traditional multithreaded MANTIS OS. Moreover, the application reprogramming code size can be decreased by 72.6% if compared with that of the Contiki dynamic-linking reprogramming. In addition to the new design concepts, the new research approach which addresses the energy conservation and fault tolerance challenges by combining the software technique and the multi-core hardware technique is applied in LiveWSN. By means of the multi-core hardware infrastructure, the lifetime of the LiveWSN nodes can be prolonged by 34% if compared with the single-core Live node. Moreover, the fault-tolerant performance of the wireless sensor network node can be optimized significantly. With the above features, LiveWSN becomes memory efficient, energy efficient, reprogrammable, and fault tolerant, and it can run on the high resource-constrained nodes to execute the outdoor real-time wireless sensor network applications.
... There has been a long-standing debate in the research community about the best programming model for highconcurrency; this debate has often focused on threads and events in particular. Ousterhout [28] enumerated a number of potential advantages for events. Similarly, recent work on scalable servers advocates the use of events. ...
Conference Paper
Full-text available
This paper presents Capriccio, a scalable thread package for use with high-concurrency servers. While recent work has advocated event-based systems, we believe that thread-based systems can provide a simpler programming model that achieves equivalent or superior performance. By implementing Capriccio as a user-level thread package, we have decoupled the thread package implementation from the underlying operating system. As a result, we can take advantage of cooperative threading, new asynchronous I/O mechanisms, and compiler support. Using this approach, we are able to provide three key features: (1) scalability to 100,000 threads, (2) efficient stack management, and (3) resource-aware scheduling. We introduce linked stack management, which minimizes the amount of wasted stack space by providing safe, small, and non-contiguous stacks that can grow or shrink at run time. A compiler analysis makes our stack implementation efficient and sound. We also present resource-aware scheduling , which allows thread scheduling and admission control to adapt to the system's current resource usage. This technique uses a blocking graph that is automatically derived from the application to describe the flow of control between blocking points in a cooperative thread package. We have applied our techniques to the Apache 2.0.44 web server, demonstrating that we can achieve high performance and scalability despite using a simple threaded programming model.
... Our experiments reveal another interesting evolutionary trend: the replacement of POSIX APIs for asynchronous I/O with new abstractions built on multithreading abstractions. The nature and purpose of threads has been a debate in OS research for a long time [21,37,41]. POSIX makes no attempt to prioritize a threading model over an event-based model; it simply outlines APIs necessary for both. ...
Conference Paper
Full-text available
The POSIX standard, developed 25 years ago, comprises a set of operating system (OS) abstractions that aid application portability across UNIX-based OSes. While OSes and applications have evolved tremendously over the last 25 years, POSIX, and the basic set of abstractions it provides, has remained largely unchanged. Little has been done to measure how and to what extent traditional POSIX abstractions are being used in modern OSes, and whether new abstractions are taking form, dethroning traditional ones. We explore these questions through a study of POSIX usage in modern desktop and mobile OSes: Android, OS X, and Ubuntu. Our results show that new abstractions are taking form, replacing several prominent traditional abstractions in POSIX. While the changes are driven by common needs and are conceptually similar across the three OSes, they are not converging on any new standard, increasing fragmentation.
... There has been a long-standing debate in the research community about the best programming model for highconcurrency; this debate has often focused on threads and events in particular. Ousterhout [28] enumerated a number of potential advantages for events. Similarly, recent work on scalable servers advocates the use of events. ...
Conference Paper
Full-text available
This paper presents Capriccio, a scalable thread package for use with high-concurrency servers. While recent work has advocated event-based systems, we believe that threadbased systems can provide a simpler programming model that achieves equivalent or superior performance. By implementing Capriccio as a user-level thread package, we have decoupled the thread package implementation from the underlying operating system. As a result, we can take advantage of cooperative threading, new asynchronous I/O mechanisms, and compiler support. Using this approach, we are able to provide three key features: (1) scalability to 100,000 threads, (2) efficient stack management, and (3) resource-aware scheduling. We introduce linked stack management, which minimizes the amount of wasted stack space by providing safe, small, and non-contiguous stacks that can grow or shrink at run time. A compiler analysis makes our stack implementation efficient and sound. We also present resource-aware scheduling, which allows thread scheduling and admission control to adapt to the system’s current resource usage. This technique uses a blocking graph that is automatically derived from the application to describe the flow of control between blocking points in a cooperative thread package. We have applied our techniques to the Apache 2.0.44 web server, demonstrating that we can achieve high performance and scalability despite using a simple threaded programming model.
... Coscheduling programmes that involve multiple threads is complicated because of the complex architecture of multi-threaded systems [10,11]; it imposes numerous challenges and complications. These include: 1) excessive power consumption [12]; 2) di culties in achieving scalability [13]; 3) avoiding deadlocks [14]; 4) achieving portable and predictable performance. Multi-threaded programmes often utilise di↵erent patterns of cache usage. ...
Thesis
Full-text available
This thesis answers the question whether a scheduler needs to take into account where communicating threads in multi-threaded applications are executed. The impact of cache on data-sharing in multi-threaded environments is measured. This work investigates a common base–case scenario in the telecommunication industry, where a programme has one thread that writes data and one thread that reads data. A taxonomy of inter-thread communication is defined. Furthermore, a mathematical model that describes inter-thread communication is presented. Two cycle–level experiments were designed to measure latency of CPU registers, cache and main memory. These results were utilised to quantify the model. Three application–level experiments were used to verify the model by comparing predictions of the model and data received in the real-life setting. The model broadens the applicability of experimental results, and it describes three types of communication outlined in the taxonomy. Storing communicating data across all levels of cache does have an impact on the speed of data–intense multi-threaded applications. Scheduling threads in a sender–receiver scenario to different dies in a multi-chip processor decreases speed of execution of such programmes by up to 37%. Pinning such threads to different cores in the same chip results in up to 5% decrease in speed of execution. The findings of this study show how threads need to be scheduled by a cache-aware scheduler. This project extends the author’s previous work, which investigated cache interference.
... Despite that, multi-threaded programming was still an error prone task to achieve, as it was subject to race conditions, very complex scenarios to debug. The disadvantages and common problems with using threads were well summarized by Ousterhout [24]. Dataflow programming was able to provide parallelism without the increased complexity involved in the management of threads. ...
Conference Paper
Full-text available
Dataflow Programming (DFP) has been a research topic of Software Engineering since the ‘70s. The paradigm models computer programs as a direct graph, promoting the application of dataflow diagram principles to computation, opposing the more linear and classical Von Neumann model. DFP is the core to most visual programming languages, which claim to be able to provide end-user programming: with it’s visual interface, it allows non-technical users to extend or create applications without programming knowledges. Also, DFP is capable of achieving parallelization of computation without introducing development complexity, resulting in an increased performance of applications built with it when using multi-core computers. This survey describes how visual programming languages built on top of DFP can be used for end-user programming and how easy it is to achieve concurrency by applying the paradigm, without any development overhead. DFP’s open problems are discussed and some guidelines for adopting the paradigm are provided.
... Du fait des avantages et inconvénients présents dans les modèles threadé et événementiel, ces deux-là ont été sujets à une longue polémique [56,62,84,79,40,64]. Ces débats ont menés à de nombreuses tentatives d'améliorations, que nous décrivons ici. ...
Article
Full-text available
This thesis studies the performances of data servers on multicores. More precisely, we focus on the scalability with the number of cores. First, we study the internals of an event-driven multicore runtime. We demonstrate that false sharing and inter-core communications hurt performances badly, and prevent applications from scaling. We then propose several optimisations to fix these issues. In a second part, we compare the multicore performances of three Webservers, each reprensentative of a programming model. We observe that the differences between each server's performances vary as the number of cores increases. We are able to pinpoint the cause of the scalability limitation observed. We present one approach and some perspectives to overcome this limit.
... C'est la raison pour laquelle l'approche à événements a souvent été mise en avant dans le cas où une concurrence massive ou légère était nécessaire. Cependant elle nécessite d'utiliser un ordonnancement coopératif et de découper les traitements en une série de callbacks, rendant le flot de contrôle de chaque traitement peu apparent [11,15]. ...
... Event-loops have become popular because several applications using that model have shown to have a lower memory consumption, better performance and better scalability than equivalent programs written in a threaded model [1] [2]. The event-based model is also considered simpler than using threads, since threading requires proper synchronization and it is more difficult to debug [3]. A counter-argument against events is that reasoning about the control flow is difficult and with careful reengineering, threaded approaches can achieve similar performance values [4]. ...
Conference Paper
Full-text available
We propose a model for event-oriented programming under shared memory based on access permissions with explicit parallelism. In order to obtain safe parallelism, programmers need to specify the variable permissions of functions. Blocking operations are non existent, and callback-based APIs are used instead, which can be called in parallel for different events as long as the access permissions are guaranteed. This model scales for both IO and CPU-bounded programs. We have implemented this model in the Eve language, which includes a compiler that generates parallel tasks with synchronization on top of variables, and a work-stealing runtime that uses the epoll interface to manage the event loop. We have also evaluated that model in micro-benchmarks in programs that are either CPU-intensive or IO-intensive with and without shared data. In CPU-intensive programs, it achieved results very close to multithreaded approaches. In the share-nothing IO-intensive benchmark it outperformed all other solutions. In shared-memory IO-intensive benchmark it outperformed other solutions with a more or equal value of writes than read operations.
... Traditionally, GUIs do not require threads, which is why they usually function with event loops. This conclusion [Ous96] was highly debated at the time when many new GUI libraries appeared, but is nowadays widely accepted by the majority of GUI libraries in use. Suppose a text editor window, used to write documents, with a blinking cursor. ...
Article
In order to avoid the problems raised by the integration of a growing number of programmable home appliances, we propose a language with mobile agents. These mobile agents are capable of migrating from one appliance or computer to another in order to work on its local resources, which allows us to benefit from each appliance's capabilities from a single program. This language is called ULM: Un Langage pour la Mobilité. We present in this dissertation its features, its differences with other languages, as well as its implementation. ULM is based on the Scheme language, to which we have added functionality linked with mobility and the communication of mobile agents. ULM has a number of primitives allowing the creation of strongly mobile agents, with a cooperative deterministic scheduling, and control primitives such as suspension or weak preemption. We present in this dissertation the integration of these primitives in the Scheme language, as well as their interaction and the addition of new primitives such as strong preemption and safe migration. We then present the denotational semantics, and its implementation with a bytecode compiler and two virtual machines: one written in Bigloo Scheme for execution on traditional computers, the other in Java ME for mobile phones. We present then the possible use of ULM as a replacement for programs written for event loops, the interfacing of ULM and external languages, a few examples of ULM applications, and future work before we conclude.
... Errors in applying synchronization primitives easily lead to data corruption or deadlocks -often resulting in non-deterministic and hard to debug behavior, so-called heisenbugs. 2 Additionally, fine-grained locking may introduce significant overhead, coarse-grained locking may unnecessarily reduce parallelism. These difficulties have been addressed by Ousterhout and van Renesse [105,127]. Ousterhout favors event handlers, while van Renesse proposes goal-oriented programming describing dependencies between tasks on a higher level hiding low-level mechanisms such as semaphores. These approaches, however, do not fit well with real-time requirements or streams. ...
... Execution model of an OS directly influence its performance. The most two popular model used in WSN motes are event-based [28] such as TinyOS [18] and EYES/PEEROS [29] and thread based such as MantisOS [18]. Table 1 ...
Article
Full-text available
Conventional Wireless Sensor Network (WSN) application mainly deals with scalar data such as temperature, humidity, pressure and light, which are very suitable for low rate and low power networking technology such as IEEE 802.15.4 standard. The availability of commercially off the shelf (COTS) complementary metal-oxide semiconductor (CMOS) camera has made a single chip solution possible and consequently fostered researchers to push WSN a step further. Multimedia data delivery unique properties posed new challenges for resource-constrained sensor networks. Transferring raw data is very expensive while sensor nodes processing power put a serious limitation on it for any sophisticated multimedia processing. This project proposed a new platform for wireless multimedia sensor network (WMSN) namely TelG mote and WiseOS operating system to support the mote operation. The mote design for WMSN consists of ATmega644PV microcontroller from Atmel Co. as its processing unit, XBee module as its communication unit and C328R CMOS camera as its sensing unit. To hide low-level details of TelG motes such as processor management, memory management, device management, scheduling policies and multitasking, the developed WiseOS provides a clear application programming interface (API) to the application developer. WiseOS is designed to be monolithic, event-driven and using first-in first-out (FIFO) scheduler policy. The low rate video/image streaming application that was developed shows that multi-hop communication for multimedia content in WMSN using TelG mote supported by WiseOS proved to be practical.
... An operating system (OS) is important for the WSN as it can manage the hardware resources and serve the application development. The current WSN OSes can be classified into two kinds: event-driven OS and multithreaded OS [6,7]. In the event-driven OS, preemption is not supported; one task can be executed only after the previous one runs to completion (Figure 1b). ...
Article
Full-text available
Memory and energy optimization strategies are essential for the resource-constrained wireless sensor network (WSN) nodes. In this article, a new memory-optimized and energy-optimized multithreaded WSN operating system (OS) LiveOS is designed and implemented. Memory cost of LiveOS is optimized by using the stack-shifting hybrid scheduling approach. Different from the traditional multithreaded OS in which thread stacks are allocated statically by the pre-reservation, thread stacks in LiveOS are allocated dynamically by using the stack-shifting technique. As a result, memory waste problems caused by the static pre-reservation can be avoided. In addition to the stack-shifting dynamic allocation approach, the hybrid scheduling mechanism which can decrease both the thread scheduling overhead and the thread stack number is also implemented in LiveOS. With these mechanisms, the stack memory cost of LiveOS can be reduced more than 50% if compared to that of a traditional multithreaded OS. Not is memory cost optimized, but also the energy cost is optimized in LiveOS, and this is achieved by using the multi-core "context aware" and multi-core "power-off/wakeup" energy conservation approaches. By using these approaches, energy cost of LiveOS can be reduced more than 30% when compared to the single-core WSN system. Memory and energy optimization strategies in LiveOS not only prolong the lifetime of WSN nodes, but also make the multithreaded OS feasible to run on the memory-constrained WSN nodes.
... An operating system (OS) is important for the WSN as it can manage the hardware resources and serve the application development. The current WSN OSes can be classified into two kinds: event-driven OS and multithreaded OS [6,7]. In the event-driven OS, preemption is not supported; one task can be executed only after the previous one runs to completion (Figure 1b). ...
Article
Full-text available
Memory and energy optimization strategies are essential for the resource-constrained wireless sensor network (WSN) nodes. In this article, a new memory-optimized and energy-optimized multithreaded WSN operating system (OS) LiveOS is designed and implemented. Memory cost of LiveOS is optimized by using the stack-shifting hybrid scheduling approach. Different from the traditional multithreaded OS in which thread stacks are allocated statically by the pre-reservation, thread stacks in LiveOS are allocated dynamically by using the stack-shifting technique. As a result, memory waste problems caused by the static pre-reservation can be avoided. In addition to the stack-shifting dynamic allocation approach, the hybrid scheduling mechanism which can decrease both the thread scheduling overhead and the thread stack number is also implemented in LiveOS. With these mechanisms, the stack memory cost of LiveOS can be reduced more than 50% if compared to that of a traditional multithreaded OS. Not is memory cost optimized, but also the energy cost is optimized in LiveOS, and this is achieved by using the multi-core “context aware” and multi-core “power-off/wakeup” energy conservation approaches. By using these approaches, energy cost of LiveOS can be reduced more than 30% when compared to the single-core WSN system. Memory and energy optimization strategies in LiveOS not only prolong the lifetime of WSN nodes, but also make the multithreaded OS feasible to run on the memory-constrained WSN nodes.
Article
Released as open source in November 2009, Go has become the foundation for critical infrastructure at every major cloud provider. Its creators look back on how Go got here and why it has stuck around.
Article
Modern server software is demanding to develop and operate: it must be available at all times and in all locations; it must reply within milliseconds to user requests; it must respond quickly to capacity demands; it must process a lot of data and even more traffic; it must adapt quickly to changing product needs; and in many cases it must accommodate a large engineering organization, its many engineers the proverbial cooks in a big, messy kitchen.
Chapter
abstractWe motivate why event-driven approaches are suitable to address the challenges of mobile and ubiquitous computing. In particular, we describe the beneficial properties of event-based communication in so-called mobile ad hoc networks. However, because contemporary programming languages feature no built-in support for event-driven programming, programmers are often forced to integrate event-driven concepts with a different programming paradigm. In particular, we study the difficulties in combining events with the object-oriented paradigm. We argue that these difficulties form the basis of what we call the object-event impedance mismatch. We highlight the various issues at the software engineering level and propose to resolve this mismatch by introducing a novel object-oriented programming language that supports event-driven abstractions from the ground up.
Chapter
Graph partition quality affects the overall performance of distributed graph computing systems. The quality of a graph partition is measured by the balance factor and edge cut ratio. A balanced graph partition with small edge cut ratio is generally preferred since it reduces the high network communication cost. However, through an empirical study on Giraph, we find that the performance over well partitioned graph might be even two times worse than simple random partitions. The reason is that the systems only optimize for the simple partition strategies and cannot efficiently handle the increasing workload of local message processing when a high-quality graph partition is used. In this chapter, we introduce a novel partition-aware graph computing system named PAGE, which equips a new message processor and a dynamic concurrency control model. The new message processor concurrently processes local and remote messages in a unified way. The dynamic model adaptively adjusts the concurrency of the processor based on the online statistics. The experimental studies demonstrate the superiority of PAGE over the graph partitions with various qualities.
Conference Paper
Full-text available
Large infrastructures of Internet companies, such as Facebook and Twitter, are composed of several layers of micro-services. While this modularity provides scalability to the system, the I/O associated with each service request strongly impacts its performance. In this context, writing concise programs which execute I/O efficiently is especially challenging. In this paper, we introduce Ÿauhau, a novel compile-time solution. Ÿauhau reduces the number of I/O calls through rewrites on a simple expression language. To execute I/O concurrently, it lowers the expression language to a dataflow representation. Our approach can be used alongside an existing programming language, permitting the use of legacy code. We describe an implementation in the JVM and use it to evaluate our approach. Experiments show that Ÿauhau can significantly improve I/O, both in terms of the number of I/O calls and concurrent execution. Ÿauhau outperforms state-of-the-art approaches with similar goals.
Conference Paper
The focus of the paper is the use of asynchronous I/O calls in web applications to improve their scalability, by increasing the number of requests per second that it can process and decreasing the average response time of the system. Popular development frameworks have always included only blocking I/O APIs in their base, making asynchronous I/O methods hard to implement and maintain. Significant effort has been made in recent years to enrich these frameworks with better syntax for asynchronous API to improve developers’ experience and encourage its use. Such improvement in .NET’s syntax is put to the test in this paper and the results are presented and evaluated.
Conference Paper
Asynchronous programming is one of the currently dominant programming techniques. The key idea of asynchronous programming is to use a queue to post computations as tasks (events). This queue is used then by the application to pick events asynchronously for processing. Asynchronous programming was found very convenient for achieving a massive percentage of software systems used today such as Gmail and Facebook which are Web 2.0 JavaScript ones. This paper presents a novel technique for searching for nonterminating executions in asynchronous programming. The targeted cases of nontermination are those caused by the posting concept. The proposed technique is based on graphical representation for posting behaviours of asynchronous programs. Proofs for termination and correctness of the proposed method is outlined in the paper.
Article
CéU is a synchronous language targeting soft real-time systems. It is inspired by Esterel and has a simple semantics with fine-grain control over program execution. CéU uses an event-triggered notion of time that enables compile-time checks to detect conflicting concurrent statements, resulting in deterministic and concurrency-safe programs. We present the particularities of our design in comparison to Esterel, such as stack-based internal events, concurrency checks, safe integration with C, and first-class timers. We also present two implementation back ends: one aiming for resource efficiency and interoperability with C, and another as a virtual machine that allows remote reprogramming.
Article
In Model-Driven Engineering system-level approaches, the design of communication protocols and patterns is subject to the design of processing operations (computations) and to their mapping onto execution resources. However, this strategy allows us to capture simple communication schemes (e.g., processor-bus-memory) and prevents us from evaluating the performance of both computations and communications (e.g., impact of application traffic patterns onto the communication interconnect) in a single step. To solve these issues, we introduce a novel design approach-the ψ-chart-where we design communication patterns and protocols independently of a system's functionality and resources, via dedicated models. At the mapping step, both application and communication models are bound to the platform resources and transformed to explore design alternatives for both computations and communications. We present the ψ-chart and its implementation (i.e., communication models and Design Space Exploration) in TTool/DIPLODOCUS, a Unified Modeling Language (UML)/SysML framework for the modeling, simulation, formal verification and automatic code generation of data-flow embedded systems. The effectiveness of our solution in terms of better design quality (e.g., portability, time) is demonstrated with the design of the physical layer of a ZigBee (IEEE 802.15.4) transmitter onto a multi-processor architecture.
Conference Paper
Event-driven programming has become a major paradigm in developing concurrent, distributed systems. Its benefits are often informally captured by the key tenet of “decoupling”, a notion which roughly captures the ability of modules to join and leave (or fail) applications dynamically, and to be developed by independent parties. Programming models for event-driven programming either make it hard to reason about global control flow, thus hampering sound execution, or sacrifice decoupling to aid in reasoning about control flow. This work fills the gap by introducing a programming model – dubbed cooperative decoupled processes – that achieves both decoupling and reasoning about global control flow. We introduce this programming model through an event calculus, loosely inspired by the Join calculus, that enables reasoning about cooperative decoupled processes through the concepts of pre- and postconditions. A linear type system controls aliasing of events to ensure uniqueness of control flow and thus safe exchange of shared events. Fundamental properties of the type system such as subject reduction, migration safety, and progress are established.
Article
Modern server software is demanding to develop and operate: it must be available at all times and in all locations; it must reply within milliseconds to user requests; it must respond quickly to capacity demands; it must process a lot of data and even more traffic; it must adapt quickly to changing product needs; and in many cases it must accommodate a large engineering organization, its many engineers the proverbial cooks in a big, messy kitchen. Copyright © 2016 held by owner/author. Publication rights licensed to ACM.
Conference Paper
In many important cloud services, different tenants execute their requests in the thread pool of the same process, requiring fair sharing of resources. However, using fair queue schedulers to provide fairness in this context is difficult because of high execution concurrency, and because request costs are unknown and have high variance. Using fair schedulers like WFQ and WF²Q in such settings leads to bursty schedules, where large requests block small ones for long periods of time. In this paper, we propose Two-Dimensional Fair Queueing (2DFQ), which spreads requests of different costs across di erent threads and minimizes the impact of tenants with unpredictable requests. In evaluation on production workloads from Azure Storage, a large-scale cloud system at Microsoft, we show that 2DFQ reduces the burstiness of service by 1-2 orders of magnitude. On workloads where many large requests compete with small ones, 2DFQ improves 99th percentile latencies by up to 2 orders of magnitude.
Conference Paper
An event loop is the basic scheduling mechanism for programs that respond to asynchronous events. In some frameworks, only the runtime can spin event loops, while in others, these can also be spun programmatically by event handlers. The latter provides more flexibility and helps improve responsiveness in cases where an event handler must wait for some input, for example, from the user or network. It can do so while spinning an event loop. In this paper, we consider the scheduling scheme of programmatic event loops. Programs which follow this scheme are prone to interference between a handler that is spinning an event loop and another handler that runs inside the loop. We present a happens-before based race detection technique for such programs. We exploit the structure and semantics of executions of these programs to design a sparse representation of the happens-before relation. It relates only a few pairs of operations explicitly in such a way that the ordering between any pair of operations can be inferred from the sparse representation in constant time. We have implemented our technique in an offline race detector for C/C++ programs, called SparseRacer. We discovered 13 new and harmful race conditions in 9 open-source applications using SparseRacer. So far, developers have confirmed 8 as valid bugs, and have fixed 3. These bugs arise from unintended interference due to programmatic event loops. Our sparse representation improved efficiency and gave an average speedup of 5x in race detection time.
Article
There is a big class of problems that requires writing programs in an asynchronous manner. Cloud computing, service-oriented architectures, multi-core and heterogeneous systems all require programs to be written with asynchronous components. The necessity of concurrency and asynchronous execution brings in the added complexity of the inversion of control into the system, either through message passing or through event processing. In this paper, we introduce explicit programming language support for asynchronous programming that completely hides inversion of control. The presented programming model defines a common abstraction of the different types of tasks, both synchronous and asynchronous. It defines common imperative control constructs equivalent to those of the host programming language, along with a few more advanced ones for transactional and parallel execution that can universally work for any task type. It allows the programmer to implement the logic of an asynchronous system in a natural way by writing simple, seemingly, synchronous imperative code. We will show that the programs written using this approach are easier to understand by programmers. They are also easier to design automated tests for, and for performing computer-based static analysis of the program logic. The principles behind this approach were tested in a couple of real-world systems with worldwide user base. Our experience shows that it makes the complex code with a lot of interdependencies between asynchronously executed tasks easy to write and reason about.
Conference Paper
A single-page application is a web application which is retrieved with a single page load, and has become popular recently. In such web applications, real-time interaction is offered by long polling of HTML requests, typically the Comet model. However, such communication between a client and a server is inefficient because of the TCP handshake and HTTP header overhead. In order to improve this kind of inefficiency, WebSocket is proposed as a web technology providing full-duplex communications between web browsers and servers. In this paper, we design and implement a load balancer suitable for Web applications using the WebSocket protocol, which enables us to get improved performance with respect to simultaneous connectability. Usually, load balancers handle TCP packets in the transport layer, or L4, of the network. Our load balancer is designed as a relay in the application layer, or L7, in order to provide a finer distribution of the network load. We implement the load balancer on an event-driven web application framework, Node.js. We evaluate the implementation of efficiency of the load balancer.
Conference Paper
Event-driven programming frameworks such as Node.JS have recently emerged as a promising option for Web service development. Such frameworks feature a simple programming model with implicit parallelism and asynchronous I/O. The benefits of the event-based programming model in terms of concurrency management need to be balanced against its limitations in terms of scalability on multicore architectures and against the impossibility of sharing a common memory space between multiple Node.JS processes. In this paper we present Node.Scala, an event-based programming framework for the JVM which overcomes the limitations of current event-driven frameworks. Node.Scala introduces safe stateful programming for event-based services. The programming model of Node.Scala allows threads to safely share state in a standard event-based programming model. The runtime system of Node.Scala automatically parallelizes and synchronizes state access to guarantee correctness. Experiments show that services developed in Node.Scala yield linear scalability and high throughput when deployed on multicore machines.
Conference Paper
This paper introduces AC, a set of language constructs for composable asynchronous IO in native languages such as C/C++. Unlike traditional synchronous IO interfaces, AC lets a thread issue multiple IO requests so that they can be serviced concurrently, and so that long-latency operations can be overlapped with computation. Unlike traditional asynchronous IO interfaces, AC retains a sequential style of programming without requiring code to use multiple threads, and without requiring code to be "stack-ripped" into chains of callbacks. AC provides an "async" statement to identify opportunities for IO operations to be issued concurrently, a "do..finish" block that waits until any enclosed "async" work is complete, and a "cancel" statement that requests cancellation of unfinished IO within an enclosing "do..finish". We give an operational semantics for a core language. We describe and evaluate implementations that are integrated with message passing on the Barrelfish research OS, and integrated with asynchronous file and network IO on Microsoft Windows. We show that AC offers comparable performance to existing C/C++ interfaces for asynchronous IO, while providing a simpler programming model.
Thesis
Full-text available
In this thesis, we address the need for a formal model for the design and implementation of Asynchronous Concurrent Systems, and also the need for a programming language to enable analyzable design of synchronous software components. In the first part of the thesis we present Coordinated Concurrent Scenarios (CCS) — a formal model that combines the rich visual notation of High level Message Sequence Charts (HMSC) with the expressive power of Message Passing Automata (MPA), and also show that verification of ccs models can be automated using the model checker Uppaal, under certain bounds. In the second part of the thesis we present clarity — a programming language that extends C and introduces new features that enable the programmer to write asynchronous code in a sequential manner, producing code that is more amenable to static analysis.
Article
Graph partition quality affects the overall performance of parallel graph computation systems. The quality of a graph partition is measured by the balance factor and edge cut ratio. A balanced graph partition with small edge cut ratio is generally preferred since it reduces the expensive network communication cost. However, according to an empirical study on Giraph, the performance over well partitioned graph might be even two times worse than simple random partitions. This is because these systems only optimize for the simple partition strategies and cannot efficiently handle the increasing workload of local message processing when a high quality graph partition is used. In this paper, we propose a novel partition aware graph computation engine named PAGE, which equips a new message processor and a dynamic concurrency control model. The new message processor concurrently processes local and remote messages in a unified way. The dynamic model adaptively adjusts the concurrency of the processor based on the online statistics. The experimental evaluation demonstrates the superiority of PAGE over the graph partitions with various qualities.
Article
The emergence of the technology of Wireless Sensor Networks (WSNs) has lead to many changes in current and traditional computational techniques in order to adapt to their harsh and scarce requirements. A WSN consists of sensor nodes with wireless communication abilities that allow them to form a network. New system architectures have emerged to overcome sensor network limitations. Each architecture follows one of the two traditional design concepts, event-driven or thread-driven design. Although event-driven systems were assumed to generally perform better for embedded systems, tests have shown that event-driven systems tend to save more energy and space, while the thread-driven systems provide more concurrency and predictability, hence creating a tradeoff depending on the requirements of the application at hand. Performance analyzers are often used to accurately measure the performance of a certain system when such a tradeoff is evident. Performance analyzers can also locate deficiencies in a certain system for future improvements. The ever increasing complexity of applications executed by WSNs and the evolving nature of the underlying Embedded Operating Systems (EOSs) has led to the need for an accurate evaluation technique to guide practitioners in the field. This paper presents a novel approach towards providing a benchmarking and performance evaluation tool for comparing and analyzing the performance of WSN EOSs.
Conference Paper
We analyze the I/O behavior of iBench, a new collection of productivity and multimedia application workloads. Our analysis reveals a number of differences between iBench and typical file-system workload studies, including the complex organization of modern files, the lack of pure sequential access, the influence of underlying frameworks on I/O patterns, the widespread use of file synchronization and atomic operations, and the prevalence of threads. Our results have strong ramifications for the design of next generation local and cloud-based storage systems.
Article
Web applications and their architecture have changed significantly in the last decade and nowadays utilize asynchronous techniques such as Ajax or Comet to enable real time data communication and dynamic rendering. They further use HTML5 features such as drag&drop, canvas, Web workers and application cache in order to behave like desktop applications and are referred to as Rich Internet Applications (RIAs). Several research streams have influenced the way those RIAs are built. Client-side applications with JavaScript, Service-oriented architecture and cloud computing are discuessed regarding their inflluences on Web architecture.
ResearchGate has not been able to resolve any references for this publication.