Paul Barham's research while affiliated with Google Inc. and other places

Publications (52)

Preprint
Full-text available
Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we traine...
Preprint
Full-text available
We present the design of a new large scale orchestration layer for accelerators. Our system, Pathways, is explicitly designed to enable exploration of new systems and ML research ideas, while retaining state of the art performance for current models. Pathways uses a sharded dataflow graph of asynchronous operators that consume and produce futures,...
Conference Paper
In this paper we argue that systems for numerical computing are stuck in a local basin of performance and programmability. Systems researchers are doing an excellent job improving the performance of 5-year-old benchmarks, but gradually making it harder to explore innovative machine learning research ideas. We explain how the evolution of hardware a...
Conference Paper
Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, models based on recurrent neural networks and on reinforcement learning depend on recurrence relations, data-dependent conditional execution, and other features that call for dynamic control flow. These applications benefit from...
Article
We describe the timely dataflow model for distributed computation and its implementation in the Naiad system. The model supports stateful iterative and incremental computations. It enables both low-latency stream processing and high-throughput batch processing, using a new approach to coordination that combines asynchronous and fine-grained synchro...
Article
Full-text available
TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices...
Article
Full-text available
TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hund...
Patent
A system and method for providing an augmented reality environment in which the environmental mapping process is decoupled from the localization processes performed by one or more mobile devices is described. In some embodiments, an augmented reality system includes a mapping system with independent sensing devices for mapping a particular real-wor...
Technical Report
TensorFlow [1] is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of...
Conference Paper
Naiad is a distributed system for executing data parallel, cyclic dataflow programs. It offers the high throughput of batch processors, the low latency of stream processors, and the ability to perform iterative and incremental computations. Although existing systems offer some of these features, applications that require all three have relied on mu...
Patent
A system for providing augmented reality detects foreground occluders in an image of a video stream. One or more virtual objects are then rendered appropriately with respect to the occluders. Location information associated with the image is used to retrieve a three dimensional representation of the location where the image was taken. Features that...
Article
Full-text available
We present a generative model for representing and reasoning about the relationships among events in continuous time. We apply the model to the domain of networked and distributed computing environments where we fit the parameters of the model from timestamp observations, and then use hypothesis testing to discover dependencies between the events a...
Article
We discuss diversity and heterogeneity in manycore computer systems, and identify three distinct types of diversity, all of which present challenges to operating system designers and application writers alike. We observe that most current research work has concentrated on a narrow form of one of these (non-uniform memory access) to the exclusion of...
Article
As personal computing devices become increasingly parallel multiprocessors, the requirements for operating system schedulers change considerably. Future general-purpose machines will need to handle a dynamic, bursty, and interactive mix of parallel programs sharing a heterogeneous multicore machine. We argue that a key challenge for such machines i...
Conference Paper
Full-text available
Bugs in kernel extensions remain one of the main causes of poor operating system reliability despite proposed tech- niques that isolate extensions in separate protection domains to contain faults. We believe that previous fault isolation techniques are not widely used because they cannot iso- late existing kernel extensions with low overhead on sta...
Conference Paper
For over forty years, we have assumed hierarchical file system namespaces. These namespaces were a rudimentary attempt at simple organization. As users have begun to interact with increasing amounts of data and are increasingly demanding search capability, ...
Conference Paper
Commodity computer systems contain more and more processor cores and exhibit increasingly diverse architectural tradeo s, in- cluding memory hierarchies, interconnects, instruction sets and variants, and IO configurations. Previous high-performance com- puting systems have scaled in specific cases, but the dynamic nature of modern client and server...
Article
Full-text available
Worm containment must be automatic because worms can spread too fast for humans to respond. Recent work proposed network-level techniques to automate worm containment; these techniques have limitations because there is no information about the vulnerabilities exploited by worms at the network level. We propose Vigilante, a new end-to-end architectu...
Article
The basic system timer facilities used by applications and OS kernels for scheduling timeouts and periodic activities have remained largely unchanged for decades, while hardware architectures and application loads have changed radically. This raises concerns with CPU overhead power management and application responsiveness. In this paper we study h...
Conference Paper
The basic system timer facilities used by applications and OS kernels for scheduling timeouts and periodic activities have remained largely unchanged for decades, while hardware architectures and application loads have changed radically. This raises concerns with CPU overhead, power management and application responsiveness. In this paper we study...
Article
Full-text available
In a modern enterprise network of scale, dependencies between hosts and network services are surprisingly complex, typically undocumented, and rarely static. Even though network management and troubleshooting rely on this information, automated discovery and monitoring of these dependencies remains an unsolved problem. In the system we describe in...
Conference Paper
Full-text available
This paper presents the Leslie Graph, a simple yet powerful abstraction describing the complex dependencies between network, host and application components in modern networked systems. It discusses challenges in the discovery of Leslie Graphs, their uses, and describes two alternate approaches to their discovery, supported by some initial feasibil...
Article
Full-text available
On most modern operating systems, a process is a hardware-protected abstraction for executing potentially mutable code and data. Common features of processes include: dynamic code loading, dynamic code generation, access to cross-process shared memory, and a universal API. This paper argues that many of the dependability and security weaknesses of...
Article
Full-text available
Modern software is so complicated that it is often infeasible to get a good understanding of a sys-tem's dynamic behaviour simply from its source code. Commodity operating systems are a good example: they comprise numerous separately-authored components, large numbers of interact-ing threads, and extensibility mechanisms that al-low new components...
Conference Paper
Full-text available
Network-centric tools like NetFlow and security systems like IDSes provide essential data about the availability, reliability, and security of network devices and appli-cations. However, the increased use of encryption and tunnelling has reduced the visibility of monitoring ap-plications into packet headers and payloads (e. g. 93% of traffic on our...
Conference Paper
As we become increasingly dependent on computers connected to the Internet, we must protect them from worm attacks. Worms can gain complete control of millions of hosts in a few minutes, and they can use the infected hosts for malicious activities such as distributed denial of service attacks, relaying spam, corrupting data, and disclosing confiden...
Article
Worm containment must be automatic because worms can spread too fast for humans to respond. Recent work proposed network-level techniques to automate worm containment; these techniques have limitations because there is no information about the vulnerabilities exploited by worms at the network level. We propose Vigilante, a new end-to-end architectu...
Article
Full-text available
Singularity is a research project in Microsoft Research that started with the question: what would a software platform look like if it was designed from scratch with the primary goal of dependability? Singularity is working to answer this question by building on advances in programming languages and tools to develop a new system architecture and op...
Conference Paper
Full-text available
Worm containment must be automatic because worms can spread too fast for humans to respond. Recent work has proposed network-level techniques to automate worm containment; these techniques have limitations because there is no information about the vulnerabilities exploited by worms at the network level. We propose Vigilante, a new end-to-end approa...
Conference Paper
Full-text available
Enterprise networks contain hundreds, if not thousands, of cooperative end-systems. We advocate devoting a small fraction of their idle cycles, free disk space and network bandwidth to create Anemone, a platform for network management. In contrast to current approaches which rely on traffic statistics provided by network devices, Anemone combines e...
Conference Paper
Full-text available
This paper addresses the problem of extracting individual request activity from interleaved event traces. We present a new technique for event correlation which applies a form of temporal join over timestamped, parameterized event streams in order to identify the events pertaining to an individual request. Event schemas ensure that the request extr...
Conference Paper
Full-text available
Tools to understand complex system behaviour are es- sential for many performance analysis and debugging tasks, yet there are many open research problems in their development. Magpie is a toolchain for auto- matically extracting a system's workload under realis- tic operating conditions. Using low-overhead instru- mentation, we monitor the system t...
Article
Numerous systems have been designed which use virtualization to subdivide the ample resources of a modern computer. Some require specialized hardware, or cannot support commodity operating systems. Some target 100% binary compatibility at the expense of performance. Others sacrifice security or functionality for speed. Few offer resource isolation...
Article
Understanding the performance of distributed systems requires correlation of thousands of interactions between numerous components --- a task best left to a computer. Today's systems provide voluminous traces from each component but do not synthesise the data into concise models of system performance.
Article
This paper argues that there is significant benefit in providing multiple progressively stronger layers of security for hosts connecting to the Internet. It claims that this multi-layered approach allows early discard of packets associated with attacks. This reduces server vulnerability to computational denial-of-service attacks via heavyweight cry...
Article
This report describes the design of Xen, the hypervisor developed as part of the XenoServer widearea computing project. Xen enables the hardware resources of a machine to be virtualized and dynamically partitioned such as to allow multiple different `guest' operating system images to be run simultaneously.
Conference Paper
Full-text available
Understanding the performance of distributed systems requires correlation of thousands of interactions be- tween numerous components — a task best left to a com- puter. Today's systems provide voluminous traces from each component but do not synthesise the data into con- cise models of system performance. We argue that online performance modelling...
Article
Understanding the performance of distributed systems requires correlation of thousands of interactions between numerous components, a task best left to a computer. Today's systems provide voluminous traces from each compo-nent but do not synthesise the data into concise models of
Article
This paper considers some of the performance-related issues which must be tackled in order that the potential for Web Services to provide a truly global distributed system be achieved
Article
Introduction In this short paper we describe a Wide Area Audio Synchronisation demonstration using the Quality of Service (QoS) facilities of the Nemesis operating system. The Nemesis operating system is a result of the ESPRIT funded Pegasus I research project. A stereo audio stream is split with each channel being input to one of two distinct netw...
Article
This paper describes the use of Congestion Pricing as a means of providing Congestion Control and Differentiated Quality of Service. The application of the proposed technique to the Internet Protocol has the advantage that it can be simply implemented using Explicit Congestion Notification. In particular: the network mechanism is independent of hig...
Article
Support for multimedia applications by general purpose computing platforms has been the subject of considerable research.
Article
Full-text available
Modern networks are now capable of guaranteeing a consistent Quality of Service (QoS) to multimedia traffic streams. A number of major operating system vendors are also working hard to extend these guarantees into the end-system. In both cases, however, there remains the problem of determining a service rate sufficient to ensure the desired Quality...
Conference Paper
A vertically structured operating system is one in which neither the “kernel” nor “servers” perform work on behalf of applications-the former because it exists only to multiplex the CPU, and the latter in order to avoid Quality of Service interference between the applications. Instead, wherever possible, the applications perform all of their own pr...
Article
A vertically structured Operating System is one in which neither the "kernel" nor "servers" perform work on behalf of applications -- the former because it exists only to multiplex the CPU, and the latter in order to avoid Quality of Service interference between the applications. Instead, wherever possible, the applications perform all of their own...
Article
Support for multimedia applications by general purpose computing platforms has been the subject of considerable research. Much of this work is based on an evolutionary strategy in which small changes to existing systems are made. The approach adopted is to start ab initio with no backward compatibility constraints. This leads to a novel structure f...
Article
The Desk Area Network was proposed as an architecture suitable for a multimedia workstation. This paper describes how the architecture has evolved and the demonstration workstation that has been constructed 2 . 1 Introduction Common usage of the term "multimedia" is to describe systems which incorporate both traditional computer data forms, such as...
Article
This proposal describes the Anemone project and a demonstration of the work so far. The project is developing an edge-based IP network management platform which utilises only information collected at the edges of the network, eschewing the need to collect data in the network core. Devoting a small fraction of hosts' idle cycles, disk space, and net...
Article
Full-text available
Enterprise networks contain hundreds, if not thousands, of cooperative end-systems. This paper advocates devoting a small fraction of their idle cycles, free disk space and network bandwidth to create Anemone, a rich platform for network management. This contrasts with current approaches that rely on traffic statistics provided by network devices....

Citations

... Despite the promising progress on capsule works, Barham et al. [23] explained that although their convolutional capsule model required around 4 times fewer floating point operations (FLOPS) with 16 times fewer parameters than their CNN, implementations in both TensorFlow [24] and PyTorch [25] ran significantly slower and ran out of memory with much smaller models. Although several more efficient versions of capsule routing have since then been proposed [26], [27], [28], [29], the underlying problem is not only caused by routing but by the capsule voting procedure as well. ...
... Modern deep learning frameworks such as PyTorch (Paszke et al., 2019;Li S. et al., 2020) and TensorFlow (Yu et al., 2018) usually require one to define only the forward pass, and gradients of the loss function can be easily and automatically computed with respect to any parameter. The availability of open-source, well-designed, and easy-to-use deep learning frameworks certainly contributed to the increased application of DL in different areas of research, including drug discovery. ...
... A computationally efficient optimization is performed using a surrogate model of the MSKM with as inputs the ligament stiffness, reference strain and attachment positions, and as outputs the TF-kinematics and ligament strains. As surrogate modeling technique, an artificial neural network (ANN) is used, which is implemented using Tensorflow 2.4.1 [25]. For further details, we refer to the study of Bartsoen et al. [26]. ...
... example, can the problem be learned best with a multilayer perceptron (MLP) or a longterm-short-term memory network (LSTM)? 1 After a type of topology has been selected, for example MLP, the precise shape of that topology still needs to be determined and justified, e.g. the number of layers and neurons needs to be specified as hyperparameters as well as the activation functions. In a last step, the networks weights have to be chosen. ...
... Dataflow-based computational models were proposed to perform complex analytics on high-volume data sets: the timely dataflow [69,70] model targets batch processing, while its extension, differential dataflow [65], targets incremental processing. ...
... Our approach has been implemented as a self-contained software toolkit based on Tensorflow (Abadi et al. 2016) and scikit-learn (Pedregosa et al. 2011). It is implemented with a total of about 4K lines of Python code. ...
... According to this, if a sentence length is less than the maximum length, prepadding is used, and if the sentence is longer than the maximum length, pruning is done at the beginning. For experiment purposes, a well-known python library (Keras 2015) was used with tensorFlow Abadi et al. (2016) as a backend, and scikit-learn library Pedregosa et al. (2011) is used for machine learning models. We performed 5-fold cross-validation on the training dataset and evaluated the final model on the test datasets. ...
... However this extra buffering also adds to the overall latency [26]. Interoperability becomes an issue since typical data logging systems running on general purpose operating systems (OS) suffer further non determinism in OS related aspects such as task scheduling, context switching, communication protocols and buffering, etc. [27,28]. ...
... out-of-order processing. Hydra could be built on top of Apache Flink.Stream Processing Frameworks: This line of research focuses on the architecture of stream processing systems, answering questions about out-of-order data management, fault tolerance, highavailability, load management, elasticity etc.[5,14,15,21,23,27,35,60,66,76]. Fragkoulis et al. analyze the state of the art of stream processing engines[48]. ...
... In fact, we further explore the semantic of AssetRank results which are used to calculate the importance degrees of services in cloud service. Moreover, many efforts (e.g., [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30]) aiming at the dependency relationships discovery of services have been developed. Specifically, Sujoy Basu et al. [17] used automatical identification of dependency traces of messages to discover dependency relationships of Web services and is able to detect the dynamics of those services' relationships. ...