Amin Vahdat

Amin Vahdat
Google Inc. | Google

About

322
Publications
61,638
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
38,371
Citations

Publications

Publications (322)
Conference Paper
We present the large-scale, production deployment of reconfigurable Lightwave Fabrics (LWF) for Machine Learning (ML) supercomputers. These fabrics consist of a custom developed optical circuit switch (OCS), circulators, and WDM transceiver technologies. The use of a LWF dramatically enhances the current generation 4096 tensor processing unit (TPU)...
Conference Paper
In this paper, we describe Apollo, to the best of our knowledge, the world’s first large-scale production deployment of optical circuit switches (OCSes) for datacenter networking. We review the underlying hardware technologies including the design of our internally developed OCS and WDM transceivers.
Preprint
In this paper, we describe Apollo, to the best of our knowledge, the world's first large-scale production deployment of optical circuit switches (OCSes) for datacenter networking. We will first describe the infrastructure challenges and use cases that motivated optical switching inside datacenters. We then delve into the requirements of OCSes for d...
Article
We in Google's various networking teams would like to increase our collaborations with academic researchers related to data-driven networking research. There are some significant constraints on our ability to directly share data, which are not always widely-understood in the academic community; this document provides a brief summary. We describe so...
Preprint
To reduce cost, datacenter network operators are exploring blocking network designs. An example of such a design is a "spine-free" form of a Fat-Tree, in which pods directly connect to each other, rather than via spine blocks. To maintain application-perceived performance in the face of dynamic workloads, these new designs must be able to reconfigu...
Article
There is now a significant and growing functional gap between the public Internet, whose basic architecture has remained unchanged for several decades, and a new generation of more sophisticated private networks. To address this increasing divergence of functionality and overcome the Internet's architectural stagnation, we argue for the creation of...
Conference Paper
Data center networks evolve as they serve customer traffic. When applying network changes, operators risk impacting customer traffic because the network operates at reduced capacity and is more vulnerable to failures and traffic variations. The impact on customer traffic ultimately translates to operator cost (e.g., refunds to customers). However,...
Conference Paper
This paper presents our design and experience with a microkernel-inspired approach to host networking called Snap. Snap is a userspace networking system that supports Google's rapidly evolving needs with flexible modules that implement a range of network functions, including edge packet switching, virtualization for our cloud platform, traffic shap...
Conference Paper
Network virtualization stacks are the linchpins of public clouds. A key goal is to provide performance isolation so that workloads on one Virtual Machine (VM) do not adversely impact the network experience of another VM. Using data from a major public cloud provider, we systematically characterize how performance isolation can break in current virt...
Article
The cloud and telecommunications industry is in the midst of a transition towards the edge. There is a tremendous opportunity for the research community to influence this transformation, but doing so requires understanding industry momentum, and making a concerted effort to align with that momentum. We believe there are three keys to doing this: (1...
Preprint
Full-text available
Packet scheduling determines the ordering of packets in a queuing data structure with respect to some ranking function that is mandated by a scheduling policy. It is the core component in many recent innovations to optimize network performance and utilization. Our focus in this paper is on the design and deployment of packet scheduling in software....
Conference Paper
Private WANs are increasingly important to the operation of enterprises, telecoms, and cloud providers. For example, B4, Google's private software-defined WAN, is larger and growing faster than our connectivity to the public Internet. In this paper, we present the five-year evolution of B4. We describe the techniques we employed to incrementally mo...
Conference Paper
We present Sincronia, a near-optimal network design for coflows that can be implemented on top on any transport layer (for flows) that supports priority scheduling. Sincronia achieves this using a key technical result --- we show that given a "right" ordering of coflows, any per-flow rate allocation mechanism achieves average coflow completion time...
Article
Many algorithms proposed in networking research papers are widely used in many areas, including Congestion Control, Routing, Traffic Engineering, and Load Balancing. In this paper, we present algorithmic advancements that have impacted the practice of Congestion Control (CC) in datacenters and the Internet. Where possible, we also describe negative...
Conference Paper
Traffic shaping, including pacing and rate limiting, is fundamental to the correct and efficient operation of both datacenter and wide area networks. Sample use cases include policy-based bandwidth allocation to flow aggregates, rate-based congestion control algorithms, and packet pacing to avoid bursty transmissions that can overwhelm router buffe...
Conference Paper
We present the design of Espresso, Google's SDN-based Internet peering edge routing infrastructure. This architecture grew out of a need to exponentially scale the Internet edge cost-effectively and to enable application-aware routing at Internet-peering scale. Espresso utilizes commodity switches and host-based routing/packet processing to impleme...
Conference Paper
In this presentation, we will review the evolution of Google's intra-datacenter interconnects and networking over the past decade, then outline future technology directions which, along with a more holistic design approach, will be needed to keep pace with the requirements and growth of the datacenter.
Article
We present our approach for overcoming the cost, operational complexity, and limited scale endemic to datacenter networks a decade ago. Three themes unify the five generations of datacenter networks detailed in this paper. First, multi-stage Clos topologies built from commodity switch silicon can support cost-effective deployment of buildingscale n...
Conference Paper
As data centers grow larger and strive to provide tight performance and availability SLAs, their monitoring infrastructure must move from passive systems that provide aggregated inputs to human operators, to active systems that enable programmed control. In this paper, we propose Trumpet, an event monitoring system that leverages CPU resources and...
Conference Paper
Maintaining the highest levels of availability for content providers is challenging in the face of scale, network evolution and complexity. Little, however, is known about failures large content providers are susceptible to, and what mechanisms they employ to ensure high availability. From a detailed analysis of over 100 high-impact failure events...
Conference Paper
Software-defined networks can enable a variety of concurrent, dynamically instantiated, measurement tasks, that provide fine-grain visibility into network traffic. Recently, there have been many proposals for using sketches for network measurement. However, sketches in hardware switches use constrained resources such as SRAM memory, and the accurac...
Article
Full-text available
Represented as graphs, real networks are intricate combinations of order and disorder. Fixing some of the structural properties of network models to their values observed in real networks, many other properties appear as statistical consequences of these fixed observables, plus randomness in other respects. Here we employ the dk-series, a complete...
Data
Supplementary Figures 1-10, Supplementary Tables 1-5, Supplementary Notes 1-3, Supplementary Discussion, Supplementary Methods and Supplementary References
Conference Paper
The drive towards richer and more interactive web content places increasingly stringent requirements on datacenter network performance. Applications running atop these networks typically partition an incoming query into multiple subqueries, and generate the final result by aggregating the responses for these subqueries. As a result, a large fractio...
Conference Paper
Cloud computing providers have recently begun to offer high-performance virtualized flash storage and virtualized network I/O capabilities, which have the potential to increase application performance. Since users pay for only the resources they use, these new resources have the potential to lower overall cost. Yet achieving low cost requires choos...
Article
Full-text available
We present our approach for overcoming the cost, operational complexity, and limited scale endemic to datacenter networks a decade ago. Three themes unify the five generations of datacenter networks detailed in this paper. First, multi-stage Clos topologies built from commodity switch silicon can support cost-effective deployment of building-scale...
Article
Datacenter transports aim to deliver low latency messaging together with high throughput. We show that simple packet delay, measured as round-trip times at hosts, is an effective congestion signal without the need for switch feedback. First, we show that advances in NIC hardware have made RTT measurement possible with microsecond accuracy, and that...
Article
The design space for large, multipath datacenter networks is large and complex, and no one design fits all purposes. Network architects must trade off many criteria to design cost-effective, reliable, and maintainable networks, and typically cannot explore much of the design space. We present Condor, our approach to enabling a rapid, efficient desi...
Article
WAN bandwidth remains a constrained resource that is economically infeasible to substantially overprovision. Hence, it is important to allocate capacity according to service priority and based on the incremental value of additional allocation. For example, it may be the highest priority for one service to receive 10Gb/s of bandwidth but upon reachi...
Conference Paper
WAN bandwidth remains a constrained resource that is economically infeasible to substantially overprovision. Hence, it is important to allocate capacity according to service priority and based on the incremental value of additional allocation. For example, it may be the highest priority for one service to receive 10Gb/s of bandwidth but upon reachi...
Article
Emerging cloud-based network services must deliver both good performance and high availability. Achieving both of these goals requires content replication across multiple sites. Many cloud-based services either require or would benefit from the semantics and simplicity of strong consistency. However, replication techniques for strong consistency ca...
Article
Full-text available
Represented as graphs, real networks are intricate combinations of order and disorder. Fixing some of the structural properties of network models to their values observed in real networks, many other properties appear as statistical consequences of these fixed observables, plus randomness in other respects. Here we employ the $dk$-series, a complet...
Patent
Full-text available
A system, apparatus, and method for link layer address resolution of overlapping network addresses is disclosed. In one aspect, the method performed on a first device includes receiving a first address resolution request from a second device, the first address resolution request having a sender network address and a target network address, wherein...
Article
Today's network control and management traffic are limited by their reliance on existing data networks. Fate sharing in this context is highly undesirable, since control traffic has very different availability and traffic delivery requirements. In this paper, we explore the feasibility of building a dedicated wireless facilities network for data ce...
Article
Software-defined networks can enable a variety of concurrent, dynamically instantiated, measurement tasks, that provide fine-grain visibility into network traffic. Recently, there have been many proposals to configure TCAM counters in hardware switches to monitor traffic. However, the TCAM memory at switches is fundamentally limited and the accurac...
Article
Software-defined networks can enable a variety of concurrent, dynamically instantiated, measurement tasks, that provide fine-grain visibility into network traffic. Recently, there have been many proposals to configure TCAM counters in hardware switches to monitor traffic. However, the TCAM memory at switches is fundamentally limited and the accurac...
Article
Full-text available
Predictably sharing the network is critical to achieving high utilization in the datacenter. Past work has focussed on providing bandwidth to endpoints, but often we want to allocate resources among multi-node services. In this paper, we present Parley, which provides service-centric minimum bandwidth guarantees, which can be composed hierarchicall...
Article
Data Center topologies employ multiple paths among servers to deliver scalable, cost-effective network capacity. The simplest and the most widely deployed approach for load balancing among these paths, Equal Cost Multipath (ECMP), hashes flows among the shortest paths toward a destination. ECMP leverages uniform hashing of balanced flow sizes to ac...
Patent
Full-text available
Systems and methods for optimizing port usage in an optical circuit switch are disclosed herein. A plurality of optical circulators can be coupled to the plurality of input and output ports of an optical circuit switch. An optical circulator coupled to an input port and an optical circulator coupled to an output port can form a bidirectional pair c...
Conference Paper
The shared nature of multi-tenant cloud networks requires providing tenant isolation and quality of service, which in turn requires enforcing thousands of network-level rules, policies, and traffic rate limits. Enforcing these rules in virtual machine hypervisors imposes significant computational overhead, as well as increased latency. In FasTrak,...
Conference Paper
Full-text available
Fault recovery is a key issue in modern data centers. In a fat tree topology, a single link failure can disconnect a set of end hosts from the rest of the network until updated routing information is disseminated to every switch in the topology. The time for re-convergence can be substantial, leaving hosts disconnected for long periods of time and...
Article
Full-text available
OpenFlow is a vendor-agnostic API for controlling hardware and software switches. In its current form, OpenFlow is specific to particular protocols, making it hard to add new protocol headers. It is also tied to a specific processing paradigm. In this paper we make a strawman proposal for how OpenFlow should evolve in the future, starting with the...
Conference Paper
Solving “Big Data” problems requires bridging massive quantities of compute, memory, and storage, which requires a very high bandwidth network. Recently proposed direct connect networks like HyperX [1] and Flattened Butterfly [20] offer large capacity through paths of varying lengths between servers, and are highly cost effective for common data ce...
Conference Paper
Recent proposals have employed optical circuit switching (OCS) to reduce the cost of data center networks. However, the relatively slow switching times (10--100 ms) assumed by these approaches, and the accompanying latencies of their control planes, has limited its use to only the largest data center networks with highly aggregated and constrained...
Conference Paper
We present the design, implementation, and evaluation of B4, a private WAN connecting Google's data centers across the planet. B4 has a number of unique characteristics: i) massive bandwidth requirements deployed to a modest number of sites, ii) elastic traffic demand that seeks to maximize average bandwidth, and iii) full control over the edge ser...
Conference Paper
Full-text available
Recent proposals have employed optical circuit switching (OCS) to reduce the cost of data center networks. However, the relatively slow switching times (10--100 ms) assumed by these approaches, and the accompanying latencies of their control planes, has limited its use to only the largest data center networks with highly aggregated and constrained...
Conference Paper
We present the design, implementation, and evaluation of B4, a private WAN connecting Google's data centers across the planet. B4 has a number of unique characteristics: i) massive bandwidth requirements deployed to a modest number of sites, ii) elastic traffic demand that seeks to maximize average bandwidth, and iii) full control over the edge ser...
Article
Full-text available
We experimentally evaluate the network-level switching time of a functional 23-host prototype hybrid optical circuit-switched/electrical packet-switched network for datacenters called Mordia (Microsecond Optical Research Datacenter Interconnect Architecture). This hybrid network uses a standard electrical packet switch and an optical circuit-switch...
Technical Report
Full-text available
A basic problem in distributed computing has to do with assigning unique labels — that is, names or addresses — to network elements. Some approaches to solving this problem include using static assignment (e.g., MAC addresses), or using a centralized authority (e.g., DHCP). In this paper, we present an approach that is suitable for dynamic environm...
Article
We present TritonSort, a highly efficient, scalable sorting system. It is designed to process large datasets, and has been evaluated against as much as 100TB of input data spread across 832 disks in 52 nodes at a rate of 0.938TB/min. When evaluated against the annual Indy GraySort sorting benchmark, TritonSort is 66% better in absolute perfo...
Conference Paper
We discuss optical/electrical hybrid switching for datacenters. Our current prototype uses an optical circuit switched architecture based on a wavelength-selective switch (WSS) that has a measured mean host-to-host network reconfiguration time of 11.5 μs.
Conference Paper
We built and evaluated a hybrid electrical-packet/optical-circuit network for datacenters using a 10 μs optical circuit switch using wavelength-selective switches based on binary MEMs. This network has the potential to support large-scale, dynamic datacenter workloads.
Conference Paper
Full-text available
"Big Data" computing increasingly utilizes the MapReduce programming model for scalable processing of large data collections. Many MapReduce jobs are I/O-bound, and so minimizing the number of I/O operations is critical to improving their performance. In this work, we present Themis, a MapReduce implementation that reads and writes data records to...
Conference Paper
This paper presents the design and implementation of an incrementally scalable architecture for middleboxes based on commodity servers and operating systems. xOMB, the eXtensible Open MiddleBox, employs general programmable network processing pipelines, with user-defined C++ modules responsible for parsing, transforming, and forwarding network flow...
Conference Paper
Recently, there have been proposals for constructing hybrid data center networks combining electronic packet switching with either wireless or optical circuit switching, which are ideally suited for supporting bulk traffic. Previous work has relied on a technique called hotspot scheduling, in which the traffic matrix is measured, hotspots identifie...
Conference Paper
Engineering large-scale data center applications built from thousands of commodity nodes requires both an underlying network that supports a wide variety of traffic demands, and low latency at microsecond timescales. Many ideas for adding innovative functionality to networks, especially active queue management strategies, require either modifying p...
Conference Paper
In data center applications, predictability in service time and controlled latency, especially tail latency, are essential for building performant applications. This is especially true for applications or services built by accessing data across thousands of servers to generate a user response. Current practice has been to run such services at low u...
Article
Full-text available
Modern data centers are massive, and support a range of distributed applications across potentially hundreds of server racks. As their utilization and bandwidth needs continue to grow, traditional methods of augmenting bandwidth have proven complex and costly in time and resources. Recent measurements show that data center traffic is often limited...
Article
Full-text available
We designed and constructed a 24x24-port optical circuit switch (OCS) prototype with a programming time of 68.5 μs, a switching time of 2.8 μs, and a receiver electronics initialization time of 8.7 μs [1]. We demonstrate the operation of this prototype switch in a data center testbed under various workloads.
Article
Modern data centers are massive, and support a range of distributed applications across potentially hundreds of server racks. As their utilization and bandwidth needs continue to grow, traditional methods of augmenting bandwidth have proven complex and costly in time and resources. Recent measurements show that data center traffic is often limited...
Article
This talk highlights the symbiotic relationship between data management and networking through a study of two seemingly independent trends in the traditionally separate communities: large-scale data processing and software defined networking. First, data processing at scale increasingly runs across hundreds or thousands of servers. We show that bal...
Article
Full-text available
Application performance in cloud data centers often depends crucially on network bandwidth, not just the aggregate data transmitted as in typical SLAs. We describe a mechanism for data center networks called NetShare that requires no hardware changes to routers but allows bandwidth to be allocated predictably across services based on weights. The w...
Conference Paper
Full-text available
Traditional measures of network goodness--goodput, quality of service, fairness--are expressed in terms of bandwidth. Network latency has rarely been a primary concern because delivering the highest level of bandwidth essentially entails driving up latency--at the mean and, especially, at the tail. Recently, however, there has been renewed interest...
Article
Full-text available
Cloud computing is placing increasingly stringent demands on datacenter networks. Applications like MapReduce and Hadoop demand high bisection bandwidth to support their all-to-all shuffle communication phases. Conversely, Web services often rely on deep chains of relatively lightweight RPCs. While HPC vendors market niche hardware solutions, curre...
Conference Paper
Storage for cluster applications is typically provisioned based on rough, qualitative characterizations of applications. Moreover, configurations are often selected based on rules of thumb and are usually homogeneous across a deployment; to handle increased load, the application is simply scaled out across additional machines and storage of the sam...
Article
Full-text available
Support for distributed application management in large-scale networked environments remains in its early stages. Although a number of solutions exist for subtasks of application deployment, monitoring, and main-tenance in distributed environments, few tools provide a unified framework for application management. Many of the existing tools address...
Article
█ Vast majority of computation, storage, and communication will take place in data centers in the years ahead █ Building balanced systems today requires Pb/sec networks. Exabit/sec networks are not far behind █ Tremendous opportunity for optics to define what it means to perform communication in the data center
Article
Full-text available
Modern data centers can consist of hundreds of thousands of servers and millions of virtualized end hosts. Managing address assignment while simultaneously enabling scalable communication is a challenge in such an environment. We present ALIAS, an addressing and communication protocol that automates topology discovery and address assignment for the...
Conference Paper
Full-text available
Recent proposals to build hybrid electrical (packet-switched) and optical (circuit switched) data center interconnects promise to reduce the cost, complexity, and energy requirements of very large data center networks. Supporting realistic traffic patterns, however, exposes a number of unexpected and difficult challenges to actually deploying these...