Thomas E. Anderson’s research while affiliated with University of Washington and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (194)


EMPower: The Case for a Cloud Power Control Plane
  • Article

April 2025

·

1 Read

ACM SIGEnergy Energy Informatics Review

Jonggyu Park

·

Theano Stavrinos

·

Simon Peter

·

Thomas Anderson

Escalating application demand and the end of Dennard scaling have put energy management at the center of cloud operations. Because of the huge cost and long lead time of provisioning new data centers, operators want to squeeze as much use out of existing data centers as possible, often limited by power provisioning fixed at the time of construction. Workload demand spikes and the inherent variability of renewable energy, as well as increased power unreliability from extreme weather events and natural disasters, make the data center power management problem even more challenging. We believe it is time to build a power control plane to provide fine-grained observability and control over data center power to operators. Our goal is to help make data centers substantially more elastic with respect to dynamic changes in energy sources and application needs, while still providing good performance to applications. There are many use cases for cloud power control, including increased power oversubscription and use of green energy, resilience to power failures, large-scale power demand response, and improved energy efficiency.


Figure 1: m4 mimics the computational structure of flowSim but replaces its components with learnable modules.
Figure 2: m4 's workflow: Inputs (yellow boxes), outputs (red boxes), intermediate components (white boxes).
Figure 3: m4 adds "dense" supervision during training by querying intermediate network states for "remaining size" and "queue length". Dashed boxes represent subsequent simulations triggered by new flow-level events.
Figure 4: m4 converts (a) a network snapshot in time to a (b) bipartite graph and uses GNN to capture spatial dynamics.
Figure 5: m4's implementation
m4: A Learned Flow-level Network Simulator
  • Preprint
  • File available

March 2025

·

19 Reads

·

Anton A. Zabreyko

·

·

[...]

·

Thomas Anderson

Flow-level simulation is widely used to model large-scale data center networks due to its scalability. Unlike packet-level simulators that model individual packets, flow-level simulators abstract traffic as continuous flows with dynamically assigned transmission rates. While this abstraction enables orders-of-magnitude speedup, it is inaccurate by omitting critical packet-level effects such as queuing, congestion control, and retransmissions. We present m4, an accurate and scalable flow-level simulator that uses machine learning to learn the dynamics of the network of interest. At the core of m4 lies a novel ML architecture that decomposes state transition computations into distinct spatial and temporal components, each represented by a suitable neural network. To efficiently learn the underlying flow-level dynamics, m4 adds dense supervision signals by predicting intermediate network metrics such as remaining flow size and queue length during training. m4 achieves a speedup of up to 104×\times over packet-level simulation. Relative to a traditional flow-level simulation, m4 reduces per-flow estimation errors by 45.3% (mean) and 53.0% (p90). For closed-loop applications, m4 accurately predicts network throughput under various congestion control schemes and workloads.

Download



Figure 2. Possible paths to invoke an accelerator.
Figure 3. Representative results of case studies in Table 1.
Parameter table for accurate traffic shaping.
Acceleration chances on accelerators.
Arcus: SLO Management for Accelerators in the Cloud with Traffic Shaping

October 2024

·

38 Reads

Cloud servers use accelerators for common tasks (e.g., encryption, compression, hashing) to improve CPU/GPU efficiency and overall performance. However, users' Service-level Objectives (SLOs) can be violated due to accelerator-related contention. The root cause is that existing solutions for accelerators only focus on isolation or fair allocation of compute and memory resources; they overlook the contention for communication-related resources. Specifically, three communication-induced challenges drive us to re-think the problem: (1) Accelerator traffic patterns are diverse, hard to predict, and mixed across users, (2) communication-related components lack effective low-level isolation mechanism to configure, and (3) computational heterogeneity of accelerators lead to unique relationships between the traffic mixture and the corresponding accelerator performance. The focus of this work is meeting SLOs in accelerator-rich systems. We present \design{}, treating accelerator SLO management as traffic management with proactive traffic shaping. We develop an SLO-aware protocol coupled with an offloaded interface on an architecture that supports precise and scalable traffic shaping. We guarantee accelerator SLO for various circumstances, with up to 45% tail latency reduction and less than 1% throughput variance.


The Case of Unsustainable CPU Affinity

September 2024

·

18 Reads

ACM SIGEnergy Energy Informatics Review

CPU affinity reduces data copies and improves data locality and has become a prevalent technique for high-performance programs in datacenters. This paper explores the tension between CPU affinity and sustainability. In particular, affinity settings can lead to significant uneven aging of cores on a CPU. We observe that infrastructure threads, used in a wide spectrum of network, storage, and virtualization sub-systems, exercise their affinitized cores up to 23× more when compared to typical μ s-scale application threads. In addition, we observe that the affinitized infrastructure threads generate regional heat hot spots and preclude CPUs from being used with the expected lifetime. Finally, we discuss design options to tackle the unbalanced core-aging problem to improve the overall sustainability of CPUs and call for more attention to sustainability-aware affinity and mitigation of such problems.


An Agile Pathway Towards Carbon-Aware Clouds

September 2024

·

7 Reads

·

1 Citation

ACM SIGEnergy Energy Informatics Review

Climate change is a pressing threat to planetary well-being that can be addressed only by rapid near-term actions across all sectors. Yet, the cloud computing sector, with its increasingly large carbon footprint, has initiated only modest efforts to reduce emissions to date; its main approach today relies on cloud providers sourcing renewable energy from a limited global pool of options. We investigate how to accelerate cloud computing's efforts. Our approach tackles carbon reduction from a software standpoint by gradually integrating carbon awareness into the cloud abstraction. Specifically, we identify key bottlenecks to software-driven cloud carbon reduction, including (1) the lack of visibility and disaggregated control between cloud providers and users over infrastructure and applications, (2) the immense overhead presently incurred by application developers to implement carbon-aware application optimizations, and (3) the increasing complexity of carbon-aware resource management due to renewable energy variability and growing hardware heterogeneity. To overcome these barriers, we propose an agile approach that federates the responsibility and tools to achieve carbon awareness across different cloud stakeholders. As a key first step, we advocate leveraging the role of application operators in managing large-scale cloud deployments and integrating carbon efficiency metrics into their cloud usage workflow. We discuss various techniques to help operators reduce carbon emissions, such as carbon budgets, service-level visibility into emissions, and configurable-yet-centralized resource management optimizations.



Figure 1: New services empowered by our proposal.
Figure 4: A system w/ accelerator management stack.
Figure 5: End-to-end scenarios. (a) A typical heterogeneous server with wild I/O contention, (b) function call mode, (c) inline mode, and (d) a complex use case.
Case studies: unpredictable accelerator throughput allocation ratios.
Host-FPGA contention (both tenants send DMA reads). We vary í µí±€í µí±Ží µí±¥í µí± í µí±’í µí±Ží µí±‘í µí± í µí±’í µí±ží µí±†í µí±–í µí± §í µí±’ as the PCIe MTU size variations.
Accelerator-as-a-Service in Public Clouds: An Intra-Host Traffic Management View for Performance Isolation in the Wild

July 2024

·

23 Reads

I/O devices in public clouds have integrated increasing numbers of hardware accelerators, e.g., AWS Nitro, Azure FPGA and Nvidia BlueField. However, such specialized compute (1) is not explicitly accessible to cloud users with performance guarantee, (2) cannot be leveraged simultaneously by both providers and users, unlike general-purpose compute (e.g., CPUs). Through ten observations, we present that the fundamental difficulty of democratizing accelerators is insufficient performance isolation support. The key obstacles to enforcing accelerator isolation are (1) too many unknown traffic patterns in public clouds and (2) too many possible contention sources in the datapath. In this work, instead of scheduling such complex traffic on-the-fly and augmenting isolation support on each system component, we propose to model traffic as network flows and proactively re-shape the traffic to avoid unpredictable contention. We discuss the implications of our findings on the design of future I/O management stacks and device interfaces.



Citations (72)


... Even as network hardware scales to terabit speeds to meet these demands, software-based TCP stacks struggle to keep pace, leading to inefficiencies in bandwidth utilization, energy consumption, and processing overhead. This efficiency challenge is further amplified by the growing adoption of compute-constrained SmartNICs for orchestrating data movement across high-bandwidth accelerators and disaggregated resource pools [53,60,67,84]. Even with stateof-the-art kernel-bypass TCP stacks, applications spend up to 48% of the per-packet CPU cycles in the stack [83], while TCP stacks designed for programmable network processing units (NPUs) [83] and FPGAs [4,53] present limitations in scalability, complexity, and energy efficiency. ...

Reference:

Scaling Data Center TCP to Terabits with Laminar
Beehive: A Flexible Network Stack for Direct-Attached Accelerators
  • Citing Conference Paper
  • November 2024

... Packet-level simulators [35,63,71] are popular in networking research, but face significant scalability challenges for modeling large-scale data center networks. Recent work improves the scalability of packet-level simulation using machine learning [43,74,75], approximation techniques [76], and better parallelization [27]. However, these approaches have some key limitations. ...

m3: Accurate Flow-Level Performance Estimation using Machine Learning
  • Citing Conference Paper
  • August 2024

... Apart from GPUs, another major power consumer in data centers is data storage [56], with power usage estimates ranging from 10% [57,58] to as much as 25-30% [59]. In order to reduce the power utilization of storage, it is important to understand the power contribution of individual hardware components, such as individual SSDs. ...

Can Storage Devices be Power Adaptive?
  • Citing Conference Paper
  • July 2024

... Energy and carbon efficiency are distinct, shaped by diverse energy sources like solar, wind, and gas [4]- [7]. This research emphasizes carbon efficiency, prioritizing the carbon impact of energy use over mere consumption levels. ...

Treehouse: A Case For Carbon-Aware Datacenter Software
  • Citing Article
  • October 2023

ACM SIGEnergy Energy Informatics Review

... While the analyses of relative trends, proportions, and correlations are less affected, the absolute carbon values may be less accurate due to the lack of essential carbon metrics of cloud data centers (e.g., power grid carbon intensity and operational and embodied carbon of hardware). To address this challenge, public cloud providers can increase transparency by exposing more reliable carbon metrics, carbon proxies, and embodied carbon data [2,33,56,74]. At the same time, providing fine-grained, real-time carbon emission logs for cloud services can significantly enhance cloud carbon transparency, enabling developers to make informed decisions about workload rightsizing, shifting, and optimization based on carbon emissions. ...

An Agile Pathway Towards Carbon-aware Clouds
  • Citing Conference Paper
  • August 2023

... With HT enabled, two affinity options are proposed to be considered, the first with affinity scatter and the second with affinity compact; Affinity restricts the execution of specific threads to a subset of available cores, whether physical or logical. Depending on the Scientific Reports | (2025) 15:5563 topology or arrangement of the cores, affinity is essential because it can have different effects on the execution speed of a program [24][25][26] . ...

The Case of Unsustainable CPU Affinity
  • Citing Conference Paper
  • August 2023

... In addition, ASGARD requires the trustworthiness of this reset interface itself to be carefully assessed for each given accelerator. Reset-based timesharing typically demands hardware support (i) to fully reset all dynamic states, including registers and memory as well as microarchitectural and physical ones [10], [13], and (ii) to attest to the freshness of the reset [90]. However, since our target is a legacy SoC, we can only modify the accelerator software (i.e., firmware) to add a function that wipes all of its software-writable registers and memory, and have the trusted Request TEEvisor to assign the given accelerator (specified by d id) to the calling enclave, exposing the accelerator's register space at the given guest-physical address (specified by d gpa). ...

Minimizing a Smartphone's TCB for Security-Critical Programs with Exclusively-Used, Physically-Isolated, Statically-Partitioned Hardware
  • Citing Conference Paper
  • June 2023

... Consequently, innovating and deploying new protocols becomes significantly faster and more straightforward, reducing development time from years to mere weeks or months. Moreover, it bridges the gap between software simulations and hardware implementations, as designers can directly validate their ideas on actual hardware [28,32]. ...

A Vision for Runtime Programmable Networks
  • Citing Conference Paper
  • November 2021

... However, challenges persist. ML models incur computational overhead, require real-time inference capabilities, and the decisions made by ML processes, especially those outside the kernel context, need to be interpretable [9]. Despite these challenges, ML-based optimization techniques are still being actively researched to explore ways to streamline the models, enhancing their efficiency for kernel-level decision-making. ...

Toward reconfigurable kernel datapaths with learned optimizations
  • Citing Conference Paper
  • June 2021

... Panics [19], [4] • Support extendable, no_std unwind library • Stack unwinding in embedded environments 5 C Interop [22], [13] • Kernel interfaces, while designed for extensibility, are not designed for type safety • Hybrid Code Flow. The Rust compiler can not track ownership when switching between modules written in C and Rust Table 4: Challenges Unique to the Rust Programming Language following 5 recommendations when using Rust in a size-constrained environment: ...

An incremental path towards a safer OS kernel
  • Citing Conference Paper
  • June 2021