Philip Levis’s research while affiliated with Stanford University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (185)


Future of Memory: Massive, Diverse, Tightly Integrated with Compute - from Device to Software
  • Conference Paper

December 2024

·

14 Reads

Shuhan Liu

·

Robert M. Radway

·

Xinxin Wang

·

[...]

·









Towards retina-quality VR video streaming: 15ms could save you 80% of your bandwidth

January 2022

·

16 Reads

·

18 Citations

ACM SIGCOMM Computer Communication Review

Virtual reality systems today cannot yet stream immersive, retina-quality virtual reality video over a network. One of the greatest challenges to this goal is the sheer data rates required to transmit retina-quality video frames at high resolutions and frame rates. Recent work has leveraged the decay of visual acuity in human perception in novel gaze-contingent video compression techniques. In this paper, we show that reducing the motion-to-photon latency of a system itself is a key method for improving the compression ratio of gaze-contingent compression. Our key finding is that a client and streaming server system with sub-15ms latency can achieve 5x better compression than traditional techniques while also using simpler software algorithms than previous work.


GRIP: A Graph Neural Network Accelerator Architecture

January 2022

·

28 Reads

·

91 Citations

IEEE Transactions on Computers

We present GRIP, a graph neural network accelerator architecture designed for low-latency inference. Accelerating GNNs is challenging because they combine two distinct types of computation: arithmetic-intensive vertex-centric operations and memory-intensive edge-centric operations. GRIP splits GNN inference into a three edge- and vertex-centric execution phases that can be implemented in hardware. GRIP specializes each unit for the unique computational structure found in each phase. For vertex-centric phases, GRIP uses a high performance matrix multiply engine coupled with a dedicated memory subsystem for weights to improve reuse. For edge-centric phases, GRIP use multiple parallel prefetch and reduction engines to alleviate the irregularity in memory accesses. Finally, GRIP supports several GNN optimizations, including an optimization called vertex-tiling that increases the reuse of weight data. We evaluate GRIP by performing synthesis and place and route for a 28  nm28 \;\mathrm{n}\mathrm{m} implementation capable of executing inference for several widely-used GNN models (GCN, GraphSAGE, G-GCN, and GIN). Across several benchmark graphs, it reduces 99th percentile latency by a geometric mean of 17×17\times and 23×23\times compared to a CPU and GPU baseline, respectively, while drawing only 5  W5 \;\mathrm{W} .


Citations (79)


... While Large Language Models (LLMs) exhibit remarkable capabilities applicable to automation [4], individual LLM interactions are often insufficient for reliably handling multi-step, real-world processes. Consequently, the paradigm of Compound AI Systems has emerged, integrating multiple LLM calls with external APIs and tools to address complex automation challenges [49,34,5]. Such systems are increasingly deployed across various industries to streamline operations and enhance decision-making. ...

Reference:

Flow State: Humans Enabling AI Systems to Program Themselves
ALTO: An Efficient Network Orchestrator for Compound AI Systems
  • Citing Conference Paper
  • April 2024

... Furthermore, administrators need to leverage oversubscription, effectively running multiple processes or threads per core to maximize resource utilization [35]. As we more extensively discuss in §3.2, the problem is that most of the userspace networking stacks (full system solutions such as Shinjuku, or Demikernel [59,85]) follow the libOS model [58,60,80], effectively compiling the networking stack together with the application, which in turn takes full ownership of the NIC. This model is incompatible with the characteristics of cloud applications. ...

Cornflakes: Zero-Copy Serialization for Microsecond-Scale Networking
  • Citing Conference Paper
  • October 2023

... Natural convection finds useful application in smaller-scale home energy storage systems, as shown in Figure 2, prioritizing ease of use and energy efficiency [27]. Examples include the Tesla Powerwall [28], LG Chem RESU [29], Sonnen Eco [30], and Enphase Encharge [31]. These systems utilize lithium-ion or lithium-iron-phosphate battery technology to store excess energy from solar panels or the grid, offering homeowners greater energy independence, flexibility, and the ability to optimize energy usage. ...

Software defined grid energy storage
  • Citing Conference Paper
  • December 2022

... Moderate martial art exercises can enhance various physical qualities such as flexibility, coordination, and strength, improve immunity, and reduce the risk of illness [5]. However, excessive martial art exercise may cause physical dysfunction, and in severe cases, limb injuries may occur [6]. Therefore, it is necessary to develop a suitable martial art exercise intensity assessment method to recommend appropriate exercise and reduce the risk of injury for people. ...

GRIP: A Graph Neural Network Accelerator Architecture
  • Citing Article
  • January 2022

IEEE Transactions on Computers

... These tests will evaluate key performance metrics such as synchronization accuracy, latency, and the effectiveness of personalized content delivery in enhancing learning outcomes. From a technical standpoint, we aim to study approaches to optimize bandwidth consumption and reduce latency by exploring advanced techniques such as foveated video streaming, which achieves throughput reduction by prioritizing rendering quality in areas where the user is looking [27,28]. The user's viewpoint prediction can be also utilized to cache video data proactively and partially offload computing tasks to the edge server, to meet the demanding E2E latency [29]. ...

Towards retina-quality VR video streaming: 15ms could save you 80% of your bandwidth
  • Citing Article
  • January 2022

ACM SIGCOMM Computer Communication Review

... The ECO-system encounters the energy conserved by service for the synchronous operation to get the best battery timings of the device. Cinder operating system [17][18] works for the managing the energy efficiency in the application by using three control mechanisms i.e. isolation, delegation, and subdivision. The prototype is tested on Android G1 which is normally worked on Mac, Android or Microsoft operating system. ...

Apprehending joule thieves with cinder
  • Citing Article
  • January 2010

... Unfortunately, these data exchanges suffer from serialization overhead, where the in-memory data needs to be serialized into a contiguous representation for the data transport framework to be able to carry it over the wire [10] [16]. This is even the case for columnar in-memory formats such as Apache Arrow [4], which, although zero-copy within the same host, requires expensive serialization when transferring between hosts. ...

Breakfast of champions: towards zero-copy serialization with NIC scatter-gather
  • Citing Conference Paper
  • June 2021

... However, naively applying samplers of existing work to TAQA either fails to accelerate queries or requires modifying DBMSs. Specifically, row-level samplers are inefficient in databases that read data at the block level, resulting in query latencies as high as exact queries ( §4.1) [7,91]. This is especially the case for analytical queries where data scanning is often the primary latency bottleneck [17,99]. ...

Approximate partition selection for big-data workloads using summary statistics
  • Citing Article
  • August 2020

Proceedings of the VLDB Endowment

... Authors tended to the issues with respect to the utilization of non IP Low Power Wide Area Networks, because of the use of IP on compelled devices. (Ayers, H., 2020) Examines the execution of 6lowpan in major installed and sensor systems administration working framework, in any case, they don't interoperate. i.e., for any pair of executions, one implementation sends 6lowpan packets which the other fails to process and get. ...

Design Considerations for Low Power Internet Protocols
  • Citing Conference Paper
  • May 2020