
Mark Silberstein- University of Texas at Austin
Mark Silberstein
- University of Texas at Austin
About
40
Publications
5,012
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,161
Citations
Current institution
Publications
Publications (40)
Modern datacenters support a wide range of protocols and in-network switch enhancements aimed at improving performance. Unfortunately, the resulting protocols often do not coexist gracefully because they inevitably interact via queuing in the network. In this paper we describe EQDS, a new datagram service for datacenters that moves almost all of th...
With rising network rates, cloud vendors increasingly deploy FPGA-based SmartNICs (F-NICs), leveraging their inline processing capabilities to offload hypervisor networking infrastructure. However, the use of F-NICs for accelerating general-purpose server applications in clouds has been limited.
NICA is a hardware-software co-designed framework fo...
Recent GPUs enable Peer-to-Peer Direct Memory Access (p2p) from fast peripheral devices like NVMe SSDs to exclude the CPU from the data path between them for efficiency. Unfortunately, using p2p to access files is challenging because of the subtleties of low-level non-standard interfaces, which bypass the OS file I/O layers and may hurt system perf...
Modern discrete GPUs have been the processors of choice for accelerating compute-intensive applications, but using them in largescale data processing is extremely challenging. Unfortunately, they do not provide important I/O abstractions long established in the CPU context, such as memory mapped files, which shield programmers from the complexity o...
Future systemswill be omni-programmable: alongside CPUs, GPUs and FPGAs, theywill execute user code near-storage, near-network, near-memory, or on other Near-X accelerator Units, NXUs. This paper explores the design space ofOS support for omni-programmable systems, aiming to simplify the development of efficient applications that span multiple hete...
GPUs have become an integral part of modern systems, but their implications for system security are not yet clear. This paper demonstrates both that discrete GPUs cannot be used as secure co-processors and that GPUs provide a stealthy platform for malware. First, we examine a recent proposal to use discrete GPUs as secure co-processors and show tha...
PU hardware is becoming increasingly general purpose, quickly outgrowing the traditional but constrained GPU-as-coprocessor programming model. To make GPUs easier to program and easier to integrate with existing systems, we propose making the host's file system directly accessible from GPU code. GPUfs provides a POSIX-like API for GPU programs, exp...
As GPU hardware becomes increasingly general-purpose, it is quickly outgrowing the traditional, constrained GPU-as-coprocessor programming model. This article advocates for extending standard operating system services and abstractions to GPUs in order to facilitate program development and enable harmonious integration of GPUs in computing systems....
PU hardware is becoming increasingly general purpose, quickly outgrowing the traditional but constrained GPU-as-coprocessor programming model. To make GPUs easier to program and easier to integrate with existing systems, we propose making the host's file system directly accessible from GPU code. GPUfs provides a POSIX-like API for GPU programs, exp...
PU hardware is becoming increasingly general purpose, quickly outgrowing the traditional but constrained GPU-as-coprocessor programming model. To make GPUs easier to program and easier to integrate with existing systems, we propose making the host's file system directly accessible from GPU code. GPUfs provides a POSIX-like API for GPU programs, exp...
PU hardware is becoming increasingly general purpose, quickly outgrowing the traditional but constrained GPU-as-coprocessor programming model. To make GPUs easier to program and easier to integrate with existing systems, we propose making the host's file system directly accessible from GPU code. GPUfs provides a POSIX-like API for GPU programs, exp...
Processing vast numbers of data streams is a common problem in modern computer systems and is known as the "online big data problem." Adding hard real-time constraints to the processing makes the scheduling problem a very challenging task that this paper aims to address. In such an environment, each data stream is manipulated by a (different) appli...
Many scientists perform extensive computations by execut-ing large bags of similar tasks (BoTs) in mixtures of compu-tational environments, such as grids and clouds. Although the reliability and cost may vary considerably across these environments, no tool exists to assist scientists in the se-lection of environments that can both fulfill deadlines...
Modern systems keep long memories. As we show in this paper, an adversary who gains access to a Linux system, even one that implements secure deallocation, can recover the contents of applications' windows, audio buffers, and data remaining in device drivers-long after the applications have terminated. We design and implement Lacuna, a system that...
We propose a new set of OS abstractions to support GPUs and other accelerator devices as first class computing resources. These new abstractions, collectively called the PTask API, support a dataflow programming model. Because a PTask graph consists of OS-managed objects, the kernel has sufficient visibility and control to provide system-wide guara...
This chapter covers two difficult problems frequently encountered by graphics processing unit (GPU) developers-optimizing memory access for kernels with complex input-dependent access patterns, and mapping the computations to a GPU or a CPU in composite applications with multiple dependent kernels. Both pose a formidable challenge, as they require...
Data stream processing applications such as stock exchange data analysis, VoIP streaming, and sensor data processing pose two conflicting challenges: short per-stream latency -- to satisfy the milliseconds-long, hard real-time constraints of each stream, and high throughput -- to enable efficient processing of as many streams as possible. High-thro...
We present a holistic approach for efficient execution of bags-of-tasks (BOTs) on multiple grids, clusters, and volunteer computing grids virtualized as a single computing platform. The challenge is twofold: to assemble this compound environment and to employ it for execution of a mixture of throughput- and performance-oriented BOTs, with a dozen t...
Graphical models such as Bayesian networks have many applications in computational biology, numerous algorithmic improvements have been made over the years. Yet many practical problem instances remain infeasible as technology advances and more data becomes available, for instance through SNP genotyping and DNA sequencing. We therefore suggest a sch...
The ultimate goal of grid technologies is to materialize the vision of grids as virtual supercomputers of unprecedented power, through utilization of geographically disperse distributively owned resources. Despite the overwhelming success of grids in running pleasantly parallel tasks, there still exists a large set of demanding applications conside...
We present a technique for designing memory-bound algorithms with high data reuse on Graphics Processing Units (GPUs) equipped with close-to-ALU software-managed memory. The approach is based on the efficient use of this memory through the implemen - tation of a software-managed cache. We also present an analytical model for performance analysis of...
Linkage analysis is a tool used by geneticists for mapping disease-susceptibility genes in the study of Mendelian and complex diseases. However analyses of large inbred pedigrees with extensive missing data are often beyond the capabilities of a single computer. We present a distributed system called superlink-online for computing multipoint LOD sc...
Consider a workload in which massively parallel tasks that require large resource pools are interleaved with short tasks that require fast response but consume fewer re- sources. We aim at achieving high throughput and short response time when scheduling such a workload over a set of uncoordinated grids of varying sizes and performance characterist...
Linkage analysis is a tool used by geneticists for mapping disease-susceptibility genes in the study of Mendelian and complex diseases. However analyses of large inbred pedigrees with extensive missing data are often beyond the capabilities of a single computer. We present a distributed system called SUPERLINK-ONLINE for computing multipoint LOD sc...
Grids are becoming a mission-critical component in research and industry. The services they provide are thus required to be highly available, contributing to the vision of the Grid as a dependable virtual computer of infinite power. However, building highly available ser-vices in Grid is particularly difficult due to the unique characteristics of t...
The Grid communities efforts on managing and transporting data have focused on very large data sets consisting of very large
elements.We are interested in leveraging the benefits of solutions such GridFTP, in particular with respect to parallel data
transfer and restartability (as well as security, third party control, etc.), for moving large data...
Grids are becoming mission-critical components in research and industry, offering quite sophisticated solutions for the exploitation of virtually unlimited computing and storage resources. The grid resources are usually shared among multiple organizations and typically managed in a "best effort" manner. However, many real-world supercomputing appli...