Josep Torrellas

Josep Torrellas
University of Illinois, Urbana-Champaign | UIUC · Department of Computer Science

Doctor of Philosophy, Stanford University

About

423
Publications
31,619
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
12,388
Citations
Citations since 2017
70 Research Items
3025 Citations
20172018201920202021202220230100200300400500
20172018201920202021202220230100200300400500
20172018201920202021202220230100200300400500
20172018201920202021202220230100200300400500

Publications

Publications (423)
Preprint
Full-text available
Side-channel attacks that use machine learning (ML) for signal analysis have become prominent threats to computer security, as ML models easily find patterns in signals. To address this problem, this paper explores using Adversarial Machine Learning (AML) methods as a defense at the computer architecture layer to obfuscate side channels. We call th...
Article
Distributed applications such as key-value stores and databases provide fault tolerance by replicating records in the memories of different nodes, and using data consistency protocols to ensure consistency across replicas. In this environment, nonvolatile memory (NVM) offers the ability to attain high-performance data durability. However, it is unc...
Article
Full-text available
The security of computers is at risk because of information leaking through their power consumption. Attackers can use advanced signal measurement and analysis to recover sensitive data from this side channel. To address this problem, this article presents Maya, a simple and effective defense against power side channels. The idea is to use formal c...
Preprint
Full-text available
Byte-addressable, non-volatile memory (NVM) is emerging as a promising technology. To facilitate its wide adoption, employing NVM in managed runtimes like JVM has proven to be an effective approach (i.e., managed NVM). However, such an approach is runtime specific, which lacks a generic abstraction across different managed languages. Similar to the...
Article
Speculative execution attacks present an enormous security threat, capable of reading arbitrary program data under malicious speculation, and later exfiltrating that data over microarchitectural covert channels. This paper proposes speculative taint tracking (STT), a high security and high performance hardware mechanism to block these attacks. The...
Preprint
Recent security vulnerabilities that target speculative execution (e.g., Spectre) present a significant challenge for processor design. The highly publicized vulnerability uses speculative execution to learn victim secrets by changing cache state. As a result, recent computer architecture research has focused on invisible speculation mechanisms tha...
Article
A microarchitectural replay attack is a novel class of attack where an adversary can denoise nearly arbitrary microarchitectural side channels in a single run of the victim. The idea is to cause the victim to repeatedly replay by inducing pipeline flushes. In this article, we design, implement, and demonstrate our ideas in a framework, called Micro...
Article
Speculative execution attacks present an enormous security threat, capable of reading arbitrary program data under malicious speculation, and later exfiltrating that data over microarchitectural covert channels. This article proposes speculative taint tracking (STT), a high-security and high-performance hardware mechanism to block these attacks. Th...
Article
Computing is taking a central role in advancing science, technology, and society, facilitated by increasingly capable systems. Computers are expected to perform a variety of tasks, including life-critical functions, while the resources they require (such as storage and energy) are becoming increasingly limited. To meet expectations, computers use c...
Article
Ubiquitous multicore processors nowadays rely on an integrated packet-switched network for cores to exchange and share data. The performance of these intra-chip networks is a key determinant of the processor speed and, at high core counts, becomes an important bottleneck due to scalability issues. To address this, several works propose the use of m...
Preprint
Full-text available
Our community has greatly improved the efficiency of deep learning applications, including by exploiting sparsity in inputs. Most of that work, though, is for inference, where weight sparsity is known statically, and/or for specialized hardware. We propose a scheme to leverage dynamic sparsity during training. In particular, we exploit zeros introd...
Conference Paper
Many task-based graph algorithms benefit from executing tasks according to some programmer-specified priority order. To support such algorithms, graph frameworks use Concurrent Priority Schedulers (CPSs), which attempt---but do not guarantee---to execute the tasks according to their priority order. While CPSs are critical to performance, there is i...
Conference Paper
Speculative execution attacks present an enormous security threat, capable of reading arbitrary program data under malicious speculation, and later exfiltrating that data over microarchitectural covert channels. Since these attacks first rely on being able to read arbitrary data (potential secrets), a conservative approach to defeat all attacks is...
Conference Paper
Resource control in heterogeneous computers built with subsystems from different vendors is challenging. There is a tension between the need to quickly generate local decisions in each subsystem and the desire to coordinate the different subsystems for global optimization. In practice, global coordination among subsystems is considered hard, and cu...
Preprint
The security of computers is at risk because of information leaking through the system's physical outputs like power, temperature and electromagnetic (EM) emissions. Through advanced methods of signal measurement and analysis, attackers have compromised the security of many systems, recovering sensitive and personal data. Countermeasures that have...
Conference Paper
The popularity of hardware-based Trusted Execution Environments (TEEs) has recently skyrocketed with the introduction of Intel's Software Guard Extensions (SGX). In SGX, the user process is protected from supervisor software, such as the operating system, through an isolated execution environment called an enclave. Despite the isolation guarantees...
Conference Paper
Full-text available
Directories for cache coherence have been recently shown to be vulnerable to conflict-based side-channel attacks. By forcing directory conflicts, an attacker can evict victim directory entries, which in turn trigger the eviction of victim cache lines from private caches. This evidence strongly suggests that directories need to be redesigned for sec...
Conference Paper
Full-text available
A processor laid out vertically in stacked layers can benefit from reduced wire delays, low energy consumption, and a small footprint. Such a design can be enabled by Monolithic 3D (M3D), a technology that provides short wire lengths, good thermal properties, and high integration. In current M3D technology, due to manufacturing constraints, the lay...
Preprint
Wireless Network-on-Chip (WNoC) has emerged as a promising alternative to conventional interconnect fabrics at the chip scale. Since WNoCs may imply the close integration of antennas, one of the salient challenges in this scenario is the management of coupling and interferences. This paper, instead of combating coupling, aims to take advantage of c...
Conference Paper
Byte-addressable, non-volatile memory (NVM) is emerging as a revolutionary memory technology that provides persistency, near-DRAM performance, and scalable capacity. To facilitate its use, many NVM programming models have been proposed. However, most models require programmers to explicitly specify the data structures or objects that should reside...
Conference Paper
Byte addressable, Non-Volatile Memory (NVM) is emerging as a revolutionary technology that provides near-DRAM performance and scalable memory capacity. To facilitate the usability of NVM, new programming frameworks have been proposed to automatically or semi-automatically maintain crash-consistent data structures, relieving much of the burden of de...
Conference Paper
Data access patterns that involve fine-grained sharing, multicasts, or reductions have proved to be hard to scale in shared-memory platforms. Recently, wireless on-chip communication has been proposed as a solution to this problem, but a previous architecture has used it only to speed-up synchronization. An intriguing question is whether wireless c...
Conference Paper
Reference counting (RC) is one of the two fundamental approaches to garbage collection. It has the desirable characteristics of low memory overhead and short pause times, which are key in today's interactive mobile platforms. However, RC has a higher execution time overhead than its counterpart, tracing garbage collection. The reason is that RC imp...
Conference Paper
Byte-addressable non-volatile memory is poised to become prevalent in the near future. Thanks to device-level technological advances, hybrid systems of traditional dynamic random-access memory (DRAM) coupled with non-volatile random-access memory (NVRAM) are already present and are expected to be commonplace soon. NVRAM offers orders of magnitude p...
Preprint
Deep Neural Networks (DNNs) are fast becoming ubiquitous for their ability to attain good accuracy in various machine learning tasks. A DNN's architecture (i.e., its hyper-parameters) broadly determines the DNN's accuracy and performance, and is often confidential. Attacking a DNN in the cloud to obtain its architecture can potentially provide majo...
Conference Paper
In upcoming architectures that stack processor and DRAM dies, temperatures are higher because of the increased transistor density and the high inter-layer thermal resistance. However, past research has underestimated the extent of the thermal bottleneck. Recent experimental work shows that the Die-to-Die (D2D) layers hinder effective heat transfer,...
Conference Paper
To reduce the memory requirements of virtualized environments, modern hypervisors are equipped with the capability to search the memory address space and merge identical pages --- a process called page deduplication. This process uses a combination of data hashing and exhaustive comparison of pages, which consumes processor cycles and pollutes cach...
Conference Paper
In cache-based side channel attacks, a spy that shares a cache with a victim probes cache locations to extract information on the victim's access patterns. For example, in evict+reload, the spy repeatedly evicts and then reloads a probe address, checking if the victim has accessed the address in between the two operations. While there are many prop...
Conference Paper
The same flexibility that makes dynamic scripting languages appealing to programmers is also the primary cause of their low performance. To access objects of potentially different types, the compiler creates a dispatcher with a series of if statements, each performing a comparison to a type and a jump to a handler. This induces major overhead in in...
Article
The same flexibility that makes dynamic scripting languages appealing to programmers is also the primary cause of their low performance. To access objects of potentially different types, the compiler creates a dispatcher with a series of if statements, each performing a comparison to a type and a jump to a handler. This induces major overhead in in...
Article
In cache-based side channel attacks, a spy that shares a cache with a victim probes cache locations to extract information on the victim's access patterns. For example, in evict+reload, the spy repeatedly evicts and then reloads a probe address, checking if the victim has accessed the address in between the two operations. While there are many prop...
Article
This paper introduces the Survive DRAM architecture for effective in-memory micro-checkpointing. Survive implements low-cost incremental checkpointing, enabling fast rollback that can be used in various architectural techniques such as speculation, approximation, or low voltage operation. Survive also provides crash consistency when used as the fro...
Article
Full-text available
Because most technology and computer architecture innovations were (intentionally) invisible to higher layers, application and other software developers could reap the benefits of this progress without engaging in it. Higher performance has both made more computationally demanding applications feasible (e.g., virtual assistants, computer vision) an...
Article
As processors seek more resource efficiency, they increasingly need to target multiple goals at the same time, such as a level of performance, power consumption, and average utilization. Robust control solutions cannot come from heuristic-based controllers or even from formal approaches that combine multiple single-parameter controllers. Such contr...
Article
Full-text available
Increased transistor integration will soon allow us to build processor chips with over 1,000 cores. To construct such a chip, energy and power consumption are the most formidable obstacles. Hence, we need to design it from the ground up for energy efficiency. First of all, we want to operate at low voltage, since this is the point of maximum energy...
Article
Full-text available
There have been several recent efforts to improve the performance of fences. The most aggressive designs allow post-fence accesses to retire and complete before the fence completes. Unfortunately, such designs present implementation difficulties due to their reliance on global state and structures. This paper's goal is to optimize both the performa...
Article
Full-text available
There have been several recent efforts to improve the performance of fences. The most aggressive designs allow post-fence accesses to retire and complete before the fence completes. Unfortunately, such designs present implementation difficulties due to their reliance on global state and structures. This paper's goal is to optimize both the performa...
Article
Full-text available
There have been several recent efforts to improve the performance of fences. The most aggressive designs allow post-fence accesses to retire and complete before the fence completes. Unfortunately, such designs present implementation difficulties due to their reliance on global state and structures. This paper's goal is to optimize both the performa...
Article
The cache hierarchy often consumes a large portion of a processor's energy. To save energy in HPC environments, this paper proposes software-controlled reconfiguration of the cache hierarchy with an adaptive runtime system. Our approach addresses the two major limitations associated with other methods that reconfigure the caches: predicting the app...
Article
Effective execution of atomic blocks of instructions (also called transactions) can enhance the performance and programmability of multiprocessors. Atomic blocks can be demarcated in software as in Transactional Memory (TM) or dynamically generated by the hardware as in aggressive implementations of strict memory consistency. In most current design...
Article
Hardware-assisted Record and Deterministic Replay (RnR) of programs has been proposed as a primitive for debugging hard-to-repeat software bugs. However, simply providing support for repeatedly stumbling on the same bug does not help diagnose it. For bug diagnosis, developers typically want to modify the code, e.g., by creating and operating on new...
Article
Increased focus on JavaScript performance has resulted in vast performance improvements for many benchmarks. However, for actual code used in websites, the attained improvements often lag far behind those for popular benchmarks. This paper shows that the main reason behind this short-fall is how the compiler understands types. JavaScript has no con...
Conference Paper
Effective execution of atomic blocks of instructions (also called transactions) can enhance the performance and programmability of multiprocessors. Atomic blocks can be demarcated in software as in Transactional Memory (TM) or dynamically generated by the hardware as in aggressive implementations of strict memory consistency. In most current design...
Conference Paper
Hardware-assisted Record and Deterministic Replay (RnR) of programs has been proposed as a primitive for debugging hard-to-repeat software bugs. However, simply providing support for repeatedly stumbling on the same bug does not help diagnose it. For bug diagnosis, developers typically want to modify the code, e.g., by creating and operating on new...
Article
Record and Deterministic Replay (RnR) of multithreaded programs on relaxed-consistency multiprocessors has been a long-standing problem. While there are designs that work for Total Store Ordering (TSO), finding a general solution that is able to record the access reordering allowed by any relaxed-consistency model has proved challenging. This paper...
Article
Record and Deterministic Replay (RnR) of multithreaded programs on relaxed-consistency multiprocessors has been a long-standing problem. While there are designs that work for Total Store Ordering (TSO), finding a general solution that is able to record the access reordering allowed by any relaxed-consistency model has proved challenging. This paper...
Conference Paper
Record and Deterministic Replay (RnR) of multithreaded programs on relaxed-consistency multiprocessors has been a long-standing problem. While there are designs that work for Total Store Ordering (TSO), finding a general solution that is able to record the access reordering allowed by any relaxed-consistency model has proved challenging. This paper...
Conference Paper
EDRAM cells require periodic refresh, which ends up consuming substantial energy for large last-level caches. In practice, it is well known that different eDRAM cells can exhibit very different charge-retention properties. Unfortunately, current systems pessimistically assume worst-case retention times, and end up refreshing all the cells at a cons...