
R.Iris Bahar- Brown University
R.Iris Bahar
- Brown University
About
198
Publications
39,661
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,217
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (198)
Secure memory is a natural solution to hardware vulnerabilities in memory, but it faces fundamental challenges of performance and memory overheads. While significant work has gone into optimizing the protocol for performance, far less work has gone into optimizing its memory overhead. In this work, we propose the Baobab Merkle Tree, in which counte...
A major challenge in research involving artificial intelligence (AI) is the development of algorithms that can find solutions to problems that can generalize to different environments and tasks. Unlike AI, humans are adept at finding solutions that can transfer. We hypothesize this is because their solutions are informed by causal models. We propos...
Algorithms based on Monte-Carlo sampling have been widely adapted in robotics and other areas of engineering due to their performance robustness. However, these sampling-based approaches have high computational requirements, making them unsuitable for real-time applications with tight energy constraints. In this paper, we investigate 6 degree-of-fr...
Barrier synchronization constructs are placed between phases of parallel programs to ensure correctness in the execution – by preventing threads from proceeding to the subsequent phases of the program before all threads have completed the preceding stage(s). Upon release, threads leaving the barrier at the same time cause sudden change in activity...
Algorithms based on Monte-Carlo sampling have been widely adapted in robotics and other areas of engineering due to their performance robustness. However, these sampling-based approaches have high computational requirements, making them unsuitable for real-time applications with tight energy constraints. In this paper, we investigate 6 degree-of-fr...
Ultra-low-power systems with substantial computing capacity require latches and SRAMs to operate at extremely low supply voltages. However, with aggressive technology scaling, reliability becomes a major challenge due to unavoidable process variations and the presence of multiple noise sources, including intrinsic thermal noise. This paper provides...
Integrated circuits and electronic systems, as well as design technologies, are evolving at a great rate -- both quantitatively and qualitatively. Major developments include new interconnects and switching devices with atomic-scale uncertainty, the depth and scale of on-chip integration, electronic system-level integration, the increasing significa...
We propose an architecture for a Field Programmable Gate Array (FPGA) based tester for a 3D stacked integrated circuit (IC). Due to the very short distances between dies in a stack that can make SerDes connections very efficient and the high density of through silicon vias (TSVs) that may be available, it is possible to connect the FPGA to the die...
Recent advancements have led to a proliferation of machine learning systems used to assist humans in a wide range of tasks. However, we are still far from accurate, reliable, and resource-efficient operations of these systems. For robot perception, convolutional neural networks (CNNs) for object detection and pose estimation are recently coming int...
In a traditional DRAM-based main memory architecture, a memory access operation requires much more time and energy than a simple logic operation. This fact is exploited to build time-consuming and power-hungry memory-hard cryptographic functions that serve the purpose of hindering brute-force security attacks.
The security of such memory-hard funct...
Reliability and error-rate modeling are major concerns for ultra-low-power subthreshold CMOS circuitry. In this study, we extend our stochastic time-domain error simulation framework [1] to thermally-induced bit-flip errors in ultimate CMOS SRAM latch cells. Our approach extracts the dependence of error-rate on technological parameters like operati...
Recent advances in memory architectures have provoked renewed interest in near-data-processing (NDP) as way to alleviate the "mem-ory wall" problem. An NDP architecture places logic circuits, such as simple processors, in close proximity to memory. Effective use of NDP architectures requires rethinking data structures and their algorithms. Here, we...
Jointly sponsored by IEEE and ACM, the International Conference on Computer-Aided Design (ICCAD) is the premier forum to explore emerging technology challenges in electronic design automation, present leading-edge research and development solutions, and identify future roadmaps for design automation research areas. This year, ICCAD was held in San...
Energy consumption is the dominant factor in many computing systems. Voltage scaling is a widely used technique to lower energy consumption, which exploits supply voltage margins to ensure reliable circuit operation. Aggressive voltage scaling will slow signal propagation; without coherent frequency relaxation, timing violations may be generated. H...
Recent advancements have led to a proliferation of machine learning systems used to assist humans in a wide range of tasks. However, we are still far from accurate, reliable, and resource-efficient operations of these systems. For robot perception, convolutional neural networks (CNNs) for object detection and pose estimation are recently coming int...
High-end embedded systems, like their general-purpose counterparts, are turning to many-core cluster-based shared-memory architectures that provide a shared memory abstraction subject to non-uniform memory access costs. In order to keep the cores and memory hierarchy simple, many-core embedded systems tend to employ simple, scratchpad-like memories...
Convolutional neural networks (CNNs) are of increasing widespread use in robotics, especially for object recognition. However, such CNNs still lack several critical properties necessary for robots to properly perceive and function autonomously in uncertain, and potentially adversarial, environments. In this paper, we investigate factors for accurat...
The 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) concluded successfully on 16 November 2017. The conference was held at the Marriott Hotel, Irvine, CA, USA, thus returning to California after a couple of years in Austin, TX, USA. An interesting and varied program focused primarily on electronic design automation (EDA) was...
Scaling of semiconductor devices has enabled higher levels of integration and performance improvements at the price of making devices more susceptible to the effects of static and dynamic variability. Adding safety margins (guardbands) on the operating frequency or supply voltage prevents timing errors, but has a negative impact on performance and...
Near-threshold and sub-threshold voltage designs have been identified as possible solutions to overcome the limitations introduced by energy consumption in modern VLSI circuits. However, as we approach sub-10nm transistor technology, aggressive voltage and gate length scaling will reduce the reliability of logic circuits due to the increasing impac...
While Deep Neural Networks (DNNs) push the state-of-the-art in many machine learning applications, they often require millions of expensive floating-point operations for each input classification. This computation overhead limits the applicability of DNNs to low-power, embedded platforms and incurs high cost in data centers. This motivates recent i...
While Deep Neural Networks (DNNs) push the state-of-the-art in many machine learning applications, they often require millions of expensive floating-point operations for each input classification. This computation overhead limits the applicability of DNNs to low-power, embedded platforms and incurs high cost in data centers. This motivates recent i...
Deep neural networks are gaining in popularity as they are used to generate state-of-the-art results for a variety of computer vision and machine learning applications. At the same time, these networks have grown in depth and complexity in order to solve harder problems. Given the limitations in power budgets dedicated to these networks, the import...
We present thrifty-malloc: a transaction-friendly dynamic memory manager for high-end embedded multicore systems. The manager combines modularity, ease-of-use and hardware transactional memory (HTM) compatibility in a lightweight and memory-efficient design. Thrifty-malloc is easy to deploy and configure for non-expert programmers, yet provides goo...
Numerous application domains (e.g., signal and image processing, computer graphics, computer vision, and machine learning) are inherently error tolerant, which can be exploited to produce approximate ASIC implementations with low power consumption at the expense of negligible or small reductions in application quality. A major challenge is the need...
We present a novel dynamic configuration technique for deep neural networks that permits step-wise energy-accuracy trade-offs during runtime. Our configuration technique adjusts the number of channels in the network dynamically depending on response time, power, and accuracy targets. To enable this dynamic configuration technique, we co-design a ne...
We present a novel dynamic configuration technique for deep neural networks that permits step-wise energy-accuracy trade-offs during runtime. Our configuration technique adjusts the number of channels in the network dynamically depending on response time, power, and accuracy targets. To enable this dynamic configuration technique, we co-design a ne...
In this work, a low-power, low-error divider design is proposed that can achieve significant power and area savings, while introducing insignificant inaccuracies to the output. The design of our divider is highly scalable, offering a wide range of power and inaccuracy trade-offs based on the application requirements. Furthermore, the proposed divid...
The gate length of CMOS transistors is continuing to shrink down to the sub-10nm region and operating voltages are moving toward near-threshold and even sub-threshold values. With this trend, the number of electrons responsible for the total charge of a CMOS node is greatly reduced. As a consequence, thermal fluctuations that shift a gate from its...
Operating circuits in the sub-threshold region can save power, but at the cost of higher susceptibility to noise. This paper analyzes various gate-level error-mitigation designs appropriate for sub-threshold circuits. Previous works have proposed a modified version of the Schmitt trigger gate that uses logic implications to reinforce correct functi...
In this article, we propose an efficient finite-element-based (FE-based) method for both steady and transient thermal analyses of high-performance integrated circuits based on the hierarchical matrix (H-matrix) representation. H-matrix has been shown to provide a data-sparse way to approximate the matrices and their inverses with almost linear-spac...
In this paper we present a transaction-friendly dynamic memory manager for high-end embedded mul-ticore systems. The current trend in high-end embedded systems design is to turn to Massively Parallel Processor Arrays (MPPAs) or many-core architectures, and to exploit as much as possible the parallelism they offer. In this context, one of the main p...
In this paper we present a transaction-friendly dynamic memory manager for high-end embedded mul-ticore systems. The current trend in high-end embedded systems design is to turn to Massively Parallel Processor Arrays (MPPAs) or many-core architectures, and to exploit as much as possible the parallelism they offer. In this context, one of the main p...
Embedded systems are becoming increasingly common in everyday life and like their general-purpose counterparts, they have shifted towards shared memory multicore architectures. However, they are much more resource constrained, and as they often run on batteries, energy efficiency becomes critically important. In such systems, achieving high concurr...
Noise analysis in nonlinear logic circuits requires models that take into account time-varying biasing conditions. When considering thermal noise, which moves the circuit away from its equilibrium point, a correct modeling approach has to go beyond the additive white Gaussian noise (AWGN) used in classical noise analysis. Even when accurate models...
As silicon integration technology pushes toward atomic dimensions, errors due to static and dynamic variability are an increasing concern. To avoid such errors, designers often turn to "guardband" restrictions on the operating frequency and voltage. If guardbands are too conservative, they limit performance and waste energy, but less conservative g...
3D die-stacks hold great promise for increasing
system performance, but difficulties in testing dies and assembling
a 3D stack are leading to yield issues and slowing the large
scale manufacturing of these devices. In many cases, a single
defective die will kill the entire stack. To help mitigate this issue,
we explore the possibility of repairing...
High-end embedded systems, like their general-purpose counterparts, are turning to many-core cluster-based shared-memory architectures that provide a shared memory
abstraction subject to non-uniform memory access (NUMA) costs. In order to keep the cores and memory hierarchy simple, many-core embedded systems tend to employ simple, scratchpad-like m...
The ease-of-use and reconfigurability of FPGAs makes them an attractive platform for accelerating algorithms. However, accelerating becomes a challenging task as the large number of possible design parameters lead to different accelerator variants. In this article, we propose techniques for fast design exploration and multi-objective optimization t...
Many classes of applications, especially in the domains of signal and image processing, computer graphics, computer vision, and machine learning, are inherently tolerant to inaccuracies in their underlying computations. This tolerance can be exploited to design approximate circuits that perform within acceptable accuracies but have much lower power...
3D stacked integrated circuits hold great promise for increasing system performance, but difficulties in testing dies and assembling a 3D stack are leading to yield issues and slowing the large scale manufacture of these devices. We propose helping to mitigate these issues by repairing the stack with programmable logic in FPGAs that have already be...
High-end embedded systems such as smart phones, game consoles, GPS-enabled automotive systems, and home entertainment centers, are becoming ubiquitous. Like their general-purpose counterparts, and for many of the same energy-related reasons, embedded systems are turning to multicore architectures. Moreover, as the demand for more compute-intensive...
The breakdown of Dennard scaling implies radical changes in the design, integration, manufacturing and deployment of new electronic systems. These changes, along with labor-force and macro-economic trends, undermine the status quo in the semiconductor and electronic design automation (EDA) fields. Of particular concern is a fairly static and aging...
Negative Bias Temperature Instability (NBTI) is a major reliability issue in nanoscale VLSI systems. Previous work has shown how the exploitation of conventional optimization techniques can reduce the NBTI-induced aging in cache memories. Other works have proposed approaches that incorporate software directed data allocation strategies to partially...
As circuits continue to scale to smaller feature sizes, wearout and latent defects are expected to cause an increasing number of errors in the field. Online error detection techniques, including logic implication-based checker hardware, are capable of detecting at least some of these errors as they occur. However, recovery may be expensive, and the...
The JPEG2000 image coding standard provides many superior features compared to JPEG and other compression standards.However, the relatively slow performance of JPEG2000, especially in software implementations, is a critical drawback of the standard. Moreover, as image sizes rapidly grow in size, higher demands on performance for image coding and pr...
Nanoscale circuits operating at sub-threshold voltages are affected by growing impact of random telegraph signal (RTS) and thermal noise. Given the low operational voltages and subsequently lower noise margins, these noise phenomena are capable of changing the value of some of the nodes in the circuit, compromising the reliability of the computatio...
The push to embed reliable and low-power memories architectures into modern systems-on-chip is driving the EDA community to develop new design techniques and circuit solutions that can concurrently optimize aging effects due to Negative Bias Temperature Instability (NBTI), and static power consumption due to leakage mechanisms. While recent works h...
The reconfigurability of Field Programmable Gate Arrays (FPGAs) makes them an attractive platform for accelerating algorithms. Accelerating a particular algorithm is a challenging task as the large number of possible algorithmic and hardware design parameters lead to different accelerator variant implementations, each with its own metrics such as p...
Roughly ninety percent of all microprocessors manufactured in any one year are intended for embedded devices such as cameras, cell-phones, or machine controllers. We evaluate the energy-efficiency and performance of spin-locks and simple hardware transactional memory on embedded de-vices. In most cases, transactional memory provides both significan...
In part I of this paper, a robust numerical framework based on Markov queueing theory and nonequilibrium Green's functions was presented to model the fluctuations in a CMOS flip-flop, which could potentially give rise to logic upsets. In part II, this framework is used to investigate quantitatively the failure in time for end-of-roadmap CMOS device...
As CMOS technology continues the path of miniaturization, noise-induced fluctuations raise heightened reliability concerns. In previous work, an analytical framework based on Markov queueing theory and Poisson shot noise was presented to model the probabilistic behavior of a CMOS flip-flop operated in the subthreshold regime. In this paper, this mo...
Two overriding concerns in the development of embedded MPSoCs are ease of programming and hardware complexity. In this paper we present SoC-TM, an integrated HW/SW solution for transactional programming on embedded MPSoCs. Our proposal leverages a Hardware Transactional Memory (HTM) design, based on a dedicated HW module for conflict management, wh...
We present a new method to identify multi-site implications that can significantly increase the fault coverage of error-detecting hardware without increasing the area overhead. This method intelligently divides the input space about the functions of internal circuit sites and finds new valuable implications that can share gates in checker logic.
The JPEG2000 image compression standard provides superior features to the popular JPEG standard; however, the slow performance of software implementation of JPEG2000 has kept it from being widely adopted. More than 80% of the execution time for JPEG2000 is spent on the Tier-1 coding engine. While much effort over the past decade has been devoted to...
−N detj τ , where Wi,j(t) represents the weight to be used for fault j when the t th pattern of the test subset is to be chosen and N detj represents the number of times that fault j has been detected by the t − 1 patterns that have already been added to the test subset. Once the new weights are obtained, we recompute the score for each pattern rem...
Debugging and speed-binning a fabricated design requires a pattern-dependent timing model to generate patterns, which static
timing analysis is incapable of providing. To address these issues, we propose a timing analysis tool that integrates a pattern-dependent
delay model into its analysis. Our approach solves for the delay by using the concept o...
Thermally induced fluctuations in the logic state of a simple flip-flop occur on a timescale that renders them impossible to simulate through Monte Carlo methods. In a previous work, an analytical framework based on Markov chains and queue theory was introduced along with a symbolic solution for a truncated 1-D queue, diagonally connecting the two...
While performance and power continue to be important metrics for embedded systems, as CMOS technologies continue to shrink, new metrics such as variability and reliability have emerged as limiting factors in the design of modern embedded systems. In particular, the reliability impact of pMOS negative bias temperature instability (NBTI) has become a...
Radiation-induced soft errors have been a reliability concern for logic integrated circuits since their emergence. Feature-size and supply-voltage reduction require the analysis of soft-error sensitivity as a function of technology scaling. In this paper, an analytical framework based on Markov chains and queue theory is presented for computation o...
With the scaling of CMOS technologies, the gap between nominal supply voltage and threshold voltage has decreased significantly. This trend is further amplified in low-power nanometer libraries, which feature cells with identical size and functionality, but different threshold voltages. As a consequence, different cells may have different delay beh...
We investigate how transactional memory can be adapted for embedded systems. We consider energy consumption and complexity to be driving concerns in the design of these systems and therefore adapt simple hardware transactional memory (HTM) schemes in our architectural design. We propose several different cache structures and contention management s...
Traditionally, the effects of temperature on delay of CMOS devices have been evaluated using the highest operating temperature as a worst-case corner. This conservative approach was based on the fact that, in older technologies, CMOS devices systematically degraded their performance as temperature increases.With the progressive scaling of technolog...
This paper investigates the use of logic implication checkers for the online detection of errors. A logic implication, or invariant relationship, must hold for all valid input conditions; therefore, any violation of this implication will indicate an error due to an intermittent fault. Techniques are presented to efficiently identify the most useful...
In this paper, we propose the use of logic implications to enhance online error detection capabilities and to improve the testing efficiency of an integrated circuit. These logic implications are implemented in hardware and help to verify that expected invariant circuit relationships are satisfied during field operation. Thus, any implication viola...
Power consumption requirements drive CMOS scaling to ever lower supply voltages, reducing the stability margin with respect to thermal noise and raising the probability for thermally-induced soft errors. Given the long time scale of noise-induced soft errors, conventional Monte Carlo simulations cannot be used to predict error rates and alternative...
We propose a new design for an energy-efficient hardware transactional memory (HTM) system for power-aware embedded devices.
Prior hardware transactional memory designs proposed a small, fully-associative transactional cache at the same level as the
L1 cache. We propose an alternative design that unifies the transactional and L1 caches, and provide...
Post-silicon clock-tuning is a technique used as part of speed-debug efforts to increase the allowable clock frequency of a chip. These days, it is not uncommon for high-end microprocessors to have cores containing a few thousand clock-tuning elements (i.e., variable-delay buffers). Each such buffer can be assigned to one of several possible discre...
As the complexity of integrated circuits has increased, so has the need for improving testing efficiency. Unfortunately, the types of defects are also becoming more complex, which in turn makes simple approaches for testing inadequate. Using n-detect testing can improve detect coverage; however, this approach can greatly increase the test set size....
The development of future nanoscale CMOS circuits, characterized by lower supply voltages and smaller dimensions, raises the question of logic stability of such devices with respect to electrical noise. This paper presents a theoretical framework that can be used to investigate the thermal noise probability distributions for equilibrium and nonequi...
This paper examines the ramifications of using D integration tech- nology on the leakage and timing variability of integrated circuits. We develop models that estimate the outcome of mapping a 2D design onto a 3D stack from a process variation perspective. We statistically prove and experimentally demonstrate that 3D integra- tion is a useful techn...
Ensuring reliable computation at the nanoscale requires mechanisms to detect and correct errors during normal circuit operation. In this paper we propose a method for designing efficient online error detection schemes for circuits based on the identification of invariant relationships in hardware. More specifically, we present a technique that auto...
Synchronization among tasks accounts for a sizable fraction of the energy consumption and execution time of applications running on Multi-Processor Systems-on-Chips platforms. In order to achieve fast and energy-efficient operations, it is therefore essential to implement efficient and power-frugal synchronization primitives. The design of such pri...