[show abstract][hide abstract] ABSTRACT: While Moore's Law predicts the ability of semiconductor industry to engineer smaller and more efficient transistors and circuits, there are serious issues not contemplated in that law. One concern is the verification effort of modern computing systems, which has grown to dominate the cost of system design. On the other hand, technology scaling leads to burn-in phase out. As a result, in-the-field error rate may increase due to both actual errors and latent defects. Whereas data can be protected with arithmetic codes, there is a lack of cost-effective mechanisms for control logic. This paper presents a light-weight microarchitectural mechanism that ensures that data consumed through registers are correct. The structures protected include the issue queue logic and the data associated (i.e., tags and control signals), input multiplexors, rename data, replay logic, register free-list and release logic, and register file logic. Our results show a coverage around 90 percent for the targeted structures with a cost in power and area of about four percent, and without impact in performance.
IEEE Transactions on Computers 09/2011; · 1.38 Impact Factor
[show abstract][hide abstract] ABSTRACT: Time-to-market is a critical issue for nowadays integrated circuits manufacturers. In this paper the Via-Configurable Transistor Array regular layout fabric (VCTA), which aims to minimize the time-to-market and its associated costs, is studied for a Delay-Locked Loop design (DLL). The comparison with a full custom design demonstrates that VCTA can be used without loss of functionality while accelerating the design time. Layout implementations, in 90 nm CMOS process, as well as the delay, energy and jitter electrical simulations are provided.
Design & Technology of Integrated Systems in Nanoscale Era (DTIS), 2011 6th International Conference on; 05/2011
[show abstract][hide abstract] ABSTRACT: The increasing device count and design complexity are posing significant challenges to post-silicon validation. Bug diagnosis is the most difficult step during post-silicon validation. Limited reproducibility and low testing speeds are common limitations in current testing techniques. Moreover, low observability defies full-speed testing approaches. Modern solutions like on-chip trace buffers alleviate these issues, but are unable to store long activity traces. As a consequence, the cost of post-Si validation now represents a large fraction of the total design cost. This work describes a hybrid post-Si approach to validate a modern load-store queue. We use an effective error detection mechanism and an expandable logging mechanism to observe the microarchitectural activity for long periods of time, at processor full-speed. Validation is performed by analyzing the log activity by means of a diagnosis algorithm. Correct memory ordering is checked to root the cause of errors.
High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on; 03/2011
[show abstract][hide abstract] ABSTRACT: Layout regularity is introduced progressively by integrated circuit manufacturers to reduce the increasing systematic process variations in the deep sub-micron era. In this paper we focus on a scenario where layout regularity must be pushed to the limit to deal with severe systematic process variations in future technology nodes. With this objective, we propose and evaluate a new regular layout style called Via-Configurable Transistor Array (VCTA) that maximizes regularity at device and interconnect levels. In order to assess VCTA maximum layout regularity tradeoffs, we implement 32-bit adders in the 90 nm technology node for VCTA and compare them with implementations that make use of standard cells. For this purpose we study the impact of photolithography proximity and coma effects on channel length variations, and the impact of shallow trench isolation mechanical stress on threshold voltage variations. We demonstrate that both variations, that are important sources of energy and delay circuit variability, are minimized through VCTA regularity.
VLSI System on Chip Conference (VLSI-SoC), 2010 18th IEEE/IFIP; 10/2010
[show abstract][hide abstract] ABSTRACT: Technology scaling leads to burn-in phase out and higher postsilicon test complexity, which increases in-the-field failure rate due to both latent defects and actual errors, respectively. As a consequence, current reliability qualification methods will likely be infeasible. Microarchitecture knowledge of application runtime behavior offers a possibility to have low-cost continuous online testing techniques detect hard errors in the field. Whereas data can be protected with redundancy (like parity or ECC), there is a lack of mechanism for control logic. This paper proposes a microarchitectural approach for validating that the memory order buffer logic works correctly. Our design relies on a small cache-like structure that keeps track of the last store to each cached address. Each load is checked to have obtained the data from the youngest older producing store. We present three different implementations of this idea, offering different trade-offs for error coverage, performance overhead, and design complexity.
IEEE Transactions on Computers 06/2010; · 1.38 Impact Factor
[show abstract][hide abstract] ABSTRACT: Soft errors are an important challenge in contemporary microprocessors. Particle hits on the components of a processor are expected to create an increasing number of transient errors with each new microprocessor generation. In this paper, we propose simple mechanisms that effectively reduce the vulnerability to soft errors in a processor. Our designs are generally motivated by the fact that many of the produced and consumed values in the processors are narrow and their upper order bits are meaningless. Soft errors caused by any particle strike to these higher order bits can be avoided by simply identifying these narrow values. Alternatively, soft errors can be detected or corrected on the narrow values by replicating the vulnerable portion of the value inside the storage space provided for the upper order bits of these operands. As a faster but less fault tolerant alternative to ECC and parity, we offer a variety of schemes that make use of narrow values and analyze their efficiency in reducing soft error vulnerability of different data-holding components of a processor. On average, techniques that make use of the narrowness of the values can provide 49 percent error detection, 45 percent error correction, or 27 percent error avoidance coverage for single bit upsets in the first level data cache across all Spec2K. In other structures such as the immediate field of the issue queue, an average error detection rate of 64 percent is achieved.
IEEE Transactions on Dependable and Secure Computing 10/2009; · 1.06 Impact Factor
[show abstract][hide abstract] ABSTRACT: Electromigration is a major source of wire and via failure. Refueling undoes EM for bidirectional wires and power/ground grids-some of a chip's most vulnerable wires. Refueling exploits EM's self-healing effect by balancing the amount of current flowing in both directions of a wire. It can significantly extend a wire's lifetime while reducing the chip area devoted to wires.
[show abstract][hide abstract] ABSTRACT: Technology scaling leads to burn-in phase out and higher post-silicon test complexity, which increases in-the-field error rate due to both latent defects and actual errors respectively. As a consequence, current reliability qualification methods will likely be infeasible. Microarchitecture knowledge of application runtime behavior offers a possibility to have low-cost continuous online testing techniques to cope with hard errors in the field. Whereas data can be protected with redundancy (like parity or ECC), there is a lack of mechanisms for control logic. This paper proposes a microarchitectural approach for validating that the memory order buffer logic works correctly.
Test Conference, 2008. ITC 2008. IEEE International; 11/2008
[show abstract][hide abstract] ABSTRACT: Multi-core microprocessors require reducing the FIT (failures-in-time) rate per core drastically to enable a larger number of cores within a FIT budget. Since large arrays like caches and register flies are typically protected with either ECC or parity, the issue system becomes as one of the largest contributors to the core's FIT rate. Soft-errors are an important concern in contemporary microprocessors. Particle hits on the components of a processor are expected to create an increasing number of transient errors in each new microprocessor generation. In addition, the number of hard-errors in the field is expected to grow as burn-in becomes less effective. Moreover, the continuous device shrinking increases the likelihood of in-the-field failures due to rather small defects exacerbated by degradation. This paper proposes on-line mechanisms to detect and recover to a consistent state, classify and confine in-the-field errors in the issue system of both in-order and out-of-order cores. Such mechanisms provide high coverage at a small cost.
Computer Design, 2008. ICCD 2008. IEEE International Conference on; 11/2008
[show abstract][hide abstract] ABSTRACT: Technology scaling leads to burn-in phase out and increasing post-silicon test complexity, which increases in-the-field error rate due to both latent defects and actual errors. As a consequence, there is an increasing need for continuous on-line testing techniques to cope with hard errors in the field. Similarly, those techniques are needed for detecting soft errors in logic, whose error rate is expected to raise in future technologies. Cache memories, which occupy most of the area of the chip, are typically protected with parity or ECC, but most of the wires as well as some combinational blocks remain unprotected against both soft and hard errors. This paper presents a set of techniques to detect and confine hard and soft errors in cache memories in combination with parity/ECC at very low cost. By means of hard signatures in data rows and error tracking, faults can be detected, classified properly and confined for hardware reconfiguration.
[show abstract][hide abstract] ABSTRACT: This paper proposes the fuse, a technique to anticipate failures due to degradation in any ALU (arithmetic logic unit), and particularly in an adder. The fuse consists of a replica of the weakest transistor in the adder and the circuitry required to measure its degradation. By mimicking the behavior of the replicated transistor the fuse anticipates the failure short before the first failure in the adder appears, and hence, data corruption and program crashes can be avoided. Our results show that the fuse anticipates the failure in more than 99.9% of the cases after 96.6% of the lifetime, even for pessimistic random within-die variations.
[show abstract][hide abstract] ABSTRACT: In this paper, the authors present a global view of the issues outlined above as well as some directions to address them. First, the most important sources of failure (SOF) are presented as well as their impact on CMOS technology. Then, techniques and key parameters to measure degradation due to different SOF are introduced and microarchitectural approaches to mitigate degradation are outlined. The problem of error detection and anticipation is illustrated as well as pros and cons of different types of mechanisms to perform such detection and anticipation. Finally, we illustrate the whole picture where performance and reliability must be traded carefully. We point out some directions to use the information about the detected errors and the amount of degradation of each component to configure the multi-core in such a way that performance is maximized without compromising reliability.
[show abstract][hide abstract] ABSTRACT: Narrow values that can be represented by less number of bits than the full machine width occur very frequently in programs. On the other hand, clustering mechanisms enable cost- and performance-effective scaling of processor back-end features. Those attributes can be combined synergistically to design special clusters operating on narrow values (a.k.a. helper cluster), potentially providing performance benefits. We complement a 32-bit monolithic processor with a low-complexity 8-bit helper cluster. Then, in our main focus, we propose various ideas to select suitable instructions to execute in the data-width based clusters. We add data-width information as another instruction steering decision metric and introduce new data-width based selection algorithms which also consider dependency, inter-cluster communication and load imbalance. Utilizing those techniques, the performance of a wide range of workloads are substantially increased; helper cluster achieves an average speedup of 11% for a wide range of 412 apps. When focusing on integer applications, the speedup can be as high as 22% on average
[show abstract][hide abstract] ABSTRACT: Memory structures consume an important fraction of the total processor energy. One solution to reduce the energy consumed by cache memories consists of reducing their supply voltage and/or increase their threshold voltage at an expense in access time. We propose to divide the L1 data cache into two cache modules for a clustered VLIW processor consisting of two clusters. Such division is done on a variable basis so that the address of a datum determines its location. Each cache module is assigned to a cluster and can be set up as a fast power-hungry module or as a slow power-aware module. We also present compiler techniques in order to distribute variables between the two cache modules and generate code accordingly. We have explored several cache configurations using the Mediabench suite and we have observed that the best distributed cache organization outperforms traditional cache organizations by 19%-31% in energy-delay and by 11%-29% in energy-delay. In addition, we also explore a reconfigurable distributed cache, where the cache can be reconfigured on a context switch. This reconfigurable scheme further outperforms the best previous distributed organization by 3%-4%.
Parallel Architectures and Compilation Techniques, 2005. PACT 2005. 14th International Conference on; 10/2005
[show abstract][hide abstract] ABSTRACT: Current microprocessors are becoming more vul- nerable to cosmic particle strikes and parameter variations. Particle strikes may cause soft (transient) errors, whereas high variability (due to process, temperature and voltage) may trans- form non-critical paths into critical paths, resulting in t iming errors. This paper proposes a design that exploits the benefi ts of clustering for detecting and recovering from soft and timing errors in the backend. We propose to use some of the regular backend-clusters for error detection and correction, which would work as a checker backend(s).
[show abstract][hide abstract] ABSTRACT: The TRAMS (Terascale Reliable Adaptive MEMORY Systems) project addresses in an evolutionary way the ultimate CMOS scaling technologies and paves the way for revolutionary, most promising beyond-CMOS technologies. In this abstract we show the significant variability levels of future 18 and 13 nm device bulk-CMOS technologies as well as its dramatic effect on the yield of memory cells and circuits.