Kypros Constantinides

Concordia University–Ann Arbor, Ann Arbor, Michigan, United States

Are you Kypros Constantinides?

Claim your profile

Publications (16)2.07 Total impact

  • Source
    Yiannakis Sazeides · Andreas Moustakas · Kypros Constantinides · Marios Kleanthous
    [Show abstract] [Hide abstract]
    ABSTRACT: This work investigates the potential of direction-correlations to improve branch prediction. There are two types of direction-correlation: affectors and affectees. This work considers for the first time their implications at a basic level. These correlations are determined based on dataflow graph information and are used to select the subset of global branch history bits used for prediction. If this subset is small then affectors and affectees can be useful to cut down learning time, and reduce aliasing in prediction tables. This paper extends previous work explaining why and how correlation-based predictors work by analyzing the properties of direction-correlations. It also shows that branch history selected based on direction-correlations improves the accuracy of the limit and realistic conditional branch predictors, that won at the recent branch prediction contest, by up to 30% and 17% respectively. The findings in this paper call for the investigation of predictors that can efficiently learn correlations that may be non-consecutive (i.e. with holes between them) from long branch history.
  • Kypros Constantinides · Todd M. Austin
    [Show abstract] [Hide abstract]
    ABSTRACT: As silicon process technology scales deeper into the nanometer regime, hardware defects are becoming more common, to the point of threatening yield rates and product lifetimes. Introspective software mechanisms hold great promise to address these reliability challenges with both low cost and high coverage. To address these challenges, we have developed a novel instruction set enhancement, called Access-Control Extensions (ACE), that can access and control a microprocessor's internal state. Using ACE technology, special firmware can periodically probe the microprocessor during execution to locate run-time faults, repair design errors (even those discovered in the field), and streamline manufacturing tests.
    Proceedings of the 47th Design Automation Conference, DAC 2010, Anaheim, California, USA, July 13-18, 2010; 01/2010
  • Source
    Kypros Constantinides · Onur Mutlu · Todd M. Austin · Valeria Bertacco
    [Show abstract] [Hide abstract]
    ABSTRACT: This work proposes a new, software-based, defect detection and diagnosis technique. We introduce a novel set of instructions, called Access-Control Extensions (ACE), that can access and control the microprocessor's internal state. Special firmware periodically suspends microprocessor execution and uses the ACE instructions to run directed tests on the hardware. When a hardware defect is present, these tests can diagnose and locate it, and then activate system repair through resource reconfiguration. The software nature of our framework makes it flexible: testing techniques can be modified/upgraded in the field to trade-off performance with reliability without requiring any change to the hardware. We describe and evaluate different execution models for using the ACE framework. We also describe how the proposed ACE framework can be extended and utilized to improve the quality of post-silicon debugging and manufacturing testing of modern processors. We evaluated our technique on a commercial chip- multiprocessor based on Sun's Niagara and found that it can provide very high coverage, with 99.22 percent of all silicon defects detected. Moreover, our results show that the average performance overhead of software-based testing is only 5.5 percent. Based on a detailed register transfer level (RTL) implementation of our technique, we find its area and power consumption overheads to be modest, with a 5.8 percent increase in total chip area and a 4 percent increase in the chip's overall power consumption. Index Terms—Reliability, hardware defects, online defect detection, testing, online self-test, post-silicon debugging, manufacturing test.
    IEEE Transactions on Computers 08/2009; 58:1063-1079. DOI:10.1109/TC.2009.52 · 1.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Extreme scaling practices in silicon technology are quickly leading to integrated circuit components with limited reliability, where phenomena such as early-transistor failures, gate-oxide wearout, and transient faults are becoming increasingly common. In order to overcome these issues and develop robust design techniques for large-market silicon ICs, it is necessary to rely on accurate failure analysis frameworks which enable design houses to faithfully evaluate both the impact of a wide range of potential failures and the ability of candidate reliable mechanisms to overcome them. Unfortunately, while failure rates are already growing beyond economically viable limits, no fault analysis framework is yet available that is both accurate and can operate on a complex integrated system. To address this void, we present CrashTest, a fast, high-fidelity and flexible resiliency analysis system. Given a hardware description model of the design under analysis, CrashTest is capable of orchestrating and performing a comprehensive design resiliency analysis by examining how the design reacts to faults while running software applications. Upon completion, CrashTest provides a high-fidelity analysis report obtained by performing a fault injection campaign at the gate-level netlist of the design. The fault injection and analysis process is significantly accelerated by the use of an FPGA hardware emulation platform. We conducted experimental evaluations on a range of systems, including a complex LEON-based system-on-chip, and evaluated the impact of gate-level injected faults at the system level. We found that CrashTest is 16-90x faster than an equivalent software-based framework, when analyzing designs through direct primary I/Os. As shown by our LEON-based SoC experiments, CrashTest exhibits emulation speeds that are six orders of magnitude faster than simulation.
    Computer Design, 2008. ICCD 2008. IEEE International Conference on; 11/2008
  • Source
    Yiannakis Sazeides · Andreas Moustakas · Kypros Constantinides · Marios Kleanthous
    [Show abstract] [Hide abstract]
    ABSTRACT: This work investigates the potential of direction-correlations to improve branch prediction. There are two types of direction-correlation: aectors and aectees . This work considers for the first time their im- plications at a basic level. These correlations are determined based on dataflow graph information and are used to select the subset of global branch history bits used for prediction. If this subset is small then aec- tors and aectees can be useful to cut down learning time, and reduce aliasing in prediction tables. This paper extends previous work explaining why and how correlation-based predictors work by analyzing the prop- erties of direction-correlations. It also shows that branch history selected using oracle knowledge of direction-correlations improves the accuracy of the limit and realistic conditional branch predictors, that won at the recent branch prediction contest, by up to 30% and 17% respectively. The findings in this paper call for the investigation of predictors that can learn eciently correlations from long branch history that may be non-consecutive with holes between them.
    High Performance Embedded Architectures and Compilers, Third International Conference, HiPEAC 2008, Göteborg, Sweden, January 27-29, 2008, Proceedings; 01/2008
  • Source
    Kypros Constantinides · Onur Mutlu · Todd Austin · Valeria Bertacco
    [Show abstract] [Hide abstract]
    ABSTRACT: As silicon process technology scales deeper into the nanometer regime, hardware defects are becoming more common. Such de- fects are bound to hinder the correct operation of future processor systems, unless new online techniques become available to detect and to tolerate them while preserving the integrity of software applications running on the system. This paper proposes a new, software-based, defect detection and diagnosis technique. We introduce a novel set of instruc- tions, called Access-Control Extension (ACE), that can access and control the microprocessor's internal state. Special firmware periodically suspends microprocessor execution and uses the ACE instructions to run directed tests on the hardware. When a hard- ware defect is present, these tests can diagnose and locate it, and then activate system repair through resource reconfigura- tion. The software nature of our framework makes it flexible: testing techniques can be modified/upgraded in the field to trade off performance with reliability without requiring any change to the hardware. We evaluated our technique on a commercial chip-multiprocessor based on Sun's Niagara and found that it can provide very high coverage, with 99.22% of all silicon defects detected. Moreover, our results show that the average performance overhead of software- based testing is only 5.5%. Based on a detailed RTL-level imple- mentation of our technique, we find its area overhead to be quite modest, with only a 5.8% increase in total chip area.
    Microarchitecture, 2007. MICRO 2007. 40th Annual IEEE/ACM International Symposium on; 01/2008
  • Source
    Kypros Constantinides · Onur Mutlu · Todd M. Austin
    [Show abstract] [Hide abstract]
    ABSTRACT: Higher level of resource integration and the addition of new features in modern multi-processors put a significant pressure on their verification. Although a large amount of resources and time are devoted to the verification phase of modern processors, many design bugs escape the verification process and slip into processors operating in the field. These design bugs often lead to lower quality products, lower customer satisfaction, diminishing brand/company reputation, or even expensive product recalls. This paper proposes a flexible, low-overhead mechanism to detect the occurrence of design bugs during on-line operation. First, we analyze the actual design bugs found and fixed in a commercial chip- multiprocessor, Sun's OpenSPARC Tl, to understand the behavior and characteristics of design bugs. Our RTL analysis of design bugs shows that the number of signals that need to be monitored to detect design bugs is significantly larger than suggested by previous studies that analyzed design bugs at a higher level using processor errata sheets. Second, based on the insights obtained from our analyses, we propose a programmable, distributed online design bug detection mechanism that incorporates the monitoring of bugs into the flip-flops of the design. The key contribution of our mechanism is its ability to monitor all control signals in the design rather than a set of signals selected at design time. As a result, it is very flexible: when a bug is discovered after the processor is shipped, it can be detected by monitoring the set of control signals that trigger the design bug. We develop an RTL prototype implementation of our mechanism on the OpenSPARC Tl chip multiprocessor. We found its area overhead to be 10% and its power consumption overhead to be 3.5% over the whole OpenSPARC Tl chip.
    41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), November 8-12, 2008, Lake Como, Italy; 01/2008
  • Kypros Constantinides · Onur Mutlu · Todd Austin · Valeria Bertacco
    [Show abstract] [Hide abstract]
    ABSTRACT: As silicon process technology scales deeper into the nanometer regime, hardware defects are becoming more common. Such de- fects are bound to hinder the correct operation of future processor systems, unless new online techniques become available to detect and to tolerate them while preserving the integrity of software applications running on the system. This paper proposes a new, software-based, defect detection and diagnosis technique. We introduce a novel set of instruc- tions, called Access-Control Extension (ACE), that can access and control the microprocessor's internal state. Special firmware periodically suspends microprocessor execution and uses the ACE instructions to run directed tests on the hardware. When a hard- ware defect is present, these tests can diagnose and locate it, and then activate system repair through resource reconfigura- tion. The software nature of our framework makes it flexible: testing techniques can be modified/upgraded in the field to trade off performance with reliability without requiring any change to the hardware. We evaluated our technique on a commercial chip-multiprocessor based on Sun's Niagara and found that it can provide very high coverage, with 99.22% of all silicon defects detected. Moreover, our results show that the average performance overhead of software- based testing is only 5.5%. Based on a detailed RTL-level imple- mentation of our technique, we find its area overhead to be quite modest, with only a 5.8% increase in total chip area.
    Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture; 12/2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: As silicon technologies move into the nanometer regime, transistor reliability is expected to wane as devices become subject to extreme process variation, particle-induced transient errors, and tran- sistor wear-out. Unless these challenges are addressed, computer vendors can expect low yields and short mean-times-to-failure. In this article, we examine the challenges of designing complex computing systems in the presence of transient and permanent faults. We select one small aspect of a typical chip multiprocessor (CMP) system to study in detail, a single CMP router switch. Our goal is to design a BulletProof CMP switch architecture capable of tolerating significant levels of various types of defects. We first assess the vulnerability of the CMP switch to transient faults. To better understand the impact of these faults, we evaluate our CMP switch designs using circuit- level timing on detailed physical layouts. Our infrastructure represents a new level of fidelity in architectural-level fault analysis, as we can accurately track faults as they occur, noting whether they manifest or not, because of masking in the circuits, logic, or architecture. Our experimental results are quite illuminating. We find that transient faults, because of their fleeting nature, are of little concern for our CMP switch, even within large switch fabrics with fast clocks. Next, we develop a unified model of permanent faults, based on the time-tested bathtub curve. Using this convenient abstraction, we analyze the reliability versus area tradeoff across a wide spectrum of CMP switch designs, ranging from unprotected designs to fully protected designs with on-line re- pair and recovery capabilities. Protection is considered at multiple levels from the entire system down through arbitrary partitions of the design. We find that designs are attainable that can tol- erate a larger number of defects with less overhead than na¨ive triple-modular redundancy, using
    ACM Transactions on Architecture and Code Optimization 03/2007; 4. DOI:10.1145/1216544.1216545 · 0.60 Impact Factor
  • Source
    Paul Racunas · Kypros Constantinides · Srilatha Manne · Shubhendu S. Mukherjee
    [Show abstract] [Hide abstract]
    ABSTRACT: Fault screeners are a new breed of fault identification technique that can probabilistically detect if a transient fault has affected the state of a processor. We demonstrate that fault screeners function because of two key characteristics. First, we show that much of the intermediate data generated by a program inherently falls within certain consistent bounds. Second, we observe that these bounds are often violated by the introduction of a fault. Thus, fault screeners can identify faults by directly watching for any data inconsistencies arising in an application's behavior. We present an idealized algorithm capable of identifying over 85% of injected faults on the SpecInt suite and over 75% overall. Further, in a realistic implementation on a simulated Pentium-III-like processor, about half of the errors due to injected faults are identified while still in speculative state. Errors detected this early can be eliminated by a pipeline flush. In this paper, we present several hardware-based versions of this screening algorithm and show that flushing the pipeline every time the hardware screener triggers reduces overall performance by less than 1%
    13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 10-14 February 2007, Phoenix, Arizona, USA; 01/2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Extreme transistor scaling trends in silicon technology are soon to reach a point where manufactured systems will suffer from limited device reliability and severely reduced life-time, due to early transistor failures, gate oxide wear-out, manufacturing defects, and radiation-induced soft errors (SER). In this paper we present a low-cost technique to harden a microprocessor pipeline and caches against these reliability threats. Our approach utilizes online built-in self-test (BIST) and microarchitectural checkpointing to detect, diagnose and recover the computation impaired by silicon defects or SER events. The approach works by periodically testing the processor to determine if the system is broken. If so, we reconfigure the processor to avoid using the broken component. A similar mechanism is used to detect SER, faults, with the difference that recovery is implemented by re-execution. By utilizing low-cost techniques to address defects and SER, we keep protection costs significantly lower than traditional fault-tolerance approaches while providing high levels of coverage for a wide range of faults. Using detailed gate-level simulation, we find that our approach provides 95% and 99% coverage for silicon defects and SER events, respectively, with only a 14% area overhead.
    2007 Design, Automation and Test in Europe Conference and Exposition (DATE 2007), April 16-20, 2007, Nice, France; 01/2007
  • Source
    Smitha Shyam · Kypros Constantinides · Sujay Phadke · Valeria Bertacco · Todd M. Austin
    [Show abstract] [Hide abstract]
    ABSTRACT: The sustained push toward smaller and smaller technology sizes has reached a point where device reliability has moved to the forefront of concerns for next-generation designs. Silicon failure mechanisms, such as transistor wearout and manufacturing defects, are a growing challenge that threatens the yield and product life- time of future systems. In this paper we introduce the BulletProof pipeline, the rst ultra low-cost mechanism to protect a micropro- cessor pipeline and on-chip memory system from silicon defects. To achieve this goal we combine area-frugal on-line testing tech- niques and system-level checkpointing to provide the same guar- antees of reliability found in traditional solutions, but at much lower cost. Our approach utilizes a microarchitectural checkpoint- ing mechanism which creates coarse-grained epochs of execution, during which distributed on-line built in self-test (BIST) mecha- nisms validate the integrity of the underlying hardware. In case a failure is detected, we rely on the natural redundancy of instruction- level parallel processors to repair the system so that it can still op- erate in a degraded performance mode. Using detailed circuit-level and architectural simulation, we nd that our approach provides very high coverage of silicon defects (89%) with little area cost (5.8%). In addition, when a defect occurs, the subsequent degraded mode of operation was found to have only moderate performance impacts, (from 4% to 18% slowdown).
    Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2006, San Jose, CA, USA, October 21-25, 2006; 10/2006
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: As silicon technologies move into the nanometer regime, transistor reliability is expected to wane as devices become subject to extreme process variation, particle-induced transient errors, and transistor wear-out. Unless these challenges are addressed, computer vendors can expect low yields and short mean-times-to-failure. In this paper, we examine the challenges of designing complex computing systems in the presence of transient and permanent faults. We select one small aspect of a typical chip multiprocessor (CMP) system to study in detail, a single CMP router switch. To start, we develop a unified model of faults, based on the time-tested bathtub curve. Using this convenient abstraction, we analyze the reliability versus area tradeoff across a wide spectrum of CMP switch designs, ranging from unprotected designs to fully protected designs with online repair and recovery capabilities. Protection is considered at multiple levels from the entire system down through arbitrary partitions of the design. To better understand the impact of these faults, we evaluate our CMP switch designs using circuit-level timing on detailed physical layouts. Our experimental results are quite illuminating. We find that designs are attainable that can tolerate a larger number of defects with less overhead than naive triple-modular redundancy, using domain-specific techniques such as end-to-end error detection, resource sparing, automatic circuit decomposition, and iterative diagnosis and reconfiguration.
    High-Performance Computer Architecture, 2006. The Twelfth International Symposium on; 03/2006
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently, there has been a growing concern that, in relation to process technology scaling, the soft-error rate will become a major challenge in designing reliable systems. In this work, we introduce a high-fidelity, high-performance simu-lation infrastructure for quantifying the derating effects on soft-error rates while considering microarchitectural, tim-ing and logic-related masking, using realistic workloads on a CMP switch design. We use a gate-level model for the CMP switch design, enabling us to inject faults into blocks of combinational logic. We are then able to track logic-related and time-related fault masking, as well as microarchitectural-related fault masking, at the architecture level. We find out that for complex designs, logic-and time-related fault mask-ing account for more than 50% of the masked faults. We also observe that only 3-4% of the injected faults propagate an error at the design's output and cause an error in the application's execution, resulting in a derating factor of 30. From our experiments, we also demonstrate that soft-error derating effects highly depend on the design's characteristics and utilization.
  • Source
    Kypros Constantinides
    [Show abstract] [Hide abstract]
    ABSTRACT: One of the major driving forces of the semiconductor industry is the continuous scaling of the silicon process technology. Over the last four decades, the scaling into a new silicon technology every few years offered smaller and faster transistors that made possible the development of high-performance microprocessors. This technological achievement fueled the widespread adoption of microprocessor-based products in applications that touch every aspect of our life. However, many device experts warn that the continued transistor size scaling into smaller dimensions will inevitably result in silicon technologies that are much less reliable than the current ones. Microprocessors manufactured in future silicon technologies will likely experience failures due to silicon defects. In the absence of any viable alternative technology, the success of the semiconductor industry in the future will depend on the creation of cost-effective mechanisms to tolerate silicon defects while the microprocessor is in operation. This thesis is focused on the development of defect-tolerance techniques that will provide low-cost mechanisms to protect a microprocessor from silicon defects. The approach of these novel defect-tolerance solutions represents a new thinking in the field of defect-tolerant design. In particular, traditional approaches to defect-tolerant design saddle a system with redundant components that continuously verify computation. In contrast, the proposed BulletProof approach provides low cost periodic hardware checking. Furthermore, to lower the cost of hardware checking, the silicon defect detection process is shifted from hardware to software using a software-based approach, the ACE Framework. This thesis also makes the case that the hardware resources of the ACE framework can also be used for other applications to add value and ease its adoption in future generation microprocessors. Finally, this thesis presents CrashTest, a novel FPGA-based framework used to assess the threats and the reliability requirements of a microprocessor. Altogether, the defect-tolerance solutions presented in this thesis provide a cost-effective defect-tolerance framework that makes possible the development of reliable microprocessors using unreliable silicon technologies. This enables the continuation of silicon scaling into smaller but possibly less reliable transistors, a key requirement for the development of the next generation microprocessors and the extension of microprocessor-based products into new applications. Ph.D. Computer Science & Engineering University of Michigan, Horace H. Rackham School of Graduate Studies http://deepblue.lib.umich.edu/bitstream/2027.42/62317/1/kypros_1.pdf
  • Source
    Kypros Constantinides · Yiannakis Sazeides
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a hardware-based heuristic method for implementing various transformations and detecting isomor-phism in the dynamic dependence graph of a program. This enables on the fly identification of isomorphic instructions which may be useful for improving the performance of several microar-chitectural mechanisms. This work considers the application of the proposed method to conditional branch prediction. The empirical results using SPEC benchmarks suggest that the pro-posed method may be useful for increasing prediction accuracy and improving performance. Specifically, is shown for a 4-way processor that a 16KB gshare predictor combined with a 16KB overriding isomorphic predictor can achieve better performance than either a 32KB gshare or a 32KB combining gshare/bimodal predictor.

Publication Stats

405 Citations
2.07 Total Impact Points

Institutions

  • 2008–2011
    • Concordia University–Ann Arbor
      Ann Arbor, Michigan, United States
  • 2010
    • Advanced Micro Devices
      Sunnyvale, California, United States
  • 2006–2009
    • University of Michigan
      • Department of Electrical Engineering and Computer Science (EECS)
      Ann Arbor, Michigan, United States
  • 2005
    • University of Texas at Austin
      • Department of Electrical & Computer Engineering
      Austin, TX, United States