Hsien-Hsin S. Lee

Georgia Institute of Technology, Atlanta, GA, USA

Are you Hsien-Hsin S. Lee?

Claim your profile

Publications (22)1.1 Total impact

  • Conference Proceeding: Symbiotic Scheduling for Shared Caches in Multi-core Systems Using Memory Footprint Signature.
    International Conference on Parallel Processing, ICPP 2011, Taipei, Taiwan, September 13-16, 2011; 01/2011
  • Source
    Article: Security Refresh: Protecting Phase-Change Memory against Malicious Wear Out.
    Nak Hee Seong, Dong Hyuk Woo, Hsien-Hsin S. Lee
    IEEE Micro. 01/2011; 31:119-127.
  • Source
    Conference Proceeding: Security refresh: prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping.
    Nak Hee Seong, Dong Hyuk Woo, Hsien-Hsin S. Lee
    37th International Symposium on Computer Architecture (ISCA 2010), June 19-23, 2010, Saint-Malo, France; 01/2010
  • Conference Proceeding: An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth.
    16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 9-14 January 2010, Bangalore, India; 01/2010
  • Source
    Conference Proceeding: SAFER: Stuck-At-Fault Error Recovery for Memories.
    43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2010, 4-8 December 2010, Atlanta, Georgia, USA; 01/2010
  • Source
    Conference Proceeding: Way guard: a segmented counting bloom filter approach to reducing energy for set-associative caches.
    Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009, San Fancisco, CA, USA, August 19-21, 2009; 01/2009
  • Source
    Conference Proceeding: Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs
    Mrinmoy Ghosh, Hsien-Hsin S. Lee
    [show abstract] [hide abstract]
    ABSTRACT: DRAMs require periodic refresh for preserving data stored in them. The refresh interval for DRAMs depends on the vendor and the de- sign technology they use. For each refresh in a DRAM row, the stored information in each cell is read out and then written back to itself as each DRAM bit read is self-destructive. The refresh pro- cess is inevitable for maintaining data correctness, unfortunately, at the expense of power and bandwidth overhead. The future trend to integrate layers of 3D die-stacked DRAMs on top of a proces- sor further exacerbates the situation as accesses to these DRAMs will be more frequent and hiding refresh cycles in the available slack becomes increasingly difficult. Moreover, due to the implica- tion of temperature increase, the refresh interval of 3D die-stacked DRAMs will become shorter than those of conventional ones. This paper proposes an innovative scheme to alleviate the en- ergy consumed in DRAMs. By employing a time-out counter for each memory row of a DRAM module, all the unnecessary periodic refresh operations can be eliminated. The basic concept behind our scheme is that a DRAM row that was recently read or written to by the processor (or other devices that share the same DRAM) does not need to be refreshed again by the periodic refresh opera- tion, thereby eliminating excessive refreshes and the energy dissi- pated. Based on this concept, we propose a low-cost technique in the memory controller for DRAM power reduction. The simulation results show that our technique can reduce up to 86% of all refresh operations and 59.3% on the average for a 2GB DRAM. This in turn results in a 52.6% energy savings for refresh operations. The overall energy saving in the DRAM is up to 25.7% with an average of 12.13% obtained for SPLASH-2, SPECint2000, and Biobench benchmark programs simulated on a 2GB DRAM. For a 64MB 3D DRAM, the energy saving is up to 21% and 9.37% on an average when the refresh rate- is 64 ms. For a faster 32ms refresh rate the maximum and average savings are 12% and 6.8% respectively.
    Microarchitecture, 2007. MICRO 2007. 40th Annual IEEE/ACM International Symposium on; 01/2008
  • Source
    Article: Reducing Cache Pollution via Dynamic Data Prefetch Filtering
    Xiaotong Zhuang, Hsien-Hsin S. Lee
    [show abstract] [hide abstract]
    ABSTRACT: In order to bridge the gap of the growing speed disparity between processors and their memory subsystems, aggressive prefetch mechanisms, either hardware-based or compiler-assisted, are employed to hide memory latencies. As the first-level cache gets smaller in deep submicron processor design for fast cache accesses, data cache pollution caused by overly aggressive prefetch mechanisms will become a major performance concern. Ineffective prefetches not only offset the benefits of benign prefetches due to pollution but also throttle bus bandwidth, leading to an overall performance degradation. In this paper, we propose and analyze a number of hardware-based prefetch pollution filtering mechanisms to differentiate good and bad prefetches dynamically based on history information. We designed three prefetch pollution filters organized as a one-level, two-level, or gshare style. In addition, we examine two table indexing schemes: per-address (PA) based and program counter (PC) based. Our prefetch pollution filters work in tandem with both hardware and software prefetchers. As our analysis shows, the cache pollution filters can reduce the ineffective prefetches by more than 90 percent and alleviate the excessive memory bandwidth induced by them. Also, the performance can be improved by up to 16 percent when our filtering mechanism is incorporated with aggressive prefetch filters as a result of reduced cache pollution and less competition for the limited number of cache ports. In addition, a number of sensitivity studies are performed to provide more understandings of the prefetch pollution filter design
    IEEE Transactions on Computers 02/2007; 56(1):18-31. · 1.10 Impact Factor
  • Conference Proceeding: Virtual Exclusion: An architectural approach to reducing leakage energy in caches for multiprocessor systems.
    Mrinmoy Ghosh, Hsien-Hsin S. Lee
    13th International Conference on Parallel and Distributed Systems (ICPADS 2007), December 5-7, 2007, Hsinchu, Taiwan; 01/2007
  • Conference Proceeding: An Integrated Framework for Dependable and Revivable Architectures Using Multicore Processors.
    Weidong Shi, Hsien-Hsin S. Lee, Laura Falk, Mrinmoy Ghosh
    33rd International Symposium on Computer Architecture (ISCA 2006), June 17-21, 2006, Boston, MA, USA; 01/2006
  • Source
    Conference Proceeding: Reducing energy of virtual cache synonym lookup using bloom filters.
    Proceedings of the 2006 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES 2006, Seoul, Korea, October 22-25, 2006; 01/2006
  • Source
    Conference Proceeding: Efficient System-on-Chip Energy Management with a Segmented Bloom Filter.
    Mrinmoy Ghosh, Emre Özer, Stuart Biles, Hsien-Hsin S. Lee
    Architecture of Computing Systems - ARCS 2006, 19th International Conference, Frankfurt/Main, Germany, March 13-16, 2006, Proceedings; 01/2006
  • Source
    Chapter: Memory-Centric Security Architecture
    Weidong Shi, Chenghuai Lu, Hsien-Hsin S. Lee
    [show abstract] [hide abstract]
    ABSTRACT: This paper presents a new security architecture for protecting software confidentiality and integrity. Different from the previous process-centric systems designed for the same purpose, the new architecture ties cryptographic properties and security attributes to memory instead of each individual user process. The advantages of such a memory centric design are many folds. First, it provides a better security model and access control on software privacy that supports both selective and mixed tamper resistant protection on software components from heterogeneous sources. Second, the new model supports and facilities tamper resistant secure information sharing in an open software system where both data and code components could be shared by different user processes. Third, the proposed security model and secure processor design allow software components protected with different security policies to inter-operate within the same memory space efficiently. Our new architectural support requires small silicon resources and its performance impact is minimal based on our experimental results using commercial MS Windows workloads and cycle based out-of-order processor simulation.
    10/2005: pages 153-168;
  • Conference Proceeding: High Efficiency Counter Mode Security Architecture via Prediction and Precomputation.
    32st International Symposium on Computer Architecture (ISCA 2005), 4-8 June 2005, Madison, Wisconsin, USA; 01/2005
  • Conference Proceeding: Architectural Support for High Speed Protection of Memory Integrity and Confidentiality in Multiprocessor Systems.
    Weidong Shi, Hsien-Hsin S. Lee, Mrinmoy Ghosh, Chenghuai Lu
    13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September - 3 October 2004, Antibes Juan-les-Pins, France; 01/2004
  • Source
    Conference Proceeding: Hardware assisted control flow obfuscation for embedded processors.
    Xiaotong Zhuang, Tao Zhang, Hsien-Hsin S. Lee, Santosh Pande
    Proceedings of the 2004 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES 2004, Washington DC, USA, September 22 - 25, 2004; 01/2004
  • Article: Predicate-Aware Scheduling: A Technique for Reducing
    [show abstract] [hide abstract]
    ABSTRACT: Predicated execution enables the removal of branches wherein segments of branching code are converted into straight-line segments of conditional operations. An important, but generally ignored side effect of this transformation is that the compiler must assign distinct resources to all the predicated operations at a given time to ensure that those resources are available at run-time. However, a resource is only put to productive use when the predicates associated with its operations evaluate to True. We propose predicate-aware scheduling to reduce the superfluous commitment of resources to operations whose predicates evaluate to False at run-time. The central idea is to assign multiple operations to the same resource at the same time, thereby oversubscribing its use. This assignment is intelligently performed to ensure that no two operations simultaneously assigned to the same resource will have both of their predicates evaluate to True. Thus, no resource is dynamically oversubscribed. The overall effect of predicate aware scheduling is to use resources more efficiently, thereby increasing performance when resource constraints are a bottleneck.
    02/2003;
  • Conference Proceeding: Predicate-Aware Scheduling: A Technique for Reducing Resource Constraints.
    1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 23-26 March 2003, San Francisco, CA, USA; 01/2003
  • Conference Proceeding: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches.
    Xiaotong Zhuang, Hsien-Hsin S. Lee
    32nd International Conference on Parallel Processing (ICPP 2003), 6-9 October 2003, Kaohsiung, Taiwan; 01/2003
  • Source
    Article: IdlePower: Application-aware management of processor idle states
    [show abstract] [hide abstract]
    ABSTRACT: Power has become the first class design constraint in mod-ern processor design. To reduce the power density caused by aggressive, speculative execution seen in previous processor generations, computer architects have turned to a multi-core design strategy with each core substantially simplified. Additionally, different power-saving features have been pro-posed and integrated into each core to adapt dynamic exe-cution scenarios. Due in part to the independent nature of these cores, the power management has also become more flexible to further reduce the overall power consumption. With careful adaptation schemes, the system can save power by entering different idle states dynamically with minimal performance impact. Given the simultaneous emergence of virtualization technologies, the question, then, is how to ef-fectively leverage these idle states in the context of multi-ple virtual machines (VMs) executing on multicore parts. Towards this end, we develop the IdlePower approach to managing idle states in virtualized systems. Our approach combines a novel batching algorithm that creates improved opportunities to enter deep idle states by removing unneces-sary system wakeups depending upon monitored behavior of workloads. IdlePower also provides application awareness in another fashion by not only entering deep idle states based upon transition latencies, but also factoring in the perfor-mance degradation that can occur due to secondary effects such as data loss in cache structures. We extend the use of Bloom filters with IdlePower to detect application charac-teristics for dynamically predicting whether deep idle states are worthwhile based upon possible performance implica-tions. Overall, IdlePower is shown to improve residencies in the deepest C3 idle state by up to 10%, and to avoid performance degradations in workloads of up to 26%.