-
Microprocessors and Microsystems - Embedded Hardware Design. 01/2010; 34:102-116.
-
IEICE Transactions. 01/2009; 92-C:517-521.
-
[show abstract]
[hide abstract]
ABSTRACT: Portable multiphase clock generators capable of adjusting its clock phase according to input clock frequencies have been developed both in a 0.18-mum and in a 0.13-mum CMOS technologies. They consist of a full-digital CMOS circuit design that leads to a simple, robust, and portable IP. In addition, their open-loop architecture lead to no jitter accumulation and one-cycle lock characteristic that enables clock-on-demand circuit structures. The implemented low power clock generator tile in a 0.13-mum CMOS technology occupies only 0.004 mm <sup>2</sup> and operates at variable input frequencies ranging from 625 MHz to 1.2 GHz within a plusmn 2% phase error having one-cycle lock time.
Circuits and Systems II: Express Briefs, IEEE Transactions on 03/2008; · 1.41 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Power gating is one of the most effective techniques in reducing leakage power, which increases exponentially with device scaling. However, large ground bounces during abrupt changes of power mode may cause unwanted transitions in neighboring circuits, which should still be operating normally. We analyzed this ground-bounce noise and reduced it with novel power-gating structures that utilize holistic integrated device-circuit-architecture approaches. We control the amount of charge in the intermediate nodes of the circuit that passes through the sleep transistors during the wake-up transition and stabilize the minimum virtual power supply voltage required for data retention. These techniques have been proven in silicon using 65-nm bulk CMOS technology.
IEEE Transactions on Electron Devices 02/2008; · 2.32 Impact Factor
-
IEEE Trans. on Circuits and Systems. 01/2008; 55-II:116-120.
-
Proceedings of the 2007 ACM Symposium on Applied Computing (SAC), Seoul, Korea, March 11-15, 2007; 01/2007
-
High Performance Computing and Communications, Second International Conference, HPCC 2006, Munich, Germany, September 13-15, 2006, Proceedings; 01/2006
-
Microprocessors and Microsystems. 01/2006; 30:209-215.
-
Microprocessors and Microsystems. 01/2006; 30:137-144.
-
[show abstract]
[hide abstract]
ABSTRACT: This paper proposes a power-aware cache block allocation algorithm for the way-selective set-associative cache on embedded systems to reduce energy consumption without additional delay or performance degradation. For this goal, way selection logic and specialized replacement policy are designed to enable only one way of set-associative cache as in the direct-mapped cache. Overall cache access time becomes almost the same as that of a conventional set associative cache with accessing additional way selection logic. Because data array can be accessed without waiting for tag comparison, multiplexer delay can be removed totally. The simulation result shows that the proposed architecture can reduce a per access power consumption by 59% over conventional set-associative caches with average 0.06% of negligible performance loss.
Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings. IEEE International Conference on; 11/2004
-
IEICE Transactions. 01/2004; 87-D:2253-2257.
-
IEICE Transactions. 01/2004; 87-D:1962-1964.
-
[show abstract]
[hide abstract]
ABSTRACT: We present a selective filter-bank translation lookaside buffer (TLB) system with low power consumption for embedded processors. The proposed TLB is constructed as multiple banks with a small two-bank buffer, called a filter-bank buffer, located above its associated bank. Either a filter-bank buffer or a main bank TLB can be selectively accessed, based on two bits in the filter-bank buffer. Energy savings are achieved by reducing the number of entries accessed at a time, by using filtering and the bank mechanism. The overhead of the proposed TLB turns out to be negligible compared with other hierarchical structures. Simulation results show that the energy×delay product can be reduced by about 88% compared with a fully -associative TLB, 75% with respect to a filter-TLB, and 51% relative to a banked-filter TLB.
Low Power Electronics and Design, 2003. ISLPED '03. Proceedings of the 2003 International Symposium on; 09/2003
-
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003, Seoul, Korea, August 25-27, 2003; 01/2003
-
[show abstract]
[hide abstract]
ABSTRACT: This letter presents point diffusion clock network (PDCN) with local clock tree synthesis (CTS) scheme. The clock network is implemented with ten times wider metal line space than typical mesh networks for low power and utilized to nine times smaller area CTS execution for minimized clock skew amount. The measurement results show that skew amount of PDCN with local CTS is reduced to 36% and latency is shrunk to 45% of the amount in a 4.81 mm2 CortexA-8 core with 65 nm Samsung process.
-
[show abstract]
[hide abstract]
ABSTRACT: With the rapid development of semiconductor technology, more complicated systems have been integrated into single chips. However, system performance is not increased in proportion to the gate-count of the system. This is mainly because the optimized design of the system becomes more difficult as the systems become more complicated. Therefore, it is essential to understand the internal behavior of the system and utilize the system resources effectively in the System on Chip (SOC) design. In this paper, we design a Performance Analysis Unit (PAU) for monitoring the AMBA Advanced eXtensible Interface (AXI) bus as a mechanism to investigate the internal and dynamic behavior of an SOC, especially for internal bus activities. A case study with the PAU for an H.264 decoder application is also presented to show how the PAU is utilized in SOC platform. The PAU has the capability to measure major system performance metrics, such as bus latency, amount of bus traffic, contention between master/slave devices, and bus utilization for specific durations. This paper also presents a distributor and synchronization method to connect multiple PAUs to monitor multiple internal buses of large SOC.
Microprocessors and Microsystems.