Kyeounsoo Kim's research while affiliated with Hanyang University and other places

Publications (12)

Article
Full-text available
This paper demonstrates the design of efficient asynchronous bundled-data pipelines for the matrix-vector multiplication core of discrete cosine transforms (DCTs). The architecture is optimized for both zero and small-valued data, typical in DCT applications, yielding both high average performance and low average power. The proposed bundled-data pi...
Conference Paper
Full-text available
This paper proposes an efficient asynchronous hardwired matrix-vector multiplier for the two-dimensional discrete cosine transform and inverse discrete cosine transform (DCT/IDCT). The design achieves low power and high performance by taking advantage of the typically large fraction of zero and small-valued data in DCT and IDCT applications. In par...
Article
Full-text available
Introduction: The increasing demand for portable and wireless multimedia applications that rely on limited battery energy has made low power architectures and designs for these applications critical. Since real-time matrix transposition consumes a large fraction of the power in multi-dimensional image and signal processing, low-power matrix transpo...
Article
This paper presents an efficient frame memory interface of MPEG-2 video encoder which is accomplished in not only reducing interface buffer size through efficient memory map organization and access timing schedules but also avoiding unnecessary small size buffers and simplifying their control circuits. In this design, 0.5 μm CMOS TLM (triple layer...
Article
Full-text available
: This paper presents low-power asynchronous barrel shifters for variable length encoders and decoders useful in portable applications using multimedia standards. Our approach is to create multi-level asynchronous barrel shifters optimized for the skewed shift control statistics often found in these codecs. For common shifts, data passes through on...
Conference Paper
Full-text available
In this paper, a VLSI architecture for DPCM Hybrid Coding Loop (DHCL), which consists of 2D-DCT, quantization, scan conversion, inverse quantization and 2D-IDCT, is presented. The architecture of the DHCL is designed to handle macroblock data within 1320 cycles and suitable for MPEG-2 video encoder accepting NTSC and PAL image formats. Only single...
Article
This paper presents an area efficient VLSI architecture of transform coding module for MPEG-2 video encoder. This module consists of 2-D DCT and 2-D IDCT, Q and IQ, and zigzag and alternate scan conversion circuits. Hardware cost and performance of this module are mainly affected by the 2-D DCT and 2-D IDCT. In the proposed architecture, it is show...
Conference Paper
Full-text available
This paper proposes a low-overhead MSB-controlled inversion coding technique to reduce the transition activity in a matrix transposer a commonly used component in 2-dimensional discrete cosine transform (DCT) and inverse DCT (IDCT) applications. A family of designs is identified in which this technique is applied to different bit slices of the matr...
Conference Paper
This paper presents low-power asynchronous barrel shifters for variable length encoders and decoders useful in portable applications using multimedia standards. Our approach is to create multi-level asynchronous barrel shifters optimized for the skewed shift control statistics often found in these codecs. For common shifts, data passes through one...
Conference Paper
In this paper an efficient frame memory interface of an MPEG-2 video encoder is presented. The proposed architecture takes about 58% less hardware area than the existing architecture (Kim et al. 1997), and results in reducing the total hardware area of the video encoder up to 24.3%
Conference Paper
Full-text available
This paper proposes a high-performance low-power asynchronous architecture for matrix-vector multipliers of a constant matrix by a vector which are typically used in discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) applications. The architecture takes advantage of the statistics of DCT and IDCT data that suggest that the...

Citations

... Gate level techniques are more efficient than other techniques because signal gating and bypassing cannot be used at architecture level. The2-Dimensional signal gating techniques can achieve power savings for low-precision input data with large dynamic range [31][59] [60]. Using a typically large fraction of zero and small valued input, a signal gating approach can achieve power savings by deactivating slices. ...
... In [70] a review of some of the encoding techniques has been undertaken. The encoding function can be optimized for specific access patterns such as sequential access (Gray [71,72], T0 [68], Pyramid [73]) or random data (Bus Invert [61]), for special data types such as floating point numbers in DSP applications [74]. The encoding may be fully customized to the target data (Working Zone [75], Beach [76]). ...
... dissipation. An analysis finds that the DA architecture provides higher speed than a hardwired multiplier but also dissipates more power [34]. The advantages of the DA make it a popular choice for high-speed, low area implementations [12] [36]. ...
... A different approach has been used to implement a few asynchronous circuits using the bundled data scheme with completion detection techniques [112,[161][162][163][164][165], which can indicate the data validity as soon as the process is complete. A speculative completion detection scheme is designed for asynchronous fixed-point adders [161,162], and for barrel shifters [163], where the datapath channel is implemented with multiple delay models, including the worst-case delay. ...
... The authors in [11] present a design for the quantization for AVS. The design in [12] describes an MPEG-2 encoder. In [13], another JPEG encoder is implemented for images where the quantization block is designed using multiplication and shift operation instead of division. ...
... Asynchronous circuits in the pipeline style can be classified into two groups, which are [4,5]: a) bundled-data (BD) [6][7][8][9][10][11][12], where acknowledge and request signals are used as handshaking signals and the data is transmitted using the single-rail encoding, i.e., one signal per data bit. Handshaking signals are generated in the time required (matched delay) for the data to be processed; b) Data-Drive (DD) [13][14][15], where the data is encoded, for example, as dual-rail, and the data bit is represented by two signals. ...
... We consider a set of video coders and decoders, which share a set of cores to perform some common signal processing operations. In particular, we include two legacy standards, MPEG-2 [31], [32] and H.263 [33], and the more recent MPEG-4 [34], [35] format. The codecs are supposed to be configured on an FPGA device (in this example, we opted for a Xilinx XC4VLX60) to encode or decode a video stream with one of the supported formats, but they cannot fit on the device at the same time because of area and power-related issues. ...
... A different approach has been used to implement a few asynchronous circuits using the bundled data scheme with completion detection techniques [112,[161][162][163][164][165], which can indicate the data validity as soon as the process is complete. A speculative completion detection scheme is designed for asynchronous fixed-point adders [161,162], and for barrel shifters [163], where the datapath channel is implemented with multiple delay models, including the worst-case delay. ...