Ching-Yeh Chen

National Taiwan University, Taipei, Taipei, Taiwan

Are you Ching-Yeh Chen?

Claim your profile

Publications (28)25.2 Total impact

  • Conference Proceeding: Sample adaptive offset for HEVC.
    IEEE 13th International Workshop on Multimedia Signal Processing (MMSP 2011), Hangzhou, China, October 17-19, 2011; 01/2011
  • Source
    Article: Efficient Architecture Design of Motion-Compensated Temporal Filtering/Motion Compensated Prediction Engine
    [show abstract] [hide abstract]
    ABSTRACT: Since motion-compensated temporal filtering (MCTF) becomes an important temporal prediction scheme in video coding algorithms, this paper presents an efficient temporal prediction engine which not only is the first MCTF hardware work but also supports traditional motion-compensated prediction (MCP) scheme to provide computation scalability. For the prediction stage of MCTF and MCP schemes, modified extended double current Frames is adopted to reduce the system memory bandwidth, and a frame-interleaved macroblock pipelining scheme is proposed to eliminate the induced data buffer overhead. In addition, the proposed update stage architecture with pipelined scheduling and motion estimation (ME)-like motion compensation (MC) with level C+ scheme can also save about half external memory bandwidth and eliminate irregular memory access for MC. Moreover, 76.4% hardware area of the update stage is saved by reusing the hardware resources of the prediction stage. This MCTF chip can process CIF 30 fps in real-time, and the searching range is [-32, 32) for 5/3 MCTF with four-decomposition level and also support 1/3 MCTF, hierarchical B-frames, and MCP coding schemes in JSVM and H.264/AVC. The gate count is 352-K gates with 16.8 KBytes internal memory, and the maximum operating frequency is 60 MHz.
    IEEE Transactions on Circuits and Systems for Video Technology 02/2008; · 1.65 Impact Factor
  • Source
    Article: Analysis and Hardware Architecture Design of Global Motion Estimation.
    Signal Processing Systems. 01/2008; 53:285-300.
  • Source
    Article: On-Chip Memory Optimization Scheme for VLSI Implementation of Line-Based Two-Dimentional Discrete Wavelet Transform
    [show abstract] [hide abstract]
    ABSTRACT: The on-chip line buffer dominates the total area and power of line-based 2-D discrete wavelet transform (DWT). In this paper, a memory-efficient VLSI implementation scheme for line-based 2-D DWT is proposed, which consists of two parts, the wordlength analysis methodology and the multiple-lifting scheme. The required wordlength of on-chip memory is determined firstly by use of the proposed wordlength analysis methodology, and a memory-efficient VLSI implementation scheme for line-based 2-D DWT, named multiple-lifting scheme, is then proposed. The proposed wordlength analysis methodology can guarantee to avoid overflow of coefficients, and the average difference between predicted and experimental quality level is only 0.1 dB in terms of PSNR. The proposed multiple-lifting scheme can reduce not only at least 50% on-chip memory bandwidth but also about 50% area of line buffer in 2-D DWT module.
    IEEE Transactions on Circuits and Systems for Video Technology 08/2007; · 1.65 Impact Factor
  • Chapter: Multimedia IP Development
    [show abstract] [hide abstract]
    ABSTRACT: Multimedia intellectual property (IP) cores play a critical role in a successful multimedia SOC design. This chapter will focus on the design of image and video codec IPs, which usually requires lots of computational power. From theory to practice and from algorithm to hardware architecture, design methodologies toward an optimized architecture and also real design cases will be presented. Both top-down system analysis and bottom-up core module design are emphasized. Following theoretical discussions of the overall scenario, key building blocks of image and video codecs proposed in literature are reviewed. Examples will cover motion estimation, discrete cosine transform, discrete wavelet transform, and entropy coder. Then, complete image and video codec designs are explored. JPEG, JPEG 2000, and H.264/AVC are the three case studies. This chapter is intended to provide an overview, from theory to practice, on how to design efficient multimedia IPs
    05/2007: pages 19-72;
  • Source
    Article: One-pass computation-aware motion estimation with adaptive search strategy
    [show abstract] [hide abstract]
    ABSTRACT: A computation-aware motion estimation algorithm is proposed in this paper. Its goal is to find the best block-matching results in a computation-limited and computation-variant environment. Our algorithm is characterized by a one-pass flow with adaptive search strategy. In the prior scheme, Tsai et al. propose that all macroblocks are processed simultaneously, and more computation is allocated to the macroblock with the largest distortion among the entire frame in a step-by-step fashion. This implies that random access of macroblocks is required, and the related information of neighboring macroblocks cannot be used to be prediction. The random access flow requires a huge memory size for all macroblocks to store the up-to-date minimum distortions, best motion vectors, and searching steps. On the contrary, our one-pass flow processes the macroblocks one by one, which can not only significantly reduce the memory size but also effectively utilize the context information of neighboring macroblocks to achieve faster speed and better quality. Moreover, in order to improve the video quality when the computation resource is still sufficient, the search pattern is allowed to adaptively change from diamond search to three step search, and then to full search. Last but not least, traditional block matching speed-up methods are also combined to provide much better computation-distortion curves
    IEEE Transactions on Multimedia 09/2006; · 1.93 Impact Factor
  • Source
    Article: Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder
    [show abstract] [hide abstract]
    ABSTRACT: H.264/AVC significantly outperforms previous video coding standards with many new coding tools. However, the better performance comes at the price of the extraordinarily huge computational complexity and memory access requirement, which makes it difficult to design a hardwired encoder for real-time applications. In addition, due to the complex, sequential, and highly data-dependent characteristics of the essential algorithms in H.264/AVC, both the pipelining and the parallel processing techniques are constrained to be employed. The hardware utilization and throughput are also decreased because of the block/MB/frame-level reconstruction loops. In this paper, we describe our techniques to design the H.264/AVC video encoder for HDTV applications. On the system design level, in consideration of the characteristics of the key components and the reconstruction loops, the four-stage macroblock pipelined system architecture is first proposed with an efficient scheduling and memory hierarchy. On the module design level, the design considerations of the significant modules are addressed followed by the hardware architectures, including low-bandwidth integer motion estimation, parallel fractional motion estimation, reconfigurable intrapredictor generator, dual-buffer block-pipelined entropy coder, and deblocking filter. With these techniques, the prototype chip of the efficient H.264/AVC encoder is implemented with 922.8 K logic gates and 34.72-KB SRAM at 108-MHz operation frequency.
    IEEE Transactions on Circuits and Systems for Video Technology 07/2006; · 1.65 Impact Factor
  • Source
    Conference Proceeding: Frame-level data reuse for motion-compensated temporal filtering
    [show abstract] [hide abstract]
    ABSTRACT: Motion-compensated temporal filtering (MCTF) is an open-loop prediction scheme, so the frame-level data reuse for MCTF is possible. In this paper, we propose two general frame-level data reuse schemes which can minimize the memory bandwidth of current and reference frames, respectively. And their relationships between the required memory bandwidth and the number of searching range buffers are also formulated under the constraint of the data dependency in joint scalable video model. Finally, we extend our analysis to pyramid MCTF and the impact of the inter-layer prediction scheme is also considered
    Circuits and Systems, 2006. ISCAS 2006. Proceedings. 2006 IEEE International Symposium on; 06/2006
  • Source
    Article: Level C+ data reuse scheme for motion estimation with corresponding coding orders
    [show abstract] [hide abstract]
    ABSTRACT: The memory bandwidth reduction for motion estimation is important because of the power consumption and limited memory bandwidth in video coding systems. In this paper, we propose a Level C+ scheme which can fully reuse the overlapped searching region in the horizontal direction and partially reuse the overlapped searching region in the vertical direction to save more memory bandwidth compared to the Level C scheme. However, direct implementation of the Level C+ scheme may conflict with some important coding tools and then induces a lower hardware efficiency of video coding systems. Therefore, we propose n-stitched zigzag scan for the Level C+ scheme and discuss two types of 2-stitched zigzag scan for MPEG-4 and H.264 as examples. They can reduce memory bandwidth and solve the conflictions. When the specification is HDTV 720p, where the searching range is [-128,128), the required memory bandwidth is only 54%, and the increase of on-chip memory size is only 12% compared to those of traditional Level C data reuse scheme.
    IEEE Transactions on Circuits and Systems for Video Technology 05/2006; · 1.65 Impact Factor
  • Source
    Article: Analysis and architecture design of variable block-size motion estimation for H.264/AVC
    [show abstract] [hide abstract]
    ABSTRACT: Variable block-size motion estimation (VBSME) has become an important video coding technique, but it increases the difficulty of hardware design. In this paper, we use inter-/intra-level classification and various data flows to analyze the impact of supporting VBSME in different hardware architectures. Furthermore, we propose two hardware architectures that can support traditional fixed block-size motion estimation as well as VBSME with less chip area overhead compared to previous approaches. By broadcasting reference pixel rows and propagating partial sums of absolute differences (SADs), the first design has the fewer reference pixel registers and a shorter critical path. The second design utilizes a two-dimensional distortion array and one adder tree with the reference buffer that can maximize the data reuse between successive searching candidates. The first design is suitable for low resolution or a small search range, and the second design has advantages of supporting a high degree of parallelism and VBSME. Finally, we propose an eight-parallel SAD tree with a shared reference buffer for H.264/AVC integer motion estimation (IME). Its processing ability is eight times of the single SAD tree, but the reference buffer size is only doubled. Moreover, the most critical issue of H.264 IME, which is huge memory bandwidth, is overcome. We are able to save 99.9% off-chip memory bandwidth and 99.22% on-chip memory bandwidth. We demonstrate a 720-p, 30-fps solution at 108 MHz with 330.2k gate count and 208k bits on-chip memory
    Circuits and Systems I: Regular Papers, IEEE Transactions on 04/2006; · 1.97 Impact Factor
  • Source
    Article: Video de-interlacing by adaptive 4-field global/local motion compensated approach
    [show abstract] [hide abstract]
    ABSTRACT: A de-interlacing algorithm using adaptive 4-field global/local motion compensated approach is presented. It consists of block-based directional edge interpolation, same-parity 4-field motion detection, global/local motion estimation and compensation. The edges are sharper when the directional edge interpolation is adopted. The same parity 4-field motion detection and the 4-field local motion estimation detect the static areas and fast motion by four reference fields, and the global motion estimation detects the camera panning and zooming motions. The global and local motion compensation recover the interlaced videos to the progressive ones. Experimental results show that the peak signal-to-noise ratio of our proposed algorithm is 2∼3 dB higher than that of previous studies and attain the best quality of subjective view.
    IEEE Transactions on Circuits and Systems for Video Technology 01/2006; · 1.65 Impact Factor
  • Conference Proceeding: Analysis and VLSI architecture of update step in motion-compensated temporal filtering.
    International Symposium on Circuits and Systems (ISCAS 2006), 21-24 May 2006, Island of Kos, Greece; 01/2006
  • Source
    Article: Survey on Block Matching Motion Estimation Algorithms and Architectures with New Results.
    VLSI Signal Processing. 01/2006; 42:297-320.
  • Source
    Conference Proceeding: Scalable Rate-Distortion-Computation Hardware Accelerator for MCTF and ME.
    Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, ICME 2006, July 9-12 2006, Toronto, Ontario, Canada; 01/2006
  • Source
    Conference Proceeding: System analysis of VLSI architecture for motion-compensated temporal filtering
    [show abstract] [hide abstract]
    ABSTRACT: The motion-compensated temporal filtering (MCTF) is an innovative prediction scheme for video coding and has become the core technology of the coming video coding standard, MPEG-21 part 13 - scalable video coding (SVC). This paper provides the system analysis of MCTF for VLSI implementation, which includes computational complexity, external memory access, external storage size, and coding delay. The one-level MCTF is analyzed first, and a modified double current frames scheme is introduced to address the external memory access penalty that results from fractional-pel motion compensation (MC). Then the analysis is extended to multi-level MCTF, in which many important system issues will be explored. Finally, a real-life test case was given to compare the system requirements of many different MCTF schemes and the prediction scheme of H.264/AVC.
    Image Processing, 2005. ICIP 2005. IEEE International Conference on; 10/2005
  • Source
    Article: Hardware architecture design of video compression for multimedia communication systems
    [show abstract] [hide abstract]
    ABSTRACT: First Page of the Article
    IEEE Communications Magazine 09/2005; · 3.79 Impact Factor
  • Source
    Conference Proceeding: Architecture of global motion compensation for MPEG-4 advanced simple profile
    Yi-Hau Chen, Ching-Yeh Chen, Liang-Gee Chen
    [show abstract] [hide abstract]
    ABSTRACT: Global motion compensation (GMC) is an important coding tool in MPEG-4 advanced simple profile (ASP). In this paper, we propose an efficient GMC hardware architecture for MPEG-4 ASP@L5. Based on analysis of the affine model, the proposed memory arrangement and cascaded scheduling reduce the impact of irregular memory access and improve processing ability. It can process 30 fps at only 25 MHz. The implementation result shows that the total gate count is 19.3 K and internal memory size is 1.28 Kb. It is suitable to be integrated into MPEG-4 ASP encoders and decoders.
    Circuits and Systems, 2005. ISCAS 2005. IEEE International Symposium on; 06/2005
  • Source
    Conference Proceeding: One-pass computation-aware motion estimation with adaptive search strategy
    [show abstract] [hide abstract]
    ABSTRACT: A computation-aware motion estimation algorithm is proposed. Its goal is to find the best block matching results in a computation-limited and computation-variant environment. Our new features are one-pass flow and adaptive search strategies. The prior scheme allocates more computation to the macroblock with the highest distortion in the entire frame step by step. This implies that random access of macroblocks is inevitable, and the search pattern must be determined in advance. The random access flow requires a huge size of memory for all macroblocks to store the up-to-date minimum distortions, best motion vectors, and searching steps. In contrast, the one-pass flow can not only significantly reduce the memory size but also effectively use the context information of neighboring macroblocks to achieve faster convergence and better quality. Moreover, to improve video quality when computation resource is still sufficient, the search strategy is allowed to change adaptively from diamond search to three step search, and then to full search. Last but not least, traditional block matching speedup methods are combined to provide much better computation-distortion curves.
    Circuits and Systems, 2005. ISCAS 2005. IEEE International Symposium on; 06/2005
  • Source
    Conference Proceeding: Four field variable block size motion compensated adaptive de-interlacing
    [show abstract] [hide abstract]
    ABSTRACT: A four field variable block size motion compensated adaptive de-interlacing method is proposed to improve the accuracy of the motion vectors and lower the occlusions of motion compensated de-interlacing. The proposed de-interlacing method consists of variable block size motion estimation/compensation with four field SAD, interlaced block mode decision, and new block modes. The variable block size motion estimation and compensation improve the accuracy of the motion vectors, especially for spatially-periodic patterns. The new block modes and the interlaced block mode decision make block decisions more precisely, and special patterns that motion compensation cannot be compensated are correctly de-interlaced by these two methods. The subjective view shows an improvement of the accuracy of the motion vectors and the correctness of the mode decision.
    Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on; 04/2005 · 4.63 Impact Factor
  • Source
    Conference Proceeding: Memory analysis of VLSI architecture for 5/3 and 1/3 motion-compensated temporal filtering [video coding applications]
    [show abstract] [hide abstract]
    ABSTRACT: To the best of authors' knowledge, this paper presents the first work on memory analysis of VLSI architectures for motion-compensated temporal filtering (MCTF). The open-loop MCTF prediction scheme has led the revolution for hybrid video coding methods that are mainly based on the close-loop MC prediction (MCP) scheme, and it also becomes the core technology of the coming video coding standard, MPEG-21 part 13-scalable video coding (SVC). In this paper, the macroblock (MB)-level and frame-level data reuse schemes are analyzed for the MCTF. The MB-level data reuse is especially for the motion estimation (ME), and the level C+ scheme is proposed, which can further reduce the memory bandwidth of the conventional level C scheme. Frame-level data reuse schemes for MCTF are proposed according to the open-loop prediction nature.
    Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on; 04/2005 · 4.63 Impact Factor