Conference Paper

Analysis and design of macroblock pipelining for H.264/AVC VLSI architecture

Dept. of Electr. Eng., Nat. Taiwan Univ., Taipei, Taiwan
DOI: 10.1109/ISCAS.2004.1329261 Conference: Circuits and Systems, 2004. ISCAS '04. Proceedings of the 2004 International Symposium on, Volume: 2
Source: IEEE Xplore

ABSTRACT This paper presents a new macroblock (MB) pipelining scheme for H.264/AVC encoder. Conventional video encoders adopt two-stage MB pipelines, which are not suitable for H.264/AVC due to the long encoding path, sequential procedure, and large bandwidth requirement. According to our analysis of encoding process, an H.264/AVC accelerator is divided into five major functional blocks with four-stage MB pipelines to highly increase the processing capability and hardware utilization. By adopting shared memories between adjacent pipelines with sophisticated task scheduling, 55% of the bus bandwidth can be further reduced. Besides, hardware-oriented algorithms are proposed without loss of video quality to remove data dependencies that prevent parallel processing and MB pipelining. The H.264/AVC Baseline Profile Level Three encoder, which requires computational complexity of 1.8 tera-instructions per second (TIPS), is successfully mapped into hardware with our MB pipeline scheme at 100 MHz.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents an overview of the transform and quantization designs in H.264. Unlike the popular 8×8 discrete cosine transform used in previous standards, the 4×4 transforms in H.264 can be computed exactly in integer arithmetic, thus avoiding inverse transform mismatch problems. The new transforms can also be computed without multiplications, just additions and shifts, in 16-bit arithmetic, thus minimizing computational complexity, especially for low-end processors. By using short tables, the new quantization formulas use multiplications but avoid divisions.
    IEEE Transactions on Circuits and Systems for Video Technology 08/2003; 13(7-13):598 - 603. DOI:10.1109/TCSVT.2003.814964 · 2.26 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Transform coding has been widely used in video coding standards. In this paper, a hardware architecture for accelerating transform coding operations in MPEG-4 AVC/H.264 is presented. This architecture calculates 4 inputs in parallel by fast algorithms described previously. The transpose operations are implemented by a register array with directional transfers. This architecture has been mapped into a 4 × 4 multiple transforms unit and synthesized in TSMC 0.35um technology. The multiple transform processor can process 320M pixels/sec at 80Mhz for all 4 × 4 transforms used in MPEG-4 AVC/ H.264.
    Circuits and Systems, 2003. ISCAS '03. Proceedings of the 2003 International Symposium on; 06/2003
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents an efficient VLSI architecture for the deblocking filter in H.264/JVT/AVC. We use an array of 8×4 8-bit shift registers with reconfigurable data path to support both horizontal filtering and vertical filtering on the same circuit (a parallel-in parallel-out reconfigurable FIR filter). Two SRAM modules are carefully organized not only for the storage of current macroblock data and adjacent block data but also for the efficient access of pixels in different blocks. Simulation results show that under 0.25 μm technology, the synthesized logic gate count is only 19.1 K (not including a 96×32 SRAM and a 64×32 SRAM) when the maximum frequency is 100 MHz. Our architecture design can easily support real-time deblocking of 720p (1280×720) 30 Hz video. It is valuable for platform-based design of H.264 codec.
    Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on; 08/2003


Available from