Conference Paper

Multi-Pass and Frame Parallel Algorithms of Motion Estimation in H.264/AVC for Generic GPU

Nat. Taiwan Univ., Taipei
DOI: 10.1109/ICME.2007.4284972 Conference: Multimedia and Expo, 2007 IEEE International Conference on
Source: IEEE Xplore


In this paper, multi-pass and frame parallel algorithms are proposed to accelerate various motion estimation (ME) tools in H.264 with the graphics processing unit (GPU). By the multi-pass method to unroll and rearrange the multiple nested loops, the integer-pel ME can be implemented with two-pass process on GPU. Moreover, fractional ME needs six passes for frame interpolation with six-tap filter and motion vector refinement. Motion estimation with multiple reference frames can be implemented with two-pass process with frame-level parallel scheme by use of SIMD vector operations of GPU. Experimental results show that, compared to implementations with only CPU, about 6 times to 56 times speed-up can be achieved for different ME algorithms.

6 Reads
  • Source
    • "The following paragraphs will provide an insight into the state-of-the-art from these two points of view. Solutions for accelerating the H.264/AVC encoding algorithm by making use of Many-Core graphics hardware were firstly proposed in 2007 by Lee et al. [10]. They used a multi-pass and frame parallel algorithm to accelerate some ME tools available in an H.264/AVC encoder by using the OPENGL API. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract H.264/MVC is a standard for supporting the sensation of 3D, based on coding from 2 (stereo) to N views. H.264/MVC adopts many coding options inherited from single view H.264/AVC, and thus its complexity is even higher, mainly because the number of processing views is higher. In this manuscript, we aim at an efficient parallelization of the most computationally intensive video encoding module for stereo sequences. In particular, inter prediction and its collaborative execution on a heterogeneous platform. The proposal is based on an efficient dynamic load balancing algorithm and on breaking encoding dependencies. Experimental results demonstrate the proposed algorithm’s ability to reduce the encoding time for different stereo high definition sequences. Speed-up values of up to 90× were obtained when compared with the reference encoder on the same platform. Moreover, the proposed algorithm also provides a more energy–efficient approach and hence requires less energy than the sequential reference algorithm.
    Computers & Electrical Engineering 11/2013; DOI:10.1016/j.compeleceng.2013.05.009 · 0.82 Impact Factor
  • Source
    • "Therefore, we implement the most time-consuming H.264 coding process – the motion estimation unit – on this device. Although a few GPU-based motion estimation methods have been proposed [4] [5] [6] [7], the new CUDA architecture needs a new algorithm to fully utilize its features [8]. In this paper, a highly parallel variable block size full-search ME algorithm with fractional pixel refinement is proposed. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Due to the rapid growth of graphics processing unit (GPU) processing capability, using GPU as a coprocessor to assist the central processing unit (CPU) in computing massive data becomes essential. In this paper, we present an efficient block-level parallel algorithm for the variable block size motion estimation (ME) in H.264/AVC with fractional pixel refinement on a computer unified device architecture (CUDA) platform, developed by NVIDIA in 2007. The CUDA enhances the programmability and flexibility for general-purpose computation on GPU. We decompose the H.264 ME algorithm into 5 steps so that we can achieve highly parallel computation with low external memory transfer rate. Experimental results show that, with the assistance of GPU, the processing time is 12 times faster than that of using CPU only.*
    Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, ICME 2008, June 23-26 2008, Hannover, Germany; 01/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes an efficient implementation of the H.264/AVC motion estimation algorithm in hardware and software. Furthermore, a complete co- design trajectory from the HW/SW partitioning to the actual implementation on two different targets is shown. A Leon 3 + FPGA and an ARM + Montium implementation have been successfully realized. The FPGA implementation shows a speed-up of 43.6 whereas the Montium implementation shows a speed- up of 21.5 , both compared to a software-only im- plementation. Power consumption is 42.0 mW for the FPGA and 60.2 mW for the Montium. A co-simulation tool, CosiMate, is used to achieve both on target implementations in just five weeks.
    11th Euromicro Conference on Digital System Design: Architectures, Methods and Tools, DSD 2008, Parma, Italy, September 3-5, 2008; 01/2008
Show more