Ming Zhang

Zhejiang University, Hangzhou, Zhejiang Sheng, China

Are you Ming Zhang?

Claim your profile

Publications (11)5.87 Total impact

  • Article: Optimizing inter-view prediction structures for multi-view video coding using simulated annealing
    Zheng Zhu, Dong-xiao Li, Ming Zhang
    [show abstract] [hide abstract]
    ABSTRACT: New video applications, such as 3D video and free viewpoint video, require efficient compression of multi-view video. In addition to temporal redundancy, exploiting the inter-view redundancy is crucial to improve the performance of multi-view video coding. In this paper, we present a novel method to construct the optimal inter-view prediction structure for multi-view video coding using simulated annealing. In the proposed model, the design of the prediction structure is converted to the arrangement of coding order. Then, a simulated annealing algorithm is employed to minimize the total cost for obtaining the best coding order. This method is applicable to arbitrary irregular camera arrangements. As experiment results reveal, the annealing process converges to satisfactory results rapidly and the generated optimal prediction structure outperforms the reference prediction structure of the joint multi-view video model (JMVM) by 0.1–0.8 dB PSNR gains. Key wordsMulti-view video coding–Prediction structure–Simulated annealing
    Journal of Zhejiang University: Science C 04/2012; 12(2):155-162.
  • Source
    Article: Depth-aided inpainting for disocclusion restoration of multi-view images using depth-image-based rendering
    [show abstract] [hide abstract]
    ABSTRACT: A new algorithm is proposed for restoring disocclusion regions in depth-image-based rendering (DIBR) warped images. Current solutions include layered depth image (LDI), pre-filtering methods, and post-processing methods. The LDI is complicated, and pre-filtering of depth images causes noticeable geometrical distortions in cases of large baseline warping. This paper presents a depth-aided inpainting method which inherits merits from Criminisi’s inpainting algorithm. The proposed method features incorporation of a depth cue into texture estimation. The algorithm efficiently handles depth ambiguity by penalizing larger Lagrange multipliers of filling points closer to the warping position compared with the surrounding existing points. We perform morphological operations on depth images to accelerate the algorithm convergence, and adopt a luma-first strategy to adapt to various color sampling formats. Experiments on test multi-view sequence showed that our method has superiority in depth differentiation and geometrical loyalty in the restoration of warped images. Also, peak signal-to-noise ratio (PSNR) statistics on non-hole regions and whole image comparisons both compare favorably to those obtained by state of the art techniques.
    Journal of Zhejiang University - Science A: Applied Physics & Engineering 04/2012; 10(12):1738-1749. · 0.41 Impact Factor
  • Conference Proceeding: Hierarchical Joint Bilateral Filtering for Depth Post-Processing
    [show abstract] [hide abstract]
    ABSTRACT: Various 3D applications require accurate and smooth depth map, and post-processing is necessary for depth map directly generated by different correspondence algorithms. A hierarchical joint bilateral filtering method is proposed to improve the coarse depth map. By first carrying out depth confidence measuring, pixels are put into different categories according to their matching confidence. Then the initial coarse depth map is down-sampled together with the corresponding confidence map. Depth map is progressively fixed during multistep up sampling. Different from many filtering approaches, confident matches are propagated to unconfident regions by suppressing outliers in a hierarchical structure. Experiment results present that the proposed method can achieve significant improvement of initial depth map with low computational complexity.
    Image and Graphics (ICIG), 2011 Sixth International Conference on; 09/2011
  • Conference Proceeding: GPU Based Implementation of 3DTV System
    [show abstract] [hide abstract]
    ABSTRACT: This paper focuses on the near real-time implementation of end-to-end 3DTV System. It is specially designed for the generation of high-quality disparity map and depth-image-based rendering (DIBR) on the graphics processing unit (GPU) through CUDA (Compute Unified Device Architecture) API. We propose our novel methods including a kind of stereo matching with adaptive windows and an asymmetric edge adaptive filter (AEAF) for industrial application. These algorithms are structured in a way that exposes as much data parallelism as possible and the power of shared memory and data parallel programming in GPU is exploited. We evaluate our proposed methods and implementation based on the benchmark Middlebury and the experiment results show that our method is suitable for application on the trade-off among accuracy and execution speed. Running on an NVIDIA Quadro FX4800 graphics card, for each 480x375 stereo images with 60 disparity levels, the proposed system reaches about 146ms for stereo matching and reaches the speed of DIBR 5.7ms for rendering 1 view or 14ms for rendering 8 views.
    Image and Graphics (ICIG), 2011 Sixth International Conference on; 09/2011
  • Article: An Asymmetric Edge Adaptive Filter for Depth Generation and Hole Filling in 3DTV
    [show abstract] [hide abstract]
    ABSTRACT: An asymmetric edge adaptive filter (AEAF) is proposed in this paper to partially solve two puzzles in 3DTV, i.e. depth generation and hole filling. Different from other similar processing methods, one time of AEAF operation can simultaneously achieve the effect of edge correction and pre-processing of the depth maps. Thus the computing complexity can be greatly reduced. On the one hand, based on the initial depth map obtained by simple algorithms, AEAF can achieve depth maps with comparatively accurate object edges, avoiding high computation after the introduction of depth generation method based on image and video segmentation. On the other hand, AEAF can reduce the area of holes in rendered views via asymmetric smoothing of depth maps, promising an improvement in the image quality with reduced artifacts and distortions. Experiment results in the applications of 2D-to-3D conversion and stereo matching show that a balance point is found by AEAF between the two aspects of the contradiction.
    IEEE Transactions on Broadcasting 10/2010; · 1.70 Impact Factor
  • Article: Asymmetric bidirectional view synthesis for free viewpoint and three-dimensional video
    [show abstract] [hide abstract]
    ABSTRACT: Virtual view synthesis is a key technology for future free viewpoint video and three-dimensional video applications. A novel asymmetric bidirectional view synthesis method is proposed for multi-view video plus depth based systems. An intermediate virtual view is synthesized based on two adjacent reference views, which are termed as the main view and the auxiliary view. The main view contains two-dimensional color images plus per-pixel depth information. The auxiliary view only contains two-dimensional color images without depth information. The basic depth-image-based rendering technique is firstly optimized with a distance- and- depth weighted algorithm to solve the visibility and resampling problems and then applied in the following four-step view synthesis process. First, extract the occluded color data from the auxiliary view. Second, generate the corresponding occluded depth data. Then, generate two candidate virtual view images based on the main view information and the occluded information. Last, asymmetrically synthesize the two candidate virtual view images to generate the final virtual view image. Theoretical analysis and experimental results show that the proposed method can generate intermediate virtual views with high image quality.
    IEEE Transactions on Consumer Electronics 12/2009; · 0.94 Impact Factor
  • Conference Proceeding: A pipelined hardware architecture of deblocking filter in H.264/AVC
    [show abstract] [hide abstract]
    ABSTRACT: To improve the performance of in-loop deblocking filter in H.264/AVC, this paper proposes a pipelined hardware architecture. A novel transposer design is presented and its hardware cost is reduced by 15% with forwarding logic to shift pixels. Through adopting a filtering order with vertical and horizontal edges processed alternately, on-chip memory is greatly saved and only two 32times32 bits SRAMs are employed as a buffer. The time to prepare and transfer the intermediate data is also reduced, and only 222 clock cycles are required to filter one macroblock. By processing strong and normal mode filtering simultaneously in a 5-stage pipeline, the proposed architecture can work at a maximum clock frequency of 200 MHz under 0.18 mum technology, and meet the real-time filtering requirement of high-definition (1920times1088) video at a frame rate up to 116 frames per second. Moreover, for applications with low power requirement, it only needs a working frequency of 55 MHz to realize real-time decoding of 1920times1088@30fps video.
    Communications and Networking in China, 2008. ChinaCom 2008. Third International Conference on; 09/2008
  • Article: An accurate low complexity algorithm for frequency estimation in MDCT domain
    [show abstract] [hide abstract]
    ABSTRACT: This paper addresses the extraction of frequency, amplitude, and phase of sinusoidal components directly from modified discrete cosine transform (MDCT) domain, which is broadly used in many digital audio applications. An efficient method is proposed for portable device based on the approximation of MDCT expression and the simple combinations of local maximum MDCT coefficients. By extracting the frequency and amplitude of a stationary sinusoid, the experimental results demonstrate that the proposed algorithm produces accurate results with low computational complexity.
    IEEE Transactions on Consumer Electronics 09/2008; · 0.94 Impact Factor
  • Conference Proceeding: Pipelined Architecture Design of H.264/AVC CABAC Real-Time Decoding
    [show abstract] [hide abstract]
    ABSTRACT: This paper proposes a high performance 4-stage pipelined VLSI architecture of H.264/AVC CABAC decoder based on four sequential memory accesses. Through adoption of several speeding techniques such as redundant circuit and forwarding to eliminate the pipeline hazards, the parallelism among the interrelated decoding process has been fully exploited and the pipeline stalls are avoided. Additionally, by exploring parallelism in bypass mode, the proposed design decodes two successive bins in each cycle. Experimental result shows that the proposed architecture can achieve the throughput of more than 1 bin/cycle which meets the real time decoding requirement of HD1080i (1920x1088) video.
    Circuits and Systems for Communications, 2008. ICCSC 2008. 4th IEEE International Conference on; 06/2008
  • Article: Efficient pipelined CABAC encoding architecture
    [show abstract] [hide abstract]
    ABSTRACT: Context-based adaptive binary arithmetic coding (CABAC) is one of the key techniques adopted in H.264/AVC to achieve much higher compression efficiency than any other existing video compression standards. For its serial and inter-process dependent processing characteristics, the high performance design of CABAC codec is a challenge for hardware implementation. For example, the renormalization and bit-generation steps in encoding architecture are successive processes with variable iteration number which prevents the high throughput of pipelining operation. In this paper, we proposed a fully pipelined design scheme of CABAC encoder based on SoC architecture. With speeding up techniques for pipelining and special but not costly design of renormalization and bit-generation, the proposed design can achieve steady throughput of one symbol/cycle except the slice initialization process.
    IEEE Transactions on Consumer Electronics 06/2008; · 0.94 Impact Factor
  • Article: Architecture Design for H.264/AVC Integer Motion Estimation with Minimum Memory Bandwidth
    Dong-Xiao Li, Wei Zheng, Ming Zhang
    [show abstract] [hide abstract]
    ABSTRACT: Motion estimation (ME) is the most critical component of a video coding system, and it also dominates the major part of computation complexity and memory bandwidth. For H.264/AVC integer motion estimation (IME), this paper presents a novel memory-access and computation efficient full-search block-matching hardware architecture. With the highest level of on-chip data reuse, one-access for off-chip reference pixels is achieved, and the off-chip memory bandwidth is thus minimized. By distributed data caching and virtual connection of reference picture boundaries, the data traffic scheduling is simple, regular and efficient. The computation engine employs a two-dimensional (2-D) systolic processor array to calculate the absolute differences in single-instruction multiple-data (SIMD) manner, and 2-D adder trees to sum up the absolute differences, all with 100% utilization. The proposed architecture fully supports variable block-size matching of H.264/AVC, and can produce 41 sums of absolute differences (SADs) for one search point every cycle without bubble. The architecture is described in parameterized design, and an implementation for standard-definition digital TV encoding applications is presented. Theoretical analysis and experimental results show that, the proposed architecture can achieve the minimum off-chip memory bandwidth and the maximum computational performance.
    IEEE Transactions on Consumer Electronics 09/2007; · 0.94 Impact Factor