P. Merkle

Fraunhofer Heinrich-Hertz-Institute HHI, Berlín, Berlin, Germany

Are you P. Merkle?

Claim your profile

Publications (44)18.14 Total impact

  • [show abstract] [hide abstract]
    ABSTRACT: The paper describes an extension of the high efficiency video coding (HEVC) standard for coding of multi-view video and depth data. In addition to the known concept of disparity-compensated prediction, inter-view motion parameter and inter-view residual prediction for coding of the dependent video views have been developed and integrated. Furthermore, for depth coding, new intra coding modes, a modified motion compensation and motion vector coding as well as the concept of motion parameter inheritance are part of the HEVC extension. A novel encoder control uses view synthesis optimization, which guarantees that high quality intermediate views can be generated based on the decoded data. The bitstream format supports the extraction of partial bitstreams, so that conventional 2D video, stereo video and the full multi-view video plus depth (MVD) format can be decoded from a single bitstream. Objective and subjective results are presented, demonstrating that the proposed approach provides about 50% bit rate savings in comparison to HEVC simulcast and about 20% in comparison to a straightforward multi-view extension of HEVC without the newly developed coding tools.
    IEEE Transactions on Image Processing 05/2013; · 3.20 Impact Factor
  • P. Merkle, K. Muller, T. Wiegand
    [show abstract] [hide abstract]
    ABSTRACT: This paper presents a new approach for the depth coding part of a 3D video coding extension based on the Multiview Video plus Depth (MVD) representation. Our approach targets a higher coding efficiency for the depth component and is motivated by the fact that depth signals have specific characteristics that differ from video. For this purpose we apply the method of wedgelet segmentation with residual adaptation for depth blocks by implementing a new set of coding and prediction modes and by optimizing the algorithms for efficient processing and signaling. The results show that a bit rate reduction of up to 6% is achieved for the depth component, using a 3D video codec based on the high-efficiency video coding (HEVC) technology. Apart from the depth coding gains, wedgelets lead to a considerably better rendered view quality.
    Multimedia and Expo (ICME), 2013 IEEE International Conference on; 01/2013
  • [show abstract] [hide abstract]
    ABSTRACT: This paper presents a new approach for D video coding, where the video and the depth component of an MVD representation are jointly coded in an integrated framework. This enables a new type of prediction for exploiting the correlations of video and depth signals in addition to existing methods for temporal and inter-view prediction. Our new method is referred to as inter-component prediction and we adopt it for predicting non-rectangular partitions in depth blocks. By dividing the block into two regions, each represented with a constant value, such block partitions are well-adapted to the characteristics of depth maps. The results show that this approach reduces the bit rate of the depth component by up to 11% and leads to an increased quality of rendered views.
    01/2012;
  • [show abstract] [hide abstract]
    ABSTRACT: The presented approach for D video coding uses the multiview video plus depth format, in which a small number of video views as well as associated depth maps are coded. Based on the coded signals, additional views required for displaying the D video on an autostereoscopic display can be generated by depth image based rendering techniques. The developed coding scheme represents an extension of HEVC, similar to the MVC extension of H.264/AVC. However, in addition to the well-known disparity-compensated prediction advanced techniques for inter-view and inter-component prediction, the representation of depth blocks, and the encoder control for depth signals have been integrated. In comparison to simulcasting the different signals using HEVC, the proposed approach provides about 40% and 50% bit rate savings for the tested configurations with 2 and 3 views, respectively. Bit rate reductions of about 20% have been obtained in comparison to a straightforward multiview extension of HEVC without the newly developed coding tools.
    01/2012;
  • [show abstract] [hide abstract]
    ABSTRACT: This paper presents an approach for 3D video coding that uses a format in which a small number of views as well as associated depth maps are coded and transmitted. At the receiver side, additional views required for displaying the 3D video on an autostereoscopic display can be generated based on the corresponding decoded signals by using depth image based rendering (DIBR) techniques. In terms of coding technology, the proposed coding scheme represents an extension of High Efficiency Video Coding (HEVC), similar to the Multiview Coding (MVC) extension of H.264/AVC. Besides the well-known disparity-compensated prediction, advanced techniques for inter-view and inter-component prediction, the representation of depth blocks, and the encoder control for depth signals have been developed and integrated. In comparison to simulcasting the different signals using HEVC, the proposed approach provides about 40% and 50% average bit rate savings for a whole test set when configured to comply with a 2- and 3-view scenario, respectively. The proposed codec was submitted as response to a Call for Proposals on 3D Video Technology issued by the ISO/IEC Moving Picture Experts Group (MPEG) and it was ranked as the overall best performing HEVC-based proposal in the related subjective tests.
    Image Processing (ICIP), 2012 19th IEEE International Conference on; 01/2012
  • [show abstract] [hide abstract]
    ABSTRACT: This paper presents efficient coding tools for depth data in depth-enhanced video formats. The method is based on the high-efficiency video codec (HEVC). The developed tools include new depth modeling modes (DMMs), in particular using non-rectangular wedgelet and contour block partitions. As the depth data is used for synthesis of new video views, a specific 3D video encoder optimization is used. This view synthesis optimization (VSO) considers the exact local distortion in a synthesized intermediate video portion or image block for the depth map coding. In a fully optimized 3D-HEVC coder, VSO achieves average bit rate savings of 17%, while DMMs gain 6%in BD rate, even though the depth rate only contributes 10% to the overall MVD bit rate.
    Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific; 01/2012
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: A depth image-based rendering (DIBR) approach with advanced inpainting methods is presented. The DIBR algorithm can be used in 3-D video applications to synthesize a number of different perspectives of the same scene, e.g., from a multiview-video-plus-depth (MVD) representation. This MVD format consists of video and depth sequences for a limited number of original camera views of the same natural scene. Here, DIBR methods allow the computation of additional new views. An inherent problem of the view synthesis concept is the fact that image information which is occluded in the original views may become visible, especially in extrapolated views beyond the viewing range of the original cameras. The presented algorithm synthesizes these occluded textures. The synthesizer achieves visually satisfying results by taking spatial and temporal consistency measures into account. Detailed experiments show significant objective and subjective gains of the proposed method in comparison to the state-of-the-art methods.
    IEEE Transactions on Multimedia 07/2011; · 1.75 Impact Factor
  • Source
    K. Müller, P. Merkle, T. Wiegand
    [show abstract] [hide abstract]
    ABSTRACT: Current 3-D video (3DV) technology is based on stereo systems. These systems use stereo video coding for pictures delivered by two input cameras. Typically, such stereo systems only reproduce these two camera views at the receiver and stereoscopic displays for multiple viewers require wearing special 3-D glasses. On the other hand, emerging autostereoscopic multiview displays emit a large numbers of views to enable 3-D viewing for multiple users without requiring 3-D glasses. For representing a large number of views, a multiview extension of stereo video coding is used, typically requiring a bit rate that is proportional to the number of views. However, since the quality improvement of multiview displays will be governed by an increase of emitted views, a format is needed that allows the generation of arbitrary numbers of views with the transmission bit rate being constant. Such a format is the combination of video signals and associated depth maps. The depth maps provide disparities associated with every sample of the video signal that can be used to render arbitrary numbers of additional views via view synthesis. This paper describes efficient coding methods for video and depth data. For the generation of views, synthesis methods are presented, which mitigate errors from depth estimation and coding.
    Proceedings of the IEEE 05/2011; · 6.91 Impact Factor
  • Karsten Muller, Philipp Merkle
    [show abstract] [hide abstract]
    ABSTRACT: Stereoscopic video transmission systems have now evolved from 2D video systems and have been commercialized for a number of application areas, driven by developments in stereo capturing and display technology. With the new developments in autostereoscopic display technology, these stereo systems need to further advance towards D video systems. In contrast to all previous video coding technologies, D video data requires a number of new assumptions and novel technology developments. This paper discusses the evolution from 2D to stereo video and finally to D video systems and highlights some of the major new challenges for 3D Video. Finally, an evaluation framework for D video technology is shown, that addresses these challenges and is used by ISO-MPEG for standardizing the best D video coding solution.
    01/2011;
  • [show abstract] [hide abstract]
    ABSTRACT: Depth-image-based rendering (DIBR) is used to generate additional views of a real-world scene from images or videos and associated per-pixel depth information. An inherent problem of the view synthesis concept is the fact that image information which is occluded in the original view may become visible in the “virtual” image. The resulting question is: how can these disocclusions be covered in a visually plausible manner? In this paper, a new temporally and spatially consistent hole filling method for DIBR is presented. In a first step, disocclusions in the depth map are filled. Then, a background sprite is generated and updated with every frame using the original and synthesized information from previous frames to achieve temporally consistent results. Next, small holes resulting from depth estimation inaccuracies are closed in the textured image, using methods that are based on solving Laplace equations. The residual disoccluded areas are coarsely initialized and subsequently refined by patch-based texture synthesis. Experimental results are presented, highlighting that gains in objective and visual quality can be achieved in comparison to the latest MPEG view synthesis reference software (VSRS).
    Image Processing (ICIP), 2010 17th IEEE International Conference on; 10/2010
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: This paper introduces a correlation histogram method for analyzing the different components of depth-enhanced 3D video representations. Depth-enhanced 3D representations such as multi-view video plus depth consist of two components: video and depth map sequences. As depth maps represent the scene geometry, their characteristics differ from the video data. We present a comparative analysis that identifies the significant characteristics of the two components via correlation histograms. These characteristics are of special importance for compression. Modern video codecs like H.264/AVC are highly optimized to the statistical properties of natural video. Therefore the effect of compressing the two components using the MVC extension of H.264/AVC is evaluated in the second part of the analysis. The presented results show that correlation histograms are a powerful and well-suited method for analyzing the impact of processing on the characteristics of depth-enhanced 3D video.
    Image Processing (ICIP), 2010 17th IEEE International Conference on; 10/2010
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: The introduction of first 3D systems for digital cinema and home entertainment is based on stereo technology. For efficiently supporting new display types, depth-enhanced formats and coding technology is required, as introduced in this overview paper. First, we discuss the necessity for a generic 3D video format, as the current state-of-the-art in multi-view video coding cannot support different types of multi-view displays at the same time. Therefore, a generic depth-enhanced 3D format is developed, where any number of views can be generated from one bit stream. This, however, requires a complex framework for 3D video, where not only the 3D format and new coding methods are investigated, but also view synthesis and the provision of high-quality depth maps, e.g. via depth estimation. We present this framework and discuss the interdependencies between the different modules.
    Image Processing (ICIP), 2010 17th IEEE International Conference on; 10/2010
  • [show abstract] [hide abstract]
    ABSTRACT: In free viewpoint television or 3D video, depth image based rendering (DIBR) is used to generate virtual views based on a textured image and its associated depth information. In doing so, image regions which are occluded in the original view may become visible in the virtual image. One of the main challenges in DIBR is to extrapolate known textures into the disoccluded area without inserting subjective annoyance. In this paper, a new hole filling approach for DIBR using texture synthesis is presented. Initially, the depth map in the virtual view is filled at disoccluded locations. Then, in the textured image, holes of limited spatial extent are closed by solving Laplace equations. Larger disoccluded regions are initialized via median filtering and subsequently refined by patch-based texture synthesis. Experimental results show that the proposed approach provides improved rendering results in comparison to the latest MPEG view synthesis reference software (VSRS) version 3.6.
    Multimedia and Expo (ICME), 2010 IEEE International Conference on; 08/2010
  • [show abstract] [hide abstract]
    ABSTRACT: The paper presents an analysis on temporal residual data sub-sampling in the layered depth video format (LDV). First, the LDV format with main view and residual views or data is introduced. Then, the extraction of residual data is presented and its block wise alignment for better coding efficiency. Next, the temporal residual data sub-sampling is shown together with an advanced merging method prior to sub-sampling for preserving the necessary information for good view synthesis results. These synthesis results are shown for intermediate views, generated from the uncoded, as well as coded LDV data with different temporal sub-sampling factors and merging methods for the residual data. The results show, that temporal residual data sub-sampling with data merging can outperform regular LDV without sub-sampling for the coded and uncoded versions.
    3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON), 2010; 07/2010
  • Philipp Merkle, Karsten Müller, Thomas Wiegand
    [show abstract] [hide abstract]
    ABSTRACT: An overview of existing and upcoming D video coding standards is given. Various different D video formats are available, each with individual pros and cons. The D video formats can be separated into two classes: video-only formats (such as stereo and multiview video) and depth-enhanced formats (such as video plus depth and multiview video plus depth). Since all these formats exist of at least two video sequences and possibly additional depth data, efficient compression is essential for the success of D video applications and technologies. For the video-only formats the H.264 family of coding standards already provides efficient and widely established compression algorithms: H.264/AVC simulcast, H.264/AVC stereo SEI message, and H.264/MVC. For the depth-enhanced formats standardized coding algorithms are currently being developed. New and specially adapted coding approaches are necessary, as the depth or disparity information included in these formats has significantly different characteristics than video and is not displayed directly, but used for rendering. Motivated by evolving market needs, MPEG has started an activity to develop a generic D video standard within the 3DVC ad-hoc group. Key features of the standard are efficient and flexible compression of depth-enhanced D video representations and decoupling of content creation and display requirements.
    Proc SPIE 07/2010;
  • P. Merkle, K. Müller, T. Wiegand
    [show abstract] [hide abstract]
    ABSTRACT: An overview of the 3D video processing chain is given, highlighting existing and upcoming technologies and standards, addressing dependencies and interactions between acquisition, coding and display, and pointing out requirements, constraints and problems of these individual modules
    IEEE Transactions on Consumer Electronics 06/2010; · 1.09 Impact Factor
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: For advanced 3D Video (3DV) applications, efficient data representations are investigated, which only transmit a subset of the views that are required for 3D visualization. From this subset, all intermediate views are synthesized from sample-dense color and depth data. In this paper, the method for reliability-based view synthesis from compressed multi-view + depth data (MVD) is investigated and corresponding results are shown. The initial problem in such 3DV systems is the interdependency between view capturing, coding and view synthesis. For evaluating each component separately, we first generate results from the coding stage only, where color and depth coding is carried out separately. In the next step, we add the view synthesis stage with reliability-based view synthesis and show, how the separate coding results influence the view synthesis quality and what type of artifacts are produced. Efficient bit rate distribution between color and depth is investigated by objective as well as subjective evaluations. Furthermore, quality characteristics across the viewing range for different bit rate distributions are analyzed. Finally, the robustness of the reliability-based view synthesis to coding artifacts is presented.
    Image Processing (ICIP), 2009 16th IEEE International Conference on; 12/2009
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: An overview of available 3D video formats and standards is given. It is explained why none of these - although each useful in some particular sense, for some particular application - satisfies all requirements of all 3D video applications. Advanced formats currently under investigation, which have the potential to serve as generic, flexible and efficient future 3D video standard are explained. Then an activity of MPEG for development of such a new standard is described and an outlook to future developments is given.
    Image and Signal Processing and Analysis, 2009. ISPA 2009. Proceedings of 6th International Symposium on; 10/2009
  • [show abstract] [hide abstract]
    ABSTRACT: Recently, popularity of 3D video has been growing significantly and it may turn into a home user mass market in the near future. However, diversity of 3D video content formats is still hampering wide success. An overview of available and emerging 3D video formats and standards is given, which are mostly related to specific types of applications and 3D displays. This includes conventional stereo video, multiview video, video plus depth, multiview video plus depth and layered depth video. Features and limitations are explained. Finally, depth enhanced stereo (DES) is introduced as a flexible, generic, and efficient 3D video format that can unify all others and serve as universal 3D video format in the future.
    Picture Coding Symposium, 2009. PCS 2009; 06/2009
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: This paper presents a study on different techniques for stereo video compression and its optimization for mobile 3D services. Stereo video enables 3D television, but as mobile services are subject to various limitations, including bandwidth, memory, and processing power, efficient compression is required. Three of the currently available MPEG coding standards are applicable for stereo video coding, namely H.264/AVC with and without stereo SEI message and H.264/MVC. These methods are evaluated with respect to the limitations of mobile services. The results clearly indicate that for a certain bitrate inter-view prediction as well as temporal prediction with hierarchical B pictures lead to a significantly increased subjective and objective quality. Although both techniques require more complex processing at the encoder side, their coding efficiency offers the chance to realize 3D stereo at the bitrate of conventional video for mobile services.
    3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video, 2009; 06/2009