Philipp Merkle

Fraunhofer Heinrich-Hertz-Institute HHI, Berlín, Berlin, Germany

Are you Philipp Merkle?

Claim your profile

Publications (49)31.44 Total impact

  • Philipp Merkle · Karsten Muller · Detlev Marpe · Thomas Wiegand

    No preview · Article · Jan 2015 · IEEE Transactions on Circuits and Systems for Video Technology
  • Karsten Müller · Philipp Merkle · Gerhard Tech
    [Show abstract] [Hide abstract]
    ABSTRACT: In this chapter, compression methods for 3D video (3DV) are presented. This includes data formats, video and depth compression, evaluation methods, and analysis tools. First, the fundamental principles of video coding for classical 2D video content are reviewed, including signal prediction, quantization, transformation, and entropy coding. These methods are extended toward multi-view video coding (MVC), where inter-view prediction is added to the 2D video coding methods to gain higher coding efficiency. Next, 3DV coding principles are introduced, which are different from previous coding methods. In 3DV, a generic input format is used for coding and a dense number of output views are generated for different types of autostereoscopic displays. This influences the format selection, encoder optimization, evaluation methods, and requires new modules, like the decoder-side view generation, as discussed in this chapter. Finally, different 3DV formats are compared and discussed for their applicability for 3DV systems. © Springer Science+Business Media New York 2013. All rights are reserved.
    No preview · Article · Nov 2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: The paper describes an extension of the high efficiency video coding (HEVC) standard for coding of multi-view video and depth data. In addition to the known concept of disparity-compensated prediction, inter-view motion parameter and inter-view residual prediction for coding of the dependent video views have been developed and integrated. Furthermore, for depth coding, new intra coding modes, a modified motion compensation and motion vector coding as well as the concept of motion parameter inheritance are part of the HEVC extension. A novel encoder control uses view synthesis optimization, which guarantees that high quality intermediate views can be generated based on the decoded data. The bitstream format supports the extraction of partial bitstreams, so that conventional 2D video, stereo video and the full multi-view video plus depth (MVD) format can be decoded from a single bitstream. Objective and subjective results are presented, demonstrating that the proposed approach provides about 50% bit rate savings in comparison to HEVC simulcast and about 20% in comparison to a straightforward multi-view extension of HEVC without the newly developed coding tools.
    No preview · Article · May 2013 · IEEE Transactions on Image Processing
  • P. Merkle · K. Muller · T. Wiegand
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a new approach for the depth coding part of a 3D video coding extension based on the Multiview Video plus Depth (MVD) representation. Our approach targets a higher coding efficiency for the depth component and is motivated by the fact that depth signals have specific characteristics that differ from video. For this purpose we apply the method of wedgelet segmentation with residual adaptation for depth blocks by implementing a new set of coding and prediction modes and by optimizing the algorithms for efficient processing and signaling. The results show that a bit rate reduction of up to 6% is achieved for the depth component, using a 3D video codec based on the high-efficiency video coding (HEVC) technology. Apart from the depth coding gains, wedgelets lead to a considerably better rendered view quality.
    No preview · Conference Paper · Jan 2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents an approach for 3D video coding that uses a format in which a small number of views as well as associated depth maps are coded and transmitted. At the receiver side, additional views required for displaying the 3D video on an autostereoscopic display can be generated based on the corresponding decoded signals by using depth image based rendering (DIBR) techniques. In terms of coding technology, the proposed coding scheme represents an extension of High Efficiency Video Coding (HEVC), similar to the Multiview Coding (MVC) extension of H.264/AVC. Besides the well-known disparity-compensated prediction, advanced techniques for inter-view and inter-component prediction, the representation of depth blocks, and the encoder control for depth signals have been developed and integrated. In comparison to simulcasting the different signals using HEVC, the proposed approach provides about 40% and 50% average bit rate savings for a whole test set when configured to comply with a 2- and 3-view scenario, respectively. The proposed codec was submitted as response to a Call for Proposals on 3D Video Technology issued by the ISO/IEC Moving Picture Experts Group (MPEG) and it was ranked as the overall best performing HEVC-based proposal in the related subjective tests.
    No preview · Conference Paper · Sep 2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: The presented approach for D video coding uses the multiview video plus depth format, in which a small number of video views as well as associated depth maps are coded. Based on the coded signals, additional views required for displaying the D video on an autostereoscopic display can be generated by depth image based rendering techniques. The developed coding scheme represents an extension of HEVC, similar to the MVC extension of H.264/AVC. However, in addition to the well-known disparity-compensated prediction advanced techniques for inter-view and inter-component prediction, the representation of depth blocks, and the encoder control for depth signals have been integrated. In comparison to simulcasting the different signals using HEVC, the proposed approach provides about 40% and 50% bit rate savings for the tested configurations with 2 and 3 views, respectively. Bit rate reductions of about 20% have been obtained in comparison to a straightforward multiview extension of HEVC without the newly developed coding tools.
    No preview · Article · May 2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a new approach for D video coding, where the video and the depth component of an MVD representation are jointly coded in an integrated framework. This enables a new type of prediction for exploiting the correlations of video and depth signals in addition to existing methods for temporal and inter-view prediction. Our new method is referred to as inter-component prediction and we adopt it for predicting non-rectangular partitions in depth blocks. By dividing the block into two regions, each represented with a constant value, such block partitions are well-adapted to the characteristics of depth maps. The results show that this approach reduces the bit rate of the depth component by up to 11% and leads to an increased quality of rendered views.
    No preview · Article · May 2012
  • K. Muller · P. Merkle · G. Tech · T. Wiegand
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents efficient coding tools for depth data in depth-enhanced video formats. The method is based on the high-efficiency video codec (HEVC). The developed tools include new depth modeling modes (DMMs), in particular using non-rectangular wedgelet and contour block partitions. As the depth data is used for synthesis of new video views, a specific 3D video encoder optimization is used. This view synthesis optimization (VSO) considers the exact local distortion in a synthesized intermediate video portion or image block for the depth map coding. In a fully optimized 3D-HEVC coder, VSO achieves average bit rate savings of 17%, while DMMs gain 6%in BD rate, even though the depth rate only contributes 10% to the overall MVD bit rate.
    No preview · Conference Paper · Jan 2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A depth image-based rendering (DIBR) approach with advanced inpainting methods is presented. The DIBR algorithm can be used in 3-D video applications to synthesize a number of different perspectives of the same scene, e.g., from a multiview-video-plus-depth (MVD) representation. This MVD format consists of video and depth sequences for a limited number of original camera views of the same natural scene. Here, DIBR methods allow the computation of additional new views. An inherent problem of the view synthesis concept is the fact that image information which is occluded in the original views may become visible, especially in extrapolated views beyond the viewing range of the original cameras. The presented algorithm synthesizes these occluded textures. The synthesizer achieves visually satisfying results by taking spatial and temporal consistency measures into account. Detailed experiments show significant objective and subjective gains of the proposed method in comparison to the state-of-the-art methods.
    Full-text · Article · Jul 2011 · IEEE Transactions on Multimedia
  • Source
    Karsten Mueller · Philipp Merkle · Thomas Wiegand
    [Show abstract] [Hide abstract]
    ABSTRACT: Current 3-D video (3DV) technology is based on stereo systems. These systems use stereo video coding for pictures delivered by two input cameras. Typically, such stereo systems only reproduce these two camera views at the receiver and stereoscopic displays for multiple viewers require wearing special 3-D glasses. On the other hand, emerging autostereoscopic multiview displays emit a large numbers of views to enable 3-D viewing for multiple users without requiring 3-D glasses. For representing a large number of views, a multiview extension of stereo video coding is used, typically requiring a bit rate that is proportional to the number of views. However, since the quality improvement of multiview displays will be governed by an increase of emitted views, a format is needed that allows the generation of arbitrary numbers of views with the transmission bit rate being constant. Such a format is the combination of video signals and associated depth maps. The depth maps provide disparities associated with every sample of the video signal that can be used to render arbitrary numbers of additional views via view synthesis. This paper describes efficient coding methods for video and depth data. For the generation of views, synthesis methods are presented, which mitigate errors from depth estimation and coding.
    Preview · Article · May 2011 · Proceedings of the IEEE
  • Karsten Muller · Philipp Merkle
    [Show abstract] [Hide abstract]
    ABSTRACT: Stereoscopic video transmission systems have now evolved from 2D video systems and have been commercialized for a number of application areas, driven by developments in stereo capturing and display technology. With the new developments in autostereoscopic display technology, these stereo systems need to further advance towards D video systems. In contrast to all previous video coding technologies, D video data requires a number of new assumptions and novel technology developments. This paper discusses the evolution from 2D to stereo video and finally to D video systems and highlights some of the major new challenges for 3D Video. Finally, an evaluation framework for D video technology is shown, that addresses these challenges and is used by ISO-MPEG for standardizing the best D video coding solution.
    No preview · Article · Jan 2011
  • Source
    Karsten Mueller · Philipp Merkle · Gerhard Tech · Thomas Wiegand
    [Show abstract] [Hide abstract]
    ABSTRACT: The introduction of first 3D systems for digital cinema and home entertainment is based on stereo technology. For efficiently supporting new display types, depth-enhanced formats and coding technology is required, as introduced in this overview paper. First, we discuss the necessity for a generic 3D video format, as the current state-of-the-art in multi-view video coding cannot support different types of multi-view displays at the same time. Therefore, a generic depth-enhanced 3D format is developed, where any number of views can be generated from one bit stream. This, however, requires a complex framework for 3D video, where not only the 3D format and new coding methods are investigated, but also view synthesis and the provision of high-quality depth maps, e.g. via depth estimation. We present this framework and discuss the interdependencies between the different modules.
    Preview · Conference Paper · Oct 2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: Depth-image-based rendering (DIBR) is used to generate additional views of a real-world scene from images or videos and associated per-pixel depth information. An inherent problem of the view synthesis concept is the fact that image information which is occluded in the original view may become visible in the “virtual” image. The resulting question is: how can these disocclusions be covered in a visually plausible manner? In this paper, a new temporally and spatially consistent hole filling method for DIBR is presented. In a first step, disocclusions in the depth map are filled. Then, a background sprite is generated and updated with every frame using the original and synthesized information from previous frames to achieve temporally consistent results. Next, small holes resulting from depth estimation inaccuracies are closed in the textured image, using methods that are based on solving Laplace equations. The residual disoccluded areas are coarsely initialized and subsequently refined by patch-based texture synthesis. Experimental results are presented, highlighting that gains in objective and visual quality can be achieved in comparison to the latest MPEG view synthesis reference software (VSRS).
    No preview · Conference Paper · Oct 2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces a correlation histogram method for analyzing the different components of depth-enhanced 3D video representations. Depth-enhanced 3D representations such as multi-view video plus depth consist of two components: video and depth map sequences. As depth maps represent the scene geometry, their characteristics differ from the video data. We present a comparative analysis that identifies the significant characteristics of the two components via correlation histograms. These characteristics are of special importance for compression. Modern video codecs like H.264/AVC are highly optimized to the statistical properties of natural video. Therefore the effect of compressing the two components using the MVC extension of H.264/AVC is evaluated in the second part of the analysis. The presented results show that correlation histograms are a powerful and well-suited method for analyzing the impact of processing on the characteristics of depth-enhanced 3D video.
    Preview · Conference Paper · Oct 2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: In free viewpoint television or 3D video, depth image based rendering (DIBR) is used to generate virtual views based on a textured image and its associated depth information. In doing so, image regions which are occluded in the original view may become visible in the virtual image. One of the main challenges in DIBR is to extrapolate known textures into the disoccluded area without inserting subjective annoyance. In this paper, a new hole filling approach for DIBR using texture synthesis is presented. Initially, the depth map in the virtual view is filled at disoccluded locations. Then, in the textured image, holes of limited spatial extent are closed by solving Laplace equations. Larger disoccluded regions are initialized via median filtering and subsequently refined by patch-based texture synthesis. Experimental results show that the proposed approach provides improved rendering results in comparison to the latest MPEG view synthesis reference software (VSRS) version 3.6.
    No preview · Conference Paper · Aug 2010
  • K. Müller · Kristina Dix · Philipp Merkle · Thomas Wiegand
    [Show abstract] [Hide abstract]
    ABSTRACT: The paper presents an analysis on temporal residual data sub-sampling in the layered depth video format (LDV). First, the LDV format with main view and residual views or data is introduced. Then, the extraction of residual data is presented and its block wise alignment for better coding efficiency. Next, the temporal residual data sub-sampling is shown together with an advanced merging method prior to sub-sampling for preserving the necessary information for good view synthesis results. These synthesis results are shown for intermediate views, generated from the uncoded, as well as coded LDV data with different temporal sub-sampling factors and merging methods for the residual data. The results show, that temporal residual data sub-sampling with data merging can outperform regular LDV without sub-sampling for the coded and uncoded versions.
    No preview · Conference Paper · Jul 2010
  • Philipp Merkle · Karsten Müller · Thomas Wiegand
    [Show abstract] [Hide abstract]
    ABSTRACT: An overview of existing and upcoming D video coding standards is given. Various different D video formats are available, each with individual pros and cons. The D video formats can be separated into two classes: video-only formats (such as stereo and multiview video) and depth-enhanced formats (such as video plus depth and multiview video plus depth). Since all these formats exist of at least two video sequences and possibly additional depth data, efficient compression is essential for the success of D video applications and technologies. For the video-only formats the H.264 family of coding standards already provides efficient and widely established compression algorithms: H.264/AVC simulcast, H.264/AVC stereo SEI message, and H.264/MVC. For the depth-enhanced formats standardized coding algorithms are currently being developed. New and specially adapted coding approaches are necessary, as the depth or disparity information included in these formats has significantly different characteristics than video and is not displayed directly, but used for rendering. Motivated by evolving market needs, MPEG has started an activity to develop a generic D video standard within the 3DVC ad-hoc group. Key features of the standard are efficient and flexible compression of depth-enhanced D video representations and decoupling of content creation and display requirements.
    No preview · Article · Jul 2010 · Proceedings of SPIE - The International Society for Optical Engineering
  • Philipp Merkle · K. Müller · Thomas Wiegand
    [Show abstract] [Hide abstract]
    ABSTRACT: An overview of the 3D video processing chain is given, highlighting existing and upcoming technologies and standards, addressing dependencies and interactions between acquisition, coding and display, and pointing out requirements, constraints and problems of these individual modules
    No preview · Article · Jun 2010 · IEEE Transactions on Consumer Electronics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: For advanced 3D Video (3DV) applications, efficient data representations are investigated, which only transmit a subset of the views that are required for 3D visualization. From this subset, all intermediate views are synthesized from sample-dense color and depth data. In this paper, the method for reliability-based view synthesis from compressed multi-view + depth data (MVD) is investigated and corresponding results are shown. The initial problem in such 3DV systems is the interdependency between view capturing, coding and view synthesis. For evaluating each component separately, we first generate results from the coding stage only, where color and depth coding is carried out separately. In the next step, we add the view synthesis stage with reliability-based view synthesis and show, how the separate coding results influence the view synthesis quality and what type of artifacts are produced. Efficient bit rate distribution between color and depth is investigated by objective as well as subjective evaluations. Furthermore, quality characteristics across the viewing range for different bit rate distributions are analyzed. Finally, the robustness of the reliability-based view synthesis to coding artifacts is presented.
    Full-text · Conference Paper · Dec 2009
  • Source
    A. Smolic · K. Mueller · P. Merkle · A. Vetro
    [Show abstract] [Hide abstract]
    ABSTRACT: An overview of available 3D video formats and standards is given. It is explained why none of these - although each useful in some particular sense, for some particular application - satisfies all requirements of all 3D video applications. Advanced formats currently under investigation, which have the potential to serve as generic, flexible and efficient future 3D video standard are explained. Then an activity of MPEG for development of such a new standard is described and an outlook to future developments is given.
    Full-text · Conference Paper · Oct 2009