Are you P. Merkle?

Claim your profile

Publications (27)15.4 Total impact

  • Article: Depth Image-Based Rendering With Advanced Texture Synthesis for 3-D Video
    [show abstract] [hide abstract]
    ABSTRACT: A depth image-based rendering (DIBR) approach with advanced inpainting methods is presented. The DIBR algorithm can be used in 3-D video applications to synthesize a number of different perspectives of the same scene, e.g., from a multiview-video-plus-depth (MVD) representation. This MVD format consists of video and depth sequences for a limited number of original camera views of the same natural scene. Here, DIBR methods allow the computation of additional new views. An inherent problem of the view synthesis concept is the fact that image information which is occluded in the original views may become visible, especially in extrapolated views beyond the viewing range of the original cameras. The presented algorithm synthesizes these occluded textures. The synthesizer achieves visually satisfying results by taking spatial and temporal consistency measures into account. Detailed experiments show significant objective and subjective gains of the proposed method in comparison to the state-of-the-art methods.
    IEEE Transactions on Multimedia 07/2011; · 1.93 Impact Factor
  • Article: 3-D Video Representation Using Depth Maps
    K. Müller, P. Merkle, T. Wiegand
    [show abstract] [hide abstract]
    ABSTRACT: Current 3-D video (3DV) technology is based on stereo systems. These systems use stereo video coding for pictures delivered by two input cameras. Typically, such stereo systems only reproduce these two camera views at the receiver and stereoscopic displays for multiple viewers require wearing special 3-D glasses. On the other hand, emerging autostereoscopic multiview displays emit a large numbers of views to enable 3-D viewing for multiple users without requiring 3-D glasses. For representing a large number of views, a multiview extension of stereo video coding is used, typically requiring a bit rate that is proportional to the number of views. However, since the quality improvement of multiview displays will be governed by an increase of emitted views, a format is needed that allows the generation of arbitrary numbers of views with the transmission bit rate being constant. Such a format is the combination of video signals and associated depth maps. The depth maps provide disparities associated with every sample of the video signal that can be used to render arbitrary numbers of additional views via view synthesis. This paper describes efficient coding methods for video and depth data. For the generation of views, synthesis methods are presented, which mitigate errors from depth estimation and coding.
    Proceedings of the IEEE 05/2011; · 6.81 Impact Factor
  • Source
    Conference Proceeding: Correlation histogram analysis of depth-enhanced 3D video coding
    [show abstract] [hide abstract]
    ABSTRACT: This paper introduces a correlation histogram method for analyzing the different components of depth-enhanced 3D video representations. Depth-enhanced 3D representations such as multi-view video plus depth consist of two components: video and depth map sequences. As depth maps represent the scene geometry, their characteristics differ from the video data. We present a comparative analysis that identifies the significant characteristics of the two components via correlation histograms. These characteristics are of special importance for compression. Modern video codecs like H.264/AVC are highly optimized to the statistical properties of natural video. Therefore the effect of compressing the two components using the MVC extension of H.264/AVC is evaluated in the second part of the analysis. The presented results show that correlation histograms are a powerful and well-suited method for analyzing the impact of processing on the characteristics of depth-enhanced 3D video.
    Image Processing (ICIP), 2010 17th IEEE International Conference on; 10/2010
  • Source
    Conference Proceeding: 3D video formats and coding methods
    [show abstract] [hide abstract]
    ABSTRACT: The introduction of first 3D systems for digital cinema and home entertainment is based on stereo technology. For efficiently supporting new display types, depth-enhanced formats and coding technology is required, as introduced in this overview paper. First, we discuss the necessity for a generic 3D video format, as the current state-of-the-art in multi-view video coding cannot support different types of multi-view displays at the same time. Therefore, a generic depth-enhanced 3D format is developed, where any number of views can be generated from one bit stream. This, however, requires a complex framework for 3D video, where not only the 3D format and new coding methods are investigated, but also view synthesis and the provision of high-quality depth maps, e.g. via depth estimation. We present this framework and discuss the interdependencies between the different modules.
    Image Processing (ICIP), 2010 17th IEEE International Conference on; 10/2010
  • Conference Proceeding: Temporally consistent handling of disocclusions with texture synthesis for depth-image-based rendering
    [show abstract] [hide abstract]
    ABSTRACT: Depth-image-based rendering (DIBR) is used to generate additional views of a real-world scene from images or videos and associated per-pixel depth information. An inherent problem of the view synthesis concept is the fact that image information which is occluded in the original view may become visible in the “virtual” image. The resulting question is: how can these disocclusions be covered in a visually plausible manner? In this paper, a new temporally and spatially consistent hole filling method for DIBR is presented. In a first step, disocclusions in the depth map are filled. Then, a background sprite is generated and updated with every frame using the original and synthesized information from previous frames to achieve temporally consistent results. Next, small holes resulting from depth estimation inaccuracies are closed in the textured image, using methods that are based on solving Laplace equations. The residual disoccluded areas are coarsely initialized and subsequently refined by patch-based texture synthesis. Experimental results are presented, highlighting that gains in objective and visual quality can be achieved in comparison to the latest MPEG view synthesis reference software (VSRS).
    Image Processing (ICIP), 2010 17th IEEE International Conference on; 10/2010
  • Conference Proceeding: Depth image based rendering with advanced texture synthesis
    [show abstract] [hide abstract]
    ABSTRACT: In free viewpoint television or 3D video, depth image based rendering (DIBR) is used to generate virtual views based on a textured image and its associated depth information. In doing so, image regions which are occluded in the original view may become visible in the virtual image. One of the main challenges in DIBR is to extrapolate known textures into the disoccluded area without inserting subjective annoyance. In this paper, a new hole filling approach for DIBR using texture synthesis is presented. Initially, the depth map in the virtual view is filled at disoccluded locations. Then, in the textured image, holes of limited spatial extent are closed by solving Laplace equations. Larger disoccluded regions are initialized via median filtering and subsequently refined by patch-based texture synthesis. Experimental results show that the proposed approach provides improved rendering results in comparison to the latest MPEG view synthesis reference software (VSRS) version 3.6.
    Multimedia and Expo (ICME), 2010 IEEE International Conference on; 08/2010
  • Conference Proceeding: Temporal residual data sub-sampling in LDV representation format
    [show abstract] [hide abstract]
    ABSTRACT: The paper presents an analysis on temporal residual data sub-sampling in the layered depth video format (LDV). First, the LDV format with main view and residual views or data is introduced. Then, the extraction of residual data is presented and its block wise alignment for better coding efficiency. Next, the temporal residual data sub-sampling is shown together with an advanced merging method prior to sub-sampling for preserving the necessary information for good view synthesis results. These synthesis results are shown for intermediate views, generated from the uncoded, as well as coded LDV data with different temporal sub-sampling factors and merging methods for the residual data. The results show, that temporal residual data sub-sampling with data merging can outperform regular LDV without sub-sampling for the coded and uncoded versions.
    3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON), 2010; 07/2010
  • Article: 3D video: acquisition, coding, and display
    P. Merkle, K. Müller, T. Wiegand
    [show abstract] [hide abstract]
    ABSTRACT: An overview of the 3D video processing chain is given, highlighting existing and upcoming technologies and standards, addressing dependencies and interactions between acquisition, coding and display, and pointing out requirements, constraints and problems of these individual modules
    IEEE Transactions on Consumer Electronics 06/2010; · 0.94 Impact Factor
  • Source
    Conference Proceeding: Coding and intermediate view synthesis of multiview video plus depth
    [show abstract] [hide abstract]
    ABSTRACT: For advanced 3D Video (3DV) applications, efficient data representations are investigated, which only transmit a subset of the views that are required for 3D visualization. From this subset, all intermediate views are synthesized from sample-dense color and depth data. In this paper, the method for reliability-based view synthesis from compressed multi-view + depth data (MVD) is investigated and corresponding results are shown. The initial problem in such 3DV systems is the interdependency between view capturing, coding and view synthesis. For evaluating each component separately, we first generate results from the coding stage only, where color and depth coding is carried out separately. In the next step, we add the view synthesis stage with reliability-based view synthesis and show, how the separate coding results influence the view synthesis quality and what type of artifacts are produced. Efficient bit rate distribution between color and depth is investigated by objective as well as subjective evaluations. Furthermore, quality characteristics across the viewing range for different bit rate distributions are analyzed. Finally, the robustness of the reliability-based view synthesis to coding artifacts is presented.
    Image Processing (ICIP), 2009 16th IEEE International Conference on; 12/2009
  • Source
    Conference Proceeding: Development of a new MPEG standard for advanced 3D video applications
    [show abstract] [hide abstract]
    ABSTRACT: An overview of available 3D video formats and standards is given. It is explained why none of these - although each useful in some particular sense, for some particular application - satisfies all requirements of all 3D video applications. Advanced formats currently under investigation, which have the potential to serve as generic, flexible and efficient future 3D video standard are explained. Then an activity of MPEG for development of such a new standard is described and an outlook to future developments is given.
    Image and Signal Processing and Analysis, 2009. ISPA 2009. Proceedings of 6th International Symposium on; 10/2009
  • Conference Proceeding: An overview of available and emerging 3D video formats and depth enhanced stereo as efficient generic solution
    [show abstract] [hide abstract]
    ABSTRACT: Recently, popularity of 3D video has been growing significantly and it may turn into a home user mass market in the near future. However, diversity of 3D video content formats is still hampering wide success. An overview of available and emerging 3D video formats and standards is given, which are mostly related to specific types of applications and 3D displays. This includes conventional stereo video, multiview video, video plus depth, multiview video plus depth and layered depth video. Features and limitations are explained. Finally, depth enhanced stereo (DES) is introduced as a flexible, generic, and efficient 3D video format that can unify all others and serve as universal 3D video format in the future.
    Picture Coding Symposium, 2009. PCS 2009; 06/2009
  • Source
    Conference Proceeding: Stereo video compression for mobile 3D services
    [show abstract] [hide abstract]
    ABSTRACT: This paper presents a study on different techniques for stereo video compression and its optimization for mobile 3D services. Stereo video enables 3D television, but as mobile services are subject to various limitations, including bandwidth, memory, and processing power, efficient compression is required. Three of the currently available MPEG coding standards are applicable for stereo video coding, namely H.264/AVC with and without stereo SEI message and H.264/MVC. These methods are evaluated with respect to the limitations of mobile services. The results clearly indicate that for a certain bitrate inter-view prediction as well as temporal prediction with hierarchical B pictures lead to a significantly increased subjective and objective quality. Although both techniques require more complex processing at the encoder side, their coding efficiency offers the chance to realize 3D stereo at the bitrate of conventional video for mobile services.
    3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video, 2009; 06/2009
  • Source
    Conference Proceeding: Video plus depth compression for mobile 3D services
    [show abstract] [hide abstract]
    ABSTRACT: This paper presents a study on video plus depth compression using available MPEG standards and its optimization for mobile 3D services. Video plus depth enables 3D television, but as mobile services are subject to various limitations, including bandwidth, memory, and processing power, efficient compression as well as low complexity view synthesis is required. Two MPEG coding standards are applicable for video plus depth coding, namely MPEG-C Part 3 and H.264 Auxiliary Picture Syntax. These methods are evaluated with respect to the limitations of mobile services and the achievable quality for rendering the second stereo view from compressed video plus depth. In conclusion video plus depth is an interesting alternative to conventional stereo video for mobile 3D services. The results indicate that depth can be compressed at significantly lower bitrates than a secondary video, however at the expense of increased complexity for rendering the second view at the decoder.
    3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video, 2009; 06/2009
  • Source
    Conference Proceeding: Optimization and comparision of coding algorithms for mobile 3DTV
    [show abstract] [hide abstract]
    ABSTRACT: Different methods for coding of stereo video content for mobile 3DTV are examined and compared. These methods are H.264/MPEG-4 AVC simulcast transmission, H.264/MPEG-4 AVC Stereo SEI message, mixed resolution coding, and video plus depth coding using MPEG-C Part 3. The first two methods are based on a full left and right video (V+V) representation, the third method uses a full and a subsampled view and the fourth method is based on a one video plus associated depth (V+D) representation. Each method was optimized and tested using professional 3D video content. Subjective tests were carried out on a small size autostereoscopic display that is used in mobile devices. A comparison of the four methods at two different bitrates is presented. Results are provided in average subjective scoring, PSNR and VSSIM (Video Structure Similarity).
    3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video, 2009; 06/2009
  • Source
    Conference Proceeding: Intermediate view interpolation based on multiview video plus depth for advanced 3D video systems
    [show abstract] [hide abstract]
    ABSTRACT: A system for video on multiscopic 3D displays is considered where the data representation consists of multiview video plus scene depth. At most, 3 multiview video signals are being transmitted and used together with the depth data to generate intermediate views at the receiver. The paper presents an approach to such an intermediate view interpolation that separates unreliable image regions along depth discontinuities from reliable image regions. These image regions are processed with different algorithms and then fused to obtain the final interpolated view. In contrast to previous layered approaches, two boundary layers and one reliable layer is used. Moreover, the presented technique does not rely on 3D graphics support but uses image-based 3D warping instead. For enhanced quality intermediate view generation, hole-filling and filtering methods are described. As a result, high quality intermediate views for an existing 9-view auto-stereoscopic display are presented, which prove the suitability of the approach for advanced 3D video (3DV) systems.
    Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on; 11/2008
  • Conference Proceeding: The Effect of Depth Compression on Multiview Rendering Quality
    [show abstract] [hide abstract]
    ABSTRACT: This paper presents a comparative study on different techniques for depth-image compression and its implications on the quality of multiview video plus depth virtual view rendering. A novel coding algorithm for depth images that concentrates on their special characteristics, namely smooth regions delineated by sharp edges, is compared to H.264 intra-coding with depth- images. These two coding techniques are evaluated in the context of multiview video plus depth representations, where depth information is used to render virtual intermediate camera views of the scene. Therefore it is important to evaluate the influence of depth-image coding artifacts on the quality of rendered virtual views. The results of this evaluation show, that the coding algorithm specialized on the characteristics of depth images outperforms H.264 intra-coding, although its RD-performance is worse.
    3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video, 2008; 06/2008
  • Article: Compressing Time-Varying Visual Content
    K. Muller, P. Merkle, T. Wiegand
    [show abstract] [hide abstract]
    ABSTRACT: This article investigates compression approaches for 3D scene representations, where image and geometry are combined. The approaches exemplified in this article mostly focus on work in which the authors have participated.
    IEEE Signal Processing Magazine 12/2007; · 4.07 Impact Factor
  • Source
    Article: Efficient Prediction Structures for Multiview Video Coding
    [show abstract] [hide abstract]
    ABSTRACT: An experimental analysis of multiview video coding (MVC) for various temporal and inter-view prediction structures is presented. The compression method is based on the multiple reference picture technique in the H.264/AVC video coding standard. The idea is to exploit the statistical dependencies from both temporal and inter-view reference pictures for motion-compensated prediction. The effectiveness of this approach is demonstrated by an experimental analysis of temporal versus inter-view prediction in terms of the Lagrange cost function. The results show that prediction with temporal reference pictures is highly efficient, but for 20% of a picture's blocks on average prediction with reference pictures from adjacent views is more efficient. Hierarchical B pictures are used as basic structure for temporal prediction. Their advantages are combined with inter-view prediction for different temporal hierarchy levels, starting from simulcast coding with no inter-view prediction up to full level inter-view prediction. When using inter-view prediction at key picture temporal levels, average gains of 1.4-dB peak signal-to-noise ratio (PSNR) are reported, while additionally using inter-view prediction at nonkey picture temporal levels, average gains of 1.6-dB PSNR are reported. For some cases, gains of more than 3 dB, corresponding to bit-rate savings of up to 50%, are obtained.
    IEEE Transactions on Circuits and Systems for Video Technology 12/2007; · 1.65 Impact Factor
  • Conference Proceeding: Efficient Compression of Multi-View Depth Data Based on MVC
    [show abstract] [hide abstract]
    ABSTRACT: This paper presents a method for efficient compression of multi-view depth data based on our multi-view video coding approach for color data. The idea is to exploit statistical dependencies from both temporal and inter-view reference pictures for prediction. For this purpose a multi-view video data set including color and depth information is analyzed in terms of coding efficiency. Coding experiments using prediction structures with and without inter-view reference pictures are performed with multi-view depth data and compared to multi-view video coding. The results show that additionally applying inter-view prediction to temporal prediction with hierarchical B pictures improves coding efficiency for depth as well as color, reporting average gains in PSNR-Y of 0.5 dB for depth and 0.3 dB for color.
    3DTV Conference, 2007; 06/2007
  • Article: Efficient Compression of Multi-View Video Exploiting Inter-View Dependencies Based on H.264/MPEG4-AVC
    [show abstract] [hide abstract]
    ABSTRACT: Efficient Multi-view coding requires coding algorithms that exploit temporal, as well as inter-view dependencies between adjacent cameras. Based on a spatiotemporal analysis on the multi-view data set, we present a coding scheme utilizing an H.264/MPEG4-AVC codec. To handle the specific requirements of multi-view datasets, namely temporal and inter-view correlation, two main features of the coder are used: hierarchical B pictures for temporal dependencies and an adapted prediction scheme to exploit inter-view dependencies. Both features are set up in the H.264/MPEG4-AVC configuration file, such that coding and decoding is purely based on standardized software. Additionally, picture reordering before coding to optimize coding efficiency and inverse reordering after decoding to obtain individual views are applied. Finally, coding results are shown for the proposed multi-view coder and compared to simulcast anchor and simulcast hierarchical B picture coding.
    2012 IEEE International Conference on Multimedia and Expo. 07/2006;