On Learning Conditional Random Fields for Stereo

International Journal of Computer Vision (Impact Factor: 3.53). 09/2012; 99(3):1-19. DOI: 10.1007/s11263-010-0385-z

ABSTRACT Until recently, the lack of ground truth data has hindered the application of discriminative structured prediction techniques
to the stereo problem. In this paper we use ground truth data sets that we have recently constructed to explore different
model structures and parameter learning techniques. To estimate parameters in Markov random fields (MRFs) via maximum likelihood
one usually needs to perform approximate probabilistic inference. Conditional random fields (CRFs) are discriminative versions
of traditional MRFs. We explore a number of novel CRF model structures including a CRF for stereo matching with an explicit
occlusion model. CRFs require expensive inference steps for each iteration of optimization and inference is particularly slow
when there are many discrete states. We explore belief propagation, variational message passing and graph cuts as inference
methods during learning and compare with learning via pseudolikelihood. To accelerate approximate inference we have developed
a new method called sparse variational message passing which can reduce inference time by an order of magnitude with negligible
loss in quality. Learning using sparse variational message passing improves upon previous approaches using graph cuts and
allows efficient learning over large data sets when energy functions violate the constraints imposed by graph cuts.

KeywordsStereo-Learning-Structured prediction-Approximate inference

  • [Show abstract] [Hide abstract]
    ABSTRACT: Probabilistic graphical models have had a tremendous impact in machine learning and approaches based on energy function minimization via techniques such as graph cuts are now widely used in image segmentation. However, the free parameters in energy function-based segmentation techniques are often set by hand or using heuristic techniques. In this paper, we explore parameter learning in detail. We show how probabilistic graphical models can be used for segmentation problems to illustrate Markov random fields (MRFs), their discriminative counterparts conditional random fields (CRFs) as well as kernel CRFs. We discuss the relationships between energy function formulations, MRFs, CRFs, hybrids based on graphical models and their relationships to key techniques for inference and learning. We then explore a series of novel 3D graphical models and present a series of detailed experiments comparing and contrasting different approaches for the complete volumetric segmentation of multiple organs within computed tomography imagery of the abdominal region. Further, we show how these modeling techniques can be combined with state of the art image features based on histograms of oriented gradients to increase segmentation performance. We explore a wide variety of modeling choices, discuss the importance and relationships between inference and learning techniques and present experiments using different levels of user interaction. We go on to explore a novel approach to the challenging and important problem of adrenal gland segmentation. We present a 3D CRF formulation and compare with a novel 3D sparse kernel CRF approach we call a relevance vector random field. The method yields state of the art performance and avoids the need to discretize or cluster input features. We believe our work is the first to provide quantitative comparisons between traditional MRFs with edge-modulated interaction potentials and CRFs for multi-organ abdominal segmentation and the first to explore the 3D adrenal gland segmentation problem. Finally, along with this paper we provide the labeled data used for our experiments to the community.
    Machine Vision and Applications 02/2014; 25(2). DOI:10.1007/s00138-013-0497-x · 1.44 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: When stereo images are shown in three-dimensional (3D) display devices of different aspect ratios, the resizing algorithm for single image could lead to shape and depth distortion of the stereo image’s main content. This paper aims to propose a novel method for retargeting stereo image pairs without distorting important objects in the scene while still maintaining the consistency between the left and right images. We extended seam carving algorithm to stereo images. The novelty of our method is that important objects are determined by jointly considering the intensities of gradients and visual fusion area. The retargeted stereo pair has a feasible 3D interpretation that is similar to the original one. Our method protected the important content and reduced the visual distortion in each of the images as well as the depth distortion. Experimental results are presented to demonstrate that the proposed method effectively guaranteed the geometric consistency of resized stereo images.
    EURASIP Journal on Wireless Communications and Networking 01/2013; 2013(1). DOI:10.1186/1687-1499-2013-116 · 0.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we present a novel stereo view synthesis algorithm that is highly accurate with respect to inter-view consistency, thus to enabling stereo contents to be viewed on the autostereoscopic displays. The algorithm finds identical occluded regions within each virtual view and aligns them together to extract a surrounding background layer. The background layer for each occluded region is then used with an exemplar based inpainting method to synthesize all virtual views simultaneously. Our algorithm requires the alignment and extraction of background layers for each occluded region; however, these two steps are done efficiently with lower computational complexity in comparison to previous approaches using the exemplar based inpainting algorithms. Thus, it is more efficient than existing algorithms that synthesize one virtual view at a time. This paper also describes the implementation of a simplified GPU accelerated version of the approach and its implementation in CUDA. Our CUDA method has sublinear complexity in terms of the number of views that need to be generated, which makes it especially useful for generating content for autostereoscopic displays that require many views to operate. An objective of our work is to allow the user to change depth and viewing perspective on the fly. Therefore, to further accelerate the CUDA variant of our approach, we present a modified version of our method to warp the background pixels from reference views to a middle view to recover background pixels. We then use an exemplar based inpainting method to fill in the occluded regions. We use warping of the foreground from the reference images and background from the filled regions to synthesize new virtual views on the fly. Our experimental results indicate that the simplified CUDA implementation decreases running time by orders of magnitude with negligible loss in quality.
    03/2012; 3(1). DOI:10.1007/3DRes.01(2012)1


Available from