On Learning Conditional Random Fields for Stereo

International Journal of Computer Vision (Impact Factor: 3.53). 09/2012; 99(3):1-19. DOI: 10.1007/s11263-010-0385-z

ABSTRACT Until recently, the lack of ground truth data has hindered the application of discriminative structured prediction techniques
to the stereo problem. In this paper we use ground truth data sets that we have recently constructed to explore different
model structures and parameter learning techniques. To estimate parameters in Markov random fields (MRFs) via maximum likelihood
one usually needs to perform approximate probabilistic inference. Conditional random fields (CRFs) are discriminative versions
of traditional MRFs. We explore a number of novel CRF model structures including a CRF for stereo matching with an explicit
occlusion model. CRFs require expensive inference steps for each iteration of optimization and inference is particularly slow
when there are many discrete states. We explore belief propagation, variational message passing and graph cuts as inference
methods during learning and compare with learning via pseudolikelihood. To accelerate approximate inference we have developed
a new method called sparse variational message passing which can reduce inference time by an order of magnitude with negligible
loss in quality. Learning using sparse variational message passing improves upon previous approaches using graph cuts and
allows efficient learning over large data sets when energy functions violate the constraints imposed by graph cuts.

KeywordsStereo-Learning-Structured prediction-Approximate inference

  • [Show abstract] [Hide abstract]
    ABSTRACT: This article describes the dense stereoscopic 3D reconstruction of surfaces which offer only low texture by employing a global matching algorithm with smoothness-based priors in an energy minimization framework. The envisaged application areas are high speed image sequences of dynamic processes where the projection of structured light is not applicable. The lack of depth cues on the measured object normally leads to very sparse and often false reconstructions if common local matching algorithms like cross correlation or least squares matching are employed. Within this AiF funded project an operational photogrammetric stereo measurement system has been developed consisting of a stereo rig with high speed cameras and a global matching algorithm. This system allows for the first time a dense reconstruction of surfaces with low texture in high speed image sequences. Quantitative and qualitative results for two test data sets demonstrate that the determination of a dense point cloud of low texture objects without employing structured light is possible. German Dieser Artikel beschreibt die dichte stereoskopische 3D-Rekonstruktion von Oberflächen mit geringer Textur unter Anwendung eines globalen Zuordnungsverfahrens mit Glattheitsannahmen in einem Energieminimierungsverfahren. Der angestrebte Einsatzbereich sind High-speed-Bildsequenzen dynamischer Vorgänge, bei denen die Projektion von strukturiertem Licht nicht möglich ist. Der Mangel an geeigneten Merkmalen am Messobjekt führt bei der Verwendung von üblichen lokalen Zuordnungsverfahren wie Kreuzkorrelation oder Kleinste-Quadrate-Bildzuordnung normalerweise zu einer dünn besetzten und oft falschen Rekonstruktion. Im Rahmen dieses von der AiF geförderten Projektes wurde ein operationell einsetzbares photogrammetrisches Stereomesssystem bestehend aus einem Stereorack mit High-speed-Kameras und einem globalen Zuordnungsalgorithmus entwickelt. Dieses System ermöglicht erstmals eine dichte Rekonstruktion von Oberflächen mit geringer Textur in Highspeed-Bildsequenzen. Quantitative und qualitative Ergebnisse für zwei Testdatensätze demonstrieren, dass die Bestimmung einer dichten Punktwolke von Objekten mit geringer Textur möglich ist, ohne strukturiertes Licht zu verwenden.
    Photogrammetrie - Fernerkundung - Geoinformation 02/2012; 2012(1):51-61. DOI:10.1127/1432-8364/2012/0102 · 0.43 Impact Factor
  • Source
    International Journal of Computer Vision 09/2012; DOI:10.1007/s11263-012-0530-y · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a new method to combine possibly inconsistent locally (piecewise) trained conditional models p(yα∣xα) into pseudo-samples from a global model. Our method does not require training of a CRF, but instead generates samples by iterating forward a weakly chaotic dynamical system. The new method is illustrated on image segmentation tasks where classifiers based on local appearance cues are combined with pairwise boundary cues.
    IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, November 6-13, 2011; 01/2011


Available from