[show abstract][hide abstract] ABSTRACT: The problem of figure-ground segmentation is of great importance in both video edit- ing and visual perception tasks. Classical video segmentation algorithms approach the problem from one of two perspectives. At one extreme, global approaches con- strain the camera motion to simplify the image structure. At the other extreme, local approaches estimate motion in small image regions over a small number of frames and tend to produce noisy signals that are difficult to segment. With recent advances in image segmentation showing that sparse information is often sufficient for figure- ground segmentation it seems surprising then that with the extra temporal informa- tion of video, an unconstrained automatic figure-ground segmentation algorithm still eludes the research community. In this paper we present an automatic video segmen- tation algorithm that is intermediate between these two extremes and uses spatiotem- poral features to regularize the segmentation. Detecting spatiotemporal T-junctions that indicate occlusion edges, we learn an occlusion edge model that is used within a colour contrast sensitive MRF to segment individual frames of a video sequence. T-junctions are learnt and classified using a support vector machine and a Gaussian mixture model is fitted to the (foreground, background) pixel pairs sampled from the detected T-junctions. Graph cut is then used to segment each frame of the video showing that sparse occlusion edge information can automatically initialize the video segmentation problem.
Proceedings of the British Machine Vision Conference 2006, Edinburgh, UK, September 4-7, 2006; 01/2006
[show abstract][hide abstract] ABSTRACT: The goal of motion segmentation and layer extraction can be viewed as the detection and localization of occluding surfaces. A feature that has been shown to be a particularly strong indicator of occlusion, in both computer vision and neuroscience, is the T-junction; however, little progress has been made in T-junction detection. One reason for this is the difficulty in distinguishing false T-junctions (i.e. those not on an occluding edge) and real T-junctions in cluttered images. In addition to this, their photometric profile alone is not enough for reliable detection. This paper overcomes the first problem by searching for T-junctions not in space, but in space-time. This removes many false T-junctions and creates a simpler image structure to explore. The second problem is mitigated by learning the appearance of T-junctions in these spatiotemporal images. An RVM T-junction classifier is learnt from hand-labelled data using SIFT to capture its redundancy. This detector is then demonstrated in a novel occlusion detector that fuses Canny edges and T-junctions in the spatiotemporal domain to detect occluding edges in the spatial domain.
Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on; 07/2005
[show abstract][hide abstract] ABSTRACT: Video matting, or layer extraction, is a classic inverse problem in computer vision that involves the extraction of foreground objects, and the alpha mattes that describe their opacity, from a set of images. Modem approaches that work with natural backgrounds often require user-labelled "trimaps" that segment each image into foreground, background and unknown regions. For long sequences, the production of accurate trimaps can be time consuming. In contrast, another class of approach depends on automatic background extraction to automate the process, but existing techniques do not make use of spatiotemporal consistency, and cannot take account of operator hints such as trimaps. This paper presents a method inspired by natural image statistics that cleanly unifies these approaches. A prior is learnt that models the relationship between the spatiotemporal gradients in the image sequence and those in the alpha mattes. This is used in combination with a learnt foreground colour model and a prior on the alpha distribution to help regularize the solution and greatly improve the automatic performance of such systems. The system is applied to several real image sequences that demonstrate the advantage that the unified approach can afford.
Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on; 01/2004