Fig 2 - uploaded by Emile Hendriks
Content may be subject to copyright.
(a) The illustration of spatio-temporal smoothing. The gray-shaded regions stacked one after another represent the object segmentation maps in each frame. (b) The flowchart of the spatio- temporal smoothing using curve evolution. TV movie. The segmentation of the objects in this realistic se- quence is particularly difficult since the object and background colors are quite similar. The initial segmentation of the car, the walking lady and the man objects are carried out using the algo- rithm [8, 3]. The results on 168 frames of the “walking lady” ob- ject will be presented here. In Fig. 3, the smoothing results for sev- eral frames in the x-y domain are provided. The top row shows the given temporally unstable object segmentation maps and the bot- tom row shows the object segmentation maps after convergence of the curve. The weight of the curvature term in (3) is selected as α = 0 . 4 (determined experimentally). We can observe that unwanted high-curvature parts and missegmented background re- gions are eliminated easily. In Fig. 4 (a), an x-t cross-section of the “lady” object for a fixed y value is shown (after processing in the x-y domain). The bottom figure shows the result of x-t curve evolution. We can see in Fig. 4 (a) that the elimination of the high curvature part in the x-t domain corresponds to the elimination of the missegmented background pixels in the x-y domain which is marked by the horizontal line in Fig. 4 (b). Fig. 4(b) shows the segmentation map of frame 111 in the spatial (x-y) domain before and after x-t processing. In Fig. 5 (a), a y-t cross section is given for a fixed x value. Two disconnected group of black regions can be seen due to the motion of the lady, who first walks towards left and then towards right. Motion compensation is utilized to make the cross sections more aligned, as seen in Fig. 5 (b). Fig. 5 (c) shows the y-t smoothing results. We can see that some high curva- ture lines are eliminated, which correspond to the legs of the lady, which actually introduces a loss of segmentation accuracy. How- ever, this is not noticeable when the scene is viewed in 3D. The effect of y-t smoothing in the spatial domain is shown in Fig. 5 (d), where the temporal unstability caused by the legs is eliminated. In Fig. 6, several frames of the Flikken sequence are shown after applying the complete spatio-temporal smoothing algorithm. We can see from the bottom row that the smoothed results do not display sudden changes as compared to the top row, which implies a better temporal stability. Although the accuracy of segmentation decreases in several frames after temporal stabilization, the overall decrease in segmentation accuracy for 168 frames was marginal (a 3% increase in the average number of missegmented pixels). However, this shows that it is possible to increase the quality of object segmentation without decreasing the segmentation errors, as explained below.
Source publication
We present a method for improving the temporal stability of video object segmentation algorithms for 3D-TV applications. First, two quantitative measures to evaluate temporal stability without ground-truth are presented. Then, a pseudo-3D curve evolution method, which spatio-temporally stabilizes the estimated object segments is introduced. Tempora...
Contexts in source publication
Context 1
... a set of temporally unstable video object segmentation maps, we first stack them together so that a three-dimensional "object blob" in x-y-t space is formed (see Fig. 2 (a)). We propose to improve the temporal stability of this "object blob" by smoothing its surface using a surface evolution approach. ...
Context 2
... idea is illustrated in Fig. 2 (a), where the horizontal rect- angle shows the x-t cross section and the vertical rectangle shows the y-t cross section. By using this pseudo-3D approach, we can obtain spatio-temporally stable object segmentation by processing the x-y, x-t and y-t slices of the "object volume" iteratively, until the shape convergences. The order of ...
Context 3
... black regions in any y-t or x-t cross-section. If multiple discon- nected black blobs still exist after motion compensation because of the natural topology of the object, the curve evolution has to be applied for each disconnected region of significant size. The over- all flowchart of the proposed pseudo-3D smoothing algorithm is given in Fig. 2 ...
Similar publications
Previous saliency detection research required the reader to evaluate performance qualitatively, based on renderings of saliency maps on a few shapes. This qualitative approach meant it was unclear which saliency models were better, or how well they compared to human perception. This paper provides a quantitative evaluation framework that addresses...
Citations
... Unlike the traditional segmentation algorithm that aims to extract some uniform and homogeneous regions (like texture or color), recent segmentation algorithms can be defined as a process to separate an image into meaningful objects according to some specified semantics. To satisfy the coming content-based multimedia services [1], segmentation of meaningful objects in unsupervised manner is urgently required in the real-world scenes. But for an arbitrary scene (e.g., dynamic background ), fully automatic object segmentation is still a monumental challenge to the state-of-the-art techniques [2]-[4] due to a wide variety of possible objects' combination. ...
In this paper, we propose an automatic human body segmentation system which mainly consists of human body detection and object segmentation. Firstly, an automatic human body detector is designed to provide hard constraints on the object and background for segmentation. And a coarse-to-fine segmentation strategy is employed to deal with the situation of partly detected object. Secondly, background contrast removal (BCR) and self-adaptive initialization level set (SAILS) are proposed to solve the tough segmentation problems of the high contrast at object boundary and/or similar colors existing in the object and background. Finally, an object updating scheme is proposed to detect and segment new object when it appears in the scene. Experimental results demonstrate that our body segmentation system works very well in the live video and standard sequences with complex background.
... To the best of our knowledge, the literature on identifying segmentation errors in a track is relatively limited . For instance, Erdem et al. [1] have tried to identify and overcome segmentation errors for a 3D television application to improve the temporal stability of object segmentation , rather than identify and remove errors. They achieved their aim by minimising changes in the global colour histogram and turning angle function of the boundary pixels of the segmented object in each frame to maximise temporal stability. ...
... MCR are essentially colour histograms in the joint R, G, B space, built with sparse bins whose position and number is adjusted to fit the pixel distribution . Instead of just using a global colour feature, as in [1] and [5], we propose to add two extra colour features relating to the upper and lower clothing colours of a person. These features are chosen to represent the often different colours of the clothing on the upper torso, and those on the Extracting the MCR for each of these three colour features utilises the same process, but analyses different spatial components of the appearance of the segmented object. ...
This paper presents a method to identify frames with significant segmentation errors in an individual's track by analysing the changes in appearance and size features along the frame sequence. The features used and compared include global colour histograms, local histograms and the bounding box' size. Experiments were carried out on 26 tracks from 4 different people across two cameras with differing illumination conditions. By fusing two local colour features with a global colour feature, probabilities of segmentation error detection as high as 83 percent of human expert-identified major segmentation errors are achieved with false alarm rates of only 3 percent. This indicates that the analysis of such features along a track can be useful in the automatic detection of significant segmentation errors. This can improve the final results of many applications that wish to use robust segmentation results from a tracked person.
... The Major Colour Representation (MCR) used in this paper to define the colour features extends the method previously developed in [2]. We propose to add two extra colour features relating to the upper and lower clothing colours of an individual to the global colours used in [2, 3, 5]. These features are chosen to represent the often different colours of the clothing on the upper torso, and those on the legs. ...
This paper presents a framework based on robust shape and appearance features for matching the various tracks generated by a single individual moving within a surveillance system. Each track is first automatically analysed in order to detect and remove the frames affected by large segmentation errors and drastic changes in illumination. The object's features computed over the remaining frames prove more robust and capable of supporting correct matching of tracks even in the case of significantly disjointed camera views. The shape and appearance features used include a height estimate as well as illumination-tolerant colour representation of the individual's global colours and the colours of the upper and lower portions of clothing. The results of a test from a real surveillance system show that the combination of these four features can provide a probability of matching as high as 91 percent with 5 percent probability of false alarms under views which have significantly differing illumination levels and suffer from significant segmentation errors in as many as 1 in 4 frames.
Digital video content analysis is an important item for multimedia content-based indexing (MCBI), content-based video retrieval (CBVR) and visual surveillance systems. There are some frequently-used generic object detection and/or tracking (D&T) algorithms in the literature, such as Background Subtraction (BS), Continuously Adaptive Mean Shift (CMS), Optical Flow (OF) and etc. An important problem for performance evaluation is the absence of stable and flexible software for comparison of different algorithms. This software is able to compare them with the same metrics in real-time and at the same platform. In this paper, we have designed and implemented the software for the performance comparison and the evaluation of well-known video object D&T algorithms (for people D&T) at the same platform. The software works as an automatic and/or semi-automatic test environment in real-time, which uses the image and video processing essentials, e.g. morphological operations and filters, and ground-truth (GT) XML data files, charting/plotting capabilities and etc.
Tracking the movements of people within large video surveillance systems is becoming increasingly important in the current
security conscious environment. Such system-wide tracking is based on algorithms for tracking a person within a single camera,
which typically operate by extracting features that describe the shape, appearance and motion of that person as they are observed
in each video frame. These features can be extracted then matched across different cameras to obtain global tracks that span
multiple cameras within the surveillance area. In this chapter, we combine a number of such features within a statistical
framework to determine the probability of any two tracks being made by the same individual. Techniques are presented to improve
the accuracy of the features. These include the application of spatial or temporal smoothing, the identification and removal
of significant feature errors, as well as the mitigation of other potential error sources, such as illumination. The results
of tracking using individual features and the combined system-wide tracks are presented based upon an analysis of people observed
in real surveillance footage. These show that software operating on current camera technology can provide significant assistance
to security operators in the system-wide tracking of individual people.
Two new region-based methods for video object tracking using active contours are presented. The first method is based on the assumption that the color histogram of the tracked object is nearly stationary from frame to frame. The proposed method is based on minimizing the color histogram difference between the estimated objects at a reference frame and the current frame using a dynamic programming framework. The second method is defined for scenes where there is an out-of-focus blur difference between the object of interest and the background. In such scenes, the proposed “defocus energy” can be utilized for automatic segmentation of the object boundary, and it can be combined with the histogram method to track the object more efficiently. Experiments demonstrate that the proposed methods are successful in difficult scenes with significant background clutter.
It is one challenge to select a general feature for object representation fixed the unconstrained videos. An object detection method which is robust to the target rotation and scales is proposed based on the histogram feature and particle swarm optimization. First, the characters of histogram are presented, and then the merits of histogram feature are analyzed. To cover the computation problem of pixel by pixel searching, particle swarm optimization (PSO) is employed. Then the flowchart of target detection algorithm using histogram and PSO is described. The experimental result proved that the histogram processes the merits of robustness and efficiency for target detection, and that the computation could be improved due to the performance of PSO.
In this paper in order to introduce multiple target detection method. We combination histogram feature and Imperialist Competitive Algorithm (ICA). We use histogram feature because it is robust to the target rotation and scales. To overcome the computation problem of pixel by pixel searching, ICA is employed. Another advantage of ICA is that if several targets in the image or frame exist, we will be able to detect simultaneously all targets in the frame. Then we apply a threshold in order to remove weak empires which belong to objects which have similarity to targets. Then clustering empires based on the distance and selecting most powerful empire of each cluster as one of the targets contained in frame, therefore we can detect all targets existing in the frame. Finally we compare ICA method with PSO (Particle Swarm Optimization) method and show that ICA is faster and more accurate than PSO in the field of target detection.
Digital video content analysis is an important item for multimedia content-based indexing (MCBI), content-based video retrieval (CBVR) and visual surveillance systems. There are some frequently-used generic object detection and/or tracking (D&T) algorithms in the literature, such as Background Subtraction (BS), Continuously Adaptive Mean Shift (CMS), Optical Flow (OF) and etc. An important problem for performance evaluation is the absence of stable and flexible software for comparison of different algorithms. This software is able to compare them with the same metrics in real-time and at the same platform. In this paper, we have designed and implemented the software for the performance comparison and the evaluation of well-known video object D&T algorithms (for people D&T) at the same platform. The software works as an automatic and/or semi-automatic test environment in real-time, which uses the image and video processing essentials, e.g. morphological operations and filters, and ground-truth (GT) XML data files, charting/plotting capabilities and etc.
Indexing deals with the automatic extraction of information with the objective of automatically describing and organizing the content. Thinking of a video stream, different types of information can be considered semantically important. Since we can assume that the most relevant one is linked to the presence of moving foreground objects, their number, their shape, and their appearance can constitute a good mean for content description. For this reason, we propose to combine both motion information and region-based color segmentation to extract moving objects from an MPEG2 compressed video stream starting only considering low-resolution data. This approach, which we refer to as "rough indexing," consists in processing P-frame motion information first, and then in performing I-frame color segmentation. Next, since many details can be lost due to the low-resolution data, to improve the object detection results, a novel spatiotemporal filtering has been developed which is constituted by a quadric surface modeling the object trace along time. This method enables to effectively correct possible former detection errors without heavily increasing the computational effort.