-
[show abstract]
[hide abstract]
ABSTRACT: Graph-based semi-supervised learning approaches have been proven effective and efficient in solving the problem of the inefficiency
of labeled training data in many real-world application areas, such as video annotation. As a significant factor of these
algorithms, however, pair-wise similarity metric of samples has not been fully investigated. Specifically, for existing approaches,
the estimation of pair-wise similarity between two samples relies on the spatial property of video data. On the other hand,
temporal property, an essential characteristic of video data, is not embedded into the pair-wise similarity measure. Accordingly,
in this paper, a novel framework for video annotation, called Joint Spatio-Temporal Correlation Learning (JSTCL) is proposed.
This framework is characterized by simultaneously taking into account both the spatial and temporal property of video data
to improve the estimation of pair-wise similarity. We apply the proposed framework to video annotation and report superior
performance compared to key existing approaches over the benchmark TRECVID data set.
KeywordsGraph-based semi-supervised learning–Pair-wise similarity measure–Spatio-temporal correlation
Formal Pattern Analysis & Applications 04/2012; · 0.74 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: This paper presents an efficient method for monocular recovering and tracking 3D human pose using 3D to 2D joints correspondences. Different from previous work, its main novelty lies in several aspects: Firstly, our method does not involve any complex features, which means that it does not tend to rely on good foreground segmentation. Secondly, formulating the model as an second order cone programming (SOCP) problem has great advantages since the SOCP can be solved quite reliably and efficiently. Finally, it advocates the use of more effective prediction strategy to increase robustness. Experiments on walking sequences demonstrate that our model performs accurately and reliably.
Pattern Recognition (CCPR), 2010 Chinese Conference on; 11/2010
-
[show abstract]
[hide abstract]
ABSTRACT: This paper addresses a method for 3D human motion tracking and voxel-based reconstruction from sparse views. We adopt the annealed Gaussian based particle swarm optimization (AGPSO) for 3D human motion tracking. The AGPSO algorithm incorporates the temporal continuity information into the traditional particle swarm optimization (PSO) algorithm under a Bayesian framework. In the online tracking process, the state variables are estimated via the particle filtering, where the observation is designed as a minimized Markov Random Field (MRF) energy. Finally, voxel reconstruction is conducted using the skeleton shape prior via dynamic graph cut. The experimental results show that our method performs promisingly against the cluttered background and generates plausible voxel reconstructions from sparse views.
Image Processing (ICIP), 2010 17th IEEE International Conference on; 10/2010
-
[show abstract]
[hide abstract]
ABSTRACT: Saliency mechanism has been considered crucial in the human visual system and helpful to object detection and recognition. This paper addresses a novel feature-based model for visual saliency detection. It consists of two steps: first, using the learned overcomplete sparse bases to represent image patches; and then, estimating saliency information via direct low-rank and sparsity matrix decomposition. We compare our model with the previous methods on natural images. Experimental results show that our model performs competitively for visual saliency detection task, and suggest the potential application of matrix decomposition and convex optimization for image analysis.
Image Processing (ICIP), 2010 17th IEEE International Conference on; 10/2010
-
[show abstract]
[hide abstract]
ABSTRACT: Saliency mechanism has been considered crucial in the human visual system and helpful to object detection and recognition. This paper addresses a novel feature-based model for visual saliency detection. It consists of two steps: first, using the learned overcomplete sparse bases to represent image patches; and then, estimating saliency information via low-rank and sparsity matrix decomposition. We compare our model with the previous methods on natural images. Experimental results on both natural images and psychological patterns show that our model performs competitively for visual saliency detection task, and suggest the potential application of matrix decomposition and convex optimization for image analysis.
IEEE Signal Processing Letters 09/2010; · 1.39 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: In computer vision community, human pose estimation and nonrigid shape recovery have evolved into different subfields. The state-of-the-art optimization techniques have been applied to the problem of deformable surface reconstruction successfully and recent methods in this area have focused on designing formulations that are easier to solve. In general, these techniques lay their success on the assumption that sufficient 2-D-3-D correspondences can be detected. By contrast, confronted with the similar ambiguity problem, many techniques for human pose estimation adopt stochastic searching or discriminative predictions, which allow for more generative image cues. However, the global optimization cannot be guaranteed via the stochastic methods; and discriminative techniques usually suffer from inaccuracy. In this letter, we absorb ideas from both domains and propose a unified approach for articulated human pose estimation. Specifically, we optimize the human pose to account for the discriminative pose prediction, bone length preservation in parallel with the point-topoint image observation. Moreover, the L<sub>2</sub> norm minimization is solved iteratively as a linear system with high computational efficiency.
IEEE Signal Processing Letters 09/2010; · 1.39 Impact Factor
-
02/2010; , ISBN: 978-953-7619-87-9
-
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, 14-19 March 2010, Sheraton Dallas Hotel, Dallas, Texas, USA; 01/2010
-
Advances in Multimedia Information Processing - PCM 2010 - 11th Pacific Rim Conference on Multimedia, Shanghai, China, September 21-24, 2010, Proceedings, Part I; 01/2010
-
Advances in Multimedia Information Processing - PCM 2010 - 11th Pacific Rim Conference on Multimedia, Shanghai, China, September 21-24, 2010, Proceedings, Part I; 01/2010
-
[show abstract]
[hide abstract]
ABSTRACT: Automatic image annotation is a promising way to achieve more effective image management and retrieval. However, system performances of the existing state-of-the-art keyword annotation schemes are often not so satisfactory. Therefore, image annotation refinement is crucial to improve the imprecise annotation results. In this paper, a novel approach is developed to automatically annotate image content by a semi-supervised learning model. With perceptual visual characteristics, the candidate annotations of unlabelled images are first obtained based on a progressive model. Then, a transducitive model, random walk with restart algorithm is used to refine these candidate annotations and the top ones are reserved as the final annotations. Experiments conducted on the typical Corel dataset show the effectiveness of the proposed approach.
IEEE Signal Processing Letters 12/2009; · 1.39 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We consider the problem of 3D modeling under the environments where colors of the foreground objects are similar to the background, which poses a difficult problem of foreground and background classification. A purely image-based algorithm is adopted in this paper, with no prior information about the foreground objects. We classify foreground and background by fusing the information at the pixel and region levels to obtain the similarity probability map, followed by a Bayesian sensor fusion framework to infer the space occupancy grid. The estimation of the occupancy allows incremental updating once a new observation is available, and the contribution of each observation can be adjusted according to its reliability. Finally, three parameters in the algorithm are analyzed in detail and experiments show the effectiveness of this method.
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on; 05/2009 · 4.63 Impact Factor
-
Science in China Series F: Information Sciences. 01/2009; 52:244-251.
-
Science in China Series F: Information Sciences. 01/2009; 52:183-194.
-
Multimedia Tools Appl. 01/2009; 42:183-205.
-
Computer Vision - ACCV 2009, 9th Asian Conference on Computer Vision, Xi'an, China, September 23-27, 2009, Revised Selected Papers, Part II; 01/2009
-
Proceedings of the International Conference on Image Processing, ICIP 2009, 7-10 November 2009, Cairo, Egypt; 01/2009
-
Proceedings of the Fifth International Conference on Image and Graphics, ICIG 2009, Xi'an, Shanxi, China, 20-23 September 2009; 01/2009
-
Proceedings of the Fifth International Conference on Image and Graphics, ICIG 2009, Xi'an, Shanxi, China, 20-23 September 2009; 01/2009
-
[show abstract]
[hide abstract]
ABSTRACT: In this letter, a novel framework to segment video scene and represent scene content is proposed. Firstly, video shots are detected using a rough-to-fine algorithm. Secondly, key frames are selected adaptively, and redundant key frames are removed using template matching. Then, spatio-temporal coherent shots are clustered into the same scene. Finally, under the full analysis of typical characters on continuously recorded videos, video scene content is semantically represented to satisfy human demand on video retrieval. Experimental results show the proposed method makes sense to efficient retrieval of video content of interest.
IEEE Signal Processing Letters 02/2008; · 1.39 Impact Factor