Publications

  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses the problem of human activity recognition in still images. We propose a novel method that focuses on human-object interaction for feature representation of activities on Riemannian manifolds, and exploits underlying Riemannian geometry for classfication. The main contributions of the paper include: (a) represent human activity by appearance features from local patches centered at hands containing interacting objects, and by structural features formed from the detected human skeleton containing the head, torso axis and hands; (b) formulate SVM kernel function based on geodesics on Riemannian manifolds under the log-Euclidean metric; (c) apply multi-class SVM classifier on the manifold under the one-against-all strategy. Experiments were conducted on a dataset containing 17196 images in 12 classes of activities from 4 subjects. Test results, evaluations, and comparisons with state-of-the-art methods provide support to the effectiveness of the proposed scheme.
    8th ACM/IEEE international conf, ICDSC 2014, Venice, Italy; 11/2014
  • Y. Yun, K. Fu, I.Y.H. Gu, J.Yang
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses issues in video object tracking. We propose a novel method where tracking is regarded as a one-class classification problem of domain-shift objects. The proposed tracker is inspired by the fact that the positive samples can be bounded by a closed hypersphere generated by one-class support vector machines (SVM), leading to a solution for robust learning of target model online. The main novelties of the paper include: (a) represent the target model by a set of positive samples as a cluster of points on Riemannian manifolds; (b) perform online learning of target model as a dynamic cluster of points flowing on the manifold, in an alternate manner with tracking; (c) formulate geodesic-based kernel function for one-class SVM on Riemannian manifolds under the log-Euclidean metric. Experiments are conducted on several videos, where results provide support to the proposed method.
    IEEE int'l conf ICCP 2014; 10/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Most existing salient object detection algorithms face the problem of either underor over-segmenting an image. More recent methods address the problem via multi-level segmentation. However, the number of segmentation levels is manually predetermined and only works well on specific class of images. In this paper, a new salient object detection scheme is presented based on adaptive multi-level region merging. A graphbased merging scheme is developed to reassemble regions based on their shared contour strength. This merging process is adaptive to complete contours of salient objects that can then be used for global perceptual analysis, e.g., foreground/ground separation. Such contour completion is enhanced by graph-based spectral decomposition. We show that even though simple region saliency measurements are adopted for each region, encouraging performance can be obtained after across-level integration. Experiments by comparing with 13 existing methods on three benchmark datasets including MSRA-1000, SOD and SED show the proposed method results in uniform object enhancement and achieves state-of-the-art performance.
    BMVC 2014, Nottingham, UK; 09/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Recently many graph-based salient region/object detection methods have been developed. They are rather effective for still images. However, little attention has been paid to salient region detection in videos. This paper addresses salient region detection in videos. A unified approach towards graph construction for salient object detection in videos is proposed. The proposed method combines static appearance and motion cues to construct graph, enabling a direct extension of original graph-based salient region detection to video processing. To maintain coherence in both intra- and inter-frames, a spatial-temporal smoothing operation is proposed on a structured graph derived from consecutive frames. The effectiveness of the proposed method is tested and validated using seven videos from two video datasets.
    ICPR 2014, Stockholm; 08/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses two issues for mitigating driver distraction/inattention by using novel video analysis techniques: (a) inside an ego vehicle, driver inattention is monitored through first tracking drivers face/eye region using Riemannian manifoldbased particle filters, followed by recognition of dynamic eye states using PPCA (probabilistic principal component analysis) and SVM (support vector machine) classifier. Frequencies of eye blinking and eye closure are used as the indication of sleepy and warning sign is then generated for recommendation; (b) outside an ego vehicle, road traffic is also analyzed. Surrounding vehicles (in both directions) are tracked, and their states are analyzed by self-calibrated cameras using view-geometries and road information. Parameters (e.g. distance, velocity, number) of tracked vehicles are estimated on the road ground plane in the 3D world coordinate system. These pieces of information are provided for mitigating drivers inattention. The main novelties of the proposed scheme include facial geometry based eye region detection for eye closure identification, combined tracking and detection of vehicles, new formulae derived in camera self-calibration, and the hybrid system that handles both daytime and nighttime scenarios. Experiments have been conducted on video data in two different types of camera settings, i.e., captured inside and outside a vehicle. Preliminary tests have been conducted, results and performance evaluation have indicated the effectiveness of the proposed methods.
    Int’l IEEE Conf on Signal Processing and Integrated Networks (SPIN 2014), Noida-Delhi NCR, India; 02/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses issues in multi-class visual object classification, where sequential learning and sensor fusion are exploited in a unified framework. We adopt a novel method for head pose classification using RGB and depth images. The main contribution of this paper is a multi-class AdaBoost classification framework where information obtained from RGB and depth modalities interactively complement each other. This is achieved by learning weak hypotheses for RGB and depth modalities independently with the same sampling weight in the boosting structure, and then fusing them through learning a sub-ensemble. Experiments are conducted on a Kinect RGB-D face image dataset containing 4098 face images in 5 different poses. Results have shown good performance in obtaining high classification rate (99.76%) with low false alarms on the dataset.
    IEEE int'l conf. on Signal Processing and Integrated Networks (SPIN'2014), Noida-Delhi NCR, India; 02/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses a visual tracking and analysis method for automatic monitoring of an industrial manual assembly process, where each worker sequentially picks up components from different boxes during an a ssembling process. Automatic surveillance of assembling process would enable to reduce assembling errors by giving early warning. We propose a hand tracking and trajectory analysis method from videos captured by several uncalibrated cameras with overlapping views. The proposed method consists of three modules through single-view hand tracking, consistent labeling across views, and optimal decision from multi-view temporal dynamics. The main novelties of the paper include: (a) target model learning with multiple instances through K-means clustering applied to accommodate different levels of light reflection; (b) optimal criterion for consistent labeling of tracked hands across views, based on the symmetric epipolar distance; (c) backward correction of mis-detection by combining epipolar lines with previously tracked results; (d) a multi-view voting scheme for analyzing hand trajectory using binary hand location maps. Experiments have been conducted on videos by multiple uncalibrated cameras, where a person performs assembly operations. Test results and performance evaluation have shown the effectiveness of this method, in terms of multi-view consistent estimation of hand trajectories and accurate interpretation of component assembly actions.
    7th ACM/IEEE Int’l Conf. on Distributed Smart Cameras (ICDSC’13), Palm Springs, California, USA; 10/2013
  • Y Yun, IYH Gu, H Aghajan
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses the issue of classification of human activities in still images. We propose a novel method where part-based features focusing on human and object interaction are utilized for activity representation, and classification is designed on manifolds by exploiting underlying Riemannian geometry. The main contributions of the paper include: (a) represent human activity by appearance features from image patches containing hands, and by structural features formed from the distances between the torso and patch centers; (b) formulate SVM kernel function based on the geodesics on Riemannian manifolds under the log-Euclidean metric; (c) apply multi-class SVM classifier on the manifold under the one-against-all strategy. Experiments were conducted on a dataset containing 2750 images in 7 classes of activities from 10 subjects. Results have shown good performance (average classification rate of 95.83%, false positive 0.71%, false negative 4.24%). Comparisons with three other related classifiers provide further support to the proposed method.
    IEEE int'l conf. on image processing (ICIP) 2013; 09/2013
  • Y Yun, IYH Gu, H Aghajan
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses issues in object tracking with occlusion scenarios, where multiple uncalibrated cameras with overlapping fields of view are exploited. We propose a novel method where tracking is first done independently in each individual view and then tracking results are mapped from different views to improve the tracking jointly. The proposed tracker uses the assumptions that objects are visible in at least one view and move uprightly on a common planar ground that may induce a homography relation between views. A method for online learning of object appearances on Riemannian manifolds is also introduced. The main novelties of the paper include: (a) define a similarity measure, based on geodesics between a candidate object and a set of mapped references from multiple views on a Riemannian manifold; (b) propose multiview maximum likelihood (ML) estimation of object bounding box parameters, based on Gaussian-distributed geodesics on the manifold; (c) introduce online learning of object appearances on the manifold, taking into account of possible occlusions; (d) utilize projective transformations for objects between views, where parameters are estimated from warped vertical axis by combining planar homography, epipolar geometry and vertical vanishing point; (e) embed single-view trackers in a three-layer multi-view tracking scheme. Experiments have been conducted on videos from multiple uncalibrated cameras, where objects contain long-term partial/full occlusions, or frequent intersections. Comparisons have been made with three existing methods, where the performance is evaluated both qualitatively and quantitatively. Results have shown the effectiveness of the proposed method in terms of robustness against tracking drift caused by occlusions.
    IEEE Journal on Emerging and Selected Topics in Circuits and Systems. 05/2013; 3(2):12.
  • Source
    Int'l conf. ICPR 2012, 2012.; 11/2012
  • Source
    Yixiao Yun, Irene YH Gu, Hamid Aghajan
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses problem of object tracking in occlusion scenarios, where multiple uncalibrated cameras with overlapping fields of view are used. We propose a novel method where tracking is first done independently for each view and then tracking results are mapped between each pair of views to improve the tracking in individual views, under the assumptions that objects are not occluded in all views and move uprightly on a planar ground which may induce a homography relation between each pair of views. The tracking results are mapped by jointly exploiting the geometric constraints of homography, epipolar and vertical vanishing point. Main contributions of this paper include: (a) formulate a reference model of multi-view object appearance using region covariance for each view; (b) define a likelihood measure based on geodesics on a Riemannian manifold that is consistent with the destination view by mapping both the estimated positions and appearances of tracked object from other views; (c) locate object in each individual view based on maximum likelihood criterion from multi-view estimations of object position. Experiments have been conducted on videos from multiple uncalibrated cameras, where targets experience long-term partial or full occlusions. Comparison with two existing methods and performance evaluations are also made. Test results have shown effectiveness of the proposed method in terms of robustness against tracking drifts caused by occlusions.
    6th ACM/IEEE Int'l Conf. on Distributed Smart Cameras, 2012 (ICDSC 12); 10/2012
  • Source
    Yixiao Yun, Irene Y.H. Gu
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a novel method for multi-view face pose classification through sequential learning and sensor fusion. The basic idea is to use face images observed in visual and thermal infrared (IR) bands, with the same sampling weight in a multi-class boosting structure. The main contribution of this paper is a multi-class AdaBoost classification framework where information obtained from visual and infrared bands interactively complement each other. This is achieved by learning weak hypothesis for visual and IR band independently and then fusing the optimized hypothesis sub-ensembles. In addition, an effective feature descriptor is introduced to thermal IR images. Experiments are conducted on a visual and thermal IR image dataset containing 4844 face images in 5 different poses. Results have shown significant increase in classification rate as compared with an existing multi-class AdaBoost algorithm SAMME trained on visual or infrared images alone, as well as a simple baseline classification-fusion algorithm.
    IEEE international conf. ASSP, 2012, (ICASSP 2012); 03/2012

15 Following View all

16 Followers View all