[show abstract][hide abstract] ABSTRACT: In this paper, we present a new framework to monitor medication intake for elderly individuals by incorporating a video camera and Radio Frequency Identification (RFID) sensors. The proposed framework can provide a key function for monitoring activities of daily living (ADLs) of elderly people at their own home. In an assistive environment, RFID tags are applied on medicine bottles located in a medicine cabinet so that each medicine bottle will have a unique ID. The description of the medicine data for each tag is manually input to a database. RFID readers will detect if any of these bottles are taken away from the medicine cabinet and identify the tag attached on the medicine bottle. A video camera is installed to continue monitoring the activity of taking medicine by integrating face detection and tracking, mouth detection, background subtraction, and activity detection. The preliminary results demonstrate that 100% detection accuracy for identifying medicine bottles and promising results for monitoring activity of taking medicine.
Network modeling and analysis in health informatics and bioinformatics. 07/2013; 2(2):61-70.
[show abstract][hide abstract] ABSTRACT: This paper proposes a novel feature extraction algorithm specifically designed for learning to rank in image ranking. Different from the previous works, the proposed method not only targets at preserving the local manifold structure of data, but also keeps the ordinal information among different data blocks in the low-dimensional subspace, where a ranking model can be learned effectively and efficiently. We first define the ideal directions of preserving local manifold structure and ordinal information, respectively. Based on the two definitions, a unified model is built to leverage the two kinds of information, which is formulated as an optimization problem. The experiments are conducted on two public available data sets: the MSRA-MM image data set and the “Web Queries” image data set, and the experimental results demonstrate the power of the proposed method against the state-of-the-art methods.
[show abstract][hide abstract] ABSTRACT: Learning to rank has been demonstrated as a powerful tool for image ranking, but the issue of the "curse of dimensionality" is a key challenge of learning a ranking model from a large image database. This paper proposes a novel dimensionality reduction algorithm named ordinal preserving projection (OPP) for learning to rank. We first define two matrices, which work in the row direction and column direction respectively. The two matrices aim at leveraging the global structure of the data set and ordinal information of the observations. By maximizing the corresponding objective functions, we can obtain two optimal projection matrices mapping original data points into low-dimensional subspace, in which both global structure and ordinal information can be preserved. The experiments are conducted on the public available MSRA-MM image data set and "Web Queries" image data set, and the experimental results demonstrate the effectiveness of the proposed method.
[show abstract][hide abstract] ABSTRACT: The nine papers in this special section on object and event classification in large-scale video collections can be categorized into four themes: video indexing, concept detection, video summarization, and event recognition.
IEEE Transactions on Multimedia 01/2012; 14:1-2. · 1.75 Impact Factor
[show abstract][hide abstract] ABSTRACT: Recently, recognizing affects from both face and body gestures attracts more attentions. However, it still lacks of efficient and effective features to describe the dynamics of face and gestures for real-time automatic affect recognition. In this paper, we propose a novel approach, which combines both MHI-HOG and Image-HOG through temporal normalization method, to describe the dynamics of face and body gestures for affect recognition. The MHI-HOG stands for Histogram of Oriented Gradients (HOG) on the Motion History Image (MHI). It captures motion direction of an interest point as an expression evolves over the time. The Image-HOG captures the appearance information of the corresponding interesting point. Combination of MHI-HOG and Image-HOG can effectively represent both local motion and appearance information of face and body gesture for affect recognition. The temporal normalization method explicitly solves the time resolution issue in the video-based affect recognition. Experimental results demonstrate promising performance as compared with the state of the art. We also show that expression recognition with temporal dynamics outperforms frame-based recognition.
Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on; 07/2011
[show abstract][hide abstract] ABSTRACT: Myocardial strain is a critical indicator of many cardiac diseases and dysfunctions. The goal of this paper is to extract and use the myocardial strain pattern from tagged magnetic resonance imaging (MRI) to identify and localize regional abnormal cardiac function in human subjects. In order to extract the myocardial strains from the tagged images, we developed a novel nontracking-based strain estimation method for tagged MRI. This method is based on the direct extraction of tag deformation, and therefore avoids some limitations of conventional displacement or tracking-based strain estimators. Based on the extracted spatio-temporal strain patterns, we have also developed a novel tensor-based classification framework that better conserves the spatio-temporal structure of the myocardial strain pattern than conventional vector-based classification algorithms. In addition, the tensor-based projection function keeps more of the information of the original feature space, so that abnormal tensors in the subspace can be back-projected to reveal the regional cardiac abnormality in a more physically meaningful way. We have tested our novel methods on 41 human image sequences, and achieved a classification rate of 87.80%. The regional abnormalities recovered from our algorithm agree well with the patient's pathology and clinical image interpretation, and provide a promising avenue for regional cardiac function analysis.
IEEE transactions on medical imaging. 05/2011; 30(12):2017-29.
[show abstract][hide abstract] ABSTRACT: This paper presents a component-based deformable model for generalized face alignment, in which a novel bistage statistical model is proposed to account for both local and global shape characteristics. Instead of using statistical analysis on the entire shape, we build separate Gaussian models for shape components to preserve more detailed local shape deformations. In each model of components, a Markov network is integrated to provide simple geometry constraints for our search strategy. In order to make a better description of the nonlinear interrelationships over shape components, the Gaussian process latent variable model is adopted to obtain enough control of shape variations. In addition, we adopt an illumination-robust feature to lead the local fitting of every shape point when light conditions change dramatically. To further boost the accuracy and efficiency of our component-based algorithm, an efficient subwindow search technique is adopted to detect components and to provide better initializations for shape components. Based on this approach, our system can generate accurate shape alignment results not only for images with exaggerated expressions and slight shading variation but also for images with occlusion and heavy shadows, which are rarely reported in previous work.
IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics: a publication of the IEEE Systems, Man, and Cybernetics Society 02/2011; 41(1):287-98. · 3.01 Impact Factor
[show abstract][hide abstract] ABSTRACT: A new method is proposed to detect abnormal behaviors in human group activities. This approach effectively models group activities based on social behavior analysis. Different from previous work that uses independent local features, our method explores the relationships between the current behavior state of a subject and its actions. An interaction energy potential function is proposed to represent the current behavior state of a subject, and velocity is used as its actions. Our method does not depend on human detection or segmentation, so it is robust to detection errors. Instead, tracked spatio-temporal interest points are able to provide a good estimation of modeling group interaction. SVM is used to find abnormal events. We evaluate our algorithm in two datasets: UMN and BEHAVE. Experimental results show its promising performance against the state-of-art methods.
The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20-25 June 2011; 01/2011
[show abstract][hide abstract] ABSTRACT: In this paper, we present a new framework of identifying medicine bottles using a combination of a video camera and Radio Frequency Identification (RFID) sensors for applications of monitoring the elderly's activities of daily living (ADLs) at home. RFID tags are attached to medicine bottles and first detected by RFID readers from the antenna. However, the RFID detection can only detect RFID tags within a certain range of the antenna. Once a medicine bottle is moved out of the range of the RFID antenna, a camera will be activated to continue detecting and tracking the medicine bottle for further action analysis based on moving object detection and color model of the medicine bottle. The experimental results demonstrate 100% detection accuracy for identifying medicine bottles.
2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW), Atlanta, GA, USA, November 12-15, 2011; 01/2011
[show abstract][hide abstract] ABSTRACT: An expression can be approximated by a sequence of temporal segments called neutral, onset, offset and apex. However, it is not easy to accurately detect such temporal segments only based on facial features. Some researchers try to temporally segment expression phases with the help of body gesture analysis. The problem of this approach is that the expression temporal phases from face and gesture channels are not synchronized. Additionally, most previous work adopted facial key points tracking or body tracking to extract motion information, which is unreliable in practice due to illumination variations and occlusions. In this paper, we present a novel algorithm to overcome the above issues, in which two simple and robust features are designed to describe face and gesture information, i.e., motion area and neutral divergence features. Both features do not depend on motion tracking, and they can be easily calculated too. Moreover, it is different from previous work in that we integrate face and body gesture together in modeling the temporal dynamics through a single channel of sensorial source, so it avoids the unsynchronized issue between face and gesture channels. Extensive experimental results demonstrate the effectiveness of the proposed algorithm.
Ninth IEEE International Conference on Automatic Face and Gesture Recognition (FG 2011), Santa Barbara, CA, USA, 21-25 March 2011; 01/2011
[show abstract][hide abstract] ABSTRACT: Most previous works treated image retrieval as a classification problem or a similarity measurement problem. In this paper, we propose a new idea for image retrieval, in which we regard image retrieval as a ranking issue by evaluating image content quality. Based on the content preference between the images, the image pairs are organized to build the data set for rank learning. Because image content generally is disclosed by image patches with meaningful objects, each image is looked as one bag, and the regions inside are the corresponding instances. In order to save the computation cost, the instances in the image are the rectangle regions and the integral histogram is applied to speed up histogram feature extraction. Due to the feature dimension is high, we propose a boost-based multiple instance learning for image retrieval. Based on different assumptions in multiple instance setting, Mean, Max and TopK ranking models are developed with Boost learning. Experiments on the real-world images from Flickr, Pisca, and Google shows that the power of the proposed method.
Proceedings of the 19th International Conference on Multimedea 2011, Scottsdale, AZ, USA, November 28 - December 1, 2011; 01/2011
[show abstract][hide abstract] ABSTRACT: We present a framework for unsupervised image categorization in which images containing specific objects are taken as vertices in a hypergraph and the task of image clustering is formulated as the problem of hypergraph partition. First, a novel method is proposed to select the region of interest (ROI) of each image, and then hyperedges are constructed based on shape and appearance features extracted from the ROIs. Each vertex (image) and its k-nearest neighbors (based on shape or appearance descriptors) form two kinds of hyperedges. The weight of a hyperedge is computed as the sum of the pairwise affinities within the hyperedge. Through all of the hyperedges, not only the local grouping relationships among the images are described, but also the merits of the shape and appearance characteristics are integrated together to enhance the clustering performance. Finally, a generalized spectral clustering technique is used to solve the hypergraph partition problem. We compare the proposed method to several methods and its effectiveness is demonstrated by extensive experiments on three image databases.
[show abstract][hide abstract] ABSTRACT: In this paper, we propose a new feature: dynamic soft encoded pattern (DSEP) for facial event analysis. We first develop similarity features to describe complicated variations of facial appearance, which take similarities between a haar-like feature in a given image and the corresponding ones in reference images as feature vector. The reference images are selected from the apex images of facial expressions, and the k-means clustering is applied to the references. We further perform a temporal clustering on the similarity features to produce several temporal patterns along the temporal domain, and then we map the similarity features into DSEP to describe the dynamics of facial expressions, as well as to handle the issue of time resolution. Finally, boosting-based classifier is designed based on DSEPs. Different from previous works, the proposed method makes no assumption on the time resolution. The effectiveness is demonstrated by extensive experiments on the Cohn–Kanade database.
Computer Vision and Image Understanding. 01/2011; 115:456-465.
[show abstract][hide abstract] ABSTRACT: In this paper, we propose a new transductive learning framework for image retrieval, in which images are taken as vertices in a weighted hypergraph and the task of image search is formulated as the problem of hypergraph ranking. Based on the similarity matrix computed from various feature descriptors, we take each image as a ‘centroid’ vertex and form a hyperedge by a centroid and its k-nearest neighbors. To further exploit the correlation information among images, we propose a soft hypergraph, which assigns each vertex vi to a hyperedge ej in a soft way. In the incidence structure of a soft hypergraph, we describe both the higher order grouping information and the affinity relationship between vertices within each hyperedge. After feedback images are provided, our retrieval system ranks image labels by a transductive inference approach, which tends to assign the same label to vertices that share many incidental hyperedges, with the constraints that predicted labels of feedback images should be similar to their initial labels. We further reduce the computation cost with the sampling strategy. We compare the proposed method to several other methods and its effectiveness is demonstrated by extensive experiments on Corel5K, the Scene dataset and Caltech 101.
[show abstract][hide abstract] ABSTRACT: Most previous work focuses on how to learn discriminating appearance features over all the face without considering the fact that each facial expression is physically composed of some relative action units (AU). However, the definition of AU is an ambiguous semantic description in Facial Action Coding System (FACS), so it makes accurate AU detection very difficult. In this paper, we adopt a scheme of compromise to avoid AU detection, and try to interpret facial expression by learning some compositional appearance features around AU areas. We first divided face image into local patches according to the locations of AUs, and then we extract local appearance features from each patch. A minimum error based optimization strategy is adopted to build compositional features based on local appearance features, and this process embedded into Boosting learning structure. Experiments on the Cohn-Kanada database show that the proposed method has a promising performance and the built compositional features are basically consistent to FACS.
The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010; 01/2010
[show abstract][hide abstract] ABSTRACT: This paper presents a new geometric active contour (GAC) model based on Lennard-Jones (L-J) force field, which is inspired by the theory of intermolecular interaction. Different from conventional gradient based GAC models, the proposed model does not rely on any pre-computed edge map. We take each pixel of image as a particle, and design an L-J force field for GAC model according to interaction between pixels. We introduce a parameter of distance regularization to make the force tunable, and define an energy function for the L-J function to integrate various image features efficiently. A switch parameter c generates two different characteristics for the L-J force field: in the case of c=0, the force vector flows bi-directionally converge to boundaries, while in the case of c≠0 a morphological effect is formed. To quantitatively evaluate the proposed force field, we present two criteria: directional uniformity and average amplitude, and compare it with the popular GVF. In addition, we also prove that the so-called electrostatic model is actually a special case of the proposed model.
[show abstract][hide abstract] ABSTRACT: In this paper, we propose a new transductive learning framework for image retrieval, in which images are taken as vertices in a weighted hypergraph and the task of image search is formulated as the problem of hypergraph ranking. Based on the similarity matrix computed from various feature descriptors, we take each image as a `centroid' vertex and form a hyperedge by a centroid and its k-nearest neighbors. To further exploit the correlation information among images, we propose a probabilistic hypergraph, which assigns each vertex v<sub>i</sub> to a hyperedge e<sub>j</sub> in a probabilistic way. In the incidence structure of a probabilistic hypergraph, we describe both the higher order grouping information and the affinity relationship between vertices within each hy-peredge. After feedback images are provided, our retrieval system ranks image labels by a transductive inference approach, which tends to assign the same label to vertices that share many incidental hyperedges, with the constraints that predicted labels of feedback images should be similar to their initial labels. We compare the proposed method to several other methods and its effectiveness is demonstrated by extensive experiments on Corel5K, the Scene dataset and Caltech 101.
The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010; 01/2010
[show abstract][hide abstract] ABSTRACT: Conventional whole-heart CAC quantification has been demonstrated to be insufficient in predicting coronary events, especially in accurately predicting near-term coronary events in high-risk adults. In this paper, we propose a lesion-specific CAC quantification framework to improve CAC's near-term predictive value in intermediate to high-risk populations with a novel multiple instance support vector machines (MISVM) approach. Our method works on data sets acquired with clinical imaging protocols on conventional CT scanners without modifying the CT hardware or updating the imaging protocol. The calcific lesions are quantified by geometric information, density, and some clinical measurements. A MISVM model is built to predict cardiac events, and moreover, to give a better insight of the characterization of vulnerable or culprit lesions in CAC. Experimental results on 31 patients showed significant improvement of the predictive value with the ROC analysis, the net reclassification improvement evaluation, and the leave-one-out validation against the conventional methods.
Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention. 01/2010; 13(Pt 1):484-92.
[show abstract][hide abstract] ABSTRACT: The paper presents a novel multi-view learning framework based on variational inference. We formulate the framework as a graph representation in form of graph factorization: the graph comprises of factor graphs, which are used to describe internal states of views. Each view is modeled with a Gaussian mixture model. The proposed framework has three main advantages (1) less constraint assumed on data, (2) effective utilization of unlabeled data, and (3) automatic data structure inferring: proper data structure can be inferred in only one round. The experiments on image segmentation demonstrate its effectiveness.
Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on; 08/2009
[show abstract][hide abstract] ABSTRACT: Currently, the bag of visual words (BOW) representation has received wide applications in object categorization. However, the BOW representation ignores the dependency relationship among visual words, which could provide informative knowledge to understand an image. In this paper, we first design a simple method to discover this dependency through computing the spatial correlation between visual words in overlapped local patches. Obtaining the dependency relationship, we further propose a novel update strategy to modify the BOW representation. The modification is motivated by the idea of Query Expansion applied successfully in text retrieval. We implement our approach on challenging PASCAL 2006 database, and the experimental results show its improved performance against the BOW representation.
Proceedings of the International Conference on Image Processing, ICIP 2009, 7-10 November 2009, Cairo, Egypt; 01/2009