Qingshan Liu

Nanjing University of Information Science & Technology, Nan-ching, Jiangsu Sheng, China

Are you Qingshan Liu?

Claim your profile

Publications (121)46.98 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Online multiple-output regression is an important machine learning technique for modeling, predicting, and compressing multi-dimensional correlated data streams. In this paper, we propose a novel online multiple-output regression method, called MORES, for streaming data. MORES can \emph{dynamically} learn the structure of the regression coefficients to facilitate the model's continuous refinement. We observe that limited expressive ability of the regression model, especially in the preliminary stage of online update, often leads to the variables in the residual errors being dependent. In light of this point, MORES intends to \emph{dynamically} learn and leverage the structure of the residual errors to improve the prediction accuracy. Moreover, we define three statistical variables to \emph{exactly} represent all the seen samples for \emph{incrementally} calculating prediction loss in each online update round, which can avoid loading all the training data into memory for updating model, and also effectively prevent drastic fluctuation of the model in the presence of noise. Furthermore, we introduce a forgetting factor to set different weights on samples so as to track the data streams' evolving characteristics quickly from the latest samples. Experiments on three real-world datasets validate the effectiveness and efficiency of the proposed method.
    12/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a new max-margin based discriminative feature learning method. Specifically, we aim at learning a low-dimensional feature representation, so as to maximize the global margin of the data and make the samples from the same class as close as possible. In order to enhance the robustness to noise, a $l_{2,1}$ norm constraint is introduced to make the transformation matrix in group sparsity. In addition, for multi-class classification tasks, we further intend to learn and leverage the correlation relationships among multiple class tasks for assisting in learning discriminative features. The experimental results demonstrate the power of the proposed method against the related state-of-the-art methods.
    12/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Recently, distance metric learning (DML) has attracted much attention in image retrieval, but most previous methods only work for image classification and clustering tasks. In this brief, we focus on designing ordinal DML algorithms for image ranking tasks, by which the rank levels among the images can be well measured. We first present a linear ordinal Mahalanobis DML model that tries to preserve both the local geometry information and the ordinal relationship of the data. Then, we develop a nonlinear DML method by kernelizing the above model, considering of real-world image data with nonlinear structures. To further improve the ranking performance, we finally derive a multiple kernel DML approach inspired by the idea of multiple-kernel learning that performs different kernel operators on different kinds of image features. Extensive experiments on four benchmarks demonstrate the power of the proposed algorithms against some related state-of-the-art methods.
    IEEE transactions on neural networks and learning systems 08/2014; · 4.37 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a new framework to monitor medication intake for elderly individuals by incorporating a video camera and Radio Frequency Identification (RFID) sensors. The proposed framework can provide a key function for monitoring activities of daily living (ADLs) of elderly people at their own home. In an assistive environment, RFID tags are applied on medicine bottles located in a medicine cabinet so that each medicine bottle will have a unique ID. The description of the medicine data for each tag is manually input to a database. RFID readers will detect if any of these bottles are taken away from the medicine cabinet and identify the tag attached on the medicine bottle. A video camera is installed to continue monitoring the activity of taking medicine by integrating face detection and tracking, mouth detection, background subtraction, and activity detection. The preliminary results demonstrate that 100% detection accuracy for identifying medicine bottles and promising results for monitoring activity of taking medicine.
    Network modeling and analysis in health informatics and bioinformatics. 07/2013; 2(2):61-70.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a novel feature extraction algorithm specifically designed for learning to rank in image ranking. Different from the previous works, the proposed method not only targets at preserving the local manifold structure of data, but also keeps the ordinal information among different data blocks in the low-dimensional subspace, where a ranking model can be learned effectively and efficiently. We first define the ideal directions of preserving local manifold structure and ordinal information, respectively. Based on the two definitions, a unified model is built to leverage the two kinds of information, which is formulated as an optimization problem. The experiments are conducted on two public available data sets: the MSRA-MM image data set and the “Web Queries” image data set, and the experimental results demonstrate the power of the proposed method against the state-of-the-art methods.
    Signal Processing 06/2013; 93(6):1651–1661. · 2.24 Impact Factor
  • Qingshan Liu, Yueting Zhuang
    [Show abstract] [Hide abstract]
    ABSTRACT: Given its importance, the problem of classification in imbalanced data has attracted great attention in recent years. However, few efforts have been made to develop feature selection techniques for the classification of imbalanced data. This paper thus ...
    Neurocomputing 04/2013; 105:1–2. · 2.01 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Recently, recognizing affects from both face and body gestures attracts more attentions. However, it still lacks of efficient and effective features to describe the dynamics of face and gestures for real-time automatic affect recognition. In this paper, we combine both local motion and appearance feature in a novel framework to model the temporal dynamics of face and body gesture. The proposed framework employs MHI-HOG and Image-HOG features through temporal normalization or bag of words to capture motion and appearance information. The MHI-HOG stands for Histogram of Oriented Gradients (HOG) on the Motion History Image (MHI). It captures motion direction and speed of a region of interest as an expression evolves over the time. The Image-HOG captures the appearance information of the corresponding region of interest. The temporal normalization method explicitly solves the time resolution issue in the video-based affect recognition. To implicitly model local temporal dynamics of an expression, we further propose a bag of words (BOW) based representation for both MHI-HOG and Image-HOG features. Experimental results demonstrate promising performance as compared with the state-of-the-art. Significant improvement of recognition accuracy is achieved as compared with the frame-based approach that does not consider the underlying temporal dynamics.
    Image and Vision Computing 02/2013; 31(2):175-185. · 1.58 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motion saliency detection aims at finding the semantic regions in a video sequence. It is an important pre-processing step in many vision applications. In this paper, we propose a new algorithm, Temporal Spectral Residual, for fast motion saliency detection. Different from conventional motion saliency detection algorithms that use complex mathematical models, our goal is to find a good tradeoff between the computational efficiency and accuracy. The basic observation for salient motions is that on the cross section along the temporal axis of a video sequence, the regions of moving objects contain distinct signals while the background area contains redundant information. Thus our focus in this paper is to extract the salient information on the cross section, by utilizing the off-the-shelf method Spectral Residual, which is a 2D image saliency detection method. Majority voting strategy is also introduced to generate reliable results. Since the proposed method only involves Fourier spectrum analysis, it is computationally efficient. We validate our algorithm on two applications: background subtraction in outdoor video sequences under dynamic background and left ventricle endocardium segmentation in MR sequences. Compared with some state-of-art algorithms, our algorithm achieves both good accuracy and fast computation, which satisfies the need as a pre-processing method.
    Neurocomputing 06/2012; 86:24–32. · 2.01 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The nine papers in this special section on object and event classification in large-scale video collections can be categorized into four themes: video indexing, concept detection, video summarization, and event recognition.
    IEEE Transactions on Multimedia 02/2012; 14:1-2. · 1.78 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Learning to rank has been demonstrated as a powerful tool for image ranking, but the issue of the "curse of dimensionality" is a key challenge of learning a ranking model from a large image database. This paper proposes a novel dimensionality reduction algorithm named ordinal preserving projection (OPP) for learning to rank. We first define two matrices, which work in the row direction and column direction respectively. The two matrices aim at leveraging the global structure of the data set and ordinal information of the observations. By maximizing the corresponding objective functions, we can obtain two optimal projection matrices mapping original data points into low-dimensional subspace, in which both global structure and ordinal information can be preserved. The experiments are conducted on the public available MSRA-MM image data set and "Web Queries" image data set, and the experimental results demonstrate the effectiveness of the proposed method.
    01/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a new method for facial age estimation based on ordinal discriminative feature learning. Considering the temporally ordinal and continuous characteristic of aging process, the proposed method not only aims at preserving the local manifold structure of facial images, but also it wants to keep the ordinal information among aging faces. Moreover, we try to remove redundant information from both the locality information and ordinal information as much as possible by minimizing nonlinear correlation and rank correlation. Finally, we formulate these two issues into a unified optimization problem of feature selection and present an efficient solution. The experiments are conducted on the public available Images of Groups dataset and the FG-NET dataset, and the experimental results demonstrate the power of the proposed method against the state-of-the-art methods.
    Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a novel regression method based on distance metric learning for human age estimation. We take age estimation as a problem of distance-based ordinal regression, in which the facial aging trend can be discovered by a learned distance metric. Through the learned distance metric, we hope that both the ordinal information of different age groups and the local geometry structure of the target neighborhoods can be well preserved simultaneously. Then, the facial aging trend can be truly discovered by the learned metric. Experimental results on the publicly available FG-NET database are very competitive against the state-of-the-art methods.
    Pattern Recognition (ICPR), 2012 21st International Conference on; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a new idea to analyze facial expression by exploring some common and specific information among different expressions. Inspired by the observation that only a few facial parts are active in expression disclosure (e.g., around mouth, eye), we try to discover the common and specific patches which are important to discriminate all the expressions and only a particular expression, respectively. A two-stage multi-task sparse learning (MTSL) framework is proposed to efficiently locate those discriminative patches. In the first stage MTSL, expression recognition tasks, each of which aims to find dominant patches for each expression, are combined to located common patches. Second, two related tasks, facial expression recognition and face verification tasks, are coupled to learn specific facial patches for individual expression. Extensive experiments validate the existence and significance of common and specific patches. Utilizing these learned patches, we achieve superior performances on expression recognition compared to the state-of-the-arts.
    Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a new transductive learning framework for image retrieval, in which images are taken as vertices in a weighted hypergraph and the task of image search is formulated as the problem of hypergraph ranking. Based on the similarity matrix computed from various feature descriptors, we take each image as a ‘centroid’ vertex and form a hyperedge by a centroid and its k-nearest neighbors. To further exploit the correlation information among images, we propose a soft hypergraph, which assigns each vertex vi to a hyperedge ej in a soft way. In the incidence structure of a soft hypergraph, we describe both the higher order grouping information and the affinity relationship between vertices within each hyperedge. After feedback images are provided, our retrieval system ranks image labels by a transductive inference approach, which tends to assign the same label to vertices that share many incidental hyperedges, with the constraints that predicted labels of feedback images should be similar to their initial labels. We further reduce the computation cost with the sampling strategy. We compare the proposed method to several other methods and its effectiveness is demonstrated by extensive experiments on Corel5K, the Scene dataset and Caltech 101.
    Pattern Recognition 10/2011; · 2.58 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently, recognizing affects from both face and body gestures attracts more attentions. However, it still lacks of efficient and effective features to describe the dynamics of face and gestures for real-time automatic affect recognition. In this paper, we propose a novel approach, which combines both MHI-HOG and Image-HOG through temporal normalization method, to describe the dynamics of face and body gestures for affect recognition. The MHI-HOG stands for Histogram of Oriented Gradients (HOG) on the Motion History Image (MHI). It captures motion direction of an interest point as an expression evolves over the time. The Image-HOG captures the appearance information of the corresponding interesting point. Combination of MHI-HOG and Image-HOG can effectively represent both local motion and appearance information of face and body gesture for affect recognition. The temporal normalization method explicitly solves the time resolution issue in the video-based affect recognition. Experimental results demonstrate promising performance as compared with the state of the art. We also show that expression recognition with temporal dynamics outperforms frame-based recognition.
    Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on; 07/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a framework for unsupervised image categorization in which images containing specific objects are taken as vertices in a hypergraph and the task of image clustering is formulated as the problem of hypergraph partition. First, a novel method is proposed to select the region of interest (ROI) of each image, and then hyperedges are constructed based on shape and appearance features extracted from the ROIs. Each vertex (image) and its k-nearest neighbors (based on shape or appearance descriptors) form two kinds of hyperedges. The weight of a hyperedge is computed as the sum of the pairwise affinities within the hyperedge. Through all of the hyperedges, not only the local grouping relationships among the images are described, but also the merits of the shape and appearance characteristics are integrated together to enhance the clustering performance. Finally, a generalized spectral clustering technique is used to solve the hypergraph partition problem. We compare the proposed method to several methods and its effectiveness is demonstrated by extensive experiments on three image databases.
    IEEE Transactions on Software Engineering 06/2011; 33(6):1266-73. · 2.29 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Myocardial strain is a critical indicator of many cardiac diseases and dysfunctions. The goal of this paper is to extract and use the myocardial strain pattern from tagged magnetic resonance imaging (MRI) to identify and localize regional abnormal cardiac function in human subjects. In order to extract the myocardial strains from the tagged images, we developed a novel nontracking-based strain estimation method for tagged MRI. This method is based on the direct extraction of tag deformation, and therefore avoids some limitations of conventional displacement or tracking-based strain estimators. Based on the extracted spatio-temporal strain patterns, we have also developed a novel tensor-based classification framework that better conserves the spatio-temporal structure of the myocardial strain pattern than conventional vector-based classification algorithms. In addition, the tensor-based projection function keeps more of the information of the original feature space, so that abnormal tensors in the subspace can be back-projected to reveal the regional cardiac abnormality in a more physically meaningful way. We have tested our novel methods on 41 human image sequences, and achieved a classification rate of 87.80%. The regional abnormalities recovered from our algorithm agree well with the patient's pathology and clinical image interpretation, and provide a promising avenue for regional cardiac function analysis.
    IEEE transactions on medical imaging. 05/2011; 30(12):2017-29.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a new feature: dynamic soft encoded pattern (DSEP) for facial event analysis. We first develop similarity features to describe complicated variations of facial appearance, which take similarities between a haar-like feature in a given image and the corresponding ones in reference images as feature vector. The reference images are selected from the apex images of facial expressions, and the k-means clustering is applied to the references. We further perform a temporal clustering on the similarity features to produce several temporal patterns along the temporal domain, and then we map the similarity features into DSEP to describe the dynamics of facial expressions, as well as to handle the issue of time resolution. Finally, boosting-based classifier is designed based on DSEPs. Different from previous works, the proposed method makes no assumption on the time resolution. The effectiveness is demonstrated by extensive experiments on the Cohn–Kanade database.
    Computer Vision and Image Understanding 03/2011; 115:456-465. · 1.36 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a component-based deformable model for generalized face alignment, in which a novel bistage statistical model is proposed to account for both local and global shape characteristics. Instead of using statistical analysis on the entire shape, we build separate Gaussian models for shape components to preserve more detailed local shape deformations. In each model of components, a Markov network is integrated to provide simple geometry constraints for our search strategy. In order to make a better description of the nonlinear interrelationships over shape components, the Gaussian process latent variable model is adopted to obtain enough control of shape variations. In addition, we adopt an illumination-robust feature to lead the local fitting of every shape point when light conditions change dramatically. To further boost the accuracy and efficiency of our component-based algorithm, an efficient subwindow search technique is adopted to detect components and to provide better initializations for shape components. Based on this approach, our system can generate accurate shape alignment results not only for images with exaggerated expressions and slight shading variation but also for images with occlusion and heavy shadows, which are rarely reported in previous work.
    IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics: a publication of the IEEE Systems, Man, and Cybernetics Society 02/2011; 41(1):287-98. · 3.01 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A new method is proposed to detect abnormal behaviors in human group activities. This approach effectively models group activities based on social behavior analysis. Different from previous work that uses independent local features, our method explores the relationships between the current behavior state of a subject and its actions. An interaction energy potential function is proposed to represent the current behavior state of a subject, and velocity is used as its actions. Our method does not depend on human detection or segmentation, so it is robust to detection errors. Instead, tracked spatio-temporal interest points are able to provide a good estimation of modeling group interaction. SVM is used to find abnormal events. We evaluate our algorithm in two datasets: UMN and BEHAVE. Experimental results show its promising performance against the state-of-art methods.
    The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20-25 June 2011; 01/2011

Publication Stats

1k Citations
46.98 Total Impact Points

Institutions

  • 2011–2013
    • Nanjing University of Information Science & Technology
      Nan-ching, Jiangsu Sheng, China
    • City College of New York
      • Department of Electrical Engineering
      New York City, NY, United States
  • 2007–2011
    • Rutgers, The State University of New Jersey
      • • Department of Computer Science
      • • Center for Computational Biomedicine Imaging and Modeling (CBIM)
      New Brunswick, NJ, United States
  • 2002–2010
    • Chinese Academy of Sciences
      • • Institute of Automation
      • • National Pattern Recognition Laboratory
      Peping, Beijing, China
  • 2003–2009
    • Northeast Institute of Geography and Agroecology
      • • National Pattern Recognition Laboratory
      • • Institute of Automation
      Beijing, Beijing Shi, China
  • 2006
    • Ritsumeikan University
      • College of Information Science and Engineering
      Kioto, Kyōto, Japan
  • 2005
    • The Chinese University of Hong Kong
      • Department of Information Engineering
      Hong Kong, Hong Kong