Qingshan Liu

Nanjing University of Science and Technology, Nan-ching, Jiangsu Sheng, China

Are you Qingshan Liu?

Claim your profile

Publications (130)67.99 Total impact

  • Qingshan Liu · Jing Yang · Kaihua Zhang · Yi Wu
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently, the compressive tracking (CT) method has attracted much attention due to its high efficiency, but it cannot well deal with the large scale target appearance variations due to its data-independent random projection matrix that results in less discriminative features. To address this issue, in this paper we propose an adaptive CT approach, which selects the most discriminative features to design an effective appearance model. Our method significantly improves CT in three aspects: Firstly, the most discriminative features are selected via an online vector boosting method. Secondly, the object representation is updated in an effective online manner, which preserves the stable features while filtering out the noisy ones. Finally, a simple and effective trajectory rectification approach is adopted that can make the estimated location more accurate. Extensive experiments on the CVPR2013 tracking benchmark demonstrate the superior performance of our algorithm compared over state-of-the-art tracking algorithms.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper focuses on the problem of simultaneous sample and feature selection for machine learning in a fully unsupervised setting. Though most existing works tackle these two problems separately that derives two well-studied sub-areas namely active learning and feature selection, a unified approach is inspirational since they are often interleaved with each other. Noisy and high-dimensional features will bring adverse effect on sample selection, while `good' samples will be beneficial to feature selection. We present a unified framework to conduct active learning and feature selection simultaneously. From the data reconstruction perspective, both the selected samples and features can best approximate the original dataset respectively, such that the selected samples characterized by the selected features are very representative. Additionally our method is one-shot without iteratively selecting samples for progressive labeling. Thus our model is especially suitable when the initial labeled samples are scarce or totally absent, which existing works hardly address particularly for simultaneous feature selection. To alleviate the NP-hardness of the raw problem, the proposed formulation involves a convex but non-smooth optimization problem. We solve it efficiently by an iterative algorithm, and prove its global convergence. Experiments on publicly available datasets validate that our method is promising compared with the state-of-the-arts.
  • Huihui Song · Bo Huang · Qingshan Liu · Kaihua Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: To take advantage of the wide swath width of Landsat Thematic Mapper (TM)/Enhanced Thematic Mapper Plus (ETM+) images and the high spatial resolution of Système Pour l'Observation de la Terre 5 (SPOT5) images, we present a learning-based super-resolution method to fuse these two data types. The fused images are expected to be characterized by the swath width of TM/ETM+ images and the spatial resolution of SPOT5 images. To this end, we first model the imaging process from a SPOT image to a TM/ETM+ image at their corresponding bands, by building an image degradation model via blurring and downsampling operations. With this degradation model, we can generate a simulated Landsat image from each SPOT5 image, thereby avoiding the requirement for geometric coregistration for the two input images. Then, band by band, image fusion can be implemented in two stages: 1) learning a dictionary pair representing the high- and low-resolution details from the given SPOT5 and the simulated TM/ETM+ images; 2) super-resolving the input Landsat images based on the dictionary pair and a sparse coding algorithm. It is noteworthy that the proposed method can also deal with the conventional spatial and spectral fusion of TM/ETM+ and SPOT5 images by using the learned dictionary pairs. To examine the performance of the proposed method of fusing the swath width of TM/ETM+ and the spatial resolution of SPOT5, we illustrate the fusion results on the actual TM images and compare with several classic pansharpening methods by assuming that the corresponding SPOT5 panchromatic image exists. Furthermore, we implement the classification experiments on both actual images and fusion results to demonstrate the benefits of the proposed method for further classification applications.
    IEEE Transactions on Geoscience and Remote Sensing 03/2015; 53(3):1195-1204. DOI:10.1109/TGRS.2014.2335818 · 3.51 Impact Factor
  • Jiankang deng · Yubao Sun · Qingshan Liu · Hanqing Lu
    [Show abstract] [Hide abstract]
    ABSTRACT: Localizing facial landmarks is an essential prerequisite to facial image analysis. However, due to the large variability in expression, illumination, pose and the existence of occlusions in the real-world face images, how to localize facial landmarks more efficiently is still a challenging problem. In this paper, we present a low-rank driven regression model for robust facial landmark localization. Our approach consists of low-rank face frontalization and sparse shape constrained cascade regression steps, which lies on, (1) in terms of the low rank prior of face image, we recover such a low-rank face from its deformed image and the associated deformation despite significant distortion and corruption. Alignment of the recovered frontal face image is more simple and effective. And (2) in terms of the sparse coding of face shape on the shape dictionary learnt from training data, sparse shape constrained cascade regression model is proposed to simultaneously suppress the ambiguity in local features and outlier caused by occlusion, and sparse residual error deviated from low-rank face texture is also utilized to predict the occlusion area. Extensive results on several wild benchmarks such as COFW, LFPW and Helen demonstrate that the proposed method is robust to facial occlusions, pose variations and exaggerated facial expressions.
    Neurocomputing 03/2015; 151:196-206. DOI:10.1016/j.neucom.2014.09.052 · 2.01 Impact Factor
  • Source
    Kaihua Zhang · Qingshan Liu · Yi Wu · Ming-Hsuan Yang
    [Show abstract] [Hide abstract]
    ABSTRACT: Deep networks have been successfully applied to visual tracking by learning a generic representation offline from numerous training images. However the offline training is time-consuming and the learned generic representation may be less discriminative for tracking specific objects. In this paper we present that, even without learning, simple convolutional networks can be powerful enough to develop a robust representation for visual tracking. In the first frame, we randomly extract a set of normalized patches from the target region as filters, which define a set of feature maps in the subsequent frames. These maps measure similarities between each filter and the useful local intensity patterns across the target, thereby encoding its local structural information. Furthermore, all the maps form together a global representation, which maintains the relative geometric positions of the local intensity patterns, and hence the inner geometric layout of the target is also well preserved. A simple and effective online strategy is adopted to update the representation, allowing it to robustly adapt to target appearance variations. Our convolution networks have surprisingly lightweight structure, yet perform favorably against several state-of-the-art methods on a large benchmark dataset with 50 challenging videos.
  • Source
    Changsheng Li · Weishan Dong · Qingshan Liu · Xin Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Online multiple-output regression is an important machine learning technique for modeling, predicting, and compressing multi-dimensional correlated data streams. In this paper, we propose a novel online multiple-output regression method, called MORES, for streaming data. MORES can \emph{dynamically} learn the structure of the regression coefficients to facilitate the model's continuous refinement. We observe that limited expressive ability of the regression model, especially in the preliminary stage of online update, often leads to the variables in the residual errors being dependent. In light of this point, MORES intends to \emph{dynamically} learn and leverage the structure of the residual errors to improve the prediction accuracy. Moreover, we define three statistical variables to \emph{exactly} represent all the seen samples for \emph{incrementally} calculating prediction loss in each online update round, which can avoid loading all the training data into memory for updating model, and also effectively prevent drastic fluctuation of the model in the presence of noise. Furthermore, we introduce a forgetting factor to set different weights on samples so as to track the data streams' evolving characteristics quickly from the latest samples. Experiments on three real-world datasets validate the effectiveness and efficiency of the proposed method.
  • Source
    Changsheng Li · Qingshan Liu · Weishan Dong · Xin Zhang · Lin Yang
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a new max-margin based discriminative feature learning method. Specifically, we aim at learning a low-dimensional feature representation, so as to maximize the global margin of the data and make the samples from the same class as close as possible. In order to enhance the robustness to noise, a $l_{2,1}$ norm constraint is introduced to make the transformation matrix in group sparsity. In addition, for multi-class classification tasks, we further intend to learn and leverage the correlation relationships among multiple class tasks for assisting in learning discriminative features. The experimental results demonstrate the power of the proposed method against the related state-of-the-art methods.
  • Kaihua Zhang · Qingshan Liu · Huihui Song · Xuelong Li
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a novel variational approach for simultaneous estimation of bias field and segmentation of images with intensity inhomogeneity. We model intensity of inhomogeneous objects to be Gaussian distributed with different means and variances, and then introduce a sliding window to map the original image intensity onto another domain, where the intensity distribution of each object is still Gaussian but can be better separated. The means of the Gaussian distributions in the transformed domain can be adaptively estimated by multiplying the bias field with a piecewise constant signal within the sliding window. A maximum likelihood energy functional is then defined on each local region, which combines the bias field, the membership function of the object region, and the constant approximating the true signal from its corresponding object. The energy functional is then extended to the whole image domain by the Bayesian learning approach. An efficient iterative algorithm is proposed for energy minimization, via which the image segmentation and bias field correction are simultaneously achieved. Furthermore, the smoothness of the obtained optimal bias field is ensured by the normalized convolutions without extra cost. Experiments on real images demonstrated the superiority of the proposed algorithm to other state-of-the-art representative methods.
    Cybernetics, IEEE Transactions on 10/2014; 45(8). DOI:10.1109/TCYB.2014.2352343 · 3.47 Impact Factor
  • The 22nd International Conference on Pattern Recognition (ICPR); 08/2014
  • Changsheng Li · Qingshan Liu · Jing Liu · Hanqing Lu
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently, distance metric learning (DML) has attracted much attention in image retrieval, but most previous methods only work for image classification and clustering tasks. In this brief, we focus on designing ordinal DML algorithms for image ranking tasks, by which the rank levels among the images can be well measured. We first present a linear ordinal Mahalanobis DML model that tries to preserve both the local geometry information and the ordinal relationship of the data. Then, we develop a nonlinear DML method by kernelizing the above model, considering of real-world image data with nonlinear structures. To further improve the ranking performance, we finally derive a multiple kernel DML approach inspired by the idea of multiple-kernel learning that performs different kernel operators on different kinds of image features. Extensive experiments on four benchmarks demonstrate the power of the proposed algorithms against some related state-of-the-art methods.
    IEEE transactions on neural networks and learning systems 08/2014; 26(7). DOI:10.1109/TNNLS.2014.2339100 · 4.37 Impact Factor
  • Yubao Sun · Qingshan Liu · Jinhui Tang · Dacheng Tao
    [Show abstract] [Hide abstract]
    ABSTRACT: In recent years, sparse representation has been widely used in object recognition applications. How to learn the dictionary is a key issue to sparse representation. A popular method is to use l1 norm as the sparsity measurement of representation coefficients for dictionary learning. However, the l1 norm treats each atom in the dictionary independently, so the learned dictionary cannot well capture the multi-subspaces structural information of the data. Additionally, the learned subdictionary for each class usually shares some common atoms, which weakens the discriminative ability of the reconstruction error of each sub-dictionary. This paper presents a new dictionary learning model to improve sparse representation for image classification, which targets at learning a class-specific sub-dictionary for each class and a common sub-dictionary shared by all classes. The model is composed of a discriminative fidelity, a weighted group sparse constraint and a sub-dictionary incoherence term. The discriminative fidelity encourages each class-specific sub-dictionary to sparsely represent the samples in the corresponding class. The weighted group sparse constraint term aims at capturing the structural information of the data. The sub-dictionary incoherence term is to make all sub-dictionaries independent as much as possible. Because the common sub-dictionary represents features shared by all classes, we only use the reconstruction error of each class-specific sub-dictionary for classification. Extensive experiments are conducted on several public image databases, and the experimental results demonstrate the power of the proposed method, compared to the state-of-the-arts.
    IEEE Transactions on Image Processing 06/2014; 23(9). DOI:10.1109/TIP.2014.2331760 · 3.11 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a novel feature selection-based method for facial age estimation. The face aging is a typical temporal process, and facial images should have certain ordinal patterns in the aging feature space. From the geometrical perspective, a facial image can be usually seen as sampled from a low-dimensional manifold embedded in the original high-dimensional feature space. Thus, we first measure the energy of each feature in preserving the underlying local structure information and the ordinal information of the facial images, respectively, and then we intend to learn a low-dimensional aging representation that can maximally preserve both kinds of information. To further improve the performance, we try to eliminate the redundant local information and ordinal information as much as possible by minimizing nonlinear correlation and rank correlation among features. Finally, we formulate all these issues into a unified optimization problem, which is similar to linear discriminant analysis in format. Since it is expensive to collect the labeled facial aging images in practice, we extend the proposed supervised method to a semi-supervised learning mode including the semi-supervised feature selection method and the semi-supervised age prediction algorithm. Extensive experiments are conducted on the FACES dataset, the Images of Groups dataset, and the {FG-NET} aging dataset to show the power of the proposed algorithms, compared to the state-of-the-arts.
    Cybernetics, IEEE Transactions on 01/2014; DOI:10.1109/TCYB.2014.2376517 · 3.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a new framework to monitor medication intake for elderly individuals by incorporating a video camera and Radio Frequency Identification (RFID) sensors. The proposed framework can provide a key function for monitoring activities of daily living (ADLs) of elderly people at their own home. In an assistive environment, RFID tags are applied on medicine bottles located in a medicine cabinet so that each medicine bottle will have a unique ID. The description of the medicine data for each tag is manually input to a database. RFID readers will detect if any of these bottles are taken away from the medicine cabinet and identify the tag attached on the medicine bottle. A video camera is installed to continue monitoring the activity of taking medicine by integrating face detection and tracking, mouth detection, background subtraction, and activity detection. The preliminary results demonstrate that 100% detection accuracy for identifying medicine bottles and promising results for monitoring activity of taking medicine.
    07/2013; 2(2):61-70. DOI:10.1007/s13721-013-0025-y
  • Changsheng Li · Qingshan Liu · Jing Liu · Hanqing Lu
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a novel feature extraction algorithm specifically designed for learning to rank in image ranking. Different from the previous works, the proposed method not only targets at preserving the local manifold structure of data, but also keeps the ordinal information among different data blocks in the low-dimensional subspace, where a ranking model can be learned effectively and efficiently. We first define the ideal directions of preserving local manifold structure and ordinal information, respectively. Based on the two definitions, a unified model is built to leverage the two kinds of information, which is formulated as an optimization problem. The experiments are conducted on two public available data sets: the MSRA-MM image data set and the “Web Queries” image data set, and the experimental results demonstrate the power of the proposed method against the state-of-the-art methods.
    Signal Processing 06/2013; 93(6):1651–1661. DOI:10.1016/j.sigpro.2012.06.022 · 2.24 Impact Factor
  • Qingshan Liu · Yueting Zhuang
    [Show abstract] [Hide abstract]
    ABSTRACT: Given its importance, the problem of classification in imbalanced data has attracted great attention in recent years. However, few efforts have been made to develop feature selection techniques for the classification of imbalanced data. This paper thus ...
    Neurocomputing 04/2013; 105:1–2. DOI:10.1016/j.neucom.2012.07.038 · 2.01 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Recently, recognizing affects from both face and body gestures attracts more attentions. However, it still lacks of efficient and effective features to describe the dynamics of face and gestures for real-time automatic affect recognition. In this paper, we combine both local motion and appearance feature in a novel framework to model the temporal dynamics of face and body gesture. The proposed framework employs MHI-HOG and Image-HOG features through temporal normalization or bag of words to capture motion and appearance information. The MHI-HOG stands for Histogram of Oriented Gradients (HOG) on the Motion History Image (MHI). It captures motion direction and speed of a region of interest as an expression evolves over the time. The Image-HOG captures the appearance information of the corresponding region of interest. The temporal normalization method explicitly solves the time resolution issue in the video-based affect recognition. To implicitly model local temporal dynamics of an expression, we further propose a bag of words (BOW) based representation for both MHI-HOG and Image-HOG features. Experimental results demonstrate promising performance as compared with the state-of-the-art. Significant improvement of recognition accuracy is achieved as compared with the frame-based approach that does not consider the underlying temporal dynamics.
    Image and Vision Computing 02/2013; 31(2):175-185. DOI:10.1016/j.imavis.2012.06.014 · 1.58 Impact Factor
  • Changsheng Li · Qingshan Liu · Jing Liu · Hanqing Lu
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a new method for facial age estimation based on ordinal discriminative feature learning. Considering the temporally ordinal and continuous characteristic of aging process, the proposed method not only aims at preserving the local manifold structure of facial images, but also it wants to keep the ordinal information among aging faces. Moreover, we try to remove redundant information from both the locality information and ordinal information as much as possible by minimizing nonlinear correlation and rank correlation. Finally, we formulate these two issues into a unified optimization problem of feature selection and present an efficient solution. The experiments are conducted on the public available Images of Groups dataset and the FG-NET dataset, and the experimental results demonstrate the power of the proposed method against the state-of-the-art methods.
    Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on; 06/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motion saliency detection aims at finding the semantic regions in a video sequence. It is an important pre-processing step in many vision applications. In this paper, we propose a new algorithm, Temporal Spectral Residual, for fast motion saliency detection. Different from conventional motion saliency detection algorithms that use complex mathematical models, our goal is to find a good tradeoff between the computational efficiency and accuracy. The basic observation for salient motions is that on the cross section along the temporal axis of a video sequence, the regions of moving objects contain distinct signals while the background area contains redundant information. Thus our focus in this paper is to extract the salient information on the cross section, by utilizing the off-the-shelf method Spectral Residual, which is a 2D image saliency detection method. Majority voting strategy is also introduced to generate reliable results. Since the proposed method only involves Fourier spectrum analysis, it is computationally efficient. We validate our algorithm on two applications: background subtraction in outdoor video sequences under dynamic background and left ventricle endocardium segmentation in MR sequences. Compared with some state-of-art algorithms, our algorithm achieves both good accuracy and fast computation, which satisfies the need as a pre-processing method.
    Neurocomputing 06/2012; 86:24–32. DOI:10.1016/j.neucom.2011.12.033 · 2.01 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The nine papers in this special section on object and event classification in large-scale video collections can be categorized into four themes: video indexing, concept detection, video summarization, and event recognition.
    IEEE Transactions on Multimedia 02/2012; 14(1):1-2. DOI:10.1109/TMM.2011.2176990 · 1.78 Impact Factor
  • Changsheng Li · Qingshan Liu · Jing Liu · Hanqing Lu
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a novel regression method based on distance metric learning for human age estimation. We take age estimation as a problem of distance-based ordinal regression, in which the facial aging trend can be discovered by a learned distance metric. Through the learned distance metric, we hope that both the ordinal information of different age groups and the local geometry structure of the target neighborhoods can be well preserved simultaneously. Then, the facial aging trend can be truly discovered by the learned metric. Experimental results on the publicly available FG-NET database are very competitive against the state-of-the-art methods.
    Pattern Recognition (ICPR), 2012 21st International Conference on; 01/2012

Publication Stats

2k Citations
67.99 Total Impact Points

Institutions

  • 2015
    • Nanjing University of Science and Technology
      Nan-ching, Jiangsu Sheng, China
  • 2011–2015
    • Nanjing University of Information Science & Technology
      Nan-ching, Jiangsu Sheng, China
  • 2007–2011
    • Rutgers, The State University of New Jersey
      • • Department of Computer Science
      • • Center for Computational Biomedicine Imaging and Modeling (CBIM)
      Нью-Брансуик, New Jersey, United States
  • 2002–2009
    • Chinese Academy of Sciences
      • • Institute of Automation
      • • National Pattern Recognition Laboratory
      Peping, Beijing, China
  • 2005–2008
    • The Chinese University of Hong Kong
      • Department of Information Engineering
      Hong Kong, Hong Kong