Liang Wang

Chinese Academy of Sciences, Peping, Beijing, China

Are you Liang Wang?

Claim your profile

Publications (70)31.88 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Spatial information is an important cue for visual object analysis. Various studies in this field have been conducted. However, they are either too rigid or too fragile to efficiently utilize such information. In this paper, we propose to model the distribution of objects׳ local appearance patterns by using their co-occurrence at different spatial locations. In order to represent such a distribution, we propose a flexible framework called spatial feature co-pooling, with which the relations between patterns are discovered. As the final representation resulted from our framework is of high dimensionality, we propose a semi-greedy (SG) grafting algorithm to select the most discriminative features. Experimental results on the CIFAR 10, UIUC Sports and VOC 2007 datasets show that our method is effective and comparable with the state-of-art algorithms.
    Neurocomputing. 01/2014; 139:415–422.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Global spatial structure is an important factor for visual object recognition but has not attracted sufficient attention in recent studies. Especially, the problems of features' ambiguity and sensitivity to location change in the image space are not yet well solved. In this paper, we propose multiple spatial pooling (MSP) to address these problems. MSP models global spatial structure with multiple Gaussian distributions and then pools features according to the relations between features and Gaussian distributions. Such a process is further generalized into a unified framework, which formulates multiple pooling using matrix operation with structured sparsity. Experiments in terms of scene classification and object categorization demonstrate that MSP can enhance traditional algorithms with small extra computational cost.
    Neurocomputing 01/2014; 129:225–231. · 1.63 Impact Factor
  • Zhen Zhou, Li Zhong, Liang Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: Clustering methods are widely deployed in the fields of data mining and pattern recognition. Many of them require the number of clusters as the input, which may not be practical when it is totally unknown. Several existing visual methods for cluster tendency assessment can be used to estimate the number of clusters by displaying the pairwise dissimilarity matrix into an intensity image where objects are reordered to reveal the hidden data structure as dark blocks along the diagonal. A major limitation of the existing methods is that they are not capable to highlight cluster structure with complex clusters. To address this problem, this paper proposes an effective approach by using Markov Random Fields, which updates each object with its local information dynamically and maximizes the global probability measure. The proposed method can be used to determine the cluster tendency and partition data simultaneously. Experimental results on synthetic and real-world datasets demonstrate the effectiveness of the proposed method.
    Neurocomputing 01/2014; 136:49–55. · 1.63 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Feature coding and pooling are two critical stages in the widely used Bag-of-Features (BOF) framework in image classification. After coding, each local feature formulates its representation by the visual codewords. However, the two-dimensional feature-code layout is transformed to a one-dimensional codeword representation after pooling. The property for each local feature is ignored and the whole representation is tightly coupled. To resolve this problem, we propose a hierarchical feature coding approach which regards each feature-code representation as a high level feature. Codeword learning, coding and pooling are also applied to these new features, and thus a high level representation of the image is obtained. Experiments on different datasets validate our analysis and demonstrate that the new representation is more discriminative than that in the previous BOF framework. Moreover, we show that various kinds of traditional feature coding algorithms can be easily embedded into our framework to achieve better performance.
    Neurocomputing 01/2014; 144:509–515. · 1.63 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a novel unsupervised fall detection system that employs the collected acoustic signals (footstep sound signals) from an elderly person׳s normal activities to construct a data description model to distinguish falls from non-falls. The measured acoustic signals are initially processed with a source separation (SS) technique to remove the possible interferences from other background sound sources. Mel-frequency cepstral coefficient (MFCC) features are next extracted from the processed signals and used to construct a data description model based on a one class support vector machine (OCSVM) method, which is finally applied to distinguish fall from non-fall sounds. Experiments on a recorded dataset confirm that our proposed fall detection system can achieve better performance, especially with high level of interference from other sound sources, as compared with existing single microphone based methods.
    Signal Processing. 01/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Cross-modal matching has recently drawn much attention due to the widespread existence of multimodal data. It aims to match data from different modalities, and generally involves two basic problems: the measure of relevance and coupled feature selection. Most previous works mainly focus on solving the first problem. In this paper, we propose a novel coupled linear regression framework to deal with both problems. Our method learns two projection matrices to map multimodal data into a common feature space, in which cross-modal data matching can be performed. And in the learning procedure, the ell_21-norm penalties are imposed on the two projection matrices separately, which leads to select relevant and discriminative features from coupled feature spaces simultaneously. A trace norm is further imposed on the projected data as a low-rank constraint, which enhances the relevance of different modal data with connections. We also present an iterative algorithm based on half-quadratic minimization to solve the proposed regularized linear regression problem. The experimental results on two challenging cross-modal datasets demonstrate that the proposed method outperforms the state-of-the-art approaches.
    Proceedings of the 2013 IEEE International Conference on Computer Vision; 12/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Recommender systems become popular in recent years. It has been widely used in many areas such as e-commerce and social networks. Many recommender algorithms have been proposed and the most famous one is collaborative filtering algorithm. As it is vulnerable to inject profile attacks, some people try to attack the system due to profit purpose. They promote or demote target items by injecting some fake user profiles into the system. Existing works are mainly focused on detecting individual spammers and few of them are proposed to detect groups of spammers while these groups have stronger influence than individuals on recommender systems. But none of them can detect items which are under attack at the same time. In this paper we propose a method which can not only detect groups of spammers but also detect correlated groups of items. The proposed method first find some candidate groups of attackers and candidate groups of target items. Then we derive three features from the collusion attack phenomenon to detect groups of spammers and groups of target items. Finally we utilize rank aggregation technique to get the most possible group of spammers and correlated group of target items. Experimental results demonstrate the effectiveness of the proposed method.
    Proceedings of the 2013 Fifth International Conference on Multimedia Information Networking and Security; 11/2013
  • Ran He, Tieniu Tan, Liang Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: Low-rank matrix recovery algorithms aim to recover a corrupted low-rank matrix with sparse errors. However, corrupted errors may not be sparse in real-world problems and the relationship between L1 regularizer on noise and robust M-estimators is still unknown. This paper proposes a general robust framework for low-rank matrix recovery via implicit regularizers of robust M-estimators, which are derived from convex conjugacy and can be used to model arbitrarily corrupted errors. Based on the additive form of half-quadratic optimization, proximity operators of implicit regularizers are developed such that both low-rank structure and corrupted errors can be alternately recovered. In particular, the dual relationship between the absolute function in L1 regularizer and Huber M-estimator is studied, which establishes a relationship between robust low-rank matrix recovery methods and M-estimators based robust principal component analysis methods. Extensive experiments on synthetic and real-world datasets corroborate our claims and verify the robustness of the proposed framework.
    IEEE Transactions on Software Engineering 09/2013; · 2.59 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Sparse representation is one of the most influential frameworks for visual tracking. However, when applying this framework to the real-world tracking applications, there are still many challenges such as appearance variations and background noise. In ...
    Pattern Recognition. 07/2013; 46(7):1748–1749.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Image classification is a hot topic in computer vision and pattern recognition. Feature coding, as a key component of image classification, has been widely studied over the past several years, and a number of coding algorithms have been proposed. However, there is no comprehensive study concerning the connections between different coding methods, especially how they have evolved. In this paper, we firstly make a survey on various feature coding methods, including their motivations and mathematical representations, and then exploit their relations, based on which a taxonomy is proposed to reveal their evolution. Further, we summarize the main characteristics of current algorithms, each of which is shared by several coding strategies. Finally, we choose several representatives from different kinds of coding approaches and empirically evaluate them with respect to the size of the codebook and the number of training samples on several widely used databases (15-Scenes, Caltech-256, PASCAL VOC07, and SUN397). Experimental findings firmly justify our theoretical analysis, which is expected to benefit both practical applications and future research.
    IEEE Transactions on Software Engineering 06/2013; · 2.59 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Spatial information in images is considered to be of great importance in the process of object recognition. Recent studies show that human's classification accuracy might drop dramatically if the spatial information of an image is removed. The original bag-of-words (BoW) model is actually a system simulating such a classification process with incomplete information. To handle the spatial information, spatial pyramid matching (SPM) was proposed, which has become the most widely used scheme in the purpose of spatial modeling. Given an image, SPM divides it into a series of spatial blocks on several levels and concatenates the representations obtained separately within all the blocks. SPM greatly improves the performance since it embeds spatial information into BoW. However, SPM ignores the relationships between the spatial blocks. To address this problems, we propose a new scheme based on a spatial graph, whose nodes correspond to the spatial blocks in SPM, and edges correspond to the relationships between the blocks. Thorough experiments on several popular datasets verify the advantages of the proposed scheme.
    Proceedings of the 11th Asian conference on Computer Vision - Volume Part I; 11/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: In gait recognition field, template-based approaches such as Gait Energy Image (GEI) and Chrono-Gait Image (CGI) can achieve good recognition performance with low computational cost. Meanwhile, CGI can preserve temporal information better than GEI. However, they pay less attention to the local shape features. To preserve temporal information and generate more abundant local shape features, we generate multiple HOG templates by extracting Histogram of Oriented Gradients (HOG) of GEI and CGI templates. Experiments show that compared with several published approaches, our proposed multiple HOG templates achieve better performance for gait recognition.
    Pattern Recognition (ICPR), 2012 21st International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Recognizing objectionable content draws more and more attention nowadays given the rapid proliferation of images and videos on the Internet. Although there are some investigations about violence video detection and pornographic information filtering, very few existing methods touch on the problem of violence detection in still images. However, given its potential use in violence webpage filtering, online public opinion monitoring and some other aspects, recognizing violence in still images is worth being deeply investigated. To this end, we first establish a new database containing 500 violence images and 1500 non-violence images. And we use the Bag-of-Words (BoW) model which is frequently adopted in image classification domain to discriminate violence images and non-violence images. The effectiveness of four different feature representations are tested within the BoW framework. Finally the baseline results for violence image detection on our newly built database are reported.
    Advanced Video and Signal-Based Surveillance (AVSS), 2012 IEEE Ninth International Conference on; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Hermeneutics, defined as the art of interpreting a message, centered its study for many centuries in theorizing the interpretation process in written texts, mainly biblical ones. The appearance of the first recordings of image sequences during the last period of the 19th century expanded such a domain. The aesthetics of such videos was of interest in the early 20th century to great philosophers such as Ludwig Wittgenstein since, in that new format of communication, cinematographic texts became an interpretation game in which the language film was articulated in a network of multiple readings. Hermeneutics in image sequences, or as we call it, Video-Hermeneutics (VH) involves explaining the subjective and social value of the human behavior observed in image sequences and, in general, of all multimedia content. So VH is not only related to what happens in a video, but also to understand the meaning of what is being described, i.e. what message is being transmitted to us as human observers.
    Computer Vision and Image Understanding 01/2012; 116:305-306. · 1.23 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Saliency is an important factor in feature coding, based on which saliency coding (SaC) has been proposed for image classification recently. SaC is both effective and efficient in case of a moderate-scale codebook. However, empirical studies show that SaC will lose its superiority as the codebook size increases. To address this problem, we propose a group coding strategy, wherein the latent structure information of a codebook is explored by grouping neighboring codewords into a group-code. We apply group coding to SaC and derive the group saliency coding (GSC) scheme. Thorough experiments on different datasets show that GSC consistently performs better than SaC, and also outperforms other popular coding schemes, e.g., local-constrained linear coding, in terms of both accuracy and speed.
    Pattern Recognition (ICPR), 2012 21st International Conference on; 01/2012
  • Yan Huang, Wei Wang, Liang Wang, Tieniu Tan
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a new region-based saliency model to simulate the human visual attention. First, we construct a pixel-level fully-connected graph representation for an image, and perform normalized cut to segment the image based on the proximity and similarity principles. After obtaining image regions, we reconstruct a region-based fully-connected graph. Based on the saliency principle “center-surround contrast”, we define new dissimilarity functions in terms of several visual features. Finally we run a random walk on the region graph and apply site entropy rate to measure the region saliency. We evaluate the proposed model on a public dataset consisting of 120 images. Experimental results demonstrate that our model predicts eye fixations more accurately than the other four state-of-the-art methods. We also apply our saliency model to improve the performance of image retargeting.
    Pattern Recognition (ICPR), 2012 21st International Conference on; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Analysis of retinal blood vessels allows us to identify individuals with the onset of cardiovascular diseases, diabetes and hypertension. Unfor-tunately, this analysis requires a specialist to identify specific retinal fea-tures which is not always possible. Automation of this process will allow the analysis to be performed in regions where specialists are non-existent and also large scale analysis. Many algorithms have been designed to ex-tract the retinal features from fundus images. However, to date, these algorithms have been evaluated using generic image similarity measures without any justification of the reliability of these measures. In this article, we study the applicability of different measures for retinal ves-sel segmentation evaluation task. In addition, we propose an evaluation measure, F 1 , which is based on precision, recall and F-measure concept to deal with this evaluation task. An important property of F 1 is its tolerance of small localization errors which often appear in a segmented image, but do not affect the desired retinal features. The performances of different measures are tested on both real and synthetic datasets which take into account the important properties of retinal blood vessels. The results show that F 1 provides the greatest correlation to the desired eval-uation measure in all experiments. Thus, it is the most suitable measure for retinal segmentation evaluation task.
    International Journal of Computer Vision and Signal Processing. 01/2012; 1(1):1-8.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we study the problem of robust feature extraction based on l2,1 regularized correntropy in both theoretical and algorithmic manner. In theoretical part, we point out that an l2,1-norm minimization can be justified from the viewpoint of half-quadratic (HQ) optimization, which facilitates convergence study and algorithmic development. In particular, a general formulation is accordingly proposed to unify l1-norm and l2,1-norm minimization within a common framework. In algorithmic part, we propose an l2,1 regularized correntropy algorithm to extract informative features meanwhile to remove outliers from training data. A new alternate minimization algorithm is also developed to optimize the non-convex correntropy objective. In terms of face recognition, we apply the proposed method to obtain an appearance-based model, called Sparse-Fisherfaces. Extensive experiments show that our method can select robust and sparse features, and outperforms several state-of-the-art subspace methods on largescale and open face recognition datasets.
    Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Understanding the relationship among different modalities is a challenging task. The frequently used canonical correlation analysis (CCA) and its variants have proved effective for building a common space in which the correlation between different modalities is maximized. In this paper, we show that CCA and its variants may cause information dissipation when switching the modals, and thus propose to use the continuum regression (CR) model to handle this problem. In particular, the CR model with a fixed variance coefficient of 1/2 is adopted here. We also apply the multinomial logistic regression model for further classification task. To evaluate the CR model, we perform a series of cross-modal retrieval experiments in terms of two kinds of modals, namely image and text. Compared with previous methods, experimental results show that the CR model has achieved the best retrieval precision, which demonstrates the potential of our method for real internet search applications.
    Image Processing (ICIP), 2012 19th IEEE International Conference on; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Gait Energy Image (GEI) is an efficient template for human identification by gait. However, such template loses temporal information in a gait sequence, which is critial to the performance of gait recognition. To address this issue, we develop a novel temporal template, named Chrono-Gait Image (CGI) in this paper. The proposed CGI template first extracts the contour in each gait frame, followed by encoding each of gait contour images in the same gait sequence with a multi-channel mapping function and compositing them to a single CGI. To make the templates robust to complex surrounding environment, we also propose CGI-based real and synthetic temporal information preserving templates by using different gait periods and contour distortion techniques. Extensive experiments on three benchmark gait databases indicate that, compared with the recently published gait recognition approaches, our CGI-based temporal information preserving approach achieves competitive performance in gait recognition with robustness and efficiency.
    IEEE Transactions on Software Engineering 12/2011; · 2.59 Impact Factor

Publication Stats

481 Citations
31.88 Total Impact Points

Institutions

  • 2007–2014
    • Chinese Academy of Sciences
      • • Institute of Automation
      • • National Pattern Recognition Laboratory
      Peping, Beijing, China
  • 2007–2013
    • Northeast Institute of Geography and Agroecology
      • • Institute of Automation
      • • National Pattern Recognition Laboratory
      Beijing, Beijing Shi, China
  • 2007–2011
    • University of Melbourne
      • Department of Electrical and Electronic Engineering
      Melbourne, Victoria, Australia
  • 2010
    • Southeast University (China)
      • School of Computer Science and Engineering
      Nanjing, Jiangxi Sheng, China
    • Nanyang Technological University
      • School of Computer Engineering
      Singapore, Singapore
    • University of Bath
      • Department of Computer Science
      Bath, England, United Kingdom
  • 2009
    • Deakin University
      • School of Information Technology
      Geelong, Victoria, Australia
  • 2008
    • Victoria University Melbourne
      Melbourne, Victoria, Australia
  • 2006–2008
    • Monash University (Australia)
      • • Department of Electrical and Computer Systems Engineering, Clayton
      • • Intelligent Robotics Research Centre (IRRC)
      Melbourne, Victoria, Australia