Yunhong Wang

Beihang University(BUAA), Peping, Beijing, China

Are you Yunhong Wang?

Claim your profile

Publications (259)133.4 Total impact

  • Jie Qin · Li Liu · Zhaoxiang Zhang · Yunhong Wang · Ling Shao
    [Show abstract] [Hide abstract]
    ABSTRACT: Human action recognition in videos has been extensively studied in recent years due to its wide range of applications. Instead of classifying video sequences into a number of action categories, in this paper, we focus on a particular problem of action similarity labeling, which aims at verifying whether a pair of videos contain the same type of action or not. To address this challenge, a novel approach called Compressive Sequential Learning (CSL) is proposed by leveraging the compressive sensing theory and sequential learning. We first project data points to a low dimensional space by effectively exploring an important property in compressive sensing: the Restricted Isometry Property (RIP). In particular, a very sparse measurement matrix is adopted to reduce the dimensionality efficiently. We then learn an ensemble classifier for measuring similarities between pair-wise videos by iteratively minimizing its empirical risk with the AdaBoost strategy on the training set. Unlike conventional AdaBoost, the weak learner for each iteration is not explicitly defined and its parameters are learned through greedy optimization. Furthermore, an alternative of CSL named Compressive Sequential Encoding (CSE) is developed as an encoding technique and followed by a linear classifier to address the similarity labeling problem. Our method has been systematically evaluated on four action data sets: ASLAN, KTH, HMDB51 and Hollywood2, and the results show the effectiveness and superiority of our method for action similarity labeling.
    No preview · Article · Dec 2015 · IEEE Transactions on Image Processing
  • Heng Wang · Di Huang · Yunhong Wang · Hongyu Yang
    [Show abstract] [Hide abstract]
    ABSTRACT: In aging simulation, the most essential requirements are (1) human identity should remain stable in texture synthesis; and (2) the texture synthesized is expected to accord with human cognitive perception in aging. In this paper, we address the problem of face aging simulation by using a tensor completion based method. The proposed method is composed of two steps. In the first stage, Active Appearance Models (AAM) is applied to facial images to normalize pose variations. In the second stage, the tensor completion based aging simulation method is adopted to synthesize aging effects on facial images. By introducing age and identity prior information in the tensor space, human identity is mostly protected during the aging procedure and proper textures are generated to simulate the aged appearance. Experimental results achieved on the FG-NET database are not only in the age as subjective expectation, but also reserve the person specific cues, which demonstrates the effectiveness of the proposed method.
    No preview · Chapter · Nov 2015
  • Hongyu Yang · Di Huang · Yunhong Wang · Heng Wang · Yuanyan Tang
    [Show abstract] [Hide abstract]
    ABSTRACT: Face aging simulation has received rising investigations nowadays, whereas it still remains a challenge to generate convincing and natural age-progressed face images. In this paper, we present a novel approach to such an issue by using hidden factor analysis joint sparse representation. In contrast to the majority of tasks in the literature that handle the facial texture integrally, the proposed aging approach separately models the person-specific facial properties that tend to be stable in a relatively long period and the age-specific clues that change gradually over time. It then merely transforms the age component to a target age group via sparse reconstruction, yielding aging effects, which is finally combined with the identity component to achieve the aged face. Experiments are carried out on three aging databases, and the results achieved clearly demonstrate the effectiveness and robustness of the proposed method in rendering a face with aging effects. Additionally, a series of evaluations prove its validity with respect to identity preservation and aging effect generation.
    No preview · Article · Nov 2015
  • Lin Wu · Yunhong Wang · Jiangtao Long · Zhisheng Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: The novel approach presented in this paper aims for unsupervised change detection applicable and adaptable to remote sensing images. This is achieved based on a combination of principal component analysis (PCA) and genetic algorithm (GA). The PCA is firstly applied to difference image to enhance the change information, and the significance index F is computed for selecting the principal components which contain predominant change information based on Gaussian mixture model. Then the unsupervised change detection is implemented and the resultant optimal binary change detection mask is obtained by minimizing a mean square error (MSE) based fitness function using GA. We apply the proposed and the state-of-the-art change detection methods to ASTER and QuickBird data sets, meanwhile the extensive quantitative and qualitative analysis of change detection results manifests the capability of the proposed approach to consistently produce promising results on both data sets without any priori assumptions.
    No preview · Article · Aug 2015
  • Tao Xu · Zhaoxiang Zhang · Yunhong Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: In human-centric technologies, skin segmentation of body parts is a prerequisite for high-level processing. The traditional method of skin detection is pixel-wise detection coupled with morphological operations. Pixel-wise methods usually generate a number of false samples and outlier skin pixels, which can make it difficult for morphological operations to provide satisfactory results in complex scenarios. Furthermore, in many cases only a coarse region is required (e.g., the bounding-box of the face) rather than detailed pixel-wise labeling. A patch-wise skin segmentation method is proposed based on deep neural networks. Our method treats image patches as processing units instead of pixels, which directly exploits the spatial information of pixels in the detection stage rather than using morphological operations on isolated pixels after detection. An image patch dataset is built and deep skin models (DSMs) are trained based on the new dataset. Trained DSMs are then integrated into a sliding window framework to segment skin regions of the human body parts. Experiments on standard benchmarks demonstrate that DSMs provide more explicit skin region of interest candidates than pixel-wise methods in complex scenarios, and achieve competitive performance on pixel-wise skin detection.
    No preview · Article · Aug 2015 · Journal of Electronic Imaging
  • Source
    Jiaxin Chen · Zhaoxiang Zhang · Yunhong Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: Person re-identification aims to match people across non-overlapping camera views, which is an important but challenging task in video surveillance. In order to obtain a robust metric for matching, metric learning has been introduced recently. Most existing works focus on seeking a Mahalanobis distance by employing sparse pairwise constraints, which utilize image pairs with the same person identity as positive samples, and select a small portion of those with different identities as negative samples. However, this training strategy has abandoned a large amount of discriminative information, and ignored the relative similarities. In this paper, we propose a novel Relevance Metric Learning method with Listwise Constraints (RMLLC) by adopting listwise similarities, which consist of the similarity list of each image with respect to all remaining images. By virtue of listwise similarities, RMLLC could capture all pairwise similarities, and consequently learn a more discriminative metric by enforcing the metric to conserve predefined similarity lists in a low dimensional projection subspace. Despite the performance enhancement, RMLLC using predefined similarity lists fails to capture the relative relevance information, which is often unavailable in practice. To address this problem, we further introduce a rectification term to automatically exploit the relative similarities, and develop an efficient alternating iterative algorithm to jointly learn the optimal metric and the rectification term. Extensive experiments on four publicly available benchmarking datasets are carried out and demonstrate that the proposed method is significantly superior to state-of-the-art approaches. The results also show that the introduction of the rectification term could further boost the performance of RMLLC.
    Full-text · Article · Aug 2015 · IEEE Transactions on Image Processing
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a fully automatic multimodal 2D + 3D feature-based facial expression recognition approach and demonstrate its performance on the BU-3DFE database. Our approach combines multi-order gradient-based local texture and shape descriptors in order to achieve efficiency and robustness. First, a large set of fiducial facial landmarks of 2D face images along with their 3D face scans are localized using a novel algorithm namely incremental Parallel Cascade of Linear Regression (iPar-CLR). Then, a novel Histogram of Second Order Gradients (HSOG) based local image descriptor in conjunction with the widely used first-order gradient based SIFT descriptor are used to describe the local texture around each 2D landmark. Similarly, the local geometry around each 3D landmark is described by two novel local shape descriptors constructed using the first-order and the second-order surface differential geometry quantities, i.e., Histogram of mesh Gradients (meshHOG) and Histogram of mesh Shape index (curvature quantization, meshHOS). Finally, the Support Vector Machine (SVM) based recognition results of all 2D and 3D descriptors are fused at both feature-level and score-level to further improve the accuracy. Comprehensive experimental results demonstrate that there exist impressive complementary characteristics between the 2D and 3D descriptors. We use the BU-3DFE benchmark to compare our approach to the state-of-the-art ones. Our multimodal feature-based approach outperforms the others by achieving an average recognition accuracy of 86.32%. Moreover, a good generalization ability is shown on the Bosphorus database.
    No preview · Article · Jul 2015 · Computer Vision and Image Understanding
  • Huibin Li · Di Huang · Jean-Marie Morvan · Yunhong Wang · Liming Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Registration algorithms performed on point clouds or range images of face scans have been successfully used for automatic 3D face recognition under expression variations, but have rarely been investigated to solve pose changes and occlusions mainly since that the basic landmarks to initialize coarse alignment are not always available. Recently, local feature-based SIFT-like matching proves competent to handle all such variations without registration. In this paper, towards 3D face recognition for real-life biometric applications, we significantly extend the SIFT-like matching framework to mesh data and propose a novel approach using fine-grained matching of 3D keypoint descriptors. First, two principal curvature-based 3D keypoint detectors are provided, which can repeatedly identify complementary locations on a face scan where local curvatures are high. Then, a robust 3D local coordinate system is built at each keypoint, which allows extraction of pose-invariant features. Three keypoint descriptors, corresponding to three surface differential quantities, are designed, and their feature-level fusion is employed to comprehensively describe local shapes of detected keypoints. Finally, we propose a multi-task sparse representation based fine-grained matching algorithm, which accounts for the average reconstruction error of probe face descriptors sparsely represented by a large dictionary of gallery descriptors in identification. Our approach is evaluated on the Bosphorus database and achieves rank-one recognition rates of 96.56, 98.82, 91.14, and 99.21 % on the entire database, and the expression, pose, and occlusion subsets, respectively. To the best of our knowledge, these are the best results reported so far on this database. Additionally, good generalization ability is also exhibited by the experiments on the FRGC v2.0 database.
    No preview · Article · Jun 2015 · International Journal of Computer Vision
  • Zheng Liu · Zhaoxiang Zhang · Qiang Wu · Yunhong Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: Person re-identification is an important problem for associating behavior of people monitored in surveillance camera networks. The fundamental challenges of person re-identification are the large appearance distortions caused by view angles, illumination and occlusions. To address these challenges, a method is proposed in this paper to enhance person re-identification by integrating gait biometric. The proposed framework consists of the hierarchical feature extraction and descriptor matching with learned metric matrixes. Considering the appearance feature is not discriminative in some cases, the feature in this work composes of the appearance features and the gait feature for shape and temporal information. In order to solve the view-angle change problem and measuring similarity, data are mapped into a metric space so that distances between people can be measured more accurately. Then two fusion strategies are adopted. The score-level fusion computes distances on the appearance feature and the gait feature, respectively, and combine them as the final distance between samples. The feature-level fusion firstly installs two types of features in series and then computes distances by the fused feature. Finally, our method is tested on the CASIA gait dataset. Experiments show that integrating gait biometric is an effective way to enhance person re-identification.
    No preview · Article · May 2015 · Neurocomputing
  • Source
    Chunlei Li · Zhaoxiang Zhang · Yunhong Wang · Bin Ma · Di Huang
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposed a wavelet quantization based method for robust watermarking with resistance to incidental distortions. For transform domain based watermarking algorithms, blindly localizing adequate significant coefficients is one critical issue to guarantee the robustness while preserving good fidelity. In the proposed method, low frequency wavelet coefficients of the host image are randomly permutated into sub-groups according to a watermarking secret key. Embedding modifications are then distributed to important coefficients which preserve large perceptual capacity by quantizing the significant amplitude difference (SAD). Meanwhile, dither modulation strategy is employed to control the quantization artifacts and increase the robustness. In such a framework, the blind watermark extraction can be straightforwardly achieved with the watermarking secret keys which only shared by the embedder and extractor for advanced security. Numerous comparison experiments are conducted to evaluate the watermarking performance. Experimental results demonstrate the superiority of our scheme on robustness against content-preserving operations and incidental distortions such as JPEG compression, Gaussian noise.
    Full-text · Article · Apr 2015 · Neurocomputing
  • Source
    Dawei Weng · Yunhong Wang · Mingming Gong · Dacheng Tao · Hui Wei · Di Huang
    [Show abstract] [Hide abstract]
    ABSTRACT: Studies in neuroscience and biological vision have shown that the human retina has strong computational power, and its information representation supports vision tasks on both ventral and dorsal pathways. In this paper, a new local image descriptor, termed Distinctive Efficient Robust Features, or DERF, is derived by modeling the response and distribution properties of the parvocellular-projecting ganglion cells (P-GCs) in the primate retina. DERF features exponential scale distribution, exponential grid structure, and circularly symmetric function Difference of Gaussian (DoG) used as a convolution kernel, all of which are consistent with the characteristics of the ganglion cell array found in neurophysiology, anatomy, and biophysics. In addition, a new explanation for local descriptor design is presented from the perspective of wavelet tight frames. DoG is naturally a wavelet, and the structure of the grid points array in our descriptor is closely related to the spatial sampling of wavelets. The DoG wavelet itself forms a frame, and when we modulate the parameters of our descriptor to make the frame tighter, the performance of the DERF descriptor improves accordingly. This is verified by designing a tight frame DoG (TF-DoG) which leads to much better performance. Extensive experiments conducted in the image matching task on the Multiview Stereo Correspondence Data set demonstrate that DERF outperforms state of the art methods for both hand-crafted and learned descriptors, while remaining robust and being much faster to compute.
    Full-text · Article · Mar 2015 · IEEE Transactions on Image Processing
  • Qingkai Zhen · Di Huang · Yunhong Wang · Liming Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Facial expression is the most important channel for human nonverbal communication. This paper presents a novel and effective approach to automatic 3D Facial Expression Recognition, FER based on the Muscular Movement Model (MMM). In contrast to most of existing methods, MMM deals with such an issue in the viewpoint of anatomy. It first automatically segments the input face by localizing the corresponding points around each muscular region of the reference face using Iterative Closest Normal Pattern (ICNP). A set of shape features of multiple differential quantities, including coordinates, normals and shape index values, are then extracted to describe the geometry deformation of each segmented region. Therefore, MMM tends to combine both the advantages of the model based techniques as well as the feature based ones. Meanwhile, we analyze the importance of these muscular areas, and a score level fusion strategy which optimizes the weights of the muscular areas by using a Genetic Algorithm (GA) is proposed in the learning step. The muscular areas with their optimal weights are finally combined to predict the expression label. The experiments are carried out on the BU-3DFE database, and the results clearly demonstrate the effectiveness of the proposed method.
    No preview · Article · Jan 2015
  • Yunhong Wang · Jiaxin Chen · Ningning Liu · Li Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: In this working note, we mainly focus on the image anno- tation subtask of ImageCLEF 2015 challenge that BUAA-iCC research group participated. For this task, we ¯rstly explore textual similarity information between each test sample and prede¯ned concept. Subse- quently, two di®erent kinds of semantic information are extracted from visual images: visual tags using generic object recognition classi¯ers and visual tags relevant to human being related concepts. For the former information, the visual tags are predicted by using deep convolutional neural network (CNN) and a set of support vector machines trained on ImageNet, and ¯nally transferred to textual information. For the latter visual information, human related concepts are extracted via face and facial attribute detection, and ¯nally transferred to similarity informa- tion by using manually designed mapping rules, in order to enhance the performance of annotating human related concepts. Meanwhile, a late fusion strategy is developed to incorporate aforementioned various kinds of similarity information. Results validate that the combination of the textual and visual similarity information and the adopted late fusion strategy could yield signi¯cantly better performance.
    No preview · Conference Paper · Jan 2015
  • Zhaoxiang Zhang · Jianliang Hao · Yunhong Wang · Yuhang Zhao
    [Show abstract] [Hide abstract]
    ABSTRACT: We address the problem of human pose estimation, which is a very challenging problem due to view angle variance, noise and occlusions. In this paper, we propose a novel human parsing method which can estimate diverse human poses from real world images. We merge the parallel lines feature and uniform LBP feature, thereby the new feature contains both shape and texture information, which can be used by discriminative body part detectors. The standard tree model is augmented by using virtual nodes in order to describe the correlations between originally unconnected nodes, which enhances the robustness of the traditional kinematic tree model. We test our method in a sports image dataset, and the experimental results demonstrate the advantages of the merged feature as well as the augmented pose model in real applications.
    No preview · Article · Dec 2014
  • Zhaoxiang Zhang · Jie Qin · Yunhong Wang · Meng Liang
    [Show abstract] [Hide abstract]
    ABSTRACT: Object Classification in traffic scene surveillance has gained popularity in recent years. Traditional methods tend to utilize a large number of labeled training samples to achieve a satisfactory classification performance. However, labels of samples are not always available and manual labeling work is both time and labor consuming. To address the problem, a large number of semi-supervised learning based methods have been proposed, but most of them only focus on the offline settings. Motivated by an active learning framework, a novel online learning strategy is proposed in this paper. Furthermore, an intuitive semi-supervised learning method, which incorporates the spirits of both the online and active learning, is proposed and utilized in the scenario of traffic scene surveillance. The proposed learning framework is evaluated on the BUAA-IRIP traffic database, and the observed superior performance proves the effectiveness of our approach.
    No preview · Article · Dec 2014
  • Di Huang · Jia Sun · Xudong Yang · Dawei Weng · Yunhong Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: In the past decade, research on 3D face analysis has been extensively developed, and this study briefly reviews the progress achieved in data acquisition, algorithms, and experimental methodologies, for the issues of face recognition, facial expression recognition, gender and ethnicity classification, age estimation, etc., especially focusing on that after the availability of FRGC v2.0. It further points out several challenges to deal with for more efficient and reliable systems in the real world.
    No preview · Chapter · Nov 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Most existing pose-independent Face Recognition (FR) tech-niques take advantage of 3D model to guarantee the natural-ness while normalizing or simulating pose variations. Two nontrivial problems to be tackled are accurate measurement of pose parameters and computational efficiency. In this pa-per, we introduce an effective and efficient approach to esti-mate human head pose, which fundamentally ameliorates the performance of 3D aided FR systems. The proposed method works in a progressive way: firstly, a random forest (RF) is constructed utilizing synthesized images derived from 3D models; secondly, the classification result obtained by apply-ing well-trained RF on a probe image is considered as the preliminary pose estimation; finally, this initial pose is trans-ferred to shape-based 3D morphable model (3DMM) aiming at definitive pose normalization. Using such a method, simi-larity scores between frontal view gallery set and pose-nor-malized probe set can be computed to predict the identity. Experimental results achieved on the UHDB dataset outper-form the ones so far reported. Additionally, it is much less time-consuming than prevailing 3DMM based approaches.
    No preview · Conference Paper · Oct 2014
  • Di Huang · Yinhang Tang · Yiding Wang · Liming Chen · Yunhong Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: As an emerging biometric for people identification, the dorsal hand vein has received increasing attention in recent years due to the properties of being universal, unique, permanent, and contactless, and especially its simplicity of liveness detection and difficulty of forging. However, the dorsal hand vein is usually captured by near-infrared (NIR) sensors and the resulting image is of low contrast and shows a very sparse subcutaneous vascular network. Therefore, it does not offer sufficient distinctiveness in recognition particularly in the presence of large population. This paper proposes a novel approach to hand-dorsa vein recognition through matching local features of multiple sources. In contrast to current studies only concentrating on the hand vein network, we also make use of person dependent optical characteristics of the skin and subcutaneous tissue revealed by NIR hand-dorsa images and encode geometrical attributes of their landscapes, e.g., ridges, valleys, etc., through different quantities, such as cornerness and blobness, closely related to differential geometry. Specifically, the proposed method adopts an effective keypoint detection strategy to localize features on dorsal hand images, where the speciality of absorption and scattering of the entire dorsal hand is modeled as a combination of multiple (first-, second-, and third-) order gradients. These features comprehensively describe the discriminative clues of each dorsal hand. This method further robustly associates the corresponding keypoints between gallery and probe samples, and finally predicts the identity. Evaluated by extensive experiments, the proposed method achieves the best performance so far known on the North China University of Technology (NCUT) Part A dataset, showing its effectiveness. Additional results on NCUT Part B illustrate its generalization ability and robustness to low quality data.
    No preview · Article · Oct 2014 · Cybernetics, IEEE Transactions on
  • Qingjie Liu · Yunhong Wang · Zhaoxiang Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Utilizing an implicit nonparametric learning framework, a neighbor-embedding-based method is proposed to solve the remote-sensing pan-sharpening problem. First, the original high-resolution (HR) and down-sampled panchromatic (Pan) images are used to train the high/low-resolution (LR) patch pair dictionaries. Based on the perspective of locally linear embedding, patches in LR and HR images form manifolds with similar local intrinsic structure in the corresponding feature space. Every patch in each multispectral (MS) image band is modeled by its K nearest neighbors in the patch set generated from the LR Pan image, and this model can be generalized to the HR condition. Then, the desired HR MS patch is reconstructed from the corresponding neighbors in the HR Pan patch set. Finally, HR MS images are recovered by stitching these patches together. Recognizing that the K nearest neighbors should have local geometric structures similar to the input query patch based on clustering, we employ a dominant orientation algorithm to perform such clustering. The K nearest neighbors of each input LR MS patch are adaptively chosen from the associate subdictionary. Four datasets of images acquired by QuickBird and IKONOS satellites are used to test the performance of the proposed method. Experimental results show that the proposed method performs well in preserving spectral information as well as spatial details. (C) 2014 Society of Photo-Optical Instrumentation Engineers (SPIE)
    No preview · Article · Sep 2014 · Optical Engineering
  • Source

    Full-text · Dataset · Sep 2014

Publication Stats

5k Citations
133.40 Total Impact Points


  • 2005-2015
    • Beihang University(BUAA)
      • • State Key Laboratory for Virtual Reality Technology and Systems
      • • School of Computer Science and Engineering
      Peping, Beijing, China
  • 2011
    • University of Alberta
      • Department of Computing Science
      Edmonton, Alberta, Canada
  • 2010-2011
    • Ecole Centrale de Lyon
      • Laboratoire d'Informatique en Image et Systèmes d'Informations (LIRIS)
      Rhône-Alpes, France
  • 1999-2005
    • Chinese Academy of Sciences
      • • National Pattern Recognition Laboratory
      • • Institute of Automation
      Peping, Beijing, China
  • 2004
    • National Space Science
      Peping, Beijing, China
  • 1997
    • Nanjing University of Science and Technology
      Nan-ching, Jiangsu, China