Yunhong Wang

Beijing University of Aeronautics and Astronautics (Beihang University), Peping, Beijing, China

Are you Yunhong Wang?

Claim your profile

Publications (248)123.93 Total impact

  • Hongyu Yang · Di Huang · Yunhong Wang · Heng Wang · Yuanyan Tang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Face aging simulation has received rising investigations nowadays, whereas it still remains a challenge to generate convincing and natural age-progressed face images. In this paper, we present a novel approach to such an issue by using hidden factor analysis joint sparse representation. In contrast to the majority of tasks in the literature that handle the facial texture integrally, the proposed aging approach separately models the person-specific facial properties that tend to be stable in a relatively long period and the age-specific clues that change gradually over time. It then merely transforms the age component to a target age group via sparse reconstruction, yielding aging effects, which is finally combined with the identity component to achieve the aged face. Experiments are carried out on three aging databases, and the results achieved clearly demonstrate the effectiveness and robustness of the proposed method in rendering a face with aging effects. Additionally, a series of evaluations prove its validity with respect to identity preservation and aging effect generation.
  • Tao Xu · Zhaoxiang Zhang · Yunhong Wang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: In human-centric technologies, skin segmentation of body parts is a prerequisite for high-level processing. The traditional method of skin detection is pixel-wise detection coupled with morphological operations. Pixel-wise methods usually generate a number of false samples and outlier skin pixels, which can make it difficult for morphological operations to provide satisfactory results in complex scenarios. Furthermore, in many cases only a coarse region is required (e.g., the bounding-box of the face) rather than detailed pixel-wise labeling. A patch-wise skin segmentation method is proposed based on deep neural networks. Our method treats image patches as processing units instead of pixels, which directly exploits the spatial information of pixels in the detection stage rather than using morphological operations on isolated pixels after detection. An image patch dataset is built and deep skin models (DSMs) are trained based on the new dataset. Trained DSMs are then integrated into a sliding window framework to segment skin regions of the human body parts. Experiments on standard benchmarks demonstrate that DSMs provide more explicit skin region of interest candidates than pixel-wise methods in complex scenarios, and achieve competitive performance on pixel-wise skin detection.
    Journal of Electronic Imaging 08/2015; 24(4):043009. DOI:10.1117/1.JEI.24.4.043009 · 0.67 Impact Factor
  • Source
    Jiaxin Chen · Zhaoxiang Zhang · Yunhong Wang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Person re-identification aims to match people across non-overlapping camera views, which is an important but challenging task in video surveillance. In order to obtain a robust metric for matching, metric learning has been introduced recently. Most existing works focus on seeking a Mahalanobis distance by employing sparse pairwise constraints, which utilize image pairs with the same person identity as positive samples, and select a small portion of those with different identities as negative samples. However, this training strategy has abandoned a large amount of discriminative information, and ignored the relative similarities. In this paper, we propose a novel Relevance Metric Learning method with Listwise Constraints (RMLLC) by adopting listwise similarities, which consist of the similarity list of each image with respect to all remaining images. By virtue of listwise similarities, RMLLC could capture all pairwise similarities, and consequently learn a more discriminative metric by enforcing the metric to conserve predefined similarity lists in a low dimensional projection subspace. Despite the performance enhancement, RMLLC using predefined similarity lists fails to capture the relative relevance information, which is often unavailable in practice. To address this problem, we further introduce a rectification term to automatically exploit the relative similarities, and develop an efficient alternating iterative algorithm to jointly learn the optimal metric and the rectification term. Extensive experiments on four publicly available benchmarking datasets are carried out and demonstrate that the proposed method is significantly superior to state-of-the-art approaches. The results also show that the introduction of the rectification term could further boost the performance of RMLLC.
    IEEE Transactions on Image Processing 08/2015; 24(12). DOI:10.1109/TIP.2015.2466117 · 3.63 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a fully automatic multimodal 2D + 3D feature-based facial expression recognition approach and demonstrate its performance on the BU-3DFE database. Our approach combines multi-order gradient-based local texture and shape descriptors in order to achieve efficiency and robustness. First, a large set of fiducial facial landmarks of 2D face images along with their 3D face scans are localized using a novel algorithm namely incremental Parallel Cascade of Linear Regression (iPar-CLR). Then, a novel Histogram of Second Order Gradients (HSOG) based local image descriptor in conjunction with the widely used first-order gradient based SIFT descriptor are used to describe the local texture around each 2D landmark. Similarly, the local geometry around each 3D landmark is described by two novel local shape descriptors constructed using the first-order and the second-order surface differential geometry quantities, i.e., Histogram of mesh Gradients (meshHOG) and Histogram of mesh Shape index (curvature quantization, meshHOS). Finally, the Support Vector Machine (SVM) based recognition results of all 2D and 3D descriptors are fused at both feature-level and score-level to further improve the accuracy. Comprehensive experimental results demonstrate that there exist impressive complementary characteristics between the 2D and 3D descriptors. We use the BU-3DFE benchmark to compare our approach to the state-of-the-art ones. Our multimodal feature-based approach outperforms the others by achieving an average recognition accuracy of 86.32%. Moreover, a good generalization ability is shown on the Bosphorus database.
    Computer Vision and Image Understanding 07/2015; 140. DOI:10.1016/j.cviu.2015.07.005 · 1.54 Impact Factor
  • Huibin Li · Di Huang · Jean-Marie Morvan · Yunhong Wang · Liming Chen ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Registration algorithms performed on point clouds or range images of face scans have been successfully used for automatic 3D face recognition under expression variations, but have rarely been investigated to solve pose changes and occlusions mainly since that the basic landmarks to initialize coarse alignment are not always available. Recently, local feature-based SIFT-like matching proves competent to handle all such variations without registration. In this paper, towards 3D face recognition for real-life biometric applications, we significantly extend the SIFT-like matching framework to mesh data and propose a novel approach using fine-grained matching of 3D keypoint descriptors. First, two principal curvature-based 3D keypoint detectors are provided, which can repeatedly identify complementary locations on a face scan where local curvatures are high. Then, a robust 3D local coordinate system is built at each keypoint, which allows extraction of pose-invariant features. Three keypoint descriptors, corresponding to three surface differential quantities, are designed, and their feature-level fusion is employed to comprehensively describe local shapes of detected keypoints. Finally, we propose a multi-task sparse representation based fine-grained matching algorithm, which accounts for the average reconstruction error of probe face descriptors sparsely represented by a large dictionary of gallery descriptors in identification. Our approach is evaluated on the Bosphorus database and achieves rank-one recognition rates of 96.56, 98.82, 91.14, and 99.21 % on the entire database, and the expression, pose, and occlusion subsets, respectively. To the best of our knowledge, these are the best results reported so far on this database. Additionally, good generalization ability is also exhibited by the experiments on the FRGC v2.0 database.
    International Journal of Computer Vision 06/2015; 113(2). DOI:10.1007/s11263-014-0785-6 · 3.81 Impact Factor
  • Zheng Liu · Zhaoxiang Zhang · Qiang Wu · Yunhong Wang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Person re-identification is an important problem for associating behavior of people monitored in surveillance camera networks. The fundamental challenges of person re-identification are the large appearance distortions caused by view angles, illumination and occlusions. To address these challenges, a method is proposed in this paper to enhance person re-identification by integrating gait biometric. The proposed framework consists of the hierarchical feature extraction and descriptor matching with learned metric matrixes. Considering the appearance feature is not discriminative in some cases, the feature in this work composes of the appearance features and the gait feature for shape and temporal information. In order to solve the view-angle change problem and measuring similarity, data are mapped into a metric space so that distances between people can be measured more accurately. Then two fusion strategies are adopted. The score-level fusion computes distances on the appearance feature and the gait feature, respectively, and combine them as the final distance between samples. The feature-level fusion firstly installs two types of features in series and then computes distances by the fused feature. Finally, our method is tested on the CASIA gait dataset. Experiments show that integrating gait biometric is an effective way to enhance person re-identification.
    Neurocomputing 05/2015; 168. DOI:10.1016/j.neucom.2015.05.008 · 2.08 Impact Factor
  • Chunlei Li · Zhaoxiang Zhang · Yunhong Wang · Bin Ma · Di Huang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposed a wavelet quantization based method for robust watermarking with resistance to incidental distortions. For transform domain based watermarking algorithms, blindly localizing adequate significant coefficients is one critical issue to guarantee the robustness while preserving good fidelity. In the proposed method, low frequency wavelet coefficients of the host image are randomly permutated into sub-groups according to a watermarking secret key. Embedding modifications are then distributed to important coefficients which preserve large perceptual capacity by quantizing the significant amplitude difference (SAD). Meanwhile, dither modulation strategy is employed to control the quantization artifacts and increase the robustness. In such a framework, the blind watermark extraction can be straightforwardly achieved with the watermarking secret keys which only shared by the embedder and extractor for advanced security. Numerous comparison experiments are conducted to evaluate the watermarking performance. Experimental results demonstrate the superiority of our scheme on robustness against content-preserving operations and incidental distortions such as JPEG compression, Gaussian noise.
    Neurocomputing 04/2015; 166. DOI:10.1016/j.neucom.2015.03.039 · 2.08 Impact Factor
  • Source
    Dawei Weng · Yunhong Wang · Mingming Gong · Dacheng Tao · Hui Wei · Di Huang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Studies in neuroscience and biological vision have shown that the human retina has strong computational power, and its information representation supports vision tasks on both ventral and dorsal pathways. In this paper, a new local image descriptor, termed Distinctive Efficient Robust Features, or DERF, is derived by modeling the response and distribution properties of the parvocellular-projecting ganglion cells (P-GCs) in the primate retina. DERF features exponential scale distribution, exponential grid structure, and circularly symmetric function Difference of Gaussian (DoG) used as a convolution kernel, all of which are consistent with the characteristics of the ganglion cell array found in neurophysiology, anatomy, and biophysics. In addition, a new explanation for local descriptor design is presented from the perspective of wavelet tight frames. DoG is naturally a wavelet, and the structure of the grid points array in our descriptor is closely related to the spatial sampling of wavelets. The DoG wavelet itself forms a frame, and when we modulate the parameters of our descriptor to make the frame tighter, the performance of the DERF descriptor improves accordingly. This is verified by designing a tight frame DoG (TF-DoG) which leads to much better performance. Extensive experiments conducted in the image matching task on the Multiview Stereo Correspondence Data set demonstrate that DERF outperforms state of the art methods for both hand-crafted and learned descriptors, while remaining robust and being much faster to compute.
    IEEE Transactions on Image Processing 03/2015; 24(8). DOI:10.1109/TIP.2015.2409739 · 3.63 Impact Factor
  • Yunhong Wang · Jiaxin Chen · Ningning Liu · Li Zhang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: In this working note, we mainly focus on the image anno- tation subtask of ImageCLEF 2015 challenge that BUAA-iCC research group participated. For this task, we ¯rstly explore textual similarity information between each test sample and prede¯ned concept. Subse- quently, two di®erent kinds of semantic information are extracted from visual images: visual tags using generic object recognition classi¯ers and visual tags relevant to human being related concepts. For the former information, the visual tags are predicted by using deep convolutional neural network (CNN) and a set of support vector machines trained on ImageNet, and ¯nally transferred to textual information. For the latter visual information, human related concepts are extracted via face and facial attribute detection, and ¯nally transferred to similarity informa- tion by using manually designed mapping rules, in order to enhance the performance of annotating human related concepts. Meanwhile, a late fusion strategy is developed to incorporate aforementioned various kinds of similarity information. Results validate that the combination of the textual and visual similarity information and the adopted late fusion strategy could yield signi¯cantly better performance.
    CLEF (Working Notes); 01/2015
  • Di Huang · Jia Sun · Xudong Yang · Dawei Weng · Yunhong Wang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: In the past decade, research on 3D face analysis has been extensively developed, and this study briefly reviews the progress achieved in data acquisition, algorithms, and experimental methodologies, for the issues of face recognition, facial expression recognition, gender and ethnicity classification, age estimation, etc., especially focusing on that after the availability of FRGC v2.0. It further points out several challenges to deal with for more efficient and reliable systems in the real world.
    Biometric Recognition, 11/2014: pages 1-21;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Most existing pose-independent Face Recognition (FR) tech-niques take advantage of 3D model to guarantee the natural-ness while normalizing or simulating pose variations. Two nontrivial problems to be tackled are accurate measurement of pose parameters and computational efficiency. In this pa-per, we introduce an effective and efficient approach to esti-mate human head pose, which fundamentally ameliorates the performance of 3D aided FR systems. The proposed method works in a progressive way: firstly, a random forest (RF) is constructed utilizing synthesized images derived from 3D models; secondly, the classification result obtained by apply-ing well-trained RF on a probe image is considered as the preliminary pose estimation; finally, this initial pose is trans-ferred to shape-based 3D morphable model (3DMM) aiming at definitive pose normalization. Using such a method, simi-larity scores between frontal view gallery set and pose-nor-malized probe set can be computed to predict the identity. Experimental results achieved on the UHDB dataset outper-form the ones so far reported. Additionally, it is much less time-consuming than prevailing 3DMM based approaches.
    International Conference on Image Processing, Paris; 10/2014
  • Di Huang · Yinhang Tang · Yiding Wang · Liming Chen · Yunhong Wang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: As an emerging biometric for people identification, the dorsal hand vein has received increasing attention in recent years due to the properties of being universal, unique, permanent, and contactless, and especially its simplicity of liveness detection and difficulty of forging. However, the dorsal hand vein is usually captured by near-infrared (NIR) sensors and the resulting image is of low contrast and shows a very sparse subcutaneous vascular network. Therefore, it does not offer sufficient distinctiveness in recognition particularly in the presence of large population. This paper proposes a novel approach to hand-dorsa vein recognition through matching local features of multiple sources. In contrast to current studies only concentrating on the hand vein network, we also make use of person dependent optical characteristics of the skin and subcutaneous tissue revealed by NIR hand-dorsa images and encode geometrical attributes of their landscapes, e.g., ridges, valleys, etc., through different quantities, such as cornerness and blobness, closely related to differential geometry. Specifically, the proposed method adopts an effective keypoint detection strategy to localize features on dorsal hand images, where the speciality of absorption and scattering of the entire dorsal hand is modeled as a combination of multiple (first-, second-, and third-) order gradients. These features comprehensively describe the discriminative clues of each dorsal hand. This method further robustly associates the corresponding keypoints between gallery and probe samples, and finally predicts the identity. Evaluated by extensive experiments, the proposed method achieves the best performance so far known on the North China University of Technology (NCUT) Part A dataset, showing its effectiveness. Additional results on NCUT Part B illustrate its generalization ability and robustness to low quality data.
    Cybernetics, IEEE Transactions on 10/2014; 45(9). DOI:10.1109/TCYB.2014.2360894 · 3.47 Impact Factor
  • Qingjie Liu · Yunhong Wang · Zhaoxiang Zhang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Utilizing an implicit nonparametric learning framework, a neighbor-embedding-based method is proposed to solve the remote-sensing pan-sharpening problem. First, the original high-resolution (HR) and down-sampled panchromatic (Pan) images are used to train the high/low-resolution (LR) patch pair dictionaries. Based on the perspective of locally linear embedding, patches in LR and HR images form manifolds with similar local intrinsic structure in the corresponding feature space. Every patch in each multispectral (MS) image band is modeled by its K nearest neighbors in the patch set generated from the LR Pan image, and this model can be generalized to the HR condition. Then, the desired HR MS patch is reconstructed from the corresponding neighbors in the HR Pan patch set. Finally, HR MS images are recovered by stitching these patches together. Recognizing that the K nearest neighbors should have local geometric structures similar to the input query patch based on clustering, we employ a dominant orientation algorithm to perform such clustering. The K nearest neighbors of each input LR MS patch are adaptively chosen from the associate subdictionary. Four datasets of images acquired by QuickBird and IKONOS satellites are used to test the performance of the proposed method. Experimental results show that the proposed method performs well in preserving spectral information as well as spatial details. (C) 2014 Society of Photo-Optical Instrumentation Engineers (SPIE)
    Optical Engineering 09/2014; 53(9):093109. DOI:10.1117/1.OE.53.9.093109 · 0.95 Impact Factor
  • Source
    Di Huang · W. Ben Soltana · M. Ardabilian · Yunhong Wang · Liming Chen ·

  • Di Huang · Chao Zhu · Yunhong Wang · Liming Chen ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent investigations on human vision discover that the retinal image is a landscape or a geometric surface, consisting of features such as ridges and summits. However, most of existing popular local image descriptors in the literature, e.g., SIFT, HOG, DAISY, LBP, and GLOH, only employ the first order gradient information related to the slope and the elasticity, i.e. length, area, etc. of a surface, and thereby partially characterize the geometric properties of a landscape. In this paper, we introduce a novel and powerful local image descriptor that extracts the Histograms of Second Order Gradients, namely HSOG, to capture the curvature related geometric properties of the neural landscape, i.e., cliffs, ridges, summits, valleys, basins, etc. We conduct comprehensive experiments on three different applications including the problem of local image matching, visual object categorization (VOC), and scene classification. The experimental results clearly evidence the discriminative power of HSOG as compared with its first order gradient based counterparts, e.g., SIFT, HOG, DAISY, and CSLBP, and the complementarity in terms of image representation, demonstrating the effectiveness of the proposed local descriptor.
    IEEE Transactions on Image Processing 09/2014; 23(11). DOI:10.1109/TIP.2014.2353814 · 3.63 Impact Factor
  • Zhaoxiang Zhang · Yunhong Wang · Zeda Zhang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a tensor analysis based method to synthesize an artificial high-resolution (HR) visual light (VIS) face image from a low-resolution (LR) near-infrared (NIR) input image captured from challenging operating environments. As we know, active NIR imaging has been widely employed as viable means to avoid dramatic illumination changes in outdoor circumstances. However, it exhibits discrepant photic properties in comparison with VIS imaging, and the captured images may suffer from limited quality and low resolutions resulted from uncontrolled environments and challenging imaging conditions. Based on the Lambertian reflection model and a linear observation model, we derived the framework of our approach: a tensor structure based super-resolution (SR) method is employed to transform the heterogeneous face data into uniform subspaces and conduct SR in feature space with maximum a posteriori (MAP) estimation; a discrete wavelet transform (DWT) based fusion scheme is adopted to reduce the noise and compensate for the information loss in the tensor transformation. Experiments are conducted on our collected database with JAI AD-080 multispectral camera (AD-080CL JAI, 2007) [1]. Compared to the two state-of-the-art algorithms, KNN (Liu et al., 2005) [2] and LBP-KNN (Chen et al., 2009) [3], our approach shows better robustness to moderate pose and expression variations, and outstanding efficiency in dealing with images of poor quality and low resolutions.
    Neurocomputing 09/2014; 140:146–154. DOI:10.1016/j.neucom.2014.03.028 · 2.08 Impact Factor
  • Bin Ma · Yunhong Wang · Chunlei Li · Zhaoxiang Zhang · Di Huang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: As malicious attacks greatly threaten the security and reliability of biometric systems, ensuring the authenticity of biometric data is becoming increasingly important. In this paper we propose a watermarking-based two-stage authentication framework to address this problem. During data collection, face features are embedded into a fingerprint image of the same individual as data credibility token and secondary authentication source. At the first stage of authentication, the credibility of input data is established by checking the validness of extracted patterns. Due to the specific characteristics of face watermarks, the face detection based classification strategies are introduced for reliable watermark verification instead of conventional correlation based watermark detection. If authentic, the face patterns can further serve as supplemental identity information to facilitate subsequential biometric authentication. In this framework, one critical issue is to guarantee the robustness and capacity of watermark while preserving the discriminating features of host fingerprints. Hence a wavelet quantization based watermarking approach is proposed to adaptively distribute watermark energy on significant DWT coefficients of fingerprint images. Experimental results which evaluate both watermarking and biometric authentication performance demonstrate the effectiveness of this work.
    Multimedia Tools and Applications 09/2014; 72(1). DOI:10.1007/s11042-013-1372-5 · 1.35 Impact Factor
  • Yunhong Wang · Zhaoxiang Zhang · Kaiyue Wang · Haoran Deng · Bin Ma ·
    [Show abstract] [Hide abstract]
    ABSTRACT: We address the problem of on-line signature representation and verification. A novel strategy is proposed in this paper by combining texture based image analysis and spatio-temporal representation. Firstly, a correlation based method is proposed to describe spatio-temporal information between sampling points, which is then converted to traditional 2D intensity images. Secondly, abundant texture analysis methods are adopted to construct effective features for high accuracy verification. Furthermore, a template selection strategy based on intra-class variations is presented to further enhance the performance of signature verification. Extensive experiments are conducted on the SVC2004 database and experimental results demonstrate the inspiring performance of our proposed methods.
    Multimedia Tools and Applications 09/2014; 72(1). DOI:10.1007/s11042-013-1408-x · 1.35 Impact Factor
  • Gaopeng Gou · Di Huang · Yunhong Wang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Video-based face recognition has attracted much attention and made great progress in the past decade. However, it still encounters two main problems, which are efficiently representing faces in frames and sufficiently exploiting temporal??spatial constraints between frames. The authors investigate the existing real-time features for face description, and compare their performance. Moreover, a novel approach is proposed to model temporal??spatial information which is then combined with real-time features to further enforce the consistent constraints between frames to improve the recognition performance. The experiments are validated on three video face databases and the results demonstrate that temporal??spatial cues combined with the most powerful real-time features largely improve the recognition rate.
    IET Computer Vision 08/2014; 8(4):347-357. DOI:10.1049/iet-cvi.2013.0025 · 0.96 Impact Factor
  • Qingjie Liu · Yunhong Wang · Zhaoxiang Zhang · Lining Liu ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Pan-sharpening is a technique which provides an efficient and economical solution to generate multi-spectral (MS) images with high-spatial resolution by fusing spectral information in MS images and spatial information in panchromatic (PAN) image. In this study, the authors propose a new pan-sharpening method based on weighted red-black (WRB) wavelets and adaptive principal component analysis (PCA), where the usage of WRB wavelet decomposition is to extract the spatial details in PAN image and the adaptive PCA is used to select the adequate principal component for injecting spatial details. WRB wavelets are data-dependent second generation wavelets. Multi-resolution analysis (MRA) based on WRB wavelet transform shows a better de-correlation of the data compared with common linear translation-invariant MRA, which makes it suitable for applications requiring manipulating image details. A local processing strategy is introduced to reduce the artefact effects and spectral distortions in the pan-sharpened images. The proposed method is evaluated on the datasets acquired by QuickBird, IKONOS and Landsat-7 ETM + satellites and compared with existing methods. Experimental results demonstrate that the authors method can provide promising fused MS images with high-spatial resolution.
    IET Image Processing 08/2014; 8(8):477-488. DOI:10.1049/iet-ipr.2013.0279 · 0.75 Impact Factor

Publication Stats

5k Citations
123.93 Total Impact Points


  • 2005-2015
    • Beijing University of Aeronautics and Astronautics (Beihang University)
      • • State Key Laboratory for Virtual Reality Technology and Systems
      • • School of Computer Science and Engineering
      Peping, Beijing, China
  • 2011
    • University of Alberta
      Edmonton, Alberta, Canada
  • 2010-2011
    • Ecole Centrale de Lyon
      • Laboratoire d'Informatique en Image et Systèmes d'Informations (LIRIS)
      Rhône-Alpes, France
  • 1999-2005
    • Chinese Academy of Sciences
      • • Institute of Automation
      • • National Pattern Recognition Laboratory
      Peping, Beijing, China
  • 2004
    • National Space Science
      Peping, Beijing, China
  • 1997
    • Nanjing University of Science and Technology
      Nan-ching, Jiangsu, China