Yunhong Wang

Beijing University of Aeronautics and Astronautics (Beihang University), Peping, Beijing, China

Are you Yunhong Wang?

Claim your profile

Publications (243)118.77 Total impact

  • Tao Xu · Zhaoxiang Zhang · Yunhong Wang
    Journal of Electronic Imaging 08/2015; 24(4):043009. DOI:10.1117/1.JEI.24.4.043009 · 0.67 Impact Factor
  • Jiaxin Chen · Zhaoxiang Zhang · Yunhong Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: Person re-identification aims to match people across non-overlapping camera views, which is an important but challenging task in video surveillance. In order to obtain a robust metric for matching, metric learning has been introduced recently. Most existing works focus on seeking a Mahalanobis distance by employing sparse pairwise constraints, which utilize image pairs with the same person identity as positive samples, and select a small portion of those with different identities as negative samples. However, this training strategy has abandoned a large amount of discriminative information, and ignored the relative similarities. In this paper, we propose a novel Relevance Metric Learning method with Listwise Constraints (RMLLC) by adopting listwise similarities, which consist of the similarity list of each image with respect to all remaining images. By virtue of listwise similarities, RMLLC could capture all pairwise similarities, and consequently learn a more discriminative metric by enforcing the metric to conserve predefined similarity lists in a low dimensional projection subspace. Despite the performance enhancement, RMLLC using predefined similarity lists fails to capture the relative relevance information, which is often unavailable in practice. To address this problem, we further introduce a rectification term to automatically exploit the relative similarities, and develop an efficient alternating iterative algorithm to jointly learn the optimal metric and the rectification term. Extensive experiments on four publicly available benchmarking datasets are carried out and demonstrate that the proposed method is significantly superior to state-of-the-art approaches. The results also show that the introduction of the rectification term could further boost the performance of RMLLC.
    IEEE Transactions on Image Processing 08/2015; 24(12). DOI:10.1109/TIP.2015.2466117 · 3.63 Impact Factor
  • Computer Vision and Image Understanding 07/2015; DOI:10.1016/j.cviu.2015.07.005 · 1.54 Impact Factor
  • Huibin Li · Di Huang · Jean-Marie Morvan · Yunhong Wang · Liming Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Registration algorithms performed on point clouds or range images of face scans have been successfully used for automatic 3D face recognition under expression variations, but have rarely been investigated to solve pose changes and occlusions mainly since that the basic landmarks to initialize coarse alignment are not always available. Recently, local feature-based SIFT-like matching proves competent to handle all such variations without registration. In this paper, towards 3D face recognition for real-life biometric applications, we significantly extend the SIFT-like matching framework to mesh data and propose a novel approach using fine-grained matching of 3D keypoint descriptors. First, two principal curvature-based 3D keypoint detectors are provided, which can repeatedly identify complementary locations on a face scan where local curvatures are high. Then, a robust 3D local coordinate system is built at each keypoint, which allows extraction of pose-invariant features. Three keypoint descriptors, corresponding to three surface differential quantities, are designed, and their feature-level fusion is employed to comprehensively describe local shapes of detected keypoints. Finally, we propose a multi-task sparse representation based fine-grained matching algorithm, which accounts for the average reconstruction error of probe face descriptors sparsely represented by a large dictionary of gallery descriptors in identification. Our approach is evaluated on the Bosphorus database and achieves rank-one recognition rates of 96.56, 98.82, 91.14, and 99.21 % on the entire database, and the expression, pose, and occlusion subsets, respectively. To the best of our knowledge, these are the best results reported so far on this database. Additionally, good generalization ability is also exhibited by the experiments on the FRGC v2.0 database.
    International Journal of Computer Vision 06/2015; 113(2). DOI:10.1007/s11263-014-0785-6 · 3.81 Impact Factor
  • Zheng Liu · Zhaoxiang Zhang · Qiang Wu · Yunhong Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: Person re-identification is an important problem for associating behavior of people monitored in surveillance camera networks. The fundamental challenges of person re-identification are the large appearance distortions caused by view angles, illumination and occlusions. To address these challenges, a method is proposed in this paper to enhance person re-identification by integrating gait biometric. The proposed framework consists of the hierarchical feature extraction and descriptor matching with learned metric matrixes. Considering the appearance feature is not discriminative in some cases, the feature in this work composes of the appearance features and the gait feature for shape and temporal information. In order to solve the view-angle change problem and measuring similarity, data are mapped into a metric space so that distances between people can be measured more accurately. Then two fusion strategies are adopted. The score-level fusion computes distances on the appearance feature and the gait feature, respectively, and combine them as the final distance between samples. The feature-level fusion firstly installs two types of features in series and then computes distances by the fused feature. Finally, our method is tested on the CASIA gait dataset. Experiments show that integrating gait biometric is an effective way to enhance person re-identification.
    Neurocomputing 05/2015; 168. DOI:10.1016/j.neucom.2015.05.008 · 2.08 Impact Factor
  • Chunlei Li · Zhaoxiang Zhang · Yunhong Wang · Bin Ma · Di Huang
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposed a wavelet quantization based method for robust watermarking with resistance to incidental distortions. For transform domain based watermarking algorithms, blindly localizing adequate significant coefficients is one critical issue to guarantee the robustness while preserving good fidelity. In the proposed method, low frequency wavelet coefficients of the host image are randomly permutated into sub-groups according to a watermarking secret key. Embedding modifications are then distributed to important coefficients which preserve large perceptual capacity by quantizing the significant amplitude difference (SAD). Meanwhile, dither modulation strategy is employed to control the quantization artifacts and increase the robustness. In such a framework, the blind watermark extraction can be straightforwardly achieved with the watermarking secret keys which only shared by the embedder and extractor for advanced security. Numerous comparison experiments are conducted to evaluate the watermarking performance. Experimental results demonstrate the superiority of our scheme on robustness against content-preserving operations and incidental distortions such as JPEG compression, Gaussian noise.
    Neurocomputing 04/2015; 166. DOI:10.1016/j.neucom.2015.03.039 · 2.08 Impact Factor
  • Source
    Dawei Weng · Yunhong Wang · Mingming Gong · Dacheng Tao · Hui Wei · Di Huang
    [Show abstract] [Hide abstract]
    ABSTRACT: Studies in neuroscience and biological vision have shown that the human retina has strong computational power, and its information representation supports vision tasks on both ventral and dorsal pathways. In this paper, a new local image descriptor, termed Distinctive Efficient Robust Features, or DERF, is derived by modeling the response and distribution properties of the parvocellular-projecting ganglion cells (P-GCs) in the primate retina. DERF features exponential scale distribution, exponential grid structure, and circularly symmetric function Difference of Gaussian (DoG) used as a convolution kernel, all of which are consistent with the characteristics of the ganglion cell array found in neurophysiology, anatomy, and biophysics. In addition, a new explanation for local descriptor design is presented from the perspective of wavelet tight frames. DoG is naturally a wavelet, and the structure of the grid points array in our descriptor is closely related to the spatial sampling of wavelets. The DoG wavelet itself forms a frame, and when we modulate the parameters of our descriptor to make the frame tighter, the performance of the DERF descriptor improves accordingly. This is verified by designing a tight frame DoG (TF-DoG) which leads to much better performance. Extensive experiments conducted in the image matching task on the Multiview Stereo Correspondence Data set demonstrate that DERF outperforms state of the art methods for both hand-crafted and learned descriptors, while remaining robust and being much faster to compute.
    IEEE Transactions on Image Processing 03/2015; 24(8). DOI:10.1109/TIP.2015.2409739 · 3.63 Impact Factor
  • Di Huang · Jia Sun · Xudong Yang · Dawei Weng · Yunhong Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: In the past decade, research on 3D face analysis has been extensively developed, and this study briefly reviews the progress achieved in data acquisition, algorithms, and experimental methodologies, for the issues of face recognition, facial expression recognition, gender and ethnicity classification, age estimation, etc., especially focusing on that after the availability of FRGC v2.0. It further points out several challenges to deal with for more efficient and reliable systems in the real world.
    Biometric Recognition, 11/2014: pages 1-21;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Most existing pose-independent Face Recognition (FR) tech-niques take advantage of 3D model to guarantee the natural-ness while normalizing or simulating pose variations. Two nontrivial problems to be tackled are accurate measurement of pose parameters and computational efficiency. In this pa-per, we introduce an effective and efficient approach to esti-mate human head pose, which fundamentally ameliorates the performance of 3D aided FR systems. The proposed method works in a progressive way: firstly, a random forest (RF) is constructed utilizing synthesized images derived from 3D models; secondly, the classification result obtained by apply-ing well-trained RF on a probe image is considered as the preliminary pose estimation; finally, this initial pose is trans-ferred to shape-based 3D morphable model (3DMM) aiming at definitive pose normalization. Using such a method, simi-larity scores between frontal view gallery set and pose-nor-malized probe set can be computed to predict the identity. Experimental results achieved on the UHDB dataset outper-form the ones so far reported. Additionally, it is much less time-consuming than prevailing 3DMM based approaches.
    International Conference on Image Processing, Paris; 10/2014
  • Di Huang · Yinhang Tang · Yiding Wang · Liming Chen · Yunhong Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: As an emerging biometric for people identification, the dorsal hand vein has received increasing attention in recent years due to the properties of being universal, unique, permanent, and contactless, and especially its simplicity of liveness detection and difficulty of forging. However, the dorsal hand vein is usually captured by near-infrared (NIR) sensors and the resulting image is of low contrast and shows a very sparse subcutaneous vascular network. Therefore, it does not offer sufficient distinctiveness in recognition particularly in the presence of large population. This paper proposes a novel approach to hand-dorsa vein recognition through matching local features of multiple sources. In contrast to current studies only concentrating on the hand vein network, we also make use of person dependent optical characteristics of the skin and subcutaneous tissue revealed by NIR hand-dorsa images and encode geometrical attributes of their landscapes, e.g., ridges, valleys, etc., through different quantities, such as cornerness and blobness, closely related to differential geometry. Specifically, the proposed method adopts an effective keypoint detection strategy to localize features on dorsal hand images, where the speciality of absorption and scattering of the entire dorsal hand is modeled as a combination of multiple (first-, second-, and third-) order gradients. These features comprehensively describe the discriminative clues of each dorsal hand. This method further robustly associates the corresponding keypoints between gallery and probe samples, and finally predicts the identity. Evaluated by extensive experiments, the proposed method achieves the best performance so far known on the North China University of Technology (NCUT) Part A dataset, showing its effectiveness. Additional results on NCUT Part B illustrate its generalization ability and robustness to low quality data.
    Cybernetics, IEEE Transactions on 10/2014; 45(9). DOI:10.1109/TCYB.2014.2360894 · 3.47 Impact Factor
  • Qingjie Liu · Yunhong Wang · Zhaoxiang Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Utilizing an implicit nonparametric learning framework, a neighbor-embedding-based method is proposed to solve the remote-sensing pan-sharpening problem. First, the original high-resolution (HR) and down-sampled panchromatic (Pan) images are used to train the high/low-resolution (LR) patch pair dictionaries. Based on the perspective of locally linear embedding, patches in LR and HR images form manifolds with similar local intrinsic structure in the corresponding feature space. Every patch in each multispectral (MS) image band is modeled by its K nearest neighbors in the patch set generated from the LR Pan image, and this model can be generalized to the HR condition. Then, the desired HR MS patch is reconstructed from the corresponding neighbors in the HR Pan patch set. Finally, HR MS images are recovered by stitching these patches together. Recognizing that the K nearest neighbors should have local geometric structures similar to the input query patch based on clustering, we employ a dominant orientation algorithm to perform such clustering. The K nearest neighbors of each input LR MS patch are adaptively chosen from the associate subdictionary. Four datasets of images acquired by QuickBird and IKONOS satellites are used to test the performance of the proposed method. Experimental results show that the proposed method performs well in preserving spectral information as well as spatial details. (C) 2014 Society of Photo-Optical Instrumentation Engineers (SPIE)
    Optical Engineering 09/2014; 53(9):093109. DOI:10.1117/1.OE.53.9.093109 · 0.95 Impact Factor
  • Source
  • Di Huang · Chao Zhu · Yunhong Wang · Liming Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent investigations on human vision discover that the retinal image is a landscape or a geometric surface, consisting of features such as ridges and summits. However, most of existing popular local image descriptors in the literature, e.g., SIFT, HOG, DAISY, LBP, and GLOH, only employ the first order gradient information related to the slope and the elasticity, i.e. length, area, etc. of a surface, and thereby partially characterize the geometric properties of a landscape. In this paper, we introduce a novel and powerful local image descriptor that extracts the Histograms of Second Order Gradients, namely HSOG, to capture the curvature related geometric properties of the neural landscape, i.e., cliffs, ridges, summits, valleys, basins, etc. We conduct comprehensive experiments on three different applications including the problem of local image matching, visual object categorization (VOC), and scene classification. The experimental results clearly evidence the discriminative power of HSOG as compared with its first order gradient based counterparts, e.g., SIFT, HOG, DAISY, and CSLBP, and the complementarity in terms of image representation, demonstrating the effectiveness of the proposed local descriptor.
    IEEE Transactions on Image Processing 09/2014; 23(11). DOI:10.1109/TIP.2014.2353814 · 3.63 Impact Factor
  • Zhaoxiang Zhang · Yunhong Wang · Zeda Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a tensor analysis based method to synthesize an artificial high-resolution (HR) visual light (VIS) face image from a low-resolution (LR) near-infrared (NIR) input image captured from challenging operating environments. As we know, active NIR imaging has been widely employed as viable means to avoid dramatic illumination changes in outdoor circumstances. However, it exhibits discrepant photic properties in comparison with VIS imaging, and the captured images may suffer from limited quality and low resolutions resulted from uncontrolled environments and challenging imaging conditions. Based on the Lambertian reflection model and a linear observation model, we derived the framework of our approach: a tensor structure based super-resolution (SR) method is employed to transform the heterogeneous face data into uniform subspaces and conduct SR in feature space with maximum a posteriori (MAP) estimation; a discrete wavelet transform (DWT) based fusion scheme is adopted to reduce the noise and compensate for the information loss in the tensor transformation. Experiments are conducted on our collected database with JAI AD-080 multispectral camera (AD-080CL JAI, 2007) [1]. Compared to the two state-of-the-art algorithms, KNN (Liu et al., 2005) [2] and LBP-KNN (Chen et al., 2009) [3], our approach shows better robustness to moderate pose and expression variations, and outstanding efficiency in dealing with images of poor quality and low resolutions.
    Neurocomputing 09/2014; 140:146–154. DOI:10.1016/j.neucom.2014.03.028 · 2.08 Impact Factor
  • Bin Ma · Yunhong Wang · Chunlei Li · Zhaoxiang Zhang · Di Huang
    [Show abstract] [Hide abstract]
    ABSTRACT: As malicious attacks greatly threaten the security and reliability of biometric systems, ensuring the authenticity of biometric data is becoming increasingly important. In this paper we propose a watermarking-based two-stage authentication framework to address this problem. During data collection, face features are embedded into a fingerprint image of the same individual as data credibility token and secondary authentication source. At the first stage of authentication, the credibility of input data is established by checking the validness of extracted patterns. Due to the specific characteristics of face watermarks, the face detection based classification strategies are introduced for reliable watermark verification instead of conventional correlation based watermark detection. If authentic, the face patterns can further serve as supplemental identity information to facilitate subsequential biometric authentication. In this framework, one critical issue is to guarantee the robustness and capacity of watermark while preserving the discriminating features of host fingerprints. Hence a wavelet quantization based watermarking approach is proposed to adaptively distribute watermark energy on significant DWT coefficients of fingerprint images. Experimental results which evaluate both watermarking and biometric authentication performance demonstrate the effectiveness of this work.
    Multimedia Tools and Applications 09/2014; 72(1). DOI:10.1007/s11042-013-1372-5 · 1.35 Impact Factor
  • Yunhong Wang · Zhaoxiang Zhang · Kaiyue Wang · Haoran Deng · Bin Ma
    [Show abstract] [Hide abstract]
    ABSTRACT: We address the problem of on-line signature representation and verification. A novel strategy is proposed in this paper by combining texture based image analysis and spatio-temporal representation. Firstly, a correlation based method is proposed to describe spatio-temporal information between sampling points, which is then converted to traditional 2D intensity images. Secondly, abundant texture analysis methods are adopted to construct effective features for high accuracy verification. Furthermore, a template selection strategy based on intra-class variations is presented to further enhance the performance of signature verification. Extensive experiments are conducted on the SVC2004 database and experimental results demonstrate the inspiring performance of our proposed methods.
    Multimedia Tools and Applications 09/2014; 72(1). DOI:10.1007/s11042-013-1408-x · 1.35 Impact Factor
  • Gaopeng Gou · Di Huang · Yunhong Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: Video-based face recognition has attracted much attention and made great progress in the past decade. However, it still encounters two main problems, which are efficiently representing faces in frames and sufficiently exploiting temporal??spatial constraints between frames. The authors investigate the existing real-time features for face description, and compare their performance. Moreover, a novel approach is proposed to model temporal??spatial information which is then combined with real-time features to further enforce the consistent constraints between frames to improve the recognition performance. The experiments are validated on three video face databases and the results demonstrate that temporal??spatial cues combined with the most powerful real-time features largely improve the recognition rate.
    IET Computer Vision 08/2014; 8(4):347-357. DOI:10.1049/iet-cvi.2013.0025 · 0.96 Impact Factor
  • Qingjie Liu · Yunhong Wang · Zhaoxiang Zhang · Lining Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: Pan-sharpening is a technique which provides an efficient and economical solution to generate multi-spectral (MS) images with high-spatial resolution by fusing spectral information in MS images and spatial information in panchromatic (PAN) image. In this study, the authors propose a new pan-sharpening method based on weighted red-black (WRB) wavelets and adaptive principal component analysis (PCA), where the usage of WRB wavelet decomposition is to extract the spatial details in PAN image and the adaptive PCA is used to select the adequate principal component for injecting spatial details. WRB wavelets are data-dependent second generation wavelets. Multi-resolution analysis (MRA) based on WRB wavelet transform shows a better de-correlation of the data compared with common linear translation-invariant MRA, which makes it suitable for applications requiring manipulating image details. A local processing strategy is introduced to reduce the artefact effects and spectral distortions in the pan-sharpened images. The proposed method is evaluated on the datasets acquired by QuickBird, IKONOS and Landsat-7 ETM + satellites and compared with existing methods. Experimental results demonstrate that the authors method can provide promising fused MS images with high-spatial resolution.
    IET Image Processing 08/2014; 8(8):477-488. DOI:10.1049/iet-ipr.2013.0279 · 0.75 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Gender and ethnicity are both key demographic attributes of human beings and they play a very fundamental and important role in automatic machine based face analysis, therefore, there has been increasing attention for face based gender and ethnicity classification in recent years. In this paper, we present an effective and efficient approach on this issue by combining both boosted local texture and shape features extracted from 3D face models, in contrast to the existing ones that onlydepend on either 2D texture or 3D shape of faces. In order to comprehensively represent the difference between different genders or ethnicities, we propose a novel local descriptor, namely Local Circular Patterns (LCP). LCP improves the widely utilizedLocal Binary Patterns (LBP) and its variants by replacing the binary quantization with a clustering based one, resulting in higher discriminative power as well as better robustness to noise. Meanwhile the following Adaboost based feature selection finds the most discriminative gender- and race-related features and assigns them with different weights to highlight their importance in classification, which not only further raises the performance but reduces the time and memory cost as well. Experimental results achieved on the FRGC v2.0 and BU-3DFE datasets clearly demonstrate the advantages of the proposed method.
    Image and Vision Computing 07/2014; 32(12). DOI:10.1016/j.imavis.2014.06.009 · 1.59 Impact Factor
  • Huibin Li · Di Huang · Jean-Marie Morvan · Liming Chen · Yunhong Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: In the theory of differential geometry, surface normal, as a first order surface differential quantity, determines the orientation of a surface at each point and contains informative local surface shape information. To fully exploit this kind of information for 3D face recognition (FR), this paper proposes a novel highly discriminative facial shape descriptor, namely multi-scale and multi-component local normal patterns (MSMC-LNP). Given a normalized facial range image, three components of normal vectors are first estimated, leading to three normal component images. Then, each normal component image is encoded locally to local normal patterns (LNP) on different scales. To utilize spatial information of facial shape, each normal component image is divided into several patches, and their LNP histograms are computed and concatenated according to the facial configuration. Finally, each original facial surface is represented by a set of LNP histograms including both global and local cues. Moreover, to make the proposed solution robust to the variations of facial expressions, we propose to learn the weight of each local patch on a given encoding scale and normal component image. Based on the learned weights and the weighted LNP histograms, we formulate a weighted sparse representation-based classifier (W-SRC). In contrast to the overwhelming majority of 3D FR approaches which were only benchmarked on the FRGC v2.0 database, we carried out extensive experiments on the FRGC v2.0, Bosphorus, BU-3DFE and 3D-TEC databases, thus including 3D face data captured in different scenarios through various sensors and depicting in particular different challenges with respect to facial expressions. The experimental results show that the proposed approach consistently achieves competitive rank-one recognition rates on these databases despite their heterogeneous nature, and thereby demonstrates its effectiveness and its generalizability.
    Neurocomputing 06/2014; 133:179–193. DOI:10.1016/j.neucom.2013.11.018 · 2.08 Impact Factor

Publication Stats

4k Citations
118.77 Total Impact Points


  • 2005–2015
    • Beijing University of Aeronautics and Astronautics (Beihang University)
      • • State Key Laboratory for Virtual Reality Technology and Systems
      • • School of Computer Science and Engineering
      Peping, Beijing, China
  • 2011
    • French National Centre for Scientific Research
      Lutetia Parisorum, Île-de-France, France
    • University of Alberta
      Edmonton, Alberta, Canada
  • 2010–2011
    • Ecole Centrale de Lyon
      • Laboratoire d'Informatique en Image et Systèmes d'Informations (LIRIS)
      Rhône-Alpes, France
  • 1999–2005
    • Chinese Academy of Sciences
      • • Institute of Automation
      • • National Pattern Recognition Laboratory
      Peping, Beijing, China
  • 2004
    • National Space Science
      Peping, Beijing, China