Yunhong Wang

Beijing University of Aeronautics and Astronautics (Beihang University), Peping, Beijing, China

Are you Yunhong Wang?

Claim your profile

Publications (216)69.97 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Most existing pose-independent Face Recognition (FR) tech-niques take advantage of 3D model to guarantee the natural-ness while normalizing or simulating pose variations. Two nontrivial problems to be tackled are accurate measurement of pose parameters and computational efficiency. In this pa-per, we introduce an effective and efficient approach to esti-mate human head pose, which fundamentally ameliorates the performance of 3D aided FR systems. The proposed method works in a progressive way: firstly, a random forest (RF) is constructed utilizing synthesized images derived from 3D models; secondly, the classification result obtained by apply-ing well-trained RF on a probe image is considered as the preliminary pose estimation; finally, this initial pose is trans-ferred to shape-based 3D morphable model (3DMM) aiming at definitive pose normalization. Using such a method, simi-larity scores between frontal view gallery set and pose-nor-malized probe set can be computed to predict the identity. Experimental results achieved on the UHDB dataset outper-form the ones so far reported. Additionally, it is much less time-consuming than prevailing 3DMM based approaches.
    International Conference on Image Processing, Paris; 10/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: As an emerging biometric for people identification, the dorsal hand vein has received increasing attention in recent years due to the properties of being universal, unique, permanent, and contactless, and especially its simplicity of liveness detection and difficulty of forging. However, the dorsal hand vein is usually captured by near-infrared (NIR) sensors and the resulting image is of low contrast and shows a very sparse subcutaneous vascular network. Therefore, it does not offer sufficient distinctiveness in recognition particularly in the presence of large population. This paper proposes a novel approach to hand-dorsa vein recognition through matching local features of multiple sources. In contrast to current studies only concentrating on the hand vein network, we also make use of person dependent optical characteristics of the skin and subcutaneous tissue revealed by NIR hand-dorsa images and encode geometrical attributes of their landscapes, e.g., ridges, valleys, etc., through different quantities, such as cornerness and blobness, closely related to differential geometry. Specifically, the proposed method adopts an effective keypoint detection strategy to localize features on dorsal hand images, where the speciality of absorption and scattering of the entire dorsal hand is modeled as a combination of multiple (first-, second-, and third-) order gradients. These features comprehensively describe the discriminative clues of each dorsal hand. This method further robustly associates the corresponding keypoints between gallery and probe samples, and finally predicts the identity. Evaluated by extensive experiments, the proposed method achieves the best performance so far known on the North China University of Technology (NCUT) Part A dataset, showing its effectiveness. Additional results on NCUT Part B illustrate its generalization ability and robustness to low quality data.
    IEEE transactions on cybernetics. 10/2014;
  • Source
  • Di Huang, Chao Zhu, Yunhong Wang, Liming Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent investigations on human vision discover that the retinal image is a landscape or a geometric surface, consisting of features such as ridges and summits. However, most of existing popular local image descriptors in the literature, e.g., SIFT, HOG, DAISY, LBP, and GLOH, only employ the first order gradient information related to the slope and the elasticity, i.e. length, area, etc. of a surface, and thereby partially characterize the geometric properties of a landscape. In this paper, we introduce a novel and powerful local image descriptor that extracts the Histograms of Second Order Gradients, namely HSOG, to capture the curvature related geometric properties of the neural landscape, i.e., cliffs, ridges, summits, valleys, basins, etc. We conduct comprehensive experiments on three different applications including the problem of local image matching, visual object categorization (VOC), and scene classification. The experimental results clearly evidence the discriminative power of HSOG as compared with its first order gradient based counterparts, e.g., SIFT, HOG, DAISY, and CSLBP, and the complementarity in terms of image representation, demonstrating the effectiveness of the proposed local descriptor.
    IEEE Transactions on Image Processing 09/2014; · 3.20 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We address the problem of on-line signature representation and verification. A novel strategy is proposed in this paper by combining texture based image analysis and spatio-temporal representation. Firstly, a correlation based method is proposed to describe spatio-temporal information between sampling points, which is then converted to traditional 2D intensity images. Secondly, abundant texture analysis methods are adopted to construct effective features for high accuracy verification. Furthermore, a template selection strategy based on intra-class variations is presented to further enhance the performance of signature verification. Extensive experiments are conducted on the SVC2004 database and experimental results demonstrate the inspiring performance of our proposed methods.
    Multimedia Tools and Applications 09/2014; · 1.01 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: As malicious attacks greatly threaten the security and reliability of biometric systems, ensuring the authenticity of biometric data is becoming increasingly important. In this paper we propose a watermarking-based two-stage authentication framework to address this problem. During data collection, face features are embedded into a fingerprint image of the same individual as data credibility token and secondary authentication source. At the first stage of authentication, the credibility of input data is established by checking the validness of extracted patterns. Due to the specific characteristics of face watermarks, the face detection based classification strategies are introduced for reliable watermark verification instead of conventional correlation based watermark detection. If authentic, the face patterns can further serve as supplemental identity information to facilitate subsequential biometric authentication. In this framework, one critical issue is to guarantee the robustness and capacity of watermark while preserving the discriminating features of host fingerprints. Hence a wavelet quantization based watermarking approach is proposed to adaptively distribute watermark energy on significant DWT coefficients of fingerprint images. Experimental results which evaluate both watermarking and biometric authentication performance demonstrate the effectiveness of this work.
    Multimedia Tools and Applications 09/2014; · 1.01 Impact Factor
  • Zhaoxiang Zhang, Yunhong Wang, Zeda Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a tensor analysis based method to synthesize an artificial high-resolution (HR) visual light (VIS) face image from a low-resolution (LR) near-infrared (NIR) input image captured from challenging operating environments. As we know, active NIR imaging has been widely employed as viable means to avoid dramatic illumination changes in outdoor circumstances. However, it exhibits discrepant photic properties in comparison with VIS imaging, and the captured images may suffer from limited quality and low resolutions resulted from uncontrolled environments and challenging imaging conditions. Based on the Lambertian reflection model and a linear observation model, we derived the framework of our approach: a tensor structure based super-resolution (SR) method is employed to transform the heterogeneous face data into uniform subspaces and conduct SR in feature space with maximum a posteriori (MAP) estimation; a discrete wavelet transform (DWT) based fusion scheme is adopted to reduce the noise and compensate for the information loss in the tensor transformation. Experiments are conducted on our collected database with JAI AD-080 multispectral camera (AD-080CL JAI, 2007) [1]. Compared to the two state-of-the-art algorithms, KNN (Liu et al., 2005) [2] and LBP-KNN (Chen et al., 2009) [3], our approach shows better robustness to moderate pose and expression variations, and outstanding efficiency in dealing with images of poor quality and low resolutions.
    Neurocomputing 09/2014; 140:146–154. · 2.01 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Gender and ethnicity are both key demographic attributes of human beings and they play a very fundamental and important role in automatic machine based face analysis, therefore, there has been increasing attention for face based gender and ethnicity classification in recent years. In this paper, we present an effective and efficient approach on this issue by combining both boosted local texture and shape features extracted from 3D face models, in contrast to the existing ones that onlydepend on either 2D texture or 3D shape of faces. In order to comprehensively represent the difference between different genders or ethnicities, we propose a novel local descriptor, namely Local Circular Patterns (LCP). LCP improves the widely utilizedLocal Binary Patterns (LBP) and its variants by replacing the binary quantization with a clustering based one, resulting in higher discriminative power as well as better robustness to noise. Meanwhile the following Adaboost based feature selection finds the most discriminative gender- and race-related features and assigns them with different weights to highlight their importance in classification, which not only further raises the performance but reduces the time and memory cost as well. Experimental results achieved on the FRGC v2.0 and BU-3DFE datasets clearly demonstrate the advantages of the proposed method.
    Image and Vision Computing 07/2014; · 1.96 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In the theory of differential geometry, surface normal, as a first order surface differential quantity, determines the orientation of a surface at each point and contains informative local surface shape information. To fully exploit this kind of information for 3D face recognition (FR), this paper proposes a novel highly discriminative facial shape descriptor, namely multi-scale and multi-component local normal patterns (MSMC-LNP). Given a normalized facial range image, three components of normal vectors are first estimated, leading to three normal component images. Then, each normal component image is encoded locally to local normal patterns (LNP) on different scales. To utilize spatial information of facial shape, each normal component image is divided into several patches, and their LNP histograms are computed and concatenated according to the facial configuration. Finally, each original facial surface is represented by a set of LNP histograms including both global and local cues. Moreover, to make the proposed solution robust to the variations of facial expressions, we propose to learn the weight of each local patch on a given encoding scale and normal component image. Based on the learned weights and the weighted LNP histograms, we formulate a weighted sparse representation-based classifier (W-SRC). In contrast to the overwhelming majority of 3D FR approaches which were only benchmarked on the FRGC v2.0 database, we carried out extensive experiments on the FRGC v2.0, Bosphorus, BU-3DFE and 3D-TEC databases, thus including 3D face data captured in different scenarios through various sensors and depicting in particular different challenges with respect to facial expressions. The experimental results show that the proposed approach consistently achieves competitive rank-one recognition rates on these databases despite their heterogeneous nature, and thereby demonstrates its effectiveness and its generalizability.
    Neurocomputing 01/2014; 133:179–193. · 2.01 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Depression is a typical mood disorder, and the persons who are often in this state face the risk in mental and even physical problems. In recent years, there has therefore been increasing attention in machine based depression analysis. In such a low mood, both the facial expression and voice of human beings appear different from the ones in normal states. This paper presents a novel method, which comprehensively models visual and vocal modalities, and automatically predicts the scale of depression. On one hand, Motion History Histogram (MHH) extracts the dynamics from corresponding video and audio data to represent characteristics of subtle changes in facial and vocal expression of depression. On the other hand, for each modality, the Partial Least Square (PLS) regression algorithm is applied to learn the relationship between the dynamic features and depression scales using training data, and then predict the depression scale for an unseen one. Predicted values of visual and vocal clues are further combined at decision level for final decision. The proposed approach is evaluated on the AVEC2013 dataset and experimental results clearly highlight its effectiveness and better performance than baseline results provided by the AVEC2013 challenge organiser.
    Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge; 10/2013
  • Maodi Hu, Yunhong Wang, Zhaoxiang Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Considering it is difficult to guarantee that at least one continuous complete gait cycle is captured in real applications, we address the multi-view gait recognition problem with short probe sequences. With unified multi-view population hidden markov models (umvpHMMs), the gait pattern is represented as fixed-length multi-view stances. By incorporating the multi-stance dynamics, the well-known view transformation model (VTM) is extended into a multi-linear projection model in a four-order tensor space, so that a view-independent stance-independent identity vector (VSIV) can be extracted. The main advantage is that the proposed VSIV is stable for each subject regardless of the camera location or the sequence length. Experiments show that our algorithm achieves encouraging performance for cross-view gait recognition even with short probe sequences.
    International Journal of Pattern Recognition and Artificial Intelligence 10/2013; 27(06). · 0.56 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The performance of most gait recognition methods would drop down if the viewpoint of test data is different from the viewpoint of training data. In this paper, we present an idea of estimating the view angle of a test sample in advance so as to compare it with the corresponding training samples with the same or approximate viewpoint. In order to obtain reliable estimation results, the view-sensitive features should be extracted. We propose a novel and effective feature extraction method to characterize the silhouettes from different views. The discrimination power of this representation is also verified through experiments. Afterwards, the robust regression method is employed to estimate the viewpoint of gait. The view angles of test samples from BUAA-IRIP Gait Database are estimated with the regression models learned from CASIA Gait Database. Compared with the ground truth angles, such estimation is satisfactory with a small error level. Therefore, it can provide necessary help for gait application systems when the view angles of test data are uncertain. This point is verified experimentally through integrating the view angle estimation into a gait based gender classification system.
    Multimedia Tools and Applications 08/2013; 65(3). · 1.01 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: For traditional fragile watermarking schemes, isolated-block tamper which will destroy the minutiae of the fingerprint image can hardly be efficiently detected. In this paper, we propose a multi-block dependency based fragile watermarking scheme to overcome this shortcoming. The images are split into image blocks with size of 8 × 8; a 64-bit watermark is generated for each image block, and then equally partitioned into eight parts. Each part of the watermark is embedded into another image block which is selected by the corresponding secret key. Theoretic analysis and experimental results demonstrate that the proposed method not only can detect and localize the isolated-block tamper on fingerprint images with high detection probability and low false detection probability, but also enhances the systematic security obviously.
    Multimedia Tools and Applications 06/2013; 64(3). · 1.01 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Asymmetric 3D-2D face recognition (FR) aims to recognize individuals from 2D face images using textured 3D face models in the gallery (or vice versa). This new FR scenario has the potential to be readily deployable in field applications while still keeping the advantages of 3D FR solutions of being more robust to pose and lighting variations. In this paper, we propose a new experimental protocol based on the UHDB11 dataset for benchmarking 3D-2D FR algorithms. This new experimental protocol allows for the study of the performance of a 3D-2D FR solution under pose and/or lighting variations. Furthermore, we also benchmarked two state of the art 3D2D FR algorithms. One is based on the Annotated Deformable Model (using manually labeled landmarks in this paper) using manually labeled landmarks whereas the other makes use of Oriented Gradient Maps along with an automatic pose estimation through random forest.
    10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG); 04/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a novel local image descriptor for object categorization that extracts the Histograms of the Second Order Gradients and is thereby named as HSOG. The HSOG descriptor is in contrast to the widely used ones in the literature, e.g. SIFT, DAISY, HOG, LBP, etc., which are based on the first order gradient information. The contributions of this work can be summarized as: (1) the design of HSOG; (2) the prove of its discriminative power and its complementation to the first order gradient based descriptors; (3) the analysis of performance variation caused by different parameter settings; and (4) the multi-scale extension which further improves the categorization accuracy. The experimental results achieved on the Caltech 101 and Caltech 256 databases clearly highlight the effectiveness of the proposed approach.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses the problem of tracking and recognizing faces via incremental local sparse representation. We first develop a robust face tracking algorithm based on the local sparse appearance. This sparse representation model exploits both partial and spatial information of the face based on a covariance pooling method. Following in the face recognition stage, with the employment of a novel template update strategy, our recognition algorithm adapts the template to appearance change and reduces the influence of occlusion and illumination variation. In the experiments, we test the quality of face recognition in real-world noisy videos on YouTube database. Our proposed method produces a high face recognition results on over 93% of all videos. The tracking results on challenging videos demonstrate that the proposed tracking algorithm performs favorably against several state-of-the-art methods. On the challenging data set in which faces are undergo occlusion and illumination variation, our proposed method also consistently demonstrates a high recognition rate.
    Image and Graphics (ICIG), 2013 Seventh International Conference on; 01/2013
  • Tao Xu, Yunhong Wang, Zhaoxiang Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Skin colour detection plays an important role in image processing and computer vision. Selection of a suitable colour space is one key issue. The question that which colour space is most appropriate for pixel-wise skin colour detection is not yet concluded. In this study, a pixel-wise skin colour detection method is proposed based on the flexible neural tree (FNT) without considering the problem of selecting a suitable colour space. A FNT-based skin model is constructed by using large skin data sets which identifies the important components of colour spaces automatically. Experimental results show improved accuracy and false positive rates (FPRs). The structure and parameters of FNT are optimised via genetic programming and particle swarm optimisation algorithms, respectively. In the experiments, nine FNT skin models are constructed and evaluated on features extracted from RGB, YCbCr, HSV and CIE-Lab colour spaces. The Compaq and ECU datasets are used for constructing FNT-based skin model and evaluating its performance compared with other skin detection methods. Without extra processing steps, the authors method achieves state of the art performance in skin pixel classification and better performance in terms of accuracy and FPRs.
    IET Image Processing 01/2013; 7(8):751-761. · 0.90 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We address the problem of camera calibration for traffic scene surveillance, which supplies a connection between 2-D image features and 3-D measurement. It is helpful to deal with appearance distortion related to view angles, establish multiview correspondences, and make use of 3-D object models as prior information to enhance surveillance performance. A convenient and practical camera calibration method is proposed in this paper. With the camera height H measured as the only user input, we can recover both intrinsic and extrinsic parameters of the camera based on redundant information supplied by moving objects in monocular videos. All cases of traffic scene layouts are considered and corresponding solutions are given to make our method applicable to almost all kinds of traffic scenes in reality. Numerous experiments are conducted in different scenes, and experimental results demonstrate the accuracy and practicability of our approach. It is shown that our approach can be effectively adopted in all kinds of traffic scene surveillance applications.
    IEEE Transactions on Circuits and Systems for Video Technology 01/2013; 23(3):518-533. · 1.82 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Ethnicity is a key demographic attribute of human beings and it plays a important role in automatic machine based face analysis, therefore, there has been increasing attention for face based ethnicity classification in recent years. In this paper, we propose a novel method on such an issue by combining both boosted local texture and shape features extracted from 3D face models, in contrast to the existing ones that only depend on 2D facial images. The proposed method makes use of the Oriented Gradient Maps (OGMs) to highlight local geometry as well as texture variations of entire faces, while further learns a compact set of features which are highly related to the ethnicity property for classification. Experiments are comprehensively carried out on the FRGC v2.0 dataset, and the performance is up to 98.3% to distinguish Asians from non-Asians when 80% samples are used in the training set, demonstrating the effectiveness of the proposed method.
    Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Biometrics and information hiding, as two different yet promising techniques for individual identification and digital media protection, have been extensively studied in the latest decade. Recently, hybrid approaches that combine these two techniques (i.e., biometric information hiding) for advanced information security have obtained increasing research interest. The principle idea is applying information hiding as data anti-counterfeit tool for secure biometric authentication, or introducing biometrics as identity token to information hiding applications such as: owner identification, traitor tracing. In this paper, we present a brief classification of existing biometric information hiding approaches and discuss the corresponding emerged requirements. Meanwhile, two typical application cases proposed in our previous work, which adopted face and fingerprint as watermarks respectively are further discussed and analysed.
    Signal and Information Processing (ChinaSIP), 2013 IEEE China Summit & International Conference on; 01/2013

Publication Stats

3k Citations
69.97 Total Impact Points


  • 2005–2013
    • Beijing University of Aeronautics and Astronautics (Beihang University)
      • • School of Computer Science and Engineering
      • • State Key Laboratory for Virtual Reality Technology and Systems
      Peping, Beijing, China
  • 2011
    • University of Lyon
      Lyons, Rhône-Alpes, France
  • 2010–2011
    • Ecole Centrale de Lyon
      • Laboratoire d'Informatique en Image et Systèmes d'Informations (LIRIS)
      Rhône-Alpes, France
  • 1999–2007
    • Chinese Academy of Sciences
      • • Institute of Automation
      • • National Pattern Recognition Laboratory
      Peping, Beijing, China
    • Nanjing University of Science and Technology
      Nan-ching, Jiangsu Sheng, China
  • 2002–2005
    • Northeast Institute of Geography and Agroecology
      • • National Pattern Recognition Laboratory
      • • Institute of Automation
      Beijing, Beijing Shi, China
  • 2001
    • Georgia Institute of Technology
      Atlanta, Georgia, United States