Yunhong Wang

Beijing University of Aeronautics and Astronautics (Beihang University), Peping, Beijing, China

Are you Yunhong Wang?

Claim your profile

Publications (219)77.89 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Most existing pose-independent Face Recognition (FR) tech-niques take advantage of 3D model to guarantee the natural-ness while normalizing or simulating pose variations. Two nontrivial problems to be tackled are accurate measurement of pose parameters and computational efficiency. In this pa-per, we introduce an effective and efficient approach to esti-mate human head pose, which fundamentally ameliorates the performance of 3D aided FR systems. The proposed method works in a progressive way: firstly, a random forest (RF) is constructed utilizing synthesized images derived from 3D models; secondly, the classification result obtained by apply-ing well-trained RF on a probe image is considered as the preliminary pose estimation; finally, this initial pose is trans-ferred to shape-based 3D morphable model (3DMM) aiming at definitive pose normalization. Using such a method, simi-larity scores between frontal view gallery set and pose-nor-malized probe set can be computed to predict the identity. Experimental results achieved on the UHDB dataset outper-form the ones so far reported. Additionally, it is much less time-consuming than prevailing 3DMM based approaches.
    International Conference on Image Processing, Paris; 10/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: As an emerging biometric for people identification, the dorsal hand vein has received increasing attention in recent years due to the properties of being universal, unique, permanent, and contactless, and especially its simplicity of liveness detection and difficulty of forging. However, the dorsal hand vein is usually captured by near-infrared (NIR) sensors and the resulting image is of low contrast and shows a very sparse subcutaneous vascular network. Therefore, it does not offer sufficient distinctiveness in recognition particularly in the presence of large population. This paper proposes a novel approach to hand-dorsa vein recognition through matching local features of multiple sources. In contrast to current studies only concentrating on the hand vein network, we also make use of person dependent optical characteristics of the skin and subcutaneous tissue revealed by NIR hand-dorsa images and encode geometrical attributes of their landscapes, e.g., ridges, valleys, etc., through different quantities, such as cornerness and blobness, closely related to differential geometry. Specifically, the proposed method adopts an effective keypoint detection strategy to localize features on dorsal hand images, where the speciality of absorption and scattering of the entire dorsal hand is modeled as a combination of multiple (first-, second-, and third-) order gradients. These features comprehensively describe the discriminative clues of each dorsal hand. This method further robustly associates the corresponding keypoints between gallery and probe samples, and finally predicts the identity. Evaluated by extensive experiments, the proposed method achieves the best performance so far known on the North China University of Technology (NCUT) Part A dataset, showing its effectiveness. Additional results on NCUT Part B illustrate its generalization ability and robustness to low quality data.
    IEEE transactions on cybernetics. 10/2014;
  • Source
  • Di Huang, Chao Zhu, Yunhong Wang, Liming Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent investigations on human vision discover that the retinal image is a landscape or a geometric surface, consisting of features such as ridges and summits. However, most of existing popular local image descriptors in the literature, e.g., SIFT, HOG, DAISY, LBP, and GLOH, only employ the first order gradient information related to the slope and the elasticity, i.e. length, area, etc. of a surface, and thereby partially characterize the geometric properties of a landscape. In this paper, we introduce a novel and powerful local image descriptor that extracts the Histograms of Second Order Gradients, namely HSOG, to capture the curvature related geometric properties of the neural landscape, i.e., cliffs, ridges, summits, valleys, basins, etc. We conduct comprehensive experiments on three different applications including the problem of local image matching, visual object categorization (VOC), and scene classification. The experimental results clearly evidence the discriminative power of HSOG as compared with its first order gradient based counterparts, e.g., SIFT, HOG, DAISY, and CSLBP, and the complementarity in terms of image representation, demonstrating the effectiveness of the proposed local descriptor.
    IEEE Transactions on Image Processing 09/2014; · 3.11 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We address the problem of on-line signature representation and verification. A novel strategy is proposed in this paper by combining texture based image analysis and spatio-temporal representation. Firstly, a correlation based method is proposed to describe spatio-temporal information between sampling points, which is then converted to traditional 2D intensity images. Secondly, abundant texture analysis methods are adopted to construct effective features for high accuracy verification. Furthermore, a template selection strategy based on intra-class variations is presented to further enhance the performance of signature verification. Extensive experiments are conducted on the SVC2004 database and experimental results demonstrate the inspiring performance of our proposed methods.
    Multimedia Tools and Applications 09/2014; 72(1). · 1.06 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: As malicious attacks greatly threaten the security and reliability of biometric systems, ensuring the authenticity of biometric data is becoming increasingly important. In this paper we propose a watermarking-based two-stage authentication framework to address this problem. During data collection, face features are embedded into a fingerprint image of the same individual as data credibility token and secondary authentication source. At the first stage of authentication, the credibility of input data is established by checking the validness of extracted patterns. Due to the specific characteristics of face watermarks, the face detection based classification strategies are introduced for reliable watermark verification instead of conventional correlation based watermark detection. If authentic, the face patterns can further serve as supplemental identity information to facilitate subsequential biometric authentication. In this framework, one critical issue is to guarantee the robustness and capacity of watermark while preserving the discriminating features of host fingerprints. Hence a wavelet quantization based watermarking approach is proposed to adaptively distribute watermark energy on significant DWT coefficients of fingerprint images. Experimental results which evaluate both watermarking and biometric authentication performance demonstrate the effectiveness of this work.
    Multimedia Tools and Applications 09/2014; · 1.06 Impact Factor
  • Zhaoxiang Zhang, Yunhong Wang, Zeda Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a tensor analysis based method to synthesize an artificial high-resolution (HR) visual light (VIS) face image from a low-resolution (LR) near-infrared (NIR) input image captured from challenging operating environments. As we know, active NIR imaging has been widely employed as viable means to avoid dramatic illumination changes in outdoor circumstances. However, it exhibits discrepant photic properties in comparison with VIS imaging, and the captured images may suffer from limited quality and low resolutions resulted from uncontrolled environments and challenging imaging conditions. Based on the Lambertian reflection model and a linear observation model, we derived the framework of our approach: a tensor structure based super-resolution (SR) method is employed to transform the heterogeneous face data into uniform subspaces and conduct SR in feature space with maximum a posteriori (MAP) estimation; a discrete wavelet transform (DWT) based fusion scheme is adopted to reduce the noise and compensate for the information loss in the tensor transformation. Experiments are conducted on our collected database with JAI AD-080 multispectral camera (AD-080CL JAI, 2007) [1]. Compared to the two state-of-the-art algorithms, KNN (Liu et al., 2005) [2] and LBP-KNN (Chen et al., 2009) [3], our approach shows better robustness to moderate pose and expression variations, and outstanding efficiency in dealing with images of poor quality and low resolutions.
    Neurocomputing 09/2014; 140:146–154. · 2.01 Impact Factor
  • Gaopeng Gou, Di Huang, Yunhong Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: Video-based face recognition has attracted much attention and made great progress in the past decade. However, it still encounters two main problems, which are efficiently representing faces in frames and sufficiently exploiting temporal??spatial constraints between frames. The authors investigate the existing real-time features for face description, and compare their performance. Moreover, a novel approach is proposed to model temporal??spatial information which is then combined with real-time features to further enforce the consistent constraints between frames to improve the recognition performance. The experiments are validated on three video face databases and the results demonstrate that temporal??spatial cues combined with the most powerful real-time features largely improve the recognition rate.
    IET Computer Vision 08/2014; 8(4):347-357. · 0.76 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Pan-sharpening is a technique which provides an efficient and economical solution to generate multi-spectral (MS) images with high-spatial resolution by fusing spectral information in MS images and spatial information in panchromatic (PAN) image. In this study, the authors propose a new pan-sharpening method based on weighted red-black (WRB) wavelets and adaptive principal component analysis (PCA), where the usage of WRB wavelet decomposition is to extract the spatial details in PAN image and the adaptive PCA is used to select the adequate principal component for injecting spatial details. WRB wavelets are data-dependent second generation wavelets. Multi-resolution analysis (MRA) based on WRB wavelet transform shows a better de-correlation of the data compared with common linear translation-invariant MRA, which makes it suitable for applications requiring manipulating image details. A local processing strategy is introduced to reduce the artefact effects and spectral distortions in the pan-sharpened images. The proposed method is evaluated on the datasets acquired by QuickBird, IKONOS and Landsat-7 ETM + satellites and compared with existing methods. Experimental results demonstrate that the authors method can provide promising fused MS images with high-spatial resolution.
    IET Image Processing 08/2014; 8(8):477-488. · 0.68 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Gender and ethnicity are both key demographic attributes of human beings and they play a very fundamental and important role in automatic machine based face analysis, therefore, there has been increasing attention for face based gender and ethnicity classification in recent years. In this paper, we present an effective and efficient approach on this issue by combining both boosted local texture and shape features extracted from 3D face models, in contrast to the existing ones that onlydepend on either 2D texture or 3D shape of faces. In order to comprehensively represent the difference between different genders or ethnicities, we propose a novel local descriptor, namely Local Circular Patterns (LCP). LCP improves the widely utilizedLocal Binary Patterns (LBP) and its variants by replacing the binary quantization with a clustering based one, resulting in higher discriminative power as well as better robustness to noise. Meanwhile the following Adaboost based feature selection finds the most discriminative gender- and race-related features and assigns them with different weights to highlight their importance in classification, which not only further raises the performance but reduces the time and memory cost as well. Experimental results achieved on the FRGC v2.0 and BU-3DFE datasets clearly demonstrate the advantages of the proposed method.
    Image and Vision Computing 07/2014; 32(12). · 1.58 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In the theory of differential geometry, surface normal, as a first order surface differential quantity, determines the orientation of a surface at each point and contains informative local surface shape information. To fully exploit this kind of information for 3D face recognition (FR), this paper proposes a novel highly discriminative facial shape descriptor, namely multi-scale and multi-component local normal patterns (MSMC-LNP). Given a normalized facial range image, three components of normal vectors are first estimated, leading to three normal component images. Then, each normal component image is encoded locally to local normal patterns (LNP) on different scales. To utilize spatial information of facial shape, each normal component image is divided into several patches, and their LNP histograms are computed and concatenated according to the facial configuration. Finally, each original facial surface is represented by a set of LNP histograms including both global and local cues. Moreover, to make the proposed solution robust to the variations of facial expressions, we propose to learn the weight of each local patch on a given encoding scale and normal component image. Based on the learned weights and the weighted LNP histograms, we formulate a weighted sparse representation-based classifier (W-SRC). In contrast to the overwhelming majority of 3D FR approaches which were only benchmarked on the FRGC v2.0 database, we carried out extensive experiments on the FRGC v2.0, Bosphorus, BU-3DFE and 3D-TEC databases, thus including 3D face data captured in different scenarios through various sensors and depicting in particular different challenges with respect to facial expressions. The experimental results show that the proposed approach consistently achieves competitive rank-one recognition rates on these databases despite their heterogeneous nature, and thereby demonstrates its effectiveness and its generalizability.
    Neurocomputing 06/2014; 133:179–193. · 2.01 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Existing methods for multi-view gait-based identification mainly focus on transforming the features of one view to the features of another view, which is technically sound but has limited practical utility. In this paper, we propose a view-invariant discriminative projection (ViDP) method, to improve the discriminative ability of multi-view gait features by a unitary linear projection. It is implemented by iteratively learning the low dimensional geometry and finding the optimal projection according to the geometry. By virtue of ViDP, the multi-view gait features can be directly matched without knowing or estimating the viewing angles. The ViDP feature projected from gait energy image achieves promising performance in the experiments of multi-view gait-based identification. We suggest that it is possible to construct a gait-based identification system for arbitrary probe views, by incorporating the information of gallery data with sufficient viewing angles. In addition, ViDP performs even better than the state-of-the-art view transformation methods, which are trained for the combination of gallery and probe viewing angles in every evaluation.
    IEEE Transactions on Information Forensics and Security 12/2013; 8(12):2034-2045. · 2.07 Impact Factor
  • Tao Xu, Yunhong Wang, Zhaoxiang Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Skin colour detection plays an important role in image processing and computer vision. Selection of a suitable colour space is one key issue. The question that which colour space is most appropriate for pixel-wise skin colour detection is not yet concluded. In this study, a pixel-wise skin colour detection method is proposed based on the flexible neural tree (FNT) without considering the problem of selecting a suitable colour space. A FNT-based skin model is constructed by using large skin data sets which identifies the important components of colour spaces automatically. Experimental results show improved accuracy and false positive rates (FPRs). The structure and parameters of FNT are optimised via genetic programming and particle swarm optimisation algorithms, respectively. In the experiments, nine FNT skin models are constructed and evaluated on features extracted from RGB, YCbCr, HSV and CIE-Lab colour spaces. The Compaq and ECU datasets are used for constructing FNT-based skin model and evaluating its performance compared with other skin detection methods. Without extra processing steps, the authors method achieves state of the art performance in skin pixel classification and better performance in terms of accuracy and FPRs.
    IET Image Processing 11/2013; 7(8):751-761. · 0.68 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Depression is a typical mood disorder, and the persons who are often in this state face the risk in mental and even physical problems. In recent years, there has therefore been increasing attention in machine based depression analysis. In such a low mood, both the facial expression and voice of human beings appear different from the ones in normal states. This paper presents a novel method, which comprehensively models visual and vocal modalities, and automatically predicts the scale of depression. On one hand, Motion History Histogram (MHH) extracts the dynamics from corresponding video and audio data to represent characteristics of subtle changes in facial and vocal expression of depression. On the other hand, for each modality, the Partial Least Square (PLS) regression algorithm is applied to learn the relationship between the dynamic features and depression scales using training data, and then predict the depression scale for an unseen one. Predicted values of visual and vocal clues are further combined at decision level for final decision. The proposed approach is evaluated on the AVEC2013 dataset and experimental results clearly highlight its effectiveness and better performance than baseline results provided by the AVEC2013 challenge organiser.
    Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge; 10/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Automatic object classification is an important issue in traffic scene surveillance. Appearance variation due to perspective distortion is one of the most difficult problems for moving object detection, tracking, and recognition. We propose an active transfer learning approach to bridge the gap between appearance variations under two different scenes. Only a small number of training samples are required in the target scene, which can be combined with transferred samples of the source scene to achieve a reliable object classifier in the target scene, and active learning strategy makes the algorithm more efficient. Abundant experiments are conducted and experimental results demonstrate the effectiveness and convenience of our approach.
    IEEE Transactions on Information Forensics and Security 10/2013; 8(10):1632-1641. · 2.07 Impact Factor
  • Maodi Hu, Yunhong Wang, Zhaoxiang Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Considering it is difficult to guarantee that at least one continuous complete gait cycle is captured in real applications, we address the multi-view gait recognition problem with short probe sequences. With unified multi-view population hidden markov models (umvpHMMs), the gait pattern is represented as fixed-length multi-view stances. By incorporating the multi-stance dynamics, the well-known view transformation model (VTM) is extended into a multi-linear projection model in a four-order tensor space, so that a view-independent stance-independent identity vector (VSIV) can be extracted. The main advantage is that the proposed VSIV is stable for each subject regardless of the camera location or the sequence length. Experiments show that our algorithm achieves encouraging performance for cross-view gait recognition even with short probe sequences.
    International Journal of Pattern Recognition and Artificial Intelligence 10/2013; 27(06). · 0.56 Impact Factor
  • Meng Liang, Zhaoxiang Zhang, Yunhong Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: Object classification in traffic scene surveillance has attracted much attention recent years. Traditional classification methods need lots of labeled samples to build a satisfying classifier. However, the acquisition of the labeled samples may cost lots of time and human labor. In this paper, we propose an label-propagation based semi-supervised learning method which uses the information of both labeled and un-labeled samples. Experiment results show that our method outperforms the traditional methods both in accuracy and robustness.
    2013 20th IEEE International Conference on Image Processing (ICIP); 09/2013
  • Jie Qin, Zhaoxiang Zhang, Yunhong Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: Human action recognition is a hot topic in computer vision field. Various applicable approaches have been proposed to recognize different types of actions. However, the recognition performance deteriorates rapidly when the viewpoint changes. Traditional approaches aim to address the problem by inductive transfer learning, in which target-view samples are manually labeled. In this paper, we present a novel approach for cross-view action recognition based on transductive transfer learning. We address the problem by transferring instances across views. In our settings, both labels of examples from the target view and the corresponding relation between examples from pairwise views are dispensable. Experimental results on the IXMAS multi-view data set demonstrate the effectiveness of our approach, and are comparable to the state of the art.
    2013 20th IEEE International Conference on Image Processing (ICIP); 09/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: The performance of most gait recognition methods would drop down if the viewpoint of test data is different from the viewpoint of training data. In this paper, we present an idea of estimating the view angle of a test sample in advance so as to compare it with the corresponding training samples with the same or approximate viewpoint. In order to obtain reliable estimation results, the view-sensitive features should be extracted. We propose a novel and effective feature extraction method to characterize the silhouettes from different views. The discrimination power of this representation is also verified through experiments. Afterwards, the robust regression method is employed to estimate the viewpoint of gait. The view angles of test samples from BUAA-IRIP Gait Database are estimated with the regression models learned from CASIA Gait Database. Compared with the ground truth angles, such estimation is satisfactory with a small error level. Therefore, it can provide necessary help for gait application systems when the view angles of test data are uncertain. This point is verified experimentally through integrating the view angle estimation into a gait based gender classification system.
    Multimedia Tools and Applications 08/2013; 65(3). · 1.06 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: For traditional fragile watermarking schemes, isolated-block tamper which will destroy the minutiae of the fingerprint image can hardly be efficiently detected. In this paper, we propose a multi-block dependency based fragile watermarking scheme to overcome this shortcoming. The images are split into image blocks with size of 8 × 8; a 64-bit watermark is generated for each image block, and then equally partitioned into eight parts. Each part of the watermark is embedded into another image block which is selected by the corresponding secret key. Theoretic analysis and experimental results demonstrate that the proposed method not only can detect and localize the isolated-block tamper on fingerprint images with high detection probability and low false detection probability, but also enhances the systematic security obviously.
    Multimedia Tools and Applications 06/2013; 64(3). · 1.06 Impact Factor

Publication Stats

3k Citations
77.89 Total Impact Points


  • 2005–2014
    • Beijing University of Aeronautics and Astronautics (Beihang University)
      • School of Computer Science and Engineering
      Peping, Beijing, China
  • 2011
    • University of Lyon
      Lyons, Rhône-Alpes, France
    • University of Alberta
      Edmonton, Alberta, Canada
    • French National Centre for Scientific Research
      Lutetia Parisorum, Île-de-France, France
  • 2010–2011
    • Ecole Centrale de Lyon
      • Laboratoire d'Informatique en Image et Systèmes d'Informations (LIRIS)
      Rhône-Alpes, France
  • 1999–2007
    • Chinese Academy of Sciences
      • • Institute of Automation
      • • National Pattern Recognition Laboratory
      Peping, Beijing, China
  • 2002–2005
    • Northeast Institute of Geography and Agroecology
      • • National Pattern Recognition Laboratory
      • • Institute of Automation
      Beijing, Beijing Shi, China
  • 2004
    • The Hong Kong University of Science and Technology
      • Department of Computer Science and Engineering
      Chiu-lung, Kowloon City, Hong Kong
    • National Space Science
      Peping, Beijing, China