About
97
Publications
61,893
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,308
Citations
Introduction
https://sites.google.com/site/huhanhomepage
I focus on the research of unconstrained and heterogeneous face recognition (e.g., sketch to mugshot), 3D face modeling , demographic (attribute) estimation, and soft-biometrics.
05/2015: I joined the Institute of Computing Technology, CAS as an associate professor.
01/2015-04/2015: I was a visiting scholar at Google ATAP, Mountain View, CA.
10/2011-01/2015: I was a Research Associate in the Dept. of Com. Sci. and Eng. at Michigan State University.
Current institution
Additional affiliations
May 2015 - present
Institute of Computing Technology, Chinese Academy of Sciences
Position
- Professor (Associate)
Description
- Computer vision and biometrics on the topics of unconstrained face recognition, heterogeneous face recognition, 3D face modeling, demographic attribute estimation, and soft biometric biometrics.
October 2011 - March 2015
Education
September 2005 - July 2011
September 2001 - July 2005
Publications
Publications (97)
Medical image segmentation is essential for clinical diagnosis, surgical planning, and treatment monitoring. Traditional approaches typically strive to tackle all medical image segmentation scenarios via one-time learning. However, in practical applications, the diversity of scenarios and tasks in medical image segmentation continues to expand, nec...
Facial recognition (FR) technology offers convenience in our daily lives, but it also raises serious privacy issues due to unauthorized FR applications. To protect facial privacy, existing methods have proposed adversarial face examples that can fool FR systems. However, most of these methods work only in the digital domain and do not consider natu...
Although multimodal large language models (MLLMs) have achieved promising results on a wide range of vision-language tasks, their ability to perceive and understand human faces is rarely explored. In this work, we comprehensively evaluate existing MLLMs on face perception tasks. The quantitative results reveal that existing MLLMs struggle to handle...
Facial action unit (AU) recognition is essential for recognizing fine-grained changes in facial expression, while the demand for a large amount of accurately labeled AU data for training purposes has resulted in high labor costs. Nevertheless, massive face images are widely available and inaccurate labels can be easily obtained, especially as large...
Although face analysis has achieved remarkable improvements in the past few years, designing a multi-task face analysis model is still challenging. Most face analysis tasks are studied as separate problems and do not benefit from the synergy among related tasks. In this work, we propose a novel task-adaptive multi-task face analysis method named as...
Medical imaging is a non-invasive method for obtaining internal images of the human body or specific body parts by utilizing physical phenomena such as light, electric fields, magnetic fields, and sound waves. In clinical practice, modalities such as X-ray imaging, computed tomography (CT), magnetic resonance imaging (MRI), and ultrasound are most...
Blood vessel and surgical instrument segmentation is a fundamental technique for robot-assisted surgical navigation. Despite the significant progress in natural image segmentation, surgical image-based vessel and instrument segmentation are rarely studied. In this work, we propose a novel self-supervised pretraining method (SurgNet) that can effect...
Existing studies indicate that deep neural networks (DNNs) can eventually memorize the label noise. We observe that the memorization strength of DNNs towards each instance is different and can be represented by the confidence value, which becomes larger and larger during the training process. Based on this, we propose a Dynamic Instance-specific Se...
Affective behavior analysis has aroused researchers’ attention due to its broad applications. However, it is labor exhaustive to obtain accurate annotations for massive face images. Thus, we propose to utilize the prior facial information via Masked Auto-Encoder (MAE) pretrained on unlabeled face images. Furthermore, we combine MAE pretrained Visio...
Semi-supervised learning (SSL) methods show their powerful performance to deal with the issue of data shortage in the field of medical image segmentation. However, existing SSL methods still suffer from the problem of unreliable predictions on unannotated data due to the lack of manual annotations for them. In this paper, we propose an unreliabilit...
Cross-modality face image synthesis such as sketch-to-photo, NIR-to-RGB, and RGB-to-depth has wide applications in face recognition, face animation, and digital entertainment. Conventional cross-modality synthesis methods usually require paired training data, i.e., each subject has images of both modalities. However, paired data can be difficult to...
Universal Lesion Detection (ULD) in computed tomography plays an essential role in computer-aided diagnosis. Promising ULD results have been reported by multi-slice-input detection approaches which model 3D context from multiple adjacent CT slices, but such methods still experience difficulty in obtaining a global representation among different sli...
Affective behaviour analysis has aroused researchers' attention due to its broad applications. However, it is labor exhaustive to obtain accurate annotations for massive face images. Thus, we propose to utilize the prior facial information via Masked Auto-Encoder (MAE) pretrained on unlabeled face images. Furthermore, we combine MAE pretrained Visi...
Universal Lesion Detection (ULD) in computed tomography plays an essential role in computer-aided diagnosis. Promising ULD results have been reported by multi-slice-input detection approaches which model 3D context from multiple adjacent CT slices, but such methods still experience difficulty in obtaining a global representation among different sli...
In this paper, we propose a new method for remote photoplethysmography (rPPG) based heart rate (HR) estimation. In particular, our proposed method BVPNet is streamlined to predict the blood volume pulse (BVP) signals from face videos. Towards this, we firstly define ROIs based on facial landmarks and then extract the raw temporal signal from each R...
Universal Lesion Detection (ULD) in computed tomography plays an essential role in computer-aided diagnosis. Promising ULD results have been reported by coarse-to-fine two-stage detection approaches, but such two-stage ULD methods still suffer from issues like imbalance of positive v.s. negative anchors during object proposal and insufficient super...
Apathy is characterized by symptoms such as reduced emotional response, lack of motivation, and limited social interaction. Current methods for apathy diagnosis require the patient’s presence in a clinic and time consuming clinical interviews, which are costly and inconvenient for both, patients and clinical staff, hindering among other large-scale...
Occlusions are often present in face images in the wild, e.g., under video surveillance and forensic scenarios. Existing face de-occlusion methods are limited as they require the knowledge of an occlusion mask. To overcome this limitation, we propose in this paper a new generative adversarial network (named OA-GAN) for natural face de-occlusion wit...
Remote physiological measurements, e.g., remote photoplethysmography (rPPG) based heart rate (HR), heart rate variability (HRV) and respiration frequency (RF) measuring, are playing more and more important roles under the application scenarios where contact measurement is inconvenient or impossible. Since the amplitude of the physiological signals...
Remote physiological measurements, e.g., remote photoplethysmography (rPPG) based heart rate (HR), heart rate variability (HRV) and respiration frequency (RF) measuring, are playing more and more important roles under the application scenarios where contact measurement is inconvenient or impossible. Since the amplitude of the physiological signals...
Face presentation attack detection (PAD) is essential for securing the widely used face recognition systems. Most of the existing PAD methods do not generalize well to unseen scenarios because labeled training data of the new domain is usually not available. In light of this, we propose an unsupervised domain adaptation with disentangled representa...
Face presentation attack detection (PAD) has been an urgent problem to be solved in the face recognition systems. Conventional approaches usually assume the testing and training are within the same domain; as a result, they may not generalize well into unseen scenarios because the representations learned for PAD may overfit to the subjects in the t...
Face presentation attack detection (PAD) has been an urgent problem to be solved in the face recognition systems. Conventional approaches usually assume the testing and training are within the same domain; as a result, they may not generalize well into unseen scenarios because the representations learned for PAD may overfit to the subjects in the t...
Remote measurement of physiological signals from videos is an emerging topic. The topic draws great interests, but the lack of publicly available benchmark databases and a fair validation platform are hindering its further development. For this concern, we organize the first challenge on Remote Physiological Signal Sensing (RePSS), in which two dat...
Combined variations containing low-resolution and occlusion often present in face images in the wild, e.g., under the scenario of video surveillance. While most of the existing face image recovery approaches can handle only one type of variation per model, in this work, we propose a deep generative adversarial network (FCSR-GAN) for performing join...
Combined variations containing low-resolution and occlusion often present in face images in the wild, e.g., under the scenario of video surveillance. While most of the existing face image recovery approaches can handle only one type of variation per model, in this work, we propose a deep generative adversarial network (FCSR-GAN) for performing join...
Heart rate (HR) is an important physiological signal that reflects the physical and emotional status of a person. Traditional HR measurements usually rely on contact monitors, which may cause inconvenience and discomfort. Recently, some methods have been proposed for remote HR estimation from face videos; however, most of them focus on well-control...
Facial action units (AUs) recognition is essential for emotion analysis and has been widely applied in mental state analysis. Existing work on AU recognition usually requires big face dataset with AU labels; however, manual AU annotation requires expertise and can be time-consuming. In this work, we propose a semi-supervised approach for AU recogni...
Heart rate (HR) is an important physiological signal that reflects the physical and emotional status of a person. Traditional HR measurements usually rely on contact monitors, which may cause inconvenience and discomfort. Recently, some methods have been proposed for remote HR estimation from face videos; however, most of them focus on well-control...
Face recognition (FR) is being widely used in many applications from access control to smartphone unlock. As a result, face presentation attack detection (PAD) has drawn increasing attentions to secure the FR systems. Traditional approaches for PAD mainly assume that training and testing scenarios are similar in imaging conditions (illu-mination, s...
Face presentation attack detection (PAD) has drawn increasing attentions to secure face recognition (FR) systems which are being widely used in many applications from access control to smartphone unlock. Traditional approaches for PAD may lack good generalization capability into new application scenarios due to the limited number of subjects and da...
In the past few years, great efforts have been devoted to scene text detection. Nevertheless, efficient text detection in the wild remains a challenging problem. Methods for general object detection usually have limitations in handling the arbitrary orientations and large aspect ratios of scene text. In this paper, we present a novel scene text det...
Multi-label classification is an essential problem in image classification, because there are usually multiple related tags associated with each image. However, building a large scale multi-label dataset with clean labels can be very expensive and difficult. Therefore, utilizing a small set of data with verified labels and massive data with noise l...
Heart rate (HR) is an important physiological signal that reflects the physical and emotional activities of humans. Traditional HR measurements are mainly based on contact monitors, which are inconvenient and may cause discomfort for the subjects. Recently, methods have been proposed for remote HR estimation from face videos. However, most of the e...
In this work, we propose an end-to-end approach for robust remote heart rate (HR) measurement gleaned from facial videos. Specifically the approach is based on remote pho-toplethysmography (rPPG), which constitutes a pulse triggered perceivable chromatic variation, sensed in RGB-face videos. Incidentally rPPGs can be get affected in less-constraine...
The explosive growth of digital images in video surveillance and social media has led to the significant need for efficient search of persons of interest in law enforcement and forensic applications. Despite tremendous progress in primary biometric traits (e.g., face and fingerprint) based person identification, a single biometric trait alone can n...
The explosive growth of digital images in video surveillance and social media has led to the significant need for efficient search of persons of interest in law enforcement and forensic applications. Despite tremendous progress in primary biometric traits (e.g., face and fingerprint) based person identification, a single biometric trait alone canno...
Heart rate (HR) is an important physiological signal that reflects the physical and emotional activities of humans. Traditional HR measurements are mainly based on contact monitors, which are inconvenient and may cause discomfort for the subjects. Recently, methods have been proposed for remote HR estimation from face videos. However, most of the e...
In this paper, we propose an automatic engagement prediction method for the Engagement in the Wild sub-challenge of EmotiW 2018. We first design a novel Gaze-AU-Pose (GAP) feature taking into account the information of gaze, action units and head pose of a subject. The GAP feature is then used for the subsequent engagement level prediction. To effi...
RGB-D face recognition (FR) has drawn increasing attention in recent years with the advances of new RGB-D sensing technologies, and the decrease in sensor price. While a number of multi-modality fusion methods are available in face recognition, there is not known conclusion how the RGB and depth should be fused. We provide a comparative study of fo...
Face attribute estimation has many potential applications in video surveillance, face retrieval, and social media. While a number of methods have been proposed for face attribute estimation, most of them did not explicitly consider the attribute correlation and heterogeneity (e.g., ordinal vs. nominal attributes) during feature representation learn...
Action recognition has wide applications from video surveillance , scene understanding to forensic investigation. While recent methods typically focus on a single action recognition from video clips, we investigate the problem of action recognition in crowd, which better repli-cates real video surveillance scenarios. We propose to perform actions r...
With the wide applications of user authentication based on face recognition, face spoof attacks against face recognition systems are drawing increasing attentions. While emerging approaches of face an-tispoofing have been reported in recent years, most of them limit to the non-realistic intra-database testing scenarios instead of the cross-database...
With the wide deployment of face recognition systems in applications from de-duplication to mobile device unlocking,security against face spoofing attacks requires increased attention; such attacks can be easily launched via printed photos, video replays and 3D masks of a face. We address the problem of face spoof detection against print (photo) an...
Demographic estimation entails automatic estimation of age, gender and race of a person from his face image, which has many potential applications ranging from forensics to social media. Automatic demographic estimation, particularly age estimation, remains a challenging problem because persons belonging to the same demographic group can be vastly...
With the wide deployment of face recognition systems in applications from border control to mobile device unlocking, the combat of face spoofing attacks requires increased attention; such attacks can be easily launched via printed photos, video replays and 3D masks. We address the problem of facial spoofing detection against replay attacks based on...
Mobile devices can carry large amounts of personal data, but are often left unsecured. PIN locks are inconvenient to use and thus have seen low adoption (33% of users). While biometrics are beginning to be used for mobile device authentication, they are used only for initial unlock. Mobile devices secured with only login authentication are still vu...
Automatic face recognition is now widely used in applications ranging from deduplication of identity to authentication of mobile payment. This popularity of face recognition has raised concerns about face spoof attacks (also known as biometric sensor presentation attacks), where a photo or video of an authorized person’s face could be used to gain...
Face recognition in surveillance systems is important for security applications, especially in nighttime scenarios when the subject is far away from the camera. However, due to the face image quality degradation caused by large camera standoff and low illuminance, nighttime face recognition at large standoff is challenging. In this paper, we report...
As face recognition applications progress from constrained sensing and cooperative subjects scenarios (e.g., driver’s license and passport photos) to unconstrained scenarios with uncooperative subjects (e.g., video surveillance), new challenges are encountered. These challenges are due to variations in ambient illumination, image resolution, backgr...
Facial composites are widely used by law enforcement agencies to assist in the identification and apprehension of suspects involved in criminal activities. These composites, generated from witness descriptions, are posted in public places and in the media with the hope that some viewers will provide tips about the identity of the suspect. This meth...
Facial composites are widely used by law enforcement agencies to assist in the identification and apprehension of suspects involved in criminal activities. These composites, generated from witness descriptions, are posted in public places and in the media with the hope that some viewers will provide tips about the identity of the suspect. This meth...
Automatic estimation of demographic attributes (e.g., age, gender, and race) from a face image is a topic of growing interest with many potential applications. Most prior work on this topic has used face images acquired under constrained and cooperative scenarios. This paper addresses the more challenging problem of automatic age, gender, and race...
As face recognition applications progress from constrained sensing and cooperative subjects scenarios (e.g., driver’s license and passport photos) to unconstrained scenarios with uncooperative subjects (e.g., video surveillance), new challenges are encountered. These challenges are due to variations in ambient illumination, image resolution, backgr...
In person identification, recognition failure due to variations of illumination is common. In this study, we employed image-processing techniques to tackle this problem. Participants performed recognition and matching tasks where the face stimuli were either original images or computer-processed images in which shading was weakened via a number of...
One of the major challenges encountered by face recognition lies in the difficulty of handling arbitrary poses variations. While different approaches have been developed for face recognition across pose variations, many methods either require manual landmark annotations or assume the face poses to be known. These constraints prevent many face recog...
There has been a growing interest in automatic age estimation from facial images due to a variety of potential applications in law enforcement, security control, and human-computer interaction. However, despite advances in automatic age estimation, it remains a challenging problem. This is because the face aging process is determined not only by in...
Tattoos on human body provide important clue to the identity of a suspect. While a tattoo is not an unique identifier, it narrows down the list of identities for the suspect. For these reasons, law enforcement agencies have been collecting tattoo images of the suspects at the time of booking. A few successful attempts have been made to design an au...
Facial sketches are widely used by law enforcement agencies to assist in the identification and apprehension of suspects involved in criminal activities. Sketches used in forensic investigations are either drawn by forensic artists (forensic sketches) or created with computer software (composite sketches) following the verbal description provided b...
Illumination preprocessing is an effective and efficient approach in handling lighting variations for face recognition. Despite much attention to face illumination preprocessing, there is seldom systemic comparative study on existing approaches that presents fascinating insights and conclusions in how to design better illumination preprocessing met...
Lighting normalization is a kind of widely used approach for achieving illumination invariant face recognition. Lighting normalization approaches try to regularize various lighting conditions in different face images into ideal illumination before face recognition. However, many existing methods perform lighting normalization by treating face image...
The problem of automatically matching composite sketches to facial photographs is addressed in this paper. Previous research on sketch recognition focused on matching sketches drawn by professional artists who either looked directly at the subjects (viewed sketches) or used a verbal description of the subject's appearance as provided by an eyewitne...
When a crime occurs and a facial photograph of the suspect is not available (from a surveillance camera or a mobile phone), law enforcement agencies often use a facial sketch to help identify and capture the suspect. Typically, the facial sketch of a suspect is released to the public via news-papers and television so that citizens can identify the...
In the last decade, some illumination preprocessing approaches were proposed to eliminate the lighting variation in face images for lighting-invariant face recognition. However, we find surprisingly that existing preprocessing methods were seldom modeled to directly enhance the separability of different faces, which should have been the essential g...
There is growing interest in achieving age invariant face recognition due to its wide applications in law enforcement. The challenge lies in that face aging is quite a complicated process, which involves both intrinsic and extrinsic factors. Face aging also influences individual facial components (such as the mouth, eyes, and nose) differently. We...
3D face modeling from 2D face images is of significant importance for face analysis, animation and recognition. Previous research on this topic mainly focused on 3D face modeling from a single 2D face image; however, a single face image can only provide a limited description of a 3D face. In many applications, for example, law enforcement, multi-vi...
Illumination variation is one of intractable yet crucial problems in face recognition and many lighting normalization approaches
have been proposed in the past decades. Nevertheless, most of them preprocess all the face images in the same way thus without
considering the specific lighting in each face image. In this paper, we propose a lighting awa...
Illumination variation has been one of the most intractable problems in face recognition and many approaches have been proposed to handle illumination problem in the last decades of years. The key problem is how to get stable similarity measurements between two face images of the same individual but captured under dramatically different lighting co...
Today's camera sensors usually have a high gray-scale resolution, e.g. 256, however, due to the dramatic lighting variations, the gray-scales distributed to the face region might be far less than 256. Therefore, besides low spatial resolution, a practical face recognition system must also handle degraded face images of low gray-scale resolution (LG...
In this paper, we propose a novel homomorphic wavelet filtering based illumination transfer technique to change the dominant lighting of one face image (source face image) to another face image (reference face image ). Specifically, in the proposed method, based on the ldquoreflectance-illuminationrdquo imaging model, we first obtain an approximate...