Conference Paper

Mitigating Bias in Face Recognition Using Skewness-Aware Reinforcement Learning

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... skewed error ratio (SER) and standard deviations (STD). Our results on RFW, which are aligned with previous works [25,44,[61][62][63], report demographic bias in deep learning-based FR models. ...
... , where g represents the demographic group, as reported in [61,63]. A higher STD value indicates more bias across demographic groups and vice versa. ...
... This can be observed from the high STD values (far from optimal zero) and SER values that are far from 1. These observations are aligned with the previous works [25,44,[61][62][63], reporting bias in FR when trained/finetuned on unbalanced datasets. Potential causes of this bias [67] link to the unbalanced training datasets, model's sensitivity to skin color [39] and/or hairstyles [49].The racial distribution of the current FR datasets [27,66,75] used in the literature is not balanced with a majority of identities being Caucasians [61][62][63]. ...
... The underlying bias issues of neural networks, involved in the aforementioned examples, lead to important discussions [5,6,10,11,12,13,14,15,16,17,18,19]. Specifically, these aforementioned examples highlight the presence of two distinct prevalent types of biases. ...
... I, the current literature often ambiguously groups them under the general term "bias" (e.g., dataset bias, algorithmic bias, sex bias, or racial bias) [14,20,21] and interpret them differently across scenarios. Furthermore, numerous works addressing one type of bias inadvertently cite the other as their motivation [11,12,21]. Additionally, the taxonomy of bias issues in existing survey papers may not sufficiently distinguish between them or explicitly acknowledge their differences [22,23,24]. ...
... • "Racial bias indeed degrades the fairness of recognition system and the error rates on non-Caucasians are usually much higher than Caucasians." [12] • "A certain demographic group can be better recognized than other groups." [13] • "Recognition accuracies depend on demographic cohort." ...
Preprint
Full-text available
Bias issues of neural networks garner significant attention along with its promising advancement. Among various bias issues, mitigating two predominant biases is crucial in advancing fair and trustworthy AI: (1) ensuring neural networks yields even performance across demographic groups, and (2) ensuring algorithmic decision-making does not rely on protected attributes. However, upon the investigation of \pc papers in the relevant literature, we find that there exists a persistent, extensive but under-explored confusion regarding these two types of biases. Furthermore, the confusion has already significantly hampered the clarity of the community and subsequent development of debiasing methodologies. Thus, in this work, we aim to restore clarity by providing two mathematical definitions for these two predominant biases and leveraging these definitions to unify a comprehensive list of papers. Next, we highlight the common phenomena and the possible reasons for the existing confusion. To alleviate the confusion, we provide extensive experiments on synthetic, census, and image datasets, to validate the distinct nature of these biases, distinguish their different real-world manifestations, and evaluate the effectiveness of a comprehensive list of bias assessment metrics in assessing the mitigation of these biases. Further, we compare these two types of biases from multiple dimensions including the underlying causes, debiasing methods, evaluation protocol, prevalent datasets, and future directions. Last, we provide several suggestions aiming to guide researchers engaged in bias-related work to avoid confusion and further enhance clarity in the community.
... The RFW dataset has been widely adopted by the research community to evaluate and compare the performance of FR algorithms across different racial groups. • BUPT-BalancedFace [53]: The BUPT-BalancedFace dataset was constructed to address demographic bias by ensuring race balance across the dataset. It contains approximately 1.3 million images from 28,000 celebrities, with a balanced distribution of 7,000 identities per race. ...
... Several studies have evaluated bias and fairness in FR systems using the standard deviation of performance metrics calculated across demographic groups. These metrics include FMR, FNMR, and True Match Rate (TMR)-where higher value in standard deviation corresponds to greater demographic disparities [32], [53], [57], [72]- [74]. Another significant metric is the Skewed Error Ratio (SER), which specifically focuses on worst-case error ratios, providing insights into the performance imbalance across groups [53], [72], [75]. ...
... These metrics include FMR, FNMR, and True Match Rate (TMR)-where higher value in standard deviation corresponds to greater demographic disparities [32], [53], [57], [72]- [74]. Another significant metric is the Skewed Error Ratio (SER), which specifically focuses on worst-case error ratios, providing insights into the performance imbalance across groups [53], [72], [75]. Recent competitions [76]- [78] exploring the use of synthetic data for FR and bias mitigation have adopted a trade-off performance metric: the mean accuracy adjusted by the standard deviation. ...
Preprint
Full-text available
Demographic bias in face recognition (FR) has emerged as a critical area of research, given its impact on fairness, equity, and reliability across diverse applications. As FR technologies are increasingly deployed globally, disparities in performance across demographic groups -- such as race, ethnicity, and gender -- have garnered significant attention. These biases not only compromise the credibility of FR systems but also raise ethical concerns, especially when these technologies are employed in sensitive domains. This review consolidates extensive research efforts providing a comprehensive overview of the multifaceted aspects of demographic bias in FR. We systematically examine the primary causes, datasets, assessment metrics, and mitigation approaches associated with demographic disparities in FR. By categorizing key contributions in these areas, this work provides a structured approach to understanding and addressing the complexity of this issue. Finally, we highlight current advancements and identify emerging challenges that need further investigation. This article aims to provide researchers with a unified perspective on the state-of-the-art while emphasizing the critical need for equitable and trustworthy FR systems.
... Fairness in FR is often assessed by identifying biases that cause models to favor or disadvantage specific demographic groups, resulting in unequal outcomes [62]. For example, FR models may exhibit varying accuracy rates across races, resulting in affected groups facing increased risks of fraud or restricted access to essential services [61], as well as wrongful investigation or heightened surveillance based on racial and gender lines [69]. Consequently, enhancing fairness in FR technology is essential. ...
... Consequently, enhancing fairness in FR technology is essential. A widely adopted approach to mitigate these biases is to employ balanced training datasets such as BUPT-Balancedface, ensuring equitable representation in demographics [33], [61], [86]. ...
... Further, we use the standard deviation (STD) and the skewed error ratio (SER) to evaluate bias in the accuracy of our models across different ethnic groups in line with earlier work [61]. Skewed Error Ratio (SER) is calculated as the ratio of the highest accuracy to the lowest accuracy across the different groups: SER = Highest Accuracy Lowest Accuracy . ...
Article
Full-text available
Face recognition (FR) technology has a concurrent interaction between accuracy, privacy, and fairness. To investigate this, we present a deep learning FR model trained on the BUPT-Balancedface dataset, which is racially balanced, and incorporate differential privacy (DP) into the private variant of the model to ensure data confidentiality, retaining the focus on fairness aspects. We analyze the verification accuracy of both private (with different privacy budgets) and non-private models using a variety of benchmark FR datasets. Our results show that the non-private model achieves reasonably high accuracy comparable to the current state-of-the-art models. The private model shows a trade-off between accuracy and privacy, as well as between fairness and privacy, meaning that enhancing privacy tends to reduce both accuracy and fairness. Our findings indicate that DP unevenly reduces accuracy across demographics and suggest that adjusting the privacy budget allows for better balancing of privacy, accuracy, and fairness. Furthermore, we extend our experiments to consider real-world bias by training our private model on the imbalanced CASIA-WebFace dataset, where variability in accuracy and fairness disparities are amplified, showing the impact of dataset composition on the interplay between privacy, accuracy, and fairness. Additionally, we show that traditional membership inference attacks (MIAs) can compromise privacy in FR systems. We further introduce a more realistic, identity-based MIA (I-MIA) tailored specifically for FR. Our analysis demonstrates that DP significantly mitigates privacy risks from both traditional MIAs and the proposed I-MIA.
... Performance variations can also be observed for different ethnicities [2,13,34,47]. The most commonly used datasets in this area for training and evaluation, Racial Faces-in-the-Wild (RFW) [57], Balanced Faces in the Wild (BFW) [46], and BUPT [55,58] introduce four ethnicity subsets that individuals belong to: Indian, Black/African, Asian, Caucasian/White. This probably oversimplifies a complex, naturally non-binary attribute as individuals may belong or identify to more than one ethnicity. ...
... Other works therefore propose to rather use skin tone [42]. In [55], Skewed Error Ratio (SER) has been introduced as a bias measuring metric. Wu et al . ...
... For training models on unbalanced data that amplify bias, we utilized the BUPT-Balanceface dataset [55,58]. This dataset provides ethnicity-based training subsets, covering the ethnicities W hite/Caucasian, Indian, Af rican, and Asian. ...
Preprint
Full-text available
Face recognition (FR) models are vulnerable to performance variations across demographic groups. The causes for these performance differences are unclear due to the highly complex deep learning-based structure of face recognition models. Several works aimed at exploring possible roots of gender and ethnicity bias, identifying semantic reasons such as hairstyle, make-up, or facial hair as possible sources. Motivated by recent discoveries of the importance of frequency patterns in convolutional neural networks, we explain bias in face recognition using state-of-the-art frequency-based explanations. Our extensive results show that different frequencies are important to FR models depending on the ethnicity of the samples.
... Deng et al. [30] proposed variational margins at the algorithm level to mitigate the misleading tail-sample prediction in the head class. Wang et al. [58] carefully created an unfair filtering network under a meta-learning paradigm that utilizes metare-weighting interventions to reduce training bias caused by category imbalance. Bao et al. [59] proposed Pixel-level Auxiliary learning (PA) and Feature Rearrangement (FR) to better utilize the facial features, while Adaptive Routing (AR) was devised to select the appropriate classifiers to improve the long-tailed recognition. ...
... Liu et al. [61] presented fair loss, which is a margin-aware reinforcement learning-based loss function to learn an adaptive margin. Wang et al. [58] proposed a reinforcement learning-based Racial Balancing Network (RL-RBN)to find the most suitable margin for non-Caucasians and can reduce the skewness of feature dispersion among races. In this work, a dynamic group-aware margin strategy based on reinforcement learning is designed with a more flexible and stable margin loss function for imbalanced age estimation. ...
... Inspired by [58], [61], we conceptualize the problem of identifying suitable adaptive margins within the framework of a Markov Decision Process (MDP). With state s t as input, the Q-value Q (s t , a) is estimated by the deep Q-learning network (DQN), where the agent will be trained to use different actions a t to adapt the margins for each state. ...
Preprint
Full-text available
With the recent advances in computer vision, age estimation has significantly improved in overall accuracy. However, owing to the most common methods do not take into account the class imbalance problem in age estimation datasets, they suffer from a large bias in recognizing long-tailed groups. To achieve high-quality imbalanced learning in long-tailed groups, the dominant solution lies in that the feature extractor learns the discriminative features of different groups and the classifier is able to provide appropriate and unbiased margins for different groups by the discriminative features. Therefore, in this novel, we propose an innovative collaborative learning framework (GroupFace) that integrates a multi-hop attention graph convolutional network and a dynamic group-aware margin strategy based on reinforcement learning. Specifically, to extract the discriminative features of different groups, we design an enhanced multi-hop attention graph convolutional network. This network is capable of capturing the interactions of neighboring nodes at different distances, fusing local and global information to model facial deep aging, and exploring diverse representations of different groups. In addition, to further address the class imbalance problem, we design a dynamic group-aware margin strategy based on reinforcement learning to provide appropriate and unbiased margins for different groups. The strategy divides the sample into four age groups and considers identifying the optimum margins for various age groups by employing a Markov decision process. Under the guidance of the agent, the feature representation bias and the classification margin deviation between different groups can be reduced simultaneously, balancing inter-class separability and intra-class proximity. After joint optimization, our architecture achieves excellent performance on several age estimation benchmark datasets.
... • BUPT-BalancedFace [43] is designed to address performance disparities across different ethnic groups. We relabel it according to the FairFace classifier [47], which provides labels for ethnicity (White, Black, Asian, Indian) and gender (Male, Female). ...
... Task 1: The first task focuses on using synthetic data to mitigate demographic biases within FR systems. To evaluate the performance of these systems, we create sets of mated and non-mated comparisons using subjects from the BUPT-BalancedFace database [43]. We consider the eight demographic groups defined in Section II-B, which result from the combination of four ethnicities (White, Black, Asian, and Indian) and two genders (Male and Female), ensuring a balanced representation across these groups in the comparison lists. ...
... Evaluation: In each sub-task, participants received the comparison files comprising both mated and non-mated comparisons, which are used to evaluate the performance of their proposed FR systems. Task 1 involves a single comparison file containing balanced comparisons of different demographic groups of the BUPT [43] database, while Task 2 comprises four comparison files, each corresponding to every specific real-world databases considered (i.e., BUPT, AgeDB [44], CFP-FP [45], and ROF [46]). During the evaluation of each sub-task, participants are required to submit two files per database through the Codalab platform: i) the scores of the baseline system, and ii) the scores of the proposed system. ...
Preprint
Full-text available
Synthetic data is gaining increasing popularity for face recognition technologies, mainly due to the privacy concerns and challenges associated with obtaining real data, including diverse scenarios, quality, and demographic groups, among others. It also offers some advantages over real data, such as the large amount of data that can be generated or the ability to customize it to adapt to specific problem-solving needs. To effectively use such data, face recognition models should also be specifically designed to exploit synthetic data to its fullest potential. In order to promote the proposal of novel Generative AI methods and synthetic data, and investigate the application of synthetic data to better train face recognition systems, we introduce the 2nd FRCSyn-onGoing challenge, based on the 2nd Face Recognition Challenge in the Era of Synthetic Data (FRCSyn), originally launched at CVPR 2024. This is an ongoing challenge that provides researchers with an accessible platform to benchmark i) the proposal of novel Generative AI methods and synthetic data, and ii) novel face recognition systems that are specifically proposed to take advantage of synthetic data. We focus on exploring the use of synthetic data both individually and in combination with real data to solve current challenges in face recognition such as demographic bias, domain adaptation, and performance constraints in demanding situations, such as age disparities between training and testing, changes in the pose, or occlusions. Very interesting findings are obtained in this second edition, including a direct comparison with the first one, in which synthetic databases were restricted to DCFace and GANDiffFace.
... skewed error ratio (SER) and standard deviations (STD). Our results on RFW, which are aligned with previous works [25,44,[61][62][63], report demographic bias in deep learning-based FR models. ...
... , where g represents the demographic group, as reported in [61,63]. A higher STD value indicates more bias across demographic groups and vice versa. ...
... This can be observed from the high STD values (far from optimal zero) and SER values that are far from 1. These observations are aligned with the previous works [25,44,[61][62][63], reporting bias in FR when trained/finetuned on unbalanced datasets. Potential causes of this bias [67] link to the unbalanced training datasets, model's sensitivity to skin color [39] and/or hairstyles [49].The racial distribution of the current FR datasets [27,66,75] used in the literature is not balanced with a majority of identities being Caucasians [61][62][63]. ...
Preprint
Full-text available
Foundation models are predominantly trained in an unsupervised or self-supervised manner on highly diverse and large-scale datasets, making them broadly applicable to various downstream tasks. In this work, we investigate for the first time whether such models are suitable for the specific domain of face recognition. We further propose and demonstrate the adaptation of these models for face recognition across different levels of data availability. Extensive experiments are conducted on multiple foundation models and datasets of varying scales for training and fine-tuning, with evaluation on a wide range of benchmarks. Our results indicate that, despite their versatility, pre-trained foundation models underperform in face recognition compared to similar architectures trained specifically for this task. However, fine-tuning foundation models yields promising results, often surpassing models trained from scratch when training data is limited. Even with access to large-scale face recognition training datasets, fine-tuned foundation models perform comparably to models trained from scratch, but with lower training computational costs and without relying on the assumption of extensive data availability. Our analysis also explores bias in face recognition, with slightly higher bias observed in some settings when using foundation models.
... For the final evaluation of the proposed FR systems, we use the same four real databases of the 1 st edition of the FRCSyn Challenge [30,31]: i) BUPT-BalancedFace [46], designed to address performance disparities across different ethnic groups; ii) AgeDB [33], including facial images of the same subjects at different ages; iii) CFP-FP [37], presenting facial images from subjects with great changes in pose, including both frontal and profile images; and iv) ROF [15], consisting of occluded faces with both upper and lower face occlusions. ...
... Task 1: The first proposed task focuses on the use of synthetic data to mitigate demographic biases in FR systems. To assess the effectiveness of the proposed systems, we generate lists of mated and non-mated comparisons using subjects from the BUPT-BalancedFace database [46]. We take into account eight demographic groups obtained from the combination of four ethnic groups (White, Black, Asian, and Indian) and two genders (Male and Female), and keep these groups balanced in the number of comparisons. ...
... Evaluation: In each sub-task, participants received the comparison files comprising both mated and non-mated comparisons, which are used to evaluate the performance of their proposed FR systems. Task 1 involves a single comparison file containing balanced comparisons of different demographic groups of the BUPT [46] database, while Task 2 comprises four comparison files, each corresponding to each of the specific real-world databases considered (i.e., BUPT, AgeDB [33], CFP-FP [37], and ROF [15]). During the evaluation of each sub-task, participants are required to submit via Codalab three files per database: i) the scores of the baseline system, ii) the scores of the proposed system, and iii) the decision threshold for each FR system (i.e., baseline and proposed). ...
... Approaches to mitigate bias in inter-attribute discrimination performance can be categorized into two stages: the dataset construction and the model construction. In the dataset construction stage, efforts are made to create datasets with balanced racial proportions [20,21], sample or augment data to minimize disparities in recognition accuracy between attributes [13,16], and propose methods for data augmentation [11]. In the model construction stage, strategies involve mitigating bias in model performance through score normalization between attributes [18] and dynamically adjusting hyperparameters based on attributes [20,22,19], under the assumption of the existence of racially biased datasets. ...
... In the dataset construction stage, efforts are made to create datasets with balanced racial proportions [20,21], sample or augment data to minimize disparities in recognition accuracy between attributes [13,16], and propose methods for data augmentation [11]. In the model construction stage, strategies involve mitigating bias in model performance through score normalization between attributes [18] and dynamically adjusting hyperparameters based on attributes [20,22,19], under the assumption of the existence of racially biased datasets. ...
... Therefore, in facial recognition systems, the issue of racial bias has been mainly related to the racial proportions in the training datasets. Wang et al. [20] created the BUPT-Balancedface dataset with balanced racial proportions and demonstrated that training with it could reduce racial bias compared to training with traditionally biased datasets in terms of racial proportions. Faisal et al. [13] proposed a method of resampling in which data objects are repeatedly replaced to minimise differences in recognition accuracy between races, thereby removing data that could cause racial bias from the dataset. ...
Preprint
Demographic bias is one of the major challenges for face recognition systems. The majority of existing studies on demographic biases are heavily dependent on specific demographic groups or demographic classifier, making it difficult to address performance for unrecognised groups. This paper introduces ``LabellessFace'', a novel framework that improves demographic bias in face recognition without requiring demographic group labeling typically required for fairness considerations. We propose a novel fairness enhancement metric called the class favoritism level, which assesses the extent of favoritism towards specific classes across the dataset. Leveraging this metric, we introduce the fair class margin penalty, an extension of existing margin-based metric learning. This method dynamically adjusts learning parameters based on class favoritism levels, promoting fairness across all attributes. By treating each class as an individual in facial recognition systems, we facilitate learning that minimizes biases in authentication accuracy among individuals. Comprehensive experiments have demonstrated that our proposed method is effective for enhancing fairness while maintaining authentication accuracy.
... To overcome these issues, many previous studies have been conducted, and various multiracial datasets have been established, facilitating research on face recognition across diverse racial environments. For instance, the BUPT-BalancedFace dataset was developed to enhance the fairness of face recognition models by balancing the representation of various races [1]. However, the process of constructing such datasets typically involves web crawling or sampling from existing datasets by race, making it challenging to obtain a sufficient number of images for each subject. ...
... However, most of the research has been conducted in the Western world, and accuracy for other cultures, such as Asian and African, is still poor [26][27][28]. To overcome this, attempts are being made to create datasets that are increasingly racially diverse [1,19,29,30]. ...
... First, the dataset used for the style image that forms the basis of the synthetic face image is BUPT-BalancedFace [1]. This dataset includes 7000 subjects per race and consists of a total of 1.3 M images. ...
Article
Full-text available
Despite major breakthroughs in facial recognition technology, problems with bias and a lack of diversity still plague face recognition systems today. To address these issues, we created synthetic face data using a diffusion-based generative model and fine-tuned already-high-performing models. To achieve a more balanced overall performance across various races, the synthetic dataset was created by following the dual-condition face generator (DCFace) resolution and using race-varied data from BUPT-BalancedFace as well as FairFace. To verify the proposed method, we fine-tuned a pre-trained improved residual networks (IResnet)-100 model with additive angular margin (ArcFace) loss using the synthetic dataset. The results show that the racial gap in performance is reduced from 0.0107 to 0.0098 in standard deviation terms, while the overall accuracy increases from 96.125% to 96.1625%. The improved racial balance and diversity in the synthetic dataset led to an improvement in model fairness, demonstrating that this resource could facilitate more equitable face recognition systems. This method provides a low-cost way to address data diversity challenges and help make face recognition more accurate across different demographic groups. The results of the study highlighted that more advanced synthesized datasets, created through diffusion-based models, can also result in increased facial recognition accuracy with greater fairness, emphasizing that these should not be ignored by developers aiming to create artificial intelligence (AI) systems.
... Authentic Datasets. For authentic face data used to train the FR models, we adopted the well-known BUPT-Balancedface [60] and CASIA-WebFace [63] datasets. BUPT-Balancedface [60] consists of 1.3M images from 28K identities and is annotated with both ethnicity and identity labels. ...
... For authentic face data used to train the FR models, we adopted the well-known BUPT-Balancedface [60] and CASIA-WebFace [63] datasets. BUPT-Balancedface [60] consists of 1.3M images from 28K identities and is annotated with both ethnicity and identity labels. Its ethnicity annotations include four demographic groups: African, Asian, Caucasian, and Indian, with 7K identities and approximately 300K images each. ...
... The synthetic datasets were unbalanced towards the Caucasian group, as determined by labeling all the data using a ResNet18 [28] backbone trained on BUPT-BalancedFace [60] to predict the ethnicity label of each identity. The inferred ethnicity pseudo-labels are reported in Table 1. ...
Preprint
Full-text available
Over the recent years, the advancements in deep face recognition have fueled an increasing demand for large and diverse datasets. Nevertheless, the authentic data acquired to create those datasets is typically sourced from the web, which, in many cases, can lead to significant privacy issues due to the lack of explicit user consent. Furthermore, obtaining a demographically balanced, large dataset is even more difficult because of the natural imbalance in the distribution of images from different demographic groups. In this paper, we investigate the impact of demographically balanced authentic and synthetic data, both individually and in combination, on the accuracy and fairness of face recognition models. Initially, several generative methods were used to balance the demographic representations of the corresponding synthetic datasets. Then a state-of-the-art face encoder was trained and evaluated using (combinations of) synthetic and authentic images. Our findings emphasized two main points: (i) the increased effectiveness of training data generated by diffusion-based models in enhancing accuracy, whether used alone or combined with subsets of authentic data, and (ii) the minimal impact of incorporating balanced data from pre-trained generative methods on fairness (in nearly all tested scenarios using combined datasets, fairness scores remained either unchanged or worsened, even when compared to unbalanced authentic datasets). Source code and data are available at \url{https://cutt.ly/AeQy1K5G} for reproducibility.
... Biometrics has been particularly fruitful in this scenario, with the study of demographic biases, including their prevention and mitigation [10,13]. Methods tackling bias in biometrics span the evaluation of trained models in different populations [6,38,40], learning strategies including fairness constraints in their optimization objectives [16,39], or the development of new databases with a broad and fair representation of the demographic groups [32,46,48]. However, a key point for analyzing and mitigating demographic biases is to be able to measure them properly. ...
... With this balance, we are not only aiming to measure the model's bias but also to consider how competitive the recognition system is, a relevant aspect in systems with very small error rates. By examining the evaluation of high-performance models (e.g., those presented to NIST FRTE) with the DFI metric on state-of-the-art datasets, such as RFW [48] or BUPT-B [46], we noticed that error rates associated to demographic biases are not captured with the cited metric. We hypothesize that, since DFI uses the entire distribution to measure fairness i) the tail has a little relevance in the computation and ii) genuine and impostor distributions are treated as a whole, hence hindering the assessment of any bias present in either of them. ...
... Similarly to the models evaluated in [49], we assessed the performance of the trained models on IJB-C [28]. These models will be used later in our experiments: Throughout the experiments carried out in the present work, we will use the following publicly available databases: MORPH [37,44], RFW [48], and BUPT-B [46]. All three databases include demographic labels with the gender and ethnicity of each subject. ...
Preprint
Full-text available
We present a novel metric designed, among other applications, to quantify biased behaviors of machine learning models. As its core, the metric consists of a new similarity metric between score distributions that balances both their general shapes and tails' probabilities. In that sense, our proposed metric may be useful in many application areas. Here we focus on and apply it to the operational evaluation of face recognition systems, with special attention to quantifying demographic biases; an application where our metric is especially useful. The topic of demographic bias and fairness in biometric recognition systems has gained major attention in recent years. The usage of these systems has spread in society, raising concerns about the extent to which these systems treat different population groups. A relevant step to prevent and mitigate demographic biases is first to detect and quantify them. Traditionally, two approaches have been studied to quantify differences between population groups in machine learning literature: 1) measuring differences in error rates, and 2) measuring differences in recognition score distributions. Our proposed Comprehensive Equity Index (CEI) trade-offs both approaches combining both errors from distribution tails and general distribution shapes. This new metric is well suited to real-world scenarios, as measured on NIST FRVT evaluations, involving high-performance systems and realistic face databases including a wide range of covariates and demographic groups. We first show the limitations of existing metrics to correctly assess the presence of biases in realistic setups and then propose our new metric to tackle these limitations. We tested the proposed metric with two state-of-the-art models and four widely used databases, showing its capacity to overcome the main flaws of previous bias metrics.
... In such procedures, additional prediction heads may be introduced for attribute subgroup predictions and the information concerning sensitive attributes would be removed through inverse gradient updating [7], [8] or disentangling features [9]- [12]. Other fairness enhancing techniques include data augmentations [13], sampling [14], [15], data noising [16], dataset balancing with generative methods [17]- [19], and reweighting mechanisms [20], [21]. ...
... Baseline fairness measures: To evaluate the validity of HFM in capturing the discriminative degree of classifiers, we compare it with three commonly-used group fairness measures (that is, demographic parity (DP) [35], [36], equality of opportunity (EO) [37], and predictive quality parity (PQP) [2], [38]) 7 and one named discriminative risk (DR) 8 [32] that could reflect the bias level of ML models from both individual-and group-fairness aspects. ...
... As we can see from Figures 5(j) and 5(l), when direct computation of distances (i.e., maximal distance D a (S) and average distance D avg a (S)) is expensive, obtaining their approximated values via ExtendDist distinctly costs less time than that of precise values by Eq. (6) and (7). Increasing m 2 (or m 1 ) in ExtendDist would cost more time, while the effect of increasing m 1 is more obvious. ...
Preprint
Discrimination mitigation with machine learning (ML) models could be complicated because multiple factors may interweave with each other including hierarchically and historically. Yet few existing fairness measures are able to capture the discrimination level within ML models in the face of multiple sensitive attributes. To bridge this gap, we propose a fairness measure based on distances between sets from a manifold perspective, named as 'harmonic fairness measure via manifolds (HFM)' with two optional versions, which can deal with a fine-grained discrimination evaluation for several sensitive attributes of multiple values. To accelerate the computation of distances of sets, we further propose two approximation algorithms named 'Approximation of distance between sets for one sensitive attribute with multiple values (ApproxDist)' and 'Approximation of extended distance between sets for several sensitive attributes with multiple values (ExtendDist)' to respectively resolve bias evaluation of one single sensitive attribute with multiple values and that of several sensitive attributes with multiple values. Moreover, we provide an algorithmic effectiveness analysis for ApproxDist under certain assumptions to explain how well it could work. The empirical results demonstrate that our proposed fairness measure HFM is valid and approximation algorithms (i.e., ApproxDist and ExtendDist) are effective and efficient.
... However, our approach differs from the mentioned approach (FAAP (Wang et al. 2022)) in the following ways: (i) Their adversarial filters do not simulate any real-world situation, and the authors do not disclose what type of filter is developed, hence it is difficult to compare against or audit for; (ii) Their adversarial filters conceal the gender and race information and improve performance for tasks like smile or hair color identification, whereas we explicitly audit the platforms for gender prediction, which has a large number of downstream applications ranging from marketing to surveillance. Furthermore, most existing approaches are either too complex or need significant changes to the pipeline (Gong, Liu, and Jain 2020;Wang and Deng 2020;Wang, Zhang, and Deng 2021;Conti et al. 2022;Wang et al. 2022). We, on the other hand, perform simple fine-tuning and contrastive learning to mitigate the observed biases. ...
... We hope our dataset and bias mitigation algorithms will break this hegemony and provide some balance. Wang, M.;Zhang, Y.;and Deng, W. 2021. Meta balanced network for fair face recognition. ...
Article
Full-text available
Facial Recognition Systems (FRSs) are being developed and deployed all around the world at unprecedented rates. Most platforms are designed in a limited set of countries, but deployed in other regions too, without adequate checkpoints for region-specific requirements. This is especially problematic for Global South countries which lack strong legislation to safeguard persons facing disparate performance of these systems. A combination of unavailability of datasets, lack of understanding of how FRSs function and low-resource bias mitigation measures accentuate the problems at hand. In this work, we propose a self-curated face dataset composed of 6,579 unique male and female sports-persons (cricket players) from eight countries around the world. More than 50% of the dataset is composed of individuals from the Global South countries and is demographically diverse. To aid adversarial audits and robust model training, we curate four adversarial variants of each image in the dataset, leading to more than 40,000 distinct images. We also use this dataset to benchmark five popular facial recognition systems (FRSs), including both commercial and open-source FRSs, for the task of gender prediction (and country prediction for one of the open-source models as an example of red-teaming). Experiments on industrial FRSs reveal accuracies ranging from 98.2% (in case of Azure) to 38.1% (in case of Face++), with a large disparity between males and females in the Global South (max difference of 38.5% in case of Face++). Biases are also observed in all FRSs between females of the Global North and South (max difference of ~50%). A Grad-CAM analysis shows that the nose, forehead and mouth are the regions of interest for one of the open-source FRSs. Based on this crucial observation, we design simple, low-resource bias mitigation solutions using few-shot and novel contrastive learning techniques that demonstrate a significant improvement in accuracy with disparity between males and females reducing from 50% to 1.5% in one of the settings. For the red-teaming experiment using the open-source Deepface model we observe that simple fine-tuning is not very useful while contrastive learning brings steady benefits.
... We evaluate our findings with various fairness approaches, including data sampling, reweighting, adversarial training, and constraint-based approaches. In the experiments, we adopt the implementations of these approaches from [78], [79]. Similarly, we focus on the case of CelebA (T=s/S=g), and Figure 9 presents the results. ...
Preprint
Full-text available
While in-processing fairness approaches show promise in mitigating biased predictions, their potential impact on privacy leakage remains under-explored. We aim to address this gap by assessing the privacy risks of fairness-enhanced binary classifiers via membership inference attacks (MIAs) and attribute inference attacks (AIAs). Surprisingly, our results reveal that enhancing fairness does not necessarily lead to privacy compromises. For example, these fairness interventions exhibit increased resilience against MIAs and AIAs. This is because fairness interventions tend to remove sensitive information among extracted features and reduce confidence scores for the majority of training data for fairer predictions. However, during the evaluations, we uncover a potential threat mechanism that exploits prediction discrepancies between fair and biased models, leading to advanced attack results for both MIAs and AIAs. This mechanism reveals potent vulnerabilities of fair models and poses significant privacy risks of current fairness methods. Extensive experiments across multiple datasets, attack methods, and representative fairness approaches confirm our findings and demonstrate the efficacy of the uncovered mechanism. Our study exposes the under-explored privacy threats in fairness studies, advocating for thorough evaluations of potential security vulnerabilities before model deployments.
... and group standard deviation (STD) [52] defined as: std ∀y∈Y,∀s∈S ...
Preprint
Full-text available
Recent advances in generative models have sparked research on improving model fairness with AI-generated data. However, existing methods often face limitations in the diversity and quality of synthetic data, leading to compromised fairness and overall model accuracy. Moreover, many approaches rely on the availability of demographic group labels, which are often costly to annotate. This paper proposes AIM-Fair, aiming to overcome these limitations and harness the potential of cutting-edge generative models in promoting algorithmic fairness. We investigate a fine-tuning paradigm starting from a biased model initially trained on real-world data without demographic annotations. This model is then fine-tuned using unbiased synthetic data generated by a state-of-the-art diffusion model to improve its fairness. Two key challenges are identified in this fine-tuning paradigm, 1) the low quality of synthetic data, which can still happen even with advanced generative models, and 2) the domain and bias gap between real and synthetic data. To address the limitation of synthetic data quality, we propose Contextual Synthetic Data Generation (CSDG) to generate data using a text-to-image diffusion model (T2I) with prompts generated by a context-aware LLM, ensuring both data diversity and control of bias in synthetic data. To resolve domain and bias shifts, we introduce a novel selective fine-tuning scheme in which only model parameters more sensitive to bias and less sensitive to domain shift are updated. Experiments on CelebA and UTKFace datasets show that our AIM-Fair improves model fairness while maintaining utility, outperforming both fully and partially fine-tuned approaches to model fairness.
... An empirical examination of race-based differences in attitudes toward the police is particularly important given the nature of the technology that is the focus of the present study. Previous studies have shown that racial bias arises from both the data and the algorithms used to analyze them (Wang and Deng 2020). Facial recognition algorithms exhibit racial bias for various reasons. ...
Article
Full-text available
The police’s use of facial recognition technologies allows them to verify identification in real-time by mapping facial features into indicators that can be compared with other data stored in its database or in online social networks. Advances in facial recognition technologies have changed law enforcement agencies’ operations, improving their ability to identify suspects, investigate crimes, and deter criminal behavior. Most applications are used in tracking and identifying potential terrorists, searching for abducted and missing persons, and security surveillance at airports, national borders, and large public gatherings. However, for facial recognition technologies to fulfill their potential, they must not only be adopted by the police, but the public must also support their routine use. Using a secondary data analysis of a public opinion survey conducted by the Pew Research Center among a representative sample of US residents (N = 5307), we investigated the factors associated with the public`s expectations about the positive and negative outcomes of the police’s adoption of facial recognition technologies. Our results show that public attitudes to FRT are balanced, indicating awareness of the public for potential advantages, but also disadvantages of police adoption of FRT. Privacy considerations and familiarity with the technology were found to be critical for the explanation of public attitudes expressing both positive and negative expectations from the police adoption of facial recognition technology. Our study contributes to the understanding of the factors associated with public attitudes police’s use of facial recognition technologies.
... The authors of [57,70] propose adaptive threshold-based approaches to improve fairness. Another approach is to address ethnicityrelated bias by learning disparate margins per demographic segment in the representation space [73,75,78] or by suppressing attribute-related information in the model [59]. While technically interesting, these methods are ethically and legally problematic in practice since they assume disparate treatment of human subjects by AI-based systems. ...
Preprint
Full-text available
Face recognition and verification are two computer vision tasks whose performances have advanced with the introduction of deep representations. However, ethical, legal, and technical challenges due to the sensitive nature of face data and biases in real-world training datasets hinder their development. Generative AI addresses privacy by creating fictitious identities, but fairness problems remain. Using the existing DCFace SOTA framework, we introduce a new controlled generation pipeline that improves fairness. Through classical fairness metrics and a proposed in-depth statistical analysis based on logit models and ANOVA, we show that our generation pipeline improves fairness more than other bias mitigation approaches while slightly improving raw performance.
... To compare the selected datasets, we choose the FRCsyn benchmark used to rank the submitted models to the competition. This benchmark is composed of four datasets, named AGEDB [16], BUPT [17], CFP-FP [18] and ROF [19]. The AgeDB dataset is focused on a comparison of images across age, and the most challenging scenario was used, named AgeDB-30, which has a gap of 30 years between the face images of the individuals and contains 3000 genuine pairs and 3000 impostor pairs. ...
Conference Paper
Face recognition has become a widely adopted method for user authentication and identification, with applications in various domains such as secure access, law enforcement, and locating missing persons. The success of this technology is largely attributed to deep learning, which leverages large datasets and effective loss functions to achieve highly discriminative features. Despite its advancements, face recognition still faces challenges in areas such as explainability, demographic bias, privacy and robustness against aging, pose variations, illumination changes, occlusions, and expressions. Additionally, the emergence of privacy regulations has led to the discontinuation of several well-established datasets, raising legal, ethical, and privacy concerns. To address these issues, synthetic facial data generation has been proposed as a solution. This technique not only mitigates privacy concerns but also allows for comprehensive experimentation with facial attributes that cause bias, helps alleviate demographic bias, and provides complementary data to enhance models trained with real data. Competitions, such as the FRCSyn and SDFR, have been organized to explore the limitations and potential of face recognition technology trained with synthetic data. This paper compares the effectiveness of established synthetic face datasets with different generation techniques in face recognition tasks. We benchmark the accuracy of seven mainstream datasets, providing a vivid comparison of approaches that are not explicitly contrasted in the literature. Our experiments highlight the diverse techniques used to address the synthetic facial data generation problem and present a comprehensive benchmark of the area. The results demonstrate the effectiveness of various methods in generating synthetic facial data with realistic variations, evidencing the diverse techniques used to deal with the problem.
... We consider TinyFaR-A model and similar to Table 2 we trained with different synthetic datasets as well as real data. We use the evaluation procedure in [72] and calculated the verification accuracies for Asian, African, Caucasian, and Indian identities of the RFW dataset. The average and standard deviation of the accuracies are reported in Table 9. ...
Article
Full-text available
State-of-the-art face recognition models are computationally expensive for mobile applications. Training lightweight face recognition models also requires large identity-labeled datasets, raising privacy and ethical concerns. Generating synthetic datasets for training is also challenging, and there is a significant gap in performance between models trained on real and synthetic face datasets. We propose a new framework (called SynthDistill) to train lightweight face recognition models by distilling the knowledge from a pretrained teacher model using synthetic data. We generate synthetic face images without identity labels, mitigating the problems in the intra-class variation generation of synthetic datasets, and dynamically sample from the intermediate latent space of a face generator network to generate new variations of the challenging images while further exploring new face images. The results on different benchmarking real face recognition datasets demonstrate the superiority of SynthDistill compared to training on previous synthetic datasets, achieving a verification accuracy of 99.52% on the LFW dataset with a lightweight network. The results also show that SynthDistill significantly narrows the gap between real and synthetic data training. The source code of our experiments is publicly available to facilitate the reproducibility of our work.
... Bao (2019) used a multiagent RL approach for fair stock trading. Finally, Wang and Deng (2020) used RL to mitigate bias in facial recognition. ...
Article
While our understanding of fairness in machine learning has significantly progressed, our understanding of fairness in reinforcement learning (RL) remains nascent. Most of the attention has been on fairness in one-shot classification tasks; however, real-world, RL-enabled systems (e.g., autonomous vehicles) are much more complicated in that agents operate in dynamic environments over a long period of time. To ensure the responsible development and deployment of these systems, we must better understand fairness in RL. In this paper, we survey the literature to provide the most up-to-date snapshot of the frontiers of fairness in RL. We start by reviewing where fairness considerations can arise in RL, then discuss the various definitions of fairness in RL that have been put forth thus far. We continue to highlight the methodologies researchers used to implement fairness in single- and multi-agent RL systems and showcase the distinct application domains that fair RL has been investigated in. Finally, we critically examine gaps in the literature, such as understanding fairness in the context of RLHF, that still need to be addressed in future work to truly operationalize fair RL in real-world systems.
... [57] altered the utility function of GAN in order to generate equitable picture [54] proposed a U-Net for creating unbiased image data. Deep information maximization adaption networks were employed to eliminate racial bias in face vision datasets [67], while reinforcement learning was utilized for training a race-balanced network [66]. Wang et al. [69] offer a generative few-shot cross-domain adaptation method for performing fair cross-domain adaptation and enhancing minority category performance. ...
Preprint
Full-text available
The persistent challenge of bias in machine learning models necessitates robust solutions to ensure parity and equal treatment across diverse groups, particularly in classification tasks. Current methods for mitigating bias often result in information loss and an inadequate balance between accuracy and fairness. To address this, we propose a novel methodology grounded in bilevel optimization principles. Our deep learning-based approach concurrently optimizes for both accuracy and fairness objectives, and under certain assumptions, achieving proven Pareto optimal solutions while mitigating bias in the trained model. Theoretical analysis indicates that the upper bound on the loss incurred by this method is less than or equal to the loss of the Lagrangian approach, which involves adding a regularization term to the loss function. We demonstrate the efficacy of our model primarily on tabular datasets such as UCI Adult and Heritage Health. When benchmarked against state-of-the-art fairness methods, our model exhibits superior performance, advancing fairness-aware machine learning solutions and bridging the accuracy-fairness gap. The implementation of FairBiNN is available on https://github.com/yazdanimehdi/FairBiNN.
... Then, it can train a recovery model f −1 to map X p back to X, as arg min δ f −1 (X p ; δ), X 1 , and exploit f −1 to recover the client's shared X p . We concretely use BUPT [54] of 1.3M images as the attacker's dataset, and employ a fullscale U-Net [42] as its f −1 . Comparison with SOTAs. ...
... The dataset contains about 30000 face images of about 2000 individuals, all of which are manually tagged with age, gender, and identity. Besides, other commonly used fairness face datasets are LFW [118], UTK Face [119], IJB-A [120], PPB [2], MS-Celeb-1M [121], DiF [122], MTFL [123], RFW [124], BUPT Faces [125]. ...
Article
In recent years, artificial intelligence technology has been widely used in many fields, such as computer vision, natural language processing and autonomous driving. Machine learning algorithms, as the core technique of AI, have significantly facilitated people’s lives. However, underlying fairness issues in machine learning systems can pose risks to individual fairness and social security. Studying fairness definitions, sources of problems, and testing and debugging methods of fairness can help ensure the fairness of machine learning systems and promote the wide application of artificial intelligence technology in various fields. This paper introduces relevant definitions of machine learning fairness and analyzes the sources of fairness problems. Besides, it provides guidance on fairness testing and debugging methods and summarizes popular datasets. This paper also discusses the technical advancements in machine learning fairness and highlights future challenges in this area.
... Skewed Error Ratio (SER) -SER is a fairness metric introduced by Wang et al. [24] in the context of FR. Considering the set of protected groups {g 1 , g 2 , ..., g n }, the SER is defined as: ...
Preprint
This study investigates the effects of occlusions on the fairness of face recognition systems, particularly focusing on demographic biases. Using the Racial Faces in the Wild (RFW) dataset and synthetically added realistic occlusions, we evaluate their effect on the performance of face recognition models trained on the BUPT-Balanced and BUPT-GlobalFace datasets. We note increases in the dispersion of FMR, FNMR, and accuracy alongside decreases in fairness according to Equilized Odds, Demographic Parity, STD of Accuracy, and Fairness Discrepancy Rate. Additionally, we utilize a pixel attribution method to understand the importance of occlusions in model predictions, proposing a new metric, Face Occlusion Impact Ratio (FOIR), that quantifies the extent to which occlusions affect model performance across different demographic groups. Our results indicate that occlusions exacerbate existing demographic biases, with models placing higher importance on occlusions in an unequal fashion, particularly affecting African individuals more severely.
... Objective-based debiasing modifies the model's training process to achieve improved fairness. Early studies are proposed to reduce social bias within uni-modal models [2,17,39,40,59]. For example, Bolukbasi et al. [7] optimizes word embeddings to remove gender stereotypes. ...
Preprint
Multi-modal Large Language Models (MLLMs) have advanced significantly, offering powerful vision-language understanding capabilities. However, these models often inherit severe social biases from their training datasets, leading to unfair predictions based on attributes like race and gender. This paper addresses the issue of social biases in MLLMs by i) Introducing a comprehensive Counterfactual dataset with Multiple Social Concepts (CMSC), which provides a more diverse and extensive training set compared to existing datasets. ii) Proposing an Anti-Stereotype Debiasing strategy (ASD). Our method works by revisiting the MLLM training process, rescaling the autoregressive loss function, and improving data sampling methods to counteract biases. Through extensive experiments on various MLLMs, our CMSC dataset and ASD method demonstrate a significant reduction in social biases while maintaining the models' original performance.
... Recently, [48] proposed a deep reinforcement learningbased approach to make the FR race-invariant. Then, another work by [39] proposed a Gaussian distributionbased face embedding method which can extract the unambiguous mapping of the face to mitigate the impact of TID variations. ...
Article
Full-text available
Facial identity is subject to two primary natural variations: time-dependent (TD) factors such as age, and time-independent (TID) factors including sex and race. This study aims to address a broader problem known as variation-invariant face recognition (VIFR) by exploring the question: “How can identity preservation be maximized in the presence of TD and TID variations?" While existing state-of-the-art (SOTA) methods focus on either age-invariant or race and sex-invariant FR, our approach introduces the first novel deep learning architecture utilizing multi-task learning to tackle VIFR, termed “multi-task learning-based variation-invariant face recognition (MTLVIFR)." We redefine FR by incorporating both TD and TID, decomposing faces into age (TD) and residual features (TID: sex, race, and identity). MTLVIFR outperforms existing methods by 2% in LFW and CALFW benchmarks, 1% in CALFW, and 5% in AgeDB (20 years of protocol) in terms of face verification score. Moreover, it achieves higher face identification scores compared to all SOTA methods. Open source code.
... Bias problems in Machine Learning are the Achilles heel of many applications, including recommendation systems (Schnabel et al., 2016), facial recognition (Wang and Deng, 2019), and speech recognition (Koenecke et al., 2020). One of the main sources of bias comes from training datasets, as noted by Shankar et al. (2017) Ima-geNet and the Open Images dataset disproportionately represented people from North America and Europe. ...
Article
Full-text available
Medical conditions and systemic diseases often manifest as distinct facial characteristics, making identification of these unique features crucial for disease screening. However, detecting diseases using facial photography remains challenging because of the wide variability in human facial features and disease conditions. The integration of artificial intelligence (AI) into facial analysis represents a promising frontier offering a user-friendly, non-invasive, and cost-effective screening approach. This review explores the potential of AI-assisted facial analysis for identifying subtle facial phenotypes indicative of health disorders. First, we outline the technological framework essential for effective implementation in healthcare settings. Subsequently, we focus on the role of AI-assisted facial analysis in disease screening. We further expand our examination to include applications in health monitoring, support of treatment decision-making, and disease follow-up, thereby contributing to comprehensive disease management. Despite its promise, the adoption of this technology faces several challenges, including privacy concerns, model accuracy, issues with model interpretability, biases in AI algorithms, and adherence to regulatory standards. Addressing these challenges is crucial to ensure fair and ethical use. By overcoming these hurdles, AI-assisted facial analysis can empower healthcare providers, improve patient care outcomes, and enhance global health.
Preprint
Full-text available
Biometric authentication is increasingly popular for its convenience and accuracy. However, while recent advancements focus on reducing errors and expanding modalities, the reliability of reported performance metrics often remains overlooked. Understanding reliability is critical, as it communicates how accurately reported error rates represent a system's actual performance, considering the uncertainty in error-rate estimates from test data. Currently, there is no widely accepted standard for reporting these uncertainties and indeed biometric studies rarely provide reliability estimates, limiting comparability and interpretation. To address this gap, we introduce BioQuake--a measure to estimate uncertainty in biometric verification systems--and empirically validate it on four systems and three datasets. Based on BioQuake, we provide simple guidelines for estimating performance uncertainty and facilitating reliable reporting. Additionally, we apply BioQuake to analyze biometric recognition performance on 62 biometric datasets used in research across eight modalities: face, fingerprint, gait, iris, keystroke, eye movement, Electroencephalogram (EEG), and Electrocardiogram (ECG). Our analysis shows that reported state-of-the-art performance often deviates significantly from actual error rates, potentially leading to inaccurate conclusions. To support researchers and foster the development of more reliable biometric systems and datasets, we release BioQuake as an easy-to-use web tool for reliability calculations.
Article
With the recent advances in computer vision, age estimation has significantly improved in overall accuracy. However, owing to the most common methods do not take into account the class imbalance problem in age estimation datasets, they suffer from a large bias in recognizing long-tailed groups. To achieve high-quality imbalanced learning in long-tailed groups, the dominant solution lies in that the feature extractor learns the discriminative features of different groups and the classifier is able to provide appropriate and unbiased margins for different groups by the discriminative features. Therefore, in this novel, we propose an innovative collaborative learning framework (Group-Face) that integrates a multi-hop attention graph convolutional network and a dynamic group-aware margin strategy based on reinforcement learning. Specifically, to extract the discriminative features of different groups, we design an enhanced multi-hop attention graph convolutional network. This network is capable of capturing the interactions of neighboring nodes at different distances, fusing local and global information to model facial deep aging, and exploring diverse representations of different groups. In addition, to further address the class imbalance problem, we design a dynamic group-aware margin strategy based on reinforcement learning to provide appropriate and unbiased margins for different groups. The strategy divides the sample into four age groups and considers identifying the optimum margins for various age groups by employing a Markov decision process. Under the guidance of the agent, the feature representation bias and the classification margin deviation between different groups can be reduced simultaneously, balancing inter-class separability and intra-class proximity. After joint optimization, our architecture achieves excellent performance on several age estimation benchmark datasets. It not only achieves large improvements in overall estimation accuracy but also gains balanced performance in long-tailed group estimation.
Article
Facial recognition is one of the most academically studied and industrially developed areas within computer vision where we readily find associated applications deployed globally. This widespread adoption has uncovered significant performance variation across subjects of different racial profiles leading to focused research attention on racial bias within face recognition spanning both current causation and future potential solutions. In support, this study provides an extensive taxonomic review of research on racial bias within face recognition exploring every aspect and stage of the associated facial processing pipeline. Firstly, we discuss the problem definition of racial bias, starting with race definition, grouping strategies, and the societal implications of using race or race-related groupings. Secondly, we divide the common face recognition processing pipeline into four stages: image acquisition, face localisation, face representation, face verification and identification, and review the relevant corresponding literature associated with each stage. The overall aim is to provide comprehensive coverage of the racial bias problem with respect to each and every stage of the face recognition processing pipeline whilst also highlighting the potential pitfalls and limitations of contemporary mitigation strategies that need to be considered within future research endeavours or commercial applications alike.
Article
From helping you and me unlock our smartphones with a mere glance to enabling the police to identify criminals via CCTV footage, artificial intelligence (AI)-powered face recognition technology (FRT) is rapidly advancing by the day. According to reports, the Indian Railways will install FRT-based video surveillance systems in 983 railway stations across the country to ramp up security. However, the accuracy and reliability of FRT depend on the quality of the input images, the algorithms used and the size and quality of the reference database. Due to the scope for major imbalances in these elements, these algorithms have been found to be biased—largely against minority communities and women, thereby exacerbating already prevalent forms of societal discrimination. Scrutinising key research studies that collectively expose racial and gender bias prevalent in widely adopted FRT tools in the United States as well as India, this article analyses how bias and inaccuracy of these tools have led to poor outcomes and raised concerns when deployed by law enforcement agencies in India. Furthermore, this article traces the historical context of these biases and proposes debiasing measures that go beyond balancing datasets.
Article
Full-text available
Deep learning algorithms have demonstrated remarkable efficacy in various medical image analysis (MedIA) applications. However, recent research highlights a performance disparity in these algorithms when applied to specific subgroups, such as exhibiting poorer predictive performance in elderly females. Addressing this fairness issue has become a collaborative effort involving AI scientists and clinicians seeking to understand its origins and develop solutions for mitigation within MedIA. In this survey, we thoroughly examine the current advancements in addressing fairness issues in MedIA, focusing on methodological approaches. We introduce the basics of group fairness and subsequently categorize studies on fair MedIA into fairness evaluation and unfairness mitigation. Detailed methods employed in these studies are presented too. Our survey concludes with a discussion of existing challenges and opportunities in establishing a fair MedIA and healthcare system. By offering this comprehensive review, we aim to foster a shared understanding of fairness among AI researchers and clinicians, enhance the development of unfairness mitigation methods, and contribute to the creation of an equitable MedIA society.
Article
Full-text available
Questions of unfairness and inequity pose critical challenges to the successful deployment of artificial intelligence (AI) in healthcare settings. In AI models, unequal performance across protected groups may be partially attributable to the learning of spurious or otherwise undesirable correlations between sensitive attributes and disease-related information. Here, we introduce the Attribute Neutral Framework, designed to disentangle biased attributes from disease-relevant information and subsequently neutralize them to improve representation across diverse subgroups. Within the framework, we develop the Attribute Neutralizer (AttrNzr) to generate neutralized data, for which protected attributes can no longer be easily predicted by humans or by machine learning classifiers. We then utilize these data to train the disease diagnosis model (DDM). Comparative analysis with other unfairness mitigation algorithms demonstrates that AttrNzr outperforms in reducing the unfairness of the DDM while maintaining DDM’s overall disease diagnosis performance. Furthermore, AttrNzr supports the simultaneous neutralization of multiple attributes and demonstrates utility even when applied solely during the training phase, without being used in the test phase. Moreover, instead of introducing additional constraints to the DDM, the AttrNzr directly addresses a root cause of unfairness, providing a model-independent solution. Our results with AttrNzr highlight the potential of data-centered and model-independent solutions for fairness challenges in AI-enabled medical systems.
Preprint
Full-text available
Computer vision systems have witnessed rapid progress over the past two decades due to multiple advances in the field. As these systems are increasingly being deployed in high-stakes real-world applications, there is a dire need to ensure that they do not propagate or amplify any discriminatory tendencies in historical or human-curated data or inadvertently learn biases from spurious correlations. This paper presents a comprehensive survey on fairness that summarizes and sheds light on ongoing trends and successes in the context of computer vision. The topics we discuss include 1) The origin and technical definitions of fairness drawn from the wider fair machine learning literature and adjacent disciplines. 2) Work that sought to discover and analyze biases in computer vision systems. 3) A summary of methods proposed to mitigate bias in computer vision systems in recent years. 4) A comprehensive summary of resources and datasets produced by researchers to measure, analyze, and mitigate bias and enhance fairness. 5) Discussion of the field's success, continuing trends in the context of multimodal foundation and generative models, and gaps that still need to be addressed. The presented characterization should help researchers understand the importance of identifying and mitigating bias in computer vision and the state of the field and identify potential directions for future research.
Article
Full-text available
This work proposes a novel privacy-preserving neural network feature representation to suppress the sensitive information of a learned space while maintaining the utility of the data. The new international regulation for personal data protection forces data controllers to guarantee privacy and avoid discriminative hazards while managing sensitive data of users. In our approach, privacy and discrimination are related to each other. Instead of existing approaches aimed directly at fairness improvement, the proposed feature representation enforces the privacy of selected attributes. This way fairness is not the objective, but the result of a privacy-preserving learning method. This approach guarantees that sensitive information cannot be exploited by any agent who process the output of the model, ensuring both privacy and equality of opportunity. Our method is based on an adversarial regularizer that introduces a sensitive information removal function in the learning objective. The method is evaluated on three different primary tasks (identity, attractiveness, and smiling) and three publicly available benchmarks. In addition, we present a new face annotation dataset with balanced distribution between genders and ethnic origins. The experiments demonstrate that it is possible to improve the privacy and equality of opportunity while retaining competitive performance independently of the task.
Conference Paper
Full-text available
Recent research has highlighted the vulnerabilities of modern machine learning based systems to bias, especially towards segments of society that are under-represented in training data. In this work, we develop a novel, tunable algorithm for mitigating the hidden, and potentially unknown, biases within training data. Our algorithm fuses the original learning task with a variational autoencoder to learn the latent structure within the dataset and then adaptively uses the learned latent distributions to re-weight the importance of certain data points while training. While our method is generalizable across various data modalities and learning tasks, in this work we use our algorithm to address the issue of racial and gender bias in facial detection systems. We evaluate our algorithm on the Pilot Parliaments Benchmark (PPB), a dataset specifically designed to evaluate biases in computer vision systems, and demonstrate increased overall performance as well as decreased categorical bias with our debiasing approach.
Article
Full-text available
Driven by graphics processing units (GPUs), massive amounts of annotated data and more advanced algorithms, deep learning has recently taken the computer vision community by storm and has benefited real-world applications, including face recognition (FR). Deep FR methods leverage deep networks to learn more discriminative representations, significantly improving the state of the art and surpassing human performance (97.53%). In this paper, we provide a comprehensive survey of deep FR methods, including data, algorithms and scenes. First, we summarize the commonly used datasets for training and testing. Then, the data preprocessing methods are categorized into two classes: "one-to-many augmentation" and "many-to-one normalization". Second, for algorithms, we summarize different network architectures and loss functions used in the state-of-the art methods. Third, we review several scenes in deep FR, such as video FR, 3D FR and cross-age FR. Finally, some potential deficiencies of the current methods and several future directions are highlighted.
Article
Full-text available
In this paper, we design and evaluate a convolutional autoencoder that perturbs an input face image to impart privacy to a subject. Specifically, the proposed autoencoder transforms an input face image such that the transformed image can be successfully used for face recognition but not for gender classification. In order to train this autoencoder, we propose a novel training scheme, referred to as semi-adversarial training in this work. The training is facilitated by attaching a semi-adversarial module consisting of a pseudo gender classifier and a pseudo face matcher to the autoencoder. The objective function utilized for training this network has three terms: one to ensure that the perturbed image is a realistic face image; another to ensure that the gender attributes of the face are confounded; and a third to ensure that biometric recognition performance due to the perturbed image is not impacted. Extensive experiments confirm the efficacy of the proposed architecture in extending gender privacy to face images.
Conference Paper
Full-text available
This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space. However, few existing algorithms can effectively achieve this criterion. To this end, we propose the angular softmax (A-Softmax) loss that enables convolutional neural networks (CNNs) to learn angularly discriminative features. Geometrically, A-Softmax loss can be viewed as imposing discriminative constraints on a hypersphere manifold, which intrinsically matches the prior that faces also lie on a manifold. Moreover, the size of angular margin can be quantitatively adjusted by a parameter m. We further derive specific m to approximate the ideal feature criterion. Extensive analysis and experiments on Labeled Face in the Wild (LFW), Youtube Faces (YTF) and MegaFace Challenge 1 show the superiority of A-Softmax loss in FR tasks.
Conference Paper
Full-text available
Thanks to the recent developments of Convolutional Neural Networks, the performance of face verification methods has increased rapidly. In a typical face verification method, feature normalization is a critical step for boosting performance. This motivates us to introduce and study the effect of normalization during training. But we find this is non-trivial, despite normalization being differentiable. We identify and study four issues related to normalization through mathematical analysis, which yields understanding and helps with parameter settings. Based on this analysis we propose two strategies for training using normalized features. The first is a modification of softmax loss, which optimizes cosine similarity instead of inner-product. The second is a reformulation of metric learning by introducing an agent vector for each class. We show that both strategies, and small variants, consistently improve performance by between 0.2% to 0.4% on the LFW dataset based on two models. This is significant because the performance of the two models on LFW dataset is close to saturation at over 98%.
Article
Full-text available
Non-discrimination is a recognized objective in algorithmic decision making. In this paper, we introduce a novel probabilistic formulation of data pre-processing for reducing discrimination. We propose a convex optimization for learning a data transformation with three goals: controlling discrimination, limiting distortion in individual data samples, and preserving utility. We characterize the impact of limited sample size in accomplishing this objective, and apply two instances of the proposed optimization to datasets, including one on real-world criminal recidivism. The results demonstrate that all three criteria can be simultaneously achieved and also reveal interesting patterns of bias in American society.
Article
Full-text available
A number of classification problems need to deal with data imbalance between classes. Often it is desired to have a high recall on the minority class while maintaining a high precision on the majority class. In this paper, we review a number of resampling techniques proposed in literature to handle unbalanced datasets and study their effect on classification performance.
Article
Full-text available
In this paper, we design a benchmark task and provide the associated datasets for recognizing face images and link them to corresponding entity keys in a knowledge base. More specifically, we propose a benchmark task to recognize one million celebrities from their face images, by using all the possibly collected face images of this individual on the web as training data. The rich information provided by the knowledge base helps to conduct disambiguation and improve the recognition accuracy, and contributes to various real-world applications, such as image captioning and news video analysis. Associated with this task, we design and provide concrete measurement set, evaluation protocol, as well as training data. We also present in details our experiment setup and report promising baseline results. Our benchmark task could lead to one of the largest classification problems in computer vision. To the best of our knowledge, our training dataset, which contains 10M images in version 1, is the largest publicly available one in the world.
Article
Full-text available
TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with particularly strong support for training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model in contrast to existing systems, and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.
Conference Paper
Full-text available
We consider the problem of perturbing a face image in such a way that it cannot be used to ascertain soft biometric attributes such as age, gender and race, but can be used for automatic face recognition. Such an exercise is useful for extending different levels of privacy to a face image in a central database. In this work, we focus on masking the gender information in a face image with respect to an automated gender estimation scheme, while retaining its ability to be used by a face matcher. To facilitate this privacy-enhancing technique, the input face image is combined with another face image via a morphing scheme resulting in a mixed image. The mixing process can be used to progressively modify the input image such that its gender information is progressively suppressed; however, the modified images can still be used for recognition purposes if necessary. Preliminary experiments on the MUCT database suggest the potential of the scheme in imparting “differential privacy” to face images.
Article
Full-text available
MXNet is a multi-language machine learning (ML) library to ease the development of ML algorithms, especially for deep neural networks. Embedded in the host language, it blends declarative symbolic expression with imperative tensor computation. It offers auto differentiation to derive gradients. MXNet is computation and memory efficient and runs on various heterogeneous systems, ranging from mobile devices to distributed GPU clusters. This paper describes both the API design and the system implementation of MXNet, and explains how embedding of both symbolic expression and tensor operation is handled in a unified fashion. Our preliminary experiments reveal promising results on large scale deep neural network applications using multiple GPU machines.
Article
Full-text available
Pushing by big data and deep convolutional neural network (CNN), the performance of face recognition is becoming comparable to human. Using private large scale training datasets, several groups achieve very high performance on LFW, i.e., 97% to 99%. While there are many open source implementations of CNN, none of large scale face dataset is publicly available. The current situation in the field of face recognition is that data is more important than algorithm. To solve this problem, this paper proposes a semi-automatical way to collect face images from Internet and builds a large scale dataset containing about 10,000 subjects and 500,000 images, called CASIAWebFace. Based on the database, we use a 11-layer CNN to learn discriminative representation and obtain state-of-theart accuracy on LFW and YTF. The publication of CASIAWebFace will attract more research groups entering this field and accelerate the development of face recognition in the wild.
Article
Full-text available
The key challenge of face recognition is to develop effective feature representations for reducing intra-personal variations while enlarging inter-personal differences. In this paper, we show that it can be well solved with deep learning and using both face identification and verification signals as supervision. The Deep IDentification-verification features (DeepID2) are learned with carefully designed deep convolutional networks. The face identification task increases the inter-personal variations by drawing DeepID2 extracted from different identities apart, while the face verification task reduces the intra-personal variations by pulling DeepID2 extracted from the same identity together, both of which are essential to face recognition. The learned DeepID2 features can be well generalized to new identities unseen in the training data. On the challenging LFW dataset, 99.15% face verification accuracy is achieved. Compared with the best deep learning result on LFW, the error rate has been significantly reduced by 67%.
Article
Full-text available
This paper studies the influence of demographics on the performance of face recognition algorithms. The recognition accuracies of six different face recognition algorithms (three commercial, two nontrainable, and one trainable) are computed on a large scale gallery that is partitioned so that each partition consists entirely of specific demographic cohorts. Eight total cohorts are isolated based on gender (male and female), race/ethnicity (Black, White, and Hispanic), and age group (18-30, 30-50, and 50-70 years old). Experimental results demonstrate that both commercial and the nontrainable algorithms consistently have lower matching accuracies on the same cohorts (females, Blacks, and age group 18-30) than the remaining cohorts within their demographic. Additional experiments investigate the impact of the demographic distribution in the training set on the performance of a trainable face recognition algorithm. We show that the matching accuracy for race/ethnicity and age cohorts can be improved by training exclusively on that specific cohort. Operationally, this leads to a scenario, called dynamic face matcher selection, where multiple face recognition algorithms (each trained on different demographic cohorts) are available for a biometric system operator to select based on the demographic information extracted from a probe image. This procedure should lead to improved face recognition accuracy in many intelligence and law enforcement face recognition scenarios. Finally, we show that an alternative to dynamic face matcher selection is to train face recognition algorithms on datasets that are evenly distributed across demographics, as this approach offers consistently high accuracy across all cohorts.
Chapter
Neural networks achieve the state-of-the-art in image classification tasks. However, they can encode spurious variations or biases that may be present in the training data. For example, training an age predictor on a dataset that is not balanced for gender can lead to gender biased predicitons (e.g. wrongly predicting that males are older if only elderly males are in the training set). We present two distinct contributions: (1) An algorithm that can remove multiple sources of variation from the feature representation of a network. We demonstrate that this algorithm can be used to remove biases from the feature representation, and thereby improve classification accuracies, when training networks on extremely biased datasets. (2) An ancestral origin database of 14,000 images of individuals from East Asia, the Indian subcontinent, sub-Saharan Africa, and Western Europe. We demonstrate on this dataset, for a number of facial attribute classification tasks, that we are able to remove racial biases from the network feature representation. KeywordsDataset biasFace attribute classificationAncestral origin dataset
Article
Computer scientists must identify sources of bias, de-bias training data and develop artificial-intelligence algorithms that are robust to skews in the data, argue James Zou and Londa Schiebinger. Computer scientists must identify sources of bias, de-bias training data and develop artificial-intelligence algorithms that are robust to skews in the data.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Article
The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior research has investigated the spatial component of this relationship, seeking to strengthen the representational power of a CNN by enhancing the quality of spatial encodings throughout its feature hierarchy. In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We show that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets. We further demonstrate that SE blocks bring significant improvements in performance for existing state-of-the-art CNNs at slight additional computational cost. Squeeze-and-Excitation Networks formed the foundation of our ILSVRC 2017 classification submission which won first place and reduced the top-5 error to 2.251 percent, surpassing the winning entry of 2016 by a relative improvement of {\sim } 25 percent. Models and code are available at https://github.com/hujie-frank/SENet .
Chapter
In 2007, Labeled Faces in the Wild was released in an effort to spur research in face recognition, specifically for the problem of face verification with unconstrained images. Since that time, more than 50 papers have been published that improve upon this benchmark in some respect. A remarkably wide variety of innovative methods have been developed to overcome the challenges presented in this database. As performance on some aspects of the benchmark approaches 100 % accuracy, it seems appropriate to review this progress, derive what general principles we can from these works, and identify key future challenges in face recognition. In this survey, we review the contributions to LFW for which the authors have provided results to the curators (results found on the LFW results web page). We also review the cross cutting topic of alignment and how it is used in various methods. We end with a brief discussion of recent databases designed to challenge the next generation of face recognition algorithms.
Article
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
Article
Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioural decisions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.
Article
Despite significant recent advances in the field of face recognition, implementing face verification and recognition efficiently at scale presents serious challenges to current approaches. In this paper we present a system, called FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature vectors. Our method uses a deep convolutional network trained to directly optimize the embedding itself, rather than an intermediate bottleneck layer as in previous deep learning approaches. To train, we use triplets of roughly aligned matching / non-matching face patches generated using a novel online triplet mining method. The benefit of our approach is much greater representational efficiency: we achieve state-of-the-art face recognition performance using only 128-bytes per face. On the widely used Labeled Faces in the Wild (LFW) dataset, our system achieves a new record accuracy of 99.63%. On YouTube Faces DB it achieves 95.12%. Our system cuts the error rate in comparison to the best published result by 30% on both datasets.
Article
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Article
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make train-ing faster, we used non-saturating neurons and a very efficient GPU implemen-tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
Article
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
Article
In many applications, a face recognition model learned on a source domain but applied to a novel target domain degenerates even significantly due to the mismatch between the two domains. Aiming at learning a better face recognition model for the target domain, this paper proposes a simple but effective domain adaptation approach that transfers the supervision knowledge from a labeled source domain to the unlabeled target domain. Our basic idea is to convert the source domain images to target domain (termed as targetize the source domain hereinafter), and at the same time keep its supervision information. For this purpose, each source domain image is simply represented as a linear combination of sparse target domain neighbors in the image space, with the combination coefficients however learnt in a common subspace. The principle behind this strategy is that, the common knowledge is only favorable for accurate cross-domain reconstruction, but for the classification in the target domain, the specific knowledge of the target domain is also essential and thus should be mostly preserved (through targetization in the image space in this work). To discover the common knowledge, specifically, a common subspace is learnt, in which the structures of both domains are preserved and meanwhile the disparity of source and target domains is reduced. The proposed method is extensively evaluated under three face recognition scenarios, i.e., domain adaptation across view angle, domain adaptation across ethnicity and domain adaptation across imaging condition. The experimental results illustrate the superiority of our method over those competitive ones.
Article
Psychological research indicates that humans recognize faces of their own race more accurately than faces of other races. This “other-race effect” occurs for algorithms tested in a recent international competition for state-of-the-art face recognition algorithms. We report results for a Western algorithm made by fusing eight algorithms from Western countries and an East Asian algorithm made by fusing five algorithms from East Asian countries. At the low false accept rates required for most security applications, the Western algorithm recognized Caucasian faces more accurately than East Asian faces and the East Asian algorithm recognized East Asian faces more accurately than Caucasian faces. Next, using a test that spanned all false alarm rates, we compared the algorithms with humans of Caucasian and East Asian descent matching face identity in an identical stimulus set. In this case, both algorithms performed better on the Caucasian faces—the “majority” race in the database. The Caucasian face advantage, however, was far larger for the Western algorithm than for the East Asian algorithm. Humans showed the standard other-race effect for these faces, but showed more stable performance than the algorithms over changes in the race of the test faces. State-of-the-art face recognition algorithms, like humans, struggle with “other-race face” recognition.
Article
Most face databases have been created under controlled conditions to facilitate the study of specific parameters on the face recognition problem. These parameters include such variables as position, pose, lighting, background, camera quality, and gender. While there are many applications for face recognition technology in which one can control the parameters of image acquisition, there are also many applications in which the practitioner has little or no control over such parameters. This database, Labeled Faces in the Wild, is provided as an aid in studying the latter, unconstrained, recognition problem. The database contains labeled face photographs spanning the range of conditions typically encountered in everyday life. The database exhibits “natural” variability in factors such as pose, lighting, race, accessories, occlusions, and background. In addition to describing the details of the database, we provide specific experimental paradigms for which the database is suitable. This is done in an effort to make research performed with the database as consistent and comparable as possible. We provide baseline results, including results of a state of the art face recognition system combined with a face alignment system. To facilitate experimentation on the database, we provide several parallel databases, including an aligned version.
Conference Paper
Summary form only given. The face recognition vendor test (FRVT) 2002 is an independently administered technology evaluation of mature face recognition systems. FRVT 2002 provides performance measures for assessing the capability of face recognition systems to meet requirements for large-scale, real-world applications. Participation in FRVT 2002 was open to commercial and mature prototype systems from universities, research institutes, and companies. Ten companies submitted either commercial or prototype systems. FRVT 2002 computed performance statistics on an extremely large data set-121,589 operational facial images of 37,437 individuals. FRVT 2002 1) characterized identification and watch list performance as a function of database size, 2) estimated the variability in performance for different groups of people, 3) characterized performance as a function of elapsed time between enrolled and new images of a person, and 4) investigated the effect of demographics on performance. FRVT 2002 showed that recognition from indoor images has made substantial progress since FRVT 2000. Demographic results show that males are easier to recognize than females and that older people are easier to recognize than younger people. FRVT 2002 also assessed the impact of three new techniques for improving face recognition: three-dimensional morphable models, normalization of similarity scores, and face recognition from video sequences. Results show that three-dimensional morphable models and normalization increase performance and that face recognition from video sequences offers only a limited increase in performance over still images. A new XML-based evaluation protocol was developed for FRVT 2002. This protocol is flexible and supports evaluations of biometrics in general The FRVT 2002 reports can be found at http://www.frvt.org.
Article
This paper studies empirically the effect of sampling and threshold-moving in training cost-sensitive neural networks. Both oversampling and undersampling are considered. These techniques modify the distribution of the training data such that the costs of the examples are conveyed explicitly by the appearances of the examples. Threshold-moving tries to move the output threshold toward inexpensive classes such that examples with higher costs become harder to be misclassified. Moreover, hard-ensemble and soft-ensemble, i.e., the combination of above techniques via hard or soft voting schemes, are also tested. Twenty-one UCl data sets with three types of cost matrices and a real-world cost-sensitive data set are used in the empirical study. The results suggest that cost-sensitive learning with multiclass tasks is more difficult than with two-class tasks, and a higher degree of class imbalance may increase the difficulty. It also reveals that almost all the techniques are effective on two-class tasks, while most are ineffective and even may cause negative effect on multiclass tasks. Overall, threshold-moving and soft-ensemble are relatively good choices in training cost-sensitive neural networks. The empirical study also suggests that some methods that have been believed to be effective in addressing the class imbalance problem may, in fact, only be effective on learning with imbalanced two-class data sets.
Iarpa janus benchmark-c: Face dataset and protocol
  • Brianna Maze
  • Jocelyn Adams
  • A James
  • Nathan Duncan
  • Tim Kalka
  • Charles Miller
  • Otto
  • K Anil
  • Jain
  • Janet Tyler Niggel
  • Jordan Anderson
  • Cheney
Unsupervised domain adaptation for distance metric learning
  • Kihyuk Sohn
  • Wenling Shang
  • Xiang Yu
  • Manmohan Chandraker
Gender shades: Intersectional accuracy disparities in commercial gender classification
  • Joy Buolamwini
  • Timnit Gebru