Fig 1 - uploaded by Khalil Khan
Content may be subject to copyright.
Face segmentation as produced by our algorithm 

Face segmentation as produced by our algorithm 

Source publication
Conference Paper
Full-text available
In this paper the problem of multi-class face segmentation is introduced. Differently from previous works which only consider few classes - typically skin and hair - the label set is extended here to six categories: skin, hair, eyes, nose, mouth and background. A dataset with 70 images taken from MIT-CBCL and FEI face databases is manually annotate...

Citations

... Convolutional networks can also be combined with recurrent networks, such as in (Liu et al., 2017b) and (Zhou, 2017). Many other algorithms can be used for facial feature extraction, like random forests (Khan et al., 2015) and support vector machines (SVMs) (Khan et al., 2018). Still, most of the available segmentation models require extensive and error-prone preprocessing steps, in the form of either facial landmark detection, image cropping, or image alignment. ...
Preprint
Current child face generators are restricted by the limited size of the available datasets. In addition, feature selection can prove to be a significant challenge, especially due to the large amount of features that need to be trained for. To manage these problems, we proposed CADA-GAN, a \textbf{C}ontext-\textbf{A}ware GAN that allows optimal feature extraction, with added robustness from additional \textbf{D}ata \textbf{A}ugmentation. CADA-GAN is adapted from the popular StyleGAN2-Ada model, with attention on augmentation and segmentation of the parent images. The model has the lowest \textit{Mean Squared Error Loss} (MSEloss) on latent feature representations and the generated child image is robust compared with the one that generated from baseline models.
... The FASSEG 2 public dataset focuses on the multiclass semantic segmentation of the face [28] as well as the estimation of its pose [29]. In our study, we consider a subset of this dataset corresponding to a specific pose (the front view). ...
Preprint
Full-text available
Deep learning based pipelines for semantic segmentation often ignore structural information available on annotated images used for training. We propose a novel post-processing module enforcing structural knowledge about the objects of interest to improve segmentation results provided by deep learning. This module corresponds to a "many-to-one-or-none" inexact graph matching approach, and is formulated as a quadratic assignment problem. Our approach is compared to a CNN-based segmentation (for various CNN backbones) on two public datasets, one for face segmentation from 2D RGB images (FASSEG), and the other for brain segmentation from 3D MRIs (IBSR). Evaluations are performed using two types of structural information (distances and directional relations, , this choice being a hyper-parameter of our generic framework). On FASSEG data, results show that our module improves accuracy of the CNN by about 6.3% (the Hausdorff distance decreases from 22.11 to 20.71). On IBSR data, the improvement is of 51% (the Hausdorff distance decreases from 11.01 to 5.4). In addition, our approach is shown to be resilient to small training datasets that often limit the performance of deep learning methods: the improvement increases as the size of the training dataset decreases.
... As a sample data set I use the Picasso Faces data set we already created in [25]. The data set is derived from the FASSEG data set [14] of frontal face images. The underlying task of the data set was altered towards a visual relational classification task. ...
Article
Full-text available
Deep learning methods, although effective in their assigned tasks, are mostly black-boxes with respect to their inner workings. For image classification with CNNs, there exists a variety of visual explanation methods that highlight parts of input images that were relevant for the classification result. But in many domains visual highlighting may not be expressive enough when the classification relies on complex relations within visual concepts. This paper presents an approach to enrich visual explanations with verbal local explanations, emphasizing important relational information. The proposed SymMetric algorithm combines metric learning and inductive logic programming (ILP). Labels given by a human for a small subset of important image parts are first generalized to a neighborhood of similar images using a learned distance metric. The information about labels and their spatial relations is then used to build background knowledge for ILP and ultimately to learn a first-order theory that locally explains the black-box with respect to the given image. The approach is evaluated with the Dogs vs. Cats data set demonstrating the generalization ability of metric learning and with Picasso Faces to illustrate recognition of spatial meaningful constellations of sub-concepts and creation of an expressive explanation.
... The intrinsic relation between various face parts is also confirmed in the latest research work reported in [48][49][50][51][52]. Research reported in [49][50][51][52] suggests that a mutual relationship between different face parts can be exploited to address several tasks in a single framework. The research work proposed in [53] segments a face image into six different face parts, which are later used for HPE in [49,50]. Similarly, gender recognition is combined with other tasks using the same strategy in [52]. ...
... These works address several face analysis tasks, including HPE, race, gender, and age estimation in a single model. The methods proposed in [51,52] use face segmentation information provided by a prior model developed in [53]. These methods do not extract landmarks information or high-dimensionality data, but instead perform face segmentation as a prior step. ...
Article
Full-text available
Human face image analysis using machine learning is an important element in computer vision. The human face image conveys information such as age, gender, identity, emotion, race, and attractiveness to both human and computer systems. Over the last ten years, face analysis methods using machine learning have received immense attention due to their diverse applications in various tasks. Although several methods have been reported in the last ten years, face image analysis still represents a complicated challenge, particularly for images obtained from ’in the wild’ conditions. This survey paper presents a comprehensive review focusing on methods in both controlled and uncontrolled conditions. Our work illustrates both merits and demerits of each method previously proposed, starting from seminal works on face image analysis and ending with the latest ideas exploiting deep learning frameworks. We show a comparison of the performance of the previous methods on standard datasets and also present some promising future directions on the topic.
... 13 233 40 -CelebA (Liu et al, 2015) cls facial attr. 202 599 40 -FASSEG (Khan et al, 2015) seg facial parts 270 3 -Picasso (Rabold et al, 2020) seg facial parts 452 3 452 ...
... It unifies diverse other image datasets (ADE20k by Zhou et al (2017), OpenSurfaces by Bell et al (2014), Pascal-Context by Mottaghi et al (2014), Pascal-Part by Chen et al (2014), and the Describable Textures Dataset by Cimpoi et al (2014) (2020) the generated Picasso dataset is presented for binary classification of images into "valid" and "invalid" face. In each image, clips of facial features (eye, nose, mouth) from the FASSEG dataset (Khan et al, 2015) are pasted upon portrait images in which facial features were erased. The arrangement of the features is considered an "invalid" (scrambled) face if the feature arrangement is unnatural, e.g., eye and nose swapped. ...
... Source: https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html FASSEG. The FAce Semantic SEGmentation dataset(Khan et al, 2015) contains 70 frontal face and 200 multi-pose face images with semantic segmentation of the object part concepts eye, nose, mouth. The portraits are sections from images taken from the MIT-CBCL (for Biological and Computational Learning , CBCL), FEI(Thomaz and Giraldi, 2010), and Pointing04(Gourier et al, 2004) datasets. ...
Preprint
Full-text available
Deep neural networks (DNNs) have found their way into many applications with potential impact on the safety, security, and fairness of human-machine-systems. Such require basic understanding and sufficient trust by the users. This motivated the research field of explainable artificial intelligence (XAI), i.e. finding methods for opening the "black-boxes" DNNs represent. For the computer vision domain in specific, practical assessment of DNNs requires a globally valid association of human interpretable concepts with internals of the model. The research field of concept (embedding) analysis (CA) tackles this problem: CA aims to find global, assessable associations of humanly interpretable semantic concepts (e.g., eye, bearded) with internal representations of a DNN. This work establishes a general definition of CA and a taxonomy for CA methods, uniting several ideas from literature. That allows to easily position and compare CA approaches. Guided by the defined notions, the current state-of-the-art research regarding CA methods and interesting applications are reviewed. More than thirty relevant methods are discussed, compared, and categorized. Finally, for practitioners, a survey of fifteen datasets is provided that have been used for supervised concept analysis. Open challenges and research directions are pointed out at the end.
... Generative Adversarial Networks (GAN) were introduced in 2014 by Goodfellow et al. [14]. GANs have achieved impressive results in computer vision [17,18,25], image-to-image translation [26,27], inpainting [28,29], and image segmentation [30,31]. Facial manipulation tasks have continuously gained attention in recent years due to the high demand for facial editing applications. ...
Article
Full-text available
This article shows how to create a robust thermal face recognition system based on the FaceNet architecture. We propose a method for generating thermal images to create a thermal face database with six different attributes (frown, glasses, rotation, normal, vocal, and smile) based on various deep learning models. First, we use StyleCLIP, which oversees manipulating the latent space of the input visible image to add the desired attributes to the visible face. Second, we use the GANs N’ Roses (GNR) model, a multimodal image-to-image framework. It uses maps of style and content to generate thermal imaging from visible images, using generative adversarial approaches. Using the proposed generator system, we create a database of synthetic thermal faces composed of more than 100k images corresponding to 3227 individuals. When trained and tested using the synthetic database, the Thermal-FaceNet model obtained a 99.98% accuracy. Furthermore, when tested with a real database, the accuracy was more than 98%, validating the proposed thermal images generator system.
... In other cases, finer facial structure is extracted with 68-point landmarks including eyebrow line, eye contour, length and width of nose, upper and lower lip contour, and jawline [40]. In face segmentation, the whole face is either segmented as a whole [41] or is segregated into different facial regions (e.g., eyes, nose, mouth, skin, hair) [42]. Face alignment and segmentation are particularly useful in further facial analysis applications such as face recognition or facial expression detection; however, they are more difficult tasks to achieve compared to detecting bounding boxes. ...
Article
Full-text available
The development of non-contact patient monitoring applications for the neonatal intensive care unit (NICU) is an active research area, particularly in facial video analysis. Recent studies have used facial video data to estimate vital signs, assess pain from facial expression, differentiate sleep-wake status, detect jaundice, and in face recognition. These applications depend on an accurate definition of the patient’s face as a region of interest (ROI). Most studies have required manual ROI definition, while others have leveraged automated face detectors developed for adult patients, without systematic validation for the neonatal population. To overcome these issues, this paper first evaluates the state-of-the-art in face detection in the NICU setting. Finding that such methods often fail in complex NICU environments, we demonstrate how fine-tuning can increase neonatal face detector robustness, resulting in our NICUface models. A large and diverse neonatal dataset was gathered from actual patients admitted to the NICU across three studies and gold standard face annotations were completed. In comparison to state-of-the-art face detectors, our NICUface models address NICU-specific challenges such as ongoing clinical intervention, phototherapy lighting, occlusions from hospital equipment, etc. These analyses culminate in the creation of robust NICUface detectors with improvements on our most challenging neonatal dataset of +36.14, +35.86, and +32.19 in AP30, AP50, and mAP respectively, relative to state-of-the-art CE-CLM, MTCNN, img2pose, RetinaFace, and YOLO5Face models. Face orientation estimation is also addressed, leading to an accuracy of 99.45%. Fine-tuned NICUface models, gold-standard face annotation data, and the face orientation estimation method are also released here.
... The experimental analysis of our proposed work is carried on the MIT-CBCL and FEI datasets, including 85 frontal face images. [39] Moreover, the faces are not organized in the same alignment and also contains faces of various ethnicity, gender, and ages. Some of the sample images of taken datasets are shown in Fig. 6. ...
Article
Full-text available
Face segmentation is the process of segmenting the visible parts of the face excluding the neck, ears, hair, and beards. In this field, several methods have been developed, but none of them have been effective in providing optimal face segmentation. Hence, we proposed a novel face segmentation method known as level-set-based neural network (NN) algorithm. This method exploits a hybrid filter for the pre-processing of images, which eliminates the unwanted noises and blurring effect from the images. The hybrid filter is the combination of Median, Mean, and Gaussian filters and effectively removes the unwanted noises. Hence the images are segmented by utilizing level-set-based NN algorithm which is commonly based on the population set and effectively reduces the gap between the predicted and expected outcomes. The proposed method is compared with state-of-art methods such as Fully convolution network (FCN), Gabor filter(GF), multi-class semantic face segmentation(MSFS), and genetic algorithms (GA). From the experimental analysis, it is evident that the proposed work achieved better results comparing to other approaches.
... Studies in [26][27][28][29][30] suggest that the mutual relationships between face parts can be exploited not only for HPE, but also for other visual tasks such as gender recognition, race classification, and age estimation. Most of these face analysis tasks showed improved performance if prior detection of different face parts was accurately estimated by semantic face segmentation [31]. ...
... Instead of first extracting landmarks points, or highly dimensional feature vectors and then applying a certain modeling strategy, these methods first perform face segmentation. The segmentation algorithm categorizes face parts in six different classes, as in [31], and it exploits a rich database created through manual labeling methods [124]. On the basis offered by the segmentation, a proper modeling for head pose estimation is performed. ...
Article
Head pose is an important cue in computer vision when using facial information. Over the last three decades, methods for head pose estimation have received increasing attention due to their application in several image analysis tasks. Although many techniques have been developed in the years to address this issue, head pose estimation remains an open research topic, particularly in unconstrained environments. In this paper, we present a comprehensive survey focusing on methods under both constrained and unconstrained conditions, focusing on the literature from the last decade. This work illustrates advantages and disadvantages of existing algorithms, starting from seminal contributions to head pose estimation, and ending with the more recent approaches which adopted deep learning frameworks. Several performance comparison are provided. This paper also states promising directions for future research on the topic.
... Unsupervised Co-part Segmentation With the popularity of deep neural networks, motion part segmentation has achieved superior performance in domains where labeled data are abundant, such as faces (Khan et al., 2015) and human bodies (Güler et al., 2018;Kanazawa et al., 2018). Parts segmentation can also be learned in an entirely unsupervised fashion. ...
Preprint
Co-part segmentation is an important problem in computer vision for its rich applications. We propose an unsupervised learning approach for co-part segmentation from images. For the training stage, we leverage motion information embedded in videos and explicitly extract latent representations to segment meaningful object parts. More importantly, we introduce a dual procedure of part-assembly to form a closed loop with part-segmentation, enabling an effective self-supervision. We demonstrate the effectiveness of our approach with a host of extensive experiments, ranging from human bodies, hands, quadruped, and robot arms. We show that our approach can achieve meaningful and compact part segmentation, outperforming state-of-the-art approaches on diverse benchmarks.