ArticlePDF Available

High Performing Facial Skin Problem Diagnosis with Enhanced Mask R-CNN and Super Resolution GAN

MDPI
Applied Sciences
Authors:

Abstract and Figures

Facial skin condition is perceived as a vital indicator of the person’s apparent age, perceived beauty, and degree of health. Machine-learning-based software analytics on facial skin conditions can be a time- and cost-efficient alternative to the conventional approach of visiting facial skin care shops or dermatologist’s offices. However, the conventional CNN-based approach is shown to be limited in the diagnosis performance due to the intrinsic characteristics of facial skin problems. In this paper, the technical challenges in facial skin problem diagnosis are first addressed, and a set of 5 effective tactics are proposed to overcome the technical challenges. A total of 31 segmentation models are trained and applied to the experiments of validating the proposed tactics. Through the experiments, the proposed approach provides 83.38% of the diagnosis performance, which is 32.58% higher than the performance of conventional CNN approach.
Content may be subject to copyright.
Citation: Kim, M.; Song, M.H. High
Performing Facial Skin Problem
Diagnosis with Enhanced Mask
R-CNN and Super Resolution GAN.
Appl. Sci. 2023,13, 989. https://
doi.org/10.3390/app13020989
Academic Editors: Yue Wu,
Xinglong Zhang and Pengfei Jia
Received: 24 December 2022
Revised: 5 January 2023
Accepted: 8 January 2023
Published: 11 January 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
applied
sciences
Article
High Performing Facial Skin Problem Diagnosis with Enhanced
Mask R-CNN and Super Resolution GAN
Mira Kim 1, * and Myeong Ho Song 2
1IBM Corporation, Costa Mesa, CA 92626, USA
2Department of Software, Soongsil University, Seoul 06978, Republic of Korea
*Correspondence: mirakim1012@gmail.com; Tel.: +1-213-308-4596
Abstract:
Facial skin condition is perceived as a vital indicator of the person’s apparent age, perceived
beauty, and degree of health. Machine-learning-based software analytics on facial skin conditions can
be a time- and cost-efficient alternative to the conventional approach of visiting facial skin care shops
or dermatologist’s offices. However, the conventional CNN-based approach is shown to be limited in
the diagnosis performance due to the intrinsic characteristics of facial skin problems. In this paper,
the technical challenges in facial skin problem diagnosis are first addressed, and a set of 5 effective
tactics are proposed to overcome the technical challenges. A total of 31 segmentation models are
trained and applied to the experiments of validating the proposed tactics. Through the experiments,
the proposed approach provides 83.38% of the diagnosis performance, which is 32.58% higher than
the performance of conventional CNN approach.
Keywords:
facial skin problem; mask R-CNN; super resolution; Generative Adversarial Network
(GAN); tactics for high performance
1. Introduction
Facial skin condition is perceived as a vital indicator of the person’s apparent age,
perceived beauty, and degree of health. A face with shining, silky, bright, hydrated, and
trouble-free skin indicates a high degree of beauty and thus attractiveness, which creates
an initial impression of the person. As people get older, their facial skin also ages, revealing
symptoms of aging such as wrinkles, age spots, and visible pores. The biological age from
his or her facial skin condition can intuitively be predicted. For this reason, people wish to
maintain youthful facial skin without aging symptoms.
The conventional way of assessing our facial skin condition is to visit facial skin care
shops or dermatologists’ offices. However, this requires a burden of locating the right facial
skin clinic, making appointments, and visiting the clinics. In addition, the cost incurred for
the visit can be substantial.
Machine learning-based software analytics on facial skin conditions can be a time-
and cost-efficient alternative to the conventional approach of visiting the clinics. In recent
years, researchers have applied Convolutional Neural Network (CNN) based deep learning
models to diagnose facial skin problems [
1
12
]. However, current CNN-based approaches
have been shown to be limited in delivering a diagnosis with high performance and,
hence, limited in their applicability in clinics. This is mainly due to the following technical
challenges in diagnosing facial skin problems with CNN models.
Detecting small-sized skin problems such as pores, moles, and acne
Handling the high complexity of detecting about 20 different facial skin problem types
Handling appearance variability of the same facial skin problem type among people
Handling appearance similarity of different facial skin problem types
Handling false segmentations on non-facial areas
Appl. Sci. 2023,13, 989. https://doi.org/10.3390/app13020989 https://www.mdpi.com/journal/applsci
Appl. Sci. 2023,13, 989 2 of 24
The goal of this study is to devise effective software methods to overcome the technical
challenges that can be provided as a clinic-level high performance of facial skin diagnosis.
This paper is to propose a set of five software tactics that can effectively remedy the
challenges and provide a high level of performance in diagnosing facial skin problems. The
paper also presents a technical assessment of the proposed methods through experiments
and comparison to other approaches.
The paper is organized as the following. Section 2is to summarize related works and
their contributions. Section 3is to present the intrinsic limitations of conventional CNN
approach to diagnosing facial skin problems. Section 4is to elaborate the set of five tactics
that can effectively overcome the technical limitations and provide a high performance of
diagnosis. Section 5is to present the datasets used for training facial skin diagnosis models
and the results of the experiments for evaluating the proposed tactics and comparing to
other approaches.
The contribution of this study is twofold: (1) proposing a set of give tactics that
overcoming the intrinsic limitations of CNN models in diagnosing face skin problems and
(2) seamlessly integrating the tactics into software implementation. Through a proof-of-
concept implementation of the system and experiments with it, the set of proposed tactics
is shown to provide 83.38% of the diagnosis, improve the diagnosis performance by 32.58%
compared to the conventional CNN approach, and outperform by an average of 17.89%
compared to diagnosis models with MobileNetV2, Xception, ResNet, VGG16, and VGG19.
The proposed facial skin diagnosis system with the tactics can potentially be utilized
as a supplementary approach in face skin care clinics and a cost-effective alternative to
visiting clinics by individuals.
2. Related Works
There have been a number of studies for diagnosing facial skin problems with deep
neural networks. Shen’s study [
1
], Quattrini’s study [
2
], and Zhao’s study [
3
] explored the
diagnosis of a specific facial skin problem, such as acne or rosacea, using CNN models.
VGG-19 models were utilized in Shen’s work and Quattrini’s work. Liu’s study proposed a
system for detecting moles [4] using UNet segmentation models.
There have been studies to analyze multiple types of facial skin problems, including
Wu [
5
] and Gerges [
6
]. Both works utilize a CNN network for diagnosing multiple types of
facial skin problems.
There are works to utilize effective methods to improve the performance of analyzing
spatial information in the domain of facial skin diagnosis. Yadav’s study [
7
] and Junayed’s
study [
8
] proposed a pre-processing method for improving the performance of diagnosing
facial skin problems. Yadav applied a method of changing image color space from RGB to
HSV to emphasize the acne area. Junayed applied a method for generating multiple images
by changing color spaces to reduce noise and emphasize the acne scar areas. Bekmirzaev [
9
]
proposed a segmentation model structure for multiple facial skin problems, which consists
of Long Short-Term Memory (LSTM) layers and convolutional layers. Gessert [
10
] proposed
a deep neural network structure whose input is multiple divided sections from high-
resolution skin photos and consists of convolutional layers and recurrent layers. Gulzar [
11
]
proposed a segmentation model for skin lesions, which combines a vision transformer and
U-Net structure.
There exist works to effectively detect small-sized objects with CNN models [
13
21
].
Cui [
13
] proposed a CNN structure for detecting small-size objects by revising the Single
Shot Detector (SSD) structure by applying fusion layers for multiple convolutional layers
with deconvolutional layers. Liu [
14
] proposed a modified mask R-CNN model to detect
cracks in asphalt pavement using ground-penetrating radar images by adding a feature
pyramid network to the original backbone network of the mask R-CNN.
The studies applying the small-sized object detection model in specific domains are
summarized as shown in Table 1.
Appl. Sci. 2023,13, 989 3 of 24
Table 1. Representative Studies for Detecting Small Size Objects.
Study Input Image Functionality Enhanced Section
Liu [14]Ground-Penetrating
Radar Image
To detect signals for cracks in
asphalt pavement
Applied two feature pyramid networks as
backbone networks in Mask R-CNN.
Nie [15] Remote Sensing Images To detect and segment ships Added bottom-up structure to the feature
pyramid networks in Mask R-CNN.
Liu [16]
Captured images by
Unmanned aerial vehicle
(UAV)
To detect objects in UAV
perspective
Modified backbone network in YOLO to
increase receptive fields.
Seo [17] Medical Image To segment liver and
liver tumor
Added a residual path with deconvolution and
activation operations to the skip connection of
the U-Net.
Peng [18]3D Magnetic Resonance
Imaging Image
To segment breast tumor in
3D image
Proposed a segmentation structure consisting
of localization and segmentation by two
Pseudo-Siamese networks (PSN).
Tian [19] Captured images by UAV To detect objects in UAV
perspective
Proposed a detection model consisting of two
detection networks to compute
low-dimensional features
The small-sized object detections and segmentations are required in multiple domains.
The proposed studies enhanced the original CNN models to add or modify networks.
There exist works to enhance the quality of input images, such as face photos. They
proposed to enlarge images and improve the quality of the image. Various approaches for
enhancing resolutions have been proposed in [2227].
There exist works to analyze directional images including [
28
30
]. Wei [
28
] proposed
a CNN model for segmenting brain areas from MRI by segmenting the brain MRI image
into coronal, sagittal, and transverse axials. The feature maps for each direction are applied
to generate the 3D brain segmentation result.
There exist works to handle false segmentation with machine learning including
[3135]
.
Sander proposed a process to discard the failure detection area using the CNN model and
knowledge-based filtering [31].
There exist works to distinguish the target class images among similar images in-
cluding [
36
,
37
]. Khan’s study proposed a medical image classification by the appearance
similarity of each organ in different medical images using a method fused scale-invariant
feature transform (SIFT) descriptor and Harris corner algorithm [36].
The current related works provide various deep-learning approaches to diagnose
facial skin problems, but they do not address the technical hardship in diagnosing facial
skin problems. Our study is distinct on identifying the specific challenges of the diagnosis
problems and proposing a set of 5 practical tactics to handle the challenges. As a result, the
diagnosis performance is high enough to be utilized in clinics.
3. Technical Challenges in Diagnosing Facial Skin Problems
Due to the intrinsic characteristics of facial skin problems, the diagnosis of skin
problems—even with advanced machine learning algorithms—presents the following
technical challenges.
3.1. Challenge #1: Detecting Small-Sized Skin Problems Such as Pores, Moles, and Acne
The CNN algorithm can effectively analyze spatial information. CNN is based on
the shared-weight architecture of the convolution kernels and filters that slide along input
features and provide translation-equivariant responses known as feature maps [38].
However, the performance of CNN models drops when the size of the target object
in an image is considerably small. Some studies have proposed the problems of detecting
small-sized objects by CNN models and proceeded experiments to find better algorithms
Appl. Sci. 2023,13, 989 4 of 24
for detecting the objects [
39
41
]. This is due to the limited spatial features exposed in
the small-sized object image, and consequently, the limited extraction of spatial features
with filters.
Using our collection of 2225 facial photos (at a resolution of 576
×
576 pixels), the
average occupation ratios of facial skin problem areas on the photos are measured, as
shown in Table 2.
Table 2. Comparing occupation ratios of face, face section, and facial skin problem areas.
Instances on Facial Photo Average Number of Pixels Occupation Ratio on Photo
Whole Face 101,000 pixels 33.06%
Sections of Face, i.e., Eye, Nose,
Mouth, and Ear 3690 pixels 3.36%
Facial Skin Problems such as
Pore, Mole, and Acne 25 pixels 0.02%
The whole face occupies an average of 33.06% of the photo. A face section such as eye,
nose, mouth, or ear occupies an average of 3.36% of the photo. A facial skin problem, such
as pore, mole, or acne, occupies an average of 0.02% of the photo. The average radius of a
pore is 0.02~0.05 mm [42], and the average radius of a mole is about 6 mm [43].
The hypothesis from this observation is that the detection of such small-sized facial
skin problems with CNN models results in significantly low performance. To validate our
hypothesis, CNN models using the Mask R-CNN algorithm [
44
] were trained to detect
and visually segment 3 different types of objects. The model was trained with ResNet as
the backbone, 0.001 for the learning rate, and (16, 32, 64, 128, 256) as the RPN anchor size.
The performance of detection results using the Dice Similarity Coefficient (DSC) metric is
shown in Table 3.
Table 3. Performance measurements of detecting objects in different sizes.
Target Objects in Different Sizes Average of DSC Measurements
Whole Face 95.6%
Mouth as a Section of the Face 90.9%
Mole as a Facial Skin Problem 31.5%
The average DSC of a mole is shown to be only 31.5%, which is considerably lower
than the average DSC of 95.6% for the entire facial area and the DSC of 90.9% for the
mouth area.
This is due to the size of the mole, which is too small for the filters in a CNN model
to detect the spatial characteristics. If a mole is represented with 25 pixels and the size
of a filter is (5
×
5 (i.e., 25 pixels)), then the visual features of the mole are not captured
enough by the filter; rather, the features are even simplified and lost through the process
of convolution. The resulting feature map could not represent the mole with sufficient
information.
Moreover, the mask placed around such a small object like a mole by a Mask R-CNN
model cannot present its boundary with a high distinction.
3.2. Challenge #2: Detecting about 20 Different Types of Facial Skin Problems
There are about 20 different types of commonly known facial skin problems, including
acne, hyperpigmentation, scars, birthmarks, spider veins, white spots, rosacea, ingrown
hair, moles, wrinkles, dark circles, eye bags, dry skin, oily skin, dull skin, large pores, and
black heads. Some of the common skin problems are shown in Figure 1.
Appl. Sci. 2023,13, 989 5 of 24
Appl.Sci.2023,13,xFORPEERREVIEW5of24
3.2.Challenge#2:Detectingabout20DifferentTypesofFacialSkinProblems
Thereareabout20differenttypesofcommonlyknownfacialskinproblems,includ
ingacne,hyperpigmentation,scars,birthmarks,spiderveins,whitespots,rosacea,in
grownhair,moles,wrinkles,darkcircles,eyebags,dryskin,oilyskin,dullskin,large
pores,andblackheads.SomeofthecommonskinproblemsareshowninFigure1.
Figure1.Differenttypesoffacialskinproblems.
ItischallengingtotrainaCNNmodelthatcandetectallthedifferenttypesoffacial
skinproblemswithahighlevelofperformance.Thisisbecausethe20facialskinproblem
typesarenotdistinctintheirappearances;rather,theyhaveahighsimilarity.Conse
quently,trainingaCNNmodelfromatrainingsetofinstancesfromdifferentclassesbut
ahighsimilaritywouldresultinalowperformanceofclassification.
Moreover,becausefacialskinproblemareasarequitesmallinsize,trainingaCNN
modelfordetectingalltheskintypeswithahighperformancebecomesinfeasible.
3.3.Challenge#3:VariabilityonAppearancesofSameFacialSkinProblemType
Therealsoexistsahighvariabilityontheappearancesofafacialskinproblemtype
amongpeople.Figure2showsfourdifferentappearancesforacne.
Figure2.Differentappearancesofacnes.
Foragivenfacialskinproblemtype,therecanbetensofdifferentappearances,var
yingintheshape,size,depth,darkness,borderlinevividness,anddirection.Whencon
sideringabout20differentfacialskinproblemtypesandanaverageof‘m’differentap
pearancesforeachfacialskinproblemtype,thereexist(20*m)spatialpatternstoberec
ognizedbyaCNNmodel.When‘m’is50,thereare1000spatialpatternstohandle.
ItischallengingtotrainaCNNmodelthatcandetectthatmanydifferentspatial
patternswithahighlevelofperformance.Thisisbecausethevariabilityinappearances
forthesamefacialskinproblemtypeexpandstheheterogeneityofspatialfeatureswithin
afacialskinproblemtypeandthecomplexityofthespatialfeaturestohandleforall20
facialskinproblemtypes.
Figure 1. Different types of facial skin problems.
It is challenging to train a CNN model that can detect all the different types of facial
skin problems with a high level of performance. This is because the 20 facial skin problem
types are not distinct in their appearances; rather, they have a high similarity. Consequently,
training a CNN model from a training set of instances from different classes but a high
similarity would result in a low performance of classification.
Moreover, because facial skin problem areas are quite small in size, training a CNN
model for detecting all the skin types with a high performance becomes infeasible.
3.3. Challenge #3: Variability on Appearances of Same Facial Skin Problem Type
There also exists a high variability on the appearances of a facial skin problem type
among people. Figure 2shows four different appearances for acne.
Appl.Sci.2023,13,xFORPEERREVIEW5of24
3.2.Challenge#2:Detectingabout20DifferentTypesofFacialSkinProblems
Thereareabout20differenttypesofcommonlyknownfacialskinproblems,includ
ingacne,hyperpigmentation,scars,birthmarks,spiderveins,whitespots,rosacea,in
grownhair,moles,wrinkles,darkcircles,eyebags,dryskin,oilyskin,dullskin,large
pores,andblackheads.SomeofthecommonskinproblemsareshowninFigure1.
Figure1.Differenttypesoffacialskinproblems.
ItischallengingtotrainaCNNmodelthatcandetectallthedifferenttypesoffacial
skinproblemswithahighlevelofperformance.Thisisbecausethe20facialskinproblem
typesarenotdistinctintheirappearances;rather,theyhaveahighsimilarity.Conse
quently,trainingaCNNmodelfromatrainingsetofinstancesfromdifferentclassesbut
ahighsimilaritywouldresultinalowperformanceofclassification.
Moreover,becausefacialskinproblemareasarequitesmallinsize,trainingaCNN
modelfordetectingalltheskintypeswithahighperformancebecomesinfeasible.
3.3.Challenge#3:VariabilityonAppearancesofSameFacialSkinProblemType
Therealsoexistsahighvariabilityontheappearancesofafacialskinproblemtype
amongpeople.Figure2showsfourdifferentappearancesforacne.
Figure2.Differentappearancesofacnes.
Foragivenfacialskinproblemtype,therecanbetensofdifferentappearances,var
yingintheshape,size,depth,darkness,borderlinevividness,anddirection.Whencon
sideringabout20differentfacialskinproblemtypesandanaverageof‘m’differentap
pearancesforeachfacialskinproblemtype,thereexist(20*m)spatialpatternstoberec
ognizedbyaCNNmodel.When‘m’is50,thereare1000spatialpatternstohandle.
ItischallengingtotrainaCNNmodelthatcandetectthatmanydifferentspatial
patternswithahighlevelofperformance.Thisisbecausethevariabilityinappearances
forthesamefacialskinproblemtypeexpandstheheterogeneityofspatialfeatureswithin
afacialskinproblemtypeandthecomplexityofthespatialfeaturestohandleforall20
facialskinproblemtypes.
Figure 2. Different appearances of acnes.
For a given facial skin problem type, there can be tens of different appearances, varying
in the shape, size, depth, darkness, borderline vividness, and direction. When considering
about 20 different facial skin problem types and an average of ‘m’ different appearances for
each facial skin problem type, there exist (20
×
m) spatial patterns to be recognized by a
CNN model. When ‘m’ is 50, there are 1000 spatial patterns to handle.
It is challenging to train a CNN model that can detect that many different spatial
patterns with a high level of performance. This is because the variability in appearances for
the same facial skin problem type expands the heterogeneity of spatial features within a
facial skin problem type and the complexity of the spatial features to handle for all 20 facial
skin problem types.
3.4. Challenge #4: Similarity on Appearances of Different Facial Skin Problem Types
As discussed earlier, some facial skin problem types are not highly distinguishable;
rather, they show some similarities. As an example, consider the instances of a mole,
blackhead, hyperpigmentation, and age spot as shown in Figure 3.
Appl. Sci. 2023,13, 989 6 of 24
Appl.Sci.2023,13,xFORPEERREVIEW6of24
3.4.Challenge#4:SimilarityonAppearancesofDifferentFacialSkinProblemTypes
Asdiscussedearlier,somefacialskinproblemtypesarenothighlydistinguishable;
rather,theyshowsomesimilarities.Asanexample,considertheinstancesofamole,
blackhead,hyperpigmentation,andagespotasshowninFigure3.
Figure3.Appearancesimilarityamong4differentfacialskinproblemtypes.
Theyarenothighlydistinguishable,butexhibitaconsiderablelevelofsimilarities.
Inthefigure,themoleissimilartotheblackheadandtheconcentralpartof
hyperpigmentation,whichissimilartotheagespot.
ItischallengingtotrainaCNNmodelthatcandistinguishamongdifferentfacial
skinproblemtypeswithahighappearancesimilarity.Thisisbecausethisappearance
similaritybetweendifferentfacialskinproblemtypesshouldbelearnedbyaCNNmodel
andtrainingthemodelrequiresasufficientlylargetrainingsetthatisconfiguredtorep
resentallappearancevariants.Moreover,thissimilarityaddsthecomplexityofthespatial
featurestorecognize.
3.5.Challenge#5:FalseSegmentationsonNonFacialAreas
Facialskinproblemsoccuronlyonthefacialarea,andhencedetectionoftheskin
problemsshouldoccuronthefacialarea.Aphotoorimageforafacetypicallyincludes
imagesofeyebrowsandhairsonthehead.Consequently,atrainedCNNmodelforthe
purposeofdetectingfacialskinproblemscouldfalselydetectskinprobleminstanceson
nonfacialareas.
Figure4showsexamplesofthefalsedetectionoffacialskinproblemsonnonfacial
areasusingaMaskRCNNmodel.
Figure4.Examplesoffalsedetectiononnonfacialareas.
Theleftimageshowsafalsedetectionofawinklearoundhairsontheforeheadand
therightimageshowsafalsedetectionofacneonthenose.Thistypeoffalsedetection
couldoccurwheneveranonfacialareacontainsshapesthataresimilartothefacialskin
problemtypes.
Figure 3. Appearance similarity among 4 different facial skin problem types.
They are not highly distinguishable, but exhibit a considerable level of similarities. In
the figure, the mole is similar to the blackhead and the concentral part of hyperpigmentation,
which is similar to the age spot.
It is challenging to train a CNN model that can distinguish among different facial skin
problem types with a high appearance similarity. This is because this appearance similarity
between different facial skin problem types should be learned by a CNN model and
training the model requires a sufficiently large training set that is configured to represent all
appearance variants. Moreover, this similarity adds the complexity of the spatial features
to recognize.
3.5. Challenge #5: False Segmentations on Non-Facial Areas
Facial skin problems occur only on the facial area, and hence detection of the skin
problems should occur on the facial area. A photo or image for a face typically includes
images of eyebrows and hairs on the head. Consequently, a trained CNN model for the
purpose of detecting facial skin problems could falsely-detect skin problem instances on
non-facial areas.
Figure 4shows examples of the false-detection of facial skin problems on non-facial
areas using a Mask R-CNN model.
Appl.Sci.2023,13,xFORPEERREVIEW6of24
3.4.Challenge#4:SimilarityonAppearancesofDifferentFacialSkinProblemTypes
Asdiscussedearlier,somefacialskinproblemtypesarenothighlydistinguishable;
rather,theyshowsomesimilarities.Asanexample,considertheinstancesofamole,
blackhead,hyperpigmentation,andagespotasshowninFigure3.
Figure3.Appearancesimilarityamong4differentfacialskinproblemtypes.
Theyarenothighlydistinguishable,butexhibitaconsiderablelevelofsimilarities.
Inthefigure,themoleissimilartotheblackheadandtheconcentralpartof
hyperpigmentation,whichissimilartotheagespot.
ItischallengingtotrainaCNNmodelthatcandistinguishamongdifferentfacial
skinproblemtypeswithahighappearancesimilarity.Thisisbecausethisappearance
similaritybetweendifferentfacialskinproblemtypesshouldbelearnedbyaCNNmodel
andtrainingthemodelrequiresasufficientlylargetrainingsetthatisconfiguredtorep
resentallappearancevariants.Moreover,thissimilarityaddsthecomplexityofthespatial
featurestorecognize.
3.5.Challenge#5:FalseSegmentationsonNonFacialAreas
Facialskinproblemsoccuronlyonthefacialarea,andhencedetectionoftheskin
problemsshouldoccuronthefacialarea.Aphotoorimageforafacetypicallyincludes
imagesofeyebrowsandhairsonthehead.Consequently,atrainedCNNmodelforthe
purposeofdetectingfacialskinproblemscouldfalselydetectskinprobleminstanceson
nonfacialareas.
Figure4showsexamplesofthefalsedetectionoffacialskinproblemsonnonfacial
areasusingaMaskRCNNmodel.
Figure4.Examplesoffalsedetectiononnonfacialareas.
Theleftimageshowsafalsedetectionofawinklearoundhairsontheforeheadand
therightimageshowsafalsedetectionofacneonthenose.Thistypeoffalsedetection
couldoccurwheneveranonfacialareacontainsshapesthataresimilartothefacialskin
problemtypes.
Figure 4. Examples of false detection on non-facial areas.
The left image shows a false detection of a winkle around hairs on the forehead and
the right image shows a false detection of acne on the nose. This type of false detection
could occur whenever a non-facial area contains shapes that are similar to the facial skin
problem types.
4. Design of Tactics for Remedying the Technical Challenges
To remedy the technical challenges presented earlier and yield a high performance
of detecting and segmenting facial skin problems, a set of 5 effective technical tactics are
presented in this section. Each tactic is used to handle one or more technical challenges as
shown in Figure 5.
Appl. Sci. 2023,13, 989 7 of 24
Appl.Sci.2023,13,xFORPEERREVIEW7of24
4.DesignofTacticsforRemedyingtheTechnicalChallenges
Toremedythetechnicalchallengespresentedearlierandyieldahighperformance
ofdetectingandsegmentingfacialskinproblems,asetof5effectivetechnicaltacticsare
presentedinthissection.Eachtacticisusedtohandleoneormoretechnicalchallengesas
showninFigure5.
Figure5.Effectivenessofthetacticsonremedyingthetechnicalchallenges.
4.1.DesignofTactic#1:RefiningMaskRCNNNetworkwithFusionandDeconvolutionLayers
ThistacticistodevisearefinedversionofMaskRCNNnetworkstructurethatis
suitablefordetectingsmallsizedobjectssuchasfacialskinproblems.Toovercomethe
limitationsofCNNalgorithmsindetectingsmallsizedobjects,theMaskRCNNstruc
tureisrefinedwithtwoelements:FusionLayersandDeconvolutionLayers.
ThestructureofourrefinedMaskRCNNstructureisshowninFigure6.
Figure6.StructureofrefinedmaskRCNNforsmallsizedobjects.
ThenetworkstructureofaCNNmodelisshownonthetopofthefigure,consisting
ofconvolutionlayersandpoolinglayers.TheCNNstructureisrefinedbyperformingthe
followingsixsteps.
Figure 5. Effectiveness of the tactics on remedying the technical challenges.
4.1. Design of Tactic #1: Refining Mask R-CNN Network with Fusion and Deconvolution Layers
This tactic is to devise a refined version of Mask R-CNN network structure that is
suitable for detecting small-sized objects such as facial skin problems. To overcome the
limitations of CNN algorithms in detecting small-sized objects, the Mask R-CNN structure
is refined with two elements: Fusion Layers and Deconvolution Layers.
The structure of our refined Mask R-CNN structure is shown in Figure 6.
Appl.Sci.2023,13,xFORPEERREVIEW7of24
4.DesignofTacticsforRemedyingtheTechnicalChallenges
Toremedythetechnicalchallengespresentedearlierandyieldahighperformance
ofdetectingandsegmentingfacialskinproblems,asetof5effectivetechnicaltacticsare
presentedinthissection.Eachtacticisusedtohandleoneormoretechnicalchallengesas
showninFigure5.
Figure5.Effectivenessofthetacticsonremedyingthetechnicalchallenges.
4.1.DesignofTactic#1:RefiningMaskRCNNNetworkwithFusionandDeconvolutionLayers
ThistacticistodevisearefinedversionofMaskRCNNnetworkstructurethatis
suitablefordetectingsmallsizedobjectssuchasfacialskinproblems.Toovercomethe
limitationsofCNNalgorithmsindetectingsmallsizedobjects,theMaskRCNNstruc
tureisrefinedwithtwoelements:FusionLayersandDeconvolutionLayers.
ThestructureofourrefinedMaskRCNNstructureisshowninFigure6.
Figure6.StructureofrefinedmaskRCNNforsmallsizedobjects.
ThenetworkstructureofaCNNmodelisshownonthetopofthefigure,consisting
ofconvolutionlayersandpoolinglayers.TheCNNstructureisrefinedbyperformingthe
followingsixsteps.
Figure 6. Structure of refined mask R-CNN for small-sized objects.
The network structure of a CNN model is shown on the top of the figure, consisting of
convolution layers and pooling layers. The CNN structure is refined by performing the
following six steps.
Step 1 is to identify the Front-end Block that captures finer-grained features of the input
image. The block consists of ‘x’ number of layers that perform convolution and pooling
operations to extract features from the input image. The size of this block is determined
by the kernel size of each layer in this block and the average size of annotated facial skin
problem instances. Front-end Block is defined by the layers that have a smaller kernel
size rather than the average size of facial skin problem areas captured on its immediately
preceding feature map.
Appl. Sci. 2023,13, 989 8 of 24
Step 2 is to identify the Back-end Block that captures coarser-grained features of the
input image. The block consists of the same ‘x’ number of layers that extract the features of
larger-sized objects. The size of this block is same as the same size of the Front-end Block
because the Fusion Block requires pairs of a layer in Front-end Block layer and a layer in
Back-end Block layer as shown in Figure 6.
Step 3 is to generate Deconvolution Block that consists of ‘x’ deconvolution layers. This
block is used to enlarge the size of input feature maps from Back-end Block, which are fed
into a Fusion Block. A deconvolution layer performs the reverse operation of convolution,
i.e., enlarging the size of the feature map created by a convolution layer. That is, each vector
in a feature map is padded with a value of zero.
Step 4 is to generate a Fusion Block that consists of ‘x’ fusion layers. This block is used
to fuse two feature maps from two sources: Front-end Block and Back-end Block. That is,
each fusion layer receives a feature map from (i + t)th layer in Front-end Block and a feature
map from (i+x+t)th layer in Back-end Block, sums up the two feature maps and returns a
feature map.
Step 5 is to refine the structure of the Region Proposal Network of the Mask R-CNN
model by entering the feature maps from the Fusion Block as inputs to the Regional Proposal
Network. Note that the basic structure of the Regional Proposal Network in the Mask R-CNN
model is constructed from the feature maps from layers appearing later in the network.
In contrast, our refined Regional Proposal Network is enhanced with features maps from
Front-end Block that capture the spatial features of small-sized objects.
Step 6 is to apply the Object Detection Network of the Mask R-CNN model to detect
facial skin problem instances and the Object Segmentation Network of Mask R-CNN to
segment the detected problem areas.
By applying this process of six steps, the enhanced version of the Mask R-CNN model
can detect target objects of a small size, and consequently, the performance of facial skin
problem diagnosis can significantly be increased.
Hyperparameters of the proposed network are set to 0.001 as the learning rate, 0.3 as
the detection non-maximum suppression (NMS) threshold, 0.5 for the region of interest
(ROI) positive ratio, 0.7 as a threshold for RPN and NMS, (0.5, 1, 2) as RPN anchor ratio,
(8, 16, 32, 64, 128) as the RPN anchor size, localization loss (smooth L1) and confidence
loss (Softmax) for loss functions in proposed front-end and back-end blocks, localization,
and average binary cross-entropy loss. In addition, the loss function for the refined mask
R-CNN is computed by the sum of loss functions for classification, bounding box detection,
and segmentation.
4.2. Design of Tactic #2: Super Resolution Generative Adversarial Network (GAN) for Small-sized
and Blurry Images
This tactic is to enhance the quality of small-sized object images, i.e., facial skin
problem instances, by applying a Super-Resolution Generative Adversarial Network (SR-
GAN) [
45
]. Generative Adversarial Network (GAN) consists of a generator network and a
discriminator network to compete with each other to generate accurate predictions. GR-
GAN is a GAN model that upscales and improves the quality of low-resolution images.
That is, the structure of the Generator in GAN is enhanced with Sub-Pixel Convolution Layers
as shown in Figure 7.
ASub-Pixel Convolutional Layer is to enlarge the size of the feature map by combining
the vectors in feature maps into a single feature map. Then, the resulting feature map
consists of a larger number of vectors than the input feature map. Accordingly, an original
image is enhanced with more detailed image features.
The effect of applying SR-GAN on facial skin problem diagnosis is to enlarge the small-
sized facial skin problem instances and to result in a more accurate problem diagnosis with
Mask R-CNN models.
Appl. Sci. 2023,13, 989 9 of 24
Appl.Sci.2023,13,xFORPEERREVIEW9of24
Figure7.GeneratorinSRGANwithSubPixelconvolutionlayers.
ASubPixelConvolutionalLayeristoenlargethesizeofthefeaturemapbycombining
thevectorsinfeaturemapsintoasinglefeaturemap.Then,theresultingfeaturemap
consistsofalargernumberofvectorsthantheinputfeaturemap.Accordingly,anoriginal
imageisenhancedwithmoredetailedimagefeatures.
TheeffectofapplyingSRGANonfacialskinproblemdiagnosisistoenlargethe
smallsizedfacialskinprobleminstancesandtoresultinamoreaccurateproblemdiag
nosiswithMaskRCNNmodels.
4.3.DesignofTactic#3:TrainingFacialSkinProblemSpecificSegmentationModels
Thistacticistotrainasegmentationmodelforeachtypeoffacialskinproblems.This
isbasedontheobservationonthefeatureextractionschemesofCNNalgorithm.CNN
networkconsistsofconvolutionlayerstolearnspatialcharacteristicsofobjects,pooling
layerstoreducethedimensionsofthefeaturemaps,flatteninglayerstoconvertthere
sultant2dimensionalarraysintoasinglelongcontinuouslinearvector,andfullycon
nectedlayerstoconnecteveryinputneurontoeveryoutputneuron[46,47].
However,CNNmodelsprovidelowerperformancefordetectingmulticlassobjects
duetothelearningschemeofspatialfeatureswithconvolutionlayersandthedimension
reductionschemewithpoolinglayers[46,47].Generatingthisphenomenon,detecting
multiclassobjectsisharderthandetectingsingleclassobjects[48,49].Forexample,a
CNNmodeldetectingpeoplewouldperformbetterthanaCNNmodeldetectingthepeo
plewithgenderinformation,i.e.,maleandfemale.
Anotherexampleistodetectanimalsinazoo.Detectingasingleclassobject,suchas
detecting‘dog’,shouldoutperformcomparingtodetecting100differentanimaltypesin
azoo.Then,theCNNmodelfor100classobjectsmusthandletheappearancefeaturesof
all100typesofanimals.Throughtherepetitionofapplyingconvolutionandpoolingin
CNN,thespatialfeaturesof100animaltypesareabstractedbycancellingsomeofthe
acquiredfeaturesthroughactivationfunctions,suchasReLU.
AnothercauseofthelowerperformanceofthemulticlassCNNmodelisthetech
nicalhardshipindistinguishingobjectsofdifferenttypesbuthavingsomedegreeofap
pearancesimilarity[49–52].Forexample,domesticdogs,wolves,coyotes,foxes,jackals,
anddingoesbelongtodifferentanimalclasses,butthereexistanumberofsimilarappear
ancefeaturesamongdifferenttypesofanimals,suchasbetweendomesticdogsand
wolvesandbetweencoyotesandfoxes.
Hence,thistacticistotrainksegmentationmodelsforktypesoffacialskinproblems,
ratherthantrainingasinglesegmentationmodeltorecognizeallktypesoffacialskin
problems.Thatis,foragivenfacialimage,theksegmentationmodelsareindividually
appliedtodetectandsegmentitsspecificfacialskinproblemtype.Then,theresultsof
applyingksegmentationmodelsareintegratedintoasingleoutputasshowninFigure8.
Figure 7. Generator in SR-GAN with Sub-Pixel convolution layers.
4.3. Design of Tactic #3: Training Facial Skin Problem-Specific Segmentation Models
This tactic is to train a segmentation model for each type of facial skin problems.
This is based on the observation on the feature extraction schemes of CNN algorithm.
CNN network consists of convolution layers to learn spatial characteristics of objects,
pooling layers to reduce the dimensions of the feature maps, flattening layers to convert
the resultant 2-dimensional arrays into a single long continuous linear vector, and fully
connected layers to connect every input neuron to every output neuron [46,47].
However, CNN models provide lower performance for detecting multi-class objects
due to the learning scheme of spatial features with convolution layers and the dimension
reduction scheme with pooling layers [
46
,
47
]. Generating this phenomenon, detecting
multi-class objects is harder than detecting single-class objects [
48
,
49
]. For example, a CNN
model detecting people would perform better than a CNN model detecting the people with
gender information, i.e., male and female.
Another example is to detect animals in a zoo. Detecting a single-class object, such as
detecting ‘dog’, should outperform comparing to detecting 100 different animal types in a
zoo. Then, the CNN model for 100-class objects must handle the appearance features of all
100 types of animals. Through the repetition of applying convolution and pooling in CNN,
the spatial features of 100 animal types are abstracted by cancelling some of the acquired
features through activation functions, such as ReLU.
Another cause of the lower performance of the multi-class CNN model is the technical
hardship in distinguishing objects of different types but having some degree of appearance
similarity [
49
52
]. For example, domestic dogs, wolves, coyotes, foxes, jackals, and dingoes
belong to different animal classes, but there exist a number of similar appearance features
among different types of animals, such as between domestic dogs and wolves and between
coyotes and foxes.
Hence, this tactic is to train ksegmentation models for ktypes of facial skin problems,
rather than training a single segmentation model to recognize all ktypes of facial skin
problems. That is, for a given facial image, the ksegmentation models are individually
applied to detect and segment its specific facial skin problem type. Then, the results of
applying ksegmentation models are integrated into a single output as shown in Figure 8.
In the figure, the facial image is fed into ksegmentation models, which will detect and
segment its specific facial skin problem type. The results are integrated into a single output.
By specializing segmentation models by their facial skin problem types, the perfor-
mance of facial skin diagnosis is increased over employing a single integrated segmenta-
tion model.
Appl. Sci. 2023,13, 989 10 of 24
Figure 8. Applying ksegmentation models and integrating the results of all segmentations.
4.4. Design of Tactic #4: Training Face Direction-Specific Segmentation Models
This tactic is to train a segmentation model for each direction of a face, i.e., left-side face,
frontal face, and right-side face. A face photo is taken from one direction, and the photo
cannot capture the facial skin problem instances on other directions, such as instances near
ears or side-cheeks. As a result, a single segmentation model to detect facial skin problem
instances on all different areas cannot correctly detect all the skin problem instances.
This tactic is to handle this problem by applying face direction-specific segmentation
models. To utilize this tactic, a set of 3 facial photos is required as input for the facial skin
problem diagnosis system. In order to determine the direction of the face photo, a Face
Direction Identifier is designed by applying Facial Landmark Detection model [
53
]. Using
the face contours and locations of nose, mouth, and the eyes, its direction can automatically
be identified.
An example of applying face direction-specific segmentation models is shown in
Figure 9.
Appl.Sci.2023,13,xFORPEERREVIEW11of24
Figure9.Applyingfacedirectionspecificsegmentationmodels.
4.5.DesignofTactic#5:DiscardingSegmentationsonNonFacialAreasUsingFacial
LandmarkModel
Thistacticistodiscardthefalsesegmentationsmadeonnonfacialareasbyapplying
aFacialLandmarkDetectionmodel.Thefalsesegmentationscaneffectivelybediscarded
withthefollowingsteps.
Step1istodetectfacialskinprobleminstancesandfaciallandmarksforeachfacial
image.TheFacialLandmarkDetectionmodelisusedtodetectthelandmarksontheface
image.
Step2istogenerateamaskaroundonlytheskinareaofthefacialimage.Thatis,the
eyes,eyebrows,mouth,nostril,andhairsareexcludedinthismark.
Step3istoremovethefacialskinprobleminstancesinnonfacialareas.Thisisdone
byoverlayingthetwotypesofmasks,masksoffacialskinprobleminstancesandthemask
ofthefacialskinarea,anddiscardingsegmentationsmadeontheoutsideofthemaskof
thefacialskinarea.
AnexampleofdiscardingfalsesegmentationsisshowninFigure10.
Figure 9. Applying face direction-specific segmentation models.
Appl. Sci. 2023,13, 989 11 of 24
As shown in the figure, Face Direction Identifier determines the direction of each input
face photo. Then, its specific segmentation model is applied to diagnose the facial skin
problem instances, and their results are integrated.
By specializing segmentation models by the directions of a face photo, the performance
of facial skin diagnosis is increased over employing a single integrated segmentation model.
4.5. Design of Tactic #5: Discarding Segmentations on Non-Facial Areas Using Facial
Landmark Model
This tactic is to discard the false segmentations made on non-facial areas by applying
a Facial Landmark Detection model. The false segmentations can effectively be discarded
with the following steps.
Step 1 is to detect facial skin problem instances and facial landmarks for each facial
image. The Facial Landmark Detection model is used to detect the landmarks on the
face image.
Step 2 is to generate a mask around only the skin area of the facial image. That is, the
eyes, eyebrows, mouth, nostril, and hairs are excluded in this mark.
Step 3 is to remove the facial skin problem instances in non-facial areas. This is done
by overlaying the two types of masks, masks of facial skin problem instances and the mask
of the facial skin area, and discarding segmentations made on the outside of the mask of
the facial skin area.
An example of discarding false segmentations is shown in Figure 10.
Appl.Sci.2023,13,xFORPEERREVIEW11of24
Figure9.Applyingfacedirectionspecificsegmentationmodels.
4.5.DesignofTactic#5:DiscardingSegmentationsonNonFacialAreasUsingFacial
LandmarkModel
Thistacticistodiscardthefalsesegmentationsmadeonnonfacialareasbyapplying
aFacialLandmarkDetectionmodel.Thefalsesegmentationscaneffectivelybediscarded
withthefollowingsteps.
Step1istodetectfacialskinprobleminstancesandfaciallandmarksforeachfacial
image.TheFacialLandmarkDetectionmodelisusedtodetectthelandmarksontheface
image.
Step2istogenerateamaskaroundonlytheskinareaofthefacialimage.Thatis,the
eyes,eyebrows,mouth,nostril,andhairsareexcludedinthismark.
Step3istoremovethefacialskinprobleminstancesinnonfacialareas.Thisisdone
byoverlayingthetwotypesofmasks,masksoffacialskinprobleminstancesandthemask
ofthefacialskinarea,anddiscardingsegmentationsmadeontheoutsideofthemaskof
thefacialskinarea.
AnexampleofdiscardingfalsesegmentationsisshowninFigure10.
Figure 10. Example of discarding false segmentations.
In the figure, the Wrinkle Segmenter model produces marks of detected wrinkles. Three
of the wrinkle segmentations are made on a non-facial area. The Facial Landmark Detector
produces a mask of facial skin area. By overlaying two types of masks, the false segmented
wrinkles are discarded.
4.6. Design of the Main Control Flow
The main control flow of the facial skin diagnosis system is to invoke the functional
components that implement the proposed 5 tactics. The control flow is shown in the
following algorithm as shown in Algorithm 1.
As shown in the algorithm, the main control of the diagnosis system reads facial
photos as the input and invokes the functional components that implement the 5 tactics.
Appl. Sci. 2023,13, 989 12 of 24
Algorithm 1. Main control flow of ‘Facial Skin Problem Diagnosis’ system.
Input:photos: A list of 3 face photos (per person)
Output:FSPResults: A list of detected facial skin problem instances
1: Main() {
2: FSPResults = [];
3: SRGAN = // SR-GAN Model for upscaling and improving quality of Images
4: FaceLandmarkDetector = // Model for Face Landmark Detector
5: upscalingRatio = // Ratio of upscaling image by SRGAN
6: segmenters = // set of segmentation models for face direction and face skin problems
7:
8: for (photo in photos){
9: // Step 1. Identify Face Directions (regarding the tactic #4)
10: landmarks = FaceLandmarkDetector.identify(photo);
11: locMouth = // Location of Mouth from detected Landmarks
12: locNose = // Location of Nose from Detected Landmarks
13: locRt = // Location of right side of face in detected landmarks
14: locLt = // Location of Left side of face in detected landmarks
15: if ((|locMouth-locLT| < |locMouth-locRT|) & (|locNose-locLT| < |locNose-locRT|))
16: curDirection = LEFT;
17: else if ((|locMouth-locLT| > |locMouth-locRT|) & (|locNose-locLT| > |locNose-
18: locRT|))
19: curDirection = RIGHT;
20: else curDirection = FRONTAL;
21:
22: // Step 2. Invoke Facial Skin Problem-specific Segmenters (regarding the tactic #3)
23: listFSPs = [];
24: for (segmenter_type in segmenters[curDirection]){
25: SEGRef_type = // Segmenter based on Refined mask R-CNN from segmenter_type
26:
SEGOrg_type = // Segmenter based on Original mask R-CNN from segmenter_type
27: // Step 3. Applying Refined mask R-CNN (regarding the tactic #1)
28: resultRef = SegRef_type.segment(photo);
29:
30: // Step 4. Enhance the Quality of Facial Images with SR-GAN model (regarding
31: the tactic #2)
32: sections = {SECi| SEC in photo, SEC = photo}; // Divide Photos
33: resultOrg = [];
34: for (SEC sections){
35: enlargedSEC = SRGAN.enlarge(SEC);
36: result = SEGOrg_type.segment(enlargedSEC);
37: resultOrg ( result // After Decreasing size of result to (1/upscalingRatio)
38: }
39: // Determine the facial skin problem
40: // (1) Select a result from step 3 and Step 4
41: if(size(resultOrg) < thInstanceSize)
42: result = resultRef;
43: else
44: result = resultOrg;
45: // (2) Check whether the segmented instances classified by different FSP
46: for (fsp in listFSPs){
47: if (size(fspresult)/max(size(fsp), size(result)) > thSize){
48: if ((confidence score of fsp) > (confidence score of result))
49: // Remain fsp
50: else{
51: // discard fsp from listFSPs and add result
52: }
Appl. Sci. 2023,13, 989 13 of 24
53: }
54: }
55: }
56: // Step 5. Discard false segmentations on non-facial skin area (regarding the tactic
57: #5)
58: faceArea = // Mask for face skin area excepting eyes, nostrils, and mouth.
59: segResult = FSPArea faceArea; // Overlay both segmented area
60: FSPResults ( (curDirection, segResult);
}
return FSPResults;
}
5. Experiments and Assessment
This section is to present the results of experiments by applying the facial skin seg-
mentation models trained with the proposed tactics.
5.1. Datasets for Training Models
The data collection of face photos used for training, evaluation, and experiments
consists of 2225 face photos at a resolution of 576
×
576 pixels. Each photo contains one or
more instances of acne, age spots, moles, rosacea, and wrinkles.
In the experiments, only 5 types of facial skin problem types are considered for the
following criteria.
A minimal set of essential facial skin problem types is needed; performing experiments
with photos of all 20 different facial skin problem types requires a dataset of more
than 10,000 photo images and annotating the facial skin problem areas on each photo
manually by researchers would require an effort of more than 30 person-months.
A dataset including facial skin problem types that are relatively large in size and also
small in size is needed. Hence, photos showing wrinkles and rosacea are selected
for the large-sized types and acne, mole, and age spots are selected for the small-
sized types.
A dataset including different facial skin problem types but having some similarities
in their appearances is needed. Hence, photos showing acne, age spots, and moles
are selected.
A dataset including blurry boundaries between the problem-free skin areas and facial
skin problem areas is needed. Hence, photos showing acne, rosacea, and wrinkles
are selected.
The face photos were acquired from 3 different sources: Acne 04 dataset [
54
] and
Flickr-Faces-HQ dataset [
55
] available on GitHub repositories [
56
,
57
] and FEI face dataset
available on Centro University website [58].
To train CNN models to detect facial skin problems, each photo had to be manually
annotated in all the areas of facial skin problem instances. For this task, COCO Annotator
was utilized as an annotation software tool [
59
] and an XP-Pen Artist 15.6 Pro stylus pen was
used as a touch-pad device. Since a photo may contain several kin problem instances, the
task of manually annotating the 2225 photos demanded a high effort of 8 person-months.
An example of manual annotations for facial skin problem instances is shown in
Figure 11.
Appl. Sci. 2023,13, 989 14 of 24
Appl.Sci.2023,13,xFORPEERREVIEW14of24
Adatasetincludingblurryboundariesbetweentheproblemfreeskinareasandfacial
skinproblemareasisneeded.Hence,photosshowingacne,rosacea,andwrinklesare
selected.
Thefacephotoswereacquiredfrom3differentsources:Acne04dataset[54]and
FlickrFacesHQdataset[55]availableonGitHubrepositories[56,57]andFEIfacedataset
availableonCentroUniversitywebsite[58].
TotrainCNNmodelstodetectfacialskinproblems,eachphotohadtobemanually
annotatedinalltheareasoffacialskinprobleminstances.Forthistask,COCOAnnotator
wasutilizedasanannotationsoftwaretool[59]andanXPPenArtist15.6Prostyluspen
wasusedasatouchpaddevice.Sinceaphotomaycontainseveralkinprobleminstances,
thetaskofmanuallyannotatingthe2225photosdemandedahigheffortof8person
months.
AnexampleofmanualannotationsforfacialskinprobleminstancesisshowninFig
ure11.
Figure11.FacephotowithannotationsanditsJSONrepresentation.
Theleftsideoffigureshowsannotationsfortwofacialskinproblems.Molesarean
notatedinBlue,andwrinklesareannotatedinRed.Therightsideofthefigureshowsthe
JSONrepresentationoftheannotations,whichisrequiredtotrainthemodel.
Byusingourcollectionof2225facephotoswithannotationsonfacialskinproblem
instances,atotalof31MaskRCNNmodelsweretrained.Thedatacollectionisutilized
asatrainingset,validationset,andtestset,asshowninTable4.
Table4.DistributionofDataCollection.
SubsetTypeTrainingSetValidationSetTestSetTotal
Ratio(%)69.9810.0220.0100
#ofPhotos15572234452225
Thetrainingsetconsistsof1557photosthatare70%ofthedatacollection.Thevali
dationsetconsistsof223photosthatare10%ofthedatacollection.Thetestsetconsistsof
445photosthatare30%ofthedatacollection.
Hyperparametersfortrainingmodelsaresetto0.001asthelearningrate,1000as
epochs,and5asbatchsize.Theearlystoppingisappliedtoprohibitoverfittingthemodel
tothetrainingset.
5.2.ProofofConceptImplementation
AwebbasedsystemoffacialskindiagnosishasbeenimplementedinPythonusing
thefollowinglibraries:TensorFlowfordevelopingCNNmodels,NumPyforprocessing
operationsformaskdata,OpenCVforprocessingfacialphotos,MySQLformanagingda
tabase,andDjangoframeworkforbuildingthewebsite.
Figure12showsthewebuserinterfaceofthissystem.
Figure 11. Face photo with annotations and its JSON representation.
The left-side of figure shows annotations for two facial skin problems. Moles are
annotated in Blue, and wrinkles are annotated in Red. The right-side of the figure shows
the JSON representation of the annotations, which is required to train the model.
By using our collection of 2225 face photos with annotations on facial skin problem
instances, a total of 31 Mask R-CNN models were trained. The data collection is utilized as
a training set, validation set, and test set, as shown in Table 4.
Table 4. Distribution of Data Collection.
Subset Type Training Set Validation Set Test Set Total
Ratio (%) 69.98 10.02 20.0 100
# of Photos 1557 223 445 2225
The training set consists of 1557 photos that are 70% of the data collection. The
validation set consists of 223 photos that are 10% of the data collection. The test set consists
of 445 photos that are 30% of the data collection.
Hyperparameters for training models are set to 0.001 as the learning rate, 1000 as
epochs, and 5 as batch size. The early stopping is applied to prohibit overfitting the model
to the training set.
5.2. Proof-of-Concept Implementation
A web-based system of facial skin diagnosis has been implemented in Python using
the following libraries: TensorFlow for developing CNN models, NumPy for processing
operations for mask data, OpenCV for processing facial photos, MySQL for managing
database, and Django framework for building the web site.
Figure 12 shows the web user interface of this system.
The original image, masks generated around the facial skin problems, and overlay of
the mask on the original image are shown on the left-side of the figure. The right-side of
the figure shows the results of identifying facial skin problem instances.
Appl. Sci. 2023,13, 989 15 of 24
Appl.Sci.2023,13,xFORPEERREVIEW15of24
Figure12.UserInterfaceofFacialSkinProblemDiagnosisSystem.
Theoriginalimage,masksgeneratedaroundthefacialskinproblems,andoverlayof
themaskontheoriginalimageareshownontheleftsideofthefigure.Therightsideof
thefigureshowstheresultsofidentifyingfacialskinprobleminstances.
5.3.PerformanceMetricforFacialSkinProblemDiagnosis
DiceSimilarityCoefficient(DSC)isanappropriatemeasureinevaluatingtheperfor
manceofsegmentingthemodels.DSCistomeasuretheratioofmatchingthesegmented
areasonthelabeledareasonimages.Itsmetricisgivenbelow.
𝐷𝑆𝐶 2∗|𝐿𝐵 𝑆𝐸𝐺|
|𝐿𝐵||𝑆𝐸𝐺|
LetLBbeamaskforthelabeldataofaphoto,andLetSEGbeamaskofthesegmen
tationresultsfromthephoto.Let|𝑋|betheareaofmaskedinstancesinaninputmask
X.DSCismeasuredbythesumofmaskedareasforLBandSEGovertheareacontaining
theintersectionareainLBandSEG.TherangeofDSCisbetween0and1.Themoreaccu
ratethesegmentedresults,aDSCvaluecloseto1isreturned.
5.4.ExperimentScenariosandResults
Asetofexperimentswereconductedtoevaluatetheeffectivenessofthe5tactics,an
experimenttoevaluatetheintegrationofallthetactics,andanexperimenttocompareour
approachtootherknownapproaches.
5.4.1.ExperimentforTactic#1:RefinedMaskRCNNSegmentationModels
ThisexperimentistoevaluatetheeffectivenessoftheRefinedMaskRCNNmodel.
Thisisdonebytrainingtwosegmentationmodels:amodelwithMaskRCNNstructure
andamodelwiththerefinedMaskRCNNstructurewithfusionanddeconvolutionlay
ers.
Thedatasetof410facephotoswasusedandeachphotocontainsoneormorefacial
skinproblemsofacne,agespots,andmoles.Thesefacialskinproblemtypesarequite
smallinsize,andtheboundingboxofthelargestprobleminstanceinthedatasetissized
to(12×12)pixels.
Theperformancesofthetwomodelsarecomparedin
Figure13
.
Figure 12. User Interface of Facial Skin Problem Diagnosis System.
5.3. Performance Metric for Facial Skin Problem Diagnosis
Dice Similarity Coefficient (DSC) is an appropriate measure in evaluating the perfor-
mance of segmenting the models. DSC is to measure the ratio of matching the segmented
areas on the labeled areas on images. Its metric is given below.
DSC =2|LB SEG|
|LB|+|SEG|
Let LB be a mask for the label data of a photo, and Let SEG be a mask of the segmenta-
tion results from the photo. Let
|X|
be the area of masked instances in an input mask X.
DSC is measured by the sum of masked areas for LB and SEG over the area containing the
intersection area in LB and SEG. The range of DSC is between 0 and 1. The more accurate
the segmented results, a DSC value close to 1 is returned.
5.4. Experiment Scenarios and Results
A set of experiments were conducted to evaluate the effectiveness of the 5 tactics, an
experiment to evaluate the integration of all the tactics, and an experiment to compare our
approach to other known approaches.
5.4.1. Experiment for Tactic #1: Refined Mask R-CNN Segmentation Models
This experiment is to evaluate the effectiveness of the Refined Mask R-CNN model.
This is done by training two segmentation models: a model with Mask R-CNN structure
and a model with the refined Mask R-CNN structure with fusion and deconvolution layers.
The dataset of 410 face photos was used and each photo contains one or more facial
skin problems of acne, age spots, and moles. These facial skin problem types are quite
small in size, and the bounding box of the largest problem instance in the dataset is sized
to (12 ×12) pixels.
The performances of the two models are compared in Figure 13.
Appl. Sci. 2023,13, 989 16 of 24
Appl.Sci.2023,13,xFORPEERREVIEW16of24
Figure13.ComparingperformancesofconventionalMaskRCNNandenhancedMaskRCNN
models.
Inthisexperiment,asegmentationmodelofconventionalMaskRCNNwasapplied
todetectfacialskinproblemsonallthefacephotos,theperformancesweremeasured
withDSC,andcomputedtheaverageofallDSCmeasurements.Then,asegmentation
modeltrainedwiththeenhancedMaskRCNNwastrainedandappliedtomeasureits
performanceinDSC.
Forthefacialphotoswithacneproblems,theconventionalMaskRCNNmodel
yielded56.63%ofDSCwheretheEnhancedMaskRCNNmodelyielded73.26%asshown
inthefigure.Asignificantdegreeofperformancehasbeengained.
Asthesummaryoftheexperiment,theperformancesofsegmentingacne,agespots,
andmoleproblemswereincreasedby16.63%,13.7%,and19.05%,respectively,andthe
averageofperformancegainsforall3typesoffacialskinproblemsis16.46%.
5.4.2.ExperimentforTactic#2:SuperResolutionGANModel
ThisexperimentistoevaluatetheeffectivenessofapplyingSuperResolutionGAN
modelinenhancingthequalityoffacialimages.Adatasetof3facialskinproblemtypes
wasmade:acne,rosacea,andwrinkle.Asegmentationmodelwastrainedwithboththe
MaskRCNNandtheSRGANstructureandperformedtheexperiments.Theresultsof
applyingthesuperresolutiontacticwithSRGANareshowninFigure14.
Figure14.ResultofapplyingsuperresolutionwithSRGAN.
Inthefigure,theoriginalfaceimageispartitionedinto9imagesof(192×192)size,
whichisrequiredbythetrainedMaskRCNNmodel.Then,eachimageisfedintotheSR
GANmodelthatwillenhancethequalityofimagesasshowninthefigure.
Tocomparetheperformanceofsuperresolution,asegmentationmodelwastrained
withconventionalMaskRCNNandanothersegmentationmodelwithbothMaskR
Figure 13.
Comparing performances of conventional Mask R-CNN and enhanced Mask R-CNN models.
In this experiment, a segmentation model of conventional Mask R-CNN was applied
to detect facial skin problems on all the face photos, the performances were measured
with DSC, and computed the average of all DSC measurements. Then, a segmentation
model trained with the enhanced Mask R-CNN was trained and applied to measure its
performance in DSC.
For the facial photos with acne problems, the conventional Mask R-CNN model
yielded 56.63% of DSC where the Enhanced Mask R-CNN model yielded 73.26% as shown
in the figure. A significant degree of performance has been gained.
As the summary of the experiment, the performances of segmenting acne, age spots,
and mole problems were increased by 16.63%, 13.7%, and 19.05%, respectively, and the
average of performance gains for all 3 types of facial skin problems is 16.46%.
5.4.2. Experiment for Tactic #2: Super Resolution GAN Model
This experiment is to evaluate the effectiveness of applying Super Resolution GAN
model in enhancing the quality of facial images. A dataset of 3 facial skin problem types
was made: acne, rosacea, and wrinkle. A segmentation model was trained with both the
Mask R-CNN and the SR-GAN structure and performed the experiments. The results of
applying the super resolution tactic with SR-GAN are shown in Figure 14.
Appl.Sci.2023,13,xFORPEERREVIEW16of24
Figure13.ComparingperformancesofconventionalMaskRCNNandenhancedMaskRCNN
models.
Inthisexperiment,asegmentationmodelofconventionalMaskRCNNwasapplied
todetectfacialskinproblemsonallthefacephotos,theperformancesweremeasured
withDSC,andcomputedtheaverageofallDSCmeasurements.Then,asegmentation
modeltrainedwiththeenhancedMaskRCNNwastrainedandappliedtomeasureits
performanceinDSC.
Forthefacialphotoswithacneproblems,theconventionalMaskRCNNmodel
yielded56.63%ofDSCwheretheEnhancedMaskRCNNmodelyielded73.26%asshown
inthefigure.Asignificantdegreeofperformancehasbeengained.
Asthesummaryoftheexperiment,theperformancesofsegmentingacne,agespots,
andmoleproblemswereincreasedby16.63%,13.7%,and19.05%,respectively,andthe
averageofperformancegainsforall3typesoffacialskinproblemsis16.46%.
5.4.2.ExperimentforTactic#2:SuperResolutionGANModel
ThisexperimentistoevaluatetheeffectivenessofapplyingSuperResolutionGAN
modelinenhancingthequalityoffacialimages.Adatasetof3facialskinproblemtypes
wasmade:acne,rosacea,andwrinkle.Asegmentationmodelwastrainedwithboththe
MaskRCNNandtheSRGANstructureandperformedtheexperiments.Theresultsof
applyingthesuperresolutiontacticwithSRGANareshowninFigure14.
Figure14.ResultofapplyingsuperresolutionwithSRGAN.
Inthefigure,theoriginalfaceimageispartitionedinto9imagesof(192×192)size,
whichisrequiredbythetrainedMaskRCNNmodel.Then,eachimageisfedintotheSR
GANmodelthatwillenhancethequalityofimagesasshowninthefigure.
Tocomparetheperformanceofsuperresolution,asegmentationmodelwastrained
withconventionalMaskRCNNandanothersegmentationmodelwithbothMaskR
Figure 14. Result of applying super resolution with SR-GAN.
In the figure, the original face image is partitioned into 9 images of (192
×
192) size,
which is required by the trained Mask R-CNN model. Then, each image is fed into the
SR-GAN model that will enhance the quality of images as shown in the figure.
To compare the performance of super resolution, a segmentation model was trained
with conventional Mask R-CNN and another segmentation model with both Mask R-CNN
and SR-GAN. The comparison of the performances of the two models is shown in Figure 15.
Appl. Sci. 2023,13, 989 17 of 24
Appl.Sci.2023,13,xFORPEERREVIEW17of24
CNNandSRGAN.Thecomparisonoftheperformancesofthetwomodelsisshownin
Figure15.
Figure15.ComparingperformancesofsegmentationwithandwithoutSRGAN.
TheMaskRCNNsegmentationmodelwasappliedtodetectallthefacephotos,the
performancewasmeasuredinDSC,andtheaverageofalltheDSCmeasurementswas
computed.Then,asegmentationmodelwithbothMaskRCNNandSRGANwastrained
andappliedtoperformthesameoperations.
Forwrinkleproblems,theDSCwiththeMaskRCNNmodelyielded49.97%ofDSC
wheretheEnhancedMaskRCNNmodelyielded67.24%ofDSC,asshowninthefigure.
Asignificantdegreeofperformancehasbeengained.
Asthesummaryoftheexperiment,theperformancesofsegmentingacne,rosacea,
andwrinkleproblemswereincreasedby8.63%,12.46%,and17.27%,respectively,andthe
averageofperformancegainsforall3typesoffacialskinproblemsis12.79%.
5.4.3.ExperimentforTactic#3:FacialSkinProblemSpecificModels
Thisexperimentistoevaluatetheeffectivenessofapplyingfacialskinproblemspe
cificmodelsinsteadofusingasingleintegratedmodel.Adatasetoffacephotosfor5types
ofskinproblemswasused:acne,agespots,moles,rosacea,andwrinkles.
AnintegratedMaskRCNNmodelforall5typesoffacialskinproblemtypeswas
trained.Then,asetof5individualsegmentationmodelsfor5differentfacialskinproblem
typesweretrained.Then,theirperformancesweremeasuredandcomparedasshownin
Figure16.
Figure 15. Comparing performances of segmentation with and without SR-GAN.
The Mask R-CNN segmentation model was applied to detect all the face photos, the
performance was measured in DSC, and the average of all the DSC measurements was
computed. Then, a segmentation model with both Mask R-CNN and SR-GAN was trained
and applied to perform the same operations.
For wrinkle problems, the DSC with the Mask R-CNN model yielded 49.97% of DSC
where the Enhanced Mask R-CNN model yielded 67.24% of DSC, as shown in the figure. A
significant degree of performance has been gained.
As the summary of the experiment, the performances of segmenting acne, rosacea,
and wrinkle problems were increased by 8.63%, 12.46%, and 17.27%, respectively, and the
average of performance gains for all 3 types of facial skin problems is 12.79%.
5.4.3. Experiment for Tactic #3: Facial Skin Problem-Specific Models
This experiment is to evaluate the effectiveness of applying facial skin problem-specific
models instead of using a single integrated model. A dataset of face photos for 5 types of
skin problems was used: acne, age spots, moles, rosacea, and wrinkles.
An integrated Mask R-CNN model for all 5 types of facial skin problem types was
trained. Then, a set of 5 individual segmentation models for 5 different facial skin problem
types were trained. Then, their performances were measured and compared as shown in
Figure 16.
Appl.Sci.2023,13,xFORPEERREVIEW17of24
CNNandSRGAN.Thecomparisonoftheperformancesofthetwomodelsisshownin
Figure15.
Figure15.ComparingperformancesofsegmentationwithandwithoutSRGAN.
TheMaskRCNNsegmentationmodelwasappliedtodetectallthefacephotos,the
performancewasmeasuredinDSC,andtheaverageofalltheDSCmeasurementswas
computed.Then,asegmentationmodelwithbothMaskRCNNandSRGANwastrained
andappliedtoperformthesameoperations.
Forwrinkleproblems,theDSCwiththeMaskRCNNmodelyielded49.97%ofDSC
wheretheEnhancedMaskRCNNmodelyielded67.24%ofDSC,asshowninthefigure.
Asignificantdegreeofperformancehasbeengained.
Asthesummaryoftheexperiment,theperformancesofsegmentingacne,rosacea,
andwrinkleproblemswereincreasedby8.63%,12.46%,and17.27%,respectively,andthe
averageofperformancegainsforall3typesoffacialskinproblemsis12.79%.
5.4.3.ExperimentforTactic#3:FacialSkinProblemSpecificModels
Thisexperimentistoevaluatetheeffectivenessofapplyingfacialskinproblemspe
cificmodelsinsteadofusingasingleintegratedmodel.Adatasetoffacephotosfor5types
ofskinproblemswasused:acne,agespots,moles,rosacea,andwrinkles.
AnintegratedMaskRCNNmodelforall5typesoffacialskinproblemtypeswas
trained.Then,asetof5individualsegmentationmodelsfor5differentfacialskinproblem
typesweretrained.Then,theirperformancesweremeasuredandcomparedasshownin
Figure16.
Figure 16.
Comparing performance of integrated model and facial skin problem type-specific model.
Appl. Sci. 2023,13, 989 18 of 24
For wrinkle problems, the integrated segmentation model yielded 45.97% of DSC
where the wrinkle-specific segmentation model yielded 59.74% of DSC, as shown in the
figure. A significant degree of performance has been gained.
The performances of segmenting acne, age spots, moles, rosacea, and wrinkles were
increased by 8.2%, 11.06%, 9.07%, 8.6%, and 13.77%, respectively, and the average of
performance gains for all 5 types of facial skin problems is 10.14%.
5.4.4. Experiment for Tactic #4: Face Direction-Specific Models
This experiment is to evaluate the effectiveness of applying face direction-specific
models instead of using a single integrated model. A dataset of face photos taken in
3 different directions was used in this experiment.
A Mask R-CNN model was trained to segment face photos in any direction. Then, a
set of 3 individual models for 3 face directions were trained: left-side, frontal, and right-side
directions. Then, their performances were measured and compared, as shown in Figure 17.
Appl.Sci.2023,13,xFORPEERREVIEW18of24
Figure16.Comparingperformanceofintegratedmodelandfacialskinproblemtypespecific
model.
Forwrinkleproblems,theintegratedsegmentationmodelyielded45.97%ofDSC
wherethewrinklespecificsegmentationmodelyielded59.74%ofDSC,asshowninthe
figure.Asignificantdegreeofperformancehasbeengained.
Theperformancesofsegmentingacne,agespots,moles,rosacea,andwrinkleswere
increasedby8.2%,11.06%,9.07%,8.6%,and13.77%,respectively,andtheaverageofper
formancegainsforall5typesoffacialskinproblemsis10.14%.
5.4.4.ExperimentforTactic#4:FaceDirectionSpecificModels
Thisexperimentistoevaluatetheeffectivenessofapplyingfacedirectionspecific
modelsinsteadofusingasingleintegratedmodel.Adatasetoffacephotostakenin3
differentdirectionswasusedinthisexperiment.
AMaskRCNNmodelwastrainedtosegmentfacephotosinanydirection.Then,a
setof3individualmodelsfor3facedirectionsweretrained:leftside,frontal,andright
sidedirections.Then,theirperformancesweremeasuredandcompared,asshowninFig
ure17.
Figure17.Comparingperformancesofintegratedmodelandfacedirectionspecificmodels.
Forthephotosoffrontalface,theintegratedsegmentationmodelyielded53.12%of
DSCwherethefrontaldirectionspecificmodelyielded65.38%,asshowninthefigure.A
significantdegreeofperformancehasbeengained.
Theaverageperformancesofsegmentingfacephotosshowingtheleftsideface,
frontalface,andrightsidefacewereincreasedby8.45%,12.26%,and8.66%,respectively,
andtheaverageofperformancegainsforall3facedirectionsis9.79%
5.4.5.ExperimentforTactic#5:DiscardingFalseSegmentations
Thisexperimentistoevaluatetheeffectivenessofdiscardingfalsesegmentations
madeonnonfacialareas.AMaskRCNNmodelwastrainedandappliedtodetectfacial
skinproblems.Then,thesoftwarecomponentimplementingthetacticofdiscardingfalse
segmentationwasimplementedandappliedtodiscardanyresultingfalsesegmentations.
Theperformanceofthediagnosiswithoutapplyingthistacticandtheperformance
ofthediagnosisbyapplyingthistacticweremeasuredandcomparedasshowninFigure
18.
Figure 17. Comparing performances of integrated model and face direction-specific models.
For the photos of frontal face, the integrated segmentation model yielded 53.12% of
DSC where the frontal direction-specific model yielded 65.38%, as shown in the figure. A
significant degree of performance has been gained.
The average performances of segmenting face photos showing the left side face, frontal
face, and right side face were increased by 8.45%, 12.26%, and 8.66%, respectively, and the
average of performance gains for all 3 face directions is 9.79%
5.4.5. Experiment for Tactic #5: Discarding False Segmentations
This experiment is to evaluate the effectiveness of discarding false segmentations
made on non-facial areas. A Mask R-CNN model was trained and applied to detect facial
skin problems. Then, the software component implementing the tactic of discarding false
segmentation was implemented and applied to discard any resulting false segmentations.
The performance of the diagnosis without applying this tactic and the performance of
the diagnosis by applying this tactic were measured and compared as shown in Figure 18.
For the wrinkle problems, the segmentation with the Mask R-CNN yielded 45.97% of
DSC where the performance measure after discarding false segmentations was 61.23%, as
shown in the figure. A significant degree of performance has been gained.
The performances of segmenting acne, age spots, moles, rosacea, and wrinkles were
increased by 5.49%, 9.17%, 8.01%, 6.65%, and 15.26%, respectively, and the average of
performance gains for all 5 types of facial skin problems is 8.92%.
Appl. Sci. 2023,13, 989 19 of 24
Appl.Sci.2023,13,xFORPEERREVIEW19of24
Figure18.Comparingperformancesofsegmentationswith‐andwithoutdiscardingfalsesegmen
tation.
Forthewrinkleproblems,thesegmentationwiththeMaskRCNNyielded45.97%
ofDSCwheretheperformancemeasureafterdiscardingfalsesegmentationswas61.23%,
asshowninthefigure.Asignificantdegreeofperformancehasbeengained.
Theperformancesofsegmentingacne,agespots,moles,rosacea,andwrinkleswere
increasedby5.49%,9.17%,8.01%,6.65%,and15.26%,respectively,andtheaverageofper
formancegainsforall5typesoffacialskinproblemsis8.92%.
5.4.6.ExperimentforIntegratingall5Tactics
Thisexperimentistoevaluatetheperformanceoffacialskinproblemdiagnosisby
integratingall5tactics.AconventionalMaskRCNNsegmentationmodelwastrainedfor
alltheskinproblemtypes.Then,atotalof30individualsegmentationmodelswere
trainedforthe5differenttypesoftacticsandthe3facialphotodirections.
Then,theperformancesof3differentapproachesweremeasured:(1)performanceof
theconventionalMaskRCNNmodel,(2)theaverageperformancesof5differenttactics,
and(3)theaverageperformanceofapplyingall30segmentationmodels.
TheperformancesofthethreeapproachesarecomparedinFigure19.
Figure19.Comparingperformancesofthethreeapproaches.
Figure 18.
Comparing performances of segmentations with- and without discarding false segmentation.
5.4.6. Experiment for Integrating all 5 Tactics
This experiment is to evaluate the performance of facial skin problem diagnosis by
integrating all 5 tactics. A conventional Mask R-CNN segmentation model was trained for
all the skin problem types. Then, a total of 30 individual segmentation models were trained
for the 5 different types of tactics and the 3 facial photo directions.
Then, the performances of 3 different approaches were measured: (1) performance of
the conventional Mask R-CNN model, (2) the average performances of 5 different tactics,
and (3) the average performance of applying all 30 segmentation models.
The performances of the three approaches are compared in Figure 19.
Appl.Sci.2023,13,xFORPEERREVIEW19of24
Figure18.Comparingperformancesofsegmentationswith‐andwithoutdiscardingfalsesegmen
tation.
Forthewrinkleproblems,thesegmentationwiththeMaskRCNNyielded45.97%
ofDSCwheretheperformancemeasureafterdiscardingfalsesegmentationswas61.23%,
asshowninthefigure.Asignificantdegreeofperformancehasbeengained.
Theperformancesofsegmentingacne,agespots,moles,rosacea,andwrinkleswere
increasedby5.49%,9.17%,8.01%,6.65%,and15.26%,respectively,andtheaverageofper
formancegainsforall5typesoffacialskinproblemsis8.92%.
5.4.6.ExperimentforIntegratingall5Tactics
Thisexperimentistoevaluatetheperformanceoffacialskinproblemdiagnosisby
integratingall5tactics.AconventionalMaskRCNNsegmentationmodelwastrainedfor
alltheskinproblemtypes.Then,atotalof30individualsegmentationmodelswere
trainedforthe5differenttypesoftacticsandthe3facialphotodirections.
Then,theperformancesof3differentapproachesweremeasured:(1)performanceof
theconventionalMaskRCNNmodel,(2)theaverageperformancesof5differenttactics,
and(3)theaverageperformanceofapplyingall30segmentationmodels.
TheperformancesofthethreeapproachesarecomparedinFigure19.
Figure19.Comparingperformancesofthethreeapproaches.
Figure 19. Comparing performances of the three approaches.
The conventional Mask R-CNN yielded 50.8% of DSC, the tactic-specific approaches
yielded (64.6%, 69.33%, 60.94%, 59.74%, and 59.72%) of DSC, and the approach of integrat-
ing all 5 tactics yielded 83.38% of DSC.
The integrated approach outperformed the conventional Mask R-CNN approach by
32.58%, and it outperformed the tactic-specific approach by an average of 22.47%.
5.4.7. Experiment for Comparing with Other Backbone Networks
This experiment was to compare the performance of our proposed approach with
diagnosis models trained with 6 different backbone networks: MobileNetV2, Xception,
VGG16, VGG19, ResNet50, and ResNet101.
Appl. Sci. 2023,13, 989 20 of 24
Table 5shows the code segments of training the segmentation models using the
6 different backbone networks.
Table 5. Code segment of training segmentation models using the 6 backbone networks.
Code. Implementing a Python Class for training 6 segmentation models
1: import tensorflow as tf
2: import tensorflow.keras as keras
3: import tensorflow.keras.layers as KL
4: from .backbone import build_mobilenet, build_xception, build_vgg16, build_vgg19,
build_resnet50,
5: build_resnet101, build_fusion_deconv
6: from .mrcnn import build_rpn_and_mrcnn
7:
8: class FSPSegmenter:
9: def __init__(self, backbone_type, config, p_model):
10: self.config = config
11: self.path_model = p_model
12: self.model = self.build(backbone_type)
13:
14: def build(self, backbone_type):
15: input = KL.Input(shape= self.config.img_shape)
16: # To build backbone
17: feature_maps_MobileNetV2 = self.build_mobilenet(input_img)
18: feature_maps_xception = build_xception(input_img)
19: feature_maps_vgg16 = build_vgg16(input_img)
20: feature_maps_vgg19 = build_vgg19(input_img)
21: feature_maps_resnet50 = build_resnet50(input_img)
22: feature_maps_resnet101 = build_resnet101(input_img)
23: feature_maps_fusion_deconv = build_fusion_deconv(input_img)
24:
25: # To initialize structures for region proposal networks and networks for segmentation and
detection
26: self.model_mobileNetV2 = build_rpn_and_mrcnn(feature_maps_MobileNetV2)
27: self.model_xception = build_rpn_and_mrcnn(feature_maps_xception)
28: self.model_vgg16 = build_rpn_and_mrcnn(feature_maps_vgg16)
29: self.model_vgg19 = build_rpn_and_mrcnn(feature_maps_vgg19)
30: self.model_resnet50 = build_rpn_and_mrcnn(feature_maps_resnet50)
31: self.model_resnet101 = build_rpn_and_mrcnn(feature_maps_resnet101)
32: self.model_fusion_deconv = build_rpn_and_mrcnn(feature_maps_fusion_deconv)
The code segment is to configure (6 + 1) segmentation backbone structures inside of
the build method (lines 17 to 23) and to configure the remaining network structures of Mask
R-CNN inside in build_rpn_and_mrcnn methods (lines 26–32).
Once the network structures are configured, then all 7 segmentation models are trained
and applied to detecting facial skin problems using the test set of 445 photos.
The comparison of their average performances is shown in Figure 20.
As shown in the figure, each of the segmentation models with MobileNetV2, Xcep-
tion, VGG16, VGG19, ResNet50, and ResNet101 shows an average 46.91% of detection
performance. It is 17.89% lower than the performance of our proposed model.
Appl. Sci. 2023,13, 989 21 of 24
Appl.Sci.2023,13,xFORPEERREVIEW21of24
Figure20.Comparingtheproposedapproachwith6otherbackbonenetworks.
6.ConcludingRemarks
Theconditionofthefacialskinisperceivedasavitalindicatoroftheperson’sappar
entage,perceivedbeauty,anddegreeofhealth.Forthisreason,peoplewishtomaintain
youthfulfacialskinwithoutagingsymptoms.
Machinelearningbasedsoftwareanalyticsonfacialskinconditionscanbeatime‐
andcostefficientalternativetotheconventionalapproachofvisitingtheclinics.However,
thecurrentCNNbasedapproacheshavebeenshowntobelimitedinthediagnosisper
formanceand,hence,limitedintheirapplicabilityinclinics.
Inthispaper,thesetof5technicalchallengesindiagnosingfacialskinproblemswere
addressed.Then,asetof5effectivedesigntacticstoovercomethetechnicalchallengesin
diagnosingfacialskinproblemswerepresented.Eachproposedtacticisdevisedtoresolve
oneormoretechnicalchallenges.
Usingadatacollectionof2225photos,atotalof30segmentationmodelsweretrained
andappliedtotheexperiments.Theexperimentsshowed83.38%ofthediagnosisperfor
mancewhenapplyingall5tactics,whichoutperformsconventionalCNNapproachesby
32.58%.Thediagnosissystempresentedinthisstudycanpotentiallybeutilizedindevel
opingclinicaldiagnosissystems.
AuthorContributions:Conceptualization,M.K.;Methodology,M.K.andM.H.S.;Software,M.K.
andM.H.S.;Investigation,M.H.S.;Writing—originaldraft,M.K.andM.H.S.;Supervision,M.K.All
authorshavereadandagreedtothepublishedversionofthemanuscript.
Funding:Thisresearchreceivednoexternalfunding.
InstitutionalReviewBoardStatement:Notapplicable.
InformedConsentStatement:Notapplicable.
DataAvailabilityStatement:Notapplicable.
ConflictsofInterest:Theauthorsdeclarenoconflictofinterest.
References
1. Shen,X.;Zhang,J.;Yan,C.;Zhou,H.AnAutomaticDiagnosisMethodofFacialAcneVulgarisBasedonConvolutionalNeural
Network.Sci.Rep.2018,8,5839.https://doi.org/10.1038/s41598018242046.
2. Quattrini,A.;Boër,C.;Leidi,T.;Paydar,R.ADeepLearningBasedFacialAcneClassificationSystem.Clin.Cosmet.Investig.
Dermatol.2022,15,851–857.https://doi.org/10.2147/ccid.s360450.
3. Zhao,Z.;Wu,C.M.;Zhang,S.;He,F.;Liu,F.;Wang,B.;Huang,Y.;Shi,W.;Jian,D.;Xie,H.;etal.ANovelConvolutionalNeural
NetworkfortheDiagnosisandClassificationofRosacea:UsabilityStudy.JMIRPublicHealthSurveill.2021,9,e23415.
https://doi.org/10.2196/23415.
4. Liu,S.;Chen,Z.;Zhou,H.;He,K.;Duan,M.;Zheng,Q.;Xiong,P.;Huang,L.;Yu,Q.;Su,G.;etal.DiaMole:MoleDetectionand
SegmentationSoftwareforMobilePhoneSkinImages.J.HealthEng.2021,2021,6698176.https://doi.org/10.1155/2021/6698176.
Figure 20. Comparing the proposed approach with 6 other backbone networks.
6. Concluding Remarks
The condition of the facial skin is perceived as a vital indicator of the person’s apparent
age, perceived beauty, and degree of health. For this reason, people wish to maintain
youthful facial skin without aging symptoms.
Machine-learning-based software analytics on facial skin conditions can be a time-
and cost-efficient alternative to the conventional approach of visiting the clinics. How-
ever, the current CNN-based approaches have been shown to be limited in the diagnosis
performance and, hence, limited in their applicability in clinics.
In this paper, the set of 5 technical challenges in diagnosing facial skin problems were
addressed. Then, a set of 5 effective design tactics to overcome the technical challenges in
diagnosing facial skin problems were presented. Each proposed tactic is devised to resolve
one or more technical challenges.
Using a data collection of 2225 photos, a total of 30 segmentation models were trained
and applied to the experiments. The experiments showed 83.38% of the diagnosis per-
formance when applying all 5 tactics, which outperforms conventional CNN approaches
by 32.58%. The diagnosis system presented in this study can potentially be utilized in
developing clinical diagnosis systems.
Author Contributions:
Conceptualization, M.K.; Methodology, M.K. and M.H.S.; Software, M.K.
and M.H.S.; Investigation, M.H.S.; Writing—original draft, M.K. and M.H.S.; Supervision, M.K. All
authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
References
1.
Shen, X.; Zhang, J.; Yan, C.; Zhou, H. An Automatic Diagnosis Method of Facial Acne Vulgaris Based on Convolutional Neural
Network. Sci. Rep. 2018,8, 5839. [CrossRef]
2.
Quattrini, A.; Boër, C.; Leidi, T.; Paydar, R. A Deep Learning-Based Facial Acne Classification System. Clin. Cosmet. Investig.
Dermatol. 2022,15, 851–857. [CrossRef] [PubMed]
3.
Zhao, Z.; Wu, C.-M.; Zhang, S.; He, F.; Liu, F.; Wang, B.; Huang, Y.; Shi, W.; Jian, D.; Xie, H.; et al. A Novel Convolutional Neural
Network for the Diagnosis and Classification of Rosacea: Usability Study. JMIR Public Health Surveill.
2021
,9, e23415. [CrossRef]
4.
Liu, S.; Chen, Z.; Zhou, H.; He, K.; Duan, M.; Zheng, Q.; Xiong, P.; Huang, L.; Yu, Q.; Su, G.; et al. DiaMole: Mole Detection and
Segmentation Software for Mobile Phone Skin Images. J. Health Eng. 2021,2021, 6698176. [CrossRef] [PubMed]
Appl. Sci. 2023,13, 989 22 of 24
5.
Wu, H.; Yin, H.; Chen, H.; Sun, M.; Liu, X.; Yu, Y.; Tang, Y.; Long, H.; Zhang, B.; Zhang, J.; et al. A deep learning, image based
approach for automated diagnosis for inflammatory skin diseases. Ann. Transl. Med. 2020,8, 581. [CrossRef]
6.
Gerges, F.; Shih, F.; Azar, D. Automated Diagnosis of Acne and Rosacea using Convolution Neural Networks. In Proceedings
of the 2021 4th International Conference on Artificial Intelligence and Pattern Recognition (AIPR 2021), Xiamen, China, 24–26
September 2021. [CrossRef]
7.
Yadav, N.; Alfayeed, S.M.; Khamparia, A.; Pandey, B.; Thanh, D.N.H.; Pande, S. HSV model-based segmentation driven facial
acne detection using deep learning. Expert Syst. 2021,39, e12760. [CrossRef]
8.
Junayed, M.S.; Islam, B.; Jeny, A.A.; Sadeghzadeh, A.; Biswas, T.; Shah, A.F.M.S. ScarNet: Development and Validation of a Novel
Deep CNN Model for Acne Scar Classification With a New Dataset. IEEE Access 2021,10, 1245–1258. [CrossRef]
9.
Bekmirzaev, S.; Oh, S.; Yo, S. RethNet: Object-by-Object Learning for Detecting Facial Skin Problems. In Proceedings of the 2019
IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019.
[CrossRef]
10.
Gessert, N.; Sentker, T.; Madesta, F.; Schmitz, R.; Kniep, H.; Baltruschat, I.; Werner, R.; Schlaefer, A. Skin Lesion Classification
Using CNNs With Patch-Based Attention and Diagnosis-Guided Loss Weighting. IEEE Trans. Biomed. Eng.
2019
,67, 495–503.
[CrossRef]
11.
Gulzar, Y.; Khan, S.A. Skin Lesion Segmentation Based on Vision Transformers and Convolutional Neural Networks—A
Comparative Study. Appl. Sci. 2022,12, 5990. [CrossRef]
12.
Li, H.; Pan, Y.; Zhao, J.; Zhang, L. Skin disease diagnosis with deep learning: A review. Neurocomputing
2021
,464, 364–393.
[CrossRef]
13.
Cui, L.; Ma, R.; Lv, P.; Jiang, X.; Gao, Z.; Zhou, B.; Xu, M. MDSSD: Multi-scale deconvolutional single shot detector for small
objects. Sci. China Inf. Sci. 2020,63, 120113. [CrossRef]
14.
Liu, Z.; Yeoh, J.K.; Gu, X.; Dong, Q.; Chen, Y.; Wu, W.; Wang, L.; Wang, D. Automatic pixel-level detection of vertical cracks in
asphalt pavement based on GPR investigation and improved mask R-CNN. Autom. Constr. 2023,146, 104689. [CrossRef]
15.
Nie, X.; Duan, M.; Ding, H.; Hu, B.; Wong, E.K. Attention Mask R-CNN for Ship Detection and Segmentation From Remote
Sensing Images. IEEE Access 2020,8, 9325–9334. [CrossRef]
16.
Liu, M.; Wang, X.; Zhou, A.; Fu, X.; Ma, Y.; Piao, C. UAV-YOLO: Small Object Detection on Unmanned Aerial Vehicle Perspective.
Sensors 2020,20, 2238. [CrossRef] [PubMed]
17.
Seo, H.; Huang, C.; Bassenne, M.; Xiao, R.; Xing, L. Modified U-Net (mU-Net) with Incorporation of Object-Dependent High
Level Features for Improved Liver and Liver-Tumor Segmentation in CT Images. IEEE Trans. Med. Imaging
2019
,39, 1316–1325.
[CrossRef] [PubMed]
18.
Peng, C.; Zhang, Y.; Zheng, J.; Li, B.; Shen, J.; Li, M.; Liu, L.; Qiu, B.; Chen, D.Z. IMIIN: An Inter-modality Information Interaction
Network for 3D Multi-modal Breast Tumor Segmentation. Comput. Med. Imaging Graph. 2021,95, 102021. [CrossRef]
19.
Tian, G.; Liu, J.; Zhao, H.; Yang, W. Small object detection via dual inspection mechanism for UAV visual images. Appl. Intell.
2021,52, 4244–4257. [CrossRef]
20.
Chen, C.; Zhong, J.; Tan, Y. Multiple-Oriented and Small Object Detection with Convolutional Neural Networks for Aerial Image.
Remote Sens. 2019,11, 2176. [CrossRef]
21.
Amudhan, A.N.; Sudheer, A.P. Lightweight and computationally faster Hypermetropic Convolutional Neural Network for small
size object detection. Image Vis. Comput. 2022,119, 104396. [CrossRef]
22.
Zhang, Q.; Yang, G.; Zhang, G. Collaborative Network for Super-Resolution and Semantic Segmentation of Remote Sensing
Images. IEEE Trans. Geosci. Remote Sens. 2021,60, 21546971. [CrossRef]
23.
Da Wang, Y.; Blunt, M.J.; Armstrong, R.T.; Mostaghimi, P. Deep learning in pore scale imaging and modeling. Earth-Sci. Rev.
2021
,
215, 103555. [CrossRef]
24.
Guo, Z.; Wu, G.; Song, X.; Yuan, W.; Chen, Q.; Zhang, H.; Shi, X.; Xu, M.; Xu, Y.; Shibasaki, R.; et al. Super-Resolution Integrated
Building Semantic Segmentation for Multi-Source Remote Sensing Imagery. IEEE Access 2019,7, 99381–99397. [CrossRef]
25.
Aboobacker, S.; Verma, A.; Vijayasenan, D.; Suresh, P.K.; Sreeram, S. Semantic Segmentation on Low Resolution Cytology Images
of Pleural and Peritoneal Effusion. In Proceedings of the 2022 National Conference on Communications (NCC 2022), Mumbai,
India, 24–27 May 2022. [CrossRef]
26.
Fromm, M.; Berrendorf, M.; Faerman, E.; Chen, Y.; Schüss, B.; Schubert, M. XD-STOD: Cross-Domain Super resolution for Tiny
Object Detection. In Proceedings of the 2019 International Conference on Data Mining Workshops (ICDMW 2019), Beijing, China,
8–11 November 2019.
27.
Wei, S.; Zeng, X.; Zhang, H.; Zhou, Z.; Shi, J.; Zhang, X. LFG-Net: Low-Level Feature Guided Network for Precise Ship Instance
Segmentation in SAR Images. IEEE Trans. Geosci. Remote Sens. 2022,60, 21865424. [CrossRef]
28. Wei, J.; Xia, Y.; Zhang, Y. M3Net: A multi-model, multi-size, and multi-view deep neural network for brain magnetic resonance
image segmentation. Pattern Recognit. 2019,91, 366–378. [CrossRef]
29.
Zhou, X.; Takayama, R.; Wang, S.; Hara, T.; Fujita, H. Deep learning of the sectional appearances of 3D CT images for anatomical
structure segmentation based on an FCN voting method. Med. Phys. 2017,44, 5221–5233. [CrossRef]
30.
Liu, Y.; Kwak, H.-S.; Oh, I.-S. Cerebrovascular Segmentation Model Based on Spatial Attention-Guided 3D Inception U-Net with
Multi-Directional MIPs. Appl. Sci. 2022,12, 2288. [CrossRef]
Appl. Sci. 2023,13, 989 23 of 24
31.
Sander, J.; de Vos, B.D.; Išgum, I. Automatic segmentation with detection of local segmentation failures in cardiac MRI. Sci. Rep.
2020,10, 21769. [CrossRef] [PubMed]
32.
Scherr, T.; Löffler, K.; Böhland, M.; Mikut, R. Cell segmentation and tracking using CNN-based distance predictions and a
graph-based matching strategy. PLoS ONE 2020,15, e0243219. [CrossRef]
33.
Larrazabal, A.J.; Martinez, C.; Ferrante, E. Anatomical Priors for Image Segmentation via Post-processing with Denoising
Autoencoders. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention
(MICCAI 2019), Shenzhen, China, 13–17 October 2019. [CrossRef]
34.
Chan, R.; Rottmann, M.; Gottschalk, H. Entropy Maximization and Meta Classification for Out-of-Distribution Detection in
Semantic Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC,
Canada, 1–17 October 2021; pp. 5128–5137. [CrossRef]
35.
Shuvo, B.; Ahommed, R.; Reza, S.; Hashem, M. CNL-UNet: A novel lightweight deep learning architecture for multimodal
biomedical image segmentation with false output suppression. Biomed. Signal Process. Control. 2021,70, 102959. [CrossRef]
36.
Khan, S.A.; Gulzar, Y.; Turaev, S.; Peng, Y.S. A Modified HSIFT Descriptor for Medical Image Classification of Anatomy Objects.
Symmetry 2021,13, 1987. [CrossRef]
37.
Hamid, Y.; Elyassami, S.; Gulzar, Y.; Balasaraswathi, V.R.; Habuza, T.; Wani, S. An improvised CNN model for fake image
detection. Int. J. Inf. Technol. 2022,2022. [CrossRef]
38.
Zhang, W.; Itoh, K.; Tanida, J.; Ichioka, Y. Parallel distributed processing model with local space-invariant interconnections and
its optical architecture. Appl. Opt. 1990,29, 4790–4797. [CrossRef]
39.
Nguyen, N.-D.; Do, T.; Ngo, T.D.; Le, D.-D. An Evaluation of Deep Learning Methods for Small Object Detection. J. Electr. Comput.
Eng. 2020,2020, 3189691. [CrossRef]
40.
Liu, Y.; Sun, P.; Wergeles, N.; Shang, Y. A survey and performance evaluation of deep learning methods for small object detection.
Expert Syst. Appl. 2021,172, 114602. [CrossRef]
41. Sun, C.; Ai, Y.; Wang, S.; Zhang, W. Mask-guided SSD for small-object detection. Appl. Intell. 2020,51, 3311–3322. [CrossRef]
42.
Flament, F.; Francois, G.; Qiu, H.; Ye, C.; Hanaya, T.; Batisse, D.; Cointereau-Chardon, S.; Seixas, M.D.G.; Belo, S.E.D.; Bazin, R.
Facial skin pores: A multiethnic study. Clin. Cosmet. Investig. Dermatol. 2015,8, 85–93. [CrossRef] [PubMed]
43.
National Cancer Institute. Common Moles, Dysplastic Nevi, and Risk of Melanoma. Available online: https://www.cancer.gov/
types/skin/moles-fact-sheet (accessed on 4 January 2023).
44.
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer
Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [CrossRef]
45.
Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.P.; Tejani, A.; Totz, J.; Wang, Z.; et al.
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017.
46.
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.;
Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data
2021
,8,
53. [CrossRef]
47.
Indolia, S.; Goswami, A.K.; Mishra, S.; Asopa, P. Conceptual Understanding of Convolutional Neural Network- A Deep Learning
Approach. Procedia Comput. Sci. 2018,132, 679–688. [CrossRef]
48.
Luo, C.; Li, X.; Wang, L.; He, J.; Li, D.; Zhou, J. How Does the Data set Affect CNN-based Image Classification Performance? In
Proceedings of the 2018 5th International Conference on Systems and Informatics (ICSAI 2018), Nanjing, China, 10–12 November
2018. [CrossRef]
49.
Zheng, X.; Qi, L.; Ren, Y.; Lu, X. Fine-Grained Visual Categorization by Localizing Object Parts With Single Image. IEEE Trans.
Multimedia 2020,23, 1187–1199. [CrossRef]
50.
Avianto, D.; Harjoko, A. Afiahayati CNN-Based Classification for Highly Similar Vehicle Model Using Multi-Task Learning.
J. Imaging 2022,8, 293. [CrossRef]
51.
Ju, M.; Moon, S.; Yoo, C.D. Object Detection for Similar Appearance Objects Based on Entropy. In Proceedings of the 2019
7th International Conference on Robot Intelligence Technology and Applications (RiTA 2019), Daejeon, Republic of Korea, 1–3
November 2019. [CrossRef]
52.
Jang, W.; Lee, E.C. Multi-Class Parrot Image Classification Including Subspecies with Similar Appearance. Biology
2021
,10, 1140.
[CrossRef] [PubMed]
53.
Facial Landmarks Shape Predictor. Available online: https://github.com/codeniko/shape_predictor_81_face_landmarks (ac-
cessed on 4 January 2023).
54. Wu, X.; Wen, N.; Liang, J.; Lai, Y.K.; She, D.; Cheng, M.; Yang, J. Joint Acne Image Grading and Counting via Label Distribution
Learning. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV 2019), Seoul, Republic of
Korea, 27 October–2 November 2019.
55.
Karras, T.; Laine, S.; Aila, T. A Style-Based Generator Architecture for Generative Adversarial Networks. IEEE Trans. Pattern Anal.
Mach. Intell. 2020,43, 4217–4228. [CrossRef] [PubMed]
56.
Wu, X. Pytorch Implementation of Joint Acne Image Grading and Counting via Label Distribution Learning. Available online:
https://github.com/xpwu95/LDL (accessed on 4 January 2023).
57. NVDIA Research Lab. FFHQ Datase. Available online: https://github.com/NVlabs/ffhq-dataset (accessed on 4 January 2023).
Appl. Sci. 2023,13, 989 24 of 24
58. Thomaz, C.E. FEI Face Database. Available online: https://fei.edu.br/~{}cet/facedatabase.html (accessed on 4 January 2023).
59. COCO Annotator. Available online: https://github.com/jsbroks/coco-annotator (accessed on 4 January 2023).
Disclaimer/Publisher’s Note:
The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
... Alagic et al (2022) [32] conducted research on facial skin condition analysis, while Ding E (2022) [41] examined benign pigmented skin lesion identification. Borade S (2022) [133] focused on cosmetic skin disease detection, Kim MH (2023) [122] developed methods for facial skin problem diagnosis and Jeong DS (2022) [62] contributed to scalp health classification. Huang J (2022) [134] investigated rosacea diagnosis, and Liu C (2022) [74] explored melasma diagnosis. ...
Article
Full-text available
Introduction Artificial Intelligence (AI) is becoming increasingly integrated into healthcare, particularly in fields like dermatology, minimally invasive aesthetics, aesthetic surgery, and plastic and reconstructive surgery. AI has the potential to improve diagnostic accuracy, personalised treatment, and patient outcomes. However, issues such as algorithmic bias, ethical concerns, and generalisability of models remain significant barriers to its full adoption in clinical practice. Methods A systematic review was conducted following PRISMA guidelines. A broad search of databases including PubMed, EMBASE, and Scopus was performed to identify studies on AI applications in dermatology, aesthetic treatments, and surgery. Inclusion criteria focused on studies evaluating AI’s impact on clinical outcomes, bias mitigation strategies, and data management. Data extraction and quality assessment were carried out by two independent reviewers. Results Out of 103 included studies, AI showed varying accuracy across different fields. In dermatology, AI models, particularly neural networks, achieved an average accuracy of 90%, while in minimally invasive aesthetics and aesthetic surgery, accuracy ranged between 85% and 95%. Bayesian analysis demonstrated a posterior probability of 0.78 that AI outperforms traditional methods. However, challenges in bias, particularly regarding dataset diversity and ethical concerns, were frequently noted, limiting generalisability and applicability across diverse populations. Conclusions AI offers significant promise in enhancing clinical outcomes, particularly in dermatology and aesthetic surgery. Nonetheless, biases and ethical issues must be systematically addressed. Further research and standardisation are needed to ensure AI’s responsible integration into healthcare. Level of evidence: Not gradable
... Kim et al. [23] presents a novel approach to facial skin problem diagnosis using a combination of computer vision and DL techniques. The authors propose an enhanced Mask R-CNN (Masked Region-based Convolutional Neural Networks) architecture, which is trained on a large dataset of facial images with annotated skin problems, such as acne, dark spots, and fine lines. ...
Article
Full-text available
Background The integration of artificial intelligence (AI) and machine learning (ML) has revolutionized aesthetic medicine, enhancing the diagnosis, classification, and treatment of skin conditions. These technologies offer high precision, personalized care, and the potential to reduce human error. This review aimed to evaluate the current applications of AI and ML in aesthetic medicine, focusing on studies graded as Level I or II evidence by the Oxford Centre for Evidence‐Based Medicine (CEBM). Methods A comprehensive search of MEDLINE, PubMed, and Ovid databases identified studies employing AI and ML for diagnosing and managing skin conditions. Studies were included if they demonstrated high diagnostic accuracy, improved treatment personalization, or other measurable clinical outcomes. Results AI and ML systems showed high accuracy in detecting and diagnosing conditions such as skin cancer, acne, psoriasis, and seborrheic dermatitis. AI‐based platforms facilitated personalized treatment plans, enhancing therapeutic outcomes while minimizing errors. The integration of AI reduced diagnostic time and lowered healthcare costs, demonstrating significant potential for improving patient care. However, challenges such as algorithmic bias, data privacy concerns, and the need for high‐quality training datasets were highlighted. Conclusion AI and ML have transformative potential in aesthetic medicine, offering improved diagnostic precision, enhanced patient outcomes, and cost reductions. Addressing limitations related to algorithm bias, regulatory oversight, and data quality is essential to fully realize the benefits of AI in clinical practice. Future research should focus on developing robust, ethical, and regulatory‐compliant AI solutions.
... In addition, three traditional CNNs were utilized to diagnose acute lymphoblastic leukemia, with their extracted spatial features fed into either an extreme gradient boosting classifier [20]. The versatility of CNNs is extended to various diagnostic objectives, including breast cancer diagnosis [21], facial skin condition identification [22], and brain tumor detection [23]. The scalability of CNN is promising with the combination of various machine learning methods [24]. ...
Article
Full-text available
Thyroid-associated orbitopathy is an autoimmune disease that causes changes in various structures close to the eye. Medical images, such as three-dimensional computed tomography scans, can be used by medical experts to diagnose thyroid-associated orbitopathy. Meanwhile, image segmentation has been widely used in medical imaging owing to its significant impact on improving model performance by filtering out unnecessary pixel values. In this study, a neural network specialized in processing multiple segmented images was proposed to evaluate thyroid orbitopathy activity, focusing on the fact that multiple segmented images can be extracted from orbital computed tomography scans. The proposed neural network consists of multiple convolutional embedding heads, a group squeeze-and-excitation block, and a classifier stage. Our empirical study shows that the proposed model outperforms four baseline models on a thyroid-associated orbitopathy activity dataset obtained from a cohort of 1,068 patients at Chung-Ang University Hospital between January 2008 and October 2019. The proposed model achieved an average area under the receiver operating characteristic curve of 0.800, accuracy of 0.721, F1 score of 0.416, sensitivity of 0.728, and specificity of 0.720 across 50 replicate experiments. The source code for the proposed model is available at https://github.com/tkdgur658/MultiheadGroupSENet.
... Kim and Song [95] identified several limitations associated with the utilization of CNN-based models in the classification of facial skin conditions. These limitations include the challenge of accurately identifying minor skin issues, the need to classify over 20 distinct conditions, the presence of variations within the same condition, the potential for confusion between similar conditions, and the possibility of false segmentation on non-facial regions. ...
Article
Full-text available
Over the last ten years, the field of dermatology has experienced significant advancements through the utilization of artificial intelligence (AI) technologies. The adoption of such technologies is multifaceted, encompassing tasks such as screening, diagnosis, treatment, and prediction of treatment outcomes. The majority of prior systematic reviews in this domain were centered on medical dermatology, with the aim of detecting and managing serious skin diseases such as skin cancer. However, the adoption of AI in cosmetic dermatology, which focuses on improving skin conditions for cosmetic purposes, has not been comprehensively reviewed. Therefore, the objective of this systematic review article is to analyze the existing and recent research revolving around applications of AI in the field of cosmetic dermatology. The study encompasses articles published between 2018 and 2023, where a total of 63 publications are deemed relevant based on the established inclusion criteria, divided into five categories based on utilization domains, namely cosmetic product development, skin assessment, skin condition diagnosis, treatment recommendation, and treatment outcome prediction. This systematic review article provides not only valuable insights for researchers interested in exploring new research areas related to aesthetic medicine but also applicable guidance for practitioners seeking to implement AI technologies to address real-world challenges in cosmetic services.
Article
Full-text available
Autofluorescence is a remarkable property of human skin. It can be excited by UV and observed in the dark using special detection systems. The method of fluorescence photography (FP) is an effective non-invasive tool for skin assessment. It involves image capturing by a camera the emission of light quanta from fluorophore molecules in the skin. It serves as a useful tool for cosmetic and skincare research, especially for the detection of pathological skin states, like acne, psoriasis, etc. To the best of our knowledge, there is currently no comprehensive review that fully describes the application and physical principles of FP over the past five years. The current review covers various aspects of the skin FP method from its biophysical basis and the main fluorescent molecules of the skin to its potential applications and the principles of FP recording and analysis. We pay particular attention to recently reported works on the automatic analysis of FP based on artificial intelligence (AI). Thus, we argue that FP is a rapidly evolving technology with a wide range of potential applications. We propose potential directions of the development of this method, including new AI algorithms for the analysis and expanding the range of applications.
Article
This study introduces an integrated training method combining the optical approach with ground truth for skin pigment analysis. Deep learning is increasingly applied to skin pigment analysis, primarily melanin and hemoglobin. While regression analysis is a widely used training method to predict ground truth‐like outputs, the input image resolution is restricted by computational resources. The optical approach‐based regression method can alleviate this problem, but compromises performance. We propose a strategy to overcome the limitation of image resolution while preserving performance by incorporating ground truth within the optical approach‐based learning structure. The proposed model decomposes skin images into melanin, hemoglobin, and shading maps, reconstructing them by solving the forward problem with reference to the ground truth for pigments. Evaluation against the VISIA system, a professional diagnostic equipment, yields correlation coefficients of 0.978 for melanin and 0.975 for hemoglobin. Furthermore, our model can produce pigment‐modified images for applications like simulating treatment effects. This article is protected by copyright. All rights reserved.
Article
Full-text available
Non-destructive testing and characterization of internal vertical cracks are critical for road maintenance by ground penetrating radar (GPR). This paper describes a mask region-based convolutional neural network (R-CNN) that automatically detects and segments small cracks in asphalt pavement at the pixel level. Simulation using Gprmax software and field detection were performed to determine the crack features in GPR images of asphalt pavement and the relationship between the width of vertical cracks and their area in GPR images. Results showed that a 0.833 precision, 0.822 F1 score, 0.701 mean intersection-over-union (mIoU) and 4.2 frames per second (FPS) were achieved on 429 GPR images (1024×1024 pixels), and the mean error between the segmented crack width and the true values was 2.33%. The research results represent a further step toward accurately detecting and characterizing internal vertical cracks in asphalt pavement.
Article
Full-text available
Vehicle make and model classification is crucial to the operation of an intelligent transportation system (ITS). Fine-grained vehicle information such as make and model can help officers uncover cases of traffic violations when license plate information cannot be obtained. Various techniques have been developed to perform vehicle make and model classification. However, it is very hard to identify the make and model of vehicles with highly similar visual appearances. The classifier contains a lot of potential for mistakes because the vehicles look very similar but have different models and manufacturers. To solve this problem, a fine-grained classifier based on convolutional neural networks with a multi-task learning approach is proposed in this paper. The proposed method takes a vehicle image as input and extracts features using the VGG-16 architecture. The extracted features will then be sent to two different branches, with one branch being used to classify the vehicle model and the other to classify the vehicle make. The performance of the proposed method was evaluated using the InaV-Dash dataset, which contains an Indonesian vehicle model with a highly similar visual appearance. The experimental results show that the proposed method achieves 98.73% accuracy for vehicle make and 97.69% accuracy for vehicle model. Our study also demonstrates that the proposed method is able to improve the performance of the baseline method on highly similar vehicle classification problems.
Article
Full-text available
Melanoma skin cancer is considered as one of the most common diseases in the world. Detecting such diseases at early stage is important to saving lives. During medical examinations, it is not an easy task to visually inspect such lesions, as there are similarities between lesions. Technological advances in the form of deep learning methods have been used for diagnosing skin lesions. Over the last decade, deep learning, especially CNN (convolutional neural networks), has been found one of the promising methods to achieve state-of-art results in a variety of medical imaging applications. However, ConvNets’ capabilities are considered limited due to the lack of understanding of long-range spatial relations in images. The recently proposed Vision Transformer (ViT) for image classification employs a purely self-attention-based model that learns long-range spatial relations to focus on the image’s relevant parts. To achieve better performance, existing transformer-based network architectures require large-scale datasets. However, because medical imaging datasets are small, applying pure transformers to medical image analysis is difficult. ViT emphasizes the low-resolution features, claiming that the successive downsampling results in a lack of detailed localization information, rendering it unsuitable for skin lesion image classification. To improve the recovery of detailed localization information, several ViT-based image segmentation methods have recently been combined with ConvNets in the natural image domain. This study provides a comprehensive comparative study of U-Net and attention-based methods for skin lesion image segmentation, which will assist in the diagnosis of skin lesions. The results show that the hybrid TransUNet, with an accuracy of 92.11% and dice coefficient of 89.84%, outperforms other benchmarking methods.
Article
Full-text available
Introduction Acne is one of the most common pathologies and affects people of all ages, genders, and ethnicities. The assessment of the type and severity status of a patient with acne should be done by a dermatologist, but the ever-increasing waiting time for an examination makes the therapy not accessible as quickly and consequently less effective. This work, born from the collaboration with CHOLLEY, a Swiss company with decades of experience in the research and production of skin care products, with the aim of developing a deep learning system that, using images produced with a mobile device, could make assessments and be as effective as a dermatologist. Methods There are two main challenges within this task. The first is to have enough data to train a neural model. Unlike other works in the literature, it was decided not to collect a proprietary dataset, but rather to exploit the enormity of public data available in the world of face analysis. Part of Flickr-Faces-HQ (FFHQ) was re-annotated by a CHOLLEY dermatologist, producing a dataset that is sufficiently large, but still very extendable. The second challenge was to simultaneously use high-resolution images to provide the neural network with the best data quality, but at the same time to ensure that the network learned the task correctly. To prevent the network from searching for recognition patterns in some uninteresting regions of the image, a semantic segmentation model was trained to distinguish, what is a skin region possibly affected by acne and what is background and can be discarded. Results Filtering the re-annotated dataset through the semantic segmentation model, the trained classification model achieved a final average f1 score of 60.84% in distinguishing between acne affected and unaffected faces, result that, if compared to other techniques proposed in the literature, can be considered as state-of-the-art.
Article
The last decade has witnessed a multifold growth of image data courtesy of the emergence of social networking services like Facebook, Instagram, LinkedIn etc. The major menace faced by today’s world is the issue of doctored images, where-in the photographs are altered using a rich set of ways like splicing, copy-move, removal to change their meaning and hence demands serious mitigation mechanisms to be thought of. The problem when seen from the prism of Artificial intelligence is a binary classification one, where-in the characterization must be drawn between the original and the manipulated images. This research work proposes a computer vision model based on Convolution Neural Networks for fake image detection. A comparative analysis of 6 popular traditional machine learning models and 6 different CNN architectures to select the best possible model for further experimentation. The proposed model based on ResNet50 employed with powerful preprocessing techniques results in a perfect fake image detector having a total accuracy of 0.99 having an improvement of around 18% performance with other models.
Article
In the past few years, multitask learning (MTL) has been widely used in a single model to solve the problems of multiple businesses. MTL enables each task to achieve high performance and greatly reduces computational resource overhead. In this work, we designed a collaborative network that simultaneously solves the super-resolution semantic segmentation and super-resolution image reconstruction. This algorithm can obtain high-resolution semantic segmentation and super-resolution reconstruction results by taking relatively low-resolution images as input when high-resolution data are inconvenient or computing resources are limited. The framework consists of three parts: the semantic segmentation branch (SSB), the super-resolution branch (SRB), and the structural affinity block (SAB). Specifically, the SSB, SRB, and SAB are responsible for completing super-resolution semantic segmentation, image super-resolution reconstruction, and associated features, respectively. Our proposed method is simple and efficient, and it can replace the different branches with most of the state-of-the-art models. The International Society for Photogrammetry and Remote Sensing (ISPRS) segmentation benchmarks were used to evaluate our models. In particular, super-resolution semantic segmentation on the Potsdam dataset reduced Intersection over Union (IoU) by only 1.8% when the resolution of the input image was reduced by a factor of two. The experimental results showed that our framework can obtain more accurate semantic segmentation and super-resolution reconstruction results than the single model.
Article
Ship instance segmentation of high-resolution (HR) synthetic aperture radar (SAR) images is a valuable and challenging task due to the complex scattering and noise properties. In this article, we pioneered the construction of the low-level feature to discriminate the ships and complemented the super-resolution (SR) denoising techniques in the network modules, termed low-level feature guided network (LFG-Net), for precise ship instance segmentation in SAR images. LFG-Net consists of the low-level feature concerned pyramid (LFCP), the high-resolution feature interaction module (HR-FIM), and the compression recovery segmentation branch (CRSB). LFCP extends the vanilla feature pyramid network (FPN) with the P1{P_{1}} layer and complements SR techniques to capture the regional and texture information at the image level for small object segmentation. HR-FIM interacts with the bounding box region of interest (RoI) feature and mask RoI feature at the instance level with HR techniques to enhance the mask RoI feature. CRSB aims at recovering the HR mask predictions to improve the ship segmentation performance. Comprehensive experiments on high-resolution SAR images dataset (HRSID), polygon segmentation SAR ship detection dataset (PSeg-SSDD), and AirSARShip indicate that LFG-Net* achieves 11.7%, 6.3%, and 12.7% AP increments compared with the Mask R-CNN baseline, respectively. Besides, it receives 9.5%, 4.9%, and 7.3% AP increments compared with the state-of-the-art method, which bridges the gap of instance segmentation precision in SAR images. In terms of the visualized instance segmentation results, LFG-Net * is capable of segmenting the complex scenes, e.g., the adjacent distributed ships and ships with strong reflection noise interference, in SAR images. The code is available at: https://github.com/Evarray/LFG-Net .