GAN based augmentation using a hybrid loss function for dermoscopy images


Dermatology is the most appropriate field to utilize pattern recognition-based automated techniques for objective, accurate, and rapid diagnosis because diagnosis mainly relies on visual examinations of skin lesions. Recent approaches utilizing deep learning techniques have shown remarkable results in this field. However, they necessitate a substantial quantity of images and the availability of dermoscopy images is often limited. Also, even if enough images are available, their labeling requires expert knowledge and is time-consuming. To overcome these issues, an efficient augmentation approach is needed to expand training datasets from input images. Therefore, in this work, a generative adversarial network has been developed using a new hybrid loss function constructed with traditional loss functions to enhance the generation power of the architecture. Also, the effect of the proposed approach and different generative network-based augmentations, which have been used with dermoscopy images in the literature, on the classification of skin lesions has been investigated. Therefore, the main contributions of this work are: (i) introducing a new generative model for the augmentation of dermoscopy images; (ii) presenting the effect of the proposed model on the classification of the images; (iii) comparative evaluations of the effectiveness of different generative network-based augmentations in the classification of seven forms of skin lesions. The classification accuracy when the proposed augmentation is used is 93.12%, which is higher than its counterparts. Experimental results indicate the significance of augmentation techniques in the classification of skin lesions and the efficiency of the proposed structure in improving the classification accuracy.
Accepted: 25 July 2024 / Published online: 7 August 2024
© The Author(s) 2024
Evgin Goceri
1 Department of Biomedical Engineering, Engineering Faculty, Akdeniz University, Antalya,
GAN based augmentation using a hybrid loss function for
dermoscopy images
Articial Intelligence Review (2024) 57:234
Dermatology is the most appropriate eld to utilize pattern recognition-based automated
techniques for objective, accurate, and rapid diagnosis because diagnosis mainly relies on
visual examinations of skin lesions. Recent approaches utilizing deep learning techniques
have shown remarkable results in this eld. However, they necessitate a substantial quantity
of images and the availability of dermoscopy images is often limited. Also, even if enough
images are available, their labeling requires expert knowledge and is time-consuming. To
overcome these issues, an ecient augmentation approach is needed to expand training
datasets from input images. Therefore, in this work, a generative adversarial network has
been developed using a new hybrid loss function constructed with traditional loss func-
tions to enhance the generation power of the architecture. Also, the eect of the proposed
approach and dierent generative network-based augmentations, which have been used
with dermoscopy images in the literature, on the classication of skin lesions has been
investigated. Therefore, the main contributions of this work are: (i) introducing a new
generative model for the augmentation of dermoscopy images; (ii) presenting the eect
of the proposed model on the classication of the images; (iii) comparative evaluations of
the eectiveness of dierent generative network-based augmentations in the classication
of seven forms of skin lesions. The classication accuracy when the proposed augmenta-
tion is used is 93.12%, which is higher than its counterparts. Experimental results indicate
the signicance of augmentation techniques in the classication of skin lesions and the
eciency of the proposed structure in improving the classication accuracy.
Keywords Augmentation · Deep learning · Dermoscopy images · Generative adversarial
networks · Loss function
E. Goceri
1 Introduction
Dermoscopy is a signicant imaging technique for screening skin lesions. Like other
advancements in applied science (Dehghani et al. 2020; Ghiasi et al. 2019, 2023a, b), auto-
mated analyses of dermoscopy images have taken great attention recently (Raza et al. 2022;
Dave et al. 2022; Majidian et al. 2022). Because diagnosis mainly relies on visual examina-
tions of skin lesions. Therefore, dermatology is the most appropriate eld to utilize pattern
recognition-based techniques for objective, accurate, and rapid diagnosis. Although auto-
mated techniques developed utilizing deep learning show remarkable results, they neces-
sitate a substantial quantity of images. However, the availability of dermoscopy images is
usually limited because of various reasons such as not allowing the use of images of patients,
not having enough patients for some diseases, inability to capture images that meet desired
properties, lack of medical equipment and devices, and inability to access the images of
patients from dierent races or living in dierent locations. Furthermore, even if enough
images are available, their labeling necessitates expert knowledge and is time-consuming.
In the literature, various augmentation approaches have been implemented for training
deep network architectures using extended dermoscopy image datasets (Raza et al. 2022;
Galdran et al. 2017; Perez et al. 2018). However, they have limitations or disadvantages.
For instance, augmentations with geometric transformations (e.g., ipping, translation, scal-
ing, shearing, etc.) produce images with the same texture as the original image data. They
simply duplicate the original images and do not change intensity values. Also, they might
not be label-preserving transformations. Therefore, although they increase the number of
images, they do not add novel visual features to the images that are able to increase the gen-
eralization ability of the networks. To augment images by increasing both the number and
variations of them, augmentations with random erasing, cropping, noise addition, blurring,
and color changing have been applied. However, they may cause a loss of critical lesion
information. Also, color conversions can discard signicant color information and therefore
they may not preserve labels. It might be impossible to see or detect the lesions in the image
if the color or intensity values are decreased to simulate a darker background or environ-
ment (Galdran et al. 2017; Perez et al. 2018).
Augmentations with Generative Adversarial Networks (GANs) can generate new realis-
tic images. GANs’ goal is to learn real data distributions from a limited number of data sets
and then produce new images by using the learned distributions. For this goal, they use the
capability of networks in order to learn a function, which is able to approximate the gener-
ated image’s distribution to the original image’s distribution as closely as possible. GANs
do not rely on assumption on data distributions and are able to produce new images with
high visual delity. They are able to learn to generate images in many variations (e.g., back-
grounds, scale, light conditions, viewpoints, etc.) and for any desired number of classes.
Although there have been considerable improvements in generating realistic images, GAN
structures have these three main drawbacks: mode collapse, non-convergence, and vanish-
ing gradient problems. Also, researchers developing applications are interested in control-
ling the advance in the content of the generated images (Yan et al. 2015; Tan et al. 2020).
The standard GAN structure (Goodfellow et al. 2014) has no control over the modes of the
produced images. Among all GAN variants, conditional GAN (cGAN) (Mirza and Osindero
2014) shows promise (Sect. 2.1). The cGAN learns to generate images by using a condi-
tional distribution rather than marginal distributions. It has been developed to have control
234 Page 2 of 19
GAN based augmentation using a hybrid loss function for dermoscopy…
over the types of produced images by adjusting the model based on prior information. How-
ever, the cGAN structure is vulnerable to mode collapse problems.
Loss functions have a critical role in the performance of GANs. The loss functions in
GANs are known as adversarial loss functions, which make estimations of the distances
between the generated and original images’ distributions. Two loss functions working
together are used to construct an adversarial loss function. One of them is for training the
discriminator part, and the other is for training the generator part (Tzeng et al. 2017).
The collapsing mode problem in the cGAN can be alleviated by integrating hybrid loss
functions. Also, earlier research has demonstrated that combining a traditional loss function
with the adversarial loss function is benecial in generating images with high quality by
GANs (Pathak et al. 2016; Isola et al. 2017). Therefore, in this study, a hybrid loss function
has been designed and implemented to train a modied cGAN model and to improve its
generation power. Also, the eectiveness of the proposed augmentation approach in clas-
sifying dermoscopy images has been evaluated quantitatively.
Although GANs have been increasingly applied in medical applications (e.g., classi-
cation, segmentation, synthesis, and reconstruction (Sect. 2)), there are only a few works
based on GANs for the augmentation of dermoscopy images. Also, their performances
have been compared by using dierent measurements, such as recall, sensitivity, t-test, etc.
(Sect. 2.3). Moreover, they have been implemented with dierent datasets (which include
images showing dierent types of skin diseases) and classication architectures. Because of
these dierences, comparisons of their eectiveness in the classications according to the
results presented in the articles will not be meaningful, and it is unclear which one of them
is the most appropriate technique. Therefore, in this work, they have been applied with the
same datasets and classiers. Also, comparative evaluations have been performed using the
same metrics to determine the best GAN structure and to compare their eects on the clas-
sication of dermoscopy images.
The main contributions of this work are:
(i) Introducing a generative model with a new hybrid loss function.
(ii) Demonstrating the ability of the proposed model in the augmentation of dermoscopy
(iii) Presenting the eect of the proposed model on the classication of seven forms of skin
(iv) Comparing GAN-based augmentation methods, which have been applied to dermos-
copy images in the literature, using the same datasets, and comparing their eectiveness
in classifying skin lesions.
We believe that deep learning-based analyses and classications using the generated images
from the proposed augmentation will assist dermatologists in diagnosing dierent forms
of skin lesions. This paper has been organized as follows: A brief overview of the GAN
structure, GAN-based applications, and augmentation methods proposed for dermoscopy
images in the literature are presented in the “Related Work” section. The proposed method is
explained in the “The Proposed Method” section. The datasets and evaluation metrics used
in this work, and the results are given in the “Experimental Results” section. Discussions
and conclusions are presented in the “Discussion” and “Conclusion” sections, respectively.
E. Goceri
2 Related work
2.1 Background: GAN structure
GANs are generative models that have the ability to learn real data distributions from avail-
able images and produce new images using the learned distributions. GANs consist of two
deep neural network architectures, namely; discriminator D and generator G. The word
‘adversarial’ in GANs means that the D and G network structures are in competition with
each other. The G part aims to generate sample images, while the D part aims to separate the
generated samples and original images. Therefore, GANs try to nd the optimal mapping
function, written with the loss function LGAN, by using:
=arg min
G, D
The G network is trained to create sample data, which cannot be separated from original
images by the D network trained to separate them as well as possible. Therefore, GANs’
training is performed by alternating between G and D training to provide their improvement
gradually. This is because too much improvement of one of them leads to the failure of the
other. The training is conducted using back-propagation. Figure 1 shows a GAN structure
with a sample distribution (e.g., Gaussian) used by the G.
To form an adversarial loss function, GANs have two loss functions working together.
One of them (G loss) is used to train the generator part, while the other (D loss) is used to
train the discriminator part (Tzeng et al. 2017). A review work has explained that a network
trained with a mean square loss function may blur images by averaging all the features in an
image, in contrast, an adversarial loss function can preserve the features with the help of the
discriminator detecting nonexistent features and non-realistic images (Lotter et al. 2016).
In other words, features can remain preserved very well using an additional adversarial
loss (Lotter et al. 2016). However, although there have been considerable improvements in
producing realistic images, GAN structures have these three main drawbacks: non-conver-
gence, mode collapse, and vanishing gradient problems (Chen et al. 2019; Lin et al. 2018;
Salimans et al. 2016). Also, researchers developing applications are interested in controlling
the advancement in the content of the produced images (Tan et al. 2020; Yan et al. 2015).
The standard GAN structure (Goodfellow et al. 2014) has no control over the modes of
the produced images. To address these challenges and increase the performance of GANs,
research in the literature has been conducted either by modifying loss functions or enhanc-
Fig. 1 A typical GAN architecture
GAN based augmentation using a hybrid loss function for dermoscopy…
ing GAN architectures (Oord et al. 2016). Among all GAN variants, condition-based GAN
architectures can signicantly increase the quality of produced images (Oord et al. 2016).
A condition-based GAN can learn to map from an original image (x) and a noise vector
(z) to an output image (y) (G:{x, z}→y); on the other hand, a non-condition-based GAN
can learn to map from a noise vector to an output (Goodfellow et al. 2014). Condition-
based GANs control the mode of the produced image by conditioning the architecture with
a conditional variable. In a condition-based GAN, the adversarial loss function (LcGAN) is
obtained using the expected value E as:
G, D
x, y
)] +
x, G
x, z
The objective function is expressed by
G=arg min
G, D
, where the gen-
erator part aims to minimize the value of the LcGAN function; however, the discriminator part
aims to maximize it. The discriminator part uses the input image in the LcGAN. However, it
does not use the input image in the loss function (LGAN), which can be given by:
G, D
)] +
x, z
The condition-based GAN architecture, known as cGAN and used in (Mirza and Osindero
2014), alleviates the side information of annotated labels of sample images by training a
conditional generator to generate conditional images from class labels (Odena et al. 2017;
Miyato and Koyama 2018; Brock et al. 2019). Image generation based on a condition enables
exibility in augmentations and high resolution. However, the cGAN is a vulnerable model
to the mode collapse issue. Therefore, to solve this problem, various cGAN-based models
have been employed in dierent works, such as image-to-image translation operations (e.g.,
semantic labeling of photographs, photograph sketching, mapping of aerial photographs,
and background removal) (Isola et al. 2017) and image inpainting (Liu et al. 2021). In
particular, the Pixel-to-Pixel (Pix2Pix) model (Isola et al. 2017) has achieved satisfactory
results for many images (e.g., MR, Computed Tomography (CT) and has been commonly
applied, especially for the synthesis and reconstruction of medical images (Table 1).
2.2 GAN-Based augmentations and applications with medical images
Most of the GAN-based augmentations of medical images presented in the literature have
been performed with Magnetic Resonance (MR) images (Makhlouf et al. 2023; Kazeminia
et al. 2020; Sampath et al. 2021; Creswell et al. 2018). One reason for this is that MR imag-
ing requires an excessive amount of scanning time to acquire multiple sequences. On the
other hand, GANs can generate specic sequences from those already acquired. Another
possible reason can be the large amount of MR images in the publicly available domain
that can be used by researchers for the training of deep networks. Therefore, GAN-based
applications in medicine have generally been with MR images.
In the literature, medical image analyses and applications using GANs have mostly been
performed in the eld of segmentation, reconstruction, and classication (Sampath et al.
2021; Jeong et al. 2022; Xun et al. 2022; AlAmir and AlGhamdi 2022). This is because
regulations about the generator’s outputs and adversarial training can be ecient for image-
to-image translations. Even though GANs have been increasingly applied in medical appli-
E. Goceri
cations, there are only a few works based on GANs for the augmentation of dermoscopy
images (Sect. 2.3).
2.3 GAN-Based augmentation methods for Dermoscopy images
In the literature, only a few GAN models have been proposed for the augmentation of der-
moscopy images. One of them is StyleGAN, which has been applied by mixing regulariza-
tion that uses two random latent codes to augment dermoscopy images (Gong et al. 2020;
Bissoto et al. 2021). It switches between latent codes to check the styles. Experiments indi-
cated that the model is ecient for augmentation and produces classication results with
97.5% accuracy (Gong et al. 2020). However, StyleGAN has been designed mainly for
natural images (which have obviously changeable styles and continuous information) and
may result in poor quality when generating skin lesion images, which have dierent styles,
colors, and patterns. Therefore, style mixing can cause the overlapping of some features of
skin lesions and even the creation of irrelevant styles.
In (Qin et al. 2020), Skin-Lesion StyleGAN (SL-StyleGAN) has been applied to aug-
ment dermoscopy images. The authors used varying numbers of fully connected layers (e.g.,
2 and 6 fully connected layers) to solve the collapsing mode problem. They evaluated the
performance of generated images with recall scores. Their results indicate that the SL-Style-
GAN constructed by a generator with four fully connected layers can achieve a higher recall
value (0.26) compared to the other generators with dierent numbers of layers. However,
the generated images are not completely diverse according to the results presented by the
authors. Further work is needed to solve this issue.
In (Abdelhalim et al. 2021), an extension of Progressive Growing of GAN (PGGAN),
namely Self-attention Progressive Growing of GAN (SPGGAN), has been applied. It pro-
Table 1 Medical image synthesis and reconstruction applications with the Pix2pix model
Reference Image Type Application Explanation
(Mardani et al. 2019; Ran et al.
2019; Armanious et al. 2019a; Ar-
manious et al. 2019b; Seitzer et al.
2018; Kim et al. 2017; Quan et al.
2018; Yang et al. 2018; Zhang et al.
2018; Shitrit and Raviv 2017)
MR Reconstruction The Pix2pix model has been
applied for the reconstruction of
under-sampled k-space, super-
resolution, motion correction,
and inpainting
(Liu et al. 2020; Shan et al. 2018;
You et al. 2018, 2020; Yi and Babyn
2018; Liao et al. 2018; Wolterink et
al. 2017)
CT Reconstruction The Pix2pix model has been
applied for denoising and sparse
view CT reconstruction
(Mahapatra 2017)Retinal fun-
dus image
Reconstruction The Pix2pix model has been ap-
plied to provide super-resolution
(Jin et al. 2018) CT Synthesis Conditional image synthesis has
been applied with the Pix2pix
model using conditional informa-
tion such as a segmentation map
(Shin et al. 2018)MR
(Mahapatra et al. 2018; Oh and Yun
(Maspero et al. 2018)MRCT Cross modality
Paired training of images has
been performed with the Pix2pix
model to analyze prostate cancer
(Choi and Lee 2018)PETMR Paired templates have been used
in the training stage of the Pix-
2pix model for brain imaging
GAN based augmentation using a hybrid loss function for dermoscopy…
vides higher sensitivity than its counterparts in the recognition of skin lesions according to
the presented results. Although SPGGAN has higher performance than PGGAN in image
generation, the generated images lack ne details and are far from real images. Therefore,
in (Abdelhalim et al. 2021), a combination of SPGGAN and Two Time-scale Update Rules
(TTUR) called SPGGAN-TTUR has been used. The performance of the structure has
been evaluated with the P-value of the T-test (PVT). The authors have concluded that the
model outperforms the SPGGAN, since the PVT of 68.1 ± 0.8% for training datasets and
60.8 ± 1.5% for testing datasets has been achieved. On the other hand, the SPGGAN-TTUR
model still leads to several artifacts and needs to be improved (Abdelhalim et al. 2021).
In (Mutepfe et al. 2021), a Deep Convolutional GAN (DCGAN) has been applied by
using four convolutional layers, and it was observed that the classication of the images
can be performed with 93.5% accuracy. Similarly, in (Bisla et al. 2019), authors applied two
DCGANs for two classes (seborrheic keratosis and melanoma) and have obtained a clas-
sication accuracy of 91.5%. However, images generated by DCGAN suer from artifacts.
The authors have concluded that it is challenging to create samples with high quality and
to compare dierent classication techniques, since non-public datasets have been used in
some methods.
In a dierent study, a combination of DCGAN and Laplacian GAN (LAPGAN) has been
applied (Pollastri et al. 2020). Its performance has been evaluated in a segmentation task
rather than a classication. The authors used several metrics (i.e., Jaccard index, entropy,
accuracy). They concluded that DCGAN causes heavy checkerboard eects in the gener-
ated images and leads to low accuracy. The state-of-the-art GANs employed with dermos-
copy image data sets and important information about them are presented in Table 2.
Although GANs have been employed in various applications (e.g., image synthesis,
registration, detection (Sect. 2.2)), only a few GAN models have been proposed for the
augmentation of dermoscopy images. They have been applied with dierent datasets and
Table 2 The GANs applied with dermoscopy images
Reference GAN Result Drawback
(Gong et al.
2020; Bissoto
et al. 2021)
StyleGAN It improves classication perfor-
mance and can provide results with
97.5% accuracy, according to the
ndings in (Gong et al. 2020).
It can cause the overlapping
of some features and even the
creation of irrelevant styles,
resulting in poor quality.
(Qin et al.
SL-StyleGAN The generator constructed using four
fully connected layers can achieve a
recall score of 0.26.
The generated images are not
completely diverse, and the
method needs improvement.
et al. 2021)
SPGGAN It can provide a 2.5% higher sen-
sitivity in recognizing skin lesions
when compared to its counterparts.
Although SPGGAN has better
performance than PGGAN,
the images generated by SPG-
GAN lack ne details.
et al. 2021)
SPGGAN-TTUR The PVT of 60.8 ± 1.5% for testing
sets and 68.1 ± 0.8% for training sets
has been achieved.
The authors have concluded
that the model still produces
several artifacts and needs
et al. 2021;
Bisla et al.
DCGAN Classications of melanoma and
seborrheic dermatitis can be per-
formed with 91.5% accuracy (Bisla
et al. 2019), classication of other
skin lesions can be done with 93.5%
accuracy (Mutepfe et al. 2021).
Generating high-quality im-
ages similar to real ones and
comparing dierent classi-
cation techniques are dicult
since some techniques use
data sets that are not public.
E. Goceri
classication architectures (mostly ResNet-50 and InceptionV3), and also their perfor-
mances have been assessed with dierent measurements, such as recall, sensitivity, t-test,
etc. Because of these dierences, comparisons of their eectiveness in the classications,
according to the results presented in the articles, will not be meaningful, and it is unclear
which one of them is the most appropriate approach. Also, each of them has a drawback, and
a new more ecient augmentation algorithm is needed to produce skin lesion images with
high quality. Therefore, in this work, they have been implemented with the same datasets
and the same classier, and comparative evaluations have been performed using the same
metrics to determine the best GAN structure. Also, their eects on the classication of der-
moscopy images have been compared. Additionally, a new GAN model with a hybrid loss
function has been applied, and its eect on the classication of the images has been evalu-
ated. To the best of our knowledge, there is no study with these features in the literature.
3 The proposed method
The proposed augmentation technique is based on a cGAN structure because of its ability
to generate images with high quality (Oord et al. 2016). In this work, an ecient extension
form of it, namely the Pix2pix architecture, has been used as a backbone since it solves the
instability problem of the other GANs and provides the advantage of training stably using
input pair images. Also, it has yielded satisfactory results for most of the images (Isola et al.
2017) and is commonly applied, particularly for the reconstruction and synthesis of medical
images (Table 1).
A new hybrid loss function has been designed and used in the architecture, since loss
functions greatly inuence the performance of the network. They should be chosen care-
fully to avoid producing blurry images, which occurs with common loss functions like
mean-square error, and to obtain realistic images. Also, to increase the performance in the
augmentation of dermoscopy images, the activation function in the generator network has
been changed. In addition, the optimization function in the discriminator of the architecture
has been changed. The architecture of the backbone, the proposed loss function, and the
activation and optimization functions used in the architecture are explained in the following
3.1 Backbone
Both the discriminator and generator in the backbone, the Pix2pix structure, are relatively
dierent compared to the previous architectures extended from the cGAN. The generator is
an encoder-decoder type network, namely U-Net (Ronneberger et al. 2015). It can reduce
and increase the size of samples and enhance performance thanks to its skip connections
employed among the layers of the decoder and encoder. The discriminator part is the Patch-
GAN (Isola et al. 2017).
Figure 2 shows a graphical representation of the augmentation approach. The generator
produces fake images resembling real images by utilizing labeled images. Then, the outputs
of the generator part and the original image are used as inputs for the discriminator part,
which distinguishes fake images from real images and produces a patch as an output. Next,
the patch is used for a detailed comparison between the generated image and the original
1 3
234 Page 8 of 19
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
GAN based augmentation using a hybrid loss function for dermoscopy…
image. Based on the original image, it is determined whether the image being generated is
fake or real. After that, ne-tuning is applied to both the discriminator and generator based
on the determination result to generate a fake image resembling a real one. By feeding
labeled images into the trained network, an image reecting the shape and position of each
object can be generated. In this manner, the model creates images through a competition,
an iterative game, between the discriminator and generator, each with opposing objectives.
3.2 Loss function
In the Pix2pix model, the adversarial loss function plays a critical role. The generator’s loss
function incorporates L1 loss along with the cGAN loss. However, our experiments have
revealed that due to the constant gradient of the L1 function, the training is dicult to con-
verge to high accuracy, and the color saturation (color information) is not as similar to the
original image as expected (Sect. 4). In this study, to generate more realistic dermoscopy
images with high quality, a new loss function has been designed and applied.
In previous works, combining a traditional loss function with the adversarial loss func-
tion has been found to be benecial in achieving signicant improvements in GANs (Isola
et al. 2017; Pathak et al. 2016). Motivated by this, in this work, a hybrid loss function has
been used in the generator section of the Pix2pix architecture to obtain both the high- and
low-frequency details. The proposed loss function has been constructed by using three tra-
ditional loss functions: content loss, Structural Similarity Index Measurement (SSIM), and
L1 loss functions.
The content loss has been applied to make the generator more competitive. It oers
stability in training, which is required for convergence, and thus provides powerful results.
It is calculated by Euclidean distances between the content representations of the produced
images and the original images. The content representations are obtained from the feature
maps provided by the pre-trained VGG19Net (Simonyan and Zisserman 2015). Therefore,
it is a feature-wise function, which minimizes the dierence between the feature (content)
representations. The CNN-based content loss function is expressed by:
Fig. 2 A graphical representation of the architecture
1 3
Lcontent(x, y, l)=
where the terms Y
correspond to the feature representation of y (the produced
image) and x (the original image) in the lth layer, respectively. To further improve detailed
information and image quality by preserving contrast and brightness in the regions with
high frequency, the SSIM loss function has been added. It is an important metric that mea-
sures the similarities between images with the formula:
where the terms pj and pi refer to the pixel values in the y and x axis, respectively. The term
SSIM(pi,pj) is expressed by:
In Eq. (6), the terms
refer to the means of the intensities in the y and x direc-
tions, respectively. The term
refers to the covariance, while the terms σ
to the variance of the intensities in the y and x directions, respectively. The terms c1 and c2
are constants, which have been set as 1 × 10 4 and 9 × 10 4, respectively, (as in (Wang et al.
2004) to identify the dynamic range of the image.
Also, the L1 loss, whose minimization is associated with the shape of lesions, has been
integrated to ensure similarities between the original and generated images. It is calculated
using L1 distances between the generated and original images. In this work, a smooth form
of the L1 loss function has been implemented. This form has a variable gradient with a dif-
ferential value. It is not sensitive to boundaries and provides more smoothness than the L1
loss. This function is dened by:
x, z
G(x, z))2,if
G(x, z)
<1 (7)
where the terms x and y denote the original and produced images, respectively, while the
term z refers to the random noise. The constant 0.4 has been found empirically. The nal
objective function used for the generator part is expressed by:
=arg min
G, D
where the coecients β1, β2, and β3 have been set empirically to 10, 5, and 100, respectively.
The proposed hybrid loss function utilizes the advantages of dierent functions. The con-
tent loss function adds the feature-wise loss by comparing two images based on high-level
feature representations and provides stability. The SSIM loss function controls and pre-
serves brightness and contrast in regions with high-frequency. The smooth L1 loss maintains
low-frequency information to generate globally consistent images. Therefore, the weighted
GAN based augmentation using a hybrid loss function for dermoscopy…
combination of the three loss functions will ensure that the low- and high-frequency details
are better captured, and the generator produces high-quality images.
3.3 Activation function
The activation function in the original Pix2pix’s generator network is a Rectied Linear Unit
(ReLU). However, ReLUs can be fragile during training (Cui and Fearn 2018). Because the
output value of ReLU is zero in case the input value is negative, and so its 1st derivative
is zero, which makes neurons unable to update parameters. In other words, neurons cannot
learn when the function is in the negative half interval. As a remedy to this problem, Leaky
ReLU (LReLU), which uses a leaky value if the function is in the negative interval, has been
proposed (Common activation functions 2022; Maas et al. 2013). The LReLU’s output has
a tiny slope relative to the negative input, and its derivative does not always result in zero.
Therefore, it overcomes the problem and has been used in this work instead of ReLU. In
the original Pix2pix’s discriminator network, the activation function is already the LReLU.
Therefore, it has not been changed. In the output layer, to perform binary classications
between fake and real images, the activation function is sigmoid like in Pix2pix.
3.4 Optimization function
The optimization function used in the original Pix2pix’s discriminator network is Adap-
tive moment estimation (Adam). However, researchers have found that stochastic gradient
descent generalizes better than the Adam optimizer (Wilson et al. 2017). Therefore, optimi-
zation in the discriminator part has been provided by stochastic gradient descent. The opti-
mization function in the original Pix2pix’s generator network is already stochastic gradient
descent. Therefore, it has not been changed in this work.
4 Experimental results
4.1 Datasets and evaluation metrics
In this work, public dermoscopy images provided from the HAM10000 database (Tschandl
et al. 2018) have been used. The database includes these seven types of skin diseases: (1)
vascular lesions (VASC), which appear as purple or red color circles; (2) actinic kerato-
ses (AKIEC), which are generally non-pigmented; (3) benign keratosis-like lesions (BKL),
resembling melanoma; (4) melanoma (MEL), malignant neoplasms; (5) dermatobroma
(DF), identied by reticulated lines at the edges with a white center; (6) melanocytic nevi
(NV), benign neoplasm; (7) basal cell carcinoma (BCC), characterized by pigmented, nodu-
lar, at, and cystic lesions. Example images for each type are presented in Fig. 3.
A total of 770 original images from seven classes (110 images from each class) have
been taken from the database. After augmentations, a total of 2065 images (containing the
original and augmented images) have been obtained. 80% (1652 images) have been used for
training, while 20% (413 images) have been utilized for testing. A 5-fold cross-validation
approach has been utilized. The quantitative results in Sect. 4.2 represent the average results
of the 5-fold cross-validation.
E. Goceri
To evaluate the eectiveness of the GAN-based augmentation methods in classifying
images, numerical comparisons and analyses have been conducted using these metrics:
racy =
2×TPositive +FPositive +FNegative
) (13)
where the term MCC refers to the Matthew correlation coecient, the terms F and T stand
for false and true, respectively.
The networks used in this study have been trained with a mini-batch size of 128 for 600
epochs. All the methods have been applied by using Python, 64 GB RAM, i7-6900 K CPU,
and a single 12 GB GeForce GTX 1080 Ti.
4.2 Detailed results
In this work, both qualitative and quantitative evaluations have been performed. For quali-
tative evaluations, the generated images obtained from the proposed GAN structure have
been visually compared with the images obtained from the original Pix2pix structure and
other state-of-the-art GANs used for augmenting dermoscopy images (Sect. 2.3). Example
images generated by all GANs applied in this work are presented in Fig. 4.
For quantitative evaluations, the impact of the proposed augmentation model on image
classication has been investigated. For this purpose, the Inception-v4 structure (Szegedy
et al. 2017) was used. The reason for choosing this network is its top ranking in classifying
melanoma cases at the ISIC 2017 Challenge (Menegola et al. 2017). Also, the Inception-
Fig. 3 Example images from the HAM10000 database (Tschandl et al. 2018)
1 3
234 Page 12 of 19
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
GAN based augmentation using a hybrid loss function for dermoscopy…
ResNet-V2 network (Szegedy et al. 2017), which is an updated form of the Inception-v4,
was used. Instead of initializing these classiers with random weight values, they were ini-
tializaed with pre-trained weights on ImageNet datasets. It should be reminded here that the
aim of this study is to assess the eciency of GAN-based augmentation methods applied to
dermoscopy images, rather than evaluating classier models. Also, the classication results
have been compared with results obtained from augmented images provided by other GANs
applied to generate dermoscopy images in the literature (Sect. 2.3). To make meaningful
comparisons, the same datasets and classiers have been used. The results obtained from
the Inception-V4 and Inception-ResNet-V2 classiers are presented in Tables 3 and 4,
Table 3 Classication results (in percentage) from the Inception-V4 architecture
GAN Model Accuracy Sensitivity Specicity F1-Score MCC
Pix2pix (Isola et al. 2017)74.76 77.90 96.02 73.06 71.11
DCGAN (Mutepfe et al. 2021)78.57 78.45 96.52 77.72 74.73
StyleGAN (Bissoto et al. 2021)80.48 80.07 96.88 78.72 76.52
SL-StyleGAN (Qin et al. 2020)84.86 84.82 97.40 83.06 81.22
SPGGAN (Abdelhalim et al. 2021)85.71 86.16 97.74 84.48 83.10
SPGGAN-TTUR (Abdelhalim et al. 2021)87.92 88.53 98.04 86.67 85.57
The Proposed Model 91.62 91.95 98.62 91.03 90.05
Fig. 4 Example images generated by Pix2pix (a), DCGAN (b), StyleGAN (c), SL-StyleGAN (d), SPG-
GAN (e), SPGGAN-TTUR (f), and the proposed GAN model (g)
E. Goceri
5 Discussion
Nowadays, in the era of deep learning, automated techniques with deep networks are domi-
nate visual examinations of dermoscopy images, which have traditionally been conducted
by dermatologists. This is because these techniques provide accurate, objective, and rapid
results. However, one of the fundamental challenges of automated analyses and classica-
tions is to achieve satisfactory results using limited labeled datasets. Image augmentation
is an ecient solution to overcome this challenge. Augmentation of medical images using
GAN structures has been explored in the literature to alleviate the scarcity of images in
developing deep learning-based applications. However, there is no GAN model that can
be applied for the augmentation of all kinds of medical images eciently, since medical
images have dierent characteristics due to several reasons such as imaging technique and
environmental factors. In addition, although there are many works on augmentation and
processing of MR images, there are only a few studies on the augmentation of dermoscopy
images with GANs, and each of them has some drawbacks (Sect. 2.3). Furthermore, it is
not clear which one is the best, since their evaluations have been performed with dierent
metrics and dierent datasets have been used. Moreover, dierent classiers have been
applied to evaluate their performances in classications. For meaningful comparisons and
identication of the most appropriate model, they should be applied with the same datasets,
and their performances should be assessed using the same classiers and metrics. Therefore,
in this work, the eectiveness of those GAN-based augmentation techniques for dermos-
copy images and their signicance in classication has been presented by fair comparisons.
The original Pix2pix model produces blurs and artifacts which result in the loss of details
and unrealistic lesion appearances. Because of the low quality of the generated images, this
model is not suitable for the augmentation of dermoscopy images and leads to low accuracy
when used in their classication. On the other hand, it has the advantage of being able to
perform stable training using input pairs of images and resolves the instability problem in
the existing GANs.
The DCGAN model can learn to produce skin lesion samples with good diversity by
utilizing its end-to-end training. However, the convolution layers in the architecture are
stridden and cause artifacts. Also, the generated images by this model lack ne-grained
details and suer from collapsing mode problems. Therefore, they are not suitable for use
in training sets.
The StyleGAN is not appropriate for generating skin lesion samples in spite of its success
in applications with natural images. This is because it was designed for images with obvi-
ous, changeable styles and continuous information. There are major dierences between
Table 4 Classication results (in percentage) from the Inception-ResNet-V2 architecture
GAN Model Accuracy Sensitivity Specicity F1-Score MCC
Pix2pix (Isola et al. 2017)79.52 79.34 96.68 78.65 75.81
DCGAN (Mutepfe et al. 2021)80.00 79.57 96.75 79.08 76.25
StyleGAN (Bissoto et al. 2021)82.14 86.10 97.65 83.45 82.81
SL-StyleGAN (Qin et al. 2020)86.67 88.01 97.91 85.23 84.39
SPGGAN (Abdelhalim et al. 2021)87.64 88.53 98.04 86.67 85.57
SPGGAN-TTUR (Abdelhalim et al. 2021)89.42 90.50 98.12 87.31 86.33
The Proposed Model 93.12 94.20 98.99 93.71 92.88
GAN based augmentation using a hybrid loss function for dermoscopy…
such images and dermoscopy images, which are inuenced by dierent styles. The features
of lesions (e.g., styles, colors, and patterns) in dermoscopy images do not change mean-
ingfully or continuously like facial images do, and they are much less plentiful. Mixing
of styles can cause overlapping of features and even the occurrence of irrelevant styles.
Because of this, style changes applied using other types of images are inappropriate for
dermoscopy images and result in images with poor quality.
In GANs with attention mechanisms, such as SL-StyleGAN, SPGGAN, and SPPGAN-
TTUR, attention is assigned to more important features. The important features in dierent
medical images is not the same for dierent applications (e.g., segmentation, classication,
and augmentation). For instance, the features on the edges (boundaries) of the skin lesions
are more important than the textural (pattern) features inside the lesions for the segmenta-
tion of the lesions, while the opposite is true for the classication of the lesions. Both texture
and edge features are important for the augmentation process. The SL-StyleGAN-based
augmentation needs more work to improve its performance because, although it can provide
the elimination of artifacts and noise, the diversity in the images generated by this method
is not enough. The SPGGAN and SPGGAN-TTUR are based on progressive growth and
provide better performance than the style-based GANs. However, the synthetic images gen-
erated using the SPGGAN lack ne details, and the SPGGAN-TTUR needs to be improved
since it still suers from several artifacts in the generated images.
The authors in (Andrade et al. 2020) employed CycleGAN. However, their aim for image
augmentation is to improve the accuracy of lesion segmentation, wherein edge information
holds greater signicance than internal lesion details. Comparisons of the generated images
from an augmentation method, which has been specied to enhance segmentation perfor-
mance, with the images generated to improve classication tasks will not be meaningful.
Therefore, CycleGAN has not been applied in this work.
Adversarial loss functions can preserve features with the aid of the discriminator detect-
ing non-existent features and unrealistic images. However, these loss functions greatly
inuence the performance of network structures and the quality of synthetic images. They
should be chosen carefully to avoid producing blurry or unrealistic images.
Despite the improvements made by enhancing GAN architectures, properly synthesizing
details within the image remains a challenge. This is because earlier research has typically
focused on detecting either high-frequency or low-frequency details. Experiments indicate
that the proposed GAN model is promising since it incorporates features characteristic of
a skin lesion. Its high performance has been achieved through: (i) the Pix2pix backbone,
which provides stable training; (ii) the hybrid loss function, incorporating a feature-wise
loss and eciently capturing both low- and high-frequency details; and (iii) appropriate
activation and optimization functions. In the hybrid loss function, the SSIM loss preserves
brightness and contrast in regions with high frequency, the L1 loss maintains low-frequency
information, and the content loss provides the training stability necessary for convergence.
Additionally, activation and optimization functions have crucial impacts on the performance
of deep learning models. The reason is that inappropriate activation functions may cause
gradients to exponentially explode or vanish during back-propagation, resulting in the loss
of input information during forward propagation (Hayou et al. 2019). Also, researchers have
found that stochastic gradient descent generalizes better than the Adam optimizer (Wilson
et al. 2017). Due to these reasons, optimization and activation functions have been selected
carefully in this study. Therefore, the eect of the proposed network is more appropriate
E. Goceri
than others and slightly provides better performance in the classication of skin lesions
(Tables 3 and 4).
6 ConclusIon
The main challenge of deep learning-based analyses and classications of dermoscopy
images is the necessity of large training datasets, and image augmentation presents an e-
cient solution to overcome this problem. In this work, a GAN model has been proposed for
the augmentation of dermoscopy images. Furthermore, its eect on the classication of the
images has been assessed. Additionally, GAN-based augmentation methods applied with
dermoscopy datasets in the literature have been implemented using the same datasets, and
their eectiveness in the classication of skin lesions has been compared.
Qualitative evaluations indicated that the images generated by the Pix2pix, StyleGAN,
and DCGAN architectures produce images with poor quality (Fig. 4.a, b, c). They lead to
various artifacts and are not appropriate for generating realistic dermoscopy images. The
SL-StyleGAN, SPGGAN, SPGGAN-TTUR, and the proposed structure can produce visu-
ally more realistic images (Fig. 4.d, e, f). However, quantitative results with ve metrics
indicated that they lack ne-grained detail, and the proposed model provides slightly better
results in the classication than the others (Tables 3 and 4). The classication accuracy
obtained by the proposed augmentation is a minimum of 3.7% higher than its counter-
parts. The results conrm that the proposed approach contains features that characterize a
lesion and is promising. The hybrid loss function used in the proposed GAN model uses a
feature-wise loss and provides eciency in capturing both low- and high-frequency details.
This yields high accuracy in the classication stage. If images with reduced features are
used, then the accuracy decreases since high-frequency or low-frequency details are lost
in those images. Also, it has been observed that the Inception-ResNetV2 network has the
ability to classify skin lesions better than the Inception-V4 network. Although the proposed
augmentation is ecient for dermoscopy images, it has not been applied to other medical
images acquired with dierent techniques (such as MR, CT, PET, X-ray, and ultrasound)
that have very dierent characteristics caused by various reasons (such as imaging tech-
nique and environmental factors). This is a limitation of this work. Therefore, the proposed
augmentation will be applied to other kinds of images, and its performance will be tested
in future works. We believe that deep learning-based classications using the generated
images from the proposed augmentation will improve the performance of automated diag-
nosis in dermatology.
Author contributions This is a single-authored article.
Competing interests The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and
reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons licence, and indicate if you modied the licensed material.
You do not have permission under this licence to share adapted material derived from this article or parts
GAN based augmentation using a hybrid loss function for dermoscopy…
of it.The images or other third party material in this article are included in the article’s Creative Commons
licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s
Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this
licence, visit
Page 19 of 19 234
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Purpose Since Generative Adversarial Network (GAN) was introduced into the field of deep learning in 2014, it has received extensive attention from academia and industry, and a lot of high-quality papers have been published. GAN effectively improves the accuracy of medical image segmentation because of its good generating ability and capability to capture data distribution. This paper introduces the origin, working principle, and extended variant of GAN, and it reviews the latest development of GAN-based medical image segmentation methods. Method To find the papers, we searched on Google Scholar and PubMed with the keywords like “segmentation”, “medical image”, and “GAN (or generative adversarial network)”. Also, additional searches were performed on Semantic Scholar, Springer, arXiv, and the top conferences in computer science with the above keywords related to GAN. Results We reviewed more than 120 GAN-based architectures for medical image segmentation that were published before September 2021. We categorized and summarized these papers according to the segmentation regions, imaging modality, and classification methods. Besides, we discussed the advantages, challenges, and future research directions of GAN in medical image segmentation. Conclusions We discussed in detail the recent papers on medical image segmentation using GAN. The application of GAN and its extended variants has effectively improved the accuracy of medical image segmentation. Obtaining the recognition of clinicians and patients and overcoming the instability, low repeatability, and uninterpretability of GAN will be an important research direction in the future.