ArticlePDF Available

GAN based augmentation using a hybrid loss function for dermoscopy images

Authors:

Abstract and Figures

Dermatology is the most appropriate field to utilize pattern recognition-based automated techniques for objective, accurate, and rapid diagnosis because diagnosis mainly relies on visual examinations of skin lesions. Recent approaches utilizing deep learning techniques have shown remarkable results in this field. However, they necessitate a substantial quantity of images and the availability of dermoscopy images is often limited. Also, even if enough images are available, their labeling requires expert knowledge and is time-consuming. To overcome these issues, an efficient augmentation approach is needed to expand training datasets from input images. Therefore, in this work, a generative adversarial network has been developed using a new hybrid loss function constructed with traditional loss functions to enhance the generation power of the architecture. Also, the effect of the proposed approach and different generative network-based augmentations, which have been used with dermoscopy images in the literature, on the classification of skin lesions has been investigated. Therefore, the main contributions of this work are: (i) introducing a new generative model for the augmentation of dermoscopy images; (ii) presenting the effect of the proposed model on the classification of the images; (iii) comparative evaluations of the effectiveness of different generative network-based augmentations in the classification of seven forms of skin lesions. The classification accuracy when the proposed augmentation is used is 93.12%, which is higher than its counterparts. Experimental results indicate the significance of augmentation techniques in the classification of skin lesions and the efficiency of the proposed structure in improving the classification accuracy.
This content is subject to copyright. Terms and conditions apply.
Accepted: 25 July 2024 / Published online: 7 August 2024
© The Author(s) 2024
Evgin Goceri
evgin@akdeniz.edu.tr
1 Department of Biomedical Engineering, Engineering Faculty, Akdeniz University, Antalya,
Turkey
GAN based augmentation using a hybrid loss function for
dermoscopy images
EvginGoceri1
Articial Intelligence Review (2024) 57:234
https://doi.org/10.1007/s10462-024-10897-x
Abstract
Dermatology is the most appropriate eld to utilize pattern recognition-based automated
techniques for objective, accurate, and rapid diagnosis because diagnosis mainly relies on
visual examinations of skin lesions. Recent approaches utilizing deep learning techniques
have shown remarkable results in this eld. However, they necessitate a substantial quantity
of images and the availability of dermoscopy images is often limited. Also, even if enough
images are available, their labeling requires expert knowledge and is time-consuming. To
overcome these issues, an ecient augmentation approach is needed to expand training
datasets from input images. Therefore, in this work, a generative adversarial network has
been developed using a new hybrid loss function constructed with traditional loss func-
tions to enhance the generation power of the architecture. Also, the eect of the proposed
approach and dierent generative network-based augmentations, which have been used
with dermoscopy images in the literature, on the classication of skin lesions has been
investigated. Therefore, the main contributions of this work are: (i) introducing a new
generative model for the augmentation of dermoscopy images; (ii) presenting the eect
of the proposed model on the classication of the images; (iii) comparative evaluations of
the eectiveness of dierent generative network-based augmentations in the classication
of seven forms of skin lesions. The classication accuracy when the proposed augmenta-
tion is used is 93.12%, which is higher than its counterparts. Experimental results indicate
the signicance of augmentation techniques in the classication of skin lesions and the
eciency of the proposed structure in improving the classication accuracy.
Keywords Augmentation · Deep learning · Dermoscopy images · Generative adversarial
networks · Loss function
1 3
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
E. Goceri
1 Introduction
Dermoscopy is a signicant imaging technique for screening skin lesions. Like other
advancements in applied science (Dehghani et al. 2020; Ghiasi et al. 2019, 2023a, b), auto-
mated analyses of dermoscopy images have taken great attention recently (Raza et al. 2022;
Dave et al. 2022; Majidian et al. 2022). Because diagnosis mainly relies on visual examina-
tions of skin lesions. Therefore, dermatology is the most appropriate eld to utilize pattern
recognition-based techniques for objective, accurate, and rapid diagnosis. Although auto-
mated techniques developed utilizing deep learning show remarkable results, they neces-
sitate a substantial quantity of images. However, the availability of dermoscopy images is
usually limited because of various reasons such as not allowing the use of images of patients,
not having enough patients for some diseases, inability to capture images that meet desired
properties, lack of medical equipment and devices, and inability to access the images of
patients from dierent races or living in dierent locations. Furthermore, even if enough
images are available, their labeling necessitates expert knowledge and is time-consuming.
In the literature, various augmentation approaches have been implemented for training
deep network architectures using extended dermoscopy image datasets (Raza et al. 2022;
Galdran et al. 2017; Perez et al. 2018). However, they have limitations or disadvantages.
For instance, augmentations with geometric transformations (e.g., ipping, translation, scal-
ing, shearing, etc.) produce images with the same texture as the original image data. They
simply duplicate the original images and do not change intensity values. Also, they might
not be label-preserving transformations. Therefore, although they increase the number of
images, they do not add novel visual features to the images that are able to increase the gen-
eralization ability of the networks. To augment images by increasing both the number and
variations of them, augmentations with random erasing, cropping, noise addition, blurring,
and color changing have been applied. However, they may cause a loss of critical lesion
information. Also, color conversions can discard signicant color information and therefore
they may not preserve labels. It might be impossible to see or detect the lesions in the image
if the color or intensity values are decreased to simulate a darker background or environ-
ment (Galdran et al. 2017; Perez et al. 2018).
Augmentations with Generative Adversarial Networks (GANs) can generate new realis-
tic images. GANs’ goal is to learn real data distributions from a limited number of data sets
and then produce new images by using the learned distributions. For this goal, they use the
capability of networks in order to learn a function, which is able to approximate the gener-
ated image’s distribution to the original image’s distribution as closely as possible. GANs
do not rely on assumption on data distributions and are able to produce new images with
high visual delity. They are able to learn to generate images in many variations (e.g., back-
grounds, scale, light conditions, viewpoints, etc.) and for any desired number of classes.
Although there have been considerable improvements in generating realistic images, GAN
structures have these three main drawbacks: mode collapse, non-convergence, and vanish-
ing gradient problems. Also, researchers developing applications are interested in control-
ling the advance in the content of the generated images (Yan et al. 2015; Tan et al. 2020).
The standard GAN structure (Goodfellow et al. 2014) has no control over the modes of the
produced images. Among all GAN variants, conditional GAN (cGAN) (Mirza and Osindero
2014) shows promise (Sect. 2.1). The cGAN learns to generate images by using a condi-
tional distribution rather than marginal distributions. It has been developed to have control
1 3
234 Page 2 of 19
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
GAN based augmentation using a hybrid loss function for dermoscopy…
over the types of produced images by adjusting the model based on prior information. How-
ever, the cGAN structure is vulnerable to mode collapse problems.
Loss functions have a critical role in the performance of GANs. The loss functions in
GANs are known as adversarial loss functions, which make estimations of the distances
between the generated and original images’ distributions. Two loss functions working
together are used to construct an adversarial loss function. One of them is for training the
discriminator part, and the other is for training the generator part (Tzeng et al. 2017).
The collapsing mode problem in the cGAN can be alleviated by integrating hybrid loss
functions. Also, earlier research has demonstrated that combining a traditional loss function
with the adversarial loss function is benecial in generating images with high quality by
GANs (Pathak et al. 2016; Isola et al. 2017). Therefore, in this study, a hybrid loss function
has been designed and implemented to train a modied cGAN model and to improve its
generation power. Also, the eectiveness of the proposed augmentation approach in clas-
sifying dermoscopy images has been evaluated quantitatively.
Although GANs have been increasingly applied in medical applications (e.g., classi-
cation, segmentation, synthesis, and reconstruction (Sect. 2)), there are only a few works
based on GANs for the augmentation of dermoscopy images. Also, their performances
have been compared by using dierent measurements, such as recall, sensitivity, t-test, etc.
(Sect. 2.3). Moreover, they have been implemented with dierent datasets (which include
images showing dierent types of skin diseases) and classication architectures. Because of
these dierences, comparisons of their eectiveness in the classications according to the
results presented in the articles will not be meaningful, and it is unclear which one of them
is the most appropriate technique. Therefore, in this work, they have been applied with the
same datasets and classiers. Also, comparative evaluations have been performed using the
same metrics to determine the best GAN structure and to compare their eects on the clas-
sication of dermoscopy images.
The main contributions of this work are:
(i) Introducing a generative model with a new hybrid loss function.
(ii) Demonstrating the ability of the proposed model in the augmentation of dermoscopy
images.
(iii) Presenting the eect of the proposed model on the classication of seven forms of skin
lesions.
(iv) Comparing GAN-based augmentation methods, which have been applied to dermos-
copy images in the literature, using the same datasets, and comparing their eectiveness
in classifying skin lesions.
We believe that deep learning-based analyses and classications using the generated images
from the proposed augmentation will assist dermatologists in diagnosing dierent forms
of skin lesions. This paper has been organized as follows: A brief overview of the GAN
structure, GAN-based applications, and augmentation methods proposed for dermoscopy
images in the literature are presented in the “Related Work” section. The proposed method is
explained in the “The Proposed Method” section. The datasets and evaluation metrics used
in this work, and the results are given in the “Experimental Results” section. Discussions
and conclusions are presented in the “Discussion” and “Conclusion” sections, respectively.
1 3
Page 3 of 19 234
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
E. Goceri
2 Related work
2.1 Background: GAN structure
GANs are generative models that have the ability to learn real data distributions from avail-
able images and produce new images using the learned distributions. GANs consist of two
deep neural network architectures, namely; discriminator D and generator G. The word
‘adversarial’ in GANs means that the D and G network structures are in competition with
each other. The G part aims to generate sample images, while the D part aims to separate the
generated samples and original images. Therefore, GANs try to nd the optimal mapping
function, written with the loss function LGAN, by using:
G
=arg min
G
max
D
LGAN
(
G, D
)
(1)
The G network is trained to create sample data, which cannot be separated from original
images by the D network trained to separate them as well as possible. Therefore, GANs’
training is performed by alternating between G and D training to provide their improvement
gradually. This is because too much improvement of one of them leads to the failure of the
other. The training is conducted using back-propagation. Figure 1 shows a GAN structure
with a sample distribution (e.g., Gaussian) used by the G.
To form an adversarial loss function, GANs have two loss functions working together.
One of them (G loss) is used to train the generator part, while the other (D loss) is used to
train the discriminator part (Tzeng et al. 2017). A review work has explained that a network
trained with a mean square loss function may blur images by averaging all the features in an
image, in contrast, an adversarial loss function can preserve the features with the help of the
discriminator detecting nonexistent features and non-realistic images (Lotter et al. 2016).
In other words, features can remain preserved very well using an additional adversarial
loss (Lotter et al. 2016). However, although there have been considerable improvements in
producing realistic images, GAN structures have these three main drawbacks: non-conver-
gence, mode collapse, and vanishing gradient problems (Chen et al. 2019; Lin et al. 2018;
Salimans et al. 2016). Also, researchers developing applications are interested in controlling
the advancement in the content of the produced images (Tan et al. 2020; Yan et al. 2015).
The standard GAN structure (Goodfellow et al. 2014) has no control over the modes of
the produced images. To address these challenges and increase the performance of GANs,
research in the literature has been conducted either by modifying loss functions or enhanc-
Fig. 1 A typical GAN architecture
1 3
234 Page 4 of 19
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
GAN based augmentation using a hybrid loss function for dermoscopy…
ing GAN architectures (Oord et al. 2016). Among all GAN variants, condition-based GAN
architectures can signicantly increase the quality of produced images (Oord et al. 2016).
A condition-based GAN can learn to map from an original image (x) and a noise vector
(z) to an output image (y) (G:{x, z}→y); on the other hand, a non-condition-based GAN
can learn to map from a noise vector to an output (Goodfellow et al. 2014). Condition-
based GANs control the mode of the produced image by conditioning the architecture with
a conditional variable. In a condition-based GAN, the adversarial loss function (LcGAN) is
obtained using the expected value E as:
LcGAN
(
G, D
)=
Ex,y
[log
D
(
x, y
)] +
Ex,z
[log(1
D
(
x, G
(
x, z
))]
(2)
The objective function is expressed by
G=arg min
G
max
D
LcGAN
(
G, D
)
, where the gen-
erator part aims to minimize the value of the LcGAN function; however, the discriminator part
aims to maximize it. The discriminator part uses the input image in the LcGAN. However, it
does not use the input image in the loss function (LGAN), which can be given by:
L
GAN
(
G, D
)=
Ey
[log
D
(
y
)] +
Ex,z
[log(1
D
(
G
(
x, z
))]
(3)
The condition-based GAN architecture, known as cGAN and used in (Mirza and Osindero
2014), alleviates the side information of annotated labels of sample images by training a
conditional generator to generate conditional images from class labels (Odena et al. 2017;
Miyato and Koyama 2018; Brock et al. 2019). Image generation based on a condition enables
exibility in augmentations and high resolution. However, the cGAN is a vulnerable model
to the mode collapse issue. Therefore, to solve this problem, various cGAN-based models
have been employed in dierent works, such as image-to-image translation operations (e.g.,
semantic labeling of photographs, photograph sketching, mapping of aerial photographs,
and background removal) (Isola et al. 2017) and image inpainting (Liu et al. 2021). In
particular, the Pixel-to-Pixel (Pix2Pix) model (Isola et al. 2017) has achieved satisfactory
results for many images (e.g., MR, Computed Tomography (CT) and has been commonly
applied, especially for the synthesis and reconstruction of medical images (Table 1).
2.2 GAN-Based augmentations and applications with medical images
Most of the GAN-based augmentations of medical images presented in the literature have
been performed with Magnetic Resonance (MR) images (Makhlouf et al. 2023; Kazeminia
et al. 2020; Sampath et al. 2021; Creswell et al. 2018). One reason for this is that MR imag-
ing requires an excessive amount of scanning time to acquire multiple sequences. On the
other hand, GANs can generate specic sequences from those already acquired. Another
possible reason can be the large amount of MR images in the publicly available domain
that can be used by researchers for the training of deep networks. Therefore, GAN-based
applications in medicine have generally been with MR images.
In the literature, medical image analyses and applications using GANs have mostly been
performed in the eld of segmentation, reconstruction, and classication (Sampath et al.
2021; Jeong et al. 2022; Xun et al. 2022; AlAmir and AlGhamdi 2022). This is because
regulations about the generator’s outputs and adversarial training can be ecient for image-
to-image translations. Even though GANs have been increasingly applied in medical appli-
1 3
Page 5 of 19 234
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
E. Goceri
cations, there are only a few works based on GANs for the augmentation of dermoscopy
images (Sect. 2.3).
2.3 GAN-Based augmentation methods for Dermoscopy images
In the literature, only a few GAN models have been proposed for the augmentation of der-
moscopy images. One of them is StyleGAN, which has been applied by mixing regulariza-
tion that uses two random latent codes to augment dermoscopy images (Gong et al. 2020;
Bissoto et al. 2021). It switches between latent codes to check the styles. Experiments indi-
cated that the model is ecient for augmentation and produces classication results with
97.5% accuracy (Gong et al. 2020). However, StyleGAN has been designed mainly for
natural images (which have obviously changeable styles and continuous information) and
may result in poor quality when generating skin lesion images, which have dierent styles,
colors, and patterns. Therefore, style mixing can cause the overlapping of some features of
skin lesions and even the creation of irrelevant styles.
In (Qin et al. 2020), Skin-Lesion StyleGAN (SL-StyleGAN) has been applied to aug-
ment dermoscopy images. The authors used varying numbers of fully connected layers (e.g.,
2 and 6 fully connected layers) to solve the collapsing mode problem. They evaluated the
performance of generated images with recall scores. Their results indicate that the SL-Style-
GAN constructed by a generator with four fully connected layers can achieve a higher recall
value (0.26) compared to the other generators with dierent numbers of layers. However,
the generated images are not completely diverse according to the results presented by the
authors. Further work is needed to solve this issue.
In (Abdelhalim et al. 2021), an extension of Progressive Growing of GAN (PGGAN),
namely Self-attention Progressive Growing of GAN (SPGGAN), has been applied. It pro-
Table 1 Medical image synthesis and reconstruction applications with the Pix2pix model
Reference Image Type Application Explanation
(Mardani et al. 2019; Ran et al.
2019; Armanious et al. 2019a; Ar-
manious et al. 2019b; Seitzer et al.
2018; Kim et al. 2017; Quan et al.
2018; Yang et al. 2018; Zhang et al.
2018; Shitrit and Raviv 2017)
MR Reconstruction The Pix2pix model has been
applied for the reconstruction of
under-sampled k-space, super-
resolution, motion correction,
and inpainting
(Liu et al. 2020; Shan et al. 2018;
You et al. 2018, 2020; Yi and Babyn
2018; Liao et al. 2018; Wolterink et
al. 2017)
CT Reconstruction The Pix2pix model has been
applied for denoising and sparse
view CT reconstruction
(Mahapatra 2017)Retinal fun-
dus image
Reconstruction The Pix2pix model has been ap-
plied to provide super-resolution
(Jin et al. 2018) CT Synthesis Conditional image synthesis has
been applied with the Pix2pix
model using conditional informa-
tion such as a segmentation map
(Shin et al. 2018)MR
(Mahapatra et al. 2018; Oh and Yun
2018)
X-ray
(Maspero et al. 2018)MRCT Cross modality
synthesis
Paired training of images has
been performed with the Pix2pix
model to analyze prostate cancer
(Choi and Lee 2018)PETMR Paired templates have been used
in the training stage of the Pix-
2pix model for brain imaging
1 3
234 Page 6 of 19
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
GAN based augmentation using a hybrid loss function for dermoscopy…
vides higher sensitivity than its counterparts in the recognition of skin lesions according to
the presented results. Although SPGGAN has higher performance than PGGAN in image
generation, the generated images lack ne details and are far from real images. Therefore,
in (Abdelhalim et al. 2021), a combination of SPGGAN and Two Time-scale Update Rules
(TTUR) called SPGGAN-TTUR has been used. The performance of the structure has
been evaluated with the P-value of the T-test (PVT). The authors have concluded that the
model outperforms the SPGGAN, since the PVT of 68.1 ± 0.8% for training datasets and
60.8 ± 1.5% for testing datasets has been achieved. On the other hand, the SPGGAN-TTUR
model still leads to several artifacts and needs to be improved (Abdelhalim et al. 2021).
In (Mutepfe et al. 2021), a Deep Convolutional GAN (DCGAN) has been applied by
using four convolutional layers, and it was observed that the classication of the images
can be performed with 93.5% accuracy. Similarly, in (Bisla et al. 2019), authors applied two
DCGANs for two classes (seborrheic keratosis and melanoma) and have obtained a clas-
sication accuracy of 91.5%. However, images generated by DCGAN suer from artifacts.
The authors have concluded that it is challenging to create samples with high quality and
to compare dierent classication techniques, since non-public datasets have been used in
some methods.
In a dierent study, a combination of DCGAN and Laplacian GAN (LAPGAN) has been
applied (Pollastri et al. 2020). Its performance has been evaluated in a segmentation task
rather than a classication. The authors used several metrics (i.e., Jaccard index, entropy,
accuracy). They concluded that DCGAN causes heavy checkerboard eects in the gener-
ated images and leads to low accuracy. The state-of-the-art GANs employed with dermos-
copy image data sets and important information about them are presented in Table 2.
Although GANs have been employed in various applications (e.g., image synthesis,
registration, detection (Sect. 2.2)), only a few GAN models have been proposed for the
augmentation of dermoscopy images. They have been applied with dierent datasets and
Table 2 The GANs applied with dermoscopy images
Reference GAN Result Drawback
(Gong et al.
2020; Bissoto
et al. 2021)
StyleGAN It improves classication perfor-
mance and can provide results with
97.5% accuracy, according to the
ndings in (Gong et al. 2020).
It can cause the overlapping
of some features and even the
creation of irrelevant styles,
resulting in poor quality.
(Qin et al.
2020)
SL-StyleGAN The generator constructed using four
fully connected layers can achieve a
recall score of 0.26.
The generated images are not
completely diverse, and the
method needs improvement.
(Abdelhalim
et al. 2021)
SPGGAN It can provide a 2.5% higher sen-
sitivity in recognizing skin lesions
when compared to its counterparts.
Although SPGGAN has better
performance than PGGAN,
the images generated by SPG-
GAN lack ne details.
(Abdelhalim
et al. 2021)
SPGGAN-TTUR The PVT of 60.8 ± 1.5% for testing
sets and 68.1 ± 0.8% for training sets
has been achieved.
The authors have concluded
that the model still produces
several artifacts and needs
improvement.
(Mutepfe
et al. 2021;
Bisla et al.
2019)
DCGAN Classications of melanoma and
seborrheic dermatitis can be per-
formed with 91.5% accuracy (Bisla
et al. 2019), classication of other
skin lesions can be done with 93.5%
accuracy (Mutepfe et al. 2021).
Generating high-quality im-
ages similar to real ones and
comparing dierent classi-
cation techniques are dicult
since some techniques use
data sets that are not public.
1 3
Page 7 of 19 234
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
E. Goceri
classication architectures (mostly ResNet-50 and InceptionV3), and also their perfor-
mances have been assessed with dierent measurements, such as recall, sensitivity, t-test,
etc. Because of these dierences, comparisons of their eectiveness in the classications,
according to the results presented in the articles, will not be meaningful, and it is unclear
which one of them is the most appropriate approach. Also, each of them has a drawback, and
a new more ecient augmentation algorithm is needed to produce skin lesion images with
high quality. Therefore, in this work, they have been implemented with the same datasets
and the same classier, and comparative evaluations have been performed using the same
metrics to determine the best GAN structure. Also, their eects on the classication of der-
moscopy images have been compared. Additionally, a new GAN model with a hybrid loss
function has been applied, and its eect on the classication of the images has been evalu-
ated. To the best of our knowledge, there is no study with these features in the literature.
3 The proposed method
The proposed augmentation technique is based on a cGAN structure because of its ability
to generate images with high quality (Oord et al. 2016). In this work, an ecient extension
form of it, namely the Pix2pix architecture, has been used as a backbone since it solves the
instability problem of the other GANs and provides the advantage of training stably using
input pair images. Also, it has yielded satisfactory results for most of the images (Isola et al.
2017) and is commonly applied, particularly for the reconstruction and synthesis of medical
images (Table 1).
A new hybrid loss function has been designed and used in the architecture, since loss
functions greatly inuence the performance of the network. They should be chosen care-
fully to avoid producing blurry images, which occurs with common loss functions like
mean-square error, and to obtain realistic images. Also, to increase the performance in the
augmentation of dermoscopy images, the activation function in the generator network has
been changed. In addition, the optimization function in the discriminator of the architecture
has been changed. The architecture of the backbone, the proposed loss function, and the
activation and optimization functions used in the architecture are explained in the following
sections.
3.1 Backbone
Both the discriminator and generator in the backbone, the Pix2pix structure, are relatively
dierent compared to the previous architectures extended from the cGAN. The generator is
an encoder-decoder type network, namely U-Net (Ronneberger et al. 2015). It can reduce
and increase the size of samples and enhance performance thanks to its skip connections
employed among the layers of the decoder and encoder. The discriminator part is the Patch-
GAN (Isola et al. 2017).
Figure 2 shows a graphical representation of the augmentation approach. The generator
produces fake images resembling real images by utilizing labeled images. Then, the outputs
of the generator part and the original image are used as inputs for the discriminator part,
which distinguishes fake images from real images and produces a patch as an output. Next,
the patch is used for a detailed comparison between the generated image and the original
1 3
234 Page 8 of 19
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
GAN based augmentation using a hybrid loss function for dermoscopy…
image. Based on the original image, it is determined whether the image being generated is
fake or real. After that, ne-tuning is applied to both the discriminator and generator based
on the determination result to generate a fake image resembling a real one. By feeding
labeled images into the trained network, an image reecting the shape and position of each
object can be generated. In this manner, the model creates images through a competition,
an iterative game, between the discriminator and generator, each with opposing objectives.
3.2 Loss function
In the Pix2pix model, the adversarial loss function plays a critical role. The generator’s loss
function incorporates L1 loss along with the cGAN loss. However, our experiments have
revealed that due to the constant gradient of the L1 function, the training is dicult to con-
verge to high accuracy, and the color saturation (color information) is not as similar to the
original image as expected (Sect. 4). In this study, to generate more realistic dermoscopy
images with high quality, a new loss function has been designed and applied.
In previous works, combining a traditional loss function with the adversarial loss func-
tion has been found to be benecial in achieving signicant improvements in GANs (Isola
et al. 2017; Pathak et al. 2016). Motivated by this, in this work, a hybrid loss function has
been used in the generator section of the Pix2pix architecture to obtain both the high- and
low-frequency details. The proposed loss function has been constructed by using three tra-
ditional loss functions: content loss, Structural Similarity Index Measurement (SSIM), and
L1 loss functions.
The content loss has been applied to make the generator more competitive. It oers
stability in training, which is required for convergence, and thus provides powerful results.
It is calculated by Euclidean distances between the content representations of the produced
images and the original images. The content representations are obtained from the feature
maps provided by the pre-trained VGG19Net (Simonyan and Zisserman 2015). Therefore,
it is a feature-wise function, which minimizes the dierence between the feature (content)
representations. The CNN-based content loss function is expressed by:
Fig. 2 A graphical representation of the architecture
1 3
Page 9 of 19 234
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
E. Goceri
Lcontent(x, y, l)=
1
2
ij
Yl
ij,Xl
ij
2
(4)
where the terms Y
l
ij
and
Xl
ij
correspond to the feature representation of y (the produced
image) and x (the original image) in the lth layer, respectively. To further improve detailed
information and image quality by preserving contrast and brightness in the regions with
high frequency, the SSIM loss function has been added. It is an important metric that mea-
sures the similarities between images with the formula:
L
SSIM
(
G
)=1
SSIM
(
pi,p
j
)
(5)
where the terms pj and pi refer to the pixel values in the y and x axis, respectively. The term
SSIM(pi,pj) is expressed by:
M(pi,p
j)=
µpiµpj
c1
σpipj
c2
µ2
pi+µ2
pj+c1
σ2
pi+σ2
pj+c2
(6)
In Eq. (6), the terms
µ
p
y
and
µ
p
x
refer to the means of the intensities in the y and x direc-
tions, respectively. The term
σ
p
i
p
j
refers to the covariance, while the terms σ
2
pjand
σ2
pi
refer
to the variance of the intensities in the y and x directions, respectively. The terms c1 and c2
are constants, which have been set as 1 × 10 4 and 9 × 10 4, respectively, (as in (Wang et al.
2004) to identify the dynamic range of the image.
Also, the L1 loss, whose minimization is associated with the shape of lesions, has been
integrated to ensure similarities between the original and generated images. It is calculated
using L1 distances between the generated and original images. In this work, a smooth form
of the L1 loss function has been implemented. This form has a variable gradient with a dif-
ferential value. It is not sensitive to boundaries and provides more smoothness than the L1
loss. This function is dened by:
LL1smooth(G)=
|yG
(
x, z
)
|−
0
.
4
,else
0.4
×
(y
G(x, z))2,if
|
y
G(x, z)
|
<1 (7)
where the terms x and y denote the original and produced images, respectively, while the
term z refers to the random noise. The constant 0.4 has been found empirically. The nal
objective function used for the generator part is expressed by:
G
=arg min
G
max
D
LcGAN
(
G, D
)+
β1Lcontent
(
G
)+
β2LSSIM
(
G
)+
β3LL1smooth
(
G
)
(8)
where the coecients β1, β2, and β3 have been set empirically to 10, 5, and 100, respectively.
The proposed hybrid loss function utilizes the advantages of dierent functions. The con-
tent loss function adds the feature-wise loss by comparing two images based on high-level
feature representations and provides stability. The SSIM loss function controls and pre-
serves brightness and contrast in regions with high-frequency. The smooth L1 loss maintains
low-frequency information to generate globally consistent images. Therefore, the weighted
1 3
234 Page 10 of 19
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
GAN based augmentation using a hybrid loss function for dermoscopy…
combination of the three loss functions will ensure that the low- and high-frequency details
are better captured, and the generator produces high-quality images.
3.3 Activation function
The activation function in the original Pix2pix’s generator network is a Rectied Linear Unit
(ReLU). However, ReLUs can be fragile during training (Cui and Fearn 2018). Because the
output value of ReLU is zero in case the input value is negative, and so its 1st derivative
is zero, which makes neurons unable to update parameters. In other words, neurons cannot
learn when the function is in the negative half interval. As a remedy to this problem, Leaky
ReLU (LReLU), which uses a leaky value if the function is in the negative interval, has been
proposed (Common activation functions 2022; Maas et al. 2013). The LReLU’s output has
a tiny slope relative to the negative input, and its derivative does not always result in zero.
Therefore, it overcomes the problem and has been used in this work instead of ReLU. In
the original Pix2pix’s discriminator network, the activation function is already the LReLU.
Therefore, it has not been changed. In the output layer, to perform binary classications
between fake and real images, the activation function is sigmoid like in Pix2pix.
3.4 Optimization function
The optimization function used in the original Pix2pix’s discriminator network is Adap-
tive moment estimation (Adam). However, researchers have found that stochastic gradient
descent generalizes better than the Adam optimizer (Wilson et al. 2017). Therefore, optimi-
zation in the discriminator part has been provided by stochastic gradient descent. The opti-
mization function in the original Pix2pix’s generator network is already stochastic gradient
descent. Therefore, it has not been changed in this work.
4 Experimental results
4.1 Datasets and evaluation metrics
In this work, public dermoscopy images provided from the HAM10000 database (Tschandl
et al. 2018) have been used. The database includes these seven types of skin diseases: (1)
vascular lesions (VASC), which appear as purple or red color circles; (2) actinic kerato-
ses (AKIEC), which are generally non-pigmented; (3) benign keratosis-like lesions (BKL),
resembling melanoma; (4) melanoma (MEL), malignant neoplasms; (5) dermatobroma
(DF), identied by reticulated lines at the edges with a white center; (6) melanocytic nevi
(NV), benign neoplasm; (7) basal cell carcinoma (BCC), characterized by pigmented, nodu-
lar, at, and cystic lesions. Example images for each type are presented in Fig. 3.
A total of 770 original images from seven classes (110 images from each class) have
been taken from the database. After augmentations, a total of 2065 images (containing the
original and augmented images) have been obtained. 80% (1652 images) have been used for
training, while 20% (413 images) have been utilized for testing. A 5-fold cross-validation
approach has been utilized. The quantitative results in Sect. 4.2 represent the average results
of the 5-fold cross-validation.
1 3
Page 11 of 19 234
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
E. Goceri
To evaluate the eectiveness of the GAN-based augmentation methods in classifying
images, numerical comparisons and analyses have been conducted using these metrics:
Se
nsitivity=
T
Positive
F
Negative
+T
Positive
(9)
Sp
ecificity=
T
Negative
F
Positive
+T
Negative
(10)
Accu
racy =
T
Negative
+T
Positive
F
Positive
+T
Positive
+F
Negative
+T
Negative
(11)
F
1=
2×T
Positive
2×TPositive +FPositive +FNegative
(12)
MCC=
T
Positive
×T
Negative
F
Positive
×F
Negative
(T
Positive
+F
Positive
)(T
Positive
+F
Negative
)(T
Negative
+F
Positive
)(T
Negative
+F
Negative
) (13)
where the term MCC refers to the Matthew correlation coecient, the terms F and T stand
for false and true, respectively.
The networks used in this study have been trained with a mini-batch size of 128 for 600
epochs. All the methods have been applied by using Python, 64 GB RAM, i7-6900 K CPU,
and a single 12 GB GeForce GTX 1080 Ti.
4.2 Detailed results
In this work, both qualitative and quantitative evaluations have been performed. For quali-
tative evaluations, the generated images obtained from the proposed GAN structure have
been visually compared with the images obtained from the original Pix2pix structure and
other state-of-the-art GANs used for augmenting dermoscopy images (Sect. 2.3). Example
images generated by all GANs applied in this work are presented in Fig. 4.
For quantitative evaluations, the impact of the proposed augmentation model on image
classication has been investigated. For this purpose, the Inception-v4 structure (Szegedy
et al. 2017) was used. The reason for choosing this network is its top ranking in classifying
melanoma cases at the ISIC 2017 Challenge (Menegola et al. 2017). Also, the Inception-
Fig. 3 Example images from the HAM10000 database (Tschandl et al. 2018)
1 3
234 Page 12 of 19
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
GAN based augmentation using a hybrid loss function for dermoscopy…
ResNet-V2 network (Szegedy et al. 2017), which is an updated form of the Inception-v4,
was used. Instead of initializing these classiers with random weight values, they were ini-
tializaed with pre-trained weights on ImageNet datasets. It should be reminded here that the
aim of this study is to assess the eciency of GAN-based augmentation methods applied to
dermoscopy images, rather than evaluating classier models. Also, the classication results
have been compared with results obtained from augmented images provided by other GANs
applied to generate dermoscopy images in the literature (Sect. 2.3). To make meaningful
comparisons, the same datasets and classiers have been used. The results obtained from
the Inception-V4 and Inception-ResNet-V2 classiers are presented in Tables 3 and 4,
respectively.
Table 3 Classication results (in percentage) from the Inception-V4 architecture
GAN Model Accuracy Sensitivity Specicity F1-Score MCC
Pix2pix (Isola et al. 2017)74.76 77.90 96.02 73.06 71.11
DCGAN (Mutepfe et al. 2021)78.57 78.45 96.52 77.72 74.73
StyleGAN (Bissoto et al. 2021)80.48 80.07 96.88 78.72 76.52
SL-StyleGAN (Qin et al. 2020)84.86 84.82 97.40 83.06 81.22
SPGGAN (Abdelhalim et al. 2021)85.71 86.16 97.74 84.48 83.10
SPGGAN-TTUR (Abdelhalim et al. 2021)87.92 88.53 98.04 86.67 85.57
The Proposed Model 91.62 91.95 98.62 91.03 90.05
Fig. 4 Example images generated by Pix2pix (a), DCGAN (b), StyleGAN (c), SL-StyleGAN (d), SPG-
GAN (e), SPGGAN-TTUR (f), and the proposed GAN model (g)
1 3
Page 13 of 19 234
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
E. Goceri
5 Discussion
Nowadays, in the era of deep learning, automated techniques with deep networks are domi-
nate visual examinations of dermoscopy images, which have traditionally been conducted
by dermatologists. This is because these techniques provide accurate, objective, and rapid
results. However, one of the fundamental challenges of automated analyses and classica-
tions is to achieve satisfactory results using limited labeled datasets. Image augmentation
is an ecient solution to overcome this challenge. Augmentation of medical images using
GAN structures has been explored in the literature to alleviate the scarcity of images in
developing deep learning-based applications. However, there is no GAN model that can
be applied for the augmentation of all kinds of medical images eciently, since medical
images have dierent characteristics due to several reasons such as imaging technique and
environmental factors. In addition, although there are many works on augmentation and
processing of MR images, there are only a few studies on the augmentation of dermoscopy
images with GANs, and each of them has some drawbacks (Sect. 2.3). Furthermore, it is
not clear which one is the best, since their evaluations have been performed with dierent
metrics and dierent datasets have been used. Moreover, dierent classiers have been
applied to evaluate their performances in classications. For meaningful comparisons and
identication of the most appropriate model, they should be applied with the same datasets,
and their performances should be assessed using the same classiers and metrics. Therefore,
in this work, the eectiveness of those GAN-based augmentation techniques for dermos-
copy images and their signicance in classication has been presented by fair comparisons.
The original Pix2pix model produces blurs and artifacts which result in the loss of details
and unrealistic lesion appearances. Because of the low quality of the generated images, this
model is not suitable for the augmentation of dermoscopy images and leads to low accuracy
when used in their classication. On the other hand, it has the advantage of being able to
perform stable training using input pairs of images and resolves the instability problem in
the existing GANs.
The DCGAN model can learn to produce skin lesion samples with good diversity by
utilizing its end-to-end training. However, the convolution layers in the architecture are
stridden and cause artifacts. Also, the generated images by this model lack ne-grained
details and suer from collapsing mode problems. Therefore, they are not suitable for use
in training sets.
The StyleGAN is not appropriate for generating skin lesion samples in spite of its success
in applications with natural images. This is because it was designed for images with obvi-
ous, changeable styles and continuous information. There are major dierences between
Table 4 Classication results (in percentage) from the Inception-ResNet-V2 architecture
GAN Model Accuracy Sensitivity Specicity F1-Score MCC
Pix2pix (Isola et al. 2017)79.52 79.34 96.68 78.65 75.81
DCGAN (Mutepfe et al. 2021)80.00 79.57 96.75 79.08 76.25
StyleGAN (Bissoto et al. 2021)82.14 86.10 97.65 83.45 82.81
SL-StyleGAN (Qin et al. 2020)86.67 88.01 97.91 85.23 84.39
SPGGAN (Abdelhalim et al. 2021)87.64 88.53 98.04 86.67 85.57
SPGGAN-TTUR (Abdelhalim et al. 2021)89.42 90.50 98.12 87.31 86.33
The Proposed Model 93.12 94.20 98.99 93.71 92.88
1 3
234 Page 14 of 19
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
GAN based augmentation using a hybrid loss function for dermoscopy…
such images and dermoscopy images, which are inuenced by dierent styles. The features
of lesions (e.g., styles, colors, and patterns) in dermoscopy images do not change mean-
ingfully or continuously like facial images do, and they are much less plentiful. Mixing
of styles can cause overlapping of features and even the occurrence of irrelevant styles.
Because of this, style changes applied using other types of images are inappropriate for
dermoscopy images and result in images with poor quality.
In GANs with attention mechanisms, such as SL-StyleGAN, SPGGAN, and SPPGAN-
TTUR, attention is assigned to more important features. The important features in dierent
medical images is not the same for dierent applications (e.g., segmentation, classication,
and augmentation). For instance, the features on the edges (boundaries) of the skin lesions
are more important than the textural (pattern) features inside the lesions for the segmenta-
tion of the lesions, while the opposite is true for the classication of the lesions. Both texture
and edge features are important for the augmentation process. The SL-StyleGAN-based
augmentation needs more work to improve its performance because, although it can provide
the elimination of artifacts and noise, the diversity in the images generated by this method
is not enough. The SPGGAN and SPGGAN-TTUR are based on progressive growth and
provide better performance than the style-based GANs. However, the synthetic images gen-
erated using the SPGGAN lack ne details, and the SPGGAN-TTUR needs to be improved
since it still suers from several artifacts in the generated images.
The authors in (Andrade et al. 2020) employed CycleGAN. However, their aim for image
augmentation is to improve the accuracy of lesion segmentation, wherein edge information
holds greater signicance than internal lesion details. Comparisons of the generated images
from an augmentation method, which has been specied to enhance segmentation perfor-
mance, with the images generated to improve classication tasks will not be meaningful.
Therefore, CycleGAN has not been applied in this work.
Adversarial loss functions can preserve features with the aid of the discriminator detect-
ing non-existent features and unrealistic images. However, these loss functions greatly
inuence the performance of network structures and the quality of synthetic images. They
should be chosen carefully to avoid producing blurry or unrealistic images.
Despite the improvements made by enhancing GAN architectures, properly synthesizing
details within the image remains a challenge. This is because earlier research has typically
focused on detecting either high-frequency or low-frequency details. Experiments indicate
that the proposed GAN model is promising since it incorporates features characteristic of
a skin lesion. Its high performance has been achieved through: (i) the Pix2pix backbone,
which provides stable training; (ii) the hybrid loss function, incorporating a feature-wise
loss and eciently capturing both low- and high-frequency details; and (iii) appropriate
activation and optimization functions. In the hybrid loss function, the SSIM loss preserves
brightness and contrast in regions with high frequency, the L1 loss maintains low-frequency
information, and the content loss provides the training stability necessary for convergence.
Additionally, activation and optimization functions have crucial impacts on the performance
of deep learning models. The reason is that inappropriate activation functions may cause
gradients to exponentially explode or vanish during back-propagation, resulting in the loss
of input information during forward propagation (Hayou et al. 2019). Also, researchers have
found that stochastic gradient descent generalizes better than the Adam optimizer (Wilson
et al. 2017). Due to these reasons, optimization and activation functions have been selected
carefully in this study. Therefore, the eect of the proposed network is more appropriate
1 3
Page 15 of 19 234
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
E. Goceri
than others and slightly provides better performance in the classication of skin lesions
(Tables 3 and 4).
6 ConclusIon
The main challenge of deep learning-based analyses and classications of dermoscopy
images is the necessity of large training datasets, and image augmentation presents an e-
cient solution to overcome this problem. In this work, a GAN model has been proposed for
the augmentation of dermoscopy images. Furthermore, its eect on the classication of the
images has been assessed. Additionally, GAN-based augmentation methods applied with
dermoscopy datasets in the literature have been implemented using the same datasets, and
their eectiveness in the classication of skin lesions has been compared.
Qualitative evaluations indicated that the images generated by the Pix2pix, StyleGAN,
and DCGAN architectures produce images with poor quality (Fig. 4.a, b, c). They lead to
various artifacts and are not appropriate for generating realistic dermoscopy images. The
SL-StyleGAN, SPGGAN, SPGGAN-TTUR, and the proposed structure can produce visu-
ally more realistic images (Fig. 4.d, e, f). However, quantitative results with ve metrics
indicated that they lack ne-grained detail, and the proposed model provides slightly better
results in the classication than the others (Tables 3 and 4). The classication accuracy
obtained by the proposed augmentation is a minimum of 3.7% higher than its counter-
parts. The results conrm that the proposed approach contains features that characterize a
lesion and is promising. The hybrid loss function used in the proposed GAN model uses a
feature-wise loss and provides eciency in capturing both low- and high-frequency details.
This yields high accuracy in the classication stage. If images with reduced features are
used, then the accuracy decreases since high-frequency or low-frequency details are lost
in those images. Also, it has been observed that the Inception-ResNetV2 network has the
ability to classify skin lesions better than the Inception-V4 network. Although the proposed
augmentation is ecient for dermoscopy images, it has not been applied to other medical
images acquired with dierent techniques (such as MR, CT, PET, X-ray, and ultrasound)
that have very dierent characteristics caused by various reasons (such as imaging tech-
nique and environmental factors). This is a limitation of this work. Therefore, the proposed
augmentation will be applied to other kinds of images, and its performance will be tested
in future works. We believe that deep learning-based classications using the generated
images from the proposed augmentation will improve the performance of automated diag-
nosis in dermatology.
Author contributions This is a single-authored article.
Declarations
Competing interests The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-
NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and
reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons licence, and indicate if you modied the licensed material.
You do not have permission under this licence to share adapted material derived from this article or parts
1 3
234 Page 16 of 19
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
GAN based augmentation using a hybrid loss function for dermoscopy…
of it.The images or other third party material in this article are included in the article’s Creative Commons
licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s
Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this
licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
References
Abdelhalim ISA, Mohamed MF, Mahdy YB (2021) Data augmentation for skin lesion using self-attention
based progressive generative adversarial network. Expert Syst App 165:1–13
AlAmir M, AlGhamdi M (2022) The role of generative adversarial network in medical image analysis: an
in-depth survey. ACM-CSUR 55:1–36
Andrade C, Teixeira LF, Vasconcelos MJ, Rosado L (2020) Data augmentation using adversarial image-to-
image translation for the segmentation of mobile-acquired dermatological images. J Imaging 7:1–15
Armanious K, Mecky Y, Gatidis S, Yang B (2019a) Adversarial inpainting of medical image modalities.
Conf. on Acoustics, Speech and Signal Processing, Brighton, UK, pp. 3267–3271
Armanious K, Gatidis S, Nikolaou K, Yang B, Kustner T (2019b) Retrospective correction of rigid and non-
rigid mr motion artifacts using gans. Symp. on Bio. Imaging, Venice, Italy, pp.1–5
Bisla D, Choromanska A, Berman RS, Stein JA, Polsky D (2019) Towards automated melanoma detection
with deep learning: Data purication and augmentation. IEEE/CVF Conf. on Computer Vision and Pat-
tern Recognition, Long Beach, USA, pp.1–9
Bissoto A, Valle E, Avila S (2021) Gan-based data augmentation and anonymization for skin-lesion analysis:
a critical review. Conf Comp Vis Pattern Rec Virtual, pp.1847–1856
Brock A, Donahue J, Simonyan K (2019) Large scale GAN training for high delity natural image synthesis.
arXiv:1809.11096, pp. 1–35
Chen T, Zhai X, Ritter M, Lucic M, Houlsby N (2019) Self-supervised gans via auxiliary rotation loss. Conf.
on Computer Vision and Pattern Recognition, Long Beach, USA, pp. 12154–12163
Choi H, Lee DS (2018) Generation of structural MR images from amyloid PET: application to MR-less
quantication. J Nucl Med 59:1111–1117
Common activation functions. http://cs231n.github.io/neural-networks-1/#actfun. Accessed 6 December
2022
Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA (2018) Generative adversarial
networks: an overview. IEEE Signal Process Mag 35:53–65
Cui C, Fearn T (2018) Modern practical convolutional neural networks for multivariate regression: applica-
tions to nir calibration. Chemometrics Intel Lab Syst 182:9–20
Dave P, Nambudiri V, Grant-Kels JM (2022) The introduction of Dr AI: what dermatologists should consider.
J Am Acad Dermatology 1:1–2
Dehghani M, Ghiasi M, Niknam T, Kavousi-Fard A, Shasadeghi M et al (2020) Blockchain-based securing of
data exchange in a power transmission system considering congestion management and social welfare.
https://doi.org/10.3390/su13010090. Sustainability
Galdran A, Alvarez-Gila A, Meyer MI et al (2017) Data-driven color augmentation techniques for deep skin
image analysis. https://doi.org/10.48550/arXiv.1703.03702. arXiv:1703.03702
Ghiasi M, Ghadimi N, Ahmadinia E (2019) An analytical methodology for reliability assessment and failure
analysis in distributed power system. SN Appl Sci. https://doi.org/10.1007/s42452-018-0049-0
Ghiasi M, Niknam T, Wang Z, Mehrandezh M et al (2023a) A comprehensive review of cyber-attacks and
defense mechanisms for improving security in smart grid energy systems: past, present and future.
https://doi.org/10.1016/j.epsr.2022.108975. Electric Power Systems Research
Ghiasi M, Wang Z, Mehrandezh M, Jalilian S, Ghadimi N (2023b) Evolution of smart grids towards the inter-
net of energy: Concept and essential components for deep decarbonisation. IET Smart Grid 6:86–102
Gong A, Yao X, Lin W (2020) Dermoscopy image classication based on StyleGANs and decision fusion.
IEEE Access 8:70640–70650
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S et al (2014) Generative adver-
sarial nets. Adv Neural Inf Process Syst 3:2672–2680
Hayou S, Doucet A, Rousseau J (2019) On the impact of the activation function on deep neural networks
training. Int. Conference on Machine Learning, Long Beach, USA, pp. 2672–2680
Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks.
Conf. on Comp. Vis. and Pattern Recog., Honolulu, Hawaii, pp. 1125–1134
1 3
Page 17 of 19 234
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
E. Goceri
Jeong JJ, Tariq A, Adejumo T, Trivedi H, Gichoya JW, Banerjee I (2022) Systematic review of genera-
tive adversarial networks (GANs) for medical image classication and segmentation. J Digit Imaging
35:137–152
Jin D, Xu Z, Tang Y, Harrison AP, Mollura DJ (2018) CT-realistic lung nodule simulation from 3D conditional
generative adversarial networks for robust lung segmentation. Lecture Notes Comp Sci 11071:732–740
Kazeminia S, Baur C, Kuijper A et al (2020) GANs for medical image analysis. Artif Intel Med 109:101938
Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to discover cross-domain relations with generative
adversarial networks. Conf. on Machine Learn., Sydney, Australia, pp. 1857–1865
Liao H, Huo Z, Sehnert WJ, Zhou SK, Luo J (2018) Adversarial sparse-view CBCT artifact reduction. Lect
Notes Comp Sc 11070:154–162
Lin Z, Khetan A, Fanti G, Oh S (2018) Pacgan: the power of two samples in generative adversarial networks.
Advances in neural inf. Processing Systems, Montreal, Canada, pp 1–10
Liu Z, Bicer T, Kettimuthu R et al (2020) TomoGAN: low-dose synchrotron x-ray tomography with genera-
tive adversarial networks: discussion. Opt Soc Am a 37:422–434
Liu R, Wang X, Lu H, Wu Z, Fan Q, Li S, Jin X (2021) Sccgan: style and characters inpainting based on cgan.
Mob Networks Appl 26:3–12
Lotter W, Kreiman G, Cox D (2016) Deep predictive coding networks for video prediction and unsupervised
learning. arXiv:1605.08104v5, pp.1–18
Maas AL, Hannun AY, Ng AY (2013) Rectier nonlinearities improve neural network acoustic models. The
30th Int. Conference on Machine Learning (ICML 2013), Atlanta, USA, pp.1–6
Mahapatra D (2017) Retinal vasculature segmentation using local saliency maps and generative adversarial
networks for image super resolution. arXiv 171004783:1–8
Mahapatra D, Bozorgtabar B, Thiran JPRM (2018) Ecient active learning for image classication and seg-
mentation using a sample selection and conditional generative adversarial network. Lect Notes Comp
Sci 11071:580–588
Majidian M, Tejani I, Jarmain T, Kellett L, Moy R (2022) Articial intelligence in the evaluation of telemedi-
cine dermatology patients. Drugs Dermatology 21:191–194
Makhlouf A, Maayah M, Abughanam N, Catal C (2023) The use of generative adversarial networks in medi-
cal image augmentation. Neural Comput Appl 35:24055–24068
Mardani M, Gong E, Cheng JY et al (2019) Deep generative adversarial neural networks for compressive
sensing MRI. IEEE Trans Med Imaging 38:167–179
Maspero M, Savenije MHF, Dinkla AM et al (2018) Dose evaluation of fast synthetic-CT generation using a
generative adversarial network for general pelvis MR-only radiotherapy. Phys Med Biol 63:1–12
Menegola A, Tavares J, Fornaciali M, Li LT, Avila S, Valle E (2017) RECOD titans at ISIC challenge 2017.
arXiv:1703.04819, pp.1–5
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:1411.1784, pp.1–7
Miyato T, Koyama M (2018) cGANs with projection discriminator. arXiv:1802.05637, pp. 1–21
Mutepfe F, Kalejahi BK, Meshgini S, Danishvar S (2021) Generative adversarial network image synthesis
method for skin lesion generation and classication. Med Sig Sens 11:237–252
Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classier GANs. The 34th
International Conference on Machine Learning, Sydney, Australia, pp. 2642–2651
Oh DY, Yun ID (2018) Learning bone suppression from dual energy chest x-rays using adversarial networks.
arXiv: 1811.02628, pp.1–17
Oord A, Kalchbrenner N, Vinyals O et al (2016) Conditional image generation with PixelCNN Decoders.
Adv. in Neural Info. Proc. Sys., Barcelona, Spain, pp.1–9
Pathak D, Krahenbuhl P, Donahue J, Darrell T, Efros AA (2016) Context encoders: feature learning by
inpainting. Conf. On Comp. Vis. And pattern rec. Las Vegas, USA, pp 2536–2544
Perez F, Vasconcelos C, Avila S, Valle E (2018) Data augmentation for skin lesion analysis. Comp Ass
Robotic Endos Clin Image-Based Proc Skin Image Anal 11041:303–311
Pollastri F, Bolelli F, Paredes R, Grana C (2020) Augmenting data with gans to segment melanoma skin
lesions. Multimedia Tools Appl 79:15575–15592
Qin Z, Liu Z, Zhu P, Xue Y (2020) A GAN-based image synthesis method for skin lesion classication.
Comput Methods Programs Biomed 195:1–19
Quan TM, Nguyen-Duc T, Jeong WK (2018) Compressed sensing mri reconstruction using a generative
adversarial network with a cyclic loss. IEEE Trans Med Imaging 37:1488–1497
Ran M, Hu J, Chen Y et al (2019) Denoising of 3D magnetic resonance images using a residual encoder–
decoder Wasserstein generative adversarial network. Med Image Anal 55:165–180
Raza R, Zulqar F, Tariq S et al (2022) Melanoma classication from dermoscopy images using ensemble of
convolutional neural networks. Mathematics 26:1–15
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation.
Conf. Med. Image Comp. and Comp. Assist. Inter., Munich, Germany, pp. 234–241
1 3
234 Page 18 of 19
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
GAN based augmentation using a hybrid loss function for dermoscopy…
Salimans T, Goodfellow I, Zaremba W et al (2016) Improved techniques for training gans. Adv. in Neural Inf.
Proc. Systems, Barcelona, Spain, pp.1–9
Sampath V, Maurtua I, Martín JJA, Gutierrez A (2021) A survey on generative adversarial networks for
imbalance problems in computer vision tasks. Journ Big Data 8:1–59
Seitzer M, Yang G, Schlemper J et al (2018) Adversarial and perceptual renement for compressed sensing
MRI reconstruction. Lect Notes Comput Sci 11070:12–20
Shan H, Zhang Y, Yang Q et al (2018) 3-D convolutional encoder-decoder network for low-dose ct via trans-
fer learning from a 2-D trained network. IEEE Trans Med Im 37:1522–1534
Shin HC, Tenenholtz NA, Rogers JK et al (2018) Medical image synthesis for data augmentation and anony-
mization using generative adversarial networks. Lec Notes Com Sci 11037:1–11
Shitrit O, Raviv TR (2017) Accelerated magnetic resonance imaging by adversarial neural network. Lecture
Notes Comp Sc 10553:30–38
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. 3rd
Int. Conference on Learning Representations (ICLR), San Diego, USA, pp.1–14
Szegedy C, Ioe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual
connections on learning. Conf. on Articial Intel., California, USA, pp.1–7
Tan Z, Chai M, Chen D, Liao J, Chu Q et al (2020) Michigan: multi-input conditioned hair image generation
for portrait editing. arXiv:2010.16417. https://arxiv.org/abs/2010.16417
Tschandl P, Rosendahl C, Kittler H (2018) The HAM10000 dataset, a large collection of multi-source derma-
toscopic images of common pigmented skin lesions. Sci Data 5:1–9
Tzeng E, Homan J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. IEEE conf.
On Computer Vision and Pattern Recognition. Honolulu, Hawaii, pp 7167–7176
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to
structural similarity. IEEE Trans Image Process 13:600–612
Wilson AC, Roelofs R, Stern M, Srebro N, Recht B (2017) The marginal value of adaptive gradient methods
in machine learning. Adv. in Neural Inf. Proc. Systems 1:4148–4158
Wolterink JM, Leiner T, Viergever MA, Išgum I (2017) Generative adversarial networks for noise reduction
in low-dose CT. IEEE Trans Med Imaging 36:2536–2545
Xun S, Li D, Zhu H et al (2022) Generative adversarial networks in medical image segmentation: a review.
Comput Biol Med. https://doi.org/10.1016/j.compbiomed.2021.105063
Yan X, Yang J, Sohn K, Lee H (2015) Attribute2image: conditional image generation from visual attributes.
Lect Notes Comput Sci 9908:776–791
Yang G, Yu S, Dong H et al (2018) DAGAN: deep de-aliasing generative adversarial networks for fast com-
pressed sensing mri reconstruction. IEEE Trans Med Imaging 37:1310–1321
Yi X, Babyn P (2018) Sharpness-aware low-dose ct denoising using conditional generative adversarial net-
work. Digit Imaging 31:655–669
You C, Yang Q, Shan H et al (2018) Structurally-sensitive multi-scale deep neural network for low-dose CT
denoising. IEEE Access 6:41839–41855
You C, Li G, Zhang Y et al (2020) CT super-resolution GAN constrained by the identical, residual, and cycle
learning ensemble (GAN-CIRCLE). IEEE Trans Com Im 39:188–203
Zhang P, Wang F, Xu W, Li Y (2018) Multi-channel generative adversarial network for parallel magnetic
resonance image reconstruction in k-space. Lect Notes Comput Sci 11070:180–188
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
1 3
Page 19 of 19 234
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center
GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers
and authorised users (“Users”), for small-scale personal, non-commercial use provided that all
copyright, trade and service marks and other proprietary notices are maintained. By accessing,
sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of
use (“Terms”). For these purposes, Springer Nature considers academic use (by researchers and
students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and
conditions, a relevant site licence or a personal subscription. These Terms will prevail over any
conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription (to
the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of
the Creative Commons license used will apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may
also use these personal data internally within ResearchGate and Springer Nature and as agreed share
it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not otherwise
disclose your personal data outside the ResearchGate or the Springer Nature group of companies
unless we have your permission as detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial
use, it is important to note that Users may not:
use such content for the purpose of providing other users with access on a regular or large scale
basis or as a means to circumvent access control;
use such content where to do so would be considered a criminal or statutory offence in any
jurisdiction, or gives rise to civil liability, or is otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association
unless explicitly agreed to by Springer Nature in writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a
systematic database of Springer Nature journal content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a
product or service that creates revenue, royalties, rent or income from our content or its inclusion as
part of a paid for service or for other commercial gain. Springer Nature journal content cannot be
used for inter-library loans and librarians may not upload Springer Nature journal content on a large
scale into their, or any other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not
obligated to publish any information or content on this website and may remove it or features or
functionality at our sole discretion, at any time with or without notice. Springer Nature may revoke
this licence to you at any time and remove access to any copies of the Springer Nature journal content
which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or
guarantees to Users, either express or implied with respect to the Springer nature journal content and
all parties disclaim and waive any implied warranties or warranties imposed by law, including
merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published
by Springer Nature that may be licensed from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a
regular basis or in any other manner not expressly permitted by these Terms, please contact Springer
Nature at
onlineservice@springernature.com
... Consequently, we opted to employ morphological operations during the segmentation stage to mitigate the hair interference, as this strategy allows us to maintain a more efficient computational footprint. As an extension of this work, an effective augmentation method can be chosen among different alternatives 41,42 , and the effectiveness of the SSR-UNet can be evaluated after training and testing it with enlarged datasets. Additionally, some images and their corresponding labels in the datasets have unclear boundaries, which can introduce interference during training and prediction. ...
... As shown in Fig. 1, this task ranges from simple image restoration (Bao et al. 2024;) and style transfer to complex scene generation (Liu and Liu 2024; Scene generation with hierarchical latent diffusion models 2023; ) and human generation (Ju et al. 2023;. The advent of sophisticated technologies, including Variational Autoencoders (VAE) (Kingma et al. 2014), Normalizing Flows (Papamakarios et al. 2021), Generative Adversarial Networks (GANs) (Goceri 2024), and wavelet-based augmentation Table 1 Summary of the application of diffusion models in the field of image generation methods (Goceri 2023), has enabled the creation of unprecedented and imaginative visual effects in image generation. Diffusion models, originally inspired by the process of diffusion in physics, have been adapted to describe the stochastic process of gradually transforming simple noise distributions into complex data distributions, such as images (Neal 2001;Jarzynski 1997). ...
Article
Full-text available
The rapid development of deep learning technology has led to the emergence of diffusion models as a promising generative model with diverse applications. These include image generation, audio and video synthesis, molecular design, and text generation. The distinctive generation mechanism and exceptional generation quality of diffusion models have made them a valuable tool in these diverse fields. However, with the extensive deployment of diffusion models in the domain of image generation, concerns pertaining to data privacy, data security, and artistic ethics have emerged with increasing prominence. Given the accelerated pace of development in the field of diffusion models, the majority of extant surveys are deficient in two respects: firstly, they fail to encompass the latest advances in diffusion-based image synthesis; and secondly, they seldom consider the potential social implications of diffusion models. In order to address these issues, this paper presents a comprehensive survey of the most recent applications of diffusion models in the field of image generation. Furthermore, it provides an in-depth analysis of the potential social impacts that may result from their use. Firstly, this paper presents a systematic survey of the background principles and theoretical foundations of diffusion models. Subsequently, this paper provides a detailed examination of the most recent applications of diffusion models across a range of image generation subfields, including style transfer, image completion, image editing, super-resolution, and beyond. Finally, we present a comprehensive examination of these social issues, addressing data privacy concerns, such as the potential for data leakage and the implementation of protective measures during model training. We also analyse the risk of malicious exploitation of the model and the defensive strategies employed to mitigate such risks. Additionally, we examine the implications of the authenticity and originality of generated images on artistic creativity and copyright protection.
... Although we have found good detection results (81.3% mAP), we will increase them by applying other detection models to obtain more accurate results. Also, the effectiveness and generalizability of the proposed model can be evaluated as a future work after applying an efficient data augmentation method because there are many augmentation approaches and many of them are based on traditional transformations [34][35][36][37]. In addition, a modified version of the proposed method can be applied for liver [38,39] and kidney segmentations [1,40], which are challenging issues, and traditional methods [41] are not sufficient due to artifacts, noise, and low contrast [42]. ...
Article
Full-text available
In our research, we introduce a sophisticated “two‐stage” or cascade model designed to enhance the precision of lung nodule analysis. This innovative approach integrates two crucial processes: detection and segmentation. In the initial stage, a specialized object detection algorithm efficiently scans medical images to identify potential areas of interest, specifically focusing on lung nodules. This plays a crucial role in minimizing the segmentation area, particularly in the context of lung imaging, where the structures exhibit heterogeneity. This algorithm helps focus the segmentation process only on the relevant areas, reducing unnecessary computation and potential errors. Subsequently, the second stage employs advanced segmentation algorithms to precisely delineate the boundaries of the identified nodules, providing detailed and accurate contours. The combination of object detection and segmentation not only enhances the overall accuracy of lung cancer detection but also minimizes false positives, streamlines the workflow for radiologists, and provides a more comprehensive understanding of potential abnormalities. Additionally, it improves the efficiency and accuracy of segmentation, especially in cases where the complexity and heterogeneity of the lung structure make the segmentation task more challenging. This proposed method has been tested on the LIDC‐IDRI dataset, demonstrating favorable results in both nodule detection and segmentation steps, with 81.3% mAP and 83.54% DSC, respectively. These results serve as evidence that the proposed method effectively improves the accuracy of lung nodule detection and segmentation.
... The proposed algorithm was validated using the publicly available LOLA11 dataset and achieved good lung lobe segmentation. In the future, as an extension of this work, the reliability and robustness of the proposed method can be further evaluated by applying it to an increased number of images, after selecting the most effective data augmentation method from various recent approaches [37][38][39]. As another future work, a modified version of the proposed method can be applied for segmentation of kidneys and the liver from different image modalities. ...
Article
Full-text available
Automated approaches for pulmonary lobe segmentation frequently encounter difficulties when applied to clinically significant cases, primarily stemming from factors such as incomplete and blurred pulmonary fissures, unpredictable pathological deformation, indistinguishable pulmonary arteries and veins, and severe damage to the lung trachea. To address these challenges, an interactive and intuitive approach utilizing an oriented derivative of stick (ODoS) filter and a surface fitting model is proposed to effectively extract and repair incomplete pulmonary fissures for accurate lung lobe segmentation in computed tomography (CT) images. First, an ODoS filter was employed in a two‐dimensional (2D) space to enhance the visibility of pulmonary fissures using a triple‐stick template to match the curvilinear structures across various orientations. Second, a three‐dimensional (3D) post‐processing pipeline based on a direction partition and integration approach was implemented for the initial detection of pulmonary fissures. Third, a coarse‐to‐fine segmentation strategy is utilized to eliminate extraneous clutter and rectify missed pulmonary fissures, thereby generating accurate pulmonary fissure segmentation. Finally, considering that pulmonary fissures serve as physical boundaries of the lung lobes, a multi‐projection technique and surface fitting model were combined to generate a comprehensive fissure surface for pulmonary lobe segmentation. To assess the effectiveness of our approach, we actively participated in an internationally recognized lung lobe segmentation challenge known as LObe and Lung Analysis 2011 (LOLA11), which encompasses 55 CT scans. The validity of the proposed methodology was confirmed by its successful application to a publicly accessible challenge dataset. Overall, our method achieved an average intersection over union (IoU) of 0.913 for lung lobe segmentation, ranking seventh among all participants so far. Furthermore, experimental outcomes demonstrated excellent performance compared with other methods, as evidenced by both visual examination and quantitative evaluation.
... To assess the effectiveness of our models in recognizing facial expressions and determining interest or non-interest, we employ two primary metrics: loss and accuracy (Goceri, 2024). These metrics offer detailed understanding of the models' learning progress, capacity to generalize to novel data, and overall robustness (Liu et al, 2024). ...
Preprint
Full-text available
Generalizing deep learning models across diverse content types is a persistent challenge in domains like Facial Emotion Recognition (FER), where datasets often fail to reflect the wide range of emotional responses triggered by different stimuli. This study addresses the issue of content generalizability by comparing FER model performance between models trained on video data collected in a controlled laboratory environment, data extracted from a social media platform (YouTube), and synthetic data generated using Generative Adversarial Networks. The videos focus on facial reactions to advertisements, and the integration of these different data sources seeks to address underrepresented advertisement genres, emotional reactions, and individual diversity. Our FER models leverage Convolutional Neural Networks Xception architecture, which is fine-tuned using category based sampling. This ensures training and validation data represent diverse advertisement categories, while testing data includes novel content to evaluate generalizability rigorously. Precision-recall curves and ROC-AUC metrics are used to assess performance. Results indicate a 7% improvement in accuracy and a 12% increase in precision-recall AUC when combining real-world social media and synthetic data, demonstrating reduced overfitting and enhanced content generalizability. These findings highlight the effectiveness of integrating synthetic and real-world data to build FER systems that perform reliably across more diverse and representative content.
... In dermatological research, several studies for skin lesion classification have tackled the CIP in skin image datasets, primarily employing several image transformations such as rotation, blurring, and cropping to achieve class balance and increase the dataset size [13,19,[47][48][49]. It is worth noting that several GAN-based models have been proposed to create synthetic images and combine them with original images with reasonable results [70][71][72][73], however these works used GANs for data augmentation and then CNN-based models for skin lesion identification. ...
Article
Full-text available
Background Cutaneous melanoma is the most aggressive form of skin cancer, responsible for most skin cancer-related deaths. Recent advances in artificial intelligence, jointly with the availability of public dermoscopy image datasets, have allowed to assist dermatologists in melanoma identification. While image feature extraction holds potential for melanoma detection, it often leads to high-dimensional data. Furthermore, most image datasets present the class imbalance problem, where a few classes have numerous samples, whereas others are under-represented. Methods In this paper, we propose to combine ensemble feature selection (FS) methods and data augmentation with the conditional tabular generative adversarial networks (CTGAN) to enhance melanoma identification in imbalanced datasets. We employed dermoscopy images from two public datasets, PH2 and Derm7pt, which contain melanoma and not-melanoma lesions. To capture intrinsic information from skin lesions, we conduct two feature extraction (FE) approaches, including handcrafted and embedding features. For the former, color, geometric and first-, second-, and higher-order texture features were extracted, whereas for the latter, embeddings were obtained using ResNet-based models. To alleviate the high-dimensionality in the FE, ensemble FS with filter methods were used and evaluated. For data augmentation, we conducted a progressive analysis of the imbalance ratio (IR), related to the amount of synthetic samples created, and evaluated the impact on the predictive results. To gain interpretability on predictive models, we used SHAP, bootstrap resampling statistical tests and UMAP visualizations. Results The combination of ensemble FS, CTGAN, and linear models achieved the best predictive results, achieving AUCROC values of 87% (with support vector machine and IR=0.9) and 76% (with LASSO and IR=1.0) for the PH2 and Derm7pt, respectively. We also identified that melanoma lesions were mainly characterized by features related to color, while not-melanoma lesions were characterized by texture features. Conclusions Our results demonstrate the effectiveness of ensemble FS and synthetic data in the development of models that accurately identify melanoma. This research advances skin lesion analysis, contributing to both melanoma detection and the interpretation of main features for its identification.
... Convolutional and subsampling layers are combined to create a CNN with only four processing levels [48][49][50][51]. Based on the reviewing the article [52,53,54], the hybrid loss functions have been recently used in various image processing application for segmentation and classification process. However, the performance of a hybrid loss function is largely influenced by the precise calibration of weights assigned to its various components. ...
Article
Full-text available
In any commercial applications, gender is an important demographic factor that can be utilized to understand the future of retail and the nature of shopping. Nevertheless, a variety of variances in viewing perspectives, facial emotions, extreme instances, background, resolution variations, and face image appearance make gender classification a constant challenging task. This motivates us to propose Canny Edge Feature Vectorized Dense Layered AlexNet (CFVD- AlexNet) inspired by AlexNet which classifies the gender of the face images with high accuracy. The main contribution of this research is three-fold. The initial contribution is through feature extraction using unsupervised learning which is based on the Canny Edge Feature (CEF) extraction technique. The second contribution deals with fine-tuning the AlexNet to design CFVD- AlexNet. The final contribution towards the security countermeasure which overcomes the Dirty Label Backdoor attack method by proposing the CFVD Differentially Private Data Augmentations AlexNet (CFVD DPInstaHide AlexNet). The Face Gender Classification Dataset with 5200 face images is subjected to CEF preprocessing that forms significant Canny Edge Feature Vectorized (CFV) face images. The CFV face images are fitted with the existing CNN models to choose the best CNN model. Experiment results portray that AlexNet and DenseNet offer gender classification with an accuracy above 85% compared to other CNN models. Now, AlexNet has been chosen to add an extra three-layered Dense block after the Fully Connected Layer (FCL) ending with the CFVD- AlexNet model. The implementation results reveal that the proposed CFVD-AlexNet classifies the gender with a high accuracy of 99.32% when compared to the existing models. The Novelty of this research relies on suppressing the Dirty Label Backdoor (DLB) attack by the proposed CFVD DPInstaHide AlexNet model in which the set of security countermeasure of the model are also validated for the successful implementation.
Article
This study explores a transfer learning approach with vision transformers (ViTs) and convolutional neural networks (CNNs) for classifying retinal diseases, specifically diabetic retinopathy, glaucoma, and cataracts, from ophthalmoscopy images. Using a balanced subset of 4217 images and ophthalmology-specific pretrained ViT backbones, this method demonstrates significant improvements in classification accuracy, offering potential for broader applications in medical imaging. Glaucoma, diabetic retinopathy, and cataracts are common eye diseases that can cause vision loss if not treated. These diseases must be identified in the early stages to prevent eye damage progression. This paper focuses on the accurate identification and analysis of disparate eye diseases, including glaucoma, diabetic retinopathy, and cataracts, using ophthalmoscopy images. Deep learning (DL) has been widely used in image recognition for the early detection and treatment of eye diseases. In this study, ResNet50, DenseNet121, Inception-ResNetV2, and six variations of ViT are employed, and their performance in diagnosing diseases such as glaucoma, cataracts, and diabetic retinopathy is evaluated. In particular, the article uses the vision transformer model as an automated method to diagnose retinal eye diseases, highlighting the accuracy of pre-trained deep transfer learning (DTL) structures. The updated ViT#5 model with the augmented-regularized pre-trained model (AugReg ViT-L/16_224) and learning rate of 0.00002 outperforms the state-of-the-art techniques, obtaining a data-based accuracy score of 98.1% on a publicly accessible retinal ophthalmoscopy image dataset, which includes 4217 images. In most categories, the model outperforms other convolutional-based and ViT models in terms of accuracy, precision, recall, and F1 score. This research contributes significantly to medical image analysis, demonstrating the potential of AI in enhancing the precision of eye disease diagnoses and advocating for the integration of artificial intelligence in medical diagnostics.
Article
Automated neurodegenerative disease detection systems that integrate computer vision and machine learning (ML) provide non-invasive, cost-effective tools to assist neurologists in diagnosing and monitoring disorders like Parkinson’s disease (PD). Handwriting analysis plays a crucial role in the automated assessment of PD. Although many ML methods utilizing either automated or hand-crafted features—or a combination of both—have shown promising results in PD detection, they often suffer from overestimation bias and struggle to produce consistent results when tested on new, unseen datasets. To address these challenges, we propose a method that extracts more representative features, ensuring both trustworthiness and generalizability for real-time applications. We introduce ChiGa-Net, a novel framework that integrates the statistical score-based refined deep feature extraction with a genetically optimized neural network. To provide more practical results, this framework was trained and validated on a specific dataset, namely NewHandPD, and then rigorously tested on a new, unseen dataset, HandPD. Numerical experiments confirm the trustworthiness and generalizability of ChiGa-Net. For spiral images, the proposed approach achieves 92.56% balanced accuracy, 85.13% sensitivity, and 100% specificity during external testing. These results demonstrate superior performance compared to previously published methods and ML systems based on either deep learning or hand-crafted features. These findings suggest that the developed method yields reliable and generalizable results across diverse datasets, making it a robust and trustworthy tool for real-world clinical applications.
Article
Full-text available
Generative Adversarial Networks (GANs) have been widely applied in various domains, including medical image analysis. GANs have been utilized in classification and segmentation tasks, aiding in the detection and diagnosis of diseases and disorders. However, medical image datasets often suffer from insufficiency and imbalanced class distributions. To overcome these limitations, researchers have employed GANs to generate augmented medical images, effectively expanding datasets and balancing class distributions. This review follows the PRISMA guidelines and systematically collects peer-reviewed articles on the development of GAN-based augmentation models. Automated searches were conducted on electronic databases such as IEEE, Scopus, Science Direct, and PubMed, along with forward and backward snowballing. Out of numerous articles, 52 relevant ones published between 2018 and February 2022 were identified. The gathered information was synthesized to determine common GAN architectures, medical image modalities, body organs of interest, augmentation tasks, and evaluation metrics employed to assess model performance. Results indicated that cGAN and DCGAN were the most popular GAN architectures in the reviewed studies. Medical image modalities such as MRI, CT, X-ray, and ultrasound, along with body organs like the brain, chest, breast, and lung, were frequently used. Furthermore, the developed models were evaluated, and potential challenges and future directions for GAN-based medical image augmentation were discussed. This review presents a comprehensive overview of the current state-of-the-art in GAN-based medical image augmentation and emphasizes the potential advantages and challenges associated with GAN utilization in this domain.
Article
Full-text available
To achieve low‐carbon sustainable energy development, new technologies such as Internet of Energy (IoE), intelligent systems and Internet of Things (IoT) as well as distributed energy generations via smart grids (SG) are gaining attention. The interoperability between intelligent energy systems, realised through the web, enables automatic consumption optimisation and increases network efficiency and intelligent management. IoE is an intriguing topic in close connection with the IoT, communication systems, SG and electrical mobility that contributes to energy efficiency to achieve zero‐carbon technologies and green environments. Furthermore, nowadays, the widespread growth and utilisation of processors for mining digital currency in homes and small warehouses are some other factors to be considered in terms of electric energy consumption and greenhouse gas emission. However, research on the use of the Internet for evaluating the misallocation of energy and the effect it can have on CO2 emissions is often neglected. In this study, the authors present a detailed overview regarding the evolution of SG in conjunction with the employment of IoE systems as well as the essential components of IoE for decarbonisation. Also, mathematical models with simulation are provided to evaluate the role of IoE in reducing CO2 emission.
Article
Full-text available
Human skin is the most exposed part of the human body that needs constant protection and care from heat, light, dust, and direct exposure to other harmful radiation, such as UV rays. Skin cancer is one of the dangerous diseases found in humans. Melanoma is a form of skin cancer that begins in the cells (melanocytes) that control the pigment in human skin. Early detection and diagnosis of skin cancer, such as melanoma, is necessary to reduce the death rate due to skin cancer. In this paper, the classification of acral lentiginous melanoma, a type of melanoma with benign nevi, is being carried out. The proposed stacked ensemble method for melanoma classification uses different pre-trained models, such as Xception, Inceptionv3, InceptionResNet-V2, DenseNet121, and DenseNet201, by employing the concept of transfer learning and fine-tuning. The selection of pre-trained CNN architectures for transfer learning is based on models having the highest top-1 and top-5 accuracies on ImageNet. A novel stacked ensemble-based framework is presented to improve the generalizability and increase robustness by fusing fine-tuned pre-trained CNN models for acral lentiginous melanoma classification. The performance of the proposed method is evaluated by experimenting on a Figshare benchmark dataset. The impact of applying different augmentation techniques has also been analyzed through extensive experimentations. The results confirm that the proposed method outperforms state-of-the-art techniques and achieves an accuracy of 97.93%.
Article
Full-text available
Background: One of the common limitations in the treatment of cancer is in the early detection of this disease. The customary medical practice of cancer examination is a visual examination by the dermatologist followed by an invasive biopsy. Nonetheless, this symptomatic approach is timeconsuming and prone to human errors. An automated machine learning model is essential to capacitate fast diagnoses and early treatment. Objective: The key objective of this study is to establish a fully automatic model that helps Dermatologists in skin cancer handling process in a way that could improve skin lesion classification accuracy. Method: The work is conducted following an implementation of a Deep Convolutional Generative Adversarial Network (DCGAN) using the Python-based deep learning library Keras. We incorporated effective image filtering and enhancement algorithms such as bilateral filter to enhance feature detection and extraction during training. The Deep Convolutional Generative Adversarial Network (DCGAN) needed slightly more fine-tuning to ripe a better return. Hyperparameter optimization was utilized for selecting the best-performed hyperparameter combinations and several network hyperparameters. In this work, we decreased the learning rate from the default 0.001 to 0.0002, and the momentum for Adam optimization algorithm from 0.9 to 0.5, in trying to reduce the instability issues related to GAN models and at each iteration the weights of the discriminative and generative network were updated to balance the loss between them. We endeavour to address a binary classification which predicts two classes present in our dataset, namely benign and malignant. More so, some wellknown metrics such as the receiver operating characteristic -area under the curve and confusion matrix were incorporated for evaluating the results and classification accuracy. Results: The model generated very conceivable lesions during the early stages of the experiment and we could easily visualise a smooth transition in resolution along the way. Thus, we have achieved an overall test accuracy of 93.5% after fine-tuning most parameters of our network. Conclusion: This classification model provides spatial intelligence that could be useful in the future for cancer risk prediction. Unfortunately, it is difficult to generate high quality images that are much like the synthetic real samples and to compare different classification methods given the fact that some methods use non-public datasets for training.
Article
Due to the advancement in communication networks, metering and smart control systems, as well as the prevalent use of Internet-based structures, new forms of power systems have seen moderate changes with respect to several aspects of contradictory Cyber–Physical Power Systems (CPPSs). These structures usually have connections between power sections and cyber parts. CPPSs confront newly emerging issues including stability, resiliency, reliability, vulnerability and also security. Studying, analyzing and providing solutions to mitigate or solve these problems highly depend on accurate modeling methods and examining the interaction mechanisms associated with the cyber-security of Smart Grids (SGs). This paper aims to systematically summarize different methods and techniques and to review corresponding solution approaches in cyber-security in energy systems. In the first step, we discuss the interactive features of cyber-security; then, their modeling and mechanisms are reviewed and summarized in detail. Furthermore, the characteristics and applicability of different cyber-attack models are technically discussed and analyzed. The cutting-edge cyber security approaches such as blockchain and quantum computing in SGs and power systems are stated, and recent research directions are highlighted. The decisive problem-solving approaches and defense mechanisms are presented. Finally, some points regarding the role of cyber-security in the future of SGs are presented.
Article
A generative adversarial network (GAN) is one of the most significant research directions in the field of artificial intelligence, and its superior data generation capability has garnered wide attention. In this paper, we discuss the recent advancements in GANs, particularly in the medical field. First, the different medical imaging modalities and the principal theory of GANs were analyzed and summarized, after which, the evaluation metrics and training issues were determined. Third, the extension models of GANs were classified and introduced one by one. Fourth, the applications of GAN in medical images including cross-modality, augmentation, detection, classification, and reconstruction were illustrated. Finally, the problems we needed to resolve, and future directions were discussed. The objective of this review is to provide a comprehensive overview of the GAN, simplify the GAN’s basics, and present the most successful applications in different scenarios.
Article
Background: Background: Early detection of malignant skin lesions reduces morbidity. There is increased need for a telemedicine triage tool to prioritize patients who require in-person evaluation for potential malignancy. Objective: To evaluate the utility of artificial intelligence (AI) in telemedicine triage and diagnosis of cutaneous lesions. Methods: Clinical photographs of unbiopsied skin lesions were presented to AI software and three board-certified dermatologists with 18 years average clinical experience. Diagnoses were compared with biopsy reports of the same lesions. Results: Results from 100 images revealed no significant diagnostic difference between AI and a panel of three dermatologists when using the AI top three differential diagnoses. The AI correctly identified 63% of the cases whereas the dermatology group correctly identified 64.3% of the cases (P<.05). In summary, there was no statistically significant difference when evaluating lesions. Conclusion: The use of artificial intelligence as a method of triaging patients with potential skin cancer is a very useful option in telemedicine, as AI identification of BCC, SCC, and melanoma did not significantly differ from board-certified dermatologists. Both dermatologists and non-dermatologists will benefit from an AI triage system, prioritizing lesions that the software deems malignant. J Drugs Dermatol. 2022;21(2):191-194. doi:10.36849/JDD.6277.
Article
In recent years, generative adversarial networks (GANs) have gained tremendous popularity for various imaging related tasks such as artificial image generation to support AI training. GANs are especially useful for medical imaging–related tasks where training datasets are usually limited in size and heavily imbalanced against the diseased class. We present a systematic review, following the PRISMA guidelines, of recent GAN architectures used for medical image analysis to help the readers in making an informed decision before employing GANs in developing medical image classification and segmentation models. We have extracted 54 papers that highlight the capabilities and application of GANs in medical imaging from January 2015 to August 2020 and inclusion criteria for meta-analysis. Our results show four main architectures of GAN that are used for segmentation or classification in medical imaging. We provide a comprehensive overview of recent trends in the application of GANs in clinical diagnosis through medical image segmentation and classification and ultimately share experiences for task-based GAN implementations.
Article
Purpose Since Generative Adversarial Network (GAN) was introduced into the field of deep learning in 2014, it has received extensive attention from academia and industry, and a lot of high-quality papers have been published. GAN effectively improves the accuracy of medical image segmentation because of its good generating ability and capability to capture data distribution. This paper introduces the origin, working principle, and extended variant of GAN, and it reviews the latest development of GAN-based medical image segmentation methods. Method To find the papers, we searched on Google Scholar and PubMed with the keywords like “segmentation”, “medical image”, and “GAN (or generative adversarial network)”. Also, additional searches were performed on Semantic Scholar, Springer, arXiv, and the top conferences in computer science with the above keywords related to GAN. Results We reviewed more than 120 GAN-based architectures for medical image segmentation that were published before September 2021. We categorized and summarized these papers according to the segmentation regions, imaging modality, and classification methods. Besides, we discussed the advantages, challenges, and future research directions of GAN in medical image segmentation. Conclusions We discussed in detail the recent papers on medical image segmentation using GAN. The application of GAN and its extended variants has effectively improved the accuracy of medical image segmentation. Obtaining the recognition of clinicians and patients and overcoming the instability, low repeatability, and uninterpretability of GAN will be an important research direction in the future.