Timeline of some of the representative loss- and architecture-variants of GANs

Timeline of some of the representative loss- and architecture-variants of GANs

Source publication
Article
Full-text available
Generative adversarial networks (GANs), a novel framework for training generative models in an adversarial setup, have attracted significant attention in recent years. The two opposing neural networks of the GANs framework, i.e., a generator and a discriminator, are trained simultaneously in a zero-sum game, where the generator generates images to...

Citations

... These refined architectures effectively mitigate challenges in GAN training, notably mode collapse and non-convergence. GAN exhibit superior performance across applications including image synthesis, video and audio synthesis, natural language processing, medical and healthcare, security, and various domains [33]. ...
Article
Full-text available
The origin–destination (OD) matrix describes traffic flow information between regions. It is a critical input for intelligent transportation systems (ITS). However, obtaining the OD matrix remains challenging due to high costs and privacy concerns. Synthetic data, which have the same statistical distribution of real data, help address privacy issues and data scarcity. Based on Generative Adversarial Networks (GAN), OD matrix generation models, which can effectively generate a synthetic OD matrix, help to address the challenge of obtaining OD matrix data in ITS research. However, existing OD matrix generation methods can only handle with tens of nodes. To address this challenge, this study proposes the Origin–Destination Progressive Growing Generative Adversarial Networks (OD-PGGAN) for large-scale OD matrix generation task which adapt the PGGAN architecture. OD-PGGAN adopts a progressive learning strategy to gradually learn the structure of the OD matrix from a coarse to fine scale. OD-PGGAN utilizes multi-scale generators and discriminators to perform generation and discrimination tasks at different spatial resolutions. OD-PGGAN introduces a geography-based upsampling and downsampling algorithm to maintain the geographical significance of the OD matrix during spatial resolution transformations. The results demonstrate that the proposed OD-PGGAN can generate a large-scale synthetic OD matrix with 1024 nodes that have the same distribution as the real sample and outperforms two classical methods. The OD-PGGAN can effectively provide reliable synthetic data for transportation applications.
... GANs consist of two neural networks, a generator and a discriminator, which are in an adversarial setup with each other [104]. GANs have been applied to generate images, videos, and even audio that seem real. ...
Article
Full-text available
Machine learning (ML) and deep learning (DL), subsets of artificial intelligence (AI), are the core technologies that lead significant transformation and innovation in various industries by integrating AI-driven solutions. Understanding ML and DL is essential to logically analyse the applicability of ML and DL and identify their effectiveness in different areas like healthcare, finance, agriculture, manufacturing, and transportation. ML consists of supervised, unsupervised, semi-supervised, and reinforcement learning techniques. On the other hand, DL, a subfield of ML, comprising neural networks (NNs), can deal with complicated datasets in health, autonomous systems, and finance industries. This study presents a holistic view of ML and DL technologies, analysing algorithms and their application’s capacity to address real-world problems. The study investigates the real-world application areas in which ML and DL techniques are implemented. Moreover, the study highlights the latest trends and possible future avenues for research and development (R&D), which consist of developing hybrid models, generative AI, and incorporating ML and DL with the latest technologies. The study aims to provide a comprehensive view on ML and DL technologies, which can serve as a reference guide for researchers, industry professionals, practitioners, and policy makers.
... These networks are trained adversarially, leading to a model that learns the underlying data distribution and can reconstruct highly accurate outputs, even in high-noise environments. Through this adversarial training framework, the model can perform denoising in a nuanced and adaptive manner-an achievement that pure error-based strategies often struggle to attain [30]. Fig. 3 illustrates the complete GAN framework and the model structure of the discriminator. ...
Article
Full-text available
The growing complexity of cyber threats requires innovative machine learning techniques, and image-based malware classification opens up new possibilities. Meanwhile, existing research has largely overlooked the impact of noise and obfuscation techniques commonly employed by malware authors to evade detection, and there is a critical gap in using noise simulation as a means of replicating real-world malware obfuscation techniques and adopting denoising framework to counteract these challenges. This study introduces an image denoising technique based on a U-Net combined with a GAN framework to address noise interference and obfuscation challenges in image-based malware analysis. The proposed methodology addresses existing classification limitations by introducing noise addition, which simulates obfuscated malware, and denoising strategies to restore robust image representations. To evaluate the approach, we used multiple CNN-based classifiers to assess noise resistance across architectures and datasets, measuring significant performance variation. Our denoising technique demonstrates remarkable performance improvements across two multi-class public datasets, MALIMG and BIG-15. For example, the MALIMG classification accuracy improved from 23.73% to 88.84% with denoising applied after Gaussian noise injection, demonstrating robustness. This approach contributes to improving malware detection by offering a robust framework for noise-resilient classification in noisy conditions.
... However, as GANs evolve further, the sheer number of variants and models becomes prohibitive for researchers and practitioners looking to choose the optimal architecture for specific tasks. Several GAN architectures were implemented, which included the conditional GAN (CGAN), Deep Convolutional GAN (DCGAN), and the Wasserstein GAN (WGAN) [1,2], to counter some of the weaknesses inherent in the basic GAN architecture, such as instability during training, mode collapse, and failure to generate high-resolution images [3,4]. Each model improves on one of the weaknesses cited but introduces others simultaneously. ...
Article
Full-text available
The growing spectrum of Generative Adversarial Network (GAN) applications in medical imaging, cyber security, data augmentation, and the field of remote sensing tasks necessitate a sharp spike in the criticality of review of Generative Adversarial Networks. Earlier reviews that targeted reviewing certain architecture of the GAN or emphasizing a specific application-oriented area have done so in a narrow spirit and lacked the systematic comparative analysis of the models’ performance metrics. Numerous reviews do not apply standardized frameworks, showing gaps in the efficiency evaluation of GANs, training stability, and suitability for specific tasks. In this work, a systemic review of GAN models using the PRISMA framework is developed in detail to fill the gap by structurally evaluating GAN architectures. A wide variety of GAN models have been discussed in this review, starting from the basic Conditional GAN, Wasserstein GAN, and Deep Convolutional GAN, and have gone down to many specialized models, such as EVAGAN, FCGAN, and SIF-GAN, for different applications across various domains like fault diagnosis, network security, medical imaging, and image segmentation. The PRISMA methodology systematically filters relevant studies by inclusion and exclusion criteria to ensure transparency and replicability in the review process. Hence, all models are assessed relative to specific performance metrics such as accuracy, stability, and computational efficiency. There are multiple benefits to using the PRISMA approach in this setup. Not only does this help in finding optimal models suitable for various applications, but it also provides an explicit framework for comparing GAN performance. In addition to this, diverse types of GAN are included to ensure a comprehensive view of the state-of-the-art techniques. This work is essential not only in terms of its result but also because it guides the direction of future research by pinpointing which types of applications require some GAN architectures, works to improve specific task model selection, and points out areas for further research on the development and application of GANs.
... Training continues until the discriminator can no longer distinguish between real and generated data. GANs gradually improve their performance through this dynamic interaction and generate synthetic data that closely resemble realworld examples [80,81]. DCGANs specifically use CNNs in both the generator and the discriminator. ...
Article
Full-text available
This study aims to improve the efficiency of mineral exploration by intro-ducing a novel application of Deep Convolutional Generative Adversarial Networks (DCGANs) to augment geological evidence layers. By training a DCGAN model with existing geological, geochemical, and remote sensing data, we have synthesized new, plausible layers of evidence that reveal unrecognized patterns and correlations. This approach deepens the understanding of the controlling factors in the formation of mineral deposits. The implications of this research are significant and could improve the efficiency and success rate of mineral exploration projects by providing more relia-ble and comprehensive data for decision-making. The predictive map created using the proposed feature augmentation technique covered all known deposits in only 18% of the study area.
... GANs comprise two distinct yet similarly trained networks [39]: the generator and the discriminator. The generator creates data, like images and text, that mimic authentic data despite being artificial, while the discriminator enhances the generator. ...
Article
Full-text available
Text summarization is crucial in various sectors, such as engineering and healthcare, because it enhances efficiency in terms of time and costs. Current extractive text summarization methods struggle with challenges such as greedy selection, model generalization limitations, and high computational demands. To solve these problems, this research introduces a novel extractive text summarization method that uniquely integrates a Generative Adversarial Network (GAN), Transductive Long Short-Term Memory (TLSTM), and DistilBERT for sentence embedding. Our technique uses GANs, which include generator and discriminator components, with the core design based on TLSTM. TLSTM utilizes transductive learning to improve accuracy by focusing on samples geographically closer to the test data. In our model, the generator considers whether to include a sentence in the summary while the discriminator critically reviews the generated summary. This GAN model reduces greedy sentence selection, enhancing summary coherence and quality. We implement a Reinforcement Learning (RL)-based strategy to address an imbalance caused by more fake than real samples in the discriminator. This RL approach, novel in the context of GANs for summarization, views training as a sequence of interconnected decisions, treating each sample as a unique scenario. The network, acting as the decision-making agent, assigns greater rewards or penalties to the minority class to correct the imbalance. The effectiveness of our model was evaluated using the well-regarded CNN/Daily Mail dataset, achieving ROUGE-1, ROUGE-2, and ROUGE-L scores of 52.45, 26.46, and 44.85, respectively. Compared to existing methods, our results demonstrate a significant improvement in summarization quality and operational efficiency, as measured by the ROUGE metric.
... A S an emerging technique, generative adversarial networks (GANs) have been widely applied into various generation tasks, such as image generation, audio and speech synthesis, text generation, image translation, video generation, style-transfer, and so on [1]. The GANs architecture includes two competing flexible networks: one is the generator that replicates a data distribution and generates synthesized data, the other is the discriminator which distinguishes between real and generated samples [2]. ...
... The two opposing networks are trained alternately in a zero-sum game until a Nash equilibrium is reached. While GANs have demonstrated remarkable success in generating realistic and high-resolution images, they strive to achieve significant results in other domains [1]- [3]. ...
... In addition, the quality of generated audios severely degrades when the model is conditioned on mel-spectrograms from unseen speakers in different acoustic environments [15]. GANs also struggle with training difficulties and mode collapse, in which case the generator can only produces limited sample varieties [1]. Denoising diffusion probabilistic models (DDPMs) have been proposed to address these issues but suffer from a slow reverse process, making them impractical for real-time applications. ...
Article
Full-text available
In recent years, generative adversarial networks (GANs) have made significant progress in generating audio sequences. However, these models typically rely on bandwidth-limited mel-spectrograms, which constrain the resolution of generated audio sequences, and lead to mode collapse during conditional generation. To address this issue, we propose Deformable Periodic Network based GAN (DPN-GAN), a novel GAN architecture that incorporates a kernel-based periodic ReLU activation function to induce periodic bias in audio generation. This innovative approach enhances the model’s ability to capture and reproduce intricate audio patterns. In particular, our proposed model features a DPN module for multi-resolution generation utilizing deformable convolution operations, allowing for adaptive receptive fields that improve the quality and fidelity of the synthetic audio. Additionally, we enhance the discriminator network using deformable convolution to better distinguish between real and generated samples, further refining the audio quality.We trained two versions of the model: DPN-GAN small (38.67 M parameters) and DPN-GAN large (124M parameters). For evaluation, we use five different datasets, covering both speech synthesis and music generation tasks, to demonstrate the efficiency of the DPN-GAN. The experimental results demonstrate that DPN-GAN delivers superior performance on both out-of-distribution and noisy data, showcasing its robustness and adaptability. Trained across various datasets, DPN-GAN outperforms state-of-the-art GAN architectures on standard evaluation metrics, and exhibits increased robustness in synthesized audio.
... Mode collapse, where the generator fails to cover the full diversity of the target data, remains a key issue [6,7]. Strategies include manifold-preserving GANs (MP-GAN) [8], which apply entropy maximization on the data manifold, and mutual information maximization in models like InfoMax-GAN [9]. ...
... For example, the presence of sharp change points in financial time series data due to noise contamination severely degrades the prediction performance of these models [12]. Therefore, with the rapid advancement in deep learning technology, more complex models are also being tried for financial time series prediction, including transformers [13], Generative Adversarial Networks (GANs) [14], Reinforcement Learning (RL) [15], and so on. Meanwhile, to overcome deficiencies associated with individual models, a series of hybrid models have also been implemented to enhance the prediction accuracy of financial time series [3,5]. ...
Article
Full-text available
Financial time series prediction is a fundamental problem in investment and risk management. Deep learning models, such as multilayer perceptrons, Convolutional Neural Networks (CNNs), and Long Short-Term Memory (LSTM), have been widely used in modeling time series data by incorporating historical information. Among them, LSTM has shown excellent performance in capturing long-term temporal dependencies in time-series data, owing to its enhanced internal memory mechanism. In spite of the success of these models, it is observed that in the presence of sharp changing points, these models fail to perform. To address this problem, we propose, in this article, an innovative financial time series prediction method inspired by the Deep Operator Network (DeepONet) architecture, which uses a combination of transformer architecture and a one-dimensional CNN network for processing feature-based information, followed by an LSTM based network for processing temporal information. It is therefore named the CNN–LSTM–Transformer (CLT) model. It not only incorporates external information to identify latent patterns within the financial data but also excels in capturing their temporal dynamics. The CLT model adapts to evolving market conditions by leveraging diverse deep-learning techniques. This dynamic adaptation of the CLT model plays a pivotal role in navigating abrupt changes in the financial markets. Furthermore, the CLT model improves the long-term prediction accuracy and stability compared with state-of-the-art existing deep learning models and also mitigates adverse effects of market volatility. The experimental results show the feasibility and superiority of the proposed CLT model in terms of prediction accuracy and robustness as compared to existing prediction models. Moreover, we posit that the innovation encapsulated in the proposed DeepONet-inspired CLT model also holds promise for applications beyond the confines of finance, such as remote sensing, data mining, natural language processing, and so on.
... There is another class of deep learning model, known as Generative Adversarial Network (GAN) [48][49][50], which can produce highly realistic images. However, it can also inadvertently introduce artifacts resembling real structures [51][52][53], such as the spurious features like additional vessel-like patterns in PA imaging. These spurious features arise because the generator overlearns certain features in an attempt to fool the discriminator, leading to structures that resemble real vessels but are not present in the original image. ...
Article
Full-text available
Recent advances in Light Emitting Diode (LED) technology have enabled a more affordable high frame rate photoacoustic imaging (PA) alternative to traditional laser-based PA systems that are costly and have slow pulse repetition rate. However, a major disadvantage with LEDs is the low energy outputs that do not produce high signal-to-noise ratio (SNR) PA images. There have been recent advancements in integrating deep learning methodologies aimed to address the challenge of improving SNR in LED-PA images, yet comprehensive evaluations across varied datasets and architectures are lacking. In this study, we systematically assess the efficacy of various Encoder-Decoder-based CNN architectures for enhancing SNR in real-time LED-based PA imaging. Through experimentation with in vitro phantoms, ex vivo mouse organs, and in vivo tumors, we compare basic convolutional autoencoder and U-Net architectures, explore hierarchical depth variations within U-Net, and evaluate advanced variants of U-Net. Our findings reveal that while U-Net architectures generally exhibit comparable performance, the Dense U-Net model shows promise in denoising different noise distributions in the PA image. Notably, hierarchical depth variations did not significantly impact performance, emphasizing the efficacy of the standard U-Net architecture for practical applications. Moreover, the study underscores the importance of evaluating robustness to diverse noise distributions, with Dense U-Net and R2 U-Net demonstrating resilience to Gaussian, salt and pepper, Poisson, and Speckle noise types. These insights inform the selection of appropriate deep learning architectures based on application requirements and resource constraints, contributing to advancements in PA imaging technology.