Article

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

## Abstract

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

## No full-text available

... The most successful recent deep neural networks for image colorizing are based on CNNs. Also, in the context of deep learning, Goodfellow et al. (Goodfellow et al. 2014) introduced generative adversarial network (GAN), which is very useful for image colorization and a variety of problems. ...
... Moreover, as mentioned above, most researchers employed CNNs models to train with colored image datasets. However, in the context of deep learning, a successful solution, generative adversarial network (GAN) introduced by Goodfellow et al. Goodfellow et al. (2014). It includes two neural networks: a generative and discriminative network, which compete with each other. ...
... Generative Adversarial Network (GAN) is a very successful approach proposed by Goodfellow et al. Goodfellow et al. (2014). It consists of two deep neural networks: the generator (G) and the discriminator (D) . The former tries to generate fake images similar to the target dataset images. The discriminator, a classifier, tries to distinguish between fake generated images and real trained images. They follow a continuous two-player minimax game where the gen ...
Article
Full-text available
Multi-GANs, inspired by traditional GAN, divide each problem space into several smaller and more homogeneous subspaces. It is an architecture of multiple generative adversarial networks that work together to achieve the highest output quality. This paper presents Advanced Multi-GANs architecture for colorization based on two novelties, including the cluster numbers and the color harmonies. Advanced Multi-GANs can intelligently decide the number of clusters using the input test image and its scene complexity, leading to much more realistic colorization. Also, color harmony, which defines a rational relation between pixels of frames and their generated colors, is proposed to keep the harmony of the colors among a sequence of frames in video colorizing. Color harmony helps avoid changing the colors of the same objects between video frames. In experimental results, the evaluation of this study with several protocols, including image and video colorization, is provided. In addition to visual qualitative evaluation, the performance of the proposed method is quantitatively measured in the Advanced Multi-GAN framework. The experimental results show much more realistic outputs in comparison to the traditional approaches and state-of-the-art.
... The only room for attack is through providing poisoned data to the local discriminator as shown in Fig. 1. Adversarial Motivation: A vanilla GAN optimizes loss function in the manner outlined in [8], where the discriminator seeks to maximize the accuracy of the real and fake image classification while the generator seeks to minimize the likelihood that its generated image will be classified as fake. Specifically, the objective is written as follows: ...
... The optimization of GAN is recognized to be difficult, nevertheless, because the generator is subpar upon learning that log(D(G(z))) is probably saturating [8]. ...
... One practical solutions to replace the minmax loss (Eq. (1)) of vanilla GAN [8] with Wasserstein distance to regularize GAN training due to its uniform gradient throughout [1]. To further confine the loss function within 1-Lipschitz, we propose to use WGAN with gradient penalty (WGAN-DP) [9] as the local image generation model. ...
Preprint
Full-text available
Deep Learning-based image synthesis techniques have been applied in healthcare research for generating medical images to support open research. Training generative adversarial neural networks (GAN) usually requires large amounts of training data. Federated learning (FL) provides a way of training a central model using distributed data from different medical institutions while keeping raw data locally. However, FL is vulnerable to backdoor attack, an adversarial by poisoning training data, given the central server cannot access the original data directly. Most backdoor attack strategies focus on classification models and centralized domains. In this study, we propose a way of attacking federated GAN (FedGAN) by treating the discriminator with a commonly used data poisoning strategy in backdoor attack classification models. We demonstrate that adding a small trigger with size less than 0.5 percent of the original image size can corrupt the FL-GAN model. Based on the proposed attack, we provide two effective defense strategies: global malicious detection and local training regularization. We show that combining the two defense strategies yields a robust medical image generation.
... Generative adversarial nets (GAN) [11] was proposed by Goodfellow in 2014, which uses input data to generate a network structure of synthetic data. So far, many GAN -based image inpainting algorithms have been developed. ...
... Then we apply the alternating direction method of multipliers (ADMM) to solve model (11). Moreover, because the variables of model (11) are inter-dependent, auxiliary variables are imposed to simplify the optimization. ...
... Then we apply the alternating direction method of multipliers (ADMM) to solve model (11). Moreover, because the variables of model (11) are inter-dependent, auxiliary variables are imposed to simplify the optimization. Then the model can be written as: ...
Article
Full-text available
For a damaged image, recovering an image with missing entire rows or columns is a challenging problem arising in many real applications, such as digital image inpainting. For this kind of information missing situation, the diffusion-based inpainting methods are tend to produce blur, the exemplar-based methods are prone to error filling and the neural network-based methods are highly dependent on data. Many existing approaches formulate this problem as a general low-rank matrix approximate one which cannot handle this special structural missing very well. In this paper, we propose a novel image inpainting algorithm named nonlocal low-rank tensor completion (NLLRTC) based on the nonlocal self-similarity prior and the low-rank prior. By using the nonlocal self-similarity of image patches, we directly stack these patches into a three-dimensional similar tensor instead of pulling them into column vectors, then the similar tensor can be completed by tensor ring (TR) decomposition. By leveraging the alternating direction method under the augmented Lagrangian multiplier framework, the optimization results can be obtained. Moreover, a weighted nuclear norm is added to the tensor completion model to achieve better inpainting performance, which we call weighted nonlocal low-rank tensor completion (WNLLRTC) algorithm. Our empirical studies show encouraging results on both quantitative assessment and visual interpretation of our proposed methods in comparison to some state-of-the-art algorithms.
... In recent years, recurrent neural networks (RNNs), variational autoencoders (VAEs) [6] and generative adversarial networks (GANs) [7] have been used for anomaly detection of multivariate KPIs and have achieved good results. RNNs (such as long short-term memory (LSTM) [8], gated recurrent units (GRUs), etc.) are often used as the base model of VAEs (i.e., the encoder and decoder) and GANs (i.e., the generator and discriminator) to capture the time dependence of multivariate KPIs. ...
... In the anomaly detection method based on KPIs, RNNs, VAEs [6] and GANs [7] have been widely used. RNNs (e.g., LSTM, GRU networks) can process time series data and capture the time dependence of entity KPI data. ...
... (1) Accuracy rate: represents the accuracy of the model prediction results: precision = T P T P + F P (7) (2) Recall rate: represents the ratio of the positive cases found in the model to the real positive cases: ...
Article
Full-text available
In a large-scale cloud environment, many key performance indicators (KPIs) of entities are monitored in real time. These multivariate time series consist of high-dimensional, high-noise, random and time-dependent data. As a common method implemented in artificial intelligence for IT operations (AIOps), time series anomaly detection has been widely studied and applied. However, the existing detection methods cannot fully consider the influence of multiple factors and cannot quickly and accurately detect anomalies in multivariate KPIs of entities. Concurrently, fine-grained root cause locations cannot be determined for detected anomalies and often require abundant normal data that are difficult to obtain for model training. To solve these problems, we propose a long short-term memory (LSTM)-based semisupervised variational autoencoder (VAE) anomaly detection strategy called LR-SemiVAE. First, LR-SemiVAE uses VAE to perform feature dimension reduction and reconstruction of multivariate time series data and judges whether the entity is abnormal by calculating the reconstruction probability score. Second, by introducing an LSTM network into the VAE encoder and decoder, the model can fully learn the time dependence of multivariate time series. Then, LR-SemiVAE predicts the data labels by introducing a classifier to reduce the dependence on the original labeled data during model training. Finally, by proposing a new evidence lower bound (ELBO) loss function calculation method, LR-SemiVAE pays attention to the normal pattern and ignores the abnormal pattern during training to reduce the time cost of removing random anomaly and noise data. However, due to the limitations of LSTM in learning the long-term dependence of time series data, based on LR-SemiVAE, we propose a transformer-based semisupervised VAE anomaly detection and location strategy called RT-SemiVAE for cluster systems with complex service dependencies. This method learns the long-term dependence of multivariate time series by introducing a parallel multihead attention mechanism transformer, while LSTM is used to capture short-term dependence, and the introduction of parallel computing also markedly reduces model training time. After RT-SemiVAE detects entity anomalies, it traces the root entities according to the obtained service dependence graph and locates the root causes at the indicator level. We verify the strategies by using public data sets and constructing a system prototype. Experimental results show that compared with existing baseline methods, the LR-SemiVAE and RT-SemiVAE strategies can detect anomalies more quickly and accurately and perform fine-grained and accurate localization of the root causes of anomalies.
... GAN [17,23,27,60] is widely used for the image enhancement and synthesis. GAN adopts the adversarial training [17] to learn a generator ( ) and a discriminator ( ) simultaneously. ...
... GAN [17,23,27,60] is widely used for the image enhancement and synthesis. GAN adopts the adversarial training [17] to learn a generator ( ) and a discriminator ( ) simultaneously. ...
... We follow the common practice [17,23,27,55,60] to design the overall architecture of GAN. However, two key changes are introduced to our model to make it more suitable for opportunistic enhancement in video analytics. ...
Preprint
Edge computing is being widely used for video analytics. To alleviate the inherent tension between accuracy and cost, various video analytics pipelines have been proposed to optimize the usage of GPU on edge nodes. Nonetheless, we find that GPU compute resources provisioned for edge nodes are commonly under-utilized due to video content variations, subsampling and filtering at different places of a pipeline. As opposed to model and pipeline optimization, in this work, we study the problem of opportunistic data enhancement using the non-deterministic and fragmented idle GPU resources. In specific, we propose a task-specific discrimination and enhancement module and a model-aware adversarial training mechanism, providing a way to identify and transform low-quality images that are specific to a video pipeline in an accurate and efficient manner. A multi-exit model structure and a resource-aware scheduler is further developed to make online enhancement decisions and fine-grained inference execution under latency and GPU resource constraints. Experiments across multiple video analytics pipelines and datasets reveal that by judiciously allocating a small amount of idle resources on frames that tend to yield greater marginal benefits from enhancement, our system boosts DNN object detection accuracy by $7.3-11.3\%$ without incurring any latency costs.
... After building the data set in ZSL format, we transplanted some classical methods in ZSL, which include projection methods: DAP [9] and IAP [9], and generation method- [13] based GAN [14]. These methods establish the relationship between attributes and features, so as to generalize Chinese characters to Bai characters and finally improve the accuracy [10]. ...
... Learning. Currently, generative approaches dominate in GZSL, which exploit existing adversarial generative networks (GAN) [14,17,18] or variational autoencoders (VAE) [15,19,20] so that visual characteristics from class-level semantic attributes and random noise can be synthesized. f-CLSWGAN [13], cycle-UWGAN [21], and LisGAN [22] introduce the Wasserstein generative adversarial network (WGAN) [23] coupled with a pretrained classifier so that visual characters for invisible characteristics can be synthesized, thus allowing the GZSL work to deteriorate into a fully supervised issue for categorization. ...
... Generative Adversarial Network. The generative adversarial network (GAN) [13,14] proposes the feature generative framework for zero-shot learning. We introduce it into zero-shot Bai character recognition. ...
Article
Full-text available
When talking about Bai nationality, people are impressed by its long history and the language it has created. However, since fewer people of the young generation learn the traditional language, the glorious Bai culture becomes less known, making understanding Bai characters difficult. Based on the highly precise character recognition model for Bai characters, the paper is aimed at helping people read books written in Bai characters so as to popularize the culture. To begin with, a data set is built with the support of Bai culture fans and experts. However, the data set is not large enough as knowledge in this respect is limited. This makes the deep learning model less accurate since it lacks sufficient data. The popular zero-shot learning (ZSL) is adopted to overcome the insufficiency of data sets. We use Chinese characters as the seen class, Bai characters as the unseen class, and the number of strokes as the attribute to construct the ZSL format data set. However, the existing ZSL methods ignore the character structure information, so a generation method based on variational autoencoder (VAE) is put forward, which can automatically capture the character structure information. Experimental results show that the method facilitates the recognition of Bai characters and makes it more precise.
... In this systematic review, we survey works that generate realistic synthetic 3D data with Generative Adversarial Networks (GANs) Goodfellow et al. [2014]. With the massive increase of data-driven algorithms, such as deep learning-based approaches, during the last years Egger et al. [2021Egger et al. [ , 2022, data is of great interest. ...
... We performed a search in the IEEE Xplore Digital Library, Scopus, PubMed, and Web of Science with the search query ' (("Generative Adversarial Network" OR "Generative Adversarial Networks" OR gan OR gans) AND (generation OR generative) AND data AND synthetic)' to find specific papers on the use of GANs for volumetric data generation. Since GANs were presented in 2014 by Goodfellow et al. [2014], all papers prior to 2014 were excluded. ...
... (1) However, this equation has the problem of saturation in minimising the loss of the generator l og (1 − D(G(z))). To solve this problem, Goodfellow et al. [2014] propose to maximize l og (D(G(z))) instead. This technique is also known as a non-saturating GAN. ...
Preprint
Full-text available
Data has become the most valuable resource in today's world. With the massive proliferation of data-driven algorithms, such as deep learning-based approaches, the availability of data is of great interest. In this context, high-quality training, validation and testing datasets are particularly needed. Volumetric data is a very important resource in medicine, as it ranges from disease diagnoses to therapy monitoring. When the dataset is sufficient, models can be trained to help doctors with these tasks. Unfortunately, there are scenarios and applications where large amounts of data is unavailable. For example, in the medical field, rare diseases and privacy issues can lead to restricted data availability. In non-medical fields, the high cost of obtaining a sufficient amount of high-quality data can also be a concern. A solution to these problems can be the generation of synthetic data to perform data augmentation in combination with other more traditional methods of data augmentation. Therefore, most of the publications on 3D Generative Adversarial Networks (GANs) are within the medical domain. The existence of mechanisms to generate realistic synthetic data is a good asset to overcome this challenge, especially in healthcare, as the data must be of good quality and close to reality, i.e. realistic, and without privacy issues. In this review, we provide a summary of works that generate realistic 3D synthetic data using GANs. We therefore outline GAN-based methods in these areas with common architectures, advantages and disadvantages. We present a novel taxonomy, evaluations, challenges and research opportunities to provide a holistic overview of the current state of GANs in medicine and other fields.
... The rapid progress of deep learning technology Hinton and Salakhutdinov, 2006) and deep convolutional neural networks (CNNs) has led to many new applications in computer vision and image processes. The emergence of generative adversarial networks (GANs) (Goodfellow et al., 2014) has brought almost a leap in image generation, inpainting, repair, and completion. A conditional generative adversarial net (CGAN) (Mirza and Osindero, 2014) can generate custom outputs by adding class information to the model. ...
... The deep convolutional generative adversarial network structure (Goodfellow et al., 2014;Radford et al., 2016) is adopted as the main body of the model to reconstruct high-quality and highresolution images from low-quality microscopic cell images. ...
Article
Full-text available
Long-term live-cell imaging technology has emerged in the study of cell culture and development, and it is expected to elucidate the differentiation or reprogramming morphology of cells and the dynamic process of interaction between cells. There are some advantages to this technique: it is noninvasive, high-throughput, low-cost, and it can help researchers explore phenomena that are otherwise difficult to observe. Many challenges arise in the real-time process, for example, low-quality micrographs are often obtained due to unavoidable human factors or technical factors in the long-term experimental period. Moreover, some core dynamics in the developmental process are rare and fleeting in imaging observation and difficult to recapture again. Therefore, this study proposes a deep learning method for microscope cell image enhancement to reconstruct sharp images. We combine generative adversarial nets and various loss functions to make blurry images sharp again, which is much more convenient for researchers to carry out further analysis. This technology can not only make up the blurry images of critical moments of the development process through image enhancement but also allows long-term live-cell imaging to find a balance between imaging speed and image quality. Furthermore, the scalability of this technology makes the methods perform well in fluorescence image enhancement. Finally, the method is tested in long-term live-cell imaging of human-induced pluripotent stem cell-derived cardiomyocyte differentiation experiments, and it can greatly improve the image space resolution ratio.
... The current developments focus on the gaze redirection of video that is primarily implemented using the Gonvolutional Neural Network(CNN). There were additional works that happened identical, however none has been developed through the means of Generative Adversarial Network(GAN) [9] [10]. GAN is essentially a new scheme for estimating generative models via an adversarial operation, within which we tend to simultaneously train 2 models: the data distribution captured by a Generative Model G, and a sample derived from the training data instead of G which estimates the probability by Discriminate model D. ...
... Generative Adversarial Networks [9] produces model distribution which mimics a given target distribution. In many applications like video prediction and generation [17] , image super resolution [18] , image in-painting and also in classical computer vision problems [19] this method has been adopted. ...
Article
Full-text available
Gaze correction is a type of video re-synthesis problem that trains to redirect a person's eye gaze into camera by manipulating the eye area. It has many applications like video conferencing, movies, games and has a great future in medical fields such as to experiment with people having autism. Existing methods are incapable of gaze redirection of video using GAN. The suggested approach is based on the in-painting model to read from the face and fill the missed eye regions with new contents, reflecting corrected eye gaze in this paper. Both gaze estimation as well as gaze redirection have been implemented. The Hourglass model of CNN was used for gaze estimation and the Generative Adversarial Network(GAN) for video gaze redirection, in which two neural networks compete in a game to learn and produce new data with the same statistics as the training set. In addition, various losses were estimated such as discriminator, generator loss and perceptual loss in order to determine the accuracy of our model and evaluate the performance by adversarial divergence, reconstruction error and image quality measures. We demonstrate that the proposed method outperforms in terms of quality of the image and redirection precision in comprehensive tests.
... On the other hand, the data can be enhanced by using methods of data generation. Such as the field of image recognition, many scholars improve the recognition capability of the model by augmenting the information set through the generative adversarial networks (GAN) [24]. For example, Zhong et al. [25] transferred the labeled training image style to each camera by using CycleGAN and formed an enhanced training set together with the original training samples. ...
... GAN mainly consisted of two parts, including the generator and the discriminator [24]. e generator mainly makes the data generated by itself more real by learning the distribution of real sample data, while the discriminator is used to distinguish the authenticity of the received data. ...
Article
Full-text available
At present, deep learning is widely used to predict the remaining useful life (RUL) of rotation machinery in failure prediction and health management (PHM). However, in the actual manufacturing process, massive rotating machinery data are not easily obtained, which will lead to the decline of the prediction accuracy of the data-driven deep learning method. Firstly, a novel prognostic framework is proposed, which is comprised of conditional Wasserstein distance-based generative adversarial networks (CWGAN) and adversarial convolution neural networks (AdCNN), which can stably generate high-quality training samples to augment the bearing degradation dataset and solve the problem of few samples. Then, the bearing RUL prediction method is realized by inputting the monitoring data into the one-dimensional convolutional neural network (1DCNN) for adversarial training. Via the bearing degradation dataset of the IEEE 2012 PHM data challenge, the reliability of the proposed method is verified. Finally, experimental results show that our approach is better than others in RUL prediction on average absolute deviation and average square root error.
... These models can capture key growth mechanisms of cities, but the generated form is often very different from the real situation because of ignoring many complex high-dimensional features of urban morphology [24]. On the other hand, machine learning methods, especially deep generative models, have been validated owning the ability to approximate complicated, high-dimensional probability distributions [13] and have been widely used in urban science and geoscience in recent years [30,36]. Among deep generative models, the Generative Adversarial Network (GAN) is proved to be a suitable method to study spatial effects of geographical systems [1,2]. ...
... GAN has been demonstrated to be a powerful tool to fit highdimensional complex distributions [13]. This advantage makes GAN good at modeling complex geospatial data with ubiquitous spatial effects (e.g., spatial dependence and heterogeneity). ...
Conference Paper
Full-text available
Simulating urban morphology with location attributes is a challenging task in urban science. Recent studies have shown that Genera-tive Adversarial Networks (GANs) have the potential to shed light on this task. However, existing GAN-based models are limited by the sparsity of urban data and instability in model training, hampering their applications. Here, we propose a GAN framework with geographical knowledge, namely Metropolitan GAN (MetroGAN), for urban morphology simulation. We incorporate a progressive growing structure to learn hierarchical features and design a geographical loss to impose the constraints of water areas. Besides, we propose a comprehensive evaluation framework for the complex structure of urban systems. Results show that MetroGAN outper-forms the state-of-the-art urban simulation methods by over 20% in all metrics. Inspiringly, using physical geography features singly, MetroGAN can still generate shapes of the cities. These results demonstrate that MetroGAN solves the instability problem of previous urban simulation GANs and is generalizable to deal with various urban attributes.
... Further, 3DMM cannot model geometry beyond face such as hair, clothing, glasses, and even ears and consequently the morphing results exhibit strong visual artifacts in respective regions. Recent implicit solutions, popularized by generative models [Goodfellow et al. 2014], can generate unseen portraits with convincing pose variations [Karras et al. 2019;Shen et al. 2020]. However, as a generator, they do not readily support restylizing a given portrait. ...
... The emerging implicit solutions, popularized by generative models [Goodfellow et al. 2014;Karras et al. 2021Karras et al. , 2019Karras et al. , 2020, have shown huge potential for free-view high-quality portrait synthesis. Although, as a generation task, they were not specifically designed for perspective retargeting, it is possible to tweak these solutions via neural inversion to support free-view rendering [Karras et al. 2019;Shen et al. 2020;Shen and Zhou 2021;Tewari et al. 2020]. ...
Preprint
In this work, we propose NARRATE, a novel pipeline that enables simultaneously editing portrait lighting and perspective in a photorealistic manner. As a hybrid neural-physical face model, NARRATE leverages complementary benefits of geometry-aware generative approaches and normal-assisted physical face models. In a nutshell, NARRATE first inverts the input portrait to a coarse geometry and employs neural rendering to generate images resembling the input, as well as producing convincing pose changes. However, inversion step introduces mismatch, bringing low-quality images with less facial details. As such, we further estimate portrait normal to enhance the coarse geometry, creating a high-fidelity physical face model. In particular, we fuse the neural and physical renderings to compensate for the imperfect inversion, resulting in both realistic and view-consistent novel perspective images. In relighting stage, previous works focus on single view portrait relighting but ignoring consistency between different perspectives as well, leading unstable and inconsistent lighting effects for view changes. We extend Total Relighting to fix this problem by unifying its multi-view input normal maps with the physical face model. NARRATE conducts relighting with consistent normal maps, imposing cross-view constraints and exhibiting stable and coherent illumination effects. We experimentally demonstrate that NARRATE achieves more photorealistic, reliable results over prior works. We further bridge NARRATE with animation and style transfer tools, supporting pose change, light change, facial animation, and style transfer, either separately or in combination, all at a photographic quality. We showcase vivid free-view facial animations as well as 3D-aware relightable stylization, which help facilitate various AR/VR applications like virtual cinematography, 3D video conferencing, and post-production.
... Researchers have developed techniques for the autonomous synthesis and augmentation of visual data during the last few years. Numerous techniques evolved from generative adversarial networks (GANs) [9] and variational autoencoders (VAEs) [10]. Additional data, such as conditioning labels, are included in these systems (e.g. ...
... It has been demonstrated that GANs [9] generate fake images with the same distribution as the target domain. Therefore, VGAN [27], a 3D convolutional GAN capable of concurrently generating all target video frames, was introduced by Vondrick et al. ...
Article
Full-text available
Significant advances have been made in facial image animation from a single image. Nonetheless, generating convincing facial feature movements remains a complex challenge in computer graphics. The purpose of this study is to develop an efficient and effective approach for transferring motion from a source video to a single facial image by governing the position and expression of the face in the video to generate a new video imitating the source image. Compared to prior methods that focus solely on manipulating facial expressions, this model has been trained to distinguish the moving foreground from the background image and to create motions such as facial rotation and translation as well as small local motions such as gaze shift. The proposed technique uses generative adversarial networks GANs with a motion transfer model. The network forecasts photorealistic video frames for a given target image using synthetic input in renderings from a parametric face model. The authenticity in this postprocess-ing conversion is attained by precise image manipulation. Thorough adversarial training is used to produce greater accuracy in this postprocessing conversion. Although more improvements to face landmark identification on videos and face super-resolution techniques have been made to improve the results, the proposed technique can provide more coherent videos with improved visual quality, resulting in more aligned landmark sequences for training. In addition, experiments indicate that we obtain superior results compared to those obtained by the state-of-the-art image-driven technique with PSNR 30.74 and SSIM 0.90.
... GANs involve data synthesis where the synthesized data point is approximated with the actual data point. These are used with an encoder-decoder architecture Goodfellow et al. [2014] where an encoder first takes input and projects it to a latent space. A decoder then takes the latent features and projects it to the original space, attempting to reconstruct the original data point. ...
... learning utilizes the generative power of neural networks to synthesize videoVondrick et al. [2016]. These tasks originated in the image domain with the most common technique being Generative Adversarial Networks (GANs)Goodfellow et al. [2014],Zhu et al. [2017a]. These generative approaches for images have been extended to video inSrivastava et al. [2015],Tulyakov et al. [2018],Vondrick et al. [2016] ...
Preprint
The remarkable success of deep learning in various domains relies on the availability of large-scale annotated datasets. However, the use of human-generated annotations leads to models with biased learning, poor domain generalization, and poor robustness. Obtaining annotations is also expensive and requires great effort, which is especially challenging for videos. As an alternative, self-supervised learning provides a way for representation learning which does not require annotations and has shown promise in both image and video domains. Different from the image domain, learning video representations are more challenging due to the temporal dimension, bringing in motion and other environmental dynamics. This also provides opportunities for exclusive ideas which can advance self-supervised learning in the video and multimodal domain. In this survey, we provide a review of existing approaches on self-supervised learning focusing on the video domain. We summarize these methods into three different categories based on their learning objectives: pre-text tasks, generative modeling, and contrastive learning. These approaches also differ in terms of the modality which are being used: video, video-audio, video-text, and video-audio-text. We further introduce the commonly used datasets, downstream evaluation tasks, insights into the limitations of existing works, and the potential future directions in this area.
... Generative models can be trained to produce samples that mimic hard-to-model channel conditions. Applications of deep generative models in the form of variational autoencoders (VAEs) [44] and generative adversarial networks (GANs) [53] were specifically reported in the context of end-to-end simulation of wireless systems in [54], [55] and for channel modeling in [56]- [59] for earlier applications to satellite communications. ...
Preprint
Full-text available
This work takes a critical look at the application of conventional machine learning methods to wireless communication problems through the lens of reliability and robustness. Deep learning techniques adopt a frequentist framework, and are known to provide poorly calibrated decisions that do not reproduce the true uncertainty caused by limitations in the size of the training data. Bayesian learning, while in principle capable of addressing this shortcoming, is in practice impaired by model misspecification and by the presence of outliers. Both problems are pervasive in wireless communication settings, in which the capacity of machine learning models is subject to resource constraints and training data is affected by noise and interference. In this context, we explore the application of the framework of robust Bayesian learning. After a tutorial-style introduction to robust Bayesian learning, we showcase the merits of robust Bayesian learning on several important wireless communication problems in terms of accuracy, calibration, and robustness to outliers and misspecification.
... This problem can be widely used in various applications, i.e., image editing, interactive painting and content generation. Recent work [24,33,43,46,48,50] mainly follows the adversarial learning paradigm, where the network is trained with adversarial loss [7], along with a reconstruction loss. By exploring the model architectures, these methods gradually improve performance on the benchmark datasets. ...
Preprint
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks compared with Generative Adversarial Nets (GANs). Recent work on semantic image synthesis mainly follows the \emph{de facto} GAN-based approaches, which may lead to unsatisfactory quality or diversity of generated images. In this paper, we propose a novel framework based on DDPM for semantic image synthesis. Unlike previous conditional diffusion model directly feeds the semantic layout and noisy image as input to a U-Net structure, which may not fully leverage the information in the input semantic mask, our framework processes semantic layout and noisy image differently. It feeds noisy image to the encoder of the U-Net structure while the semantic layout to the decoder by multi-layer spatially-adaptive normalization operators. To further improve the generation quality and semantic interpretability in semantic image synthesis, we introduce the classifier-free guidance sampling strategy, which acknowledge the scores of an unconditional model for sampling process. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our proposed method, achieving state-of-the-art performance in terms of fidelity~(FID) and diversity~(LPIPS).
... Generative Adversarial Networks. Generative Adversarial Networks (GANs) [14] are powerful generative models which learn a distribution that mimics a given target distribution. They have been applied to many fields, such as low-level image processing tasks (e.g., image inpainting [15], [16], image super-resolution [17]- [19]), semantic and style transfer (e.g., image translation [20]- [26], image attribute manipulation [27]- [32], person image synthesis [33]- [37], image manipulation [38]). ...
Preprint
Full-text available
This paper proposes a gaze correction and animation method for high-resolution, unconstrained portrait images, which can be trained without the gaze angle and the head pose annotations. Common gaze-correction methods usually require annotating training data with precise gaze, and head pose information. Solving this problem using an unsupervised method remains an open problem, especially for high-resolution face images in the wild, which are not easy to annotate with gaze and head pose labels. To address this issue, we first create two new portrait datasets: CelebGaze and high-resolution CelebHQGaze. Second, we formulate the gaze correction task as an image inpainting problem, addressed using a Gaze Correction Module (GCM) and a Gaze Animation Module (GAM). Moreover, we propose an unsupervised training strategy, i.e., Synthesis-As-Training, to learn the correlation between the eye region features and the gaze angle. As a result, we can use the learned latent space for gaze animation with semantic interpolation in this space. Moreover, to alleviate both the memory and the computational costs in the training and the inference stage, we propose a Coarse-to-Fine Module (CFM) integrated with GCM and GAM. Extensive experiments validate the effectiveness of our method for both the gaze correction and the gaze animation tasks in both low and high-resolution face datasets in the wild and demonstrate the superiority of our method with respect to the state of the arts. Code is available at https://github.com/zhangqianhui/GazeAnimationV2
... Generative models have been attaining much interest in the audio community due to their abilities to learn the underlying audio distribution. Generative adversarial networks (GANs) (Goodfellow et al. 2014), variational autoencoders (VAEs) (Kingma and Welling 2013), and autoregressive models (Shannon et al. 2012) are extensively investigated in the speech and audio processing scientific community. Specifically, they are used to synthesised audio signal from a low-dimensional representation to a high-resolution signal (Hsu et al. 2017;Ma et al. 2019;. ...
Article
Full-text available
Deep reinforcement learning (DRL) is poised to revolutionise the field of artificial intelligence (AI) by endowing autonomous systems with high levels of understanding of the real world. Currently, deep learning (DL) is enabling DRL to effectively solve various intractable problems in various fields including computer vision, natural language processing, healthcare, robotics, to name a few. Most importantly, DRL algorithms are also being employed in audio signal processing to learn directly from speech, music and other sound signals in order to create audio-based autonomous systems that have many promising applications in the real world. In this article, we conduct a comprehensive survey on the progress of DRL in the audio domain by bringing together research studies across different but related areas in speech and music. We begin with an introduction to the general field of DL and reinforcement learning (RL), then progress to the main DRL methods and their applications in the audio domain. We conclude by presenting important challenges faced by audio-based DRL agents and by highlighting open areas for future research and investigation. The findings of this paper will guide researchers interested in DRL for the audio domain.
... This data enhancement approach is based on machine learning models proposed by Ian Goodfellow et al. [29]. The algorithm uses an unsupervised Generative Adversarial Network (GAN) pre-trained on the ImageNet dataset [30] and then trained on several datasets [31,32,33,34] to improve input image lighting. ...
Article
Localization is one of the most critical tasks for an autonomous vehicle, as position information is required to understand its surroundings and move accordingly. In the last years, Visual Odometry (VO) has shown promising results among different localization approaches. However, VO algorithms are usually evaluated in standard datasets and benchmarks with favorable perceptual conditions. This research aims to explore the performance of state-of-the-art VO algorithms in a perceptually challenging environment: an urban underground railway scenario. This domain includes complex conditions defined by significant light changes, insufficient illumination and textures, or non-Lambertian surfaces that challenge VO. First, a dataset of an urban underground railway scenario has been generated, designing an algorithm that synchronizes camera frames, train odometry data, the gradient profile of the scenario railway, and Geomap coordinate trajectories. Then, the performance of DF-VO and ORB-SLAM2 is evaluated in the generated dataset. Finally, in order to explore how to overcome the challenging lighting conditions of the scenario, the application of the data enhancement algorithm EnlightenGAN is proposed. The results show that enhancing the dataset with EnlightenGAN improves the performance of both algorithms and reduces the dispersion of ORB-SLAM2 estimations among different runs.
... The GTDA method obtained the optimal classification result on the target domain using Nash equilibrium [17]. Wulfmeier et al. [18] adopted Generative Adversarial Networks (GANs) [19] to align the features across domains. ...
Preprint
In this paper, we address the Online Unsupervised Domain Adaptation (OUDA) problem and propose a novel multi-stage framework to solve real-world situations when the target data are unlabeled and arriving online sequentially in batches. To project the data from the source and the target domains to a common subspace and manipulate the projected data in real-time, our proposed framework institutes a novel method, called an Incremental Computation of Mean-Subspace (ICMS) technique, which computes an approximation of mean-target subspace on a Grassmann manifold and is proven to be a close approximate to the Karcher mean. Furthermore, the transformation matrix computed from the mean-target subspace is applied to the next target data in the recursive-feedback stage, aligning the target data closer to the source domain. The computation of transformation matrix and the prediction of next-target subspace leverage the performance of the recursive-feedback stage by considering the cumulative temporal dependency among the flow of the target subspace on the Grassmann manifold. The labels of the transformed target data are predicted by the pre-trained source classifier, then the classifier is updated by the transformed data and predicted labels. Extensive experiments on six datasets were conducted to investigate in depth the effect and contribution of each stage in our proposed framework and its performance over previous approaches in terms of classification accuracy and computational speed. In addition, the experiments on traditional manifold-based learning models and neural-network-based learning models demonstrated the applicability of our proposed framework for various types of learning models.
... Recent attempts [3,7,13] leverage solutions based on Generative Adversarial Networks [4] or Variational Autoencoders [8]. Although those methods offer considerable speed-up of the simulation process, they also suffer from the limitations of existing generative models. ...
Preprint
Full-text available
Generative Adversarial Networks (GANs) are powerful models able to synthesize data samples closely resembling the distribution of real data, yet the diversity of those generated samples is limited due to the so-called mode collapse phenomenon observed in GANs. Especially prone to mode collapse are conditional GANs, which tend to ignore the input noise vector and focus on the conditional information. Recent methods proposed to mitigate this limitation increase the diversity of generated samples, yet they reduce the performance of the models when similarity of samples is required. To address this shortcoming, we propose a novel method to selectively increase the diversity of GAN-generated samples. By adding a simple, yet effective regularization to the training loss function we encourage the generator to discover new data modes for inputs related to diverse outputs while generating consistent samples for the remaining ones. More precisely, we maximise the ratio of distances between generated images and input latent vectors scaling the effect according to the diversity of samples for a given conditional input. We show the superiority of our method in a synthetic benchmark as well as a real-life scenario of simulating data from the Zero Degree Calorimeter of ALICE experiment in LHC, CERN.
... Deep learning shows great power and enables significant opportunities in financial modeling and risk management; see for instance Heaton et al. [18] for financial prediction, Min and Hu [20], Hu [19] and Cao et al. [8] for solving stochastic differential games, Wise et al. [23] for deep hedging, to list a few. A very popular subclass of deep neural networks is generative adversarial network (GAN) [17] that contains two neural networks contest with each other in a game. We refer the reader to Ni et al. [21] for an alternative GAN method with application of sequential data generation. ...
Preprint
The aim of this paper is to study a new methodological framework for systemic risk measures by applying deep learning method as a tool to compute the optimal strategy of capital allocations. Under this new framework, systemic risk measures can be interpreted as the minimal amount of cash that secures the aggregated system by allocating capital to the single institutions before aggregating the individual risks. This problem has no explicit solution except in very limited situations. Deep learning is increasingly receiving attention in financial modelings and risk management and we propose our deep learning based algorithms to solve both the primal and dual problems of the risk measures, and thus to learn the fair risk allocations. In particular, our method for the dual problem involves the training philosophy inspired by the well-known Generative Adversarial Networks (GAN) approach and a newly designed direct estimation of Radon-Nikodym derivative. We close the paper with substantial numerical studies of the subject and provide interpretations of the risk allocations associated to the systemic risk measures. In the particular case of exponential preferences, numerical experiments demonstrate excellent performance of the proposed algorithm, when compared with the optimal explicit solution as a benchmark.
... Structure-aware shape generation. Deep generative models [17], [18] are the powerful way to shape generation tasks and have achieved tremendous success in just a few years. For example, GRASS [19] employs a generative recursive autoencoder to encode part structure in a hierarchical way. ...
Article
Full-text available
It is desirable to enable robots capable of automatic assembly. Structural understanding of object parts plays a crucial role in this task yet remains relatively unexplored. In this paper, we focus on the setting of furniture assembly from a complete set of part geometries, which is essentially a 6-DoF part pose estimation problem. We propose a multi-layer transformer-based framework that involves geometric and relational reasoning between parts to update the part poses iteratively. We carefully design a unique instance encoding to solve the ambiguity between geometrically-similar parts so that all parts can be distinguished. In addition to assembling from scratch, we extend our framework to a new task called in-process part assembly. Analogous to furniture maintenance, it requires robots to continue with unfinished products and assemble the remaining parts into appropriate positions. Our method achieves far more than 10% improvements over the current state-of-the-art in multiple metrics on the public PartNet dataset. Extensive experiments and quantitative comparisons demonstrate the effectiveness of the proposed framework.
... Generative adversarial networks (GANs) rely on establishing a minimax game between two neural networks which work in opposition to each other [14]. One network (often called a generator or creator) is trained to produce synthetic data whereas the opposing network (often called discriminator or critic) is trained to judge the authenticity (that is, real or generated) of the created data. ...
Preprint
Full-text available
Processes related to cloud physics constitute the largest remaining scientific uncertainty in climate models and projections. This uncertainty stems from the coarse nature of current climate models and relatedly the lack of understanding of detailed physics. We train a generative adversarial network to generate realistic cloud fields conditioned on meterological reanalysis data for both climate model outputs as well as satellite imagery. While our network is able to generate realistic cloud fields, especially their large-scale patterns, more work is needed to refine its accuracy to resolve finer textural details of cloud masses to improve its predictions.
... Because feature disentanglement helps us construct a set of independent features responsible for the data-generating process, the feature alignment property transforms those obtained features into semantically meaningful features, and the generator helps us to obtain visual explanations, explaining these features. Due to feature disentanglement and generator, our implicit choice of model reduced to variational auto-encoders [33] or generative models [34]. We adopt the discoveries from [31], which shows how variational and adversarial training encourages models to learn disentangled representations implicitly. ...
Preprint
Full-text available
Most of the current explainability techniques focus on capturing the importance of features in input space. However, given the complexity of models and data-generating processes, the resulting explanations are far from being complete', in that they lack an indication of feature interactions and visualization of their effect'. In this work, we propose a novel twin-surrogate explainability framework to explain the decisions made by any CNN-based image classifier (irrespective of the architecture). For this, we first disentangle latent features from the classifier, followed by aligning these features to observed/human-defined context' features. These aligned features form semantically meaningful concepts that are used for extracting a causal graph depicting the perceived' data-generating process, describing the inter- and intra-feature interactions between unobserved latent features and observed context' features. This causal graph serves as a global model from which local explanations of different forms can be extracted. Specifically, we provide a generator to visualize the effect' of interactions among features in latent space and draw feature importance therefrom as local explanations. Our framework utilizes adversarial knowledge distillation to faithfully learn a representation from the classifiers' latent space and use it for extracting visual explanations. We use the styleGAN-v2 architecture with an additional regularization term to enforce disentanglement and alignment. We demonstrate and evaluate explanations obtained with our framework on Morpho-MNIST and on the FFHQ human faces dataset. Our framework is available at \url{https://github.com/koriavinash1/GLANCE-Explanations}.
Article
Zusammenfassung Die Optimierung von Zerspanprozessen erfordert ein grundlegendes Verständnis des Werkzeugverschleißverhaltens. Um hierauf künstliche Intelligenzalgorithmen anlernen zu können, ist eine Auswahl an signifikanten und technisch relevanten Kennwerten notwendig. Diese Arbeit präsentiert eine Methode zur Datenselektion und -skalierung, um die Anwendungsanforderungen einer künstlichen Intelligenz zu erfüllen. Die entwickelte Lösung zeigt auf Basis reduzierter experimenteller Umfänge eine hohe Übereinstimmung mit den gemessenen Verschleißkennwerten.
Article
In this paper, we present a methodology based on generative adversarial network architecture to generate synthetic data sets with the intention of augmenting continuous glucose monitor data from individual patients. We use these synthetic data with the aim of improving the overall performance of prediction models based on machine learning techniques. Experiments were performed on two cohorts of patients suffering from type 1 diabetes mellitus with significant differences in their clinical outcomes. In the first contribution, we have demonstrated that the chosen methodology is able to replicate the intrinsic characteristics of individual patients following the statistical distributions of the original data. Next, a second contribution demonstrates the potential of synthetic data to improve the performance of machine learning approaches by testing and comparing different prediction models for the problem of predicting nocturnal hypoglycemic events in type 1 diabetic patients. The results obtained for both generative and predictive models are quite encouraging and set a precedent in the use of generative techniques to train new machine learning models.
Chapter
Deep neural networks have shown to be promising approaches for medical image analysis. However, their training is most effective when they learn robust data representations using large-scale annotated datasets, which are tedious to acquire in clinical practice. As medical annotations are often limited, there has been an increasing interest in making data representations robust in case of data lack. In particular, a spate of research focuses on constraining the learned representations to be interpretable and able to separate out, or disentangle, the data explanatory factors. This chapter discusses recent disentanglement frameworks, with a special focus on the image segmentation task. We build on a recent approach for disentanglement of cardiac medical images into disjoint patient anatomy and imaging modality dependent representations. We incorporate into the model a purposely designed architecture (which we term “temporal transformer”) which, from a given image and a time gap, can estimate anatomical representations of an image at a future time-point within the cardiac cycle of cine MRI. The transformer's role is to introduce a self-supervised objective to encourage the emergence of temporally coherent data representations. We show that such a regularization improves the quality of disentangled representations, ultimately increasing semi-supervised segmentation performance when annotations are scarce. Finally, we show that predicting future representations can be potentially used for image synthesis tasks.
Chapter
Missing data is common in medical image research. For instance, corrupted or unusable slices owing to the presence of artifacts such as respiratory or motion ghosting, aliasing, and signal loss in images significantly reduce image quality and diagnostic accuracy. Also, medical image acquisition time is often limited by cost and physical or patient care constraints, resulting in highly under-sampled images, which can be formulated as missing in-between slices. Such clinically acquired scans violate underlying assumptions of many downstream algorithms. Another important application lies in multi-modal/multi-contrast imaging, where different medical images contain complementary information for improving the diagnosis. However, a complete set of different images is often difficult to obtain. All of these can be considered as missing image data, which can lead to a reduced statistical power and potentially biased results, if not handled appropriately. Thanks to the recent advances in deep neural networks and generative adversarial networks (GANs), the problem of missing image imputation can be viewed as an image synthesis problem, and its performance has been remarkably improved. In this chapter, we present cardiac MR imaging as a use case and investigate a robust approach, namely Image Imputation Generative Adversarial Network (I2-GAN), and compare it with several traditional and state-of-the-art image imputation techniques in context of missing slices.
Article
На протязі усього свого шляху розвитку людство намагалося займатися прогнозуванням, адже якщо ти вмієш прогнозувати та передбачати перебіг подій, то шанс на виживання збільшується. З розвитком суспільства об’єкти і мета прогнозування змінювалися, ускладнювалися та охоплювали все більш складні явища і процеси.
Chapter
This chapter reviews recent developments of generative adversarial network (GAN)-based methods for medical and biomedical image synthesis tasks. These methods are classified into GAN, conditional GAN (cGAN), and cycle-consistent GAN (Cycle-GAN) according to the network architecture designs. For each category, a literature survey is given, which covers discussions of the network architecture designs, loss functions used to supervise the network, and challenges of training the network. We then introduce some practical aspects of these GAN-based methods, such as network setting for different tasks' aim, image pre-processing to enhance the input image quality, and data augmentation to enlarge the training data variation. We also briefly introduce some specific applications of cGAN and Cycle-GAN. Finally, a conclusion with highlighted important contributions and discussion of some identified specific challenges for GAN-based methods are given.
Article
We consider the problem of enhancing user privacy in common data analysis and machine learning development tasks, such as data annotation and inspection, by substituting the real data with samples from a generative adversarial network. We propose employing Bayesian differential privacy as the means to achieve a rigorous theoretical guarantee while providing a better privacy-utility trade-off. We demonstrate experimentally that our approach produces higher-fidelity samples compared to prior work, allowing to (1) detect more subtle data errors and biases, and (2) reduce the need for real data labelling by achieving high accuracy when training directly on artificial samples.
Chapter
Detecting and identifying pathological structures is one of the key tasks in the research of medical image computing. In clinical practice, the task is one step of diagnosis involving doctors and radiologists, and can be tedious and time-consuming. With the rise of machine learning and deep learning, researchers have formulated the task mathematically and proposed numerous methods to automate and accelerate this process. Current research has been actively developed around several topics, such as semantic segmentation, active learning and continual learning, where one of each can present years of research works. In spite of the great variety of research, we would like to dedicate this chapter to one of these interesting tasks, unsupervised abnormality detection, where pathological structures are detected as abnormalities according to prior knowledge representation in the form of features or distribution for healthy anatomy. Unsupervised abnormality detection, though more challenging than supervised detection, has the advantage of data efficiency and capability of detecting pathologies without specification. Approaches such as probabilistic models and generative models are learned on a dataset consisting of clinically confirmed healthy images, and then new images are evaluated by the learned model to decide whether they have high probability in the estimated distribution, e.g., healthy images, or low probability, e.g., abnormal images. Typical deep-learning-based approaches are built with generative models such as generative adversarial networks and variational autoencoders, and have been evaluated on large public datasets. The detection accuracy of unsupervised abnormality detection has seen significant improvement in recent years and gradually narrows the gap with supervised methods.
Article
Given the exponential growth of patent documents, automatic patent summarization methods to facilitate the patent analysis process are in strong demand. Recently, the development of natural language processing (NLP), text-mining, and deep learning has greatly improved the performance of text summarization models for general documents. However, existing models cannot be successfully applied to patent documents, because patent documents describing an inventive technology and using domain-specific words have many differences from general documents. To address this challenge, we propose in this study a multi-patent summarization approach based on deep learning to generate an abstractive summarization considering the characteristics of a patent. Single patent summarization and multi-patent summarization were performed through a patent-specific feature extraction process, a summarization model based on generative adversarial network (GAN), and an inference process using topic modeling. The proposed model was verified by applying it to a patent in the drone technology field. In consequence, the proposed model performed better than existing deep learning summarization models. The proposed approach enables high-quality information summary for a large number of patent documents, which can be used by R&D researchers and decision-makers. In addition, it can provide a guideline for deep learning research using patent data.
Article
Data scarcity is a serious issue in credit risk assessment for some emerging financial institutions. As a typical category of data scarcity, small sample with high dimensionality often leads to the failure to build an effective credit risk assessment model. To solve this issue, a Wasserstein generative adversarial networks (WGAN)-based data augmentation and hybrid feature selection method is proposed for small sample credit risk assessment with high dimensionality. In this methodology, WGAN is first used to produce the virtual samples to overcome the data instance scarcity issue, and then a kernel partial least square with quantum particle swarm optimization (KPLS-QPSO) algorithm is proposed to solve the high-dimensionality issue. For verification purposes, two small sample credit datasets with high dimensionality are used to demonstrate the effectiveness of the proposed methodology. Empirical results indicate that the proposed methodology can significantly improve the prediction performance and avoid possible economic losses in credit risk assessment. This implies that the proposed methodology is a competitive approach to small sample credit risk assessment with high dimensionality.
Article
The ability of deep learning has been tested to learn graphical features for building-plan generation. However, whether the deeper space allocation strategies can be obtained and thus reduce energy consumption has still not been investigated. In the present study, we aimed to train a neural network by employing a characterized sample set to generate a residential building floor plan (RBFP) for achieving energy reduction effects. The network is based on Pix2Pix, including two sub-models: functional segmentation layout (FSL) generation and building floor plan (BFP) generation. To better characterize the energy efficiency, 98 screened floor plans of Solar Decathlon (SD) entries were labeled as the sample set. The data augmentation method was adopted to improve the performance of the FSL sub-model after the preliminary testing. Three existing residential buildings were used as cases to observe whether the network-generated RBFP gained the effect of decreasing energy consumption with decent space allocation. The results showed that, under the same simulation settings and building exterior profile (BEP) conditions, the function arrangement of the generated scheme was more reasonable compared to the original scheme in each case. The annual total energy consumption was reduced by 13.38%, 12.74%, and 7.47%, respectively. In conclusion, trained by the sample set that characterizes energy efficiency, the RBFP generation network has a positive effect in both optimizing the space allocation and reducing energy consumption. The implemented data augmentation method can significantly improve the network’s training results with a small sample size.
Article
Unsupervised domain adaptation (UDA) person re-identification (re-ID) aims to transfer knowledge from a labeled source domain to guide the task proposed on the unlabeled target domain, in which people share different identifications and cross multiple camera views within two different domains. Consequently, traditional UDA re-ID techniques generally suffer due to the negative transfer caused by the inevitable noise generated by variant backgrounds, while the foregrounds also lack sufficient reliable identification knowledge to guarantee the qualified cross-domain re-ID. To remedy the raised negative transfer caused by variant backgrounds, we propose a novel body structure estimation (BSE) mechanism enforced semantic driven attention network (SDA), which enables the designed model with semantic effectiveness to distinguish the foreground and background. In searching for the reliable feature representations as in the foreground areas, we propose a novel label refinery mechanism to dynamically optimize the traditional attribute learning techniques for the strengthened personal attribute features and thus resulting the qualified UDA-re-ID. Extensive experiments demonstrate the effectiveness of our method in solving unsupervised domain adaptation person re-ID task on three large-scale datasets including Market-1501, DukeMTMC-reID and MSMT17.
Article
Full-text available
We present a data-driven or non-intrusive reduced-order model (NIROM) which is capable of making predictions for a significantly larger domain than the one used to generate the snapshots or training data. This development relies on the combination of a novel way of sampling the training data (which frees the NIROM from its dependency on the original problem domain) and a domain decomposition approach (which partitions unseen geometries in a manner consistent with the sub-sampling approach). The method extends current capabilities of reduced-order models to generalise, i.e., to make predictions for unseen scenarios. The method is applied to a 2D test case which simulates the chaotic time-dependent flow of air past buildings at a moderate Reynolds number using a computational fluid dynamics (CFD) code. The procedure for 3D problems is similar, however, a 2D test case is considered sufficient here, as a proof-of-concept. The reduced-order model consists of a sampling technique to obtain the snapshots; a convolutional autoencoder for dimensionality reduction; an adversarial network for prediction; all set within a domain decomposition framework. The autoencoder is chosen for dimensionality reduction as it has been demonstrated in the literature that these networks can compress information more efficiently than traditional (linear) approaches based on singular value decomposition. In order to keep the predictions realistic, properties of adversarial networks are exploited. To demonstrate its ability to generalise, once trained, the method is applied to a larger domain which has a different arrangement of buildings. Statistical properties of the flows from the reduced-order model are compared with those from the CFD model in order to establish how realistic the predictions are.
Article
The fine-grained localization of clinicians in the operating room (OR) is a key component to design the new generation of OR support systems. Computer vision models for person pixel-based segmentation and body-keypoints detection are needed to better understand the clinical activities and the spatial layout of the OR. This is challenging, not only because OR images are very different from traditional vision datasets, but also because data and annotations are hard to collect and generate in the OR due to privacy concerns. To address these concerns, we first study how joint person pose estimation and instance segmentation can be performed on low resolutions images with downsampling factors from 1x to 12x. Second, to address the domain shift and the lack of annotations, we propose a novel unsupervised domain adaptation method, called AdaptOR, to adapt a model from an in-the-wild labeled source domain to a statistically different unlabeled target domain. We propose to exploit explicit geometric constraints on the different augmentations of the unlabeled target domain image to generate accurate pseudo labels and use these pseudo labels to train the model on high- and low-resolution OR images in a self-training framework. Furthermore, we propose disentangled feature normalization to handle the statistically different source and target domain data. Extensive experimental results with detailed ablation studies on the two OR datasets MVOR+ and TUM-OR-test show the effectiveness of our approach against strongly constructed baselines, especially on the low-resolution privacy-preserving OR images. Finally, we show the generality of our method as a semi-supervised learning (SSL) method on the large-scale COCO dataset, where we achieve comparable results with as few as 1% of labeled supervision against a model trained with 100% labeled supervision. Code is available at https://github.com/CAMMA-public/HPE-AdaptOR.
Article
Expression recognition has been an important research direction in the field of psychology, which can be used in traffic, medical, security, and criminal investigation by expressing human feelings through the muscles in the corners of the mouth, eyes, and face. Most of the existing research work uses convolutional neural networks (CNN) to recognize face images and thus classify expressions, which does achieve good results, but CNN do not have enough ability to extract global features. The Transformer has advantages for global feature extraction, but the Transformer is more computationally intensive and requires a large amount of training data. So, in this paper, we use the hierarchical Transformer, namely Swin Transformer, for the expression recognition task, and its computational power will be greatly reduced. At the same time, it is fused with a CNN model to propose a network architecture that combines the Transformer and CNN, and to the best of our knowledge, we are the first to combine the Swin Transformer with CNN and use it in an expression recognition task. We then evaluate the proposed method on some publicly available expression datasets and can obtain competitive results.
ResearchGate has not been able to resolve any references for this publication.