ArticlePDF Available

Generative Adversarial Networks: The State of the Art and Beyond

Authors:

Abstract and Figures

Generative adversarial networks (GANs) have become a hot research topic in artificial intelligence. Inspired by the two-player zero-sum game, GAN is composed of a generator and a discriminator, both trained with the adversarial learning mechanism. The aim of GAN is to estimate the potential distribution of existing data and generate new data samples from the same distribution. Since its initiation, GAN has been widely studied due to its enormous prospect for applications, including image and vision computing, speech and language processing, information security, and chess game. In this paper we summarize the state of the art of GAN and look into its future. First of all, we survey the GAN's background, theoretic and implementation models, application fields, advantages and disadvantages, and development trends. Then, we investigate the relation between GAN and parallel intelligence with the conclusion that GAN has a great potential in parallel systems especially in computational experiments, in terms of virtual-real interaction and integration. Finally, we clarify that GAN can provide specific and substantial algorithmic support for the ACP theory.
Content may be subject to copyright.
43 3 自动化学报 Vol. 43, No. 3
2017 3ACTA AUTOMATICA SINICA March, 2017
生成式对抗网络 GAN 的研究进展与展望
王坤峰 1,2 1,3段艳杰 1,3林懿伦 1,3郑心湖 4王飞跃 1,5
摘要生成式对抗网络 GAN (Generative adversarial networks) 目前已经成为人工智能学界一个热门的研究方向. GAN
的基本思想源自博弈论的二人零和博弈,由一个生成器和一个判别器构成,通过对抗学习的方式来训练.目的是估测数据样本
的潜在分布并生成新的数据样本.在图像和视觉计算、语音和语言处理、信息安全、棋类比赛等领域, GAN 正在被广泛研究,
具有巨大的应用前景.本文概括了 GAN 的研究进展,并进行展望.在总结了 GAN 的背景、理论与实现模型、应用领域、优缺
点及发展趋势之后,本文还讨论了 GAN 与平行智能的关系,认为 GAN 可以深化平行系统的虚实互动、交互一体的理念,
别是计算实验的思想,ACP (Artificial societies, computational experiments, and parallel execution) 理论提供了十分具
体和丰富的算法支持.
关键词 生成式对抗网络,生成式模型,零和博弈,对抗学习,平行智能, ACP 方法
引用格式 王坤峰,苟超,段艳杰,林懿伦,郑心湖,王飞跃.生成式对抗网络 GAN 的研究进展与展望.自动化学报, 2017,
43(3): 321332
DOI 10.16383/j.aas.2017.y000003
Generative Adversarial Networks: The State of the Art and Beyond
WANG Kun-Feng1,2GOU Chao1,3DUAN Yan-Jie1,3LIN Yi-Lun1,3
ZHENG Xin-Hu4WANG Fei-Yue1,5
Abstract Generative adversarial networks (GANs) have become a hot research topic in artificial intelligence. Inspired
by the two-player zero-sum game, GAN is composed of a generator and a discriminator, both trained with the adversarial
learning mechanism. The aim of GAN is to estimate the potential distribution of existing data and generate new data
samples from the same distribution. Since its initiation, GAN has been widely studied due to its enormous prospect
for applications, including image and vision computing, speech and language processing, information security, and chess
game. In this paper we summarize the state of the art of GAN and look into its future. First of all, we survey the GAN0s
background, theoretic and implementation models, application fields, advantages and disadvantages, and development
trends. Then, we investigate the relation between GAN and parallel intelligence with the conclusion that GAN has a great
potential in parallel systems especially in computational experiments, in terms of virtual-real interaction and integration.
Finally, we clarify that GAN can provide specific and substantial algorithmic support for the ACP theory.
Key words Generative adversarial networks, generative models, zero-sum game, adversarial learning, parallel intelli-
gence, ACP methodology
Citation Wang Kun-Feng, Gou Chao, Duan Yan-Jie, Lin Yi-Lun, Zheng Xin-Hu, Wang Fei-Yue. Generative adversarial
networks: the state of the art and beyond. Acta Automatica Sinica, 2017, 43(3): 321332
收稿日期 2017-02-01 录用日期 2017-03-01
Manuscript received February 1, 2017; accepted March 1, 2017
国家自然科学基金 (61533019, 71232006, 91520301) 资助
Supported by National Natural Science Foundation of China
(61533019, 71232006, 91520301)
本文责任编委 刘德荣
Recommended by Associate Editor LIU De-Rong
1. 中国科学院自动化研究所复杂系统管理与控制国家重点实验室 北京
100190 中国 2. 青岛智能产业技术研究院 青岛 266000 中国 3.
中国科学院大学 北京 100049 中国 4. 明尼苏达大学计算机科学与工
程学院 明尼阿波利斯 MN 55414 美国 5. 国防科学技术大学军事计
算实验与平行系统技术研究中心 长沙 410073 中国
1. The State Key Laboratory of Management and Control for
Complex Systems, Institute of Automation, Chinese Academy
of Sciences, Beijing 100190, China 2. Qingdao Academy of
Intelligent Industries, Qingdao 266000, China 3. University of
Chinese Academy of Sciences, Beijing 100049, China 4. De-
partment of Computer Science and Engineering, University of
生成式对抗网络 GAN (Generative adversarial
networks) Goodfellow [1] 2014 年提出的一
种生成式模型. GAN 在结构上受博弈论中的二人
零和博弈 (即二人的利益之和为零,一方的所得正是
另一方的所失)的启发,系统由一个生成器和一个判
别器构成.生成器捕捉真实数据样本的潜在分布,
生成新的数据样本;判别器是一个二分类器,判别输
入是真实数据还是生成的样本.生成器和判别器均
可以采用目前研究火热的深度神经网络[2]. GAN
Minnesota, Minneapolis, MN 55414, USA 5. Research Cen-
ter for Computational Experiments and Parallel Systems Tech-
nology, National University of Defense Technology, Changsha
410073, China
322 自动化学报 43
优化过程是一个极小极大博弈 (Minimax game)
,优化目标是达到纳什均衡[3],使生成器估测到数
据样本的分布.
在当前的人工智能热潮下, GAN 的提出满足了
许多领域的研究和应用需求,同时为这些领域注入
了新的发展动力. GAN 已经成为人工智能学界一
个热门的研究方向,著名学LeCun 甚至将其称为
过去十年间机器学习领域最让人激动的点子”.
,图像和视觉领域是对 GAN 研究和应用最广泛
的一个领域,已经可以生成数字、人脸等物体对象,
构成各种逼真的室内外场景,从分割图像恢复原图
,给黑白图像上色,从物体轮廓恢复物体图像,
低分辨率图像生成高分辨率图像等[4].此外, GAN
已经开始被应用到语音和语言处理[56]电脑病毒
监测[7]棋类比赛程序[8] 等问题的研究中.
本文综述了生成式对抗网络 GAN 的最新研
进展,并对发展趋势进行展望.1节介绍 GAN
提出背景.2节描述 GAN 的理论与实现模型,
GNN 的基本原理、学习方法、衍生模型等.3
节列举 GAN 在图像和视觉、语音和语言、信息安全
等领域的典型应.4节对 GAN 进行思考与展
,讨论 GAN 与平行智能,特别是与计算实验的关
.最后,5节对本文进行总结.
1GAN 的提出背景
本节介绍 GAN 的提出背景,以便读者更好地
理解 GAN 的研究进展和应用领域.
1.1 人工智能的热潮
近年来,随着计算能力的提高和各行业数据量
的剧增,人工智能取得了快速发展,使得研究者对人
工智能的关注度和社会大众对人工智能的憧憬空前
提升[2,9] .学术界普遍认为人工智能分为两个阶段:
感知阶段和认知阶段.在感知阶段,机器能够接收来
自外界的各种信号,例如视觉信号、听觉信号等,
对此作出判断,对应的研究领域有图像识别、语音识
别等.在认知阶段,机器能够对世界的本质有一定的
理解,不再是单纯、机械地做出判断.基于多年的研
究经,本文作者认为人工智能的表现层次包括判
断、生成理解和创造及应用,如图 1所示.一方面,
这些层次相互联系相互促进;一方面,各个层次之
间又有很大的鸿沟,有待新的研究突破.
无论是普遍认为的人工智能两阶段还是本文作
者总结的人工智能四个层次,其中都涉及理解这个
环节.,理解无论对人类还是人工智能都是内
在的表现,无法直接测量,只能间接从其他方面推
.如何衡量人工智能的理解程度,虽然没有定论,
但是著名学者 Feynman 有句名言 “What I cannot
create, I do not understand. (不可造 ,未能知
.)” 这说明机器制造事物的能力从某种程度上取
决于机器对事物的理解.GAN 作为典型的生成
式模,其生成器具有生成数据样本的能力.这种
能力定程上反它对物的.,
GAN 有望加深人工智能的理解层面的研究.
1人工智能的研究层次
Fig. 1 The levels of artificial intelligence
1.2 生成式模型的积累
生成式模型不仅在人工智能领域占有重要地位,
生成方法本身也具有很大的研究价值.生成方法和
判别方法是机器学习中监督学习方法的两个分支.
生成式模型是生成方法学习得到的模型.生成方法
涉及对数据的分布假设和分布参数学习,并能够根
据学习而来的模型采样出新的样本.本文认为生成
式模型从研究出发点的角度可以分为两类:人类理
解数据的角度和机器理解数据的角度.
从人类理解数据的角度出发,典型的做法是先
对数据的显式变量或者隐含变量进行分布假设,
后利用真实数据对分布的参数或包含分布的模型进
行拟合或训练,最后利用学习到的分布或模型生成
新的样本.这类生成式模型涉及的主要方法有最大
似然估计法、近似法[1011] 马尔科夫链方法[1214]
.从这个角度学习到的模型具有人类能够理解的
分布,但是对机器学习来说具有不同的限制.例如,
以真实样本进行最大似然估计,参数更新直接来自
于数据样本,导致学习到的生成式模型受到限制.
采用近似法学习到的生成式模型由于目标函数难解
一般只能在学习过程中逼近目标函数的下界,并不
是直接对目标函数的逼近.马尔科夫链方法既可以
用于生成式模型的训练又可以用于新样本的生成,
但是马尔科夫链的计算复杂度较高.
从机器理解数据的角度出发,建立的生成式模
型一般不直接估计或拟合分布,而是从未明确假设
的分布中获取采样的数据[15],通过这些数据对模型
进行修.这样得到的生成式模型对人类来说缺乏
3期 王坤峰等:生成式对抗网络 GAN 的研究进展与展望 323
可解释性,但是生成的样本却是人类可以理解的.
此推,机器以人类无法显式理解的方式理解了数
据并且生成了人类能够理解的新数据.GAN
出之,这种从机器理解数据的角度建立的生成式
模型一般需要使用马尔科夫链进行模型训练,效率
较低,一定程度上限制了其系统应用.
GAN 提出之前,生成式模型已经有一定研究积
,模型训练过程和生成数据过程中的局限无疑是
生成式模型的障碍.要真正实现人工智能的四个层
,就需要设计新的生成式模型来突破已有的障碍.
1.3 神经网络的深化
过去 10 年来,随着深度学[1617] 技术在各个
领域取得巨大成功,神经网络研究再度崛起.神经网
络作为深度学习的模型结构,得益于计算能力的提
升和数据量的增大,一定程度上解决了自身参数多、
训练难的问题,被广泛应用于解决各类问题中.
,深度学习技术在图像分类问题上取得了突破性
的效果[1819] ,显著提高了语音识别的准确率[20],
被成功应用于自然语言理解领域[21].神经网络取得
的成功和模型自身的特点是密不可分的.在训练方
,神经网络能够采用通用的反向传播算法,训练过
程容易实现;在结构方面,神经网络的结构设计自由
灵活,局限性;在建模能力方面,神经网络理论上
能够逼近任意函数,应用范围广.另外,计算能力的
提升使得神经网络能够更快地训练更多的参数,
一步推动了神经网络的流行.
1.4 对抗思想的成功
从机器学习到人工智能,对抗思想被成功引入
若干领域并发挥作用.博弈、竞争中均包含着对抗
的思想.博弈机器学习[22] 将博弈论的思想与机器学
习结合,对人的动态策略以博弈论的方法进行建模,
优化广告竞价机制,并在实验中证明了该方法的
效性.棋程序 AlphaGo[23] 战胜人类选手引起大
众对人工智能的兴趣,AlphaGo 的中级版本在训
练策略网络的过程中就采取了两个网络左右互博的
方式,获得棋局状态、策略和对应回报,并以包含博
弈回报的期望函数作为最大化目标.在神经网络的
研究,曾有研究者利用两个神经网络互相竞争的
方式对网络进行训练[24],鼓励网络的隐层节点之间
在统计上独立,将此作为训练过程中的正则因素.
有研究者[2526] 采用对抗思想来训练领域适应的神
经网络:特征生成器将源领域数据和目标领域数据
变换为高层抽象特征,尽可能使特征的产生领域难
以判别;领域判别器基于变换后的特征,尽可能准确
地判别特征的领域.对抗样[2728] 也包含着
的思想,指的是那些和真实样本差别甚微却被误分
类的样本或者差异很大却被以很高置信度分为某一
真实类的样,反映了神经网络的一种诡异行为特
.对抗样本和对抗网络虽然都包含着对抗的思想,
但是目的完全不同.对抗思想应用于机器学习或人
工智能取得的诸多成,也激发了更多的研究者对
GAN 的不断挖掘.
2GAN 的理论与实现模型
2.1 GAN 的基本原理
GAN 的核心思想来源于博弈论的纳什均衡.
设定参与游戏双方分别为一个生成器 (Generator)
和一个判别(Discriminator), 生成器的目的是尽
量去学习真实的数据分布,而判别器的目的是尽量
正确判别输入数据是来自真实数据还是来自生成器;
为了取得游戏胜利,这两个游戏参与者需要不断优
,各自提高自己的生成能力和判别能力,这个学习
优化过程就是寻找二者之间的一个纳什均衡. GAN
的计算流程与结构如图 2所示.任意可微分的函
数都可以用来表GAN 的生成器和判别器,由此,
我们用可微分函数 DG来分别表示判别器和生
成器,它们的输入分别为真实数据 x和随机变量 z.
G(z)则为由 G生成的尽量服从真实数据分布 pdata
的样本.如果判别器的输入来自真实数据,标注为 1.
如果输入样本为 G(z), 标注为 0. 这里 D的目标
实现对数据来源的二分类判别:(来源于真实数据
x的分)或者伪 (来源于生成器的伪数据 G(z)),
G的目标是使自己生成的伪数据 G(z)D
的表现 D(G(z)) 和真实数据 xD上的表现 D(x)
一致,这两个相互对抗并迭代优化的过程使得 D
2 GAN 的计算流程与结构
Fig. 2 Computation procedure and structure of GAN
324 自动化学报 43
G的性能不断提升,当最终 D的判别能力提升到一
定程度,并且无法正确判别数据来源时,可以认为这
个生成器 G已经学到了真实数据的分布.
2.2 GAN 的学习方法
本节中我们讨论 GAN 的学习训练机制.
首先,在给定生成器 G情况下,我们考虑最
优化判别器 D.和一般基于 Sigmoid 的二分类模型
训练一样,训练判别器 D也是最小化交叉熵的过程,
其损失函数为:
ObjD(θD, θG) = 1
2Expdata(x)[log D(x)]
1
2Ezpz(z)[log(1 D(g(z)))]
(1)
其中,x采样于真实数据分pdata(x), z样于先
验分布 pz(z) (例如高斯噪声分布), E(·)表示计算期
望值.这里实际训练时和常规二值分类模型不同,
别器的训练数据集来源于真实数据集分布 pdata(x)
(标注1) 和生成器的数据分布 pg(x) (标注为 0)
两部分.给定生成器 G,我们需要最小化式 (1) 来得
到最优解,在连续空间上,(1) 可以写为如下形式:
ObjD(θD, θG) = 1
2Zx
pdata(x) log(D(x))dx
1
2Zz
pz(z) log(1 D(g(z)))dz=
1
2Zx
[pdata(x) log(D(x))+
pg(x) log(1 D(x))]dx
(2)
对任意的非零实数 mn,且实数值 y[0,1],
表达式
mlog(y)nlog(1 y) (3)
m
m+n处得到最小值.因此,给定生成器 G的情
况下,目标函数 (2)
D
G(x) = pdata(x)
pdata(x) + pg(x)(4)
处得到最小值,此即为判别器的最优解.由式 (4)
, GAN 估计的是两个概率分布密度的比值,这也
是和其他基于下界优化或者马尔科夫链方法的关键
不同之处.
另一方面,D(x)的是 x源于 实数
而非生成数据的概率.当输入数据采样自真实数据
x,D的目标是使得输出概率值 D(x)近于 1,
而当输入来自生成数据 G(z),D的目标是正确
判断数据来源,使得 D(G(z)) 趋近于 0, 同时 G
目标是使得其趋近于 1. 这实际上就是一个关于
GD的零和游戏,那么生成器 G的损失函数为
ObjG(θG) = ObjD(θD, θG). 所以 GAN 的优化问
题是一个极 小 极大化问题, GAN 的目标函数可
以描述如下:
min
Gmax
D{f(D, G) = Expdata(x)[log D(x)]+
Ezpz(z)[log(1 D(G(z)))]}
(5)
总之,对于 GAN 的学习过程,我们需要训练模
D来最大化判别数据来源于真实数据或者伪数
据分布 G(z)的准确,同时,我们需要训练模型 G
来最小log(1 D(G(z))). 这里可以采用交替优
化的方法: 定生G,优化判别D,使
D的判别准确率最大化;然后固定判别器 D,优化
生成器 G,使得 D的判别准确率最小化.当且仅当
pdata =pg时达到全局最优解.训练 GAN ,同一
轮参数更新中,D的参数更k次再G
的参数更新 1.
2.3 GAN 的衍生模型
Goodfellow [1] 2014 年提出 GAN 以来,
各种基于 GAN 的衍生模型被提出,这些模型的创
新点包括模型结构改进、理论扩展及应用等.部分
衍生模型的计算流程与结构如图 3所示.
GAN 在基于梯度下降训练时存在梯度消失的
问题,因为当真实样本和生成样本之间具有极小重
叠甚至没有重叠时,其目标函数的 Jensen-Shannon
散度是一个常数,导致优化目标不连续.解决
训练梯度消失问题, Arjovsky [29] 提出了 Wasser-
stein GAN (W-GAN). W-GAN Earth-Mover
Jensen-Shannon 散度来度量真实样本和生成样
本分布之间的距离,用一个批评函数 f来对应 GAN
的判别器,而且批评函数 f需要建立在 Lipschitz
续性假设上., GAN 的判别器 D具有无限的
建模能力,无论真实样本和生成的样本有多复杂,
别器 D都能把它们区分开,这容易导致过拟合问
.为了限制模型的建模能力, Qi[30] 提出了 Loss-
sensitive GAN (LS-GAN), 将最小化目标函数得到
的损失函数限定在满足 Lipschitz 连续性函数类上,
作者还给出了梯度消失时的定量分析结果.需要指
, W-GAN LS-GAN 并没有改变 GAN 模型的
结构,只是在优化方法上进行了改进.
GAN 的训练只需要数据源的标注信息 (真或
), 并根据判别器输出来优化. Odena[31] 提出了
3期 王坤峰等:生成式对抗网络 GAN 的研究进展与展望 325
3 GAN 衍生模型的计算流程与结构 ((a) GAN[1] , W-GAN[29], LS-GAN[30] ; (b) Semi-GAN[31]; (c) C-GAN[32] ;
(d) Bi-GAN[33]; (e) Info-GAN[34] ; (f) AC-GAN[35]; (g) Seq-GAN[6])
Fig. 3 Computation procedures and structures of GAN-derived models
Semi-GAN, 将真实数据的标注信息加入判别器 D
的训练.进一, Conditional GAN (CGAN)[32]
提出加入额外的信息 yGD和真实数据来建模,
这里y可以是标签或其他辅助信息.传统 GAN
都是学习一个生成式模型来把隐变量分布映射到复
杂真实数据分布上, Donahue [33] 提出一种 Bidi-
rectional GANs (BiGANs) 来实现将复杂数据映射
到隐变量空间,从而实现特征学习.除了 GAN 的基
本框架, BiGANs 额外加入了一个解码器 Q用于将
真实数x映射到隐变量空间,其优化问题转换为
326 自动化学报 43
min
G,Q max
Df(D, Q, G).
InfoGAN[34] GAN 的另一个重要扩展. GAN
能够学得有效的语义特征,但是输入噪声变z
特定变量维数和特定语义之间的关系不明确,In-
foGAN 能够获取输入的隐层变量和具体语义之间
的互信息.具体实现就是把生成器 G的输入分为
两部 zc, zGAN 的输入一致,c
被称为隐码,这个隐码用于表征结构化隐层随机变
量和体特语义间的隐含 . GAN 设定
pG(x) = pG(x|c), 而实际上 cG的输出具有较强
的相关性.G(z, c)来表示生成器的输出,作者[34]
提出利用互信息 I(c;G(z, c)) 来表征两个数据的相
关程度,用目标函数
min
Gmax
D{fI(D, G) = f(D, G)λI (c;G(z, c))}
(6)
来建模求解,这里由于后验概率 p(c|x)不能直接获
,需要引入变分分布来近似后验的下界来求得最
优解.
Odena [35] 提出的 Auxiliary Classifier GAN
(AC-GAN) 可以实现多分类问题,它的判别器输
相应的标签概率.在实际训练中,目标函数则包含真
实数据来源的似然和正确分类标签的似然,不再单
独由判别器二分类损失来反传调节参数,可以进
步调节损失函数使得分类正确率更高, AC-GAN
关键是可以利用输入生成器的标注信息来生成对应
的图像标签,同时还可以在判别器扩展调节损失函
,从而进一步提高对抗网络的生成和判别能力.
考虑到 GAN 的输出为连续实数分布而无法产
生离散空间的分布, Yu [6] 提出了一种能够生成离
散序列的生成式模型 Seq-GAN. 他们用 RNN 实现
生成器 G,CNN 现判别器 D,D输出判
别概率通过增强学习来更新 G.增强学习中的奖励
通过 D来计算,对于后面可能的行为采用了蒙特卡
洛搜索实现,计算 D的输出平均作为奖励值反馈.
3GAN 的应用领域
作为一个具有 生成能力的模型, GAN
的直接应用就是建模,生成与真实数据分布一致的
数据样本,例如可以生成图像、视频等. GAN 可以
用于解决标注数据不足时的学习问题,例如无监督
学习、半监督学习等. GAN 还可以用于语音和语言
处理,例如生成对话、由文本生成图像等.本节从图
像和视觉、语音和语言、其他领域三个方面来阐述
GAN 的应用.
3.1 图像和视觉领域
GAN 能够生成与真实数据分布一致的图像.
个典型应用来自 Twitter 公司, Ledig [36] 提出利
GAN 来将一个低清模糊图像变换为具有丰富细
节的高清.作者VGG 网络[37] 作为判别器,
用参数化的残差网络[19] 表示生成器,实验结果如图
4所示,可以看到 GAN 生成了细节丰富的图像.
4基于 GAN 的生成图像示例[36]
Fig. 4 Illustration of GAN-generated image[36]
GAN 也开始用于生成自动驾驶场景. Santana
[38] 提出利用 GAN 来生成与实际交通场景分布一
致的图像,再训练一个基于 RNN 的转移模型实现预
测的目的,实验结果如图 5所示. GAN 可以用于自
动驾驶中的半监督学习或无监督学习任务,还可以
利用实际场景不断更新的视频帧来实时优化 GAN
的生成器.
Gou [3940] 提出利用仿真图像和真实图像作
为训练样本来实现人眼检测,但是这种仿真图像与
真实图像在一定的分布. Shrivastava [41]
提出一种基于 GAN 的方法 (称为 SimGAN),
无标签真实图像来丰富细化仿真图像,使得合成图
像更加真实.作者引入一个自正则化项来实现最小
化合成误差并最大程度保留仿真图像的类别,同时
利用加入的局部对抗损失函数来对每个局部图像块
进行判别,使得局部信息更加丰富.
3.2 语音和语言领域
目前已经有一些关于 GAN 的语音和语言处理
文章. Li [5] 提出用 GAN 来表征对话之间的隐
关联性,从而生成对话文本. Zhang [42] 提出基于
GAN 的文本生成,他们用 CNN 作为判别,判别
器基于拟合 LSTM 的输出,用矩匹配来解决优化问
;在训练时,和传统更新多次判别器参数再更新一
次生成器不同,需要多次更新生成器再更新 CNN
3期 王坤峰等:生成式对抗网络 GAN 的研究进展与展望 327
5基于 GAN 的生成图像示例 (奇数列为生成图像,偶数列为目标图像)[38]
Fig. 5 Another illustration of GAN-generated images (Odd columns show the generated images, and even columns show
the target images)[38]
别器. SeqGAN[6] 策略度来 生成G,
策略梯度的反馈奖励信号来自于生成器经过蒙特卡
洛搜索得,实验表明 SeqGAN 语音、诗词和音
乐生成方面可以超过传统方法. Reed [43] 提出用
GAN 基于文本描述来生成图像,文本编码被作为生
成器的条件输入,同时为了利用文本编码信息,也将
其作为判别器特定层的额外信息输入来改进判别器,
判别是否满足文本描述的准确率,实验结果表明生
成图像和文本描述具有较高相关性.
3.3 其他领域
除了将 GAN 用于图像和视觉语音和语
等领域, GAN 还可以与强化学习相结,例如前述
SeqGAN[6].还有研究者将 GAN 和模仿学习融
[4445]GAN Actor-critic 方法结合[46] .
Hu [7] 提出 MalGAN 帮助检测恶意代码,GAN
生成具有对抗性的病毒代码样本,实验结果表明基
GAN 的方法可以比传统基于黑盒检测模型的方
法性能更好. Childambaram [8] 基于风格转换提
出了一个扩展 GAN 生成器,判别器来正则
生成器而不是用一个损失函数,用国际象棋实验示
例证明了所提方法的有效性.
4GAN 的思考与展望
4.1 GAN 的意义和优点
GAN 对于生成式模型的发展具有重要的意义.
GAN 作为一种生成式方法,有效解决了可建立自然
性解释的数据的生成难题.尤其对于生成高维数据,
所采用的神经网络结构不限制生成维度,大大拓
了生成数据样本的范围.所采用的神经网络结构能
够整合各类损失函数,增加了设计的自由度. GAN
的训练过程创新性地将两个神经网络的对抗作为训
练准则并且可以使用反向传播进行训练,训练过
不需要效率较低的马尔科夫链方法,也不需要做各
种近似推理,没有复杂的变分下,大大改善了生成
式模型的训练难度和训练效率. GAN 的生成过程不
需要繁琐的采样序列,可以直接进行新样本的采样
和推断,提高了新样本的生成效率.对抗训练方法摒
弃了直接对真实数据的复制或平均,增加了生成样
本的多样性. GAN 在生成样本的实践中,生成的样
本易于人类理解.例如,能够生成十分锐利清晰的图
328 自动化学报 43
,为创造性地生成对人类有意义的数据提供了可
能的解决方法.
GAN 除了对生成式模型的贡献,对于半监督学
习也有启发. GAN 学习过程中不需要数据标签.
GAN 提出的目的不是半监督学习,但是 GAN
训练过程可以用来实施半监督学习中无标签数据对
模型的预训练过程.具体来说,先利用无标签数据训
GAN, 基于训练好的 GAN 对数据的理解,再利
用小部分有标签数据训练判别器,用于传统的分
和回归任务.
4.2 GAN 的缺陷和发展趋势
GAN 虽然解决了生成式模型的一些问题,并且
对其他方法的发展具有一定的启发意义,但是 GAN
并不完美,它在解决已有问题的同时也引入了一些
新的问题. GAN 最突出的优点同时也是它最大的
问题根源. GAN 采用对抗学习的准则,理论上还
不能判断模型的收敛性和均衡点的存在性.训练过
程需要保证两个对抗网络的平衡和同步,否则难
得到很好的训练效果.而实际过程中两个对抗网络
的同不易 ,训练过程可能不稳定.另外,
为以神经网络为基础的生成式模型, GAN 存在神
经网络类模型的一般性缺陷,即可解释性差.,
GAN 生成的样本虽然具有多样性,但是存在崩溃模
(Collapse mode) 现象[4] ,可能生成多样的,但对
于人类来说差异不大的样本.
GAN ,但不可否认的是,
GAN 的研究进展表明它具有广阔的发展前.
, Wasserstein GAN[29] 彻底解决了训练不稳定问
,同时基本解决了崩溃模式现象.如何彻底解决崩
溃模式并继续优化训练过程是 GAN 的一个研究方
.另外,关于 GAN 收敛性和均衡点存在性的理论
推断也是未来的一个重要研究课题.以上研究方向
是为了更好地解决 GAN 存在的缺陷.从发展应用
GAN 的角度,如何根据简单随机的输入,生成多样
的、能够与人类交互的数据,是近期的一个应用发展
方向.GAN 与其他方法交叉融合的角度,如何将
GAN 与特征学习、模仿学习强化学习等技术更好
地融,开发新的人工智能应用或者促进这些方法
的发展,是很有意义的发展方向.从长远来看,如何
利用 GAN 动人工智能的发展与应用,升人
智能理解世界的能力,甚至激发人工智能的创造力
是值得研究者思考的问题.
4.3 GAN 与平行智能的关系
王飞跃研究员[4748] 2004 年提出了复杂系
统建模与调控的 ACP (Artificial societies, compu-
tational experiments, and parallel execution) 理论
和平行系统方法.平行系统强调虚实互动,构建人工
系统来描述实际系统,利用计算实验来学习和评估
各种计算模型,通过平行执行来提升实际系统的性
,使得人工系统和实际系统共同推进[4950]. ACP
理论和平行系统方法目前已经发展为更广义的平行
智能理[51]. GAN 训练中真实的数据样本和生成
的数据样本通过对抗网络互动,并且训练好的生成
器能够生成比真实样本更多的虚拟样本. GAN 可以
深化平行系统的虚实互动、交互一体的理念. GAN
作为一种有效的生成式模型,可以融入到平行智能
研究体系.本节从以下几个方面讨论 GAN 平行
智能的关系.
4.3.1 GAN 与平行视觉
平行视觉[52] ACP 理论在视觉计算领域的推
广,其基本框架与体系结构如图 6所示.平行视觉结
合计算机图形学、虚拟现实机器学习、知识自动化
等技术,利用人工场景、计算实验平行执行等理论
和方,建立复杂环境下视觉感知与理解的理论和
方法体系.平行视觉利用人工场景来模拟和表示复
杂挑战的实际场景,使采集和标注大规模多样性数
据集成为可,通过计算实验进行视觉算法的设计
与评估,最后借助平行执行来在线优化视觉系统.
中产生虚拟的人工场景便可以采用 GAN 实现,
5所示. GAN 能够生成大规模多样性的图像数
据集,与真实数据集结合起来训练视觉模型,有助于
提高视觉模型的泛化能力.
6平行视觉的基本框架与体系结构[52]
Fig. 6 Basic framework and architecture for parallel vision[52]
3期 王坤峰等:生成式对抗网络 GAN 的研究进展与展望 329
4.3.2 GAN 与平行控制
平行控制[5355] 是一种反馈控制,ACP 理论
在复杂系统控制领域的具体应用,其结构如图 7
.平行控制核心是利用人工系统进行建模和表示,
通过计算实验进行分析和评估,最后以平行执行实
现对复杂系统的控制.除了人工系统的生成和计算
实验的分析,平行控制中的人工系统和实际系统平
行执行的过程也利用 GAN 进行模拟,方面可以
进行人工系统的预测学习和实际系统的反馈学习,
另一方面可以进行控制单元的模拟学习和强化学习.
4.3.3 GAN 与平行学习
平行学习[56] 是一种新的机器学习理论框架,
ACP 理论在学习领域的体现,其理论框架如图 8
.平行学习理论框架强调:使用预测学习解决如何
随时间发展对数据进行探索;使用集成学习解决如
何在空间分布上对数据进行探索;使用指示学习解
决如何探索数据生成的方向.平行学习作为机器学
习的一个新型理论框架,与平行视觉和平行控制关
系密切. GAN 在大数据生成、基于计算实验的预测
学习等方面都可以和平行学习结合发展.
5结论
本文综述了生成式对抗网络 GAN 的研究进展.
GAN 提出,立刻受到了人工智能研究者的重视.
GAN 的基本思想源自博弈论的二人零和博弈,由一
个生成器和一个判别器构成,通过对抗学习的方式
来迭代训练,逼近纳什均衡. GAN 作为一种生成
模型,不直接估计数据样本的分布,而是通过模型学
习来估测其潜在分布并生成同分布的新样本.这种
从潜在分布生成 无限新样本的能力,在图像和视
觉计算、语音和语言处理、信息安全等领域具有重
大的应用价值.
7平行控制系统的结构[55]
Fig. 7 Structure of parallel control systems[55]
8平行学习的理论框架图[56]
Fig. 8 Theoretical framework of parallel learning[56]
330 自动化学报 43
本文还展望了 GAN 的发展趋势,重点讨论了
GAN 与平行智能的关系,认为 GAN 可以深化平行
系统的虚实互动、交互一体的理念,ACP 理论提
供具体和丰富的算法支持.平行视觉、平行控制、
平行学习等若干平行系统中, GAN 可以通过生成与
真实数据同分布的数据样本,来支持平行系统的理
论和应用研究.因此, GAN 作为一种有效的生成式
模型,可以融入到平行智能的研究体系.
References
1 Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-
Farley D, Ozair S, Courville A, Bengio Y. Generative adver-
sarial nets. In: Proceedings of the 2014 Conference on Ad-
vances in Neural Information Processing Systems 27. Mon-
treal, Canada: Curran Associates, Inc., 2014. 26722680
2 Goodfellow I, Bengio Y, Courville A. Deep Learning. Cam-
bridge, UK: MIT Press, 2016.
3 Ratliff L J, Burden S A, Sastry S S. Characterization and
computation of local Nash equilibria in continuous games.
In: Proceedings of the 51st Annual Allerton Conference on
Communication, Control, and Computing (Allerton). Mon-
ticello, IL, USA: IEEE, 2013. 917924
4 Goodfellow I. NIPS 2016 tutorial: generative adversarial
networks. arXiv preprint arXiv: 1701.00160, 2016.
5 Li J W, Monroe W, Shi T L, Jean S, Ritter A, Jurafsky D.
Adversarial learning for neural dialogue generation. arXiv
preprint arXiv: 1701.06547, 2017.
6 Yu L T, Zhang W N, Wang J, Yu Y. SeqGAN: sequence gen-
erative adversarial nets with policy gradient. arXiv preprint
arXiv: 1609.05473, 2016.
7 Hu WW, Tan Y. Generating adversarial malware examples
for black-box attacks based on GAN. arXiv preprint arXiv:
1702.05983, 2017.
8 Chidambaram M, Qi Y J. Style transfer generative adver-
sarial networks: learning to play chess differently. arXiv
preprint arXiv: 1702.06762, 2017.
9 Bengio Y. Learning deep architectures for AI. Foundations
and Trends in Machine Learning, 2009, 2(1): 1127
10 Kingma D P, Welling M. Auto-encoding variational Bayes.
arXiv preprint arXiv: 1312.6114, 2013.
11 Rezende D J, Mohamed S, Wierstra D. Stochastic back-
propagation and approximate inference in deep generative
models. arXiv preprint arXiv: 1401.4082, 2014.
12 Hinton G E, Sejnowski T J, Ackley D H. Boltzmann Ma-
chines: Constraint Satisfaction Networks that Learn. Tech-
nical Report No. CMU-CS-84119, Carnegie-Mellon Uni-
versity, Pittsburgh, PA, USA, 1984.
13 Ackley D H, Hinton G E, Sejnowski T J. A learning al-
gorithm for Boltzmann machines. Cognitive Science, 1985,
9(1): 147169
14 Hinton G E, Osindero S, Teh Y W. A fast learning algo-
rithm for deep belief nets. Neural Computation, 2006, 18(7):
15271554
15 Bengio Y, Thibodeau-Laufer ´
E, Alain G, Yosinski J. Deep
generative stochastic networks trainable by backprop. arXiv
preprint arXiv: 1306.1091, 2013.
16 Hinton G E, Salakhutdinov R R. Reducing the dimensional-
ity of data with neural networks. Science, 2006, 313(5786):
504507
17 LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015,
521(7553): 436444
18 Krizhevsky A, Sutskever I, Hinton G E. Imagenet classifi-
cation with deep convolutional neural networks. In: Pro-
ceedings of the 25th International Conference on Neural In-
formation Processing Systems. Lake Tahoe, Nevada, USA:
ACM, 2012. 10971105
19 He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learn-
ing for image recognition. In: Proceedings of the 2016 IEEE
Conference on Computer Vision and Pattern Recognition
(CVPR). Las Vegas, NV, USA: IEEE, 2016. 770778
20 Hinton G, Deng L, Yu D, Dahl G E, Mohamed A R, Jaitly N,
Senior A, Vanhoucke V, Nguyen P, Sainath T N, Kingsbury
B. Deep neural networks for acoustic modeling in speech
recognition: the shared views of four research groups. IEEE
Signal Processing Magazine, 2012, 29(6): 8297
21 Sutskever I, Vinyals O, Le Q V. Sequence to sequence learn-
ing with neural networks. In: Proceedings of the 2014 Con-
ference on Advances in Neural Information Processing Sys-
tems 27. Montreal, Canada: Curran Associates, Inc., 2014.
31043112.
22 He D, Chen W, Wang L W, Liu T Y. A game-theoretic ma-
chine learning approach for revenue maximization in spon-
sored search. arXiv preprint arXiv: 1406.0728, 2014.
23 Silver D, Huang A, Maddison C J, Guez A, Sifre L, van Den
Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam
V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbren-
ner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K,
Graepel T, Hassabis D. Mastering the game of go with deep
neural networks and tree search. Nature, 2016, 529(7587):
484489
24 Schmidhuber J. Learning factorial codes by predictability
minimization. Neural Computation, 1992, 4(6): 863879
25 Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle
H, Laviolette F, Marchand M, Lempitsky V. Domain-
adversarial training of neural networks. Journal of Machine
Learning Research, 2016, 17(59): 135
26 Chen W Z, Wang H, Li Y Y, Su H, Wang Z H, Tu C H,
Lischinski D, Cohen-Or D, Chen B. Synthesizing training
images for boosting human 3D pose estimation. In: Pro-
ceedings of the 2016 Fourth International Conference on 3D
Vision (3DV). Stanford, CA, USA: IEEE, 2016. 479488
27 Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D,
Goodfellow I, Fergus R. Intriguing properties of neural net-
works. arXiv preprint arXiv: 1312.6199, 2013.
3期 王坤峰等:生成式对抗网络 GAN 的研究进展与展望 331
28 McDaniel P, Papernot N, Celik Z B. Machine learning in
adversarial settings. IEEE Security & Privacy, 2016, 14(3):
6872
29 Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. arXiv
preprint arXiv: 1701.07875, 2017.
30 Qi G J. Loss-sensitive generative adversarial networks on
Lipschitz densities. arXiv preprint arXiv: 1701.06264, 2017.
31 Odena A. Semi-supervised learning with generative adver-
sarial networks. arXiv preprint arXiv: 1606.01583, 2016.
32 Mirza M, Osindero S. Conditional generative adversarial
nets. arXiv preprint arXiv: 1411.1784, 2014.
33 Donahue J, Kr¨ahenb¨uhl P, Darrell T. Adversarial feature
learning. arXiv preprint arXiv: 1605.09782, 2016.
34 Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I,
Abbeel P. InfoGAN: interpretable representation learning
by information maximizing generative adversarial nets. In:
Proceedings of the 2016 Neural Information Processing Sys-
tems. Barcelona, Spain: Department of Information Tech-
nology IMEC, 2016. 21722180
35 Odena A, Olah C, Shlens J. Conditional image synthe-
sis with auxiliary classifier GANs. arXiv preprint arXiv:
1610.09585, 2016.
36 Ledig C, Theis L, Husz´ar F, Caballero J, Cunningham A,
Acosta A, Aitken A, Tejani A, Totz J, Wang Z H, Shi W Z.
Photo-realistic single image super-resolution using a genera-
tive adversarial network. arXiv preprint arXiv: 1609.04802,
2016.
37 Simonyan K, Zisserman A. Very deep convolutional net-
works for large-scale image recognition. arXiv preprint
arXiv: 1409.1556, 2014.
38 Santana E, Hotz G. Learning a driving simulator. arXiv
preprint arXiv: 1608.01230, 2016.
39 Gou C, Wu Y, Wang K, Wang F Y, Ji Q. Learning-by-
synthesis for accurate eye detection. In: Proceedings of the
2016 IEEE International Conference on Pattern Recognition
(ICPR). Cancun, Mexico: IEEE, 2016.
40 Gou C, Wu Y, Wang K, Wang K F, Wang F Y, Ji Q. A joint
cascaded framework for simultaneous eye detection and eye
state estimation. Pattern Recognition, 2017, 67: 2331
41 Shrivastava A, Pfister T, Tuzel O, Susskind J, Wang W
D, Webb R. Learning from simulated and unsupervised
images through adversarial training. arXiv preprint arXiv:
1612.07828, 2016.
42 Zhang Y Z, Gan Z, Carin L. Generating text via adversar-
ial training. In: Proceedings of the 2016 Conference on Ad-
vances in Neural Information Processing Systems 29. Curran
Associates, Inc., 2016.
43 Reed S, Akata Z, Yan X C, Logeswaran L, Lee H, Schiele
B. Generative adversarial text to image synthesis. In: Pro-
ceedings of the 33rd International Conference on Machine
Learning. New York, NY, USA: ICML, 2016.
44 Ho J, Ermon S. Generative adversarial imitation learning.
In: Proceedings of the 2016 Conference on Advances in Neu-
ral Information Processing Systems 29. Curran Associates,
Inc., 2016. 45654573
45 Finn C, Christiano P, Abbeel P, Levine S. A connection be-
tween generative adversarial networks, inverse reinforcement
learning, and energy-based models. arXiv preprint arXiv:
1611.03852, 2016.
46 Pfau D, Vinyals O. Connecting generative adversarial net-
works and actor-critic methods. arXiv preprint arXiv:
1610.01945, 2016.
47 Wang Fei-Yue. Parallel system methods for management
and control of complex systems. Control and decision, 2004,
19(5): 485489, 514
(王飞 .平行系统方法与复杂系统的管理和控制.控制与决策,
2004, 19(5): 485489, 514)
48 Wang Fei-Yue. Computational experiments for behavior
analysis and decision evaluation of complex systems. Jour-
nal of System Simulation, 2004, 16(5): 893897
(王飞跃.计算实验方法与复杂系统行为分析和决策评估.系统仿真
学报, 2004, 16(5): 893897)
49 Wang F Y, Zhang J, Wei Q L, Zheng X H, Li L. PDP:
parallel dynamic programming. IEEE/CAA Journal of Au-
tomatica Sinica, 2017, 4(1): 15
50 Bai Tian-Xiang, Wang Shuai, Shen Zhen, Cao Dong-Pu,
Zheng Nan-Ning, Wang Fei-Yue. Parallel robotics and paral-
lel unmanned systems: framework, structure, process, plat-
form and applications. Acta Automatica Sinica, 2017, 43(2):
161175
(白天翔,王帅,沈震,东璞,郑南宁,王飞.平行机器人与平行
无人系统:框架、结构、过程、平台及其应用.自动化学报, 2017,
43(2): 161175)
51 Wang F Y, Wang X, Li L X, Li L. Steps toward parallel in-
telligence. IEEE/CAA Journal of Automatica Sinica, 2016,
3(4): 345348
52 Wang Kun-Feng, Gou Chao, Wang Fei-Yue. Parallel vision:
an ACP-based approach to intelligent vision computing.
Acta Automatica Sinica, 2016, 42(10): 14901500
(王坤峰,苟超,王飞跃.平行视觉:基于 ACP 的智能视觉计算方
.自动化学报, 2016, 42(10): 14901500)
53 Wang Fei-Yue. On the modeling, analysis, control and man-
agement of complex systems. Complex Systems and Com-
plexity Science, 2006, 3(2): 2634
(王飞跃.关于复杂系统的建模、分析、控制和管理.复杂系统与复
杂性科学, 2006, 3(2): 2634)
54 Wang Fei-Yue, Liu De-Rong, Xiong Gang, Cheng Chang-
Jian, Zhao Dong-Bin. Parallel control theory of complex
systems and applications. Complex Systems and Complex-
ity Science, 2012, 9(3): 112
(王飞跃,刘德荣,熊刚,程长建,赵冬斌.复杂系统的平行控制理论
及应用.复杂系统与复杂性科学, 2012, 9(3): 112)
55 Wang Fei-Yue. Parallel control: a method for data-driven
and computational control. Acta Automatica Sinica, 2013,
39(4): 293302
(王飞跃.平行控制:据驱动的计算控制方法.自动化学, 2013,
39(4): 293302)
332 自动化学报 43
56 Li Li, Lin Yi-Lun, Cao Dong-Pu, Zheng Nan-Ning, Wang
Fei-Yue. Parallel learning — a new framework for machine
learning. Acta Automatica Sinica, 2017, 43(1): 18
(李力,林懿伦,曹东璞,郑南宁,王飞.平行学习 机器学习的
一个新型理论框架.自动化学报, 2017, 43(1): 18)
王坤峰 中国科学院自动化研究所复杂
系统管理与控制国家重点实验室副研究
.主要研究方向为智能交通系统,智能
视觉计算,机器学习.
E-mail: kunfeng.wang@ia.ac.cn
(WANG Kun-Feng Associate pro-
fessor at The State Key Laboratory of
Management and Control for Complex Systems, Institute
of Automation, Chinese Academy of Sciences. His research
interest covers intelligent transportation systems, intelli-
gent vision computing, and machine learning.)
苟 超 中国科学院自动化研究所复杂
系统管理与控制国家重点实验室博士研
究生.主要研究方向为智能交通系统,
像处理,模式识别.
E-mail: gouchao2012@ia.ac.cn
(GOU Chao Ph. D. candidate at
The State Key Laboratory of Manage-
ment and Control for Complex Systems, Institute of Au-
tomation, Chinese Academy of Sciences. His research in-
terest covers intelligent transportation systems, image pro-
cessing, and pattern recognition.)
段艳杰 中国科学院自动化研究所复杂
系统管理与控制国家重点实验室博士研
究生.主要研究方向为智能交通系统,
器学习及应用.
E-mail: duanyanjie2012@ia.ac.cn
(DUAN Yan-Jie Ph. D. candidate
at The State Key Laboratory of Man-
agement and Control for Complex Systems, Institute of
Automation, Chinese Academy of Sciences. Her research
interest covers intelligent transportation systems, machine
learning and its application.)
林懿伦 中国科学院自动化研究所复杂
系统管理与控制国家重点实验室博士研
究生.主要研究方向为社会计算,智能交
通系统,深度学习和强化学习.
E-mail: linyilun2014@ia.ac.cn
(LIN Yi-Lun Ph. D. candidate at
The State Key Laboratory of Manage-
ment and Control for Complex Systems, Institute of Au-
tomation, Chinese Academy of Sciences. His research in-
terest covers social computing, intelligent transportation
systems, deep learning and reinforcement learning.)
郑心湖 明尼苏达大学计算机科学与工
程学院研究生. 研究方向为社会计
,机器学习,数据分析.
E-mail: zheng473@umn.edu
(ZHENG Xin-Hu Postgraduate in
the Department of Computer Science
and Engineering, University of Min-
nesota, USA. His research interest covers social computing,
machine learning, and data analytics.)
王飞跃 中国科学院自动化研究所复杂
系统管理与控制国家重点实验室研究员.
国防科学技术大学军事计算实验与平行
系统技术研究中心主任.主要研究方向
为智能系统和复杂系统的建模、分析与
控制.本文通信作者.
E-mail: feiyue.wang@ia.ac.cn
(WANG Fei-Yue Professor at The State Key Labo-
ratory of Management and Control for Complex Systems,
Institute of Automation, Chinese Academy of Sciences. Di-
rector of the Research Center for Computational Experi-
ments and Parallel Systems Technology, National Univer-
sity of Defense Technology. His research interest covers
modeling, analysis, and control of intelligent systems and
complex systems. Corresponding author of this paper.)
... This paper reconstructs the image deblurring as a sample conversion between the blurred image domain and the clear image domain. This paper proposes a model named AS-CycleGAN which combines CGAN and image translation based on Generative Adversarial Networks (GAN) [10,11]. To realize image conversion, AS-CycleGAN adopts WGAN-GP [12,13] and two symmetric CGAN based on CycleGAN [11]. ...
... This paper proposes a model named AS-CycleGAN which combines CGAN and image translation based on Generative Adversarial Networks (GAN) [10,11]. To realize image conversion, AS-CycleGAN adopts WGAN-GP [12,13] and two symmetric CGAN based on CycleGAN [11]. Meanwhile, the conversion of images between the blurred domain and clear domain by using a deep residual network structure. ...
... Experiments show that the fusion method can change the shape and features of the input image and generate a new image while retaining the main content of the input image. Wang et al. summarized the image super-resolution technology based on deep learning and divided it into three types: supervised, unsupervised and specific application fields, and provided systematic super-resolution theory and practical methods [13]. Lei et al. proposed a framework that can be used to decipher passwords, so that GAN can be applied to decipher passwords [14]. ...
Article
Full-text available
With the generation of images, videos, and other data, how to identify the gait of the action in the video has gradually become the focus of research. Aiming at the problems of complex and changeable movements, strong coherence, and serious occlusion in dance video images, this paper proposes a dynamic recognition model of gait contour of dance movements based on GAN (generative adversarial networks). GAN method is used to convert the gait diagrams in any state into a group of gait diagrams in normal state with multiple angles, which are arranged in turn. In order to retain as much original feature information as possible, multiple loss strategy is adopted to optimize the network, increase the distance between classes, and reduce the distance within classes. Experimental results show that the average recognition rates of this model at 50°, 90°, and 120°are 93.24, 98.24, and 97.93, respectively, which shows that the recognition accuracy of dance movement recognition method is high. And this method can effectively improve the dynamic recognition of gait contour of dance movements.
... Recently, the methods based on generative adversarial networks (GAN) [24] have achieved great success [25][26][27][28]. Among the several GAN, the cycle-consistent adversarial network (CycleGAN) [29] has received a lot of attention. ...
Article
Full-text available
Foggy weather can cause such problems as blurred image information and the loss of image details, which may pose great challenges to road traffic target detection based on images and videos. In this study, we propose a domain‐adaptive road vehicle target detection method to implement domain adaptation for the real foggy scene. We firstly constructed a highway vehicle detection dataset with foggy images (HVFD), which contains normal weather images and foggy images and provides a complete data support for vehicle detection based on computer vision. Secondly, by improving CycleGAN we designed an improved generative confrontation network (CPGAN), which realised the style transfer between foggy images and normal weather images. Finally, we formulated a YOLOv4 target detection framework according to the domain adaptation based on the pre‐trained YOLOv4 fog vehicle detection model. The experimental results show that the method we put forward can effectively improve vehicle detection performance and reduce the work of manually labelling a large number of foggy image tags, which has a strong generalisation ability for computer vision‐based applications in low‐visibility weather.
... e generative adversarial network (GAN) was proposed by Goodfellow, which uses convolutional neural networks to train image samples [10,11]. As a probability generation model, the generation against the network has been applied to many visual tasks, especially in the excellent performance of the image generation direction. ...
Article
Full-text available
This paper proposes a self-adjusting generative confrontation network image denoising algorithm. The algorithm combines noise reduction and the adaptive learning GAN model. First, the algorithm uses image features to preprocess the image and extract the effective information of the image. Then, the edge signal is classified according to the threshold value to suppress the problem of “excessive strangulation,” and then the edge signal of the image is extracted to enhance the effective signal in the high-frequency signal. Finally, the algorithm uses an adaptive learning GAN model to further train the image. Each iteration of the generator network is composed of three stages. And then, we get the best value. Through experiments, it can be seen from the data that the article algorithm is compared with the traditional algorithm and the literature algorithm. Under the same conditions, the algorithm can ensure the operating efficiency while having better fidelity, and it can still denoise at the same time. The edge signal of the image is preserved and has a better visual effect.
... We use the idea of Generative Adversarial Networks [29] to improve the segmentation performance. As shown in Fig. 1, it is an overview of the proposed network with GAN structure. ...
Article
Full-text available
Iris segmentation plays a vital role in the iris recognition system. However, it faces many challenges in non-ideal situations. To improve the iris segmentation performance for possible mobile devices, this paper presents a light iris segmentation method based on fully convolutional network. Firstly, a lightweight fully convolutional iris segmentation network is developed. Secondly, we adopt weighted loss, multi-level feature dense fusion module, multi-supervised training of multi-scale image and generative adversarial network to improve the segmentation performance. The final model is 6.21 M. Experiments show that the proposed method achieves 99.30% PA, 95.35% mIoU on UBIRIS.v2 and 99.66% PA, 96.75% mIoU on CASIA-Iris-Thousand database, which is relatively encouraging for a light iris segmentation network. It takes 41.56 ms and 63.03 ms to segment an image of UBIRIS.v2 and CASIA-Iris-Thousand databases, respectively.
Article
With the expansion of people’s needs, the translation performance of traditional models is increasingly unable to meet current demands. This article mainly studied the Transformer model. First, the structure and principle of the Transformer model were briefly introduced. Then, the model was improved by a generative adversarial network (GAN) to improve the translation effect of the model. Finally, experiments were carried out on the linguistic data consortium (LDC) dataset. It was found that the average Bilingual Evaluation Understudy (BLEU) value of the improved Transformer model improved by 0.49, and the average perplexity value reduced by 10.06 compared with the Transformer model, but the computation speed was not greatly affected. The translation results of the two example sentences showed that the translation of the improved Transformer model was closer to the results of human translation. The experimental results verify that the improved Transformer model can improve the translation quality and be further promoted and applied in practice to further improve the English translation and meet application needs in real life.
Article
Full-text available
This paper focuses on the difficulties that appear when the number of fault samples collected by a permanent magnet synchronous motor is too low and seriously unbalanced compared with the normal data. In order to effectively extract the fault characteristics of the motor and provide the basis for the subsequent fault mechanism and diagnosis method research, a permanent magnet synchronous motor fault feature extraction method based on variational auto-encoder (VAE) and improved generative adversarial network (GAN) is proposed in this paper. The VAE is used to extract fault features, combined with the GAN to extended data samples, and the two-dimensional features are extracted by means of mean and variance for visual analysis to measure the classification effect of the model on the features. Experimental results show that the method has good classification and generation capabilities to effectively extract the fault features of the motor and its accuracy is as high as 98.26%.
Article
Full-text available
In this paper, we propose a framework to incorporate robotics and software-defined surrogates using the ACP-based parallel systems theory. The framework offers a flexible, cost-effective and safe platform to develop and conduct experiments on UAVs, UGVs, USVs and AUVs, and links unmanned vehicles with cyber-physical-social systems (CPSS). This paper focuses on the structure of the proposed framework and each of the functional modules. Relevant tools, as well as further applications and challenges of the proposed system are also discussed.
Article
Full-text available
This paper presents a novel loss-sensitive generative adversarial net (LS-GAN). Compared with the classic GAN that uses a dyadic classification of real and generated samples to train the discriminator, we learn a loss function that can generate samples with the constraint that a real example should have a smaller loss than a generated sample. This results in a novel paradigm of loss-sensitive GAN (LS-GAN), as well as a conditional derivative that can generate samples satisfying specified conditions by properly defining a suitable loss function. The theoretical analysis shows that the LS-GAN can generate samples following the true data density we wish to estimate. In particular, we focus on a large family of Lipschitz densities for the underlying data distribution, allowing us to use a class of Lipschitz losses and generators to model the LS-GAN. This relaxes the assumption on the classic GANs that the model should have infinite modeling capacity to obtain the similar theoretical guarantee. This provides a principled way to regularize a family of deep generative models with the proposed LS-GAN criterion, preventing them from being overfitted to duplicate few training examples. Furthermore, we derive a non-parametric solution that characterizes the upper and lower bounds of the losses learned by the LS-GAN. We conduct experiments to evaluate the proposed LS-GAN on classification and generation tasks, and demonstrate the competitive performances as compared with the other state-of-the-art models.
Conference Paper
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.7 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a strong phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which beats the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Article
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Conference Paper
Cascade regression framework has been successfully applied to facial landmark detection and achieves state-of-theart performance recently. It requires large number of facial images with labeled landmarks for training regression models. We propose to use cascade regression framework to detect eye center by capturing its contextual and shape information of other related eye landmarks. While for eye detection, it is timeconsuming to collect large scale training data and it also can be unreliable for accurate manual annotation of eye related landmarks. In addition, it is difficult to collect enough training data to cover various illuminations, subjects with different head poses and gaze directions. To tackle this problem, we propose to learn cascade regression models from synthetic photorealistic data. In our proposed approach, eye region is coarsely localized by a facial landmark detection method first. Then we learn the cascade regression models iteratively to predict the eye shape updates based on local appearance and shape features. Experimental results on benchmark databases such as BioID and GI4E show that our proposed cascade regression models learned from synthetic data can accurately localize the eye center. Comparisons with existing methods also demonstrates our proposed framework can achieve preferable performance against state-of-the-art methods.
Article
Eye detection and eye state (close/open) estimation are important for a wide range of applications, including iris recognition, visual interaction and driver fatigue detection. Current work typically performs eye detection first, followed by eye state estimation by a separate classifier. Such an approach fails to capture the interactions between eye location and its state. In this paper, we propose a method for simultaneous eye detection and eye state estimation. Based on a cascade regression framework, our method iteratively estimates the location of the eye and the probability of the eye being occluded by eyelid. At each iteration of cascaded regression, image features from the eye center as well as contextual image features from eyelid and eye corners are jointly used to estimate the eye position and openness probability. Using the eye openness probability, the most likely eye state can be estimated. Since it requires large number of facial images with labeled eye related landmarks, we propose to combine the real and synthetic images for training. It further improves the performance by utilizing this learning-by-synthesis method. Evaluations of our method on benchmark databases such as BioID and Gi4E database as well as on real world driving videos demonstrate its superior performance comparing to state-of-the-art methods for both eye detection and eye state estimation.
Article
Deep reinforcement learning is a focus research area in artificial intelligence. The principle of optimality in dynamic programming is a key to the success of reinforcement learning methods. The principle of adaptive dynamic programming U+0028 ADP U+0029 is first presented instead of direct dynamic programming U+0028 DP U+0029, and the inherent relationship between ADP and deep reinforcement learning is developed. Next, analytics intelligence, as the necessary requirement, for the real reinforcement learning, is discussed. Finally, the principle of the parallel dynamic programming, which integrates dynamic programming and analytics intelligence, is presented as the future computational intelligence.