ThesisPDF Available

Using Generative Adversarial Networks for Content Generation in Games

Authors:

Abstract

Building game worlds for the players to explore can be a particularly time-consuming activity, especially for game designers. Editing tools in game engines can heavily improve this process by allowing users to edit one or more identical level components in a way that usually resembles Microsoft Paint’s tool system; nonetheless, these methods still require a lot of input, and designers oftentimes need to care about a multitude of small details when creating levels and game worlds. While there is undoubtedly a certain positive value in manually crafted details, sometimes these small minutiae replicate patterns seen elsewhere. A question arises: what if we could encapsulate these patterns into replicable features? Until now, such attempts at replication have been done through standard procedural content generation techniques. But technological advancements made in the field of machine learning now allow us to generate content by unprecedented means. The aim of this dissertation is to prove, using a generative adversarial network, if is it possible to create game worlds using a semantic tool-brush, which would allow users to paint a map with colours that represent entities such as mountains, hills, rivers, etc., and generate a 3D world accordingly. In order to achieve such a result, a open-source GAN model by Nvidia called SPADE will be trained on game data, in order to convert a 2D image to a 3D game world, in a similar fashion to Nvidia’s GauGAN model.
Abertay University
School of Design and Informatics
Using Generative Adversarial Networks
for Content Generation in Games
Dissertation
Submitted in partial fulfillment of the requirements for the degree of
MSc in Computer Games Technology
August 2020
Ridam S.S. Rahman
Abstract
Building game worlds for the players to explore can be a particularly time-consuming
activity, especially for game designers. Editing tools in game engines can heavily
improve this process by allowing users to edit one or more identical level
components in a way that usually resembles Microsoft Paint’s tool system;
nonetheless, these methods still require a lot of input, and designers oftentimes need
to care about a multitude of small details when creating levels and game worlds.
While there is undoubtedly a certain positive value in manually crafted details,
sometimes these small minutiae replicate patterns seen elsewhere. A question
arises: what if we could encapsulate these patterns into replicable features? Until   
now, such attempts at replication have been done through standard procedural
content generation techniques. But technological advancements made in the field of
machine learning now allow us to generate content by unprecedented means.
The aim of this dissertation is to prove, using a generative adversarial network, if is it
possible to create game worlds using a semantic tool-brush, which would allow
users to paint a map with colours that represent entities such as mountains, hills,
rivers, etc., and generate a 3D world accordingly. In order to achieve such a result, a      
open-source GAN model by Nvidia called SPADE will be trained on game data, in
order to convert a 2D image to a 3D game world, in a similar fashion to Nvidia’s
GauGAN model.



Contents
1. Introduction
1.1. Problem Definition
1.2. Aims and Objectives
1.3. Project Overview
2. Background and Literature Review
2.1. Overview on Procedural Content Generation
2.2. Image Synthesis using GANs
2.2.1. Random Generation
2.2.2. Paired Image to Image Translation
2.2.3. Unpaired Image to Image Translation
2.2.4. Image Synthesis using Text
2.3. Music Generation
2.4. Terrain Generation using GANs
3. Requirements Specification and Design
3.1. Training Data Set
3.1.1. Requirements
3.1.2. Acquiring the Data Set
3.1.3. Improving the Data Set
3.2. Compute Power
3.2.1. Limitations
3.2.2. Possibilities
4. Implementation
4.1. SPADE
5. Evaluation
5.1. Frechét Inception Distance
5.2. Summary of Results
6. Conclusion
6.1. Future work
References

Chapter 1
Introduction
Creating fictional worlds is an activity mankind has been doing for ages. If we don’t
limit ourselves to the context of strictly computer-generated content, we could
definitely broaden our set of virtual worlds to something way larger, full of information
that has been around for more time than anyone can remember. Human folklore has
always been filled with imaginary worlds that mimic reality in one way or another.
Depictions of the Garden of Eden almost always end up representing a nice looking   
green forest, for instance, except for a few exceptional cases.
i.1 [On the left: Lucas Cranach the Elder, The Garden of Eden (1530)]
i.2 [On the right: Peter Paul Rubens and Jan Brueghel the Elder, The garden of Eden with the fall of man (1615)]
i.3 [On the left: Thomas Cole, The Garden of Eden (1828)]
i.4 [On the right: Hieronymus Bosch, The Garden of Earthly Delights (1503-1515)]
While some would assume these similarities are due to a lack of imagination, an
interesting implication is that we just re elaborate certain concepts and ideas that we
grow up with. We subconsciously build our imaginary worlds around certain tropes, or
features, that we absorb through the environments we live in and our cultural
backgrounds. It is pretty much the reason why the fantasy genre in western countries    
has become a reiteration of certain tropes seen in J.R.R. Tolkien’s Lord of the Rings.
From a certain perspective, without even realizing, humans do something similar to
what happens in the context of machine learning: we see, we learn, and we elaborate
outputs based on the inputs we have received.
Before our modern-day computers, all we could do to describe and depict imaginary
worlds was to rely on the talent of writers and artists. Although the results of
handcrafted work can be stunning, it still remains a really tedious process that can
take a lot of time. But ever since the power of automata has become part of our lives,
we managed to figure out new ways to create virtual worlds using technology. Among
the many, one that clearly comes to mind is procedural content generation. The idea at
the core of the concept is more or less to replicate patterns. According to cognitive      
neuroscience, humans have a knack for pattern recognition, in our own way. We even
have a discreet tendency to pareidolia, a brain mechanism that can lead people to see
dinosaur-shaped clouds, for example. Knowing all of this, it shouldn’t come as a
surprise then that humans are good at coming up with techniques to algorithmically
generate pseudo-randomized patterns. These methods have been around for years,
with impressive results that can easily be seen in different kinds of entertainment
media, from movies to video games; but although it is true that there are already
techniques that can mimic natural features based on patterns, the implementation
process usually requires establishing a set of rules for content generation, which may
not be immediate, and still requires a lot of input from the designers. In layman terms,
we need to figure out the rules that shape our world, before being able to create a
world.
Technological advancements achieved by mankind over the course of the last decade
however have opened up a whole new set of possibilities for content generation using
machine learning. What if the patterns we try to replicate were extrapolated by a
neural network? What if we could turn these patterns into feature vectors? At the end,
any information regarding virtual worlds can be represented as a data distribution.
Patterns and features can be automatically gathered during the training process from
the content itself we’re trying to synthesize. The real issue though is gathering such
training data.
1.1. Problem Definition
Most common game editing tools allow the users to create entire virtual worlds starting
from simple blocks. Sometimes these tools are even integrated in the game itself:
creative players can build extremely rich and diverse environments from scratch just by
using these in-game editors, although they are the tip of the iceberg compared to all the
tools available nowadays for game world editing, including third-party software, or
actual proper game engines.
i.5 [On the left, screenshot from a world generated using Minecraft’s Creative Mode, credits to Mojang Studios]
i.6 [On the right, screenshot from Dreams, credits to Media Molecule]
i.7 [On the left, screenshot from Little Big Planet’s Create Mode, credits to Media Molecule]
i.8 [On the right, screenshot from Mario Maker, credits to Nintendo]
Games such as Minecraft (Mojang Studios, 2009), Dreams (Media Molecule, 2020),
Little Big Planet (Media Molecule, 2008), or Mario Maker (Nintendo, 2015) allow players
to create the game world by playing the game itself. While being able to create game
content has become an in-game feature for some commercially successful games,
turning an often tedious process for developers into an enjoyable experience for players,
one thing that hasn’t changed is the effort that has to be put in the process of turning
one’s imagination into a game level. Aforementioned third party software, such as
MCEdit (MCEdit.net, 2016) for Minecraft, become quite useful in these cases, as tools
like this can dramatically expand the ability to create worlds: the latter tool for example
allows to modify a Minecraft save file, helping users to reshape the game world in ways
that are more complex than what can be made by merely using the classic in-game
editor in the creative mode. The amount of features these tools offer could almost make
them considerable as low-level game engines.
i.9 [Screenshot from MCEdit, credits to MCEdit.net]
As complex as they can be nonetheless, these tools still present certain limitations: for
instance, although entertaining, designing and building game worlds can still be a tiring
experience, as current world editing tools allow at most to create worlds only by
interacting with individual sets of game components in a way that is similar to Microsoft
Paint, a method that requires a lot of interaction from the user.
Is it possible to build environments using a more efficient method? Is there some way to
design worlds using artificial intelligence, without losing the intrinsic complexity that is
derived by the time and work that is spent on more conventional world building
techniques? To answer such questions, it is opportune to first focus on the technological
advancements that have been made in the field of machine learning, particularly for
what concerns content generation.
Generative Adversarial Networks, or GANs for short, have recently acquired the
spotlight as a machine learning technique (Goodfellow I. et al., 2014), due to the
amazing possibilities they offer. In this machine learning model two neural networks
compete against each other in order to make one of the networks become good at
“fooling” the other one, by acquiring the ability to replicate any data distribution. In other
words, given a proper training set a GAN can imitate or actually create any kind of
content, from music to art.
Ever since their inception, GANs have been used for a wide variety of purposes: the
fashion industry for example has benefited from GAN-made imaginary models, thus
avoiding all the expenses associated with hiring models, photographers, makeup artists,
and so forth (Wong, C., 2019); GANs have been used also to make portraits,
landscapes, and high-fidelity replication of art-styles using image style transfer. Using
the latter technology anyone can become an artist, and turn any picture into something
that looks as if it was made by Van Gogh, or Gauguin (Gatys et al., 2015). Ethical
concerns have also been raised as GANs have been recently combined with
autoencoders to improve Deepfakes, synthetic media where typically a person in an
existing image or video is replaced with someone else's likeness.
GANs offer limitless possibilities, even within the scope of games: in this context, one of
the most popular GAN uses comes from upscaling 2D game content, such as textures,
in order to obtain a better version of them with a higher resolution (Wang et al., 2015).
AI upscaling has also been used on a commercial level on games like Resident Evil.
Last but not least, there’s one recent GAN model that is extremely relevant for the
potential it offers: GauGAN, a trained model created by NVidia.
i.10 [Screenshot from GauGAN, credits to Nvidia]
This software features a semantic tool-brush (Park T. et al., 2019), a particular
two-dimensional brush that can represent semantic references as colours. Using
GauGAN users can paint rocky pinnacles using a “Mountain brush”, or similarly use a
“Sea brush” or a “Lake brush” to paint bodies of water on a realistic landscape.
The idea proposed here is that it is possible to train a model like GauGAN on a training
set based on game data, and possibly even develop a tool that allows to create 3D
worlds by merely drawing a 2D representation of it.
1.2. Aims and Objectives
The aim of this dissertation can be summarized quite easily in a few steps.
The first objective is to analyze thoroughly the technology behind content generation,
with particular interest towards the context of machine learning, in order to gather a full
understanding of what can be done and what are the limits of content generation, at its
current state. Although the primary focus of this dissertation will be put into analyzing
the technology behind Nvidia’s GauGAN, and how to generate images using semantic
representations, insight shall be gathered on other techniques as well, to avoid having a
narrowed view on the topic, and to evaluate the possibility of game content generation
using different approaches.
After a brief overview on such techniques, semantic-image-to-photo translation shall be
explained in more detail. In particular, requirements shall be analyzed, to explore the
possibility of generating game content using SPADE, a specific generative adversarial
network model used by Nvidia to create GauGAN. If such requirements can be met, a
SPADE model shall be trained, in order to generate a GAN model that can process
semantic inputs in order to generate game data.
Once the training session is over, the results generated by the model shall be assessed
using a quantitative score system, known as the Frechét Inception Distance. If such
results will prove valid according to the score, the experiment shall be considered
successful.
1.3. Project Overview
The project’s most critical sections can be split into four parts. The first is understanding
content generation using GANs, with particular focus on semantic-image-to-photo
translation. This part will be covered in chapter 2, Background and Literature Review.
The second part of the project involves acquiring data that is suitable for training a
model based on such technology. This section will be covered in chapter 3,
Requirements Specification and Design.
The third part shall be mostly related to the training session. This section will be covered
in chapter 4, Implementation.
Last but not least, the fourth part of this project will be centered around the evaluation
of the results obtained from the training, and what are the possibilities, given the
results. This last section shall be covered in chapter 5, Evaluation.
Chapter 2
Background and Literature Review
In order to understand how to use a GAN model for generating game content, it is first
necessary to get a more in-depth insight on how artificial content generation works and
how it has been used in the context of neural networks, and within the game industry.
Before diving into that though, a quick overview shall be given into the context of
standard procedural content generation.
2.1. Overview on Procedural Content Generation
Many games nowadays rely in one way or another on some sort of algorithm-based
procedural content generation technique in order to generate the game world. Some of
these techniques are usually used to generate terrains and trees, for instance, although
there’s no specific limitation to what can be created, which can even include planets and
monsters as well. Another interesting use of procedural content generation techniques
is to create dungeons, particularly for the game genre known as rogue-likes.
Although implementation-wise all these methods are quite different, they have
something in common: they’re all based on a rule system, and the general objective is to
replicate patterns in an apparently random way, in order to simulate our perception of
nature as something discreetly chaotic. Some of these implementations do indeed
include a certain degree of randomness, but usually this is kept to a minimum.
Pseudo-randomness in procedurally generated game worlds is usually achieved by
relying on seeds, a bunch of static numbers that are used in a fashion that is quite
similar to how Perlin noise is made. The seed is meant to be a consistent value between
playthroughs in a game world originated by such seed. If the seed and the rules used for
content generation are known, it is technically possible to predict what is going to be
generated. Although powerful, the problem with content generated using techniques
such as Perlin Noise is the little control designers have over the generated content.
Developers can change the rules for content generation, but in order to be able to edit
the game world at one’s heart content, classic editing tools are more preferable.
2.2. Image Synthesis using GANs
The most famous example of content generation in common culture is image synthesis,
the creation of images using computer algorithms. The ability of machines to generate
pictures has been improving way beyond expectations over the course of the last
decades. Such improvements have led researchers to analyse the techniques used to
generate these images, thus allowing a more precise classification system that will be
used here to explain the separate approaches to image synthesis through machine
learning. According to “ Image Synthesis Using Machine Learning Techniques” (Gupta P.
et al., 2019), there are four generation techniques.
i.11 [Classification diagram, taken from Image Synthesis Using Machine Learning Techniques, 2019]
2.2.1. Random Generation
The synthesis of random images of a particular class. A random image generator
that is trained using a set of pictures of real faces will synthesize realistic images
of new faces previously unseen by the generator. The major limitation of this
technique is that a large training set is required. In the original paper by Ian
Goodfellow, et al. “Generative Adversarial Networks” (2014), GANs were used to
generate new realistic examples for the MNIST handwritten digit dataset, the
CIFAR-10 small object photograph dataset, and the Toronto Face Database.
i.12 [Examples of GANs used to generate new plausible examples for image datasets. Taken from Generative
Adversarial Networks, 2014.]
In “Unsupervised Representation Learning with Deep Convolutional Generative
Adversarial Networks” (Alec Radford, et al., 2015), a model called DCGAN is
used to demonstrate how to train stable GANs at scale in order to generate
examples of bedrooms.
i.13 [Example of GAN-Generated Photographs of Bedrooms.Taken from Unsupervised Representation Learning
with Deep Convolutional Generative Adversarial Networks, 2015.]
A notable feat accomplished by the DCGAN is the ability to perform vector
arithmetic in the latent space between two inputs.
i.14 [Example of Vector Arithmetic for GAN-Generated Faces.Taken from Unsupervised Representation Learning
with Deep Convolutional Generative Adversarial Networks, 2015.]
In “Progressive Growing of GANs for Improved Quality, Stability, and Variation”
(Tero Karras, et al., 2017), it is demonstrated that the generation of realistic
photographs of human faces can be achieved. The model was trained using the
physical appearance of celebrities, meaning that there are features from existing
well-known personalities in the generated faces, making them seem oddly
familiar within a certain degree.
i.15 [Examples of Photorealistic GAN-Generated Faces.Taken from Progressive Growing of GANs for Improved
Quality, Stability, and Variation, 2017.]
DCGAN has also been used to generate objects and scenes.
i.16 [Example of Photorealistic GAN-Generated Objects and ScenesTaken from Progressive Growing of GANs for
Improved Quality, Stability, and Variation, 2017.]
In “The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and
Mitigation” (M. Brundage et al., 2018), the results achieved by DCGAN are
discussed in order to showcase the rapid progress of GANs from 2014 to 2017.
i.17 [Example of the Progression in the Capabilities of GANs from 2014 to 2017.Taken from The Malicious Use of        
Artificial Intelligence: Forecasting, Prevention, and Mitigation, 2018.]
In “Large Scale GAN Training for High Fidelity Natural Image Synthesis” (Andrew
Brock, et al., 2018), another model called BigGAN shows results that recall
realistic photographs.
i.18 [Example of Realistic Synthetic Photographs Generated with BigGAN. Taken from Large Scale GAN Training
for High Fidelity Natural Image Synthesis, 2018.]
In “Towards the Automatic Anime Characters Creation with Generative
Adversarial Networks” (Yanghua Jin, et al., 2017) it is also shown that GANs can
be also used to generate faces of anime characters (i.e. Japanese comic book
characters).
i.19 [Example of GAN-Generated Anime Character Faces.Taken from Towards the Automatic Anime Characters    
Creation with Generative Adversarial Networks, 2017.]
2.2.2. Paired Image to Image Translation
Used for synthesizing an image that belongs to a certain category or set using an
image of another category or set when paired images belonging to both sets are
available. In “Image-to-Image Translation with Conditional Adversarial
Networks” (Phillip Isola, et al., 2016) it is demonstrated that GANs can also be
used for Image-To-Image Translation. Examples include tasks such as the
translation of:
Semantic images to photographs of cityscapes and buildings.
Satellite photographs to Google Maps.
Photos from day to night.
Black and white photographs to color.
Sketches to color photographs.
i.20 [Example of Photographs of Daytime Cityscapes to Nighttime With pix2pix. Taken from Image-to-Image      
Translation with Conditional Adversarial Networks, 2016.]
i.21 [Example of Sketches to Color Photographs With pix2pix. Taken from Image-to-Image Translation with
Conditional Adversarial Networks, 2016..]
In “Beyond Face Rotation: Global and Local Perception GAN for Photorealistic
and Identity Preserving Frontal View Synthesis” (Rui Huang, et al., 2017) GANs
are used to generate frontal-view photographs of human faces given
photographs taken at an angle.
i.22 [Example of GAN-based Face Frontal View Photo GenerationTaken from Beyond Face Rotation: Global and
Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis, 2017.]
In “Pose Guided Person Image Generation” (Liqian Ma, et al., 2017) photographs
of human models striking new poses are generated.
i.23 [Example of GAN-Generated Photographs of Human PosesTaken from Pose Guided Person Image
Generation, 2017.]
In “Unsupervised Cross-Domain Image Generation” (Yaniv Taigman, et al., 2016)
a GAN model is used to translate images from one domain to another, including
from street numbers to MNIST handwritten digits, and from photographs of
celebrities to what they call emojis or small cartoon faces.
i.24 [Example of Celebrity Photographs and GAN-Generated Emojis.Taken from Unsupervised Cross-Domain
Image Generation, 2016.]
In “Invertible Conditional GANs For Image Editing” (Guim Perarnau, et al., 2016)
a model called IcGAN is used to reconstruct photographs of faces with specific
features, such as changes in hair color, style, facial expression, and even gender.
i.25 [Example of Face Photo Editing with IcGAN.Taken from Invertible Conditional GANs For Image Editing, 2016.]
In “Coupled Generative Adversarial Networks” (Ming-Yu Liu, et al., 2016) the
generation of faces with specific properties such as hair color, facial expression,
and glasses is explored. Images generated with varied color and depth are also
generated.
i.26 [Example of GANs used to Generate Faces With and Without Blond Hair.Taken from Coupled Generative
Adversarial Networks, 2016.]
In “Neural Photo Editing with Introspective Adversarial Networks” (Andrew
Brock, et al., 2016) a face photo editor is presented using a hybrid of variational
autoencoders and GANs. The editor allows rapid realistic modification of human
faces including changing hair color, hairstyles, facial expression, poses, and
adding facial hair.
i.27 [Example of GANs used to Generate Faces With and Without Blond Hair.Taken from Coupled Generative
Adversarial Networks, 2016.]
In “Image De-raining Using a Conditional Generative Adversarial Network” (He
Zhang, et al., 2017) GANs are used for image editing, including examples such as
removing rain and snow from photographs.
i.28 [Example of Using a GAN to Remove Rain From PhotographsTaken from Image De-raining Using a
Conditional Generative Adversarial Network]
In “Face Aging With Conditional Generative Adversarial Networks” (Grigory
Antipov, et al., 2017) GANs are used to generate photographs of faces with
different apparent ages, from younger to older.
i.29 [Example of Photographs of Faces Generated With a GAN With Different Apparent Ages.Taken from Face
Aging With Conditional Generative Adversarial Networks, 2017.]
In “Age Progression/Regression by Conditional Adversarial Autoencoder” (Zhifei
Zhang et al., 2017) GANs are used to rejuvenate photographs of faces.
i.30 [Example of Using a GAN to Age Photographs of FacesTaken from Age Progression/Regression by
Conditional Adversarial Autoencoder, 2017.]
In “GP-GAN: Towards Realistic High-Resolution Image Blending” (Huikai Wu, et
al., 2017) it is demonstrated that GANs can be used to blend photographs,
specifically elements from different photographs such as fields, mountains, and
other large structures.
i.31 [Example of GAN-based Photograph Blending.Taken from GP-GAN: Towards Realistic High-Resolution
Image Blending, 2017.]
In “Photo-Realistic Single Image Super-Resolution Using a Generative
Adversarial Network” (Christian Ledig, et al., 2016) the SRGAN model is used to
generate output images with higher, sometimes much higher, pixel resolution.
i.32 [Example of GAN-Generated Images With Super Resolution. Taken from Photo-Realistic Single Image
Super-Resolution Using a Generative Adversarial Network, 2016.]
In “High-Quality Face Image SR Using Conditional Generative Adversarial
Networks” (Huang Bin, et al., 2017) GANs are used to create versions of
photographs of human faces.
i.33 [Example of High-Resolution Generated Human Faces. Taken from High-Quality Face Image SR Using
Conditional Generative Adversarial Networks, 2017.]
In “Analyzing Perception-Distortion Tradeoff using Enhanced Perceptual
Super-resolution Network” (Subeesh Vasu, et al., 2018) an example of GANs
creating high-resolution photographs is provided.
i.34 [Example of GAN-Generated Photograph Inpainting Using Context Encoders.Taken from Context Encoders:
Feature Learning by Inpainting describe the use of GANs, specifically Context Encoders, 2016.]
In “Semantic Image Inpainting with Deep Generative Models” (Raymond A. Yeh,
et al., 2016) use GANs to fill in and repair intentionally damaged photographs of
human faces.
i.35 [Example of GAN-based Inpainting of Photographs of Human FacesTaken from Semantic Image Inpainting    
with Deep Generative Models, 2016.]
In “Generative Face Completion” (Yijun Li, et al., 2017) GANs are used for
inpainting and reconstructing damaged photographs of human faces.
i.36 [Example of GAN Reconstructed Photographs of FacesTaken from Generative Face Completion, 2017.]
In “Pixel-Level Domain Transfer” (Donggeun Yoo, et al., 2016) it is demonstrated
that GANs can be used to generate photographs of clothing as may be seen in a
catalog or online store, based on photographs of models wearing the clothing.
i.37 [Example of Input Photographs and GAN-Generated Clothing Photographs. Taken from Pixel-Level Domain
Transfer, 2016.]
In “Generating Videos with Scene Dynamics” (Carl Vondrick, et al., 2016) GANs
are used for video prediction, specifically predicting up to a second of video
frames with success, mainly for static elements of the scene.
i.38 [Example of Video Frames Generated With a GAN. Taken from Generating Videos with Scene Dynamics,
2016.]
In “Learning a Probabilistic Latent Space of Object Shapes via 3D
Generative-Adversarial Modeling” (Jiajun Wu, et al., 2016) GANs are used to
generate new three-dimensional objects (e.g. 3D models) such as chairs, cars,
sofas, and tables.
i.39 [Example of GAN-Generated Three Dimensional Objects. Taken from Learning a Probabilistic Latent Space of
Object Shapes via 3D Generative-Adversarial Modeling]
In “3D Shape Induction from 2D Views of Multiple Objects” (Matheus Gadelha, et
al., 2016) GANs are used to generate three-dimensional models given
two-dimensional pictures of objects from multiple perspectives.
i.40 [Example of Three-Dimensional Reconstructions of a Chair From Two-Dimensional Images. Taken from 3D
Shape Induction from 2D Views of Multiple Objects, 2016.]
GauGAN, the model upon which we will be focusing on within this project in
order to train our own model for game content generation, belongs to a
subcategory of paired-image-to-image translation models based on semantic
inputs. In “High-Resolution Image Synthesis and Semantic Manipulation with
Conditional GANs” (Ting-Chun Wang, et al., 2017) photorealistic images
generated from semantic images are showcased as examples.
i.41 [Example of Semantic Image and GAN-Generated Cityscape Photograph. Taken from High-Resolution Image
Synthesis and Semantic Manipulation with Conditional GANs, 2017.]
2.2.3. Unpaired Image to Image Translation
Used when paired data belonging to the input set and the target set do not exist
for training. Images of both sets must be used for training but each input image
does not require a corresponding target image present in the training dataset.
In “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial
Networks” (Jun-Yan Zhu et al., 2017), the CycleGAN model performs an
impressive set of image-to-image translations.
Five image translation cases are shown below as an example:
Photograph to artistic painting style.
Horse to zebra.
Photograph from summer to winter.
Satellite photograph to Google Maps view.
Painting to photograph.
i.42 [Four Image-to-Image Translations performed with CycleGAN. Taken from Unpaired Image-to-Image
Translation using Cycle-Consistent Adversarial Networks, 2017.]
i.43 [Example of Translation from Paintings to Photographs With CycleGAN. Taken from Unpaired
Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.]
2.2.4. Image Synthesis using Text
Used to synthesize images using a description of the content of the image to be
synthetized in the form of text. Image synthesis using text models will produce
an image of a particular class that the model is trained for when provided a
detailed description of the image of that particular class. For example, a model
can be created for synthetizing images of birds using a detailed description of
the bird. Images of a particular class along with paired text description must be
provided for training. In “StackGAN: Text to Photo-realistic Image Synthesis with
Stacked Generative Adversarial Networks” (Han Zhang, et al., 2016) it is
demonstrated that GANs can be used to generate realistic looking photographs
from textual descriptions of simple objects like birds and flowers.
i.44 [Example of Textual Descriptions and GAN-Generated Photographs of Birds. Taken from StackGAN: Text to
Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks, 2016.]
“Generative Adversarial Text to Image Synthesis” (Scott Reed, et al., 2016) also
features an interesting example of text to image generation of small objects and
scenes including birds and flowers. In “TAC-GAN – Text Conditioned Auxiliary
Classifier Generative Adversarial Network“ (Ayushman Dash, et al., 2017),
another model is trained on the same dataset, producing similar results.
i.45 [Example of Textual Descriptions and GAN-Generated Photographs of Birds and Flowers. Taken from    
Generative Adversarial Text to Image Synthesis.]
In “Learning What and Where to Draw” (Scott Reed, et al., 2016), the capability
of GANs to generate images from text is expanded by using bounding boxes and
key points as hints as to where to draw a described object.
i.46 [Example of Photos of Object Generated From Text and Position Hints With a GAN.Taken from Learning    
What and Where to Draw, 2016.]
2.3. Music Generation
Although it does not strictly relate to the scope of this project, music generation is also
another interesting feat that has been achieved in the field of machine learning. A good
example of this can be seen by using MuseNet (Payne C., 2019), a neural network
trained to generate the next note in a sequence, given an input sequence. This is built on
a model called Sparse Transformer (Child R. et al., 2019), a model that can reconstruct
sequences using a concept known as “sparse attention”. Using the latter Sparse
Transformers can actually create not only music, but also images or any data set.
2.4. Terrain Generation using GANs
GANs have also been used for terrain generation. In “Interactive example-based terrain
authoring with conditional generative adversarial networks” (Guérin É. et al., 2017),
Conditional GANs have been used in order to generate terrains using sketches, similarly
to what is the aim of this project. Rivers, mountains and other entities can be drawn and
turned into a 3D terrain. In (Beckham C. et al., 2017) there is also another interesting
use of GANs generated height-mapping, using data from NASA’s Visible Earth project.
In the latter example, the input contains data from satellite images that capture Earth’s
morphology. These inputs are then processed by a DCGAN to generate height maps,
and Pix2Pix to generate textures.
i.47-50 [Examples of Terrains Generated With a GAN. Taken from Interactive example-based terrain authoring with
conditional generative adversarial networks, 2017.]
Chapter 3
Requirements Specification and Design
Based on “Image Synthesis Using Machine Learning Techniques” (Gupta P. et al., 2019)
and “Interactive example-based terrain authoring with conditional generative adversarial
networks” (Guérin É. et al., 2017) it is assumed that a model trained using SPADE may
show the results that feature the best inception score, and the best height-maps as it is
also a conditional GAN, while it can be assumed that Pix2PixHD generates better
textures than its predecessor, Pix2Pix. The research conducted in “A step towards
procedural terrain generation with GANs” (Beckham C. et al., 2017) may thus be adapted
to a more modern framework.
The first fundamental step that has to be taken in order to train SPADE is to obtain a
proper dataset to train on. In order to do so, the ideal plan is to obtain game world data
from a video game, such as Minecraft for example, and 2D images that can be mapped
into correspondent 3D landscapes.
While the optimal case would be to start with available game data, in the unlikely event
that the aforementioned condition is not fulfilled the scope of this research would pivot
towards acquiring data. This data could be acquired by capturing game information
from any game that is deemed suitable for terrain generation using GauGAN. Before
defining how to do so, it is important to establish some requirements, as not all games
could be suitable for the scope of this project.
To elaborate more, a suitable game should have level content that features entities that
can be mapped into semantic data. As a semantic tool-brush could generate any
content it is trained to generate, it is important to have entities that can be mapped
using semantic rules. In other words, it is required that the training data is labelled,
associating entities with data representations of them. This can be a taxing task, as
sometimes labelling data can be achieved exclusively by exploiting human effort; even
with automation tools for data scraping, gathering training data can be a heavily time
consuming experience.
Although this makes it harder to develop a general tool for games with specific unique
entities (like a game containing a fictional alien planet with particularly uncommon
features), a potential tool can still be really helpful for generating entities that are easier
to find in most games, like natural terrains. Once these fundamental requirements are
sorted out, training activities can be discussed in more detail. Using Minecraft as an
example, natural entities could be easily defined as block types belonging to a certain
kind of biome, while more challenging entities such as man-made artefacts like temples
and villages could be added later on, using conditional generation methods.
3.1. Training Data Set
The first step for training our model, as can be easily guessed, is to acquire suitable
training data out of a game. Luckily this has been kindly provided by Microsoft, using
game save data from fifty Minecraft worlds generated through project Malmo, a
particular research project centered around artificial intelligence. Specifically, a bot has
been sent to travel around each of these worlds, filling the missing chunks as it walked.
This raw data has then been processed through an open-source software called
Mapcrafter (Moritz Hilscher, 2018). Mapcrafter is a powerful tool that can generate
isometric zoomable views of the processed maps, navigable through a browser-ready
HTML file. As it parses through a save file in order to generate the aforementioned map,
it also creates a nested folder structure that contains images of sections from the map.
These are quite convenient for the training process, as they can indeed be fed to a GAN
as ground truth images.
Last but not least, Mapcrafter allows to tweak how the map is rendered by specifying
certain options, such as lighting, biome-rendering, shadow-rendering, or even
block-masking, and most importantly, custom-texturing. These options have been
carefully selected in order to generate the training set, alongside with the labelled set.
The means by which this feat has been achieved will be discussed in the next sections.
3.1.1. Requirements
As mentioned earlier, one of the most important things to do for semantic labeling is the
definition of semantic entities. In the case of GauGAN, such entities are things like
flowers, trees, mountains, rocks, stones, and so on and so forth.
Although Minecraft does indeed feature things that look like geomorphological features,
labeling them is not immediate, nor easy. There are no specific data structures that
define mountains or hills, for instance. However, there are certain elements that could
be clustered into semantic entities, like trees. Certain block types for example do appear
clustered together in specific biomes. But let us acquire some context first.
Blocks are building materials that can be used to create structures in Minecraft. They are
the very core of Minecraft’s gameplay, and they feature in all versions of Minecraft. They
can be crafted or can be naturally found in biomes, although it should be noted that
some blocks are exclusive to the Creative Mode. For the scope of this project, we shall
be limiting the block type labeling to the ones that belong to the Overworld. Blocks
belonging to the Nether or the End won’t be considered.
Biomes are areas with specific height, light levels, vegetation, and types of blocks that
could be easily compared to ecosystems we find in real life. Minecraft currently has 34
biomes, listed here quickly for reference, and categorized by temperature:
Snowy
These biomes are known for their inclusion of Snow and Ice.
Frozen River
Ice Plains
Ice Plains Spikes
Cold Beach
Cold Taiga
Cold Taiga (Mountainous)
Cold
These biomes are cold, but not cold enough to have snow everywhere.
Extreme Hills
Extreme Hills M
Taiga
Taiga M
Mega Taiga
Mega Spruce Taiga
Extreme Hills+
Extreme Hills+ M
Stone Beach
Lush
Lush biomes are warm and often contain Flowers.
Plains
Sunflower Plains
Forest
Flower Forest
Swamp
Swamp M
River
Beach
Jungle
Jungle M
Jungle Edge
Jungle Edge M
Birch Forest
Birch Forest M
Birch Forest Hills M
Roofed Forest
Roofed Forest M
Mushroom Island
Mushroom Island Shore
Dry
Dry Biomes are very hot, and rarely contain any moisture or any foliage.
Desert
Desert M
Savannah
Savannah M
Mesa
Mesa (Bryce)
Plateau
Plateau M
Neutral
These biomes are either completely filled with Water, or have several variants
that differ depending on their biome.
Ocean (Variants)
Hills (Variants)
As can be seen, biomes are not trivial entities to encode in labels. While such a unique
environmental diversity is indeed to appreciate, it comes with a fair share of problems,
when trying to establish semantic entities. Different biomes, in fact, require different
semantic labels. An entity like “grass” would require a different label for every different
hue of grass existing across biomes. The same thought process can be applied to
anything that looks different in a particular biome.
Although it is noted that a more complex labeling can doubtlessly be achieved, for the
scope of this project biome diversity has been excluded from the training requirements,
by using a rendering option in Mapcrafter that can turn off biome rendering.
Given the aforementioned issue and the fact that blocks are usually defined by a small
subset of associated textures, entities have thus been clustered using a different ad-hoc
structure, i.e by custom block clustering as follows:
i.51 [Semantic Entity Block Clustering used for training]
i.52 [Semantic Entity Block Clustering used for training]
i.53 [Semantic Entity Block Clustering used for training]
i.54 [Semantic Entity Block Clustering used for training]
i.55 [Semantic Entity Block Clustering used for training]
i.56 [Semantic Entity Block Clustering used for training]
The elements listed in the block section are texture references, matching the names of
the textures in Mapcrafter. This has been a fundamental step in order to achieve
labeling. The process will be described in more detail in the following section.
 
3.1.2. Acquiring the Data Set
In order to train a GauGAN-like model using SPADE, two kinds of inputs are required.
The first kind of input is a real image, the kind of final result we want to obtain, or
replicate. The second input, on the other hand, has to be a labeled version of the same
image. Getting this labeled image set is not trivial.
GauGAN for instance has been trained on a dataset known as COCOStuff, an open
source repository of labelled images. The means by which this has been achieved goes
across human labor and computer vision-based algorithms that can distinguish distinct
objects within a certain degree. As this was not available to us in the short span of time
in which the project was developed, alternative methods for labeling had to be sought.
As the first step, in order to label the dataset, all the maps have been processed through
Mapcrafter by using two custom Python scripts. These scripts generated a
configuration file, and executed a Mapcrafter command with the generated scripts for
each of the maps in the Minecraft game save data set. These generated configuration
templates are shown here for reference.
Mapcrafter original map generator sample configuration file:
output_dir = ..\output_original\***
[global:map]
world = world
render_view = isometric
render_mode = daylight
render_biomes = false
rotations = top-left
[world:world]
input_dir = ..\minecraft-java-raw-worlds\***
[map:map_world]
name = World
world = world
render_view = isometric
Although we won’t dive deep into how Mapcrafter works for the scope of this project, it
should be noted that further documentation is available at mapcrafter.org. Nonetheless,
the most relevant options that have been chosen to generate these files will be briefly
discussed here.
As can be seen in the original map generator sample configuration file, we have
specified certain options in the global map section related to view, mode, biomes, and
rotations.
Mapcrafter labeled map generator sample configuration file:
output_dir = ..\output_original\***
background_color = #000000
[global:map]
world = world
render_view = isometric
render_mode = plain
render_biomes = false
rotations = top-left
[world:world]
input_dir = ..\minecraft-java-raw-worlds\***
[map:map_world]
name = World
world = world
texture_dir = data\labeled_textures
lighting_intensity = 0.0
render_view = isometric
Here’s a quick rundown on the configuration options:
render_view = isometric | topdown
Default: isometric - This is the view that your world is rendered from. You
can choose from different render views:
isometric
A 3D isometric view looking at north-east, north-west, south-west or
south-east (depending on the rotation of the world).
topdown
A simple 2D top view. This view could be used to generate an alternative
training set. Its potential for training will be discussed in a later section.
render_mode = plain|daylight|nightlight|cave
Default: daylight - This is the render mode to use when rendering the world.
Possible render modes are:
plain
Plain render mode without lighting or other special effects. This was chosen
for the labeled set rendering. As the labeling doesn’t require any particular
kind of lighting effect, this was opted as an optimal choice.
daylight
Renders the world with lighting. This was selected to generate the real
images, with high quality shadows.
nightlight
Like daylight, but renders at night. This could be potentially used for training
to generate a night version of the same maps.
cave
Renders only caves and colors blocks depending on their height to make
them easier to recognize. This is not suitable for training, and it is hard to
label.
render_biomes = true|false
Default: true - This setting makes the renderer use the original biome colors
for blocks like grass and leaves. It was important to set this feature to false in the
original map configuration file, in order to exclude biome complexity. Although
the latter can be kept, it requires a deeper understanding of how textures are
processed in mapcrafter.
rotations = [top-left] [top-right] [bottom-right] [bottom-left]
Default: top-left - This is a list of directions to render the world from. You can
rotate the world by n*90 degrees. Later in the output file you can interactively
rotate your world. Possible values for this space-separated list are: top-left,
top-right, bottom-right, bottom-left. Top left means that north is on the top left
side on the map (same thing for other directions).
texture_dir = data\labeled_textures
This is the directory with the Minecraft Texture files. Labeled images have been
generated by modifying this value in the labeled map configuration file. The new
folder address contains a labelled version of the textures. Here’s a screenshot of
the content of that folder:
i.57 [Screenshot of the labeled texture folder content ]
The process to generate these textures is simple. Given the texture elements that
belong to a certain block, and given a labeled cluster of blocks to which the
aforementioned block belongs to, we choose a color to represent the label, and
we change every non empty pixel in the textures using a color processing script.
Due to a lack of time, this has been quickly achieved by using Photoshop and
Microsoft Paint, although it is also believed that the labeling process can be
improved by algorithmical means. Knowing what textures are associated with a
block type, some simple software tool could easily be made in order to select
certain blocks, cluster them into an arbitrary entity, choose the entity’s color, and
swap all selected textures’ colors with such semantic label color. Textures that
are not to be labeled have been swapped with a black texture, to avoid artifacts
from being visible in the map view.
i.58-59 [Full map images generated using the map processing script. The map in the top image shows the real images. The
map in the bottom shows labeled maps. ]
On each iteration of the map processing script a real map and an equivalent
labeled map are generated in a new directory. Once all the maps are generated, a
Python script processes the map containing folders in order to generate one
huge image out of each map, to obtain a full representation of the map in a single
image.
These large images are then processed through a third party software called
IrfanView to make them suitable for training, and to generate the dataset that
our GAN model, SPADE, will use to train.
IrfanView (Irfan Skiljan, 2020), is a powerful tool to process images, and it’s quite
convenient to quickly apply a huge variety of graphical effects on an image, or
most importantly, to do batch processing of images. This tool was indeed
fundamental in order to obtain the labeled images, as changing the textures in
Mapcrafter to render the labeled maps is not enough to obtain a properly labeled
GAN-ready dataset.
Mapcrafter, in fact, applies shadows to textures, and there’s no trivial way of
disabling this feature through the configuration options. Due to this, the labeled
images generated from Mapcrafter are like the one shown below.
i.60 [Unprocessed Labeled Map ]
This is quite problematic. Each color chosen for labeling ends accidentally
generating up to two extra hues of the original label color. As can be seen in the
image, there is one hue for each side that is illuminated. This makes the data
unsuitable for training, due to there being too many colors.

An efficient solution for labeling would indeed be to look through the code of
Mapcrafter (as it’s an open source project), find how shadows are generated, and
get rid of any line of code that renders the extra shadows. Although this option
has been evaluated, due to technical requirements and Mapcrafter’s complexity it
has been deemed inefficient to try to pursue this solution in a short amount of
time. Instead, the labeled full maps have been processed all together through
IrfanView, by using an option that allows image batch processing.
i.61 [On the left, File>Batch Conversion/Rename option, from
IrfanView]
i.62 [Above, File>Batch Conversion/Rename>Advanced
button, from IrfanView]
Although some attempts have been previously tried to create a script in order to
get rid of the extra colors efficiently by using OpenCV and nearest-neighbour
color processing algorithms, IrfanView revealed itself to be a more efficient
solution than expected. Why reinvent the wheel when the wheel already exists?
The batch processing option allows to select a set of images, and process them
using specific conditions. In our case, the only relevant condition that was
selected in order to generate a properly labeled image set was the “Replace
color” function.
i.63 [File>Batch Conversion/Rename>Advanced>Replace color>Settings button, from IrfanView]
The “Replace Color” function can not only replace one color with another, but
also replace multiple colors together by using a “tolerance value”, similarly to
what has been attempted by using the nearest-neighbour algorithm.
As only 13 labels have been chosen as semantic entities for the scope of this
project, this color swapping process has been done manually for each label
across all maps, but it should also be noted that the same process can be
achieved in a more efficient way. According to IrfanView’s documentation in fact,
custom scripts can be written in order to do batch processing, by using a .ini file.
Although this hasn’t been done for this project, it is suggested that gathering
further insight on how to process image batches using .ini files may lead to a
more efficient way to process images, especially if the number of labels is
greater.
i.64 [File>Batch Conversion/Rename>Advanced>Replace color>Settings>Replace Color window, from IrfanView]
Once the properly labeled images have been obtained, the large full map images
have been split again into squares using another option of IrfanView that allows
to turn images into tiles. This last processing bit has been run on both the
original maps and the labeled maps. The final result is a GAN-ready dataset,
perfectly suitable for training!
3.2. Improving the Data Set
The dataset that has been generated for the training is the one obtained through the
process described in the previous section. However some relevant information
concerning the labeled set generation and how to improve this is reported in this section
as well.
As can be seen in the Semantic Entity Block Clustering images (i.52-57), some blocks
belonging to multiple biomes have been clustered together under certain semantic
entities in the labeling process. Most shrubbery has been clustered under “Grass”, for
instance. It should be mentioned though that patches of tall grass, flowers and cacti are
technically small features shared across different biomes. This is not a trivial notion, as
this would cause labeling issues if the biome rendering feature was turned on. By
turning off such a feature, this problem of oddly clustered textures has been avoided,
but not solved as it could have been. While at first labeling all these textures seemed a
good idea, an issue came to light during the training process, as cacti belong to two
different clusters, the regular sand desert cluster and the red sand desert. As cacti were
included in the regular sand biome, a first naive attempt at generating labels produced
some odd looking artifacts in red desert biomes: cacti appeared to be these “regular
sand-labeled” objects, scattered across the red sand biome, even though cacti were part
of this biome as well. The reason why this wasn’t noticed at first is due to the fact that
the red sand biome is rare, and appears only in three maps, among the fifty provided by
Microsoft.
An efficient solution to this issue, as well as for anything that is shared across different
biomes, is to turn off the rendering of anything that is not needed, by using a Mapcrafter
configuration option.
block_mask = <block mask>
Default: show all blocks
With the block mask option it is possible to hide or show only specific blocks.
The block mask is a space separated list of block groups you want to hide/show.
If a ! precedes a block group, all blocks of this block group are hidden, otherwise
they are shown. Per default, all blocks are shown. Possible block groups are:
All blocks:
*
A single block (independent of block data):
[blockid]
A single block with specific block data:
[blockid]:[blockdata]
A range of blocks:
[blockid1]-[blockid2]
All blocks with a specific id and (block data & bitmask) ==
specified data:
[blockid]:[blockdata]b[bitmask]
For example:
Hide all blocks except blocks with id 1,7,8,9 or id 3 / data 2:
!* 1 3:2 7-9
Show all blocks except jungle wood and jungle leaves:
!17:3b3 !18:3b3
Jungle wood and jungle leaves have id 17 and 18 and use
data value 3 for first two bits (bitmask 3 = 0b11)
other bits are used otherwise -> ignore all those bits
As can be guessed, the block mask configuration option has a lot of potential, especially
if biome-enabled labeling is desired.
Last but not least, the set can be definitely expanded by rendering other rotations as
well using Mapcrafter, as the currently available real and labeled dataset is solely based
on the top-left view. This would increase the size of the image dataset up to four times
more, one per rotation (top right, top left, bottom right, bottom left). This entire set could
then be ulteriorly expanded by batch processing the image set and generating a
mirrored view of the same images.
The implication is that the dataset used for training for the scope of this project can be
expanded up to 16 times more. Although it is wished that training could’ve been
achieved with such a larger dataset, this was sadly not possible, due to limitations that
will be discussed in the next sections, 
3.2.1 Compute Power
This is a non-negotiable requirement for machine learning. Anybody who has ever spent
time training a neural network knows how terribly slow a training process can be, and
having computational power can definitely ease up the process. For the scope of this
project the accessible resources were a local machine, a remote server provided from
Abertay University, and 5000 credits of compute power on Microsoft Azure provided by
Microsoft, with a few usage limitations on the credits.
The local machine has the following specifications:
Operative System: Windows 10 Professional
CPU: AMD Ryzen 5 1600 Six-Core Processor, 3400 Mhz, 6 core, 
12 logical processors
Motherboard: B350 TOMAHAWK (MS-7A34)
GPU: NVIDIA GTX 1060 6G
RAM: 8 GB DDR4
Abertay’s remote machine, also known as Talisker, has been provided with the
following specifications:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
Stepping: 1
CPU MHz: 1297.800
CPU max MHz: 3100.0000
CPU min MHz: 1200.0000
BogoMIPS: 4390.24
Virtualisation: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0-9,20-29
NUMA node1 CPU(s): 10-19,30-39
The Azure Virtual Machines have been configured with these options.
Virtual machine
Computer name: VM-02
Operating system: Linux (ubuntu 18.04)
SKU: 1804
Publisher: microsoft-dsvm
VM generation: V1
Agent status: Ready
Agent version: 2.2.50
Host: None
Proximity placement group: N/A
Colocation status: N/A
Availability + scaling
Availability zone: N/A
Extensions:
DependencyAgentLinux
OMSAgentForLinux
Size
Standard NC24r_Promo
vCPUs: 24
RAM: 224 GiB
Each Mapcrafter map generation can run at best using 12 threads. If we multiply this by
the number of maps, we require at least 12*50 = 600 threads to generate the maps.
This value would have to be duplicated for each of the maps that are generated, since a
real map and a labeled map are generated for each one of the maps present in the
Minecraft game save dataset. The amount of possible concurrent threads rises to 1200.
Last but not least, an improved version of the dataset could be generated by rendering
all four isometric viewpoints. In order to generate this, the possible amount of
concurrent threads rises to 4800. If Mapcrafter is modded in order to generate the
labeled images as well, no further processing would be required. As at the current state
of the work such a feat has not been achieved yet, we still need to rely on further image
processing.
An image batch processing script needs to be run in order to adapt the images to a
properly labeled image. IrfanView can also be used to duplicate the size of the dataset,
and generate mirrored views.
3.2.2 Limitations
During the training process it has been noticed that Talisker takes less time to process
images, as it has a powerful GPU, but due to not having a large amount of available
GPUs, it can’t process too many images. The problem on Azure is the opposite.
Although more images can be processed, the computational power provided takes way
more time to process the images, due to the GPU being less powerful than the one on
Talisker.
3.2.3 Possibilities
Due to a lack of time more computational power couldn’t be provided, nor could there
be generated a larger dataset with the data generation tweaks discussed at the end of
the Compute Power section, as having time to process a dataset 16 times larger than
the one used for the scope of this project would’ve been unrealistic. Nonetheless, the
labeling process can also not only be improved, but the dataset can also be expanded in
other ways as well. Mapcrafter in fact has a dimension configuration option, beside a
biome rendering option.
dimension = nether|overworld|end
Default: overworld - You can specify with this option the dimension of the
world Mapcrafter should render. If you choose The Nether or The End,
Mapcrafter will automatically detect the corresponding region directory.
The “dimension” value has been ignored for the current training, letting it to be set to
“overworld” by default, but similarly to the “night” option for the rendering mode, it can
come quite in handy to expand the labeling set, in order to train the model to render
different kinds of maps. The Nether, in fact, features its own biomes, and although it is
less interesting visually, training could also be done on the End world, too.
Labeling Workflow Diagram
The diagram shown on the left
summarizes all the steps described in
the sections above. The code used in
all of the steps, including links to open
source repositories used for the scope
of this project can be found at the end
of this document, in the Appendix
section.
Although this workflow relies on
IrfanView to do image processing for
labeling, it is important to know that
such processes can be avoided if
Mapcrafter is tweaked to not render
shadows. Although achieving such a
result is not trivial, it is assured that it
can be achieved.
While technical limitations have
impeded to achieve what is believed to
be the most optimal technique to
obtain a labeled dataset from
Minecraft, the alternative solution that
has been used is still reported in this
workflow, in order to allow replication
of the results obtained in this project.
Chapter 4
Implementation
The architecture behind GauGAN is called SPADE, and it is the core of the training
model. It is an image processing layer that uses spatially-adaptive normalization to
synthetize images given an input semantic layout. Using this open-source model
provided by (Park T. et al., 2019) and Nvidia, the goal of this project is to train SPADE
utilizing the data discussed in the previous sections. The training data should contain
images of the terrains seen from above, as well as a semantic-labelling of the images.
Training datasets examples can actually be seen on the SPADE repository: COCO-Stuff,
Cityscapes and ADE20K can be useful examples to look at in order to get an idea of
what a proper dataset for training should look like. Alternatively, the Visible Earth
project from NASA seems another viable option to consider. Once a proper training
dataset is acquired, we enter the training phase. This phase can tend to be a bit erratic,
as tuning a training model and distributing weights are processes that can be prone to
trial-and-error approaches. Nonetheless, given enough time, finding the best tuning
values can become trivial. Ideally, once the model is properly trained with a good
dataset, the trained model will be able to generate 3D height-map textures using
semantic inputs, in a way that is similar to what GauGAN does.
4.1. SPADE
i.65 [SPADE batch normalization. Taken from Semantic Image Synthesis with Spatially-Adaptive Normalization]
In many common normalization techniques such as Batch Normalization (Ioffe et al.,    
2015), there are learned affine layers (as in PyTorch and TensorFlow) that are applied
after the actual normalization step. In SPADE, the affine layer is learned from semantic    
segmentation maps. This is similar to Conditional Normalization (De Vries et al., 2017    
and Dumoulin et al., 2016), except that the learned affine parameters need to be    
spatially-adaptive, which means SPADE uses different scaling and bias for each
semantic label. Using this simple method, semantic signals can act on all layer outputs,
unaffected by the normalization process which may cause loss of such information.
Moreover, because the semantic information is provided via SPADE layers, random
latent vectors may be used as input for the neural network, which can be used to
manipulate the style of the generated images (Park T. et al., 2019). In order to train the
model, Nvidia’s open source repository has been cloned from Github. The repository is    
available at https://github.com/NVlabs/SPADE
Instructions to set up the training are reported from the official documentation. New
models can be trained with the following commands.
1. Prepare a dataset.
To train on custom datasets, the easiest way is to use ./data/custom_dataset.py
by specifying the option --dataset_mode custom, along with --label_dir
[path_to_labels] --image_dir [path_to_images]. You also need to specify options
such as --label_nc for the number of label classes in the dataset,
--contain_dontcare_label to specify whether it has an unknown label, or
--no_instance to denote the dataset doesn't have instance maps.
2. Train.
To train on your own custom dataset the command pattern to use is the
following: python train.py --name [experiment_name] --dataset_mode custom
--label_dir [path_to_labels] -- image_dir [path_to_images] --label_nc
[num_labels]
E.g for the training of our model:
python train.py --name <insert_name> --dataset_mode custom --label_dir
./datasets/027/train_label --image_dir ./datasets/027/train_img --label_nc 13
--no_instance --contain_dontcare_label --gpu_ids 0,1,2,3 --batchSize 4
There are many options that can be specified. More instructions can be accessed by
typing python train.py --help. The specified options are printed to the console. To
specify the number of GPUs to utilize, use --gpu_ids. If you want to use the second and
third GPUs for example, use --gpu_ids 1,2.
To log training, use --tf_log for Tensorboard. The logs are stored at
[checkpoints_dir]/[name]/logs.
Testing is similar to testing pretrained models.
python test.py --name [name_of_experiment] --dataset_mode [dataset_mode]
--dataroot [path_to_dataset]
Use --results_dir to specify the output directory. --how_many will specify the maximum
number of images to generate. By default, it loads the latest checkpoint. It can be
changed using --which_epoch.
Code Structure Reference:
train.py, test.py: the entry point for training and testing.
trainers/pix2pix_trainer.py: harnesses and reports the progress of training.
models/pix2pix_model.py: creates the networks, and compute the losses
models/networks/: defines the architecture of all models
options/: creates option lists using argparse package. More individuals are
dynamically added in other files as well. Please see the section below.
data/: defines the class for loading images and label maps.
Options:
Some options belong to only one specific model, while others have different default
values depending on other options. The BaseOption class dynamically loads and sets
options depending on what model, network, and datasets are used.
Training has been done on 90% of the input dataset (4145 images per world, randomly
sampled).
Here’s some additional information gathered during the implementation process.
On the Azure Nodes an epoch (i.e. an entire cycle through the input dataset)
takes 30 minutes, so training takes around 30 hours on the Azure Virtual
Machines.
On Talisker, using a 1080Ti and Titan X, training takes 14- 16 hours.
Testing has been done using only 10% of the dataset (461 images randomly sampled).
This set was then used during the evaluation process to gather further results on the
quality of the model.
Default parameters used for training:
Epochs: 50
Learning rate: 0.0002
Optimizer: Adam optimizer is used for the generator as well as the discriminator.
Β1 and β2 are set to 0.5 and 0.999 respectively.
Model - Pix2pix
NetG – Spade
NGF -- # convolution filters in 1st layer – 64
Trained 20 out of 50 worlds
Chapter 5
Evaluation
According to SPADE’s open source repository, the results of the model trained by
Nvidia, GauGAN, reported in the paper, can be replicated by using an NVIDIA DGX1
machine with 8 V100 GPUs, which is equivalent to 128 GB in GPUs. A benchmark
reference for the training time could be grasped out of this data, by considering the size
of the dataset used for training, and the number of labels. As the time required for
training is not available in the paper, and as the resources to replicate such results were
not available, such benchmark statistics for quantitative assessment could not be
included for the evaluation. However, other means of evaluation have been found and
used to assess the training results.
Results are tested by using the scoring methodology used by (Gupta P. et al., 2019), in
order to evaluate the generated game worlds’ Inception Distance and Fréchet score.
Another useful feedback could’ve been also gathered by qualitative assessments using
surveys, but due to the extraordinary circumstances in which this dissertation was
written, i.e. during a global pandemic, small issues have arised and made this hard.
Nonetheless, we do still have plenty of data to do quantitative assessment. Using the
latter, it is possible to find more insight on how to improve the model. Furthermore, once
the model is trained, a tool like GauGAN could be developed in order to generate 3D
game worlds. While this is beyond the scope of this project, it is worth noting down
how much can be built on top of a successfully trained model.
5.1. Frechét Inception Distance
In GANs, the objective function for the generator and the discriminator estimates how
good they are at fooling each other. For example, we measure how well the generator is
capable of generating content that the discriminator finds close to the Ground Truth
image. However it is not a good metric to calculate the image quality or its diversity.
A common evaluation metric is the Inception Score. It uses two criteria in measuring the
performance of GAN: the quality of the generated images, and their diversity.
Entropy can be viewed as randomness. If the value of a random variable x is highly
predictable, it has low entropy. On the contrary, if it is highly unpredictable, the entropy
is high. In Frechét Inception Distance, or FID, the Inception network is used to extract
features from an intermediate layer. Then the data distribution is modeled around these
features using a multivariate Gaussian distribution with mean µ and covariance Σ. The
FID between the real images x and generated images g is computed as:
where Tr sums up all the diagonal elements. FID is more robust to noise than the classic
Inception Score. If the model generates only a single image per class, the distance will
be high. So FID is a better measurement for image diversity. FID has some rather high
bias but low variance. By computing the FID between a training dataset and a testing
dataset, we should expect the FID score to be close to zero since both are real images.
However, running the test with different batches of training samples shows a different
FID. This characteristic has been used in order to establish a threshold called World
variance FID in the table shown in the next section. This value calculates the mean
world distance, or, in layman terms, “how much do the worlds vary between each
other”. As this was calculated over training of 20 worlds, the possible unique pairwise
combinations were 190.
5.2.Summary of Results
The following table has been partially filled with results obtained from training.
Epoch
Reference FID
(average reference
FID of n=190)
Training FID score
(real images vs training
synthesized images)
FID score on test image
10
93.099
41.471
77.515
15
93.099
43.761
N/A
20
93.099
39.125
N/A
25
93.099
42.656
N/A
30
93.099
41.669
88.498
35
93.099
43.662
N/A
40
93.099
34.898
72.169
45
93.099
36.656
N/A
50
93.099
40.367
77.473
As can be noticed, the FID score between the synthesized images and the real images is
decreasing over the course of the training, although slight spikes do happen every now
and then. This is quite expected, as is expected that unseen images from the test set
will get a higher FID score. Considering the partial results that we were able to
extrapolate and how it is decreasing over time, training can be assumed as discreetly
successful, given the limits of this first training attempt. Increases in FID towards the
ending are assumed to be related to attempts of generating certain artifacts present in
Minecraft. Some visual results are also included in the next section for qualitative
assessment.
i.66 [Training results sample at epoch 35]
i.67 [Training results sample at epoch 35]
i.68 [Training results sample at epoch 35]
i.69 [Training results sample at epoch 35]
i.70 [ Training results sample at epoch 35]
As can be seen, although SPADE struggles to extrapolate the features of less common
biomes (in particular the red sand label, or villages), it is still able to offer a quasi-realistic
representation. It is assumed that a larger dataset featuring more red desert biomes
would improve the process of content generation, as more frequent biomes generation
show promising results.
The increase of FID scores over the last epoch can be
explained. Our GAN model is starting to acquire the
ability to encode features taken from the ones seen
across different images, thus starting to diverge from
the original images it has been trained on, while still
maintaining the ability to generate features related to
the labels. While ultimately this is a result we want to
obtain, it is always a good idea to check the visual
results at each epoch, in order to avoid a situation where unrealistic artifacts are
generated.
i.71 [ Real Image sample at epoch 50]
i.72 [Synthesized sample at epoch 50]
Chapter 6
Conclusion
If trained correctly, it is assumed that a SPADE model could not only give us a tool like
GauGAN for game-specific world generation (depending on what dataset is the model
trained on), but it is also believed that it could go beyond such limitations and generate
all-purpose content for multiple other games. It is held as a conviction that a semantic
tool-brush based terrain generator could highly improve the work of artists and game
designers.
Procedurally generated content for instance has always a certain degree of chaos that
can’t be controlled. While on one hand it allows to generate a lot of content, it doesn’t
give much power over content creation to designers. A tool like the one discussed in this
dissertation on the other hand could extremely improve the ability of content creators to
manipulate their own creation, allowing them to sketch entire maps without having to
focus on small details, saving time and allowing them to focus on creating other content.
Further training may also substantially improve the amount of things that could be
generated by the model, increasing the variety of elements that could be created by
inventive users.
Through this research, we’ve managed to find an answer to our main question: can
semantic-labeling based game generation be used to generate game content? The
answer is positive. The processing of labeling is not trivial and finding suitable game
data to parse through is what requires most of the effort, but as has been shown in this
dissertation, it can be achieved.
6.1. Future work
Due to a lack of time, a global pandemic and technical limitations, understanding in full
depth how Mapcrafter works is still on the to-do-list, especially in order to get rid of
shadows rendering for labeling (thus ideally getting rid of the extra processing required
through IrfanView). Last but not least, an interesting feat achieved by C. Acornley from
Abertay University has opened new possibilities for training, as a modified version of
Mapcrafter (here referred ad Heightcrafter) can extrapolate and encode the height into
the alpha channel. This feature, combined with what has been found over the course of
this project, opens up a whole new set of possibilities. Retraining a new model with
Heightcrafter using the top-down view could possibly allow us to create a tool that can
parse the synthetized results and turn them into actual Minecraft worlds.
Last but not least, training done using biome-enabled labeling could allow us to apply
style transfer as well, in a fashion similar to GauGAN’s demo. As most biomes share
similar features, changing the style is trivial. But the fundamental requirement in order
to achieve so is to do training on the biomes.
References
Bibliography
Games
Mojang (2019) Minecraft
Media Molecule (2020) Dreams
Media Molecule (2008) Little Big Planet
Nintendo (2015) Super Mario Maker
Softwares
Mcedit.net (2016) Mcedit - World Editor For Minecraft. [online] Available at: https://www.mcedit.net/
IrfanView (Irfan Skiljan, 2020) [online] Available at:https://www.irfanview.com/
Papers
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron
Courville, Yoshua Bengio (2014) Generative Adversarial Networks
Ceecee Wong (2019) The Rise of AI Supermodels, CDO Trends
Leon A. Gatys, Alexander S. Ecker, Matthias Bethge (2015) A Neural Algorithm of Artistic Style
Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen Change Loy, Yu Qiao, Xiaoou
Tang (2015) ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, Jun-Yan Zhu (2019) Semantic Image Synthesis with
Spatially-Adaptive Normalization
Gupta P., Shukla S. (2020) Image Synthesis Using Machine Learning Techniques. In: Hemanth D.,
Shakya S., Baig Z. (eds) Intelligent Data Communication Technologies and Internet of Things. ICICI
2019. Lecture Notes on Data Engineering and Communications Technologies, vol 38. Springer, Cham
Christine Payne MuseNet. OpenAI, 25 Apr. 2019, openai.com/blog/musenet
Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever, (2019) Generating Long Sequences with Sparse
Transformers
Éric Guérin, Julie Digne, Éric Galin, Adrien Peytavie, Christian Wolf, Bedrich Benes, Benoît Martinez
(2017) Interactive example-based terrain authoring with conditional generative adversarial networks
Christopher Beckham, Christopher Pal (2017) A step towards procedural terrain generation with GANs
Alec Radford, Luke Metz, Soumith Chintala (2015), Unsupervised Representation Learning with Deep
Convolutional Generative Adversarial Networks
Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen (2017) Progressive Growing of GANs for
Improved Quality, Stability, and Variation
Miles Brundage, Shahar Avin, Jack Clark, Helen Toner, Peter Eckersley, Ben Garfinkel, Allan Dafoe, Paul
Scharre, Thomas Zeitzoff, Bobby Filar, Hyrum Anderson, Heather Roff, Gregory C. Allen, Jacob
Steinhardt, Carrick Flynn, Seán Ó hÉigeartaigh, Simon Beard, Haydn Belfield, Sebastian Farquhar, Clare
Lyle, Rebecca Crootof, Owain Evans, Michael Page, Joanna Bryson, Roman Yampolskiy, Dario Amodei
(2018) The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation
Andrew Brock, Jeff Donahue, Karen Simonyan (2018) Large Scale GAN Training for High Fidelity
Natural Image Synthesis
Yanghua Jin, Jiakai Zhang, Minjun Li, Yingtao Tian, Huachun Zhu, Zhihao Fang (2017) Towards the
Automatic Anime Characters Creation with Generative Adversarial Networks
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros (2016) Image-to-Image Translation with
Conditional Adversarial Networks
Rui Huang, Shu Zhang, Tianyu Li, Ran He (2017) Beyond Face Rotation: Global and Local Perception
GAN for Photorealistic and Identity Preserving Frontal View Synthesis
Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuytelaars, Luc Van Gool (2017) Pose Guided
Person Image Generation
Guim Perarnau, Joost van de Weijer, Bogdan Raducanu, Jose M. Álvarez (2016), Invertible Conditional
GANs For Image Editing
Ming-Yu Liu, Oncel Tuzel (2016), Coupled Generative Adversarial Networks
Andrew Brock, Theodore Lim, J.M. Ritchie, Nick Weston (2016) Neural Photo Editing with Introspective
Adversarial Networks
He Zhang, Vishwanath Sindagi, Vishal M. Patel (2017), Image De-raining Using a Conditional
Generative Adversarial Network
Grigory Antipov, Moez Baccouche, Jean-Luc Dugelay (2017), Face Aging With Conditional Generative
Adversarial Networks
Zhifei Zhang, Yang Song, Hairong Qi (2017), Age Progression/Regression by Conditional Adversarial
Autoencoder
Huikai Wu, Shuai Zheng, Junge Zhang, Kaiqi Huang (2017), GP-GAN: Towards Realistic
High-Resolution Image Blending
Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta,
Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi (2016), Photo-Realistic
Single Image Super-Resolution Using a Generative Adversarial Network
Huang Bin, Chen Weihai, Wu Xingming, Lin Chun-Liang (2017), High-Quality Face Image SR Using
Conditional Generative Adversarial Networks
Subeesh Vasu, Nimisha Thekke Madam, Rajagopalan A.N (2018) Analyzing Perception-Distortion
Tradeoff using Enhanced Perceptual Super-resolution Network
Raymond A. Yeh, Chen Chen, Teck Yian Lim, Alexander G. Schwing, Mark Hasegawa-Johnson, Minh N.
Do (2016) Semantic Image Inpainting with Deep Generative Models
Yijun Li, Sifei Liu, Jimei Yang, Ming-Hsuan Yang (2017), Generative Face Completion
Donggeun Yoo, Namil Kim, Sunggyun Park, Anthony S. Paek, In So Kweon (2016), Pixel-Level Domain
Transfer
Carl Vondrick, Hamed Pirsiavash, Antonio Torralba (2016), Generating Videos with Scene Dynamics
Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T. Freeman, Joshua B. Tenenbaum (2016), Learning a
Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling
Matheus Gadelha, Subhransu Maji, Rui Wang (2016) 3D Shape Induction from 2D Views of Multiple
Objects
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro (2017),
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros (2017), Unpaired Image-to-Image Translation
using Cycle-Consistent Adversarial Networks
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas
(2016), StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial
Networks
Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee (2016),
Generative Adversarial Text to Image Synthesis
Ayushman Dash, John Cristian Borges Gamboa, Sheraz Ahmed, Marcus Liwicki, Muhammad Zeshan
Afzal (2017), TAC-GAN – Text Conditioned Auxiliary Classifier Generative Adversarial Network
Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, Honglak Lee (2016), Learning
What and Where to Draw
Sergey Ioffe, Christian Szegedy (2015), Batch Normalization: Accelerating Deep Network Training by
Reducing Internal Covariate Shift
Harm de Vries, Florian Strub, Jérémie Mary, Hugo Larochelle, Olivier Pietquin, Aaron Courville (2017),
Modulating early visual processing by language
Vincent Dumoulin, Jonathon Shlens, Manjunath Kudlur (2016), A Learned Representation For Artistic
Style
List of Figures
(i.1) Lucas Cranach the Elder, The Garden of Eden (1530), oil on poplar wood, 81 cm x 114 cm,
Gemäldegalerie Alte Meister, Dresden
(i.2) Peter Paul Rubens and Jan Brueghel the Elder, The garden of Eden with the fall of man (1615), oil
on panel, 74.3 cm x 114.7 cm, Mauritshuis art museum in The Hague, Netherlands
(i.3)Thomas Cole, The Garden of Eden (1828), 97.7 cm x 133.9 cm, Amon Carter Museum of American
Art
(i.4) Hieronymus Bosch, The Garden of Earthly Delights (1503-1515), oil on oak panels, 205.5 cm ×
384.9 cm (81 in × 152 in), Museo del Prado, Madrid
(i.5) Mojang Studios, screenshot from Minecraft
(i.6) Media Molecule, screenshot from Dreams
(i.7) Media Molecule, screenshot from Little Big Planet
(i.8) Nintendo, screenshot from Mario Maker
(i.9) MCEdit.net, screenshot from MCEdit 2.0
(i.10) Nvidia, screenshot from GauGAN
(i.11) Classification diagram, taken from Image Synthesis Using Machine Learning Techniques, 2019
(i.12) Examples of GANs used to generate new plausible examples for image datasets. Taken from
Generative Adversarial Networks, 2014
(i.13) Example of GAN-Generated Photographs of Bedrooms.Taken from Unsupervised Representation
Learning with Deep Convolutional Generative Adversarial Networks, 2015
(i.14) Example of Vector Arithmetic for GAN-Generated Faces.Taken from Unsupervised
Representation Learning with Deep Convolutional Generative Adversarial Networks, 2015
(i.15) Examples of Photorealistic GAN-Generated Faces.Taken from Progressive Growing of GANs for
Improved Quality, Stability, and Variation, 2017
(i.16) Example of Photorealistic GAN-Generated Objects and ScenesTaken from Progressive Growing
of GANs for Improved Quality, Stability, and Variation, 2017
(i.17) Example of the Progression in the Capabilities of GANs from 2014 to 2017.Taken from The
Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation, 2018
(i.18) Example of Realistic Synthetic Photographs Generated with BigGAN. Taken from Large Scale
GAN Training for High Fidelity Natural Image Synthesis, 2018
(i.19) Example of GAN-Generated Anime Character Faces.Taken from Towards the Automatic Anime
Characters Creation with Generative Adversarial Networks, 2017
i.20 Example of Photographs of Daytime Cityscapes to Nighttime With pix2pix. Taken from
Image-to-Image Translation with Conditional Adversarial Networks, 2016
i.21 Example of Sketches to Color Photographs With pix2pix. Taken from Image-to-Image Translation
with Conditional Adversarial Networks, 2016.
i.22 Example of GAN-based Face Frontal View Photo GenerationTaken from Beyond Face Rotation:
Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis,
2017
i.23 Example of GAN-Generated Photographs of Human PosesTaken from Pose Guided Person Image
Generation, 2017
i.24 Example of Celebrity Photographs and GAN-Generated Emojis.Taken from Unsupervised
Cross-Domain Image Generation, 2016
i.25 Example of Face Photo Editing with IcGAN.Taken from Invertible Conditional GANs For Image
Editing, 2016
i.26 Example of GANs used to Generate Faces With and Without Blond Hair.Taken from Coupled
Generative Adversarial Networks, 2016
i.27 Example of GANs used to Generate Faces With and Without Blond Hair.Taken from Coupled
Generative Adversarial Networks, 2016
i.28 Example of Using a GAN to Remove Rain From PhotographsTaken from Image De-raining Using a
Conditional Generative Adversarial Network
i.29 Example of Photographs of Faces Generated With a GAN With Different Apparent Ages.Taken
from Face Aging With Conditional Generative Adversarial Networks, 2017
i.30 Example of Using a GAN to Age Photographs of FacesTaken from Age Progression/Regression by
Conditional Adversarial Autoencoder, 2017
i.31 Example of GAN-based Photograph Blending.Taken from GP-GAN: Towards Realistic
High-Resolution Image Blending, 2017
i.32 Example of GAN-Generated Images With Super Resolution. Taken from Photo-Realistic Single
Image Super-Resolution Using a Generative Adversarial Network, 2016
i.33 Example of High-Resolution Generated Human Faces. Taken from High-Quality Face Image SR
Using Conditional Generative Adversarial Networks, 2017
i.34 Example of GAN-Generated Photograph Inpainting Using Context Encoders.Taken from Context
Encoders: Feature Learning by Inpainting describe the use of GANs, specifically Context Encoders,
2016
i.35 Example of GAN-based Inpainting of Photographs of Human FacesTaken from Semantic Image
Inpainting with Deep Generative Models, 2016
i.36 Example of GAN Reconstructed Photographs of FacesTaken from Generative Face Completion,
2017
i.37 Example of Input Photographs and GAN-Generated Clothing Photographs. Taken from Pixel-Level
Domain Transfer, 2016
i.38 Example of Video Frames Generated With a GAN. Taken from Generating Videos with Scene
Dynamics, 2016
i.39 Example of GAN-Generated Three Dimensional Objects. Taken from Learning a Probabilistic Latent
Space of Object Shapes via 3D Generative-Adversarial Modeling
i.40 Example of Three-Dimensional Reconstructions of a Chair From Two-Dimensional Images. Taken
from 3D Shape Induction from 2D Views of Multiple Objects, 2016
i.41 Example of Semantic Image and GAN-Generated Cityscape Photograph. Taken from
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs, 2017
i.42 Four Image-to-Image Translations performed with CycleGAN. Taken from Unpaired
Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017
i.43 Example of Translation from Paintings to Photographs With CycleGAN. Taken from Unpaired
Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017
i.44 Example of Textual Descriptions and GAN-Generated Photographs of Birds. Taken from
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks,
2016.
i.45 Example of Textual Descriptions and GAN-Generated Photographs of Birds and Flowers. Taken
from Generative Adversarial Text to Image Synthesis
i.46 Example of Photos of Object Generated From Text and Position Hints With a GAN. Taken from
Learning What and Where to Draw, 2016
i.47-50 Examples of Terrains Generated With a GAN. Taken from Interactive example-based terrain
authoring with conditional generative adversarial networks, 2017.
I.51-56 Semantic Entity Block Clustering used for training
i.57 Screenshot of the labeled texture folder content
i.58-59 Full map images generated using the map processing script. The map in the top image shows
the real images. The map in the bottom shows labeled maps
i.60 Unprocessed Labeled Map
i.61 File>Batch Conversion/Rename option, from IrfanView
i.62 Above, File>Batch Conversion/Rename>Advanced button, from IrfanView
i.63 File>Batch Conversion/Rename>Advanced>Replace color>Settings button, from IrfanView
i.64 File>Batch Conversion/Rename>Advanced>Replace color>Settings>Replace Color window, from
IrfanView
i.65 SPADE batch normalization. Taken from Semantic Image Synthesis with Spatially-Adaptive
Normalization
i.66-70 Training results samples at epoch 35
i.71 Real Image samples at epoch 50
i.72 Synthesized sample at epoch 50
... Thus, new and realistic visuals can be produced. In the design process, computer-aided engineering/design tasks such as inspiration generation, idea/concept generation, and computational (computational) design topology optimization can be easily done with the help of GAN [35,36,37]. ...
Article
Full-text available
New, creative and innovative ideas that will be created in the early stages of the design process are very important to develop better and original products. Human designers may become overly attached to certain design ideas that hinder the thinking process toward generating new concepts. This situation can prevent the creation of ideal designs. Finding original design ideas requires a creative mind, knowledge, experience and talent. In addition, verbal, written, and visual sources of inspiration can be helpful and inspiring for generating ideas and concepts. In this study has been created a visual integration model using a data-supported Artificial Intelligence (AI) method to generate creative design ideas. A generative adversarial network model (GAN) is proposed, which produces new creative product images inspired by nature with the combination of a target object and biological object images. This model has been successfully applied to an aircraft design problem, tested and evaluated. The sketches obtained with the generative design model can inspire the designer to find new/creative design ideas and variants. This approach can increase the quality of the ideas produced and make the idea and concept production process easy, simple and quick.
... More details on such specific applications are given in the following sections. GAN, another deep learning method frequently used in design applications, has provided a new perspective on deep learning with its ability to perform tasks such as creating images that did not exist before, increasing the resolution of existing images, creating 3D models from 2D images, and estimating different angles of an object given an angle and text-to-image translation Rahman, 2020). Two different networks (generative and discriminator) simultaneously available in a single architecture of the GAN enable to create unique data (Yi et al., 2019). ...
Article
Having passed the primitive phases and starting to revolutionize many different fields in some way, artificial intelligence is on its way to becoming a disruptive technology. It is also foreseen to totally change human-centred traditional engineering design approaches. Although still in the early phases, AI-powered engineering applications enable them to work with ambiguous design parameters and solve complex engineering problems, not otherwise possible with traditional design methods. This work attempts to shine a light on current progress and future research trends in AI applications in design/engineering design concepts, covering the last 15 years which is the ramp-up period for AI. Methods such as machine learning, genetic algorithm, and fuzzy logic have been carefully examined from an engineering design perspective. AI-powered design studies have been categorized and critically reviewed for various design stages such as inspiration, idea and concept generation, evaluation, optimization, decision-making, and modelling. As an overview result of this review, we can confidently say that the interest in data-based design methods and Explainable Artificial Intelligence (XAI) has increased in recent years. Furthermore, the use of AI methods in engineering design applications helps to obtain efficient, fast, accurate, and comprehensive results. Especially with deep learning methods and combinations, situations where human capacity is insufficient can be addressed efficiently. However, choosing the right AI method for a design problem under consideration is significantly important for such successful results. Hence, we have given an outline perspective on choosing the right AI method based on the literature outcomes for design problems.
ResearchGate has not been able to resolve any references for this publication.