Conference PaperPDF Available

Architectural Drawings Recognition and Generation through Machine Learning

Authors:

Abstract and Figures

With the development of information technology, the ideas of programming and mass calculating were introduced into the design field, resulting in the upcoming of computer-aided design. With the idea of designing by data, we began to manipulate data directly, and interpret data through design works. Machine Learning, as a decision making tool, has been widely used in many fields. It can be used to analyze large amount of data, and predict the future changes. Generative Adversarial Network (GAN) is a model frame in machine learning. It’s specially designed to learn and generate output data with similar or identical characteristics. Pix2pixHD is a modified version of GAN that learns image data in pairs and generate new images based on the input. The author applied pix2pixHD in recognizing and generating architectural drawings, marking rooms with different colors and then generating apartment plans through two convolutional neural networks. Next, in order to understand how these networks work, the author analyzed the frame of them, and provide an explanation of the three working principles of the networks, convolution layer, residual network layer and deconvolution layer. Lastly, in order to visualize the networks in architectural drawings, the author derived data from different layer and different training epochs, and visualized as gray scale images. It was found that the features of the architectural plan drawings have been gradually learned and stored as parameters in the networks. As the networks get deeper and the training epoch increases, the features in the graph become more concise and clearer. This phenomenon may be inspiring in understanding the designing behavior of human.
Content may be subject to copyright.
156
Architectural Drawings
Recognition and Generation
through Machine Learning
1 Apartment oor plan: recog-
nition and generation through
generative adversarial network
Weixin Huang
School of Architecture,
Tsinghua University
Hao Zheng
School of Design,
University of Pennsylvania
1
ABSTRACT
With the development of information technology, the ideas of programming and mass
calculation were introduced into the design eld, resulting in the growth of comput-
er-aided design. With the idea of designing by data, we began to manipulate data directly,
and interpret data through design works. Machine Learning as a decision making tool
has been widely used in many elds. It can be used to analyze large amounts of data and
predict future changes. Generative Adversarial Network (GAN) is a model framework in
machine learning. It’s specially designed to learn and generate output data with similar or
identical characteristics. Pix2pixHD is a modied version of GAN that learns image data
in pairs and generates new images based on the input. The author applied pix2pixHD in
recognizing and generating architectural drawings, marking rooms with different colors
and then generating apartment plans through two convolutional neural networks. Next, in
order to understand how these networks work, the author analyzed their framework, and
provided an explanation of the three working principles of the networks, convolution layer,
residual network layer and deconvolution layer. Lastly, in order to visualize the networks in
architectural drawings, the author derived data from different layer and different training
epochs, and visualized the ndings as gray scale images. It was found that the features of
the architectural plan drawings have been gradually learned and stored as parameters in
the networks. As the networks get deeper and the training epoch increases, the features in
the graph become more concise and clearer. This phenomenon may be inspiring in under-
standing the designing behavior of humans.
157
3
RECENT DEVELOPMENT OF GENERATIVE
ADVERSARIAL NETWORK (GAN)
In the past four years, Generative Adversarial Networks
(GAN), as one type of machine learning algorithm, has
achieved a lot of progress for generative tasks. Although
there were many problems when GAN was rst proposed,
such as unstable training and so on, researchers improved
it from the aspects of the framework, training techniques
and so on, resulting in the following explosive growth.
Goodfellow et al. (2014) were known as the rst team to
propose the Generative Adversarial Network in machine
learning. By providing training data in pairs, the program
nds the most suitable parameters in the network so that
the discriminator (D) has the least potential to distinguish
the generated data (G) from the original data (Figure 2).
In order to solve the problem of training in the wrong direc-
tion, Mirza and Osindero (2014) proposed a rened version
of GAN, Conditional Generative Adversarial Networks
(CGAN). The idea of CGAN is to turn the original generation
process into a conditional process, based on providing
some additional information as hints. The additional infor-
mation can be one-hot vectors, two-dimensional images, or
even three-dimensional models. Once the training process
runs toward the unexpected direction, punishment will
be given to correct the training tendency according to the
additional information.
Later, Zhu et al. (2016) invented the iGAN/GVM. Their work
contains two kinds of additional information, one is the
user's input, such as strokes, lines, stretches, and defor-
mation, second is the boundary of objects in the image. In
addition to outputting two-dimensional data (images), the
program will also stick the texture in the original images
to the shape of the resulting images, making the output
images more real and clearer near the boundary. They
used the light eld information to capture the point-to-point
mapping relationships, so that it is possible to repeatedly
paste the textures to the output images.
After the creation of iGAN, Isola et al. (2017) continued to
work on pix2pix by generating a real photo from a part-
ly-damaged photo, a colorful map from a black-and-white
map, and an image with texture and shadow from a linear
sketch. In pix2pix, the input D is a pair of images rather
than a single image, and the task of D becomes the evalu-
ation of whether those two images are the same or not. So
after training, we can input an image and tell the program
to generate the most possible corresponding output image
(Figure 3).
Based on pix2pix, Wang et al. (2017) built a rened network
called pix2pixHD, enlarging the resolution of the images
into 2048*1024 instead of the previous 256*256. An input
image is regarded as three parallel two-dimensional
matrices, according to the width, height, and three RGB
channels of the image. Then the matrices are transformed
in the generator through ve groups of convolution layers,
then nine groups of residual network layers, and nally ve
groups of deconvolution layers (Figure 4).
For now, pix2pixHD is the latest and most efcient frame-
work to process image data in pairs (Wang et al. 2017). Its
ability to process large images also gives us more details
when generating complex architectural drawings. The
following research in this article is based on pix2pixHD
framework to discuss the mapping between architectural
plan drawings, which is different from previous research
of mapping city images (Zheng 2018), mapping perspective
images (Peng, Zhang, and Nagakura 2017), and mapping
structural images (Luo, Wang, and Xu 2018).
RECOGNIZING AND GENERATING THROUGH GAN
Since GAN is a powerful tool in dealing with image data, its
application in architecture, especially in recognizing and
24
2 Workow of GAN. 3 Pix2Pix examples by Isola et al. (2017).
4 Network architecture of Pix2PixHD by Wang et al. (2017).
COMPUTATIONAL INFIDELITIES
158
generating architectural drawings, has good potential for
development. A process of training and evaluating between
an architectural drawing and its corresponding labeled
map was carried out by the author in Python and Pytorch.
In addition, to simplify the study, only a dataset of colorful
oor plans of apartments collected from property website
lianjia.com was tested in order to remove the inuence of
varying scales and styles of the drawings.
Labeling Principles
First of all, a labeling rule was created which uses different
colors to represent areas with different functions (Figure
5). Colors with RGB values of only 0 or 255 were commonly
used in the labeling map in order to differentiate the labels
as far as possible, so all together 8 combinations of RGB
values can be achieved, which are used to label walkway,
bedroom, living room, kitchen, toilet, dining room, balcony,
and blank areas outside the at. Windows and doors are
less important, so R:128 G:0 B:0 is used for windows and
R:0 G:128 B:0 is used for doors. Since windows and doors
are the connections of the other areas, their drawing layer
is always on the top of the others.
One hundred fteen image pairs such as Figure 5 were
selected, sized to a xed plotting scale, and carefully marked
by three volunteer architectural students. Based on this
dataset, two trainings, plan-to-map (recognizing plan draw-
ings and producing color labeled maps) and map-to-plan
(inputting color labeled maps and generating plan drawings),
were tested and will be introduced in the following pages
.
Recognizing
After dividing 115 images into a training set with 100 images
and a testing set with 15 images, our team rst trained the
network using plan drawings as input and color labeled
maps as output. The program is supposed to take in a plan
drawing and recognize it by producing a map with different
5 Left: oor plan drawing. Middle:
labeled image. Right: labeling
rule.
5
colors that represent different functional areas. The whole
training process was carried out with one NVIDIA TITAN X
graphics card, and it took 80 seconds for one epoch with
100 images, so totally 1.8 hours for one network.
Figure 6 shows the selected results from the testing set.
It performs well in recognizing areas of bedroom, kitchen,
toilet, and balcony, whose boundaries are clear because
there are usually walls to separate them from each other
and specic furniture inside, as in No.237 and No.C22.
However, for walkway, living room, and dining room, the
network may not be able to tell them apart since there is
usually no clear boundary between them, as in No.255
where the areas of walkway and living room are mixed
together. But actually, a test asking multiple architects to
mark No.255 showed different results between the areas
of walkway and living room, so it’s also hard for humans to
distinguish these two areas in No.255. Also in No.240 the
boundary between dining room, walkway, and living room
is not clear, and different architects may give different
answers based on their own understanding. This uncer-
tainty somehow reects the similarity in human cognition
and machine learning results.
In No.C27, the shape of this oor plan contains an ellipse and
a triangle, but most oor plans in the training set are orthog-
onal. As a result, the prediction does not perfectly match the
original plan. Adding more images with irregular boundary
shapes into the training set may help to solve this problem.
It is also interesting to see that in No.C23, an error from
our volunteer was found by the trained network. The living
room and parts of a walkway are labeled as a bedroom
by the volunteer, but the network successfully recognized
this area. The performance of the network even exceeds
that of a human in some images. Later, we double checked
all images and found four wrongly-marked image pairs in
Architectural Drawings Recognition and Generation Huang, Zheng
159
the training set, but those errors didn’t lead the training
process in the wrong direction, which demonstrates the
fault-tolerant ability of the network.
In conclusion, the network works well in recognizing
architectural plan drawings. Compared to the training set
of thousands of images commonly used in other research,
a training set with 100 images is enough for the network to
learn and summarize the knowledge of architectural plan
drawings of specic apartments.
Generating
Next, instead of regarding plan drawings as input images,
our team then trained another network using color labeled
maps as input images and plan drawings as output images.
When evaluating, the program should generate a plan
drawing according to the input labeled map.
Figure 7 shows selected results from the testing set. All six
selected images show clear generation of the kitchen and
toilet areas, including accurate positions of kitchenware
and sanitary ware and correct direction of door openings
(Figure 8). The high quality of these results is not surprising
since there is not much uncertainty in the positioning of
xtures and doors in the training set.
6
7
6 Result of recognizing.
7 Result of generating.
8
8 Detailed generating result of
toilet and kitchen.
COMPUTATIONAL INFIDELITIES
160
However, when generating the area of living room, the
positions of the sofa and the TV are not always clear, as in
the output of No.237 and No.C22, since facing either right
or left seems reasonable. But in No.C26 and No.C30, facing
the other direction is impossible because of the existence
of a door and walkway, so the positions of the sofa and TV
are very clear in these two output images. Similar results
happen in the bedroom of No.228 and No.256, which are
also reasonable.
Another point is that the position of the dining table in
No.228 and No.C26 is slightly different from that in the
original images. But a survey shows that more architects
thought the generated position was more reasonable
because it leaves more space for the walkway and door.
This somehow shows the reliability of the network in design.
In conclusion, the network has the potential to learn the
rules of design effectively. Both the very certain rules
that a design needs to follow and the uncertain situations
that provide exibility can be reected by the network.
Architects can release their hands from simple or even
complex design work by inputting labeling information to
9 Convolution layer and kernel.
10 Nine groups of ResNet.
11 Deconvolution layer and kernel.
the program and getting detailed design plans as feedback.
WORKING PRINCIPLES
Based on the dataset and experiments above, in the
following two sections, the working principles and core
algorithms in the generator of GAN are explored, from the
whole framework to certain neurons.
Convolution Layer
Image data is actually a combination of three two-dimen-
sional matrices which represent the RGB channels of pixels
in the image. When the network takes in the image data, the
matrices will go through a series of calculations and nally
come out as a new set of matrices. We call each set of
calculations layer, and each single calculation neuron.
The rst section of layers includes ve groups of convo-
lution layer sets; each contains one convolution layer, one
batch normalization layer, and one ReLU layer. As Figure 9
shows, the original image will be multiplied by a convolution
kernel matrix in the convolution layer, and become a new
matrix. This operation will be carried out every two pixels,
so the size of the new matrix is half of the original image in
911
10
Architectural Drawings Recognition and Generation Huang, Zheng
161
width and height. This calculation enables the combination
of information in neighbor pixels, and further summariza-
tion of the information in the image. Usually, a convolution
layer contains multiple convolution kernels, and each
kernel produces a new matrix. All new matrices arrange in
a line, resulting in a three-dimensional matrix, which is the
nal outcome of a convolution layer.
Then, one batch normalization layer and one ReLU layer will
act as a data coordinator to normalize the numbers and
produce the activation matrices for the next layer.
After the computation of all ve groups of layers, the size of
the image will be greatly reduced to width/16 * height/16,
with 1024 layers of information, so the nal size of the data
will be 16 * 16 * 1024. All information in the original image
is summarized and stored in the new three-dimensional
matrix, ready for the next group of calculations.
Residual Network Layer
The second part in the network is nine groups of residual
network layers (ResNet). One ResNet contains two sets of
convolution layers, but instead of directly linking convo-
lution layers, ResNet has a back door to skip two layers if
the result is growing worse (Figure 10). It processes the
network into deeper layers, while making sure the overt-
ting problem does not occur.
Deconvolution Layer
Compared to the effect of a convolution layer to make the
image smaller, the deconvolution layer is a reversed oper-
ation, enlarging the image back to the original size, while
reducing the number of two-dimensional matrices.
Figure 11 illustrates the computation principles of a decon-
volution layer. The source pixels are arranged separately,
and the same rule of multiplication is applied to the kernel.
Each deconvolution layer will make the matrices double the
size in height and width, but reduce the number of matrices.
After going through ve deconvolution layers, the data with
the size of width/16 * height/16 * 1024 will be a size of
width * height * 3, same as the original image.
In the generator network, image data will be folded into
a smaller image with many layers of information, then
be unfolded back to another image of the same size but
with only 3 channels of color information. In the network,
thousands of kernels work together to summarize and then
explain the features, which are the main parameters that
the network should learn from the training set.
VISUALIZING THE NETWORK
After understanding how the network trains and processes
image data, our team visualized each matrix in the whole
network to see what kinds of visual features the network
has recognized and generated.
Network for Recognition
In order to activate the kernels in the network, a testing
image was inputted. Then a series of black-and-white
images was translated from the two-dimensional matrices,
which were the result of the original image after passing
through each layer. The pixel with more extreme values (0
or 255) means it’s more activated into two groups.
Figure 12 and Figure 13 show the selected translated
images from the recognition network. The network was
trained 80 times (epochs), and its loss value reached a rela-
tively low and stable number, so we thought this training
process was completed.
In Figure 12, as the convolution layer (conv-layer) goes
deeper, the activation of the image becomes more and more
conspicuous, and more and more features are activated.
Neuron No.29 in conv-layer 1 indicates that only features
like vertical walls are detected, but more features like the
edges of tables and beds are activated in Neuron No.1 in
conv-layer 2. What’s more, Neuron No.67 in conv-layer 3
shows the paving pattern of bedrooms and living room is
detected, and in Neuron No.0 in conv-layer 4, the features
of the paving, walls, and furniture edges can be activated
together. In the last conv-layer, it seems all features are
summarized and combined into one matrix. So the aim of
this convolution process is to condense and re-encode
the information and features in the original image, and to
prepare the data for the later deconvolution layers. This is
more like the learning process of humans, from concrete
entities to abstract concepts as we think deeper.
Figure 13 shows the translated images in ResNet layers
and deconvolution layers. The matrices don’t change much
in ResNet layers because of the protection mechanism of
the overtting problem. The author tried to shut down the
back door in ResNet, but this caused the vanishing gradient
problem as a result when back propagating. Here, the
ResNet is necessary although it takes some computation.
Next comes ve groups of deconvolution layers (deconv-
layer). In the recognition network, the nal aim is to map
the oor plans to the color labeled maps, whose colors are
usually continuous and compact. Neuron No.69 in deconv-
layer 1 shows the chaos situation when the computation of
matrices in the rst deconvolution layer completes. But as
COMPUTATIONAL INFIDELITIES
162
the network goes deeper, the boundaries between different
areas become clearer, and the noise gradually disappears.
Finally, a clean map showing the prediction of different areas
comes out as the output of the last deconvolution layer.
The gradual change of a specic neuron in different training
epochs can also be illustrated (Figure 14). For a total of 80
training epochs, samples in epochs 4, 20, 36, 48, 68, and 80
in conv-layer 3 are selected. For example, Neuron No.67 in
epoch 4 shows a clear activation of the paving pattern in
bedroom area. With the training going on, the activation of
the paving pattern in the living room also becomes clearer.
In the nal training of epoch 80, the image of Neuron No.67
shows an equivalent weight of both the paving patterns,
which are reasonable because the positions of the two
areas may have some connections. We can understand
this as a procedure of learning like humans, as the more
experience accumulated in the learning process, the better
the knowledge will match the reality. It is easier to under-
stand the effect of training epochs on performance by
referring to the human learning process.
What’s more, through the analysis of the last deconv-layer,
a table was produced showing which two areas have
a greater possibility to be activated together (Table 1).
Numbers greater than 0.7 are highlighted. It shows a larger
possibility for areas with the same types of functions to be
activated together, such as the upper and lower balcony,
and the upper and lower bedroom. This is because of the
detection of similar patterns in the network. However,
the possibility of living room and balcony being activated
together is also comparatively high. Since there is no simi-
larity in pattern between these two kinds of areas, it could
be said that the design of living rooms and balconies, such
13
12
14 Table 1
Table 1 Possibilities that two
areas are activated together for
recognition network.
12 Translated images in convolution
layer for recognition network.
13 Translated images in ResNet and
deconvolution layer for recogni-
tion network.
14 Translated images of selected
neurons in different training
epochs for recognition network.
kitchen upper
balcony
upper
bedroom
dining
room walkway toilet living
room
lower
balcony
lower
bedroom
kitchen 1 0.45 0.38 0.27 0.31 0.28 0.25 0.36 0.39
upper balcony 0.45 1 0.59 0.48 0.62 0.5 0.73 1 0.48
upper bedroom 0.38 0.59 1 0.39 0.5 0.39 0.52 0.52 1
dining room 0.27 0.48 0.39 1 0.42 0.39 0.47 0.45 0.34
walkway 0.31 0.62 0.5 0.42 1 0.52 0.53 0.58 0.42
toilet 0.28 0.5 0.39 0.39 0.52 1 0.53 0.5 0.25
living room 0.25 0.73 0.52 0.47 0.53 0.53 1 0.73 0.44
lower balcony 0.36 1 0.52 0.47 0.58 0.5 0.73 1 0.47
lower bedroom 0.39 0.48 1 0.34 0.42 0.25 0.44 0.47 1
kitchen upper
balcony
upper
bedroom
dining
room walkway toilet living
room
lower
balcony
lower
bedroom
kitchen 1 0.33 0.38 0.22 0.56 0.38 0.14 0.33 0.38
upper balcony 0.33 1 0.48 0.36 0.47 0.39 0.49 1 0.48
upper bedroom 0.38 0.48 1 0.44 0.25 0.09 0.31 0.48 1
dining room 0.22 0.36 0.44 1 0.09 0.27 0.48 0.36 0.44
walkway 0.56 0.47 0.25 0.09 1 0.55 0.28 0.47 0.25
toilet 0.38 0.39 0.09 0.27 0.55 1 0.45 0.39 0.09
living room 0.14 0.49 0.31 0.48 0.28 0.45 1 0.49 0.31
lower balcony 0.33 1 0.48 0.36 0.47 0.39 0.49 1 0.48
lower bedroom 0.38 0.48 1 0.44 0.25 0.09 0.31 0.48 1
Architectural Drawings Recognition and Generation Huang, Zheng
163
as their positions, may be highly related.
In conclusion, by visualizing the recognition network,
certain similarities are found in the learning method and
process between the GAN machine learning algorithm and
human cognition. It might be interesting to dig it deeper in
other types of data to the relationship of machine learning
and human cognition in architectural design problems.
Network for Generation
Next, same test was done for the generation network.
Figure 15 shows the translated images in conv-layer. In
generation networks, the features of input images are
easier to recognize and understand compared to the
recognition network. The edges and differences between
each area are quite clear, and almost no noise exists in
the conv-layers. The features being activated change from
simple color blocks to the combination of color blocks and
their boundary lines, which further supports the former
conclusion.
However, after the computation in the ResNet, the same
thing happens in the deconv-layer 1, images are in a chaotic
situation with much noise (Figure 16). As the network
goes deeper, the noise gradually disappears and the true
generation of the oor plan begins to show up. As shown in
Neuron No.42 of deconv-layer 3, distinguishable edges of
walls and furniture are activated, which means the genera-
tion network acted properly in deconv-layers.
The same test of translating images from different training
epochs was done for the generation network (Figure 17).
But here we thought it was more valuable to activate the
16
15
17
Table 2 Possibilities that two
areas are activated together for
generation network.
15 Translated images in convo-
lution layer for generation
network.
16 Translated images in ResNet and
deconvolution layer for generation
network.
17 Translated images of selected
neurons in different training
epochs for generation network.
Table 2
kitchen upper
balcony
upper
bedroom
dining
room walkway toilet living
room
lower
balcony
lower
bedroom
kitchen 1 0.45 0.38 0.27 0.31 0.28 0.25 0.36 0.39
upper balcony 0.45 1 0.59 0.48 0.62 0.5 0.73 1 0.48
upper bedroom 0.38 0.59 1 0.39 0.5 0.39 0.52 0.52 1
dining room 0.27 0.48 0.39 1 0.42 0.39 0.47 0.45 0.34
walkway 0.31 0.62 0.5 0.42 1 0.52 0.53 0.58 0.42
toilet 0.28 0.5 0.39 0.39 0.52 1 0.53 0.5 0.25
living room 0.25 0.73 0.52 0.47 0.53 0.53 1 0.73 0.44
lower balcony 0.36 1 0.52 0.47 0.58 0.5 0.73 1 0.47
lower bedroom 0.39 0.48 1 0.34 0.42 0.25 0.44 0.47 1
kitchen upper
balcony
upper
bedroom
dining
room walkway toilet living
room
lower
balcony
lower
bedroom
kitchen 1 0.33 0.38 0.22 0.56 0.38 0.14 0.33 0.38
upper balcony 0.33 1 0.48 0.36 0.47 0.39 0.49 1 0.48
upper bedroom 0.38 0.48 1 0.44 0.25 0.09 0.31 0.48 1
dining room 0.22 0.36 0.44 1 0.09 0.27 0.48 0.36 0.44
walkway 0.56 0.47 0.25 0.09 1 0.55 0.28 0.47 0.25
toilet 0.38 0.39 0.09 0.27 0.55 1 0.45 0.39 0.09
living room 0.14 0.49 0.31 0.48 0.28 0.45 1 0.49 0.31
lower balcony 0.33 1 0.48 0.36 0.47 0.39 0.49 1 0.48
lower bedroom 0.38 0.48 1 0.44 0.25 0.09 0.31 0.48 1
COMPUTATIONAL INFIDELITIES
164
deconv-layer because the deconv-layer is more complex
and contains more unique information than the conv-layer
in the generation network. The same phenomenon in the
generation network was found in the translated images
of different training epochs as in the recognition network.
Neuron No.18 in epoch 4 shows a very blurry generation of
the living room and the dining room. But when it proceeds
to epoch 36, the dining table becomes clear, and in epoch
48, the blurry area in living room disappears, which means
this neuron regards the generation of living room as an
unrelated factor to that of other areas, and excludes its
weight. In epoch 80, however, the former highly activated
dining area becomes less activated, because of the same
reason. This shows the ability of the network in self-cor-
recting and evolving as the training time increases.
As shown in table 2, while the two balcony areas and
bedroom areas still keep a very large possibility (100%) to
be activated together, the possibility of dining room and
walkway appears very small. This represents the ability
of the network to distinguish these two controversial
areas. Meanwhile, the possibilities of the toilet and two
bedrooms are both very small, this indicates the preference
of designing toilets and bedrooms close together in the
dataset, so the network adjusted its parameters to distin-
guish them apart in a very early stage.
Figure 18 shows the summary of large or small activation
possibilities. Walkway and living room are the core compo-
nents in this graph, since they have links to most of the
other areas. This might reveal the designing sequences
of apartment plans, in which the walkway and living room
come rst, then other rooms.
Figure 19 shows the evolution of possibility tables in
different training epochs. Since the matrices are symmet-
rical, only the upper triangle is illustrated. Generally
speaking, with the training going on, the possibilities turn
from even numbers to more extreme numbers (Figure
20), which indicates the improvement during the training
process. In epoch 4, most numbers are distributed within
0.28 to 0.42, but in epoch 80, the number distribution in
0.09 and 0.48 increases a lot, this may indicate that during
the training process, the network gradually learns to tell
areas apart or combine them together.
To be specic, the possibility of toilet and bedrooms is 0.17
in epoch 4, and gradually decreases to 0.09 in epoch 80,
while that of kitchen and walkway increases from 0.41 in
epoch 4 to 0.56 in epoch 80. This demonstrates the learning
process of understanding the relationships between
different areas that occurs in the generation network. .
On the other hand, it could be see that many of the co-ac-
tivation possibilities are not changing signicantly, which
may indicate that the knowledge of apartment plan design
and the training process is more complex than what can be
revealed by the statistical relationship between spaces, and
requires more in-depth exploration in the future.
CONCLUSION
Pix2pixHD, an application of Generative Adversarial
Networks (GAN) is tentatively applied in recognizing and
generating architectural drawings. The experiments are
successful, and can be further developed into prototypes
of a powerful tools for drawing review, digitalization, and
drawing assistance. Also, by understanding the working
principles and visualizing sample networks, designers can
verify and summarize their design techniques and ideas,
and get further inspiration through this process.
Analysis of the recognizing and generating process, as well
as the training process of the GAN has been tentatively
18
19
20
18 Summary of highlighted activa-
tion possibilities.
19 Possibility tables in different
training epochs.
20 Possibility distribution in
different training epochs.
Architectural Drawings Recognition and Generation Huang, Zheng
165
carried out, revealing some interesting phenomena. Because
of the complexity of neural networks, it is believed that there
will be more in-depth associations that lie in the network,
which will provide a valuable understanding of architectural
oor plan design. Further in-depth studies could be carried
out to explore the mechanism that lies in the network.
Through the analysis of the process of network training
and information processing, it is interesting to nd that,
compared to a human learning process, a machine learning
algorithm has similar characteristics, such as to dig
abstract concepts from concrete entities, and to extract
accurate standards from blurry understandings.
It could be seen that in the future, articial intelligence may
play more and more active roles in not only repetitive works,
but also creative works. It is highly possible that human
ability would be greatly expanded when combined with arti-
cial intelligence. The next step of this research would be to
develop networks to recognize and generate architectural
drawings faster and more reliably, which could be applied
in releasing architects from repetitive works, and enhance
exploration of creative design solutions.
ACKNOWLEDGEMENTS
This paper is the continuing research of ‘Understanding and
Visualizing Generative Adversarial Networks in Architectural
Drawings’ by the authors. The previous article was published as a
short paper in CAADRIA 2018.
REFERENCES
Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio.
2014. “Generative Adversarial Nets.” In Advances in Neural
Information Processing Systems 27. Montreal, QC: NIPS.
Isola, Phillip, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros.
2017. “Image-to-image Translation with Conditional Adversarial
Networks.” arXiv preprint. arXiv:1611.07004
Luo, Dan, Jinsong Wang, and Weiguo Xu. 2018. “Robotic Automatic
Generation of Performance Model for Non-Uniform Limear
Material via Deep Learning.” In Learning, Prototyping and Adapting,
Proceedings of the 23rd International Conference on Computer-
Aided Architectural Design Research in Asia. Beijing: CAADRIA.
Mirza, Mehdi, and Simon Osindero. 2014. “Conditional Generative
Adversarial Nets.” arXiv preprint. arXiv:1411.1784.
Peng, Wenzhe, Fan Zhang, and Takehiko Nagakura. 2017.
“Machines’ Perception of Space: Employing 3D Isovist Methods
and a Convolutional Neural Network in Architectural Space
Classication.” In Disciplines & Disruption: Proceedings of the 37th
Annual Conference of the Association for Computer Aided Design in
Architecture, edited by Takehiko Nagakura, Skylar Tibbits, Mariana
Ibanez, and Caitlin Mueller, 474–81. Cambridge, MA: ACADIA.
Wang, Ting-Chun, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz,
and Bryan Catanzaro. 2017. “High-Resolution Image Synthesis
and Semantic Manipulation with Conditional GANs.” arXiv preprint.
arXiv:1711.11585.
Zheng, Hao. 2018. “Drawing with Bots Human-computer Collaborative
Drawing Experiments.” In Learning, Prototyping and Adapting,
Proceedings of the 23rd International Conference on Computer-
Aided Architectural Design Research in Asia. Beijing: CAADRIA.
Zhu, Jun-Yan, Philipp Krähenbühl, Eli Shechtman, and Alexei A
Efros. 2016. “Generative Visual Manipulation on the Natural Image
Manifold.” In Proceedings of the 14th European Conference on
Computer Vision, part V, edited by Bastian Liebe, Jiri Matas, Nicu
Sebe, and Max Welling, 597–613. Amsterdam: ECCV.
IMAGE CREDITS
Figure 3: © Isola, Phillip, Jun-Yan Zhu, Tinghui Zhou, and Alexei A
Efros. 2017.
Figure 4: © Wang, Ting-Chun, Ming-Yu Liu, Jun-Yan Zhu, Andrew
Tao, Jan Kautz, and Bryan Catanzaro. 2017.
All other drawings and images by the authors.
Weixin Huang is an Associate Professor in the School of
Architecture, Tsinghua University. He is the Associate Director
of the Digital Architectural Technology Education Committee of
China, a committee member of Computer Aided Architectural
Design Research in Asia (CAADRIA), and one of the founders of
Digital Architectural Design Association (DADA) of the Architectural
Society of China (ASC). He received his Ph.D. from Kyoto University,
Japan. His research focuses on digital design of architectural &
structural integrated systems, big data spatio-temporal behavior
analysis, and design cognition.
Hao Zheng is currently a Ph.D. student at the University of
Pennsylvania, School of Design. He is a programmer and design
researcher, specializing in machine learning, robotic tech-
nology, mixed reality, and generative design. He holds a Master
of Architecture degree from the University of California, Berkeley,
and a Bachelor of Architecture degree from Shanghai Jiao Tong
University. Before joining UPenn, Hao worked as a research
assistant at Tsinghua University with a concentration on robotic
assembly and machine learning and at UC Berkeley with a concen-
tration on bio-material 3D printing and deep learning.
COMPUTATIONAL INFIDELITIES
... The adaptation of Pix2Pix in architectural plan layout generation has been investigated by many researchers (Chaillou, 2020;Huang and Zheng, 2018;Liu et al., 2021;Liu et al., 2022;Tian, 2021). Huang and Zheng (2018) adopt Pix2Pix to provide image-to-image translation. ...
... The adaptation of Pix2Pix in architectural plan layout generation has been investigated by many researchers (Chaillou, 2020;Huang and Zheng, 2018;Liu et al., 2021;Liu et al., 2022;Tian, 2021). Huang and Zheng (2018) adopt Pix2Pix to provide image-to-image translation. ...
... In contrast to previous studies that used preprepared data for ML problems (Huang and Zheng, 2018), the data in this study is gathered through analogue drawing from architectural precedents. Therefore, the data collection phase of the study is labor intensive. ...
Full-text available
Article
Purpose – This study aims to present a twofold machine learning (ML) model, namely, EDU-AI, and its implementation in educational buildings. The specific focus is on classroom layout design, which is investigated regarding implementation of ML in the early phases of design. Design/methodology/approach – This study introduces the framework of the EDU-AI, which adopts generative adversarial networks (GAN) architecture and Pix2Pix method. The processes of data collection, data set preparation, training, validation and evaluation for the proposed model are presented. The ML model is trained over two coupled data sets of classroom layouts extracted from a typical school project database of the Ministry of National Education of the Republic of Turkey and validated with foreign classroom boundaries. The generated classroomlayouts are objectively evaluated through the structural similaritymethod (SSIM). Findings – The implementation of EDU-AI generates classroom layouts despite the use of a small data set. Objective evaluations show that EDU-AI can provide satisfactory outputs for given classroom boundaries regardless of shape complexity (reserved for validation and newly synthesized). Originality/value – EDU-AI specifically contributes to the automation of classroom layout generation using ML-based algorithms. EDU-AI’s two-step framework enables the generation of zoning for any given classroom boundary and furnishing for the previously generated zone. EDU-AI can also be used in the early design phase of school projects in other countries. It can be adapted to the architectural typologies involving footprint, zoning and furnishing relations.
... In recent years, various design disciplines have become interested in utilizing artificial intelligence (AI) in their design deliberations, such as studies on deep learning (DL) systems to generate novel floor layouts, or answer other design questions Chaillou 2021;Huang and Zheng, 2018;Nauata et al. 2020). Our study explores artificial neural networks' (ANN) ability to produce landscape designs for small scale residential front-, and backyards. ...
... We used Dataset I and Dataset II to train the Pix2Pix GAN engine, with the aim to produce front and backyard designs from a rough sketch. There are studies that try to relate architectural design with deep learning processes using a similar number of examples and similar algorithms (Huang and Zheng, 2018). The workflow process requires the semantic segmentation of Dataset I. We manually produced each project sample according to the following protocol: a.black: roads zones, b.grey: buildings zones, c.green: softscape zone, d.orange: hardscape zone, e.blue: water bodies. ...
Full-text available
Conference Paper
The use of artificial intelligence (AI) engines in the design disciplines is a nascent field of research, which became very popular over the last decade. In particular, deep learning (DL) and related generative adversarial networks (GANs) proved to be very promising. While there are many research projects exploring AI in architecture and urban planning, e.g., in order to generate optimal floor layouts, massing models, evaluate image quality, etc., there are not many research projects in the area of landscape architecture-in particular the design of two-dimensional garden layouts. In this paper, we present our work using GANs to generate optimal front-and backyard layouts. We are exploring various GAN engines, e.g., DCGAN, that have been successfully used in other design disciplines. We used supervised and unsupervised learning utilizing a massive dataset of about 100,000 images of front-and backyard layouts, with qualitative and quantitative attributes, e.g., idea and beauty scores, as well as functional and structural evaluation scores. We present the results of our work, i.e., the generation of garden layouts, and their evaluation, and speculate on how this approach may help landscape architects in developing their designs. The outcome of the study may also be relevant to other design disciplines.
... Machine learning has been used during the design stage to augment generative design and parametric simulations. Deep generative algorithms such as Generative Adversarial Networks (GANs) [99,100] have been proposed for generating diverse but realistic architectural floorplans that are known to be a time-consuming iterative process. The automated generation of architectural floorplans can be coupled with BPS tools to systematically explore architectural layouts that optimize building energy efficiency [101]. ...
Full-text available
Preprint
Energy consumption in buildings, both residential and commercial, accounts for approximately 40% of all energy usage in the U.S., and similar numbers are being reported from countries around the world. This significant amount of energy is used to maintain a comfortable, secure, and productive environment for the occupants. So, it is crucial that the energy consumption in buildings must be optimized, all the while maintaining satisfactory levels of occupant comfort, health, and safety. Recently, Machine Learning has been proven to be an invaluable tool in deriving important insights from data and optimizing various systems. In this work, we review the ways in which machine learning has been leveraged to make buildings smart and energy-efficient. For the convenience of readers, we provide a brief introduction of several machine learning paradigms and the components and functioning of each smart building system we cover. Finally, we discuss challenges faced while implementing machine learning algorithms in smart buildings and provide future avenues for research at the intersection of smart buildings and machine learning.
... In the field of medical imaging, GANs were applied to facilitate exploration and discovery of the underlying structure of training data and learning to synthesize new samples to address the chronic scarcity of labeled data (Iqbal and Ali, 2018;Frid-Adar et al., 2018;Han et al., 2018;Uzunova et al., 2020). In civil engineering, a few studies have used GANs for automated architectural and urban design (Huang and Zheng, 2018;Nauata et al., 2020;Quan, 2022;Arora et al., 2021). Uzun et al. (2020) utilized DCGAN to autonomously produce architectural plan scheme and evaluate it as a generative plan layout tool. ...
Full-text available
Article
The application of Artificial Intelligence (AI) is a popular trend to make damage inspection and analysis in structural health monitoring more intelligent and automatic. However, the existing AI-based approaches, especially vision-based methods, mainly focus on damage identification and quantification from images without further analysis to obtain structural load-carrying performance. This paper proposes a Damage-T Generative Adversarial Network (Damage-T GAN) to achieve fast translation from real-world crack images to numerical damage contours. To verify its applicability and effectiveness, two datasets from different reinforced concrete beams were built, and the performance of the trained GAN model was evaluated against the metrics IS, FID, etc. After obtaining the damage contour, a purely visual approach was applied to quantify the damage. As a result, the proposed framework greatly helps field engineers to quickly judge the damage stage of beams in site scenes by simultaneous acquisition of real-world crack images and the generated damage contours of numerical model.
... Floorplan design is a crucial part of house design, which involves designing the room layouts and their connectivities such as walls and doors. With the availability of several largescale floorplan benchmarks [4,5,24] and advance in generative models such as generative adversarial networks (GANs) [6], generating floorplans automatically has recently attracted the attention and interest of architects as well as computer vision researchers [10,19]. ...
Chapter
The automatic generation of floorplans given user inputs has great potential in architectural design and has recently been explored in the computer vision community. However, the majority of existing methods synthesize floorplans in the format of rasterized images, which are difficult to edit or customize. In this paper, we aim to synthesize floorplans as sequences of 1-D vectors, which eases user interaction and design customization. To generate high fidelity vectorized floorplans, we propose a novel two-stage framework, including a draft stage and a multi-round refining stage. In the first stage, we encode the room connectivity graph input by users with a graph convolutional network (GCN), then apply an autoregressive transformer network to generate an initial floorplan sequence. To polish the initial design and generate more visually appealing floorplans, we further propose a novel panoptic refinement network (PRN) composed of a GCN and a transformer network. The PRN takes the initial generated sequence as input and refines the floorplan design while encouraging the correct room connectivity with our proposed geometric loss. We have conducted extensive experiments on a real-world floorplan dataset, and the results show that our method achieves state-of-the-art performance under different settings and evaluation metrics.
... Huang et Zheng ont appliqué pix2pixHD pour reconnaître des dessins d'architecture et générer des plans d'appartements [13]. Chaillou et Nauata ont exploré la génération de plans d'étage pour produire des espaces fonctionnels et rationnels basés sur des styles spécifiques (baroque, victorien...). ...
Full-text available
Article
Présentée dans le passé comme une approche visant à imiter l’intelligence biologique, l’adaptation et l’évolution dans la résolution de problèmes, l’intelligence artificielle (IA) mobilise massivement la communauté scientifique, et se redéfinit constamment dans ses ambitions, voire son essence même, surpassant les précédents modèles analytiques, prédictifs et génératifs en parallèle avec une constante et rapide évolution des équipements de calcul dédiés. Si l’IA excelle depuis quelques années dans la résolution de problèmes touchant de nombreux champs de l’activité humaine, ses applications aux domaines architectural et urbain n’en sont qu’aux balbutiements. Mais depuis 2017, l’IA générative est convoquée pour répondre à des objectifs non quantifiables ou difficilement mesurables (comme l’esthétique) et stimuler la créativité des concepteurs, et ses algorithmes, en particulier les GANs, commencent à être mis à profit pour l’aide à la conception en phase d’idéation architecturale.
... With the development of computer vision, imagebased deep learning methods are popular to address image segmentation and semantic retrieval problems (O'Mahony et al. 2019). Huang and Zheng (2018) used Generative Adversarial Networks (GAN) to identify room functions and building elements in the architectural drawings. Their method performed well in recognizing rooms with orthogonal shapes and furniture symbols but led to errors when a room has irregular shapes or insufficient details. ...
Article
Purpose In this study, a novel framework based on deep learning models is presented to assess energy and environmental performance of a given building space layout, facilitating the decision-making process at the early-stage design. Design/methodology/approach A methodology using an image-based deep learning model called pix2pix is proposed to predict the overall daylight, energy and ventilation performance of a given residential building space layout. The proposed methodology is then evaluated by being applied to 300 sample apartment units in Tehran, Iran. Four pix2pix models were trained to predict illuminance, spatial daylight autonomy (sDA), primary energy intensity and ventilation maps. The simulation results were considered ground truth. Findings The results showed an average structural similarity index measure (SSIM) of 0.86 and 0.81 for the predicted illuminance and sDA maps, respectively, and an average score of 88% for the predicted primary energy intensity and ventilation representative maps, each of which is outputted within three seconds. Originality/value The proposed framework in this study helps upskilling the design professionals involved with the architecture, engineering and construction (AEC) industry through engaging artificial intelligence in human–computer interactions. The specific novelties of this research are: first, evaluating indoor environmental metrics (daylight and ventilation) alongside the energy performance of space layouts using pix2pix model, second, widening the assessment scope to a group of spaces forming an apartment layout at five different floors and third, incorporating the impact of building context on the intended objectives.
Full-text available
Chapter
Wind comfort plays a central role in improving the safety, livability, and resilience of urban environments. The modification of wind patterns by buildings can cause physical discomfort to pedestrians and danger to vulnerable populations. The height, size, location and shape of buildings and urban features have a significant effect on wind acceleration or mitigation. A study was performed on the potential for small-scale elements of enhancing wind comfort within three pedestrian areas in Tallinn’s Ülemiste district, which suffers from high urban wind discomfort. The investigation combined parametric design and CFD simulations to test a variety of wind shelter types and sizes and urban layout design to incorporate them into open spaces. A Lawson wind comfort criterion was used to evaluate wind discomfort in the actual situation and the possibility of improving comfort with the shelters. Based on initial results, the area in the state of comfort improved from 40% to 83 %. The methods and results are presented in detail in the paper
Full-text available
Conference Paper
In the following research, a systematic approach is developed to generate an experiment-based performance model that computes and customizes properties of non-uniform linear materials to accommodate the form of designated curve under bending and natural force. In this case, the test subject is an elastomer strip of non-uniform sections. A novel solution is provided to obtain sufficient training data required for deep learning with an automatic material testing mechanism combining robotic arm automation and image recognition. The collected training data are fed into a deep combination of neural networks to generate a material performance model. Unlike most traditional performance models that are only able to simulate the final form from the properties and initial conditions of the given materials, the trained neural network offers a two-way performance model that is also able to compute appropriate material properties of non-uniform materials from target curves. This network achieves complex forms with minimal and effective programmed materials with complicated nonlinear properties and behaving under natural forces.
Full-text available
Conference Paper
When drawing architectural images like plan drawings, designers should always define every details, so the images can contain enough information to support a design. However, the core information in a complex design is usually very simple. For example, in a plan drawing of an office building, what we really care is the boundary and function of each room, rather than the exact location of furniture should be. So here, we propose a method to help designers automatically generate the predicted details of architectural drawings, based on Conditional Generative Adversarial Network (CGAN). Through machine learning of hundreds of image pairs, the learning program will build a model to find out the connections between two given images, then the evaluation program will generate an output image according to the new input image we provide. Four experiments about generating architectural plan images and city satellite images will be introduced in this article. The final goal for this method is to assist designers to simplify drawing process, and eventually to replace designers to self-draw architectural and city images.
Full-text available
Article
We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs). Conditional GANs have enabled a variety of applications, but the results are often limited to low-resolution and still far from realistic. In this work, we generate 2048x1024 visually appealing results with a novel adversarial loss, as well as new multi-scale generator and discriminator architectures. Furthermore, we extend our framework to interactive visual manipulation with two additional features. First, we incorporate object instance segmentation information, which enables object manipulations such as removing/adding objects and changing the object category. Second, we propose a method to generate diverse results given the same input, allowing users to edit the object appearance interactively. Human opinion studies demonstrate that our method significantly outperforms existing methods, advancing both the quality and the resolution of deep image synthesis and editing.
Full-text available
Article
We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.
Full-text available
Conference Paper
Realistic image manipulation is challenging because it requires modifying the image appearance in a user-controlled way, while preserving the realism of the result. Unless the user has considerable artistic skill, it is easy to "fall off" the manifold of natural images while editing. In this paper, we propose to learn the natural image manifold directly from data using a generative adversarial neural network. We then define a class of image editing operations, and constrain their output to lie on that learned manifold at all times. The model automatically adjusts the output keeping all edits as realistic as possible. All our manipulations are expressed in terms of constrained optimization and are applied in near-real time. We evaluate our algorithm on the task of realistic photo manipulation of shape and color. The presented method can further be used for changing one image to look like the other, as well as generating novel imagery from scratch based on user's scribbles.
Article
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
Article
Generative Adversarial Nets [8] were recently introduced as a novel way to train generative models. In this work we introduce the conditional version of generative adversarial nets, which can be constructed by simply feeding the data, y, we wish to condition on to both the generator and discriminator. We show that this model can generate MNIST digits conditioned on class labels. We also illustrate how this model could be used to learn a multi-modal model, and provide preliminary examples of an application to image tagging in which we demonstrate how this approach can generate descriptive tags which are not part of training labels.
Machines' Perception of Space: Employing 3D Isovist Methods and a Convolutional Neural Network in Architectural Space Classification
  • Wenzhe Peng
  • Fan Zhang
  • Takehiko Nagakura
Peng, Wenzhe, Fan Zhang, and Takehiko Nagakura. 2017. "Machines' Perception of Space: Employing 3D Isovist Methods and a Convolutional Neural Network in Architectural Space Classification." In Disciplines & Disruption: Proceedings of the 37th