Conference PaperPDF Available

iPLAN: Interactive and Procedural Layout Planning

Authors:

Abstract and Figures

Layout design is ubiquitous in many applications, e.g. architecture/urban planning, etc, which involves a lengthy iterative design process. Recently, deep learning has been leveraged to automatically generate layouts via image generation, showing a huge potential to free designers from laborious routines. While automatic generation can greatly boost productivity, designer input is undoubtedly crucial. An ideal AI-aided design tool should automate repetitive routines, and meanwhile accept human guidance and provide smart/proactive suggestions. However, the capability of involving humans into the loop has been largely ignored in existing methods which are mostly end-to-end approaches. To this end, we propose a new human-in-the-loop generative model, iPLAN, which is capable of automatically generating layouts, but also interacting with designers throughout the whole procedure, enabling humans and AI to co-evolve a sketchy idea gradually into the final design. iPLAN is evaluated on diverse datasets and compared with existing methods. The results show that iPLAN has high fidelity in producing similar layouts to those from human designers, great flexibility in accepting designer inputs and providing design suggestions accordingly, and strong generalizability when facing unseen design tasks and limited training data.
Content may be subject to copyright.
iPLAN: Interactive and Procedural Layout Planning
Feixiang He
University of Leeds, UK
scfh@leeds.ac.uk
Yanlong Huang
University of Leeds, UK
y.l.huang@leeds.ac.uk
He Wang *
University of Leeds, UK
h.e.wang@leeds.ac.uk
Abstract
Layout design is ubiquitous in many applications, e.g.
architecture/urban planning, etc, which involves a lengthy
iterative design process. Recently, deep learning has been
leveraged to automatically generate layouts via image gen-
eration, showing a huge potential to free designers from la-
borious routines. While automatic generation can greatly
boost productivity, designer input is undoubtedly crucial.
An ideal AI-aided design tool should automate repetitive
routines, and meanwhile accept human guidance and pro-
vide smart/proactive suggestions. However, the capabil-
ity of involving humans into the loop has been largely ig-
nored in existing methods which are mostly end-to-end ap-
proaches. To this end, we propose a new human-in-the-loop
generative model, iPLAN, which is capable of automati-
cally generating layouts, but also interacting with designers
throughout the whole procedure, enabling humans and AI
to co-evolve a sketchy idea gradually into the final design.
iPLAN is evaluated on diverse datasets and compared with
existing methods. The results show that iPLAN has high
fidelity in producing similar layouts to those from human
designers, great flexibility in accepting designer inputs and
providing design suggestions accordingly, and strong gen-
eralizability when facing unseen design tasks and limited
training data.
1. Introduction
Layout generation has recently spiked research interests
in computer vision/graphics, aiming to automate the design
process and boost the productivity. The traditional design
process follows a diagram of iteratively adjusting/finalizing
details from coarse to fine and global to local, which im-
poses repetitive and laborious routines on designers. Very
recently, it has been shown that automatic image genera-
tion of such designs (with minimal human input) is pos-
sible through learning from data [12,25,26,36]. This new
line of research combines deep learning with design and has
*Corresponding author
demonstrated a new avenue for AI-aided design.
To achieve full automation, current research tends to
learn from existing designs in an end-to-end fashion, and
then to generate new ones with qualitative similarity and
sufficient diversity. Taking floorplan as an example, au-
tomated generation can be based on simple human input,
such as the boundary of the floor space [36], the relations
among rooms [25,26], or both [12]. While fully auto-
mated generation is important, design is in nature a pro-
cedural process, which involves alternations between repet-
itive routines and creative thinking at multiple intermediate
stages [29]. Therefore, an ideal AI-aided system should au-
tomate the routine part while allowing the designer to im-
part creativity. This requires the system to be able to in-
teract with the designer, in the sense that it should accept
designer’s guidance, then actively suggest possible solu-
tions accordingly, completing a feed-back loop. So far, the
human-in-the-loop element is largely missing, which pre-
vents a closer integration of AI and existing design practice.
Designing such an AI model faces several intrinsic chal-
lenges. In practice, learning how to interact with the de-
signer requires a full observation of the decisions made at
every intermediate stage. However, existing datasets, such
as RPLAN [36] and LIFULL [27], usually only include
the final designs, without the stage-to-stage design process.
One potential solution to overcome this issue is to reverse-
engineer intermediate stages from final designs, which how-
ever leads to another difficulty: the order of stages depends
on the specific task/goal and could vary dramatically even
for the same final design. Further, the order uncertainty is
exacerbated by the strong personal styles and preferences of
designers. Thus, how to design an AI system that can ac-
count for the above factors is a key research question, which
is under-explored to date.
In this paper, we propose a new human-in-the-loop gen-
erative model for layout generation, which is referred to
as interactive planning (iPLAN). Unlike previous work,
iPLAN is equipped with a user-friendly interaction mech-
anism, which is achieved by letting the AI model learn the
multi-stage design process, aiming to accommodate free-
form user interactions and propose design suggestions at
every stage. This allows designer inputs at different stages
across a wide range of levels of detail, while offering the
capability of fully automated generation. To address the
challenge of missing procedural design data, we reverse-
engineer the final design to obtain the stage-to-stage pro-
cess, based on principles that are widely adopted by profes-
sional designers [29]. This enables us to design a Markov
chain model to capture the full design procedure. Since
there is more than one way to reverse-engineer the final de-
signs (i.e., the stage order can vary), our model is designed
with the capacity of accepting inputs with an arbitrary or-
der, and consequently can learn the style variations implic-
itly from the data.
While iPLAN is general, we focus on floorplan design
in this paper. iPLAN has been validated on two large-scale
benchmark datasets, i.e., RPLAN [36] and LIFULL [27],
under diverse scenarios. The experiments show that our
model is versatile in accepting designer inputs at various
levels of detail, from minimal input and automatic gen-
eration, to stage-to-stage human guidance and interactive
design. By learning from designs augmented by reverse-
engineered processes, our model exhibits high fidelity in
generating new designs with close style similarity and suf-
ficient diversity. Finally, our model is highly flexible and
generalizable when trained on varying amounts of data and
facing unseen spaces and design requirements that are cate-
gorically different from the training data.
Contributions: (i) We propose a novel human-in-the-
loop generative model iPLAN which respects design prin-
ciples and mimics the design styles of professional design-
ers implicitly. (ii) We demonstrate a successful fine-grained
stage-to-stage generative model for floorplan, as opposed
to existing end-to-end approaches. (iii) We show a variety
of design scenarios, including fully automated generation,
interactive planning with user instructions, and generaliza-
tion for unseen tasks; (iv) We conduct extensive evaluations
on diverse benchmark datasets and demonstrate that iPLAN
outperforms the state of the art under multiple metrics.
2. Related Work
Layout generation has been an active research area in
computer vision, e.g., indoor scene synthesis [6,23,37,39,
40] and floorplan generation [2,9,12,24,28,36], image com-
position [1,15], etc. Existing approaches can be generally
grouped into two categories: handcrafted rule-based meth-
ods and data-driven methods. We mainly review the latter
as they are closely related to our research.
Indoor scene synthesis. The synthesis of indoor scenes
typically involves the placement of furniture models from
an existing database into a given room. Convolutional neu-
ral networks can be trained to iteratively insert one ob-
ject at a time into a room for indoor scene generation
[31,35]. High-level scene semantics can also be employed,
e.g., scene graphs, as a prior for a more controlled genera-
tion [34]. The biggest difference between indoor scene syn-
thesis and floorplan generation is their requirements on the
space partitioning. While indoor scene synthesis places ob-
jects in a room where the room itself does not need to be
divided, floorplan normally requires explicit space division
for different functionalities.
Image composition from scene graphs. Another related
field is image composition from scene graphs, where the
task is to derive the scene from a layout graph that de-
scribes the locations and features of the objects. Such gener-
ation can be achieved by Generative Adversarial Networks
(GANs) based on graph convolution [15]. Further improve-
ments can be obtained by separating the layout from the
appearance of the objects [1]. For more controllability, Li
et al. [21] synthesize images from a scene graph and the
corresponding image crops. In contrast to the floorplan, the
challenge in image composition is how to compose different
objects into an image rather than partitioning the space.
Floorplan generation Floorplan generation can be for-
mulated as an image synthesis problem which is one ac-
tive research area in computer vision. Due to the surge of
deep learning, the most promising approaches are GANs
[3,7,1619,38]. Image-based GANs have been proven ef-
fective in floorplan generation [4,13,14,30,4143]. Graph-
based GANs can also produce floorplans by only taking spa-
cial constraints, such as room connections, room types, in
the form of graph [25,26]. However, all these methods are
end-to-end approaches, and therefore provide limited inter-
activity to the designer.
More recently, some human-in-the-loop approaches are
proposed. Wu et al. [36] propose a two-phase approach
to produce floorplans of residential buildings. The model
successively predicts locations of rooms and walls given a
building boundary, and converts the predicted layout into
a vector graphics format. Graph2Plan [12] combines the
topology information in the form of graphs with spatial con-
straints, to instantiate rooms accordingly. These methods
enable human interactions at certain stages, e.g., modify-
ing room locations and retrieving the graph. Different from
the existing methods, we propose a fine-grained generative
model which enables interactions with the designer at dif-
ferent levels, from providing high-level design requirements
to low-level instructions at a step.
3. Methodology
The overview of iPLAN is presented in Fig. 1. Without
loss of generality, our model takes the space boundary as
input, and decomposes the design procedure into: acquir-
ing room types, locating rooms, and finalizing room parti-
tions. Such workflow aims to mimic human designers [29]
and accept designer inputs at any stage. The workflow is
modeled as a joint probability distribution of all the afore-
Figure 1. Overview of our framework iPLAN. Room types are pre-
dicted at a time, while room locations and partitions are predicted
iteratively.
mentioned factors, which is then factorized into stages and
later formulated as a Markov chain (Sec. 3.1). Next, each
factorized distribution provides a flexible entry point to in-
corporate user input or can be used for automatic generation
(Sec. 3.2–Sec. 3.3).
3.1. Problem Formulation
A dataset of Hlayouts is denoted by D={Di}H
i=1 with
the i-th layout Di= (Bi,Ri,Ti, Ni,Ci).BiR128×128
is the boundary, Nidenotes the total number of rooms. We
use jto index a specific room. Ri={ri,j}Ni
j=1 denotes
the room regions with ri,j R4indicating the top-left and
bottom-right corners of the bounding box of the room. Ti=
{ti,j }Ni
j=1 is a set of room types ti,j Z+.Ci={ci,j }Ni
j=1
is a set of room centers ci,j R2. Given D, we aim to
design a generative model for P(D) = QH
i=1 P(Di).
Similarly to existing methods, we also formulate floor-
plan design as an image generation problem, but with our
focus on proposing a fine-grained generative model to en-
able human-in-the-loop interaction, through decomposing
P(D)appropriately. From a mathematical perspective,
there are many ways to decompose P(D). In order to al-
low human inputs across various levels of detail, we rely on
design principles and widely adopted practice [29] to mimic
the human design workflow, and naturally divide a floorplan
design procedure into several stages. First, the desired num-
ber of rooms and their types are determined. Next, locations
of areas with specific functionality (e.g., living rooms, bed-
rooms) are roughly estimated. Finally, room partitioning
is conducted to finalize the design. This design diagram
serves as a strong inductive bias in our model, following
which P(D)is decomposed into:
P(D) =
H
Y
i=1
P(Di) =
H
Y
i=1
P(Ri,Ci,Ti, Ni,Bi)
=
H
Y
i=1
P(Ri|Ci,Ti, Ni,Bi)P(Ci|Ti, Ni,Bi)
P(Ti, Ni|Bi)P(Bi)
(1)
where P(Bi)accounts for the boundary known a priori;
P(Ti, Ni|Bi)is to infer the desired number of rooms and
room types; P(Ci|Ti, Ni,Bi)and P(Ri|Ci,Ti, Ni,Bi)
respectively correspond to the coarse and fine designs of
the layout, where the former estimates the room locations
while the latter predicts the exact partitions. A visualiza-
tion of Eq. (1) is provided in Figure 1, where the sec-
ond, third and fourth blocks correspond to P(Ti, Ni|Bi),
P(Ci|Ti, Ni,Bi)and P(Ri|Ci,Ti, Ni,Bi), respectively.
Inspired by indoor scene synthesis where objects are
placed iteratively [31,35], we assume rooms are designed
one by one. Formally, we model P(Ri|Ci,Ti, Ni,Bi)and
P(Ci|Ti, Ni,Bi)as Markov chains, i.e., designs are con-
ducted in a step-wise manner and early decisions will affect
later ones, which allows a designer to focus on one room at
a time and give guidance at any step:
P(Ci|Ti, Ni,Bi) =
Ni
Y
j=1
P(ci,j |ci,<j , ti,j , Ni,Bi)(2)
P(Ri|Ci,Ti, Ni,Bi)=
Ni
Y
j=1
P(ri,j |ri,<j ,ci,j , ti,j , Ni,Bi)
(3)
where ri,<j={ri,1,. . . ,ri,j1
}and ci,<j={ci,1,...,ci,j1
}
denote the set of allocated room partitioning and centers be-
fore the j-th room, respectively.
Connections to existing research. Eq. (1)–(3) is
a generalization of existing methods and is more fine-
grained. Graph2Plan [12] simultaneously determines
Ci,Tiand Nigiven Bi, and then P(Ri|Ci,Ti, Ni,Bi)
is predicted. RPLAN [36] estimates P(Ci,Ti, Ni|Bi)
via predicting P(ci,j , ti,j |ci,<j , ti,<j ,Bi)consecutively,
and then indirectly estimates room areas Riby locating
walls. In contrast, we further decompose P(Ci,Ti, Ni|Bi)
into P(Ti, Ni|Bi)and P(Ci|T, Ni,Bi), and fur-
ther decompose the latter using Eq. (2). Moreover,
P(Ri|Ci,Ti, Ni,Bi)is also decomposed into multiple
steps by Eq. (3). Our decompositions bring a more
fine-grained procedural generative model that allows for
user interactions at arbitrary steps. This enables more
flexible and closer human-AI interactions. For the sake of
brevity, we omit the subscripts iof Ti, Ni,Biand Ciin
the following sections.
3.2. The Number and Types of Rooms
Tand Nare normally given beforehand. However,
given a specific B, the design might not be unique, e.g.,
the same space can be designed as a 2-bed or 3-bed flat. In
other words, there exists a distribution of possible designs.
Thus, we propose to learn their distributions P(T, N |B)
from real designs by professional designers to enable au-
tomatic exploration. {T, N }can be replaced by a random
variable Q={qk}K
k=1, where Kdenotes the number of
room types in Dand qkcorresponds to the number of rooms
Figure 2. Prediction of a room center at one step
under the k-th type. So, we model P(Q|B)instead.
We propose a boundary-conditioned Variational Autoen-
coder (BCVAE) based on VAE [32] where Bserves as a
condition. The model consists of an embedding module
Fed, an encoder Fen and a decoder Fde . By feeding Fed
with B, it outputs an embedded vector γR128. Also,
{µ,Σ}=Fen(Q,γ)where µR32 and ΣR32×32 are
the mean and covariance of a Gaussian distribution. Fur-
thermore, a latent variable zis sampled from N(µ,Σ)us-
ing the reparameterization trick [20]. Given zand γ, we re-
construct Qby ˆ
Q=Fde(z,γ). We employ a standard VAE
loss for training. The detailed network architecture and the
training loss are discussed in the supplementary material.
During inference, for a new boundary B, we sample
zfrom N(0,I)and predict ˆ
Qby ˆ
Q=Fde(z,Fed (B)),
which is then used to recover ˆ
Tand ˆ
N.
3.3. Locating Rooms
Locating room regions P(C|T, N, B)plays an essential
role in the design process. Similarly to [36], we model room
region prediction as a step-wise classification task. At each
step, given a multi-channel image representation of the cur-
rent design state and the next desired room type tj, we pre-
dict the center of the next room. The multi-channel image
encodes the boundary Band all previously predicted room
centers c<j . The image consists of K+ 4 binary channels,
three of which label B,i.e., the boundary, the front door
and the interior area pixels. Kchannels represent the pre-
dicted room centers, with each channel corresponding to a
room type. For each room, its center is represented by a
9×9square of pixels with value 1. We also use a channel
to summarize the centers of all predicted rooms. Regarding
the desired room type tj, we convert it into a one-hot vector.
We feed the multi-channel representation to a Resnet-
18 [10] network and the one-hot vector of the desired room
type to an embedding network to extract features. The em-
bedding network has three fully-connected layers, followed
by four convolution blocks, a Batch Normalization Layer
and a LeakyReLu Layer. The extracted features are con-
catenated and subsequently fed to a decoder module that
contains 4 atrous spatial pyramid pooling (ASPP) [5], a
Convolutional block and a Deconvolutional block. The out-
put is OR(K+3)×128×128, giving a probability vector of
Figure 3. Procedural room shape generation
all pixels. For each pixel, we predict K+ 3 labels, i.e., K
room types and EXISTING, FREE, and OUTSIDE, where
EXISTING, FREE and OUTSIDE indicate whether a pixel
belongs to an existing room, a free space (in the interior
of the boundary) and the exterior, respectively. The whole
model is shown in Figure 2.
For training, we predict one room at a time, by decom-
posing the final design into a series of design states with
one room added at each stage. However, there are multiple
possible sequences for a given final design and the ground-
truth step-to-step decisions are unavailable. Therefore, we
employ a stochastic training process to learn all possible se-
quences by randomly removing rooms from a design. We
propose a pixel-wise cross-entropy loss:
L=
128
X
h=1
128
X
w=1
ωylog exp(Oy,h,w )
PK+3
k=1 exp(Ok,h,w ),(4)
where yis the ground-truth class index for the pixel located
at (h, w).ωyrepresents the weight of y-th label, we set it to
be 2 for Kroom types and 1.25 for the other three labels.
During the inference phase, for a sequence of room types
T={t1, t2, . . . , tN}we first predict the room center for
t1,i.e., P(c1|t1, N, B), then we update the current design
state using this predicted center ˆ
c1and continue to predict
the next room center for t2,i.e., P(c2|ˆ
c1, t2, N, B). This
procedure is repeated until all centers ˆ
Care determined.
Note that when the order of elements in Tvaries, its cor-
responding layout can also change, thus providing design
diversity. Also, at any intermediate step, user input (e.g.,
adjust a room center) can be incorporated into our model
before it predicts the next room center.
3.4. Predicting Room Partitioning
After predicting the room centers ˆ
C, we are ready for de-
tailed room partitioning P(R|ˆ
C,ˆ
T,ˆ
N,B), where we need
to consider the room size and the room shape. The room
size is directly related to the specific functionality of the
room, e.g., living rooms are normally larger than bathrooms
for social interactions. The shape is affected by the func-
tionality too but also strongly affected by the boundary ge-
ometry and the inner walls. Further, there is usually no ‘op-
timal’ solution but a distribution of near-optimal solutions
when looking at the final designs. Since the distribution
can be arbitrary, we propose a new Generative Adversarial
Network (GAN) to model the step-wise prediction of room
partitioning. The general model is shown in Figure 3. In-
stead of directly predicting the shape of each room, we first
predict its bounding box. This allows the bounding box pre-
dicted at a later stage to grab the space that is already allo-
cated. We find this is a straightforward yet effective way to
generate non-box rooms.
Generator: The generator begins with a bounding-box
regressor Fbthat outputs the top-left and bottom-right cor-
ners of the box. Fbconsists of six convolutional units and
two fully connected layers, each unit with a Convolution
layer, a Layer-Normalization layer and a ReLU layer. Fb
enables the designer to interact with the system easily, e.g.,
relocating a room, modifying a room bounding box. How-
ever, only predicting the corners of bounding boxes is not
convenient. We want to label all the pixels within the box
so that we can work in the image space consistently. So
we design a tailor-made masking module Fmto map each
room bounding box into a room mask image. Fmcontains
six convolution layers with 3×3kernel, followed by a Sig-
moid activation function.
For a given sequence of room centers and types
{(c1, t1),(c2, t2),...,(cN, tN)}, we use Fbto predict the
coordinates ˆ
r1of the bounding box of the first room spec-
ified by c1and t1, where ˆ
r1=Fb(S0,c1, t1)with S0=
B. Furthermore, we obtain the room mask using ˆ
M1=
Fm(ˆ
r1)R128×128. The output of the generator at the
first step is computed by
ˆ
S1=S0×(1ˆ
M1) + ˆ
M1×t1,(5)
where 1R128×128 represents a matrix with all ele-
ments being 1. During the prediction in the second step,
the generator takes as input ˆ
S1,c2,t2and outputs ˆ
S2=
ˆ
S1×(1ˆ
M2) + ˆ
M2×t2with ˆ
M2=Fm(ˆ
r2) =
FmFb(ˆ
S1,c2, t2). The same procedure is iterated (for
totally ˆ
Nsteps) until the entire design is accomplished.
Discriminator: The discriminator retains the similar
backbone as Fbin the generator, except that the last two
convolution layers are dropped. The discriminator is em-
ployed to distinguish whether a sequence of design states
comes from the generator or the ground truth. Specifically, a
sequence of predicted design states ˆ
S={ˆ
S1,ˆ
S1,..., ˆ
Sˆ
N}
should be recognized as ‘FALSE’, while a sequence from
the ground truth with the same order of room types
(i.e.,{t1, t2, . . . , t ˆ
N}) should be predicted as ‘TRUE’.
For training, we define a loss on top of WGAN-GP [8]
loss, i.e.,
L=Eˆ
SPg[D(ˆ
S)] ESPr[D(S)]
+λ1E¯
SP¯
S(|| ▽ ¯
SD(¯
S)||21)2+λ2Ls,(6)
where Pgand Prrepresent the generated and real data dis-
tribution, respectively. ¯
SP¯
Sdenotes a random interpo-
lation between ˆ
Sand S, and is used for a gradient penalty
with λ1= 10.|| · ||2is the 2-norm. The last term is in-
troduced to explicitly regularize the bounding boxes with
λ2= 100:
Ls=
N
X
j=1
lj, lj=0.5(rjˆ
rj),if ||rjˆ
rj||1<1
||rjˆ
rj||10.5, otherwise
(7)
where rjis the ground truth and || · ||1is the 1-norm.
At the inference stage, our procedural model predicts
room areas in a step-wise manner, which hence allows user
input at an arbitrary intermediate step, such as modifying
room types, changing room centers or even the already pre-
dicted room areas. Gaps may exist between the predicted
rooms. Thus, we employ a simple post-processing step as
in [33] and detail it in the supplementary material.
4. Experiments
4.1. Datasets
We conduct experiments on two commonly used
datasets, RPLAN [36] and LIFULL [27]. RPLAN is col-
lected from the real-world residential buildings in the Asian
real estate market, which contains over 80k floorplans and
13 types of rooms1. All floorplans in RPLAN are axis-
aligned and pre-processed to the same scale. The training-
validation-test split of the dataset is 70%–15%–15% [12].
LIFULL HOME’s dataset offers approximately five mil-
lion apartment floorplans from the Japanese house market.
The original dataset is given in the form of images, but a
subset has been parsed into vector format by [22]. We se-
lect a subset with 410 rooms. This specific dataset consists
of approximately 54k floorplans and 9 room types2, among
which 85% (randomly sampled) of the data serves as the
training set while the remaining for testing.
4.2. Metrics
Although each floorplan only contains one design (i.e.,
one boundary corresponds to one design), our model can
predict the distribution of plausible designs, offering de-
sign diversity and alternative choices. To evaluate the pre-
dicted distribution, we employ the Fr´
echet Inception Dis-
tance (FID) [11] to calculate the distance between two
distributions, which was also used in floorplan generation
[25,26]. Built on FID, we introduce three metrics F I Dimg ,
F I Darea and F IDtype : (i)F IDimg : computed on ren-
dered images to evaluate the distributional differences of
1LivingRoom, MasterRoom, Kitchen, Bathroom, DiningRoom, Child-
Room, StudyRoom, SecondRoom, GuestRoom, Balcony, Entrance, Stor-
age, Wall-in.
2LivingRoom, Kitchen, Bedroom, Bathroom, Office, Balcony, Hall-
way, OtherRoom.
Dataset Method F IDimg F I Darea F I Dtype
RPLAN
HouseGAN++ 51.33 1.36 ×1080.038
Rplan 4.1 2.29 ×1050.58
OurII I 1.22 3.13 ×1040.05
OurII 0.72 1.09 ×1040.03
Rplan* 0.11 8.62 ×1036.4 ×103
Graph2Plan 0.62 8.82 ×1032.70 ×104
OurI0.16 4.89 ×1024.44×106
LIFULL
Rplan 50.19 4.29 ×1065.15
OurII I 37.35 9.81 ×1052.52
OurII 32.65 7.75 ×1052.14
Rplan* 1.43 5.62 ×1050.064
Graph2Plan 0.64 2.87 ×1032.63 ×105
OurI0.38 2.07 ×1033.59×106
Table 1. FID-based metrics on RPLAN and LIFULL.
the generated and true images. (ii)F I Darea: to evaluate
the distributional differences of room areas. Each layout is
represented by a 1×Kvector areai, with its k-th element
areai,k representing the average area of the k-th type of
rooms in the i-th floorplan. Kdenotes the number of room
types. (iii)F I Dtype: to calculate the distributional differ-
ences of room numbers against room types. Each layout is
represented by a 1×Kvector typei, whose element typei,k
represents the number of rooms under the k-th type in the
i-th floorplan. It is worthwhile to mention that the above
three metrics are not biased towards our method as they are
not involved in the training process at all.
4.3. Baselines
We choose three methods as baselines. Rplan3[36] is
an image-based method, which outputs a floorplan given
a boundary. We further use a variant of Rplan, named
Rplan, which also takes as input the room types and cen-
ters. HouseGAN++ [26] is a graph-constrained approach,
which treats rooms as nodes and requires node types and
connectivity as input. It generates mask images for all nodes
according to the graph, then blends them to form the layout.
Graph2plan [12] requires a boundary and a room relation
graph as input, where the graph includes information on
room size, room center, room type and their connections.
All floorplans are rendered in the same way as [12], includ-
ing door and window placements.
4.4. Quantitative Evaluations
Ablation Study. iPLAN is capable of fully automated
generation while allowing for user interactions at differ-
ent stages. Based on the amount of information received
from the designer, three variants are evaluated: (v1) OurI
takes as input the boundary, room types, and room cen-
ters; (v2) OurI I takes as input the boundary and room
3We rename the approach RPLAN as Rplan in order to distinguish it
from the dataset RPLAN.
types; (v3) OurI II takes as input only the boundary. The
variants show different levels of automation/user interactiv-
ity, demonstrating iPLAN’s flexibility when the depth of
human involvement varies, from little human input (e.g.,
iPLAN freely explores designs), to step-to-step guidance
(e.g., where to put each room). The implementation de-
tails are in the supplementary document. Table 1shows the
results. iPLAN achieves good results on all three settings,
among which OurIachieves the best results and OurII is
better than OurII I . This is natural as more prior informa-
tion helps for better prediction. Considering RPLAN and
LIFULL are significantly different in terms of the overall
shapes of boundaries and rooms, iPLAN can indeed handle
different data distributions well.
Comparisons on RPLAN iPLAN in general outper-
forms all baseline methods by large margins shown in Ta-
ble 1. Looking closely, HouseGAN++ achieves worse re-
sults, except F IDty pe with a small margin, than other meth-
ods. The reason is that its layouts are not bounded by
boundaries. By taking the boundary as input, both Rplan
and OurII I predict room centers and regions successively,
but OurII I outperforms Rplan on all three metrics. Further-
more, when both the boundary and room types are given,
OurII outperforms the first three methods. Finally, Rplan,
Graph2Plan and OurIare given the full information (bound-
ary, room types and centers). OurIfurther improves on
all metrics, where Rplanachieves a slightly better score
on F I Dimg. We observe that the step-wise prediction in
iPLAN is less likely to generate partitioning ambiguity (al-
locating a space to multiple rooms), while is observed in
Rplan. Also, learning conditional probabilities (as iPLAN
does) as oppose to a joint probability of multiple factors
from final designs (as Rplanand Graph2Plan do) is easier,
where the decomposition of final designs can be seen as a
data augmentation strategy.
Comparisons on LIFULL Similar comparisons have
been done on LIFULL (Table 1). We skip HouseGAN++
because it needs door locations in the input but there is
no such data in LIFULL. Compared with Rplan, OurI
and Graph2Plan, room centers are not provided as priors
for Rplan, OurI II and OurII , leading to a large perfor-
mance gap between these two groups. It is understandable
because LIFULL is more challenging/heterogeneous with
multi-scale samples and nested rooms. Next, OurI I out-
performs OurIII on all three metrics because more prior
knowledge (i.e., room types) is given, and both are better
than Rplan. Furthermore, when the room centers are pro-
vided, inaccurate wall prediction affects Rplan. Differ-
ent from the results on RPLAN, Graph2Plan outperforms
Rplanon LIFULL. This is because Graph2Plan takes spa-
cial room relations as input which is more robust for such
a multi-scale dataset. OurIis still better than Rplanand
Graph2Plan. Therefore, we can conclude that, when the
Figure 4. Qualitative comparisons. Shaded areas indicate design choices that are questionable and unobserved in data.
same amount of prior knowledge is given, iPLAN achieves
the best performance.
4.5. Qualitative Evaluation
To show intuitive results, qualitative comparisons are
given in Figure 4. Each color represents one room type (see
color indication in Fig. 5). The shaded boxes indicate ques-
tionable design areas. Specifically, in the first row, Rplan
designs an isolated bathroom at a corner in the first image,
which is a strange design and not observed in the data. In
the second row, the first image by Rplan allocates an unrea-
sonable space (shaded) to the public area, which however
should be assigned to the neighboring bedroom. Similarly,
the layout produced by Graph2Plan consists of 3 bedrooms
(second row), but the bathroom is embedded among them
(shaded), which means people need to pass through one
bedroom to go to the toilet. Besides, the bathroom is too
small. As for the last row, the room area is not properly
predicted by Graph2plan because the region (shaded) at the
bottom-left corner is too narrow.
By comparing OurI, OurII and OurIII , it is apparent
that OurIis closest to GT as a result of the strong input
constraints. In contrast, when less constraints are imposed,
the diversity is significant, as depicted in OurII and OurIII.
More results and analysis are shown in the supplementary
material.
4.6. User Interactions
The performance of iPLAN under different levels of hu-
man interactions is showcased in Figure 5. Figure 5(a) is
fully automated generation without human input other than
a boundary. iPLAN can generate diverse layouts, with a
varying number of rooms and different room types/areas.
Figure 5(b) shows the user can decide the room types, where
the final layouts are different when the order of room types
varies, showing the flexibility of iPLAN. We further pro-
Dataset Method F I Dimg F IDarea F I Dtype
RPLAN
Our8
II 1.2 1.36 ×1040.06
Our8
I0.2 2.69 ×1021.25 ×105
Our78
II 2.28 4.87 ×1040.18
Our78
I0.24 6.89 ×1021.54 ×105
Table 2. Generalization results on RPLAN.
vide iPLAN with more priors in Fig. 5(c), i.e., the bound-
ary, room types and centers, leading to a planned layout
that is nearly the same as the ground truth. We also evaluate
iPLAN by introducing a user input at an intermediate step
(Fig. 5(d)), where the balcony is moved to the left and con-
sequently a different layout is obtained. Figure 5demon-
strates the interactive ability of iPLAN, which is crucial in
settings where the designer leads the design.
4.7. Generalizability
Quantitative results To further push iPLAN, we quan-
titatively evaluate the generalizability of iPLAN by setting
up two groups of experiments on RPLAN. In the first group,
we select 51322 layouts consisting of 47rooms for train-
ing and randomly choose 12,000 8-room layouts for testing.
In the second group, we consider a more challenging task by
randomly selecting 26574 layouts containing 46rooms
for training and 12000 layouts containing 78rooms for
testing. Note that in both experiments, the layouts for test
include more rooms than the ones used for training.
Quantitative results are reported in Table 2. Again, we
can observe that OurIoutperforms OurII , which is consis-
tent with our intuition since more prior knowledge is pro-
vided to OurI. Further, the results in first group are better
than those in the second group, which is reasonable because
the generalization setting in the second group is more chal-
lenging. However, both groups achieve comparable perfor-
mance to Table 1and still outperform the baselines.
Qualitative Results An interesting test is to see if
Figure 5. User interactions. (a)–(c) indicate interactions via providing different levels of human input; (d) shows fine-grained interaction
in a step-wise generation.
Figure 6. Layouts with non-axis aligned edges. Left: successful
examples. Right: Rplan fails but our model succeeds.
iPLAN trained on RPLAN can be generalized to unseen
types of boundaries. The qualitative results are also pro-
vided to verify the generalizability of iPLAN. Note that
RPLAN only contains layouts with straight axis-aligned
edges in the boundaries, so we consider the boundaries with
non-axis aligned edges and curves. Since HouseGAN++
cannot design a layout for a specific building boundary and
Graph2Plan is restricted to boundaries with axis-aligned
edges, we only compare iPLAN with Rplan.
Figure 6shows several examples of non-axis aligned de-
sign produced by Rplan and iPLAN. The left lists four suc-
cessful cases on both approaches and the right shows two
examples where Rplan fails but iPLAN succeeds. Even
when successful, Rplan often predicts isolated rooms which
divide the public areas into strange shapes and reduce their
usability (the first two examples), while iPLAN utilizes the
space better. As for the failures of Rplan, we give two exam-
ples of room center estimation and wall prediction, where it
fails to reasonably fill and partition the space, a problem
from which iPLAN does not suffer. Please refer to the sup-
plementary document for more examples.
5. Limitation & Conclusion
While iPLAN is able to handle some irregular bound-
aries, it cannot cope with extremely irregular ones. One im-
portant future direction would be to handle complex envi-
ronments, e.g., non-axis aligned boundaries and compound
and nested rooms, which is crucial to generalize iPLAN to
public spaces such as train stations and shopping malls.
In this paper, we proposed a novel human-in-the-loop
generative model iPLAN to learn professional designs in
a stage-to-stage fashion while respecting design principles.
While being capable of fully automated generation, iPLAN
allows close interactions with humans by accepting user
guidance at every stage and automatically suggesting possi-
ble designs accordingly. Comprehensive evaluations on two
benchmark datasets RPLAN and LIFULL show that iPLAN
outperforms the state-of-the-art methods, both quantita-
tively and qualitatively. Importantly, iPLAN has exhibited
strong generalization capability to unseen design tasks and
boundary inputs.
Acknowledgements: We thank Jing Li for her input on
the design practice. This project has received funding
from the European Union’s Horizon 2020 research and in-
novation programme under grant agreement No 899739
CrowdDNA and the Marie Skłodowska-Curie grant agree-
ment No 101018395. Feixiang He has been supported by
UKRI PhD studentship [EP/R513258/1, 2218576].
References
[1] Oron Ashual and Lior Wolf. Specifying object attributes
and relations in interactive scene generation. In Proceedings
of the IEEE/CVF International Conference on Computer Vi-
sion, pages 4561–4569, 2019. 2
[2] Fan Bao, Dong-Ming Yan, Niloy J Mitra, and Peter Wonka.
Generating and exploring good building layouts. ACM
Transactions on Graphics (TOG), 32(4):1–10, 2013. 2
[3] Andrew Brock, Jeff Donahue, and Karen Simonyan. Large
scale gan training for high fidelity natural image synthesis.
arXiv preprint arXiv:1809.11096, 2018. 2
[4] Stanislas Chaillou. Archigan: Artificial intelligence x ar-
chitecture. In Architectural intelligence, pages 117–127.
Springer, 2020. 2
[5] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos,
Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image
segmentation with deep convolutional nets, atrous convolu-
tion, and fully connected crfs. IEEE transactions on pattern
analysis and machine intelligence, 40(4):834–848, 2017. 4
[6] Matthew Fisher, Daniel Ritchie, Manolis Savva, Thomas
Funkhouser, and Pat Hanrahan. Example-based synthesis
of 3d object arrangements. ACM Transactions on Graphics
(TOG), 31(6):1–11, 2012. 2
[7] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing
Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and
Yoshua Bengio. Generative adversarial nets. Advances in
neural information processing systems, 27, 2014. 2
[8] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent
Dumoulin, and Aaron Courville. Improved training of
wasserstein gans. arXiv preprint arXiv:1704.00028, 2017.
5
[9] Mikako Harada, Andrew Witkin, and David Baraff. Inter-
active physically-based manipulation of discrete/continuous
models. In Proceedings of the 22nd annual conference on
Computer graphics and interactive techniques, pages 199–
208, 1995. 2
[10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
Deep residual learning for image recognition. In Proceed-
ings of the IEEE conference on computer vision and pattern
recognition, pages 770–778, 2016. 4
[11] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner,
Bernhard Nessler, and Sepp Hochreiter. Gans trained by a
two time-scale update rule converge to a local nash equilib-
rium. Advances in neural information processing systems,
30, 2017. 5
[12] Ruizhen Hu, Zeyu Huang, Yuhan Tang, Oliver Van Kaick,
Hao Zhang, and Hui Huang. Graph2plan: Learning floor-
plan generation from layout graphs. ACM Transactions on
Graphics (TOG), 39(4):118–1, 2020. 1,2,3,5,6
[13] Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz.
Multimodal unsupervised image-to-image translation. In
Proceedings of the European conference on computer vision
(ECCV), pages 172–189, 2018. 2
[14] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A
Efros. Image-to-image translation with conditional adver-
sarial networks. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 1125–1134,
2017. 2
[15] Justin Johnson, Agrim Gupta, and Li Fei-Fei. Image gener-
ation from scene graphs. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition, pages
1219–1228, 2018. 2
[16] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen.
Progressive growing of gans for improved quality, stability,
and variation. arXiv preprint arXiv:1710.10196, 2017. 2
[17] Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine,
Jaakko Lehtinen, and Timo Aila. Training generative
adversarial networks with limited data. arXiv preprint
arXiv:2006.06676, 2020. 2
[18] Tero Karras, Samuli Laine, and Timo Aila. A style-based
generator architecture for generative adversarial networks.
In Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pages 4401–4410, 2019. 2
[19] Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten,
Jaakko Lehtinen, and Timo Aila. Analyzing and improv-
ing the image quality of stylegan. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 8110–8119, 2020. 2
[20] Diederik P Kingma and Max Welling. Auto-encoding varia-
tional bayes. arXiv preprint arXiv:1312.6114, 2013. 4
[21] Yikang Li, Tao Ma, Yeqi Bai, Nan Duan, Sining Wei, and
Xiaogang Wang. Pastegan: A semi-parametric method to
generate image from scene graph. Advances in Neural Infor-
mation Processing Systems, 32:3948–3958, 2019. 2
[22] Chen Liu, Jiajun Wu, Pushmeet Kohli, and Yasutaka Fu-
rukawa. Raster-to-vector: Revisiting floorplan transforma-
tion. In Proceedings of the IEEE International Conference
on Computer Vision, pages 2195–2203, 2017. 5
[23] Paul Merrell, Eric Schkufza, Zeyang Li, Maneesh Agrawala,
and Vladlen Koltun. Interactive furniture layout using in-
terior design guidelines. ACM transactions on graphics
(TOG), 30(4):1–10, 2011. 2
[24] Pascal M¨
uller, Peter Wonka, Simon Haegler, Andreas Ulmer,
and Luc Van Gool. Procedural modeling of buildings. In
ACM SIGGRAPH 2006 Papers, pages 614–623. 2006. 2
[25] Nelson Nauata, Kai-Hung Chang, Chin-Yi Cheng, Greg
Mori, and Yasutaka Furukawa. House-gan: Relational gener-
ative adversarial networks for graph-constrained house lay-
out generation. In European Conference on Computer Vi-
sion, pages 162–177. Springer, 2020. 1,2,5
[26] Nelson Nauata, Sepidehsadat Hosseini, Kai-Hung Chang,
Hang Chu, Chin-Yi Cheng, and Yasutaka Furukawa. House-
gan++: Generative adversarial layout refinement networks.
arXiv preprint arXiv:2103.02574, 2021. 1,2,5,6
[27] National Institute of Informatics. LIFULL HOME’S Dataset,
2020. 1,2,5
[28] Chi-Han Peng, Yong-Liang Yang, and Peter Wonka. Com-
puting layouts with deformable templates. ACM Transac-
tions on Graphics (TOG), 33(4):1–11, 2014. 2
[29] Roberto J Rengel. The interior plan: Concepts and exercises.
A&C Black, 2011. 1,2,3
[30] Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan,
Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. Encoding
in style: a stylegan encoder for image-to-image translation.
In Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pages 2287–2296, 2021. 2
[31] Daniel Ritchie, Kai Wang, and Yu-an Lin. Fast and flex-
ible indoor scene synthesis via deep convolutional genera-
tive models. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages 6182–
6190, 2019. 2,3
[32] Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning
structured output representation using deep conditional gen-
erative models. Advances in neural information processing
systems, 28:3483–3491, 2015. 4
[33] Chun-Yu Sun, Qian-Fang Zou, Xin Tong, and Yang Liu.
Learning adaptive hierarchical cuboid abstractions of 3d
shape collections. ACM Transactions on Graphics (TOG),
38(6):1–13, 2019. 5
[34] Kai Wang, Yu-An Lin, Ben Weissmann, Manolis Savva, An-
gel X Chang, and Daniel Ritchie. Planit: Planning and in-
stantiating indoor scenes with relation graph and spatial prior
networks. ACM Transactions on Graphics (TOG), 38(4):1–
15, 2019. 2
[35] Kai Wang, Manolis Savva, Angel X Chang, and Daniel
Ritchie. Deep convolutional priors for indoor scene syn-
thesis. ACM Transactions on Graphics (TOG), 37(4):1–14,
2018. 2,3
[36] Wenming Wu, Xiao-Ming Fu, Rui Tang, Yuhan Wang, Yu-
Hao Qi, and Ligang Liu. Data-driven interior plan genera-
tion for residential buildings. ACM Transactions on Graph-
ics (SIGGRAPH Asia), 38(6), 2019. 1,2,3,4,5,6
[37] Lap Fai Yu, Sai Kit Yeung, Chi Keung Tang, Demetri
Terzopoulos, Tony F Chan, and Stanley J Osher. Make
it home: automatic optimization of furniture arrangement.
ACM Transactions on Graphics (TOG)-Proceedings of ACM
SIGGRAPH 2011, v. 30,(4), July 2011, article no. 86, 30(4),
2011. 2
[38] Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augus-
tus Odena. Self-attention generative adversarial networks. In
International conference on machine learning, pages 7354–
7363. PMLR, 2019. 2
[39] Xi Zhao, Ruizhen Hu, Paul Guerrero, Niloy Mitra, and Taku
Komura. Relationship templates for creating scene varia-
tions. ACM Transactions on Graphics (TOG), 35(6):1–13,
2016. 2
[40] Xi Zhao, He Wang, and Taku Komura. Indexing 3d scenes
using the interaction bisector surface. ACM Trans. Graph.,
33(3):22:1–22:14, June 2014. 2
[41] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A
Efros. Unpaired image-to-image translation using cycle-
consistent adversarial networks. In Proceedings of the IEEE
international conference on computer vision, pages 2223–
2232, 2017. 2
[42] Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Dar-
rell, Alexei A Efros, Oliver Wang, and Eli Shechtman. Multi-
modal image-to-image translation by enforcing bi-cycle con-
sistency. In Advances in neural information processing sys-
tems, pages 465–476, 2017. 2
[43] Peihao Zhu, Rameen Abdal, Yipeng Qin, and Peter Wonka.
Sean: Image synthesis with semantic region-adaptive nor-
malization. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages 5104–
5113, 2020. 2
iPLAN: Interactive and Procedural Layout Planning-Supplementary Material
Feixiang He
University of Leeds, UK
scfh@leeds.ac.uk
Yanlong Huang
University of Leeds, UK
y.l.huang@leeds.ac.uk
He Wang *
University of Leeds, UK
h.e.wang@leeds.ac.uk
1. Architecture of BCVAE
The detailed architecture of BCVAE is illustrated in
Tab. 1. For a given layout, Q={qk}K
k=1 represents room
types, where Kdenotes the number of room types in Dand
qkZcorresponds to the number of rooms under the k-th
type.
Before feeding Qinto BCVAE, a reformulation is im-
plemented. We first determine the largest number of rooms
for each type kacross the whole dataset and denote it as
q
kZ. Then, for each qkQ(qkq
k), we transform it
into a q
k-D vector, i.e., vk, whose first qkelements are set
as 1 while the remaining elements as 0. By concatenating
all transformed vectors, we can obtain an alternative repre-
sentation of Q, i.e., V= [vT
1vT
2. . . vT
K]T.
We denote the output of BCVAE as ˆ
Vand use binary
cross entropy as the reconstruction loss:
Lrec =
n c
X
j=1
lj, lj=[vjlogˆvj+ (1 vj)log(1 ˆvj)],(1)
where n c =PK
i=1 q
krepresents the length of V.
The total loss of BCVAE is:
L=Lrec +λDKL (N(µ,Σ)||N (0,I)),(2)
where DKL denotes the Kullback–Leibler (KL) divergence,
λ= 0.5.
2. Post-Processing for Room Partitioning Pre-
diction
After the room partitioning prediction, sometimes
gaps may exist between the predicted rooms ˆ
R=
{ˆ
r1,ˆ
r2,...,ˆ
rN}. We employ a simple post-processing step
to ensure that the interior area of Bis fully covered and the
room bounding boxes are located within B. We formulate
it as a generic optimization problem:
arg min
ˆ
R
L= arg min
ˆ
R
Lcoverage (ˆ
R,B) + Linterior (ˆ
R,B)
(3)
*Corresponding author
where Lcoverage and Linter ior constrain the spatial consis-
tency between Band the room bounding box set ˆ
R.
To explain Lcoverage and Linter ior clearly, we introduce
a distance function d(p, r)to measure the coverage of a
point pby a box r:
d(p, r)=0,if pin(r)
minqbd(r)||pq||, otherwise (4)
where in(r)denotes the interior area of the box rand
bd(r)represents the boundary of r.
The coverage loss can be defined as:
Lcoverage (ˆ
R,B) = Ppin (B)minid2(p, ˆ
ri)
|in(B)|,(5)
where |in|is the number of pixels in the set in (B).
The interior loss can be denoted as follows:
Linterior (ˆ
R,B) = PiPpin (ˆ
ri)d2(p, ˆ
B)
Pi|in(ˆ
ri)|,(6)
where ˆ
Bis the bounding box of the boundary. Note that
Bˆ
B.
Therefore, in the inference stage, we directly adjust the
predicted rooms ˆ
R={ˆ
r1,ˆ
r2,...,ˆ
rN}by minimizing the
loss Lin Eq. (3).
3. Additional Qualitative Comparisons
Fig. 1and Fig. 2show qualitative results on RPLAN and
LIFULL, respectively. In both datasets, Graph2Plan and
OurIare provided with the full human input (including the
boundary, room types and room locations), their generated
layouts are expected to be similar to the GT. While it is
the case for OurI, it doesn’t seem to be so for Graph2Plan.
The shaded areas on the layouts produced by Graph2Plan
show clear differences from the GT layouts. In contrast, the
layouts from OurIare nearly the same as the GT.
In addition, we also compare iPLAN with other meth-
ods on a more challenging dataset, LIFULL. When only
the house boundary is provided, OurII I outperforms Rplan.
OurII corresponds to the case when the house boundary and
1
Architecture Layer Specification Output Size
embedding network
conv bn relu11×16 ×4×4(s= 2, p = 1) 64 ×64 ×16
conv bn relu216 ×16 ×4×4(s= 2, p = 1) 32 ×32 ×16
conv bn relu316 ×32 ×4×4(s= 2, p = 1) 16 ×16 ×32
conv bn relu432 ×32 ×4×4(s= 2, p = 1) 8 ×8×32
conv bn relu532 ×16 ×4×4(s= 2, p = 1) 4 ×4×16
conv bn relu616 ×16 ×4×4(s= 2, p = 1) 2 ×2×16
flatten N/A 1×64
encoder
concat N/A 1×(n c + 64)
linear relu1 (n c + 64) ×128 1 ×128
linear relu2 128 ×64 1 ×64
linear31 64 ×32 1 ×32
linear32 64 ×32 1 ×32
decoder
concat N/A 1×96
linear relu1 96 ×96 1 ×96
linear relu2 96 ×64 1 ×64
linear364 ×n c 1×n c
sigmoid N/A 1×n c
Table 1. The BCVAE architectural specification. sand prespectively denote stride and padding. n c is the dimension of house type.
Convolution kernels and layer output are separately specified by (Nin ×Nout ×W×H)and (W×H×C).
Figure 1. Qualitative comparisons on RPLAN. Shaded areas indicate design choices that are questionable.
the room types are known, which achieves slightly better
predictions than OurII I as more information is fed. Fur-
thermore, if the full human input is provided, OurIperforms
better than OurII . Note that OurIis superior to Graph2Plan
which is also fed with the full human input.
4. Additional Generalization Evaluations
More generalization results on RPLAN are presented in
Fig. 3. The first row and second row correspond to Rplan
and OurII I , respectively. OurIII achieves better results.
Consistent with our analysis in the main paper, Rplan is
prone to splitting the public area into two main areas, caus-
ing potential inconvenience for family activities (the first
Figure 2. Qualitative comparisons on LIFULL. Shaded areas indicate design choices that are questionable.
Figure 3. Layouts with non-axis aligned edges. Left: successful examples. Right: Rplan fails but our model succeeds.
two columns in Fig. 3). Sometimes, Rplan also fails to plan
a bathroom in the layout (the third column in Fig. 3). More-
over, on some boundaries, Rplan fails to design the layouts
(the last three columns in Fig. 3). In general, OurII I out-
performs Rplan when the boundary is non-axis aligned.
5. Implementation Details
We have implemented iPLAN in PyTorch. All models
are trained and tested on a NVIDIA GeForce RTX 2080
Ti. It takes about two hours to train BCVAE, two days to
optimize the room-locating network and one day to train
the room area prediction model.
Chapter
Creating floor plans for indoor scenes is a fundamental task of architecture and interior design. Since a high-quality floor plan requires proper space division and location distribution for rooms inside the scene, such a task always require professional designers to spend huge efforts and workloads. In this paper, we intend to leverage the artificial intelligence method to relieve the manual work of interior floor plan design. Specifically, we employ the differentiable renderer, which is an optimizer that adjusts the parameters of certainly given mesh primitives. The optimization process is under the constraints of the input outer boundary, overlap, and weights, to ensure the design is parameterized, and suitable for interior floor plans with plausible space division. We also provide post-processing to refine the optimized mesh primitives and convert them to an interior floor plan. We conduct both ablation and controlled experiments to verify the effectiveness of our method in floor plan design.KeywordsComputer-aided designFloor plan synthesisDifferentiable renderer
Chapter
Full-text available
This paper proposes a novel graph-constrained generative adversarial network, whose generator and discriminator are built upon relational architecture. The main idea is to encode the constraint into the graph structure of its relational networks. We have demonstrated the proposed architecture for a new house layout generation problem, whose task is to take an architectural constraint as a graph (i.e., the number and types of rooms with their spatial adjacency) and produce a set of axis-aligned bounding boxes of rooms. We measure the quality of generated house layouts with the three metrics: the realism, the diversity, and the compatibility with the input graph constraint. Our qualitative and quantitative evaluations over 117,000 real floorplan images demonstrate that the proposed approach outperforms existing methods and baselines. We will publicly share all our code and data.
Article
In this paper, we study physics-based cloth simulation in a very high resolution setting, presumably at submillimeter levels with millions of vertices, to meet perceptual precision of our human eyes. State-of-the-art simulation techniques, mostly developed for unstructured triangular meshes, can hardly meet this demand due to their large computational costs and memory footprints. We argue that in a very high resolution, it is more plausible to use regular meshes with an underlying grid structure, which can be highly compatible with GPU acceleration like high-resolution images. Based on this idea, we formulate and solve the nonlinear optimization problem for simulating high-resolution wrinkles, by a fast block-based descent method with reduced memory accesses. We also investigate the development of the collision handling component in our system, whose performance benefits greatly from the grid structure. Finally, we explore various issues related to the applications of our system, including initialization for fast convergence and temporal coherence, gathering effects, inflation and stuffing models, and mesh simplification. We can treat our system as a quasistatic wrinkle synthesis tool, run it as a standalone dynamic simulator, or integrate it into a multi-resolution solver as an additional component. The experiment demonstrates the capability, efficiency and flexibility of our system in producing a variety of high-resolution wrinkles effects.
Book
This introductory-level text introduces students to the planning of interior environments, addressing both the contents of the environments and the process of interior space planning. Topics include the making of rooms, the design of effective spatial sequences, functional relationships among project parts, arrangement of furniture, planning effective circulation systems, making spaces accessible, and designing safe environments with efficient emergency egress systems. Exercises throughout the book facilitate learning by encouraging students to apply ideas and concepts immediately after reading about them. This second edition features logically re-organized content with coverage on accessibility and universal design throughout, providing for a more intuitive read. It also features new original artwork by the author and a new glossary for quick look-up of terms. Finally, there are new exercises that engage students and test their ability to apply what they have learned.
Chapter
AI will soon massively empower architects in their day-to-day practice. This article provides a proof of concept. The framework used here offers a springboard for discussion, inviting architects to start engaging with AI, and data scientists to consider Architecture as a field of investigation.
Article
We introduce a learning framework for automated floorplan generation which combines generative modeling using deep neural networks and user-in-the-loop designs to enable human users to provide sparse design constraints. Such constraints are represented by a layout graph. The core component of our learning framework is a deep neural network, Graph2Plan, which converts a layout graph, along with a building boundary, into a floorplan that fulfills both the layout and boundary constraints. Given an input building boundary, we allow a user to specify room counts and other layout constraints, which are used to retrieve a set of floorplans, with their associated layout graphs, from a database. For each retrieved layout graph, along with the input boundary, Graph2Plan first generates a corresponding raster floorplan image, and then a refined set of boxes representing the rooms. Graph2Plan is trained on RPLAN, a large-scale dataset consisting of 80K annotated floorplans. The network is mainly based on convolutional processing over both the layout graph, via a graph neural network (GNN), and the input building boundary, as well as the raster floorplan images, via conventional image convolution. We demonstrate the quality and versatility of our floorplan generation framework in terms of its ability to cater to different user inputs. We conduct both qualitative and quantitative evaluations, ablation studies, and comparisons with state-of-the-art approaches.