Conference PaperPDF Available

A Pig, an Angel and a Cactus Walk Into a Blender: A Descriptive Approach to Visual Blending


Abstract and Figures

A descriptive approach for automatic generation of visual blends is presented. The implemented system, the Blender, is composed of two components: the Mapper and the Visual Blender. The approach uses structured visual representations along with sets of visual relations which describe how the elements – in which the visual representation can be decomposed – relate among each other. Our system is a hybrid blender, as the blending process starts at the Map-per (conceptual level) and ends at the Visual Blender (visual representation level). The experimental results show that the Blender is able to create analogies from input mental spaces and produce well-composed blends, which follow the rules imposed by its base-analogy and its relations. The resulting blends are visually interesting and some can be considered as unexpected.
Content may be subject to copyright.
A Pig, an Angel and a Cactus Walk Into a Blender:
A Descriptive Approach to Visual Blending
ao M. Cunha, Jo˜
ao Gonc¸alves, Pedro Martins, Penousal Machado, Am´
ılcar Cardoso
CISUC, Department of Informatics Engineering
University of Coimbra
A descriptive approach for automatic generation of visual
blends is presented. The implemented system, the Blender,
is composed of two components: the Mapper and the Visual
Blender. The approach uses structured visual representations
along with sets of visual relations which describe how the el-
ements – in which the visual representation can be decom-
posed – relate among each other. Our system is a hybrid
blender, as the blending process starts at the Mapper (concep-
tual level) and ends at the Visual Blender (visual representa-
tion level). The experimental results show that the Blender is
able to create analogies from input mental spaces and produce
well-composed blends, which follow the rules imposed by its
base-analogy and its relations. The resulting blends are visu-
ally interesting and some can be considered as unexpected.
Conceptual Blending (CB) theory is a cognitive framework
proposed by Fauconnier and Turner (2002) as an attempt to
explain the creation of meaning and insight. CB consists in
integrating two or more mental spaces in order to produce
a new one, the blend(ed) space. Here, mental space means
a temporary knowledge structure created for the purpose of
local understanding (Fauconnier 1994).
Visual blending, which draws inspiration from CB the-
ory, is a relatively common technique used in Computa-
tional Creativity to generate creative artefacts in the visual
domain. While some of the works are explicitly based on
Conceptual Blending theory, as blending occurs at a con-
ceptual level, other approaches generate blends only at a rep-
resentation/instance level by means of, for example, image
processing techniques.
We present a system for automatic generation of visual
blends (Blender), which is divided into two different parts:
the Mapper and the Visual Blender. We follow a descriptive
approach in which a visual representation for a given con-
cept is constructed as a well-structured object (from here
onwards when we use the term representation we are re-
ferring to visual representations). The object can contain
other objects and has a list of descriptive relations, which
describe how the object relates to others. The relations de-
scribe how the representation is constructed (example: part
Ainside part B). In our opinion, this approach allows an eas-
ier blending process and contributes to the overall sense of
Figure 1: Examples of produced blends.
cohesion among the parts.
Our system can be seen as a hybrid blender, as the blend-
ing process starts at the conceptual level (which occurs in
the Mapper) and only ends at the visual representation level
(which occurs in the Visual Blender). We use an evolution-
ary engine based on a Genetic Algorithm, in which each
population corresponds to a different analogy and each indi-
vidual is a visual blend. The evolution is guided by a fitness
function that assesses the quality of each blend based on the
satisfied relations. In the scope of this work, the focus is
given to the Visual Blender.
Related Work
In terms of the type of rendering, current computational ap-
proaches to visual blending can be divided into two groups:
the ones which attempt to blend pictures or photorealistic
renderings; and the ones that focus on non-photorealistic
representations, such as pictograms or icons.
The Boat-House Visual Blending Experience (Pereira and
Cardoso 2002) is, to the best of our knowledge, one of the
earliest attempts to computationally produce visual blends.
The work was motivated by the need to interpret and visu-
alize blends produced by a preliminary version of the Di-
vago framework, which is one of the first artificial creative
systems based on CB theory (Pereira 2007). In addition to
a declarative description of the concepts via rules and con-
cept maps (i.e., graphs representing binary relations between
concepts), Pereira and Cardoso also considered a domain of
instances, which were drawn using a Logo-like program-
ming language. To test the system, the authors performed
several experiments with the house and boat blend (Goguen
1999) considering different instances for the input spaces.
Ribeiro et al. (2003) explored the use of the Divago
framework in procedural content generation. In this work,
the role of Divago was to produce novel creatures at a con-
ceptual level from a set of existing ones. Then, a 3D in-
terpreter was used to visualize the objects. The interpreter
was able to convert concept maps from Divago, representing
creatures, into Wavefront OBJ files that could be rendered
uck (2013) introduced a framework that formalises
the process of CB while applying it to the visual domain.
The framework is composed of five modules that com-
bine image processing techniques with gathering semantic
knowledge about the concept depicted in an image with the
help of ontologies. Elements of the image are replaced with
other unexpected elements of similar shape (for example,
round medical tablets are replaced with pictures of a globe).
Confalonieri et al. (2015) proposed a discursive approach
to evaluate the quality of blends (although there is no ev-
idence of an implementation). The main idea was to use
Lakatosian argumentative dialogue (Lakatos 1976) to itera-
tively construct valuable and novel blends as opposed to a
strictly combinatorial approach. To exemplify the argumen-
tative approach, the authors focused on icon design by in-
troducing a semiotic system for modelling computer icons.
Since icons can be considered as a combination of signs that
can convey multiple intended meanings to the icon, Con-
falonieri et al. proposed argumentation to evaluate and refine
the quality of the icons.
Xiao and Linkola (2015) proposed Vismantic, a semi-
automatic system aimed at producing visual compositions
to express specific meanings, namely the ones of abstract
concepts. Their system is based on three binary image oper-
ations (juxtaposition, replacement and fusion), which are the
basic operations to represent visual metaphors (Phillips and
McQuarrie 2004). For example, Vismantic represents the
slogan Electricity is green as an image of an electric light
bulb where the wire filament and screw base are fused with
an image of green leaves. The selection of images as well as
the application of the visual operations require user’s inter-
Correia et al. (2016) proposed X-Faces, which can be
seen as a data augmentation technique to autonomously gen-
erate new faces out of existing ones. Elementary parts of the
faces, such as eyes, nose or mouth, are recombined by means
of evolutionary algorithms and computer vision techniques.
The X-Faces framework generates unexpected, yet realis-
tic, faces by exploring the shortcomings and vulnerabilities
of computational face detectors to promote the evolution of
faces that are not recognised as such by these systems.
Recent works such as DeepStyle (Gatys, Ecker, and
Bethge 2015) can also be seen as a form of visual blend-
ing. DeepStyle is based on a deep neural network that has
the ability to separate image content from certain aspects of
style, allowing to recombine the content of an arbitrary im-
age with a given rendering style (style transfer). The system
is known for mimicking features of different painting styles.
Several other authors have seen the potential of deep
neural networks for tasks related to visual blending (Berov
and Kuhnberger 2016; McCaig, DiPaola, and Gabora 2016;
Heath and Ventura 2016). For instance, Berov and
uhnberger (2016) proposed a computational model of vi-
sual hallucination based on deep neural networks. To some
extent, the creations of this system can be seen as visual
The approach
Having the organization of mental spaces as an inspiration,
we follow a similar approach to structure the construction of
the visual representations, which are considered as a group
of several parts / elements. By focusing on the parts instead
of the whole, there is something extra that stands out: not
only is given importance to the parts but the representation
ceases to be a whole and starts to be seen as parts related to
each other. As our goal is to produce visual results, these
relations have a visual descriptive nature (i.e. the nature of
the relation between two elements is either related to their
relative position or to their visual qualities). This allows the
generation of visual blends, guided and evaluated by criteria
imposed by the relations present in the base-representations
(see Fig.3) used in the visual blend production.
In addition, by using a representation style that consists
of basic shapes, we reduce the concept to its simplest form,
maintaining its most important features and thus, hopefully,
capturing its essence (a similar process can be seen in Pi-
casso’s The Bull, a set of eleven lithographs produced in
1945). As such, our approach can be classified as belonging
to the group of non-photorealistic visual blending. This sim-
plification of concepts has as inspiration several attempts to
produce a universal language, understandable by everyone –
such as the pictographic ISOTYPE by Otto Neurath (1936)
or the symbolic Blissymbolics by Charles Bliss (1965).
As already mentioned, our main idea is centered on the
fact that the construction of a visual representation for a
given concept can be approached in a structured way. Each
representation is associated with a list of descriptive rela-
tions (e.g.: part A below part B), which describes how the
representation is constructed. Due to this, a visual blend
between two representations is not simply a replacement of
parts but its quality is assessed based on the number of re-
lations that are respected. This gives much more flexibility
to the construction of representations by presenting a ver-
sion of it and also allowing the generation of similar ones, if
The initial idea involved only a representation for each
concept. However, a given concept has several possible vi-
sual representations (e.g. there are several possible ways
of visually representing the concept car), which means that
only using one would make the system very limited.
In order to avoid biased results, we decided to use several
versions for each concept. Each visual representation can
be different (varying in terms of style, complexity, number
of characteristics and even chosen perspective) and thus also
having a different set of visual relations among the parts.
Figure 2: On the left is a the representation drawn with the
elements identified; On the right is the result of the conver-
sion into fully scalable vector graphic.
In comparison to the systems described in the previous
Section, we follow a different approach to the generation of
visual blends by implementing a hybrid system and giving
great importance to the parts and their relations – such tends
to be overlooked by the majority of the reviewed works in
which an unguided replacement of parts often leads to a lack
of cohesion among them. This approach allows us not only
to assess the quality of the blends and guide evolution but
also to easily generate similar (and also valid) blends based
on a set of relations.
Collecting data
The initial phase of the project consisted in a process of data
collection. Firstly, a list of possible concepts was produced
by collecting concepts already used in the conceptual blend-
ing field of research. From this list, three concepts were
selected based on their characteristics: angel (human-like),
pig (animal) and cactus (plant) – collected from Keane and
Costello (2001). The goal of this phase was to collect vi-
sual representations for these concepts. An enquiry to col-
lect the desired data was designed, which was composed of
five tasks:
T1 Collection of visual representations for the selected con-
T2 Identification of the representational elements;
T3 Description of the relations among the identified ele-
T4 Identification of the prototypical elements – i.e. the
element(s) that most identify a given concept (Johnson
1985). For instance, for the concept pig most participants
considered nose and tail as the prototypical elements;
T5 Collection of visual blends for the selected concepts.
The data was collected from nine participants who were
asked to complete the required tasks. In the first task (T1),
the participants were asked to draw a representation for each
concept avoiding unnecessary complexity but still represent-
ing the most important elements of the concept. In order to
achieve intelligible and relatively simple representations, the
participants were suggested to use primitives such as lines,
ellipses, triangles and quadrilaterals as the basis for their
drawings. After completing the first version, a second one
was requested. The reason for two versions was to promote
Figure 3: Representations used as a base.
In the second task (T2), the participants identified the el-
ements drawn using their own terms (for example, for the
concept angel some of the identified elements were head,
After completing the previous task, the participants were
asked to identify the relations among elements that they con-
sidered as being essential essential (T3). These relations
were not only related to the conceptual space but also (and
mostly) to the representation. In order to help the partici-
pants, a list of relations was provided. Despite being told
that the list was only to be considered as an example and not
to be seen as closed, all the participants used the relations
provided – this ensured the semantic sharing between par-
ticipants. Some participants suggested other relations that
were not on the list – these contributions were well-received.
The identified relations are dependent on the author’s in-
terpretation of the concept, which can be divided into two
levels. The first level is related to how the author interprets
the connections among the concepts of the parts at a con-
ceptual level (for example car,wheel or trunk). The second
level is related to the visual representation being considered:
different visual representations may have different relations
among the same parts (this can be caused, for example, by
the change of perspective or style) – e.g. the different posi-
tioning of the head in the two pig representations in Fig.3.
Task four (T4) consisted in identifying the prototypical
parts of the representations – the parts which most identify
the concept (Johnson 1985). These will be used for inter-
preting the results obtained and for posterior developments.
In the last task of the enquiry (T5), the participants were
asked to draw representations for the blends between the
three concepts. As a blend between two concepts can be in-
terpreted and posteriorly represented in different ways (e.g.
just at a naming level a blend between pig and cactus can
be differently interpreted depending on its name being pig-
cactus or cactus-pig). For this reason, the participants were
asked to draw one or more visual representations for the
blend. These visual representations were later used for com-
paring with the results obtained with the Visual Blender.
Figure 4: Structure of the implemented Blender. The
Blender consists of a Mapper and a visual Blender. The fig-
ure also shows the input spaces (1), the visual representa-
tions and list of relations (2), the produced analogies (3) and
the produced blends (4).
After the conduction of the enquiry, the data was treated in
order to be used by the Visual Blender. Firstly, the repre-
sentations collected for each of the concepts were converted
into fully scalable vector graphics (see Fig. 2) and prepared
to be used as base visual representations (see Fig.3) for the
Visual Blender (using layer naming according to the data
collected for each representation – each layer was named af-
ter its identified part). In addition to this, the relations among
parts were formatted to be used as input together with their
corresponding representation.
The Visual Blender
As already mentioned, the Blender has two different com-
ponents: the Mapper and the Visual Blender (see Fig.4).
The Mapper receives two input spaces (represented as 1 in
Fig.4), one referring to concept A and the other one to con-
cept B. It produces analogies (3 in Fig.4) that are afterwards
used by the Visual Blender component. The Visual Blender
also receives visual representations and corresponding list of
relations among parts (2 in Fig.4) that are used as a base and
data for producing the visual blends (4 in Fig.4).
As this paper is focused on the Visual Blender component,
the Mapper is only briefly described (subsection Generating
the blends: structural mapping). Despite being related, the
two components have different implementation details (e.g.
object structure).
Generating the blends: structural mapping
In Conceptual Blending theory, after the selection of input
spaces, the subsequent step is to perform a partial matching
between elements of the given mental spaces. This can be
seen as establishing an analogy between the two inputs. Our
input spaces are in the form of semantic maps composed of
Ncconcepts and Nttriples, with Nt, NcN. The triples
are in the form <concept0, relation, concept1>. Each con-
cept corresponds to a vertex in a generic graph and the rela-
tion represents a directed edge connecting both concepts.
The Mapper iterates through all possible root mappings,
each composed of two distinct concepts taken from the in-
put spaces. This means that there is a total of Nc
ations. Then, the algorithm extracts two isomorphic sub-
graphs from the larger input space. The two sub-graphs
are split in two sets of vertices A(left) and B(right). The
structural isomorphism is defined by the sequence of relation
types (pw, isa,...) found in both sub-graphs.
Starting at the root mapping defined by two (left and right)
concepts, the isomorphic sub-graphs are extracted from the
larger semantic structure (the input spaces) by executing two
synchronised expansions of nearby concepts at increasingly
depths. The first expansion starts from the left concept and
the second from the right concept. The left expansion is
done recursively in the form of a depth first expansion and
the right as a breadth first expansion. The synchronisation is
controlled by two mechanisms:
1. the depth of the expansion, which is related to the number
of relations reached by each expansion, starting at either
concept from the root mapping;
2. the label used for selecting the same relation to be ex-
panded next in both sub-graphs.
Both left (depth) and right (breadth) expansions are al-
ways synchronized at the same level of deepness (first mech-
anism above).
While expanding, the algorithm stores additional associa-
tions between each matched relations and the corresponding
concept which was reached through that relation. In reality,
what is likely to happen is to occur a multitude of isomor-
phisms. In that case, the algorithm will store various map-
pings from any given concept to multiple different concepts,
as long as the same concepts were reached from a previous
concept with the same relation. In the end, each isomor-
phism and corresponding set of concept mappings gives rise
to an analogy. The output of the Mapper component is a list
of analogies with the greatest number of mappings.
Generating the blends: construction and relations
The Visual Blender component uses structured base-
representations (of the input concepts) along with their set
of relations among parts to produce visual blends based on
analogies (mappings) produced by the Mapper component.
The way of structuring the representations is based on
the Syntactic decomposition of graphic representations pro-
posed by von Engelhardt (2002) in which a composite
graphic object consists of: a graphic space (occupied by the
object); a set of graphic objects (which may also be com-
posite graphic objects); and a set of graphic relations (which
may be object-to-space and/or object-to-object).
The objects store several attributes: name, shape, posi-
tion relative to the father-object (which has the object in the
set of graphic objects), the set of relations to other objects
and the set of child-objects. By having such a structure, the
complexity of blending two base representations is reduced,
as it facilitates object exchange and recursive changing (by
moving an object, the child-objects are also easily moved).
A relation between two objects consists of: the object A,
the object Band the type of relation (above, lowerPart, in-
side, ...) – e.g. eye (A) inside head (B).
Generating the blends: visual blending
The Visual Blender receives the analogies between two
given concepts produced by the Mapper component and the
blend step occurs during the production of the visual rep-
resentation – differently from what happens in The Boat-
House Visual Blending Experience (Pereira and Cardoso
2002), in which the blends are merely interpreted at the vi-
sual representation level.
The part of the blending process that occurs at the Visual
Blender produces visual representations as output and con-
sists of five steps:
S1 An analogy is selected from the set of analogies pro-
vided by the Mapper;
S2 One of the concepts (either Aor B) is chosen as a base
(consider Aas the chosen one, as an example);
S3 A visual representation (rA) is chosen for the concept A
and a visual representation (rB) is chosen for the concept
S4 Parts of rA are replaced by parts of rB based on the anal-
ogy. For each mapping of the analogy – consider for ex-
ample leg of Acorresponds to arm of B– the following
steps occur:
S4.1 The parts from rA that correspond to the element in
the mapping (e.g. leg) are searched using the names
of the objects. In the current example, the parts found
could be left leg (left is a prefix), right leg 1 (right is
a prefix and 1a suffix) or even leftfront leg;
S4.2 For each of the found parts in S4.1, a matching part
is searched in rB using the names of the objects. This
search firstly looks for objects that match the full name,
including the prefix and suffix (e.g. right arm 1) and, if
none is found, searches only using the name in the map-
ping (e.g. arm). It avoids plural objects (e.g. arms). If
no part is found, it proceeds to step S4.4;
S4.3 The found part (pA) of rA is replaced by the match-
ing part (pB) of rB, updating the relative positions of
pB and its child-objects, and relations (i.e. relations
that used to belong to pA now point to pB);
S4.4 A process of Composition occurs (see examples in
Fig.5 – the tail and the belly / round shape in the tri-
angular body are obtain using composition). For each
of the matching parts from rB (even if the replacement
does not occur) a search is done for parts from rB that
have a relation with pB (for example, a found part could
be hand). It only accepts a part if rA does not have a
part with the same name and if the analogy used does
not have a mapping for it. If a found part matches these
criteria, a composition can occur by copying the part
Figure 5: The “face expressions” of the angel-pigs – given
the same or similar rules, the produced results are still quite
diverse. The tail and the belly / round shape in the triangular
body are obtain through a process of composition (S4.4).
to rA (in our example, depending on either the replace-
ment in Step S4.3 occurred or not, rA would have either
hand related to arm or to leg, respectively);
S5 The rA resulting from the previous steps is checked for
inconsistencies (both in terms of relative positioning and
obsolete relations – which can happen if an object does
not exist anymore due to a replacement);
After generating a representation, the similarity to the
base representations (rA and rB) is assessed to avoid pro-
ducing representations visually equal to them. This assess-
ment is done by using a Root Mean Square Error (RMSE)
measure that checks the similarity on a pixel-by-pixel basis.
Evolutionary Engine
The main goal of the Visual Blender component is to pro-
duce and evolve possible visual blends based on the analo-
gies produced by the Mapper. In order to achieve this and
promote diversity while respecting each analogy, an evolu-
tionary engine was implemented. This engine is based on
a Genetic Algorithm (GA) using several populations (each
corresponding to a different analogy), in which each indi-
vidual is a visual blend.
In order to guide evolution, we adopt a fitness function
that assesses how well the the existing relations are re-
spected. Some of the relations, e.g. the relation above, have
a binary assessment – either 0, when the relation is not re-
spected, or 1 when it is respected. Others yield a value be-
tween 0 and 1 depending on how respected it is – e.g. the
relation inside calculates the number of points that are inside
and returns #P ointsI nside
total#P oints .
The fitness function for a given visual blend bis as fol-
f(b) =
where #R(b)denotes the number of relations present in b
and vis the function with values in [0,1] that indicates how
much a relation ris respected (0– not respected at all, 1 –
fully respected).
The evolutionary engine includes five tasks which are per-
formed in each generation for each population:
T1 Produce more individuals when the population size is
below the maximum size;
T2 Store the best individual to avoid loosing it (elitism);
T3 Mutate the individuals of the population. For each indi-
vidual, each object can be mutated by changing its posi-
tion. This change also affects its child-objects;
T4 Recombine the individuals: the parents are chosen using
tournament selection (with size 2) and a N-point crossover
is used to produce the children. In order to avoid the gen-
eration of invalid individuals, the crossover only occurs
between chromosomes (objects) with the same name (e.g.
ahead is only exchanged with a head). If this rule was not
used, it would lead to the production of descendants that
would not respect the analogy followed by the population;
T5 Removal of identical individuals in order to increase
In the experiments reported in this paper the mutation
probability was set to 0.05, per gene, and the recombina-
tion probability to 0.2, per individual. These values were
established empirically in preliminary runs.
Results and discussion
In this section we present and discuss the experimental re-
sults. We begin with a general analysis. Afterwards, we
analyse the resulting visual representations comparing them
with the data collected in the initial enquiry. Then, we anal-
yse the quality of the produced blends by presenting the re-
sults of a final enquiry focused on perception.
Overall, the analysis of the experimental results indicates
that the implemented blender is able to produce sets of
blends with great variability (see Fig.5 for an example of
the results obtained for the same analogy and the same rela-
tions) and unexpected features, while respecting the analogy.
The evolutionary engine is capable of evolving the blends
towards a higher number of satisfied relations. This is veri-
fiable in numerical terms, through the analysis of the evolu-
tion of fitness, and also through the visual assessment of the
results. Figure 6 illustrates the evolution of a blend: the legs
and tail are iteratively moved towards the body in order to
increase the degree of satisfaction of the relations.
We can also observe that the system tends to produce
blends in which few parts are exchanged between concepts.
This can be explained as follows: when the number of parts
increases the difficulty of (randomly) producing a blend with
adequate fitness drastically decreases. As such, blends with
fewer exchanges of parts, thus closer to base representa-
tion (in which all the relations are satisfied), tend to become
dominant during the initial generations of the evolutionary
runs. We consider that a significantly higher number of runs
would be necessary to produce blends with more exchanges.
Furthermore, valuing the exchange of parts, through the
modification of the fitness function, may also be advisable
for promoting the emergence of such blends.
As the blends are being produced as a visual representa-
tion which works as a whole as well as a set of individual
Figure 6: Evolution of a blend: the legs and tail come closer
to the body, guided by the fitness function.
Figure 7: Comparison between hand-drawn blends and
blends generated by the implemented Blender, organised
by groups: group 1 corresponds to pig-cactus blends; 2
corresponds to angel-cactus; groups 3-5 correspond to pig-
angel (the figure on the left of each group is the hand-drawn
parts, the Principle of Integration is being respected by de-
sign – from the Optimality Principles presented by Faucon-
nier and Turner (1998).
Comparison with user-drawn blends
During the initial phase of the project, we conducted a task
of collecting visual blends drawn by the participants. A to-
tal of 39 drawn blends were collected, from which 14 corre-
spond to the blend between cactus and angel, 12 correspond
to the blend between cactus and pig and 13 correspond to
the blend between pig and cactus. The implemented blender
was able to produce visual blends similar to the ones drawn
by the participants (see some examples in Fig. 7). After
analysing the produced blends, the following results were
23 from the 39 drawn blends (DB) were produced by our
2 are not possible to be produced due to inconsisten-
cies (e.g. one drawn blend from angel-pig used a map-
ping from wing-tail and at the same time maintained the
6 were not able to be produced in the current version due
Figure 8: Examples of the visual blends presented in the
second enquiry. On the left are the “good” blends (one for
each) and on the right are the “bad” blends (1 corresponds
to cactus-pig, 2 to angel-cactus and 3 to angel-pig).
to mappings that were not produced by the Mapper (e.g.
head from angel with body from cactus);
5 were not able to be produced because not all of the
collected drawn representations were used in the exper-
According to the aforementioned results, the imple-
mented Blender is not only able to produce blends that are
coherent with the ones drawn by participants but is also able
to produce novel blends that no participant drew, showing
creative behaviour.
Evaluating perception
In order to assess if the produced blends could be correctly
perceived, a second enquiry was conducted. The main goal
was to evaluate whether or not the participant could identify
the input spaces used for each blend (i.e. if it was possible to
identify pig and cactus in a blend produced for pig-cactus).
This is related to the Unpacking Principle (Fauconnier and
Turner 1998).
In the first enquiry, the fourth task (T4) consisted in col-
lecting the prototypical parts for each concept – these are
the parts that most identify the concept (e.g. wing for angel).
We used the data collected for producing the second enquiry.
For each blend (angel-pig,cactus-pig or angel-cactus), four
visual blends were selected (two considered “good” and two
considered “bad”, see Fig. 8). The quality evaluation (“bad”
or “good”) was based on two criteria: fitness of the individ-
ual and presence or legibility of the prototypical parts (i.e. a
“good” exemplar is an individual with the prototypical parts
clearly identifiable; a “bad” exemplar is an individual with
fewer prototypical parts or these are not clearly identifiable).
A total of 12 visual blends were used and the enquiry
was conducted to 30 participants. Each visual blend was
Table 1: Number of correct names given (input spaces’
names) for each of the blends (percentage of answers).
cactus-pig Good 20 50 30
Bad 50 50 0
angel-pig Good 10 20 70
Bad 40 50 10
angel-cactus Good 0 60 40
Bad 10 80 10
Table 2: Number of correct names given (input spaces’
names) for each of the blends (number of answers).
# R. 0 1 2
Good 1 1 2 2
2 1 3 1
Bad 3 1 4 0
4 4 1 0
Good 5 0 1 4
6 1 1 3
Bad 7 2 3 0
8 2 2 1
Good 9 0 4 1
10 0 2 3
Bad 11 0 5 0
12 1 3 1
tested by 5 participants. In order minimise the biasing of
the results, each participant evaluated two visual represen-
tations (one “bad” and one “good”) of different blends (e.g.
when the first was of cactus-pig, the second could only be
of angel-pig or angel-cactus). The “bad” blends were eval-
uated first to further minimise the biasing.
The results (Table 1 and Table 2), clearly show that the
“good” blends were easier to be correctly named (the per-
centage of total correct naming is always higher for the
“good” examples; the percentage of total incorrect naming is
always higher for the “bad” blends). In addition to this, the
names of the input spaces were also easier to be identified in
some of the representations than in others (e.g. the “good”
blends for angel-pig received more totally correct answers
than the rest of the blends, as shown in Table 2).
Overall, the majority of the participants could identify at
least one of the input spaces for the “good” exemplars of vi-
sual blends. Even though some of the participants could not
correctly name both of the input spaces, the answers given
were somehow related to the correct ones (e.g. the names
given for the input spaces in the first “bad” blend of 3 in Fig.
8 were often pig and lady/woman, instead of pig and angel
this is due to the fact that no halo nor wings are presented).
Conclusions and future work
We presented a descriptive approach for automatic genera-
tion of visual blends. The approach uses structured repre-
sentations along with sets of visual relations which describe
how the parts – in which the visual representation can be
decomposed – relate among each other. The experimen-
tal results demonstrate the ability of the Blender to produce
analogies from input mental spaces and generate a wide va-
riety of visual blends based on them. The Visual Blender
component, in addition to fulfilling its purpose, is able to
produce interesting and unexpected blends. Future enhance-
ments to the proposed approach include:
(i) exploring an island approach in which exchange of indi-
viduals from different analogies may occur if they respect
the analogy of the destination population;
(ii) exploring the role of the user (guided evolution), by al-
lowing the selection of individuals to evolve;
(iii) considering Optimality Principles in the assessment of
fitness (e.g. how many parts are exchanged) and explor-
ing which of them may be useful or needed – something
discussed by Martins et al. (2016);
(iv) using relations such as biggerThan or smallerThan to
explore style changing (e.g. the style of the produced
blends will be affected if a base visual representation has
head biggerThan body);
(v) exploring context in the production of blends (e.g. stars
surrounding the angel).
This research is partially funded by: Fundac¸ ˜
ao para a
encia e Tecnologia (FCT), Portugal, under the grant
Berov, L., and Kuhnberger, K.-U. 2016. Visual hallucination
for computational creation. In Proceedings of the Seventh
International Conference on Computational Creativity.
Bliss, C. K. 1965. Semantography (Blissymbolics): A Logi-
cal Writing for an illogical World. Semantography Blissym-
bolics Publ.
Confalonieri, R.; Corneli, J.; Pease, A.; Plaza, E.; and Schor-
lemmer, M. 2015. Using argumentation to evaluate concept
blends in combinatorial creativity. In Proc. of the Sixth Int.
Conf. on Computational Creativity, 174–181.
Correia, J.; Martins, T.; Martins, P.; and Machado, P. 2016.
X-faces: The exploit is out there. In Proceedings of the Sev-
enth International Conference on Computational Creativity.
Fauconnier, G., and Turner, M. 1998. Conceptual integra-
tion networks. Cognitive Science 22(2):133–187.
Fauconnier, G., and Turner, M. 2002. The Way We Think.
New York: Basic Books.
Fauconnier, G. 1994. Mental Spaces: Aspects of Meaning
Construction in Natural Language. New York: Cambridge
University Press.
Gatys, L. A.; Ecker, A. S.; and Bethge, M. 2015. A neural
algorithm of artistic style. arXiv preprint arXiv:1508.06576.
Goguen, J. 1999. An introduction to algebraic semiotics,
with applications to user interface design. In Lecture Notes
in Artificial Intelligence, volume Computation for Metaphor,
Analogy and Agents, 242–291. Springer.
Heath, D., and Ventura, D. 2016. Before a computer can
draw, it must first learn to see. In Proceedings of the 7th
International Conference on Computational Creativity, page
to appear.
Johnson, R. 1985. Prototype theory, cognitive linguistics
and pedagogical grammar. Working Papers in Linguistics
and Language Training 8:12–24.
Keane, M. T., and Costello, F. J. 2001. Setting limits
on analogy: Why conceptual combination is not structural
alignment. In Gentner, D.; Holyoak, K.; and Kokinov, B.,
eds., The Analogical Mind: A Cognitive Science Perspec-
tive. Cambridge, MASS: MIT Press.
Lakatos, I. 1976. Proofs and refutations: the logic of math-
ematical discovery. Cambridge University Press.
Martins, P.; Pollak, S.; Urbancic, T.; and Cardoso, A. 2016.
Optimality principles in computational approaches to con-
ceptual blending: Do we need them (at) all? In Proceedings
of the Seventh International Conference on Computational
McCaig, G.; DiPaola, S.; and Gabora, L. 2016. Deep convo-
lutional networks as models of generalization and blending
within visual creativity. arXiv preprint arXiv:1610.02478.
Neurath, O. 1936. International Picture Language. The
First Rules of Isotype... With Isotype Pictures. Kegan Paul
& Company.
Pereira, F. C., and Cardoso, A. 2002. The boat-house visual
blending experience. In Proceedings of the Symposium for
Creativity in Arts and Science of AISB 2002.
Pereira, F. C. 2007. Creativity and Artificial Intelligence: A
Conceptual Blending Approach. Berlin: Mouton de Gruyter.
Phillips, B. J., and McQuarrie, E. F. 2004. Beyond visual
metaphor: A new typology of visual rhetoric in advertising.
Marketing theory 4(1-2):113–136.
Ribeiro, P.; Pereira, F. C.; Marques, B.; Leitao, B.; and Car-
doso, A. 2003. A model for creativity in creature genera-
tion. In 4th International Conference on Intelligent Games
and Simulation (GAME-ON 2003).
uck, A. 2013. Conceptual blending for the visual
domain. Ph.D. Dissertation, Masters thesis, University of
von Engelhardt, J. 2002. The language of graphics: A
framework for the analysis of syntax and meaning in maps,
charts and diagrams. Yuri Engelhardt.
Xiao, P., and Linkola, S. 2015. Vismantic: Meaning-making
with images. In Proceedings of the 6th Int. Conference on
Computational Creativity, ICCC-15.
... Therefore, the levels of conceptualisation of a visual blend can vary. In fact, the process of conceptualisation can reach high degrees of complexity -e.g. using a process of conceptual blending based on structural alignment techniques to produce analogies from structures such as mental spaces (see Fig. 4) -such was used by Cunha et al. (2017). ...
... (Xiao and Linkola, 2015), and nonphotorealistic, e.g. (Cunha et al., 2017). These two types have great differences in terms of how the visual blending process occurs. ...
... These two types have great differences in terms of how the visual blending process occurs. A photorealistic visual blending may require computer vision and image processing techniques, whereas a non-photorealistic visual blending that uses fully scalable vector graphic is much easier to conduct (Cunha et al., 2017). ...
Conference Paper
Full-text available
The computational generation of visual representation of concepts is a topic that has deserved attention in Computational Creativity. One technique that is often used is visual blending – using two images to produce a third. However, visual blending on its own does not necessarily have a strong conceptual grounding. In this paper, we propose that visual conceptual blending be used for concept representation – a visual blend complemented by conceptual layer developed through elaboration. We outline a model for visual conceptual blending that can be instantiated in a computational system.
... Systems that produce visual representations for concepts (e.g. [3][4][5]) mostly focus on perceptual features (e.g. shapes, colours, etc.). ...
... These visualisations of image schemas are aligned with spatial relations used by authors addressing visual blending (e.g. or inside (x, y) [4] or above (x, y) [13]). However, in the work of such authors, spatial relations are mostly used as an aid for element posi- tioning. ...
... Different meanings were attained depending on the combination of sign and relation (i.e. a downwards-pointing arrow could lead to both download X or download-to X, depending on the used relation). Cunha et al. [4] focused on perceptual aspects and tried to produce visual blends by identifying the prototypical parts of concepts [14] and using previously defined spatial relations. ...
Conference Paper
Full-text available
Current computational systems that visually represent concepts mostly focus on perceptual characteristics and overlook conceptual ones (e.g. affordances). In this paper, we propose an approach to include affordance-related features in such systems, by using image schemas. We firstly deconstruct often used icons to show the role of image schemas, then we use examples to illustrate how visual representations can be produced using image schemas and discuss existing issues. This approach has great application potential, especially in existing Visual Blending systems from the domain of Computational Creativity.
... Systems that produce visual representations for concepts (e.g. [5,6]) mostly focus on perceptual features (e.g. shapes, colors, etc.). ...
... These visualisations of image schemas are similar to spatial relations used by authors addressing visual blending (e.g. above (x, y) [11] or inside (x, y) [6]). However, in the work of such authors, spatial relations are mostly used as an aid for element positioning. ...
... Different meanings were attained depending on the combination of sign and relation (i.e. a downwards-pointing arrow could lead to both download X or download-to X, depending on the used relation). Cunha et al. [6] focused on perceptual aspects and tried to identify the prototypical parts of concepts [12], using previously defined spatial relations to build the visual blends. ...
Current computational approaches for visual representation of concepts mostly focus on perceptual characteristics, overlooking conceptual ones (e.g. affordances). It has been shown that affordances can be modelled using image schemas, which have been considered for computational conceptual blending. We propose an approach to include affordance-related features, represented by image schemas, in systems for visual representation of concepts. We illustrate the importance of these features with some examples. | PRESENTED AT: TriCoLore 2018, ISD4: Image Schema Day IV, Bozen-Bolzano, Italy
... Moreover, the existence of emoji datasets composed of fully scalable vector graphics-e.g. Twemoji-makes them appropriate for visual blending due to their layered structure [10]. ...
The emoji connection between visual representation and semantic knowledge, together with its large conceptual coverage have the potential to be exploited in computational approaches to the visual representation of concepts. An example of a system that explores this potential is Emojinating-a system that uses a process of visual blending of existing emoji to represent concepts. In this paper, we use the Emojinating system as a case study to analyse the appropriateness of visual blending for the visual representation of concepts. We conduct three experiments in which we analyse output quality, type of blend used, usefulness to the user and ease of interpretation. Our main contributions are the following: (i) the production of a double-word concept list for testing the system; (ii) an extensive user study using two different concept lists (single-word and double-word); and (iii) a study that compares produced blends with user drawings.
... On the other hand, the Emoji set fulfils both requirements due to its large conceptual coverage (2823 emoji in Emoji 11.0) and appropriateness of image format -e.g. Twemoji is composed of fully scalable vector graphics, appropriate for visual blending (Cunha et al. 2017). ...
Conference Paper
Full-text available
Visual blending can be employed for the visual representation of concepts, merging input representations to obtain new meanings. Yet, the question arises whether there is any need for a computational approach to visual blending. We address the topic using different points of view and conduct two user studies to assess the usefulness of a visual blending system.
... Another photorealistic example is the generation of new faces from existing ones, by combining face parts [11]. Non-photorealistic examples are the generation of visual representations for boat-house [12] or for the blends between the concepts pig/angel/cactus [13], which combine input visual representations to represent the new concepts. While these explorations only address a reduced number of concepts, the system presented in [5] -upon which this paper builds -works on a bigger scale by combining Semantic Network exploration with visual blending to automatically represent user-introduced concepts using emoji. ...
Conference Paper
Full-text available
Graphic designers visually represent concepts in several of their daily tasks, such as in icon design. Computational systems can be of help in such tasks by stimulating creativity. However, current computational approaches to concept visual representation lack in effectiveness in promoting the exploration of the space of possible solutions. In this paper, we present an evolutionary approach that combines a standard Evolutionary Algorithm with a method inspired by Estimation of Distribution Algorithms to evolve emoji blends to represent user-introduced concepts. The quality of the developed approach is assessed using two separate user-studies. In comparison to previous approaches, our evolutionary system is able to better explore the search space, obtaining solutions of higher quality in terms of concept representativeness.
Conference Paper
Concept Blending is one of the most prominent computational approaches to study and understand the underlying processes related to creativity. In this article, we show how to use the Regulated Activation Network (RAN) cognitive model to reconstruct abstract concepts and their blends. The MNIST dataset is used in this work to build a representation of abstract concepts. For the demonstration, three experiments were designed: first, shows how a high dimensional input image is encoded into a low dimension vector and further reconstructed back into an image; second, reconstruction of blends of abstract concepts that represent same digits; third, reconstructing blends of abstract concepts which represent different digits. The reconstructed images in all three experiments were visually analyzed. The best reconstructions were observed with the encoded image experiment obtaining Mean Squared Error of 0.00562 and an Rsquare score of 0.9193. The blends of similar abstract concepts also reconstructed the expected blend of a digit. The blends of dissimilar abstract concepts reconstructed the images by creating interesting symbols such as character x.
Conference Paper
Visual metaphors are a creative technique used in print media to convey a message through images. This message is not said directly, but implied through symbols and how those symbols are juxtaposed in the image. The messages we see affect our thoughts and lives, and it is an open research challenge to get machines to automatically understand the implied messages in images. However, it is unclear how people process these images or to what degree they understand the meaning. We test several theories about how people interpret visual metaphors and find people can interpret the visual metaphor correctly without explanatory text with 41.3% accuracy. We provide evidence for four distinct types of errors people make in their interpretation, which speaks to the cognitive processes people use to infer the meaning. We also show that people's ability to interpret a visual message is not simply a function of image content but also of message familiarity. This implies that efforts to automatically understand visual images should take into account message familiarity.
Full-text available
Computational models of novel concept understanding and creativity are addressed in this paper from the viewpoint of conceptual blending theory (CBT). In our approach, a novel, unknown concept is addressed in a communication setting, where this novel concept, created as a blend by an emitter agent, sends a communicative object (words, or in this paper, a visual representation of that concept) to another agent. When received by a computational agent, a novel concept for that communicative object can only be understood by blending concepts already known by that agent. In this paper, we first posit that understanding new concepts via blending is also a creative process. Albeit different from generating conceptual blends, understanding a novel concept via blending involves the disintegration and decompression of that novel concept, in such a way that the receiver of that concept is able to re-create the conceptual network supposedly intended by the emitter of the novel concept. Secondly, we also propose image schemas as a tool that agents can use to interpret the spatial information obtained when disintegrating/unpacking novel concepts and then re-create the new blend. This process is studied in a communication setting where semiotics and meaning are conveyed by visual and spatial signs (instead of the usual setting of natural language or text). In this case study, qualitative spatial descriptors are applied for obtaining a formal description of an icon or pictogram, which is later assigned a meaning by a process of conceptual blending using image schemas.
Computational Creativity [CC] is a multidisciplinary research field, studying how to engineer software that exhibits behavior which would reasonably be deemed creative. This article shows how composition of software solutions in this field can effectively be supported through a CC infrastructure that supports user-friendly development of CC software components and workflows, their sharing, execution and reuse. The infrastructure allows CC researchers to build workflows that can be executed online and be easily reused by others on the workflow web address. Moreover, it enables the building of procedures composed of software developed by different researchers from different laboratories, leading to novel ways of software composition for computational purposes that were not expected in advance. This capability is illustrated on a workflow that implements a Concept Generator prototype based on the Conceptual Blending framework. The prototype consists of a composition of modules made available as web services in the infrastructure, and is explored and tested through experiments involving blending of texts from different domains, blending of images, and poetry generation.
Conference Paper
Full-text available
In the combinatorial form of creativity novel ideas are produced through unfamiliar combinations of familiar ideas. We explore this type of creativity in the scope of Data Augmentation applied to Face Detection. Typically, the creation of face detectors requires the construction of datasets of examples to train, test, and validate a classifier, which is a troublesome task. We propose a Data Augmentation technique to autonomously generate new frontal faces out of existing ones. The elementary parts of the faces are recombined using Evolutionary Computation and Computer Vision techniques. The key novel contributions include: (i) an approach capable of automatically creating face alternatives; (ii) the creation and usage of computational curators to automatically select individuals from the evolutionary process; and (iii) an experimentation with the interplay between Data Augmentation and serendipity. The system tends to create a wide variety of unexpected faces that exploit the vulnerabilities of face detectors. The overall results suggest that our approach is a viable Data Augmentation approach in the field of Face Detection.
Conference Paper
Full-text available
Optimality principles are a key element in the Conceptual Blending (CB) framework, as they are responsible for guiding the integration process towards ‘good blends’. Despite their relevance, these principles are of- ten overlooked in the design of computational models of CB. In this paper, we analyse the explicit or implicit presence and relevance of the optimality principles in three different computational approaches to the CB, known from the literature. The approaches chosen for the analysis are Divago, Blending from a generalisation- based analogy model, and blending as a convolution of neural patterns. The analysis contains a discussion on the relevance of the principles and how some of absent principles can be introduced in the different models.
Conference Paper
Full-text available
Research on computational painters usually focuses on simulating rational parts of the generative process. From an art-historic perspective it is plausible to assume that also an arational process, namely visual hallucination , played an important role in modern fine art movements like Surrealism. The present work investigates this connection between creativity and hallucination. Using psychological findings, a three-step process of perception-based creativity is derived to connect the two phenomena. Insights on the neurological correlates of hallucination are used to define properties necessary for modelling them. Based on these properties a recent technique for feature visualisation in Convolutional Neural Networks is identified as a computational model of hallucination. Contrasting the thus enabled perception-based approach with the Painting Fool allows to introduce a distinction between two distinct creative acts, sketch composition and rendering. The contribution of this work is threefold: First, a computational model of hallucination is presented and discussed in the context of a computational painter. Second , a theoretic distinction is introduced that aligns research on different strands of computational creativity and captures the differences to current computational painters. Third, the case is made that computational methods can be used to simulate abnormal mental patterns , thus investigating the role that "madness" might play in creativity – instead of simply renouncing the myth of the mad artist.
Conference Paper
Full-text available
This paper presents Vismantic, a semi-automatic system generating proposals of visual composition (visual ideas) in order to express specific meanings. It implements a process of developing visual solutions from ‘what to say’ to ‘how to say’, which requires both conceptual and visual creativity. In particular, Vismantic extends our previous work on using conceptual knowledge to find diverse visual representations of abstract concepts, with the capacity of combining two images in three ways, including juxtaposition, replacement and fusion. In an informal evaluation consisting of five communication tasks, Vismantic demonstrated the potential of producing a number of expressive and diverse ideas, among which many are surprising. Our analysis of the generated images confirms that visual meaning-making is a subtle interaction between all elements in a picture, for which Vismantic demands more visual semantic knowledge, higher image analysis and synthesis skills, and the ability of interpreting composed images, in order to deliver more ideas that make sense.
Creativity and Artificial Intelligence: A Conceptual Blending Approach takes readers into a computationally plausible model of creativity. Inspired by a thorough analysis of work on creativity from the areas of philosophy, psychology, cognitive science, cognitive linguistics and artificial intelligence, the author deals with the various processes, principles and representations that lie underneath the act of creativity. Focusing on Arthur Koestler's Bisociations, which eventually lead to Turner and Fauconnier's conceptual blending framework, the book proposes a theoretical model that considers blends and their emergent structure as a fundamental cognitive mechanism. The author thus discusses the computational implementation of several aspects of conceptual blending theory, namely composition, completion, elaboration, frames and optimality constraints. Informal descriptions and examples are supplied to provide non-computer scientists as well as non-cognitive linguists with clear insights into these ideas. Several experiments are made, and their results are discussed, with particular emphasis on the validation of the creativity and conceptual blending aspects. Written by a researcher with a background in artificial intelligence, the book is the result of several years of exploration and discussion from different theoretical perspectives. As a result, the book echoes some of the criticism made on conceptual blending and creativity in artificial intelligence, and thus proposes improvements in both areas, with the aim of being a constructive contribution to these very intriguing, yet appealing, research orientations. © 2007 by Walter de Gruyter GmbH & Co. KG. All rights reserved.
We examine two recent artificial intelligence (AI) based deep learning algorithms for visual blending in convolutional neural networks (Mordvintsev et al. 2015, Gatys et al. 2015). To investigate the potential value of these algorithms as tools for computational creativity research, we explain and schematize the essential aspects of the algorithms' operation and give visual examples of their output. We discuss the relationship of the two algorithms to human cognitive science theories of creativity such as conceptual blending theory and honing theory, and characterize the algorithms with respect to generation of novelty and aesthetic quality.
Mental Spaces is the classic introduction to the study of mental spaces and conceptual projection, as revealed through the structure and use of language. It examines in detail the dynamic construction of connected domains as discourse unfolds. The discovery of mental space organization has modified our conception of language and thought: powerful and uniform accounts of superficially disparate phenomena have become available in the areas of reference, presupposition projection, counterfactual and analogical reasoning, metaphor and metonymy, and time and aspect in discourse. The present work lays the foundation for this research. It uncovers simple and general principles that lie behind the awesome complexity of everyday logic.