Content uploaded by Mika Hämäläinen
Author content
All content in this area was uploaded by Mika Hämäläinen on Nov 07, 2019
Content may be subject to copyright.
Content uploaded by Mika Hämäläinen
Author content
All content in this area was uploaded by Mika Hämäläinen on Nov 03, 2019
Content may be subject to copyright.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing
and the 9th International Joint Conference on Natural Language Processing, pages 6001–6006,
Hong Kong, China, November 3–7, 2019. c
2019 Association for Computational Linguistics
6001
Generating Modern Poetry Automatically in Finnish
Mika H¨
am¨
al¨
ainen
Department of Digital Humanities
University of Helsinki
mika.hamalainen@helsinki.fi
Khalid Alnajjar
Department of Computer Science
University of Helsinki
khalid.alnajjar@helsinki.fi
Abstract
We present a novel approach for generating
poetry automatically for the morphologically
rich Finnish language by using a genetic al-
gorithm. The approach improves the state of
the art of the previous Finnish poem genera-
tors by introducing a higher degree of freedom
in terms of structural creativity. Our approach
is evaluated and described within the paradigm
of computational creativity, where the fitness
functions of the genetic algorithm are assimi-
lated with the notion of aesthetics. The output
is considered to be a poem 81.5% of the time
by human evaluators.
1 Introduction
Poem generation is a challenging task for cre-
ative NLG (natural language generation) requir-
ing structural integrity in the form of rhyming and
meter, grammatical correctness and figurative ex-
pression. Poems are meant to be interpreted and
therefore the meaning they convey cannot be fully
explained by semantics, but they rather require an
exploration into the notion of pragmatics.
In this paper, we present a novel approach
based on a genetic algorithm for creating poetry
in Finnish from the stand point of computational
creativity. In addition to solving problems related
to poems in general, the morphosyntactically com-
plex Finnish sets additional requirements for pro-
ducing grammatical output.
Computational creativity can be seen as a search
for creative artefacts in a conceptual space (cf.
Wiggins,2006). Therefore the use of genetic al-
gorithm for a creative task is reasonable as it con-
ducts a search and picks out the most suitable can-
didates based on its fitness function. An important
aspect for creativity is that the system should be
able to assess its own creations, a notion called
appreciation (Colton,2008) or aesthetic function
(Colton et al.,2011) in the literature. The fitness
function of the genetic algorithm serves for this
exact purpose, as it can score the output in terms
of different aesthetic dimensions.
2 Related work
In the past, poetry generation has been stud-
ied both from the point of view of computa-
tional creativity and natural language generation.
Poem generation has been tackled with a vari-
ety of different methods such as case-based rea-
soning (Gerv´
as,2001), templates (Colton et al.,
2012), translation with WFSTs (weighted finite-
state transducers) (Greene et al.,2010), text trans-
formation via word embeddings (Bay et al.,2017)
and conditional variational autoencoders (Li et al.,
2018). As the field of poem generation has been
broadly discussed by Oliveira (2017), we dedi-
cate the rest of this section to describing the exist-
ing poetry generation work conducted for Finnish
within the computational creativity paradigm. We
also discuss some previous approaches using ge-
netic algorithms.
One of the first takes on Finnish poem gener-
ation is the P. O. Eticus system (Toivanen et al.,
2012). P. O. Eticus uses a corpus of human au-
thored poems. These poems are used as templates
for generating new poetry. In practice, the system
takes a random poem from the corpus, conducts
a morphological analysis on it and replaces some
of the words in the existing poem. The replaced
words are inflected to match the morphology of
the original word.
Another take on the poetry generation in
Finnish is that of Kantosalo et al. (2015). This
approach is presented as a part of a poem author-
ing system. How this generator operates is that
it takes sentences form children’s books stored in
its corpus based on a shared keyword. These sen-
tences serve as verses, or poem fragments, and
6002
they are output one after another forming a gen-
erated poem. As the system does not alter text at
all, it does not have to deal with the complexities
of the Finnish morphology.
The most recent work on Finnish poem genera-
tion is the work presented by H¨
am¨
al¨
ainen (2018a).
This approach consists of individual rule-based
verse generators, each of which produces struc-
turally different verses with different types of fig-
urative expression, such as metaphors, tautology,
comparison and so on. The verse generators are
applied in the order defined by hand-written poem
structures. Semantic cohesion is achieved by the
fact that each verse generator outputs a noun to
the following verse generator in the poem struc-
ture. This guarantees that verses are always coher-
ent to some extent with the verse that immediately
precedes them. This generator is in use in the cre-
ative internet application Poem Machine tailored
for co-creativity (H¨
am¨
al¨
ainen,2018b).
The previous approaches in Finnish poem gen-
eration covered in this section are limited in terms
of structural creativity. The approaches are ei-
ther limited by the structure imposed by the exist-
ing poems or sentences, or the hand-written verse
structures. The approach we present in our pa-
per showcases more creative freedom on the struc-
tural level. This, however, is challenging due to
the complicated morphosyntax of Finnish; struc-
tural changes can easily render the results nonsen-
sical as wrong morphology in an incorrect syn-
tactic position will make the entire sentence non-
grammatical. We take this into account in our pro-
posed method.
Genetic algorithms have been used in the gener-
ation of poetic language before. Although not full
poem generation, Herv´
as et al. (2007) have used
genetic algorithms for generating alliterations in
Spanish. In terms of full poetry generation, Manu-
rung et al. (2012) aim for grammaticality,mean-
ingfulness and poeticness with their genetic algo-
rithm approach. Their approach tires to maximize
the similarity of the poem meter to the target me-
ter, and the poem semantics to the target seman-
tics, while still retaining grammaticality.
A recent approach to poem generation with ge-
netic algorithms, TwitSong 3.0 (Lamb and Brown,
2019), is based on a mined corpus of sentences
that are used as verses in poems based on their
inter-compatibility in terms of rhyming. They
base their fitness functions on the following met-
rics: (meter, emotion, topicality and imagery). The
fitness functions operate on verse level. They
solve emotion and imagery with existing lexicons,
topicality is assessed based on trigram and key-
word similarity with the desired topic and meter is
scored based on how close it is to a iambic meter.
3 Poem Generator
This section is dedicated to describing the data
used for poem generation, the genetic algorithm
and how the Finnish morphosyntax is solved by
the system. Special attention is paid to describing
the fitness functions, according to which the sys-
tem can rank its creations.
3.1 Data
We crawl Wikisources1for Finnish poetry. This
way we obtain 6,189 poems. We parse the poems
by using the Finnish dependency parser (Haveri-
nen et al.,2014) to obtain syntactic relations, mor-
phological features, part of speech and lemma for
each word. This constitutes our poem corpus,
denoted as P, with verse-level syntactic parsing.
These poems are used by the genetic algorithm for
the initial population, where a stanza of a human
authored poem is treated as one poem.
For semantics, we use the pretrained word2vec
word embeddings2trained on the Finnish Internet
Parsebank (Kanerva et al.,2014). This word2vec
model has been trained on lemmatized data, which
is important as we are interested in obtaining re-
placement words in an uninflected form.
3.2 Genetic Algorithm
Genetic algorithms are inspired by evolution tak-
ing place in the real world. They have an initial set
of individuals forming a population. These indi-
viduals are then exposed to evolutionary processes
such as mutation and crossover. After a genera-
tion, the fittest individuals survive to the next gen-
eration and the evolutionary process is repeated.
For modeling this process, we use the Python li-
brary DEAP (Fortin et al.,2012) as the genetic al-
gorithms framework.
We employ a standard (µ+λ) genetic algo-
rithm, which has previously been used in compu-
tational creativity applications (see Alnajjar et al.,
1https://fi.wikisource.org
2http://bionlp-www.utu.fi/fin-vector-space-models/fin-
word2vec-lemma.bin
6003
2018). The method begins by constructing an ini-
tial population and then evolving it, while optimiz-
ing certain parameters, throughout Ggenerations.
At each generation step, the fittest µindividuals
in the current population and the λoffspring are
selected to represent the next population. We em-
pirically set µand λto 100 and Gto 25. Addition-
ally, the algorithm takes two user-defined inputs,
a poem pand a theme t. For our case, we con-
sidered a theme tas a single word representing an
abstract concept such as nature; alternatively, a set
of words could be used instead to represent a more
focused theme (e.g. tree,forest,flower, . . . etc).
3.2.1 Initial Population
To build an initial population containing po-
ems with various syntactic structures, the method
makes µcopies of the input poem p. For each
poem, the method then replaces one verse in it
with a random verse from a different poem exist-
ing in the poem corpus P.
3.2.2 Mutation and Crossover
In our method, we implement one type of mu-
tation which selects a random content word in
the poem. The term content word in this case
refers to a word that belongs to an open class
part-of-speech category. The selected word is sub-
stituted with another semantically similar word,
which is determined as follows. Let wbe the
random content word selected to be replaced, the
method retrieves the top 300 semantically simi-
lar words to was candidate replacements from
the word2vec model. Thereafter, the method
uses UralicNLP (H¨
am¨
al¨
ainen,2019) to perform
morphological analysis on all candidate words.
The candidate words that have a different part-of-
speech tag than the original word ware omitted
out. Out of the remaining candidate words, a ran-
dom word is picked to substitute w.
We use a single-point crossover at the verse-
level. In practice, this means that during the evolu-
tionary process two poem individuals are selected
and a single point at the beginning of their verses
is chosen at random. Verses after that point are
swapped between them.
3.2.3 Fitness Functions
The genetic algorithm assesses the individuals
based on six metrics that evaluate the poem’s
structure and one metric that evaluates semantics.
The difference in the number of syllables in verses
and in the poetic foot, as measured by the distribu-
tion of long and short syllables, are contrasted to
the original poem. The genetic algorithm is set to
minimize these values to keep the difference mini-
mal. As not changing the poem at all would result
in the minimum difference in these values, we pe-
nalize identical verses by giving them a distance of
20. This way the genetic algorithm tries to make
changes so that results following the original me-
ter are preferred.
The number of full rhymes, assonance rhymes
and consonance rhymes in between the verses of
each generated poem are used as metrics to as-
sess to overall poetic quality of the individuals.
The number of alliterating words is counted within
verses as this type of rhyme is traditionally oc-
curring within verses in Finnish poetry. The val-
ues given by these four metrics are maximized by
the genetic algorithm to get the maximum number
rhyming words in the final outputs.
The last metric measures the average semantic
similarity of the words in the poem to the input
theme twith the word2vec model. Maximizing
this function pushes the evolutionary process to-
wards creating poems that are close in semantics
to the desired input theme.
As we are employing multiple objective func-
tions in our genetic algorithm, we resort to using a
non-dominant sorting algorithm (NSGA-II) (Deb
et al.,2002) for optimizing these functions. In
short, the algorithm selects individuals that are not
dominated by any other individual. An individ-
ual xis considered to be dominating another if its
scores on all objective functions are greater than
or equal to y’s and it is always better than yon at
least one objective.
3.3 Surface Generation
As the genetic algorithm does substitutions on the
level of lemmas, it is important to be able to turn
the verses with new lemmas into grammatical sen-
tences. This is not only needed for presenting the
final output produced by the genetic algorithm to
people, but also for the fitness functions to work.
In Finnish, the surface form of a word (mor-
phological realization) is affected by two mech-
anisms: agreement and government. The former
means that certain words have to share morpho-
logical features in a sentence. For example, ad-
jectives will have to follow the case and number
of the noun they modify, like so: punainen talo (a
6004
red house) and punaisessa talossa (in a red house).
This can be accounted for just by inflecting the re-
placement word with the morphology of the origi-
nal word. For this purpose we use Omorfi (Pirinen
et al.,2017) which implements Finnish morphol-
ogy as an FST (finite-state transducer).
Government, on the other hand, requires some
additional work. In government, words affect on
each other morphologically in a way that depends
on the governor. This means that if a governor
word is replaced by another one in the sentence,
the morphology of the governed word needs to
adapt to the change. In concrete, given an origi-
nal verse uneksin hatusta (I dream of a hat) and a
change of the verb to n¨
aen hatun (I see a hat), the
case of the object for hattu has to change from ela-
tive to genitive. We resolve government with Syn-
tax maker (H¨
am¨
al¨
ainen and Rueter,2018), which
resolves the required case based on corpus statis-
tics.
4 Results and Evaluation
As evaluation of creative systems is one of the
most difficult problems in the field of compu-
tational creativity, instead of trying to come up
with an evaluation metric of our own, we opt for
the evaluation method used to evaluate a previous
Finnish poem generator. In practice, this means
conducting a quantitative evaluation with human
judges with the evaluation questions defined by
Toivanen et al. (2012).
An additional reasoning for using human eval-
uators instead of automated evaluation metrics is
the poor correlation observed in a previous study
(H¨
am¨
al¨
ainen and Alnajjar,2019) of automatic
evaluation metrics such as BLEU (Papineni et al.,
2002) and PINC (Chen and Dolan,2011) scores
with human judgments when evaluating creativity
of a system.
We run the genetic algorithm to produce a final
population for 20 different initial poems for four
different themes luonto (nature), perhe (family),
lemmikki (pet) and ihminen (human). From each
of the 20 final populations, we pick one poem at
random. We shuffle the order of poems to reduce
the priming effect of poems appearing always in
a given order. We divide the 20 poems into two
batches of 10 poems to reduce the effort of an in-
dividual evaluator. Each batch of 10 is then eval-
uated by 10 different human evaluators recruited
from the university campus. The total number
of evaluators is 20 and all of them are native in
Finnish.
We use the following evaluation questions from
Toivanen et al. (2012): (1) How typical is the text
as a poem? (2) How understandable is it? (3)
How good is the language? (4) Does the text evoke
mental images? (5) Does the text evoke emotions?
(6) How much do you like the text?. These ques-
tions are evaluated in a 5 point Likert scale, where
1 represents the worst and 5 the best grade. In ad-
dition to these questions, one simple binary ques-
tion is asked: Is the text a poem?.
Figure 1: Evaluation results
Figure 1represents the average values of the re-
sults of the human evaluation for each question.
The plot also shows the evaluation results of P.O.
Eticus as obtained in their study. As we can see,
our method shows higher ratings on all the evalu-
ation questions except for question 3. As for the
binary question, the judges rated the output as a
poem 81.5% of the time which is exactly the same
result as P.O. Eticus got.
However, it is to remember that as a high level
of subjectivity is involved in this evaluation set-
ting, our results should not be directly compared to
those of P.O. Eticus. The results form their study
should taken more as a reference, rather than a
definite proof that our system always outperforms
P.O. Eticus.
Q1 Q2 Q3 Q4 Q5 Q6
Average 3.10 2.94 3.11 3.60 3.23 2.77
Median 3 3 3 4 3 3
Mode 2 2 3 4 4 2
Table 1: The average, median and mode of the evalua-
tion results
Table 1shows the median and mode of the re-
sults in addition to the average values. The median
values seems to correspond to the rounded average
values. However, the mode deviates in the case
6005
of the first, second, fifth and last questions as the
most chosen answer by the judges was different
from the average.
Ja kultaa, kuninkaankin saan.
Ja laulut ne kiert¨
av¨
at maata ja merta
Jos virkkaan kun orja
Aina, todella Herra pahankurisuutta antaa.
And gold, of a king I shall have.
And songs, they shall roam on the land and the sea
If I knit like a slave
Always, indeed the Lord shall wrack his mischief.
Above is an example of a poem generated by
the system and its translation in English. The po-
ems generated by the system are typically of this
length as the genetic algorithm uses a stanza of an
existing poem as its starting point.
5 Discussion and Conclusion
The method presented in this paper shows im-
provement on a previously used evaluation met-
ric. However, based on the discussions we had
with some of the human evaluators after they had
given their judgment, it became evident that peo-
ple have very different criteria for poetry. Some
of the judges had guessed that they were reading
computer generated poems, even though this detail
was not revealed to them explicitly. Their judg-
ments were the most critical towards the generated
poetry. On the other hand, the evaluators, who
were struck by a surprise that the poems were gen-
erated by a computer, were in general more gener-
ous in their judgments. One of the evaluators al-
most refused to believe the poems were generated
by a computer instead of a person.
The high level of subjectivity that we could ob-
serve just by talking with people calls for a more
robust qualitative study on the poem evaluation
problem itself in the future. This would allow us to
uncover additional factors that affect on the judg-
ments given by people. Furthermore, conducting a
study just on the evaluation itself makes it possible
for us to evaluate the adequacy of the used evalu-
ation metric in evaluating computer generated po-
etry.
Nevertheless, the scores achieved by our sys-
tem, in relation to a previous method by follow-
ing the same evaluation metric, are promising as
they are indicative of potentially higher quality
in the output. We have presented a solution for
the Finnish morphosyntax in conjunction with em-
ploying a genetic algorithm to cater for computa-
tional creativity in poem generation.
Acknowledgements
Special thanks to Jack Rueter for helping out with
the evaluation.
References
Khalid Alnajjar, Hadaytullah Hadaytullah, and Hannu
Toivonen. 2018. “Talent, Skill and Support.” A
method for automatic creation of slogans. In Pro-
ceedings of the 9th International Conference on
Computational Creativity (ICCC 2018), pages 88–
95, Salamanca, Spain. Association for Computa-
tional Creativity.
Benjamin Bay, Paul Bodily, and Dan Ventura. 2017.
Text transformation via constraints and word em-
bedding. In Proceedings of the Eighth International
Conference on Computational Creativity, pages 49–
56.
David L Chen and William B Dolan. 2011. Collect-
ing highly parallel data for paraphrase evaluation.
In Proceedings of the 49th Annual Meeting of the
Association for Computational Linguistics: Human
Language Technologies-Volume 1, pages 190–200.
Simon Colton. 2008. Creativity Versus the Percep-
tion of Creativity in Computational Systems. In
AAAI Spring Symposium: Creative Intelligent Sys-
tems, Technical Report SS-08-03, pages 14—-20,
Stanford, California, USA.
Simon Colton, John William Charnley, and Alison
Pease. 2011. Computational creativity theory: The
face and idea descriptive models. In ICCC, pages
90–95.
Simon Colton, Jacob Goodwin, and Tony Veale. 2012.
Full-face poetry generation. In Proceedings of the
Third International Conference on Computational
Creativity, pages 95—-102.
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. 2002.
A fast and elitist multiobjective genetic algorithm:
Nsga-ii.IEEE Transactions on Evolutionary Com-
putation, 6(2):182–197.
F´
elix-Antoine Fortin, Franc¸ois-Michel De Rainville,
Marc-Andr´
e Gardner, Marc Parizeau, and Chris-
tian Gagn´
e. 2012. DEAP: Evolutionary algorithms
made easy. Journal of Machine Learning Research,
13:2171–2175.
Pablo Gerv´
as. 2001. An expert system for the compo-
sition of formal Spanish poetry.Knowledge-Based
Systems, 14(3):181–188.
Erica Greene, Tugba Bodrumlu, and Kevin Knight.
2010. Automatic analysis of rhythmic poetry with
applications to generation and translation. In Pro-
ceedings of the 2010 Conference on Empirical Meth-
ods in Natural Language Processing, EMNLP ’10,
pages 524–533, Stroudsburg, PA, USA. Association
for Computational Linguistics.
6006
Mika H¨
am¨
al¨
ainen. 2018a. Harnessing NLG to Cre-
ate Finnish Poetry Automatically. In Proceedings
of the Ninth International Conference on Computa-
tional Creativity, pages 9–15.
Mika H¨
am¨
al¨
ainen. 2018b. Poem Machine - a Co-
creative NLG Web Application for Poem Writing.
In The 11th International Conference on Natural
Language Generation: Proceedings of the Confer-
ence, pages 195—-196.
Mika H¨
am¨
al¨
ainen. 2019. UralicNLP: An NLP library
for Uralic languages.Journal of Open Source Soft-
ware, 4(37):1345.
Mika H¨
am¨
al¨
ainen and Khalid Alnajjar. 2019. Mod-
elling the Socialization of Creative Agents in a
Master-Apprentice Setting: The Case of Movie
Title Puns. In Proceedings of the Tenth Inter-
national Conference on Computational Creativity,
pages 266–273.
Mika H¨
am¨
al¨
ainen and Jack Rueter. 2018. Develop-
ment of an Open Source Natural Language Gener-
ation Tool for Finnish. In Proceedings of the Fourth
International Workshop on Computational Linguis-
tics for Uralic Languages, pages 51–58.
Katri Haverinen, Jenna Nyblom, Timo Viljanen,
Veronika Laippala, Samuel Kohonen, Anna Missil¨
a,
Stina Ojala, Tapio Salakoski, and Filip Ginter. 2014.
Building the essential resources for finnish: the
turku dependency treebank.Language Resources
and Evaluation, 48(3):493–531.
Raquel Herv´
as, Jason Robinson, and Pablo Gerv´
as.
2007. Evolutionary assistance in alliteration and al-
lelic drivel. In Workshops on Applications of Evolu-
tionary Computation, pages 537–546. Springer.
Jenna Kanerva, Juhani Luotolahti, Veronika Laippala,
and Filip Ginter. 2014. Syntactic n-gram collection
from a large-scale corpus of internet Finnish. In Hu-
man Language Technologies-The Baltic Perspective:
Proceedings of the Sixth International Conference
Baltic HLT, volume 268, pages 184–191.
Anna Kantosalo, Jukka Toivanen, and Hannu Toivo-
nen. 2015. Interaction Evaluation for Human-
Computer Co-creativity: A Case Study. In Proceed-
ings of the Sixth International Conference on Com-
putational Creativity, pages 276–283.
Carolyn Lamb and Daniel G. Brown. 2019. Twit-
Song 3.0: towards semantic revisions in computa-
tional poetry. In Proceedings of the Tenth Inter-
national Conference on Computational Creativity,
pages 212–219.
Juntao Li, Yan Song, Haisong Zhang, Dongmin Chen,
Shuming Shi, Dongyan Zhao, and Rui Yan. 2018.
Generating classical Chinese poems via conditional
variational autoencoder and adversarial training.
In Proceedings of the 2018 Conference on Em-
pirical Methods in Natural Language Processing,
pages 3890–3900, Brussels, Belgium. Association
for Computational Linguistics.
Ruli Manurung, Graeme Ritchie, and Henry Thomp-
son. 2012. Using genetic algorithms to create mean-
ingful poetic text. J. Exp. Theor. Artif. Intell., 24:43–
64.
Hugo Gonc¸alo Oliveira. 2017. A survey on intelligent
poetry generation: Languages, features, techniques,
reutilisation and evaluation. In Proceedings of the
10th International Conference on Natural Language
Generation, pages 11–20, Santiago de Compostela,
Spain. Association for Computational Linguistics.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-
Jing Zhu. 2002. Bleu: a method for automatic eval-
uation of machine translation. In Proceedings of
the 40th annual meeting on association for compu-
tational linguistics, pages 311–318.
Tommi A Pirinen, Inari Listenmaa, Ryan Johnson,
Francis M. Tyers, and Juha Kuokkala. 2017. Open
morphology of finnish. LINDAT/CLARIN digital
library at the Institute of Formal and Applied Lin-
guistics, Charles University.
Jukka Toivanen, Hannu Toivonen, Alessandro Valitutti,
and Oskar Gross. 2012. Corpus-Based Generation
of Content and Form in Poetry. In Proceedings
of the Third International Conference on Computa-
tional Creativity.
Geraint A Wiggins. 2006. A preliminary framework
for description, analysis and comparison of creative
systems. Knowledge-Based Systems, 19(7):449–
458.