Conference PaperPDF Available

Modern French Poetry Generation with RoBERTa and GPT-2

Authors:

Abstract

We present a novel neural model for modern poetry generation in French. The model consists of two pretrained neural models that are fine-tuned for the poem generation task. The encoder of the model is a RoBERTa based one while the decoder is based on GPT-2. This way the model can benefit from the superior natural language understanding performance of RoBERTa and the good natural language generation performance of GPT-2. Our evaluation shows that the model can create French poetry successfully. On a 5 point scale, the lowest score of 3.57 was given by human judges to typ-icality and emotionality of the output poetry while the best score of 3.79 was given to understandability.
Modern French Poetry Generation with RoBERTa and GPT-2
Mika H¨
am¨
al¨
ainen1,2, Khalid Alnajjar1,2 and Thierry Poibeau2
1University of Helsinki, Finland
2´
Ecole Normale Sup´
erieure-PSL and CNRS and Universit´
e Sorbonne nouvelle, Paris, France
firstname.lastname@{helsinki.fi1or ens.psl.eu2}
Abstract
We present a novel neural model for modern poetry gen-
eration in French. The model consists of two pretrained
neural models that are fine-tuned for the poem gener-
ation task. The encoder of the model is a RoBERTa
based one while the decoder is based on GPT-2. This
way the model can benefit from the superior natural
language understanding performance of RoBERTa and
the good natural language generation performance of
GPT-2. Our evaluation shows that the model can cre-
ate French poetry successfully. On a 5 point scale, the
lowest score of 3.57 was given by human judges to typ-
icality and emotionality of the output poetry while the
best score of 3.79 was given to understandability.
Introduction
Poem generation is a challenging creative natural language
generation task. As a form of art, it has undergone several
changes in the history. Classical poetry incorporates typ-
ically meter and rhyme as their function was to help peo-
ple recall poems, especially when poetic tradition was still
mostly oral rather than written.
In the modern era, the role of the poetry has changed from
an art form that has to follow a fixed structure that defines
its meter and rhyming such as iamb, haiku or anapest. Mod-
ern poetry is more concerned about creating something new
by breaking any strict structural rules and by continuously
questioning what poetry is, what it can be and what it should
be (see Kantokorpi, Lyytik¨
ainen, and Viikari 1990).
In the field of poem generation, meter is a feature that is
very often considered in generated poetry (Colton, Good-
win, and Veale 2012; Lau et al. 2018; H ¨
am¨
al¨
ainen and Al-
najjar 2019b; Zugarini, Melacci, and Maggini 2019; Lewis,
Zugarini, and Alonso 2021). By incorporating meter, peo-
ple can be more forgiving when evaluating the output of the
system as it is known that people are ready to interpret more
into the content of the output of a computationally creative
system if the form is correct (Veale 2016). In other words,
a poem that looks like a poem, as in that it follows a certain
meter, must be a poem. A truly competent computational
poet should be capable of generating something that is rec-
ognizable as a poem even if its output was modern free-form
poetry.
In this paper, we explore the topic of modern poetry gen-
eration in French. We fine-tune a novel encoder-decoder ar-
chitecture which consists of a RoBERTa (Liu et al. 2019)
based model as the encoder and a GPT-2 (Radford et al.
2019) based model as the decoder. Because RoBERTa is
very good at natural language understanding tasks but poor
at generation tasks and GPT-2 is good at generation, but bad
at understanding, it makes sense to incorporate both of the
models. The task of RoBERTa is to encode the input (i.e.
to understand poetry) and the task of GPT-2 is to decode the
output (i.e. to generate poetry).
Related work
Poem generation has sparked a lot of interest in the past as
we can see in a recent survey on the field (Gonc¸alo Oliveira
2017). There is also some work on generating French poetry
in particular (Van de Cruys 2019; Poibeau et al. 2020). In
this section, we will provide a quick overview on some of
the related work.
Poetry has been generated by using rules. This can, for
instance, be done by modeling certain poetic features to find
suitable verses (Lamb and Brown 2019), by writing differ-
ent rules to generate verses with different poetic features
(H¨
am¨
al¨
ainen 2018) or by using templates (Colton, Good-
win, and Veale 2012).
Genetic algorithms have been popular in poem generation
as well (Manurung, Ritchie, and Thompson 2012; Yang et al.
2016; 2021). In many of these approaches, the generation
itself is not governed by rules, but rules are used to assess
the quality of each generated poem. These approaches can
be seen as a step away from fully relying on rules in the
generative process.
Recently, different neural approaches have become com-
mon place in poetry generation such as LSTM models
(Zhang and Zhang 2020; Atassi and El Azami 2022) and
GPT-2 (Hu and Sun 2020; Beheitt and Hmida 2022). Our
approach follows this line of work. How we set ourselves
apart from the existing models is that we propose a novel
poem generator model that can benefit from both RoBERTa
and GPT-2 instead of relying only on one model. In ad-
dition, we condition the beam search on rhyme instead of
solely relying on the output of a vanilla model.
Verse Input Output Output in English
1 amant galamment escrime dame Vainement, paladin des dames, tu t’escrimes In vain, paladin of the ladies, you fight me with a sword
2 Vainement, paladin des dames, tu t’escrimes `
A crier que j’ai fait, ici, peu galamment. Shouting what I have done, here, little gallantly.
3`
A crier que j’ai fait, ici, peu galamment. si de Laure P´
etrarque avait ´
et´
e l’amant, if P´
etrarque had been the lover of Laure
4 si de Laure P´
etrarque avait ´
et´
e l’amant, Qui sait ce qu’il e ˆ
ut dit de Laure dans ses Rimes? Who knows what he said about Laure in his Rhymes?
Table 1: Example of the training data for one poem
Data
As machine learning requires data, we need a poem corpus.
For this reason, we crawl all the French poems that are avail-
able on Wikisource1. The poems are not free of noise as
some of the poems include verses in Greek alphabets, mul-
tiple different types of quotation marks, hyphens and spaces
of different lengths etc. We clean the data from all of these
inconsistencies by manually inspecting odd characters and
either by replacing them (e.g. only one type of a hyphen)
or removing them (e.g. Greek letters). The corpus contains
7553 poems. In addition, we use the French sonnet corpus
introduced by Poibeau et al. 2020. This corpus has 1039
sonnets.
Because these poems and sonnets are of different lengths,
we split all of them into stanzas. From this point on, we
treat a stanza as a poem so that all poems in our corpus are
of a similar length. This gives us altogether 25,215 French
poems and sonnets. For the purposes of our models, we do
not make a distinction between poems and sonnets.
Poem generator
In this section, we describe our poem generation model. The
model follows an encoder-decoder architecture where the
encoder is a RoBERTa model and the decoder is a GPT-2
model. Rather than training these models from scratch, we
use pretrained language models and fine-tune them for the
task of poem generation using a transfer learning approach.
We chose a RoBERTa-based model as the encoder given
their great ability in capturing contextual semantics. GPT-
2 is well-known for modeling a language; hence, making an
optimal decoder for text-generation tasks.
First we have to pick the suitable pretrained models. As
we use Transformers library (Wolf et al. 2020), we select
our models from their repository. The current state-of-the-
art French RoBERTa model is CamemBERT2(Martin et al.
2020) which is based on the RoBERTa (Liu et al. 2019)
architecture and trained on the large OSCAR corpus (Abadji
et al. 2022) in French. We use CamemBERT as our encoder.
As for the selection of the GPT-2 model, there were sev-
eral alternatives. By trying the models out, we could see
that all of them except for Belgian GPT-23(Louis 2020)
predicted rather poor output. The model was trained on a
variety of genres (such as news, Wikipedia, novels, Euro-
pean parliament text etc.) on a relatively big, around 60 GB,
corpus. For this reason, we opted for Belgian GPT-2 as our
decoder model.
1https://fr.wikisource.org/wiki/Cat´
egorie:Po`
emes
2https://huggingface.co/camembert-base
3https://huggingface.co/antoiloui/belgpt2
We use Spacy4(Honnibal et al. 2020) to extract up to
4 keywords from each poem in the corpus. We train our
encoder-decoder architecture for sequence to sequence gen-
eration, where it predicts the next verse in a poem given a
previous verse. In the absence of a previous verse, we train
the model to predict the first verse of a poem from the up
to 4 keywords extracted from the poem. An example of in-
put and output in the training data for one poem can be seen
in Table 1. The first input consists of the keywords amant
(lover), galamment (gallantly), escrime (fencing) and dame
(lady), which are used to predict the first verse of the poem.
The poem corpus is split randomly to 80% for training
and 20% for validation. The model is trained for 10 epochs.
We use the Adam algorithm (Kingma and Ba 2014) with de-
coupled weight decay regularization (Loshchilov and Hutter
2017) and learning rate of 5e-05 to optimize the parameters
of the model, with cross entropy loss as the loss function to
reduce the difference between gold standard token and pre-
dicted tokens.
Rhyming is taken into account during the generation
phase. The model is requested to generate a sequence be-
tween 4 to 20 tokens, with a length penalty of 1.0 using a
greedy approach. At each step of generating the output se-
quence (i.e., when predicting the next token), we use the
model to predict the top 10 possible tokens instead of just
one highest scoring output. We then sort these candidate to-
kens based on their probabilities and rhyming scores. The
rhyming score is calculated by counting the number of to-
kens in the output that rhyme (full rhyme, consonance or
assonance) with the input (i.e., the previous verse and any
subsequent words generated during the run).
Because it is not easy to know whether two French words
rhyme or not based on the orthography (similarly to En-
glish), we use eSpeak-ng5to produce an IPA (interna-
tional phonetic alphabet) representation for each token in
the model’s vocabulary. IPA alphabets are designed to rep-
resent how words are pronounced by writing out the actual
phonemes. We use a simple set of rules to compare the IPA
strings of two tokens with each other to determine whether
they rhyme or not.
In practice, we first ensure that both of the IPA strings are
equally long, if this is not the case, we remove characters
from the beginning of the longer string until the IPA strings
are equally long. If the strings are identical, no rhyme is
considered, because a word does not make a good rhyme
with itself. For full rhyme, the two IPA strings rhyme if they
are identical from the first vowel onward. For assonance, we
replace all consonants with a placeholder character C, if the
4The fr core news sm model
5https://github.com/espeak-ng/espeak-ng
D’un beau travail, d’une bonne pose,
De la paix, de la beaut´
e.
Que je plains la beaut´
e
De la femme, qui m’inspire
From a beautiful work, from a good pose.
From the peace, from the beauty
Oh, I lament the beauty
Of the woman, who inspires me
C’est ici que s’´
eveille le soleil,
C’est ici que repose le grand cr´
eateur,
Dont la ruine, h´
elas! se renouvelle
De l’Enfant du Progr`
es
It is here where the sun wakes
It is here where the great creator rests
Whose ruin, alas! renews itself
From the Child of Progress
C’est un des mois les plus beaux de l’ann´
ee,
C’est le printemps, c’est l’´
et´
e, c’est
Le ciel o`
u mon printemps se joue.
`
A mon jardin qui s’effondrit.
It is one of the most beautiful months of the year,
It is the spring, it is the summer, it is
The sky where my spring plays.
In my garden that collapses
Table 2: Examples of generated poetry and their translations.
IPA strings are identical, i.e. they share the same vowels in
the same positions, they are considered to have assonance
rhyme. For consonance, we do the same as with assonance,
but by replacing all vowels with a placeholder V.
Results and evaluation
For evaluation purposes, we generate 20 different poems
consisting of 4 verses each. For each poem, we use a set
of four randomly selected keywords among all the keywords
extracted from the poem corpus. None of the keyword com-
binations is identical to what the model saw during the train-
ing. We generate the poems similarly to the example shown
in Table 1. This means, that the keywords were used to gen-
erate the first verse, which was then used to generate the
second verse and so on.
Some of the generated poems and their translations can
be seen in Table 2. As we can see, the generated output is
cohesive and quite grammatical. We can, however, see that
sometimes the verb conjugation might be wrong such as in
the case of effondrit which is a non-existing inflectional form
of effondrer (to collapse). Also, the model has a tendency
of starting every verse with a capital letter even if it was a
continuation to the sentence started in the previous verse.
We conduct a crowd-sourced evaluation on Appen6. We
set French as a language requirement for the crowd-workers
so that we know that they actually speak French and are able
to assess French poetry. Each poem is evaluated by 20 dif-
ferent crowd-workers. An individual worker can evaluate
all 20 different poems or just some of them, in which case
the remaining unevaluated poems are shown to a different
crowd-worker. An individual crowd-worker cannot evaluate
the same poem multiple times.
For evaluation, we use the same parameters as used by
several authors for evaluating poetry (Toivanen et al. 2012;
H¨
am¨
al¨
ainen and Alnajjar 2019a; Shihadeh and Ackerman
2020): (1) The poem is typical (2) The poem is understand-
able (3) The poem is grammatical (4) The poem evokes
imagination (5) The poem evokes emotions (6) I like the
poem. These statements are evaluated in a 5 point Likert
scale, where 1 represents the worst and 5 the best grade.
6https://appen.com/
Q1 Q2 Q3 Q4 Q5 Q6
Avg 3.57 3.79 3.77 3.65 3.57 3.77
STD 0.88 0.84 0.81 0.79 0.88 0.77
Table 3: The evaluation results and standard deviation
The results can be seen in Table 3. All in all, the results
are good and show that the system can generate poetry suc-
cessfully. The lowest scores were obtained for typicality
and emotionality. The highest score was given to under-
standability. In the future, more robust human evaluation
methods need to be applied to understand why these param-
eters scored high and low (H¨
am¨
al¨
ainen and Alnajjar 2021a;
2021b).
Conclusions
In this paper, we have presented a novel approach to French
poem generation. We have presented an architecture that
consists of RoBERTa and GPT-2 models that are fine-tuned
on a poem corpus. In addition, we have modeled rhyme as a
part of the prediction pipeline of the model.
The results obtained in human evaluation are promising
and they indicate that the model performs well in the task it
was designed to do. In order to make the evaluation results
more transparent, we have released them in full on Zenodo7
together with the generated poems that were used in the eval-
uation.
Pretrained neural language models have been proven to be
useful in poem generation. In the future, it would be inter-
esting to study them in a multilingual setting, where a pre-
trained multilingual model is fine-tuned to generate poetry
using the corpora of some languages other than the desired
target language.
Author Contributions
The first two authors contributed to the work presented in
this paper equally. The third author was involved in planning
the methods and writing the paper.
7https://zenodo.org/record/6558357
Acknowledgments
This work was partially financed by the Society of Swedish
Literature in Finland with funding from Enhancing Con-
versational AI with Computational Creativity. This work
was conducted during a mobility period supported by Nokia
Foundation under grant number 20220193. This work was
supported in part by the French government under manage-
ment of Agence Nationale de la Recherche as part of the “In-
vestissements d’avenir” program, reference ANR19-P3IA-
0001 (PRAIRIE 3IA Institute). The work was also sup-
ported by the CNRS funded International Research Network
Cyclades (Corpora and Computational Linguistics for Digi-
tal Humanities).
References
Abadji, J.; Ortiz Suarez, P.; Romary, L.; and Sagot, B.
2022. Towards a Cleaner Document-Oriented Multilingual
Crawled Corpus. arXiv e-prints arXiv:2201.06642.
Atassi, A., and El Azami, I. 2022. Comparison and gen-
eration of a poem in arabic language using the lstm, bilstm
and gru. Journal of Management Information & Decision
Sciences 25.
Beheitt, M. E. G., and Hmida, M. B. H. 2022. Automatic
arabic poem generation with gpt-2. In Proceedings of the
14th International Conference on Agents and Artificial In-
telligence - Volume 2: ICAART,, 366–374. INSTICC.
Colton, S.; Goodwin, J.; and Veale, T. 2012. Full-face poetry
generation. In ICCC, 95–102.
Gonc¸alo Oliveira, H. 2017. A survey on intelligent po-
etry generation: Languages, features, techniques, reutilisa-
tion and evaluation. In Proceedings of the 10th International
Conference on Natural Language Generation, 11–20. San-
tiago de Compostela, Spain: Association for Computational
Linguistics.
H¨
am¨
al¨
ainen, M., and Alnajjar, K. 2019a. Generating mod-
ern poetry automatically in Finnish. In Proceedings of the
2019 Conference on Empirical Methods in Natural Lan-
guage Processing and the 9th International Joint Confer-
ence on Natural Language Processing (EMNLP-IJCNLP),
5999–6004. Hong Kong, China: Association for Computa-
tional Linguistics.
H¨
am¨
al¨
ainen, M., and Alnajjar, K. 2019b. Let’s face it.
finnish poetry generation with aesthetics and framing. In
Proceedings of the 12th International Conference on Natu-
ral Language Generation, 290–300.
H¨
am¨
al¨
ainen, M., and Alnajjar, K. 2021a. The great mis-
alignment problem in human evaluation of NLP methods.
In Proceedings of the Workshop on Human Evaluation of
NLP Systems (HumEval), 69–74. Online: Association for
Computational Linguistics.
H¨
am¨
al¨
ainen, M., and Alnajjar, K. 2021b. Human evaluation
of creative NLG systems: An interdisciplinary survey on re-
cent papers. In Proceedings of the 1st Workshop on Nat-
ural Language Generation, Evaluation, and Metrics (GEM
2021), 84–95. Online: Association for Computational Lin-
guistics.
H¨
am¨
al¨
ainen, M. 2018. Harnessing nlg to create finnish
poetry automatically. In Proceedings of the ninth interna-
tional conference on computational creativity. Association
for Computational Creativity (ACC).
Honnibal, M.; Montani, I.; Van Landeghem, S.; and
Boyd, A. 2020. spacy: Industrial-strength nat-
ural language processing in python, 2020. URL
https://doi.org/10.5281/zenodo 1212303(6).
Hu, J., and Sun, M. 2020. Generating major types of chinese
classical poetry in a uniformed framework. arXiv preprint
arXiv:2003.11528.
Kantokorpi, M.; Lyytik¨
ainen, P.; and Viikari, A. 1990.
Runousopin perusteet. Gaudeamus.
Kingma, D. P., and Ba, J. 2014. Adam: A method for
stochastic optimization. In arXiv.
Lamb, C., and Brown, D. G. 2019. TwitSong 3.0: to-
wards semantic revisions in computational poetry. In Pro-
ceedings of the Tenth International Conference on Compu-
tational Creativity, 212–219.
Lau, J. H.; Cohn, T.; Baldwin, T.; Brooke, J.; and Ham-
mond, A. 2018. Deep-speare: A joint neural model of po-
etic language, meter and rhyme. In Proceedings of the 56th
Annual Meeting of the Association for Computational Lin-
guistics (Volume 1: Long Papers), 1948–1958.
Lewis, D.; Zugarini, A.; and Alonso, E. 2021. Syllable
neural language models for english poem generation. In
12th International Conference on Computational Creativity
(ICCC’21).
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.;
Levy, O.; Lewis, M.; Zettlemoyer, L.; and Stoyanov, V.
2019. Roberta: A robustly optimized bert pretraining ap-
proach. arXiv preprint arXiv:1907.11692.
Loshchilov, I., and Hutter, F. 2017. Decoupled weight decay
regularization. In arXiv.
Louis, A. 2020. BelGPT-2: a GPT-2 model pre-trained on
French corpora. https://github.com/antoiloui/belgpt2.
Manurung, R.; Ritchie, G.; and Thompson, H. 2012. Using
genetic algorithms to create meaningful poetic text. Jour-
nal of Experimental & Theoretical Artificial Intelligence
24(1):43–64.
Martin, L.; Muller, B.; Ortiz Su´
arez, P. J.; Dupont, Y.; Ro-
mary, L.; de la Clergerie, ´
E.; Seddah, D.; and Sagot, B.
2020. CamemBERT: a tasty French language model. In
Proceedings of the 58th Annual Meeting of the Association
for Computational Linguistics, 7203–7219. Online: Associ-
ation for Computational Linguistics.
Poibeau, T.; Maignant, M.; M´
elanie-Becquet, F.; Plancq, C.;
Raffard, M.; and Roussel, M. 2020. Sonnet combinatorics
with OuPoCo. In Proceedings of the The 4th Joint SIGHUM
Workshop on Computational Linguistics for Cultural Her-
itage, Social Sciences, Humanities and Literature, 133–137.
Online: International Committee on Computational Linguis-
tics.
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.;
Sutskever, I.; et al. 2019. Language models are unsuper-
vised multitask learners. OpenAI blog 1(8):9.
Shihadeh, J., and Ackerman, M. 2020. Emily: An emily
dickinson machine. In ICCC, 243–246.
Toivanen, J.; Toivonen, H.; Valitutti, A.; and Gross, O. 2012.
Corpus-Based Generation of Content and Form in Poetry. In
Proceedings of the Third International Conference on Com-
putational Creativity.
Van de Cruys, T. 2019. La g ´
en´
eration automatique de
po´
esie en franc¸ais (automatic poetry generation in French).
In Actes de la Conf´
erence sur le Traitement Automatique des
Langues Naturelles (TALN) PFIA 2019. Volume I : Articles
longs, 113–126. Toulouse, France: ATALA.
Veale, T. 2016. The shape of tweets to come: Automating
language play in social networks. Multiple Perspectives on
Language Play 1:73–92.
Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.;
Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al.
2020. Transformers: State-of-the-art natural language pro-
cessing. In Proceedings of the 2020 conference on empirical
methods in natural language processing: system demonstra-
tions, 38–45.
Yang, W.; Cheng, Y.; He, J.; Hu, W.; and Lin, X. 2016.
Research on community competition and adaptive genetic
algorithm for automatic generation of tang poetry. Mathe-
matical Problems in Engineering 2016.
Yang, W.; Weng, W.; Chen, G.; and Jiang, Z. 2021.
Elitist strategy of genetic algorithms for writing tang po-
etry. International Arab Journal Of Information Technology
18(4):604–610.
Zhang, H., and Zhang, Z. 2020. Automatic generation
method of ancient poetry based on lstm. In 2020 15th
IEEE Conference on Industrial Electronics and Applications
(ICIEA), 95–99. IEEE.
Zugarini, A.; Melacci, S.; and Maggini, M. 2019. Neural
poetry: Learning to generate poems using syllables. In In-
ternational Conference on Artificial Neural Networks, 313–
325. Springer.
... The emergence of AI-driven code generation has sparked a vibrant debate in the software development community. This technology, which leverages advanced machine learning models like [4][5] [6][7]OpenAI's GPT-2, RoBERTa, promises to transform the traditional coding process. Automated software documentation generation powered by AI holds significant promise for revolutionizing development productivity by alleviating the burden of manual documentation tasks. ...
... The authors' findings suggest that Markov-LSTM is a promising approach for generating Arabic poems, and they encourage further research in this area. Hämäläinen, Alnajjar & Poibeau (2022) proposed "Modern French Poetry Generation with RoBERTa and GPT-2". The article examines the use of two different machine learning models to generate modern French poetry: RoBERTa and GPT-2. ...
Article
Full-text available
Generating poetry using machine and deep learning techniques has been a challenging and exciting topic of research in recent years. It has significance in natural language processing and computational linguistics. This study introduces an innovative approach to generate high-quality Pashto poetry by leveraging two pre-trained transformer models, LaMini-Cerebras-590M and bloomz-560m. The models were trained on an extensive new and quality Pashto poetry dataset to learn the underlying complex patterns and structures. The trained models are then used to generate new Pashto poetry by providing them with a seed text or prompt. To evaluate the quality of the generated poetry, we conducted both subjective and objective evaluations, including human evaluation. The experimental results demonstrate that the proposed approach can generate Pashto poetry that is comparable in quality to human-generated poetry. The study provides a valuable contribution to the field of Pashto language and poetry generation and has potential applications in natural language processing and computational linguistics.
... A study by (Lo et al., 2022) introduced GPoeT-2 model, based on GPT-2 architecture and capable of generating from any given subject. Lastly, (Hämäläinen et al., 2022), also sought to establish a model capable of generating French poems by using two pre-trained language models. ...
Preprint
Full-text available
The task of writing rap is challenging and involves producing complex rhyming schemes, yet meaningful lyrics. In this work, we propose Raply, a fine-tuned GPT-2 model capable of producing meaningful rhyming text in the style of rap. In addition to its rhyming capabilities, the model is able to generate less offensive content. It was achieved through the fine-tuning the model on a new dataset Mitislurs, a profanity-mitigated corpus. We evaluate the output of the model on two criteria: 1) rhyming based on the rhyme density metric; 2) profanity content, using the list of profanities for the English language. To our knowledge, this is the first attempt at profanity mitigation for rap lyrics generation.
... In the field of natural language processing (NLP), the culturally interesting study of poetry generation, like machine translation, has moved from generation based on rules and templates [4,5] to generation based on statistical machine learning [6,7] and then to generation based on neural networks [8,9]. Currently, poetry generation based on pretrained large language models [10][11][12][13] has greatly improved in quality. France and China are both countries with profound cultural heritage. ...
Article
Full-text available
Literature has a strong cultural imprint and regional color, including poetry. Natural language itself is part of the poetry style. It is interesting to attempt to use one language to present poetry in another language style. Therefore, in this study, we propose a method to fine-tune a pre-trained model in a targeted manner to automatically generate French-style modern Chinese poetry and conduct a multi-faceted evaluation of the generated results. In a five-point scale based on human evaluation, judges assigned scores between 3.29 and 3.93 in seven dimensions, which reached 80.8–93.6% of the scores of the Chinese versions of real French poetry in these dimensions. In terms of the high-frequency poetic imagery, the consistency of the top 30–50 high-frequency poetic images between the poetry generated by the fine-tuned model and the French poetry reached 50–60%. In terms of the syntactic features, compared with the poems generated by the baseline model, the distribution frequencies of three special types of words that appear relatively frequently in French poetry increased by 12.95%, 15.81%, and 284.44% per 1000 Chinese characters in the poetry generated by the fine-tuned model. The human evaluation, poetic image distribution, and syntactic feature statistics show that the targeted fine-tuned model is helpful for the spread of language style. This fine-tuned model can successfully generate modern Chinese poetry in a French style.
... Illustrative of the potential of finetuning is the innovative application in the field of modern poetry generation, where two pretrained neural models, RoBERTa (Liu et al., 2019) and GPT-2 (Radford et al., 2019), are finetuned for generating French poetry. The poetry produced was evaluated favorably by human judges, particularly in terms of understandability with its best score of 3.79 on a 5-point scale (Hämäläinen et al., 2022). Similarly, businesses frequently fine-tune LLMs to meet unique needs, leading to improvements in customer service. ...
Preprint
Full-text available
The Metropolia University of Applied Sciences initiated a project intended to enhance its Moodle platform with an AI-powered plugin, designed to improve the educational process for educators. Central to this initiative is the development of a chatbot designed to engage with users in conversations about teaching material and sustainability, specifically about Sustainable Development Goals.This thesis evaluates several open-source large language models — Llama 3 (8B), Gemma (2B and 7B), and Phi 2 (2.7B) — implementing a methodology that includes training dataset generation, automated evaluation, comparative analysis, and error analysis. Training data was created by collecting sustainability-related documents and using Mistral (7B) to convert plain text into Q&A pairs. Then, these base models were finetuned with the generated sustainability data and general datasets designed for dialogue and summarizations.The model’s performances were measured using the BLEU, ROUGE, AND METEOR scores to assess the quality of text generation, while comparative analysis focused on evaluating the model efficiency relative to the resources consumed and the parameters size, and an error analysis was done to classify the error types. The study shows that finetuning always improved the performance; the best-performing model being finetuned was Gemma (7B) with a METEOR score of 0.25, and the maximum time taken during finetuning was 8 hours and 30 minutes.
... As for the selection of the GPT-2 model, we use AraGPT2 [Antoun et al., 2021] We combine the two models into an encoder-decoder architecture similarly to the work described in Hämäläinen et al. [2022a]. To ensure proper configuration of the new model and a correct mapping between the encoder and decoder, we defined the mapping of special characters such as beginning of sentence, padding, unknown and end of sentence tokens. ...
Article
Full-text available
We present an encoder-decored based model for normalization of Arabic dialects using both BERT and GPT-2 based models. Arabic is a language of many dialects that not only differ from the Modern Standard Arabic (MSA) in terms of pronunciation but also in terms of morphology, grammar and lexical choice. This diversity can be troublesome even to a native Arabic speaker let alone a computer. Several NLP tools work well for MSA and in some of the main dialects but fail to cover Arabic language as a whole. Based on our manual evaluation, our model normalizes sentences entirely correctly 46% of the time and almost correctly 26% of the time.
... For instance, Beheitt and Hmida (2022) trained GPT-2 (Radford et al., 2019) on Arabic news then fine-tuned the model on Arabic poetry. Hämäläinen et al. (2022) made use of an encoder-decoder architecture to generate modern French poetry, where the encoder is initialized from a pre-trained RoBERTa (Liu et al., 2019) checkpoint while the decoder is based on a pre-trained GPT-2 checkpoint. They scraped a corpus of French poems, and used it to train their model for sequence-to-sequence generation, where it predicts a verse given a previous verse in a poem. ...
... Poetry generation by artificial intelligence has been actively studied in various languages. In addition to English and Japanese, there have been studies on French poetry generation [4] using GPT-2 [3] and Chinese poetry generation [5] using GPT-2. ...
Article
This paper describes the implementation of an artificial intelligence haiku generator. We trained language models using existing haiku and literary studies, evaluated model performance using automatically computable evaluation indices such as perplexity, and subjectively evaluated the generated haiku by using a questionnaire.The main contributions of this paper are as follows. First, the effectiveness of a series of model evaluation processes, including automatically calculable evaluation indices and the results of subjective evaluations using questionnaires, is investigated. These processes are effective in the development of haiku generation models. Second, high-quality haiku generation is achieved using high-performance language models such as GPT-2 and BART.The results of the questionnaire survey revealed that it is possible to generate sensible haiku comparable to those written by humans.The insight gained from this study is applicable to other generative tasks.
Preprint
Full-text available
We present a DialGPT based model for generating creative dialog responses that are conditioned based on one of the following emotions: anger, disgust, fear, happiness, pain, sadness and surprise. Our model is capable of producing a contextually apt response given an input sentence and a desired emotion label. Our model is capable of expressing the desired emotion with an accuracy of 0.6. The best performing emotions are neutral, fear and disgust. When measuring the strength of the expressed emotion, we find that anger, fear and disgust are expressed in the most strong fashion by the model.
Conference Paper
Full-text available
Automatically generating poetry by computers is a challenging topic that requires the use of advanced deep learning techniques. While much attention has been given to English and Chinese poem generation, there are few significant efforts considering other languages. Generating poems in Arabic is a difficult task due to the complexity of the Arabic language grammatical structure. In this paper, we investigate the feasibility of training generative pre-trained language model GPT-2 to generate Arabic poems. The results of the experiments, which included the BLEU score as well as human assessments, confirmed the effectiveness of our proposed model. Both automatic and human evaluations show that our proposed model outperforms existing models in generating Arabic poetry.
Conference Paper
Full-text available
We outline the Great Misalignment Problem in natural language processing research, this means simply that the problem definition is not in line with the method proposed and the human evaluation is not in line with the definition nor the method. We study this misalignment problem by surveying 10 randomly sampled papers published in ACL 2020 that report results with human evaluation. Our results show that only one paper was fully in line in terms of problem definition, method and evaluation. Only two papers presented a human evaluation that was in line with what was modeled in the method. These results highlight that the Great Misalignment Problem is a major one and it affects the validity and reproducibility of results obtained by a human evaluation.
Conference Paper
Full-text available
We present a creative poem generator for the morphologically rich Finnish language. Our method falls into the master-apprentice paradigm, where a computationally creative genetic algorithm teaches a BRNN model to generate poetry. We model several parts of poetic aesthetics in the fitness function of the genetic algorithm, such as sonic features, semantic coherence, imagery and metaphor. Furthermore , we justify the creativity of our method based on the FACE theory on computational creativity and take additional care in evaluating our system by automatic metrics for concepts together with human evaluation for aesthetics , framing and expressions.
Conference Paper
Full-text available
This paper presents a new, NLG based approach to poetry generation in Finnish for use as a part of a bigger Poem Machine system the objective of which is to provide a platform for human computer co-creativity. The approach divides generation into a linguistically solid system for producing grammatical Finnish and higher level systems for producing a poem structure and choosing the lexical items used in the poems. An automatically extracted open-access semantic repository tailored for poem generation is developed for the system. Finally , the resulting poems are evaluated and compared with the state of the art in Finnish poem generation.
Article
Automatic Chinese Tang poetry composition arouses researchers' attention these years and faces a lot of challenges. Most existing poetry generation systems can only generate poems without human interaction; thus, these poems cannot always express the human mind accurately. To improve this disadvantage, this paper proposes a modified elitist genetic algorithm to generate poetry with arbitrary interaction from the user, which means that the user can specify the poem’s emotion and input words or verses to be used in the poem. The modified algorithm comprises an improved elitist strategy to retain keywords or verses provided by the users, and a new concrete fitness function for more accurate and effective quality evaluation of poems. The Turing test and fitness function contrast experiments show that the proposed algorithm could generate poems using given keywords or verse and the poems generated by the algorithm receive higher ratings and recognition than the original poems written by a human. The experimental results demonstrate the effectiveness of the proposed algorithm and prove that this research can make practical and theoretical contributions.