Automatic Poetry Generation in Turkish

Thesis (PDF Available) · June 2016with 2,414 Reads
DOI: 10.13140/RG.2.1.3625.4322
Advisor: Asst. Prof. Dr. Tugba Yildiz
Abstract
Poetry is one of the unique field of literary art and human-based natural language generation system. A text considered as a poem if it satisfies three properties which are grammaticality, meaningfulness and poeticness. The goal of this project is creating a computer program that can generate poems which are indistinguishable from human-written poems by satisfying these three prop- erties. An experiment is made with 146 participants to test if our automatic poetry generation program ROMTU is able to achieve this goal. As a result, ROMTU were able to mislead 48.63% of participants.
AUTOMATIC POETRY GENERATION IN TURKISH
By
Utku Sen
Supervised By
Asst. Prof. Dr. Tugba Yildiz
Computer Engineering
Istanbul Bilgi University
May 2016
Contents
1 Introduction 5
2 Related Work 5
2.0.1 The Poetry Creator . . . . . . . . . . . . . . . . . . . . . 5
2.0.2 RACTER........................... 6
2.0.3 ELUAR............................ 6
2.0.4 WASP............................. 6
3 Methodology 6
3.1 CreatingaLexicon.......................... 6
3.2 Creating a POS-tagged Word List . . . . . . . . . . . . . . . . . 8
3.3 Creating a Vectorised Corpus . . . . . . . . . . . . . . . . . . . . 9
3.4 Creating Pattern Syntax for Poems . . . . . . . . . . . . . . . . . 10
3.5 Poem Generation Algorithm . . . . . . . . . . . . . . . . . . . . . 12
4 Experiments 16
4.1 Questions ............................... 19
4.2 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.1 Question1 .......................... 21
4.2.2 Question2 .......................... 24
4.2.3 Question3 .......................... 26
4.2.4 Question4 .......................... 28
4.2.5 Question5 .......................... 30
4.2.6 Question6 .......................... 32
4.2.7 TotalResult.......................... 34
5 Conclusion 36
6 References 37
1
List of Figures
1 Flow chart representation of overall process . . . . . . . . . . . . 16
2 DistributionbyAge ......................... 17
3 Distribution by Gender . . . . . . . . . . . . . . . . . . . . . . . 17
4 Distribution by Department . . . . . . . . . . . . . . . . . . . . . 18
5 Distribution by Interest in Poetry . . . . . . . . . . . . . . . . . . 18
6 Total Result for Question 1 . . . . . . . . . . . . . . . . . . . . . 21
7 Results on Demographic Distributions for Question 1 . . . . . . . 22
8 Total Rank for Question 1 . . . . . . . . . . . . . . . . . . . . . . 23
9 Total Result for Question 2 . . . . . . . . . . . . . . . . . . . . . 24
10 Results on Demographic Distributions for Question 2 . . . . . . . 24
11 Total Rank for Question 8 . . . . . . . . . . . . . . . . . . . . . . 25
12 Total Result for Question 3 . . . . . . . . . . . . . . . . . . . . . 26
13 Results on Demographic Distributions for Question 3 . . . . . . . 26
14 Total Rank for Question 3 . . . . . . . . . . . . . . . . . . . . . . 27
15 Total Result for Question 4 . . . . . . . . . . . . . . . . . . . . . 28
16 Results on Demographic Distributions for Question 4 . . . . . . . 28
17 Total Rank for Question 4 . . . . . . . . . . . . . . . . . . . . . . 29
18 Total Result for Question 5 . . . . . . . . . . . . . . . . . . . . . 30
19 Results on Demographic Distributions for Question 5 . . . . . . . 31
20 Total Rank for Question 5 . . . . . . . . . . . . . . . . . . . . . . 32
21 Total Result for Question 6 . . . . . . . . . . . . . . . . . . . . . 32
22 Results on Demographic Distributions for Question 6 . . . . . . . 33
23 Total Rank for Question 6 . . . . . . . . . . . . . . . . . . . . . . 34
24 Total Result for All Questions . . . . . . . . . . . . . . . . . . . . 34
25 Results on Demographic Distributions for All Questions . . . . . 35
26 Total Rank for All Questions . . . . . . . . . . . . . . . . . . . . 36
2
List of Tables
1 Top 5 most similar words for ga¸c(tree) . . . . . . . . . . . . . . 10
2 Top 5 most similar words for urkiye(Turkey) . . . . . . . . . . . 10
3 Top 3 most similar words for kırmızı(red) with compare to bev-
erage model.............................. 14
3
Abstract
Poetry is one of the unique field of literary art and human-based natural
language generation system. A text considered as a poem if it satisfies three
properties which are grammaticality, meaningfulness and poeticness. The goal
of this project is creating a computer program that can generate poems which
are indistinguishable from human-written poems by satisfying these three prop-
erties. An experiment is made with 146 participants to test if our automatic
poetry generation program ROMTU is able to achieve this goal. As a result,
ROMTU were able to mislead 48.63% of participants.
4
1 Introduction
Poetry is one of the unique field of literary art and human-based natural lan-
guage generation system. We can basically describe the poetry as: ”Composi-
tion in verse or some comparable patterned arrangement of language in which
the expression of feelings and ideas is given intensity by the use of distinctive
style and rhythm.”[1].
As we described poetry as a part of literary art, it should satisfy the ba-
sic needs on literary art according to human point of view. Poetry genera-
tion requires intelligence, expert mastery over world, linguistic knowledge, and
creativity.[2] Human creativity is mostly fed by emotions on poem generation
process.
The main goal of this study is creating a computer program that can generate
poems which are indistinguishable from human-written poems. Having this
program may help for understanding human creativity and behaviour on art
creation. Moreover, if we understand this progress, we can represent it with
algorithms[4].
Since there is no strict rule that defines poetry[3], we need some properties,
key traits to separate a poetic text from a non-poetic one. For doing it, we used
three properties of poetry. A text can be considerable as poem if it satisfies
these three properties[5]:
Grammaticality: A poem must satisfy all the grammatical rules in se-
lected language.
Meaningfulness: A poem must convey at least one message to user
which is meaningful under some interpretation.
Poeticness: A poem must be distinguishable from a non-poetic text. For
achieving this, a poem must have poetic features such as: Rhythm, rhyme,
metaphors etc.
2 Related Work
Automatic poem generation became a research field in the late nineties.[5] Since
the poetry is a branch of the literary art, building a poem generation system
which is satisfies three properties of poetry is not an easy work. Because of that,
many methods are developed to achieve this goal. In this chapter, this methods
and related projects will be discussed.
2.0.1 The Poetry Creator
It’s a basic system which generates poem with the words which are given by a
user. The words are filling the pre-defined verse templates.[5]
5
2.0.2 RACTER
RACTER uses grammar-based generation system. It satisfies thematic continu-
ity by reusing lexical elements. With this method, it produces understandable
sentences[5].
2.0.3 ELUAR
ELUAR using template-based generation system with categorising these tem-
plates to themes such as love, nature and philosophy.[5]
2.0.4 WASP
WASP is a rule-based poem generation system. According to Oliveira (2009)
”it aims to study and test the importance of the initial vocabulary, the word
choice, the verse pattern selection and the construction heuristics taking into
account the acceptance (or not) of the generated verses and complete poems.”
3 Methodology
As it described in the introduction section, the goal of the project is creating
a computer program that can generate poems which are indistinguishable from
human-written poems. For achieving this goal, the generated poem should sat-
isfy three properties which are: Grammaticality,meaningfulness and poeticness.
For satifying these properties, we used mixed approaches.
3.1 Creating a Lexicon
In our context, lexicon refers to group of vocabulary which is used for gener-
ating poems. For creating a lexicon, firstly 1500 different poems are gathered
from siirakademisi.com website. After than the most commonly used words
are selected. Since Turkish is a agglutinative language, stem of the words are
found with tr-disamb tool[4] in order to use them with different grammatical
tenses. As a result, a lexicon with 4245 stem words is made. For achieving the
meaningfulness property, the words in lexicon divided into the categories below
according to their model and meaning:
Adjective - e.g d¨uz(flat)
Negative Adjective - e.g aptal(stupid)
Positive Adjective - e.g zarif(elegant)
Verb - e.g de˘gi¸smek(changing)
Negative Verb - e.g ¨olmek(dying)
Positive Verb - e.g g¨ulmek(laughing)
6
Noun - e.g ama¸c(goal)
Negative Noun - e.g cinayet(murder)
Positive Noun - e.g destek(support)
Pronoun - e.g sen(you)
Preposition - e.g yakın(close)
Negative Feeling - e.g pi¸smanlık(regret)
Positive Feeling - e.g a¸sk(love)
Negative Color - e.g siyah(black)
Positive Color - e.g altın(gold)
Location - e.g liman(harbor)
Object - e.g masa(table)
Person - e.g yargı¸c(judge)
Animal - e.g kedi(cat)
Beverage - e.g ¸sarap(wine)
Body part - e.g g¨oz(eye)
Fruit - e.g elma(apple)
Time - e.g ak¸sam(night)
Vehicle - e.g gemi(ship)
Weather - e.g g¨une¸sli(sunny)
Plant - e.g rose(g¨ul)
Planet - e.g ay(moon)
Season - e.g kı¸s(winter)
Gender - e.g kadın(woman)
This categorization helps for choosing proper words in poem generation process.
7
3.2 Creating a POS-tagged Word List
As it said before, Turkish is a agglutinative language. We need to use suffixes to
generate different grammatical tenses from a word. An example is given below:
git (go)
gitti (he/she/it went)
gittim (I went)
For generating different grammatical tenses for the words in lexicon auto-
maticaly, we use POS-tagging method. ”Part-of-speech tagging (POS-tag) is
the process of marking up the words in a text with their corresponding POS-
tags which are reflecting their syntactic category”[7]. Most common tags are
listed below[7]:
+Noun: Noun or derived noun
+Adj: Adjective or derived adjective
+Adv: Adverb or derived adverb
+Verb: verb or derived verb
+Pron: Pronoun or derived pronoun
+Conj: Conjunction
+Det: Determinant
+Postp: Postpronoun
+Ques: words written apart coming after question adjuncts
+Interj: Interjection
Following morphological POS-tags are also used in this project[7]
Minor POS: Able, Acquire, ActOf, Adamantly, AfterDoingSo, Agt, Al-
most, As, AsIf, AsLongAs, Become, ByDoingSo, Card, Caus, DemonsP,
Dim, Distrib, EverSince, FeelLike, FitFor, FutPart, Hastily, InBetween,
Inf, Inf1, Inf2, Inf3, JustLike, Ly, Ness, NotState, Ord, Pass, PastPart,
PCAbl, PCAcc, PCDat, PCGen, PCIns, PCNom, Percent, PersP, Pres-
Part, Prop, Quant, QuesP, Range, Ratio,Real, Recip, ReflexP, Rel, Re-
lated, Repeat, Since, SinceDoingSo, Start, Stay, Time, When, While,
With, Without, Zero
Person Agreements: A1pl, A1sg, A2pl, A2sg, A3pl, A3sg
Possessive Agreements: P1pl, P1sg, P2pl, P2sg, P3pl, P3sg, Pnon
8
Case Markers: Abl, Acc, Dat, Equ, Gen, Ins, Loc, Nom
Polarity: Neg, Pos
Following example describes how the new words forming with different mean-
ing from a stem word with POS-tags[8]
1. stem:al(take,red)
2. al+VerbˆDB+Verb+Pass+Pos+Past+A3sg (It was taken)
3. al+AdjˆDB+Noun+Zero+A3sg+P2sg+NomˆDB+Verb+Zero+Past+A3sg
(It was your red)
4. alın+Noun+A3sg+Pnon+NomˆDB+Verb+Zero+Past+A3sg (It was the
forhead)
As a word list, we used Zemberek[9] which consists 2882636 Turkish words
with their all different tenses and usages. By using Turkish morphological dis-
ambiguator program[10], we parsed each word in the Zemberek to their stem
and POS-tag. A small part of this word list is given below.
1. abartmayaca˘gım abart[Verb]+mA[Neg]-YAcAk[Adj+FutPart]+Hm[P1sg]
2. abartmayaca˘gız abart[Verb]+mA[Neg]+YAcAk[Fut]+YHz[A1pl]
3. abartmayacak abart[Verb]+mA[Neg]-YAcAk[Adj+FutPart]+[Pnon]
3.3 Creating a Vectorised Corpus
For satisfying meaningfulness property, a trained data file which provides sim-
ilarity relationships of words with their cosine distance were needed. A corpus
which includes 500M tokens is used as a data to be trained. This corpus is
gathered from Turkish news portals and web pages in Turkish[11].
Word2vec tool[12] is used for generating vectorised representation of our cor-
pus. Following commands are used for accomplishing this:
$- ./word2phrase -train tr-orpus.txt -output text8-phrase -threshold 500 -
debug 2
$- ./word2vec -train text8-phrase -output tr-corpus.bin -cbow 0 -size 300 -
window 10 -negative 0 -hs 1 -sample 1e-3 -threads 12 -binary 1
As a result, a vectorised data is exported to tr-corpus.bin . We can observe
this data by running following command:
$- ./distance tr-corpus.bin
We can check similar words by their cosine distance to given word. For
example, top 5 most similar words for a˘ga¸c(tree) and T¨urkiye(Turkey) are given
below.
9
Word Cosine Distance
yaprak(leaf) 0.667447
¸calı(bush) 0.654950
¸cam a˘ga¸c(pine tree) 0.649843
sarma¸sık(ivy) 0.646334
Table 1: Top 5 most similar words for a˘ga¸c(tree)
Word Cosine Distance
¨ulke(country) 0.619929
ab(eu) 0.600553
avrupa(europe) 0.575065
entegrasyon(integration) 0.458791
Table 2: Top 5 most similar words for T¨urkiye(Turkey)
3.4 Creating Pattern Syntax for Poems
Predefined patterns are used for poem generation process. Each pattern starts
with a pattern head:
<pattern number=(number) theme=(theme name)>
For example:
<pattern number=1 theme=a¸sk>
Each pattern ends with following line:
</pattern>
Stanzas are located between these start and end tags. Each stanza may
include following elements:
1. A model which is defined in lexicon (e.g animal,adjective,vehicle)
2. A POS-tag representation inside quotes
3. A custom word (e.g a˘ga¸c(tree) )
An example stanza definition:
date ”[Noun]+lAr[A3pl]+[Pnon]+[Nom]” ncolor gibi
date is defined in lexicon which consists words like sabah(morning), ak¸sam(night)
etc.
”[Noun]+lAr[A3pl]+[Pnon]+[Nom]” is a POS-tag definition
ncolor(negative color) is defined in lexicon which consists words like siyah(black),
koyu(dark) etc.
gibi(like,just as) is a custom word
10
6 different poem patterns are generated for our experiments (see section 4).
All these patterns are given below:
Poem 1
1) bir season time ”[Noun]+[A3sg]+SH[P3sg]+[Nom]-YDH[Verb+Past]+[A3sg]”
2) nadjective ”[Adj]-YDH[Verb+Past]+m[A1sg]” i¸ste
3) pverb ”[Verb]+[Pos]-DHk[Noun+PastPart]+[A3sg]+Hn[P2sg]+[Nom]” gen-
der
4) pverb ”[Verb]+mA[Neg]+z[Aor]+[A3sg]” pronoun ”[Pron]+[Pers]+[A2sg]+[Pnon]+NH[Acc]”
¨ustelik
5) sen nadjective person
6) nverb ”[Verb]+[Pos]-DHk[Noun+PastPart]+[A3sg]+Hm[P1sg]+NH[Acc]” bilmelisin
7) o nadjective yerde
8) date ”[Noun]+lAr[A3pl]+[Pnon]+[Nom]” nverb ”[Verb]+[Pos]-YHncA[Adv+When]”
aniden
Poem 2
1) ne diye nverb ”[Verb]-Hl[Verb+Pass]+[Pos]+Hr[Aor]+sHn[A2sg]”
2) o nadjective location ”[Noun]+[A3sg]+[Pnon]+DA[Loc]”
3) padjective ”[Adj]-sHn[Verb+Pres+A2sg]” i¸ste
4) padjective bir animal gibi
5) date ”[Noun]+lAr[A3pl]+[Pnon]+[Nom]” ncolor i¸cindeydi
6) plant ”[Noun]+lAr[A3pl]+[Pnon]+[Nom]” nadjective bug¨un
7) nverb ”[Verb]-Hn[Verb+Pass]+[Pos]+Hyor[Prog1]+YHm[A1sg]” i¸ste
8) nadjective bir animal gibi
Poem 3
1) animal ”[Noun]+lAr[A3pl]+[Pnon]+[Nom]” pverba ”[Verb]+[Pos]+DH[Past]+[A3sg]”
2) nadjective gender ”[Noun]+lAr[A3pl]+[Pnon]+[Nom]” gibi
3) weather bir location ”[Noun]+[A3sg]+[Pnon]+DA[Loc]”
4) nadjective season gibi
5) vehicle ”[Noun]+lAr[A3pl]+[Pnon]+[Nom]” verb ”[Verb]+mA[Neg]+z[Aor]+[A3sg]”
artık
6) beverage da verb ”[Verb]-Hl[Verb+Pass]+mA[Neg]+z[Aor]+[A3sg]”
7) season nverb ”[Verb]+mA[Neg]+z[Aor]+[A3sg]” artık
8) pfeeling verb ”[Verb]-Hl[Verb+Pass]+mA[Neg]+z[Aor]+[A3sg]”
Poem 4
1) bir gender ”[Noun]+[A3sg]+SH[P3sg]+[Nom]” pverbi ”[Verb]+[Pos]+Hyor[Prog1]+YDH[Past]+m[A1sg]”
2) weather location ”[Noun]+[A3sg]+[Pnon]+NHn[Gen]” preposition ”[Noun]+[A3sg]+SH[P3sg]+NDA[Loc]”
11
3) planet gibiydi bodypart ”[Noun]+[A3sg]+SH[P3sg]+[Nom]”
4) nadjective season ”[Noun]+[A3sg]+[Pnon]+DA[Loc]”
5) vehicle ”[Noun]+lAr[A3pl]+[Pnon]+[Nom]” verb ”[Verb]+[Pos]+Hyor[Prog1]+[A3sg]+YDH[Past]”
6) weather location ”[Noun]+[A3sg]+[Pnon]+NHn[Gen]” preposition ”[Noun]+[A3sg]+SH[P3sg]+NDA[Loc]”
7) animal ”[Noun]+lAr[A3pl]+[Pnon]+[Nom]” verb ”[Verb]+[Pos]+Hyor[Prog1]+[A3sg]+YDH[Past]”
8) nadjective ”[Adj]-CA[Adj+AsIf]”
Poem 5
1) time ”[Noun]+[A3sg]+[Pnon]+NHn[Gen]” nverb ”[Verb]+[Pos]-DHk[Noun+PastPart]+[A3sg]+SH[P3sg]+[Nom]”
bu date ”[Noun]+lAr[A3pl]+[Pnon]+DA[Loc]”
2) nfeeling i¸cinde pverbi ”[Verb]+[Pos]+Hyor[Prog1]+YHm[A1sg]” pronoun ”[Pron]+[Pers]+[A2sg]+[Pnon]+NH[Acc]”
nadjective ”[Adj]-CA[Adj+AsIf]”
3) ve sen
4) bir gender ”[Noun]+[A3sg]+[Pnon]+YlA[Ins]” pverbw ”[Verb]+[Pos]+Hyor[Prog1]+sHn[A2sg]”
nadjective ”[Adj]-CA[Adj+AsIf]”
5) ve ben
6) weather bir location ”[Noun]+[A3sg]+[Pnon]+DA[Loc]-YHm[Verb+Pres+A1sg]”
7) plant ”[Noun]+lAr[A3pl]+[Pnon]+[Nom]” bile nadjective burada
Poem 6
1) weather bir nnoun ”[Noun]+[A3sg]+Hn[P2sg]+[Nom]” preposition ”[Noun]+[A3sg]+SH[P3sg]+NDA[Loc]”
2) vehicle ”[Noun]+lAr[A3pl]+[Pnon]+[Nom]” verb ”[Verb]+[Pos]+Hyor[Prog1]+[A3sg]”
date vakti
3) nfeeling var location ”[Noun]+[A3sg]+[Pnon]+DA[Loc]” yine
4) ve amount beverage
5) plant ”[Noun]+lAr[A3pl]+[Pnon]+[Nom]” nverb ”[Verb]+[Pos]+Hyor[Prog1]+[A3sg]”
animal ”[Noun]+lAr[A3pl]+[Pnon]+[Nom]” nverb ”[Verb]+[Pos]+Hyor[Prog1]+[A3sg]”
6) ama ben pverbi ”[Verb]+[Pos]+Hyor[Prog1]+YHm[A1sg]” pronoun ”[Pron]+[Pers]+[A2sg]+[Pnon]+NH[Acc]”
7) pverb ”[Verb]+[Pos]+Hyor[Prog1]+YHm[A1sg]”
8) yine bu season ”[Noun]+[A3sg]+[Pnon]+DA[Loc]”
3.5 Poem Generation Algorithm
Python used as a programming language for creating automatic poetry genera-
tion algorithm. Pseudocode of each function are described below.
Algorithm 1 similarity compare(word1,word2)
Input: two words from lexicon(string)
Output: similarity score(float)
return corpus.similarity(word1,word2)
A library named gensim[13] is used for importing and using vectorised corpus
12
with Python. When we give two words from lexicon to Algorithm 1, it finds
tr-corpus.bin file and finds cosine distance of these two words.
Algorithm 2 find postag(word,postag)
Input: a word(string), a POS-tag representation(string)
Output: a word(string)
zemberek z emberekparse.bin
for each line in zemberek do
search word +postag
end for
return first part of the line in which word +postag is found
Algorithm 2 searches given word and it’s POS-tag representation in our POS-
tagged word list named zemberek. We can repeat the example which is given
in section 3.2. Lets say given word is: ”abart”(exaggerate) and it’s POS-tag
representation is: ”[Verb]+mA[Neg]-YAcAk[Adj+FutPart]+Hm[P1sg]”
This algorithm will search every line in the word list to find a line which
includes ”abart[Verb]+mA[Neg]-YAcAk[Adj+FutPart]+Hm[P1sg]”
In our example, there was a line in following format:
abartmayaca˘gım abart[Verb]+mA[Neg]-YAcAk[Adj+FutPart]+Hm[P1sg]
As a result, this algorithm will return ”abartmayaca˘gım”(I wont’t exagger-
ate)
Algorithm 3 find random pattern(theme)
Input: theme name(string)
Output: a pattern(string)
lines poempatterns.txt
for each line in lines do
search given theme
end for
return a random pattern with given theme
As all the patterns are stored in poempatterns.txt file, Algorithm 4 helps for
choosing a random pattern from there.
Algorithm 4 helps for choosing most similar words for words in stanza to
satisfy meaningfulness property. Firstly, it loads the words which are listed in
given model’s lexicon. After than it checks if there is a selected or a predefined
word in stanza. If not, there is nothing to compare, it just returns a random
word from lexicon. But if there is, it calls Algorithm 1 for each word which are
listed in model’s lexicon and stores them in a dictionary. And it returns the
word with highest cosine distance score.
For example, our stanza is this:”pcolor beverage”
Since there is no selected word in stanza, Algorithm 4 will return a random for
for pcolor model. Assume that it returned ”kırmızı(red)”. Now, it will compare
13
Algorithm 4 select word(model)
Input: model name which is defined in lexicon(string)
Output: a word(string)
lines text file of the model’s lexicon
if it’s the first word in stanza then
most similar a random word from lexicon
return most similar
else
for each candidate word in lines do
result similarity compare(first word in stanza, candidate word)
similarity index[candidate word]result
end for
most similar word with highest similarity score
return most similar
end if
every word in beverage model with ”kırmızı” by calling Algorithm 1. Three
most similar words are listed below.
Word Cosine Distance
¸sarap(wine) 0.321968951606
meyve suyu(fruit juice) 0.167722385212
bira(beer) 0.118995090746
Table 3: Top 3 most similar words for kırmızı(red) with compare to beverage
model
Since ¸sarap(wine) has the highest cosine distance score, Algorithm 4 returns
that.
14
Algorithm 5 create poem(theme)
Input: a theme(string)
Output: stanzas(string)
pattern f ind random pattern(theme)
for each line in pattern do
for each word in line do
if the word defined in models and it’s followed by a postag then
selected word select word(word)
final word find postag(selected word)
stanza f inal word
else if the word defined in models and it’s not followed by a postag
then
selected word select word(word)
stanza selected word
else
stanza word
end if
end for
print stanza
end for
Algorithm 5 creates the poem by returning final form of stanzas. Firstly,
it calls Algorithm 3 to find a random pattern. After it returned a pattern,
Algorithm 5 checks every line of pattern and every word in a line. It checks if
the word in a line defined as a lexicon model (see section 3.1). If yes, it checks
if it has a POS-tag representation. If it has, firstly it calls Algorithm 4 to get
a word, then it calls Algorithm 2 to find the final word. Than it adds that
to stanza. If it hasn’t got a POS-tag representation, it calls Algorithm 4 and
adds returned word to stanza. If the word is not defined as a lexicon model,
Algorithm 5 directly adds it to stanza. After the process is finished, it prints
stanza.
Following flow chart represents the overall process.
15
Figure 1: Flow chart representation of overall process
4 Experiments
As it mentioned in the first chapter, aim of this project is creating a computer
program that can generate poetry which are indistinguishable from human-
written poetry. This experiment is prepared for measuring the success of Romtu
on this aim. It follows similar methodology with Bolock (2014).
The test is done with 146 participants physically. All participants were
engineering students of Istanbul Bilgi University. Participants were fully aware
of the purpose of the test. Each participant answered following questions before
the test:
Age
Gender
Department
Interest in Poetry
Following graphs shows demography of participants:
16
Figure 2: Distribution by Age
Figure 1 shows age distribution of participants.
Figure 3: Distribution by Gender
Figure 2 shows gender distribution of participants. 62.32% of them are 18-21
years old, 34.93% of them are 22-25 years old and only 2.75% of them are 26-28
-or more- years old.
17
Figure 4: Distribution by Department
Figure 3 shows department distribution. 53.42% of participants are com-
puter engineering students while 46.58% of them are students of other engineer-
ing departments.
Figure 5: Distribution by Interest in Poetry
Figure 4 shows distribution by interest in poetry of participants. 39.72% of
them are claiming that they are interested in poetry while 60.28% of them are
claiming that they are not interested.
18
4.1 Questions
The experiment consists of 6 questions. In each question, a computer-generated
poem and a human-written poem from a famous Turkish poet are given to
participants. Participants were asked to detect human-written poems in every
question. Also, participants ranked each poem from 0 to 5 (0:weak, 5:strong)
according to following criteria:
Rhyme
Message
Usage of Language
The poems used in the test are given below with their question number:
Question 1:
Computer-generated poem
Bir kızı ¨op¨uyordum
Kapalı ko˘gu¸sun kenarında
une¸s gibiydi g¨oz¨u
Vefasız sonbaharda
Otob¨usler dolu¸suyordu
Karlı at¨olyenin yakınında
Kanaryalar ¨ot¨uyordu
Tuhaf¸ca
Denize Serenad - R¨st¨u Onur[14]
Neyim varsa
Sana bırakmalıyım deniz
Sende ge¸cmeli mevsimlerim
Sende ¸ci¸cek a¸cmalı a˘ga¸clarım
Sende ya¸samalıyım deniz
Asi ve h¨ur
Sende ¨olmeliyim
Bulutlara bakarak
Question 2:
S¸air - Muzaffer Tayyip Uslu[15]
Siz bakmayın bana
Ben ¸sairim
Denizin ¨uzerinde y¨ur¨uyebilirim
Islık ¸calarak
Hatta ellerim cebimde
Bir de sigara bulunsun
˙
Isterseniz a˘gzımda
Computer-generated poem
Puslu bir matemin kenarında
Vapurlar yana¸sıyor sabah vakti
Acı var memlekette yine
Ve azıcık rakı
Ama ben seviyorum seni
¨
Ozl¨uyorum
Yine bu sonbaharda
Question 3:
19
Ayrılı¸s - Ahmet Muhip Dıranas[16]
un batıyor, g¨un batıyor
Veda etsem hepinize
Ufuk kanlı bir denize
on¨uyor sizi bıraksam
un batıyor g¨un batıyor
Evimi e¸syamı paramı
Nem varsa yaksam ve bir an
Kaybetsem kara bir duman
Arkasında hafızamı
Computer-generated poem
Bir yaz haftasıydı
Mecalsizdim i¸ste
¨
Ozledi˘gin kadın
¨
Ozlemez seni ¨ustelik
Sen vefasız ¸cirkef
Utandı˘gımı bilmelisin
O berbat yerde
S¸afaklar sararınca aniden
Question 4:
Computer-generated poem
Saatin vedala¸stı˘gı bu sabahlarda
Acı i¸cinde seviyorum seni buruk¸ca
Ve sen
Bir erkekle evleniyorsun ya¸slıca
Ve ben
Puslu bir ¸oldeyim
C¸ cekler bile yabani burada
Belki Bir G¨un - Nazım Hikmet Ran[17]
Belki bir g¨un maviler giyer
Deniz olurum
Belki bir g¨un ye¸siller giyer
ga¸c olurum
Hi¸c belli olmaz sevgilim
Belki bir g¨un beyazlar giyer
Senin olurum
Question 5:
Kardelen - Hikmet Elp[18]
Erken do˘gum yaptı bahar
Kar ¸ci¸ce˘gi do˘gdu
Karların altından
Artık bahar yakın derken
Erken do˘gum yaptı bahar
Kı¸s bahara gebe
Bahar ¸ci¸ceklere
Oysa kı¸s bahardan ¨once
Kardeleni do˘gurdu
Computer-generated poem
Ne diye incinirsin
O yetim kabirde
¨
Ozg¨urs¨un i¸ste
Yi˘git bir ¸sahin gibi
Ak¸samlar siyah i¸cindeydi
C¸ i¸cekler miskin bug¨un
Par¸calanıyorum i¸ste
Yılgın bir gelincik gibi
Question 6:
20
Computer-generated poem
Kuzular sırna¸stı
C¸ ekingen kadınlar gibi
Kasvetli bir k¨skte
Susuz yaz gibi
Trenler devrilmez artık
S¸arap da damıtılmaz
Kı¸s ¨urpermez artık
sk anlatılmaz
Seni Anlamak - Muzaffer Tayyip
Uslu[14]
Seni anlamak i¸cin
Sen olmak gerekir
Seni anlıyorum
C¸ ¨unk¨u ure˘gindeki senim
Beni korkutan ¸sey
¨
Ol¨um de˘gil
Beni korkutan insanlar
4.2 Results and Analysis
In this section, results will be shown and analyzed based on answers of partici-
pants. At the beginning, each question will be evaluated separately. After that,
overall results will be shown. Also, we will evaluate each result on different
demographic distribution.
4.2.1 Question 1
Figure 6: Total Result for Question 1
Figure 5 shows that most of the participants are identified human-written poem.
Computer-generated poem consists a concrete&realistic theme while human-
21
written poem has more abstract details. It’s one possible reason of the success
ratio of participants. Also, computer-generated poem has difficulties on carrying
it’s message to participants (Figure 8). It’s another possible reason of the result.
Figure 7: Results on Demographic Distributions for Question 1
Figure 7 shows results of different groups of participants. In parallel with
the total result, each group successfully identified human-written poem with
different success ratio. It is possible to make following assumptions:
Female participants 12.59% more successful than male participants on
identifying human-written poem. According to this result, we can state
that women participants finds abstract poems more humanistic as a possi-
ble explanation. Also, we can say that when the message of the poem is not
strong (Figure 7), women participants can identify computer-generated
poem better.
Participants who are claiming that they are interested in poetry has more
success ratio than not interested group. The success difference is 19.12%
Since the participants who interested in poetry are more familiar with
poem structures and themes, they found the second poem more humanis-
tic.
Computer engineering students identified human-written poem with 9.96%
difference than the other engineering students. Since the computer engi-
neering students are more familiar with Natural Language Processing,
they have more chance to spot the algorithmic language style.
22
Figure 8: Total Rank for Question 1
Figure 8 shows the total ranking of the computer-generated poem in Ques-
tion 1 by it’s features. It has 3.15 and 3.18 points out of 5 on rhyme and usage
of language features. But in message feature, it has 2.62 points out 5 which can
be considered as unsuccessful than the other features. This poem has difficulties
on carrying message to participants. This can be main reason of the success
ratio of participants.
23
4.2.2 Question 2
Figure 9: Total Result for Question 2
Figure 9 shows that most of the participants are failed to identify human-written
poem from a computer-generated poem. In this question, computer-generated
poem has more abstract details than human-written poem. Also computer-
generated poem consists more metaphors. As similar with the Question 1, it’s
one possible reason for explanation of the result.
Figure 10: Results on Demographic Distributions for Question 2
24
Figure 10 shows results of different groups of participants. In parallel with
the total result, each group failed to identify human-written poem with different
ratio.
In this question, we can’t see so much differences between the success
ratio of men and women participants. The difference is 0.63%. Since the
computer-generated poem has more abstract details, women participants
find this poem more humanistic. But also men participants has almost
same success ratio with women participants. As we can see in the Figure
10, computer-generated poem has low rank on it’s rhyme feature. It’s a
candidate explanation for this result.
Participants who are claiming that they are interested in poetry has more
success ratio than not interested group. The success difference is 12.98%
Since the participants who interested in poetry are more familiar with
poem structures and themes, they found the second poem more humanis-
tic.
In this question, computer engineering students failed to identify human-
written poem with 10.38% difference than the other engineering students.
As we said in the Question 1, computer engineering students are more
familiar with NLP and these kind of text generation algorithms. We can
say even if the message of the poem strong or not, computer engineer-
ing students tries to spot algorithmic structures and they thought that
S¸air - Muzaffer Tayyip Uslu[15] follows more algorithmic structure than
computer-generated poem.
Figure 11: Total Rank for Question 8
Figure 11 shows the total ranking of the computer-generated poem in Ques-
tion 2 by it’s features. It has 3.42, 3.69, 3.66 points out of 5 on rhyme, message
25
and usage of language features. This poem has higher ranks than Question 1
on every feature. This can be main reason of the results.
4.2.3 Question 3
Figure 12: Total Result for Question 3
Figure 12 shows that 58.21% of the participants are identified human-written
while 41.79% of the are failed. In this question, both poems follow similar
theme. Because of that, we can’t analyze them on abstract-concrete based
theme differentiation.
Figure 13: Results on Demographic Distributions for Question 3
26
Figure 13 shows results of different groups of participants. The result differs
from results of first two questions.
In this question, men participants are more successful than women partic-
ipants. The difference is 5.49%. This poem also has lower ranks than first
two questions (Figure 13). Especially, rhyme feature has very low rank.
We can state that men participants identifies computer-generated poem
more efficiently if it has lower rank on rhyme feature than other features.
Also, words which are used in poems might be effective on decisions of the
participants.
Interestingly, participants who are claiming that they are interested in
poetry has lower success ratio than not interested group. The success
difference is 22.21% Since our poem generation algorithm imitates styles
of real poets, they might have find the structure of computer-generated
poem familiar.
We can’t see too much success difference between computer engineering
students and other engineering students. The difference is 2.73%.
Figure 14: Total Rank for Question 3
Figure 14 shows the total ranking of the computer-generated poem in Ques-
tion 3 by it’s features. It has 2.85, 3.12, 3.06 points out of 5 on rhyme, message
and usage of language features. This poem has lower ranks than Question 2 on
every feature. We can state that when a computer-generated poem has high fea-
ture ranking, participants are more likely to fail on identifying human-written
poem.
27
4.2.4 Question 4
Figure 15: Total Result for Question 4
Figure 15 shows that 54.8% of the participants are identified human-written
poem from a computer-generated poem while 45.2% of them are failed. The
failure ratio is 3.41% higher than Question 3. Also, like the Question 3, both
poems follow similar theme in this question. Because of that, we can’t ana-
lyze them on abstract-concrete based theme differentiation. But this poem has
higher ranks on it’s features than Question 3. This difference can be one possible
explanation of the result.
Figure 16: Results on Demographic Distributions for Question 4
28
Figure 16 shows results of different groups of participants. The results are
parallel with Question 3.
In this question, men participants are more successful than women partic-
ipants on identifying the human-written poem. The difference is 6.05%.
This poem has higher ranks than Question 3. Also, rhyme feature has
lower rank than the other features. Our assumption was ”men partici-
pants identifies computer-generated poem more efficiently if it has lower
rank on rhyme feature than other features”. But since this difference is
not high as the Question 3, we can’t accept it as a main reason for this
question. As it said earlier, words which are used in poems might be
effective on decision of the participants.
Similar to Question 3, participants who are claiming that they are inter-
ested in poetry has lower success ratio than not interested group. The
success difference is 13.66% Since our poem generation algorithm imi-
tates styles of real poets, they might have find the structure of computer-
generated poem familiar.
In this question, computer engineering students failed to identify human-
written poem with 11.65% difference than the other engineering students.
As we said in the Question 1-2, computer engineering students are more
familiar with NLP and these kind of text generation algorithms. We can
say even if the message of the poem strong or not, computer engineering
students tries to spot algorithmic structures and they thought that Belki
Bir G¨un - Nazım Hikmet Ran[17] follows more algorithmic structure than
computer-generated poem.
Figure 17: Total Rank for Question 4
29
Figure 17 shows the total ranking of the computer-generated poem in Ques-
tion 4 by it’s features. It has 3.5, 3.52, 3.58 points out of 5 on rhyme, message
and usage of language features. This poem has higher ranks than Question 3 on
every feature. Again, we can state that when a computer-generated poem has
high feature ranking, participants are more likely to fail on identifying human-
written poem.
4.2.5 Question 5
Figure 18: Total Result for Question 5
Figure 18 shows that 50.69% of the participants successfully identified human-
written poem while 49.31% participants failed to identify it. In this question,
computer-generated poem has few more abstract details than human-written
poem even if they follow similar themes. As similar with the Question 1-2, it’s
one possible reason for explanation of the result.
30
Figure 19: Results on Demographic Distributions for Question 5
Figure 19 shows results of different groups of participants.
In this question, women participants are more successful than men partic-
ipants on identifying the human-written poem. The difference is 6.05%.
Since the computer-generated poem has few more abstract details, women
participants find this poem more humanistic.
Participants who are claiming that they are interested in poetry has more
success ratio than not interested group. The success difference is 10.3%
Since our poem generation algorithm imitates styles of real poets, they
might have find the structure of computer-generated poem familiar.
In this question, computer engineering students failed to identify human-
written poem with 15.28% difference than the other engineering students.
As we said in the Question 1-2, computer engineering students are more
familiar with Natural Language Processing and these kind of text gen-
eration algorithms. We can say even if the message of the poem strong
or not, computer engineering students tries to spot algorithmic structures
and they thought that Kardelen - Hikmet Elp[18] follows more algorithmic
structure than computer-generated poem.
31
Figure 20: Total Rank for Question 5
Figure 20 shows the total ranking of the computer-generated poem in Ques-
tion 5 by it’s features. It has 3.26, 3.17, 3.17 points out of 5 on rhyme, message
and usage of language features. This poem has higher lower ranks than Question
4 on every feature. But as a total result, participants identified human-written
poem with less success rate than the Question 4. Because of that it’s hard to
say these features are the only reason of result of this question. Words which
are used in poems might be effective on decision of the participants.
4.2.6 Question 6
Figure 21: Total Result for Question 6
32
Figure 21 shows that most of the participants are failed to identify human-
written poem from a computer-generated poem. In this question, computer-
generated poem has more abstract details than human-written poem. Also
computer-generated poem consists more metaphors. As similar with the Ques-
tion 2, it’s one possible reason for explanation of the result.
Figure 22: Results on Demographic Distributions for Question 6
Figure 22 shows results of different groups of participants. In parallel with
the total result, each group failed to identify human-written poem with different
ratio.
In this question, women participants are more successful than men partic-
ipants on identifying the human-written poem. The difference is 8.73%.
Since the computer-generated poem has more abstract details, women
participants find this poem more humanistic.
Participants who are claiming that they are interested in poetry has more
success ratio than not interested group. The success difference is 2.74%
In this question, computer engineering students failed to identify human-
written poem with 15.54% difference than the other engineering students.
As it said before in previous questions, computer engineering students are
more familiar with Natural Language Processing and these kind of text
generation algorithms. We can say even if the message of the poem strong
or not, computer engineering students tries to spot algorithmic structures
and they thought that Seni Anlamak - Muzaffer Tayyip Uslu[15] follows
more algorithmic structure than computer-generated poem.
33
Figure 23: Total Rank for Question 6
Figure 23 shows the total ranking of the computer-generated poem in Ques-
tion 6 by it’s features. It has 3.57, 3.33, 3.52 points out of 5 on rhyme, message
and usage of language features.
4.2.7 Total Result
Figure 24: Total Result for All Questions
Figure 24 shows overall results for our experiment. According to that, 51.37%
of the participants are successfully identified human-written poem while 48.63%
34
of them are failed. As a result, we can say that poems which are generated by
our algorithm are nearly indistinguishable from human-written poems.
Figure 25: Results on Demographic Distributions for All Questions
Figure 25 shows the total results of different groups of participants.
In total result, we don’t see too much success difference between men and
women participants. Women participants are 2.3% more successful on
identifying human-written poems. According the results, we can say that
gender is not so effective trait on identifying human-written poetry. But
also, we can make these assumptions about decisions of different genders:
1. When a poem follows consists more abstract details and metaphors,
women participants find them more humanistic.
2. When a poem follows a strong rhyme structure, men participants
find them more humanistic.
We also can’t see too much success difference between the participants
who claim they are interested in poetry or not. The difference is 1.54%.
According to this result we can say that interest in poetry is not so effective
on identifying human-written poem from a computer-generated poem
According to total result, computer engineering students failed to iden-
tify human-written poem with 7.6% difference than the other engineering
students. Possible main reason mentioned in previous questions. Since
the computer engineering students are more familiar with language gen-
eration algorithms, they are trying to find footprints of an algorithm in
poems instead of checking it’s features. That’s why they failed with high
ratio.
35
Figure 26: Total Rank for All Questions
Figure 26 shows the total ranking of the computer-generated poems by
their features. It has 3.28, 3.25, 3.39 points out of 5 on rhyme, message and
usage of language features. Usage of language feature has higher rank than
the other features. Rhyme and message features got almost same score. As
a result, we can say that our poetry generation algorithm closely satisfies
three properties of poetry which are covered in Introduction section.
5 Conclusion
As it described in the first section, the aim of this project is creating a
computer program that can generate poems which are indistinguishable
from human-written poems. As a result, our automatic poetry generation
program ROMTU were able to mislead 48.63% of participants. We can
state this program nearly accomplished it’s aim.
Despite it’s success, ROMTU has some weak points. These weak points
are listed below. Fixing them may increase it’s success rate in future.
1. Small lexicon: The lexicon which is used to generate poem was not
so large. One can increase the success ratio by using a larger lexicon.
2. Doesn’t follow a rhyme scheme: ROMTU doesn’t follow a pre-
defined rhyme scheme such as abba, abab etc. Success ratio could be
increased by implementing rhyme scheme
3. Similarity analysis doesn’t aware of following stanzas: In our
algorithm, similarity compare of words is individual for each line. The
overall message of the poem may be better if the similarity analysis
covers all stanzas.
36
4. Lack of literal properties: ROMTU doesn’t aware of literal prop-
erties such as alliteration and anaphora. Also it doesn’t aware of
opposite meaning words such as blackwhite, morningnight etc.
5. Low performance: Our development machine has 2,7 Ghz Intel
Core i5 processor with 16GB memory. ROMTU is using 99,21% of
CPU and 2,51GB of RAM while it’s running. Since it uses lots of
resources, it’s hard to deploy it as a web application.
6 References
[1]Oed.com. (2016). poetry, n. : Oxford English Dictionary. [online]
Available at: http://www.oed.com/view/Entry/146552 [Accessed 11 Jun.
2016].
[2] Manurung, H., Ritchie, G., Thompson, H. (2000). Towards A Com-
putational Model Of Poetry Generation.
[3] Manurung, H. (2003). An evolutionary algorithm approach to poetry
generation.
[4] Denizyuret.com. (2006). Deniz Yuret’s Homepage: Learning Morpho-
logical Disambiguation Rules for Turkish. [online] Available at: http://www.denizyuret.com/2006/06/learning-
morphological-disambiguation.html [Accessed 11 Jun. 2016].
[5] Bolock, A. (2014). Automatic Poetry Generation Using CHR.
[6] Oliveira, H. (2009). Automatic generation of poetry: an overview.
[7] Altunyurt, L. and Orhan, Z. (2006). PART OF SPEECH TAGGER
FOR TURKISH.
[8] Ehsani, R., Alper, M., Eryi˘git, G. and Adalı, E. (2012). Disambiguat-
ing Main POS tags for Turkish.
[9] GitHub. (2016). ahmetaa/zemberek-nlp. [online] Available at: https://github.com/ahmetaa/zemberek-
nlp [Accessed 28 May 2016].
[10] Denizyuret.com. (2006). Deniz Yuret’s Homepage: Learning Morpho-
logical Disambiguation Rules for Turkish. [online] Available at: http://www.denizyuret.com/2006/06/learning-
morphological-disambiguation.html [Accessed 28 May 2016].
[11] Sak, H., ung¨or, T. and Sara¸clar, M. (2008). Turkish Language
Resources: Morphological Parser, Morphological Disambiguator and Web
Corpus.
37
[12] Code.google.com. (2016). [online] Available at: https://code.google.com/archive/p/word2vec
[Accessed 29 May 2016].
[13] Radimrehurek.com. (2016). gensim: topic modelling for humans. [on-
line] Available at: https://radimrehurek.com/gensim/ [Accessed 29 May
2016].
[14] Serenad, D. (2016). Denize Serenad S¸iiri - R¨st¨u Onur. [online] An-
toloji.com. Available at: http://www.antoloji.com/denize-serenad-siiri/
[Accessed 11 Jun. 2016].
[15] Siirsanatedebiyat.com. (2016). Muzaffer Tayyip Uslu — Siir Sanat
Edebiyat. [online] Available at: http://www.siirsanatedebiyat.com/muzaffer-
tayyip-uslu-bir-sevda-siiri/ [Accessed 11 Jun. 2016].
[16] Antoloji.com. (2016). Ayrılı¸s S¸iiri - Ahmet Muhip Dıranas. [on-
line] Available at: http://www.antoloji.com/ayrilis-2-siiri/ [Accessed 11
Jun. 2016].
[17] Neokur.com. (2016). BELK˙
I B˙
IR G ¨
UN — Nazım Hikmet Ran
S¸iirleri. [online] Available at: http://www.neokur.com/siir/1610/belki-
bir-gun [Accessed 11 Jun. 2016].
[18] Antoloji.com. (2016). Kardelen S¸ iiri - Hikmet Elp. [online] Avail-
able at: http://www.antoloji.com/kardelen-139-siiri/ [Accessed 11 Jun.
2016].
38
This research hasn't been cited in any other publications.
  • Ayrılı¸sAyrılı¸s S ¸iiri -Ahmet Muhip Dıranas
    • Antoloji
    Antoloji.com. (2016). Ayrılı¸sAyrılı¸s S ¸iiri -Ahmet Muhip Dıranas. [online] Available at: http://www.antoloji.com/ayrilis-2-siiri/ [Accessed 11
  • Deniz Yuret's Homepage: Learning Morphological Disambiguation Rules for Turkish. [online] Available at: http://www.denizyuret.com
    • Denizyuret
    Denizyuret.com. (2006). Deniz Yuret's Homepage: Learning Morphological Disambiguation Rules for Turkish. [online] Available at: http://www.denizyuret.com/2006/06/lear morphological-disambiguation.html [Accessed 28 May 2016].
  • [online] Available at: https://code.google.com/archive
    • Code
    Code.google.com. (2016). [online] Available at: https://code.google.com/archive/p/word2vec [Accessed 29 May 2016].
  • poetry, n. : Oxford English Dictionary
    • Oed
    Oed.com. (2016). poetry, n. : Oxford English Dictionary. [online]
  • Denize Serenad S ¸iiri -Rü¸Rü¸stü Onur. [online] Antoloji .com. Available at: http://www.antoloji.com/denize-serenad-siiri
    • D Serenad
    Serenad, D. (2016). Denize Serenad S ¸iiri -Rü¸Rü¸stü Onur. [online] Antoloji.com. Available at: http://www.antoloji.com/denize-serenad-siiri/ [Accessed 11 Jun. 2016].
  • Article
    Full-text available
    Poetry is a unique artifact of the human language faculty, with its defining feature being a strong unity between content and form. Contrary to the opinion that the automatic generation of poetry is a relatively easy task, we argue that it is in fact an extremely difficult task that requires intelligence, world and linguistic knowledge, and creativity.
  • Article
    Full-text available
    This paper is about the automatic generation of creative text, more precisely the automatic generation of poetry. It starts by presenting two possible categorisations for sys-tems that aim generating poetry and then makes a brief overview on the existing attempts to this subject based on what can be found in the literature.
  • Chapter
    Full-text available
    In this paper, we propose a set of language resources for building Turkish language processing applications. Specifically, we present a finite-state implementation of a morphological parser, an averaged perceptron-based morphological disambiguator, and compilation of a web corpus. Turkish is an agglutinative language with a highly productive inflectional and derivational morphology. We present an implementation of a morphological parser based on two-level morphology. This parser is one of the most complete parsers for Turkish and it runs independent of any other external system such as PC-KIMMO in contrast to existing parsers. Due to complex phonology and morphology of Turkish, parsing introduces some ambiguous parses. We developed a morphological disambiguator with accuracy of about 98% using averaged perceptron algorithm. We also present our efforts to build a Turkish web corpus of about 423 million words.
  • Conference Paper
    Full-text available
    In this paper, we present a rule based model for morphological disambiguation of Turkish. The rules are generated by a novel decision list learning algorithm us- ing supervised training. Morphological ambiguity (e.g. lives = live+s or life+s) is a challenging problem for agglutinative languages like Turkish where close to half of the words in running text are morpho- logically ambiguous. Furthermore, it is possible for a word to take an unlimited number of sufx es, therefore the number of possible morphological tags is unlim- ited. We attempted to cope with these problems by training a separate model for each of the 126 morphological features recognized by the morphological analyzer. The resulting decision lists independently vote on each of the potential parses of a word and the nal parse is selected based on our condence on these votes. The accuracy of our model (96%) is slightly above the best previously reported results which use statistical models. For compari- son, when we train a single decision list on full tags instead of using separate models on each feature we get 91% accuracy.
  • Article
    Full-text available
    In this paper we describe the difficulties of poetry generation, particularly in contrast to traditional informative natural language generation. We then point out deficiencies of previous attempts at poetry generation, and propose a stochastic hillclimbing search model which addresses these deficiencies. We present both conceptual and implemented details of the most important aspects of such a model, the evaluation and evolution functions. Finally, we report and discuss results of our preliminary implementation work. 1 Motivation Poetry is a unique artifact of human natural language production, with the distinctive feature of having a strong unity between its content and its form. The creation of poetry is a task that requires intelligence, expert mastery over world and linguistic knowledge, and creativity. Although some research work has been devoted towards creative language such as story generation, poetry writing has not been afforded the same attention. It is the aim of thi...