Conference PaperPDF Available

F.: Learning Morphological Disambiguation Rules for Turkish

Authors:

Abstract

In this paper, we present a rule based model for morphological disambiguation of Turkish. The rules are generated by a novel decision list learning algorithm us- ing supervised training. Morphological ambiguity (e.g. lives = live+s or life+s) is a challenging problem for agglutinative languages like Turkish where close to half of the words in running text are morpho- logically ambiguous. Furthermore, it is possible for a word to take an unlimited number of sufx es, therefore the number of possible morphological tags is unlim- ited. We attempted to cope with these problems by training a separate model for each of the 126 morphological features recognized by the morphological analyzer. The resulting decision lists independently vote on each of the potential parses of a word and the nal parse is selected based on our condence on these votes. The accuracy of our model (96%) is slightly above the best previously reported results which use statistical models. For compari- son, when we train a single decision list on full tags instead of using separate models on each feature we get 91% accuracy.
Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL, pages 328–334,
New York, June 2006. c
2006 Association for Computational Linguistics
Learning Morphological Disambiguation Rules for Turkish
Deniz Yuret
Dept. of Computer Engineering
Koc¸ University
˙
Istanbul, Turkey
dyuret@ku.edu.tr
Ferhan T ¨
ure
Dept. of Computer Engineering
Koc¸ University
˙
Istanbul, Turkey
fture@ku.edu.tr
Abstract
In this paper, we present a rule based
model for morphological disambiguation
of Turkish. The rules are generated by a
novel decision list learning algorithm us-
ing supervised training. Morphological
ambiguity (e.g. lives = live+s or life+s)
is a challenging problem for agglutinative
languages like Turkish where close to half
of the words in running text are morpho-
logically ambiguous. Furthermore, it is
possible for a word to take an unlimited
number of suffixes, therefore the number
of possible morphological tags is unlim-
ited. We attempted to cope with these
problems by training a separate model for
each of the 126 morphological features
recognized by the morphological analyzer.
The resulting decision lists independently
vote on each of the potential parses of a
word and the final parse is selected based
on our confidence on these votes. The
accuracy of our model (96%) is slightly
above the best previously reported results
which use statistical models. For compari-
son, when we train a single decision list on
full tags instead of using separate models
on each feature we get 91% accuracy.
1 Introduction
Morphological disambiguation is the task of select-
ing the correct morphological parse for a given word
in a given context. The possible parses of a word
are generated by a morphological analyzer. In Turk-
ish, close to half the words in running text are mor-
phologically ambiguous. Below is a typical word
“masalı” with three possible parses.
masal+Noun+A3sg+Pnon+Acc (= the story)
masal+Noun+A3sg+P3sg+Nom (= his story)
masa+Noun+A3sg+Pnon+NomˆDB+Adj+With
(= with tables)
Table 1: Three parses of the word “masalı”
The first two parses start with the same root,
masal (= story, fable), but the interpretation of the
following +ı suffix is the Accusative marker in one
case, and third person possessive agreement in the
other. The third parse starts with a different root,
masa (= table) followed by a derivational suffix +lı
(= with) which turns the noun into an adjective. The
symbol ˆDB represents a derivational boundary and
splits the parse into chunks called inflectional groups
(IGs).1
We will use the term feature to refer to individual
morphological features like +Acc and +With; the
term IG to refer to groups of features split by deriva-
tional boundaries (ˆDB), and the term tag to refer to
the sequence of IGs following the root.
Morphological disambiguation is a useful first
step for higher level analysis of any language but it
is especially critical for agglutinative languages like
Turkish, Czech, Hungarian, and Finnish. These lan-
guages have a relatively free constituent order, and
1See (Oflazer et al., 1999) for a detailed description of the
morphological features used in this paper.
328
syntactic relations are partly determined by morpho-
logical features. Many applications including syn-
tactic parsing, word sense disambiguation, text to
speech synthesis and spelling correction depend on
accurate analyses of words.
An important qualitative difference between part
of speech tagging in English and morphological dis-
ambiguation in an agglutinative language like Turk-
ish is the number of possible tags that can be as-
signed to a word. Typical English tag sets include
less than a hundred tag types representing syntac-
tic and morphological information. The number of
potential morphological tags in Turkish is theoret-
ically unlimited. We have observed more than ten
thousand tag types in our training corpus of a mil-
lion words. The high number of possible tags poses
a data sparseness challenge for the typical machine
learning approach, somewhat akin to what we ob-
serve in word sense disambiguation.
One way out of this dilemma could be to ignore
the detailed morphological structure of the word and
focus on determining only the major and minor parts
of speech. However (Oflazer et al., 1999) observes
that the modifier words in Turkish can have depen-
dencies to any one of the inflectional groups of a
derived word. For example, in “mavi masalı oda” (=
the room with a blue table) the adjective “mavi” (=
blue) modifies the noun root “masa” (= table) even
though the final part of speech of “masalı” is an ad-
jective. Therefore, the final part of speech and in-
flection of a word do not carry sufficient information
for the identification of the syntactic dependencies
it is involved in. One needs the full morphological
analysis.
Our approach to the data sparseness problem is
to consider each morphological feature separately.
Even though the number of potential tags is un-
limited, the number of morphological features is
small: The Turkish morphological analyzer we use
(Oflazer, 1994) produces tags that consist of 126
unique features. For each unique feature f, we take
the subset of the training data in which one of the
parses for each instance contain f. We then split this
subset into positive and negative examples depend-
ing on whether the correct parse contains the feature
f. These examples are used to learn rules using the
Greedy Prepend Algorithm (GPA), a novel decision
list learner.
To predict the tag of an unknown word, first the
morphological analyzer is used to generate all its
possible parses. The decision lists are then used to
predict the presence or absence of each of the fea-
tures contained in the candidate parses. The results
are probabilistically combined taking into account
the accuracy of each decision list to select the best
parse. The resulting tagging accuracy is 96% on a
hand tagged test set.
A more direct approach would be to train a single
decision list using the full tags as the target classifi-
cation. Given a word in context, such a decision list
assigns a complete morphological tag instead of pre-
dicting individual morphological features. As such,
it does not need the output of a morphological ana-
lyzer and should be considered a tagger rather than
a disambiguator. For comparison, such a decision
list was built, and its accuracy was determined to be
91% on the same test set.
The main reason we chose to work with decision
lists and the GPA algorithm is their robustness to ir-
relevant or redundant features. The input to the deci-
sion lists include the suffixes of all possible lengths
and character type information within a five word
window. Each instance ends up with 40 attributes on
average which are highly redundant and mostly irrel-
evant. GPA is able to sort out the relevant features
automatically and build a fairly accurate model. Our
experiments with Naive Bayes resulted in a signif-
icantly worse performance. Typical statistical ap-
proaches include the tags of the previous words as
inputs in the model. GPA was able to deliver good
performance without using the previous tags as in-
puts, because it was able to extract equivalent infor-
mation implicit in the surface attributes. Finally, un-
like most statistical approaches, the resulting models
of GPA are human readable and open to interpreta-
tion as Section 3.1 illustrates.
The next section will review related work. Sec-
tion 3 introduces decision lists and the GPA training
algorithm. Section 4 presents the experiments and
the results.
2 Related Work
There is a large body of work on morphological dis-
ambiguation and part of speech tagging using a va-
riety of rule-based and statistical approaches. In the
329
rule-based approach a large number of hand crafted
rules are used to select the correct morphological
parse or POS tag of a given word in a given context
(Karlsson et al., 1995; Oflazer and T ¨ur, 1997). In
the statistical approach a hand tagged corpus is used
to train a probabilistic model which is then used to
select the best tags in unseen text (Church, 1988;
Hakkani-T¨ur et al., 2002). Examples of statisti-
cal and machine learning approaches that have been
used for tagging include transformation based learn-
ing (Brill, 1995), memory based learning (Daele-
mans et al., 1996), and maximum entropy models
(Ratnaparkhi, 1996). It is also possible to train sta-
tistical models using unlabeled data with the ex-
pectation maximization algorithm (Cutting et al.,
1992). Van Halteren (1999) gives a comprehensive
overview of syntactic word-class tagging.
Previous work on morphological disambiguation
of inflectional or agglutinative languages include
unsupervised learning for of Hebrew (Levinger
et al., 1995), maximum entropy modeling for Czech
(Hajiˇc and Hladk´a, 1998), combination of statistical
and rule-based disambiguation methods for Basque
(Ezeiza et al., 1998), transformation based tagging
for Hungarian (Megyesi, 1999).
Early work on Turkish used a constraint-based ap-
proach with hand crafted rules (Oflazer and Kuru ¨oz,
1994). A purely statistical morphological disam-
biguation model was recently introduced (Hakkani-
T¨ur et al., 2002). To counter the data sparseness
problem the morphological parses are split across
their derivational boundaries and certain indepen-
dence assumptions are made in the prediction of
each inflectional group.
A combination of three ideas makes our approach
unique in the field: (1) the use of decision lists and
a novel learning algorithm that combine the statis-
tical and rule based techniques, (2) the treatment of
each individual feature separately to address the data
sparseness problem, and (3) the lack of dependence
on previous tags and relying on surface attributes
alone.
3 Decision Lists
We introduce a new method for morphological dis-
ambiguation based on decision lists. A decision list
is an ordered list of rules where each rule consists
of a pattern and a classification (Rivest, 1987). In
our application the pattern specifies the surface at-
tributes of the words surrounding the target such as
suffixes and character types (e.g. upper vs. lower
case, use of punctuation, digits). The classification
indicates the presence or absence of a morphological
feature for the center word.
3.1 A Sample Decision List
We will explain the rules and their patterns using the
sample decision list in Table 2 trained to identify the
feature +Det (determiner).
Rule Class Pattern
1 1 W=˜c¸ ok R1=+DA
2 1 L1=˜pek
3 0 W=+AzI
4 0 W=˜c¸ ok
5 1
Table 2: A five rule decision list for +Det
The value in the class column is 1 if word W
should have a +Det feature and 0 otherwise. The
pattern column describes the required attributes of
the words surrounding the target word for the rule
to match. The last (default) rule has no pattern,
matches every instance, and assigns them +Det.
This default rule captures the behavior of the ma-
jority of the training instances which had +Det in
their correct parse. Rule 4 indicates a common
exception: the frequently used word “c¸ok” (mean-
ing very) should not be assigned +Det by default:
“c¸ ok” can be also used as an adjective, an adverb,
or a postposition. Rule 1 introduces an exception to
rule 4: if the right neighbor R1 ends with the suffix
+DA (the locative suffix) then “c¸ok” should receive
+Det. The meanings of various symbols in the pat-
terns are described below.
When the decision list is applied to a window of
words, the rules are tried in the order from the most
specific (rule 1) to the most general (rule 5). The first
rule that matches is used to predict the classification
of the center word. The last rule acts as a catch-all;
if none of the other rules have matched, this rule as-
signs the instance a default classification. For exam-
ple, the five rule decision list given above classifies
the middle word in “pek c¸ok alanda” (matches rule
330
W target word A [ae]
L1, L2 left neighbors I [ıiu¨u]
R1, R2 right neighbors D [dt]
== exact match B [bp]
=˜case insensitive match C [cc¸ ]
=+ is a suffix of K [kg ˘g]
Table 3: Symbols used in the rule patterns. Capital
letters on the right represent character groups useful
in identifying phonetic variations of certain suffixes,
e.g. the locative suffix +DA can surface as +de, +da,
+te, or +ta depending on the root word ending.
1) and “pek c¸ok insan” (matches rule 2) as +Det,
but “insan c¸ok daha” (matches rule 4) as not +Det.
One way to interpret a decision list is as a se-
quence of if-then-else constructs familiar from pro-
gramming languages. Another way is to see the last
rule as the default classification, the previous rule as
specifying a set of exceptions to the default, the rule
before that as specifying exceptions to these excep-
tions and so on.
3.2 The Greedy Prepend Algorithm (GPA)
To learn a decision list from a given set of training
examples the general approach is to start with a de-
fault rule or an empty decision list and keep adding
the best rule to cover the unclassified or misclassi-
fied examples. The new rules can be added to the
end of the list (Clark and Niblett, 1989), the front of
the list (Webb and Brkic, 1993), or other positions
(Newlands and Webb, 2004). Other design decisions
include the criteria used to select the “best rule” and
how to search for it.
The Greedy Prepend Algorithm (GPA) is a variant
of the PR EP END algorithm (Webb and Brkic, 1993).
It starts with a default rule that matches all instances
and classifies them using the most common class in
the training data. Then it keeps prepending the rule
with the maximum gain to the front of the grow-
ing decision list until no further improvement can be
made. The algorithm can be described as follows:
GPA(data )
1dlist N IL
2default-class MOS T-COM MO N-C LAS S(data)
3rule [if TRU E then default-class]
4while GAI N(rule,dlist,data)>0
5do dlist prepend(rule,dlist)
6rule MAX -GA IN -RUL E(dlist ,data)
7return dlist
The gain of a candidate rule in GPA is defined
as the increase in the number of correctly classified
instances in the training set as a result of prepend-
ing the rule to the existing decision list. This is
in contrast with the original PRE PEN D algorithm
which uses the less direct Laplace preference func-
tion (Webb and Brkic, 1993; Clark and Boswell,
1991).
To find the next rule with the maximum gain, GPA
uses a heuristic search algorithm. Candidate rules
are generated by adding a single new attribute to the
pattern of each rule already in the decision list. The
candidate with the maximum gain is prepended to
the decision list and the process is repeated until no
more positive gain rules can be found. Note that if
the best possible rule has more than one extra at-
tribute compared to the existing rules in the decision
list, a suboptimal rule will be selected. The origi-
nal PR EPE ND uses an admissible search algorithm,
OPU S, which is guaranteed to find the best possible
candidate (Webb, 1995), but we found OPU S to be
too slow to be practical for a problem of this scale.
We picked GPA for the morphological disam-
biguation problem because we find it to be fast and
fairly robust to the existence of irrelevant or redun-
dant attributes. The average training instance has
40 attributes describing the suffixes of all possible
lengths and character type information in a five word
window. Most of this information is redundant or
irrelevant to the problem at hand. The number of
distinct attributes is on the order of the number of
distinct word-forms in the training set. Nevertheless
GPA is able to process a million training instances
for each of the 126 unique morphological features
and produce a model with state of the art accuracy
in about two hours on a regular desktop PC.2
2Pentium 4 CPU 2.40GHz
331
4 Experiments and Results
In this section we present the details of the data,
the training and testing procedures, the surface at-
tributes used, and the accuracy results.
4.1 Training Data
documents 2383
sentences 50673
tokens 948404
parses 1.76 per token
IGs 1.33 per parse
features 3.29 per IG
unique tokens 111467
unique tags 11084
unique IGs 2440
unique features 126
ambiguous tokens 399223 (42.1%)
Table 4: Statistics for the training data
Our training data consists of about 1 million
words of semi-automatically disambiguated Turkish
news text. For each one of the 126 unique morpho-
logical features, we used the subset of the training
data in which instances have the given feature in at
least one of their generated parses. We then split this
subset into positive and negative examples depend-
ing on whether the correct parse contains the given
feature. A decision list specific to that feature is cre-
ated using GPA based on these examples.
Some relevant statistics for the training data are
given in Table 4.
4.2 Input Attributes
Once the training data is selected for a particular
morphological feature, each instance is represented
by surface attributes of five words centered around
the target word. We have tried larger window sizes
but no significant improvement was observed. The
attributes computed for each word in the window
consist of the following:
1. The exact word string (e.g. W==Ali’nin)
2. The lowercase version (e.g. W=˜ali’nin) Note:
all digits are replaced by 0’s at this stage.
3. All suffixes of the lowercase version (e.g.
W=+n, W=+In, W=+nIn, W=+’nIn, etc.) Note:
certain characters are replaced with capital let-
ters representing character groups mentioned in
Table 3. These groups help the algorithm rec-
ognize different forms of a suffix created by the
phonetic rules of Turkish: for example the loca-
tive suffix +DA can surface as +de, +da, +te, or
+ta depending on the ending of the root word.
4. Attributes indicating the types of characters at
various positions of the word (e.g. Ali’nin
would be described with W=UP PE R-FIR ST,
W=LO WER -MI D, W=AP OS-M ID, W=LOW ER-
LA ST)
Each training instance is represented by 40 at-
tributes on average. The GPA procedure is responsi-
ble for picking the attributes that are relevant to the
decision. No dictionary information is required or
used, therefore the models are fairly robust to un-
known words. One potentially useful source of at-
tributes is the tags assigned to previous words which
we plan to experiment with in future work.
4.3 The Decision Lists
At the conclusion of the training, 126 decision lists
are produced of the form given in Table 2. The num-
ber of rules in each decision list range from 1 to
6145. The longer decision lists are typically for part
of speech features, e.g. distinguishing nouns from
adjectives, and contain rules specific to lexical items.
The average number of rules is 266. To get an esti-
mate on the accuracy of each decision list, we split
the one million word data into training, validation,
and test portions using the ratio 4:1:1. The train-
ing set accuracy of the decision lists is consistently
above 98%. The test set accuracies of the 126 deci-
sion lists range from 80% to 100% with the average
at 95%. Table 5 gives the six worst features with test
set accuracy below 89%; these are the most difficult
to disambiguate.
4.4 Correct Tag Selection
To evaluate the candidate tags, we need to combine
the results of the decision lists. We assume that the
presence or absence of each feature is an indepen-
dent event with a probability determined by the test
set accuracy of the corresponding decision list. For
example, if the +P3pl decision list predicts YE S,
we assume that the +P3pl feature is present with
332
87.89% +Acquire To acquire (noun)
86.18% +PCIns Postposition subcat.
85.11% +Fut Future tense
84.08% +P3pl 3. plural possessive
80.79% +Neces Must
79.81% +Become To become (noun)
Table 5: The six features with the worst test set ac-
curacy.
probability 0.8408 (See Table 5). If the +Fut deci-
sion list predicts N O, we assume the +Fut feature is
present with probability 10.8511 = 0.1489. To
avoid zero probabilities we cap the test set accura-
cies at 99%.
Each candidate tag indicates the presence of cer-
tain features and the absence of others. The prob-
ability of the tag being correct under our indepen-
dence assumption is the product of the probabilities
for the presence and absence of each of the 126 fea-
tures as determined by our decision lists. For effi-
ciency, one can neglect the features that are absent
from all the candidate tags because their contribu-
tion will not effect the comparison.
4.5 Results
The final evaluation of the model was performed on
a test data set of 958 instances. The possible parses
for each instance were generated by the morpholog-
ical analyzer and the correct one was picked manu-
ally. 40% of the instances were ambiguous, which
on the average had 3.9 parses. The disambiguation
accuracy of our model was 95.82%. The 95% confi-
dence interval for the accuracy is [0.9457, 0.9708].
An analysis of the mistakes in the test data show
that at least some of them are due to incorrect tags
in our training data. The training data was semi-
automatically generated and thus contained some er-
rors. Based on hand evaluation of the differences be-
tween the training data tags and the GPA generated
tags, we estimate the accuracy of the training data to
be below 95%. We ran two further experiments to
see if we could improve on the initial results.
In our first experiment we used our original model
to re-tag the training data. The re-tagged training
data was used to construct a new model. The result-
ing accuracy on the test set increased to 96.03%, not
a statistically significant improvement.
In our second experiment we used only unam-
biguous instances for training. Decision list training
requires negative examples, so we selected random
unambiguous instances for positive and negative ex-
amples for each feature. The accuracy of the result-
ing model on the test set was 82.57%. The problem
with selecting unambiguous instances is that certain
common disambiguation decisions are never repre-
sented during training. More careful selection of
negative examples and a sophisticated bootstrapping
mechanism may still make this approach workable.
Finally, we decided to see if our decision lists
could be used for tagging rather than disambigua-
tion, i.e. given a word in a context decide on the full
tag without the help of a morphological analyzer.
Even though the number of possible tags is unlim-
ited, the most frequent 1000 tags cover about 99%
of the instances. A single decision list trained with
the full tags was able to achieve 91.23% accuracy
using 10000 rules. This is a promising result and
will be explored further in future work.
5 Contributions
We have presented an automated approach to learn
morphological disambiguation rules for Turkish us-
ing a novel decision list induction algorithm, GPA.
The only input to the rules are the surface attributes
of a five word window. The approach can be gener-
alized to other agglutinative languages which share
the common challenge of a large number of poten-
tial tags. Our approach for resolving the data sparse-
ness problem caused by the large number of tags is
to generate a separate model for each morphologi-
cal feature. The predictions for individual features
are probabilistically combined based on the accu-
racy of each model to select the best tag. We were
able to achieve an accuracy around 96% using this
approach.
Acknowledgments
We would like to thank Kemal Oflazer of Sabancı
University for providing us with the Turkish mor-
phological analyzer, training and testing data for dis-
ambiguation, and valuable feedback.
333
References
Brill, E. (1995). Transformation-based error-driven
learning and natural language processing: A case
study in part-of-speech tagging. Computational Lin-
guistics, 21(4):543–565.
Church, K. W. (1988). A stochastic parts program and
noun phrase parser for unrestricted text. In Proceed-
ings of the Second Conference on Applied Natural
Language Processing, pages 136–143.
Clark, P. and Boswell, R. (1991). Rule induction with
CN2: Some recent improvements. In Kodratoff,
Y., editor, Machine Learning – Proceedings of the
Fifth European Conference (EWSL-91), pages 151–
163, Berlin. Springer-Verlag.
Clark, P. and Niblett, T. (1989). The CN2 induction al-
gorithm. Machine Learning, 3:261–283.
Cutting, D., Kupiec, J., Pedersen, J., and Sibun, P. (1992).
A practical part-of-speech tagger. In Proceedings of
the 3rd Conference on Applied Language Processing,
pages 133–140.
Daelemans, W. et al. (1996). MBT: A memory-based
part of speech tagger-generator. In Ejerhead, E. and
Dagan, I., editors, Proceedings of the Fourth Workshop
on Very Large Corpora, pages 14–27.
Ezeiza, N. et al. (1998). Combining stochastic and rule-
based methods for disambiguation in agglutinative lan-
guages. In Proceedings of the 36th Annual Meeting of
the Association for Computational Linguistics (COL-
ING/ACL98), pages 379–384.
Hajiˇc, J. and Hladk´a, B. (1998). Tagging inflective lan-
guages: Prediction of morphological categories for a
rich, structured tagset. In Proceedings of the 36th
Annual Meeting of the Association for Computational
Linguistics (COLING/ACL98), pages 483–490, Mon-
treal, Canada.
Hakkani-T¨ur, D. Z., Oflazer, K., and T ¨ur, G. (2002).
Statistical morphological disambiguation for aggluti-
native languages. Computers and the Humanities,
36:381–410.
Karlsson, F., Voutialinen, A., Heikkil¨a, J., and Anttila, A.
(1995). Constraint Grammar - A Language Indepen-
dent System for Parsing Unrestricted Text. Mouton de
Gruyter.
Levinger, M., Ornan, U., and Itai, A. (1995). Learning
morpho-lexical probabilities from an untagged corpus
with an application to hebrew. Computational Lin-
guistics, 21(3):383–404.
Megyesi, B. (1999). Improving brill’s pos tagger for an
agglutinative language. In Pascale, F. and Joe, Z., ed-
itors, Proceedings of the Joing SIGDAT Conference
on Empirical Methods in Natural Language and Very
Large Corpora, pages 275–284, College Park, Mary-
land, USA.
Newlands, D. and Webb, G. I. (2004). Alternative strate-
gies for decision list construction. In Proceedings of
the Fourth Data Mining Conference (DM IV 03), pages
265–273.
Oflazer, K. (1994). Two-level description of turkish
morphology. Literary and Linguistic Computing,
9(2):137–148.
Oflazer, K., Hakkani-T¨ur, D. Z., and T¨ur, G. (1999).
Design for a turkish treebank. In Proceedings of
the Workshop on Linguistically Interpreted Corpora,
EACL 99, Bergen, Norway.
Oflazer, K. and Kuru¨oz, ˙
I. (1994). Tagging and morpho-
logical disambiguation of turkish text. In Proceedings
of the 4th Applied Natural Language Processing Con-
ference, pages 144–149. ACL.
Oflazer, K. and T¨ur, G. (1997). Morphological disam-
biguation by voting constraints. In Proceedings of the
35th Annual Meeting of the Association for Computa-
tional Linguistics (ACL97, EACL97), Madrid, Spain.
Ratnaparkhi, A. (1996). A maximum entropy model for
part-of-speech tagging. In Proceedings of the Confer-
ence on Empirical Methods in Natural Language Pro-
cessing.
Rivest, R. L. (1987). Learning decision lists. Machine
Learning, 2:229–246.
van Halteren, H., editor (1999). Syntactic Wordclass Tag-
ging. Text, Speech and Language Technology. Kluwer
Academic Publishers.
Webb, G. I. (1995). Opus: An efficient admissible algo-
rithm for unordered search. JAIR, 3:431–465.
Webb, G. I. and Brkic, N. (1993). Learning decision lists
by prepending inferred rules. In Proceedings of the AI
93 Workshop on Machine Learning and Hybrid Sys-
tems, pages 6–10, Melbourne.
334
... Rule based methods apply hand-crafted rules in order to select the correct morphological analyses or eliminate incorrect ones (Oflazer and Kuruöz 1994;Oflazer and Tur 1996;Daybelge and Cicekli 2007). Yüret and Türe (2006) proposed a decision list learning algorithm for extraction of (Oflazer and Tur 1996;Kutlu and Cicekli 2013). We propose a deep neural architecture followed by the Viterbi algorithm for morphological disambiguation of words in a sentence. ...
... Moreover, compared to an agglu-tinative language such as Turkish, English words can take on a limited number of word forms and part-of-speech tags. Yüret and Türe (2006) observe that more than ten thousand tag types exists in a corpus comprised of a million Turkish words. Thus, due to the high number of possible tags and the number of possible analyses in languages with productive morphology, morphological disambiguation is quite different from part-of-speech tagging in English. ...
... They integrate their generative model to NLP applications such as language modeling, word alignment and morphological disambiguation and obtain state-of-the-art results for Russian morphological disambiguation. Yüret and Türe (2006) extract Turkish morphological disambiguation rules using a decision list learner, Greedy Prepend Algorithm (GPA), and they achieve 95.8% accuracy on manually disambiguated data consisting of around 1K words. Megyesi (1999) adapt a transformation based syntactic rule learner (Brill 1995) for Hungarian and Hajic (1998) extend his work for Czech and five other languages. ...
Preprint
Agglutinative languages such as Turkish, Finnish and Hungarian require morphological disambiguation before further processing due to the complex morphology of words. A morphological disambiguator is used to select the correct morphological analysis of a word. Morphological disambiguation is important because it generally is one of the first steps of natural language processing and its performance affects subsequent analyses. In this paper, we propose a system that uses deep learning techniques for morphological disambiguation. Many of the state-of-the-art results in computer vision, speech recognition and natural language processing have been obtained through deep learning models. However, applying deep learning techniques to morphologically rich languages is not well studied. In this work, while we focus on Turkish morphological disambiguation we also present results for French and German in order to show that the proposed architecture achieves high accuracy with no language-specific feature engineering or additional resource. In the experiments, we achieve 84.12, 88.35 and 93.78 morphological disambiguation accuracy among the ambiguous words for Turkish, German and French respectively.
... Machine learning and deep learning approaches are commonly applied for Turkish MD problem, as well. In [28], a rule-based approach utilizing decision lists is used, named as the Greedy prepend algorithm. The features of surface form of the ambiguous words are used in the method together with the preceding and following words, and this approach achieved 91.23% accuracy without the help of a morphological analyzer and 95.82% accuracy when the morphological analysis is carried out by an external morphological analyzer. ...
... The first one is TrMor2018 [9] which is recently used in MD studies. The second one is the previous version of the first data set; trMor2006 [28]. The last dataset is an unambiguous dataset. ...
... Both TrMor2006 [28] and TrMor2018 [9] are semiautomatically generated morphology data sets. Since TrMor2006 training data set has limited accuracy, a new data set is required. ...
... Rule based methods apply hand-crafted rules in order to select the correct morphological analyses or eliminate incorrect ones (Oflazer and Kuruöz 1994;Oflazer and Tur 1996;Daybelge and Cicekli 2007). Yüret and Türe (2006) proposed a decision list learning algorithm for extraction of (Oflazer and Tur 1996;Kutlu and Cicekli 2013). We propose a deep neural architecture followed by the Viterbi algorithm for morphological disambiguation of words in a sentence. ...
... Moreover, compared to an agglu-tinative language such as Turkish, English words can take on a limited number of word forms and part-of-speech tags. Yüret and Türe (2006) observe that more than ten thousand tag types exists in a corpus comprised of a million Turkish words. Thus, due to the high number of possible tags and the number of possible analyses in languages with productive morphology, morphological disambiguation is quite different from part-of-speech tagging in English. ...
... They integrate their generative model to NLP applications such as language modeling, word alignment and morphological disambiguation and obtain state-of-the-art results for Russian morphological disambiguation. Yüret and Türe (2006) extract Turkish morphological disambiguation rules using a decision list learner, Greedy Prepend Algorithm (GPA), and they achieve 95.8% accuracy on manually disambiguated data consisting of around 1K words. Megyesi (1999) adapt a transformation based syntactic rule learner (Brill 1995) for Hungarian and Hajic (1998) extend his work for Czech and five other languages. ...
Article
Full-text available
Agglutinative languages such as Turkish, Finnish andHungarian require morphological disambiguation beforefurther processing due to the complex morphologyof words. A morphological disambiguator is usedto select the correct morphological analysis of a word.Morphological disambiguation is important because itgenerally is one of the first steps of natural languageprocessing and its performance affects subsequent analyses.In this paper, we propose a system that uses deeplearning techniques for morphological disambiguation.Many of the state-of-the-art results in computer vision,speech recognition and natural language processinghave been obtained through deep learning models.However, applying deep learning techniques to morphologicallyrich languages is not well studied. In this work,while we focus on Turkish morphological disambiguationwe also present results for French and German inorder to show that the proposed architecture achieveshigh accuracy with no language-specific feature engineeringor additional resource. In the experiments, weachieve 84.12 , 88.35 and 93.78 morphological disambiguationaccuracy among the ambiguous words forTurkish, German and French respectively.
... The exact procedure used for the disambiguation is unclear. The corpus was introduced by Hakkani- Tür et al., (2002), and made publicly available by later studies on morphological disambiguation (Dayanık et al., 2018;Sak et al., 2011;Yüret & Türe, 2006;). Another fully manually disambiguated dataset consisting of 25098 words is reported in Kutlu and Çiçekli (2013), which can be obtained from the authors via email. ...
Article
Full-text available
This paper presents a comprehensive survey of corpora and lexical resources available for Turkish. We review a broad range of resources, focusing on the ones that are publicly available. In addition to providing information about the available linguistic resources, we present a set of recommendations, and identify gaps in the data available for conducting research and building applications in Turkish Linguistics and Natural Language Processing.
... The exact procedure used for the disambiguation is unclear. The corpus was introduced by Hakkani- Tür et al. (2002), and made publicly available by later studies on morphological disambiguation (Yüret and Türe, 2006;Sak et al., 2011;Dayanık et al., 2018). Another fully manually disambiguated dataset consisting of 25 098 words is reported in Kutlu and Çiçekli (2013), which is not publicly available but can be obtained from the authors via email. ...
Preprint
Full-text available
This paper presents a comprehensive survey of corpora and lexical resources available for Turkish. We review a broad range of resources, focusing on the ones that are publicly available. In addition to providing information about the available linguistic resources, we present a set of recommendations, and identify gaps in the data available for conducting research and building applications in Turkish Linguistics and Natural Language Processing.
Article
Full-text available
In this paper, a contrastive learning approach for morphological disambiguation (MD) using large language models (LLMs) is presented. A contrastive loss function is introduced for training the approach, which reduces the distance between the correct analysis and contextual embeddings while maintaining a margin between correct and incorrect embeddings. One of the aims of the paper is to analyze the effects of fine-tuning an LLM on MD in morphologically complex languages (MCLs) with special reference to low-resource languages such as Kazakh, as well as Turkish. Another goal of the paper is to consider various distance measures for this contrastive loss function, aiming to achieve better results when performing disambiguation by computing the distance between the context and the analysis embeddings. The existing approaches for morphological disambiguation, such as HMM-based and feature-engineering approaches, have limitations in modeling long-term dependencies and in the case of large, sparse tagsets. These challenges are mitigated in the proposed approach by leveraging LLMs, thus achieving better accuracy in handling the cases of ambiguity and OOV tokens without the need to rely on other features. Experiments were conducted on three datasets for two MCLs, Kazakh and Turkish—the former is a typical low-resource language. The results revealed that the proposed approach with contrastive loss improves MD performance when integrated with knowledge from large language models.
Article
In this paper, we introduce the resources that we developed for Turkish dependency parsing, which include a novel manually annotated treebank (BOUN Treebank), along with the guidelines we adopted, and a new annotation tool (BoAT). The manual annotation process that we employed was shaped and implemented by a team of four linguists and five Natural Language Processing (NLP) specialists. Decisions regarding the annotation of the BOUN Treebank were made in line with the Universal Dependencies (UD) framework as well as our recent efforts for unifying the Turkish UD treebanks through manual re-annotation. To the best of our knowledge, the BOUN Treebank is the largest Turkish UD treebank. It contains a total of 9761 sentences from various topics including biographical texts, national newspapers, instructional texts, popular culture articles, and essays. In addition, we report the parsing results of a state-of-the-art dependency parser obtained over the BOUN Treebank as well as two other treebanks in Turkish. Our results demonstrate that the unification of the Turkish annotation scheme and the introduction of a more comprehensive treebank lead to improved performance with regards to dependency parsing.
Article
Full-text available
This work surveys well-known approaches to building decision lists. Some novel variations to strategies based on default rules for the most common class and inser- tion of new rules before the default rule are presented. These are expected to offer speed up in the construction of the decision list as well as co mpression of the length of the list. These strategies and a testing regime hav e been implemented and some empirical studies done to compare the strategies. Experimental results are presented and interpreted. We show that all strategies deliver decision lists of comparable accuracy. However, two techniques are shown to deliver this accu- racy with lists composed of significantly fewer rules than al ternative strategies. Of these, one also demonstrates significant computational adv antages. The prepend- ing strategy is also demonstrated to produce decision lists which are, as much as an order of magnitude, shorter than those produced by CN2.
Conference Paper
Full-text available
This paper describes a full two-level morphological description of Turkish word structures. The description has been implemented using the PC-KIMMO environment and is based on a root word lexicon of about 23,000 root words. The phonetic rules of contemporary Turkish (spoken in Turkey) have been encoded using 22 two-level rules while the morphotactics of the agglutinative word structures have been encoded as finite-state machines for verbal, nominal paradigms and other categories. Almost all the special cases of, and exceptions to phonological and morphological rules have been taken into account. In this paper, we describe the rules and the finite state machines along with examples and a discussion of how various special cases were handled. We also describe some known limitations and problems with this description.
Conference Paper
Full-text available
We present statistical models for morphological disambiguation in agglutinative languages, with a specific application to Turkish. Turkish presents an interesting problem for statistical models as the potential tag set size is very large because of the productive derivational morphology. We propose to handle this by breaking up the morhosyntactic tags into inflectional groups, each of which contains the inflectional features for each (intermediate) derived form. Our statistical models score the probability of each morhosyntactic tag by considering statistics over the individual inflectional groups and surface roots in trigram models. Among the four models that we have developed and tested, the simplest model ignoring the local morphotactics within words performs the best. Our best trigram model performs with 93.95% accuracy on our test data getting all the morhosyntactic and semantic features correct. If we are just interested in syntactically relevant features and ignore a very small set of semantic features, then the accuracy increases to 95.07%.
Conference Paper
Full-text available
The paper puts forward a quasi-dependency model for structural analysis of Chinese baseNPs and a MDL-based algorithm for quasi-dependency-strength acquisition. The experiments show that the proposed model is more suitable for Chinese baseNP analysis ...
Article
This paper introduces a new representation for Boolean functions, called decision lists, and shows that they are efficiently learnable from examples. More precisely, this result is established for k-;DL – the set of decision lists with conjunctive clauses of size k at each decision. Since k-DL properly includes other well-known techniques for representing Boolean functions such as k-CNF (formulae in conjunctive normal form with at most k literals per clause), k-DNF (formulae in disjunctive normal form with at most k literals per term), and decision trees of depth k, our result strictly increases the set of functions that are known to be polynomially learnable, in the sense of Valiant (1984). Our proof is constructive: we present an algorithm that can efficiently construct an element of k-DL consistent with a given set of examples, if one exists.
Conference Paper
A program that tags each word in an input sentence with the most likely part of speech has been written. The program uses a linear-time dynamic programming algorithm to find an assignment of parts of speech to words that optimizes the product of (a) lexical probabilities (probability of observing part of speech i given word i) and (b) contextual probabilities (probability of observing part of speech i given n following parts of speech). Program performance is encouraging; a 400-word sample is presented and is judged to be 99.5% correct
Article
Systems for inducing concept descriptions from examples are valuable tools for assisting in the task of knowledge acquisition for expert systems. This paper presents a description and empirical evaluation of a new induction system. CN2, designed for the efficient induction of simple, comprehensible production rules in domains where problems of poor description language and/or noise may be present. Implementations of the CN2, ID3, and AQ algorithms are compared on three medical classification tasks.