PosterPDF Available

Can word embeddings be used in an application of morphosyntactic disambiguation task?

Authors:

Abstract and Figures

Submission for PolEval competition task 1A, http://poleval.pl/
Content may be subject to copyright.
Can word embeddings be used in an application of
morphosyntactic disambiguation task?
Adrianna Janik
Sages sp. z o.o. (Kodołamacz.pl) Contact Information:
Email: ada.janik@gmail.com
Abstract
This article is an exploration of a model for choosing morphosyntactic dis-
ambiguation within given forms. The model is based on an idea of words
embeddings combined with the graph theory. The evaluation was made using
accuracy score of 85% for the test dataset. The embeddings were constructed
by using morphosyntactic forms in place of words. This approach was pur-
sued due to an observation that the idea of embeddings can be generalized for
other sequence-like problems or, as in this case, different forms of words. This
model is just a demonstration of how the morphosyntatic embeddings can be
used in a disambiguation task. Although it cannot be used as a stand-alone
disambiguer yet, it can probably have a partial application in hybrid solutions.
Introduction
The problem of word embeddings is a relatively new and interesting
trend in the deep learning studies. Word2Vec [1] and the idea of word
embeddings originated in the domain of Natural Language Processing.
As we can see the idea of words within the context of a sentence or a
surrounding word window is universal. It can be applied to any prob-
lem dealing with sequences of related data.
The starting points are the definitions. The first question is: what do
we understand by vocabulary? The Mirriam-Webster dictionary de-
fines it as: ‘a list or collection of words(. . . )’. And what about a word?
According to Mirriam-Webster it is ‘a speech sound or series of speech
sounds that symbolizes and communicates a meaning usually without
being divisible into smaller units capable of independent use’. Now
the whole question is, if we describe a word as a set of morphosyntac-
tic tags representing the traditionally defined word and use this simpler
model in the embeddings, what will be the final result?
Giving the sequence of morphosyntactic forms as it occurs in a
dataset we can extract some information about the syntax of the lan-
guage. This model as vocabulary instead of raw words takes its mor-
phosyntactic forms e.g.: subst:sg:acc:m3’ or ’prep:nom’ and for each
adjacent pair of possible disambiguations in the sentence calculates
similarity. Following that reasoning, for each sentence weighted a
directed graph of disambiguations is created in such a way that the
weight between two disambiguation-nodes is the similarity. As a re-
sult, with such a model we can analyze the graph in search for the
shortest path between the beginning of the sentence and its end.
Goal
The goal of this paper is to present a model that originated from sub-
mission to POLEVAL 2017 task 1A: Morphosyntactic disambiguation
and guessing for the Polish language.
1 Model architecture
First, embedding model for morphosyntactic forms was taught us-
ing Word2Vec skip-gram model. As vocabulary, the morphosyntactic
forms were taken, e.g. for a sentence: ’Rzecz gustu :)’ a graph shown
in Fig. 2 was created. The embeddings were taught using Adagrad
optimizer and a mean sampled softmax loss function was minimized.
Second, for each sentence a directed graph with weighted edges was
created in such a way that each morphosyntactic form within a sentence
was represented by the nodes. The edges corresponded to the relation
between adjacent words. The weights were calculated as a similarity
between morphosyntactic forms in words vector space.
Third, a graph analysis was performed for each sentence. The least
cost path of certain number of nodes was calculated between a begin-
ning and end of a sentence. K shortest paths were found using al-
gorithm proposed by Jin Yen [2] On Fig. 2 we can notice a marked
path chosen by the algorithm and also marked as a gold standard in the
dataset. The complexity of this algorithm is O(EST+MKN3)
where E indicates the number of epochs which in this case was 30, S
the number of steps to show the whole dataset with batches, T indi-
cates the number of words in a batch (128), M indicates the number of
sentences and K denotes the number of shortest paths that needs to be
found in order to get the result (in the worst case K= 100, in the best
case 1), at last we have N which indicates the number of nodes in each
graph.
Measure value
Accuracy 0.85
Precision 0.27
Recall 0.24
F-score 0.25
Table 1: Model evaluation
2 Morphosyntactic embeddings
In Fig. 1 we can see 30 points from t-SNE representation of the learned
embeddings. Even in such a small subset we can detect some groups.
In the red area we can see the words in a subgroup of substantives
(subst), in the green we can see the subset of prepositions (prep), in the
lower part of a graph we can see adverbs (adv).
3 Morphosyntactic graph
After the embeddings were learned, a graph of all possible disambigua-
tions for each sentence in dataset was created. In a case, when the first
word in a sentence had many forms, an artificial start node was created
with 0 weighted edges joining all disambiguations for the first word
(Fig. 3). The similar operation was applied to the last word or sign
in the sentence - a fake end node was inserted. Finally, the operation
described above allowed us to use standard algorithms for searching of
the shortest path in a graph.
Figure 1: Created graph for sentence: ’Rzecz gustu :)’
Figure 2: Created graph for sentence: ’A w Gda´
nsku?’
4 Status of research
The model was trained on dataset coming from PolEval competition.
For test data accuracy of the output was 85%. Unfortunately, false
positive examples (FP=72679) were much higher than true positive
(TP=26721) and that gave us an accuracy that cannot evaluate the
model adequately in the discussed problem, better metrics for such
data give rather low results (Table 1).
Figure 3: ROC curve for graph-based classifier
Table 2: t-SNE representation of embeddings
5 Conclusion and future work
The work is still in the exploratory phase so for the meaningful results
we should wait some time, I stay optimistic despite having not satis-
factory results for this application. The intuition behind the model is
that morphosyntactic form is just another representation of a word with
slightly different information and it enables prediction of the following
morphosyntatic forms similarly to the Word2Vec model for predicting
the following words.
Morphosyntactic embeddings could be used to predict some correct
grammatical forms as suggestions. For future work, there is a chal-
lenge to estimate similarity between morphosyntactic forms and train
model on a bigger corpus, joining sources coming from articles and
books as well as those ones coming from Internet forums. It would
be very interesting to observe how the results depend on the level of
grammatical correctness of the data.
References
[1] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Ef-
ficient estimation of word representations in vector space. CoRR,
abs/1301.3781, 2013.
[2] Jin Y. Yen. Finding the k shortest loopless paths in a network.
17:712–716, 01 1970.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.
Article
This paper presents an algorithm for finding the K loopless paths that have the shortest lengths from one node to another node in a network. The significance of the new algorithm is that its computational upper bound increases only linearly with the value of K. Consequently, in general, the new algorithm is extremely efficient as compared with the algorithms proposed by Bock, Kantner, and Haynes [2], Pollack [7], [8], Clarke, Krikorian, and Rausan [3], Sakarovitch [9] and others. This paper first reviews the algorithms presently available for finding the K shortest loopless paths in terms of the computational effort and memory addresses they require. This is followed by the presentation of the new algorithm and its justification. Finally, the efficiency of the new algorithm is examined and compared with that of other algorithms.