Can word embeddings be used in an application of
morphosyntactic disambiguation task?
Sages sp. z o.o. (Kodołamacz.pl) Contact Information:
This article is an exploration of a model for choosing morphosyntactic dis-
ambiguation within given forms. The model is based on an idea of words
embeddings combined with the graph theory. The evaluation was made using
accuracy score of 85% for the test dataset. The embeddings were constructed
by using morphosyntactic forms in place of words. This approach was pur-
sued due to an observation that the idea of embeddings can be generalized for
other sequence-like problems or, as in this case, different forms of words. This
model is just a demonstration of how the morphosyntatic embeddings can be
used in a disambiguation task. Although it cannot be used as a stand-alone
disambiguer yet, it can probably have a partial application in hybrid solutions.
The problem of word embeddings is a relatively new and interesting
trend in the deep learning studies. Word2Vec  and the idea of word
embeddings originated in the domain of Natural Language Processing.
As we can see the idea of words within the context of a sentence or a
surrounding word window is universal. It can be applied to any prob-
lem dealing with sequences of related data.
The starting points are the deﬁnitions. The ﬁrst question is: what do
we understand by vocabulary? The Mirriam-Webster dictionary de-
ﬁnes it as: ‘a list or collection of words(. . . )’. And what about a word?
According to Mirriam-Webster it is ‘a speech sound or series of speech
sounds that symbolizes and communicates a meaning usually without
being divisible into smaller units capable of independent use’. Now
the whole question is, if we describe a word as a set of morphosyntac-
tic tags representing the traditionally deﬁned word and use this simpler
model in the embeddings, what will be the ﬁnal result?
Giving the sequence of morphosyntactic forms as it occurs in a
dataset we can extract some information about the syntax of the lan-
guage. This model as vocabulary instead of raw words takes its mor-
phosyntactic forms e.g.: ’subst:sg:acc:m3’ or ’prep:nom’ and for each
adjacent pair of possible disambiguations in the sentence calculates
similarity. Following that reasoning, for each sentence weighted a
directed graph of disambiguations is created in such a way that the
weight between two disambiguation-nodes is the similarity. As a re-
sult, with such a model we can analyze the graph in search for the
shortest path between the beginning of the sentence and its end.
The goal of this paper is to present a model that originated from sub-
mission to POLEVAL 2017 task 1A: Morphosyntactic disambiguation
and guessing for the Polish language.
1 Model architecture
First, embedding model for morphosyntactic forms was taught us-
ing Word2Vec skip-gram model. As vocabulary, the morphosyntactic
forms were taken, e.g. for a sentence: ’Rzecz gustu :)’ a graph shown
in Fig. 2 was created. The embeddings were taught using Adagrad
optimizer and a mean sampled softmax loss function was minimized.
Second, for each sentence a directed graph with weighted edges was
created in such a way that each morphosyntactic form within a sentence
was represented by the nodes. The edges corresponded to the relation
between adjacent words. The weights were calculated as a similarity
between morphosyntactic forms in words vector space.
Third, a graph analysis was performed for each sentence. The least
cost path of certain number of nodes was calculated between a begin-
ning and end of a sentence. K shortest paths were found using al-
gorithm proposed by Jin Yen  On Fig. 2 we can notice a marked
path chosen by the algorithm and also marked as a gold standard in the
dataset. The complexity of this algorithm is O(E∗S∗T+M∗K∗N3)
where E indicates the number of epochs which in this case was 30, S
the number of steps to show the whole dataset with batches, T indi-
cates the number of words in a batch (128), M indicates the number of
sentences and K denotes the number of shortest paths that needs to be
found in order to get the result (in the worst case K= 100, in the best
case 1), at last we have N which indicates the number of nodes in each
Table 1: Model evaluation
2 Morphosyntactic embeddings
In Fig. 1 we can see 30 points from t-SNE representation of the learned
embeddings. Even in such a small subset we can detect some groups.
In the red area we can see the words in a subgroup of substantives
(subst), in the green we can see the subset of prepositions (prep), in the
lower part of a graph we can see adverbs (adv).
3 Morphosyntactic graph
After the embeddings were learned, a graph of all possible disambigua-
tions for each sentence in dataset was created. In a case, when the ﬁrst
word in a sentence had many forms, an artiﬁcial start node was created
with 0 weighted edges joining all disambiguations for the ﬁrst word
(Fig. 3). The similar operation was applied to the last word or sign
in the sentence - a fake end node was inserted. Finally, the operation
described above allowed us to use standard algorithms for searching of
the shortest path in a graph.
Figure 1: Created graph for sentence: ’Rzecz gustu :)’
Figure 2: Created graph for sentence: ’A w Gda´
4 Status of research
The model was trained on dataset coming from PolEval competition.
For test data accuracy of the output was 85%. Unfortunately, false
positive examples (FP=72679) were much higher than true positive
(TP=26721) and that gave us an accuracy that cannot evaluate the
model adequately in the discussed problem, better metrics for such
data give rather low results (Table 1).
Figure 3: ROC curve for graph-based classiﬁer
Table 2: t-SNE representation of embeddings
5 Conclusion and future work
The work is still in the exploratory phase so for the meaningful results
we should wait some time, I stay optimistic despite having not satis-
factory results for this application. The intuition behind the model is
that morphosyntactic form is just another representation of a word with
slightly different information and it enables prediction of the following
morphosyntatic forms similarly to the Word2Vec model for predicting
the following words.
Morphosyntactic embeddings could be used to predict some correct
grammatical forms as suggestions. For future work, there is a chal-
lenge to estimate similarity between morphosyntactic forms and train
model on a bigger corpus, joining sources coming from articles and
books as well as those ones coming from Internet forums. It would
be very interesting to observe how the results depend on the level of
grammatical correctness of the data.
 Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Ef-
ﬁcient estimation of word representations in vector space. CoRR,
 Jin Y. Yen. Finding the k shortest loopless paths in a network.
17:712–716, 01 1970.