How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections. Subtask 1, word-level morpheme segmentation, covered 5 million words in 9 languages (Czech, English, Spanish, Hungarian, French, Italian, Russian,...
It is shown that the mean morpheme length (measured in phonemes) decreases with the increasing length of word types (in morphemes) in Czech texts, i.e. these language units behave according to the Menzerath-Altmann law. The law is not valid in general for word tokens. Some hints towards an interpretation of parameters are presented.
This study deals with the recently proposed concept of so-called Context Specificity of Lemma (CSL). CSL is based on the word embedding technique called Word2vec which enables measuring lexical context similarity between lemmas. Specifically, a recently proposed method Closest Context Specificity (CCS) is applied to a diachronic analysis of Czech t...
Each well defined linguistic concept can be studied quantitatively. Though this way has no end, one must perform the study stepwise. Here we analyze the behavior of adverbs and adverbial expressions and apply the models to Czech texts. The adverbials are classified in 13 classes and we study the class size, the length in individual classes, the pla...