Tomaž Erjavec

Tomaž Erjavec
  • PhD
  • Senior Researcher at Jožef Stefan Institute

About

203
Publications
26,111
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,924
Citations
Current institution
Jožef Stefan Institute
Current position
  • Senior Researcher
Additional affiliations
January 1998 - present
Jožef Stefan Institute
Position
  • Senior Researcher

Publications

Publications (203)
Article
Full-text available
The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 billion words. The corpora are uniformly encoded, contain rich metadata about their 24 thousand speakers, and ar...
Chapter
The central topic of the monograph Paremiology between Tradition and Innovation is proverbs, which today belong to the field of paremiology. The monograph discusses proverbs from the perspective of folklore, linguistics, ethnolinguistics, semiotics, anthropology and digitization methods. For all these fields, it is the digital humanities that enabl...
Book
Full-text available
The central topic of the monograph Paremiology between Tradition and Innovation is proverbs, which today belong to the field of paremiology. The monograph discusses proverbs from the perspective of folklore, linguistics, ethnolinguistics, semiotics, anthropology and digitization methods. For all these fields, it is the digital humanities that enabl...
Conference Paper
Full-text available
The speeches in ParlaMint corpora of parliamentary proceedings are marked by their speaker, and the speakers are then paired with various metadata, also with their time-delimited affiliations with political parties or parliamentary groups. These are stored separately, and are also associated with further information. This paper discusses the additi...
Article
Full-text available
Parliamentary debates represent an essential part of democratic discourse and provide insights into various socio-demographic and linguistic phenomena - parliamentary corpora, which contain transcripts of parliamentary debates and extensive metadata, are an important resource for parliamentary discourse analysis and other research areas. This paper...
Preprint
Full-text available
The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 billion words. The corpora are uniformly encoded, contain rich metadata about their 24 thousand speakers, and ar...
Chapter
Decision-making and opinion-forming are everyday tasks that involve weighing pro and con arguments. The goal of Touché is to foster the development of support-technologies for decision-making and opinion-forming and to improve our understanding of these processes. This fifth edition of the lab features three shared tasks: (1) Human value detection...
Article
The collection of critical editions eZISS came about with the connection of three elements: the awakening of interest in primary sources of Slovene literature; the intensive methodological introduction of the fundamental philosophical-historical sciences, particularly ecdotics or text criticism; and finally the use of contemporary open standards fo...
Article
Full-text available
Ordered collections of machine-readable texts, corpora, are useful in various branches of linguistics. The present paper focuses on the machine-readable form of corpora, above all on their annotation, i.e. adding interpretative information to the text in the corpus. The annotation presented is based on taking into consideration the international st...
Article
Full-text available
Pričujoča tematska številka revije Slovenščina 2.0 se posveča digitalnemu jezikoslovju, hitro rastočemu interdisciplinarnemu področju raziskav na stičišču tradicionalnega jezikoslovja, informacijskih tehnologij in družboslovnih ved. V ospredju digitalnojezikoslovnih raziskav je ohranjanje, analiza in uporaba jezikovnih podatkov, digitalnih artefakt...
Conference Paper
Full-text available
This paper discusses the encoding, validation and development of language resources of the completed ParlaMint I and on-going ParlaMint II CLARIN projects, which centre on the collaborative development of a large set of interoperable corpora of parliamentary proceedings. It focuses on the ParlaMint encoding and the GitHub development platform and t...
Preprint
Full-text available
Abbreviations present a significant challenge for NLP systems because they cause tokenization and out-of-vocabulary errors. They can also make the text less readable, especially in reference printed books, where they are extensively used. Abbreviations are especially problematic in low-resource settings, where systems are less robust to begin with....
Conference Paper
The article discusses the digitization of the collection of Slovenian proverbs from the Institute of Slovenian Ethnography ZRC SAZU. The collection was created from 1947, and its digitization began at the start of the 21st century on the initiative of Marija Stanonik. The departure point of the presented were two Excel spreadsheets with paremiologi...
Chapter
Full-text available
In this chapter we describe the recent developments in language technology infrastructure building for three South Slavic languages - Slovenian, Croatian, and Serbian. These developments are primarily the result of intense coordination between different projects. Our experience shows that the infrastructure for language technologies can be signific...
Article
Full-text available
V prispevku je predstavljen Korpus šolskih besedil slovenskega jezika, specializirani pisni korpus slovenščine v obsegu približno 1,8 milijona pojavnic. Korpus je bil zasnovan v okviru projekta Franček, Jezikovna svetovalnica za učitelje slovenščine in Šolski slovar slovenskega jezika, in sicer kot gradivska osnova za oblikovanje Šolskega slovarja...
Poster
The poster shows how the archival collection of Slovenian paremiological units was digitized and uploaded to the Clarin.si repository.
Article
Full-text available
This paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples...
Article
Full-text available
The aim of this contribution is to reflect on the process of building the multilingual 'European Literary Text Collection' (ELTeC) that is being created in the framework of the networking project 'Distant Reading for European Literary History' funded by COST (European Cooperation in Science and Technology). To provide some background, we briefly in...
Article
Full-text available
Parliamentary proceedings are a rich source of data that can be used by scholars in various humanities and social sciences disciplines. Unlike the sources of most other language corpora, parliamentary proceedings are not subject to copyright or personal privacy protections, and are typically available online, thus making them ideal for compilation...
Conference Paper
Full-text available
This paper presents recent developments and the content of the ssj500k training corpus, the largest and most widely used open-source collection of training data for Slovene language processing, which has been manually annotated with respect to segmentation, tokeni-sation, lemmatisation, JOS morphosyntax and dependency syntax, Universal Dependencies...
Article
The paper presents the KAS corpus of Slovenian academic writing, which consists of almost 65,000 B.A./B.Sc., 16,000 M.A./M.Sc. and 1600 Ph.D. theses (5 million pages or 1.7 billion tokens) gathered from the digital libraries of Slovenian higher education institutions via the Slovenian Open Science portal. We discuss the compilation, meta-data, anno...
Article
Full-text available
Odprta znanost temelji na prosto in odprto dostopnih znanstvenih publikacijah in podatkih. Slednji omogočajo preverjanje rezultatov predhodnih raziskav in njihovo nadgrajevanje, v kontekstu jezikovnih tehnologij in ročno označenih jezikovnih virov pa tudi šolanje novih orodij za procesiranje besedil. Vendar pa je, tako kot za znanstvene objave, tud...
Conference Paper
Full-text available
We describe a new version of the Gigafida reference corpus of Slovene. In addition to updating the corpus with new material and annotating it with better tools, the focus of the upgrade was also on its transformation from a general reference corpus, which contains all language variants including non-standard language, to the corpus of standard (wri...
Preprint
MULTEXT-East language resources, a multilingual dataset for language engineering research, focused on the morphosyntactic level of linguistic description. The MULTEXT-East dataset includes the EAGLES-based morphosyntactic specifications, morphosyntactic lexicons, and an annotated multilingual corpora. The parallel corpus, the novel "1984" by George...
Article
Full-text available
V prispevku je predstavljena korpusna obravnava pisne variantnosti v slovenskem knjižnem jeziku 16. stoletja tako s sinhronega kot z diahronega vidika. Raziskava temelji na ročno pregledanem vzorcu (okoli 14.000 besednih enot) iz Trubarjevih del Ta pervi deil tiga Noviga teſtamenta, 1557, in Hiſhna poſtilla, 1595, in Juričičeve Poſtille, 1578, ter...
Article
Part-of-speech (PoS) tagging of non-standard language with models developed for standard language is known to suffer from a significant decrease in accuracy. Two methods are typically used to improve it: word normalisation, which decreases the out-of-vocabulary rate of the PoS tagger, and domain adaptation where the tagger is made aware of the non-...
Chapter
In this paper we present datasets of Facebook comment threads to mainstream media posts in Slovene and English developed inside the Slovene national project FRENK (the acronym FRENK stands for “FRENK - Raziskave Elektronske Nespodobne Komunikacije” (engl. “Research on Electronic Inappropriate Communication”)) which cover two topics, migrants and LG...
Chapter
This paper presents a dataset and supervised learning experiments for term extraction from Slovene academic texts. Term candidates in the dataset were extracted via morphosyntactic patterns and annotated for their termness by four annotators. Experiments on the dataset show that most co-occurrence statistics, applied after morphosyntactic patterns...
Article
Full-text available
The paper deals with the problem of encoding the verses and textual variants in the critical edition of Foglar's Manuscript, a Styrian Baroque hymn book from the mid-eighteenth century. We first show the diplomatic transcript of the verse in selected problematic cases, after which we present the method applied to produce a critical apparatus for ap...
Article
Full-text available
The paper presents the Parlameter corpus of contemporary Slovene parliamentary proceedings, which covers the VIIth mandate of the Slovene Parliament (2014-2018). The Parlameter corpus offers rich speaker metadata (gender, age, education, party affiliation) and is linguistically annotated (lemmatization, tagging), which boost research in several dig...
Preprint
This paper presents a dataset and supervised learning experiments for term extraction from Slovene academic texts. Term candidates in the dataset were extracted via morphosyntactic patterns and annotated for their termness by four annotators. Experiments on the dataset show that most co-occurrence statistics, applied after morphosyntactic patterns...
Preprint
In this paper we present datasets of Facebook comment threads to mainstream media posts in Slovene and English developed inside the Slovene national project FRENK which cover two topics, migrants and LGBT, and are manually annotated for different types of socially unacceptable discourse (SUD). The main advantages of these datasets compared to the e...
Conference Paper
Full-text available
With the rapid development and increasing accessibility of natural language processing (NLP) techniques, the exploitation of NLP inside electronic lexicography is on a rise. Textual datasets manually annotated with linguistic information are a backbone of the currently dominating paradigm in NLP based on supervised machine learning. However, develo...
Article
The paper presents the results of the Janes project, which aimed to develop language resources and tools for Slovene user generated content. The paper first describes the 200 million word Janes corpus, containing tweets, forum posts, news comments, user and talk pages from Wikipedia, and blogs and blog comments, where each text is accompanied by ri...
Conference Paper
Although recent years have witnessed a growth in the number of computational language resources and tools, a lot still needs to be done, especially with low-density languages. This is the case with all South Slavic languages and especially Montenegrin, the fourth standard of the once Serbo-Croatian language that has been re-codified only recently....
Conference Paper
Full-text available
In this paper we present hr500k, a Croatian reference training corpus of 500 thousand tokens, segmented at document, sentence and word level, and annotated for morphosyntax, lemmas, dependency syntax, named entities, and semantic roles. We present each annotation layer via basic label statistics and describe the final encoding of the resource in Co...
Chapter
Full-text available
V tem poglavju najprej predstavimo splošni postopek in delotok izdelave ročno označenih korpusov (od priprave podatkov, izdelovanja smernic za označevanje, dela z označevalno platformo in poteka označevalne kampanje do pretvorbe v končni format ter objave in distribucije), pri čemer se podrobneje posvetimo največjima tako nastalima korpusoma Janes-...
Article
Full-text available
In this paper, we focus on corpus-linguistic studies that address theoretical questions and on computational linguistic work on corpus annotation, that makes corpora useful for linguistic work. First, we discuss why the corpus linguistic approach was discredited by generative linguists in the second half of the 20th century, how it made a comeback...
Chapter
Full-text available
Creating linguistic annotations requires more than just a reliable annotation scheme. Annotation can be a complex endeavour potentially involving many people, stages, and tools. This chapter outlines the process of creating end-to-end linguistic annotations, identifying specific tasks that researchers often perform. Because tool support is so centr...
Chapter
The chapter presents the MULTEXT-East language resources, a multilingual dataset for language engineering research, focused on the morphosyntactic level of linguistic description. The MULTEXT-East dataset includes the EAGLES-based morphosyntactic specifications, morphosyntactic lexicons, and an annotated multilingual corpora. The parallel corpus, t...
Article
Full-text available
The paper presents the current version of the Slovene corpus of netspeak Janes which contains tweets, forum posts, news comments, blogs and blog comments, and user and talk pages from Wikipedia. First, we describe the harvesting procedure for each data source and provide a quantitative analysis of the corpus. Next, we present automatic and manual p...
Article
Full-text available
Web texts are becoming increasingly relevant sources of information, with web corpora useful for corpus linguistic studies and development of language technologies. Even though web texts are directly accessable, which substantially simplifies the collection procedure compilation of web corpora is still complex, time consuming and expensive. It is c...
Article
Full-text available
The paper presents the current version of the Slovene corpus of netspeak Janes which contains tweets, forum posts, news comments, blogs and blog comments, and user and talk pages from Wikipedia. First, we describe the harvesting procedure for each data source and provide a quantitative analysis of the corpus. Next, we present automatic and manual p...
Conference Paper
20. Michael Beißwenger, Thierry Chanier, Isabella Chiari, Tomaž Erjavec, Darja Fișer, Axel Herold, Nikola Lubešić, Harald Lüngen, Céline Poudat, Egon Stemle, Angelika Storrer and Ciara Wigham, 2016, “Integrating corpora of computer-mediated communication into the language resources landscape: Initiatives and best practices from French, German, Ital...
Article
Full-text available
Creating linguistic annotations requires more than just a reliable annotation scheme. Annotation can be a complex endeavour potentially involving many people, stages, and tools. This chapter outlines the process of creating end-to-end linguistic annotations, identifying specific tasks that researchers often perform. Because tool support is so centr...
Article
Text mining and natural language processing are fast growing areas of research, with numerous applications in business, science and creative industries. This paper presents TextFlows, a web-based text mining and natural language processing platform supporting workflow construction, sharing and execution. The platform enables visual construction of...
Article
Full-text available
p> Main results of MONDILEX project The paper presents the results and recommendations of MONDILEX, a 7FP project that covered six Slavic languages: Bulgarian, Polish, Russian, Slovak, Slovene, and Ukrainian. The paper summarizes the research undertaken on standardisation and integration of Slavic language resources and on the establishment of...
Article
Full-text available
p> MONDILEX – towards the research infrastructure for digital resources in Slavic lexicography The paper presents activities of the EU project MONDILEX, Conceptual Modelling of Networking of Centres for High-Quality Research in Slavic Lexicography and their Digital Resources . The main objective of MONDILEX is to design the conceptual scheme of...
Article
Full-text available
p> The Japanese-Slovene dictionary jaSlo: its development, enhancement and use The paper presents the on-line Japanese-Slovene dictionary jaSlo, in particular the ways in which it has been used, and how it has been extended with examples mined from a parallel corpus. The paper first describes jaSlo and the structure of its dictionary entry, its...
Data
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Ma...
Article
We propose a language-independent word normalisation method and exemplify it on modernising historical Slovene words. Our method relies on character-level statistical machine translation (CSMT) and uses only shallow knowledge. We present relevant data on historical Slovene, consisting of two (partially) manually annotated corpora and the lexicons d...
Article
The availability of large collections of text (language corpora) is crucial for empirically supported linguistic investigations of various languages; however, such corpora are complicated and expensive to collect. In recent years corpora made from texts on the World Wide Web have become an attractive alternative to traditional corpora, as they can...
Article
Full-text available
The paper describes the combined results of several projects which constitute a basic language resource infrastructure for printed historical Slovene. The IMP language resources consist of a digital library, an annotated corpus and a lexicon, which are interlinked and uniformly encoded following the Text Encoding Initiative Guidelines. The library...
Conference Paper
This paper presents the results of the standardization procedure of Slovene tweets that are full of colloquial, dialectal and foreign-language elements. With the aim of minimizing the human input required we produced a manually normalized lexicon of the most salient out-of-vocabulary (OOV) tokens and used it to train a character-level statistical m...
Article
Full-text available
Članek predstavlja algoritem in implementacijo programa za razpoznavanje imen v slovenskem jeziku s pomočjo strojnega učenja. Nadzorovani pristop na osnovi pogojnih naključnih polj je naučen na označenem korpusu ssj500k. V korpusu, ki je prosto dostopen pod licenco Creative Commons CC-BY-NC-SA, so pri besednih pojavnicah poleg oblikoskladenjskih oz...
Article
Full-text available
V prispevku predstavimo referenčne, specializirane in vzporedne korpuse, do katerih je mogoče dostopati prek konkordančnikov na strežniku nl.ijs.si. Večina korpusov vsebuje besedila v slovenščini, nekaj pa je tudi tujejezičnih. Mnogi od korpusov obstajajo že dalj časa, vendar so sedaj na novo označeni, pri nekaterih so dodana nova besedila, nekater...
Article
Full-text available
Raziskave in razvoj na področju jezikovnih tehnologij se danes za jezike s širokim krogom govorcev pospešeno prenašajo v komercialne sisteme, ki postajajo vse bolj razširjeni. Denimo, rešitve samodejne prepoznave govora in samodejne sinteze govora se množično vgrajujejo v cenovno ugodne programske pakete, namenjene predvsem uporabi na osebnih račun...
Article
The paper gives a thorough examination of the Register of Slovenian-language manuscripts from the 17th and 18th centuries from different points of view: it is presented as a digital repository in humanities disciplines available for searching (digital library) and as a methodological framework of further scholarly research and discoveries in the fi...
Article
The paper deals with the issues of digital curation, including storage formats, presentation, and access rights, in the frame of the case-study of two projects on digitisation of written materials: the Scholarly digital editions of Slovenian literature and the Slovenian biographical lexicon. Three basic aspects of digital curation, including preser...
Article
The paper presents three language resources enabling better full-text access to digitised printed historical Slovenian texts: a hand-annotated corpus, a hand-annotated lexicon of historical words and a collection of transcribed texts. The aim of the resources is twofold: on one hand they support empirical linguistic research (corpus, collection) an...
Article
Full-text available
This paper presents the Register of Early Modern Slovenian Manuscripts, which includes manuscripts from the 17th and 18th centuries that have been overlooked by scholars focused on printed books from the same era. The Register attempts to address this gap in Slovenian manuscript studies by describing these unknown and forgotten early modern manuscr...
Article
Full-text available
The paper presents a set of integrated on-line language resources targeted at Japanese language learners, primarily those whose mother tongue is Slovene. The resources consist of the on-line Japanese-Slovene learners’ dictionary jaSlo and two corpora, a 1 million word Japanese-Slovene parallel corpus and a 300 million word corpus of web pages, wher...
Conference Paper
This paper describes a Web-based editor called CoBaLT (Corpus-Based Lexicon Tool), developed to construct corpus-based computational lexica and to correct word-level annotations and transcription errors in corpora. The paper describes the tool as well as our experience in using it to annotate a reference corpus and compile a large lexicon of histor...
Article
The paper presents the MULTEXT-East language resources, a multilingual dataset for language engineering research, focused on the morphosyntactic level of linguistic description. The MULTEXT-East dataset includes the morphosyntactic specifications, morphosyntactic lexica, and a parallel corpus, the novel “1984” by George Orwell, which is sentence al...
Article
This paper presents a web service for automatic linguistic annotation of Slovene and English texts. The web service enables text up-loading in a number of different input formats, and then converts, tokenises, tags and lemmatises the text, and returns the annotated text. The paper presents the ToTrTaLe annotation tool, and the implementation of the...
Conference Paper
Full-text available
Web corpora have become an attractive source of linguistic content, yet are for many languages still not available. This paper introduces two new annotated web corpora: the Croatian hrWaC and the Slovene slWaC. Both were built using a modified standard “Web as Corpus” pipeline having in mind the limited amount of available web data. The modificatio...
Conference Paper
The paper describes a tool developed to process historical (Slovene) text, which annotates words in a TEI encoded corpus with their modern-day equivalents, morphosyntactic tags and lemmas. Such a tool is useful for developing historical corpora of highly-inflecting languages, enabling full text search in digital libraries of historical texts, for m...
Conference Paper
Full-text available
This paper describes the modeling of the morphosyntactic annotations of the MULTEXT-East corpora and lexicons as an OWL/DL ontology. Formalizing annotation schemes in OWL/DL has the advantages of enabling formally specifying interrelationships between the various features and making logical inferences based on the relationships between them. We sho...
Article
Full-text available
Part-of-Speech (PoS) or, better, morphosyntactic tagging is the process of assigning morphosyntactic categories to words in a text, an important pre-processing step for most human language technology applications. PoS-tagging of Slovene texts is a challenging task since the size of the tagset is over one thousand tags (as opposed to English, where...
Article
Full-text available
Lemmatisation is the process of finding the normalised forms of words appearing in text. It is a useful preprocessing step for a number of language engineering and text mining tasks, and especially important for languages with rich inflectional morphology. This paper presents a new lemmatisation system, LemmaGen, which was trained to generate accur...
Conference Paper
Full-text available
After a brief overview of the elements of modern grid computing, a number of common use-cases of natural language processing tasks running on the grid are presented, notably corpus annotation with morpho-syntactic tagging (600+ million-word corpus in one day), n-gram statistics processing of a corpus and web-accessible services with annotation and...
Conference Paper
Full-text available
The JOS language resources are meant to facilitate developments of HLT and corpus linguistics for the Slovene language and consist of the morphosyntactic specifications, defining the Slovene morphosyntactic features and tagset; two annotated corpora (jos100k and jos1M); and two web services (a concordancer and text annotation tool). The paper intro...
Article
Full-text available
The article discusses possibilities of using the Grid platform for Nat-ural Language Processing tasks. Legal problems concerning distribution of copy-righted texts are described and possible solutions including encryption of data are outlined.
Article
Full-text available
In this article we analyze common human language technology re- quirements and the possibility of implementing them using Grid infrastructure. Different possibilities for the setup of an execution envir onment are treated and the standard PKI based Grid security approach is explained, with an emphasis of securing data access in a potentially untrus...
Article
Full-text available
We describe the results of a short-term SEEERAnet project the aim of which was to investigate the feasibility of machine translation (MT) research and development for several South Slavic and Balkan languages, more precisely Romanian, Bulgarian, Slovene, Greek and Serbian. For these languages MT systems are scarce and for some of them even non-exis...
Article
Full-text available
It is widely recognized that the proliferation of annotation schemes runs counter to the need to re-use language resources, and that standards for linguistic annotation are becoming increasingly mandatory. To answer this need, we have developed a framework comprised of an abstract model for a variety of different annotation types (e.g., morpho-synt...
Article
Full-text available
Of all the major world languages, Japanese is lagging behind in terms of publicly accessible and searchable corpora. In this paper we describe the development of JpWaC, a large corpus of 400 million words of Japanese web text, and its encoding for the Sketch Engine. The Sketch Engine is a web-based corpus query tool that supports fast concordancing...
Conference Paper
Full-text available
In our previous work we introduced a hybrid, GA&ILP-based approach for learning of stem-suffix segmentation rules from an unmarked list of words. Evaluation of the method was made difficult by the lack of word corpora annotated with their morphological segmentation. Here the hybrid approach is evaluated indirectly, on the task of tag prediction. A...
Article
Full-text available
Part-of-speech (PoS) or, better, morphosyntactic tagging is the process of assigning morphosyntactic categories to words in a text, an important pre-processing step for most human language technology applications. PoS-tagging of Slovene texts is a challenging task since the size of the tagset is over one thousand tags (as opposed to English, where...
Article
Full-text available
Lemmatisation is the process of finding the normalised forms of wordforms as they appear in text. It is a useful pre-processing step for a large number of language engineering tasks, and especially important for languages with rich inflection morphology. This paper presents a machine learning approach to automated word lemmatisation using a Ripple...
Conference Paper
Full-text available
The JOS morphosyntactic resources for Slovene consist of the specifications, lexicon, and two corpora: jos100k, a 100,0 00 word balanced monolingual sampled corpus annotated with hand validated morphosyntactic descriptions (MSDs) and lemmas, and jos1M, the 1 million word partially hand validated corpus. The two corpora have been sampled from the 60...
Conference Paper
Full-text available
This paper reports the principles behind designing a tagset to cover Russian morphosyntactic phenomena, modifications of the core tagset, and its evaluation. The tagset and associated morphosyntactic specifications are based on the MULTEXT-East fra mework, while the decisions in designing it were aimed at achieving a balance between parameters impo...

Network

Cited By