Spela Arhar Holdt

Spela Arhar Holdt
University of Ljubljana

PhD

About

58
Publications
7,845
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
228
Citations
Citations since 2017
39 Research Items
149 Citations
20172018201920202021202220230102030
20172018201920202021202220230102030
20172018201920202021202220230102030
20172018201920202021202220230102030

Publications

Publications (58)
Article
Full-text available
The current study examines whether the fear of being laughed at (gelotophobia) can be assessed reliably and validly by means of a self-report instrument in different countries of the world. All items of the GELOPH (Ruch and Titze, GELOPH〈46〉, University of Düsseldorf, 1998; Ruch and Proyer, Swiss Journal of Psychology 67:19–27, 2008b) were translat...
Article
Full-text available
6. junija 2018 je na Inštitutu Jožef Stefan potekal dogodek, na katerem so bili javnosti predstavljeni cilji in prvi rezultati projekta Nova slovnica sodobne standardne slovenščine: viri in metode (ARRS J6-8256). Namen projekta je razviti jezikoslovno metodologijo za računalniško podprto analizo sodobne slovenščine, kakršna je zajeta v referenčnih...
Article
Full-text available
The Thesaurus of Modern Slovene is a responsive dictionary: it is compiled automatically from existing language resources while further developments of the dictionary include user participation. Many of the features introduced by the responsive model are new to the Slovene language community (e.g. data is extracted automatically and includes some e...
Conference Paper
Full-text available
The paper presents the "Game of Words" (in Slovene: Igra besed), a mobile application purposed for a gamified improvement of two automatically compiled dictionaries for Slovene: the Collocations Dictionary of Modern Slovene and the Thesaurus of Modern Slovene. We provide a brief history of the game, and introduce its two modules that utilize colloc...
Conference Paper
Full-text available
We describe a new version of the Gigafida reference corpus of Slovene. In addition to updating the corpus with new material and annotating it with better tools, the focus of the upgrade was also on its transformation from a general reference corpus, which contains all language variants including non-standard language, to the corpus of standard (wri...
Chapter
Full-text available
The 2021 EUROCALL conference engaged just under 250 speakers from 40 different countries. Cnam Paris and Sorbonne Université joined forces to host and organise the event despite the challenging context due to the Covid-19 pandemic. Originally programmed to be held on site in the heart of Paris, France, the EUROCALL organising team and executive com...
Chapter
Monografija v prvem delu prinaša znanstvene, v drugem delu pa strokovne prispevke o razvoju in rabi jezikovnih virov in učnih e-okolij za jezikovni pouk slovenščine. V izhodišče postavlja digitalizacijo v jezikoslovju in nove možnosti poučevanja slovenščine ter nakazuje smernice razvoja učnih gradiv. Izpostavljeni so najnovejši jezikovni viri, ki l...
Chapter
This paper aims at attracting attention towards crowdsourcing opportunities for the language teaching community. It presents and discusses a specific crowdsourcing paradigm designed to mass-produce exercise content for different languages by involving their language learners online. The discussion of the implicit crowdsourcing paradigm is framed by...
Conference Paper
Full-text available
Corpora are valuable sources for the development of language learning materials (e.g., books, grammars, dictionaries, exercises), because they contain language as produced in natural contexts. Even though corpora are getting larger, mainly due to crawling data from the web, their pedagogical use remains rather challenging. Not all texts are appropr...
Conference Paper
Full-text available
V prispevku predstavljamo rezultate projekta "Slovar sopomenk sodobne slovenščine: od skupnosti za skupnost", ki je izvedel vrsto diseminacijskih aktivnosti z namenom ozaveščanja javnosti o izidu Slovarja sopomenk sodobne slovenščine in o novem konceptu odzivnega slovarja, ki je digitalno zasnovan in v razvoj vključuje tudi uporabniško skupnost. Po...
Conference Paper
Full-text available
Prispevek predstavlja pripravo in vsebino referenčnega seznama pogostih splošnih besed za slovenščino. Seznam smo pripravili s prekrivanjem najpogostejših 10.000 lem iz štirih slovenskih besedilnih korpusov: uravnoteženega referenčnega korpusa pisne slovenščine Kres, referenčnega korpusa govorjene slovenščine GOS, korpusa računalniško posredovane k...
Conference Paper
Full-text available
This paper presents recent developments and the content of the ssj500k training corpus, the largest and most widely used open-source collection of training data for Slovene language processing, which has been manually annotated with respect to segmentation, tokeni-sation, lemmatisation, JOS morphosyntax and dependency syntax, Universal Dependencies...
Article
Full-text available
The paper presents a cross-European survey on teachers and crowdsourcing. The survey examines how familiar language teachers are with the concept of crowdsourcing and addresses their attitude towards including crowdsourcing into language teaching activities. The survey was administrated via an online questionnaire and collected volunteers' data on:...
Article
Full-text available
The paper is based on a survey conducted within the framework of the basic research project Collocations as a Basis for Language Description: Semantic and Temporal Perspectives (KOLOS; J6-8255). It presents a qualitative analysis of a user evaluation of the interface of the Collocations Dictionary of Modern Slovene (CDS). It discusses an alternativ...
Conference Paper
Full-text available
We introduce in this paper a generic approach to combine implicit crowdsourcing and language learning in order to mass-produce language resources (LRs) for any language for which a crowd of language learners can be involved. We present the approach by explaining its core paradigm that consists in pairing specific types of LRs with specific exercise...
Conference Paper
Full-text available
The Thesaurus of Modern Slovene is the largest open-source digital collection of Slovene synonyms, published in March 2018 by the Centre of Language Resources and Technologies of the University of Ljubljana. The Thesaurus was initially compiled entirely automatically and allows users to contribute toward improving the resource by adding suggestions...
Article
Full-text available
This paper is an extended version of a conference paper presenting the categorization of verbal multi-word expressions (VMWEs) according to the PARSEME COST Action Shared Task 1.1 Guidelines. The categorization is universal but takes into account the characteristics of the individual languages included in it. The Shared Task was used to annotate ov...
Article
Full-text available
The majority of existing readability measures are designed for English texts. We aim to adapt and test the readability measures on Slovene. We test ten well-known readability formulas and eight additional readability criteria on five types of texts: children's magazines, general magazines, daily newspapers, technical magazines, and transcriptions o...
Article
The paper focuses on collocations typical of Slovene computer-mediated communication (CMC), which comprises communication via social networks, forums, blogs, etc. The study examines the CMC-specific collocates of the most frequent Slovene nouns, as well as collocates of CMC-typical nouns. Collocations were automatically extracted with procedures ba...
Article
Full-text available
Poseben tematski sklop letošnje prve številke vsebuje osem kratkih znanstvenih prispevkov, ki pregledno opisujejo trenutno stanje na področju leksikografije na Danskem, Švedskem, Norveškem, Hrvaškem, v Grčiji, Baskiji, Estoniji in Braziliji. Prispevki so nastali kot rezultat znanstvenega sodelovanja v evropski mreži ENeL – European Network of e-Lex...
Article
Full-text available
Prispevek predstavlja prvi korak k dopolnjevanju leksikona Sloleks z oblikoslovnimi vzorci, in sicer na primeru samostalnikov. Vzorci so v prvem koraku strojno pridobljeni iz leksikona samega na osnovi izbranih razločevalnih lastnosti (oblikoskladenjskih oznak in spremenljivih delov besednih oblik). Sledi ročno razvrščanje, v katerem (a) ločimo sis...
Article
Full-text available
V prispevku opišemo leksikalno analizo izluščenih podatkov za določen kolokacijski okvir iz korpusov Janes in Kres ter predstavimo rezultate, ki so zanimivi za spremljanje leksikalnih novosti v slovenski leksiki in za njeno posodobitev v slovarjih. Izluščene podatke smo analizirali primerjalno glede na aktualne slovarje za slovenščino z vidika še n...
Article
Full-text available
The article presents the results of a survey on dictionary use in Europe, focusing on general monolingual dictionaries. The survey is the broadest survey of dictionary use to date, covering close to 10,000 dictionary users (and non-users) in nearly thirty countries. Our survey covers varied user groups, going beyond the students and translators who...
Article
Full-text available
The latest reference corpus of written Slovene, the Gigafida corpus, was created as part of the ‘Communication in Slovene’ project. In the same project, a web concordancer was designed for the broadest possible use, and tailored to the needs and abilities of user groups such as translators, writers, proofreaders and teachers. Two years after the co...
Conference Paper
Full-text available
V prispevku predstavljamo kategorije glagolskih večbesednih enot, kot so bile oblikovane v okviru mednarodne COST akcije PARSEME Shared Task 1.1. za 26 različnih jezikov, in izdelavo učnega korpusa glagolskih večbesednih enot za slovenščino. Osnovni namen prispevka je opisati prve kvantitativne in kvalitativne analize, ki bodo predstavljale izhodiš...
Conference Paper
Full-text available
V prispevku predstavljamo Kolokacijski slovar sodobne slovenščine, nov leksikalni vir za slovenščino. Vir temelji na uporabi sodobnih leksikografskih metod, ki vključujejo avtomatsko luščenje leksikalnih podatkov iz korpusov, množičenje in hitro odzivnost na spremembe v jeziku. Med pomembnejšimi lastnostmi korpusa je prikazovanje gesel v različnih...
Conference Paper
Full-text available
The majority of existing readability measures are explicitly designed for and tested on English texts. The aim of our paper is to adapt and test the readability measures on Slovene. We test a set of 10 well-known readability formulas and 8 additional readability criteria on different types of texts: children's magazines, general magazines, daily ne...
Chapter
Full-text available
Poglavje predstavlja označevanje in analizo skladenjskih značilnosti računalniško posredovane slovenščine. Na ravni besednega reda najprej predstavimo pripravo korpusa Janes-Syn in prilagoditve označevalnega sistema specifikam računalniško posredovane slovenščine, nato pa preverimo, kako trije neodvisni označevalci razumejo besednoredno zaznamovano...
Chapter
Full-text available
V tem poglavju najprej predstavimo splošni postopek in delotok izdelave ročno označenih korpusov (od priprave podatkov, izdelovanja smernic za označevanje, dela z označevalno platformo in poteka označevalne kampanje do pretvorbe v končni format ter objave in distribucije), pri čemer se podrobneje posvetimo največjima tako nastalima korpusoma Janes-...
Conference Paper
Full-text available
With the rise of digital media in the last decades, many language-related discussions have found home on various fora and social media such as Facebook, where users can participate in a shared-interest group to discuss language use, problems and resources. The posts in these groups are formulated by language users as a genuine response to a specifi...
Conference Paper
Full-text available
By presenting the Thesaurus of Modern Slovene, the largest open-access collection of Slovene synonyms, this paper describes the concept of a responsive dictionary, a dictionary that allows its data to continuously respond to the changes in language and the feedback from the language community. We begin by briefly summarizing the method of its const...
Conference Paper
Full-text available
The paper presents the compilation of the Collocations Dictionary of Modern Slovene, a new resource targeting the language production needs of Slovene speakers. An important aspect of the compilation of the dictionary is the immediate publication of all the entries, from automatic, postprocessed, finalized by lexicographers and so on, and indicatin...
Article
Full-text available
Prispevek predstavlja rezultate slovenskega dela mednarodne raziskave o odnosu jezikovnih uporabnikov do splošnih enojezičnih slovarjev. Raziskava je potekala leta 2017 v okviru dejavnosti Evropske mreže za e-leksikografijo (ENeL). V Sloveniji je vprašalnik izpolnilo 619 posameznic in posameznikov, med katerimi so tako redni slovarski uporabniki ko...
Article
Full-text available
A Survey on the Use of Slovene (the project “Language Policy in the Republic of Slovenia and the Needs of the Speakers of Slovene”)
Article
Full-text available
Odziv na “Anketo o slovenščini” projekta Jezikovna politika Republike Slovenije in potrebe uporabnikov
Article
Full-text available
Prispevek se ukvarja z možnostmi uporabe korpusa šolskih pisnih izdelkov Šolar za namene ugotavljanja procesov usvajanja kolokacij v slovenskem jeziku ter s potencialnim vplivom korpusno pridobljenih podatkov o kolokacijah na jezikovni pouk slovenščine. Na številčno omejenem gradivu zvez pridevnik + samostalnik in samostalnik + samostalnik sta bili...
Chapter
The chapter highlights the potential of corpus-based resources for language education in K-12, more specifically for L1 teaching in the higher grades of elementary school and in the secondary school. Presented are two freely available online resources that were recently developed for teaching and learning Slovene as L1. Firstly, the Šolar corpus (w...
Article
Full-text available
Terminology in Professional Life: The Status of the Present and the Needs of the Future
Article
Full-text available
V tednu med 4. in 8. julijem 2016 je pod okriljem Oddelka za prevajalstvo Filozofske fakultete Univerze v Ljubljani in s finančno podporo slovenske raziskovalne infrastrukture Clarin.si potekal jezikoslovni raziskovalni tabor za dijake in študente. Tabor, katerega osrednja tema je bila spletna slovenščina, je bil letos že drugi po vrsti (poročilo o...
Article
Full-text available
V sklopu konference Slovenščina na spletu in v novih medijih je 27. novembra 2015 v dvorani Zemljepisnega muzeja GIAM ZRC SAZU potekala okrogla miza z naslovom Slovenščina Janes: pogovorna, nestandardna, spletna ali spretna? K razpravi je bilo povabljenih pet strokovnjakov in strokovnjakinj s področja slovenskega jezikoslovja: dr. Helena Dobrovoljc...
Article
Full-text available
The main objective of this article is to assess the value of the Janes corpus for research in the field of language standardization. Unlike the existing reference corpora of written Slovenian, the newly available Janes corpus of user-generated content mostly consists of texts that have not been modified by a proofreading expert; it therefore offers...
Article
This paper explores the possibility of identifying lexicographical needs of language users by analysing user-generated content in digital media, and presents an analysis of more than 1,000 language-related questions and comments posted on Slovene language advice sites, Facebook groups, and news forums. The questions stem from authentic situations o...
Article
Full-text available
V sklopu evropskega finančnega mehanizma COST, ki podpira meddržavno in meddisciplinarno povezovanje evropskih raziskovalcev oz. raziskav,1 med leti 2015 in 2018 poteka tudi aktivnost IS1401: Strengthening Europeans’ Capabilities by Establishing the European Literacy Network (ELN).2 Glavni namen aktivnosti je povezati raziskovalce, ki se na različn...
Article
Full-text available
V prispevku prikazujemo analizo luščenja eno- in večbesednih terminoloških kandidatov, ki smo ga izvedli za potrebe priprave terminološke podatkovne zbirke odnosov z javnostmi na podlagi korpusa KoRP z luščilnikom LUIZ. Podrobneje se posvečamo dvojemu: (a) izluščenim enobesednim samostalniškim terminološkim kandidatom, katerih seznam primerjamo s p...
Article
Full-text available
Ena izmed oblikoslovnih posebnosti slovenskega jezika je možnost rabe končnice -je namesto -i v imenovalniku množine pri določenih samostalnikih prve moške sklanjatve (npr. študenti – študentje). Prispevek predstavlja rezultate korpusne analize, ki kaže, kateri slovenski samostalniki se v sodobni slovenščini pojavljajo s končnico-je in kako jih je...
Conference Paper
Full-text available
Mihael Ar~an Unit for Natural Language Processing, Galway UDK 811.163.6'322'373.74'374 Ra~unalni{ka leksikografija je meddisciplinarno podro~je, ki se osredoto~a na avtomatizacijo leksikografskih postopkov in pripravo leksikalnih podatkovnih zbirk razli~nih vrst. V prispevku pred-stavljava postopek avtomatskega pridobivanja besednih zvez samostalni...
Article
Full-text available
***SLOVENSKO: Prispevek se osredotoča na možnosti uporabe besedilnega korpusa oz. korpusnih podatkov pri pouku slovenskega jezika. Korpus v razredu ponuja v prvi vrsti dopolnitev obstoječemu jezikovnemu opisu, saj prinaša za jezikovni pouk nepogrešljive sodobne podatke o realni jezikovni rabi, posledično pa nove možnosti razmišljanja o jeziku: učen...
Article
Full-text available
Trojina, zavod za uporabno slovenistiko UČNI KORPUS SSJ IN LEKSIKON BESEDNIH OBLIK ZA SLOVENŠČINO Glavni namen prispevka je predstavitev priprave učnega korpusa ter leksikona besednih oblik za slovenščino. 400.000 besed obsegajoči korpus SSJ predvideva štirinivojsko označenost: lematizacijo, označenost na oblikoskladenjski ter skladenjski ravni ter...
Article
Full-text available
The paper describes the FidaPLUS corpus which is an upgrade of the Slovenian reference corpus. The corpus has been improved on various levels: size, up-to- dateness, quality of linguistic annotation (lemmatization, POS-tagging), availability and user-friendliness of the on-line concordancer. It has also been implemented in the Sketch Engine softwar...
Article
Full-text available
Prispevek predstavlja korpus FidaPLUS, ki je nadgradnja slovenskega referenčnega korpusa. Korpus, ki ga na eni strani odlikujejo velika obsežnost, ažurnost, potrebna jezikoslovna označenost ter uravnoteženost in heterogenost, na drugi zmogljiv in informacijsko podprt konkordančnik, je na internetu prosto dostopen za splošno uporabo. V članku se osr...
Article
Full-text available
***SLOVENSKO: Za raziskovanje jezika v posebnih funkcijah je referenčni korpus – kot uravnotežen reprezentativni vzorec besedil določenega diskurzivnega prostora – sam po sebi nezadosten, saj ne prinaša dovolj velikega števila besedil s posameznih predmetnih področij. Ena izmed metod, ki jo v takšnih primerih raziskovalec lahko uporabi, je gradnja...
Article
Full-text available
Povzetek Program Klepec je klepetanju namenjen programirani sogovornik, ki za jezik komunikacije z uporabnikom uporablja slovenščino. Nastal je v sklopu projekta KOLOS, katerega cilj je omogočiti komunikacijo med človekom ter računalnikom v naravnem jeziku. V članku predstavljava trenutno stanje programa ter nakazujeva smernice za nadaljnji razvoj,...

Network

Cited By

Projects

Projects (5)
Project
Our research combines knowledge from the fields of linguistics, computer and information sciences, and education to support the development of language resources and technologies for the contemporary Slovene language.
Project
The project developed corpus-based methodology and datasets for analysis of contemporary Slovene.