Darinka Verdonik

Darinka Verdonik
  • phd
  • University of Maribor

About

49
Publications
8,117
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
236
Citations
Current institution
University of Maribor

Publications

Publications (49)
Article
Full-text available
The paper details the creation of an open access speech corpus for a less-resourced language, covering the diversity in terms of accents, dialects, speech styles and demographic characteristics that exist in the target population. Three primary challenges are identified that impact the time and cost efficiency of such a speech corpus development si...
Article
Full-text available
Sequence-to-sequence models have been applied to many challenging problems, including those in text and speech technologies. Normalization is one of them. It refers to transforming non-standard language forms into their standard counterparts. Non-standard language forms come from different written and spoken sources. This paper deals with one such...
Chapter
Znanstvena monografija Stanje in perspektive uporabe govornih virov v raziskavah govora predstavlja rezultate prvega leta dela v raziskovalnem projektu Temeljne raziskave za razvoj govornih virov in tehnologij za slovenski jezik (J7-4642) kot tudi rezultate raziskovalcev, ki se ukvarjajo z govorom v drugih raziskovalnih projektih, s skupnim glavnim...
Chapter
The aim of this study is to uncover the interaction between DM functions and dialogue segmentation. More precisely, we investigate to what extent DMs are used as autonomous dialogue acts or as part of a larger dialogue act. Dialogue acts are defined as speech acts following Austin’s definition of illocutionary force. We hypothesize that the dialogu...
Article
The research proposed in this paper focuses on pragmatic interlinks between discourse markers and non-verbal behavior. Although non-verbal behavior is recognized to add non-redundant information and social interaction is not merely recognized as the transmission of words and sentences, the evidence regarding grammatical/linguistic interlinks betwee...
Book
Učno gradivo je prilagojeno za študente računalništva in na splošno tehničnih ved. Obravnava vrsto komunikacijskih tem in skuša skoznje osvetliti različne vidike zasebne, poslovne in javne komunikacije, bodisi z vidika tvorca bodisi z vidika naslovnika. Obranava: (1) različne oblike komunikacije na delovnem mestu s poudarkom na delovnih sestankih k...
Chapter
Monografija v prvem delu prinaša znanstvene, v drugem delu pa strokovne prispevke o razvoju in rabi jezikovnih virov in učnih e-okolij za jezikovni pouk slovenščine. V izhodišče postavlja digitalizacijo v jezikoslovju in nove možnosti poučevanja slovenščine ter nakazuje smernice razvoja učnih gradiv. Izpostavljeni so najnovejši jezikovni viri, ki l...
Chapter
Monografija v prvem delu prinaša znanstvene, v drugem delu pa strokovne prispevke o razvoju in rabi jezikovnih virov in učnih e-okolij za jezikovni pouk slovenščine. V izhodišče postavlja digitalizacijo v jezikoslovju in nove možnosti poučevanja slovenščine ter nakazuje smernice razvoja učnih gradiv. Izpostavljeni so najnovejši jezikovni viri, ki l...
Article
Full-text available
Prispevek izhaja iz treh izzivov, ki jih zaznavamo pri pouku slovenščine v višjih razredih osnovnih šol in v srednjih šolah: kako odpraviti napake knjižne norme, ki vztrajajo v pisnih izdelkih učencev; kako izboljšati frazeološko kompetenco; kako izboljšati sporazumevalno jezikovno zmožnost. Ti izzivi so osrednja točka razvoja sodobnega učnega e-ok...
Chapter
Monografija predstavlja sprotne rezultate projekta Slovenščina na dlani (2017–2021), v okviru katerega pripravljamo inovativno prosto dostopno učno e-okolje za jezikovni pouk slovenščine. E-okolje je zasnovano kot obsežna zbirka vaj in nalog iz štirih vsebinskih sklopov (pravopis, slovnica, frazemi in pregovori ter besedila), ki temeljijo na zajema...
Chapter
Full-text available
The present research explores non-verbal behavior that accompanies the management of turns in naturally occurring conversations. To analyze turn management, we implemented the ISO 24617-2 multidimensional dialog act annotation scheme. The classification of the communicative intent of non-verbal behavior was performed with the annotation scheme for...
Research
EVA Corpus 1.0 consists of one episode of an audio/video session plus corresponding orthographic transcriptions with a duration of 57 minutes. The multi-party spontaneous discourse in the recording is from an entertaining evening TV-talk show "A si ti tut not padu", broadcasted by the POP-TV Slovene commercial TV station in 2008, and represents a p...
Chapter
The present paper describes a corpus for research into the pragmatic nature of how information is expressed synchronously through language, speech, and gestures. The outlined research stems from the ‘growth point theory’ and ‘integrated systems hypothesis’, which proposes that co-speech gestures (including hand gestures, facial expressions, posture...
Article
This paper presents an investigation of what is gained in the process of dictionary creation by using a speech reference corpus of one million words in conjunction with a huge written reference corpus. It also analyses how much additional effort this requires. Collecting spoken data takes a great deal of effort, and existing speech corpora are rath...
Article
In the present paper, we investigate a group of markers in spoken interaction, commonly termed general extenders (GEs). We compare their usage in different discourse settings within the reference speech corpus of the Slovene language GOS. The results show that there is a high variability of GE form, but that most forms are rarely used. GEs are gene...
Article
Full-text available
Predmet razprave so teoretsko-metodološka načela, ki so se razvijala v krogih t. i. novofirthijancev, kjer se od vsega začetka opredeljujejo za korpusno analizo, čim manj obremenjeno s predhodnimi jezikoslovnimi teorijami. V prispevku najprej pregledamo dela teh avtorjev, iz katerih izhajajo med drugim slovnica vzorcev (angl. pattern grammar), teor...
Article
V prispevku raziskujemo rabo krščanskega izrazja v frazemih (s poudarkom na pragmatičnih frazemih). Kot dopolnitev predhodnim frazeološkim razpravam, ki temeljijo na slovarskem gradivu oz. pisnih besedilih ali pisnih korpusih, nas zanima raba v vsakdanjem spontano govorjenem jeziku. Rezultati kažejo, da gre za besede, ki so prevladujoče rabljene v...
Article
This paper addresses the problem of statistical machine translation between highly inflected languages. Even when dealing with closely-related language pairs, statistical machine translation encounters problems if the parallel corpus is not big enough. To reduce the problem of data sparsity, we use the approach called factored translation, which ha...
Article
This article studies the use of Christian vocabulary and coinages from them in phrasemes (with an emphasis on pragmatic phrasemes). As an addition to previous studies of phraseology that were based on lexicographic material, or on written texts or written corpuses, this study examines everyday spontaneous spoken language. The results show that thes...
Article
Full-text available
In recent years, building reference speech corpora was an important part of the activities which provided the necessary linguistic infrastructure in many European countries, for languages with many speakers (e.g., French, German, Spanish, Italian) as well as for those with smaller numbers of speakers (e.g., Swedish, Dutch, Czech, Slovak). This pape...
Article
Full-text available
Stalen del razvoja strojnega prevajanja je evalvacija prevodov, pri čemer se v glavnem uporabljajo avtomatski postopki. Ti vedno temeljijo na referenčnem prevodu. V tem prispevku pokažemo, kako zelo različni so lahko referenčni prevodi za področje podnaslavljanja ter kako lahko to vpliva na oceno – ista metrika lahko isti prevajalnik oceni kot neup...
Article
Full-text available
Article brings detailed information about procedures of building Slovenian lexica within the LC-STAR project, and also detailed information about the size of that lexica. University of Maribor joined the LC-STAR project in order to provide appropriate language resources for developing speech-to-speech translation technology for Slovenian language....
Article
Full-text available
The paper provides an overview and critical examination of the concept of context as it developed in various theories of linguistics and discourse analyses. The presented approaches to context belong to two main directions, i.e., social and cognitive. Socially oriented approaches to context do not develop a detailed and coherent theory of context t...
Article
Full-text available
One of the aspects of speech that remains under-researched is the internal variety of speech, i.e. the differences and similarities between different types of speech. This paper aims to contribute to this research by making the comparison between different discourses of Slovene spontaneous speech, focusing on the use of vocabulary. The key word ana...
Article
Full-text available
Different kinds of pragmatic expressions in spoken discourse, such as discourse markers, interjections, topic orientation markers, pragmatic deictics, general extenders, etc., have attracted the attention of researchers over recent decades. However, expressions that have their origins within religions have not as yet been studied from the pragmatic...
Article
Full-text available
This paper presents a framework for the efficient development and representation of morphological and phonetic lexicons, to be used in speech technology applications. Solutions that would be the most appropriate for developing speech technologies for specific language have to be analyzed when developing the lexicons. In the paper issues such as the...
Article
Full-text available
General extenders are expressions such as in tako naprej ‘and so on’, pa to ‘and such’, pa tako ‘and like that’, ali pa nekaj takega ‘or something like that’. The paper provides a survey of the forms and frequency of these expressions in various types of Slovene discourse and through a qualitative analysis sheds light on discursive roles of the mos...
Article
General extenders are expressions such as in tako naprej 'and so on', pa to 'and such', pa tako 'and like that', ali pa nekaj takega 'or something like that'. The paper provides a survey of the forms and frequency of these expressions in various types of Slovene discourse and through a qualitative analysis sheds light on discursive roles of the mos...
Book
Full-text available
V okviru nove zbirke Sporazumevanje je izšla knjiga avtoric dr. Darinke Verdonik in dr. Ane Zwitter Vitez z naslovom Slovenski govorni korpus Gos. Monografija predstavlja snovanje, gradnjo in možnosti uporabe prvega referenčnega govornega korpusa, ki vključuje avtentične posnetke govorjene slovenščine, zajete v najpogostejših govornih situacijah: v...
Article
Full-text available
In this paper, we discuss borderline examples of (mis)understanding where it is not clear whether or not a misunderstanding has occurred, whether or not communication was successful, and where the participants do not try to negotiate an understanding, even though different interpretations are very likely to exist. By analyzing real data, we point o...
Article
Full-text available
The relationships between text or talk and the context are among the basic fields of pragmatic research and an insight into their nature may contribute to a better understanding of language use. In this article, we use the results of an analysis of discourse marker use in two different conversational genres (telephone conversation and television in...
Article
The paper presents a new Slovenian language resource, the Slovenian BNSI Broadcast News database. Its main goal is to produce the necessary language resources for the Slovenian large vocabulary continuous speech recognition in an unrestricted domain. The BNSI Broadcast News database is a result of cooperation between the Faculty of electrical Lngin...
Article
Full-text available
Speech-to-speech translation technology has difficulties processing elements of spontaneity in conversation. We propose a discourse marker attribute in speech corpora to help overcome some of these problems. There have already been some attempts to annotate discourse markers in speech corpora. However, as there is no consistency on what expressions...
Article
Full-text available
Univerza v Mariboru, Fakulteta za elektrotehniko, računalništvo in informatiko, Center za jezikovne tehnologije, Smetanova ul. 17, 2000 Maribor, Slovenija darinka.verdonik@guest.arnes.si Povzetek Članek predstavlja jezikoslovni vidik sestavljanja oblikoslovnega in glasoslovnega slovarja za slovenski knjižni jezik (SImlex in SIflex), ki ju urejamo n...
Article
Full-text available
The paper represents the Turdis database of spontaneous conversations in tourist domain in Slovenian language. Database was built for use in developing speech-to-speech translation components, however it can be used also for developing dialog systems or used for linguistic researches. The idea was to record a database of telephone conversations in...
Article
The aim of this paper is to discuss and specify some pragmatic language categories that could be used as attributes in spontaneous speech corpora, especially the corpora used for developing speech-to-speech translation systems components. When developing the speech-to-speech translation, researchers have to deal with spontaneous (conversational) sp...
Article
Full-text available
This paper presents the Slovenian Broadcast News Database project that was started in year 2002 as cooperation between University of Maribor and Slovenian national broadcaster RTV Slovenia. The resulting database will be used for large vocabulary continuous speech recognition and multimedia database retrieval or archive indexation. First some organ...
Article
Full-text available
This paper presents the SINOD database, which is the first Slovenian non-native speech database. It will be used to improve the performance of large vocabulary continuous speech recogniser for non-native speakers. The main quality impact is expected for acoustic models and recogniser's vocabulary. The SINOD database is designed as supplement to the...
Article
The article represents language resources needed for developing speech-to-speech translation systems. Lately most of researchers try to develop statistical approaches to machine translation, therefore we concentrate on language resources needed for statistical machine translation. Since speech-to-speech machine translation systems are composed from...
Article
Full-text available
Discourse markers are expressions which contribute little or nothing to the content of discourse, but have an important pragmatic function, since they connect the content, organise discourse, express the attitude towards the interlocutor, the content, etc. Even though the number of studies focusing on discourse markers within the framework of pragm...
Article
This paper presents the Slovenian Broadcast News Database project that was started in year 2002 as cooperation between University of Maribor and Slovenian national broadcaster RTV Slovenia. The resulting database will be used for large vocabulary continuous speech recognition and multimedia database retrieval or archive indexation. First some organ...
Article
Full-text available
The paper represents a comparison of morphological characteristics of three highly inflectional Slavic languages: Slovenian, Russian and Czech. We find out that morphology of these languages is very similar. Then we represent SImlex, morphological lexicon for Slovenian with 20.000 lemmas and around 800.000 morphological forms, and SIflex, parallel...

Network

Cited By