About
78
Publications
13,681
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
318
Citations
Introduction
Current institution
Additional affiliations
September 1997 - present
January 1997 - present
Publications
Publications (78)
Because of the prevalence of depression, its often-chronic course, relapse and associated disability, early detection and non-intrusive monitoring is a crucial tool for timely diagnosis and treatment, remission of depression and prevention of relapse. In this way, its impact on quality of life and well-being can be limited. Current attempts to use...
The importance and value of real-world data in healthcare cannot be overstated because it offers a valuable source of insights into patient experiences. Traditional patient-reported experience and outcomes measures (PREMs/PROMs) often fall short in addressing the complexities of these experiences due to subjectivity and their inability to precisely...
With spoken language interfaces, chatbots, and enablers, the conversational intelligence became an emerging field of research in man-machine interfaces in several target domains. In this paper, we introduce the multilingual conversational chatbot platform that integrates Open Health Connect platform and mHealth application together with multimodal...
BACKGROUND
Due to the prevalence of depression, its often chronic course, re-occurrences, and associated disability, early detection and non-intrusive monitoring present a crucial tool in timely diagnosing and treatment, remission of depression, prevention of relapse, and therefore limiting its impact on quality of life and well-being. Existing suc...
This paper focuses on gaining new knowledge through observation, qualitative analytics, and cross-modal fusion of rich multi-layered conversational features expressed during multiparty discourse. The outlined research stems from the theory that speech and co-speech gestures originate from the same representation; however, the representation is not...
Large acoustic inventories must be used to produce speech close to natural quality. However, the concatenation cost space grows exponentially with the number of acoustic units in the acoustic inventory, increasing the latency of the unit selection algorithm, making algorithms unusable in real-time end-to-end systems. Even when data compression tech...
The research proposed in this paper focuses on pragmatic interlinks between discourse markers and non-verbal behavior. Although non-verbal behavior is recognized to add non-redundant information and social interaction is not merely recognized as the transmission of words and sentences, the evidence regarding grammatical/linguistic interlinks betwee...
Collection of patient-reported outcomes (PROs) remotely and their usage in the clinical workflow provide an improvement on both patient’s quality of life and cancer care. However, adoption of collecting PROs into the clinical workflow is rare, and existing works still have a lot of issues providing a holistic approach. This paper offers enhancement...
Patient-reported outcomes (PROs) and their use in the clinical workflow can improve cancer survivors’ outcomes and quality of life. However, there are several challenges regarding efficient collection of the patient-reported outcomes and their integration into the clinical workflow. Patient adherence and interoperability are recognized as main barr...
When human-TV interaction is performed by remote controller and mobile devices only, the interactions tend to be mechanical, dreary and uninformative. To achieve more advanced interaction, and more human-human like, we introduce the virtual agent technology as a feedback interface. Verbal and co-verbal gestures are linked through complex mental pro...
The present research explores non-verbal behavior that accompanies the management of turns in naturally occurring conversations. To analyze turn management, we implemented the ISO 24617-2 multidimensional dialog act annotation scheme. The classification of the communicative intent of non-verbal behavior was performed with the annotation scheme for...
In data-driven corpus-based text-to-speech synthesis systems, the main issue is to select the most natural-sounding sequence of acoustic units without unnatural acoustic transitions, and to minimize all acoustic mismatches at the concatenation points. Unit selection algorithms incorporating unit selection cost functions have been known to synthesiz...
EVA Corpus 1.0 consists of one episode of an audio/video session plus corresponding orthographic transcriptions with a duration of 57 minutes. The multi-party spontaneous discourse in the recording is from an entertaining evening TV-talk show "A si ti tut not padu", broadcasted by the POP-TV Slovene commercial TV station in 2008, and represents a p...
The present paper describes a corpus for research into the pragmatic nature of how information is expressed synchronously through language, speech, and gestures. The outlined research stems from the ‘growth point theory’ and ‘integrated systems hypothesis’, which proposes that co-speech gestures (including hand gestures, facial expressions, posture...
A major drawback of corpus-based speech synthesis systems is the use of large acoustic
inventories, and currently one of the main challenges is the optimal representation of concatenation costs
associated with units in the acoustic inventory. These concatenation costs are used to evaluate spectral
mismatches between the acoustic units to be concate...
This paper outlines a novel framework that has been designed to create a repository of “gestures” for embodied conversational agents. By utilizing it, the virtual agents can sculpt conversational expressions incorporating both verbal and non-verbal cues. The 3D representations of gestures are captured in EVA Corpus, and then stored as a repository...
In order to attack these issues, a comprehensive multimodal knowledge source is used that is based on video data of spontaneous multi-party conversations, together with its utilization and information signals extraction through EVA annotation scheme [14], which has been defined, based on several theories in corpus linguistics, psycho-linguistics, a...
This paper outlines a novel framework that has been designed to create a repository of ''gestures'' for embodied conversational agents. By utilizing it, the virtual agents can sculpt conversational expressions incorporating both verbal and non-verbal cues. The 3D representations of gestures are captured in EVA Corpus, and then stored as a repositor...
We are looking for a candidate to apply for a job as part of the H2020 project. Number of posted positions: 1. Job Location is at the University of Maribor, Faculty of Electrical Engineering, Computer and Information Science, Laboratory for Digital Signal Processing. Type of employment is full-time. Duration of the employment: full time fixed term...
We are looking for a researcher to apply for a research position on the public call "Javni razpis za spodbujanje raziskovalcev na začetku kariere 2.1. (CALL 5442-1/2018)".
The researcher must have interest in the following topics:
• conversational dialog systems and dialogue managers,
• machine learning (especially deep learning),
• natural langua...
In order to engage with a human user on more personal level, natural HCI is starting to virtualize itself and is utilizing the potential of entities resembling human collocutors in interaction. In particular through human-likeness, these entities represent the multimodal interaction models, which are capable to adapt to user’s context and to facili...
Conversation is becoming one of the key interaction modes in HMI. As a result, the conversational agents (CAs) have become an important tool in various everyday scenarios. From Apple and Microsoft to Amazon, Google, and Facebook, all have adapted their own variations of CAs. The CAs range from chatbots and 2D, carton-like implementations of talking...
WSEAS Transactions on Environment and Development
Embodied conversational agents are virtual entities that tend to imitate as many features of face-face dialogs as possible. In order to achieve this goal, the ability to reproduce synchronized verbal and co-verbal signals coupled into conversational behavior becomes essential. Further, signals such as social cues, attitude (emotions), personality,...
Multimodality and multimodal communication is a rapidly evolving research field addressed by scientists working in various perspectives, from psycho-sociological fields, anthropology and linguistics, to communication and multimodal interfaces, companions, smart homes and ambient assisted living etc. Multimodality in human-machine interaction is not...
This study is a part of an ongoing effort in order to empirically investigate in detail relations between verbal and co-verbal behavior expressed during multi-speaker highly spontaneous and affective face-to-face conversations. The main motivation for this study is to be able to create natural co-verbal resources for automatic synthesis of highly n...
In the paper, a speech-based platform for intelligent ambience and/or supportive environment applications is presented. The platform has a distributed architecture, which enables extended connectivity and support for multiple intelligent ambience services. The mobile unit Genesis is an integral part of the distributed platform, enabling interaction...
Full access: https://authors.elsevier.com/a/1T~NG3OWJ8hFRu
As a result of the convergence of different services delivered over the internet protocol, internet protocol television (IPTV) may be regarded as the one of the most widespread user interfaces accepted by a highly diverse user domain. Every generation, from children to the elderly, can use...
Version 1.0.4 released. Version 1.0.4 of Meettell mobile application features the moderator functionality. By entering the moderator session code any participant can become a moderator of a discussion and can efficiently manage the discussion with his/her mobile phone. The Meettell mobile application now supports all the available discussion manage...
An advanced new system for managing discussions on meetings, conferences and other events.
The aim of the book is to represent a flexible and efficient algorithm and a novel system used for the planning, generation, and realization of conversational behavior (co-verbal behavior). Such behavior is best described as a set of moving body parts, which are meaningful. In terms of prosody, it is synchronized with the accompanying speech. The m...
Multimodal interfaces incorporating embodied conversational agents enable the development of novel concepts regarding interaction management tactics within responsive human-machine interfaces. Such interfaces provide several additional non-verbal communication channels, such as: natural visualized speech, facial expression, and different body motio...
The paper presents the novel design of a one-pass large vocabulary continuous-speech recognition decoder engine, named SPREAD. The decoder is based on a time-synchronous beam-search approach, including statically expanded cross-word triphone contexts. An approach using efficient tuple structures is proposed for the construction of the complete sear...
Several systems with multimodal interfaces are already available, and they allow for a more natural and more advanced exchange of information between man and a machine. Nevertheless, the television domain is still undergoing an innovation/development phase within which standard linear television is further enhanced with several novel technologies....
The main goal of using non-verbal modalities together with the general text-to-speech (TTS) system is to better emulate human-like course of the interaction between users and the UMB-SmartTV platform. Namely, when human-TV interaction is supported by TTS only, the interactions tend to be still less functional and less human-like. In order to achiev...
Embodied conversational agents (ECA) and speech-based human–machine interfaces can together represent more advanced and more natural human–machine interaction. Fusion of both topics is a challenging agenda in research and production spheres. The important goal of human–machine interfaces is to provide content or functionality in the form of a dialo...
IPTV services are still evolving and try to bring ICT novelties into IPTV environment. Several initiatives are
focused to provide more personalized interactivity to the standard TV sets and to develop more personalized interactive applications for STBs. Nevertheless, the personalization and interactivity are usually limited towards context-awarene...
When human-TV interaction is performed by remote controller and mobile devices only, the interactions tend
to be mechanical, dreary and uninformative. To achieve more advanced interaction, and more human-human like, we introduce the virtual agent technology as a feedback interface. Verbal and co-verbal gestures are linked through complex mental pr...
Article brings detailed information about procedures of building Slovenian lexica within the LC-STAR project, and also detailed information about the size of that lexica. University of Maribor joined the LC-STAR project in order to provide appropriate language resources for developing speech-to-speech translation technology for Slovenian language....
Web-based solutions and interfaces should be easy, more intuitive, and should also adapt to the natural and cognitive information processing and presentation capabilities of humans. Today, human-controlled multimodal systems with multimodal interfaces are possible. They allow for a more natural and more advanced exchange of information between man...
Visual perception, speech perception and the understanding of perceived information are linked through complex mental processes. Gestures, as part of visual perception and synchronized with verbal information, are a key concept of human social interaction. Even when there is no physical contact (e.g., a phone conversation), humans still tend to exp...
Several features of human-human conversation have to be accounted for in order to recreate conversational behavior on a synthetic model, as natural as possible.. Spontaneous conversations are a combination of multiple modalities (e.g. gestures, postures, gazes, expressions) in order to effectively convey information between participants. This paper...
We describe the preparation of parallel corpora based on professional quality subtitles in seven European language pairs. The main focus is the effect of the processing steps on the size and quality of the final
corpora.
This paper presents a framework for the efficient development and representation of morphological and phonetic lexicons, to
be used in speech technology applications. Solutions that would be the most appropriate for developing speech technologies
for specific language have to be analyzed when developing the lexicons. In the paper issues such as the...
This paper presents a novel process of transferring the human-generated communicative behavior
onto an embodied conversational agent. The aim of our work is to build a high-resolution motion dictionary
based on empirical analysis of non-verbal behavior performed in multi-speaker informal dialogues. The verbal
and non-verbal behavior is recreated...
Non-verbal behavior performed by embodied conversational agents still appears “wooden” and sometimes even “unnatural”. Annotated corpora and high resolution annotations capturing the expressive details of movement, may improve the gradualness of synthetic behavior. This paper presents a non-functional, form-oriented annotation scheme based on infor...
This paper proposes a gradient-descent based unit selection optimization algorithm for the optimization of unit-cost function weights and for improving the overall performance of the unit-selection algorithm, as used in a corpus-based text-to-speech synthesis system. Complex multidimensional and fuzzy-logic based unit-cost functions are used in the...
The first reason for using non-verbal modalities together with the TTS system, is to better
emulate the natural course of the dialogue, and to make people feel more comfortable when
‘’speaking’’ to a machine. The second reason is hidden in those issues that occur during the
usage of human-machine interaction systems. The need to repeat and the m...
Web applications are a widely-spread and a widely-used concept for presenting information. Their underlying architecture and standards, in many cases, limit their presentation/control capabilities of showing pre-recorded audio/video sequences. Highly-dynamic text content, for instance, can only be displayed in its native from (as part of HTML conte...
Multimodal interfaces supporting ECAs enable the development of novel concepts regarding human-machine interaction interfaces and provide several communication channels such as: natural speech, facial expression, and different body gestures. This paper presents the synthesis of expressive behaviour within the realm of affective computing. By provid...
Embodied Conversational Agents (ECAs) play an
important role in the development of personalized and expressive
human-machine interaction, allowing users to interact with a system
over several communication channels, such as: natural speech, facial
expression, and different body gestures. This paper presents a novel
approach to the generation o...
The ECESS consortium (European Center of Excellence in Speech Synthesis) aims to speed up progress in speech synthesis technology, by providing an appropriate evaluation framework. The key element of the evaluation framework is based on the partition of a text-to-speech synthesis system into distributed TTS modules. A text processing, prosody gener...
In this paper a new modular framework (EVA framework) and expressive embodied conversational agent EVA are presented. From talking heads to fully animatable bodies, and by techniques such as behavioral modeling and emotion modeling, researches are trying to present interaction interfaces providing as natural behavior as possible. The ECA EVA presen...
In this paper, a finite-state machine based distributed framework DATA used for development of intelligent ambience systems is presented. Event-based distributed framework DATA enables development of efficient, clear and flexible operation over several electronic devices, mobile platforms and mobile units that can be part of complex intelligent amb...
Most users in either desktop or ubiquitous environments access Web applications from Web browser interfaces. Majority of standard Web applications are still based on GUIs and usually support user-machine interaction using traditional human-machine interfaces (e.g. mouse, keyboard). In order to make access to the Web content more natural and to impr...
The paper presents platform for web based TTS modules and systems evaluation named RES (Remote Evaluation System). It is being
developed within the European Centre of Excellence for Speech Synthesis (ECESS, www.ecess.eu). The presented platform will
be used for web based online evaluation of various text-to-speech (TTS) modules, and even complete T...
The consortium ECESS (European Center of Excellence for Speech Synthesis) has set up a framework for evaluation of software modules and tools relevant for speech synthesis. Till now two lines of evaluation campaigns have been established: (1) Evaluation of the ECESS TTS modules (text processing, prosody, acoustic synthesis). (2) Evaluation of ECESS...
Speech-to-speech translation technology has difficulties processing elements of spontaneity in conversation. We propose a discourse marker attribute in speech corpora to help overcome some of these problems. There have already been some attempts to annotate discourse markers in speech corpora. However, as there is no consistency on what expressions...
This article presents a new unified approach to modeling grapheme-to-phoneme conversion for the PLATTOS Slovenian text-to-speech system. A cascaded structure consisting of several successive processing steps is proposed for the aim of grapheme-to-phoneme conversion. Processing foreign words and rules for the post-processing of phonetic transcriptio...
This paper proposes a time and space-efficient architecture for a text-to-speech synthesis system (TTS). The proposed architecture can be efficiently used in those applications with unlimited domain, requiring multilingual or polyglot functionality. The integration of a queuing mechanism, heterogeneous graphs and finite-state machines gives a power...
Embodied conversational agents employed in multimodal interaction applications have the potential to achieve similar properties
as humans in face-to-face conversation. They enable the inclusion of verbal and nonverbal communication. Thus, the degree
of personalization of the user interface is much higher than in other human-computer interfaces. Thi...
This paper focuses on the estimation of the Tilt intonation model (1). Usually, Tilt events are detected using a first estima- tion which is improved using gradient descent techniques. To speed up the search we propose to use a closed form expression for some of the Tilt parameters. The gradient descent search is used only for the time related para...
In multilingual text-to-speech synthesis systems, many external extensive natural language resources are used, especially in the text processing part. Therefore it is very important that representation of these resources is time and space efficient. It is also very important that language resources for new languages can be easily incorporated into...
This paper presents an application, LentInfo, which is a system used to provide information about programmes for the Festival Lent in Slovenia. The Festival Lent consists of different open-air theatre and music performances and raws more than 400,000 visitors per year. This application is based on a Hidden Markov Model (HMM) speech recogniser, and...
Statistical approaches in speech technology, whether used for statistical language models, trees, hidden Markov models or neural networks, represent the driving forces for the creation of language resources (LR), e.g., text corpora, pronunciation and morphology lexicons, and speech databases. This paper presents a system architecture for the rapid...
A lot of external natural language resources are used in spoken dialogue systems. These resources present considerable problems because of the needed space and slow lookup-time. It is, therefore, very important that the presentation of external language resources is time and space efficient. It is also very important that new language resources are...
Univerza v Mariboru, Fakulteta za elektrotehniko, računalništvo in informatiko, Center za jezikovne tehnologije, Smetanova ul. 17, 2000 Maribor, Slovenija darinka.verdonik@guest.arnes.si Povzetek Članek predstavlja jezikoslovni vidik sestavljanja oblikoslovnega in glasoslovnega slovarja za slovenski knjižni jezik (SImlex in SIflex), ki ju urejamo n...
This paper presents the spoken dialogue system used for automatic
correspondence. The system offers capabilities to voice command all the
functions available in the system. This is the most natural way of
man-machine communication. Architecture of the system is modular. It
consists of four major modules: graphic interface, recognizer based on
key-w...
The paper represents the Turdis database of spontaneous conversations in tourist domain in Slovenian language. Database was built for use in developing speech-to-speech translation components, however it can be used also for developing dialog systems or used for linguistic researches. The idea was to record a database of telephone conversations in...
Univerza v Mariboru, Fakulteta za elektrotehniko, računalništvo in informatiko, Smetanova ul. 17, SI-2000 Maribor, Slovenija andrej.zgank@uni-mb.si Povzetek clanku predstavljamo govorno voden informacijski portal LentInfo, ki je prv siršemu krogu uporabnikov predstavljena aplikacija, ki jo je možno voditi z uporabo slovenskega govora. Portal uporab...