Matej Rojc

Matej Rojc
  • associate professor
  • Professor (Associate) at University of Maribor

About

78
Publications
13,681
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
318
Citations
Current institution
University of Maribor
Current position
  • Professor (Associate)
Additional affiliations
September 1997 - present
University of Maribor
Position
  • Professor (Associate)
January 1997 - present
University of Maribor
Position
  • Professor (Associate)

Publications

Publications (78)
Article
Full-text available
Because of the prevalence of depression, its often-chronic course, relapse and associated disability, early detection and non-intrusive monitoring is a crucial tool for timely diagnosis and treatment, remission of depression and prevention of relapse. In this way, its impact on quality of life and well-being can be limited. Current attempts to use...
Article
Full-text available
The importance and value of real-world data in healthcare cannot be overstated because it offers a valuable source of insights into patient experiences. Traditional patient-reported experience and outcomes measures (PREMs/PROMs) often fall short in addressing the complexities of these experiences due to subjectivity and their inability to precisely...
Chapter
Full-text available
With spoken language interfaces, chatbots, and enablers, the conversational intelligence became an emerging field of research in man-machine interfaces in several target domains. In this paper, we introduce the multilingual conversational chatbot platform that integrates Open Health Connect platform and mHealth application together with multimodal...
Preprint
BACKGROUND Due to the prevalence of depression, its often chronic course, re-occurrences, and associated disability, early detection and non-intrusive monitoring present a crucial tool in timely diagnosing and treatment, remission of depression, prevention of relapse, and therefore limiting its impact on quality of life and well-being. Existing suc...
Article
Full-text available
This paper focuses on gaining new knowledge through observation, qualitative analytics, and cross-modal fusion of rich multi-layered conversational features expressed during multiparty discourse. The outlined research stems from the theory that speech and co-speech gestures originate from the same representation; however, the representation is not...
Article
Large acoustic inventories must be used to produce speech close to natural quality. However, the concatenation cost space grows exponentially with the number of acoustic units in the acoustic inventory, increasing the latency of the unit selection algorithm, making algorithms unusable in real-time end-to-end systems. Even when data compression tech...
Article
The research proposed in this paper focuses on pragmatic interlinks between discourse markers and non-verbal behavior. Although non-verbal behavior is recognized to add non-redundant information and social interaction is not merely recognized as the transmission of words and sentences, the evidence regarding grammatical/linguistic interlinks betwee...
Article
Full-text available
Collection of patient-reported outcomes (PROs) remotely and their usage in the clinical workflow provide an improvement on both patient’s quality of life and cancer care. However, adoption of collecting PROs into the clinical workflow is rare, and existing works still have a lot of issues providing a holistic approach. This paper offers enhancement...
Article
Full-text available
Patient-reported outcomes (PROs) and their use in the clinical workflow can improve cancer survivors’ outcomes and quality of life. However, there are several challenges regarding efficient collection of the patient-reported outcomes and their integration into the clinical workflow. Patient adherence and interoperability are recognized as main barr...
Article
Full-text available
When human-TV interaction is performed by remote controller and mobile devices only, the interactions tend to be mechanical, dreary and uninformative. To achieve more advanced interaction, and more human-human like, we introduce the virtual agent technology as a feedback interface. Verbal and co-verbal gestures are linked through complex mental pro...
Chapter
Full-text available
The present research explores non-verbal behavior that accompanies the management of turns in naturally occurring conversations. To analyze turn management, we implemented the ISO 24617-2 multidimensional dialog act annotation scheme. The classification of the communicative intent of non-verbal behavior was performed with the annotation scheme for...
Article
In data-driven corpus-based text-to-speech synthesis systems, the main issue is to select the most natural-sounding sequence of acoustic units without unnatural acoustic transitions, and to minimize all acoustic mismatches at the concatenation points. Unit selection algorithms incorporating unit selection cost functions have been known to synthesiz...
Research
EVA Corpus 1.0 consists of one episode of an audio/video session plus corresponding orthographic transcriptions with a duration of 57 minutes. The multi-party spontaneous discourse in the recording is from an entertaining evening TV-talk show "A si ti tut not padu", broadcasted by the POP-TV Slovene commercial TV station in 2008, and represents a p...
Chapter
The present paper describes a corpus for research into the pragmatic nature of how information is expressed synchronously through language, speech, and gestures. The outlined research stems from the ‘growth point theory’ and ‘integrated systems hypothesis’, which proposes that co-speech gestures (including hand gestures, facial expressions, posture...
Article
Full-text available
A major drawback of corpus-based speech synthesis systems is the use of large acoustic inventories, and currently one of the main challenges is the optimal representation of concatenation costs associated with units in the acoustic inventory. These concatenation costs are used to evaluate spectral mismatches between the acoustic units to be concate...
Chapter
Full-text available
This paper outlines a novel framework that has been designed to create a repository of “gestures” for embodied conversational agents. By utilizing it, the virtual agents can sculpt conversational expressions incorporating both verbal and non-verbal cues. The 3D representations of gestures are captured in EVA Corpus, and then stored as a repository...
Chapter
Full-text available
In order to attack these issues, a comprehensive multimodal knowledge source is used that is based on video data of spontaneous multi-party conversations, together with its utilization and information signals extraction through EVA annotation scheme [14], which has been defined, based on several theories in corpus linguistics, psycho-linguistics, a...
Conference Paper
Full-text available
This paper outlines a novel framework that has been designed to create a repository of ''gestures'' for embodied conversational agents. By utilizing it, the virtual agents can sculpt conversational expressions incorporating both verbal and non-verbal cues. The 3D representations of gestures are captured in EVA Corpus, and then stored as a repositor...
Research Proposal
Full-text available
We are looking for a candidate to apply for a job as part of the H2020 project. Number of posted positions: 1. Job Location is at the University of Maribor, Faculty of Electrical Engineering, Computer and Information Science, Laboratory for Digital Signal Processing. Type of employment is full-time. Duration of the employment: full time fixed term...
Research Proposal
We are looking for a researcher to apply for a research position on the public call "Javni razpis za spodbujanje raziskovalcev na začetku kariere 2.1. (CALL 5442-1/2018)". The researcher must have interest in the following topics: • conversational dialog systems and dialogue managers, • machine learning (especially deep learning), • natural langua...
Chapter
Full-text available
In order to engage with a human user on more personal level, natural HCI is starting to virtualize itself and is utilizing the potential of entities resembling human collocutors in interaction. In particular through human-likeness, these entities represent the multimodal interaction models, which are capable to adapt to user’s context and to facili...
Chapter
Full-text available
Conversation is becoming one of the key interaction modes in HMI. As a result, the conversational agents (CAs) have become an important tool in various everyday scenarios. From Apple and Microsoft to Amazon, Google, and Facebook, all have adapted their own variations of CAs. The CAs range from chatbots and 2D, carton-like implementations of talking...
Conference Paper
Full-text available
Embodied conversational agents are virtual entities that tend to imitate as many features of face-face dialogs as possible. In order to achieve this goal, the ability to reproduce synchronized verbal and co-verbal signals coupled into conversational behavior becomes essential. Further, signals such as social cues, attitude (emotions), personality,...
Article
Full-text available
Multimodality and multimodal communication is a rapidly evolving research field addressed by scientists working in various perspectives, from psycho-sociological fields, anthropology and linguistics, to communication and multimodal interfaces, companions, smart homes and ambient assisted living etc. Multimodality in human-machine interaction is not...
Conference Paper
Full-text available
This study is a part of an ongoing effort in order to empirically investigate in detail relations between verbal and co-verbal behavior expressed during multi-speaker highly spontaneous and affective face-to-face conversations. The main motivation for this study is to be able to create natural co-verbal resources for automatic synthesis of highly n...
Article
In the paper, a speech-based platform for intelligent ambience and/or supportive environment applications is presented. The platform has a distributed architecture, which enables extended connectivity and support for multiple intelligent ambience services. The mobile unit Genesis is an integral part of the distributed platform, enabling interaction...
Article
Full-text available
Full access: https://authors.elsevier.com/a/1T~NG3OWJ8hFRu As a result of the convergence of different services delivered over the internet protocol, internet protocol television (IPTV) may be regarded as the one of the most widespread user interfaces accepted by a highly diverse user domain. Every generation, from children to the elderly, can use...
Presentation
Full-text available
Version 1.0.4 released. Version 1.0.4 of Meettell mobile application features the moderator functionality. By entering the moderator session code any participant can become a moderator of a discussion and can efficiently manage the discussion with his/her mobile phone. The Meettell mobile application now supports all the available discussion manage...
Presentation
Full-text available
An advanced new system for managing discussions on meetings, conferences and other events.
Book
Full-text available
The aim of the book is to represent a flexible and efficient algorithm and a novel system used for the planning, generation, and realization of conversational behavior (co-verbal behavior). Such behavior is best described as a set of moving body parts, which are meaningful. In terms of prosody, it is synchronized with the accompanying speech. The m...
Article
Full-text available
Multimodal interfaces incorporating embodied conversational agents enable the development of novel concepts regarding interaction management tactics within responsive human-machine interfaces. Such interfaces provide several additional non-verbal communication channels, such as: natural visualized speech, facial expression, and different body motio...
Article
Full-text available
The paper presents the novel design of a one-pass large vocabulary continuous-speech recognition decoder engine, named SPREAD. The decoder is based on a time-synchronous beam-search approach, including statically expanded cross-word triphone contexts. An approach using efficient tuple structures is proposed for the construction of the complete sear...
Article
Full-text available
Several systems with multimodal interfaces are already available, and they allow for a more natural and more advanced exchange of information between man and a machine. Nevertheless, the television domain is still undergoing an innovation/development phase within which standard linear television is further enhanced with several novel technologies....
Article
Full-text available
The main goal of using non-verbal modalities together with the general text-to-speech (TTS) system is to better emulate human-like course of the interaction between users and the UMB-SmartTV platform. Namely, when human-TV interaction is supported by TTS only, the interactions tend to be still less functional and less human-like. In order to achiev...
Book
Full-text available
Embodied conversational agents (ECA) and speech-based human–machine interfaces can together represent more advanced and more natural human–machine interaction. Fusion of both topics is a challenging agenda in research and production spheres. The important goal of human–machine interfaces is to provide content or functionality in the form of a dialo...
Conference Paper
Full-text available
IPTV services are still evolving and try to bring ICT novelties into IPTV environment. Several initiatives are focused to provide more personalized interactivity to the standard TV sets and to develop more personalized interactive applications for STBs. Nevertheless, the personalization and interactivity are usually limited towards context-awarene...
Conference Paper
Full-text available
When human-TV interaction is performed by remote controller and mobile devices only, the interactions tend to be mechanical, dreary and uninformative. To achieve more advanced interaction, and more human-human like, we introduce the virtual agent technology as a feedback interface. Verbal and co-verbal gestures are linked through complex mental pr...
Article
Full-text available
Article brings detailed information about procedures of building Slovenian lexica within the LC-STAR project, and also detailed information about the size of that lexica. University of Maribor joined the LC-STAR project in order to provide appropriate language resources for developing speech-to-speech translation technology for Slovenian language....
Article
Full-text available
Web-based solutions and interfaces should be easy, more intuitive, and should also adapt to the natural and cognitive information processing and presentation capabilities of humans. Today, human-controlled multimodal systems with multimodal interfaces are possible. They allow for a more natural and more advanced exchange of information between man...
Article
Full-text available
Visual perception, speech perception and the understanding of perceived information are linked through complex mental processes. Gestures, as part of visual perception and synchronized with verbal information, are a key concept of human social interaction. Even when there is no physical contact (e.g., a phone conversation), humans still tend to exp...
Article
Full-text available
Several features of human-human conversation have to be accounted for in order to recreate conversational behavior on a synthetic model, as natural as possible.. Spontaneous conversations are a combination of multiple modalities (e.g. gestures, postures, gazes, expressions) in order to effectively convey information between participants. This paper...
Conference Paper
Full-text available
We describe the preparation of parallel corpora based on professional quality subtitles in seven European language pairs. The main focus is the effect of the processing steps on the size and quality of the final corpora.
Article
Full-text available
This paper presents a framework for the efficient development and representation of morphological and phonetic lexicons, to be used in speech technology applications. Solutions that would be the most appropriate for developing speech technologies for specific language have to be analyzed when developing the lexicons. In the paper issues such as the...
Conference Paper
Full-text available
This paper presents a novel process of transferring the human-generated communicative behavior onto an embodied conversational agent. The aim of our work is to build a high-resolution motion dictionary based on empirical analysis of non-verbal behavior performed in multi-speaker informal dialogues. The verbal and non-verbal behavior is recreated...
Chapter
Full-text available
Non-verbal behavior performed by embodied conversational agents still appears “wooden” and sometimes even “unnatural”. Annotated corpora and high resolution annotations capturing the expressive details of movement, may improve the gradualness of synthetic behavior. This paper presents a non-functional, form-oriented annotation scheme based on infor...
Article
Full-text available
This paper proposes a gradient-descent based unit selection optimization algorithm for the optimization of unit-cost function weights and for improving the overall performance of the unit-selection algorithm, as used in a corpus-based text-to-speech synthesis system. Complex multidimensional and fuzzy-logic based unit-cost functions are used in the...
Chapter
Full-text available
The first reason for using non-verbal modalities together with the TTS system, is to better emulate the natural course of the dialogue, and to make people feel more comfortable when ‘’speaking’’ to a machine. The second reason is hidden in those issues that occur during the usage of human-machine interaction systems. The need to repeat and the m...
Conference Paper
Full-text available
Web applications are a widely-spread and a widely-used concept for presenting information. Their underlying architecture and standards, in many cases, limit their presentation/control capabilities of showing pre-recorded audio/video sequences. Highly-dynamic text content, for instance, can only be displayed in its native from (as part of HTML conte...
Conference Paper
Full-text available
Multimodal interfaces supporting ECAs enable the development of novel concepts regarding human-machine interaction interfaces and provide several communication channels such as: natural speech, facial expression, and different body gestures. This paper presents the synthesis of expressive behaviour within the realm of affective computing. By provid...
Article
Full-text available
Embodied Conversational Agents (ECAs) play an important role in the development of personalized and expressive human-machine interaction, allowing users to interact with a system over several communication channels, such as: natural speech, facial expression, and different body gestures. This paper presents a novel approach to the generation o...
Article
Full-text available
The ECESS consortium (European Center of Excellence in Speech Synthesis) aims to speed up progress in speech synthesis technology, by providing an appropriate evaluation framework. The key element of the evaluation framework is based on the partition of a text-to-speech synthesis system into distributed TTS modules. A text processing, prosody gener...
Conference Paper
Full-text available
In this paper a new modular framework (EVA framework) and expressive embodied conversational agent EVA are presented. From talking heads to fully animatable bodies, and by techniques such as behavioral modeling and emotion modeling, researches are trying to present interaction interfaces providing as natural behavior as possible. The ECA EVA presen...
Article
Full-text available
In this paper, a finite-state machine based distributed framework DATA used for development of intelligent ambience systems is presented. Event-based distributed framework DATA enables development of efficient, clear and flexible operation over several electronic devices, mobile platforms and mobile units that can be part of complex intelligent amb...
Article
Full-text available
Most users in either desktop or ubiquitous environments access Web applications from Web browser interfaces. Majority of standard Web applications are still based on GUIs and usually support user-machine interaction using traditional human-machine interfaces (e.g. mouse, keyboard). In order to make access to the Web content more natural and to impr...
Chapter
Full-text available
The paper presents platform for web based TTS modules and systems evaluation named RES (Remote Evaluation System). It is being developed within the European Centre of Excellence for Speech Synthesis (ECESS, www.ecess.eu). The presented platform will be used for web based online evaluation of various text-to-speech (TTS) modules, and even complete T...
Conference Paper
Full-text available
The consortium ECESS (European Center of Excellence for Speech Synthesis) has set up a framework for evaluation of software modules and tools relevant for speech synthesis. Till now two lines of evaluation campaigns have been established: (1) Evaluation of the ECESS TTS modules (text processing, prosody, acoustic synthesis). (2) Evaluation of ECESS...
Article
Full-text available
Speech-to-speech translation technology has difficulties processing elements of spontaneity in conversation. We propose a discourse marker attribute in speech corpora to help overcome some of these problems. There have already been some attempts to annotate discourse markers in speech corpora. However, as there is no consistency on what expressions...
Article
Full-text available
This article presents a new unified approach to modeling grapheme-to-phoneme conversion for the PLATTOS Slovenian text-to-speech system. A cascaded structure consisting of several successive processing steps is proposed for the aim of grapheme-to-phoneme conversion. Processing foreign words and rules for the post-processing of phonetic transcriptio...
Article
This paper proposes a time and space-efficient architecture for a text-to-speech synthesis system (TTS). The proposed architecture can be efficiently used in those applications with unlimited domain, requiring multilingual or polyglot functionality. The integration of a queuing mechanism, heterogeneous graphs and finite-state machines gives a power...
Conference Paper
Full-text available
Embodied conversational agents employed in multimodal interaction applications have the potential to achieve similar properties as humans in face-to-face conversation. They enable the inclusion of verbal and nonverbal communication. Thus, the degree of personalization of the user interface is much higher than in other human-computer interfaces. Thi...
Conference Paper
Full-text available
This paper focuses on the estimation of the Tilt intonation model (1). Usually, Tilt events are detected using a first estima- tion which is improved using gradient descent techniques. To speed up the search we propose to use a closed form expression for some of the Tilt parameters. The gradient descent search is used only for the time related para...
Article
In multilingual text-to-speech synthesis systems, many external extensive natural language resources are used, especially in the text processing part. Therefore it is very important that representation of these resources is time and space efficient. It is also very important that language resources for new languages can be easily incorporated into...
Article
Full-text available
This paper presents an application, LentInfo, which is a system used to provide information about programmes for the Festival Lent in Slovenia. The Festival Lent consists of different open-air theatre and music performances and raws more than 400,000 visitors per year. This application is based on a Hidden Markov Model (HMM) speech recogniser, and...
Article
Full-text available
Statistical approaches in speech technology, whether used for statistical language models, trees, hidden Markov models or neural networks, represent the driving forces for the creation of language resources (LR), e.g., text corpora, pronunciation and morphology lexicons, and speech databases. This paper presents a system architecture for the rapid...
Article
A lot of external natural language resources are used in spoken dialogue systems. These resources present considerable problems because of the needed space and slow lookup-time. It is, therefore, very important that the presentation of external language resources is time and space efficient. It is also very important that new language resources are...
Article
Full-text available
Univerza v Mariboru, Fakulteta za elektrotehniko, računalništvo in informatiko, Center za jezikovne tehnologije, Smetanova ul. 17, 2000 Maribor, Slovenija darinka.verdonik@guest.arnes.si Povzetek Članek predstavlja jezikoslovni vidik sestavljanja oblikoslovnega in glasoslovnega slovarja za slovenski knjižni jezik (SImlex in SIflex), ki ju urejamo n...
Conference Paper
Full-text available
This paper presents the spoken dialogue system used for automatic correspondence. The system offers capabilities to voice command all the functions available in the system. This is the most natural way of man-machine communication. Architecture of the system is modular. It consists of four major modules: graphic interface, recognizer based on key-w...
Article
Full-text available
The paper represents the Turdis database of spontaneous conversations in tourist domain in Slovenian language. Database was built for use in developing speech-to-speech translation components, however it can be used also for developing dialog systems or used for linguistic researches. The idea was to record a database of telephone conversations in...
Article
Full-text available
Univerza v Mariboru, Fakulteta za elektrotehniko, računalništvo in informatiko, Smetanova ul. 17, SI-2000 Maribor, Slovenija andrej.zgank@uni-mb.si Povzetek clanku predstavljamo govorno voden informacijski portal LentInfo, ki je prv siršemu krogu uporabnikov predstavljena aplikacija, ki jo je možno voditi z uporabo slovenskega govora. Portal uporab...

Network

Cited By