• Home
  • Stevan Ostrogonac
Stevan Ostrogonac

Stevan Ostrogonac
Infostud Ltd.

PhD

About

46
Publications
11,096
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
132
Citations
Additional affiliations
September 2021 - November 2021
Univerzitet Metropolitan
Position
  • Assistant Professor
Description
  • Involved in defining curriculum for several courses related to machine learning and natural language processing.
February 2019 - present
Infostud 3 Ltd.
Position
  • Senior Artificial Intelligence Engineer
Description
  • Creating data-intensive AI applications. Combining NLP techniques, different ML model architectures and domain knowledge to create complex systems for solving a variety of business problems. Implementation of innovations related to the use of data. Intellectual property protection.
May 2017 - January 2019
OCD Solutions
Position
  • Founder / Senior Software Developer
Description
  • Computer programming related to text-to-speech for Serbian, English and Hebrew, mainly in C++. Cooperation with AlfaNum Ltd. from Novi Sad, Serbia and Speech Morphing Inc. from San Jose, California, USA.
Education
January 2011 - December 2018
University of Novi Sad
Field of study
  • Faculty of Technical Sciences, Electrical and Computer Engineering
October 2009 - January 2011
University of Novi Sad
Field of study
  • Faculty of Technical Sciences, Electrical and Computer Engineering
October 2005 - October 2009
University of Novi Sad
Field of study
  • Faculty of Technical Sciences, Electrical and Computer Engineering

Publications

Publications (46)
Conference Paper
Full-text available
Purchasing a car is one of the capital investments that often requires a significant amount of time for a buyer to obtain domain knowledge prior to deciding. This paper presents the main results of a project aimed to help the used car buyers in making the optimal choice. The goal of the project was to analyze available data and choose an adequate m...
Article
Full-text available
Machine learning models have been tested on countless classification problems in the past. However, there is little information available on how well they perform when the task is learning abstract concepts, that are difficult to understand, even for humans. The object of this research was to find the best model for capturing the concepts of white-...
Article
Full-text available
Within the past two decades, text processing became an important part of most state-of-the-art advanced automation systems. However, for many under-resourced languages it is still challenging to perform textual data preparation, due to the lack of adequate tools. In this work, a python package for text processing for Serbian called nlpheart is pres...
Article
Full-text available
When training language models (especially for highly inflective languages), some applications require word clustering in order to mitigate the problem of insufficient training data or storage space. The goal of word clustering is to group words that can be well represented by a single class in the sense of probabilities of appearances in different...
Thesis
Full-text available
A statistical language model, in theory, represents a probability distribution over sequences of words of a language. In practice, it is a tool for estimating probabilities of word sequences of interest. Mathematical basis related to language models is mostly language independent. However, the quality of trained models depends not only on training...
Article
Full-text available
Modern text-to-speech systems generally achieve good intelligibility. The one of the main drawbacks of these systems is the lack of expressiveness in comparison to natural human speech. It is very unpleasant when automated system conveys positive and negative message in completely the same way. The introduction of parametric methods in speech synth...
Article
Full-text available
Naturalness is one of the most important aspects of synthesized speech, and state-of-the-art parametric speech synthesizers require training on large quantities of annotated speech data to be able to convey prosodic elements such as pitch accent and phrase boundary tone. The most frequently used framework for prosodic annotation of speech in Americ...
Conference Paper
Full-text available
One of the advantages of statistical methods in speech synthesis is possibility to easily change speaker characteristics and speaking styles. In this paper we present a simple method for incorporating styles into synthesized speech based on style codes. The proposed method is successfully applied to both hidden Markov model and deep neural networks...
Conference Paper
Full-text available
ToBI (Tones and Break Indices) is a widely used set of conventions for prosodic annotation of speech in American English. This paper presents certain problems in the application of ToBI within a system for American English speech synthesis, which arise due to the lack of explicit tags for indicating either positive or negative emphasis of certain w...
Conference Paper
Full-text available
Document classification and topic modeling represent some of the biggest challenges in the fields of natural language processing and information retrieval. Many of the techniques developed for these purposes are language-independent (Sanderson and Bruce Croft, 2012). However, language resources are needed for each language, along with domain-specif...
Article
Full-text available
Spell checking tools have been developed for many languages, but for most of them (including Serbian) such applications are based on simple dictionary lookup and can, therefore, handle only so-called non-word errors. This research is focused on developing advanced spell checking software for Serbian. Semantic errors are the most difficult ones to h...
Article
Full-text available
The inclusion of persons with disabilities has always represented an important issue. Advancements within the field of computer science have enabled the development of different types of aids, which have significantly improved the quality of life of the disabled. However, for some disabilities, such as visual impairment, the purpose of these aids i...
Conference Paper
Full-text available
This paper presents initial work related to the construction of an advanced spell checking software for textual content written in Serbian. So far, spell checking tools for Serbian have been limited to detecting only non-word spelling errors, since they rely on dictionary lookup techniques and implement no context analysis. By incorporating languag...
Conference Paper
Full-text available
This paper presents a deep neural network (DNN) based large vocabulary continuous speech recognition (LVCSR) system for Serbian, developed using the open-source Kaldi speech recognition toolkit. The DNNs are initialized using stacked restricted Boltzmann machines (RBMs) and trained using cross-entropy as the objective function and the standard erro...
Conference Paper
Full-text available
Teaching computers how to derive the meaning of words has been one of the greatest challenges of the Information Age. Semantic analysis requires extensive language resources and complicated techniques comprising computationally expensive algorithms. For English, a lot of research was conducted in this field and a great amount of language resources...
Conference Paper
Full-text available
Voice Assistant is a personal assistant mobile phone application for the Serbian language that allows natural communication between the phone and the user [1]. The application provides an array of essential commands for a fast and efficient usage of the device in a number of tasks, e.g., messaging, calling, handling of contacts, changing settings,...
Article
Full-text available
This paper describes the procedure of collecting speech and corresponding textual data and the processing needed to create a repository for training a LVCSR system for the Serbian language. The speech database for Serbian consists of speech recordings from audio books, radio programs and talk shows, as well as read utterances from an array of male...
Conference Paper
Full-text available
The paper presents the results obtained using a large vocabulary continuous speech recognition system for the Serbian language, based on the open-source Kaldi speech recognition toolkit. Data preparation procedures are described in brief, giving special attention to the particularities of the Serbian language. The original, proposed recipes were mo...
Conference Paper
Full-text available
Eigenvalues Driven Gaussian Selection (EDGS) is used in this paper in order to reduce the computational complexity of an acoustic processing module of a medium vocabulary continuous speech recognition system for the Serbian language, based on Hidden Markov Models (HMMs) with the diagonal covariance matrices. The optimal values of five different par...
Conference Paper
Full-text available
This paper presents a computer application based on speech technologies for the Serbian language, which has been adapted to persons with disabilities and is especially intended for the purpose of education of blind or partially sighted children. There are many different approaches to education of visually impaired pupils. However, language-dependen...
Conference Paper
Full-text available
Although in the last decade a considerable step towards the naturalness of synthesized speech has been made, the key objection to state-of-the-art synthesizers is that synthesized voices still fail to convey the emotion that a human speaker would be supposed to have in a particular situation. The activities presented in this work include the produc...
Conference Paper
Full-text available
This paper presents the results of introducing additional clustering to the leafs of the classification and regression trees (CARTs) used in text-to-speech for Serbian. The additional clustering is based on acoustic features of the observations, and the decoding process employs a statistical language model consisting of n-grams of states which corr...
Conference Paper
Full-text available
Temporal Discrete Cosine Transform (TDCT) features have shown good performance in the speaker verification task, and in this paper we utilize them in speech emotion recognition. Tests were conducted on a Serbian emotional speech database, using Neural Networks (NN) as a classifier and Mel-Frequency Cepstral Coefficients (MFFC) as a reference featur...
Conference Paper
Full-text available
This paper describes the whole procedure of speech database collection and processing required for building a good large vocabulary speech recognition system for the Serbian language. The speech database consists of speech recordings from audio books, radio programs and talk shows, as well as read utterances from an array of male and female speaker...
Conference Paper
Full-text available
This paper presents an extension to the SEDREAMS algorithm for extracting the information about glottal opening and glottal closure instants (GCI and GOI) directly from the speech signal. Accurate detection of GCIs and GOIs is crucial for estimating the glottal features which are to be used in speaker recognition systems. Many different approaches...
Conference Paper
Full-text available
This paper describes a system for extracting glottal excitation signal from speech. The system has been developed in order to obtain information on the glottal excitation pulse shapes which would be used to improve the quality of Hidden Markov Model (HMM) Text-To-Speech (TTS) system which has been developed for Serbian. The glottal excitation extra...
Conference Paper
Full-text available
Unlike other new technologies, most speech technologies are heavily language dependent and have to be developed separately for each language. The paper gives a detailed description of speech and language resources for Serbian and kindred South Slavic languages developed during the last decade within joint projects of the Faculty of Technical Scienc...
Conference Paper
Full-text available
This paper presents a way to reduce the size of an ngram based language model (LM). This reduction method has proven to be very efficient in the sense of minimal information loss. The list of n-gram probabilities becomes too large for use in large-vocabulary speech recognition (LVCSR) systems when medium or large vocabularies are used for language...
Conference Paper
Full-text available
Deep belief networks are used in this paper in order to initialize a feed-forward neural network, used for classification of emotions based on lip shape. Deep architectures are trained in a layer-wise greedy manner during the unsupervised training phase, whereas the neural network is trained during the supervised training phase, using the images fr...
Conference Paper
Full-text available
This paper addresses two different approaches in modeling emotions applied to automatic Emotional Speech Recognition (ESR) for Serbian. These emotion models are: the categorical emotion model and the dimensional emotion model. Using the first modeling approach utterances are classified in 5 discrete emotion classes, and using the second modeling ap...
Conference Paper
Full-text available
This paper describes a study on correspondence between the language model quality and the size of the textual corpus used in the training process. Three types of n-gram models developed for the Serbian language were included in the study: word-based, lemma-based and class-based model. They are created in order to deal with the data sparsity problem...
Conference Paper
Full-text available
This paper gives a short overview of contemporary text to speech (TTS) systems available for the Serbian language and then presents the results of subjective assessment tests of the quality of synthesized speech generated with these methods. Its main goal is to show the improvement in resulting speech quality obtained using the new hidden Markov mo...
Conference Paper
Full-text available
This paper proposes a method of creating language models for highly inflective non-agglutinative languages. Three types of language models were considered - a common n-gram model, an n-gram model of lemmas and a class n-gram model. The last two types were specially designed for the Serbian language reflecting its unique grammar structure. All the l...
Conference Paper
Full-text available
In this paper, a text-to-speech synthesis system for Serbian based on hidden Markov models (HMMs) is presented. This is the first HMM-based synthesizer that can be successfully used on Serbian texts. It uses static and dynamic mel-generalized cepstum coefficients as spectral parameters for training context-dependent phone models, while as excitatio...
Conference Paper
Full-text available
A language model based on the N-gram concept was developed for the purpose of large vocabulary continuous speech recognition in Serbian. This model represents a combination of three models, each representing a separate language model. One of the models has been trained on a textual corpus containing common words, another one on a corpus containing...
Conference Paper
Full-text available
In this paper a speech synthesis system based on hidden Markov models (HMMs) in Serbian is presented, with an emphasis on the usage of global variance for enhancement of synthesized speech. When the basic technique is used, parameter trajectories of static features are over-smoothed due to statistical processing, and synthesized speech sounds muffl...
Conference Paper
Full-text available
This paper describes synthesized speech intelligibility testing using DRT (Diagnostic Rhyme Test) and SUS (Semantically Unpredictable Sentences) for Serbian language. The description of a program for creating semantically unpredictable sentences for five basic sentence structures in Serbian language, called SUSmaker, is also given. An overview of A...
Conference Paper
Full-text available
Speech signal carries a great amount of information which is vital for efficient human communication and therefore it is very important to ensure high quality of it’s transmission and understanding. Natural speech audibility is one of the few subjective features which can be precisely evaluated. This is done by using short words without meaning, a....

Network

Cited By