Vassilis Katsouros

Vassilis Katsouros
Athena-Research and Innovation Center in Information, Communication and Knowledge Technologies · Institute for Language and Speech Processing

About

52
Publications
10,183
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
573
Citations

Publications

Publications (52)
Conference Paper
Full-text available
Despite the fact that in some areas of cultural life, as in the case of certain online video platforms or TV programs, notable progress has been made to provide content accessible to Deaf and Hard of Hearing people (DHH), the same cannot be said for live theater performances. In this work, a system called NLP-Theatre is presented , with the emphasi...
Conference Paper
Full-text available
This paper presents SciPar, a new collection of parallel corpora created from openly available metadata of bachelor theses, master theses and doctoral dissertations hosted in institutional repositories, digital libraries of universities and national archives. We describe first how we harvested and processed metadata from 86, mainly European, reposi...
Article
Full-text available
Guitar tablature transcription consists in deducing the string and the fret number on which each note should be played to reproduce the actual musical part. This assignment should lead to playable string-fret combinations throughout the entire track and, in general, preserve parsimonious motion between successive combinations. Throughout the histor...
Article
Modern businesses are obligated to conform to regulations to prevent physical injuries and ill health for anyone present on a site under their responsibility, such as customers, employees and visitors. Safety officers (SOs) are engineers, who perform site audits to businesses, record observations regarding possible safety issues and make appropriat...
Preprint
Full-text available
Recent deep learning Text-to-Speech (TTS) systems have achieved impressive performance by generating speech close to human parity. However, they suffer from training stability issues as well as incorrect alignment of the intermediate acoustic representation with the input text sequence. In this work, we introduce Regotron, a regularized version of...
Preprint
Full-text available
Aphasia is a common speech and language disorder, typically caused by a brain injury or a stroke, that affects millions of people worldwide. Detecting and assessing Aphasia in patients is a difficult, time-consuming process, and numerous attempts to automate it have been made, the most successful using machine learning models trained on aphasic spe...
Article
Automatically synthesizing dance motion sequences is an increasingly popular research task in the broader field of human motion analysis. Recent approaches have mostly used recurrent neural networks (RNNs), which are known to suffer from prediction error accumulation, usually limiting models to synthesize short choreographies of less than 100 poses...
Article
Full-text available
Plant identification from images has become a rapidly developing research field in computer vision and is particularly challenging due to the morphological complexity of plants. The availability of large databases of plant images, and the research advancements in image processing, pattern recognition and machine learning, have resulted in a number...
Conference Paper
Full-text available
In this work, we employ deep learning methods for visual onset detection. We focus on live music performances involving bowed string instruments. In this context, we take as a source of meaningful information the sequence of movements of the performers' body and especially the bowing motion of the (right) hand. Body skeletons for each video frame a...
Article
Full-text available
Jazz improvisation on a given lead sheet with chords is an interesting scenario for studying the behaviour of artificial agents when they collaborate with humans. Specifically in jazz improvisation, the role of the accompanist is crucial for reflecting the harmonic and metric characteristics of a jazz standard, while identifying in real-time the in...
Chapter
iMuSciCA supports mastery of core academic content on STEM subjects for secondary school students alongside with the development of their creativity and deeper learning skills, through engagement in music activities. To reach this goal, iMuSciCA introduces new methodologies and innovative technologies supporting active, discovery-based, collaborati...
Conference Paper
Full-text available
We present a web-based real-time application that enables gestural interaction with virtual instruments for musical expression. Skeletons of the users are tracked by a Kinect sensor, while the performance of the virtual instruments is accomplished using gestures inspired from their corresponding physical counterparts. The application supports the v...
Conference Paper
Full-text available
Long Short-Term Memory (LSTM) neural networks have been effectively applied on learning and generating musical sequences, powered by sophisticated musical representations and integrations into other deep learning models. Deep neural networks, alongside LSTM-based systems, learn implicitly: given a sufficiently large amount of data, they transform i...
Article
Music Information Research (MIR) requires access to real musical content in order to test the efficiency and effectiveness of its methods as well as to compare developed methodologies on common data. Existing datasets do not address the research direction of musical track popularity that has recently received considerate attention. Moreover, source...
Conference Paper
Music Information Research requires access to real musical content in order to test efficiency and effectiveness of its methods as well as to compare developed methodologies on common data. Existing datasets do not address the research direction of musical track popularity that has recently received considerate attention. Existing sources of musica...
Article
This paper addresses the extraction of multipurpose spectral rhythm features that simultaneously tackle a variety of rhythm analysis tasks, namely, dance style classification, meter estimation, and tempo estimation. The term spectral rhythm features emanates from the origin of the extracted features, which is the periodicity function (PF), a spectr...
Conference Paper
Full-text available
This paper investigates the development of a rhythm representation of music audio signals, that (i) is able to tackle rhythm related tasks and, (ii) is invertible, i.e. is suitable to reconstruct audio from it with the corresponding rhythm content being preserved. A conventional front-end processing schema is applied to the audio signal to extract...
Article
Energy production from Municipal Solid Waste (MSW) has become one of the most prominent strategies in MSW management. In this study a multi-objective mathematical programming model is developed in order to provide the candidate (Pareto optimal) solutions for a MSW management system performing structural, design and operational optimization. Besides...
Conference Paper
This paper reports on high-performance Optical Character Recognition (OCR)experiments using Long Short-Term Memory (LSTM) Networks for Greek polytonic script. Even though there are many Greek polytonic manuscripts, the digitization of such documents has not been widely applied, and very limited work has been done on the recognition of such scripts....
Article
Although recognition of online handwritten text has reached a point of maturity, recognition of online handwritten mathematical expressions remains still a challenging problem. In this work we train a probabilistic SVM classifier to recognize spatial relations between two mathematical symbols or sub-expressions and then employ a CYK based algorithm...
Conference Paper
A critical issue in recognition of mathematical expressions is the identification of the spatial relations of the symbols or/and sub-expressions that comprise the entire mathematical formula. This paper addresses the problem of structural analysis of mathematical expressions by constructing appropriate feature vectors to represent the spatial affin...
Conference Paper
Full-text available
In this paper a method for computing an audio based similarity between music excerpts is presented. The method consists of three main parts, with the first step being feature extraction, which involves the calculation of three feature sets that correspond to music timbre, rhythm and harmony. Next, for each feature set a Deep Belief Network was trai...
Article
In this study a multi-objective mathematical programming model is developed for taking into account GHG emissions for Municipal Solid Waste (MSW) management. Mathematical programming models are often used for structure, design and operational optimization of various systems (energy, supply chain, processes, etc.). The last twenty years they are use...
Conference Paper
Full-text available
Mathematical expression recognition is still a very challenging task for the research community mainly because of the two-dimensional (2d) structure of mathematical expressions (MEs). In this paper, we present a novel approach for the structural analysis between two on-line handwritten mathematical symbols of a ME, based on spatial features of the...
Article
Full-text available
This paper presents a Bayesian approach for maintenance action recommendation tested on the PHM 2013 Data Challenge dataset. The Challenge focused on maintenance action recommendation based on historical cases and the algorithms were evaluated on their ability to recommend confirmed problem types. The proposed approach is based on a Bayesian infere...
Conference Paper
Full-text available
In this paper we present a method for learning tempo classes in order to reduce tempo octave errors. There are two main contributions of this paper in the rhythm analysis field. Firstly, a novel technique is proposed to code the rhythm periodicity functions of a music signal. Target tempi range is divided into overlapping "tempo bands" and the peri...
Conference Paper
Full-text available
We present a system for recognizing online mathematical expressions (ME). Symbol recognition is based on a template elastic matching distance between pen direction features. The structural analysis of the ME is based on extracting the baseline of the ME and then classifying symbols into levels above and below the baseline. The symbols are then sequ...
Conference Paper
Document image binarization is an initial though critical stage towards the recognition of the text components of a document. This paper describes an efficient method based on mathematical morphology for extracting text regions from degraded handwritten document images. The basic stages of our approach are: (a) top-hat-by-reconstruction to produce...
Conference Paper
Full-text available
This work extends the mean shift algorithm from the observa-tion space to the manifolds of parametric models that are formed by exponential families. We show how the Kullback-Leibler di-vergence and its dual define the corresponding affine connec-tion and propose a method for incorporating the uncertainty in estimating the parameters. Experiments a...
Conference Paper
Full-text available
In this paper, we present tempo estimation and beat tracking algorithms by utilizing percussive/harmonic separation of the audio signal, in order to extract filterbank energies and chroma features from the respective components. Periodicity analysis is carried out by the convolution of feature sequences with a bank of resonators. Target tempo is es...
Conference Paper
Full-text available
This paper proposes an enhancement of our previously presented word segmentation method (ILSPLWseg) [1] by exploiting local spatial features. ILSP-LWseg is based on a gap metric that exploits the objective function of a soft-margin linear SVM that separates successive connected components (CCs). Then a global threshold for the gap metrics is estima...
Article
Full-text available
This paper discusses the use of the BIC with respect to speaker diarization, i.e., the problem of assigning the observation vectors of an audio file to a set of speakers of unknown cardinality. Our primary goals are to examine the two dominant approaches of the BIC, namely the global and the local and combine the strengths of the two variants into...
Conference Paper
Full-text available
In this paper we examine a new penalty term for the Bayesian Information Criterion (BIC) that is suited to the problem of speaker diarization. Based on our previous approach of penalizing each cluster only with its effective sample size - an approach we called segmental - we propose a stricter penalty term. The criterion we derive retains the main...
Article
Full-text available
Two novel approaches to extract text lines and words from handwritten document are presented. The line segmentation algorithm is based on locating the optimal succession of text and gap areas within vertical zones by applying Viterbi algorithm. Then, a text-line separator drawing technique is applied and finally the connected components are assigne...
Conference Paper
Full-text available
This paper addresses the problem of automatic text-line and word segmentation in handwritten document images. Two novel approaches are presented, one for each task. In text-line segmentation a Viterbi algorithm is proposed while an SVM-based metric is adopted to locate words in each text-line. The overall algorithm was tested in the ICDAR2007 handw...
Conference Paper
Full-text available
In this paper we present a method of combining several acoustic parametric spaces, statistical models and distance metrics in speaker diarization task. Focusing our interest on the post-segmentation part of the problem, we adopt an incremental feature selection and fusion algorithm based on the Maximum Entropy Principle and Iterative Scaling Algori...
Article
Full-text available
An on-line handwritten character recognition technique based on a template matching distance is proposed. In this method, the pen-direction features are quantized using the 8-level Freeman chain coding scheme and the dominant points of the stroke are identified using the first difference of the chain code. The distance between two symbols results f...
Conference Paper
Full-text available
A new method for verifying text areas detected in video streams is proposed. The algorithm explores the spectral properties of the horizontal projection of candidate text regions in order to reduce the high amount of false alarms that most text detection algorithms suffer from. The full algorithm (text localization followed by verification and temp...
Article
A model of a stochastic froth is introduced in which the rate of random coalescence of a pair of bubbles depends on an inverse power law of their sizes. The main question of interest is whether froths with a large number of bubbles can grow in a stable fashion; that is, whether under some time-varying change of scale the distributions of rescaled b...
Article
A model of a stochastic froth is introduced in which the rate of random coalescence of a pair of bubbles depends on an inverse power law of their sizes. The main question of interest is whether froths with a large number of bubbles can grow in a stable fashion; that is, whether under some time-varying change of scale the distributions of rescaled b...

Network

Cited By

Projects

Projects (2)
Project
The proposed project, "A Speech and language Therapy Platform with Virtual Agent" (PLan-V), aims at developing an integrated, technologically assisted, speech/language intervention platform for people with chronic neurogenic communication disorders. Acquired speech and language disorders are increasingly relevant for a significant percentage of the population, given the current aging rate, and they have a direct and severe effect on quality of life because they limit daily communication. The effective support of this population requires individualized, systematic and regular intervention by speech/language therapists. Treatment outcomes are directly and positively correlated with the quality and frequency of clinical services. The project aims at developing a novel system that can support the self-management of chronic patients with speech and language impairment. It will allow patients to practice wherever and whenever they wish, without the physical presence of a clinician, via the assistance of a digital character/virtual speech and language therapist (Avatar). At the same time, the proposed platform will constitute a valuable clinical tool that will assist clinicians with the time consuming process of routine patient assessment and evaluation, and the development and execution of individualized intervention programs that will complement face-to-face clinical sessions.