George Carayannis's research while affiliated with National Technical University of Athens and other places

Publications (76)

Article
This paper addresses the extraction of multipurpose spectral rhythm features that simultaneously tackle a variety of rhythm analysis tasks, namely, dance style classification, meter estimation, and tempo estimation. The term spectral rhythm features emanates from the origin of the extracted features, which is the periodicity function (PF), a spectr...
Conference Paper
Full-text available
This paper investigates the development of a rhythm representation of music audio signals, that (i) is able to tackle rhythm related tasks and, (ii) is invertible, i.e. is suitable to reconstruct audio from it with the corresponding rhythm content being preserved. A conventional front-end processing schema is applied to the audio signal to extract...
Article
Although recognition of online handwritten text has reached a point of maturity, recognition of online handwritten mathematical expressions remains still a challenging problem. In this work we train a probabilistic SVM classifier to recognize spatial relations between two mathematical symbols or sub-expressions and then employ a CYK based algorithm...
Conference Paper
Full-text available
In this paper a method for computing an audio based similarity between music excerpts is presented. The method consists of three main parts, with the first step being feature extraction, which involves the calculation of three feature sets that correspond to music timbre, rhythm and harmony. Next, for each feature set a Deep Belief Network was trai...
Conference Paper
Full-text available
Mathematical expression recognition is still a very challenging task for the research community mainly because of the two-dimensional (2d) structure of mathematical expressions (MEs). In this paper, we present a novel approach for the structural analysis between two on-line handwritten mathematical symbols of a ME, based on spatial features of the...
Conference Paper
Full-text available
In this paper we present a method for learning tempo classes in order to reduce tempo octave errors. There are two main contributions of this paper in the rhythm analysis field. Firstly, a novel technique is proposed to code the rhythm periodicity functions of a music signal. Target tempi range is divided into overlapping "tempo bands" and the peri...
Conference Paper
Full-text available
We present a system for recognizing online mathematical expressions (ME). Symbol recognition is based on a template elastic matching distance between pen direction features. The structural analysis of the ME is based on extracting the baseline of the ME and then classifying symbols into levels above and below the baseline. The symbols are then sequ...
Conference Paper
Full-text available
In this paper we present a simple yet novel technique for harmonic/percussive separation of monaural audio music signals. Under the assumption that percussive/harmonic components exhibit vertical/horizontal lines in the spectrogram, image morphological filters are applied to the spectrogram of the input signal. The structure elements of the morphol...
Conference Paper
Document image binarization is an initial though critical stage towards the recognition of the text components of a document. This paper describes an efficient method based on mathematical morphology for extracting text regions from degraded handwritten document images. The basic stages of our approach are: (a) top-hat-by-reconstruction to produce...
Conference Paper
Full-text available
In this paper, we present tempo estimation and beat tracking algorithms by utilizing percussive/harmonic separation of the audio signal, in order to extract filterbank energies and chroma features from the respective components. Periodicity analysis is carried out by the convolution of feature sequences with a bank of resonators. Target tempo is es...
Conference Paper
Full-text available
In this paper, the use of closed-form expressions is compared to the BIC approximation, with respect to speaker clustering. We first show that the particular BIC setting which is commonly used in this task, namely the approximation of the marginal - with respect to the model parameters - and conditional - with respect to the latent variables - like...
Conference Paper
Document image segmentation to text lines is a critical stage towards unconstrained handwritten document recognition. Although morphological operations proved to be effective in processing machine-printed documents for several issues, similar methods for unconstraint-handwritten documents lack accuracy. We propose an efficient method based on binar...
Article
Full-text available
This paper discusses the use of the BIC with respect to speaker diarization, i.e., the problem of assigning the observation vectors of an audio file to a set of speakers of unknown cardinality. Our primary goals are to examine the two dominant approaches of the BIC, namely the global and the local and combine the strengths of the two variants into...
Article
Full-text available
The aim of this paper is to report for the first time the 1000 most common words and lemmas of Modern Greek and some of their quantitative characteristics. The frequency word list produced is based on the Hellenic National Corpus (HNC), a corpus of Modern Greek language consisting of about 13 million words of written texts. In particular, we invest...
Conference Paper
Full-text available
In this paper we examine a new penalty term for the Bayesian Information Criterion (BIC) that is suited to the problem of speaker diarization. Based on our previous approach of penalizing each cluster only with its effective sample size - an approach we called segmental - we propose a stricter penalty term. The criterion we derive retains the main...
Article
Full-text available
Two novel approaches to extract text lines and words from handwritten document are presented. The line segmentation algorithm is based on locating the optimal succession of text and gap areas within vertical zones by applying Viterbi algorithm. Then, a text-line separator drawing technique is applied and finally the connected components are assigne...
Conference Paper
Full-text available
This paper presents an algorithm that extracts the tempo of a musical excerpt. The proposed system assumes a constant tempo and deals directly with the audio signal. A sliding window is applied to the signal and two feature classes are extracted. The first class is the log-energy of each band of a mel-scale triangular filterbank, a common feature v...
Conference Paper
Full-text available
A novel approach to the Bayesian Information Criterion (BIC) is introduced. The new criterion redefines the penalty terms of the BIC, such that each parameter is penalized with the effective sample size is trained with. Contrary to Local-BIC, the proposed criterion scores overall clustering hypotheses and therefore is not restricted to hierarchical...
Conference Paper
Full-text available
In this paper we describe a system that applies emerging technologies for speech recognition, language processing, multimedia indexing and retrieval, all integrated into a large video and audio library that covers broadcast news and current affairs in Greece. It assists the Greek National Council for Radio and Television (NCRTV) in compiling inform...
Conference Paper
Full-text available
This paper addresses the problem of automatic text-line and word segmentation in handwritten document images. Two novel approaches are presented, one for each task. In text-line segmentation a Viterbi algorithm is proposed while an SVM-based metric is adopted to locate words in each text-line. The overall algorithm was tested in the ICDAR2007 handw...
Conference Paper
Full-text available
In this paper we present a method of combining several acoustic parametric spaces, statistical models and distance metrics in speaker diarization task. Focusing our interest on the post-segmentation part of the problem, we adopt an incremental feature selection and fusion algorithm based on the Maximum Entropy Principle and Iterative Scaling Algori...
Article
Full-text available
An on-line handwritten character recognition technique based on a template matching distance is proposed. In this method, the pen-direction features are quantized using the 8-level Freeman chain coding scheme and the dominant points of the stroke are identified using the first difference of the chain code. The distance between two symbols results f...
Article
This paper presents a new state-space method for spectral estimation that performs decimation by any factor and it is based on Singular Value Decomposition in order to estimate frequency, damping factor, amplitude and phase of complex damped sinusoids in the presence of noise. The new method, called DESE D, makes use of the full set of data and bri...
Article
Full-text available
Modern Greek is one of the least quantitatively studied modern European languages and the goal of this paper is to fill this relative void. We use the Hellenic National Corpus (HNC), which is a growing corpus that currently includes 33 million words. The corpus and all the tools used in our work were developed by the Institute for Language and Spee...
Article
This article describes a method for discriminating among authors within a given register of Modern Greek. The focus here is to determine to what extent the stylistic differences among authors can be detected with a high degree of accuracy for a set of texts belonging to a well‐defined register. To that end, the chosen register is characteriz...
Article
This article describes a method for discriminating among registers of Modern Greek and among authors within a given register. Two issues have been investigated: (a) whether register discrimination can successfully exploit linguistic information reflecting the evolution of a language (such as diglossia features of the Modern Greek language) and (b)...
Article
We report on the application of the Self-Organizing Map (SOM) classification method to the task of categorizing texts according to their register and the style of their author. The SOM has been selected as its performance in various data-mining applications has been found to be highly successful. Here, the method is evaluated against the task of cl...
Conference Paper
Full-text available
If speech analysis is to detect a speaker's emotional state, it needs to derive information from both linguistic information, i.e., the qualitative targets that the speaker has attained (or approximated), conforming to the rules of language; and paralinguistic information, i.e., allowed variations in the way that qualitative linguistic targets are...
Article
Full-text available
This article investigates (a) whether register discrimination can successfully exploit linguistic information reflecting the evolution of a language (such as the diglossia phenomenon of the Modem Greek language) and (b) what kind of linguistic information and which statistical techniques may be employed to distinguish among individual styles within...
Article
This presentation focuses on the IMUTUS project, which concerns the creation of an innovative method for training users on traditional musical instruments with no MIDI (Musical Instrument Digital Interface) output. The entities collaborating in IMUTUS are ILSP (coordinator), EXODUS, SYSTEMA, DSI, SMF, GRAME, and KTH. The IMUTUS effectiveness is enh...
Article
This paper presents a new state-space method for spectral estimation that performs decimation by any factor D while it imposes no constraints to the model order with respect to D. The new method, called DESED, as well as its Total Least Squares version called DESED_TLS, makes use of the full data set available and is based on SVD in order to estima...
Conference Paper
A new state-space method for spectral estimation that performs decimation by factor two while it makes use of the full set of data available is presented. The proposed method, called DESE2, is based on singular value decomposition in order to estimate frequency, damping factor, amplitude and phase of exponentially damped sinusoids in the presence o...
Article
In the present paper, the Self-Organising Map (SOM) is applied to the problem of categorising a corpus of Modem Greek texts according to the style of their authors. A number of variants of the SOM model are used in a series of experiments, in order to compare and contrast their behaviour in the specific task. The experimental results indicate that...
Conference Paper
In this paper we present a laboratory prototype that has been developed at the Institute for Language and Speech Processing (ILSP) for Optical Recognition of Printed Music in the framework of the MUSTUTOR project funded by the EC. The algorithms implemented proceed in three different stages and perform symbol recognition of the printed music image....
Article
Full-text available
This article proposes a new method for determining the order of wide-band quasi-periodic signals from frequency estimates provided either by their short-time Fourier or linear prediction (LP) spectra. The method consists in the search for harmonic patterns in the signal spectrum that minimize an error sum of the estimated frequencies. This error ca...
Article
A new analytical methodology is introduced here for fixed-point error analysis of various Toeplitz solving algorithms. The method is applied to the very useful Schur algorithm and the lately introduced split Schur (1918, 1986) algorithm. Both exact and first order error analysis are provided in this paper. The theoretical results obtained are consi...
Article
This comparative study of the l -step-ahead linear prediction and least-squares finite impulse response (LS FIR) filtering problems emphasizes the numerical behavior of the resulting Toeplitz systems. It is shown that, although these systems are similar, the restraints on the autocorrelation coefficients fundamentally differentiate them. In the pro...
Article
This paper focuses on the study of some key properties of transform coding techniques, for efficient object manipulation and reconstruction. Two-dimensional binary objects are considered and represented by their one-dimensional contour signals. The expression of these contours in the transform domain has several advantages: generation of the object...
Article
The parallel realization of a popular dynamic time warping (DTW) algorithm is discussed. Two alternative techniques are proposed, one based on a circular array and the other using a linear array of processing elements (PEs). The architecture of each PE is defined in both cases and computational phases are outlined. The number of PEs is not restrict...
Article
An adaptive approach to the restoration of noisy images is presented in this paper. Two recursive schemes are developed, which simultaneously estimate the unknown image model and restore the image. In the first, the reduced update Kalman filter (RUKF) is appropriately combined with a fast multichannel space-recursive estimation technique (FAMSRET)....
Article
A computationally efficient method for adaptive image estimation is developed, based on the multichannel form of the one-dimensional fast least-squares algorithms. Extended forms of various two-dimensional autoregressive image models are derived and used for this purpose. It is shown that the method, named the fast multichannel space recursive esti...
Article
This work gives a general presentation and classification of the various rules for text-to-phoneme transcription in Modern Greek. It is the outcome of a detailed study of Modern Greek based on more than 5000 words taken from everyday texts. We believe that this study is reasonably exhaustive, and that the rules formulated can accomodate not only te...
Article
This correspondence presents an order-recursive algorithm for the computation of the model parameters required in multipulse linear predictive coding (LPC) or speech in the case of autocorrelation method. Its computational complexity is 2p<sup>2</sup>+ 4p multiplications and divisions, where p + 1 is the number of the model parameters. It is also s...
Article
This paper presents order-recursive algorithms for the computation of the model parameters required in Multipulse Linear Predictive Coding (LPC) of speech in the case of autocorrelation and covariance methods. Its computational complexity is 2p2+ 4p multilications and divisions, where p+1 is the number of the model parameters. It is also shown, tha...
Article
This paper is concerned with the efficient determination of the optimum, in the least squares sense, FIR filter on the basis of data samples of the input and desired response signals, by procedures recursive in the filter order. This situation typically arises when no a priori statistics are available and the system order is not known. The general...
Article
A new computationally efficient algorithm for sequential least-squares (LS) estimation is presented in this paper. This fast a posteriori error sequential technique (FAEST) requires 5p MADPR (multiplications and divisions per recursion) for AR modeling and 7p MADPR for LS FIR filtering, where p is the number of estimated parameters. In contrast the...
Conference Paper
The present paper deals with a new, computationally efficient, algorithm for Sequential Least Squares (LS) estimation. This scheme requires only O(5p) MAD (Multiplications And Divisions) per recursion to update a Kalman type gain vector; p is the number of estimated parameters. In contrast the well-known fast Kalman algorithm requires O(8p) MAD. Th...
Article
In many applications, including geophysical signal processing and system identification, the computation of a FIR Wiener filter, corresponding to the optimum lag between the input signal and the desired response, or of an optimum prediction distance predictor are often required. These problems lead to the solution of a family of Toeplitz systems of...
Article
In many signal processing applications, one often seeks the solution of a linear system of equations by means of fast algorithms. The special form of the matrix associated with the linear system may permit the development of algorithms requiring 0 (p<sup>2</sup>) or fewer operations. Hankel and Toeplitz matrices provide well known examples and vari...
Article
The relative average duration of vowels were found to have a significant impact on the perception of a given speech rate (tempo) of a Text to Speech Synthesis Sys-tem (TTS) developed for the Greek language. Even though the duration ratios vary slightly according to segmental context and word prominence, their modifi-cation beyond a certain level of...
Article
Full-text available
The pitch or F0 contour is a carrier of multiple information, from segmental to intonational (i.e. grammatical and syntactic to semantic to pragmatic). In this paper it is investigated the relation of certain configurations of F0 to the structural organization of an utterance. First some analytical observations are made. Then, in order to verify ou...
Article
Full-text available
This paper presents the Hellenic National (HNC), which is the corpus of Modern Greek developed by the Institute for Language and Speech Processing (ILSP). The presentation describes all stages of the creation of the corpus: collection of the material, tagging and tokenizing, construction of the database and the online implementation which aims at r...
Article
Full-text available
In this article, a system is proposed for the automatic style categorisation of text corpora in the Greek language. This categorisation is based to a large extent on the type of language used in the text, for example whether the language used is representative of formal Greek or not. To arrive to this categorisation, the highly inflectional nature...
Article
Full-text available
The present paper describes a new algorithm for addressing a significant issue: "Greeklish" (or "Greenglish"), which arose by the fact that the Greek language is not fully supported by computer programs and operating systems. In the first section of the paper we describe the "Greeklish" phenomenon and the current situation, in reference also with r...

Citations

... A recent study estimated tempo, dance-style and meter by applying oscillators to rhythm activations, extracting the response at different frequencies (Gkiokas, Katsouros, & Carayannis, 2016). On a similar theme, Schreiber and Müller (2017) use 50 features for tempo octave correction capturing the energy at 10 beat periodicities (log-spaced) × 5 spectral bands. ...