George Carayannis’s research while affiliated with National Technical University of Athens and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (76)


Towards Multi-Purpose Spectral Rhythm Features: An Application to Dance Style, Meter and Tempo Estimation
  • Article

April 2016

·

66 Reads

·

10 Citations

IEEE/ACM Transactions on Audio Speech and Language Processing

·

·

George Carayannis

This paper addresses the extraction of multipurpose spectral rhythm features that simultaneously tackle a variety of rhythm analysis tasks, namely, dance style classification, meter estimation, and tempo estimation. The term spectral rhythm features emanates from the origin of the extracted features, which is the periodicity function (PF), a spectral representation that encapsulates the salience of the rhythm frequencies. Two dimensionality reduction techniques applied on the PF to extract expressive and compact features are compared, namely, a linear transformation resulting from Principal Component Analysis and a nonlinear mapping derived from a Restricted Boltzmann Machine. Subsequently, the derived features were used as input to an SVM classifier for each task. Moreover, an additional method is proposed that reformulates the well-studied tempo estimation task as a combination of multiple binary classification sub-problems. Evaluation was performed on a large number of datasets demonstrating that the same set of features learned from the PF provide a robust rhythmic representation that achieved comparable results to the current state-of-the-art methods for the aforementioned tasks.


Figure 5. Class sampling method overview. 
Towards an Invertible Rhythm Representation
  • Conference Paper
  • Full-text available

November 2015

·

146 Reads

·

·

·

[...]

·

George Carayannis

This paper investigates the development of a rhythm representation of music audio signals, that (i) is able to tackle rhythm related tasks and, (ii) is invertible, i.e. is suitable to reconstruct audio from it with the corresponding rhythm content being preserved. A conventional front-end processing schema is applied to the audio signal to extract time varying characteristics (accent features) of the signal. Next, a periodicity analysis method is proposed that is capable of reconstructing the accent features. Afterwards, a network consisting of Restricted Boltzmann Machines is applied to the periodicity function to learn a latent representation. This latent representation is finally used to tackle two distinct rhythm tasks, namely dance style classification and meter estimation. The results are promising for both input signal reconstruction and rhythm classification performance. Moreover, the proposed method is extended to generate random samples from the corresponding classes.

Download

Recognition of online handwritten mathematical formulas using probabilistic SVMs and stochastic context free grammars

February 2015

·

105 Reads

·

52 Citations

Pattern Recognition Letters

Although recognition of online handwritten text has reached a point of maturity, recognition of online handwritten mathematical expressions remains still a challenging problem. In this work we train a probabilistic SVM classifier to recognize spatial relations between two mathematical symbols or sub-expressions and then employ a CYK based algorithm to parse the mathematical expression in order to produce the respective MathML output. For the recognition of mathematical expressions we assume compliance with a stochastic context free grammar. It must be noted that in this work we make the assumption that the symbols that comprise the mathematical expression have been correctly recognized. We evaluate the recognition of spatial relation on the MathBrush database and the experimental results produce an overall mean error rate of 2.8%. MathML output is evaluated with the use of the datasets and evaluation tools of the CROHME2012 and CROHME2013 competitions. Experimental results give, at mathematical expression level, an accuracy of 78.70%, 65.78%, 56.37% and 50.22% for the Part-I, Part-II, Part-III and Part-IV on the respective test sets.


Fig. 1. 
Deploying Deep Belief Nets for content based audio music similarity

In this paper a method for computing an audio based similarity between music excerpts is presented. The method consists of three main parts, with the first step being feature extraction, which involves the calculation of three feature sets that correspond to music timbre, rhythm and harmony. Next, for each feature set a Deep Belief Network was trained without supervision on a large music collection. The respective distances of the output units of the Deep Belief Networks between two music excerpts are computed, normalized and finally combined to form the distance measure. The proposed method was evaluated on the MIREX 2013 Audio Music Similarity task. Results are encouraging, however, they indicate that the harmonic similarity component degrades the performance.


Figure 1. The bounding box of a symbol A is the rectangular area that firmly encloses the symbol.
Table 1 . Dataset of spatial relations between pairs of symbols.
Table 2 . Comparison of average number of errors of three different classifiers (i) ILSP-1, (ii) one-against-all SVM and (iii) one- against-one SVM
Figure 3. Error example resulted by the "one-against-all" technique
Figure 4. Error examples resulted by the "one-against-one" technique
Structural analysis of online handwritten mathematical symbols based on support vector machines

February 2013

·

448 Reads

·

4 Citations

Proceedings of SPIE - The International Society for Optical Engineering

Mathematical expression recognition is still a very challenging task for the research community mainly because of the two-dimensional (2d) structure of mathematical expressions (MEs). In this paper, we present a novel approach for the structural analysis between two on-line handwritten mathematical symbols of a ME, based on spatial features of the symbols. We introduce six features to represent the spatial affinity of the symbols and compare two multi-class classification methods that employ support vector machines (SVMs): one based on the “one-against-one” technique and one based on the “one-against-all”, in identifying the relation between a pair of symbols (i.e. subscript, numerator, etc). A dataset containing 1906 spatial relations derived from the Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME) 2012 training dataset is constructed to evaluate the classifiers and compare them with the rule-based classifier of the ILSP-1 system participated in the contest. The experimental results give an overall mean error rate of 2.61% for the “one-against-one” SVM approach, 6.57% for the “one-against-all” SVM technique and 12.31% error rate for the ILSP-1 classifier. © (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.


Figure 1. Overview of the proposed method.
Figure 2. Periodicity vector coding process. ⊗ stands for inner product.
Figure 4. Distribution of classification errors (dark bars) with respect to ground-truth tempo compared to the overall dataset tempo distribution (light bars).
Reducing Tempo Octave Errors by Periodicity Vector Coding and SVM Learning

October 2012

·

101 Reads

·

17 Citations

In this paper we present a method for learning tempo classes in order to reduce tempo octave errors. There are two main contributions of this paper in the rhythm analysis field. Firstly, a novel technique is proposed to code the rhythm periodicity functions of a music signal. Target tempi range is divided into overlapping "tempo bands" and the periodicity function is filtered by triangular masks aligned to those tempo bands, in order to calculate the respective saliencies, followed by the application of the DCT transform on band strengths. The second contribution is the adoption of Support Vector Machines to learn broad tempo classes from the coded periodicity vectors. Training instances are assigned a tempo class according to annotated tempo. The classes are assumed to correspond to "music speed". At classification phase, each target excerpt is assigned a tempo class label by the SVM. Target periodicity vector is masked by the predicted tempo class range, and tempo is estimated by peak picking in the reduced periodicity vector. The proposed method was evaluated on the benchmark ISMIR 2004 Tempo Induction Evaluation Exchange Dataset for both tempo class and tempo value estimation tasks. Results indicate that the proposed approach provides an efficient framework to tackle the tempo estimation task.


Figure 2. Bounding box of a symbol Centroid. A very common technique to test whether a symbol lies within a region or not, is to examine the coordinates of its centroid [1]. In processing on-line handwritten symbols, the calculation of a symbol's centroid cannot be based on the mean value of pixels' coordinates as in the case of off-line symbols. Following the approach suggested in [3], we define the x-coordinate of the centroid of a symbol A
Figure 3. MathML structure of the example of Figure 1 We have to point out that the symbol of fraction/minus, such as the {_=1} symbol in the example ME of Figure 1, is clarified at the structural analysis stage and not at the symbol recognition stage.
A System for Recognition of On-Line Handwritten Mathematical Expressions

September 2012

·

506 Reads

·

21 Citations

We present a system for recognizing online mathematical expressions (ME). Symbol recognition is based on a template elastic matching distance between pen direction features. The structural analysis of the ME is based on extracting the baseline of the ME and then classifying symbols into levels above and below the baseline. The symbols are then sequentially analyzed using six spatial relations and a respective 2d structure is processed to give the resulting MathML representation of the ME. The system was evaluated on the Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME) 2011 datasets and demonstrates promising results.


Figure 5. SDR of the proposed method for opening/erosion filter (simple structure element used) compared to median filter proposed in [10] for various filter sizes. 
Deploying Nonlinear Image Filters to Spectrogram for Harmonic/Percussive Separation

September 2012

·

210 Reads

·

6 Citations

In this paper we present a simple yet novel technique for harmonic/percussive separation of monaural audio music signals. Under the assumption that percussive/harmonic components exhibit vertical/horizontal lines in the spectrogram, image morphological filters are applied to the spectrogram of the input signal. The structure elements of the morphological filters are chosen to accentuate regions of the spectrogram corresponding to harmonic and percussive components. The proposed method was evaluated on the SISEC 2008/2010 development data and outperformed the baseline method adopted.


A Morphology Based Approach for Binarization of Handwritten Documents

September 2012

·

42 Reads

·

8 Citations

Document image binarization is an initial though critical stage towards the recognition of the text components of a document. This paper describes an efficient method based on mathematical morphology for extracting text regions from degraded handwritten document images. The basic stages of our approach are: (a) top-hat-by-reconstruction to produce a filtered image with reasonable even background, (b) region growing starting from a set of seed points and attaching to each seed similar intensity neighboring pixels and (c) conditional extension of the initially detected text regions based on the values of the second derivative of the filtered image. The method was evaluated on the benchmarking dataset of the International Document Image Binarization Contest (DIBCO 2011) and show promising results.


Music Tempo Estimation and Beat Tracking by Applying Source Separation and Metrical Relations

March 2012

·

385 Reads

·

59 Citations

Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on

In this paper, we present tempo estimation and beat tracking algorithms by utilizing percussive/harmonic separation of the audio signal, in order to extract filterbank energies and chroma features from the respective components. Periodicity analysis is carried out by the convolution of feature sequences with a bank of resonators. Target tempo is estimated from the resulting periodicity vector by incorporating metrical relations knowledge. Tempo estimation is followed by a local tempo refinement method to enhance the beat-tracking algorithm. Beat tracking involves the computation of the beat saliencies derived from the resonators responses and proposes a distance measure between candidate beats locations. A dynamic programming algorithm is adopted to find the optimal “path” of beats. Both tempo estimation and beat tracking methods were submitted on MIREX 2011, while the tempo estimation algorithm was also evaluated on ISMIR 2004 Tempo Induction Evaluation Exchange Dataset.


Citations (52)


... A recent study estimated tempo, dance-style and meter by applying oscillators to rhythm activations, extracting the response at different frequencies (Gkiokas, Katsouros, & Carayannis, 2016). On a similar theme, Schreiber and Müller (2017) use 50 features for tempo octave correction capturing the energy at 10 beat periodicities (log-spaced) × 5 spectral bands. ...

Reference:

Tempo-Invariant Processing of Rhythm with Convolutional Neural Networks
Towards Multi-Purpose Spectral Rhythm Features: An Application to Dance Style, Meter and Tempo Estimation
  • Citing Article
  • April 2016

IEEE/ACM Transactions on Audio Speech and Language Processing

... Subsequently, each frequency bin is resampled at the frame rate of 200 Hz. Next, the Harmonic/Percussive separation method described in [49] is applied to the resampled CQT in order to extract the TF components that correspond to the Percussive (P) and Harmonic (H) parts of the signal. The motivation behind this step is to decorrelate, as much as possible, the rhythmic content of the two components. ...

Deploying Nonlinear Image Filters to Spectrogram for Harmonic/Percussive Separation

... If a listener perceives the tempo to be halved or doubled, their preferred tactus would change to a metrical level once removed from the originally detected tactus. Octave errors have been revealed in numerous attempts at automatic tempo extraction in the music information retrieval community (Gkiokas et al., 2012;Hockman & Fujinaga, 2010;Hörschläger et al., 2015;Klapuri et al., 2006), but very little behavioral research has addressed this phenomenon directly. ...

Reducing Tempo Octave Errors by Periodicity Vector Coding and SVM Learning

... Instead, it uses extensive training data to acquire the domain knowledge. Table 5 presents Recent studies on online [61,48,49,51,52,57,46] and offline [40,54,60] handwritten MEs deduce that much consideration has appealed to the use of RNN variations (LSTM, GRU) as a decoder. The online strokebased inputs are encoded using GRU [40,51], BLSTM [57], GNN [47], DenseNet [52] and the joint GRU-CNN [58] models. ...

Recognition of online handwritten mathematical formulas using probabilistic SVMs and stochastic context free grammars
  • Citing Article
  • February 2015

Pattern Recognition Letters

... In order to detect Greeklish and transcode them to Greek characters, we use specialized software. 47 After fixing each terms' language, we proceed in correcting its case (line 5). As a consequence, terms of keywords that had been treated separately up to that point (as, for example, in the case of the 3rd, 9th and 10th rows of Table 4) are now grouped together. ...

Bypassing Greeklish!
  • Citing Article

... The storyboards are essentially utilized by the designer to communicate the ideas regarding the site structure navigation of the application (Newman & Landay, 2000). With the advance technologies that are being used nowadays, the multimedia storyboard is becoming more attractive and engaging for the novices (Antoniou-kritikou, Carayannis, & Katsouros, 1991). Thus, the storyboards will help the researcher to sketch the suitable flow of this multimedia application. ...

A MULTIMEDIA STORYBOARD AS AN OBJECT AND AS A STARTING POINT FOR LANGUAGE LEARNING

... For the Language model training, we create a large corpus for the Greek language using a subset of the Greek part of CC-Net [71] (approximately 11 billion tokens) and combine it with 1.5 billion tokens from the Greek version of Wikipedia and the Hellenic National Corpus (HNC) [72]. During preprocessing, we remove all punctuation and accents, deduplicate lines and convert all letters to lowercase. ...

Design and implementation of the online ILSP Greek Corpus

... Decimative spectrum estimation constitutes a very interesting field of signal processing research [4], [5], [6], compared to the classical methods that were proposed a few decades ago. Decimation improves resolution capability of a frequency estimation method. ...

A New Decimative Spectral Estimation Method with Unconstrained Model Order and Decimation Factor
  • Citing Article
  • January 2002

... A specified number of bestmatching points are marked with a symbol and can then be used as starting points for browsing. Some other approaches use SOM as a clustering and visualization software tool [24,28,39]. They have also been used to estimate mobile location [47], pattern recognition [1], and gene clustering [48]. ...

Evaluating SOM-based models in Text Classification Tasks for the Greek Language
  • Citing Article
  • January 2001

... Tambouratzis, et al. [28] carried out style-based text classification tests for the Greek language, focusing on polysemy and grammatically equivalent word forms. They counted morphological, as well as structural features of the texts and deployed cluster analysis on three categories (Fiction, History, Politics), with high accuracy results. ...

Automatic Style Categorisation of Corpora in the Greek Language