George TambouratzisInstitute for Language and Speech Processing Athena Res. Centre
George Tambouratzis
Ph.D.
About
87
Publications
3,908
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
485
Citations
Introduction
Pattern Recognition
Neural Networks and Computational Intelligence
Machine Translation
Skills and Expertise
Publications
Publications (87)
The present article involves the generation of phrase boundaries in unconstrained free text. Particle Swarm Optimisation (PSO) is applied to determine the optimal values for a set of parameters, based on a limited amount of training data. The starting point is a detailed analysis of generated solutions, which leads to a reformulation of the phrasin...
The present article describes a novel phrasing model which can be used for segmenting sentences of unconstrained text into syntactically-defined phrases. This model is based on the notion of attraction and repulsion forces between adjacent words. Each of these forces is weighed appropriately by system parameters, the values of which are optimised v...
The present article reviews the application of Particle Swarm Optimization (PSO) algorithms to optimize a phrasing model, which splits any text into linguistically-motivated phrases. In terms of its functionality, this phrasing model is equivalent to a shallow parser. The phrasing model combines attractive and repulsive forces between neighbouring...
Following the detailed description of the PRESEMT Machine Translation system and the report on its performance, the current chapter focuses on the system’s portability. Portability is a term intended to signify the process of integrating a new language pair into the system. This involves reviewing all the necessary system modules and resources and...
The topic of the current chapter is the evaluation of the performance of PRESEMT both per se as well as in comparison with other MT systems, the performance relating to the translation quality being achieved. While it is possible to employ humans for this task (subjective evaluation), who assess an MT system in terms of fluency (i.e. grammaticality...
This chapter performs a review of the research work discussed in the previous chapters of the present volume. This review represents a summary of the outcomes of the research within the PRESEMT project. As a logical outcome, a set of key directions is identified for future work in order to further improve the MT methodology. A brief report of the m...
This chapter presents in detail the main translation process of PRESEMT, delving deeper in the core of the system and its inner workings.
This chapter introduces the general design characteristics of PRESEMT and provides a detailed description of all resources required as well as all pre-processing steps needed, such as corpora processing and model creation.
This chapter describes a number of improvements performed on the basic PRESEMT system. These improvements are aimed at specific modules of the system in an effort to achieve gains in the translation accuracy, for which alternative implementations have been suggested. These extensions concern different modules of the PRESEMT architecture. The first...
This chapter contains a general introduction to the topic of the present book. It presents the current challenges of Machine Translation (MT), in particular for languages where only a limited amount of specialised resources is readily available. To that end, a comprehensive review of the state-of-the-art in MT is performed. Focus is placed on relat...
This book provides a unified view on a new methodology for Machine Translation (MT). This methodology extracts information from widely available resources (extensive monolingual corpora) while only assuming the existence of a very limited parallel corpus, thus having a unique starting point to Statistical Machine Translation (SMT). In this book, a...
The present article investigates the effectiveness of evolutionary computation algorithms in a specific optimisation task, namely morphological segmentation of words into subword segments, focusing on the definition of stems and endings. More precisely, particle swarm optimisation (PSO) is compared to an earlier study on the same task using ant col...
The present chapter reviews the development of a hybrid Machine Translation (MT) methodology, which is readily portable to new language pairs. This MT methodology (which has been developed within the PRESEMT project) is based on sampling mainly monolingual corpora, with very limited use of parallel corpora, thus supporting portability to new langua...
This communication focuses on comparing the template-matching technique to established probabilistic approaches - such as conditional random fields (CRF) - on a specific linguistic task, namely the phrasing of a sequence of words into phrases. This task represents a low-level parsing of the sequence into linguistically-motivated phrases. CRF repres...
The present article reports on efforts to improve the translation accuracy of a corpus– based hybrid MT system developed using the PRESEMT methodology. This methodology operates on a phrasal basis, where phrases are linguistically-motivated but are automatically determined via a dedicated module. Here, emphasis is placed on improving the structure...
The present article focuses on improving the performance of a hybrid Machine Translation (MT) system, namely PRESEMT. The PRESEMT methodology is readily portable to new language pairs, and allows the creation of MT systems with minimal reliance on expensive resources. PRESEMT is phrase-based and uses a small parallel corpus from which to extract st...
The present article investigates the fusion of different language models to improve translation accuracy. A hybrid MT system, recentlydeveloped in the European Commissionfunded PRESEMT project that combines example-based MT and Statistical MT principles is used as a starting point. In this article, the syntactically-defined phrasal language models...
In this article, the application of Ant-Colony Optimization (ACO) to a morphological segmentation task is described, where
the aim is to analyse a set of words into their constituent stem and ending. A number of criteria for determining the optimal
segmentation are evaluated comparatively while at the same time investigating more comprehensively th...
This article presents a hierarchical clustering algorithm aimed at creating groups of stems with similar characteristics. The resulting groups (clusters) are expected to comprise stems belonging to the same inflectional paradigm (e.g. verbs in passive voice) in order to support the creation of a morphological lexicon. A new metric for calculating t...
The current paper evaluates the performance of the PRESEMT methodology, which facilitates the creation of machine translation (MT) systems for different language pairs. This methodology aims to develop a hybrid MT system that extracts translation information from large, predominantly monolingual corpora, using pattern recognition techniques. PRESEM...
The current paper presents a language-independent methodology, which facilitates the creation of machine translation (MT) systems for various language pairs. This methodology is implemented in the PRESEMT hybrid MT system. PRESEMT has the lowest possible requirements on specialised resources and tools, given that for many languages (especially less...
This document contains a brief presentation of the PRESEMT project that aims in the development of a novel language-independent methodology for the creation of a flexible and adaptable MT system.
This article investigates the application of the SOLNN (Self-Organising Logic Neural Network) n-tuple-based network to character recognition and image segmentation clustering tasks, where the classes consist of a large number of distinct sub-classes. It is shown that the SOLNN clustering performance and node utilisation are both improved by virtue...
The main purpose of this paper is the classification of documents in terms of their content. Two systems are presented here that share a two-level architecture that include 1) a word map created via unsupervised learning that functions as a document-representation module and 2) a supervised multilayer-perceptron-based classifier. Two approaches to...
In this article, aspects regarding the optimisation of mach ine translation systems via evolutionary computation algorithms are examined. The article focuses on pattern- recognition based machine translation systems that use large monolingual corpora in the target language from which statistical information is extracted. The research reported here...
The present article introduces a phrasealignment approach that involves the processing of a small bilingual corpus in order to extract suitable structural information. This is used in the PRESEMT project, whose aim is the quick development of phrase-based Machine Translation (MT) systems for new language pairs. A main bottleneck of such systems is...
In this paper, an automated method is proposed for optimising the real-valued parameters of a hybrid Machine Translation (MT) system that employs pattern recognition techniques together with extensive monolingual corpora in the target language from which statistical information is extracted. The absence of a parallel corpus prohibits the use of the...
The present paper investigates the application of the multi-layer perceptron (MLP) to the task of categorizing texts based
on their authors’ style. This task is of particular importance for information retrieval applications involving very large
document databases. The emphasis of this article is to determine the extent to which the MLP model can b...
This article presents a novel approach for morphological analysis based on the concept of genetic algorithms (GAs). Morphological
analysis is of critical importance in data mining and information retrieval systems because it leads to a more homogeneous
representation of words. The system presented here makes minimal use of language specific informa...
In the present study an ACO algorithm is adopted as a part of a document classification system that classifies documents written
in Greek, in thematic categories. The main purpose of the ACO module is to create a word map that will assist in the representation
of the documents in the pattern space. The word map creation algorithm proposed involves...
Given a text or collection of texts involving unconstrained language, a basic task in a multitude of applications is the identification of stems and endings for each word form, which is termed morphological analysis. In this paper, the use of an ant colony optimization (ACO) metaheuristic is proposed for a linguistic task that involves the automate...
This article investigates the behaviour of a self-organizing logic neural network when it is tasked with clustering complex data spaces. The network is based on the discriminator-node structure and is trained using an unsupervised-learning adaptation rule. The network performance is evaluated by applying it to clustering tasks involving identifiabl...
A data analysis task is described, which is focused on the clustering of high-dimensional meteorological data collected long term (more than 43 years) at 128 weather stations in Greece. The proposed hybrid method combines (a) the assignment of the stations to two-dimensional grids of nodes via self-organizing maps (SOMs) of various sizes and (b) st...
The sectioned genetic algorithm (hereafter denoted as sectioned GA), which is presented in this paper, represents a modification of the standard GA and deals with large scale problems (i.e. problems involving pattern spaces with high dimensionalities). Instead of increasing the size of the population searching the pattern space when the problem dim...
This paper presents a hierarchical clustering algorithm aimed at creating groups of stems with similar characteristics. The resulting groups (clusters) are expected to comprise stems belonging to the same inflectional paradigm (e.g. verbs in passive voice) which will aid the creation of a morphological lexicon. A new metric for calculating the dist...
In this article, a system based on Hidden Markov Models (HMM) for document organization is presented. The purpose of the system is the classification of a document collection in terms of document content. The system possesses a two-level hybrid connectionist architecture that comprises (i) an automatically created word map using a HMM, which functi...
In this paper, the VEMUS platform is presented, as a novel approach for music tuition that focuses on beginner and intermediate students, typically aged from 9 to 15 years. This platform is characterized by an open, highly interactive and networked multilingual music tuition framework that covers a selection of popular wind instruments. The VEMUS e...
The present article investigates the effectiveness of neural network models when applied to the task of categorising texts in the Greek language based on the style of their authors. Multilayer perceptrons (MLP), radial basis function networks (RBF) and self-organizing maps (SOM) are comparatively studied on the task of classifying documents based o...
A genetic algorithm (GA) is presented in this article aiming at the automated extraction of morphological information from a corpus and ultimately at the creation of a computational model capable of distinguishing the stem of a word from its inflectional suffix. A multiobjective approach of a GA (MGA) is introduced, where different objective functi...
In this paper, a novel SOM-based system for document organization is presented. The purpose of the system is the classification of a document collection in terms of document content. The system possesses a two-level hybrid connectionist architecture that comprises (i) an automatically created word map using a SOM, which functions as a feature extra...
This article reports on experiments performed with a large corpus, aiming at separating texts according to the author style. The study initially focusses on whether the classification accuracy regarding the author identity may be improved, if the text topic is known in advance. The experimental results indicate that this kind of information contrib...
The innovative feature of the system presented in this paper is the use of pattern-matching techniques to retrieve translations resulting in a flexible, language-independent approach, which employs a limited amount of explicit a priori linguistic knowledge. Furthermore, while all state-of-the-art corpus-based approaches to Machine Translation (MT)...
In this paper an innovative approach is presented for MT, which is based on pat- tern matching techniques, relies on extensive target language monolingual corpora and em- ploys a series of similarity weights between the source and the target language. Our system is based on the notion of 'patterns', which are viewed as 'models' of target language s...
This article describes a method for discriminating among registers of Modern Greek and among authors within a given register. Two issues have been investigated: (a) whether register discrimination can successfully exploit linguistic information reflecting the evolution of a language (such as diglossia features of the Modern Greek language) and (b)...
This article describes a method for discriminating among authors within a given register of Modern Greek. The focus here is to determine to what extent the stylistic differences among authors can be detected with a high degree of accuracy for a set of texts belonging to a well‐defined register. To that end, the chosen register is characteriz...
In this article, two clustering techniques based on neural networks are introduced. The two neural network models are the Harmony theory network (HTN) and the self-organizing logic neural network (SOLNN), both of which are characterized by parallel processing, a distributed architecture, and a large number of nodes. After describing their clusterin...
We report on the application of the Self-Organizing Map (SOM) classification method to the task of categorizing texts according to their register and the style of their author. The SOM has been selected as its performance in various data-mining applications has been found to be highly successful. Here, the method is evaluated against the task of cl...
A data mining application is described, which is focused on the analysis of high-dimensional meteorological data collected long-term (over 43 years) at 130 weather stations in Greece. A hybrid clustering method (combining artificial neural networks and statistical-based techniques) has been employed for grouping the data. The proposed method has be...
This article investigates (a) whether register discrimination can successfully exploit linguistic information reflecting the evolution of a language (such as the diglossia phenomenon of the Modem Greek language) and (b) what kind of linguistic information and which statistical techniques may be employed to distinguish among individual styles within...
The scanning n-tuple technique (as introduced by Lucas and Amiri, 1996) is studied in pattern recognition tasks, with emphasis placed on methods that improve its recognition performance. We remove potential edge effect problems and optimize the parameters of the scanning n-tuple method with respect to memory requirements, processing speed and recog...
This presentation focuses on the IMUTUS project, which concerns the
creation of an innovative method for training users on traditional
musical instruments with no MIDI (Musical Instrument Digital Interface)
output. The entities collaborating in IMUTUS are ILSP (coordinator),
EXODUS, SYSTEMA, DSI, SMF, GRAME, and KTH. The IMUTUS effectiveness is
enh...
In this article, the self-organizing map (SOM) is employed to analyze data describing the 24-hour blood pressure and heart-rate variability of human subjects. The number of observations varies widely over different subjects, and therefore a direct statistical analysis of the data is not feasible without extensive pre-processing and interpolation fo...
In this paper, a system is presented that performs an automated morphological categorization of Greek words extracted from
a corpus. This system processes morphologically the words via the repetitive application of a masking-and-matching technique.
It is found that the introduction of a priori information regarding the grammar of the Greek language...
In the present paper, the Self-Organising Map (SOM) is applied to the problem of categorising a corpus of Modem Greek texts according to the style of their authors. A number of variants of the SOM model are used in a series of experiments, in order to compare and contrast their behaviour in the specific task. The experimental results indicate that...
This article investigates the application of the SOLNN (Self-Organising Logic Neural Network) n-tuple-based network to character recognition and image segmentation clustering tasks, where the classes consist of a large number of distinct sub-classes. It is shown that the SOLNN clustering performance and node utilisation are both improved by virtue...
In this article, the application of the scanning n-tuple technique
to classification tasks is studied. The performance of this technique is
examined in a handwritten character recognition task where the accuracy
is initially low. This task is employed as a case study for designing a
general-purpose algorithm that improves the scanning n-tuple perfo...
This article studies the implementation of a handwritten character recognition task using neural networks. Two logic neural network models axe employed to classify the Essex dataset, which comprises real-world hand-written characters. To reduce the underlying dataset variation, several pre-processing approaches are investigated. This allows the com...
In this article, two neural network clustering techniques are compared to classical statistical techniques. This is achieved by examining the results obtained when applying each technique to a real-world phoneme recognition task. An analysis of the phoneme datasets exposes the clusters which exist in the pattern space. The study of the similarity o...
This article focuses on the systematic design of a segment database which has been used to support a time-domain speech synthesis
system for the Greek language. Thus, a methodology is presented for the generation of a corpus containing all possible instances
of the segments for the specific language. Issues such as the phonetic coverage, the senten...
In this article, an image segmentation method based on the SOLNN self-organising logic neural network is studied. The input image is initially processed using the TCS texture-highlighting technique and is then presented to the SOLNN network which segments it. The SOLNN is characterised by a variable sensitivity which enables it to be fine-tuned to...
This article discusses the implementation of a hand-written character recognition task using neural networks. Two logic neural networks-the WISARD (I. Aleksander and H. Morton, 1990) and the SOLNN (G. Tambouratzis and T.J. Stonham, 1993)-are compared on the basis of their classification accuracy. The results obtained are compared to these of other...
In this article, a self-organising logic neural network is studied. This network successfully clusters input patterns into classes characterised by a high similarity, while assigning these classes to the network nodes so that relationships existing in the pattern space are replicated on the network structure. The network performance is optimised by...
A self-organising discriminator-based logic neural network is
compared to the similarly-structured supervised WISARD neural network on
the basis of their performance in a pattern recognition task. The
self-organising system is shown to possess a superior performance in
learning environments where the training patterns have a high degree of
variabil...
This article investigates the behaviour of a self-organizing logic neural network when it is tasked with clustering complex data spaces. The network is based on the discriminator-node structure and is trained using an unsupervised-learning adaptation rule. The network performance is evaluated by applying it to clustering tasks involving identifiabl...
An unsupervised learning algorithm which enables a logical neural network to separate different classes of binary images while at the same time creating a topology-preserving mapping of the input space is described. Results concerning its successful application to character separation and recognition are presented and the quality of the mappings ge...
A logic artificial neural network paradigm is used to cluster texture spectra in feature space to achieve image segmentation. The features are grouped such that they represent regions of textural homogeneity on the image. These are extracted from small local areas on the image. The strategy results in a feature spectrum transformation of the image....
An unsupervised-learning algorithm which endows a logical neural network model with the ability to separate patterns belonging to different pattern classes while at the same time creating a topology-preserving mapping of the input space is examined in this paper. Emphasis is placed in the storage efficiency of the algorithm and its ability to maxim...
In this paper, we present a method for image segmentation via texture recognition and feature space clustering, using a logical neural network paradigm. The texture analysis strategy results in a transformation of the image into a Texture Co-occurrence Spectrum (TCS). The logical neural network clusters the TCS in feature space by operating in an u...
The topology-preservation characteristics of a self-organising system are studied in this paper. The system consists of a logical neural network with a structure based on the discriminator network and a method of training that presents certain similarities to Kohonen’s self-organising maps. In particular, the optimal neighbourhood size for the most...
The suitability of the ASP computer for image processing tasks has been evaluated by implementing the Abingdon Cross benchmark. The results obtained indicate that the ASP is among the fastest and most cost effective image processors available.
An online, unsupervised training algorithm is presented, which
allows a logical neural network already trained to identify classes of
objects to adapt to changes in the environment. This algorithm enables
the system to operate continuously, without danger of overgeneralisation
and displays useful noise-reduction properties. Results indicating its
c...
The performance of the ASP computer on vision tasks has been
evaluated by applying the Abingdon Cross benchmark using a number of
different algorithms. In this paper, these algorithms are compared and
contrasted on the basis of their performance
In this article the principles of the METIS Machine Translation system are pre- sented. METIS employs an extensive tagged and lemmatised corpus of texts in the target language, coupled with bilin- gual lexica covering the desired pairs of source-target languages. To generate a high-quality translation, the METIS sys- tem is provided with statistica...
In this article, a system is proposed for the automatic style categorisation of text corpora in the Greek language. This categorisation is based to a large extent on the type of language used in the text, for example whether the language used is representative of formal Greek or not. To arrive to this categorisation, the highly inflectional nature...
The present article describes AMP, a system for automated morphological processing of An-cient Greek word forms. It is considered a hybrid approach, combining pattern recognition techniques with limited linguistic knowledge to achieve accurate segmentation into stem and ending, and is expected to substantially contribute to the creation and/or enri...