Average memory storage (in kilobytes) and processing time per test instance (in seconds) obtained with IB1-IG and IGTREE, applied to the GS dataset.

Source publication

Treetalk: Memory-Based Word Phonemisation

Article

Full-text available

Feb 2002

We propose a memory-based (similarity-based) approach to learning the mapping of words into phonetic representations for use in speech synthesis systems. The main advantage of memory-based data mining techniques is their high accuracy, the main disadvantage is processing speed. We introduce a hybrid between memory-based and decision-tree-based lear...

MUST&P-SRL: Multi-lingual and Unified Syllabification in Text and Phonetic Domains for Speech Representation Learning

Conference Paper

Full-text available

Dec 2023

Noé Tits

In this paper, we present a methodology for linguistic feature extraction, focusing particularly on automatically syllabifying words in multiple languages, with a design to be compatible with a forced-alignment tool, the Montreal Forced Aligner (MFA). In both the textual and phonetic domains, our method focuses on the extraction of phonetic transcr...

TIMIT and NTIMIT Phone Recognition Using Convolutional Neural Networks

Book

Full-text available

Jan 2019

p>A novel application of convolutional neural networks to phone recognition is presented in this paper. Both the TIMIT and NTIMIT speech corpora have been employed. The phonetic transcriptions of these corpora have been used to label spectrogram segments for training the convolutional neural network. A sliding window extracted fixed sized images fr...

TIMIT and NTIMIT Phone Recognition Using Convolutional Neural Networks: 7th International Conference, ICPRAM 2018, Funchal, Madeira, Portugal, January 16-18, 2018, Revised Selected Papers

Chapter

Full-text available

Jan 2019

A novel application of convolutional neural networks to phone recognition is presented in this paper. Both the TIMIT and NTIMIT speech corpora have been employed. The phonetic transcriptions of these corpora have been used to label spectrogram segments for training the convolutional neural network. A sliding window extracted fixed sized images from...

Phoneme frequency distribution in Lana, Emilia text and speech corpora,...

Number of diphones grouped by ranges of occurrences number in Emilia...

Boxplot of perceptual evaluation results: MOS; SUS; ITUs; and CMOS

Emilia: a speech corpus for Argentine Spanish text to speech synthesis

Article

Full-text available

Sep 2019

This paper introduces Emilia, a speech corpus created to build a female voice in Spanish spoken in Buenos Aires for the Aromo text-to-speech system. Aromo is a unit selection text-to-speech system, which employs diphones as units of synthesis. The key requirements and design criteria for Emilia were: to synthesize any text in Spanish into high-qual...

From segmentation bootstrapping to transcription-to-word conversion

Conference Paper

Full-text available

Aug 2013

Uwe Reichel

The mapping of a raw phonetic transcription to an orthographic word sequence is carried out in three steps: First, a syllable segmentation of the transcription is bootstrapped, based on unsupervised subtractive learning. Then, the syllables are grouped to word entities guided by non-linguistic distributional proper- ties. Finally, the phonetic word...

Measuring and comparing vowel qualities in a Dutch spontaneous speech corpus

Conference Paper

Full-text available

Sep 2006

Spelling space: A computational test bed for phonological and morphological changes in Dutch spelling

Article

Full-text available

Jul 2006

Van A Bosch

The Dutch spelling system, like other European spelling systems, represents a certain balance between preserving the spelling of morphemes (the morphological principle) and obeying letter-to-sound regularities (the phonological principle). We present experimental results with artificial learners that show a competition effect between the two principles: adhering more to one principle leads to more violations of the other. The artificial learners, memory-based learning algorithms, are trained (1) to convert written words to their phonemic counterparts and (2) to analyze written words on their morphological composition, based on data extracted from the CELEX lexical database. As an exception to the competition effect we show that introducing the schwa as a letter in the spelling system causes both morphology and phonology to be learnt better by the artificial learners. In general we argue that artificial learning studies are a tool in obtaining objective measurements on a spelling system that may be of help in spelling reform processes.

Performance animation from low-dimensional control signals

Article

Full-text available

Jul 2005
ACM T GRAPHIC

This paper introduces an approach to performance animation that employs video cameras and a small set of retro-reflective markers to create a low-cost, easy-to-use system that might someday be practical for home use. The low-dimensional control signals from the user's performance are supplemented by a database of pre-recorded human motion. At run time, the system automatically learns a series of local models from a set of motion capture examples that are a close match to the marker locations captured by the cameras. These local models are then used to reconstruct the motion of the user as a full-body animation. We demonstrate the power of this approach with real-time control of six different behaviors using two video cameras and a small set of retro-reflective markers. We compare the resulting animation to animation from commercial motion capture equipment with a full set of markers.

Exploiting Spatial-temporal Constraints for Interactive Animation Control

Article

Jinxiang Chai

On variation and change in diphthongs and long vowels of spoken Dutch

Article

I. Jacobi

SHOTGUN: Converting words into triplets A hybrid approach to grapheme-phoneme conversion in Dutch

Article

Dec 2016

Software systems convert between graphemes and phonemes using lexicon-based, rule-based or data-driven techniques. SHOTGUN combines these techniques in a hybrid system which converts between graphemes and phonemes bi-directionally, adds linguistic and educational information about the relationships between graphemes and phonemes and provides estimates about the likelihood that the generated output is correct. We describe the components from which SHOTGUN is built and determine its accuracy by running tests on two data sources, the BasisSpellingBank and CELEX, comparing the results to Nunn’s (1998) rule-based conversion system. SHOTGUN converts phonemes to graphemes and vice versa with precision of 81% and 86% when tested on the BasisSpellingBank, and 80% and 81% when tested on CELEX. SHOTGUN proves to be a powerful new conversion tool.

Analogy-Based Models for Natural Language Learning

Article

Nov 2005

Nicolas Stroppa

The field of Natural Language Processing is mainly covered by two families of approaches. The first one is characterized by linguistic knowledges expressed through rules (production rules for syntax, inference rules for semantics, etc.) operating on symbolic representations. The second one assumes a probabilistic model underlying the data, the parameters of which are induced from corpora of annotated linguistic data. These two families of methods, although efficient for a number of applications, have serious drawbacks. One the one hand, rule-based methods are faced with the difficulty and the cost of constructing high quality knowledge bases: experts are rare and the knowledge of a domain $X$ may not simply adapt to another domain $Y$. One the other hand, probabilistic methods do not naturally handle strongly structured objects, do not support the inclusion of explicit linguistic knowledge, and, more importantly, heavily rely on an often subjective prior choice of a certain model. Our work focuses on analogy-based methods whose goal is to tackle all or part of these limitations. In the framework of Natural Language Learning, alternative inferential models in which no abstraction is performed have been proposed: linguistic knowledge is implicitly contained within the data. In Machine Learning, methods with such principles are known as ``Lazy Learning''. They usually rely on the following learning bias: if an input object $Y$ is ``close'' to another object $X$, then its output $f(Y)$ is a good candidate for $f(X)$. Although this hypothesis is relevant for most Machine Learning tasks, the structured nature and the paradigmatic organization of linguistic data suggest a slightly different approach. To take this specificity into account, we study a model relying on the notion of ``analogical proportion''. Within this model, inferring $f(T)$ is performed by finding an analogical proportion with three known objects $X$, $Y$ and $Z$. The ``analogical hypothesis'' is formalized as: if \lana{X}{Y}{Z}{T}, then \lana{$f(X)$}{$f(Y)$}{$f(Z)$}{$f(T)$}. Inferring $f(T)$ from the known $f(X)$, $f(Y)$, $f(Z)$ is achieved by solving the ``analogical equation'' (with unknown $U$): \lana{$f(X)$}{$f(Y)$}{$f(Z)$}{$U$}. In the first part of this work, we present a study of this model of analogical proportion within a more general framework termed ``analogical learning''. This framework is instantiated in several contexts: in the field of cognitive science, it is related to analogical reasoning, an essential faculty underlying a number of cognitive processes; in traditional linguistics, it gives a support to a number of phenomena such as analogical creation, opposition, commutation; in the context of machine learning, it corresponds to ``lazy learning'' methods. The second part of our work proposes a unified algebraic framework, which defines the concept of analogical proportion. Starting from a model of analogical proportion operating on strings (elements of a free monoid), we present an extension to the more general case of semigroups. This generalization directly yields a valid definition for all the sets deriving from the structure of semigroup, which allows us to handle analogical proportions of common representations of linguistic entities such as strings, trees, feature structures and finite sets. We describe algorithms which are adapted to processing analogical proportions of such structured objects. We also propose some directions to enrich the model, thus allowing its use in more complex cases. The inferential model we studied, firstly designed for Natural Language Processing purposes, can be explicitly interpreted as a Machine Learning method. This formalization makes it possible to highlight several of its noticeable features. One of these characteristics lies in its capacity to handle structured objects, in input as well as in output, whereas traditional classification tasks generally assume an output space made up of a finite set of classes. We then introduce the notion of analogical extension in order to express the learning bias of the model. Lastly, we conclude by presenting experimental results obtained in several Natural Language Processing tasks: pronunciation, flectional analysis and derivational analysis.

A dynamic context shortening method for a minimum-context grapheme-to-phoneme data-driven transducer generator

Article

Aug 2006
J QUANT LINGUIST

Andrzej Plucinski

We present an efficient way to learn automatically letter-to-phoneme mapping rules for Polish by using the concept of “dynamic context shortening method”.Attempts at reconstruction of transcription rules date back to 1987, when Sejnowski and Rosenberg applied a self-organizing neural network for “grapheme-to-phoneme” mapping. In the latter approaches decision tree based methods were applied. The trees for each letter were built starting from empty contexts. The left and right contexts were then alternately widened until the transcription ambiguity of the training data disappeared. We started in our approach from the symmetrical context wide enough to ensure unambiguous transcription in every context surroundings. Then both contexts, i.e., left and right, were shortened alternately until the ambiguity appeared. In all the cases where ambiguous transcriptions occurred, the previous context forms were restored and were not shortened further. Therefore at every step the cause of ambiguity, namely too short left or right context, was clearly known and removed. On the basis of the results obtained, the transcription tables for each letter were constructed. A 350,000 character corpus of Polish text transcribed into phonemic form was prepared and different-length training samples were taken from it at random and analysed. The remaining parts were used for verification. It turned out that it was enough to prepare a 30,000 character training sample to learn Polish grapheme-to-phoneme minimum-context mapping.We describe also three original generalization methods which we call rules coring, indeterminacies absorption and a guessing method. The last was invented to be applied in case of a limited acceptance of a context. The methods applied together allow the removal of about 70% of errors.

A pilot study for a Corpus of Dutch Aphasic Speech (CoDAS). Focusing on the orthographic transcription

Article

A corpus of Dutch aphasic speech: sketching the design and Performing a pilot study

Article

Jul 2008

E.N. Westerhout

In this thesis, a pilot study for the development of a corpus of Dutch aphasic speech (CoDAS) is presented. Given the lack of resources of this kind not only for Dutch but also for other languages, CoDAS will be able to set standards and it will contribute to the future research in this area. A corpus of Dutch aphasic speech should fulfill at least three requirements. First, it should encode a plausible sample of contemporary Dutch as spoken by aphasic patients. That is, it should include speech representing different types of aphasia as well as various communication settings. Secondly, the speech fragments should be documented with the relevant metadata which should include information about the speaker and aphasia. Thirdly, the corpus should be enriched with various kinds of linguistic information. Given the special character of the speech contained in CoDAS, we cannot simply carry over the design and the annotation protocols of existing corpora, such as SDC or CHILDES. However, they have been assumed as starting point. In our pilot study, we have established the basic requirements with respect to text types, metadata and annotation levels that CoDAS should fulfill. In this respect, we have investigated whether and how the procedures and protocols for the annotation and transcription used for the SDC should be adapted in order to annotate and transcribe the aphasic speech properly. In particular, for the orthographic transcription and the part-of-speech tagging, suggestions for improvement of the existing protocols have been given. On the other hand, the phonetic transcription procedure assumed within the SDC can be adopted without major modifications.

Average memory storage (in kilobytes) and processing time per test instance (in seconds) obtained with IB1-IG and IGTREE, applied to the GS dataset.

Similar publications

Citations