Table 3 - uploaded by Antal Van den Bosch
Content may be subject to copyright.
Average memory storage (in kilobytes) and processing time per test instance (in seconds) obtained with IB1-IG and IGTREE, applied to the GS dataset.

Average memory storage (in kilobytes) and processing time per test instance (in seconds) obtained with IB1-IG and IGTREE, applied to the GS dataset.

Source publication
Article
Full-text available
We propose a memory-based (similarity-based) approach to learning the mapping of words into phonetic representations for use in speech synthesis systems. The main advantage of memory-based data mining techniques is their high accuracy, the main disadvantage is processing speed. We introduce a hybrid between memory-based and decision-tree-based lear...

Similar publications

Conference Paper
Full-text available
In this paper, we present a methodology for linguistic feature extraction, focusing particularly on automatically syllabifying words in multiple languages, with a design to be compatible with a forced-alignment tool, the Montreal Forced Aligner (MFA). In both the textual and phonetic domains, our method focuses on the extraction of phonetic transcr...
Book
Full-text available
p>A novel application of convolutional neural networks to phone recognition is presented in this paper. Both the TIMIT and NTIMIT speech corpora have been employed. The phonetic transcriptions of these corpora have been used to label spectrogram segments for training the convolutional neural network. A sliding window extracted fixed sized images fr...
Chapter
Full-text available
A novel application of convolutional neural networks to phone recognition is presented in this paper. Both the TIMIT and NTIMIT speech corpora have been employed. The phonetic transcriptions of these corpora have been used to label spectrogram segments for training the convolutional neural network. A sliding window extracted fixed sized images from...
Article
Full-text available
This paper introduces Emilia, a speech corpus created to build a female voice in Spanish spoken in Buenos Aires for the Aromo text-to-speech system. Aromo is a unit selection text-to-speech system, which employs diphones as units of synthesis. The key requirements and design criteria for Emilia were: to synthesize any text in Spanish into high-qual...
Conference Paper
Full-text available
The mapping of a raw phonetic transcription to an orthographic word sequence is carried out in three steps: First, a syllable segmentation of the transcription is bootstrapped, based on unsupervised subtractive learning. Then, the syllables are grouped to word entities guided by non-linguistic distributional proper- ties. Finally, the phonetic word...

Citations

... vowels followed by final /r/ or /l/. For the segment boundaries and vowel classes we relied on the corpus segmentations and annotations that fitted our research in terms of a broad transcription: the phonemic representation of the corpus is based on the orthographic transcriptions of the corpus and was generated fully automatically by TreeTalk [4]. The symbols were derived from SAMPA in such a way, that the produced sounds were related to the phonemes of Dutch [5], thus giving the same symbol to all variants of a phoneme: "E+" to all /EI/, "A+" to /Au/, "Y+" to /2y/, "e" to /e:/, and "o" to /o:/. ...
... MBL has been applied to word phonemization successfully in the past (Stanfill and Waltz, 1986; Van den Bosch and Daelemans, 1993; Van den Bosch et al., 1996; Daelemans and Van den Bosch, 2001), also to Dutch (Busser, 1998), and is known to perform at state-of-the-art performance levels as needed in speech synthesis technology. It has also been shown to outperform artificial neural networks (Stanfill and Waltz, 1986 ) and decisiontree methods (Daelemans et al., 1999). ...
... The actual choice which letter of a digraph becomes aligned to the null phoneme is a random but consistent choice made by the expectation–maximization (EM) algorithm (Dempster et al., 1977 ), which is employed to create an optimized letterphoneme probability matrix automatically (for details, cf. Daelemans & Van den Bosch, 2001). Using the same mechanism, EM is used to insert graphemic nulls in those (relatively less frequent) cases in which the written form has fewer letters than the phonemic transcription has phonemes; e.g., <taxi> – [tAksi]. ...
Article
Full-text available
The Dutch spelling system, like other European spelling systems, represents a certain balance between preserving the spelling of morphemes (the morphological principle) and obeying letter-to-sound regularities (the phonological principle). We present experimental results with artificial learners that show a competition effect between the two principles: adhering more to one principle leads to more violations of the other. The artificial learners, memory-based learning algorithms, are trained (1) to convert written words to their phonemic counterparts and (2) to analyze written words on their morphological composition, based on data extracted from the CELEX lexical database. As an exception to the competition effect we show that introducing the schwa as a letter in the spelling system causes both morphology and phonology to be learnt better by the artificial learners. In general we argue that artificial learning studies are a tool in obtaining objective measurements on a spelling system that may be of help in spelling reform processes.
... Nearest neighbor synthesis simply chooses the closest example. Locally weighted regression interpolates the nearby points by their distance to the query point and has proven very effective in such problem domains as motor learning [Atkeson et al. 1997b] and speech synthesis [Daelemans and van de Bosch 2001]. Figure 6 shows that our method creates more accurate results than either nearest neighbor synthesis or locally weighted regression. ...
Article
Full-text available
This paper introduces an approach to performance animation that employs video cameras and a small set of retro-reflective markers to create a low-cost, easy-to-use system that might someday be practical for home use. The low-dimensional control signals from the user's performance are supplemented by a database of pre-recorded human motion. At run time, the system automatically learns a series of local models from a set of motion capture examples that are a close match to the marker locations captured by the cameras. These local models are then used to reconstruct the motion of the user as a full-body animation. We demonstrate the power of this approach with real-time control of six different behaviors using two video cameras and a small set of retro-reflective markers. We compare the resulting animation to animation from commercial motion capture equipment with a full set of markers.
... Nearest neighbor synthesis simply chooses the closest example. Locally weighted regression interpolates the nearby points by their distance to the query point and has proven very effective in such problem domains as motor learning [6] and speech synthesis [28]. Figure 3.6 shows that our method creates more accurate results than either nearest neighbor synthesis or locally weighted regression. ...
... Based on the principle of combining two or more (correlated) variables into a single factor, the principal component analysis breaks down the information in the bandfilters into its most basic variations. Physical or psychophysical 24 ...
... For our vowel study, we wanted to consider as many vowel segments from spontaneous speech as possible. For the segment boundaries and vowel classes we relied on the existing segmentations and annotations of the CGN that fitted the research in terms of a broad transcription: a phonemic representation that was based on the orthographic transcriptions of the corpus, and had been generated fully automatically by TreeTalk (Daelemans & van den Bosch, 2001 [24]). The symbols used were derived from SAMPA in such a way that the produced sounds were related to the phonemes of Dutch (Gillis, 2001 [41]), hence, 53 giving the same symbol to all variants of a phoneme: "E+" to all /Ei/, "A+" to /Au/, "Y+" to /oey/, "e" to /e:/, and "o" to /o:/. ...
Article
Software systems convert between graphemes and phonemes using lexicon-based, rule-based or data-driven techniques. SHOTGUN combines these techniques in a hybrid system which converts between graphemes and phonemes bi-directionally, adds linguistic and educational information about the relationships between graphemes and phonemes and provides estimates about the likelihood that the generated output is correct. We describe the components from which SHOTGUN is built and determine its accuracy by running tests on two data sources, the BasisSpellingBank and CELEX, comparing the results to Nunn’s (1998) rule-based conversion system. SHOTGUN converts phonemes to graphemes and vice versa with precision of 81% and 86% when tested on the BasisSpellingBank, and 80% and 81% when tested on CELEX. SHOTGUN proves to be a powerful new conversion tool.
Article
The field of Natural Language Processing is mainly covered by two families of approaches. The first one is characterized by linguistic knowledges expressed through rules (production rules for syntax, inference rules for semantics, etc.) operating on symbolic representations. The second one assumes a probabilistic model underlying the data, the parameters of which are induced from corpora of annotated linguistic data. These two families of methods, although efficient for a number of applications, have serious drawbacks. One the one hand, rule-based methods are faced with the difficulty and the cost of constructing high quality knowledge bases: experts are rare and the knowledge of a domain $X$ may not simply adapt to another domain $Y$. One the other hand, probabilistic methods do not naturally handle strongly structured objects, do not support the inclusion of explicit linguistic knowledge, and, more importantly, heavily rely on an often subjective prior choice of a certain model. Our work focuses on analogy-based methods whose goal is to tackle all or part of these limitations. In the framework of Natural Language Learning, alternative inferential models in which no abstraction is performed have been proposed: linguistic knowledge is implicitly contained within the data. In Machine Learning, methods with such principles are known as ``Lazy Learning''. They usually rely on the following learning bias: if an input object $Y$ is ``close'' to another object $X$, then its output $f(Y)$ is a good candidate for $f(X)$. Although this hypothesis is relevant for most Machine Learning tasks, the structured nature and the paradigmatic organization of linguistic data suggest a slightly different approach. To take this specificity into account, we study a model relying on the notion of ``analogical proportion''. Within this model, inferring $f(T)$ is performed by finding an analogical proportion with three known objects $X$, $Y$ and $Z$. The ``analogical hypothesis'' is formalized as: if \lana{X}{Y}{Z}{T}, then \lana{$f(X)$}{$f(Y)$}{$f(Z)$}{$f(T)$}. Inferring $f(T)$ from the known $f(X)$, $f(Y)$, $f(Z)$ is achieved by solving the ``analogical equation'' (with unknown $U$): \lana{$f(X)$}{$f(Y)$}{$f(Z)$}{$U$}. In the first part of this work, we present a study of this model of analogical proportion within a more general framework termed ``analogical learning''. This framework is instantiated in several contexts: in the field of cognitive science, it is related to analogical reasoning, an essential faculty underlying a number of cognitive processes; in traditional linguistics, it gives a support to a number of phenomena such as analogical creation, opposition, commutation; in the context of machine learning, it corresponds to ``lazy learning'' methods. The second part of our work proposes a unified algebraic framework, which defines the concept of analogical proportion. Starting from a model of analogical proportion operating on strings (elements of a free monoid), we present an extension to the more general case of semigroups. This generalization directly yields a valid definition for all the sets deriving from the structure of semigroup, which allows us to handle analogical proportions of common representations of linguistic entities such as strings, trees, feature structures and finite sets. We describe algorithms which are adapted to processing analogical proportions of such structured objects. We also propose some directions to enrich the model, thus allowing its use in more complex cases. The inferential model we studied, firstly designed for Natural Language Processing purposes, can be explicitly interpreted as a Machine Learning method. This formalization makes it possible to highlight several of its noticeable features. One of these characteristics lies in its capacity to handle structured objects, in input as well as in output, whereas traditional classification tasks generally assume an output space made up of a finite set of classes. We then introduce the notion of analogical extension in order to express the learning bias of the model. Lastly, we conclude by presenting experimental results obtained in several Natural Language Processing tasks: pronunciation, flectional analysis and derivational analysis.
Article
We present an efficient way to learn automatically letter-to-phoneme mapping rules for Polish by using the concept of “dynamic context shortening method”.Attempts at reconstruction of transcription rules date back to 1987, when Sejnowski and Rosenberg applied a self-organizing neural network for “grapheme-to-phoneme” mapping. In the latter approaches decision tree based methods were applied. The trees for each letter were built starting from empty contexts. The left and right contexts were then alternately widened until the transcription ambiguity of the training data disappeared. We started in our approach from the symmetrical context wide enough to ensure unambiguous transcription in every context surroundings. Then both contexts, i.e., left and right, were shortened alternately until the ambiguity appeared. In all the cases where ambiguous transcriptions occurred, the previous context forms were restored and were not shortened further. Therefore at every step the cause of ambiguity, namely too short left or right context, was clearly known and removed. On the basis of the results obtained, the transcription tables for each letter were constructed. A 350,000 character corpus of Polish text transcribed into phonemic form was prepared and different-length training samples were taken from it at random and analysed. The remaining parts were used for verification. It turned out that it was enough to prepare a 30,000 character training sample to learn Polish grapheme-to-phoneme minimum-context mapping.We describe also three original generalization methods which we call rules coring, indeterminacies absorption and a guessing method. The last was invented to be applied in case of a limited acceptance of a context. The methods applied together allow the removal of about 70% of errors.
Article
In this thesis, a pilot study for the development of a corpus of Dutch aphasic speech (CoDAS) is presented. Given the lack of resources of this kind not only for Dutch but also for other languages, CoDAS will be able to set standards and it will contribute to the future research in this area. A corpus of Dutch aphasic speech should fulfill at least three requirements. First, it should encode a plausible sample of contemporary Dutch as spoken by aphasic patients. That is, it should include speech representing different types of aphasia as well as various communication settings. Secondly, the speech fragments should be documented with the relevant metadata which should include information about the speaker and aphasia. Thirdly, the corpus should be enriched with various kinds of linguistic information. Given the special character of the speech contained in CoDAS, we cannot simply carry over the design and the annotation protocols of existing corpora, such as SDC or CHILDES. However, they have been assumed as starting point. In our pilot study, we have established the basic requirements with respect to text types, metadata and annotation levels that CoDAS should fulfill. In this respect, we have investigated whether and how the procedures and protocols for the annotation and transcription used for the SDC should be adapted in order to annotate and transcribe the aphasic speech properly. In particular, for the orthographic transcription and the part-of-speech tagging, suggestions for improvement of the existing protocols have been given. On the other hand, the phonetic transcription procedure assumed within the SDC can be adopted without major modifications.