[Show abstract][Hide abstract] ABSTRACT: Developing correct Grapheme-to-Phoneme (GTP) conversion method is a central problem in text-to-speech synthesis. Particularly, deriving phonologi-cal features which are not shown in orthography is challenging. In the Amharic language, geminates and epenthetic vowels are very crucial for proper pronunciation but neither is shown in orthography. This paper describes an architecture, a preprocess-ing morphological analyzer integrated into an Am-haric Text to Speech (AmhTTS) System, to con-vert Amharic Unicode text into phonemic specifi-cation of pronunciation. The study mainly focused on disambiguating gemination and vowel epenthe-sis which are the significant problems in develop-ing Amharic TTS system. The evaluation test on 666 words shows that the analyzer assigns gemi-nates correctly (100%). Our approach is suitable for languages like Amharic with rich morphology and can be customized to other languages.
[Show abstract][Hide abstract] ABSTRACT: Despite its linguistic complexity, the Horn of Africa region includes several major languages with more than 5 million speak-ers, some crossing the borders of mul-tiple countries. All of these languages have official status in regions or nations and are crucial for development; yet com-putational resources for the languages re-main limited or non-existent. Since these languages are complex morphologically, software for morphological analysis and generation is a necessary first step toward nearly all other applications. This pa-per describes a resource for morphologi-cal analysis and generation for three of the most important languages in the Horn of Africa, Amharic, Tigrinya, and Oromo.
[Show abstract][Hide abstract] ABSTRACT: Computer-assisted language learning is by now so common around the world as to be something of a default, and the teaching of the indigenous languages of the Americas is already benefiting from the new technology. Intelligent computer-assisted language learning relies on software that has relatively sophisticated models of the target language and/or the learner. An example is the use of a program that has an explicit model of some aspect of the grammar of the target language and can analyze or generate words or sentences. Many indigenous languages of the Americas are characterized by complex morphology, and morphology must play a significant role in the instruction of these languages. This paper describes how morphological analyzers and generators can handle the complex morphology of languages such as K'iche' and Quechua and discusses a potential application of this technology to the teaching of such languages. Computer-Assisted Language Learning In recent years, computers have become so important in language teaching that it hard to imagine a class without them. Students use computers to do exercises practicing what they have learned in the class, they access documents from the Internet, they interact with other learners or with native speakers of the target language on the Internet, and they write papers with word processing software that may be especially adapted to second language learning. The field of computer-assisted language learning (CALL) has its own conferences and its own journals, CALICO Journal, Computer Assisted Language Learning, and Language Learning and Technology. Computers are even a part of the language curriculum in relatively impoverished parts of the world, including regions where indigenous languages are taught as
[Show abstract][Hide abstract] ABSTRACT: Extensible Dependency Grammar (XDG; Debusmann, 2007) is a flexible, modular dependency grammar framework in which sentence analyses consist of multigraphs and processing takes the form of constraint satisfaction. This paper shows how XDG lends itself to grammar-driven machine translation and introduces the machinery necessary for synchronous XDG. Since the approach relies on a shared semantics, it resembles interlingua MT. It differs in that there are no separate analysis and generation phases. Rather, translation consists of the simultaneous analysis and generation of a single source-target sentence. Extensible Dependency Grammar (XDG; Debusmann, 2007) es un marco gramático de dependencias flexible y modular en el que los análisis de frases consisten en multigrafos y el procesamiento toma la forma de satisfacción de restricciones. En este artículo, mostramos las ventajas que tiene el XDG para la traducción automática basada en la gramática y presentamos la maquinaria necesaria para una versión sincrónica de XDG. Puesto que hay una semántica simple compartida, el enfoque que describimos es parecido a la traducción automática mediante una lengua intermedia. Sin embargo, se diferencia de ella por el hecho de que no hay fases de análisis y generación separadas. La traducción más bien consiste en el análisis simultáneo y la generación de una sola frase de entrada-salida. Extensible Dependency Grammar (XDG; Debusmann, 2007) és un marc gramàtic de dependències modular i flexible en què les anàlisis de frases consisteixen en multigrafs i el processament pren la forma de satisfacció de restriccions. En aquest article, mostrem els avantatges que l'XDG té per a la traducció automàtica basada en la gramàtica i presentem la maquinària necessària per a una versió sincrònica de l'XDG. Atès que hi ha una semàntica compartida simple, l'enfocament que descrivim s'assembla a la traducció automàtica mitjançant una llengua intermèdia. Se'n diferencia, però, pel fet que no hi ha fases d'anàlisi i generació separades. En canvi, la traducció consisteix en l'anàlisi i la generació simultànies d'una única frase d'entrada-sortida.
[Show abstract][Hide abstract] ABSTRACT: Resource-poor languages may suffer from a lack of any of the basic resources that are fundamental to computational linguistics, including an adequate digital lexicon. Given the relatively small corpus of texts that exists for such languages, extending the lexicon presents a challenge. Languages with complex morphology present a special case, however, because individual words in these languages provide a great deal of information about the grammatical properties of the roots that they are based on. Given a morphological analyzer, it is even possible to extract novel roots from words. In this paper, we look at the case of Tigrinya, a Semitic language with limited lexical resources for which a morphological analyzer is available. It is shown that this analyzer applied to the list of more than 200,000 Tigrinya words that is extracted by a web crawler can extend the lexicon in two ways, by adding new roots and by inferring some of the derivational constraints that apply to known roots.
Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, 17-23 May 2010, Valletta, Malta; 01/2010
[Show abstract][Hide abstract] ABSTRACT: There has been little work on computational grammars for Amharic or other Ethio-Semitic languages and their use for parsing and generation. This paper introduces a grammar for a fragment of Amharic within the Extensible Dependency Grammar (XDG) framework of Debusmann. A language such as Amharic presents special challenges for the design of a dependency grammar because of the complex morphology and agreement constraints. The paper describes how a morphological analyzer for the language can be integrated into the grammar, introduces empty nodes as a solution to the problem of null subjects and objects, and extends the agreement principle of XDG in several ways to handle verb agreement with objects as well as subjects and the constraints governing relative clause verbs. It is shown that XDG's multiple dimensions lend themselves to a new approach to relative clauses in the language. The introduced extensions to XDG are also applicable to other Ethio-Semitic languages.
[Show abstract][Hide abstract] ABSTRACT: The ontological distinction between discrete individuated objects and continuous substances, and the way this distinction is expressed in different languages has been a fertile area for examining the relation between language and thought. In this paper we combine simulations and a cross-linguistic word learning task as a way to gain insight into the nature of the learning mechanisms involved in word learning. First, we look at the effect of the different correlational structures on novel generalizations with two kinds of learning tasks implemented in neural networks-prediction and correlation. Second, we look at English- and Spanish-speaking 2-3-year-olds' novel noun generalizations, and find that count/mass syntax has a stronger effect on Spanish- than on English-speaking children's novel noun generalizations, consistent with the predicting networks. The results suggest that it is not just the correlational structure of different linguistic cues that will determine how they are learned, but the specific learning mechanism and task in which they are involved.
[Show abstract][Hide abstract] ABSTRACT: This paper presents an application of finite state transducers weighted with feature structure descriptions, following Amtrup (2003), to the morphology of the Semitic language Tigrinya. It is shown that feature-structure weights provide an effi- cient way of handling the templatic mor- phology that characterizes Semitic verb stems as well as the long-distance de- pendencies characterizing the complex Tigrinya verb morphotactics. A relatively complete computational implementation of Tigrinya verb morphology is described.
EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, March 30 - April3, 2009, Athens, Greece; 01/2009
[Show abstract][Hide abstract] ABSTRACT: The embodiment hypothesis is the idea that intelligence emerges in the interaction of an agent with an environment and as a result of sensorimotor activity. We offer six lessons for developing embodied intelligent agents suggested by research in developmental psychology. We argue that starting as a baby grounded in a physical, social, and linguistic world is crucial to the development of the flexible and inventive intelligence that characterizes humankind.
Artificial Life 01/2005; 11(1-2):13-29. · 1.59 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Introduction We are interested in modeling an aspect of human rhythm perception and production called beat induction. Roughly, beat induction consists of nding the downbeats in a metrical signal. The most common example of beat induction in human performance is foot tapping to music, and one way to state our goal is that we want to build feet which can tap to the radio as well as people do. However, we are interested in more than the end-state of adult rhythmical behavior; we also focus on the path that people take in perfecting this skill. We build embodied models, ones with actual robotic components that interact with and constrain computational components. This commitment to a physical model of beat induction might seem wrong-headed. After all, beat induction is a perceptual phenomenon in adults, not necessarily involving motor control at all. But we believe that the interactions between body and brain in developing infants and toddlers cannot
[Show abstract][Hide abstract] ABSTRACT: Children generalize nouns in ways that are consistent with the referent s ontological and/or grammatical kind. In other words, children generalize a new noun based on both the perceptual properties of the referent and the linguistic properties of the noun (Soja, Carey, & Spelke, 1991; Jones & Smith, 1998, Soja, 1992; Smith, 1995). Cross-linguistic studies have shown that systematic differences in the structures of different languages are reflected in children s novel noun generalizations (Imai & Gentner, 1997; Gathercole & Min, 1997, Yoshida & Smith 1999). One of the differences that has been studied is the ontological object/substance distinction as it relates to the syntactic count/mass distinction. In this paper we look at the effect of mass/count syntax and perceptual cues concerning solidity on English- and Spanish-speaking children s generalization of new nouns. The task used to study this is the Novel Noun Extension Task. In this task, the child is shown an exemplar and the exemplar is labeled. The child is then asked what other things, matching the exemplar on different dimensions, can be called by the same name. Previous research has shown that children extend the name of a solid object to other objects of the same shape and the name of a nonsolid substance to other shapes made out of the same material. (Soja et al
[Show abstract][Hide abstract] ABSTRACT: Most theories of language processing and acquisition make the assumption that perception and comprehension are related to production, but few haveanything say about how. This paper describes a performance-oriented connectionist model of the acquisition of morphology in which production builds on representations which develop during the learning of word recognition. Using arti#cial language stimuli embodying simple su#xation, pre#xation, and template rules, I demonstrate that the model generalizes to novel combinations of roots and in#ections for both word recognition and production. I argue that the capacity of connectionist networks to develop intermediate distributed representations which not only enable the solving of the task at hand but also facilitate another task o#ers a plausible accountofhow comprehension and production come to share phonological knowledge as words are learned. Introduction Language learners must acquire both the ability to comprehend language and...
[Show abstract][Hide abstract] ABSTRACT: This paper presents a connectionist model of how representations for syllables might be learned from sequences of phones. A simple recurrent network is trained to distinguish a set of words in an artificial language, which are presented to it as sequences of phonetic feature vectors. The distributed syllable representations that are learned as a side-effect of this task are used as input to other networks. It is shown that these representations encode syllable structure in a way which permits the regeneration of the phone sequences (for production) as well as systematic phonological operations on the representations. Linguistic Structure and Distributed Representation If the language sciences agree on one thing, it is the hierarchical nature of language. The importance of hierarchical, structured representations is now generally recognized for the phonological pole, where syllables and metrical units now play a major role (see, e.g., Frazier (1987) and Goldsmith (1990)),...
[Show abstract][Hide abstract] ABSTRACT: The morphological systems of natural languages are replete with examples of the same devices used for multiple purposes: (1) the same type of morphological process (for example, suffixation for both noun case and verb tense) and (2) identical morphemes (for example, the same suffix for English noun plural and possessive). These sorts of similaritywould be expected to convey advantages on language learners in the form of transfer from one morphological category to another. Connectionist models of morphology acquisition have been faulted for their supposed inability to represent phonological similarity across morphological categories and hence to facilitate transfer. This paper describes a connectionist model of the acquisition of morphology which is shown to exhibit transfer of this type. The model treats the morphology acquisition problem as one of learning to map forms onto meanings and vice versa. As the network learns these mappings, it makes phonological generalizations whi...
[Show abstract][Hide abstract] ABSTRACT: This paper describes a modular connectionist model of the acquisition of receptive inflectional morphology. The model takes inputs in the form of phones one at a time and outputs the associated roots and inflections. In its simplest version, the network consists of separate simple recurrent subnetworks for root and inflection identification; both networks take the phone sequence as inputs. It is shown that the performance of the two separate modular networks is superior to a single network responsible for both root and inflection identification. In a more elaborate version of the model, the network learns to use separate hidden-layer modules to solve the separate tasks of root and inflection identification.
[Show abstract][Hide abstract] ABSTRACT: It is proposed that the theory of dynamical systems offers appropriate tools to model many phonological aspects of both speech production and perception. A dynamic account of speech rhythm is shown to be useful for description of both Japanese mora timing and English timing in a phrase repetition task. This orientation contrasts fundamentally with the more familiar symbolic approach to phonology, in which time is modeled only with sequentially arrayed symbols. It is proposed that an adaptive oscillator offers a useful model for perceptual entrainment (or `locking in') to the temporal patterns of speech production. This helps to explain why speech is often perceived to be more regular than experimental measurements seem to justify. Because dynamic models deal with real time, they also help us understand how languages can differ in their temporal detail---contributing to foreign accents, for example. The fact that languages differ greatly in their temporal detail suggests th...
[Show abstract][Hide abstract] ABSTRACT: This paper describes an evolving computational model of the perception and production of simple rhythmic patterns. The model consists of a network of oscillators of different resting frequencies which couple with input patterns and with each other. Oscillators whose frequencies match periodicities in the input tend to become activated. Metrical structure is represented explicitly in the network in the form of clusters of oscillators whose frequencies and phase angles are constrained to maintain the harmonic relationships that characterize meter. Rests in rhythmic patterns are represented by explicit rest oscillators in the network, which become activated when an expected beat in the pattern fails to appear. The model makes predictions about the relative difficulty of patterns and the effect of deviations from periodicity in the input. The Phenomenon The nested periodicity that defines musical, and probably also linguistic, meter appears to be fundamental to the way in which ...
[Show abstract][Hide abstract] ABSTRACT: Relations lie at the center of humankind's most intellectual endeavors and are also fundamental to any account of linguistic semantics. Despite the importance of relations in understanding cognition and language, there is no well-accepted account of the origins of relations. What are relations made of? How are they made? In this chapter we address these questions. First, we consider past proposals of how relations are represented and the implications of these representational ideas for development. Second, we review the developmental evidence in the context of five psychological facts about relations that must be explained by any account of their origin. This evidence suggests that relational concepts are similarity based, influenced by specific developmental history, and influenced by language. Third, we summarize Gasser and Colunga's Playpen model of the learning of relations. This connectionist model instantiates a new proposal about the stuff out of which relations are made and the...
[Show abstract][Hide abstract] ABSTRACT: One kind of prosodic structure that apparently underlies both music and some examples of speech production is meter. Yet detailed measurements of the timing of both music and speech show that the nested periodicities that define metrical structure can be quite noisy in time. What kind of system could produce or perceive such variable metrical timing patterns? And what would it take to be able to store and reproduce particular metrical patterns from long-term memory? We have developed a network of coupled oscillators that both produces and perceives patterns of pulses that conform to particular meters. In addition, beginning with an initial state with no biases, it can learn to prefer the particular meter that it has been previously exposed to.