ArticlePDF Available

Grapheme-to-Phoneme Conversion for Amharic Text-to-Speech System

Authors:

Abstract and Figures

Developing correct Grapheme-to-Phoneme (GTP) conversion method is a central problem in text-to-speech synthesis. Particularly, deriving phonologi-cal features which are not shown in orthography is challenging. In the Amharic language, geminates and epenthetic vowels are very crucial for proper pronunciation but neither is shown in orthography. This paper describes an architecture, a preprocess-ing morphological analyzer integrated into an Am-haric Text to Speech (AmhTTS) System, to con-vert Amharic Unicode text into phonemic specifi-cation of pronunciation. The study mainly focused on disambiguating gemination and vowel epenthe-sis which are the significant problems in develop-ing Amharic TTS system. The evaluation test on 666 words shows that the analyzer assigns gemi-nates correctly (100%). Our approach is suitable for languages like Amharic with rich morphology and can be customized to other languages.
Content may be subject to copyright.
Conference on Human Language Technology for Development, Alexandria, Egypt, 2-5 May 2011.
Grapheme-to-Phoneme Conversion for Amharic
Text-to-Speech System
Tadesse Anberbir
Ajou University, Graduate School of
Information and Communication,
South Korea.
tadesse@ajou.ac.kr
Michael Gasser
Indiana University, School
of Informatics and Computing,
USA.
gasser@indiana.edu
Tomio Takara
University of the Ryukyus,
Graduate School of Engineering
and Science, Japan.
Kim Dong Yoon
Ajou University, Graduate School of
Information and Communication,
South Korea.
Abstract
Developing correct Grapheme-to-Phoneme (GTP)
conversion method is a central problem in text-to-
speech synthesis. Particularly, deriving phonologi-
cal features which are not shown in orthography is
challenging. In the Amharic language, geminates
and epenthetic vowels are very crucial for proper
pronunciation but neither is shown in orthography.
This paper describes an architecture, a preprocess-
ing morphological analyzer integrated into an Am-
haric Text to Speech (AmhTTS) System, to con-
vert Amharic Unicode text into phonemic specifi-
cation of pronunciation. The study mainly focused
on disambiguating gemination and vowel epenthe-
sis which are the significant problems in develop-
ing Amharic TTS system. The evaluation test on
666 words shows that the analyzer assigns gemi-
nates correctly (100%). Our approach is suitable
for languages like Amharic with rich morphology
and can be customized to other languages.
1 Introduction
Grapheme-to-Phoneme (GTP) conversion is a
process which converts a target word from its
written form (grapheme) to its pronunciation
form (phoneme). Language technologies such as
Text-to-speech (TTS) synthesis require a good
GTP conversion method.
GTP conversion comes under two main ap-
proaches: rule-based and data-driven techniques
and recently some statistical techniques have
been proposed (See (Damper et al., 1998) for
review of the several techniques). Using these
methods successful results are obtained for dif-
ferent languages (Taylor, 2005; Chalamandaris et
al., 2005) and other. However, in many languag-
es automatic derivation of correct pronunciation
from the grapheme form of a text is still chal-
lenging. Particularly phonological features which
are not shown in orthography make the GTP
conversion very complex.
Amharic, the official language of Ethiopia, has
a complex morphology and some phonological
features are not shown in orthography. Morphol-
ogy, the way morphemes in a language join to
form words, influences language technology be-
cause some phonological processes cannot be
modeled without proper modeling of morpholog-
ical processes. For example, most geminates in
Amharic language are related to grammatical
processes and can be predicted from morpholog-
ical processes. In general, for Semitic languages
such as Amharic and Tigrinya, morphological
analysis can make explicit some of the phonolog-
ical features of the languages that are not reflect-
ed in the orthography and plays a very crucial
role in text-to-speech synthesis. However, so far
no study has been conducted in this area.
In this study, we proposed and integrated a
preprocessing morphological analyzer into an
Amharic Text-to-speech (AmhTTS) system
mainly to automatically predict geminates and
epenthetic vowels positions in a word. Our re-
search is the first attempt to integrate a morpho-
logical analyzer called HornMorpho (Gasser,
2011) into an AmhTTS. The integrated morpho-
logical analyzer takes Amharic Unicode input
and outputs Latin transcription marking gemi-
nates and the location of epenthetic vowels. Then,
the output of the morphological analyzer is used
by the AmhTTS system and further processed to
extract all the features generated by the analyzer.
AmhTTS system is a parametric and rule-based
system designed based on cepstral method (An-
berbir and Takara, 2006).
68
The paper is organized as follows: Section 2
provides background information about Amharic
language and Section 3 briefly discusses about
Amharic writing system and challenges in GTP
conversion. Then, Section 4 and Section 5 de-
scribe about the automatic assignment method
using morphological analyzer and evaluation
results, respectively. Finally, Section 6 presents
concluding remarks.
2 The Amharic language
Amharic, the official language of Ethiopia, is a
Semitic language that has the greatest number of
speakers after Arabic. According to the 1998
census, Amharic has 17.4 million speaker as a
mother tongue language and 5.1 million speakers
as a second language (Ethnologue, 2004).
A set of 34 phones, seven vowels and 27 con-
sonants, makes up the complete inventory of
sounds for the Amharic language (Baye, 2008).
Consonants are generally classified as stops,
fricatives, nasals, liquids, and semi-vowels. Ta-
ble 1 shows the phonetic representation of the
consonants of Amharic as to their manner of arti-
culation, voicing, and place of articulation.
Table 1. Categories of Amharic Consonants with
corresponding IPA representation.
Manner of
Articula-
tion
voicing Place of Articulation
Labials
Dentals
Palatals
Velars
Labi-
ovelar
Glot-
tals
Stops
Voiceless ፕ [p] ት [t] ች [ ] ክ [k]
ኵ [k
w
]
አ [ ]
Voiced ብ [b] ድ [d] ጅ [ ] ግ [ ]
ጕ [
w
]
Glottalized
ጵ [p’] ጥ [t’] ጭ [tʃ]
ቅ [q]
[
q
w
]
Fricatives
Voiceless ፍ [f] ስ [s] ሽ [ ] ህ [h]
Voiced ቭ [v] ዝ [z] ዥ [ ]
Glottalized
ጽ [s’]
Rounded ኊ [h
w
]
Nasals Voiced ም [m] ን [n] ኝ [ ]
Liquids Voiced ል [l]
ር [r]
Glides Voiced ው [w] ይ [j]
The seven vowels, along with their representa-
tion in Ge’ez characters, are shown in terms of
their place of articulation in Table 2. In addition
to the five vowels common among many lan-
guages, Amharic has two central vowels, /ə/ and
/ɨ/, the latter with a mainly epenthetic function.
The epenthetic vowel /ɨ/ plays a key role in syl-
labification. Moreover, in our study we found the
epenthetic vowel to be crucial for proper pronun-
ciation in Amharic speech synthesis.
Table 2. Categories of Amharic Vowels with IPA
equivalent.
front central back
High [i] [ɨ] [u]
Mid [e] [ə] [o]
low [a]
Like other languages, Amharic also has its
own typical phonological and morphological fea-
tures that characterize it. The following are some
of the striking features of Amharic phonology
that gives the language its characteristic sound
when one hears it spoken: the weak indetermi-
nate stress; the presence of glottalic, palatal, and
labialized consonants; the frequent gemination of
consonants and central vowels; and the use of the
automatic epenthetic vowel (Bender et al., 1976).
Among these, we found gemination of conso-
nants and the use of the automatic epenthetic
vowel to be very critical for naturalness in Am-
haric speech synthesis.
3 Amharic writing system and Prob-
lems in GTP Conversion
Amharic uses the Ge’ez (or Ethiopic) writing
system which originated with the ancient Ge’ez
language. In this system the symbols are conso-
nant-based but also contain an obligatory vowel
marking. Most symbols represent consonant-
vowel combinations, but there may also be a
special symbol for each consonant that represents
the plain consonant. Each Amharic consonant is
associated with seven characters (referred to as
“orders”) for the seven vowels of the language. It
is the sixth-order character that is the special
symbol, representing the plain consonant. The
basic pattern for each consonant is shown in Fig.
1, where: C=Consonant and [ ] shows vowels in
IPA.
1st
order
2nd
order
3rd
order
4th
order
5th
order
6th order
7th
order
C[ə] C[u]
C[i] C[a] C[e]
C C[o]
Figure 1. Amharic syllabic structure with exam-
ple for consonant
/l/.
Amharic writing system is partially phonetic.
According to (wolf 1995), there is more or less a
one-to-one correspondence between the sounds
and the graphemes. However, as shown in Table
3, it has some features that make it complex from
the perspective of GTP conversion.
In what follows we discuss the two main am-
biguities in Amharic orthography in more detail.
69
Table 3. Problems in Amharic grapheme-to-
phoneme (G2P) conversion.
Problem Example
Homograph Depending on the context, the word ገና
can have the meaning ‘still/yet’ or
‘Christmas’
Insertion of epenthetic
vowel [ɨ]
in words like ትምህርት, epenthetic vo-
wel should be inserted and pronounced
as /t ɨ mhɨrt/ not /tmhrt/
Introduction of semi-
vowel w, y
words like, በቅሎአችን bəqlo-aʧn ‘our
mule’ becomes በቅሎዋችን bəqlowaʧn.
Compression of succes-
sive vowels
ለ + አኔ /lə + ɨn
e
/ becomes ለኔ /lən
e
/
የአማርኛ yə-amarɨ ɲa becomes yamarɨ ɲa
3.1 Gemination
Gemination in Amharic is one of the most dis-
tinctive characteristics of the cadence of the
speech, and also carries a very heavy semantic
and syntactic functional weight. Unlike English
language in which the rhythm of the speech is
mainly characterized by stress (loudness),
rhythm in Amharic is mainly marked by longer
and shorter syllables depending on gemination of
consonants, and by certain features of phrasing
(Bender and Fulass, 1978). In Amharic, all con-
sonants except /h/ and /ʔ/ may occur in either a
geminated or a non-geminated form. Amharic,
and other languages written with the Ge’ez script,
differs from most other languages that feature
gemination, such as Japanese, Arabic, Italian,
and Tamil, in that gemination is not shown in the
orthography.
Table 4. Minimal pairs with Singleton vs. Gemi-
nate Consonants
Singleton
geminate
Orth. Pronunc.
Gloss Pronunc.
Gloss
ገና gəna still/yet
gənna Christmas
ለጋ ləga fresh ləgga he hit
ስፊ səfi tailor səffi wide
ሽፍታ ʃɨfta out law ʃɨffɨta rash
ይሰማል
yɨsǝmal he hears
yɨssǝmmal
he/it is heard
Amharic gemination is either lexical or mor-
phological. As a lexical feature it usually cannot
be predicted. For instance, ገና may be read as
/gəna/, meaning 'still/yet', or /gənna/, meaning
'Christmas'. (See Table 4 for some similar exam-
ple of minimal pairs). Although this is not a
problem for Amharic speakers because minimal
pairs are relatively infrequent, it is a challenging
problem in Amharic speech synthesis. In fact, the
failure of the orthography of Amharic to show
geminates is the main challenge in GTP conver-
sion that we found in our research. Without a
context, it is impossible to disambiguate such
forms.
On the other hand, when gemination is mor-
phological, rather than lexical, it is often possible
to predict it from the orthography of the word
alone. This is especially true for verbs (Bender
and Fulass, 1978). For example, consider two
words derived from the verb root consisting of
the consonant sequence sbr ‘break’, ስበረው and
ይሰበራሉ. The first is unambiguously /sɨbǝrǝw/
‘break (masc.sing.) it!’, the second unambi-
guously /yɨssǝbbǝrallu/ ‘they are broken’. The
fact that the /s/ and /b/ are not geminated in the
first word and are both geminated in the second
and that the /r/ is geminated in neither word is
inferable from the prefix, the suffix, and the pat-
tern of stem vowels. That is, within the verb
there is normally some redundancy. Therefore,
with knowledge of the lexical and morphological
properties of the language, it is possible to pre-
dict gemination.
3.2 Epenthesis
Epenthesis is the process of inserting a vowel to
break up consonant clusters. Epenthesis, unlike
gemination is not contrastive and it is not surpris-
ing that it is not indicated in the orthography of
Amharic and other languages. But, although it
carries no meaning, the Amharic epenthetic vo-
wel /ɨ/ (in Amharic ‘ሰርጎ ገብ’ Baye, 2008) plays a
key role for proper pronunciation of speech and
in syllabification. However, the nature and func-
tion of the epenthetic vowel has been a problem
in Amharic studies and so far no study has been
conducted on the phonetic nature of this vowel.
As noted above, Amharic script does not dis-
tinguish between consonants that are not fol-
lowed by a vowel and consonants that are fol-
lowed by the high central vowel /ɨ/, and as shown
in Fig.1, both are represented by the sixth order
(ሳድስ) character in a series. The sixth order cha-
racters are ambiguous; depending on their posi-
tion in a word; they can be either voweled (with
epenthic vowel /ɨ/) or unvoweled. For example,
in ልብ /lɨbb/ ‘heart’, the first character,
,
represents the CV sequence /lɨ/(voweled), whe-
reas in ስልክ /sɨlk/ ‘telephone’, the same character
represents the bare consonant /l/(unvoweled).
Because such differences are crucial for speech
synthesis, a TTS system needs access to the
epenthesis rules.
70
4 Proposed Amharic GTP Conversion
Method
In this section, first we discuss about Amharic
morphology and the ‘HornMorpho’ morphologi-
cal analyzer which we integrated into the
AmhTTS system. Then, we briefly discuss our
proposed GTP conversion method.
4.1 Amharic Verb Morphology
Amharic is a morphologically complex language.
As in other Semitic languages such as Arabic,
Amharic verbs consist of a stem – analyzable as
a combination of a lexical root and a pattern
representing tense, aspect, mood, and various
derivational categories and various affixes
representing inflectional categories (Bender and
Fulass, 1978). Verb roots consist of consonant
sequences, most of the three consonants. The
patterns that combine with roots to form verb
stems consist of vowels that are inserted between
the consonants and gemination of particular root
consonants.
Consider the role of gemination in distinguish-
ing the three main categories of three-consonant
(triradical) roots, traditionally referred to as types
A, B, and C (Bender and Fulass, 1978). Table 5
shows the pattern of gemination and the posi-
tions of vowels for triradical roots in the five ba-
sic tense-aspect-mood categories of the language.
In type A, except in the perfective form, there is
no gemination at all. Type B is characterized by
the gemination of the second consonant through-
out the conjugation. Type C differs from the oth-
er two types in that the second consonant is ge-
minated in the perfective and imperfective as-
pects only (Amsalu and Demeke, 2006). The sys-
tem is complicated by the fact that each root can
also occur in combination with up to 10 deriva-
tional categories, such as passive and causative.
Table 5. Vowel-gemination patterns for triradical
roots (adopted from Amsalu and Demeke, 2006)
Verb Types
Type A Type B Type C
Perfective C1VC2C2C3- C1VC2C2VC3 C1VC2C2VC3-
Imperfective
-C1VC2C3(-) -C1VC2C2C3(-)
-C1VC2C2C3(-)
Imperative C1C2VC3(-) C1VC2C2C3(-)
C1VC2C3(-)
Gerund C1VC2C3- C1VC2C2C3- C1VC 2C3-
Infinitive -C1C2VC3 -C1VC2C2C3 -C1VC2C3
The patterns shown in Table 5 are for the sim-
ple derivational category, which can be seen as
the default for most roots, but each cell in the
table could be augmented with up to 9 other pat-
terns. These different derivational categories also
affect gemination. For example, for the passive
category, the imperfective pattern for Type A
becomes -C1C1VC2C2VC3(-), with both first
and second consonants geminated.
Based on the gemination patterns, a morpho-
logical analyzer can be used to locate geminates
in an input word form. For example, the Amharic
word ይሰበራል ‘it is broken’ is correctly pro-
nounced with gemination (lengthening) of the
first and second consonants: yɨssǝbbǝral. The
gemination in this case is grammatical, and a
morphological analyzer can infer it based on its
knowledge of Amharic verb roots and the partic-
ular patterns that they occur with.
4.2 ‘HornMorpho’ Morphological Analyzer
Amharic morphology, especially verb mor-
phology, is extremely complex (Baye, 2008), but
it is relatively well understood. Thus it is possi-
ble, with considerable effort, to create a morpho-
logical analyzer for the language using finite
state techniques. HornMorpho (Gasser, 2011) is
such a system. Given a word in conventional
Amharic orthography, it infers the gemination of
consonants in the word wherever this is possible
(as well as extracting grammatical and lexical
information from the word). The rules for epen-
thesis in Amharic are also quite complex and not
completely understood, but a first approximation
has been implemented in HornMorpho. That is,
the output of HornMorpho includes the epenthet-
ic vowel /ɨ/ wherever it is expected, according to
the rules in the program, effectively disambiguat-
ing the sixth-order orthographic characters and
simplifying the overall GTP conversion.
When analyzing, the word is first Romanized
using a variant of the SERA romanization system.
Next the program checks to determine whether
the word is stored in a list of unanalyzable or
pre-analyzed words. If not, it attempts to perform
a morphological analysis of the word. It first
does this using a “lexical” approach, based on a
built-in lexicon of roots or stems and its know-
ledge of Amharic morphology. If this fails, it
tries to guess what the structure of the word is,
using only its knowledge of morphology and the
general structure of roots or stems. If the “guess-
er” analyzer fails, the program gives up. Both the
lexical and guesser analyzers operate within the
general framework known as finite state mor-
phology
1
.
1
For a more in-depth introduction to finite state morphology,
see Beesley & Karttunen(2003), for another application of
71
4.3 Finite State Morphology
Finite state morphology makes use of transducers
that relate surface forms with lexical forms. In
the case of HornMorpho, a surface form is the
orthographic representation of an Amharic word,
for example, ይሰበራል yɨsǝbǝral, and the corres-
ponding lexical form is the root of the word and
a set of grammatical features. For ይሰበራል, the
lexical representation is sbr + [sbj=3sm,
asp=imprf, vc=ps], that is, the root sbr ‘break’,
third person singular masculine subject, imper-
fective aspect, and passive voice, in English,
‘he/it is broken’.
Each of the arcs joining the states in a finite
state transducer (FST) represents an association
of an input character and an output character.
Successful traversal of an FST results in the
transduction of a string of input characters into a
string of output characters. In HornMorpho, us-
ing a technique developed by Amtrup (2003), we
add grammatical features to the arcs as well, and
the result of traversing the FST is the unification
of the features on the arcs as well as the output
string. This allows us to input and output gram-
matical features as well as character strings.
A simple FST can represent a phonological or
orthographic rule, a morphotactic rule
(representing possible sequences of morphemes),
or the form of a particular root. Because germi-
nation is both lexical and grammatical in Amhar-
ic, it plays a role in all three types. By combining
a set of such FSTs using concatenation and com-
position, we can produce a single FST that em-
bodies all of the rules necessary for handling
words in the language. Since FSTs are reversible,
this FST can be used in either direction, analysis
(surface to lexical forms) or generation (lexical
to surface forms). HornMorpho uses a single
large FST to handle the analysis and generation
of all verbs. This FST includes a lexicon of verb
roots as well as all of the hundreds of phonologi-
cal, orthographic, and morphological rules that
characterize Amharic verbs.
4.4 Proposed GTP conversion Method
For Amharic text-to-speech, we use the existing
HornMorpho analysis FST and a modified ver-
sion of the HornMorpho generation FST. We
first run the analysis FST on the input ortho-
graphic form. For example, for the word ይሰበራል,
finite state technology to Amharic morphology, see Saba &
Girma (2006).
this yields sbr + [sbj=3sm, asp=imprf, vc=ps].
Next we run a phonological generation FST on
this output, yielding the phonological output
yɨssǝbbǝral for this example. The second and
third consonants are geminated by an FST that is
responsible for the passive imperfective stem
form of three-consonant roots.
Figure 2 shows the architecture our proposed
method. First the morphological analyzer accepts
Unicode text from file and generates the corres-
ponding phonemic transcription with appropriate
prosodic markers such as geminates and indi-
cates the location of epenthetic vowel. Then, the
output of the analyzer will be an input for
AmhTTS system and further annotated by the
text analysis module to extract syllables and pro-
sodic marks. Finally the speech synthesis part
generates speech sounds.
AmhTTS synthesis system is a parametric and
rule based system designed based on the general
speech synthesis system (Takara and Kochi,
2000). The text analysis in AmhTTS system ex-
tracts the linguistic and prosodic information
from the output of a morphological analyzer and
extracts the gemination and other marks and then
converts into a sequence of syllables using the
syllabification rule. The syllabification rule uses
the following syllables structure (V, VC, V’C,
VCC, CV, CVC and CV’C and CVCC).
Figure 2. Proposed architecture for GTP conver-
sion and automatic geminate and epenthetic vo-
wel assignment
Synthetic speech
Amharic Text to Speech System
Preprocessing Morphological Analyzer
Grapheme-to-Phoneme (GTP) conversion
Geminate assignment
Epenthetic vowel assignment
Text
Speech Synthesis
Prosody Modeling
Duration and intonation modeling
Text analysis
Syllabification and Phrasing
72
5 Evaluation
A preliminary evaluation of the proposed auto-
matic geminate assignment method was made by
analyzing 666 words and we found 100% correct
assignment/restoration of gemination. The words
were selected from Armbruster (1908) verb col-
lections, where gemination is marked manually,
representing all of the verb conjugation classes
(Type A, B and C). In Type A penultimate con-
sonant geminates in Perfect only, in Type B pe-
nultimate consonant geminates throughout the
conjugation and in Type C penultimate conso-
nant geminates in Perfect and Contingent.
However, the analyzer does not analyze words
that contain non-Ge’ez characters or Ge’ez num-
erals, and also there is an incomplete list of
common words (with gemination) which it does
not attempt to analyze. For example, given the
unambiguous form ይሰበራሉ, it outputs
/yɨssebberallu/, and given the ambiguous form ገና,
it outputs both possible pronunciations /gǝna/
and gǝnna/. Words like ገና can only be inferred
by analyzing the context and finding out the
parts-of-speech (POS) of the word. But this is
beyond the scope of the current work.
6 Conclusion
The paper discussed orthographic problems in
Amharic writing system and presented prelimi-
nary results on a method for automatic assign-
ment of geminates and epenthetic vowel in GTP
conversion for Amharic TTS system. Our me-
thod is the first attempt for Amharic language
and can be easily customized for other Ethio-
Semitic languages.
However, the work described in this paper is
still on progress and for more accurate GTP con-
version, parts-of-speech (POS) tagger and phrase
break predictor needs to be implemented or ad-
dressed. For example, the word ገና, which can be
pronounced as both / gəna/ meaning ‘still/yet’
and gənna meaning ‘Christmas’, can only be in-
ferred by analyzing the context and finding out
the POS of the word.
In the future, we have a plan to disambiguate
words like ገና using syntax and use the analyzer
to extract more features and finally customize
our TTS system for other languages.
Acknowledgments
The authors would like to thank Daniel Yacob
for providing the Armbruster verb collections
where the geminatation is marked manually with
the “Tebek” symbol
(U+135F).
References
A. Chalamandaris, S. Raptis,, and P. Tsiakoulis. 2005.
Rule-based grapheme-to-phoneme method for the
Greek. In: Proc. of INTERSPEECH-2005, pp.
2937-2940, Lisbon, Portugal.
Amtrup, J. 2003. Morphology in machine translation
systems: efficient integration of finite state trans-
ducers and feature structure descriptions. Machine
Translation, 18, 213-235.
Armbruster, C. H. 1908. Initia Amharica: an intro-
duction to spoken Amharic, Cambridge: Cambridge
University Press.
Baye Yimam. 2008.
ማር ኛ
ሰ ዋ ሰ ው
(Amharic
Grammar), Addis Ababa. (in Amharic).
Beesley, K.R., and Karttunen, L. 2003. Finite state
morphology. Stanford, California: CSLI Publica-
tions.
Ethnologue. 2004: Languages of the World,
http://www.ethnologue.com/
Gasser, M. (2011). HornMorpho: a system for mor-
phological analysis and generation of Amharic,
Oromo, and Tigrinya words. Conference on Hu-
man Language Technology for Development,
Alexandria, Egypt.
Leslau, Wolf. 1995. Reference Grammar of Amharic,
Wiesbaden: Harrassowitz.
M.L Bender, J.D.Bowen, R.L. Cooper and C.A. Fer-
guson. 1976. Language in Ethiopia, London, Ox-
ford University Press.
M. Lionel Bender and Hailu Fulass. 1978. Amharic
Verb Morphology: A Generative Approach, Car-
bondale.
Paul Taylor, 2005. Hidden Markov Model for Gra-
pheme to Phoneme Conversion. In: Proc. of IN-
TERSPEECH-2005, pp. 1973-1976.
R.I. Damper, Y. Marchand, M. J. Adamson, and K.
Gustafson. 1998. A comparison of letter-to-sound
conversion techniques for English text-to-speech
synthesis”, Proceedings of the Institute of Acous-
tics, 20 (6). pp. 245-254.
Saba Amsalu and Girma A. Demeke. 2006. Non-
concatinative Finite-State Morphotactics of Am-
haric Simple Verbs, ELRC Working Papers Vol. 2;
number 3.
T. Anberbir and T. Takara. 2006. Amharic Speech
Synthesis Using Cepstral Method with Stress Gen-
eration Rule, INTERSPEECH 2006 ICSLP, Pitts-
burgh, Pennsylvania, pp. 1340-1343.
T. Takara and T. Kochi. 2000. General speech syn-
thesis system for Japanese Ryukyu dialect, Proc. of
the 7th WestPRAC, pp. 173-176.
73
... Gemination in Amharic is one of the most distinctive features of the speech's cadence, and it has great semantic and syntactic functional weight. In contrast to English, where the rhythm is mainly characterized by stress (loudness), the rhythm of Amharic is mainly characterized by longer and shorter syllables depending on the germination of consonants and certain characteristics of the phrase [1]. Amharic gemination is either lexical or morphological. ...
... Dialects: Amharic is considered one of the most challenging languages to be utilized in speech emotion recognition systems because of its huge lexical variety and complicated morphology [1]. There are four main types of Amharic dialect, namely Gojjam (Gojjamegna), Wollo (Wollogna), Shewa (Shewagna), and Gonder (Gonderegna) [21]. ...
... In total there were 65 participants (25 female, 40 male), aged from 20 to 40 years. Professional (20), Semi-professional (26), Amateur (19) Recording: In order to record the speech, we used six Huawei nova4 mobile phones, on which an Android-based speech recording software app [1] had been installed. Mobile phones were used because professional audio equipment was not available to us. ...
Preprint
Full-text available
In this paper we present the Amharic Speech Emotion Dataset (ASED), which covers four dialects (Gojjam, Wollo, Shewa and Gonder) and five different emotions (neutral, fearful, happy, sad and angry). We believe it is the first Speech Emotion Recognition (SER) dataset for the Amharic language. 65 volunteer participants, all native speakers, recorded 2,474 sound samples, two to four seconds in length. Eight judges assigned emotions to the samples with high agreement level (Fleiss kappa = 0.8). The resulting dataset is freely available for download. Next, we developed a four-layer variant of the well-known VGG model which we call VGGb. Three experiments were then carried out using VGGb for SER, using ASED. First, we investigated whether Mel-spectrogram features or Mel-frequency Cepstral coefficient (MFCC) features work best for Amharic. This was done by training two VGGb SER models on ASED, one using Mel-spectrograms and the other using MFCC. Four forms of training were tried, standard cross-validation, and three variants based on sentences, dialects and speaker groups. Thus, a sentence used for training would not be used for testing, and the same for a dialect and speaker group. The conclusion was that MFCC features are superior under all four training schemes. MFCC was therefore adopted for Experiment 2, where VGGb and three other existing models were compared on ASED: RESNet50, Alex-Net and LSTM. VGGb was found to have very good accuracy (90.73%) as well as the fastest training time. In Experiment 3, the performance of VGGb was compared when trained on two existing SER datasets, RAVDESS (English) and EMO-DB (German) as well as on ASED (Amharic). Results are comparable across these languages, with ASED being the highest. This suggests that VGGb can be successfully applied to other languages. We hope that ASED will encourage researchers to experiment with other models for Amharic SER.
... A set of sequential rewrite rules can be used to achieve this. Agglutinative languages like Turkish [8] and Amharic [9] have reported works on language specific knowledge based g2p conversion using FST technology. Epitran [3], an open source tool using rule based FSTs for g2p conversion of more than 61 world languages recently added Malayalam support, with preliminary mapping between graphemes and phonemes. ...
... The graph in Fig. 21 illustrates the phonemic richness of the corpora used in the ASR experiments described in Section VIII. The phoneme with the highest number of appearances is the inherent vowel /a/, followed by the vowel /i/ in all the corpora 9 Mlphon Web Interface https://phon.smc.org.in/ under consideration. ...
Article
Full-text available
In this article we present the design and the development of a knowledge based computational linguistic tool, Mlphon [em.el.foːɳ] for Malayalam language. Mlphon computationally models linguistic rules using finite state transducers and performs multiple functions including grapheme to phoneme (g2p) and phoneme to grapheme (p2g) conversions, syllabification, phonetic feature analysis and script grammar check. This open source software tool, released under MIT license, is developed as a one-stop solution to handle different speech related text processing tasks for automatic speech recognition, text to speech synthesis and non-speech natural language processing tasks including syllable subword based language modeling, phoneme diversity analysis and text sanity check. The tool is evaluated on a manually crafted gold standard lexicon. Mlphon performs orthographic syllabification with 99% accuracy with a syllable error rate of 0.62% on the gold standard lexicon. For grapheme to phoneme conversion task, overall phoneme recognition accuracy of 99% with a phoneme error rate of 0.55% is obtained on gold standard lexicon. Additionally an extrinsic evaluation of Mlphon is performed by employing the pronunciation lexicon created using Mlphon, in Malayalam automatic speech recognition (ASR) task. Performance analysis in terms of the computation time of lexicon creation process and the word error rate (WER) on ASR task are presented along with a comparison over other automated tools for lexicon creation. Pronunciation lexicons with more than 100k commonly used Malayalam words in phonemised and syllabified forms is created and they are published as open language resources along with this work. We also demonstrate the usage of Mlphon on different natural language processing applications - syllable subword ASR, assisted pronunciation learning, phoneme diversity analysis and text sanity check. Being a knowledge based solution with open source code, Mlphon can be adapted to other languages of similar script nature.
... Gemination in Amharic is one of the most distinctive features of the speech's cadence, and it has great semantic and syntactic functional weight. In contrast to English, where the rhythm is mainly characterized by stress (loudness), the rhythm of Amharic is mainly characterized by longer and shorter syllables depending on the germination of consonants and certain characteristics of the phrase [2]. Amharic gemination is either lexical or morphological. ...
... Dialects: Amharic can be considered a challenging language for SER because of its huge lexical variety and complicated morphology [2]. There are four main Amharic dialects, namely Gojjam (Gojjamegna), Wollo (Wollogna), Shewa (Shewagna), and Gonder (Gonderegna) [32]. ...
Article
Full-text available
In this paper we present the Amharic Speech Emotion Dataset (ASED), which covers four dialects (Gojjam, Wollo, Shewa and Gonder) and five different emotions (neutral, fearful, happy, sad and angry). We believe it is the first Speech Emotion Recognition (SER) dataset for the Amharic language. 65 volunteer participants, all native speakers of Amharic, recorded 2,474 sound samples, two to four seconds in length. Eight judges (two for each dialect) assigned emotions to the samples with high agreement level (Fleiss kappa = 0.8). The resulting dataset is freely available for download. Next, we developed a four-layer variant of the well-known VGG model which we call VGGb. Three experiments were then carried out using VGGb for SER, using ASED. First, we investigated which features work best for Amharic, FilterBank, Mel Spectrogram, or Mel-frequency Cepstral Coefficient (MFCC). This was done by training three VGGb SER models on ASED, using FilterBank, Mel Spectrogram and MFCC features respectively. Four forms of training were tried, standard cross-validation, and three variants based on sentences, dialects and speaker groups. Thus, a sentence used for training would not be used for testing, and the same for a dialect and speaker group. MFCC features were superior under all four training schemes. MFCC was therefore adopted for Experiment 2, where VGGb and three well-known existing models were compared on ASED: RESNet50, AlexNet and LSTM. VGGb was found to have very good accuracy (90.73%) as well as the fastest training time. In Experiment 3, the performance of VGGb was compared when trained on two existing SER datasets – RAVDESS (English) and EMO-DB (German) – as well as on ASED (Amharic). Results are comparable across these languages, with ASED being the highest. This suggests that VGGb can be successfully applied to other languages. We hope that ASED will encourage researchers to explore the Amharic language and to experiment with other models for Amharic SER.
... The process of converting a target word from its written form (grapheme) to its pronunciation form (phoneme) is known as grapheme-to-phoneme (GTP) [34]. For example, the phoneme for the word ሀገር (country) are ሀ (h a) ገ(g aa) ር (r ee). ...
Article
Full-text available
More than 7000 languages are spoken in the world today. Amharic is one of the languages spoken in the East African country Ethiopia. A lot of speech data is being made every day in different languages as machines are getting better at processing and have improved storing capacity. However, searching for a particular word with its respective time frame inside a given audio file is a challenge. Since Amharic has its own distinguishing characteristics, such as glottal, palatal, and labialized consonants, it is not possible to directly use models that are developed for other languages. A popular approach in developing systems for searching particular information in speech involves using an automatic speech recognition (ASR) module that generates the text version of the speech where the word or phrase is searched based on text query. However, it is not possible to transcribe a long audio file without segmentation, which in turn affects the performance of the ASR module. In this paper, we are reporting our investigation on the effects of manual and automatic speech segmentation of Amharic audio files in a spiritual domain. We have used manual segmentation as a baseline for our investigation and found out that sentence‑like automatic segmentation resulted in a word error rate (WER) closer to the WER achieved on the manually segmented test speech. Based on the experimental results, we propose Amharic speech search using text word query (ASSTWQ) based on automatic sentence‑like segmentation. Since we have achieved lower WER using the previously developed speech corpus, which is in a broadcast news domain, together with the in‑domain speech corpus, we recommend using both in‑ and out‑domain speech corpora to develop the Amharic ASR module. The performance of the proposed ASR is a WER of 53% that needs further improvement. Combining two language models (LMs) developed using training text from the two different domains (spiritual and broadcast news) allowed a WER reduction from 53% to 46%. Therefore, we have developed two ASSTWQ systems using the two ASR modules with WERs of 53% and 46%.
Chapter
We present in this chapter some basic linguistic facts about Semitic languages, covering orthography, morphology, and syntax. We focus on Arabic (both standard and dialectal), Ethiopian languages (specifically, Amharic), Hebrew, Maltese and Syriac. We conclude the chapter with a contrastive analysis of some of these phenomena across the various languages.
Article
Full-text available
In this paper we have described nonconcatenative finitestate morphotactics of Amharic simple verbs. A morphological analyzer (transducer) that analyses simple Amharic verbal stems into their roots and feature tags is developed. The transducer also functions as a morphological synthesizer. It has an interface that works for Amharic text written in Unicode encoded Fidel script. We used Xerox FiniteState Tool (XFST) and Lexicon Compiler (LEXC) to construct the finitestate lexical transducer.
Article
Full-text available
The finite-state paradigm of computer science has provided a basis for natural-language applications that are efficient, elegant, and robust. This volume is a practical guide to finite-state theory and the affiliated programming languages lexc and xfst. Readers will learn how to write tokenizers, spelling checkers, and especially morphological analyzer/generators for words in English, French, Finnish, Hungarian, and other languages. Included are graded introductions, examples, and exercises suitable for individual study as well as formal courses. These take advantage of widely-tested lexc and xfst applications that are just becoming available for noncommercial use via the Internet.
Article
We present a finite state morphology system augmented with typed feature structures as weights on transitions. This mechanism allows the use of highly efficient finite state approaches for morphological analysis and generation, while providing the rich linguistic descriptions often used in Machine Translation systems. Using a semiring interpretation, the weight of a morphological analysis result represents the possible linguistic interpretations of an input word, while the resulting character string itself represents the lemma of the input. Long-distance phenomena and infixation can be handled in an easy and elegant manner, simultaneously providing a seamless interface to subsequent linguistic processing modules. Two extensions to the basic model are discussed: the incorporation of lexical knowledge into the finite state transducer and a transformation that renders unification-based finite state models as efficient as those employing other weight structures. The model is applied to morphological operations in a Persian–English Machine Translation system.