Joint optimization of word alignment and epenthesis generation for Chinese to Taiwanese sign synthesis.
ABSTRACT This work proposes a novel approach to translate Chinese to Taiwanese sign language and to synthesize sign videos. An aligned bilingual corpus of Chinese and Taiwanese Sign Language (TSL) with linguistic and signing information is also presented for sign language translation. A two-pass alignment in syntax level and phrase level is developed to obtain the optimal alignment between Chinese sentences and Taiwanese sign sequences. For sign video synthesis, a scoring function is presented to develop motion transition-balanced sign videos with rich combinations of intersign transitions. Finally, the maximum a posteriori (MAP) algorithm is employed for sign video synthesis based on joint optimization of two-pass word alignment and intersign epenthesis generation. Several experiments are conducted in an educational environment to evaluate the performance on the comprehension of sign expression. The proposed approach outperforms the IBM Model 2 in sign language translation. Moreover, deaf students perceived sign videos generated by the proposed method to be satisfactory.
-
Citations (0)
- Cited In (2)
-
Article: Effect of spatial reference and verb inflection on the usability of sign language animations
[show abstract] [hide abstract]
ABSTRACT: Computer-generated animations of American Sign Language (ASL) can improve the accessibility of information, communication, and services for the significant number of deaf adults in the US with difficulty in reading English text. Unfortunately, there are several linguistic aspects of ASL that current automatic generation or translation systems cannot produce (or are time-consuming for human animators to create). To determine how important such phenomena are to user satisfaction and the comprehension of ASL animations, studies were conducted in which native ASL signers evaluated ASL animations with and without: establishment of spatial reference points around the virtual human signer representing entities under discussion, pointing pronoun signs, contrastive role shift, and spatial inflection of ASL verbs. It was found that adding these phenomena to ASL animations led to a significant improvement in user comprehension of the animations, thereby motivating future research on automating the generation of these animations. KeywordsAmerican sign language–Animation–Evaluation–Sign language–Spatial reference–Verb inflection–Accessibility technology for people who are deafUniversal Access in the Information Society 04/2012; · 0.33 Impact Factor -
SourceAvailable from: archives-ouvertes.fr
Article: Why is the Creation of a Virtual Signer Challenging Computer Animation ?
[show abstract] [hide abstract]
ABSTRACT: Virtual signers communicating in signed languages are a very interesting tool to serve as means of communication with deaf people and improve their access to services and information. We discuss in this paper important factors of the design of virtual signers in regard to the animation problems. We notably show that some aspects of these signed languages are challenging for up-to-date animation methods, and present possible future research directions that could also benefit more widely the animation of virtual characters.Motion in Games 2010.
Page 1
Joint Optimization of Word Alignment
and Epenthesis Generation for Chinese
to Taiwanese Sign Synthesis
Yu-Hsien Chiu, Chung-Hsien Wu, Senior Member, IEEE, Hung-Yu Su, and Chih-Jen Cheng
Abstract—This work proposes a novel approach to translate Chinese to Taiwanese sign language and to synthesize sign videos. An
alignedbilingualcorpusofChineseandTaiwaneseSignLanguage(TSL)withlinguisticandsigninginformationisalsopresentedforsign
languagetranslation.Atwo-passalignmentinsyntaxlevelandphraselevelisdevelopedtoobtaintheoptimalalignmentbetweenChinese
sentences and Taiwanese sign sequences. For sign video synthesis, a scoring function is presented to develop motion transition-
balanced sign videos with rich combinations of intersign transitions. Finally, the maximum a posteriori (MAP) algorithm is employed for
signvideosynthesisbasedonjointoptimizationoftwo-passwordalignmentandintersignepenthesisgeneration.Severalexperimentsare
conductedinaneducationalenvironmenttoevaluatetheperformanceonthecomprehensionofsignexpression.Theproposedapproach
outperforms the IBM Model 2 in sign language translation. Moreover, deaf students perceived sign videos generated by the proposed
method to be satisfactory.
Index Terms—Taiwanese sign language, language translation, sign language synthesis, video concatenation.
Ç
1
S
duals, just as spoken languages are used among the hearing
[1], [2]. Deaf individuals encounter the difficulty that most
hearing individuals communicate with spoken language. A
language barrier exists between these two populations.
Media, such as books, newspapers, and TV news, are
presented visually in written and spoken language, rather
than in the sign language with which they are most familiar.
Conversely, hearing people who communicate using sign
languageastheirsecondlanguagealsoexperiencedifficulties
in sign grammar and production [3]. The current state of the
technologiesutilizedtoprovidedeafandhearingindividuals
with access to information and communication with each
other is inadequate [4], [5]. The support provided by most
computerized grammar checkers and language translators
doesnotprovidethespecificrequirementsfortranslatingsign
language from the written or spoken language. Accordingly,
designingasigngenerationsystemisanattempttoexploitthe
concurrent effects of linguistic and gestural characteristics of
TSL to enable sign expression to be learned intuitively.
Sign language translation research concentrates on gen-
erating written language from sign language by using image
recognitionandvirtualrealitygloves[6],[7],[8],[9],[10],[11].
INTRODUCTION
IGN language is a visual/gestural language that serves as
the primary means of communication for deaf indivi-
Gesturefeaturesinvolvinghandshapes,movement,position,
and palm orientation have been employed to recognize and
segment sign image sequences. However, these methods
focusonimageprocessingmorethanmachinetranslationand
sign language analysis and have limited vocabularies.
Natural language processing is typically used to translate
fromwrittentosignlanguage.Brown[3]developedatransfer-
based model using lexical function grammar to create a
representation of American Sign Language (ASL) from
English, in which correspondences between English and
ASL are defined manually. He also reviewed other work and
thelimitationsrelatingtocomputationallinguisticsandASL.
Many statistical approaches for machine translation and
language modeling have recently been addressed [12], [13].
Alignment probability estimation is applied to resolve hard
decisions at any level of the translation process, such as the
levels of words, phrases, and sentences. Representative
systems include IBM Models [14] and Verbmobi [15]. The IBM
models,introducedbyBrownetal.,areaseriesoffivestatistical
models for machine translation, which quantify information
such as alignment of word position and translation of word
pairsfrombilingualcorpora.SincetheIBMmodelisastatistical
approach, it requires a large corpus for model training. The
IBM model is not a good choice for machine translation
between languages using only a small bilingual corpus.
Several researchers proposed various approaches for
visual display of sign language by using signing avatars
and 3D animation [16], [17], [18], [19], [20] based on motion
generation [21], [22], [23], [24], [25]. Kennaway et al. [16]
developed an avatar approach, based on motion capture and
virtualreality,tosynthesizesignanimationfromahigh-level
description of signs in terms of the HamNoSys transcription
system. Synthesizing realistic 3D not only needs to drive an
avatar to act precisely but also smoothen the textures and
movements and, accordingly, is time-consuming. Related
works based on 3D animation and talking heads were
considered by [21], [22], [23], [24], [25] and [26], [27], [28],
28IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. 29,NO. 1,JANUARY 2007
. Y.-H. Chiu is with the Home Network Technology Center, Industrial
Technology Research Institute, N200, ITRI Bldg. R1, No. 31, Gongye 2nd
Rd., Annan District, Tainan City 709, Taiwan, ROC.
E-mail: chiuyh@itri.org.tw.
. C.-H. Wu, H.-Y. Su, and C.-J. Cheng are with the Department of
Computer Science and Information Engineering, National Cheng Kung
University, No.1, Ta-Hsueh Road, Tainan City 701, Taiwan, ROC.
E-mail: {chwu, elfsu, chengc}@csie.ncku.edu.tw.
Manuscript received 1 Dec. 2005; revised 27 Apr. 2006; accepted 24 May
2006; published online 13 Nov. 2006.
Recommended for acceptance by T. Darrell.
For information on obtaining reprints of this article, please send e-mail to:
tpami@computer.org, and reference IEEECS Log Number TPAMI-0665-1205.
0162-8828/07/$20.00 ? 2007 IEEEPublished by the IEEE Computer Society
Page 2
respectively, but are impractical due to the real-time produc-
tionissuesandcomputationalcomplexity.Sincesynthesizing
realistic video frames only considers the smoothness of
concatenated videos, the dimension reduction (from 3D to
2D) greatlydecreasesthecomplexityofthesynthesissystem.
Solina and Krape? z [29] presented an approach for concate-
nating sign video clips into video clips of complete sign
sequences of Slovene Sign Language (SSL). They eliminated
redundant moves of the hands and produced smooth
transitions between sign videos. Their concatenation criteria
are based on the sense of similar palm positions between
video clips. However, the motion in the synthesized video
sequence still suffers from discontinuity and only the
difference of palm positions is considered.
This work proposes a sequence correspondence model
between Chinese and TSL by the hierarchical alignment of a
different level of grammatical components. This hierarchical
Two-Passalignmentschemecanreducethecomplexityofthe
standard IBM model 2 and is suitable for translation with a
small bilingual corpus. Grammatical components at the
syntax and phrase levels are defined to tokenize sentences
into sequences of primary units (PUs) and secondary units
(SUs). Consecutive secondary units are further concatenated
intoaphrasefragment(PF),whichisalsoregardedasaPUat
the syntactical level. In the first-pass alignment, primary
units, including phrase fragments, are aligned at the syntax
level. In the second-pass alignment, secondary units within
the phrase fragments are aligned at the phrase level. An
alignment approachmodified from IBM Model 2 ispresented
to model the correspondence between the transliterated
word/sentence pairs. A sign video database with motion
transition information was developed for sign synthesis.
Finally, the maximum a posteriori (MAP) algorithm is
employed for sign synthesis based on joint optimization of
two-pass word alignment and intersign epenthesis genera-
tion. Fig. 1 displays the system diagram for sign translation
and epenthesis generation.
The remainder of this paper is organized as follows:
Section 2 describes the development of several significant
corpora applied in this work. Section 3 then presents a
model that translates Chinese to concatenated sign videos.
Next, Section 4 summarizes the experimental results.
Conclusions are finally drawn in Section 6, along with
suggestions for future research.
2CORPUS DEVELOPMENT
Sign language is a natural language based on vision and
space.Themeaningofasignisricherthanthatofaword.The
bilingualcorpususedhereinwasannotatedwithinformation
concerning grammar and alignment from Chinese to TSL.
Spatial information, such as initial/final gestures, hand
positions,andmovingtypes,wasmanuallytagged.Todefine
the hand position, the signing space of the sign image was
simplified in this study to 25 regions. Since the intersign
region between the final region of the preceding sign video
and the initial region of the succeeding sign video is crucial
for achieving smooth video concatenation, all possible
movementtransitionsbetweentwosignshavetobeincluded
in the corpus, called the transition-balanced corpus, for sign
video filming. Fig. 2 illustrates a frame of the sign-word
“sister,” which comprises the simplified signing space and
the possible movements of the right hand. The spatial
information of a sign is annotated to build a sign transition-
balanced corpus.
2.1Bilingual Corpus with Annotated
TSL Information
Two thousand one hundred fifty nine frequently used signs
were gathered from teaching materials for deaf students in
primary schools [30]. Professor Jane S. Tsay of the Depart-
ment of Linguistics at National Chung Cheng University,
Taiwan, annotated the TSL information. Table 1 lists several
exampleswithdifferenttransliterationpatternsinthealigned
parallel corpus. The syntactic information was extracted
automaticallyusingtheCKIPAutoTagsystem[31]duringthe
annotation process. A Chinese lexicon and parser were
employed to obtain the part-of-speech (POS) sequence and
the grammatical structure of a sentence. The HowNet
knowledgebase developed by Dong [32] was adopted to
analyze semantic roles, such as agents, times, and places.
Two grammatical components, primary units (PUs) and
secondary units (SUs), as listed in Table 2, are defined by
analyzing the bilingual sentences. Primary units are defined
asthemaincomponentsofaChinesesentence,suchasverbs,
specific nouns including proper nouns, places, and time
nouns, adverbs (for modifying verbs), and conjunctions.
Secondary units are defined as the phrases that contain
common nouns, quantity nouns, adjectives, and adverbs (for
CHIU ET AL.: JOINT OPTIMIZATION OF WORD ALIGNMENT AND EPENTHESIS GENERATION FOR CHINESE TO TAIWANESE SIGN... 29
Fig. 1. System diagram for sign translation and epenthesis generation.
Fig. 2. Simplified signing space. The dotted arrows indicate the possible
transitions.
Page 3
modifyingadjectives). Thebilingual corpuscan betokenized
and denoted as PUs and SUs based on the defined
grammatical components. Furthermore, consecutive SUs are
grouped as a phrase fragment (PF), which is treated as a PU.
Fig. 3 shows the tokenization process for the sentence, “This
meeting needs five sign language (SL) translators to help.”
According to Table 2, the Chinese sentence can be tokenized
as {SU1: This (Nep), SU2: meeting (Na), PU1: need (VK), SU3:
five(Neu),SU4:SL(Na),SU5:translator(Na),PU2:help(VC)}.
The consecutive SUs can be further concatenated and
represented as phrase fragments (PF). Hence, the Chinese
sentence is represented as:
½PF1: fSU1: ThisðNepÞ;SU2: meetingðNaÞg;
PU1: needðVKÞ;PF2: fSU3: fiveðNeuÞ;SU4: SLðNaÞ;
SU5: translatorðNaÞg;PU2: helpðVCÞ?:
In the developed parallel corpus, 1,983 Chinese sentences
werealignedtoTSLsequences,yielding3,966alignedparallel
sentences. The average length of sign words was 5.8. Table 3
shows the average number of occurrences of grammatical
components in each sentence. Appendix A presents the
grammatical components and POSs defined in this work.
The model extracted 1,983 distinct aligned primary unit
sequences ðaverage length ¼ 4:6Þ and 639 distinct aligned
secondary unit sequences ðaverage length ¼ 2:7Þ. The com-
plexity of the language model was evaluated according to
perplexity defined as the average word branching factor of a
language model.
Perplexity ¼ 2H¼
Y
N
i¼1
P wi;jwi?1;wi?2;...;wi?Nþ1
ðÞ
?1
N;
ð1Þ
30IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. 29,NO. 1,JANUARY 2007
TABLE 1
Examples of the Aligned Parallel Texts
TABLE 2
A Set of Grammatical Components
Page 4
where H is the estimated entropy, which is the average
amount of information in a given word sequence
fw1;w2;......;wNg. A low value for the perplexity H
usually indicates a good language model. This work used
the bi-gram model as the baseline model for comparison
with the POS-based and two-pass alignment models.
Table 4 presents the perplexity comparison of the three
models.
2.2 Transition Balanced Corpus and Database
Development
Since the intersign region between the final region of the
precedingsignvideoandinitialregionofthesucceedingsign
video iscrucialfor obtainingsmoothvideo concatenation,all
possible movement transitions between two signs need to be
included in the transition-balanced corpus for sign video
filming. The TSL linguist helped annotate the hand positions
defined in the simplified sign space. The sign database
consisted of 891 two-handed gestures, 417 right-handed
gestures,andeight left-handed gestures(without compound
signwords).Fig.4illustratesthenumberofoccurrencesofthe
initial and final positions with the right-handed gestures
displayed as a 2D gray-scale image on the simplified signing
space. Most initial and final positions of the right-handed
gestures occur in front of the face. Fig. 5 displays the number
of occurrences of initial and final positions with two-handed
gestures. Most initial and final positions of the two-handed
gestures occur in front of the chest.
To obtain all possible concatenation information between
two signs, a POS-based sign word replacement method is
presented to build a large set of sign sequences. Every sign
word in the 1,983 aligned parallel sentences was replaced by
a sign word with the same POS in the sign database (2,159
frequently used signs). For instance, given the sign sequence
CHIU ET AL.: JOINT OPTIMIZATION OF WORD ALIGNMENT AND EPENTHESIS GENERATION FOR CHINESE TO TAIWANESE SIGN... 31
Fig. 3. An instance of Chinese to TSL translation using the two-pass alignment approach.
TABLE 3
Average Number of Occurrences of Grammatical Components in Each Sentence
TABLE 4
Perplexity Evaluation Results for the Baseline, the POS-Based, and the Grammatical Component-Based Approaches
Page 5
“Mother is very busy” fðMotherÞ=Na;ðBusyÞ=VH;ðVeryÞ=
Dfag,thesignwordswiththesamePOS,fðStudentÞ=Nagand
fðBrotherÞ=Nag, can be applied to replace the sign word
fðMotherÞ=Nag and generate two expanded sign sequences.
Inthisexample,themotiontransitions:fðStudentÞ ! ðBusyÞg
andfðBrotherÞ ! ðBusyÞgcanbeobtainedbythereplacement
method. To select a suitable sentence for video filming, a
scoring method is adopted to rank and screen the expanded
sentence Si¼ wi
STSðSiÞ ¼
Y
whereNðSiÞisthenumberofsignwordsinsentenceSi,Pðwi
denotes the word frequency of wi
the bi-gram probability. This equation reveals that a corpus
with higher sentence scores contains more frequently used
and grammatically correct sign words.
Based on this corpus, a sentence selection algorithm was
designedtoextractthesmallestpossiblenumberofsentences
1;...;wi
j;...;wi
NðSiÞ:
NðSiÞ
j¼1
P wi
j
??
!1=NðSiÞ
?
Y
NðSiÞ?1
j¼1
P wi
jþ1jwi
j
??
!1=ðNðSiÞ?1Þ
;
ð2Þ
jÞ
j, and Pðwi
jþ1jwi
jÞ represents
containing all possible transition information. First, the
number of occurrences of the possible transition patterns
for each sign word extracted from the expanded corpus was
filmed. The transition patterns were defined as video clips
between the final hand position of sign word wj and the
initial hand position of the succeeding sign word w?
each sign word, a 25 ? 25 transition number matrix was
defined with respect to 25 ? 25 possible movements from
each of the 25 regions in the simplified signing space to the
others. An example of the transition occurrence matrix Tjof
sign word wjis represented as
jþ1. For
Tj¼
I1
0
0
I2 ... I24
0...
0...
I25
0
0
F1
F2
F24
F25
12
0
0
1
0
0
...
...
0
0
0
5
2
66664
3
77775
:
ð3Þ
A large value in the transition matrix indicates a frequently
occurring transition pattern in the expanded corpus. The
candidate sentence is chosen by considering the frequency
andthe valueinthetransition matrix ofthesign words inthe
sentence. Therefore, an information scoring scheme for
selectingthemostappropriatesentencesisdefinedasfollows:
SelectðSiÞ ¼ STSðSiÞ ?NTPðSiÞ
TPðSiÞ
;
ð4Þ
where NTPðSiÞ denotes the number of transition patterns
in the expanded sentence Sinot included in the balanced
corpus, and TPðSiÞ represents the total number of transi-
tion patterns in the expanded sentence Si. A large value of
NTPðSiÞ=TPðSiÞ demonstrates that sentence Sicontributes
many transition patterns and, therefore, has high priority
for selection. For each transition pattern in a certain sign
word, only the sentence with the highest SelectðSiÞ value
can be chosen and added to the balanced corpus.
Many online resources involving sign images and videos
can be found for sign video synthesis. Most sign video
databases lack the transition information from one sign
word to the next. The initial and final gestures in these sign
video databases are generally performed in front of the
abdomen. The discontinuity problem typically appears in
the video synthesis output. This study applied an alter-
native way to acquire sign videos from the sentence-level
videos. To achieve this purpose, a systematic method of
developing a sign video database with rich information of
motion transition between sign videos.
32IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. 29,NO. 1,JANUARY 2007
Fig. 4. Number of occurrences of (a) initial and (b) final positions associated with right-handed gestures displayed as a 2D gray-scale image on the
simplified sign space.
Fig. 5. Number of occurrences of (a) initial and (b) final positions
associated with two-hand gestures illustrated as a 2D gray-scale image
on the simplified sign space.
Page 6
A sentence-level sign video database was developed from
the transition-balanced corpus. The imaging conditions
include a fixed distance between signer and the camera and
consistent dressing and light during a two-month filming
period. All videos were calibrated for light conditions and
signingspacepositions.Thesamelinguistwhoconductedthe
annotation of hand positions also helped segment sign
videos. Each sign video is annotated with the complete
signing process from the initial gesture to the final gesture
and the transition regions between the preceding and
succeeding sign words.
According to the sentence selection algorithm, 409 transi-
tion-balanced sentences were selected from the expanded
sentences database for inclusion in the transition-balanced
corpus. The corresponding video sequences were filmed,
segmented, and annotated based on this corpus. The linguist
helped segment the video sequences and annotate the
positions of the initial and final gestures and the transition
regions of each sign video. In total, 1,810 sign videos were
extracted. The average number of video frames in the initial
and final transition regions was 7.4 and 12.6, respectively.
3TAIWANESE SIGN TRANSLATION
IBM models [14] have been proven to be helpful for machine
translation. IBM model 2, which is based on word-by-word
alignment, models a translation sentenceeT from source
as follows:
language S with length m to target language T with length l
eT ¼ argmax
T
PðTjSÞ ? argmax
T
PðSjTÞPðTÞ;
ð5Þ
where PðSjTÞ denotes the translation model and PðTÞ is the
language model of target language T. The translation
model PðSjTÞ is estimated as:
PðSjTÞ ¼ PðS;ajTÞ ¼ "ðmjlÞ
Y
m
j¼1
X
l
i¼0
tðSjjTiÞaðijj;l;mÞ;
ð6Þ
where "ðmjlÞ denotes the string length probability, tðSjjTiÞ
represents the translation probabilities for translating Sjto
Ti, and aðijj;l;mÞ is the alignment probabilities to align
location i of source language with length m to location j of
the target language with length l. The most probable
sentenceeT of the target language is obtained by IBM model 2.
e T ¼ argmax
¼ argmax
T
PðTjSÞ ? argmax
Y
T
PðSjTÞPðTÞ
T
"ðmjlÞ
m
j¼1
X
l
i¼0
tðSjjTiÞaðijj;l;mÞ
!
PðTÞ:
ð7Þ
Since IBM model 2 is a statistical model, it requires a large
corpusformodeltraining.IBMmodel2hastheproblemofdata
sparseness for TSL translation using only a small bilingual
corpus. This study presents an approach to modeling the
sequencecorrespondencebetweentheChineseandTSLusing
hierarchical alignment of different levels of grammatical
components.Thishierarchical two-pass alignmentcan lower
thecomplexityofthestandardIBMmodel2andissuitablefor
translation using a small bilingual corpus. For a Chinese
sentence, function words/quantifiers are first removed to
generate the Chinese grammatical component sequence,
including PUs and SUs. Consecutive SUs are further
concatenated as a complete unit and defined as a PF. For a
given Chinese sentence, a Chinese PU sequence is generated
as CPUN
PFs. An alignment-based translation model modified from
IBM Model 2 was used for Chinese and TSL sequence
alignment. The Chinese PU sequence CPUN
the sign PU sequence SPUN
equal sequence length, N, based on the syntax-level (first-
pass)alignment.Inthephrase-level(second-pass)alignment,
the Chinese SU sequence within the ith PF PFi?
CSUMi
1¼ CSU1;CSU2;...;CSUMi, where length Mi of the
alignedsignPUsequenceSPUN
1isfurtheralignedtoasignSU
sequence SSUMi
1
¼ SSU1;SSU2;...;SSUMi. Fig. 3 shows an
example for the two-pass alignment. Table 5 shows an
example of thebilingual sentences with aligned grammatical
components. Fig. 6 shows the sign sequence {“I,” “Meeting,”
“Taipei”}translatedfromtheChinesesentence“Iamhavinga
meeting in Taipei.” The sign-word “I” is a one-handed sign
and the sign-words “Meeting” and “Taipei” are two-handed
signs. Each sign can be denoted as an initial gesture (hand
shape), followed by a motion (palm orientation), and a final
gesture(handshape).Somecriteriamustbeconsideredwhen
modeling the intersign hand transition for sign video
synthesis.
The two-pass alignment translation model modified
from IBM Model 2 is used to choose the sign sequence?SN
(length N) with the highest probability among all possible
target sequences according to Bayes’ decision rule:
1¼ CPU1;CPU2;...;CPUNwith N PUs, including
1is aligned to
1¼ SPU1;SPU2;...;SPUN, with
1
CHIU ET AL.: JOINT OPTIMIZATION OF WORD ALIGNMENT AND EPENTHESIS GENERATION FOR CHINESE TO TAIWANESE SIGN...33
TABLE 5
Examples of the Aligned Parallel Texts Using Grammatical Components
Page 7
?SN
1¼ argmax
S
P
?
?
X
SN
1jCPUN
1
?
?
¼ argmax
S
P
0
CPUN
1jSN
1
P
?
SN
1
?
¼ argmax
S
SPUN
1
P
?
CPUN
1;SPUN
1jSN
1
?
@
1
AP
?
SN
1
?
:
ð8Þ
Although all possible sign PU sequences SPUN
included in the estimation of the conditional probability,
only the maximally aligned sign PU sequence SPUN
considered herein. Therefore, the equation can be simpli-
fied as
?
¼ argmax
S
?
? argmax
S
?
ffi argmax
S
?
1should be
1
is
?SN
1? argmax
S
max
SPUN
1
P
?
1jSPUN
CPUN
1;SPUN
?
?
1jSN
?
1
?
1jSN
? P
?
SN
1
?
!
?
ffi argmax
S
PCPUN
1
? PSPUN
1
?
? PSN
1
?
P
?
CPUN
1jSPUN
1
?
X
SSUN
1
PSPUN
1;SSUN
1jSN
?
?
1
1
?
0
@
1
?
A? P
? P
?
SN
1
?
P
?
CPUN
?
CPUN
?
1jSPUN
?
1jSPUN
1
SPUN
1jSSUN
1
?
? PSSUN
1jSN
P
1
? PSN
1
?
?
? P
?
SPUN
1jSSUN
1
?
? PSSUN
1;SN
1
;
ð9Þ
where PðCPUN
alignment probabilities for the primary and secondary units,
respectively. PðSSUN
visual language model probability considering both the
syntactic and visual information of signs for intersign
epenthesis. These three probabilities are described in the
following sections.
1jSPUN
1Þ and PðSPUN
1jSSUN
1Þ denote the
1;SN
1Þ is defined as the TSL syntactic-
3.1
Consider the sequence of primary and secondary units,
each of which is independent of the others. The modified
alignment models can be estimated as follows:
?
X
Alignment Probability Estimation
PCPUN
1jSPUN
1
?
¼
X
aN
1
P
?
CPUN
1;aN
1jSPUN
1
?
¼
aN
1
Y
N
j¼1
Pðajjj
?
? P
?
CPUjjSPUaj
?!
ð10Þ
;
P
?
SPUN
1jSSUN
1
?
¼
X
X
bN
1
P
?
SPUN
1;bN
1jSSUN
1
?
¼
bN
1
Y
N
j¼1
PðbjjjÞ ? P
?
SPUjjSSUbj
?!
;
ð11Þ
where ajis the alignment mapping j ! i ¼ aj, which aligns
CPUjin position j to SPUiin position i ¼ aj, and PðajjjÞ
and PðbjjjÞ denote the PU and SU alignment probabilities,
respectively. Fig. 7 depicts the alignment concept.
3.2TSL Language Model Probability Considering
Intersign Epenthesis
In statistical machine translation, a language model of the
target language can be adopted to validate the translation
result.Theprobabilityofthesyntactic-visuallanguagemodel
PðSSUN
tion of signs for intersign epenthesis, is defined as follows:
?
?
i¼1
where PðSSUijSSUi?1
probability and IEðSi?1;SiÞ, defined in (15), represents the
score for intersign epenthesis based on the visual informa-
tion. For the problem of sparse data, the bi-gram language
model and Long-Distance Information (LDI) are applied to
1;SN
1Þ, based on both syntactic and visual informa-
PSSUN
1;SN
?
1
?
Y
N
PSSUijSSUi?1
?
? IE
?
Si?1;Si
?hi
;
ð12Þ
1
Þ denotes the N-gram language model
34IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. 29,NO. 1,JANUARY 2007
Fig. 6. Illustration of the sign sequence {I, meeting, Taipei} translated
from the Chinese sentence “I am having a meeting in Taipei”, and video
concatenation.
Fig. 7. Illustration of the first-pass alignment.
Page 8
smooth the language model. The syntactic-visual language
model probability PðSSUN
PðSSUN
Y
? IEðSi?1;SiÞ:
1;SN
1Þ is further defined as
1;SN
1Þ
¼
N
i¼1
P
?
SSUijSSUi?1
?
?
?
max
k¼1?i?1LDI
?
SSUk;SSUi
??!1
2
ð13Þ
The LDI between SSUkand SSUiof the varying distance is
estimated as follows:
LDIðSSUk;SSUiÞ ¼PðSSUk;SSUiÞ
PðSSUkÞ
;1 ? k ? i ? 1: ð14Þ
In this equation, if
LDIðSSUkð1?k?i?1Þ;SSUiÞ > LDIðSSUi?1;SSUiÞ;
then the language model probability can be increased by
considering the long-distance information; otherwise, the
language model probability equals the bi-gram probability.
The intersign epenthesis score, which estimates the
transition smoothness for intersign epenthesis based on
the direction and distance between two consecutive signs, is
defined as follows:
IEðSi?1;SiÞ ¼
1
1 þ e??DðSi?1;SiÞ;
ð15Þ
DðSi?1;SiÞ ¼ ? ? DlðCurveFðSi?1Þ;CurveIðSiÞÞ
þ ð1 ? ?Þ ? D?ðCurveFðSi?1Þ;CurveIðSiÞÞ;
ð16Þ
where ? is the weighting factor, which is calculated
experimentally. CurveFðSi?1Þ represents the final part of
the signing curve of sign Si?1 in the intersign transition.
Similarly, CurveIðSiÞ denotes the initial part of the signing
curve of sign Si in the intersign transition. DlðCurveF
ðSi?1Þ;CurveIðSiÞÞ is defined as the distance between the
hand positions of the frame in the final part of the signing
curve of sign Si?1 and the frame in the initial part of the
signing curve of sign Sibased on the Euclidean distance:
DlðCurveFðSi?1Þ;CurveIðSiÞÞ
¼ DlðZðxFi?1;yFi?1Þ;ZðxIi;yIiÞÞ
¼ ? ?
where ? denotes the normalization factor, which equals the
diagonal distance in the video frame and Zðx;yÞ is the curve
function of a given hand motion trajectory, created by the
curve fitting algorithm. D?ðCurveFðSi?1Þ;CurveIðSiÞÞ is
determined with the included angles between the tangent
direction from the hand position of the frame in the final
part of the signing curve of sign Si?1to the hand position of
the frame in the initial part of the signing curve of sign Si:
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðxIi? xFi?1Þ2þ ðyIi? yFi?1Þ2
q
;
ð17Þ
D?ðCurveFðSi?1Þ;CurveIðSiÞÞ ¼ ?cos?1
?Fi?1? ?Ii
k?Fi?1kk?Iik
??
ð18Þ
;
where ? denotes the normalization factor defined as 1=?.
The tangent vector is calculated by differentiating the curve
function Zðx;yÞ at the specific point ðx;yÞ. These two
vectors, ?Fi?1and ?Ii, are defined as follows:
@ZðxFi?1;yFi?1Þ
@xFi?1
?
Fig. 8 illustrates the calculation of the optimal points for
concatenating two sign videos based on the distance and
the tangent direction between the preceding sign Si?1and
the succeeding sign Si.
?Fi?1¼
;@ZðxFi?1;yFi?1Þ
@yFi?1
??
;
ð19Þ
?Ii¼
@ZðxIi;yIiÞ
@xIi
;@ZðxIi;yIiÞ
@yIi
?
:
ð20Þ
4RESULTS AND DISCUSSION
Several experiments were conducted to evaluate the quality
of translation and synthesis of Taiwanese sign videos. The
TOP-N measure was applied to determine the translation
CHIU ET AL.: JOINT OPTIMIZATION OF WORD ALIGNMENT AND EPENTHESIS GENERATION FOR CHINESE TO TAIWANESE SIGN...35
Fig. 8. Determination of the optimal points for concatenating two sign videos based on the distance and the tangent direction between the preceding
and succeeding signs.
Page 9
quality. The trajectory was analyzed to show the smooth-
ness of the synthesized videos. In subjective evaluation on
understanding, deaf primary school students were invited
to watch the synthesized videos of the question and give the
corresponding answers. These evaluations are described in
the following sections.
4.1
In the evaluation of the translation performance, the V -fold
cross-validation technique [34] was used to ensure that the
training data were utilized effectively. The corpus was thus
split into 10 disjoint subsets, i.e., V ¼ 10. Each subset
containing about 198 aligned parallel sentences was used in
turn as an independent test set while the remaining V ? 1
subsetscontaining1,785alignedparallelsentenceswereused
for alignment model training. If the target sign sequence of
interest was included in the Top-N candidate list of the
sentences translated from a given test Chinesesentence, then
the translation was considered as correct. Fig. 9 displays the
comparison results between the proposed two-pass align-
ment approach and IBM Model 2. The vertical axis indicates
the correct translation rate, and the horizontal axis indicates
the number of candidates, represented as Top-N. The
proposed approach outperformed IBM Model 2 and had a
loweralignmentcomplexity.TheperformanceofIBMModel2
was limited by the small corpus size in this task. The correct
translation rates of Top-1 and Top-5 were 75.1 and 84.7 per-
cent, respectively. Some sign sentences ranked among Top-5
were also found to be acceptable in terms of TSL. These
sentences with different word orders express different
senses. When considering this factor, the Top-5 correct
translation rate in this experiment reaches 91.1 percent.
Evaluation on the Performance of Translation
4.2Evaluation on the Performance of Sign Video
Synthesis
ATSLlearningsystemwasdevelopedtoevaluatethesystem
performanceinpractice.Fig.10showstheinterfaceoftheTSL
learningsystem.ThesystemwasimplementedinVisualC++
version6.0undertheWindows2000platformwithP42.0GHz
CPU and 512 MB RAM. Two input modes of text and speech
were provided. This system translates an input Chinese
sentence into a TSL sign sequence and outputs the synthe-
sized video sequence. Five profoundly deaf students (three
male and two female native signers) in the sixth grade at the
primaryschoolinTainan,Taiwan,evaluatedtheutilityofthe
proposed approach as a practical learning aid.
4.2.1 Evaluation on the Sign Video Concatenation
The distance and the tangent direction between the hand
positions were considered in sign video concatenation, as
described in Section 3.2. The effects of the factors were
investigated by subjective and objective evaluations of
four cases:
1.
2.
3.
4.
considering both distance and direction,
considering only direction,
considering only distance, and
direct concatenation.
The weighting factor ? in (16) was set to 0.5.
Intheobjectiveevaluation,thehand’smotiontrajectoryof
acertainsignvideosequence,chosenfromthesentence-level
sign videos of the developed database, was compared with
the proposed approach using Cases 1, 2, 3, and 4. Fig. 11
shows the hand position curves of the synthesized frames
(from frames 13? 23 and 37? 43) for Cases 1, 2, 3, and the
groundtruthsignvideo.Thehandpositionisrepresented by
XandY-coordinates.Thepositionistrackedautomaticallyby
a hand position tracking algorithm. In this figure, Case 3
achieved the smallest root mean square (RMS) error con-
sidering only the distance of hand transitions. However, in
36IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. 29,NO. 1,JANUARY 2007
Fig. 9. Comparison of the proposed modified alignment approach and
IBM model 2.
Fig. 10. Interface of the Taiwanese sign language learning system.
Fig. 11. An example of moving trajectories between two concatenated
sign videos for four concatenation cases.
Page 10
this study, the smoothness is defined as the metric consider-
ing both distance and direction. Accordingly, a subjective
evaluationwasfurtherconductedtoevaluatethenaturalness
of the concatenated sign video outputs. In subjective
evaluation,threeordinalopinions,good,fair,andpoor,were
determinedbytheviewers/subjectsbasedonthesynthesized
sign videos. Thirty-five sign sequences, randomly selected
fromthealignedparallelcorpus,wereusedasthetestcorpus
for each subject. Fig. 12 shows the subjective evaluation
results. A ?2-test yielded significance levels of p < 0:001,
revealing significant differences among Cases 1, 2, 3, and 4.
The horizontal axis indicates the subjective opinions, and the
vertical axis indicates the histogram of the three opinions.
Experimental results indicate that the synthesized sign
videos for Case 1 outperformed the other cases.
4.2.2 Case Study
Asubjectiveevaluation,basedonthereadingcomprehension
of synthesized sign videos by the subjects, was undertaken.
The performance evaluation used three ordinal opinions,
good,fair,andpoor.Thetextsentencesfortestingwerechosen
from questions in the teaching materials. Two test sentences
containing 20 long sentences ðaverage length ¼ 8:2Þ and
20 short sentences ðaverage length ¼ 4:4Þ were also consid-
ered when investigating the performance of TSL translation.
Fig. 13 displays the evaluation results. The horizontal axis
indicates the subjective opinions, and the vertical axis
indicates the histogram of the three opinions. Experimental
resultsshowthattheperformanceofreadingcomprehension
achieved 75 percent with 43 percent fair and 32 percent good
opinions.Theperformanceforshortsentencesoutperformed
thatforlongsentences.A?2-testyieldedsignificancelevelsof
p < 0:001, revealing significant differences between the
results of the two methods. The translation results indicate
that a large number of phrase fragments in a long sentence
leads to a low translation accuracy. For time complexity
evaluation, Table 6 lists the time requirement for both
translation and video synthesis.
5CONCLUSIONS
This work has presented an innovative approach to joint
optimization of TSL translation and sign video synthesis for
teaching and learning TSL. In the translation part, an
alignment-based translation model, modified from IBM
model 2, is presented to transliterate the Chinese into TSL by
estimating the aligned probabilities of the grammatical
structure. Grammatical components are useful for modeling
the transliterated correspondences and for reducing the
perplexity of the alignment model. The long-distance
information approach and intersign epenthesis score can
be used to acquire local relationships within TSL grammar
at either the syntactic or the visual level and reinforce the
probability estimation using bi-gram model. Experimental
results in the evaluation of the aligned parallel corpus reveal
that the proposed translation framework outperformed IBM
Model 2.
The proposed sign video database provides rich informa-
tionofmotion transitionsbetween signvideos fordisplaying
thetranslationresult.Theproposedsignvideoconcatenation
approach searches the optimal sequence of sign video clips
among the possible translated TSL sequences by computing
the maximum epenthesis score based on the distance and
direction of hand’s positions. Experimental results in the
subjective and objective evaluation reveal that the proposed
sign language synthesis approach outperformed the ap-
proachdescribedin[14].Thecasestudyalsoindicatesthatthe
proposed TSL learning system performed well for reading
comprehension.
Current work on sign language processing focuses on
building gesture recognition systems with large vocabul-
aries. Grammatical inflection [11] is a difficult issue for the
recognition of continuous signing sequences in these
systems and also affects the reconcilability of the synthe-
sized results in sign synthesis. Different inflections of a sign
in the sentences for sign video filming in this study.
However, collecting enough data for all inflections is often
difficult. Fortunately, speakers can correct and, therefore,
understand the synthesized sign output with inflection
based on their knowledge and experiences. Problems with
inflections for translation from written language to gesture
language need to be studied in the future. To achieve this
requirement, the phonological constraints of TSL and the
effects of sign inflection could be applied to improve the
naturalness and fluency of sign expression.
CHIU ET AL.: JOINT OPTIMIZATION OF WORD ALIGNMENT AND EPENTHESIS GENERATION FOR CHINESE TO TAIWANESE SIGN...37
Fig. 12. Subjective evaluation results of sign video synthesis.
Fig. 13. Case study results.
TABLE 6
Evaluation of Time Requirement
Page 11
APPENDIX A
Table 7 shows the grammatical components and POSs
defined in this study.
ACKNOWLEDGMENTS
TheauthorswouldliketothanktheNationalScienceCouncil
of the Republic of China, Taiwan, for financially supporting
this research under Contract No. NSC 94-2614-E-006-073.
REFERENCES
[1] S. Wilcox and P.P. Wilcox, Learning to See. Gallaudet Univ. Press,
1997.
F. Alonso, A. Antonio, J.L. Fuertes, and C. Montes, “Teaching
Communication Skills to Hearing-Impaired Children,” IEEE
Multimedia, pp. 55-67, 1995.
C. Brown, “Assistive Technology Computers and Persons with
Disabilities,” Comm. ACM, vol. 5, pp. 36-46, 1992.
D.L. Speers, “Representation of American Sign Language for
Machine Translation,” PhD dissertation, Graduate School of Arts
and Sciences, Georgetown Univ., 2001.
L.L. Lloyd, D.R. Fuller, and H.H. Arvidson, Augmentative and
Alternative Communication: A Handbook of Principles and Practices.
Allyn and Bacon, Inc., 1997.
C. Vogler and D. Metaxas, “Toward Scalability in ASL Recogni-
tion: Breaking Down Signs into Phonemes,” Lecture Notes in
Artificial Intelligence, vol. 1739, pp. 211-224, 1999.
C. Vogler and D. Metaxas, “A Framework for Recognizing the
Simultaneous Aspects of American Sign Language,” Computer
Vision and Image Understanding, no. 81, pp. 358-384, 2001.
T. Starner, J. Weaver, and A. Pentland, “Real-Time American Sign
Language Recognition Using Desk and Wearable Computer-
Based Video,” IEEE Trans. Pattern Analysis and Machine Intelligence,
vol. 20, no. 12, pp. 1371-1375, Dec. 1998.
M.C. Su, Y.X. Zhao, H. Huang, and H.F. Chen, “A Fuzzy Rule-
Based Approach to Recognizing 3-D Arm Movements,” IEEE
Trans. Neural Systems and Rehabilitation Eng., vol. 9, no. 2, 2001.
[10] R. Liang, “Continuous Gesture Recognition System for Taiwanese
Sign Language,” PhD dissertation, Nat’l Taiwan Univ., 1997.
[11] S.C.W. Ong and S. Ranganath, “Automatic Sign Language
Analysis: A Survey and the Future beyond Lexical Meaning,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 6,
pp. 873-891, June 2005.
[12] C.C. Manning and H. Schu ¨tze, Foundations of Statistical Natural
Language Processing. MIT Press, 1999.
[13] W. Chou and B.H. Juang, Pattern Recognition in Speech and
Language Processing. CRC Press, 2003.
[14] P.F.Brown,S.A.DellaPietra,V.J.DellaPietra,andR.L.Mercer,“The
Mathematics of Statistical Machine Translation: Parameter Estima-
tion,” Computational Linguistics, vol. 19, no. 2, pp. 263-311, 1993.
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[15] H. Ney, S. Niessen, F. Och, H. Sawaf, C. Tillmann, and S. Vogel,
“Algorithms for Statistical Translation of Spoken Language,” IEEE
Trans. Speech and Audio Processing, vol. 8, no. 1, pp. 24-36, 2000.
[16] R. Kennaway, “Synthetic Animation of Deaf Signing Gestures,”
Proc. Fourth Int’l Workshop Gesture and Sign Language Based Human-
Computer Interaction, 2001.
[17] A. Irving and R. Foulds, “A Parametric Approach to Sign
Language Synthesis,” Proc. SIGACCESS, pp. 212-213, 2005.
[18] Y. Chen, W. Gao, G. Fang, C. Yang, and Z. Wang, “CSLDS:
Chinese Sign Language Dialog System,” Proc. IEEE Int’l Workshop
Analysis and Modeling of Faces and Gestures, pp. 236-237, 2003.
[19] A.B. Grieve-Smith, “SignSynth: A Sign Language Synthesis
Application Using Web3D and Perl,” Proc. Gesture Workshop,
pp. 134-145, 2001.
[20] E.J. Holden, J.C. Wong, and R. Owens, “An Effective Sign
Language Display System,” Proc. Eighth Int’l Symp. Signal
Processing and Its Applications, vol. 1, pp. 54-57, 2005.
[21] O. Arikan and D.A. Forsyth, “Interactive Motion Generation from
Examples,” Proc. 29th Ann. Conf. Computer Graphics and Interactive
Techniques, pp. 483-490, 2002.
[22] Y. Li, T. Wang, and H.Y. Shum, “Motion Texture: A Two-Level
Statistical Model for Character Motion Synthesis,” ACM Trans.
Graphics, vol. 21, no. 3, pp. 465-472, 2002.
[23] L. Kovar, M. Gleicher, and F. Pighin, “Motion Graphs,” Proc. ACM
SIGGRAPH, pp. 473-482, 2002.
[24] J. Lee, J. Chai, P.S.A. Reitsma, J.K. Hodgins, and N.S. Pollard,
“Interactive Control of Avatars Animated with Human Motion
Data,” Proc. ACM SIGGRAPH, pp. 491-500, 2002.
[25] S.W. Kim, Z.X. Li, and Y. Aoki, “On Intelligent Avatar Commu-
nication Using Korean, Chinese and Japanese Sign-Languages: An
Overview,” Proc. Eighth Control, Automation, Robotics and Vision
Conf., vol. 1, pp. 747-752, 2004.
[26] Y. Cao, P. Faloutsos, E. Kohler, and F. Pighin, “Real-Time Speech
Motion Synthesis from Recorded Motions,” Proc. ACM SIG-
GRAPH Eurographics Symp. Computer Animation, pp. 347-355, 2004.
[27] T. Ezzat, G. Geiger, and T. Poggio, “Trainable Video-Realistic
Speech Animation,” Proc. ACM SIGGRAPH, vol. 21, pp. 388-397,
2002.
[28] C. Bregler, M. Covell, and M. Slaney, “Video Rewrite: Driving
Visual Speech with Audio,” Proc. ACM SIGGRAPH, pp. 353-360,
1997.
[29] F. Solina and S. Krape? z, “Synthesis of the Sign Language of the
Deaf from the Sign Video Clips,” Electrotechnical Rev., vol. 66,
pp. 260-265, 1999.
[30] Ministry of Education, Division of Special Education, Changyong
Cihui Shouyu Huace (Sign Album of Common Words), vol. 1. Taipei:
Ministry of Education, 2000.
[31] “The Chinese Knowledge Information Processing Group, Analysis
ofChinesePartofSpeech,”CKIPTechnicalReport,no.93-05,Inst.of
Information Science, Academic Sinica, Taipei, 1993 (in Chinese).
[32] Z. Dong, The HowNet Web Site, http://www.keenage.com, 1999.
[33] Inst. of Linguistics, Nat’l Chung Cheng Univ., Chiayi, Taiwan,
Proc. Int’l Symp. Taiwan Sign Language Linguistics, http://
www.ccunix.ccu.edu.tw/~lngsign/tsl-links-e.htm, 2003.
38IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. 29,NO. 1,JANUARY 2007
TABLE 7
Grammatical Components and POSs Defined in This Study
Page 12
[34] P.A. Lachenbruch and M.R. Mickey, “Estimation of Error Rate in
Discriminant Analysis,” Technometrics, pp. 1-11, 1968.
[35] S. Shott, Statistics for Health Professionals. W.B. Sauders, 1990.
[36] C.H. Wu, Y.H. Chiu, and C.S. Guo, “Text Generation from
Taiwanese Sign Language Using a PST-Based Language Model for
Augmentative Communication,” IEEE Trans. Neural Systems and
Rehabilitation Eng., vol. 12, no. 4, pp. 441-454, 2004.
[37] C.H. Wu, Y.H. Chiu, and K.W. Cheng, “Error-Tolerant Sign
Retrieval Using Visual Features and Maximum A Posteriori
Estimation,” IEEE Trans. Pattern Analysis and Machine Intelligence,
vol. 26, no. 4, pp. 495-508, Apr. 2004.
Yu-Hsien Chiu received the BS degree in
electrical engineering from I-Shou University,
Kaohsiung, Taiwan, ROC, in 1997, and the MS
degree in biomedical engineering from National
Cheng Kung University, Tainan, Taiwan, in
1999. He received the PhD degree from the
Department of Computer Science and Informa-
tion Engineering, National Cheng Kung Univer-
sity, Tainan, Taiwan, in 2004. His research
interests include speech and biomedical signal
processing, embedded system design, spoken language processing,
and sign language processing for the hearing-impaired.
Chung-Hsien Wu received the PhD degree in
electricalengineeringfromNationalChengKung
University, Tainan, Taiwan, ROC, in 1991. Since
August 1991, he has been with the Department
of Computer Science and Information Engineer-
ing, National Cheng Kung University, Tainan,
Taiwan. He became a professor in August 1997.
He is currently the editor-in-chief for the Inter-
national Journal of Computational Linguistics
and Chinese Language Processing. His re-
search interests include speech recognition, text-to-speech, and multi-
mediainformationretrieval.Dr.WuisaseniormemberofIEEE.Heisalso
a member of International Speech Communication Association (ISCA)
and ROCLING.
Hung-Yu Su received the BS and MS degrees
from the Department of Computer Science and
Information Engineering, National Cheng Kung
University, Tainan, Taiwan, in 2001 and 2003,
respectively. Currently,he isa PhD studentin the
Department of Computer Science and Informa-
tion Engineering, National Cheng Kung Univer-
sity, Tainan, Taiwan. His research interests
include natural language processing, machine
translation, and sign language processing for the hearing-impaired.
Chih-Jen Cheng received the BS degree in
information engineering from Yuan Ze Univer-
sity, Chung-Li, Taiwan, in 2001, and the MS
degree in information engineering from National
Cheng Kung University, Tainan, Taiwan, in
2003. His research interests include digital
signal processing, natural language processing,
machine translation, and image processing.
. For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.
CHIU ET AL.: JOINT OPTIMIZATION OF WORD ALIGNMENT AND EPENTHESIS GENERATION FOR CHINESE TO TAIWANESE SIGN...39
View other sources
Hide other sources
-
Available from Chung-Hsien Wu · 22 Feb 2013
-
Available from itri.org.tw