Verb Pattern Based Korean-Chinese Machine Translation System
-
Citations (0)
-
Cited In (0)
Page 1
Verb Pattern Based Korean-Chinese Machine Translation System
Kim, Changhyun
NLP Team
Human Information Processing Dept.
ETRI,Korea
chkim@etri.re.kr
Hong, Munpyo
NLP Team
Human Information Processing Dept.
ETRI,Korea
Hmp63 1 08 @etri.re.icr
Yang,Sung
NLP Team
Human Information Processing Dept.
ETRI,Korea
siyang@etri.re.kr
Kim, Young Kil
NLP Team
Human Information Processing Dept.
ETRI,Korea
kimyk@etri.re.kr
Seo, Young Ae
NLP Team
Human Information Processing Dept.
ETRI,Korea
yaseo@etri.re.kr
Sung-Kwon Choi
NLP Team
Human Information Processing Dept.
ETRI,Korea
choisk@etri.re.kr
Abstract
This paper describes our ongoing Korean-Chinese machine translation system, which is based
on verb patterns. A verb pattern consists of a source language pattern part for analysis and a
target language pattern part for generation. Knowledge description on lexical level makes it
easy to achieve accurate analyses and natural, correct generation. These features are very
important and effective in machine translation between languages with quite different
linguistic structures including Korean and Chinese. We performed a preliminary evaluation of
our current system and reported the result in the paper.
1?
Introduction
Machine translation requires correct analysis of source languages, appropriate generation into target
languages and a large amount of knowledge such as rules, statistics or patterns. Especially persistent and
consistent knowledge acquisition and management, monotonic improvement of performance according
to knowledge accumulation are the keys to machine translation.
Rule based methods suffer from knowledge acquisition and consistent management. Statistical methods
show no connections between the previous statistical knowledge and the new statistical knowledge and
have difficulty in reflecting linguistic phenomena and peculiarities directly into knowledge. Patterns
have several formats such as sentence-based patterns(Kaji Hiroyuki(1992)), phrase-based
patterns(Watanabe Hideo(1993)) and collocation-based pafterns(Smadja(1996), Kevin McTait(1999)).
Sentence-based patterns uses a whole sentence as a pattern and transfer the input sentence in one time. It
suffers mainly from data sparseness. Phrase-based patterns can be used for both syntactic analysis and
transfer. The transfer is done phrase by phrase. Collocation-based patterns are used for lexical transfer,
that is, the transfer unit is a word.
ETRI performed a verb-pattern based Korean-English machine translation from 1999 to 2001 and
experienced strong points on the side of knowledge acquisition, consistent management of linguistic
peculiarities between two languages and monotonic increase in system performance according to the
number of patterns(Kim, Y.K. et al.(2001); Seo,Y.A. et al(2000)). A verb pattern consists of a source
language pattern for analysis and a target language pattern for generation. Knowledge description on
lexical level makes it easy to achieve accurate analyses and natural, correct generations. So, accurate
157
Page 2
`‘)
Complex?
Structure Analyzer
•...4VP Structure Patterns
Case
Structure Analyzer
Dependency
Tree Generator
VP Structure Patte
Verb Patterns
Case Constraint Patte
Dependency Rules
ADVP Patterns
Chunker
Morphological 4_1
Normalizer
a
ormalization Rules
MRD
Tagged Corpus
Knowledge
Building Tool
Bilingual Corpus
Law Corpus
Frozen Expression
Processor
a
Verb Phrase Linker
Relative Clause Linker
--to VP Generator
a
??
NP Generator
Translation Memory
Link Patterns
Relative Clause
Linking Rules
VP Patterns
NP Patterns
Post Processor
Morphological
Analyzer
PreProcessor
Unified Lexicon
4--F -T—
nalyzed Dictionary
../
1 Pre-Processing Rubs
Morphological
Generator
Generation Codes
Post Processing Rule ..s.1—• Post Processor
Chines Output
analysis directly leads to correct generation, which is very effective in machine translation between
languages with quite different linguistic structures. With respect to the reusability of knowledge, verb
patterns for Korean-English machine translation can be reused after just modifying the English pattern
part into Chinese, thus saving the cost of knowledge construction.
In section 2 we show the system overview and a simulation example translating a Korean sentence into
Chinese. Verb patterns are explained in more detail in section 3 and a parser using verb patterns are
described in section 4. A generation module is explained in section 5 and an evaluation is made in
section 6. In section 7 we conclude our system with some remarks.
2 System Overview
Our verb-pattern based Korean-Chinese machine translation system consists of a Korean morphological
analyzer, a verb-pattern based parser and a generation module consisting of a verb phrase linker and a
word generator. Figure 1 is the system overview.
Figure 2 is a simple simulation example translating a Korean input sentence into a Chinese sentence.
The morphological analyzer first readjusts words into appropriate morphological units, performs
morphological analysis and finally ranks the results using statistical information. The parser first
readjusts the results of the morphological analyzer into syntactic units and performs
predicate-argument-adjunct analysis and predicate-predicate structure analysis. The generation module
determines the Chinese translation for connectives and arranges the order of each connective clause.
And it finally generates Chinese words.
Morphological Tagger+—Stathtical Diction
Post Processing Rules
Refining Rules
Refiner
Korean Input
Figure 1: Korean-Chinese Machine Translation System
3 Verb Pattern
Phrase-based patterns can be used for both syntactic analysis and transfer(Watanabe Hideo(1993)). The
term 'verb pattern' we are using is to be understood as a kind of subcategorization frame of a predicate.
158
Page 3
.34 .31-)F ;la!? ',?J-9-1 )1r11011?
0114ki P.1`&) °o1 --g?
VtLIC.}.
5 3'
3$a1?
1 4]+21 Cli [VW M 241+00?
lid [it*, a)u+g. C-181++.1, Clokt- 41 31-(16*, C.).13H+3[?Mg] ,c)![AR,
01X1+al?
+ igr?4 (AM,?
1+g 7[X1+2 21+-61-LICE
(((3i4? Mid* C41,, NgLICI.Oft.) 4301. s?-2.1_2_12-'1_71C11011? 0114M) 14E' f)rig. )[X111_5?Iti-LICE.)
g.?
VP1?
VP2?
VP3
<Linking Pattern of Verb Phrases>
VP1 [] VP2[] > VP1 [ECONJ:[eroot := NAM VP2
PC:
VP2[] VP3[] > VP2[ECONJ:[eroot := MAC]) VP3
g-
VP1, 7 VP2, TI VP3
<Verb Pattern>
A=AFg91 B=A11 4!..k1 1.1?n ,i!g CISHCf > A Be A:POSS /JR ff:v
A=2J3H71. B=M24!011 OIXHC.1> A? OIXIC11-1
A=Al-g.91. B-=?,'N!* 71-Xl!CI > A 4:v B )I.X1C41-1
<Noun Phrase Pattern>
A=AF B=Mg1S-1 C=Mat > A B frgC
°BI 2i21 )1 > fitIMAR !YAM
.?.
illE.T4213t6t42143011 tiff, BTA U*TIlitt1FIA10114, Phiitt*lwsittt
Figure 2 : A Simulation for Korean-Chinese Machine Translation
The main difference between a verb pattern and a subcategorization frame lies in the fact that a verb
pattern is always linked to the target language word (predicate of the target language). A verb pattern is
employed not only in the analysis but also in the transfer phase. That's why a source language verb
pattern is linked to a target language verb pattern. Because we consider target words during analysis, a
verb pattern in our approach is slightly different from a subcategorization frame in the traditional sense.
In the theoretical linguistics a subcategorization frame always contains arguments of the predicate. An
adjunct of a predicate or a modifier of an argument is usually not included in the subcategorization frame.
However, we think that for the purpose of machine translation, these words must be taken into account.
In reality adjuncts of a verb or modifiers of an argument can seriously affect the selection of target words,
as can be seen in the following example.
Korean : 01 q-Y-?
English : This month is up
5,kci-
Verb Pattern: A=A1 ZI-! 01 4:b 7Hr-1-:PAST > A be:v up :: 71-4 1-2
In this example, the Korean verb 71-11- (to pass by)' is more appropriate and natural to be translated into
`be up', if modified by the adverb -11- (completely)'. This kind of conflational divergence can be
handled in pattern-based approaches and our verb patterns annotate the adverb with a marker (b) and
link the adverb and the verb to a conflated English expression. Idiomatic usages of a verb can also be
treated easily within verb patterns. A frozen argument in an idiomatic expression is just to be equated
with a variable. For example, a Korean idiomatic expression "-&.7 14-°1 71-4 (= be favorably disposed
toward)" can be described as the following:
0
159
Page 4
!.117-11 1_7,4171 71!r-1- > A be:v favorably disposed toward B:OBJ 714 1-3
The noun `12,1 (=a favorable impression)' is not overtly expressed in the target language expression. If
an expression in the source language side is not marked, it will not be considered any more in the further
phase of the translation. Postpositions such as `°11711', 71-', are normalized Korean postpositions which
correspond to syntactic case markers.
Let's see verb patterns for Korean-Chinese machine translation. A Korean verb pattern is linked to its
corresponding Chinese verb pattern by the symbol `>'. The arguments in the left-hand side of a verb
pattern are basically represented with semantic features such as "1 (lime)',
Eq-(transportation)', etc. We are currently using about 200 hierarchical semantic features to cover the
semantic information of nouns, the arguments of a verb. The right-hand side of `>' is the corresponding
target word expression. In the current stage of the development we have about 40,000 Korean-Chinese
verb patterns and we expect about 60,000 at the end of the year. We present some examples of
Korean-Chinese verb patterns in the below.
Adjectival Verbs
1 : A=Ai-v3171? > A 14 :v : :?
ti-q- 1-1
Copular
oirc-1->A:vB
solr_f 2 A=A11-1171- qt:b?
!oi onr-}- > A?
B °}41-1
Other Intransitive Verbs
Lai- :? B=t4-1!1A-1?
er-i- 6 : A=At i-1171 414 > A 5 3 :v ert 1-1 -
> A E B A:v
Transitive Verbs
.v_r-4- 74 : A=Ai- , 01171-?
1j-§1--81-4 1: A=A1-q! B=A1V:!-Z? > A B JNSIL:v
o ar+ 1 : A=Afv171- r+ > A om:v od-r4 1-1
Y-9-1--T-4 3 : A=Ai-v-pi-B=At?
*cdt--1-r-1- 1 : A=A}qm? > A B?
Tr -} 1 : A=A}?B=A} V- , ! oil 711 C=A1--k 1?
> A Z:v B?
1-1
>A?
B
*Attu+ 1-1
B::?
> A tE, C?
1-1
Verbs taking NPs with Adverbial Postpositions
1 :?
11!7}- B=1?-.1 !oil Jaro- > A fl?
B::?
1-1
Each entry denotes different meaning or different translation. For example, in copular, °1 1 and
01 r-1 2 have different translation results. In the above verb pattern examples we can see that many
divergence problems in translation can be solved easily by verb patterns. For example, a structural
divergence problem between Korean and Chinese can be properly solved using the verb pattern of
cf (help)' as in the below.
01 XE z?T 91 CI. 4>
The direct object of?
‘ri/j' and used as a kind of modifier of the grammatical object "It '. This kind of divergence is directly
addressed in the verb pattern by combining the direct object with?
`..D- 01 A- (the woman)' is combined with a post-nominal particle
as in the below.
y.94---T-r-}- 3: A=Aii-1171B=Am!--a..? > A :v B
160
Page 5
4?
Parser
The parser consists of pre-processing the results of morphological analysis into syntactic units, grasping
the predicate-argument-adjunct structure for each predicate using verb patterns, finding the head of each
unresolved adverb phrase and finally linking predicates using predicate-predicate structure patterns. In
this section, we are going to explain about analyzing the predicate-argument-adjunct structures and
predicate-predicate structures. Verb patterns represent predicate-argument-adjunct structures and also
provide the information for resolving the syntactic cases of auxiliary particles and particle ellipses.
Currently predicate-predicate structure patterns use statistics of verb endings to represent structural
preferences between predicates in multi-predicate sentences.
4.1 Predicate-Argument-Adjunct Structure Analysis
A dependency structure is used in predicate-argument-adjunct analysis. As described previously, verb
patterns describe not only arguments but also adjuncts. In predicate-argument-adjunct analysis, verb
patterns are used in two steps, that is, in binary pattern matching and in full pattern matching. In binary
pattern matching, the information of the form <a noun meaning, a postposition, a verb meaning> are
extracted from verb patterns to filter out the unnecessary dependency relations. In full pattern matching,
each verb on a dependency tree is compared with verb patterns and evaluated according to the matched
proportion. In case of a pronominal clause, the modifee of the clause can fill in an argument of the
predicate. So the pronominal clause including the modifee is compared together with verb patterns. The
higher the proportion is, the higher the score of the evaluation is.
Auxiliary predicates and suffixes can change the argument of a verb, which demands another verb
patterns different from the original and causes difficulties in manual construction and management of
verb patterns. So, we use rules to deal with auxiliary predicates and suffixes instead of constructing
another verb patterns. This is possible because there exist regularities, for example, in transforming
syntactic cases to and from active/passive, active/causative forms as the following.
passive -4 active
causative --> active
,C, for transitive verbs
2 subject 4 object
0 adverb -4 subject
0 subject, adverb(011 l.1) ) 4 adverb(
object()
0 for verb/adjective
© subject 4 adverb
© object,adverb(011 )11) 4
object,subject
0 adverbeil AO 4 subject
,CD object 4 subject
),
Auxiliary particles and particle ellipsis can be interpreted into several syntactic • cases and cause
difficulties in syntactic case resolution. The current parser restricts the interpretation only to objective
and subjective cases and applies the same method to both phenomena. The modifee of a pronominal
clause also falls in the case of particle ellipsis. The parser applies the subjective particle `7E' and the
objective particle ' in turn to auxiliary particles and particle ellipsis and compares them with verb
patterns. If the meanings between nouns are the same and the case is still empty then the case resolution
is succeeded.
4.2 Predicate-Predicate Structure Analysis
Predicate-predicate structure analysis determines the structure between predicate phrases. Below shows
an example.
Korean Sentence :?
4>z.1 J-°1 tiv i (z.1 - 01 o}-1_ a V-
He disappeared after declaring that he is not the criminal
After predicate-argument-adjunct analysis :
161