Content uploaded by Robert Mercer
Author content
All content in this area was uploaded by Robert Mercer on Apr 07, 2015
Content may be subject to copyright.
A STATISTICAL APPROACH TO LANGUAGE TRANSLATION
P. BROWN, J. COCKE, S. DELI,A PIETRA, V.
DELLA PIETRA,
F. JELINEK, R, MF, RCF, R, and P. ROOSSIN
IBM Research Division
T.J. Watson Research Center
Department of Computer Science
P.O.
Box 218
Yorktown Heights, N.Y. 10598
ABSTRACT
An approach to automatic translation is outlined that utilizes
technklues of statistical inl'ormatiml extraction from large data
bases. The method is based on the availability of pairs of large
corresponding texts that are translations of each other. In our
case, the iexts are in English and French.
Fundamental to the technique is a complex glossary of
correspondence of fixed locutions. The steps of the proposed
translation process are: (1) Partition the source text into a set
of fixed locutioris. (2) Use the glossary plus coutextual
information to select tim corresponding set of fixed Ioctttions into
a sequen{e forming the target sentence. (3) Arrange the words
of the talget fixed locutions into a sequence forming the target
sentence.
We have developed stalistical techniques facilitating both tile
autonlatic creation of the glossary, and the performance of tile
three translation steps, all on the basis of an aliglnncllt of
corresponding sentences in tile two texts.
While wc are not yet able to provide examples of French /
English tcanslation, we present some encouraging intermediate
results concerning glossary creation and the arrangement of target
WOl'd seq lie)lees.
1. INTRODUCTION
In this paper we will outline an approach to automatic translation
that utilizes techniques of statistical information extraction from
large data bases. These self-organizing techniques have proven
successful in the field of automatic speech recognition [1,2,3].
Statistical approaches have also been used recently in
lexicography [41 and natural language processing [3,5,6]. The idea
of automatic translation by statistical (information thco,'etic)
methods was proposed many years ago by Warren Weaver [711.
As will be seen in the body of tile paper, tile suggested technique
is based on the availability of pairs of large corresponding texts
that are Iranslations of each other.
Ill
particular, we have chosen
to work with the English and French languages because we were
able to obtain the biqingual llansard corpus of proceedings of the
Canadian parliament containing 30 million words of text [8]. We
also prefer to apply our ideas initially to two languages whose
word orcter is similar, a condition that French and English satisfy.
Our approach eschews the use of an internmdiate ,nechalfism
(language) that would encode the "meaning" of tile source text.
The proposal will seem especially radical since very little will be
sakl about employment of conventional grammars. This
omissiol], however, is not essential, and may only rcl'lect our
relative lack of tools as well as our uncertainty about tile degree
of grammar sophistication required. We are keeping an open
mind!
Ill what follows we will not be able to give actual results el French
/ English translation: our less than a year old project is not I'ar
enongh ahmg. Rather, we will outline our current thinking, sketch
certain techniqttes,
and
substantiate our Ol)timism
by
presenting:
some intermediate quantitative data. We wrote this solnewhat
specttlativc paper hoping to stimulate interest in applications el
statistics to transhttion and to seek cooperation in achieving this
difficult task.
2. A IIEURIST|C OUTLINE OF 'FILE BASIC PHII,OSOPttY
Figure I juxtaposes a rather typical pair of corresponding English
mid ]:rench selltenees, as they appear in the Ih.nlsard corpus.
They arc arranged graphically so as to make evident thai (a) the
literal word order is on the whole preserved, (b) the chulsal (and
perhaps phrasal) structure is preserved, and (c) the sentence pairs
contain stretches of essentially literal correspondence interrupted
by fixed locutions. In the latter category arc [I rise on = ie
souleve], ]affecting = apropos], and [one which reflects
o n
=
i/our mettre cn doutc].
It can thus be argued that translation ought
to
bc based
on a
complex glossary of correspondence el' fixed locutions. Inch~ded
would be single words as well as phrases consisting el contiguous
or tuna--contiguous words. E.g., I word = mot l, I word = proposl.
[not = ne ... pasl, [no = ne ... pas[, [scat belt = ccmturc[, late =
a mangel and even (perhaps} lone which reflects Oil
=
[)()ill"
mcttrc ell doute], etc.
Transhttion call he sotnewhat naively regarded as a tht'cc slag¢
process:
( 1 ) Partition the source text into a set of fixed locutions
(2) Use the glossary plus contextual information to select the
corresponding set of fixed locutions in the target language.
(3) Arrange the words of the target fixed locutions into a
sequence that forms the target sentence.
This naive approach forms the basis of our work. In fact, we have
developed statistical techniques facilitating the creation of the
glossary, and the performance of the three translation steps.
While the only way to refute the many weighty objections to our
ideas woukl be to construct a machine that actually carries out
satisfactory translation, some mitigating comments are ill order,
7l
We do not hope to partition uniquely the source sentence into
locutions. In most cases, many partitions will be possible, each
having a probability attached to it.
Whether "affccting" is to be translated as "apropos" or
"cuncernant," or, as our dictionary has it, "touchant" or
"cmouvant," or in a variety of other ways, depends on the rest
of the sentence. However, a statistical indication may be
obtained from the presence or absence of particular guide words
in that scntcncc. Tile statistical technique of decision trees [9]
can be used to determine the guide word set, and to estimate the
ln'obability to be attached to cach possible translate.
The sequential arrangement of target words obtained from the
glossary inay depend on an analysis of the source sentence• For
instance, clause corrcspondence may be insisted upon, in which
case only permutations of words which originate in the same
source clause wotdd be possible. Furthermore, the character of
the source clause may affect the probability of use of certain
functioll words in the target clause. There is, of course, nothing
to prcvent the use of more detailed information about the
structure of the parse of the source sentence. However,
preliminary experilnents presented below indicate that only a very
crude grammar may be needed (see Section 6).
3. CREATING THE GLOSSARY,'FIRST ATTEMPT
We have already indicated in the previous section why creating a
glossary is not just a matter of copying some currently available
dictiouary into the computer, in fact, in the paired sentences of
Figure 1, "affecting" was translated as "apropos," a
correspondence that is riot ordinarily available• Laying aside for
the time being the desirability of (idiomatic) word cluster - to -
word cluster translation, what we are'after at first is to find for
each word f in the (French) source language the list of words
{e~, e2 ..... e,} of the (English) target language into which f can
translate, and the probability
P(e, If )
that such a translation takes
place.
A first approach to a solution that takes advantage of a large data
basc of paired sentences (referred to as 'training text') may be as
follows. Suppose for a moment that in every French / English
sentence pair each French wordftranslates into one and only one
English word e, and that this word is somehow revealed to the
computer. Then we could proceed by!'
I. Establish a counter
C(e,,f)
for each word e~ of the English
w~cabulary. Initially set
C(e~,f)
= 0 for words et. Set J = 1.
2. Find the Jth occurrence of the word fin the French text. Let
it take place in the Kth sentence, and let its translate be the qth
word in the Kth English sentence E = e~,, e~ .....
e~,.
Then
increment by 1 the counter
C(e,,¢f).
3. Increase J by 1 and repeat steps 2 and 3.
Setting
M(f )
equal to the sum of all the counters
C(e,, f)
at the
conclusion of the above operation (in fact, it is easy to see that
M(f) is the number of occurrences of fin the total French text),
we could then estimate the probability
P(e, J f )
of translating the
word f by the word e, by the fraction
C(e,,f)/M(f).
The problem with the above approach is that it relies on correct
identification of the translates of French words, i.e., on the
solution of a significant part of tile translation problem. In the
absence of such identification, the obvious recourse is to profess
complete ignorance, beyond knowing that the translate is one of
the words of the corresponding English sentence, each of its
words being equally likely. Step 2 of the above algorithm then
must be changed to
2'. Find the Jth occurrence of the word fin the French text. Let
it take place in the Kth sentence, and let the Kth English sentence
consist of words e,,,
e,~, ..., e,°.
Then increment the counters
C(e,,,f), C(e,,,f) ..... C(o,o,f)
by tire fraction 1/n.
This second approach is based on tile faith that in a large corpus,
the frequency of occurrence of true translates of f in
corresponding English sentences would overwhelm that of other
candidates whose appearance in those sentences is accidental•
This belief is obviously flawed. In particular, the article "the"
would get the highest count since it would appear multiply in
practically every English sentence, and similar problems would
exist with other function words as well.
What needs to bedone is to introduce some sort of normalization
that would appropriately discount for the expected frequency of
occurrence of words. Let
P(e~)
denote the probability (based on
ttle above procedure) that the word e, is a translate of a randomly
chosen French'word.
P(e~)
is given by
Pie i) = ~f P(eilf')r(f') = ~f P(e~ lf')M(f')/M
(3.i)
where M is the total length of the French text, and
M(f')
is the
number of occurrences off t in that text (as before). The fraction
P(e,
If) /
P(e,)
is an indicator of the strength of association of e,
with f, since
P(e,
If) is normalized by the frequency
P(e,)
of
associating e~ with an average word. Thus it is reasonable to
consider
e,
a likely translate of f if
P(e, I f )
is sufficiently large•
The above normalization may seem arbitrary, but it has a sound
underpinning from the field of Information Theory [ 10]. In fact,
the quantity
P(eilf)
l(ei;
f) = log (3.2)
P(e,)
is the mutual information between the French word f and the
English word
e,.
Unfortunately, while normalization yields ordered lists of likely
English word translates of French words, it does not provide us
with the desired probability values. Furthermore, we get no
guidance as to the size of a threshold T such that e, would be a
candidate translate of f if and only if
l(~;f) > T (3.3)
Various ad hoe modifications exist to circumvent the two
problems• One might, for instance, find the pair e, f with the
highest mutual information, criminate e~ and f from all
corresponding sentences in which they occur (i.e. decide once
and for all that in those sentences e, is tile translate of f !), then
re-compute all the quantities over the shortened texts, determine
the new maximizing pair
e~,f ~
and continue the process until
some arbitrary stopping rule is invoked•
Before the next section introduces a better approach that yields
probabilities, we present in Figure 2 a list of high mutual
72
information English words for some selected French words. The
reader will agree that even tire flawed technique is quite powerful.
4. A
SIMPLE GI,OSSARY BASED ON
A MODE[,
O1" TIlE TRANSI,ATION PROCESS
We will now revert to our original ambition of deriving
probabilities of translation,
P(e,[f).
Let us start by observing
that tlm algorithm of the previous section has the following flaw:
Shonld it be "decided" that the qth word, e,, , of the English
sentence is Ihc translate of the rth word,
~r,
of the French
sentence, that process makes no provision for removing e,. from
eonskk ratiou as a candidate translate of any of tile remaining
French words (those not in the rth position)! We need to find a
mctho0 to decide (probabilistically !) which English word was
general ed by which l.'rench one, and then estimate
P(e, tf )
by the
relative frequency with whiehfgave rise to e, as "observed" ira tire
texts of paired French / English sentence transhttcs. Our
procedure will be based on a model (an admittedly crude one) of
how Ertgtish words are generated from their French counterparts.
With a slight additional refinement to be specified in the next
section (see the discussion on position distortion), the following
model will do the trick. Augment the English vocabulary by the
NULl, vcord eo that leaves no trace in tile English text. Then each
French word f will prodnce exactly one 'primary' English word
(which may be, however, invisible). Furthermore, primary
English words can produce a number of secondary ones.
The provisions for the null word and for tile production of
secondary words will account for the unequal length of
corresponding French and English sentences. It would be
expected that some (but not all) French function words would
be killed by producing null words, and that English ones would
be crealed by secondary production. In particular, in the example
of Figme l, one would expect that "reflects" woakl generate both
"which" and "on" by secondary production, and "rise" would
similarly generate "on." On tbc other hand, the article 'T" of
'TOrat( ur" and the preposition "a" of "apropos" wotfld both
be expected to generate a null word in the primary process.
This model of generation of English words from French ones then
requires the specification of the following quantities:
1. The probabilities
P(e, lf)
that the ith word of the English
dictionary was generated by the French word f.
2. The probabilities
Q(% l e,)
that the jth English word is
generated from tile ith one in a secondary generation process.
3. The probabilities R (k I e~) that the ith English word generates
exactly
k
other words in the secondary process. By convention,
we set R(0 [ e0) = 1 to assure that the null word does not generate
any other words.
The lnollel probability that the word f generates e,, in tile primary
process, and e~:,...,e~, in the secondary one, is equal to the product
P(ei, lf ) R(k -
11%)
Q(ei2lei,) Q(%lei~)... Q(%leq)
(4.1)
Given a pair of English and French sentences E and F, by the
term generation pattern $ we understand the specification of
which English words were generated from which French ones,
and which~secondary words from which primary ones. Therefore,
the probability P(E,$IF) of generating the words of E ira a
pattern $ from those of F is given simply by a product of factors
like (4.1), one for each French word. We can then think of
estimating the probabilities
P(e,
l f),
R(k l e,),
and
Q(e:l¢)
by the
following algorithm at tile start of which all counters are set to
0:
1. For a sentence pair E,F of the texts, find that pattern $ that
gives the maximal value of P(E,$IF), and then make the
(somewhat impulsive) decision that that pattern $ actually took
place.
2. If in the pattern $, f gave rise to e,, augment counter
CP(e,,f)
by l; if e, gave rise to k sccoudary English words,
augment counter
CR(k, e,)
by 1 ; if e~ is any (secondary) word that
was given rise to by e,, augment counter
CQ(e~,
e,) by 1.
3. Carry out steps 1 and 2 for all sentence pairs of tile training
text.
4. Estimate the model probabilities by nornmlizing the
correspnndiug counters, i.e.,
P(e,]f) = CP(ei, f)/CP(f) where CP(f) = ECP(e, f)
i
R(k] e i) = CR(k, ei)/CR(e,) where CR(ei) = E CR(k, ei)
k
Q(ejl e i) = CQ(e 1, e,)/CQ(e i) where CQ(e,) = ECQ(ei, e,)
J
The problem with the above algorithm is that it is circular: ila
order to evalnate P(E,$ ] F) one needs to know the probabilities
P(e,
I)c),
R(kl e,),
and
Q(ejle,)
in the first place! Forttmately, the
difficulty can be alleviated by use of itcrative re-estimation, which
is a technique that starts out by guessing the values of unknown
quantities and gradually re-adjusts them so as to account better
and better for given data [ 11 ].
More precisely, given any specification of the probabilitics
P(e, lf), R(k
l e,), and
Q(%le,) ,
we compute the probabilities
P(E,$ [ F) needed in step 1, and after carrying out step 4, wct, sc
the freshly obtained probabilities
P(e,
If),
R(k ]e,),
and
Q(e,
I e,)
to repeat the process fi'om step I again, etc. We hah the
computation when the obtained estimates stop changing from
iteration to iteration.
While it can be shown that tile probability estimates obtained in
the above process will converge [11,12], it cannot be proven that
the values obtained will be the desired ones. A heuristic argument
can be formulated making it plausible that a more complex but
computationally excessive version [13] will succceC Its truncated
modification leads to a glossary that seems a very satisfactory
one. We present some interesting examples of its
P(e, If)
entries
in Figure 3.
Two important aspects of this process have not yet been dealt
with: the initial selection of values of
P(e, lf), R(kle,) ,
and
Q(51e,),
and a method of finding the pattern $ maximizing
P(E,$ [ F).
A good starting point is as follows:
A. Make
Q(ejle,)
= l/K, where K is the size of the English
vocabulary.
73
g. l.et R(I [e,) = 0.8, R(01¢) = 0.1, R(2I<) = R(31<) =
R(4 I g) = R(5 I e,) = 0.025 for all words e, except the null word
ell l,et R(0 le0) = 1.0.
C. To determine the initial distribution
P(e, lf)
proceed as
I'ollows:
(i) Estimate first
P(< If )
by tile algorithm of Section 3.
(ii) Compute the mutual information values l(e,; f) by formula
(Y2), and for each f find the 20 words e, for which
I(e,;f)
is
largest.
(iii) I.ct
P(<~I./') =
P(< l f) = (I/21) - e for all words<on the
list obtained in OiL whEre e is some small positive number.
l)istributc the remaining probability e uniformly over all the
I nglish words not
on
the list.
I:inding tile maximizing pattern $ for a given sentence pair E, F
is ~ well-studiEd technical problem with a variety of
,'{mHmtatiomdly feasible solutions that arc suboptimal in some
practically uuimportant respects I 14]. Not to interrupt the flow
t,l imuiti\e ideas, we omit the discussion of the corresponding
d 1~2',11 illnns.
5. TOWARD A COMPLEX G1,OfSSARY
In the previous section we have introduced a techniqne that
derives a word - to - word translation glossary. We will now
reline tile model to make the probabilities a better reflection of
reality, and then outline an approach for including in tile glossary
Ihe /ixEd locations discussed in Section 2.
It should be noted that while English / French translation is quite
k)cal (as illustrated by the alignment of Figure 1), the model
leading to (4.1) did not take advantage of this affinity of the two
languages: tile relative position of the word translate pairs ill their
respective selltences was not taken into account. If m and n
denote the respective lengths of corresponding French and
l:.nglish sentences, then the probability that 6~ (the kth word in the
English sentencE) is a primary translate of.f~0 (the hth word in the
[:rench sentence) shoukl more accurately be given by the
probability
P(e,,kl
.f,,,h,m,n) that depends both on word
positions and sentence lengths. '1'o keep the formulation as simple
as possiblE, WE can restrict ourselves to tile functional form
l'(ei ,k I /i,,,h,m,n) = PW(e,~ I fh) PD(k l h,m,n)
(5.1)
In (5.1) we make thc 'distortion' distribution
PD(klh,m,n)
indcpcndcnt o1' the identity of the words whose positional
discrepancy it dcscribcs.
As far as secondary generation is concerned, it is first clear that
the production of preceding words differs from that of those that
Iollow. So the R and Q probabilities should be split into left and
right probabilities
RL
and
QL,
and
RR
and
QR.
Furthermore,
\re shnuld provide the Q -probabilities with their own distortion
components that would depend on the distance of the secondary
word from its primary 'parent'. As a result of these
cons!derations, the probability that f~, generates (for instance) the
primary words e,~ and preceding and following secondary words
<~ ,, <~ ,, e,.~ would be given by
f'W(6~ I .[i~,) PD(k l h,m,n) RL(2 l eiA)
RR(I e~a)
QL(G_:~,
3 I G)
QL(%
,,11%)
QR(e,~+2,2lei ,)
(5.2)
Obviously, other distortion formulations are possible. The
purpose of any is to sharpen the derivation process by restricting
the choice of translates to the positinnally likely candidates in the
corresponding sen tencc.
To find fixed locutions in English, we can use the final
probabilities
QL
and
QR
obtained by tile method of the previous
section to compute mutual informations between primary and
secondary word pairs,
QR(e' I e)
IR(e;e r)
= log---- (5.3)
P(e')
and
QL(e' I e)
1L(e~;e)
= log
P(e')
where
P(e') = C(e')/N
is the relative frequency of occurrence of
the secondary word e' in the English text
(C(e')
denotes the
number of occurrences of e ' in the text of size N), and
QR
and
QL
are the average secondary generation probabilities,
QR(e']e) = ZQR(e', i] e)
(5,4)
i
and
Ql.(e']e) = EQR(e', i l e)
i
WE can then establish an experimentally appropriate threshold
71, and iuchulc in the glossary all pairs (e, e') and (e', e) whose
mutual information exceeds 7'.
While tile process above results in two-word fixed locutions,
longer locutions can be obtained iteratively in the next round after
the two-word variety had been included in the glossary and in the
formulation of its creation.
To obtain French locutions, one must simply reverse the direction
of the translation process, making English and French the source
and target languages, respectively.
With two-word locutions present in both the English and French
parts of the glossary, it is necessary to reformulate the generation
process (4.1). The change would be minimal if we could decide
to treat the words of a locution (,/;f') as a single word f* =
U, f') rather than as two separate words f and f' whenever both
are found in a sentence. In such a case nothing more than a
receding of the French text would be required. However, such a
radical step would almost certainly be wrong: it could well
connect auxiliaries and participles that were not part of a single
past construction. Clearly then, the choice between separateness
and unity should be statistical, with probabilities estimated in the
overall glossary construction process and initialized according to
the frequencies with which elements of the pair
f,f~
were
associated o1 not by secondary generation when they appeared in
the same sentence.
Since the approach of this section was not yet used to obtain any
results, we will leave its complete mathematical specification to a
future report.
74
6. GF, I~,IERATIION ()F TRANSLATED 'I'EXT
We have pointed out in Sectk)u 2 that translation can be
somewhat xaively regarded as a lhrec stage process:
( I )
Partition the source text into a set of fixed locutions.
(2) Use the glossary plus contextual information to select the
corresponding set of fixed lomttious in the target language.
(3) Atrange the words of thc target fixed locutkms into a
seqtteltce forming the target sentence.
We have just fitfished arguin b, itt Section 5 that the partilioning
of sottrcc +ext ili1O locutions is SOIIIUWIIHt conlplex, and that it
must be approached statistically. The basic idea of using
cotltextttal iilfOl'l?lation tO select the correct 'sense' of a Iocutioll
is to eonsh uct a contextual glossary based on a probability of the
form P(el J; g'IFI ) where e and f are English anrl French
locutions, ;tnd
q, ilq
denotes a 'lexical' equivalence class of the
scalence F The tu,;t of class membership woukl typically depend
on tilt pre~:ence of SOIIIC contbination of words in F. The choice
of an app;opriate eqtfivalcncc chtssification schenlc would, of
course, be .'+he subject of research based on yet another statistical
formulation. The estimate of
P(el
./'; ~11"1 ) would be derived
from courtts o1' locttlion alignments in sentmlce translate pai,s, the
alignments being dstimated based on non-contextual glossary
probabilitit+s of the form (5.2).
The last stop in our translation scheme is the re-arrangement of
the words o1" the generated English locutions into an appropriate
sequence. To see whc'ther this can be douc statistically, we
explored what would happen in the ilnpossibly optimistic case
where the words generated in (2) were exactly those of the
l inglish sczttencc (only their order would be unknown):
From a large f'+'uglish corpus we derived estimates of trigram
probabilities,
P(e3let, e:~),
that the word el follows immediately
the sequencc pair e~, % A model of 13,nglish sentence production
based on a trigram estimate would conclude that a sentence
e~, ca, ..J e,, is
generated with probability
P(el,
e2) P(e3 Iet, e2)
P(e41
e2, e3) ..-
P(e,,
I e,,+ 2, e, I) (6.1)
We then rook other l:';nglish sentences (not included in the
training COrlmS) and deterntined which of the n t different
arrangements el + their n words was most likely, using the l'ormula
(6.1). We found that in 63% of sentences of 10 words or less,
the most likely arrangement was the original English sentence.
I;urthermore, the most likely arrangement preserved the meaning
of the original sentence in 79% of the cases.
Figure, 4. shows examples hi' synonymous and non-.synonymous
re-al'rangelnenL'~.
We realize that very little hope exists of the glossary yielding the
words and only the words of an English seutence translating the
original French one, and that, furthermore, Euglish sentences arc
typically longer than 10 words. Nevertheless, we feel that the
abow: result is a hopeful one for fnture statistical translation
methods incorporating the use of appropriate syntactic structure
information.
REFERENCES
111 L.R. Bahl, F. Jclinek, and R.l,. Mercer: A maximum likelihood
approach to contimlous speech recognition, IEEE Traosaclioos on
Pattern Analysis and Machine Intelligence, PAM I-5 (2): 179-190, March
1983.
12] .I.K. Baker: Stochastic modeling for automatic speech
tmdcrstanding. In R.A. P, eddy, editor, Speech Recognition, pages
521-541, Academic Press, New York, 1979.
131 J.D. Ferguson: llidden Markov analysis: An introduction. In J.D.
Fcrguson, Ed., ltldden Marker Models for Speech. Princeton, New
Jersey, IDA-CRD, Oct. 1980, pp. 8-15
14] J. Metl. Sinclair: "Lcxicogral~hic F.vidence" in, I)ielionarie,~,
Lexicography and Langaage l,earniog (l!1+'F Doeoments: 1211), editol
R. llson, NewYork: Pergamon Press, pp. 81-94, 1985.
151 P,.G. Garsidc, G.N. Lccch and (].P,. Sampson, The Comlmlational
Analysis of l(l,glish: a Corpus-Based AI)l)roach, I.ongman 1987
1161 G.R. Sampson, "A Stochastic Approach to Parsing" itl. lh'oceeding+,
of tile I lth lnlernalional Corfferenee oil (k+mputaliotml l,inguintics
(COl ,IN(] '86) Bonn 151-155, 1986.
171 W. Weaver: Translalion (194.9). Reproduced in: I,ocke. WN. &
Booth, A,D. eds.: Maelnine Iranslalimn of hmguages. Calnbrid,ee, MA.:
MIT Press, 1955.
18]
I[lansards: Official l)roeeedings of the liouse of Cemlnons of Canada,
19"I4+.78, CanadialJ Government Printie~ Bureau, Ihtll, Quebec
( ~
~111~/(Ja.
19[ I+. IIrciman, J.ll. Friedtnall, R.A. Olshen, and (J. Stone:
Classification and Regression Trees, Wadsworth alld t~rooks, M(mtcrey.
CA,
[
984.
[10] R.G. Gallager: Informalion Theory aad reliahle (ommuniealion,
John Wiley and Sons, Ii1c,, New York, 1968.
[I I1 A.P. Dcmpstcr, N.M.l.aird, al/d It.B. ll.ubin: Maximum likelihood
from ineolnpletc data via tile I"M algorithm, Journal of Ihe Royal
S|atistical Society, 39(B):
1-38, 1977.
1121 A.J, Viterbi: Error bounds Ior conw)httional codes and an
asylntotically optimum decoding algorithm, 11,'1.;1,~ Transactions on
Information Theory, 1T-13:2611-267, 19fi7.
[ 13] L.E. Bauln: All inequality and associated inaxilnization tcc]miquc
in statistical estimatkm of probabilistic functions o1 a Maikov process,
lneqoalities, 3:1-8, 1972.
[ 14] F. Jclinek: A fast sequential decoding algorithm using a stack, IBM
T. a. Watson Research Development, vol. 13, pp. 6754~85, No\. 19(¢)
Mr. Speaker, I rise on a question of privilege
Monsieur l'Orateur, je souleve la qoestion de privilege
affecting the rights and prerogatives of pmliamentary committees
a propos des droits et des prerogatives des eomites parlenmnlaires
and o11o which reflects oii tile wol'd of two ininisters
et i)otlr nlettre en d<mte les i)ro])os tie detlX illhlistles
of the Crown.
tic la Cotlronne.
FIGURE I
AI,IGNMENT OF A FRENCII AND ENGHSH SI;,NTENCE PAIR
75
eau water
lait milk
banque bank
banques banks
hier yesterday
janvier January
jours days
votre your
cufants children
trop too
toujours always
trois three
monde world
pourquoi why
aujord'bui today
sans without
lui him
mais but
suis am
seulemeot only
peut cannot
ceintures seat
ceinturcs belts
bravo !
FIGURE 2
A LIST OF HIGH MUTUAL INFORMATION FRENCH-ENGLISH
WORD PAIRS
WHICH QUI
I. qui 0.380 who 0.188
2. que 0.177 which 0.161
3. dont 0.082 that 0.084
4, de 0.060 0.038
5. d' 0.035 to 0.032
6. laquclle 0.(131 of 0.027
7. ou 0.027 the 0.026
8. ct 0.022 what 0.018
THEREFORE DONC
1. donc 0.514 therefore 0.322
2. consequent 0.075 so 0.147
3. pat" 0.074 is 0.034
4. ce 0.066 then 0.024
5. pourquoi 0.064 thus 0.022
6. alors 0.025 the 0.018
7. il 0.025 that 0.013
8. aussi 0.015 us 0.012
STILL ENCORE
1. encore 0.435 still 0,181
2. toujours 0.230 again 0.174
3. reste 0.027 yet 0.148
4. *** 0.020 even 0.055
5. quand 0.018 more 0.046
6. meme 0.017 another 0,030
7. de 0.015 further 0.021
8. de 0.014 once 0.013
FIGURE
3 (PART I)
EXAMPLES OF PARTIAL GLOSSARY LISTS OF MOST LIKELY
WORD TRANSLATES AND THEIR PROBABILITIES
Note: *** denotes miscellaneous words not belonging to the lexicon.
PEOPLE GENS
1. les 0.267 people 0.781
2. gens 0.244 they 0.013
3. personnes 0.100 those 0.009
4. population 0.055 individuals 0.008
5. peuple 0.035 persons 0.005
6. canadiens 0.031 people's 0.004
7. habitants 0.024 men 0.004
8. ceux 0.023 person 0.003
OBTAIN OBTENIR
l. obtenir 0.457 get 0.301
2. pour 0.050 obtain 0.108
3. les 0.033 have 0.036
4. de 0.031 getting 0.032
5. trouver 0.026 seeking 0.023
6. se 0.025 available 0.021
7. obtenu 0.020 obtaining 0.021
8. procurer 0.020 information 0.016
QUICKLY RAPIDEMENT
1. rapidement 0.508 quickly 0.389
2. vite 0.130 rapidly 0.147
3. tot 0.042 fast 0.052
4. rapide 0.021 quick 0.042
5. brievement 0.019 soon 0.036
6. aussitot 0.013 faster 0.035
7. plus 0.012 speedy 0.026
8. bientot 0.012 briefly 0.025
FIGURE 3 (PART II)
EXAMPLES OF PARTIAL GLOSSARY LISTS OF MOST LIKELY
WORD TRANSLATES AND THEIR PROBABILITIES
EXAMPLES OF RECONSTRUCTION TttAT PRESERVE
MEANING:
would I report directly to you?
I would report directly to you?
now let me mention some of the disadvantages.
let me mention some of the disadvantages now,
he did this several hours later.
this he did several hours later.
EXAMPLES OF RECONSTRUCTION THAT DO NOT PRESERVE
MEANING
these people have a fairly large rate of turnover.
of these people have a fairly large turnover rate.
in our organization research has two missions.
in our missions research organization has two.
exactly how this might be done is not clear.
clear is not exactly how this might be done.
FIGURE 4
STATISTICAL ARRANGEMENT OF WORDS BELONGING TO
ENGLISH SENTENCES
76