Peter F. Brown’s research while affiliated with IBM Research - Thomas J. Watson Research Center and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (22)


A Statistical Approach to Machine Translation
  • Chapter

June 2003

·

35 Reads

·

5 Citations

Peter F. Brown

·

John Cocke

·

Stephen A. Della Pietra

·

[...]

·

Paul S. Roossin

A collection of historically significant articles on machine translation, from its beginnings through the early 1990s. The field of machine translation (MT)—the automation of translation between human languages—has existed for more than fifty years. MT helped to usher in the field of computational linguistics and has influenced methods and applications in knowledge representation, information theory, and mathematical statistics. This valuable resource offers the most historically significant English-language articles on MT. The book is organized in three sections. The historical section contains articles from MT's beginnings through the late 1960s. The second section, on theoretical and methodological issues, covers sublanguage and controlled input, the role of humans in machine-aided translation, the impact of certain linguistic approaches, the transfer versus interlingua question, and the representation of meaning and knowledge. The third section, on system design, covers knowledge-based, statistical, and example-based approaches to multilevel analysis and representation, as well as computational issues. Bradford Books imprint


A Statistical Approach To Machine Translation
  • Article
  • Full-text available

July 2002

·

2,061 Reads

·

1,388 Citations

Computational Linguistics

this paper, we present a statistical approach to machine translation. We describe the application of our approach to translation from French to English and give preliminary results

Download

Word-Sense Disambiguation Using Statistical Methods

May 2002

·

656 Reads

·

213 Citations

We describe a statistical technique for assigning senses to words. An instance of a word is assigned a sense by asking a question about the context in which the word appears. The question is constructed to have high mutual information with the translation of that instance in another lan- guage. When we incorporated this method of assigning senses into our statistical machine translation system, the error rate of the system decreased by thirteen percent.


Aligning Sentences In Parallel Corpora

May 2002

·

210 Reads

·

281 Citations

In this paper we describe a statistical technique for aligning sentences with their trauslations in two parMid corpora. In addition to certain anchor points that are available in our da. ta, the only information about the sentences that we use for calculating Mignments is the number of tokens that they contain. Because we make no use of the lexicaJ details of the sentence, the Mignment com- putation is fast and therefore practicM for application to very large collections of text. We have used this thnlque to Mign several million sentences in the English-French Hansard corpora and have achieved an ccuracy in excess of 99% in a random selected set of 1000 sentence pairs that we checked by hand. We show that even without the benefit of anchor points the correlation between the lengths of aligned sentences is strong enough that we should expect to achieve an accuracy of between 96% and 97%. Thus, the technique may be applicable to a wider variety of texts than we have yet tried.


The Candide System for Machine Translation

March 2000

·

557 Reads

·

102 Citations

We present an overview of Candide, a system for automatic translation of French text to English text. Candide uses methods of information theory and statistics to develop a probability model of the translation process. This model, which is made to accord as closely as possible with a large body of French and English sentence pairs, is then used to generate English translations of previously unseen French sentences. This paper provides a tutorial in these methods, discussions of the training and operation of the system, and a summary of test results. 1. Introduction Candide is an experimental computer program, now in its fifth year of development at IBM, for translation of French text to English text. Our goal is to perform fully-automatic, high-quality text-to-text translation. However, because we are still far from achieving this goal, the program can be used in both fully-automatic and translator's-assistant modes. Our approach is founded upon the statistical analysis of language....


Estimating Hidden Markov Model Parameters So As To Maximize Speech Recognition Accuracy

February 1993

·

137 Reads

·

70 Citations

IEEE Transactions on Speech and Audio Processing

The problem of estimating the parameter values of hidden Markov word models for speech recognition is addressed. It is argued that maximum-likelihood estimation of the parameters via the forward-backward algorithm may not lead to values which maximize recognition accuracy. An alternative estimation procedure called corrective training, which is aimed at minimizing the number of recognition errors, is described. Corrective training is similar to a well-known error-correcting training procedure for linear classifiers and works by iteratively adjusting the parameter values so as to make correct words more probable and incorrect words less probable. There are strong parallels between corrective training and maximum mutual information estimation; the relationship of these two techniques is discussed and a comparison is made of their performance. Although it has not been proved that the corrective training algorithm converges, experimental evidence suggests that it does, and that it leads to fewer recognition errors that can be obtained with conventional training methods


But dictionaries are data too

January 1993

·

84 Reads

·

41 Citations

Although empiricist approaches to machine translation depend vitally on data in the form of large bilingual corpora, bilingual dictionaries are also a source of information. We show how to model at least a part of the information contained in a bilingual dictionary so that we can treat a bilingual dictionary and a bilingual corpus as two facets of a unified collection of data from which to extract values for the parameters of a probabilistic machine translation system. We give an algorithm for obtaining maximum likelihood estimates of the parameters of a probabilistic model from this combined data and we show how these parameters are affected by inclusion of the dictionary for some sample words.


The Mathematics of Statistical Machine Translation: Parameter Estimation

January 1993

·

2,931 Reads

·

3,887 Citations

Computational Linguistics

We describe a series of five statistical models of the translation process and give algorithms for estimating the parameters of these models given a set of pairs of sentences that are translations of one another. We define a concept of word-by-word alignment between such pairs of sentences. For any given pair of such sentences each of our models assigns a probability to each of the possible word-by-word alignments. We give an algorithm for seeking the most probable of these alignments. Although the algorithm is suboptimal, the alignment thus obtained accounts well for the word-by-word relationships in the pair of sentences. We have a great deal of data in French and English from the proceedings of the Canadian Parliament. Accordingly, we have restricted our work to these two languages; but we feel that because our algorithms have minimal linguistic content they would work well on other pairs of languages. We also feel, again because of the minimal linguistic content of our algorithms, that it is reasonable to argue that word-by-word alignments are inherent in any sufficiently large bilingual corpus.


Dividing and conquering long sentences in a translation system

January 1992

·

239 Reads

·

10 Citations

The time required for our translation system to handle a sentence of length l is a rapidly growing function of l. We describe here a method for analyzing a sentence into a series of pieces that can be translated sequentially. We show that for sentences with ten or fewer words, it is possible to decrease the translation time by 40% with almost no effect on translation accuracy. We argue that for longer sentences, the effect should be more dramatic.


Table 2 Tokens in the test sample but not in the 293,181-token vocabulary. 
Table 3 Component contributions to the cross-entropy. 
An Estimate of an Upper Bound for the Entropy of English

January 1992

·

464 Reads

·

372 Citations

Computational Linguistics

We present an estimate of an upper bound of 1.75 bits for the entropy of characters in printed English, obtained by constructing a word trigram model and then computing the cross-entropy between this model and a balanced sample of English text. We suggest the well-known and widely available Brown Corpus of printed English as a standard against which to measure progress in language modeling and offer our bound as the first of what we hope will be a series of steadily decreasing bounds.


Citations (18)


... It took many months to train, and the result fell short of expectations: a 4% reduction in perplexity over the baseline trigram, and a further 9% reduction when interpolated with the latter. In the second attempt [38], much stronger bias was introduced: first, the vocabulary was clustered into a binary hierarchy as in [33], and each word was assigned a bit-string representing the path leading to it from the root. Then, tree questions were restricted to the identity of the most significant as-yetunknown bit in each word in the history. ...

Reference:

Rosenfeld, R.: Two decades of statistical language modeling: where do we go from here? Proceedings of the IEEE 88(8), 1270-1278
Language modeling using decision trees
  • Citing Article

... Other corpus-based machine translation approaches use statistical and probabilistic techniques for the analysis of the source language text and the generation of the target language text. An example of this type of corpus-based machine translation is the statistics-based approach of Brown et al. (1990). Example-based translation and statistically-based translation are so-called "empirical" approaches which apply relatively low-level statistical or pattemmatching techniques. ...

Analysis, Statistical Transfer, and Synthesis in Machine Translation

... Our work is related to the studies on segmenting long sentences into short ones, and [5] first explored dividing a sentence into a set of parts. Later, many criteria are proposed, such as N-gram, edit distance clues [7], and word alignment [22]. ...

Dividing and conquering long sentences in a translation system

... A fertility penalty actually allows the pairwise weights to be more optimistic in that they can predict more alignments for reasonable pairs, allowing the fertility penalty to ensure only the best is chosen. This penalty also prevents the "garbage collecting" effect that arises for instances that have rare features (Brown et al., 1993). ...

But dictionaries are data too

... Unsupervised Explicit Alignment does not require any labels to align the instances from different modalities [77,78]. In time series applications where multi-view time series are available, dynamic time warping (DTW) can be used as a similarity measure to align two different sequences [78,79]. ...

The Mathematics of Statistical Machine Translation: Parameter Estimation

Computational Linguistics

... Then, with access to the true distribution of our grammar, we evaluate and interpret our models. model architectures or training regimes (Brown et al., 1992), or with the expected perplexity given a (compute-optimal) scaling law (Kaplan et al., 2020;Hoffmann et al., 2022). This approach has been enormously successful, but it leaves a number of interesting questions unanswered. ...

An Estimate of an Upper Bound for the Entropy of English

Computational Linguistics

... As a result, in an uncomfortable iterative process, network retraining and HMM realignments are alternated to provide targets that are more exact. Direct training of HMM neural network hybrids has been done using full-sequence training techniques like Maximum Mutual Information to increase the likelihood of accurate transcription (Bahl et al., 1986;. However, these methods can only be used to retrain a system that has already been trained at the frame level, and they necessitate the careful adjustment of several hyperparameters, often much more than for deep neural networks. ...

Maximum mutual information estimation of hidden Markov parameters for speech recognition

Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

... The authors reported an improved performance with their techniques. In [28], a method for parameter estimation designed to cope with the inaccurate modelling assumptions sometimes made with HMM was developed. Also investigated in the paper was the trade-off between packing information into the speech feature vector and the abilty to derive accurate models from the vectors. ...

Speech recognition with continuous-parameter hidden Markov models
  • Citing Article
  • September 1987

Computer Speech & Language

... Jelinek-Mercer smoothing [31], ratio of Host Control Interface (HCI) commands frames in the flow, ration of ACL data frames in the flow, ratio of Synchronous Connection Oriented (SCO) data frames in the flow, and the ratio of HCI data frames in the flow. Malicious network flows are detected using a pretrained machine learning model for attack detection. ...

A Statistical Approach to French/English Translation

... Upper bound on entropy Reference Model (bits per character) 0.6 * − 1. 3 Shannon [86] Experiments with human subjects 1. 2 Rosenfeld [81] Maximum entropy language model 1. 25 Cover and King [26] Experiments with human subjects 1. 5 Tilbourg [94] n-grams 2. 14 Shannon [86] Word frequencies 2. 39 Witten, Moffat, Bell [100] PPMC (context length of 5) 2. 48 Moffat [69] PPMC (context length of 3) 3.1 ...

A fast algorithm for deleted interpolation
  • Citing Conference Paper
  • September 1991