Lawrence Rabiner

Lawrence Rabiner
Rutgers, The State University of New Jersey | Rutgers · Department of Electrical and Computer Engineering

About

362
Publications
55,492
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
75,150
Citations

Publications

Publications (362)
Article
Recounts the career and contributions of James L. Flanagan.
Conference Paper
This paper describes a set of 58 Matlab®-based speech processing exercises designed to give students and instructors hands-on experience with digital speech processing basics, fundamentals, representations, algorithms and applications. This result is achieved by providing working Matlab code using a LITE graphical user interface (GUI) for ease of u...
Conference Paper
This paper describes a set of about 60 MATLAB®-based speech processing exercises designed to give students and instructors hands-on experience with digital speech processing algorithms and systems. This result is achieved by providing MATLAB code with a graphical user interface (GUI) for ease of use and understanding of the code. For each MATLAB ex...
Conference Paper
A new loss function has been introduced for Minimum Classification Error, that approaches optimal Bayes' risk and also gives an improvement in performance over standard MCE systems when evaluated on the Aurora connected digits database.
Conference Paper
A Minimum Classification Error (MCE) based recognition system that also estimates a global feature transformation matrix has been implemented. Unlike earlier studies, we make the explicit assumption that the covariance matrix of the Gaussian mixtures is diagonal when estimating the transformation matrix. This is necessary for mathematical consisten...
Conference Paper
We have developed a novel loss function that embeds large-margin classification into Minimum Classification Error (MCE) training. Unlike previous efforts this approach employs a loss function that is bounded, does not require incremental adjustment of the margin or prior MCE training. It extends the Bayes risk formulation of MCE using Parzen Window...
Conference Paper
Phonemes in the English language can be represented using either parallel or hierarchical distinctive speech features. There have been a number of efforts to integrate multiple information sources but none of these efforts addressed the issue of combining multiple sets of articulatory/linguistic features with different organization topologies. In t...
Chapter
The quest for amachine that can recognize and understand speech, from any speaker, and in any environment has been the holy grail of speech recognition research for more than 70 years. Although we have made great progress in understanding how speech is produced and analyzed, and although we have made enough advances to build and deploy in the field...
Chapter
IntroductionRecognition Technology HierarchySources of Variability of SpeechIssues in Implementation of Speech Recognition SystemsStatistical Pattern Recognition ModelTemplates Versus Statistical ModelsResults on Isolated Word RecognitionConnected-Word Recognition ModelContinuous, Large Vocabulary, Speech RecognitionSummaryReferences
Conference Paper
Full-text available
Automatic Speech Attribute Transcription (ASAT), an ITR project sponsored under the NSF grant (IIS-04-27113), is a cross-institute effort involving Georgia Institute of Technology, The Ohio State University, University of California at Berkeley, and Rutgers University. This project approaches speech recognition from a more linguistic perspective: u...
Conference Paper
Time-Delay Neural Networks (TDNN) have been shown by Waibel et al. (1) to be a good method for the classification of dynamic speech sounds such as voiced stop consonants. In this paper we discuss key issues in the design and training of a TDNN, based on a Multi-Layer Perceptron (MLP), when used for classification of the sets of voiced stop consonan...
Article
Since even before the time of Alexander Graham Bell's revolution- ary invention, engineers and scientists have studied the phenomenon of speech communication with an eye on creating more efficient and effective systems of human-to-human and human-to-machine communi- cation. Starting in the 1960s, digital signal processing (DSP), assumed a central r...
Chapter
Statistical methods for speech processing refer to a general methodology in which knowledge about both a speech signal and the language that it expresses, along with practical uses of that knowledge for specific tasks or services, is developed from actual realizations of speech data through a well-defined mathematical and statistical formalism. For...
Article
Biing-Hwang Juang (Ph.D., University of California, Santa Barbara, 1981) has worked at Speech Communications Research Laboratory and Signal Technology, Inc. on several government-sponsored research projects. He joined the Acoustics Research Department at Bell Laboratories in 1982. In 1996 he became the director of the Acoustics and Speech Research...
Conference Paper
In this paper we discuss the design and implementation of the ASAT front end processing system, whose goal is to convert the speech waveform into a range of measurements and parameters which are then combined to form probabilistic attributes. The ASAT front end processing module utilizes a range of spectral and temporal speech parameters as input t...
Conference Paper
Full-text available
Camera calibration is an important step in D reconstruction of scenes. Many natural and man made objects are circular and form good candidates as calibration objects. We present a linear calibration algorithm to estimate the intrinsic camera parameters using at least three images of con centric circles of unknown radii. Novel methods to determine t...
Conference Paper
Earlier research has shown that the maximum spectral transition positions are related with the perceptual critical points that contain the most important information for consonant and syllable perception. This paper presents a quantitative analysis of the relation, in time, between the maximum spectral transition positions and the phone boundaries...
Article
Automatic chord recognition has been a topic of interest in the context of Music Information Retrieval (MIR) for several years, and attempts have been made at implementing such systems using well understood Signal Processing and Pattern Recognition techniques. The sequence of chords in a musical recording, in addition to providing the melody, often...
Conference Paper
In spite of the effort and progress made during the last few decades, the performance of automatic speech recognition (ASR) systems still lags far behind that achieved by humans. Some researchers think that more speech data will be sufficient in order to bridge this performance gap. Others think that radical modifications to the current methods nee...
Article
Designing a machine that mimics human behavior, particularly the capability of speaking naturally and responding properly to spoken language, has intrigued engineers and scientists for centuries. Since the 1930s, when Homer Dudley of Bell Laboratories proposed a system model for speech analysis and synthesis (1, 2), the problem of automatic speech...
Conference Paper
Full-text available
A heterogeneous distributed system that enables people in geographically separate locations to share a common workspace is presented. In particular, the applicability of such a system to the notion of asymmetric collaboration is illustrated by a chess scenario. In our system one user (novice) works in the real world and the other user (expert) work...
Conference Paper
Full-text available
We present a novel paradigm for human to human asymmetric collaboration. There is a need for people at geographically separate locations to seamlessly collaborate in real time as if they are physically co-located. In our system one user (novice) works in the real world and the other user (expert) works in a parallel virtual world. They are assisted...
Article
In the multimedia world of future communications, speech will play an increasingly important role. From speaker verification to automatic speech recognition and the understanding of key phrases by computers, the spoken word will replace keyboards and pointing devices like the mouse. In his Perspective, Rabiner discusses recent advances and remainin...
Article
Digital signal processing (DSP) is a fundamental tool for much of the research that has been carried out of Bell Labs in the areas of speech and acoustics research. The fundamental bases for DSP include the sampling theorem of Nyquist, the method for digitization of analog signals by Shannon et al., methods of spectral analysis by Tukey, the cepstr...
Article
Full-text available
In the future, the world of telecommunications will be vastly different than it is today. The driving force will be the seamless integration of real time communications (e.g. voice, video, music, etc.) and data into a single network, with ubiquitous access to that network anywhere, anytime, and by a wide range of devices. The only currently availab...
Chapter
The advent of digital multimedia communications has generated a growing need for powerful multimedia processing techniques to enable the generation of useful and intelligent communications services. Multimedia processing techniques play a significant role in creating communications services by; 1) enabling efficient transmission and storage of mult...
Article
Full-text available
Discusses coding standards for still images and motion video. We first briefly discuss standards already in use, including: Group 3 and Group 4 for bilevel fax images; JPEG for still color images; and H.261, H.263, MPEG-1, and MPEG-2 for motion video. We then cover newly emerging standards such as JBIG1 and JBIG2 for bilevel fax images, JPEG-2000 f...
Article
Full-text available
The challenge of multimedia processing is to provide services that seamlessly integrate text, sound, image, and video information and to do it in a way that preserves the ease of use and interactivity of conventional plain old telephone service (POTS) telephony. To achieve this goal, there are a number of technological problems that must be conside...
Conference Paper
Advances in speech recognition technology, over the past 4 decades (1950s to 1990s), have enabled a wide range of telecommunications and desktop services to become `voice enabled'. Early applications were driven by the need to automate and thereby reduce the cost of attendant services, or by the need to create revenue generating new services which...
Article
Full-text available
We are currently in the midst of a revolution in communications that promises to provide ubiquitous access to multimedia communication services. In order to succeed, this revolution demands seamless, easy-to-use, high quality interfaces to support broadband communication between people and machines. In this paper we argue that spoken language inter...
Conference Paper
Full-text available
The challenge of multimedia processing is to seamlessly integrate text, sound, image, and video information into a single communications channel, and to do it in a way that provides high quality communications while preserving the ease-of-use and interactivity of conventional telephony. There are a number of technology drivers that are pushing the...
Article
Historically, the first major revolution in communications occurred around the turn of the 20th century when the concept of ??Universal Service?? became the rallying point for creating a system where everyone had access to a telephone and could connect automatically and without operator assistance to any other telephone user. The next major revolut...
Article
For the past two decades, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Speech recognition systems have been developed for a wide variety of applications, ranging from small vocabulary keyword recognition over dial-up telephone lines,...
Article
Full textFull text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (721K), or click on a page image below to browse page by page. 9911 9912 9913
Article
Research has been conducted in the area of voice processing for over six decades but it has only been in the past few years that the impact of the years of research is starting to be seen in modern telecommunications systems. Virtually every area of voice processing, including speech coding, speech synthesis, speech recognition, and even, to a smal...
Article
Multimedia services are made possible by a host of underlying technologies. These include the processing of speech, audio, image and video signals, and handwritten data, as well as the high-quality transmission of audiovisual messages and data information. Audiovisual signal processing incorporates the subtechnologies of coding, synthesis, and reco...
Article
Vision 2001, AT&T's concept for 21st-century communications, is a world where access to people, machines, and information is easy and convenient, and where every type of communication and message service is ubiquitous and readily accessible. To make Vision 2001 a reality, advances are required in speech, audio and video signal processing; computer...
Article
The broad goal of Vision 2001 is to provide seamless, easy-to-use, high-quality, and affordable communications between people and machines — anywhere and any time. To achieve this goal, the fields of computing, communications, and networking must converge in a variety of information terminals and network services. To make Vision 2001 a reality, the...
Article
In this paper, we review the current state of the art in automatic speech recognition and discuss future directions for the technology. For the past several years, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Speech recognition system...
Chapter
The ways in which people communicate are changing rapidly. The standard voice call over a wired network is but one means of communications, which already includes cordless and wireless voice calls, video calls, beeper service, FAX service, e-mail service, and data services. This revolution in communications is being fueled by several sources, inclu...
Conference Paper
During the decade of the 1990s, the fields of communications, computing, and networking are coming together in the form of personal information/communication terminals, and in the associated services (so-called personal communications services, PCS). Several technologies will play major roles in this communications revolution, but one of the key on...
Article
The ways in which people communicate are changing rapidly. The options are many and diverse, ranging from voice calls over wireless networks, to video calls over the conventional wired network, ISDN video, FAX, e-mail, voice mail, beeper services, data services, audio teleconferencing, video teleconferencing, and so-called scribble phone service (t...
Article
In this paper, we review the current state of the art in automatic speech recognition and discuss future directions for the technology. Over the past several years, intensive research in speech recognition has been carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Speech recognition systems...
Article
Research in large vocabulary speech recognition has been intensively carried out worldwide, in the past several years, spurred on by advances in algorithms, architectures and hardware. In the United States, the DARPA community has focused efforts on studying several continuous speech recognition tasks including Naval Resource Management, a 991 word...
Article
During the past several years, research in large-vocabulary speech recognition has been intensively carried out worldwide, encouraged by advances in algorithms, architecture, and hardware. In the United States, the defense advanced-research projects agency (DARPA) spoken-language-processing community has focused its efforts on studying several syst...
Article
Accurate and robust connected digit recognition is essential for a wide range of telecommunication services. Based on training and testing using only clean network digit data, and using the same whole‐word model architecture as in the TI/NIST connected digit testing, the string error rate increased from less than 1% to more than 5%. The performance...
Article
The problem of recognizing strings of connected digits is crucial to a number of applications such as voice dialing of telephone numbers, automatic data entry, credit card entry, PIN (personal identification number) entry, entry of access codes for transactions, etc. Algorithms for connected digit recognition, based on whole-word reference patterns...
Article
Connected digit recognition is a problem that has received a lot of attention over the past several years because of its importance in providing speech recognition services (e.g., catalog ordering, credit card entry, all digit dialing of telephone numbers, etc.). Although a number of systems have been described that provide very high string accurac...
Article
Word juncture coarticulation is one of the major sources of acoustic variability for initial and final word segments when spoken in fluent speech. One way to improve characterization of word pronuciations in continuous speech is to include inter-word contexts in lexical representations, similar to the way intra-word contexts are utilized. In this p...
Article
An important area of speech recognition is automatic recognition of connected digit strings (i.e., sequences composed of the digits zero through nine, and oh). Applications of this technology include credit card authorization, catalog ordering, dialing of telephone numbers, and data entry. For the past two years AT&T has experimented with a system...
Article
We report on some recent improvements to an HMM-based, continuous speech recognition system which is being developed at AT&T Bell Laboratories. These advances, which include the incorporation of inter-word, context-dependent units and an improved feature analysis, lead to a recognition system which gives a 95% word accuracy for speaker-independent...
Chapter
The field of large vocabulary continuous speech recognition has advanced to the point where there are several systems capable of providing greater than 95% word accuracy for speaker independent recognition, of a 1000 word vocabulary, spoken fluently for a task with a perplexity of about 60. There are several factors which account for the high perfo...
Chapter
The use of hidden Markov models for speech recognition has become predominant for the last several years, as evidenced by the number of published papers and talks at major speech conferences. The reasons why this method has become so popular are the inherent statistical (mathematically precise) framework, the ease and availability of training algor...
Chapter
In this paper, we present an efficient data structure for implementing a continuous, large vocabulary, speech recognizer. The recognition system is based on hidden Markov models of phonetic units for representing both intraword and interword context dependent phones. Due to the large number of connections present in the decoding network, the struct...
Conference Paper
Spectrum-based speech representations are discussed. Spectral representations, in order to be useful for speech recognition, need to be justified from both the computational (analytical) and the perceptual viewpoints. The authors' discussion of spectral representations, therefore, includes both the computational model and the associated measures of...
Article
Two methods for generating training sets for a speech recognition system are studied. The first uses a nondeterministic statistical method to generate a uniform distribution of sentences from a finite state machine (FSM) represented in digraph form. The second method, a deterministic heuristic approach, takes into consideration the importance of wo...
Conference Paper
Full-text available
It is shown how one can apply the improved acoustic modeling techniques (using a continuous density hidden Markov model framework) developed for large vocabulary speech recognition applications to the problem of connected digit recognition with no changes made to the basic modeling techniques and with no vocabulary specific information used. The im...
Article
Full-text available
The authors provide a detailed description of all aspects of the implementation of a large-vocabulary speaker-independent, continuous speech recognizer used as a tool for the development of recognition algorithms based on hidden Markov models (HMMs) and Viterbi decoding. The complexity of HMM recognizers is greatly increased by the introduction of...
Article
Full-text available
The modifications made to a connected word speech recognition algorithm based on hidden Markov models (HMMs) which allow it to recognize words from a predefined vocabulary list spoken in an unconstrained fashion are described. The novelty of this approach is that statistical models of both the actual vocabulary word and the extraneous speech and ba...
Article
The technology of speech recognition has evolved for almost 2 decades since the introduction of sophisticated pattern recognition techniques such as dynamic time warping and clustering. Applications of the technology have been slower to evolve for several reasons, including system performance, system cost, and the general acceptance of voice techno...
Article
The authors discuss and document a parameter estimation algorithm for data sequence modeling involving hidden Markov models. The algorithm, called the segmental K -means method, uses the state-optimized joint likelihood for the observation data and the underlying Markovian state sequence as the objective function for estimation. The authors prove t...
Article
Full-text available
We report on some recent improvements to an HMM- based, continuous speech recognition system which is being developed at AT&T Bell Laboratories. These advances, which include the incorporation of inter-word, context-dependent units and an improved feature analysis, lead to a recognition system which achieves better than 95% word accuracy for speake...
Conference Paper
Full-text available
An approach for designing a set of acoustic models for speech recognition applications which results in a minimal empirical error rate for a given decoder and training data is studied. In an evaluation of the system for an isolated word recognition task, hidden Markov models (HMMs) are used to characterize the probability density functions of the a...
Conference Paper
Full-text available
Acoustic modeling method of basic speech subword units are discussed to provide high word recognition accuracy. It is shown that for a basic set of 47 context-independent phone-like units, word accuracies on the order of 86-90% can be obtained for a 1000-word vocabulary, in a speaker-independent mode, for a grammar with a perplexity of 60, on indep...
Article
Algorithms for speech recognition can be characterized broadly as pattern recognition approaches and acoustic phonetic approaches. To date, the greatest degree of success in speech recognition has been obtained using pattern recognition paradigms. The use of pattern recognition techniques were applied to the problems of isolated word (or discrete u...
Article
Full-text available
The field of large vocabulary, continuous-speech recognition has advanced to the point where there are several systems capable of attaining between 90 and 95% word accuracy for speaker-independent recognition, of a 1000-word vocabulary, spoken fluently for a task with a perplexity (average word branching factor) of about 60. There are several facto...
Article
Some relations among approaches that have been applied to estimating models for acoustic signals in speech recognition systems are examined. In particular, the modeling approaches based on maximum likelihood (ML), maximum mutual information (MMI), and minimum discrimination information (MDI) are studied. It is shown that all three approaches can be...
Article
The field of digital speech processing includes the areas of speech coding, speech synthesis, and speech recognition. With the advent of faster computation and high speed VLSI circuits, speech processing algorithms are becoming more sophisticated, more robust, and more reliable. As a result, significant advances have been made in coding, synthesis,...
Article
Algorithms for speech recognition can be characterized broadly as pattern recognition approaches and acoustic phonetic approaches. To date, the greatest degree of success in speech recognition has been obtained using pattern recognition paradigms. Thus, in this paper, we will be concerned primarily with showing how pattern recognition techniques ha...
Article
Full-text available
Most large vocabulary speech recognition systems essentially consist of a training algorithm and a recognition structure which is essentially a search for the best path through a rather large decoding network. Although the performance of the recognizer is crucially tied to the details of the training procedure, it is absolutely essential that the r...
Article
Full-text available
A description is given of an implementation of a novel frame-synchronous network search algorithm for recognizing continuous speech as a connected sequence of words according to a specified grammar. The algorithm, which has all the features of earlier methods, is inherently based on hidden Markov model (HMM) representations and is described in an e...
Article
An iterative approach for minimum-discrimination-information (MDI) hidden Markov modeling of information sources is proposed. The approach is developed for sources characterized by a given set of partial covariance matrices and for hidden Markov models (HMMs) with Gaussian autoregressive output probability distributions (PDs). The approach aims at...
Conference Paper
Full-text available
The problem of how to select and construct a set of fundamental unit statistical models suitable for speech recognition is addressed. A unified framework is discussed which can be used to accomplish the goal of creating effective basic models of speech. The performances of three types of fundamental units, namely whole word, phoneme-like, and acous...
Conference Paper
Full-text available
The authors describe an HMM (hidden Markov model) clustering procedure and discuss its application to connected-word systems and to large-vocabulary recognition based on phonelike units. It is shown that the conventional approach of maximizing likelihood is easily implemented but does not work well in practice, as it tends to give improved models o...
Conference Paper
Full-text available
The authors present an algorithm based on hidden Markov models which can recognize a predefined set of vocabulary items spoken in the context of fluent speech. They show that for a vocabulary of five words, it is possible to correctly recognize 87.1% of keywords when they occur in fluent speech and are spoken over a long-distance telephone network....
Article
This tutorial provides an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and gives practical details on methods of implementation of the theory along with a description of selected applications of the theory to distinct problems in speech recognition. Results from a number of original sou...
Article
Full-text available
The field of large vocabulary, continuous speech recognition has advanced to the point where there are several systems capable of attaining between 90 and 95% word accuracy for speaker independent recognition of a 1000 word vocabulary, spoken fluently for a task with a perplexity (average word branching factor) of about 60. There are several factor...
Article
A circuit for electronically synthesizing speech has an audio generator for representing voiced sounds and a noise generator for representing voiceless sounds and a means for selecting significant parameters of the various speech elements by sampling and a means for storing those parameters. The circuit also includes a filter unit comprised of a nu...
Article
At AT&T Bell Laboratories, a broad range of systems for speech recognition, depending on the intended application area, has been studied. These systems have been extensively studied for isolated word (and phrase) recognition for command and control applications where a single word (or phrase) suffices to effect some type of control over a system or...
Article
Techniques for training hidden Markov model (HMM) parameters from a labeled training set of data are well established and include the forward‐backward algorithm as well as the segmental K‐means algorithm. These algorithms have been shown to be capable of estimating the parameters of an HMM based on mathematically well‐founded techniques. In practic...
Conference Paper
Past research has shown that a connected digit recognition system, based on either word templates or word hidden Markov models (HMM), could effectively be trained using a segmental k -means training procedure. In these studies, a set of randomly generated digit strings of variable length was used to train the recognizer. However, problems were enco...

Network