Oded Ghitza

Oded Ghitza
Boston University | BU · BioMolecular Engineering Research Center

About

69
Publications
12,274
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,213
Citations

Publications

Publications (69)
Preprint
Full-text available
The human brain tracks temporal regularities in acoustic signals faithfully. Recent neuroimaging studies have shown complex modulations of synchronized neural activities to the shape of stimulus envelopes. How to connect neural responses to different envelope shapes with listeners' perceptual ability to synchronize to acoustic rhythms requires furt...
Article
Full-text available
Speech comprehension requires the ability to temporally segment the acoustic input for higher-level linguistic analysis. Oscillation-based approaches suggest that low-frequency auditory cortex oscillations track syllable-sized acoustic information and therefore emphasize the relevance of syllabic-level acoustic processing for speech segmentation. H...
Article
Full-text available
Oscillation-based models of speech perception postulate a cortical computational principle by which decoding is performed within a window structure derived by a segmentation process. Segmentation of syllable-size chunks is realized by a theta oscillator. We provide evidence for an analogous role of a delta oscillator in the segmentation of phrase-s...
Preprint
Full-text available
Oscillation-based models of speech perception postulate a cortical computational principle by which decoding is performed within a window structure derived by a segmentation process. In the syllable level segmentation is realized by a theta oscillator. We provide evidence for an analogous role of a delta oscillator at the phrasal level. We recorded...
Article
Full-text available
This is a commentary on a review article by Meyer, Sun & Martin (2019), “Synchronous, but not entrained: exogenous and endogenous cortical rhythms of speech and language processing”, doi:10.1080/23273798.2019.1693050. At the heart of this review article is the language comprehension process. Anchored at a psycho- and neurolinguistic viewpoint, the...
Preprint
Full-text available
Can neural activity reveal syntactic structure building processes and their violations? To verify this, we recorded electroencephalographic and behavioral data as participants discriminated concatenated isochronous sentence chains containing only grammatical sentences (regular trials) from those containing ungrammatical sentences (irregular trials)...
Preprint
Full-text available
Speech comprehension requires the ability to temporally segment the acoustic input for higher-level linguistic analysis. Oscillation-based approaches suggest that low-frequency auditory cortex oscillations track syllable-sized acoustic information and therefore emphasize the relevance of syllabic-level processing for speech segmentation. Most lingu...
Article
Full-text available
The rhythms of speech and the time scales of linguistic units (e.g., syllables) correspond remarkably to cortical oscillations. Previous research has demonstrated that in young adults, the intelligibility of time-compressed speech can be rescued by "repackaging" the speech signal through the regular insertion of silent gaps to restore correspondenc...
Preprint
Human listeners understand spoken language across a variety of rates, but when speech is presented three times or more faster than its usual rate, it becomes unintelligible. How the brain achieves such tolerance and why speech becomes unintelligible above certain rates is still unclear. We addressed these questions using electrocorticography (ECoG)...
Article
Full-text available
This psychoacoustic study provides behavioural evidence that neural entrainment in the theta range (3–9 Hz) causally shapes speech perception. Adopting the “rate normalization” paradigm (presenting compressed carrier sentences followed by uncompressed target words), we show that uniform compression of a speech carrier to syllable rates inside the t...
Article
Oscillation-based models of speech perception postulate a cortical computation principle by which decoding is performed within a time-varying window structure, synchronised with the input on multiple time scales. The windows are generated by a segmentation process, implemented by a cascade of oscillators. This paper tests the hypothesis that prosod...
Article
At the core of oscillation-basedmodels of speech perception is the notion that decoding is guided by parsing. In these models, parsing is executed by setting a time-varying, hierarchical window structure synchronized to the input. Syllabic parsing is into speech fragments that are multi-phone in duration, and it is realized by a theta oscillator ca...
Article
Full-text available
This study examines the decoding times at which the brain processes structural information in music and compares them to timescales implicated in recent work on speech. Combining an experimental paradigm based on Ghitza and Greenberg (Phonetica, 66(1-2), 113-126, 2009) for speech with the approach of Farbood et al. (Journal of Experimental Psycholo...
Article
Full-text available
Studies on the intelligibility of time-compressed speech have shown flawless performance for moderate compression factors, a sharp deterioration for compression factors above three, and an improved performance as a result of “repackaging”—a process of dividing the time-compressed waveform into fragments, called packets, and delivering the packets i...
Article
Full-text available
The premise of this study is that models of hearing, in general, and of individual hearing impairment, in particular, can be improved by using speech test results as an integral part of the modeling process. A conceptual iterative procedure is presented which, for an individual, considers measures of sensitivity, cochlear compression, and phonetic...
Article
Full-text available
A RECENT COMMENTARY (OSCILLATORS AND SYLLABLES: a cautionary note. Cummins, 2012) questions the validity of a class of speech perception models inspired by the possible role of neuronal oscillations in decoding speech (e.g., Ghitza, 2011; Giraud and Poeppel, 2012). In arguing against the approach, Cummins raises a cautionary flag "from a phoneticia...
Article
Full-text available
A recent opinion article (Neural oscillations in speech: do not be enslaved by the envelope. Obleser et al., 2012) questions the validity of a class of speech perception models inspired by the possible role of neuronal oscillations in decoding speech (e.g., Ghitza, 2011; Giraud and Poeppel, 2012). The authors criticize, in particular, what they see...
Article
Full-text available
Recent hypotheses on the potential role of neuronal oscillations in speech perception propose that speech is processed on multi-scale temporal analysis windows formed by a cascade of neuronal oscillators locked to the input pseudo-rhythm. In particular, Ghitza (2011) proposed that the oscillators are in the theta, beta, and gamma frequency bands wi...
Conference Paper
Full-text available
In this paper, we investigate a closed-loop auditory model and explore its potential as a feature representation for speech recognition. The closed-loop representation consists of an auditory-based, efferent-inspired feedback mechanism that regulates the operating point of a filter bank, thus enabling it to dynamically adapt to changing background...
Article
Full-text available
The premise of this study is that current models of speech perception, which are driven by acoustic features alone, are incomplete, and that the role of decoding time during memory access must be incorporated to account for the patterns of observed recognition phenomena. It is postulated that decoding time is governed by a cascade of neuronal oscil...
Chapter
This study was motivated by the hypothesis that low-frequency cortical oscillations help the brain decode the speech signal. The intelligibility (in terms of word error rate) of natural-sounding, synthetically-generated sentences was measured using a paradigm that alters speech-energy rhythm over a range of modulation frequencies. The material comp...
Article
Current predictors of speech intelligibility are inadequate for understanding and predicting speech confusions caused by acoustic interference. We develop a model of auditory speech processing that includes a phenomenological representation of the action of the Medial Olivocochlear efferent pathway and that is capable of predicting consonant confus...
Article
Full-text available
Sensory processing is associated with gamma frequency oscillations (30-80 Hz) in sensory cortices. This raises the question whether gamma oscillations can be directly involved in the representation of time-varying stimuli, including stimuli whose time scale is longer than a gamma cycle. We are interested in the ability of the system to reliably dis...
Article
Full-text available
This study was motivated by the prospective role played by brain rhythms in speech perception. The intelligibility - in terms of word error rate - of natural-sounding, synthetically generated sentences was measured using a paradigm that alters speech-energy rhythm over a range of frequencies. The material comprised 96 semantically unpredictable sen...
Conference Paper
We present a model of auditory speech processing capable of predicting consonant confusions by normal hearing listeners, based on a phenomenological model of the Medial Olivocochlear efferent pathway. We then use this model to predict human error patterns of initial consonants in consonant-vowel-consonant words. In the process we demonstrate its po...
Article
Full-text available
We developed a computational model of diphone perception based on salient properties of peripheral and central auditory processing. The model comprises an efferent-inspired closed-loop model of the auditory periphery (PAM) connected to a template-matching circuit (TMC). Robustness against background noise is provided principally by the signal proce...
Chapter
The work described here arose from the need to understand and predict speech confusions caused by acoustic interference and by hearing impairment. Current predictors of speech intelligibility are inadequate for making such predictions (even for normal-hearing listeners). The Articulation Index, and related measures, STI and SII, are geared to predi...
Article
Full-text available
In the past few years, objective quality assessment models have become increasingly used for assessing or monitoring speech and audio quality. By measuring perceived quality on an easily-understood subjective scale, such as listening quality (excellent, good, fair, poor, bad), these methods provide a quick and repeatable way to estimate customer ex...
Article
JNDS of interaural time delay (ITD) of selected frequency bands in the presence of other frequency bands have been reported for noiseband stimuli [Zurek (1985); Trahiotis and Bernstein (1990)]. Similar measurements will be reported for speech and music signals. When stimuli are synthesized with bandpass/band-stop operations, performance with comple...
Article
Full-text available
Studies in neurophysiology and in psychophysics provide evidence for the existence of temporal integration mechanisms in the auditory system. These auditory mechanisms may be viewed as "detectors," parametrized by their cutoff frequencies. There is an interest in quantifying those cutoff frequencies by direct psychophysical measurement, in particul...
Article
The hypothesis explored in this study is that the MOC efferent system plays an important role in speech reception in the presence of sustained background noise. This talk describes efforts to assess this hypothesis using a test of initial consonant reception (the Diagnostic Rhyme Test) performed by subjects with normal hearing. Activation of select...
Conference Paper
Full-text available
A coding paradigm is proposed which is based solely on the properties of the human auditory system and does not assume any specific source properties. Hence, its performance is equally good for speech, noisy speech, and music signals. The signal decomposition in the proposed paradigm takes advantage of binaural properties of the human auditory syst...
Article
Full-text available
A computational model to predict MOS of processed speech is proposed. The system measures the distortion of processed speech (compared to the source speech) using a peripheral model of the mammalian auditory system and a psychophysically-inspired measure, and maps the distortion value onto the MOS scale. This paper describes our attempt to derive a...
Article
Neurophysiological and psychophysical studies provide evidence for the existence of temporal integration mechanisms in the auditory system. These may be viewed as low‐pass filters, parametrized by their cutoff frequencies. It is of interest to specify these cutoffs, particularly for tasks germane to the effect of temporal smoothing on speech qualit...
Conference Paper
Full-text available
A computational model to predict MOS (mean opinion score) of processed speech is proposed. The system measures the distortion of processed speech (compared to the source speech) using a peripheral model of the mammalian auditory system and a psychophysically-inspired measure, and maps the distortion value onto the MOS scale. This paper describes ou...
Conference Paper
Full-text available
For many tasks in speech signal processing it is of interest to develop an objective measure that correlates well with the perceptual distance between speech segments. (By speech segments the authors mean pieces of a speech signal of duration 50-150 milliseconds. For concreteness they consider a segment to mean a diphone.) Such a distance metric wo...
Conference Paper
Full-text available
The performance of large-vocabulary automatic speech recognition (ASR) systems deteriorates severely in mismatched training and testing conditions. Signal processing techniques based on the human auditory system have been proposed to improve ASR performance, especially under adverse acoustic conditions. The paper compares one such scheme, the ensem...
Article
Full-text available
The purpose of this special session is to call the attention of the hearing science community to the need for new knowledge on how speech segments of durations of 50?150 ms long (e.g., phonemes, diphones), are being represented in the auditory system. In this session, the need for such knowledge will be addressed in the context of two specific spee...
Article
For many tasks in speechsignal processing it is of interest to develop an objective measure that correlates well with the perceptual distance between speech segments. (Speech segments are defined as pieces of a speech signal of duration 50–150 ms. For concreteness, a segment is considered to mean a diphone, i.e., a segment from the midpoint of one...
Article
At present, the performance of automatic speech recognition (ASR) systems is still limited by variabilities within and between speakers, by acoustic differences between training and application environments, and by the sensitivity of ASR systems against changing communication channels. This talk considers the conjecture that the use of speech‐produ...
Article
Full-text available
Auditory models that are capable of achieving human performance in tasks related to speech perception would provide a basis for realizing effective speech processing systems. Saving bits in speech coders, for example, relies on a perceptual tolerance to acoustic deviations from the original speech. Perceptual invariance to adverse signal conditions...
Article
Full-text available
This study provides a quantitative measure of the accuracy of the auditory periphery in representing prespecified time-frequency regions of initial and final diphones of spoken CVCs. The database comprised word pairs that span the speech space along Jakobson et al.'s binary phonemic features [Tech. Rep. No. 13, Acoustic Laboratory, MIT, Cambridge,...
Article
Full-text available
A long-standing question that arises when studying a particular auditory model is how to evaluate its performance. More precisely, it is of interest to evaluate to what extent the model representation can describe the actual human internal representation. Here, this question is addressed in the context of speech perception. That is, given a speech...
Article
In most implementations of hidden Markov models (HMMs) a state is assumed to be a stationary random sequence of observation vectors whose mean and covariance are estimated. Successive observations in a state are assumed to be independent and identically distributed. These assumptions are reasonable when each state represents a short segment of the...
Article
Full-text available
Most speech processing systems (e.g., speech recognitionsystems or speech codingsystems) contain a feature‐analysis stage that extracts the required task specific information from the speech waveform. This study addresses the question of how to identify what part of the speechinformation is lost in this process. To answer this question, a diagnosti...
Article
Full-text available
Traditional speech coding schemes are designed to produce synthesized speech with a waveform (or a spectrum) that is as close as possible to the original. With limits on the bit rate, however, it would be better to produce synthesized speech that matches the original speech at the auditory‐nerve level. Current models of the auditory periphery enabl...
Article
Full-text available
In most implementations of hidden Markov models (HMM) a state is assumed to be a stationary random sequence of observation vectors whose mean and covariance are estimated. Successive observations in a state are assumed to be independent and identically distributed. These assumptions are reasonable when each state represents a short segment of the s...
Conference Paper
Full-text available
The author describes the closed-loop ensemble-interval-histogram (EIH) model. It is constructed by adding a feedback system to the former, open-loop, EIH model (Ghitza, Computer, speech and Language, 1(2), pp.109-130, Dec. 1986). While the open-loop EIH is a computational model based upon the ascending path of the auditory periphery, the feedback s...
Article
Full-text available
Traditional speech analysis/synthesis techniques are designed to produce synthesized speech with a spectrum (or waveform) that is as close as possible to the original. It is suggested, instead, that representations of the synthetic and the original speech be matched at the auditory nerve level. This concept has been used in conjunction with the sin...
Conference Paper
Full-text available
In a previous report (Ghitza, 1987, [1]) we described a computational model based upon the temporal characteristics of the information in the auditory nerve fiber firing patterns, which produced an "auditory" spectral representation (the EIH) of the input signal. We also demonstrated that for speech recognition purposes, the EIH is more robust agai...
Article
We describe here a computational model based upon the temporal characteristics of the information in the auditory nerve-fiber firing patterns. The model produces a frequency domain representation of the input signal in terms of the ensemble histogram of the inverse of the interspike intervals, measured from firing patterns generated by a simulated...
Article
Efficient scalar quantization tables for LPC k-parameters were developed using a distortion measure based on just-noticeable-differences (JND's) in formant parameters of the speech spectrum envelope. Forty percent fewer bits were required than the 41/frame used in conventional approaches. An empirical technique was developed for relating perturbati...
Conference Paper
Full-text available
Traditional speech analysis/synthesis techniques are designed to produce synthesized speech with a spectrum (or waveform) which is as close as possible to the original. It is suggested, instead, to match the in-synchrony-bands spectrum measures (Ghitza, ICASSP-85, Tampa FL., Vol.2, p. 505) of the synthetic and the original speech. This concept has...
Conference Paper
Full-text available
A speech spectrum intensity measure based on temporal non-place modeling of the cat's auditory nerve firing patterns is introduced where the spectrum intensity values are estimated using timing-synchrony measurements only. The ability of this measure to serve as a speech information carrier was tested psychoacoustically, by integrating the proposed...
Chapter
Flanagan in 1955 performed psychoacoustical experiments to measure the JNDs for the formant center-frequency (Flanagan, 1955a) and its intensity (Flanagan, 1955b), and thereby determine the precision required in formant vocoder speech synthesis (Flanagan,1957). These experiments were performed on steady-state, synthetic speech vowels, yielding the...
Article
Full-text available
Psychoacousticaldiscrimination limits described the ability of the human observer to perceive changes in an acoustic stimulus; sounds differing by less than a “Just Noticeable Difference” (JND) are similarly heard. Flanagan and his associates [J. Acoust. Soc. Am. 27, 613 (1955); 27, 1223 (1955); 30, 435 (1958)] studied steady‐state synthetic speech...
Article
Full-text available
We describe a computational model of diphone perception based on salient properties of peripheral and central auditory processing. The model comprises an efferent-inspired closed-loop model of the auditory periphery connected to a template-matching neuronal circuit with a gamma rhythm at its core. We show that by exploiting auditory feedback a plac...
Article
Full-text available
This is a final report for a stand-alone grant supporting the first 9 months of a 4-year research program entitled "Auditory peripheral processing of degraded speech". The underlying thesis is that the auditory periphery contributes to the robust performance of humans in speech reception in noise through a concerted contribution of the efferent fee...

Network

Cited By