This paper presents a pathological voice identification system employing signal processing techniques through cochlear implant models. The fundamentals of the biological process for speech perception are investigated to develop this technique. Two cochlear implant models are considered in this work: one uses a conventional bank of bandpass filters, and the other one uses a bank of optimized gammatone filters. The critical center frequencies of those filters are selected to mimic the human cochlear vibration patterns caused by audio signals. The proposed system processes the speech samples and applies a CNN for final pathological voice identification. The results show that the two proposed models adopting bandpass and gammatone filterbanks can discriminate the pathological voices from healthy ones, resulting in F1 scores of 77.6% and 78.7%, respectively, with speech samples. The obtained results of this work are also compared with those of other related published works.
In a typical cochlear implant design, the ambient sound is detected via a microphone and the transmission unit of the implant is placed at the back of the auricle. However, this design has several drawbacks. Firstly, the subject cannot bath or swim comfortably with the microphone unit on, and secondly having an external attached unit which may be visible is cosmetically disturbing. Herein, the idea is to explore obtaining the acoustic signals that would directly drive the cochlear nerves, without using a microphone, in which only the vibrations of the ossicles are employed. Thus, the natural filter caused by the anatomy of the ear may be maintained. The proposed method is to place or attach a micro-electro-mechanical-system (MEMS) type of tiny and lightweight accelerometer to sense or detect the vibrations of ossicles, namely malleus, incus and stapes. A quick analysis or first-thought revealed that physically longer extension of the incus is the most suitable and/or convenient place to attach such a sensor. The model adopted has been optimized to match the amplitude and phase response of the human ear from a system analysis point of view. Some simulation experiments had been done to study and understand the possible loading effects of placing a sensor on the incus. Purpose of the simulations is testing the feasibility before the very difficult surgical procedures. Preliminary results indicate that placing a sensor of weight up to 36 mg does not seriously affect the amplitude and the phase response of the ear. This study is yet another example of how simulations of physiological systems can be advantageous and facilitating in the design of biomedical systems.
Vowels, consonants, and sentences were processed through software emulations of cochlear-implant signal processors with 2-9 output channels. The signals were then presented, as either the sum of sine waves at the center of the channels or as the sum of noise bands the width of the channels, to normal-hearing listeners for identification. The results indicate, as previous investigations have suggested, that high levels of speech understanding can be obtained using signal processors with a small number of channels. The number of channels needed for high levels of performance varied with the nature of the test material. For the most difficult material--vowels produced by men, women, and girls--no statistically significant differences in performance were observed when the number of channels was increased beyond 8. For the least difficult material--sentences--no statistically significant differences in performance were observed when the number of channels was increased beyond 5. The nature of the output signal, noise bands or sine waves, made only a small difference in performance. The mechanism mediating the high levels of speech recognition achieved with only few channels of stimulation may be the same one that mediates the recognition of signals produced by speakers with a high fundamental frequency, i.e., the levels of adjacent channels are used to determine the frequency of the input signal. The results of an experiment in which frequency information was altered but temporal information was not altered indicates that vowel recognition is based on information in the frequency domain even when the number of channels of stimulation is small.
Recent studies have shown that high levels of speech understanding could be achieved when the speech spectrum was divided into four channels and then reconstructed as a sum of four noise bands or sine waves with frequencies equal to the center frequencies of the channels. In these studies speech understanding was assessed using sentences produced by a single male talker. The aim of experiment 1 was to assess the number of channels necessary for a high level of speech understanding when sentences were produced by multiple talkers. In experiment 1, sentences produced by 135 different talkers were processed through n (2 < or = n < or = 16) number of channels, synthesized as a sum of n sine waves with frequencies equal to the center frequencies of the filters, and presented to normal-hearing listeners for identification. A minimum of five channels was needed to achieve a high level (90%) of speech understanding. Asymptotic performance was achieved with eight channels, at least for the speech material used in this study. The outcome of experiment 1 demonstrated that the number of channels needed to reach asymptotic performance varies as a function of the recognition task and/or need for listeners to attend to fine phonetic detail. In experiment 2, sentences were processed through 6 and 16 channels and quantized into a small number of steps. The purpose of this experiment was to investigate whether listeners use across-channel differences in amplitude to code frequency information, particularly when speech is processed through a small number of channels. For sentences processed through six channels there was a significant reduction in speech understanding when the spectral amplitudes were quantized into a small number (< 8) of steps. High levels (92%) of speech understanding were maintained for sentences processed through 16 channels and quantized into only 2 steps. The findings of experiment 2 suggest an inverse relationship between the importance of spectral amplitude resolution (number of steps) and spectral resolution (number of channels).
This paper describes the listening habits and musical enjoyment of postlingually deafened adults who use cochlear implants. Sixty-five implant recipients (35 females, 30 males) participated in a survey containing questions about musical background, prior involvement in music, and audiologic success with the implant in various listening circumstances. Responses were correlated with measures of cognition and speech recognition. Sixty-seven implant recipients completed daily diaries (7 consecutive days) in which they reported hours spent in specific music activities. Results indicate a wide range of success with music. In general, people enjoy music less postimplantation than prior to hearing loss. Musical enjoyment is influenced by the listening environment (e.g., a quiet room) and features of the music.
Abbreviations: HIGH = high school, POST = adult education, PRIM = primary school, SCT = Sequence Completion Test, SE = standard error, TERT = tertiary, VMT = Visual Monitoring Task, VMT1 = Visual Monitoring Task one per second rate, VMT2 = Visual Monitoring Task two per second rate.
The ability of a normal human listener to recognize objects in the environment from only the sounds they produce is extraordinarily robust with regard to characteristics of the acoustic environment and of other competing sound sources. In contrast, computer systems designed to recognize sound sources function precariously, breaking down whenever the target sound is degraded by reverberation, noise, or competing sounds. Robust listening requires extensive contextual knowledge, but the potential contribution of sound-source recognition to the process of auditory scene analysis has largely been neglected by researchers building computational models of the scene analysis process. This thesis proposes a theory of sound-source recognition, casting recognition as a process of gathering information to enable the listener to make inferences about objects in the environment or to predict their behavior. In order to explore the process, attention is restricted to isolated sounds produced by a small class of sound sources, the non-percussive orchestral musical instruments. Previous research on the perception and production of orchestral instrument sounds is reviewed from a vantage point based on the excitation and resonance structure of the sound-production process, revealing a set of perceptually salient acoustic features. A computer model of the recognition process is developed that is capable of "listening" to a recording of a musical instrument and classifying the instrument as one of 25 possibilities. The model is based on current models of signal processing in the human auditory system. It explicitly extracts salient acoustic features and uses a novel improvisational taxonomic architecture (based on simple statistical pattern-recognition techniques) to classify the sound source. The performance of the model is compared directly to that of skilled human%0
A PERCEPTUAL REPRESENTATION OF AUDIO by Daniel Patrick Whittlesey Ellis Submitted to the Department of Electrical Engineering and Computer Science on Feb 5th 1992, in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering and Computer Science The human auditory system performs many remarkable feats; we only fully appreciate how sophisticated these are when we try to simulate them on a computer. Through building such computer models, we gain insight into perceptual processing in general, and develop useful new ways to analyze signals. This thesis describes a transformation of sound into a representation with various properties specifically oriented towards simulations of source separation. Source separation denotes the ability of listeners to perceive sound originating from a particular origin as separate from simultaneous interfering sounds. An example would be following the notes of a single instrument while listening to a...
This study was designed to evaluate the relative importance of transients, harmonic structure, and vibrato as timbre cues in the absolute judgment of musical tones. Tape recordings were made of tones played, by experienced musicians in an anechoic chamber, on ten different instruments at three selected frequencies of the equally tempered scale. Appropriate retaping and splicing procedures provided five types of stimulus material: (1) entire tone (initial transients, steady state, and final transient); (2) entire tone (initial transients, shortened steady state, and final transients); (3) initial transients and steady‐state portion only; (4) steady‐state portion only; and (5) steady state and final transients only. The final test tape consisted of 300 randomly ordered tonal stimuli (10 instruments×3 frequencies×2 playing styles×5 types of tones). Twenty trained musicians were tested and retested on the final tape in two experimental sessions. Ss were provided with a reminder list of 39 instruments (grouped into classes), and were required, on hearing a test tone, to identify the appropriate instrument. The results of this study indicate that (1) some instruments are better identified than others, (2) absolute judgments are better at F 4 than at C 4 or A 1, (3) a vibrato tone is better‐identified than a nonvibrato tone, (4) initial transients with a steady‐state portion of the tone produce the best identification, and (5) identification is significantly improved with practice. In general, discrimination was relatively poor, with only about 40% of the absolute judgments being correct.
The purposes of this study were (a) to evaluate the Primary Measures of Music Audiation (PMMA) as a test of musical perception for postligually deafened adult cochlear implant (CI) users; and (b) to report test outcome
on the Rhythm and Tonal subtests of the PMMA. Correlations between PMMA scores and speech perception tasks were calculated. Subjects were 34 postlingually deafened adults with CI experience. Subject
performance on the PMMA was analyzed to determine test usability and technical adequacy (reliability, item discrimination, and difficulty) for this
particular population. Comparisons were made across two different implant types (Nucleus and Ineraid devices) and across Rhythm
and Tonal subtests. The PMMA was found to be usable with minor adjustments. No significant differences in accuracy were found for the Rhythm or Tonal
subtest across devices. However, CI (Nucleus and Ineraid) users were significantly more accurate on the Rhythm than the Tonal
subtest (p ≤ .001). The mean difficulty for the Rhythm subtest was 84.93, while the mean difficulty for the Tonal subtest was 77.50.
The mean discrimination indices were as follows: Rhythm subtest, .18; Tonal subtest, .28. The Tonal subtest contained a larger
number of items within the satisfactory range for item difficulty and item discrimination. The strongest correlations between
musical perception and speech perception were between the Tonal subtest and the speech perception measures of phoneme identification
(r = .45) and accent recognition (r = .46).
The problem in this study was to determine and compare the success of two groups of musicians and a group of non-musicians in their efforts to identify nine different musical instruments by tone quality alone when each instrument was heard directly and indirectly over a public-address system. An effort was made to use instruments that were representative of the strings, brass, wood-winds, reeds, and percussion groups. In order to make tone quality the principal factor in identification, an effort was made to use a tone whose frequency was as nearly common to all of the instruments as was possible. The tone agreed upon was middle C or its octave. The instruments were presented in a definite order with rest periods between each presentation. The data show that some of the instruments were recognized with greater ease and more certainty than others. In general, the percent of correct identifications was lower for each of the three groups when the tones were heard over a public address system than when heard directly. Statistical examination of the reliability of the difference between the means for each of the three groups, however, revealed that only the difference found for the members of the first group of musicians was large enough to meet Fisher's test for significance at the five percent level.
This study was designed to evaluate the relative importance of transients, harmonic structure, and vibrato as timbrecues in the absolute judgment of musical tones. Tape recordings were made of tones played on ten different instruments at C4, F4, and A4 of the equally tempered scale. Appropriate splicing and retaping procedures provided a test tape with 300 randomly ordered tonal stimuli (10 instruments × 3 frequencies × 2 playing styles × 5 types of tones). Twenty trained musicians were tested and retested on the final tape in two experimental sessions. S's were provided with a reminder list of 39 instruments (grouped into classes) and were required to identify the particular instrument for each tonal stimulus. The results of this study indicate that (1) some instruments (e.g., clarinet,oboe, and flute) are identified more easily than others (e.g., violin,cello, and bassoon); (2) more correct identifications are made at F4 than C4 or A4; (3) the best identification is made for stimuli consisting of initial transients and a short steady state; (4) a vibratotone is better identified than a nonvibrato tone; and (5) identification is improved significantly with practice. Suggestions are offered for further research.
Typical methods for observing pitch changes with intensity for pure tones consist of varying the frequency of one tone of fixed intensity (the comparison tone) so as to match the pitch of a second tone of fixed frequency (the tone under test) when the latter is set at different intensities. Differences between the comparison and test‐tone frequencies, when equated in pitch under these conditions, are ascribed to their intensity differences and used as a measure of the pitch intensity shifts for the test tone. The comparison and test tones may differ in frequency, however, even when they are matched in pitch under equal intensity conditions. These frequency differences, called pitch‐matching errors, may be identical to those noted above for comparable conditions and consequently nullify the apparent pitch‐intensity shifts. This possibility was studied in two experiments which sought to reproduce the pitch intensity relationship as defined by the data of Stevens and Snow. In more than half the comparisons made under various frequency and intensity conditions, apparent pitch shifts for 50, 75, 100, 200, 400, 700, 1500, and 6000 cps were found to be not significantly different from pitch‐matching errors. When averaged, the remaining shifts followed the directions of Stevens' curves but were small (2% or less). Pitch‐intensity functions for the lower frequency tones were particularly variable and generally bore little relationship to Snow's functions for such tones.
Listeners' recognition of wind‐instrument tones was investigated for the tone concert F on the treble staff (frequency approximately 349 cps). Tones of a flute,oboe, B‐flat clarinet,tenorsaxophone,altosaxophone, cornet, trumpet, French horn,trombone, and baritone were recorded, equated for duration and sound level as well as frequency. Thirty university band members listened to these tones played back: unaltered, backward, with the rise and decay removed, and through a 480‐cps low‐pass filter. Recognition of wind‐instrument tone quality was best for the unaltered playback, next best under the backward condition, next in the absence of rise and decay, and poorest under the filtered condition.
An experiment was designed to assess the relative contribution to listener categorization strategies of various temporal partitions of the acoustic signal for trumpet, clarinet, and violin. The role of context, whole phrase versus single note, was also evaluated. Analog recordings of three folk-song phrases performed on two clarinets, violins, and trumpets were digitized. A computer program was developed for digital signal editing. Signal edit conditions included normal, time-variant steady-state alone, transients alone, and static steady state with and without transients. Musicians and nonmusicians responded to a matching procedure in which unedited signals of one phrase were choice stimuli and edited signals for two different phrases served as models. Two replications of all possible combinations of instrument, phrase, and edit conditions were presented for a total of 72 items. Two additional groups of musicians and nonmusicians participated in an identical procedure in which the stimuli were single notes extracted from two phrases. Analyses revealed that, for the whole-phrase signals, there was no case in which the means obtained with the "normal" signal and the "time variant steady state alone" signal were statistically different; these means were always statistically higher than the "transients alone" mean. It was concluded that transients were neither sufficient nor necessary for the categorization of trumpet, clarinet, and violin in whole-phrase contexts. The time- variant quasi-steady state was sufficient and necessary for the categorization of trumpet and violin phrases, and it was sufficient but not necessary for the categorization of clarinet phrases. For the single- note stimuli, "transients alone" yielded means statistically equivalent to the "normal" and "time variant steady state alone" means. It was concluded that transients were sufficient, but not necessary, for instrument categorization in single-note contexts. The whole-phrase context yielded significantly higher means than the single-note context; music majors performed the task with greater accuracy than nonmusic majors.
Two experiments were performed to evaluate the perceptual relationships between 16 music instrument tones. The stimuli were computer synthesized based upon an analysis of actual instrument tones, and they were perceptually equalized for loudness, pitch, and duration. Experiment 1 evaluated the tones with respect to perceptual similarities, and the results were treated with multidimensional scaling techniques and hierarchic clustering analysis. A three-dimensional scaling solution, well matching the clustering analysis, was found to be interpretable in terms of the spectral energy distribution; the presence of synchronicity in the transients of the higher harmonics, along with the closely related amount of spectral fluctuation within the tone through time; and the presence of low-amplitude, high-frequency energy in the initial attack segment; an alternate interpretation of the latter two dimensions viewed the cylindrical distribution of clusters of stimulus points about the spectral energy distribution, grouping on the basis of musical instrument family (with two exceptions). Experiment 2 was a learning task of a set of labels for the 16 tones. Confusions were examined in light of the similarity structure for the tones from experiment 1, and one of the family-grouping exceptions was found to be reflected in the difficulty of learning the labels.
Presently devised single channel devices generate relatively primitive sensation of hearing. They provide some enhancement of communication skills for the totally deaf. Definite psychological advantages for the totally deaf have been observed. Pitch discrimination is by the mechanism of "periodicity pitch." No "place" pitch encoding is possible. The recognition of complex sounds is not possible. Multiple segments of auditory nerve must be stimulated in a manner which will stimulate the complex patterns of neural activity necessary for speech discrimination. Electrodes can be optimized and the pathophysiological consequences of electrical stimulation can be determined in experimental animals. The perceptual consequences of electrical stimulation, however, can best be determined in man himself. How much we will have to rely on known and future methods of aural rehabilitation will depend upon how well perceptual speech patterns can be generated by electrical stimulation of the auditory nerve.
HIGH levels of speech recognition have been achieved with a new sound processing strategy for multielectrode cochlear implants. A cochlear implant system consists of one or more implanted electrodes for direct electrical activation of the auditory nerve, an external speech processor that transforms a microphone input into stimuli for each electrode, and a transcutaneous (rf-link) or percutaneous (direct) connection between the processor and the electrodes. We report here the comparison of the new strategy and a standard clinical processor. The standard compressed analogue (CA) processor presented analogue waveforms simultaneously to all electrodes, whereas the new continuous interleaved sampling (CIS) strategy presented brief pulses to each electrode in a nonoverlapping sequence. Seven experienced implant users, selected for their excellent performance with the CA processor, participated as subjects. The new strategy produced large improvements in the scores of speech reception tests for all subjects. These results have important implications for the treatment of deafness and for minimal representations of speech at the auditory periphery.
The purpose of this pilot study was to investigate adult Ineraid and Nucleus cochlear implant (CI) users' perceptual accuracy for melodic and rhythmic patterns, and quality ratings for different musical instruments. Subjects were 18 postlingually deafened adults with CI experience. Evaluative measures included the Primary Measures of Music Audiation (PMMA) and a Musical Instrument Quality Rating. Performance scores on the PMMA were correlated with speech perception measures, music background, and subject characteristics. Results demonstrated a broad range of perceptual accuracy and quality ratings across subjects. On these measures, performance for temporal contrasts was better than for melodic contrasts independent of CI device. Trends in the patterns of correlations between speech and music perception suggest that particular structural elements of music are differentially accessible to cochlear implant users. Additionally, notable qualitative differences for ratings of musical instruments were observed between Nucleus and Ineraid users.
Direct electrical stimulation of the auditory nerve can be used to restore some degree of hearing to the profoundly deaf. Percepts due to electrical stimulation have characteristics corresponding approximately to the acoustic percepts of loudness, pitch, and timbre. To encode speech as a pattern of electrical stimulation, it is necessary to determine the effects of the stimulus parameters on these percepts. The effects of the three basic stimulus parameters of level, repetition rate, and stimulation location on subjects' percepts were examined. Pitch difference limens arising from changes in rate of stimulation increase as the stimulating rate increases, up to a saturation point of between 200 and 1000 pulses per second. Changes in pitch due to electrode selection depend upon the subject, but generally agree with a tonotopic organization of the human cochlea. Further, the discriminability of such place-pitch percepts seems to be dependent on the degree of current spread in the cochlea. The effect of stimulus level on perceived pitch is significant but is highly dependent on the individual tested. The results of these experiments are discussed in terms of their impact on speech-processing strategies and their relevance to acoustic pitch perception.
Nearly perfect speech recognition was observed under conditions of greatly reduced spectral information. Temporal envelopes of speech were extracted from broad frequency bands and were used to modulate noises of the same bandwidths. This manipulation preserved temporal envelope cues in each band but restricted the listener to severely degraded information on the distribution of spectral energy. The identification of consonants, vowels, and words in simple sentences improved markedly as the number of bands increased; high speech recognition performance was obtained with only three bands of modulated noise. Thus, the presentation of a dynamic temporal pattern in only a few broad spectral regions is sufficient for the recognition of speech.
The perception of musical pitch was investigated in postlinguistically deaf subjects with cochlear implants. Stimuli consisted of sequences of biphasic electrical pulse trains at rates which represented the tones of the equal-tempered musical scale, delivered at equalized comfortable loudness levels to selected single bipolar electrodes along the array of the Nucleus cochlear implant. Seventeen subjects correctly identified a mean of 44% of rhythmically intact familiar tunes, presented in an open-set paradigm. Three subjects were tested with a closed set of melodies without rhythmic cues. The results showed relatively higher recognition scores at lower pulse rates, although melody recognition remained possible up to rates of approximately 600-800 pulses per second. Stimulation of apical electrodes yielded higher recognition scores than of basal electrodes. The perception of musical intervals, defined as frequency ratios between two trains of stimulus pulse rates, was investigated in an interval intonation labeling experiment, for intervals ranging from a minor 3rd to a major 6th. Within a range of low pulse rates, subjects defined the intervals mediated by electrical pulse rate by the same ratios which govern musical intervals of tonal frequencies in normal-hearing listeners. It may be concluded that temporal cues are sufficient for the mediation of musical pitch, at least for the lower half of the range of fundamental frequencies commonly used in music.
Numerical estimations of pitch were obtained from nine postlinguistically deafened adults using the 22-electrode cochlear implant manufactured by Cochlear Pty. Limited. A series of electrodes on the array were stimulated using three modes of stimulation: Bipolar (BP), common ground (CG), and monopolar (MONO). In BP stimulation, an electric current was passed between two electrodes separated by one electrode for eight patients and two electrodes for one patient. In CG stimulation, a single electrode was activated and the other electrodes on the array were connected together to serve as the return path for the current. In MONO stimulation, an electric current was passed between a single electrode and the most basal electrode on the array. Pitch estimations were generally consistent with the tonotopic organization of the cochlea. There was a marked reversal in pitch for electrodes in the middle of the array using CG stimulation for three patients. A reduced range of pitch using MONO stimulation was recorded for patients where the most basal electrode was internal to the cochlea. There were also individual differences in pitch estimations between the three modes of stimulation for most patients. The current levels required to elicit threshold (T) and comfortable listening (C) levels were, in general, higher for BP stimulation than for CG stimulation and were lowest for MONO stimulation. For CG stimulation, there was a tendency for T and C levels to be higher for electrodes in the middle of the array than at the basal or apical ends. For MONO stimulation, T and C levels uniformly increased in an apical to basal direction for the majority of patients. There was no consistent pattern in T and C levels for BP stimulation. The size of the range of usable hearing using CG stimulation tended to be similar to that using BP stimulation and was usually higher than that using MONO stimulation.
Studies were undertaken to investigate the ability of a user of the Nucleus multi-electrode cochlear implant to judge pitch in the context of musical intervals. The subject had qualified as a musical instrument tuner before he received his implant, and was able to judge the intervals between electrical sensations with neither training nor the guidance of familiar melodies. The procedures used were interval estimation, and interval production by the method of adjustment. The pitch of the electrical stimulation was controlled by varying the pulse repetition rate, the active electrode position, or two combinations of these parameters. Further studies employed sinusoidally amplitude modulated pulse trains with varying modulation frequency. The results showed that rate or modulation frequency could convey musical pitch information over a limited range (approximately two octaves). The data were directly comparable with the relationship between musical intervals and frequency for normal hearing. The pitch related to electrode place varied in accordance with the tonotopic organization of the cochlea, and also appeared to be able to support musical intervals. When both place and rate varied together, the place-related pitch was generally dominant. In all cases, the judgement of intervals tended to diverge from their acoustic counterparts as the intervals became larger.
This study compares the musical perception of 17 adult recipients of the Nucleus cochlear implant using two different formant extraction processing strategies (F0F1F2 and MPEAK).
Over a 12 mo period, participants were alternately switched between two strategies every 3 mo. Performance was evaluated using three measures of rhythmic and sequential pitch perception.
Three individuals performed significantly better with the MPEAK strategy on one particular rhythm task, 11 participants performed better with the MPEAK strategy on another rhythm task, and no significant differences were found between the two strategies on a sequential pitch pattern task.
Neither strategy seems clearly superior for perception of either sequential pitch or rhythmic patterns.
Cochlear implants have been successful in restoring partial hearing to profoundly deaf people. The success of cochlear implants can be attributed to the combined efforts of scientists from various disciplines, including bioengineering, physiology, otolaryngology, speech science, and signal processing. Each of these disciplines contributed to various aspects of the cochlear implant design. Signal processing, in particular, played an important role in the development of different techniques for deriving electrical stimuli from the speech signal. The purpose of this article is to present a review of various signal-processing techniques that have been used for cochlear prosthesis over the past 25 years.
Eight adults with cochlear implants participated in experiments to test their ability to recognize music. Some subjects showed good ability to recognize songs that were sung with instrumental accompaniment but poor ability to recognize songs played on an electronic keyboard without verbal cues, indicating that they were recognizing the songs by verbal cues rather than by musical qualities such as tones and melodic intervals. This conclusion was strengthened by the finding that subjects were barely able to distinguish between songs with the same rhythm and pitch range, and they showed poor ability to discriminate musical intervals. (The closest discrimination was 4 semitones.) Subjects had good ability to distinguish among the synthesized sounds of various musical instruments played on the electronic keyboard. We speculate that subjects could distinguish the various musical instruments in the same way they distinguish among human voices using spectrographic patterns such as formants or maxima.
The purpose of this study was to compare postlingually deafened cochlear implant recipients and normal-hearing adults on timbre (tone quality) recognition and appraisal of 8 musical instruments representing 3 frequency ranges and 4 instrumental families. The implant recipients were significantly less accurate than the normal-hearing adults on timbre recognition. The implant recipients gave significantly poorer ratings than did the normal-hearing adults to those instruments played in the higher frequency range and to those from the string family. The timbre measures were weakly correlated with speech perception measures, but were significantly correlated with 3 cognitive measures of sequential processing.
A statistical pattern-recognition technique was applied to the classification of musical instrument tones within a taxonomic hierarchy. Perceptually salient acoustic features--- related to the physical properties of source excitation and resonance structure---were measured from the output of an auditory model (the log-lag correlogram) for 1023 isolated tones over the full pitch ranges of 15 orchestral instruments. The data set included examples from the string (bowed and plucked), woodwind (single, double, and air reed), and brass families. Using 70%/30% splits between training and test data, maximum a posteriori classifiers were constructed based on Gaussian models arrived at through Fisher multiplediscriminant analysis. The classifiers distinguished transient from continuant tones with approximately 99% correct performance. Instrument families were identified with approximately 90% performance, and individual instruments were identified with an overall success rate of appr...
On the Sensations of Tone as a Physiological Basis for the Theory of Music
Helmholtz, H. On the Sensations of Tone as a Physiological Basis for the Theory of Music. Trans. A.J. Ellis. 2 nd ed. New York: Dover, 1954.
House, William F. Cochlear Implants. Ann. Otol. 85 (1976): 2.