The cocktail party problem.

Adaptive Systems Lab, McMaster University, Hamilton, Ontario, Canada L8S 4K1.
Neural Computation (Impact Factor: 1.69). 10/2005; 17(9):1875-902. DOI: 10.1162/0899766054322964
Source: PubMed

ABSTRACT This review presents an overview of a challenging problem in auditory perception, the cocktail party phenomenon, the delineation of which goes back to a classic paper by Cherry in 1953. In this review, we address the following issues: (1) human auditory scene analysis, which is a general process carried out by the auditory system of a human listener; (2) insight into auditory perception, which is derived from Marr's vision theory; (3) computational auditory scene analysis, which focuses on specific approaches aimed at solving the machine cocktail party problem; (4) active audition, the proposal for which is motivated by analogy with active vision, and (5) discussion of brain theory and independent component analysis, on the one hand, and correlative neural firing, on the other.

Download full-text


Available from: Zhe Chen, Aug 21, 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present the concept of an acoustic rake receiver (ARR)---a microphone beamformer that uses echoes to improve the noise and interference suppression. The rake idea is well-known in wireless communications. It involves constructively combining different multipath components that arrive at the receiver antennas. Unlike typical spread-spectrum signals used in wireless communications, speech signals are not orthogonal to their shifts, which makes acoustic raking a more challenging problem. That is why the correct way to think about it is spatial. Instead of explicitly estimating the channel, we create correspondences between early echoes in time and image sources in space. These multiple sources of the desired and interfering signals offer additional spatial diversity that we can exploit in the beamformer design. We present several "intuitive" and optimal formulations of ARRs, and show theoretically and numerically that the rake formulation of the maximum signal-to-interference-and-noise beamformer offers significant performance boosts in terms of noise suppression and interference cancellation. We accompany the paper by the complete simulation and processing chain written in Python. The code and the sound samples are available online at \url{}.
    IEEE Journal of Selected Topics in Signal Processing 07/2014; DOI:10.1109/JSTSP.2015.2415761 · 3.63 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In two experiments, we assessed the effects of combining different cues of concurrent sound segregation on the object-related negativity (ORN) and the P400 event-related potential components. Participants were presented with sequences of complex tones, half of which contained some manipulation: One or two harmonic partials were mistuned, delayed, or presented from a different location than the rest. In separate conditions, one, two, or three of these manipulations were combined. Participants watched a silent movie (passive listening) or reported after each tone whether they perceived one or two concurrent sounds (active listening). ORN was found in almost all conditions except for location difference alone during passive listening. Combining several cues or manipulating more than one partial consistently led to sub-additive effects on the ORN amplitude. These results support the view that ORN reflects a combined, feature-unspecific assessment of the auditory system regarding the contribution of two sources to the incoming sound. .[Type a quote from the document or the summary of an interesting point. You can position the text box anywhere in the document. Use the Drawing Tools tab to change the formatting of the pull quote text box.].
    Biological psychology 05/2014; 100. DOI:10.1016/j.biopsycho.2014.04.005 · 3.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: No other modality is more frequently represented in the prefrontal cortex than the auditory, but the role of auditory information in prefrontal functions is not well understood. Pathways from auditory association cortices reach distinct sites in the lateral, orbital, and medial surfaces of the prefrontal cortex in rhesus monkeys. Among prefrontal areas, frontopolar area 10 has the densest interconnections with auditory association areas, spanning a large antero-posterior extent of the superior temporal gyrus from the temporal pole to auditory parabelt and belt regions. Moreover, auditory pathways make up the largest component of the extrinsic connections of area 10, suggesting a special relationship with the auditory modality. Here we review anatomic evidence showing that frontopolar area 10 is indeed the main frontal "auditory field" as the major recipient of auditory input in the frontal lobe and chief source of output to auditory cortices. Area 10 is thought to be the functional node for the most complex cognitive tasks of multitasking and keeping track of information for future decisions. These patterns suggest that the auditory association links of area 10 are critical for complex cognition. The first part of this review focuses on the organization of prefrontal-auditory pathways at the level of the system and the synapse, with a particular emphasis on area 10. Then we explore ideas on how the elusive role of area 10 in complex cognition may be related to the specialized relationship with auditory association cortices.
    Frontiers in Neuroscience 04/2014; 8:77. DOI:10.3389/fnins.2014.00077