Sparse time-frequency representation

Center for Studies in Physics and Biology, The Rockefeller University, 1230 York Avenue, New York, NY 10021.
Proceedings of the National Academy of Sciences (Impact Factor: 9.67). 05/2006; 103(16):6094-9. DOI: 10.1073/pnas.0601707103
Source: PubMed


Auditory neurons preserve exquisite temporal information about sound features, but we do not know how the brain uses this information to process the rapidly changing sounds of the natural world. Simple arguments for effective use of temporal information led us to consider the reassignment class of time-frequency representations as a model of auditory processing. Reassigned time-frequency representations can track isolated simple signals with accuracy unlimited by the time-frequency uncertainty principle, but lack of a general theory has hampered their application to complex sounds. We describe the reassigned representations for white noise and show that even spectrally dense signals produce sparse reassignments: the representation collapses onto a thin set of lines arranged in a froth-like pattern. Preserving phase information allows reconstruction of the original signal. We define a notion of "consensus," based on stability of reassignment to time-scale changes, which produces sharp spectral estimates for a wide class of complex mixed signals. As the only currently known class of time-frequency representations that is always "in focus" this methodology has general utility in signal analysis. It may also help explain the remarkable acuity of auditory perception. Many details of complex sounds that are virtually undetectable in standard sonograms are readily perceptible and visible in reassignment.

Download full-text


Available from: Marcelo Osvaldo Magnasco
  • Source
    • "Although surprising, these results should not be unexpected . Human analysis of acoustic signals might be superior to the performance of mathematical methods for timefrequency analysis such as Fourier Transformation (Gardner et al., 2006; Oppenheim et al., 2013). Such human ''hyperacuity'' is particularly relevant in professional musicians but has been observed also in non-musicians. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We present here an innovative hypothesis and report preliminary evidence that the sound of NMR signals could provide an alternative to the current representation of the individual metabolic fingerprint and supply equally significant information. The NMR spectra of the urine samples provided by four healthy donors were converted into audio signals that were analyzed in two audio experiments by listeners with both musical and non-musical training. The listeners were first asked to cluster the audio signals of two donors on the basis of perceived similarity and then to classify unknown samples after having listened to a set of reference signals. In the clustering experiment, the probability of obtaining the same results by pure chance was 7.04% and 0.05% for non-musicians and musicians, respectively. In the classification experiment, musicians scored 84% accuracy which compared favorably with the 100% accuracy attained by sophisticated pattern recognition methods. The results were further validated and confirmed by analyzing the NMR metabolic profiles belonging to two other different donors. These findings support our hypothesis that the uniqueness of the metabolic phenotype is preserved even when reproduced as audio signal and warrants further consideration and testing in larger study samples.
    Full-text · Article · Mar 2015 · Omics A Journal of Integrative Biology
  • Source
    • "For one important component of the speech signal (the vowels), we propose such a functional element. It has been repeatedly suggested that the neural system uses sparse representations to code visual (Olshausen & Field, 1996) or auditory (Hahnloser, Kozhevnikov, & Fee, 2002; Gardner & Magnasco, 2006) information. We will show how a sparsity condition on the chosen speech representation triggers the extraction of salient speech features and how this can be used to decompose a sound mixture based on deterministic speech features. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The separation of mixed auditory signals into their sources is an eminent neuroscience and engineering challenge. We reveal the principles underlying a deterministic, neural network-like solution to this problem. This approach is orthogonal to ICA/PCA that views the signal constituents as independent realizations of random processes. We demonstrate exemplarily that in the absence of salient frequency modulations, the decomposition of speech signals into local cosine packets allows for a sparse, noise-robust speaker separation. As the main result, we present analytical limitations inherent in the approach, where we propose strategies of how to deal with this situation. Our results offer new perspectives toward efficient noise cleaning and auditory signal separation and provide a new perspective of how the brain might achieve these tasks.
    Full-text · Article · Jun 2011 · Neural Computation
  • Source
    • "Second, we could then try to reconstruct the receptive field as usual in auditory physiology, i.e., as the spike-triggered sonogram. The outcome of this calculation is shown in Fig 2, where one may see that this calculation only succeeded in reconstructing the analyzing wavelet of the sonogram itself, rather than displaying the sharp features shown in the reassigned spectrograms computed in Ref [18]. Furthermore it is to be noted that there is no inhibition in our model, yet the spike-triggered sonogram has a central " on " feature surrounded by an inhibitory " off " halo. "
    [Show abstract] [Hide abstract]
    ABSTRACT: It is widely acknowledged that detailed timing of action potentials is used to encode information, for example, in auditory pathways; however, the computational tools required to analyze encoding through timing are still in their infancy. We present a simple example of encoding, based on a recent model of time-frequency analysis, in which units fire action potentials when a certain condition is met, but the timing of the action potential depends also on other features of the stimulus. We show that, as a result, spike-triggered averages are smoothed so much that they do not represent the true features of the encoding. Inspired by this example, we present a simple method, differential reverse correlations, that can separate an analysis of what causes a neuron to spike, and what controls its timing. We analyze with this method the leaky integrate-and-fire neuron and show the method accurately reconstructs the model's kernel.
    Full-text · Article · Jul 2008 · Biosystems
Show more