Sparse time-frequency representations.

Center for Studies in Physics and Biology, The Rockefeller University, 1230 York Avenue, New York, NY 10021.
Proceedings of the National Academy of Sciences (Impact Factor: 9.81). 05/2006; 103(16):6094-9. DOI: 10.1073/pnas.0601707103
Source: PubMed

ABSTRACT Auditory neurons preserve exquisite temporal information about sound features, but we do not know how the brain uses this information to process the rapidly changing sounds of the natural world. Simple arguments for effective use of temporal information led us to consider the reassignment class of time-frequency representations as a model of auditory processing. Reassigned time-frequency representations can track isolated simple signals with accuracy unlimited by the time-frequency uncertainty principle, but lack of a general theory has hampered their application to complex sounds. We describe the reassigned representations for white noise and show that even spectrally dense signals produce sparse reassignments: the representation collapses onto a thin set of lines arranged in a froth-like pattern. Preserving phase information allows reconstruction of the original signal. We define a notion of "consensus," based on stability of reassignment to time-scale changes, which produces sharp spectral estimates for a wide class of complex mixed signals. As the only currently known class of time-frequency representations that is always "in focus" this methodology has general utility in signal analysis. It may also help explain the remarkable acuity of auditory perception. Many details of complex sounds that are virtually undetectable in standard sonograms are readily perceptible and visible in reassignment.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Spectral decomposition, or local time-frequency analysis, tries to enhance the amount of information one can obtain from a seismic volume by finding the frequency content of the seismic data at each time sample. However, if a small amount of noise is present within the seismic amplitude volume, it has the potential to become more prominent in the spectrally decomposed data especially if high-resolution or sparsity promoting methods are utilized. To combat this problem post-processing noise removal has commonly been employed, but these techniques can potentially degrade the resolution of small-scale geological structures in their attempt to remove this noise. Rather than de-noising the spectrally decomposed data after they are generated, we propose to incorporate the ideas of f−x−y deconvolution within the spectral decomposition process to create an algorithm that has the ability to de-noise the time-frequency representation of the data as they are being generated. By incorporating the spatial prediction error filters that are utilized for f−x−y deconvolution with the spectral decomposition problem, a spatially smooth time-frequency representation that maintains its sparsity, or high-resolution characteristics, can be obtained. This spatially smooth high-resolution time-frequency representation is less likely to exhibit the random noise that was present in the more conventionally obtained time-frequency representation. Tests on a real data set demonstrate that by de-noising while the time-frequency representation is being constructed, small-scale geological structures are more likely to maintain their resolution since the de-noised time-frequency representation is specifically built to reconstruct the data.
    Geophysical Prospecting 06/2013; 61(s1). · 1.51 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this work, the overall perceived pitch (principal pitch) of pure tones modulated in frequency with an asymmetric waveform is studied. The dependence of the principal pitch on the degree of asymmetric modulation was obtained from a psychophysical experiment. The modulation waveform consisted of a flat portion of constant frequency and two linear segments forming a peak. Consistent with previous results, significant pitch shifts with respect to the time-averaged geometric mean were observed. The direction of the shifts was always toward the flat portion of the modulation. The results from the psychophysical experiment, along with those obtained from previously reported studies, were compared with the predictions of six models of pitch perception proposed in the literature. Even though no single model was able to predict accurately the perceived pitch for all experiments, there were two models that give robust predictions that are within the range of acceptable tuning of modulated tones for almost all the cases. Both models point to the existence of an underlying "stability sensitive" mechanism for the computation of pitch that gives more weight to the portion of the stimuli where the frequency is changing more slowly.
    The Journal of the Acoustical Society of America 03/2014; 135(3):1344. · 1.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents an alternative approach to speech enhancement by using compressed sensing (CS). CS is a new sampling theory, which states that sparse signals can be reconstructed from far fewer measurements than the Nyquist sampling. As such, CS can be exploited to reconstruct only the sparse components (e.g., speech) from the mixture of sparse and non-sparse components (e.g., noise). This is possible because in a time-frequency representation, speech signal is sparse whilst most noise is non-sparse. Derivation shows that on average the signal to noise ratio (SNR) in the compressed domain is greater or equal than the uncompressed domain. Experimental results concur with the derivation and the proposed CS scheme achieves better or similar perceptual evaluation of speech quality (PESQ) scores and segmental SNR compared to other conventional methods in a wide range of input SNR.
    Speech Communication 07/2013; 55(6):757–768. · 1.55 Impact Factor

Full-text (2 Sources)

Available from
May 27, 2014

Marcelo Osvaldo Magnasco