Book

DAFX: Digital Audio Effects: Second Edition

Authors:

Abstract

The rapid development in various fields of Digital Audio Effects, or DAFX, has led to new algorithms and this second edition of the popular book, DAFX: Digital Audio Effects has been updated throughout to reflect progress in the field. It maintains a unique approach to DAFX with a lecture-style introduction into the basics of effect processing. Each effect description begins with the presentation of the physical and acoustical phenomena, an explanation of the signal processing techniques to achieve the effect, followed by a discussion of musical applications and the control of effect parameters. Topics covered include: filters and delays, modulators and demodulators, nonlinear processing, spatial effects, time-segment processing, time-frequency processing, source-filter processing, spectral processing, time and frequency warping musical signals. Updates to the second edition include: • Three completely new chapters devoted to the major research areas of: Virtual Analog Effects, Automatic Mixing and Sound Source Separation, authored by leading researchers in the field. • Improved presentation of the basic concepts and explanation of the related technology. • Extended coverage of the MATLABTM scripts which demonstrate the implementation of the basic concepts into software programs. • Companion website (www.dafx.de) which serves as the download source for MATLABTM scripts, will be updated to reflect the new material in the book. Discussing DAFX from both an introductory and advanced level, the book systematically introduces the reader to digital signal processing concepts, how they can be applied to sound and their use in musical effects. This makes the book suitable for a range of professionals including those working in audio engineering, as well as researchers and engineers involved in the area of digital signal processing along with students on multimedia related courses.
... Some examples of time-frequency signal processing are multipliers [25,27,26,1,33,2], where each time-frequency component is multiplied by a scalar that depends on the time and frequency of the atom (with applications, for example, in audio analysis [3] and improving signal to noise [23]), signal denoising e.g. wavelet shrinkage denoising [11,10] and Shearlet denoising [16], where the coefficient of each time-frequency atom is transformed by some non-linear scalar mapping, and phase vocoder [30,7,35,20], where each time-frequency atom is mapped to a different time-frequency atom, and the coefficients undergo some non-linear transformation. ...
... It is evident from this description that the signal is time-dilated by dilating the position of the timefrequency atoms, without dilating their frequency. The intensities of the atoms are retained, but their phases are modified so that the oscillations of neighboring atoms have compatible phases, avoiding destructive interference (See for example [35] or [21] for the explanation of the phase correction nonlinearity κ). ...
... In the proof of Theorem 13, the bound (33) is derived from (28) and (35). In a signal processing methods that transform V f [s M ] to a function S that preserves the bound (35), namely, ...
Article
Full-text available
We study signal processing tasks in which the signal is mapped via some generalized time-frequency transform to a higher dimensional time-frequency space, processed there, and synthesized to an output signal. We show how to approximate such methods using a quasi-Monte Carlo (QMC) approach. We consider cases where the time-frequency representation is redundant, having feature axes in addition to the time and frequency axes. The proposed QMC method allows sampling both efficiently and evenly such redundant time-frequency representations. Indeed, 1) the number of samples required for a certain accuracy is log-linear in the resolution of the signal space, and depends only weakly on the dimension of the redundant time-frequency space, and 2) the quasi-random samples have low discrepancy, so they are spread evenly in the redundant time-frequency space. One example of such redundant representation is the localizing time-frequency transform (LTFT), where the time-frequency plane is enhanced by a third axis. This higher dimensional time-frequency space improves the quality of some time-frequency signal processing tasks, like the phase vocoder (an audio signal processing effect). Since the computational complexity of the QMC is log-linear in the resolution of the signal space, this higher dimensional time-frequency space does not degrade the computation complexity of the proposed QMC method. The proposed QMC method is more efficient than standard Monte Carlo methods, since the deterministic QMC sample points are optimally spread in the time-frequency space, while random samples are not.
... Time-frequency signal processing is any method that decomposes a signal to its time-frequency components, manipulates these components, and recombines/synthesizes the resulting atoms to an output time signal. Some examples of time-frequency signal processing are multipliers [18,19], where each time-frequency component is multiplied by a scalar that depends on the time and frequency of the atom (with applications, for example, in audio analysis [1] and improving signal to noise [16]), signal denoising e.g wavelet shrinkage denoising [6,5], where the coefficient of each time-frequency atom is transformed by some non-linear scalar mapping, and phase vocoder [21,3,23,13], where each time-frequency atom is mapped to a different time-frequency atom, and the coefficients undergo some non-linear transformation. ...
... It is evident from this description that the signal is time-dilated by dilating the position of the timefrequency atoms, without dilating their frequency. The intensities of the atoms are retained, but their phases are modified so that the oscillations of neighboring atoms have compatible phases, avoiding destructive interference (See for example [23] or [14] for the explanation of the phase correction nonlinearity κ). ...
Preprint
Full-text available
We study signal processing tasks in which the signal is mapped via some generalized time-frequency transform to a higher dimensional time-frequency space, processed there, and synthesized to an output signal. We show how to approximate such methods using a quasi-Monte Carlo (QMC) approach. The QMC method speeds up computations, since the number of samples required for a certain accuracy is log-linear in the resolution of the signal space, and depends only weakly on the resolution of the time-frequency space, which is typically higher. We focus on signal processing based on the localizing time-frequency transform (LTFT). In the LTFT, the time-frequency plane is enhanced by adding a third axis. This higher dimensional time-frequency space improves the quality of some time-frequency signal processing tasks, like phase vocoder (an audio signal processing effect). Since the computational complexity of the QMC is log-linear in the resolution of the signal space, this higher dimensional time-frequency space does not degrade the computation complexity of the QMC method. This is in contrast to more standard grid based discretization methods, that increase exponentially in the dimension of the time-frequency space. The QMC method is also more efficient than standard Monte Carlo methods, since the deterministic QMC sample points are optimally spread in the time-frequency space, while random samples are not.
... Signal processing tasks of this form are used in a multitude of applications, including multipliers [25,26,2,32,3] (with applications, for example, in audio analysis [4] and increasing signal-to-noise ratio [24]), signal denoising e.g. wavelet shrinkage denoising [10,9] and Shearlet denoising [18], and phase vocoder [29,6,35,21,23,11,27,30]. The frame synthesis operator V * f is computed by integrating over G, with the formula ...
... 2. Equation (35) assures that the energy of q is well spread on the interval [−1/2, 1/2]. Indeed, no small subset of [−1/2, 1/2] can contain most of the energy of q, otherwise q ∞ would be significantly larger than q 2 . ...
Preprint
Full-text available
Recently, a Monte Carlo approach was proposed for speeding up signal processing pipelines based on continuous frames. In this paper we present and analyze applications of this new theory. The computational complexity of the Monte Carlo method relies on the continuous frame being so called linear volume discretizable (LVD). The LVD property means that the number of samples in the coefficient space required by the Monte Carlo method is proportional to the resolution of the discrete signal. We show in this paper that the continuous wavelet transform (CWT) and the localizing time-frequency transform (LTFT) are LVD. The LTFT is a time-frequency representation based on a 3D time-frequency space with a richer class of time-frequency atoms than classical time-frequency transforms like the short time Fourier transform (STFT) and the CWT. Our analysis proves that performing signal processing with the LTFT is as efficient as signal processing with the STFT and CWT (based on FFT), even though the coefficient space of the LTFT is higher dimensional.
... Na Figura 3 a seguir, há um breve resumo do relacionamento entre os conceitos. Em formato digital, as operações de processamento de sinal são aplicadas na forma de onda de áudio em sua representação discreta, geralmente como uma stream binária (ZOELZER, 2011). O áudio manipulado pode ser obtido por um conversor analógico-digital (ADC) em tempo real, em um áudio previamente convertido ou ainda em um sinal de áudio gerado eletronicamente (áudio sintetizado). ...
Preprint
Full-text available
Music actors operate several tools to achieve high-quality musical artifacts. One of the typically used tools is the digital multi-effects processor, which applies a chain of non-linear transformations (effects) to audio signals. However, the large number of available configurations implies that better use of such equipment is restricted to specialists. In this context, intelligent systems for music production aim to propose automatic tools to ease and support decision making by music actors. In this work, we tackle the problem of sequence-dependent audio plugin (effect implementation) recommendation from data. In particular, we are interested in three settings: i) recommendations under partially observed plugin sequences; ii) determining the best ordering of plugins in a sequence; iii) full recommendations of plugin sequences. For these tasks, we employ two machine learning paradigms: supervised learning and collaborative filtering. In the first, we apply standard classifiers to predict plugins/classes in a fixed position within a sequence given the remaining sequence elements/plugins. We evaluate Multilayer Perceptron, Logistic Regression, and Support Vector Machines. On the other hand, models for collaborative filtering seek to perform automatic predictions based on similar sequences from the available data. We evaluate the k-nearest neighbors method, and the Restricted Boltzmann Machine for Collaborative Filtering (RBM-CF) model. All models are trained using data collected from online repositories. Based on the experiments, supervised learning approaches outperform those of collaborative filtering for the task of recommendations under partial sequence information. However, the supervised learning paradigm does not scale with sequence size as it requires training a large number of models. Given its generative capability, only RBM-CF can be applied to full-sequence recommendations, and to determining plugin orderings.
Article
GuiaRT is an interactive musical setup based on a nylon-string guitar equipped with hexaphonic piezoelectric pickups. It consists of a modular set of real-time tools for the symbolic transcription, variation, and triggering of selected segments during a performance, as well as some audio processing capabilities. Its development relied on an iterative approach, with distinct phases dedicated to feature extraction, transcriptions, and creative use. This article covers the motivations for this augmented instrument and several details of its implementation, including the hardware and strategies for identifying the most typical types of sound produced on a nylon-string guitar, as well as tools for symbolic musical transformations. This acoustic–digital interface was primarily designed for interactive exploration, and it has also been effectively used in performance analyses and as a pedagogical tool.
Chapter
ModernSeeAlsoSeeAlsoDigital audio effects processing and circuit technology has made available a number of methods for processing the acoustic signal covering various requirements. Among the different methods, the term effect generally refers to the processing of an existing sound in order to make it more suggestive.
Article
This paper presents a design method for frequency response masking (FRM)-based nonuniform filter bank with reduced effective wordlength. Instead of designing prototype filters separately, we propose to model the filter bank design problem as a nonlinear programming problem and jointly design the prototype filters in the FRM structure. Moreover, in this work, all the filters are designed directly in the finite wordlength space, such that they can be directly implemented using efficient fixed-point arithmetic without any quantization. By simultaneously considering all the subband constraints during optimization, the proposed method is able to guarantee the desirable stopband attenuation for all subbands and achieve optimal effective wordlength (EWL) for the coefficients, reducing the hardware complexity for very large-scale integration (VLSI) implementation. Experimental results show that the hardware complexity of filter banks can be reduced without sacrificing the audiogram compensation performance for hearing aid applications.
Article
Full-text available
Guitar effects are commonly used in popular music to shape the guitar sound to fit specific genres, or to create more variety within musical compositions. The sound not only is determined by the choice of the guitar effect, but also heavily depends on the parameter settings of the effect. Previous research focused on the classification of guitar effects and extraction of their parameter settings from solo guitar audio recordings. However, more realistic is the classification and extraction from instrument mixes. This work investigates the use of convolution neural networks (CNNs) for the classification and parameter extraction of guitar effects from audio samples containing guitar, bass, keyboard, and drums. The CNN was compared to baseline methods previously proposed, like support vector machines and shallow neural networks together with predesigned features. On two datasets, the CNN achieved classification accuracies $$1-5\,\%$$ 1 - 5 % above the baseline accuracy, achieving up to $$97.4\, \%$$ 97.4 % accuracy. With parameter values between 0.0 and 1.0, mean absolute parameter extraction errors of below 0.016 for the distortion, below 0.052 for the tremolo, and below 0.038 for the slapback delay effect were achieved, matching or surpassing the presumed human expert error of 0.05. The CNN approach was found to generalize to further effects, achieving mean absolute parameter extraction errors below 0.05 for the chorus, phaser, reverb, and overdrive effect. For sequentially applied combinations of distortion, tremolo, and slapback delay, the mean extraction error slightly increased from the performance for the single effects to the range of 0.05 to 0.1. The CNN was found to be moderately robust to noise and pitch changes of the background instrumentation suggesting that the CNN extracted meaningful features.
Chapter
Full-text available
The development of Virtual Reality (VR) systems and multimodal simulations presents possibilities in spatial-music mixing, be it in virtual spaces, for ensembles and orchestral compositions or for surround sound in film and music. Traditionally, user interfaces for mixing music have employed the channel-strip metaphor for controlling volume, panning and other audio effects that are aspects that also have grown into the culture of mixing music spatially. Simulated rooms and two-dimensional panning systems are simply implemented on computer screens to facilitate the placement of sound sources within space. In this chapter, we present design aspects for mixing in VR, investigating already existing virtual music mixing products and creating a framework from which a virtual spatial-music mixing tool can be implemented. Finally, the tool will be tested against a similar computer version to examine whether or not the sensory benefits and palpable spatial proportions of a VE can improve the process of mixing 3D sound.
Chapter
This work is focused on the binaural spatialization, presenting an analysis of the most common solutions to understand and classify their advantages and drawbacks and to find the one that results in a better virtual/augmented experience on the basis of subjective tests. This work is a preliminary step toward the implementation of an augmented reality system for cultural heritage enjoyment exploiting spatial audio through low-cost devices. Two different models are implemented in order to avoid the use of non individualized HRTFs and results show promising opportunities that must be further exploited.
Chapter
In the present contribution, we proposed a novel approach for digital amplifier simulation and signal modulation in audio engineering applications. To this end, input and output signals are superimposed to produce a Lissajous curve, which contains nonlinear effects from amplification and further phase shifts from linear equalization filters. Here, the challenge is to identify the nonlinearity. From experimental data, a time sequence was used for the analysis that formed a closed Lissajous figure, which provided the spectrum together with the phase shifts of the output signal by fitting the data. A representation for the nonlinear transfer function was obtained upon setting the phase shifts to zero in the reconstructed output signal upon using the developed analytical expressions for the amplitude and phase shifts. For three considered nonlinear transfer functions, the method showed a fairly good similarity in comparison to the original output signals. The developments were further applied to a set of different input frequencies in order to analyze this influence on the quality of profiling the nonlinearity. Input frequencies in the range where the linear filters had only small effects provided acceptable results for the nonlinear transfer function. Finally, possible future developments and improvements of the proposed approach are presented.
Conference Paper
Peak reduction is a common step used in audio playback chains to increase the loudness of a sound. The distortion introduced by a conventional nonlinear compressor can be avoided with the use of an allpass filter, which provides peak reduction by acting on the signal phase. This way, the signal energy around a waveform peak can be smeared while maintaining the total energy of the signal. In this paper, a new technique for linear peak amplitude reduction is proposed based on a Schroeder allpass filter, whose delay line and gain parameters are synced to match peaks of the signal’s auto-correlation function. The proposed method is compared with a previous search method and is shown to be often superior. An evaluation conducted over a variety of test signals indicates that the achieved peak reduction spans from 0 to 5 dB depending on the input waveform. The proposed method is widely applicable to real-time sound reproduction with a minimal computational processing budget.
Article
Full-text available
Recently, a Monte Carlo approach was proposed for processing highly redundant continuous frames. In this paper, we present and analyze applications of this new theory. The computational complexity of the Monte Carlo method relies on the continuous frame being so-called linear volume discretizable (LVD). The LVD property means that the number of samples in the coefficient space required by the Monte Carlo method is proportional to the resolution of the discrete signal. We show in this paper that the continuous wavelet transform (CWT) and the localizing time-frequency transform (LTFT) are LVD. The LTFT is a time-frequency representation based on a 3D time-frequency space with a richer class of time-frequency atoms than classical time-frequency transforms like the short time Fourier transform (STFT) and the CWT. Our analysis proves that performing signal processing with the LTFT has the same asymptotic complexity as signal processing with the STFT and CWT (based on FFT), even though the coefficient space of the LTFT is higher dimensional.
Chapter
Guitar effects are commonly used in popular music to shape the guitar sound to fit specific genres or to create more variety within musical compositions. The sound is not only determined by the choice of the guitar effect, but also heavily depends on the parameter settings of the effect. Previous research focused on the classification of guitar effects and extraction of their parameter settings from solo guitar audio recordings. However, more realistic is the classification and extraction from instrument mixes. This work investigates the use of convolution neural networks (CNNs) for classification and extraction of guitar effects from audio samples containing guitar, bass, keyboard and drums. The CNN was compared to baseline methods previously proposed like support vector machines and shallow neural networks together with predesigned features. The CNN outperformed all baselines, achieving a classification accuracy of up to 97.4% and a mean absolute parameter extraction error of below 0.016 for the distortion, below 0.052 for the tremolo and below 0.038 for the slapback delay effect achieving or surpassing the presumed human expert error of 0.05.KeywordsConvolutional neural networksGuitar effectsParameter extractionMusic information retrieval
Article
This paper describes a novel Deep Learning method for the design of IIR parametric filters for automatic multipoint audio equalization, that is the task of improving the sound quality of a listening environment at multiple listening points employing multiple loudspeakers. The filters are designed to approximate the inverse of the RIR and achieve almost flat magnitude response. A simple and effective neural architecture, named BiasNet, is proposed to determine the IIR equalizer parameters. This novel architecture is conceived for optimization and, as such, is able to produce optimal IIR equalizer parameters at its output, after training, with no input required. In absence of input, the presence of learnable non-zero bias terms ensures that the network works properly. An output scaling method is used to obtain accurate tuning of the IIR filters center frequency, quality factor and gain. All layers involved in the proposed method are shown to be differentiable, allowing backpropagation to optimize the network weights and achieve, after a number of training iterations, the optimal output according to a given RIR. The parameters are optimized with respect to a loss function based on a spectral distance between the measured and desired magnitude response, and a regularization term is used to keep the same microphone-loudspeaker energy balance after equalization. Two experimental scenarios are employed, a room and a car cabin, with several loudspeakers. The performance of the proposed method improves over the baseline techniques and achieves an almost flat band at a lower computational cost.
Article
Full-text available
This paper focuses on signal processing tasks in which the signal is transformed from the signal space to a higher dimensional coefficient space (also called phase space) using a continuous frame, processed in the coefficient space, and synthesized to an output signal. We show how to approximate such methods, termed phase space signal processing methods, using a Monte Carlo method. As opposed to standard discretizations of continuous frames, based on sampling discrete frames from the continuous system, the proposed Monte Carlo method is directly a quadrature approximation of the continuous frame. We show that the Monte Carlo method allows working with highly redundant continuous frames, since the number of samples required for a certain accuracy is proportional to the dimension of the signal space, and not to the dimension of the phase space. Moreover, even though the continuous frame is highly redundant, the Monte Carlo samples are spread uniformly, and hence represent the coefficient space more faithfully than standard frame discretizations.
Article
A Digital Audio Workstation (DAW) is a hardware and/or software device aiming to ease those operations required for music production, such as arranging, recording, editing, mixing, and, more in general, modifying sounds creatively. A peculiarity of a DAW environment is that most of the work is highly parallelizable, since the basic architecture of a DAW consists in the simultaneous processing of different audio tracks, mainly independent from each other. In order to exploit such a feature, this paper proposes an interface that lets the DAW interact with the Graphics Processing Unit (GPU) in a standardized way. Despite some academic research and experimentation, the professional audio software industry almost never exploited GPUs when implementing entire DAWs, but only when realising very specific tools or third party extensions (plugins). This work also presents and discusses the outcomes of a number of tests conducted in order to choose the optimal architecture. As a result, a GPU-based approach turned to be a valid alternative to the use of CPUs in the computation of audio effects, such as the rendering of audio tracks after mixing and mastering operations, both in real time and offline.
Article
Full-text available
Voice activity detection (VAD) aims for detecting the presence of speech in a given input signal, and is often the first step in voice -based applications such as speech communication systems. In the context of personal devices, own voice detection (OVD) is a sub-task of VAD, since it targets speech detection of the person wearing the device, while ignoring other speakers in the presence of interference signals. This article first summarizes recent single and multi-microphone, multi-sensor, and hearing aids related VAD techniques. Then, a wearable in-ear device equipped with multiple microphones and an accelerometer is investigated for the OVD task using a neural network with input embedding and long short-term memory (LSTM) layers. The device picks up the user’s speech signal through air as well as vibrations through the body. However, besides external sounds the device is sensitive to user’s own non-speech vocal noises (e.g. coughing, yawning, etc.) and movement noise caused by physical activities. A signal mixing model is proposed to produce databases of noisy observations used for training and testing the frame-by-frame OVD method. The best model’s performance is further studied in the presence of different recorded interference. An ablation study reports the model’s performance on sub-sets of sensors. The results show that the OVD approach is robust towards both user motion and user generated vocal non-speech sounds in the presence of loud external interference. The approach is suitable for real-time operation and achieves 90-96 % OVD accuracy in challenging use scenarios with a short 10 ms processing frame length.
Conference Paper
Full-text available
Nonlinear digital circuits and waveshaping are active areas of study, specifically for what concerns numerical and aliasing issues. In the past, an effective method was proposed to discretize nonlinear static functions with reduced aliasing based on the antiderivative of the nonlinear function. Such a method is based on the continuous-time convolution with an FIR antialiasing filter kernel, such as a rectangular kernel. These kernels, however, are far from optimal for the reduction of aliasing. In this paper we introduce the use of arbitrary IIR rational transfer functions that allow a closer approximation of the ideal antialiasing filter, required in the fictitious continuous-time domain before sampling the nonlinear function output. These allow a higher degree of aliasing reduction and can be flexibly adjusted to balance performance and computational cost.
Conference Paper
Full-text available
Decomposition of sounds into their sinusoidal, transient, and noise components is an active research topic and a widely-used tool in audio processing. Multiple solutions have been proposed in recent years, using time-frequency representations to identify either horizontal and vertical structures or orientations and anisotropy in the spectrogram of the sound. In this paper, we present SiTraNo: an easy-to-use MATLAB application with a graphic user interface for audio decomposition that enables visualization and access to the sinusoidal, transient, and noise classes, individually. This application allows the user to choose between different well-known separation methods to analyze an input sound file, to instantaneously control and remix its spectral components, and to visually check the quality of the separation, before producing the desired output file. The visualization of common artifacts, such as birdies and dropouts, is demonstrated. This application promotes experimenting with the sound decomposition process by observing the effect of variations for each spectral component on the original sound and by comparing different methods against each other, evaluating the separation quality both audibly and visually. SiTraNo and its source code are available on a companion website and repository.
Article
Full-text available
Ultrasonic sources are inaudible to humans, and while digital signal processing techniques are available to bring ultrasonic signals into the audible range, there are currently no systems which also simultaneously permit the listener to localise the sources through spatial hearing. Therefore, we describe a method whereby an in-situ listener with normal binaural hearing can localise ultrasonic sources in real-time; opening-up new applications, such as the monitoring of certain forms of wild life in their habitats and man-made systems. In this work, an array of ultrasonic microphones is mounted to headphones, and the spatial parameters of the ultrasonic sound-field are extracted. A pitch-shifted signal is then rendered to the headphones with spatial properties dictated by the estimated parameters. The processing provides the listener with the spatial cues that would normally occur if the acoustic wave produced by the source were to arrive at the listener having already been pitch-shifted. The results show that the localisation accuracy delivered by the proof-of-concept device implemented here is almost as good as with audible sources, as tested both in the laboratory and under conditions in the field.
Article
A linear-in-the-parameters nonlinear filter consists of a functional expansion block, which expands the input signal to a higher dimensional space nonlinearly, followed by an adaptive weight network. The number of weights to be updated depends on the type and order of the functional expansion used. When applied to a nonlinear system identification task, as the degree of the nonlinearity of the system is usually not known a priori , linear-in-the-parameters nonlinear filters are required to update a large number of coefficients to effectively model the nonlinear system. However, all the weights of the nonlinear filter may not contribute significantly to the identified model. We show via simulation experiments that, the weight vector of a linear-in-the-parameters nonlinear filter usually exhibits a low-rank nature. To take advantage of this observation, this paper proposes a class of linear-in-the-parameters nonlinear filters based on the nearest Kronecker product decomposition. The performance of the proposed filters is superior in terms of convergence behaviour as well as tracking ability in comparison to their traditional linear-in-the-parameters nonlinear filter counterparts, when tested for nonlinear system identification. Furthermore, the proposed nearest Kronecker product decomposition-based linear-in-the-parameters nonlinear filters has been shown to provide improved noise mitigation capabilities in a nonlinear active noise control scenario.
Article
Full-text available
The digital implementation of a nonlinear audio circuit often employs the Newton-Raphson (NR) method for solving the corresponding system of implicit ordinary differential equations in the discrete-time domain. Although its quadratic convergence speed makes NR attractive for real-time audio applications, quadratic convergence is not always guaranteed, since it depends on initial conditions, and also divergence might occur. For this reason, especially in the context of Virtual Analog modeling, techniques for increasing the robustness of NR are in order. Among the various approaches, the Wave Digital (WD) formalism recently showed potential to rethink traditional circuit simulation methods. In this manuscript, we discuss an original formulation of the NR method in the WD domain for the solution of audio circuits with multiple one-port nonlinearities. We provide an in-depth theoretical analysis of the proposed iterative method and we show how its quadratic convergence strongly depends on the free parameters (called port resistances) introduced when modeling the reference circuit in the WD domain. In particular, we demonstrate that the size of the basin where the WD NR solver can be initialized to converge on a solution with quadratic speed is a function of the free parameters. We also show that by setting each port resistance value as close as possible to the derivative w.r.t. current of the nonlinear element vi characteristic we keep the basin size large. We finally implement an audio ring modulator circuit with four diodes in order to test the proposed iterative method.
Article
This chapter discusses the needs and challenges faced in sound reproduction. A wide variety of applications in which sound needs to be reproduced such as: public address, full‐duplex speech communication, audio content production, broadcasting, computer games, virtual reality, accurate reproduction of sound, enhancement of acoustics and active noise cancellation, and aided hearing. Sound can be made audible over different loudspeaker set‐ups, ranging from a monophonic setup to multi‐channel systems and headphones. The best‐known systems are discussed and the most common recording techniques are described. Virtual source positioning aims to control only the perceived direction of virtual sources, although sometimes the distance and the spatial width of sources may also be controlled. The basic binaural recording technique is to reproduce a recorded binaural sound track through headphones. Digital audio effects are systems that modify audio signals fed at their inputs according to set control parameters and make the modified signal available at their outputs.
Article
Sound radiation of most natural sources, like human speakers or musical instruments, typically exhibits a spatial directivity pattern. This directivity contributes to the perception of sound sources in rooms, affecting the spatial energy distribution of early reflections and late diffuse reverberation. Thus, for convincing sound field reproduction and acoustics simulation, source directivity has to be considered. Whereas perceptual effects of directivity, such as source-orientation-dependent coloration, appear relevant for the direct sound and individual early reflections, it is unclear how spectral and spatial cues interact for later reflections. Better knowledge of the perceptual relevance of source orientation cues might help to simplify the acoustics simulation. Here, it is assessed as to what extent directivity of a human speaker should be simulated for early reflections and diffuse reverberation. The computationally efficient hybrid approach to simulate and auralize binaural room impulse responses [Wendt et al., J. Audio Eng. Soc. 62, 11 (2014)] was extended to simulate source directivity. Two psychoacoustic experiments assessed the listeners' ability to distinguish between different virtual source orientations when the frequency-dependent spatial directivity pattern of the source was approximated by a direction-independent average filter for different higher reflection orders. The results indicate that it is sufficient to simulate effects of source directivity in the first-order reflections.
Chapter
Brain-controlled wheelchair is an assisting device for patients with motor disabilities controlled by brain waves. The user convenience and safety of the brain-controlled wheelchair development using EMG are focused. Patients with disabilities who are still able to move his fingers can control the brain-controlled wheelchair with a finger. This paper discusses the design and implementation of signal processing using artificial neural network for classification of motion command brain-controlled wheelchair. The signal processing is divided into three parts, namely preprocessing, feature extraction, and classification. Preprocessing stage using digital filter, FIR bandpass filter 10–500 Hz, and notch filter at 50 Hz to eliminate noise. The preprocessing proceeds at the characteristic extraction stage in the form of RMS, MAX, VAR, SD, and MAV. The value of the feature will be calculated using the artificial neural network to generate the command such as forward, turn right, turn left, or stop.
Chapter
In the film, the sounds related to characters’ actions, such as clothing friction and footsteps, are difficult to be directly collected through recording and add manual sound effects requires a lot of time and energy. Aiming at the sound of footstep, which is highly related to human motion, this paper proposes a method to generate the sound of footsteps. First, the method of pedestrian gait detection is used to extract the landing frames, and then the relative parameters of footfall are obtained by the alignment of sampling rate. Finally, through these parameters, the sound effect of footsteps synchronized with the pedestrian gait picture is synthesized to obtain reasonable audio and video. In addition, we tested on UCF-ARG and HMDB datasets, and solved the problems of different directions, different speeds, and half length in pedestrian video by feature selection and fusion, show the effectiveness of our proposed method.
Chapter
This paper addresses the description of musical note attacks considering the influence of the reverberation. It is well known that attacks have an essential role in music performance. By manipulating note attack quality, musicians are able to control timbre, articulation, and rhythm, which are essential parameters for conveying their expressive intentions. Including information about the interaction with room acoustics enriches the study of musical performances in everyday practice conditions where reverberant environments are always present. Spectral Modeling decomposition was applied to evaluate independently, three components along the attack: (i) the harmonics of the note being played, (ii) the harmonics of the reverberation, (iii) the residue energy. The description proposal covers two stages: A 2D confrontation of the energy from the extracted components, and a profile representing the first nine harmonics’ structure. We tested the approach in a case study using recordings of an excerpt from a clarinet piece from the traditional classical repertoire, played by six professional musicians. MANOVA tests indicated significant differences (p < 0.05) when considering the musician as a factor for the 2D confrontation. Linear Discriminant Analysis applied for supervised dimensionality reduction of the harmonic profile data also indicated group separation to the same factor. We examined different legato, as well as articulated note transition presenting different performance technique demands.
Chapter
In this chapter we learn basics of modern audio coding. First, we become familiar with fundamentals of human hearing system and frequency masking effect. We do some computer experiments illustrating signal-to-mask-ratio (SMR) calculation. Next, we learn filter bank used in MP2 audio coder for signal splitting into 32 sub-signals containing only samples lying in 32 different frequency sub-bands. We see that quantization of sub-band signals offers better signal-to-noise ratio than quantization of the original full-band signal. We test near-perfect reconstruction of original signal from sub-band samples. We will see how this filter bank is built: the prototype filter of the MPEG-audio standard is modulated by 32 cosines. Next, we learn about extension of the MP2 filter bank to an adaptive MP3 filter bank having up to 576 sub-bands. Finally, we learn signal sub-band decomposition used in advanced audio-coding standard (AAC). We become familiar with some AAC tricks and tips. As a complete example, an MP2 audio encoder and decoder Matlab program is presented.
Conference Paper
Full-text available
In this paper, experimental evaluation of speech intelligibility in small (177 m 3) and medium-sized (270 m 3) classrooms is presented. These studies are based on the assumption that the harmful effect of noise is negligible compared to that for reverberation. Speech intelligibility was computed from pre-measured room impulse response by a modulation and formant-modulation methods. It is shown that the speech intelligibility in a room significantly depends not only on the distance of the listener to the speaker, but also on the distance from reflective surfaces (walls), as well as on the orientation of the listener's head relative to these surfaces. In particular, it was found that the speech intelligibility was significantly higher at the distance of 30-50 cm from the rear wall than in the middle of the room in both studied rooms. At the same time, speech intelligibility at the distance of 1-2 m from any sidewall was lower than that near the rear wall. Moreover, for places near sidewalls, speech intelligibility was lower for the ear closer to a wall.
Article
Full-text available
Introduction: Surrounding spherical loudspeaker arrays facilitate the application of various spatial audio reproduction methods and can be used for a broad range of acoustic measurements and perceptual evaluations. Methods: Installed in an anechoic chamber, the design and implementation of such an array of 68 coaxial loudspeakers, sampling a spherical cap with a radius of 1.35 m on an equal-area grid, is presented. A network-based audio backbone enables low-latency signal transmission with low-noise amplifiers providing a high signal-to-noise ratio. To address batch-to-batch variations, the loudspeaker transfer functions were equalised by individually designed 512-taps finite impulse response filters. Time delays and corresponding level adjustments further helped to minimise radial mounting imperfections. Results: The equalised loudspeaker transfer functions measured under ideal conditions and when mounted, their directivity patterns, and in-situ background noise levels satisfy key criteria towards applicability. Advantages and shortcomings of the selected decoders for panning-based techniques, as well as the influence of loudspeaker positioning errors, are analysed in terms of simulated performance metrics. An evaluation of the achievable channel separation allows deriving recommendations of feasible subset layouts for loudspeaker-based binaural reproduction. Conclusion: The combination of electroacoustic properties, simulated sound field synthesis performance and measured channel separation classifies the system as suitable for its target applications.
Article
Obtaining accurate measures of the turbocharger rotational speed is a key task to achieve good powertrain control performance in turbocharged combustion engines. However, direct access to the rotating parts of a turbocharger requires expensive sensors that present long-term reliability issues. In view of this, this article focuses on the design of measurement architectures for the estimation of the turbocharger shaft rotating speed via the numerical processing of the overall sound emissions acquired by a microphone placed in the vehicle hood. This kind of signal represents an extremely rich source of information about the operating conditions of all noisy powertrain subsystems. The core of the scheme is represented by an adaptive discrete-time nonlinear frequency locked-loop (FLL) filter that is properly designed to extract the useful frequency content from the acquired audio signal. The whole architecture is innovative, flexible, and extremely low cost by requiring, for its implementation, the additional installation of a single microphonic capsule only. Moreover, it exhibits such a modest computational burden to be directly implementable in commercial engine control units (ECUs) without requiring additional computing hardware. Reported experimental assessments show that the accuracy of the estimate is excellent in all allowed rotational speed regimes.
Article
Full-text available
This article discusses methods and algorithms for processing sound signals, the purpose and classification of filters, basic digital filters, first-order low and high-pass filters for solving technical problems, including matching signal parameters with the characteristics of the electro-acoustic path.
Article
Audio- and image-based soft failure detection methods are developed, which can detect both severe failures (such as system hang) and subtle ones (such as glitch or a momentary disturbance on display). Incorporating the developed detection methods with a robotic ESD (electrostatic discharge) tester, we developed a fully automated soft failure investigation tool. Using this fully automated tool, we obtained failure-specific susceptibility maps for a camera (our target device). These susceptibility maps not only illustrated the sensitive locations of the device, they also showed what type of soft failure is correlated with which locations.
Article
Consumer electronics equipped with a microphone array, such as car navigation devices and headsets commonly implement speech enhancement techniques based on the gradient method to cope with additive noise. However, while these techniques had been originally developed for voice communication and can maximize the signal-to-distortion ratio (SDR), they cannot always maximize automatic speech recognition (ASR) accuracy. For this reason, the front-end speech enhancement parameters have been adjusted by human experts to each environment and acoustic model. In this study, we developed a novel system for maximizing the accuracy of a given ASR engine by automatically adjusting the front-end speech enhancement. The proposed method allows consumers to use ASR through the consumer electronics with less stress when ambient noise varies. A genetic algorithm (GA) is used to generate parameter values of the front-end speech enhancement for particular environments. The generated values can be dynamically assigned to input speech signals by preliminarily clustering the environments based on noise features. In evaluations, parameter values determined by our method outperformed one adjusted by a human expert.
ResearchGate has not been able to resolve any references for this publication.