Journal of the Audio Engineering Society

Published by Audio Engineering Society
Online ISSN: 1549-4950
Publications
Article
An optimal spherical-head model is derived from the high-frequency interaural time difference (ITD) of a population of 25 subjects. Analysis demonstrates that the optimal sphere results in very small lateral angle errors, except near the interaural poles. The spherical-head model is then estimated using the anthropometry of the subjects, based on simple and robust empirical predictive equations. Such a customization greatly decreases objective angular errors that occur when a generic model is used.
 
Article
On September 26, 1999, a musical performance, taking place at McGill University, was transmitted to an audience at New York University, over the Internet. While Internet streaming audio technologies have been in use for several years, what made this event unique was the audience's experience of uninterrupted, intermediate quality, multichannel audio (AC-3). In order to achieve this result, a custom system was developed employing both TCP and UDP protocols, and providing its own buffering and retransmission algorithms. The motivation for this approach is explored, and experiments justifying the decisions made are explained. = Appeared in Journal of the Audio Engineering Society,July-August, 2000. 2 1 Introduction The growth of the Internet has not only reshaped our work lives, but, in recent years, has also begun to affect and redefine various areas of entertainment, in particular, that of music. Until the last decade, the idea of downloading music to one's home, at great convenienc...
 
Article
Look-ahead Sigma-Delta modulators look forward k samples before deciding to output a “one” or a “zero”. The Viterbi algorithm is then used to search the trellis of the exponential number of possibilities that such a procedure generates. This paper describes alternative tree based algorithms. Tree based algorithms are simpler to implement because they do not require backtracking to determine the correct output value. They can also be made more efficient using “Stack” algorithms. Both the tree algorithm and the more computationally efficient “Stack” algorithms are described. Implementations of both algorithms are described in some detail. In particular, the appropriate data structures for both the trial filters and score memories. Comparative results of their performance are also presented.
 
Article
A spherical microphone array has been used to perform directional measurements of airborne sound transmission between rooms. With a source and array on opposite sides of a wall, omnidirectional impulse responses were measured to each of the array microphones. Beamforming resulted in a set of directional impulse responses, which were analyzed to find the distribution of arriving sound energy at the array position during various time ranges. Weak spots in the separating wall are indicated as directions of increased arriving sound energy. The system was able to identify minor defects in a test wall in between two reverberation chambers, and also to identify leaks in the wall of an actual meeting room. On a utilisé un réseau de microphones sphériques dans le but de réaliser des mesures directionnelles de la transmission des bruits par voie aérienne d'une pièce à une autre. Une source sonore et le réseau de microphones étant placés respectivement de part et d'autre d'un mur, les réponses impulsionnelles omnidirectionnelles ont été mesurées à chacun des microphones du réseau. La formation de faisceau a donné lieu à un ensemble de réponses impulsionnelles directionnelles, que l'on a analysé afin de trouver la distribution de l'énergie acoustique à l'arrivée à la position du réseau durant diverses plages temporelles. Les points faibles dans le mur de séparation sont indiqués comme les directions de l'énergie acoustique accrue à l'arrivée. Le système s?est révélé capable d'identifier des défauts mineurs dans un mur d'essai situé entre deux salles de réverbération et des fuites dans le mur d'une salle de réunion réelle. RES
 
Magnitude responses of a plane wave's spatial derivatives of different orders n.
Directional characteristics of a plane wave's
Article
The literature on gradient and differential microphone arrays makes a distinction between the two types, and nevertheless shows how both types can be used to obtain the same directional responses. A more theoretically sound rationale for using delays in differential microphone arrays has not yet been given. This paper presents a gradient analysis of the sound field viewed as a spatio-temporal phenomenon, and gives a theoretical interpretation of the working principles of gradient and differential microphone arrays. It shows that both types of microphone arrays can be viewed as devices for approximately measuring spatio-temporal derivatives of the sound field. Furthermore, it also motivates the design of high-order differential microphone arrays using the aforementioned spatio-temporal gradient analysis.
 
Article
The subjective significance of two general types of metrics used to describe the "quality" of a room based on its aspect ratio are compared. Tests were carried out to evaluate differences between three virtual rooms that score extreme classifications in each of the metrics. The results of the tests indicate that room aspect ratios do have sonic effect on the perception of the modal distribution, but the effect is very much dependent on the frequency content of' the original signal. This indicates that a room that scores well using a certain metric may still suffer from problems if the frequency content of the driving signal matches one particularly strong modal artifact. More significantly, the results of these tests imply that attempts to rank critical listening spaces based on modal distribution metrics are likely to be highly misleading and the derivation of difference limen, which would be required for useful comparison. has been shown in practice to be highly problematic and perhaps meaningless.
 
Article
The importance of telecommunication continues to grow in our everyday lives. An ambitious goal for developers is to provide the most natural way of audio communication by giving users the impression of being located next to each other. MPEG Spatial Audio Object Coding (SAOC) is a technology for coding, transmitting, and interactively reproducing spatial sound scenes on any conventional multi-loudspeaker setup (e.g., ITU 5.1). This paper describes how Directional Audio Coding (DirAC) can be used as recording front-end for SAOC-based teleconference systems to capture acoustic scenes and to extract the individual objects (talkers). By introducing a novel DirAC to SAOC parameter transcoder, a highly efficient way of combining both technologies is presented that enables interactive, object-based spatial teleconferencing.
 
Article
The sampling and interpolation of a sound field in two and three dimensions along a circle is discussed. The Fourier domain representation of the sound field is used, and an angular sampling theorem is developed for the sampling of the sound field along a circle. Based on these results, HRTF sampling and interpolation are discussed. This method achieves very precise interpolation in terms of mean square error. However, these results are only possible if very finely spaced HRTF measurements are available. A method is proposed to improve interpolation results when the HRTF measurements are more coarsely spaced than dictated by the angular Nyquist theorem. The proposed method interpolates the HRTFs in a subband domain. In subbands where small angular aliasing occurs, the previous method is applied. In the other subbands, interpolation is carried out in a complex temporal envelope domain to avoid aliasing. Simulations with models and measured data show that the proposed algorithm performs significantly better than previous methods in a mean square error sense.
 
Article
In comparison to active absorbers, active diffusers have more complicated target impedance functions, a smaller region of stable control, and are more sensitive to control impedance errors. The issues of stability and sensitivity are analyzed theoretically and empirically. Measurements show comparable performance for active and passive diffusers with the active solution requiring less space.
 
Article
An investigation into the measurement of the thresholds of detection of modal Q factors in rooms at low frequency is measured. Key features of the approach taken include the use of music rather than test tones or noise as program material, and the manipulation of damping conditions for a range of modes over a broad low-frequency bandwidth as opposed to the control of one modal artifact within an array of surrounding uncontrolled resonances. It is shown that the detectability of Q-factor changes is directly proportional to the reference Q, and is weakly dependent on the presence and level of higher frequency reverberations. A threshold value of Q = 16 is suggested, below which further changes are unlikely to be detected.
 
Article
The perceptibility of small changes in reverberation time (RT) when music is reproduced within small studio control rooms was judged by a number of subjects. The aim of the study was to determine the difference limen (DL) for midfrequency RTs shorter than 0.6 s, which are usually encountered in such rooms. The work used a real control room and consisted of two experiments. In the first experiment the DL for RT was measured by changing the amount of absorption in the room. Here the measurement stimulus (music) was reproduced to the subjects by one loudspeaker located in the center in front of the mixing console. In the second experiment the measurement stimuli were recorded in the control room using a dummy head and later replayed to the subjects by headphones. Differences between the two measurements were compared and the overall DL was found to be 0.042 6 0.015 s.
 
Article
A perceptually motivated spatial decomposition for two-channel stereo audio signals, capturing the information about the virtual sound stage, is proposed. The spatial decomposition allows resynthesizing audio signals for playback over sound systems other than two-channel stereo. With the use of more front loudspeakers the width of the virtual sound stage can be increased beyond ±30° and the sweet-spot region is extended. Optionally, lateral independent sound components can be played back separately over loudspeakers on the sides of a listener to increase listener envelopment. It is also explained how the spatial decomposition can be used with surround sound and wavefield synthesis-based audio systems.
 
Article
Semantic audio analysis has become a fundamental task in modern audio applications, making the improvement and optimization of classification algorithms a necessity. Standard frame-based audio classification methods have been optimized and modern approaches introduce engineering methodologies that capture the temporal dependency between successive feature observations, following the process of temporal feature integration. Moreover, the deployment of the convolutional neural networks defined a new era on semantic audio analysis. The current paper attempts a thorough comparison between standard feature-based classification strategies, state-of-the-art temporal feature integration tactics and 1D/2D deep convolutional neural network setups, on typical audio classification tasks. Experiments focus on optimizing a lightweight configuration for convolutional network topologies on a Speech/Music/Other classification scheme that can be deployed on various audio information retrieval tasks, such as voice activity detection, speaker diarization or speech emotion recognition. The outmost target of this work is to establish an optimized protocol for constructing deep convolutional topologies on general audio detection classification schemes, minimizing complexity and computational needs.
 
Article
AES workshops chaired by Nick Sansano and Jay LeBoeuf discussed business practices and technology involved in the new generation of music production. Producers on the panel talked about the additional responsibility for some of the bands or artists they were developing. It was highlighted that getting proper credit on a record is extremely important because building a brand as a producer/engineer is crucial. Tony Visconti gave examples of how he had produced or coached artists using Skype between New York and the UK and suggested that rather than spending a lot of money on flying people around and putting them up in hotels we should consider spending it on the production. One of the advantages of cloud computing, suggested the panel, is that as networking resources improve in speed there is less need to be concerned about whether a certain amount of processing power is available on the local platform.
 
Article
In movie theaters, sound sources such as dialog are often reproduced on the center loudspeaker without regard to the visual position on screen. Some sound engineers and researchers have suggested that spatial audiovisual coherence could improve the audience experience, especially for stereoscopic-3D (s-3D) movies. In the experiment described, subjects were asked to judge the suitability of several soundtracks for eight s-3D sequences. Depending on the soundtrack, sound sources could be more or less coherent in azimuth and depth. Results showed that sound suitability could be significantly improved for most of the sequences when coherence in azimuth was achieved. An improvement in the experience of depth was only observed with one sequence. When sequences were presented in nonstereoscopic (2D) version, there was no significant effect of stereoscopy. Subjects quickly became accustomed to azimuthal coherence, which improved sound suitability throughout the experiment. This suggests that the audience adaptation to a new cinematographic convention regarding spatialization of sound objects would not be a burden.
 
Article
The Schroder backward integration method for estimating the reverberation time of an enclosure, as suggested in the ISO 3382 standard, is analyzed from an estimation theoretic perspective, in a general context that is applicable to both blind and non-blind estimation. Expressions for the estimation bias and variance of the reverberation decay rate are derived and verified using Monte-Carlo simulations. Comparison is made with a straight-forward linear regression method (not using backward integration). It is shown that, even though significantly reducing the estimation variance, the use of backward integration can in many cases mitigate the estimation accuracy due to large bias. This clearly indicates that prudence is called for when using backward integration for automated decay rate estimation problems.
 
Article
Recently, a new generation of spatial audio formats were introduced that include elevated loudspeakers and surpass traditional surround sound formats, such as 5.1, in terms of spatial realism. To facilitate high-quality bit-rate-efficient distribution and flexible reproduction of 3D sound, the MPEG standardization group recently started the MPEG-H Audio Coding development for the universal carriage of encoded 3D sound from channel-based, objectbased, and HOA-based input. High quality reproduction is supported for many output formats from 22.2 and beyond down to 5.1, stereo, and binaural reproduction-independently of the original encoding format-thus overcoming incompatibility between various 3D formats. The paper describes the current status of the standardization project and provides an overview of the system architecture, its capabilities, and performance.
 
Microphone preamplifier with phantom power capability. Switches SW1 and SW2, when properly closed, cause the three fault modes discussed in this paper.
Current distribution during the differential fault for the circuits type A, B, C, and D.
During the fault, in the circuit of Figure 21, the negative rail (dashed) follows the voltage across R4 (solid).
Article
In 2001, Hebert and Thomas presented a paper at the 110 th AES Convention which described the "phantom menace" phenomenon wherein microphone phantom power faults can damage audio input circuitry. This paper offers new information about the phantom menace fault mechanisms, analyzes common protection circuits, and introduces a new protection scheme that is more robust. In addition, new information is presented relating these input protection schemes to audio performance, and recommendations are made to minimize noise and distortion.
 
Article
ITU-T Recommendation P.563 defines a single-ended method for objective speech quality assessment. The P.563 objective speech quality measurement method predicts the subjective absolute category rating (ACR) listening quality. The accuracy of the P.563 algorithm was tested against data from subjective listening tests. It was found that P.563 compresses the corresponding MOS (mean opinion score) value range for coded speech. For AMR coded speech with GSM and 3G radio channel errors, the MOS value range was from about 1.2 to 4.0, while the corresponding P.563 MOS-LQO range was from about 3.0 to 3.6. The quality of the modulated noise reference unit (MNRU) and direct samples was predicted better.
 
Article
Aspects of a study of different methods to mitigate the impact of packet loss in a wireless distribution network on the subjective quality of compressed high-fidelity audio are presented. The system was simulated in MATLAB based on parameters of an 802.11a WLAN in multicast mode and the Vorbis codec. To aid in the selection of the most appropriate packet loss concealment strategy not only the additional bandwidth, the processing requirements, or the latency need to be considered. One important differentiating factor is the perceived subjective audio quality. Therefore an accurate estimate of the subsequent audio quality is required. Several simulation-based methods using psychoacoustic models of the human hearing system to quantify subjective audio quality are compared.
 
Recording situation in the anechoic chamber of the TU Berlin with eight musical instruments and a conductor, also showing the monitor mixing desk.
Transfer function (gray) and inverse filter (black) for a Neumann KM120 microphone; the dashed line shows the compensation result of the inverse filter on the transfer function.
Coordinate system for the documentation of the microphone positions.
Article
For the quality of model-based, virtual acoustic environments, not only the room acousticsimulation but also the quality and suitability of the source material play an important role. An optimal recording of real sound sources is characterized by an anechoic production and ahigh signal-to-noise ratio and crosstalk attenuation between the different recording channels.Furthermore a recording in the far field of the source is necessary to use correct directivitiesfor room acoustic simulations. From an artistic point of view, the recording situation withits technical boundary conditions must be designed in a way that the musical or vocal ren-dering of professional performers is impaired as little as possible. To provide a high-qualitysource signal for acoustic simulations of orchestral content, a professional symphony orches-tra was recorded in the anechoic chamber of TU Berlin performing the 8th Symphony of L.v. Beethoven. Through a combination of groupwise and sequential recordings with individ-ual monitor mixes via headphones and video recordings of the conductor and concertmaster,an optimal compromise was sought with regard to artistic and technical aspects. The articlepresents the recording process and processing chain as well as the results achieved with respectto technical and artistical quality criteria.
 
Article
Nowadays, mobile devices (smart phones and tablet PCs), and DMB (digital media broadcasting) systems can play music or movies with different pitches or tempos. In order to change the pitch or tempo of an audio signal, it is common to perform an algorithm after the decoder stage. An algorithm that can be used to change the pitch or tempo requires time-domain operations to be performed on the decoded audio signal. Since two audio algorithms are carried out in the processor and the system, such methods require several calculations and have high power consumption, making them inefficient. If changing the pitch or tempo of an audio signal is performed while the encoded audio signal is being decoded, it reduces the load on the processor as well as the system. In this paper we propose a modified AAC (advanced audio coding) decoder structure that can change the pitch and tempo of an audio signal.
 
Article
A universal design procedure is presented for enhancing the low-frequency responses of loudspeakers ranging from handset microspeakers to subwoofers. This procedure aims at finding the optimal parameters of vented-box configurations using a systematic procedure based on vibration-absorber theory. By viewing the system as two coupled serial and parallel oscillators, a characteristic equation is derived for the vented-box system, A simulation platform is then established using electro - mechano - acoustical (EMA) analogous circuits. The electrical impedance and the on-axis sound pressure level (SPL) of the loudspeaker can be simulated by solving the loop equations of the analogous circuit. In order to facilitate the design process of such loudspeaker systems, a design chart is devised using the characteristic equation in deciding the parameters to deliver maximum output at the low-frequency end. In addition, a constrained optimization procedure is applied to maximize the acoustic output under the enclosure constraints. Simulations and experiments were undertaken to validate the resulting optimum design. Design guidelines are summarized.
 
Single Helmholtz resonator geometry
Free field representation of Helmholtz resonator (HR) model attached to a panel with signal paths shown.
Typical comparative results for absorption coefficient evaluated for 1/3 octave bands of a commercially available resonator absorption panel and the simulated data of the same panel obtained by the proposed method. 
Helmholtz resonators 10x10 array with displaced receiver along an arc. 
Article
An efficient method for simulating the far field acoustic radiation from resonator panel absorbers is presented following a filter-based modeling approach. This method allows the evaluation of the time and frequency domain response of arbitrary-sized perforated panels in any specific receiver position under free-field and diffuse-field acoustic environments. Such panel absorbers employ the Helmholtz resonator principle, whose filter-based representation is well-defined. Hence, combining such basic resonator elements, the physical parameters of the panel can be user-defined in a parametric fashion, and it is shown that from the derived impulse response, the panel's absorption coefficient can be evaluated with sufficient accuracy, following the standardized procedure. Unlike existing analytical approaches, the proposed approach offers significant computational efficiency and allows flexible and fast practical evaluation of the effect of such panel absorbers.
 
Conference Paper
The Recording Arts Program at the University of Colorado at Denver and Health Sciences Center (UCDHSC) performed an independent evaluation of three audio codecs: Dolby Digital (AC-3 at 384 kbps), Advanced Audio Coding Plus (HE-AAC at 160 kbps), and Dolby Digital Plus (E-AC-3 at 224 and 200 kbps). UCDHSC performed double-blind listening tests during the summer of 2006, which largely adhered to the standards of ITU-R BS. 1116-1 (which provides guidelines for multichannel critical listening tests). The results of this test illustrate a clear delineation between the AC-3 codec and the others tested. Test procedures and findings are presented.
 
Article
An equivalent circuit model for dynamic moving-coil transducers incorporating a semiinductance had been evaluated previously. A more complex equivalent circuit model inspired by problems with loudspeakers with a copper cap on the pole piece is now evaluated in order to include transducers that were not covered by the previous model. A simplified version of this model, having only a few extra components compared to the traditional model, is proposed. This modified model is shown to be in good agreement with test measurements, and this new semi-inductance model is therefore proposed as replacement for the about 30-yearold traditional model. Simulations according to this model provide reliable small-signal parameters for most types of loudspeakers. A nondestructive way to extract the electrical (blocked) impedance is also shown as well as the possibility to reveal ''hidden'' information in the motional impedance due not only to the rim resonance but also to cone breakups at higher frequencies.
 
Article
Multichannel audio signal processing has undergone major development in recent years. The incorporation of spatial information into an immersive audiovisual virtual environment or into video games provides a better sense of "presence" to applications. In a binaural system, spatial sound consists of reproducing audio signals with spatial cues (spatial information embedded in the sound) through headphones. This spatial information allows the listener to identify the virtual positions of the sources corresponding to different sounds. Headphone-based spatial sound is obtained by filtering different sound sources through a collection of special filters (whose frequency responses are called Head-Related Transfer Functions) prior to rendering them through headphones. These filters belong to a database composed by a limited number of spatial fixed positions. A complete audio application that can render multiple sound sources in any position of the space and virtualize movements of sound sources in real time demands high computing needs. Graphics Processing Units (GPUs) are highly parallel programmable coprocessors that provide massive computation when the needed operations are properly parallelized. This paper presents the design of a headphone-based multisource spatial audio application whose main feature is that all required processing is carried out on the GPU. To this end, two solutions have been approached in order to synthesize sound sources in spatial positions that are not included in the database and to virtualize sound sources movements between different spatial positions. The results show that the proposed application is able to move up to 240 sources simultaneously .
 
Article
The relationship between the duration of a sound presentation and the accuracy of human localization is investigated. The three-dimensional sound is presented via headphones. The head-tracking system was integrated together with the sound presentation. Generalized head-related transfer functions (HRTFs) are used in the experiment. Six different types of sounds with durations of 0.5, 2,4, and 6 seconds were presented in random order on any azimuth in the horizontal plane. Thirty subjects participated in the study. A special location indication system called DINC (directional indication compass) was developed. With DINC the judged location of every test can be recorded accurately. The results showed that the localization accuracy is significantly related to the duration of the sound presentation. As long as the sound has a broad frequency bandwidth, the sound type has little effect on the localization accuracy. A presentation of at least 4-second duration is recommended. There is no significant difference between male and female subjects in the accuracy of detection.
 
Article
Auralization is a powerful tool to increase the realism and sense of immersion in Virtual Reality environments. The Head Related Transfer Function (HRTF) filters commonly used for auralization are non-individualized, as obtaining individualized HRTFs poses very serious practical difficulties. It is therefore extremely important to understand to what extent this hinders sound perception. In this paper we address this issue from a learning perspective. In a set of experiments, we observed that mere exposure to virtual sounds processed with generic HRTF did not improve the subjects' performance in sound source localization, but short training periods involving active learning and feedback led to significantly better results. We propose that using auralization with non-individualized HRTF should always be preceded by a learning period.
 
MFS simulation showing pressure contours (in 2 dB increments ) at 10 Hz. Dotted curve is a perfect circle shifted 9 cm (the acoustic center) forward of the enclosure.  
MFS simulation showing pressure contours (in 3 dB increments ) at 1 kHz. A very complex pressure field is created behind the enclosure.  
Article
Numerical calculation of radiation and diffraction from loudspeaker cabinets is usually carried out in many design tools using methods that are strictly valid only for high frequency. However, knowledge of the theoretical sound field at low frequency (below 500 Hz) is needed, for example, to model the anechoic response. In the low-frequency regime, traditional approaches to diffraction modeling can produce errors that are not insignificant. This work describes an approach to solve the three-dimensional Helmholtz equation for a piston radiator in a rectangular solid enclosure using the Method of Fundamental Solutions (MFS). It is simpler to implement than boundary-element schemes and has an accuracy that is in principle limited only by machine precision and computing resources. In practice, there is a maximum frequency limitation that depends on the computational cabinet size. We therefore supplement the algorithm with a high-frequency approach that joins smoothly to the MFS scheme at intermediate frequency (1-2kHz). Examples and applications are also described.
 
(Color online) The linear array prototype.  
Sound field control problem for the generation of zones of accurate reproduction of sound. The bright zone consists of M B = 3 bright points, corresponding to control points #19, #20, and #21, whilst the rest of the M D = 34 control points form the dark area.  
Sound field control problem for the generation of zones of silence. The dark zone consists of M D = 3 dark points, corresponding to control points #19, #20, and #21, whilst the rest of the M B = 34 control points form the bright area.  
Article
Compact loudspeaker arrays driven with input signals designed with the Pressure Matching Method (PMM) can be used for the reproduction of a target signal in a given control area. This technology may be used for the generation of zones of private sound and zones of silence. In this work we compare two strategies for the accurate reproduction of a target signal defined with large amplitude variations between the so-called acoustically bright and dark zones. More specifically, we compare the Weighted Pressure Matching Method (WPMM) and a formulation of the PMM with a constraint on the accuracy of the target signal reproduction. Results of simulations and experiments with a linear array prototype show that input signals designed with the WPMM provide better trade-offs between accuracy of the target field reproduction in the so-called bright zone and directivity performance than the other strategy considered.
 
Article
The sound field produced by loudspeakers at low frequencies in small- and medium-size rectangular listening rooms is highly nonuniform due to the multiple reflections and diffractions of sound on the walls and different objects in the room. A new method, called controlled acoustic bass system (CABS), is introduced. The system utilizes front loudspeakers and extra loudspeakers on the opposite wall of the room processed to cancel out the rear-wall reflections, which effectively conveys a more uniform sound field. The system works in the time domain and presents good performance over the loudspeaker low-frequency range. CABS has been simulated and measured in two different standard listening rooms with satisfactory results.
 
Article
Where creating a desired complicated sound field, an acoustic source array should be designed appropriately in order to obtain the acoustic source parameters. To this end a method is suggested that will utilize an acoustical holography technique based on the inverse boundary-element method. Acoustic analogy between the problems of source reconstruction and source design was the initial motivation for the study. In the design of the source array the pressure distribution at specific field points is the constraint of the problem, and the signal distribution at the source surface points is the object function of the problem. The whole procedure of the application consists of three stages: First a condition of the desired sound field should be set as the constraint. Secondg the geometry and the boundary conditions of the source array system and the target field, that is, points in the sound field of concern, are modeled by the boundary elements. Actual characteristics of source and space can be considered to generate an accurate condition of the target field, regardless of the near and far fields. Finally the source parameters are inversely calculated by backward projection. The suggested method is especially useful in controlling the near and intermediate fields, and it can control the far field similarly to other methods. As an example, a source array to fulfill the plane-wave propagating zone and another quiet zone near thepropagation zone was designed and tested by simulation and measurement.
 
Article
Two-port and multiport models are widely used to represent elements that can be characterized with linear models. This paper provides a basic overview of two- and multiport models of linear, time-invariant acoustic systems and applies these to modeling audio systems and creating audio signal-processing algorithms. Both frequency domain and discrete time domain analyses are included along with notes on MATLAB implementations. The concepts and methods are applied to the design and implementation of networks of acoustic elements.
 
Article
Automatic recognition of sound events can be valuable for efficient situation analysis of audio scenes. In this article we address the problem of detecting human activities in natural environments based solely on the acoustic modality. The primary goal is the continuous acoustic surveillance of a particular natural scene for illegal human activities (trespassing, hunting, etc.) in order to promptly alert an authorized officer for taking the appropriate measures. We constructed a novel system that is mainly characterized by its hierarchical structure as well as by its acoustic parameters. Each sound class is represented by a hidden Markov model created using descriptors from the time, frequency, and wavelet domains. The system has the ability to automatically adapt to acoustic conditions of different scenes via the feedback loop that serves unsupervised model refinement. We conducted extensive experiments for assessing the performance of the system with respect to its recognition and detection capabilities. To this end we employed confusion matrices and Detection Error Tradeoff curves while we report that high performance was achieved for both detection and recognition.
 
Article
A new acoustic filter that uses four polymer films with half-wavelength spacing was designed and used for detailed measurements of the audio signal components from amplitude-modulated high-intensity ultrasonic waves. The filter provided strong attenuation of the carrier frequency, which is a major cause of microphone nonlinearity. The measurements clearly showed parametric array effects from 300 to 8000 Hz and modulated radiation pressure above 30 Hz.
 
Article
The boundary-element method is applied to model the transient acoustic field radiating from a loudspeaker. The finite-element method models the structural behavior of the loudspeaker and provides the necessary boundary data for the acoustic model. The well-known stability problems of time-domain boundary-element methods are avoided by using a Burton-Miller type integral equation. Structural damping and postprocessing are applied to the structural model in order to obtain a realistic response. The convergence of the exterior pressure results as the time step decreases is investigated along with the effect of varying the bandwidth of the applied forcing. The model is verified at different points in the exterior field and two examples of insights from the time domain are shown: the response of the structure to an impulse, and the presence of the acoustic center.
 
Article
This paper presents measurements and visualization of sound intensity around a human head simulator in a free field. A Cartesian robot, applied for precise positioning of the acoustic vector sensor, was used to measure sound intensity. Measurements were performed in a free field using a head and torso simulator and a setup consisting of four different loudspeaker configurations. The acoustic vector sensor was positioned around the head with a 5-cm step. Sound intensity was measured in 277 points. For every step three orthogonal sound intensity components were calculated. Pure tones of frequencies 250, 1000, and 4000 Hz were applied to analyze the acoustic field. Obtained results were used to provide visualizations of sound intensity distribution around the human head. The tool developed for this purpose utilized three-dimensional sound intensity measurements and visualization techniques.
 
Article
The center of the spherical waves radiated from a loudspeaker is defined as its acoustic center. This study aims to investigate how the acoustic center of a closed-box loudspeaker is shifted when the loudspeaker is placed in a linear array. That is, the acoustic center of the loudspeaker is estimated when the loudspeaker is placed alone and then the loudspeaker is placed in a linear array composed of two or three identical loudspeakers. The acoustic center of each loudspeaker in the linear arrays is estimated with the other loudspeakers turned off and compared with that in the single loudspeaker case. In order to estimate the acoustic center based on the wave fronts, a method is proposed that measures sound pressure around the loudspeaker with an array of microphones and uses the beamforming method for the reduction of the effect of the experimental errors. Experimental results show that the acoustic center is shifted differently depending on the relative position of the loudspeaker in the array. This implies that the performance of sound field control with a linear array of loudspeakers can be improved by taking the shift of the acoustic center into account. Journal of the Audio Engineering Society.
 
Article
Compression drivers couple a radiating diaphragm to a horn throat of smaller area, resulting in high efficiency. When coupled to a suitable horn, this area reduction improves the match between the output mechanical impedance of the driver and the loading acoustic radiation impedance. Early workers had devised a phase plug to fill most of the cavity volume. Applying both an equal path-length and a modal approach to a phase plug with concentric annular channels coupled to a cavity shaped as a flat disk is explored. The assumption that the cavity may be represented as a flat disk is investigated by comparing its behavior with that of an axially vibrating rigid spherical cap radiating into a curved cavity. It is demonstrated that channel arrangements derived for a flat disk are not optimum for use in a typical compression driver with a curved cavity. A new methodology for calculating the channel positions and areas giving least modal excitation is described. The impact of the new approach is illustrated with a practical design.
 
Block diagram illustrating the MIMO channel inversion problem.  
SFR filter of length N F = 512 samples obtained from frequency response˜Hresponse˜ response˜H k (ω) using a DFT of length N T = 2048.
Comparison of WFS and SFR in reproducing a point source with frequency f 1 = 500 Hz located at r m = (3 m, 1 m). The used loudspeakers are marked with squares. Sound field snapshots: (a) desired, (b) WFS I, (c) WFS II, and (d) SFR.
Article
Sound fields are essentially band-limited phe-nomena, both temporally and spatially. This implies that a spatially sampled sound field respecting the Nyquist crite-rion is effectively equivalent to its continuous original. We describe Sound Field Reconstruction (SFR)—a technique that uses the previously stated observation to express the reproduction of a continuous sound field as an inversion of the discrete acoustic channel from a loudspeaker array to a grid of control points. The acoustic channel is inverted us-ing truncated singular value decomposition (SVD) in order to provide optimal sound field reproduction subject to a limited effort constraint. Additionally, a detailed procedure for obtaining loudspeaker driving signals that involves selection of active loudspeakers, coverage of the listening area with control points, and frequency-domain FIR filter design is described. Extensive simulations comparing SFR with Wave Field Synthesis show that on average, SFR provides higher sound field reproduction accuracy.
 
Article
A new method for the combined simulation of the audio and haptic response of an acoustic environment for arbitrary musical source material based on physical measurements is presented. Tactile signals are reproduced by the convolution of "decoupled" vibration with impulse responses derived from mechanical impedance measurements. Audio signals are reproduced by the convolution of anechoic sound with binaural room impulse responses. Playback is accomplished through headphones and a calibrated motion platform. Benefits of combining the new method of tactile signal reproduction with known audio auralization techniques include the ability to perform side-by-side listening tests for audio-tactile stimuli perceived in real music performance situations. Details of the method are discussed along with obstacles and applications. Structural response measurements indicate that propagation time and magnitude attenuation with distance vary perceptibly depending on both floor construction and direction of travel. One method of analysis shows that tactile and acoustical impulse responses vary independently across environments, supporting the need for measured vibration signals in audio-tactile displays. The results of the study can be used to calibrate haptic displays for music reproduction, but also to build test environments for perceptual studies aiming to improve stage construction with regard to vibration characteristics.
 
Article
The Helmholtz equation admits one-parameter (1P) solutions in u, that is, solutions depending on a single spatial coordinate u, if and only if |▽u| and ▽²u are functions of u alone. The |▽u| condition allows u to be transformed to another coordinate ξ, which measures arc length. For a 1P field inside a tube of orthogonal trajectories to the surfaces of constant ξ, the wave equation reduces exactly to Webster's horn equation, in which ξ is the axial coordinate of the horn and S(ξ) the area of a constant-ξ surface segment bounded by the tube. The 1P existence conditions can be expressed in terms of coordinate scale factors and used to determine whether a given coordinate system admits 1P waves. They can also be expressed in terms of the principal curvatures of the constant-ξ surfaces, leading to the unfortunate conclusion that the only coordinates meeting the conditions are those whose level surfaces are parallel planes, coaxial cylinders, or concentric spheres; that is, no new 1P horn geometries remain to be discovered. [P.S.: This material is also covered in chapters 3 to 5 of the author's thesis "Modeling of Horns and Enclosures for Loudspeakers", q.v.]
 
Article
Zoom control is a key feature of audiovisual capture in both professional and consumer cameras. While a video zoom operation is often not complemented with a corresponding acoustic zoom, psychophysical as well as neuroimaging results suggest that a cross-modal approach to zooming may facilitate multisensory integration. As auditory distance perception is primarily determined by sound intensity, an audiovisual zoom effect may be obtained by matching the levels of different sources in a sound scene with their visually perceived motion during video zooming. In this paper, we propose a general theory for independent sound source level control which can be exploited to attain an acoustic zoom effect. An essential feature of the proposed theory is that it does not consist in an explicit sound source separation, which relieves its potential computational requirements. An efficient implementation using fixed and adaptive spatial and spectral noise reduction algorithms is proposed and evaluated. Experimental results using an array with a small number of low-cost microphones confirm that the proposed approach is particularly suited for consumer audiovisual capture applications.
 
Top-cited authors
Ville Pulkki
  • Aalto University
Brian C J Moore
  • University of Cambridge
Brian Glasberg
  • University of Cambridge
Tapio Lokki
  • Aalto University
Mark A Poletti
  • Callaghan Innovation