Article

Computer-generated pulse signal applied for sound measurement

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

A computer-generated pulse signal for sound measurement is discussed. A pulse signal whose power spectrum is flat is generated by inverse Fourier transformation. The generation of a time-stretched pulse and its compression method are also considered. Computer-controlled measurements enable time averaging and the elimination of reflected sound is made in the computer memory by the operator's instruction monitoring acquired waveform on CRT.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... The RIR can be recorded from a physical environment using different techniques [4,[11][12][13]. ...
... To overcome the limitations of synthetic RIRs, real RIRs are recorded in a controlled environment using different techniques [11][12][13]. The maximum length sequence method [69], the time-stretched pulses method [11], and the exponential sine sweep method [12] are common methods to measure real RIRs. ...
... To overcome the limitations of synthetic RIRs, real RIRs are recorded in a controlled environment using different techniques [11][12][13]. The maximum length sequence method [69], the time-stretched pulses method [11], and the exponential sine sweep method [12] are common methods to measure real RIRs. Among these approaches, the exponential sine sweep method is robust to changing loudspeaker output volume and performs well in automatic speech recognition tasks. ...
Preprint
Full-text available
Sound propagation is the process by which sound energy travels through a medium, such as air, to the surrounding environment as sound waves. The room impulse response (RIR) describes this process and is influenced by the positions of the source and listener, the room's geometry, and its materials. Physics-based acoustic simulators have been used for decades to compute accurate RIRs for specific acoustic environments. However, we have encountered limitations with existing acoustic simulators. To address these limitations, we propose three novel solutions. First, we introduce a learning-based RIR generator that is two orders of magnitude faster than an interactive ray-tracing simulator. Our approach can be trained to input both statistical and traditional parameters directly, and it can generate both monaural and binaural RIRs for both reconstructed and synthetic 3D scenes. Our generated RIRs outperform interactive ray-tracing simulators in speech-processing applications, including ASR, Speech Enhancement, and Speech Separation. Secondly, we propose estimating RIRs from reverberant speech signals and visual cues without a 3D representation of the environment. By estimating RIRs from reverberant speech, we can augment training data to match test data, improving the word error rate of the ASR system. Our estimated RIRs achieve a 6.9% improvement over previous learning-based RIR estimators in far-field ASR tasks. We demonstrate that our audio-visual RIR estimator aids tasks like visual acoustic matching, novel-view acoustic synthesis, and voice dubbing, validated through perceptual evaluation. Finally, we introduce IR-GAN to augment accurate RIRs using real RIRs. IR-GAN parametrically controls acoustic parameters learned from real RIRs to generate new RIRs that imitate different acoustic environments, outperforming Ray-tracing simulators on the far-field ASR benchmark by 8.95%.
... Tikander obtained positive results after testing an equalised hear-through function on a portable AAR system in real life situations, and observed some adaptation after the subjects wore the device for at least 1.5 hours [3]. Previous studies already suggested that humans can adapt to changes in their Head-Related Transfer Function (HRTF) if they are exposed to them for an extended period of time [9][10][11]. The goal of this study is to further explore the effect of long-term user adaptation to an AAR system built with off-the-shelf components (rather than custom-made ones), aiming at achieve perceived realism in an equalised hear-through function. ...
... Length Sequence (MLS) [151], the Inverse Repeated Sequence (IRS) [51], or to the Time-Stretched Pulse [10] impulse response recording method. ...
... • Participants had a high approval rate on Prolific studies(> 95%) and a number submissions to previous studies superior to 10. ...
Thesis
Full-text available
This thesis aims to investigate a variety of effects linking the auditory distance perception of virtual sound sources to the context of audio-only augmented reality (AAR) applications. It focuses on how its specific perceptual context and primary objectives impose constraints on the design of the distance rendering approach used to generate virtual sound sources for AAR applications. AAR refers to a set of technologies that aim to merge computer-generated auditory content into a user's acoustic environment. AAR systems have fundamental requirements as an audio playback system must enable a seamless integration of virtual sound events within the user's environment. Different challenges arise from these critical requirements. The first part of the thesis concerns the critical role of acoustic cue reproduction in the auditory distance perception of virtual sound sources in the context of audio-only augmented reality. Auditory distance perception is based on a range of cues categorized as acoustic, and cognitive. We examined which strategies for weighting auditory cues are used by the auditory system to create the perception of sound distance. By considering different spatial and temporal segmentations, we attempted to characterize how early energy is perceived in relation to reverberation. The second part of the thesis's motivations focuses on how, in AAR applications, environment-related cues could impact the perception of virtual sound sources. In AAR applications, the geometry of the environment is not always completely considered. In particular, the calibration effect induced by the perception of the visual environment on the auditory perception is generally overlooked. We also became interested in the instance in which co-occurring real sound sources whose placements are unknown to the user could affect the auditory distance perception of virtual sound sources through an intra-modal calibration effect.
... Tikander obtained positive results after testing an equalised hear-through function on a portable AAR system in real life situations, and observed some adaptation after the subjects wore the device for at least 1.5 hours [3]. Previous studies already suggested that humans can adapt to changes in their Head-Related Transfer Function (HRTF) if they are exposed to them for an extended period of time [9][10][11]. The goal of this study is to further explore the effect of long-term user adaptation to an AAR system built with off-the-shelf components (rather than custom-made ones), aiming at achieve perceived realism in an equalised hear-through function. ...
... Length Sequence (MLS) [151], the Inverse Repeated Sequence (IRS) [51], or to the Time-Stretched Pulse [10] impulse response recording method. ...
... • Participants had a high approval rate on Prolific studies(> 95%) and a number submissions to previous studies superior to 10. ...
Presentation
No PDF available ABSTRACT Visual and acoustic environment may influence the perception of auditory distance. In the context of Audio-only augmented reality (AAR), the coherence of the perceived virtual sound sources with the apparent room geometry and acoustics cannot always be guaranteed. The perceptual consequences of these incoherences are not well known. We conducted two online perceptual studies with a sound distance rendering model based on measured spatial room impulse responses (SRIR). A first study evaluated the perceptual performances of the model in incongruent visual contexts. The incongruent environment-related visual cues (spatial visual boundary and room volume) demonstrated a significant effect on the auditory distance perception (ADP) of virtual sound sources, through a calibration effect. A second study evaluated the impact of acoustical incongruence. Virtual sound sources distances were judged after the participant listened to distracting sound sources conveying distance cues relative to a different acoustical environment. When this distracting sound sources corresponded to a larger room than the one reproduced by the model, a higher compression effect was observed on the ADP of virtual sound sources. However, when the intensity cue conveyed by the distracting sound sources were coherent with the acoustical environment simulated by the model, their distracting effect were negligible.
... Periodic pulses [10,11] were often used to contrast the effect of noise. Time-stretched pulses [12,13], composed by an expanded pulse excitation, have been proposed to overcome limitations in the amplitude of the pulses, which, if electronically generated, could damage the loudspeaker or could activate the protection circuit of the amplifier. Periodic sequences have been largely used in RIR estimation. ...
... The number of nonlinear equations in the reduced system (12), increases exponentially with the order K, geometrically with the diagonal number D, but only linearly with the memory length N T . The Newton-Raphson method has still computational and memory requirements that can be prohibitively large, since they increase with the cube of number of equations, i.e., Q 3 . ...
... , a 1 . For every couple of symmetric joint moments (e.g., < x(n)x 2 (n − 2) > P and < x 2 (n)x(n − 2) > P ) only one has to be considered in (12). • Oddness: for any N-tuple a 1 , a 2 , . . . ...
Article
Full-text available
The paper discusses a measurement approach for the room impulse response (RIR), which is insensitive to the nonlinearities that affect the measurement instruments. The approach employs as measurement signals the perfect periodic sequences for Wiener nonlinear (WN) filters. Perfect periodic sequences (PPSs) are periodic sequences that guarantee the perfect orthogonality of a filter basis functions over a period. The PPSs for WN filters are appealing for RIR measurement, since their sample distribution is almost Gaussian and provides a low excitation to the highest amplitudes. RIR measurement using PPSs for WN filters is studied and its advantages and limitations are discussed. The derivation of PPSs for WN filters suitable for RIR measurement is detailed. Limitations in the identification given by the underestimation of RIR memory, order of nonlinearity, and effect of measurement noise are analysed and estimated. Finally, experimental results, which involve both simulations using signals affected by real nonlinear devices and real RIR measurements in the presence of nonlinearities, compare the proposed approach with the ones that are based on PPSs for Legendre nonlinear filter, maximal length sequences, and exponential sweeps.
... Accidentally, we found that FVN provides powerful tools for assessing acoustic systems [6]. FVN is a new member of TSP (Time Stretched Pulse) [7][8][9][10][11][12] in a broad sense. Well-known TSP members are pseudo-random noise (PN) and swept-sine (SS) signals. ...
... Well-known TSP members are pseudo-random noise (PN) and swept-sine (SS) signals. They provide an essential infrastructure of acoustic measurement and assessment [7,8,[12][13][14][15]. It is crucial to acquire reusable speech materials [16,17] and present sound stimuli to participants of subjective tests using adequately prepared sound devices and the environment. ...
... TSP-based methods [7][8][9][10][11] are commonly used in acoustic measurement and appropriate for measuring linear time-invariant responses. However, they need extra-equipment or inspection by experts for analyzing non-linear responses and inter-modulation distortion [12][13][14][15]. ...
Preprint
We introduce a new member of TSP (Time Stretched Pulse) for acoustic and speech measurement infrastructure, based on a simple all-pass filter and systematic randomization. This new infrastructure fundamentally upgrades our previous measurement procedure, which enables simultaneous measurement of multiple attributes, including non-linear ones without requiring extra filtering nor post-processing. Our new proposal establishes a theoretically solid, flexible, and extensible foundation in acoustic measurement. Moreover, it is general enough to provide versatile research tools for other fields, such as biological signal analysis. We illustrate using acoustic measurements and data augmentation as representative examples among various prospective applications. We open-sourced MATLAB implementation. It consists of an interactive and real-time acoustic tool, MATLAB functions, and supporting materials.
... This results in a time-domain signal resembling a sine with varying instantaneous frequency. When the measurement is finished, the obtained signal is temporally compressed so that the frequencies appear again at the same time instance followed by the measured sound decay [134,135]. ...
... The LSS is depicted in the top part of Fig. 3.1. Since all the frequencies are emitted for the same amount of time, the spectrum of the LSS is flat in the frequency region of interest [134,135]. ...
Thesis
Full-text available
In this dissertation, the discussion is centered around the sound energy decay in enclosed spaces. The work starts with the methods to predict the reverberation parameters, followed by the room impulse response measurement procedures, and ends with an analysis of techniques to digitally reproduce the sound decay. The research on the reverberation in physical spaces was initiated when the first formula to calculate room's reverberation time emerged. Since then, finding an accurate and reliable method to predict reverberation has been an important area of acoustic research. This thesis presents a comprehensive comparison of the most commonly used reverberation time formulas, describes their applicability in various scenarios, and discusses their accuracy when compared to results of measurements. The common sources of uncertainty in reverberation time calculations, such as bias introduced by air absorption and error in sound absorption coefficient, are analyzed as well. The thesis shows that decreasing such uncertainties leads to a good prediction accuracy of Sabine and Eyring equations in diverse conditions regarding sound absorption distribution. The measurement of the sound energy decay plays a crucial part in understanding the propagation of sound in physical spaces. Nowadays, numerous techniques to capture room impulse responses are available, each having its advantages and drawbacks. In this dissertation, the majority of commonly used measurement techniques are listed, whereas the exponential swept-sine is described in more detail. This work elaborates on the external factors that may impair the measurements and introduce error to their results, such as stationary and non-stationary noise, as well as time variance. The dissertation introduces Rule of Two, a method of detecting nonstationary disturbances in sweep measurements. It also shows the importance of using median as a robust estimator in non-stationary noise detection. Artificial reverberation is a popular sound effect, used to synthesize sound energy decay for the purpose of audio production. This dissertation offers an insight into artificial reverberation algorithms based on recursive structures. The filter design proposed in this work offers precise control over the decay rate while being efficient enough for real-time implementation. The thesis discusses the role of the delay lines and feedback matrix in achieving high echo density in feedback delay networks. It also shows that four velvet-noise sequences are sufficient to obtain smooth output in interleaved velvet noise reverberator. The thesis shows that the accuracy of reproduction increases the perceptual similarity between measured and synthesised impulse responses. The insights collected in this dissertation offer insights into the intricacies of reverberation prediction, measurement and synthesis. The results allow for reliable estimation of parameters related to sound energy decay, and offer an improvement in the field of artificial reverberation.
... The topic of this paper, audio peak reduction, is a close relative of other phase processing techniques leading to inaudible or nearly inaudible results, such as group-delay equalization [9][10][11], decorrelation [12][13][14][15], and signalprocessing techniques used for upmixing [16]. Chirps or sweeps similar to the ones discussed in this paper are used in measurements [17][18][19][20], the short ones especially in the estimation of the properties of time-varying systems [21,22]. This paper explores the Lynch and Orban-Foti techniques for audio peak reduction. ...
... A finite-length chirp with a completely flat spectrum cannot be synthesized. However avoiding discontinuities in the frequency response can flatten the spectrum [17,19]. ...
Article
Full-text available
Two filtering methods for reducing the peak value of audio signals are studied. Both methods essentially warp the signal phase while leaving its magnitude spectrum unchanged. The first technique, originally proposed by Lynch in 1988, consists of a wideband linear chirp. The listening test presented here shows that the chirp must not be longer than 4 ms, so as not to cause any audible change in timbre. The second method, called the phase rotator, put forward in 2001 by Orban and Foti is based on a cascade of second-order all-pass filters. This work proposes extensions to improve the performance of the methods, including rules to choose the parameter values. A comparison with previous methods in terms of achieved peak reduction, using a collection of short audio signals, is presented. The computational load of both methods is sufficiently low for real-time application. The extended phase rotator method is found to be superior to the linear chirp method and comparable to the other search methods. The practical peak reduction obtained with the proposed methods spans from 0 to about 3.5 dB. The signal processing methods presented in this work can increase loudness or save power in audio playback.
... Par ailleurs, il est nécessaire d'investir dans un système d'écoute d'autant plus coûteux et complexe à installer qu'il y a de canaux à jouer. 4. Avec une pensée particulière pour la console Virtual Boy de Nintendo, sortie en 1995 et dont la technologie permettait déjà un rendu en relief. ...
... Le principe est d'approximer au mieux un Dirac puis d'améliorer le rapport signal à bruit Ceci étant, il a été montré que leur utilisation dans le cadre de mesures de HRTF pouvait facilement introduire des artefacts en cas de mouvement de tête [164]. [4], elle rassemble une série de méthodes dont le point commun est de tenter « d'étirer » temporellement un Dirac, d'effectuer la mesure acoustique voulue, puis de « compresser » correctement la réponse pour retrouver le résultat qu'aurait produit un véritable Dirac. Le plus souvent, le signal excitateur prend la forme d'une rampe de fréquence et la phase de compression s'opère par simple convolution de la mesure avec la rampe inverse. ...
Thesis
Full-text available
Le terme « binaural » fait référence au champ de recherche visant à comprendre et maîtriser les mécanismes permettant à l’être humain de percevoir l’origine spatiale des sons. Cette perception émerge de notre faculté à détecter certains indices de localisation au sein de notre environnement sonore et ces indices, quant à eux, naissent de l’interaction des sons avec notre corps et en particulier nos oreilles. Pour reproduire au casque un effet de spatialisation sonore il faut donc d’une part réintroduire ces indices et d’autre part en personnaliser la génération en l’adaptant à la morphologie de l’auditeur.Pour y parvenir nous avons imaginé un procédé original fondé sur l’utilisation d’un modèle déformable 3D d’oreille et l’étude en amont des liens entre morphologies et HRTF. Nous en démontrons ici la faisabilité en le mettant en pratique grâce à des bases de données synthétiques créées pour l’occasion. Cette génération de données nous a par ailleurs amené à proposer des optimisations au calcul numérique d’HRTF et à réfléchir aux améliorations possibles pour en fiabiliser le rendu subjectif.
... Various types of TSPs provide a means to circumvent this difficulty. Swept-Sine and MLS (Maximum Length Sequence) are representative types of TSPs [4], [5], [6], [7]. They have temporally spread waveform and have means to compress back to the virtual impulse. ...
... Therefore, the convolution of h fvn [n] and h fvn [−n] yields a unit impulse. This pulse recovery is the behavior that makes TSP signals useful for impulse response measurement [4], [5], [6], [7]. ...
Preprint
Full-text available
We introduce a new acoustic measurement method that can measure the linear time-invariant response, the nonlinear time-invariant response, and random and time-varying responses simultaneously. The method uses a set of orthogonal sequences made from a set of unit FVNs (Frequency domain variant of Velvet Noise), a new member of the TSP (Time Stretched Pulse). FVN has a unique feature that other TSP members do not. It is a high degree of design freedom that makes the proposed method possible without introducing extra equipment. We introduce two useful cases using two and four orthogonal sequences and illustrates their use using simulations and acoustic measurement examples. We developed an interactive and realtime acoustic analysis tool based on the proposed method. We made it available in an open-source repository. The proposed response analysis method is general and applies to other fields, such as auditory-feedback research and assessment of sound recording and coding.
... The reason why this approach was not successfully applied at that time is mainly due to the immature hardware and software technology used to perform measurements [70]. Different types of sweep signals can be found in the literature, such as linear sweep [46,71,72], exponential sweep [41,70], red-colored sweep [38,73], sweeplets [74], hyperbolic sweeps [75], and constant-SNR sweeps [76]. Among them, linear sweeps (some researchers call them time stretched pulses, TSP [71,72]) and exponential sweeps (sometimes called log sweeps or logarithmic sweeps) are two popular sweep types that are often used to measure acoustic impulse responses when using sweep signals. ...
... Different types of sweep signals can be found in the literature, such as linear sweep [46,71,72], exponential sweep [41,70], red-colored sweep [38,73], sweeplets [74], hyperbolic sweeps [75], and constant-SNR sweeps [76]. Among them, linear sweeps (some researchers call them time stretched pulses, TSP [71,72]) and exponential sweeps (sometimes called log sweeps or logarithmic sweeps) are two popular sweep types that are often used to measure acoustic impulse responses when using sweep signals. ...
Article
Full-text available
A head-related transfer function (HRTF) describes an acoustic transfer function between a point sound source in the free-field and a defined position in the listener’s ear canal, and plays an essential role in creating immersive virtual acoustic environments (VAEs) reproduced over headphones or loudspeakers. HRTFs are highly individual, and depend on directions and distances (near-field HRTFs). However, the measurement of high-density HRTF datasets is usually time-consuming, especially for human subjects. Over the years, various novel measurement setups and methods have been proposed for the fast acquisition of individual HRTFs while maintaining high measurement accuracy. This review paper provides an overview of various HRTF measurement systems and some insights into trends in individual HRTF measurements.
... Figure 5 shows the arrangements of the equipment in the test room. The impulse responses were measured in the target and test rooms using the time-stretched pulse (TSP) method [29]. The reverberation time of the target room, calculated from the impulse response, is 940 ms, whereas that of the test room is 350 ms. ...
Article
Highly realistic sound-field reproduction systems have been attracting attention owing to their ability to reproduce a sound field. These systems commonly use electrodynamic loudspeakers (EDLs) to construct sound images. The directivity of EDLs is broad, so the constructed sound images become diffused owing to the reverberation characteristics of the room. Therefore, sharp sound-image construction requires a large, complex system. As a more feasible approach, we focus on parametric array loudspeakers (PALs). The directivity of PALs is narrow, so the constructed sound image is sharp. Therefore, we propose a sound-field reproduction system that involves using a PAL to construct a sharp sound image and EDLs to reproduce the sensation of reverberation. To reproduce the sensation of reverberation, the early reflections of the target sound field are calculated using the mirror-image method, and virtual early reflections are produced. Thus, the sharpness of the sound image and the sensation of reverberation can be easily controlled to reproduce a highly realistic sound field. We conducted objective evaluation experiments to verify the effectiveness of the proposed system.
... The rightmost gray frame represents the test signal for the simultaneous measurement. In the figure, "MLS: Maximum Length Sequence" and "Swept-sine" are commonly used test signals for impulse response measurements [28,29]. The term "TSP: Time Stretched Pulse" represents them, and our CAPRICEP [16] is a new family member of TSP. ...
Preprint
We propose protocols for acquiring speech materials, making them reusable for future investigations, and presenting them for subjective experiments. We also provide means to evaluate existing speech materials' compatibility with target applications. We built these protocols and tools based on structured test signals and analysis methods, including a new family of the Time-Stretched Pulse (TSP). Over a billion times more powerful computational (including software development) resources than a half-century ago enabled these protocols and tools to be accessible to under-resourced environments.
... The proposed measurement technique can capture the radiation characteristics of a sound source that can repeatedly output the same acoustic signal at a dense position by capturing signals each time the turntable is rotated. When the sound source can output the measurement signal, the radiation characteristics can be captured with the impulse response using time stretched pulse (TSP) [29] and maximum length sequence (MLS) [30] signals. ...
Article
Full-text available
The finite difference time domain (FDTD) method has been proposed and used for sound field simulation. To reproduce actual sound wave propagation in sound field simulations, it is necessary to apply the radiation characteristics. With the FDTD method, radiation characteristics can be applied by setting sound pressure in a dense grid arrangement. However, conventional techniques for capturing radiation characteristics use a sparse array of microphones and are considered insufficient for the FDTD simulation. Furthermore, the technique required to apply captured acoustic signals in a dense grid arrangement with the FDTD method has not been considered. In this paper, we propose a novel hardware and software system that captures the radiation characteristics for a dense grid arrangement and applies them to the FDTD method, while controlling the sound wave propagation with the non-propagation region. The proposed system produces the average differences from measured values of sound pressure, propagation time, center frequency, and log-spectral distortion of 1.8 dB, 0.04 ms, 700 Hz, and 3.5 dB, respectively, which is more accurate than the conventional techniques. The result shows that this system is useful for improving the accuracy of sound wave propagation reproduction with the sound field simulation.
... Dan et al. [37] proposed an integrated framework using a Bayesian model and an expectationmaximization algorithm to calibrate time offsets and microphone positions, particularly in the context of multiple sound sources. However, since these methods rely on special calibration sounds, such as hand-clapping and time-stretched pulses (TSP) [38], it is difficult to achieve real-time calibration. In addition, most existing techniques focus on calibrating the positions of the microphone array and sound sources, rather than directly estimating the ATF sets, which are critical for effective SSL and SSS. ...
Article
Full-text available
In this paper, we propose an online adaptation method for Fourier series-based acoustic transfer function (FS-ATF) models for robot audition systems using microphone array signal processing. The ATF represents the characteristics of signal propagation from a sound source to a microphone, which is essential for sound source localization (SSL) and sound source separation (SSS). For practical applications in dynamically changing real-world environments, ATF models must satisfy two criteria: (1) they must be adaptable to changes in the acoustic environment; and (2) they must be lightweight to be suitable for resource-constrained systems, such as robots with limited memory and computational capacity. The proposed method addresses these challenges using Fourier series expansions for interpolation, which reduces the memory footprint of the ATF model and facilitates online adaptation to acoustic environmental changes. The experimental results demonstrate that the proposed online adaptation method both improves the SSL and SSS performance while reducing the size of the ATF model, which represents a significant improvement over existing online ATF adaptation methods.
... where RIR represents the intensity and time of arrival of direct sound, early reflections, and late reverberation. The RIR can either be measured in a controlled environment [5,16,24,77] or simulated using physics-based simulators [86,92]. Measuring RIR requires sophisticated hardware and human labor. ...
Conference Paper
Full-text available
Accurate estimation of Room Impulse Response (RIR), which captures an environment's acoustic properties, is important for speech processing and AR/VR applications. We propose AV-RIR, a novel multi-modal multi-task learning approach to accurately estimate the RIR from a given re-verberant speech signal and the visual cues of its corresponding environment. AV-RIR builds on a novel neural codec-based architecture that effectively captures environment geometry and materials properties and solves speech dereverberation as an auxiliary task by using multi-task learning. We also propose Geo-Mat features that augment material information into visual cues and CRIP that improves late reverberation components in the estimated RIR via image-to-RIR retrieval by 86%. Empirical results show that AV-RIR quantitatively outperforms previous audio-only and visual-only approaches by achieving 36%-63% improvement across various acoustic metrics in RIR estimation. Additionally, it also achieves higher preference scores in human evaluation. As an auxiliary benefit, dereverbed speech from AV-RIR shows competitive performance with the state-of-the-art in various spoken language processing tasks and outperforms reverberation time error score in the real-world AVSpeech dataset. Qualitative examples of both synthesized reverberant speech and enhanced speech are available online 1 .
... These methods, however, rely on offline processing. In addition, it is difficult to calibrate in real time while localizing and separating sound sources, because special calibration sounds, such as hand claps and time-stretched pulse, are required [31]. Furthermore, most methods focus on the calibrating microphone array and source positions and do not directly estimate the TF sets needed for SSL and SSS. ...
Conference Paper
Full-text available
This paper proposes an online adaptation method for Fourier series based acoustic transfer function (TF) models for robot audition systems based on microphone array signal processing. The TF represents the signal propagation characteristics from a sound source to a microphone, which is an essential component for real-world auditory scene analysis, including sound source localization and separation. The real-world applications of TF-based array signal processing requires two characteristics: 1) adaptability to changes in the acoustic environment (changes in the signal propagation characteristics between the sound source and the microphone), and 2) a lightweight TF set for use in embedded systems such as robots with limited memory and computational resources. This paper proposes an online adaptation method for lightweight TF models using the Fourier series expansion. This method has both above two characteristics. Experimental results showed that the use of TF set adapted online using the proposed method performs better sound source localization and separation performance than existing online TF adaptation methods.
... The method's drawback is the lengthier computation time for the deconvolution employing high-order FFT and IFFT filters [20]. Another approach for the measurement of impulse responses is the time-stretched pulses [21]. With the intention of reducing peak distortions, this approach seeks to raise the sound-to-noise ratio. ...
Article
Full-text available
Typically, background noise of different types and levels is present during the measurement of the impulse response in spaces. The two methods that are, in practice, most frequently used in the measurement of the impulse response, are the exponential sine sweep (ESS), and the maximum length sequence (MLS). This study’s objective was to estimate the impact of background noise (white noise, tonal noise) on the acoustic parameters (T30, EDT, C80, and D50) for ESS and MLS measurements, by introducing artificial background noise, employing an external sound source. For this purpose, measurements were performed with varying levels of external noise (in steps of 2 dB), and the effect was assessed, using the relative error compared to measurements without artificial background noise. According to the findings for white noise (as background noise), in the case of T30 and EDT, the difference between the two methods, as well as the relative error, for the initial levels of added background noise, was small. However, for higher levels of added background noise, there was a sharp increase in the relative error, which was greater for the ESS method, both for T30 and EDT. Regarding C80 and D50, while initially the differences between the ESS and MLS methods were small, cumulatively, as the background noise increased, the relative error increased for both methods, with the ESS method showing the largest error. In the case of tonal noise (as background noise), the results were consistent with those observed in the case of white noise. The study’s findings contribute to a better understanding of the ESS and MLS methods, and suggest the expected relative error of acoustic parameters when various types and levels of background noise are present. Additionally, the study suggests, based on background noise and level, the optimum method to conduct impulse response measurements.
... We confirmed spatial independence of the sites where sensors were located using a frequency receiver test (FRT) designed by audio specialist (CG). Parameters of the FRT were designed according to standardized principles of outdoor sound propagation and computergenerated pulse signals for sound measurement (Aoshima, 1981;Embleton et al., 1976). To test acoustic isolation of each sensor location, we placed a Klipsch (KMC3 speaker series), a monopole point source with fixed maximum amplitude, at the location and at the height of each sensor. ...
... The physical model was fixed so that the nose leaf of the bat was perpendicular to the ground. The impulse response was measured using an upward time-stretched pulse (TSP) (Aoshima, 1981;Suzuki et al., 1995). The length of the TSP corresponded to 500 ms at a sampling rate of 48 kHz (bandwidth 24 kHz). ...
Article
Full-text available
The practicality of the finite-difference time-domain (FDTD) method was confirmed by comparing head-related transfer functions obtained from a three-dimensional (3D) digital model of a bat (Rhinolophus ferrumequinum nippon) head with acoustic experiments using a 3D printed physical model. Furthermore, we simulated the auditory directionality using a 3D digital model that was modified based on the pinna movement of a bat during echolocation and found that the alternating movements of the left and right pinna result in a binaural sound pressure difference for vertical sources. Using the FDTD method, suitable for simulating acoustics in large spaces, we could analyze in detail the binaural echoes that bats receive and the acoustic cues they use for echolocation.
... Measurement signals (or excitation signals) with high energy are used for impulse response measurement to achieve a required noise reduction performance. Conventionally, white spectrum signals, such as maximum length sequence (MLS) [5][6][7] and time-stretched pulse (TSP, or linearly swept sine) [8][9][10] signals, have been used as typical measurement signals. An exponentially swept sine (ESS or log-SS) signal [11][12][13][14] with high energy in the low-frequency band is currently the predominantly used measurement signal for room acoustic measurements. ...
Article
Full-text available
It is desirable that the measured acoustic impulse response has constant normalized noise power (NNP) in all frequency bands. However the conventional measurement signals aimed at achieving this property were derived intuitively, and the theoretical background is insufficient. In this work we first theoretically derived the relational formula that the measurement signals must satisfy for the measured impulse response to have constant NNP over all frequency bands. This formula includes all the measurement signals that achieve constant NNP. We then found the shortest (equivalently, the minimum energy) measurement signal among them. We call this signal the bandwise minimum noise (BMN) signal. Experiments to measure the room impulse responses were carried out. The experimental results confirmed that the impulse responses measured by the BMN signal had almost constant NNP in all frequency bands. Also, it was confirmed that the BMN signal achieved the required NNP for reverberation time measurement with the shortest signal length as compared with the conventional measurement signals.
... For example, we measured the response to pitch perturbation using the maximum length sequence (MLS) [19]. Selection of MLS among other TSP signals [22]- [26] was inevitable to make the test signal unpredictable. However, MLS has difficulty in measuring systems with non-linearity [24], [25]. ...
Preprint
We propose an objective measurement method for pitch extractors' responses to frequency-modulated signals. The method simultaneously measures the linear and the non-linear time-invariant responses and random and time-varying responses. It uses extended time-stretched pulses combined by binary orthogonal sequences. Our recent finding of involuntary voice pitch response to auditory stimulation while voicing motivated this proposal. The involuntary voice pitch response provides means to investigate voice chain subsystems individually and objectively. This response analysis requires reliable and precise pitch extraction. We found that existing pitch extractors failed to correctly analyze signals used for auditory stimulation by using the proposed method. Therefore, we propose two reference pitch extractors based on the instantaneous frequency analysis and multi-resolution power spectrum analysis. The proposed extractors correctly analyze the test signals. We open-sourced MATLAB codes to measure pitch extractors and codes for conducting the voice pitch response experiment on our GitHub repository.
... For example, we measured the response to pitch perturbation using the maximum length sequence (MLS) [19]. Selection of MLS among other TSP signals [22]- [26] was inevitable to make the test signal unpredictable. However, MLS has difficulty in measuring systems with non-linearity [24], [25]. ...
Preprint
We introduced a measurement procedure for the involuntary response of voice fundamental-frequency to frequency modulated auditory stimulation. This involuntary response plays an essential role in voice fundamental frequency control while less investigated due to technical difficulties. This article introduces an interactive and real-time tool for investigating this response and supporting tools adopting our new measurement method. The method enables simultaneous measurement of multiple system properties based on a novel set of extended time-stretched pulses combined with orthogonalization. We made MATLAB implementation of these tools available as an open-source repository. This article also provides the detailed measurement procedure using the interactive tool followed by offline measurement tools for conducting subjective experiments and statistical analyses. It also provides technical descriptions of constituent signal processing subsystems as appendices. This application serves as an example for adopting our method to biological system analysis.
... To overcome these limitations, methods based on stationary noise have been proposed. While (Barry, 1974;Hollin and Jones, 1977) use white noise, (Aoshima, 1981) and (Suzuki et al., 1995) later proposed a flat spectrum pulse signal stretched in time by filtering. Other excitation signals were then developed to guarantee a better immunity to background noise, such as MLS (Rife and Vanderkooy, 1999;Schroeder, 1979;Stan et al., 2002) and ...
Preprint
In the context of building acoustics and the acoustic diagnosis of an existing room, this paper introduces and investigates a new approach to estimate mean absorption coefficients solely from a room impulse response (RIR). This inverse problem is tackled via virtually-supervised learning, namely, the RIR-to-absorption mapping is implicitly learned by regression on a simulated dataset using artificial neural networks. We focus on simple models based on well-understood architectures. The critical choices of geometric, acoustic and simulation parameters used to train the models are extensively discussed and studied, while keeping in mind conditions that are representative of the field of building acoustics. Estimation errors from the learned neural models are compared to those obtained with classical formulas that require knowledge of the room's geometry and reverberation times. Extensive comparisons made on a variety of simulated test sets highlight different conditions under which the learned models can overcome the well-known limitations of the diffuse sound field hypothesis underlying these formulas. Results obtained on real RIRs measured in an acoustically configurable room show that at 1~kHz and above, the proposed approach performs comparably to classical models when reverberation times can be reliably estimated, and continues to work even when they cannot.
... For example, we measured the response to pitch perturbation using the maximum length sequence (MLS) [21]. Selection of MLS among other TSP signals [24][25][26][27][28] was inevitable to make the test signal unpredictable. However, MLS has difficulty in measuring systems with non-linearity [26,27]. ...
... To overcome the limitations of synthetic RIRs, real RIRs are recorded in a controlled environment. The maximum length sequence method [19], the time-stretched pulses method [20], and the exponential sine sweep method [21] are common methods to measure real RIRs. Among these approaches, the exponential sine sweep method is robust to changing loudspeaker output volume and performs well in automatic speech recognition tasks. ...
... To overcome these limitations, methods based on stationary noise have been proposed. While (Barry, 1974;Hollin and Jones, 1977) use white noise, (Aoshima, 1981) and (Suzuki et al., 1995) later proposed a flat spectrum pulse signal stretched in time by filtering. Other excitation signals were then developed to guarantee a better immunity to background noise, such as MLS (Rife and Vanderkooy, 1999;Schroeder, 1979;Stan et al., 2002) and ...
Article
In the context of building acoustics and the acoustic diagnosis of an existing room, it introduces and investigates a new approach to estimate the mean absorption coefficients solely from a room impulse response (RIR). This inverse problem is tackled via virtually supervised learning, namely, the RIR-to-absorption mapping is implicitly learned by regression on a simulated dataset using artificial neural networks. Simple models based on well-understood architectures are the focus of this work. The critical choices of geometric, acoustic, and simulation parameters, which are used to train the models, are extensively discussed and studied while keeping in mind the conditions that are representative of the field of building acoustics. Estimation errors from the learned neural models are compared to those obtained with classical formulas that require knowledge of the room's geometry and reverberation times. Extensive comparisons made on a variety of simulated test sets highlight different conditions under which the learned models can overcome the well-known limitations of the diffuse sound field hypothesis underlying these formulas. Results obtained on real RIRs measured in an acoustically configurable room show that at 1 kHz and above, the proposed approach performs comparably to classical models when reverberation times can be reliably estimated and continues to work even when they cannot.
... However, generating a strong, yet significantly short pulse, i.e., an impulse, in real-world situations is difficult. Several studies have used alternate methods for calculating impulse responses without generating an actual impulse [27], [28]. This study uses a sine sweep that does not require tight synchronization between the signal generator sampling clock and the digitizing unit of the probing device that is used to capture the response. ...
Article
Full-text available
This study presents a method for predicting location classes of a room such as a kitchen, and restroom, where a user is located by discovering location-specific sensor data motifs in sensor data observed by user’s sensor devices, such as smartwatch, without requiring labeled training data collected in a target environment. For example, we can observe similar waveforms corresponding to kitchen knife chopping actions using body-worn accelerometers in kitchens and can also observe similar sound features by active sound probing in bathrooms because of their water-resistant walls. This indicates that such location-specific sensor data motifs can be inherent information for location class prediction in almost every environment. This study proposes a novel method that automatically detects location-specific motifs from time series sensor data by calculating a score that represents the “location specificity” of each motif in a time series. Previous studies on location class prediction assume that location-specific sensor data are always observed in a room or use handcrafted rules and templates to detect location-specific sensor data resulting in difficulties in applying them to several realistic environments. In contrast, our method, named IndoLabel, can automatically discover short sensor data motifs, specific to a location class, and can automatically build an environment-independent location classifier without requiring handcrafted rules and templates. The proposed method was evaluated in real house environments using leave-one-environment-out cross-validation and achieved a state-of-the-art performance although labeled training data in the target environment was unavailable.
... There are several variations in the change in frequency. The increase or decrease in the instantaneous frequency of the linear swept-sine signal is linear [28,29]. On the other hand, the instantaneous frequency of the exponential swept-sine signal exponentially increases, as its name suggests [30][31][32]. ...
Preprint
Full-text available
A new impulse response (IR) dataset called "MeshRIR" is introduced. Currently available datasets usually include IRs at an array of microphones from several source positions under various room conditions, which are basically designed for evaluating speech enhancement and distant speech recognition methods. On the other hand, methods of estimating or controlling spatial sound fields have been extensively investigated in recent years; however, the current IR datasets are not applicable to validating and comparing these methods because of the low spatial resolution of measurement points. MeshRIR consists of IRs measured at positions obtained by finely discretizing a spatial region. Two subdatasets are currently available: one consists of IRs in a three-dimensional cuboidal region from a single source, and the other consists of IRs in a two-dimensional square region from an array of 32 sources. Therefore, MeshRIR is suitable for evaluating sound field analysis and synthesis methods. This dataset is freely available at \url{https://sh01k.github.io/MeshRIR/} with some codes of sample applications.
... swept-sine technique is demonstrated as an example of state of the art measurement methods of RIRs. It outperforms other methods like the maximum length sequence (MLS) method[Sch79], the inverse repeated sequence (IRS) method[DH93] and the time-stretched pulse method[Aos81,Suz+95] due to a very high signal-to-noise ratio (SNR). Additionally, in comparison to the other measurement methods, no periodic repetitions of the excitation signal are necessary during the measurement. ...
Thesis
Full-text available
The process of dereverberation of audio signals recorded in a reverberant environment is among the most challenging problems pertaining to the signal treatment of acoustics. The complexity increases in the presence of interfering noise or if the acoustic signal source is not stationary and changes its position. The field of study on dereverberation is extensive and there are numerous dereverberation methods which provide solutions for the treatment of signals with fading artifacts. However, most of them lack efficacy or precision if the sound source moves through space. In this paper, a method for the dereverberation of a moving sound source will be proposed and examined. The method utilizes microphone arrays and beamforming in order to track the sound source’s position. It regains the excitation signal and eliminates the reflections with the multiple-input multiple-output inverse filtering theorem. When applied to a moving acoustic source, the dereverberation method is amended with periodically tracked source positions and segmentally inverse filtering with repeatedly updated inverse filter coefficients. The method is both simulated in an environment with one reflecting wall and investigated in an anechoic chamber under the same circumstances. This comprises the examination of an algorithm that aims for the improvement of sound source localization with adapted steering vector formulations. Tests are performed both with a stationary and with a moving source signal. The following two aspects of the dereverberation of a moving sound source will be considered. First, it is observed what influence a time-invariant impulse response has on the inverse filter applied to a short segment of a moving speech signal. Then a reverberant signal is sliced and the performance of the inverse filter is examined. In the framework of this paper it can be shown in simulations that the outcome of a stationary sound source dereverberation is highly accurate. The measurements applied cannot confirm the results obtained in the simulations because the applied inverse filtering method is sensitive to impulse response fluctuations. Simulations, however, show good results for the dereverberation of small moving speech signal segments. The influence of the impulse response fluctuation only shows high perturbances at the fricatives of a speech signal. The method fails for sliced reverberant signals due to the incapability of the inverse filtering method to dereverberate the signal segmentally.
... For example, we measured the response to pitch perturbation using the maximum length sequence (MLS) [21]. Selection of MLS among other TSP signals [24][25][26][27][28] was inevitable to make the test signal unpredictable. However, MLS has difficulty in measuring systems with non-linearity [26,27]. ...
Preprint
Auditory feedback plays an essential role in the regulation of the fundamental frequency of voiced sounds. The fundamental frequency also responds to auditory stimulation other than the speaker's voice. We propose to use this response of the fundamental frequency of sustained vowels to frequency-modulated test signals for investigating involuntary control of voice pitch. This involuntary response is difficult to identify and isolate by the conventional paradigm, which uses step-shaped pitch perturbation. We recently developed a versatile measurement method using a mixture of orthogonal sequences made from a set of extended time-stretched pulses (TSP). In this article, we extended our approach and designed a set of test signals using the mixture to modulate the fundamental frequency of artificial signals. For testing the response, the experimenter presents the modulated signal aurally while the subject is voicing sustained vowels. We developed a tool for conducting this test quickly and interactively. We make the tool available as an open-source and also provide executable GUI-based applications. Preliminary tests revealed that the proposed method consistently provides compensatory responses with about 100 ms latency, representing involuntary control. Finally, we discuss future applications of the proposed method for objective and non-invasive auditory response measurements.
... Notably, the size of the sound-absorbing panels was 1.08 m 2 (600  900 mm  2 pieces), and the sound absorption coefficient exceeded 1. Table 2 lists the experimental conditions. The measuring method was the swept-sine method (time-stretched pulse method) using the sound source as the chirp signal [16][17][18]. ...
Article
In recent years, the rapid development of information and communication technology (ICT) and the influence of the novel coronavirus (COVID-19) have affected our lives and work in various fields such as medical and welfare, construction and manufacturing and education, etc. With this global background, teleconference systems have received attention and become a new trend. However, the acoustics of rooms using teleconference system often overlap the acoustic characteristics from multiple rooms on both the speaker and listener sides. Therefore, it can sometimes be difficult to listen to each other. A prior study suggested that the installation of sound-absorbing panels improves intelligibility and reduces the listening difficulty for young people. However, elderly people must be included in the target owing to the effects of aging. This study aimed to clarify improvements in the subjective assessments of elderly people in a room where a teleconference system is used. In addition, the differences in subjective assessments between young people and elderly people were also investigated. The results of an experiment indicate that, first, a room using a teleconference system demonstrated a greater improvement in subjective assessments after the acoustic improvements compared to the same room where face-to-face meetings. Second, the subjective assessments and improvements of them for elderly people differed greatly since older user had listening habits and experiences that varied from those of young people.
... The RIR can be captured from an acoustic environment using different techniques [8,9,10]. Recording real RIRs require a lot of human labor and special hardware. ...
Preprint
Full-text available
We propose a method for improving the quality of synthetic room impulse responses generated using acoustic simulators for far-field speech recognition tasks. We bridge the gap between the synthetic room impulse responses and the real room impulse responses using our novel, one-dimensional CycleGAN architecture. We pass a synthetic room impulse response in the form of raw-waveform audio to our one-dimensional CycleGAN and translate it into a real room impulse response. We also perform sub-band room equalization to the translated room impulse response to further improve the quality of the room impulse response. We artificially create far-field speech by convolving the LibriSpeech clean speech dataset [1] with room impulse response and adding background noise. We show that far-field speech simulated with the improved room impulse response using our approach reduces the word error rate by up to 19.9% compared to the unmodified room impulse response in Kaldi LibriSpeech far-field automatic speech recognition benchmark [2].
... The RIR is then calculated by circular correlation between the measured output and the original MLS signal. This method was further improved in order to achieve better RIR estimation in [Aoshima 1981;Dunn and Hawksford 1993]. Unfortunately this technique introduces several artifacts which yield spurious peaks in the estimation. ...
Thesis
Most of audio signal processing methods regard reverberation and in particular acoustic echoes as a nuisance. However, they convey important spatial and semantic information about sound sources and, based on this, recent echo-aware methods have been proposed. In this work, we focus on two directions. First, we study how to estimate acoustic echoes blindly from microphone recordings. Two approaches are proposed, one leveraging on continuous dictionaries, one using recent deep learning techniques. Then, we focus on extending existing methods in audio scene analysis to their echo-aware forms. The Multichannel NMF framework for audio source separation, the SRP-PHAT localization method, and the MVDR beamformer for speech enhancement are all extended to their echo-aware versions.
... To overcome the limitations of synthetic RIRs, real RIRs are recorded in a controlled environment. The maximum length sequence method [19], the time-stretched pulses method [20], and the exponential sine sweep method [21] are common methods to measure real RIRs. Among these approaches, the exponential sine sweep method is robust to changing loudspeaker output volume and performs well in automatic speech recognition tasks. ...
Preprint
Full-text available
We present a Generative Adversarial Network (GAN) based room impulse response generator (IR-GAN) for generating realistic synthetic room impulse responses (RIRs). IR-GAN extracts acoustic parameters from captured real-world RIRs and uses these parameters to generate new synthetic RIRs. We use these generated synthetic RIRs to improve far-field automatic speech recognition in new environments that are different from the ones used in training datasets. In particular, we augment the far-field speech training set by convolving our synthesized RIRs with a clean LibriSpeech dataset [1]. We evaluate the quality of our synthetic RIRs on the far-field LibriSpeech test set created using real-world RIRs from the BUT ReverbDB [2] and AIR [3] datasets. Our IR-GAN reports up to an 8.95% lower error rate than Geometric Acoustic Simulator (GAS) in far-field speech recognition benchmarks. We further improve the performance when we combine our synthetic RIRs with synthetic impulse responses generated using GAS. This combination can reduce the word error rate by up to 14.3% in far-field speech recognition benchmarks.
Article
This study proposes a three-dimensional room transfer function (RTF) parameterization method based on multiple concentric planar circular arrays, which exhibits robustness to variations in the positions of both the receiver and source. According to the harmonic solution to the wave equation, the RTFs between two spherical regions (sound source and receiver) in a room can be expressed as a weighted sum of spherical harmonics, whose weight coefficients serve as the RTF parameters, which can be estimated by placing multiple concentric planar circular arrays composed of monopole-source pairs (MSPs) and multiple concentric planar circular arrays composed of omnidirectional-microphone pairs (OMPs) in respective source and receiver regions. We use MSP arrays to generate required outgoing soundfields originating from a source region. We derive a method to use OMP arrays to estimate RTF parameters that are concealed within the captured soundfield, which can be employed to reconstruct the RTF from any point in the source region to any point in the receiver region. The accuracy of the RTF parameterization method is validated through simulation testing.
Conference Paper
Full-text available
本論文では,マイクロホンアレイ信号処理に基づくロボット聴覚システムのための,フーリエ級数に基づく音響伝達関数モデルのオンライン適応手法について述べる.伝達関数は音源からマイクロホンへの信号伝搬特性を表すものであり,音源定位や分離など,実環境の分析には不可欠である.伝達関数に基づくアレイ信号処理を実環境に応用するには,2 つの特徴が必要である.1) 音響環境の変化に適応できること,2) メモリや計算資源が限られたロボットなどの組み込みシステム で使用するため,伝達関数モデルが軽量であることである.本論文では,上記 2 つの特徴を併せ持ったフーリエ級数展開を用いた軽量な伝達関数モデルのオンライン適応手法を提案する.実験の結果,提案手法を用いてオンラインで適応した伝達関数を用いることで,既存のオンライン伝達関数適応手法よりも音源定位・分離性能が向上することを示した.
Article
An acoustic emitter based on electrical–mechanical transduction is designed to have a projected emission pattern. However, due to constructive features, design flaws, and different material qualities, the final device does not match project specifications which may cause, among other things, the reduction of performance, loss of emission power, and undesirable noise. In this work, an apparatus for acoustic characterization comprising a turntable and an arc is developed to estimate the emission power of acoustic emitters. The time-stretched pulse method is used as a test signal to obtain the emission power for each emitter's azimuth and elevation angles. Processing the acquired signals gives an estimated radiation pattern in the three-dimensional space, which visually allows the analysis of the emission characteristics of the acoustic source.
Article
The parametric array loudspeaker (PAL) can realize sharp directivity using the straightness of ultrasound. Some recent studies have applied phase array approaches to PAL to control the directivity. The PAL is composed of a large aperture emitter consisting of multiple small ultrasonic transducers to exhibits sharper directivity and reproduce audible sound with sufficient sound pressure. In contrast, the directivity of the small-scale PAL may diffuse. In the present paper, we focus on forming a narrow acoustical beam for practical use of the PAL, so that it is able to control the area where the audible sound reproduced by the PAL can be heard. We propose a narrow-edged beamforming method based on individual phase inversion in the amplitude-modulated wave. The PAL is divided into interior part and exterior part, and the phases of the components in the signal fed to the exterior of the PAL are controlled individually. In the processing of the proposed method, the lower frequency component between the carrier and the sideband wave is phase-inverted in the signal fed to the exterior of the PAL. The proposed method uses phase cancellation of both ultrasound and demodulated audible sound to form a narrower beam. This method can also be combined with beam steering techniques to form a narrow steered beam. The experimental results demonstrate the effectiveness of the proposed method in forming a narrower beam than the common PAL.
Preprint
Full-text available
We propose a simple method to measure acoustic responses using any sounds by converting them suitable for measurement. This method enables us to use music pieces for measuring acoustic conditions. It is advantageous to measure such conditions without annoying test sounds to listeners. In addition, applying the underlying idea of simultaneous measurement of multiple paths provides practically valuable features. For example, it is possible to measure deviations (temporally stable, random, and time-varying) and the impulse response while reproducing slightly modified contents under target conditions. The key idea of the proposed method is to add relatively small deterministic signals that sound like noise to the original sounds. We call the converted sounds safeguarded test signals.
Article
An open problem in room impulse response (RIR) measurement is the effect of nonlinearities, especially those with memory, present in the measurement system, specifically in the power amplifier and in the loudspeaker. The nonlinearities can corrupt the measurement introducing artifacts. The paper discusses a RIR measurement method that is robust towards these nonlinearities. The proposed methodology allows measuring the RIR using the cross-correlation method, i.e., computing the cross-correlation between the output signal and an appropriate sequence. In contrast to other cross-correlation based methods, the proposed approach directly estimates the first-order kernel of the Volterra filter modeling the measurement systems, i.e., the system impulse response for small signals. The proposed approach exploits the concepts of orthogonal periodic sequences, recently proposed in the literature. The input signal can be any periodic persistently exciting sequence and can also be a quantized sequence. Measurements performed both on an emulated scenario and in real environments illustrate the validity of the approach and compare it with other competing RIR measurement methods.
Article
Full-text available
Several studies have examined the transfer effects of playing action video games. Recently, some researchers have proposed auditory virtual reality games with three-dimensional virtual auditory display. These studies were intended to apply auditory virtual reality games to the auditory education of visually impaired people. However, few studies have investigated the transfer effects of playing auditory games. In this paper, we introduce previous studies that investigated transfer effects of playing virtual three-dimensional auditory games. Moreover, we proposed new perspectives and future assignments of auditory virtual reality games.
Article
A method employing a digital computer for evaluating the acoustic properties of enclosures is described. Specially shaped tone bursts, generated on the computer, are radiated into the enclosure under study. The sound‐pressure responses at different locations in the enclosure are recorded on a magnetic tape. The data are converted into digital form by an analog‐to‐digital converter and are processed by the digital computer. The processing by the computer includes filtering (to improve signal‐to‐noise ratio), envelope detection, and evaluation of different quantities having subjective or physical significance. A microfilm plotter attached to the computer is used to plot the results. Among the quantities evaluated are reverberation times based on different portions of the decay; direct, early, and reverberant energies; and directional distribution of sound‐energy flux (diffusion). The different quantities are evaluated both as a function of frequency and location in the enclosure. Spatial and frequency averages of the different quantities are also evaluated.
Article
A new method is presented for acquiring impulse responses in concert halls with large signal-to-noise ratios and with a high resolution. In our proposal an omnidirectional loudspeaker is used which is driven bY an amplified sweep: a signal containing all frequencies of interest smeared out in time. By using a deconvolution technique, an almost perfect pulse is obtained with a high peak pressure and a short effective duration. Measurements were made in two different concert halls to illustrate the practical implications of the new technique.