Conference Paper

Amplitude Panning and the Interior Pan

To read the full-text of this research, you can request a copy directly from the author.


The perception of source location using multi-loudspeaker amplitude panning is considered. While there exist many perceptual models for pairwise panning, relatively few studies consider the general multi-loudspeaker case. This paper evaluates panning scenarios in which a source is panned on the boundary or within the volume bounded by discrete loudspeakers, referred to as boundary and interior pans respectively. Listening results reveal the following: 1) pans to a single loudspeaker yield lowest localization error, 2) pairwise pans tend to be consistently localized closer to the listener than single loudspeaker pans, 3) largest errors occur when the virtual source is panned close to the listener, 4) interior pans are accurately perceived and, surprisingly, in some cases more accurately than pairwise pans.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Thomas and Robinson [35] conducted a study using physical loudspeakers as anchors in a localization task assessing dual-balance panned, auditory events. Five loudspeakers were configured at positions +90 • , +45 • , 0 • , -45 • and -90 • . ...
... The maximum eight loudspeakers are comprised of four pairs, two for left/right and two for front/rear. The gains for each balance pair are computed using sine/cosine functions [35]. The Cartesian point source panner is a 3D extension of the dual-balance panner, as described in [16]. ...
Conference Paper
Full-text available
Interior panning algorithms enable content authors to position auditory events not only at the periphery of the loudspeaker configuration, but also within the internal space between the listeners and the loudspeakers. In this study such algorithms are rigorously evaluated, comparing rendered static auditory events at various locations against true physical loudspeaker references. Various algorithmic approaches are subjectively assessed in terms of; Overall, Timbral, and Spatial Quality for three different stimuli, at five different positions and three radii. Results show for static positions that standard Vector Base Amplitude Panning performs equal, or better, than all other interior panning algorithms tested here. Timbral Quality is maintained throughout all distances. Ratings for Spatial Quality vary, with some algorithms performing significantly worse at closer distances. Ratings for Overall Quality reduce moderately with respect to reduced reproduction radius and are predominantly influenced by Timbral Quality.
... As we further discuss in Sect. 2, the renderer (a fully-differentiable implementation [12] of Dolby Atmos [13]) enables multichannel-based learning, as opposed to object-based learning, because it dictates a fully-determined object-based format at the encoder's output. Finally, note that multichannel-based learning also bears resemblances with self-supervised learning [14,15]: instead of directly learning from objects, our loss targets a proxy signal based on multichannel renders. ...
... The mask-estimation block implements the U-Net [16], with minimal adaptations in the final layer to generate 5.1 masks. The renderer is the Dolby Atmos renderer [13] expressed in differentiable form. The de-panner and de-trimmer also correspond to the Dolby Atmos renderer. ...
Full-text available
The current paradigm for creating and deploying immersive audio content is based on audio objects, which are composed of an audio track and position metadata. While rendering an object-based production into a multichannel mix is straightforward, the reverse process involves sound source separation and estimating the spatial trajectories of the extracted sources. Besides, cinematic object-based productions are often composed by dozens of simultaneous audio objects, which poses a scalability challenge for audio object extraction. Here, we propose a novel deep learning approach to object extraction that learns from the multichannel renders of object-based productions, instead of directly learning from the audio objects themselves. This approach allows tackling the object scalability challenge and also offers the possibility to formulate the problem in a supervised or an unsupervised fashion. Since, to our knowledge, no other works have previously addressed this topic, we first define the task and propose an evaluation methodology, and then discuss under what circumstances our methods outperform the proposed baselines.
A method for reproducing binaural audio over an array of loudspeakers is presented. The proposed method is based on spatially-matched filters and is shown to be, in general, more stable than the conventional inverse filter approach. The non-weighted formulation of the method introduces reductions in the reproduced pressure magnitude response at each control point and is therefore inherently lossy. On the other hand, far less error is introduced for frequencies where the Hermitian angle between the row vectors of the plant matrix is close to ninety degrees. For sufficiently dense arrays, little error is introduced at mid-low and high frequencies. The trade-off between reductions in the reproduced target signals and crosstalk cancellation is controlled by introducing frequency dependent weights to the left and right channel filters. The weights that minimise the error are derived. It is shown that the optimal weight solution emphasises achieved crosstalk cancellation over loss less focusing.
Full-text available
This study investigates the independent influences of interchannel level difference (ICLD) and interchannel time difference (ICTD) on the panning of 2-channel stereo phantom images for various musical sources. The results indicate that a level panning can perform robustly regardless of the spectral and temporal characteristics of source signals, whereas a time panning is not suitable for a continuous source with a high fundamental frequency. Statistical differences between the data obtained for different sources are found to be insignificant, and from this a unified set of ICLD and ICTD values for 10 • , 20 • , and 30 • image positions are derived. Linear level and time panning functions for the two separate panning regions of 0 • –20 • and 21 • –30 • are further proposed, and their applicability to arbitrary loudspeaker base angle is also considered. These perceptual panning functions are expected to be more accurate than the theoretical sine or tangent law in terms of matching between predicted and actually perceived image positions.
Conference Paper
Full-text available
Phantom images that rely on interchannel level differences can be produced easily for two-channel stereo. Yet one of the most difficult challenges in production for a five-channel environment is the creation of stable phantom images to the side of the listening position. The addition of simulated early reflection patterns from all five loudspeakers influences the localization of lateral phantom sources. Listening tests were conducted to compare participants' abilities to localize lateral sources under three conditions: power-panned sources alone, sources with simulated early reflection patterns, and simulated early reflection patterns alone (without direct sound). Results compare localization error for the three conditions at different locations and suggest that early reflection patterns alone can be sufficient for source localization.
The performance of a stereophonic sound reproduction system is considered in terms of its wavefront reconstruction capabilities. A new theory of image localization is proposed and used to develop a more general stereophonic sine law which is valid at higher frequencies. The phenomenon of central image disappearance is considered, and the resulting analysis is shown to conform with subjective experience. A criterion is established for the frequency above which a stereophonic system can no longer provide high-fidelity reproduction. Practical results are provided to substantiate the proposed theories.
The results of some practical measurements on the effects of interchannel intensity and time differences in two channel (stereophonic) sound systems are presented. The effects of alterations in the listener position are also covered. The test signals used ranged from single component tones to wide and narrow band noise and included running speech. A theory based on the assumption that the brain is sensitive to interaural time difference and its variation with head movement is developed and is shown to be in reasonable agreement with the practical findings.
The localization of amplitude-panned virtual sources is biased towards the median plane when loudspeakers are not posi-tioned symmetrically with the median plane. The bias is measured using listening tests and with computational modeling of virtual source perception. A modification to an existing pair-wise panning method is proposed that compensates the displacement of virtual sources. The proposed method is evaluated by interpreting conducted listening test results, and by simulating virtual sources with computational modeling. Evaluations suggest that the bias is non-existing with the proposed method.
The apparent position of speech and music sound sources was investigated using both a two channel loudspeaker array and a three channel loudspeaker array. The results showed that a sine-cosine pan law was reasonably accurate for the three channel array, but consistently produced sharp images who's positions were consistently wider than expected with a two channel array. The discrepancy was investigated using a headphone model. We found the apparent position depends strongly on the spectrum of the source, with speech frequencies tending to dominate the overall impression.
Conference Paper
The perceived spatial spread of amplitude panned virtual sources is dependent on the number of loudspeakers that are used to produce them. When pair-wise or triplet-wise panning is applied, the number of active loudspeakers varies as a function of the panning direction. This may cause unwanted changes in spatial spread and coloration of a virtual source if it is moved in the sound stage. In this paper a method is presented to make the directional spread of amplitude panned virtual sources independent of their panning direction. This is accomplished by panning the sound signal to multiple directions near each other simultaneously. This forms a single virtual source with constant directional spread as a function of direction
The paper reviews briefly the history of stereophonic reproduction. The principal basic systems with their underlying ideas are described and compared. Some account is given of the supposed mechanism of natural binaural listening from the viewpoint of direction localization. The principles and practice are discussed of a particular system for domestic use, derived from the early work of Blumlein, and characterized by the use of spaced loudspeakers driven in phase, to which the name "stereosonic" has been given. The aims of this system are defined, and the mathematical theory involved in its use is developed. Limitations and sources of error in the results achieved are described. Equipment used in the making of master recordings and some of the problems of studio technique involved are described. Consideration is given to the form which a domestic stereophonic record should take, and the standards to which such a record should conform, together with the requirements which these impose on the reproducing equipment.
An improved understanding of some stereophonic phenomena may be obtained by use of acoustical pressure phasors to portray sound pressure at the ears of the observer. With the help of phasors, it is possible to expand and modify certain conclusions of previous observers and to validate some previously unpublished observations: a stereophonic "law of sines" is derived. The existence and location of the "out-of-bounds" stereophonic image is analyzed and verified. The "allowed maximum out-of-phase ratio" is derived, together with the observation that this maximum is exceeded by certain microphone arrays. The motion and elevation of the center image in stereophonic reproduction is observed and explained.
British Patent Specification 394,325 (Improvements in and Relating to Sound-Transmission, Sound-Recording and Sound-Reproducing Systems)
  • A D Blumlein
Blumlein, A. D., "British Patent Specification 394,325 (Improvements in and Relating to Sound-Transmission, Sound-Recording and Sound-Reproducing Systems)," Journal Acoust. Eng. Soc., 6(2), pp. 91-98, 1958.
Comparative Sterophonic Listening Tests
  • C Ceoen
Ceoen, C., "Comparative Sterophonic Listening Tests," Journal Acoust. Eng. Soc., 20(1), pp. 19-27, 1972.
Further Study of Sound Field Coding with Higher Order Ambsonics Analytic Methods of Wavefield Synthesis
  • J Daniel
  • J Ahrens
Daniel, J., "Further Study of Sound Field Coding with Higher Order Ambsonics," Berlin, Germany, 2004. [10] Ahrens, J., Analytic Methods of Wavefield Synthesis, Springer-Verlag Berlin Heidelberg, 2012, doi:10.1007/978-3-642-25743-8.
Distance-Based Amplitude Panning
  • T Lossius
  • P Balatazar
  • T Hogue
Lossius, T., Balatazar, P., and de la Hogue, T., "Distance-Based Amplitude Panning," in Proc. International Computer Music Conf., Montreal, Canada, 2009.
Auditory Distance Rendering Using a Standard 5.1 Loudspeaker Layout
  • M.-V Laitinen
  • A Walther
  • J Plogsties
  • V Pulkki
Laitinen, M.-V., Walther, A., Plogsties, J., and Pulkki, V., "Auditory Distance Rendering Using a Standard 5.1 Loudspeaker Layout," in Audio Engineering Soc., New York, USA, 2015.
Optimal 3D Speaker Panning
  • G Dickens
  • M Flax
  • A Mckeag
  • D Mcgrath
Dickens, G., Flax, M., McKeag, A., and McGrath, D., "Optimal 3D Speaker Panning," in Audio Engineering Soc., pp. 421-426, Rovaniemi, Finland, 1999.
Evaluation of Panning Algorithms for Theatrical Applications
  • N Tsingos
  • C Q Robinson
  • D P Darcy
  • P A Crum
Tsingos, N., Robinson, C. Q., Darcy, D. P., and Crum, P. A., "Evaluation of Panning Algorithms for Theatrical Applications," in Proc. International Conf. on Spatial Audio, Erlangen, Germany, 2014.
Phantom Sources using Multiple Loudspeakers in the Horizontal Plane
  • M Frank
Frank, M., Phantom Sources using Multiple Loudspeakers in the Horizontal Plane, Ph.D. thesis, University of Music and Performing Arts, Graz, Austria, 2013.
Unified Theory of Microphone Systems for Sterophonic Sound Recording
  • M Williams
Williams, M., "Unified Theory of Microphone Systems for Sterophonic Sound Recording," in Proc. Audio Engineering Soc. Conv., London, 1987.
Acoustics -Normal Equal-Loudness-Level Contours
"Acoustics -Normal Equal-Loudness-Level Contours," Technical Report ISO 226:2003, International Organization for Standardization, New York, USA, 2003.
A Remarkable Phenomenon with Stereophonic Sound Reproduction
  • K De Boer
de Boer, K., "A Remarkable Phenomenon with Stereophonic Sound Reproduction," Technical Report 9, Philips, 8-13, 1947.
Investigation on the Phantom Image Elevation Effect
  • H Lee
Lee, H., "Investigation on the Phantom Image Elevation Effect," in Audio Engineering Soc., 2015.
  • B Moore
Moore, B., Hearing, New York: Academic, 1995.
AES 143 rd Convention
  • B Moore
  • Hearing
Moore, B., Hearing, New York: Academic, 1995. AES 143 rd Convention, New York, NY, USA, 2017 October 18-21 Page 8 of 8