Conference PaperPDF Available

Sampling Rate Discrimination: 44.1 kHz vs. 88.2 kHz

Authors:

Abstract and Figures

It is currently common practice for sound engineers to record digital music using high-resolution formats, and then down sample the files to 44.1kHz for commercial release. This study aims at investigating whether listeners can perceive differences between musical files recorded at 44.1kHz and 88.2kHz with the same analog chain and type of AD-converter. Sixteen expert listeners were asked to compare 3 versions (44.1kHz, 88.2kHz and the 88.2kHz version down-sampled to 44.1kHz) of 5 musical excerpts in a blind ABX task. Overall, participants were able to discriminate between files recorded at 88.2kHz and their 44.1kHz down-sampled version. Furthermore, for the orchestral excerpt, they were able to discriminate between files recorded at 88.2kHz and files recorded at 44.1kHz.
Content may be subject to copyright.
Audio Engineering Society
Convention Paper
Presented at the 128th Convention
2010 May 22–25 London, UK
The papers at this Convention have been selected on the basis of a submitted abstract and extended précis that have been peer
reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from the author's advance
manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents.
Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New
York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof,
is not permitted without direct permission from the Journal of the Audio Engineering Society.
Sampling rate discrimination:
44.1 kHz vs. 88.2 kHz
Amandine Pras1, Catherine Guastavino1
1 Centre for Interdisciplinary Research in Music Media and Technology,
Multimodal Interaction Laboratory, McGill University, Montréal, Québec, H3A 1EA, Canada
amandine.pras@mcgill.ca, catherine.guastavino@mcgill.ca
ABSTRACT
It is currently common practice for sound engineers to record digital music using high-resolution formats, and then
down sample the files to 44.1kHz for commercial release. This study aims at investigating whether listeners can
perceive differences between musical files recorded at 44.1kHz and 88.2kHz with the same analog chain and type of
AD-converter. Sixteen expert listeners were asked to compare 3 versions (44.1kHz, 88.2kHz and the 88.2kHz
version down-sampled to 44.1kHz) of 5 musical excerpts in a blind ABX task. Overall, participants were able to
discriminate between files recorded at 88.2kHz and their 44.1kHz down-sampled version. Furthermore, for the
orchestral excerpt, they were able to discriminate between files recorded at 88.2kHz and files recorded at 44.1kHz.
1. INTRODUCTION
In 1982, Sony and Philips defined the CD standard
with a sample rate of 44.1 kHz. Since then, ‘high-
resolution’ formats, defined by Rumsey [9] as digital
formats with a sample rate beyond the CD standard
of 44.1 kHz, have been introduced in the market
without commercial success. Thus, sound engineers
tend to record digital music at very high sample rates
and then down-sample the files to 44.1 kHz for
commercial release. However, the down-sampling
process introduces measurable artifacts [3].
Therefore, it is necessary to question why sound
engineers use high sample rates for recording when
the final delivery format is in 44.1 kHz.
Sample rate refers to the number of samples per
second extracted from the original signal. In order to
reconstruct a signal, the sample rate must be at least
twice the frequency of the signal being sampled [8].
According to this theorem and limits of human
hearing commonly known to be 20 kHz, the CD
standard of 44.1 kHz is high enough to encode the
audible content of a signal. However, several theories
support the practice of recording at very high sample
rates.
First, Stuart [10] claimed that some people could hear
above 20 kHz, possibly up to 25 kHz. Indeed, in a
study conducted by Nishiguchi & Hamasaki [6], one
out of 36 participants significantly detected
differences between sound with and without
frequencies above 20 kHz.
A second theory relates to technological limitations
of analog-to-digital converters. To avoid spectral
aliasing of frequencies that are too high to be
encoded, the first step of analog-to-digital conversion
is low-pass filtering. The slope of this anti-aliasing
filter could affect the high frequency content of the
signal, which may introduce audible artifacts [10].
A third theory refers to the temporal resolution
implied by the sample rate [11]. While listening with
two ears, humans can discriminate time differences
of 2 µs or less [7]. Percussionists can play sounds
with transients lasting only a few µs; hall
reverberation may include reflections only a few µs
apart. The temporal difference between two samples
in 44.1 kHz is 22.7 µs, i.e. may not be precise
enough.
Few studies have been conducted to investigate the
perceptual differences between high-resolution and
44.1 kHz or 48 kHz. Meyer & Moran [5] compared
Super Audio CD playback and a loop through a
digital device in 44.1 kHz in an ABX comparison test
but failed to observe significant differences.
Yoshikawa et al [12] found that three participants out
of 11 could discriminate between musical excerpts in
96 kHz and their down-sampled version in 48 kHz.
However, these audible differences could be
attributed to the down sampling algorithm and not to
the difference of sample rate. Laugier [2] observed a
better spatial reproduction and high frequency
restitution while listening to high-resolution files
recorded at 192 kHz / 24 bits compared to files
recorded at 48 kHz / 16 bits. However, since different
bit-depths and equipment were used, the perceived
differences cannot be attributed to the differences in
sample rate alone.
To date, we do not know whether people can
perceive differences between musical files recorded
at 44.1 kHz and files recorded at higher sample rates.
This question is critical to determine if high-
resolution audio is economically viable. Furthermore,
we aim to determine in which context people are
more sensitive to sample rate differences, so that
sound engineers can best choose the recording format
as a function of the instrument(s) recorded and the
acoustics of the room.
In this article, we hypothesize that expert listeners
can discriminate musical files recorded at 44.1 kHz
and 88.2 kHz. To test this hypothesis, we recorded
five different musical excerpts, each presented in
three different formats: 44.1 kHz, 88.2 kHz and the
88.2 kHz version down-sampled to 44.1 kHz. Except
for the sampling rates, the exact same audio gear and
settings were used for recording and playback.
Overall, participants were able to discriminate
between files recorded at 88.2 kHz and their
88.2 kHz to 44.1 kHz down-sampled version.
Furthermore, for the orchestral excerpt, they were
able to discriminate between files recorded at
88.2 kHz and files recorded at 44.1 kHz (p = .01).
2. METHODS
2.1. Participants
Sixteen expert listeners, fifteen males and one
female1, with a mean age of 30 (SD = 7.1), took part
in the study and received CDN$20 per hour for their
participation. All participants reported having studio
experience in sound engineering for an average of 8
years (SD = 5.6). Six reported working as
professional sound engineers in Montreal and ten
were Sound Recording students at McGill University.
All participants except one had musical training (15
years on average, SD = 5.5).
2.2. Musical excerpts
We recorded five musical excerpts corresponding to
different instruments and hall acoustics, namely
Orchestra, Cymbals, Classical Guitar, Voice and
Violin (see details of the musical excerpts in
Table 1). All musicians except the percussionist were
performance students at Université du Québec à
Montréal and McGill University.
All musical excerpts were captured with the exact
same analog chain, consisting of a non-coincident
pair of omnidirectional MKH 8020 microphones
(Sennheiser, QC, Canada) and HV-3D preamplifiers
(Millennia, CA, USA). The two microphones were
separated by 30 cm (12 in), slightly angled (see an
example from the cymbal recording in Figure 1).
1 The first author (AP) participated in the study.
Excerpt
Composer/Piece Performer Location Room
characteristics
Recording
distance
Orchestra Anton Bruchner
Symphony NO. 6
McGill Symphony
Orchestra directed
by Alexis Hauser
Pollack Hall Medium concert
hall (600 seats)
made of wood
50 cm (20 in)
above the
orchestra director
Cymbals Improvisation Mark Nelson CIRMMT
Immersive
Presence Lab
Small dry room 50 cm (20 in)
from the higher
cymbal
Classical
Guitar
Johann Kaspar
Mertz An
Malvina
Michel Salvail CIRMMT
Critical
Listening Lab
Small lively room
made of wood
50 cm (20 in)
from the guitar’s
soundhole
Voice Libby Larsen A
man can love two
women
Margaret Rood Tanna Schulich
Hall
Small concert hall
(200 seats) made of
wood
150 cm (60 in)
from the mouth
Violin Improvisation Sonia Coppey Tanna Schulich
Hall
Small concert hall
(200 seats) made of
wood
150 cm (60 in)
from the violin
Table 1. Details of the five musical excerpts used in the study
We chose these microphones for their frequency
response, ranging from 10 Hz to 60 kHz. According
to Nyquist theorem, the maximum possible frequency
to be digitally converted at 88.2 kHz is 44.1 kHz.
Therefore, the frequency response of the Sennheiser
MKH 8020 microphones does not limit the sound
quality when recording at a sample rate of 88.2 kHz.
We split the analog signal from the two outputs of the
preamplifier, i.e. Left and Right, to four channels, i.e.
Left and Right twice, that were digitally converted at
24 bits, both at 44.1 kHz and 88.2 kHz, using two
Micstasy analog-digital converters (RME, Germany).
We used the 744T portable audio recorder (Sound
Devices, WI, USA) to record the digital signal at
44.1 kHz, and Logic Studio software in a MacBook
Pro (Apple, CA, USA) to record the digital signal at
88.2 kHz. The entire recording chain is detailed in
Figure 2.
We isolated five short excerpts from our recordings,
corresponding to musical phrases of five to eight
seconds, both at 44.1 kHz and 88.2 kHz. No sound
processing was applied, except for a fade-in and a
fade-out in Pyramix 6 software (Merging
Technologies, Switzerland). We made sure that the
selected files at 44.1 kHz and 88.2 kHz had the exact
same fades (in and out) and length. Then, we down-
sampled the 88.2 kHz files to 44.1 kHz. We chose
Pyramix to down-sample the files, this software
being commonly used by sound engineers who record
acoustic music in high-resolution formats.
Furthermore, the down-sampling algorithm in
Pyramix does not provide any settings that could
possibly introduce bias.
Figure 1 Top view of cymbal recording in the
Immersive Presence Laboratory of CIRMMT
Pras & Guastavino Sample rate discrimination
AES 128th Convention, London, UK, 2010 May 22–25
Page 4 of 8
In summary, five musical excerpts were available in
three versions: 44.1 kHz, 88.2 kHz and the 88.2 kHz
version down-sampled to 44.1 kHz. The experiment
consisted of five blocks corresponding to the five
musical excerpts. Each block consisted of 12 trials
each, i.e. all possible pairwise combinations of the
three different versions, each presented four times
(twice in each of the two presentation orders).
Figure 2. Recording diagram
2.3. Procedure
Participants were asked to perform a double blind
ABX task. For each trial, the excerpt was presented
with three versions, namely A, B and the reference X.
A and B always differ. X is always either the same as
A or the same as B. The participant’s task is to
indicate whether X = A or X = B. To nullify order
effects, the order of presentation across trials and
blocks was randomized.
Participants had to listen to all three versions
presented in a trial at least once, and could then
repeat each version as many times as desired or
switch between versions while playing before making
their decision. If they were unsure, they were asked
to pick a version arbitrarily. Before the experimental
session, we demonstrated the graphical interface with
four practice trials. Listeners were free to adjust the
sound level and their position if needed. The duration
of the experiment ranged between two and four hours
per participant, including a break between each block
of trials.
The experiment took place in the Critical Listening
Laboratory of the Centre for Interdisciplinary
Research in Music Media and Technology
(CIRMMT, Montréal, QC, Canada). This ITU
standard room provides high quality controlled
listening conditions. Stimuli were presented through
an RME Fireface 800 digital-analog converter, a
Grace m906 monitor controller (Grace Design, CO,
USA), a Classé CA-5200 stereo amplifier (Classé
Audio, QC, Canada) and B&W 802D loudspeakers
(Bowers & Wilkin, West Sussex, England). Although
the RME Fireface 800 may not be considered a high-
end digital-analog converter, we used it as it was the
only converter that allowed us to switch sample rates
between 44.1 kHz and 88.2 kHz in a reasonable
amount of time. To avoid clipping, we adjusted
delays in our user interface, programmed in
Max/MSP/Jitter 5 (Cycling ’74, CA, USA), resulting
in 730 ms between each version. B&W 802D
loudspeakers have a frequency response ranging from
27 Hz to 33 kHz, thus allowing high-resolution audio
formats to be reproduced in good conditions
regarding the high frequency content.
2.4. Post-study questionnaire
After the listening task, participants were invited to
fill out a questionnaire. The first part concerned
demographical information, studio experience and
musical training. Then, expert listeners were asked to
rate the difficulty of the listening task on a scale of 0
to 10, as well as to describe the perceptual
differences between the different versions. Finally,
we asked which sample rate(s) they commonly use
while recording and why.
3. RESULTS
3.1. Overall discrimination
Cumulative binomial tests on the number of correct
responses were conducted for each participant,
collapsing over all comparison pairs and all musical
excerpts. At this individual level, three expert
listeners out of 16 obtained significant results,
p < .05, 2-tailed. However, they significantly selected
the wrong answer, suggesting that they could hear
Pras & Guastavino Sample rate discrimination
AES 128th Convention, London, UK, 2010 May 22–25
Page 5 of 8
differences between A and B but picked the wrong
one (e.g. A = X when in fact B = X). Subsequently,
we will present the results of these three participants
separately. The remaining 13 participants did not
perform above chance level, either at the individual
or group level, p > .05, 2-tailed, when collapsing over
all format comparison pairs and all musical excerpts.
We applied detection theory to take into
consideration the false alarm rate [1][4]. This
analysis confirmed our findings, i.e. whenever the
binomial test was significant, the corresponding |d’|
was greater than 1 and 95% confidence interval did
not include 0.
To further test our research hypotheses, performance
results were analyzed as a function of format
discrimination.
3.2. Format discrimination
We conducted binomial tests on the number of
correct responses for each format comparison
collapsing over all 13 participants and all musical
excerpts. Significant results were observed for the
comparison between files recorded at 88.2 kHz and
their down-sampled 44.1 kHz version, p = .04, 1-
tailed2. A tendency was observed for the comparison
between files recorded at 88.2 kHz and 44.1 kHz,
p = .1. No significant result were observed for the
comparison between files recorded at 44.1 kHz and
files down-sampled to 44.1 kHz, p = .2.
The same tests were conducted for the three
participants who significantly picked the wrong
answer. Significant results were observed for the
comparison between files recorded at 88.2 kHz and
their down-sampled 44.1 kHz version, as well as for
the comparison between files recorded at 44.1 kHz
and files down-sampled to 44.1 kHz, p = .02,
p < .001, respectively. However, no significant
results were observed for the comparison between
files recorded at 88.2 kHz and 44.1 kHz, p = .15.
2 1-tailed binomial test were used to test our directional research
hypothesis.
3.3. Discrimination by musical excerpts
Figure 3 represents the percentage of times the 13
remaining participants selected the correct answer for
each format comparison and musical excerpt. Using
the binomial test, performances over 63 % indicate
that expert listeners could discriminate between the
two versions and picked the correct answer.
Performances ranging between 37 and 63 % are not
significant (p > .05), suggesting that listeners could
not discriminate between the two versions.
Regarding the comparison between files recorded at
88.2 kHz and 44.1 kHz, significant results were
observed for the Orchestra excerpt only, p = .02.
Regarding the comparison between files recorded at
88.2 kHz and their down-sampled 44.1 kHz version,
significant results were observed for the Classical
Guitar and the Voice excerpts, p = .004, p = .04,
respectively. Regarding the comparison between files
recorded at 44.1 kHz and files down-sampled to
44.1 kHz, no significant result was observed for any
musical excerpt.
Figure 3 Discrimination results for the 13 remaining
participants (n = 149 for Orchestra, n = 150 for
Cymbals, n = 156 for Classical Guitar, Voice and
Violin, N=767 for all excerpts)
Figure 4 presents the percentage of times the three
participants who significantly picked the wrong
answer selected the correct answer for each format
comparison and the musical excerpt. Using the
binomial test, performances under 17 % indicate that
listeners could discriminate between the two versions
but picked the wrong answer. Performances ranging
between 17 and 83 % are not significant (p > .05),
Pras & Guastavino Sample rate discrimination
AES 128th Convention, London, UK, 2010 May 22–25
Page 6 of 8
suggesting that listeners could not discriminate
between the two versions.
It should be noted that significance levels depend on
the number of observations, hence the different
dotted lines in figures 3 and 4 (number of
observations mentioned in the captions).
Regarding the comparison between files recorded at
88.2 kHz and 44.1 kHz, significant results were
observed for the Violin excerpt only, p = .006.
Regarding the comparison between files recorded at
88.2 kHz and their down-sampled 44.1 kHz version,
no significant result was observed. Regarding the
comparison between files recorded at 44.1 kHz and
files down-sampled to 44.1 kHz, significant results
were observed for the Classical Guitar and the Violin
excerpts, p = .02, p = .006, respectively.
Figure 4 Discrimination results for the three
participants who significantly picked up the wrong
answer (n = 36 for Orchestra, Cymbals, Classical
Guitar and Voice, n = 32 for Violin, N=176)
Although these three participants significantly picked
the wrong answer over all comparison formats and
musical excerpts, we observed that for the
comparison between the Orchestra and Cymbals files
recorded at 88.2 kHz and 44.1 kHz, the percentage of
correct answers were similar to those of the 13
remaining participants. However, they did not reach
statistical significance given the low number of
observations. When collapsing over all 16
participants, the results of the comparison between
Orchestra files recorded at 88.2 and 44.1 kHz is still
significant, p = .01.
3.4. Post questionnaire
On a scale from 0 to 10, expert listeners reported that
the difficulty level of the task was 9 on average
(SD = 1.1). They commented that the task was very
demanding in terms of concentration and that it was
hard to stop doubting about what they heard. Thirteen
out of 16 participants described in their own words
the perceived differences between the different
versions. We extracted a total of 16 phrasings from
these verbal descriptions and grouped them into five
categories of sound criteria, namely spatial
reproduction (7 occurrences), high frequency
richness (7 occ.), timbre (5 occ.), precision (5 occ.)
and fullness (2 occ.).
Ten out of 16 participants reported that they are used
to working both at 1 fs3 (i.e. 44.1 or 48 kHz) and 2 fs
(i.e. 88.2 or 96 kHz) in recording studios. Six
participants further specified that their choice of
sampling rate depends on the format of final delivery.
More specifically, three mentioned selecting 2 fs for
classical music and 1 fs for pop music due to Digital
Signal Processing limitations. Five other participants
reported always recording at 1 fs, and the remaining
one only recording at 88.2 kHz. Overall, participants
justified recording at 1 fs because of storage space
(5 occ.) and equipment limitations (4 occ.); while
they chose to record at 2 fs to enhance the sensation
of space (3 occ.) and to get the highest possible
resolution (3 occ.).
4. DISCUSSION
Findings from the listening tests suggest that expert
listeners can detect differences between musical
excerpts presented at 88.2 kHz and 44.1 kHz.
Moreover, the qualitative analysis of verbal
descriptors indicates that these differences were
perceived in terms of spatial reproduction, high
frequency content, timbre and precision. However,
the ability to perceive these differences depends on
the format comparison and musical excerpt. Listeners
could significantly discriminate between files
recorded at different sample rates only for the
orchestral excerpt, the only recording of a complex
scene with different musical instruments playing in a
medium concert hall. This finding provides support
for theories that high-resolution formats better
3 Frequency sample or sample rate
Pras & Guastavino Sample rate discrimination
AES 128th Convention, London, UK, 2010 May 22–25
Page 7 of 8
reproduce the details of transients and room acoustics
[10][11].
Furthermore, our findings show that listeners were
more sensitive to differences between files recorded
at 88.2 kHz and their 44.1 kHz down-sampled
version, than to differences between files recorded at
different sample rates. As we down-sampled the files
through a single software program, further
investigation of down-sampling algorithms is
required to draw conclusions regarding the impact of
down sampling vs. recording at 44.1 kHz. However,
our findings question the common practice of
recording at high sample rates and later down
sampling, as it seems to lower the sound quality more
than recording directly at 44.1 kHz. Therefore, sound
engineers should consider the format of final delivery
and commercial release before choosing the
recording sample rate.
While we observed audible differences between
sample rates of 88.2 and 44.1 kHz, they remain very
subtle and difficult to detect. It is difficult to interpret
why three out of 16 participants significantly picked
the wrong answer. We verified every step of the data
collection and analysis. A possible reason could be
that given the difficulty and duration of the listening
test, participants doubted so much that they lost
confidence and systematically picked the wrong
answer.
It should also be noted that all the files used in this
study were recorded and presented in 24 bits. Thus,
we were not comparing the CD standard (i.e.
44.1 kHz, 16 bits) with high-resolution formats but
restricted our experiment to sample rate
discrimination. This choice was based on the fact that
limitations of bit-depth of the CD standard at 16 bits
have been identified and documented [10]. Therefore,
differences between CD standard and high-resolution
audio formats should be easier to detect than the
differences observed in this study.
Participants suggested using more excerpts with long
reverberation in future experiments. Indeed, we
focused here on different instruments and only
included one complex auditory scene in a medium
hall. For this orchestral excerpt only, participants
were able to significantly discriminate between
44.1 kHz and 88.2 kHz. These perceptual differences
will be further investigated by varying systematically
and independently the complexity of the auditory
scene and the acoustics of the room. Furthermore, we
plan to replicate this study with professional
musicians to quantify the extent to which our ability
to hear differences between sample rates depends on
expertise. We will also extend this research to
preference tests on the file comparisons that provided
significant results. Furthermore, the stimuli used for
our listening tests were also recorded simultaneously
through different analog-digital converters. We are
currently investigating the sensitivity of expert
listeners to different converters.
5. ACKNOWLEDGMENT
The work reported herein was funded by an FQRSC
team grant on the Perception of Audio Quality (P.I.:
I. Fujinaga, CG and 4 co-applicants). The user
interface for the listening task was programmed in
Max/MSP by Guillaume Boutard and Julien
Boissinot. The authors would like to thank the
musicians who participated in the stimuli recordings,
and the expert listeners who offered their listening
skills and time for this study. The authors would also
like to thank Julien Boissinot, Yves Methot and
Harold Kilianski for technical assistance during the
recordings and the experiments conducted at the
Centre for Interdisciplinary Research in Music Media
and Technology (Montréal, QC, Canada), and
Maryse Lavoie and Aaron Rosenblum for
proofreading.
6. REFERENCES
[1] Boley, J., & Lester, M., “Statistical Analysis of
ABX Results Using Signal Detection Theory,”
presented at the AES127th Convention, New
York, NY, USA, 2009 October 9-12.
[2] Laugier, V., “La Haute Résolution
Audionumérique,” Master thesis at CNSMDP
(France). (June 2005) [Not published]
[3] Leonard, B., “The Downsampling Dilemma:
Perceptual Issues in Sample Rate Reduction,”
presented at the AES124th Convention,
Amsterdam, The Netherlands, 2008 May 17-20.
[4] Macmillan, N. A., Creelman, C. D., “Detection
theory: A user's guide”, Cambridge: University
Press Cambridge. (1991)
[5] Meyer, E. B., & Moran, D. R., “Audibility of a
CD-Standard A/D/A Loop Inserted into High-
Resolution Audio Playback”, J. Audio Eng. Soc.,
vol. 55(9), pp. 775-779. (September 2007)
Pras & Guastavino Sample rate discrimination
AES 128th Convention, London, UK, 2010 May 22–25
Page 8 of 8
[6] Nishiguchi, T., Hamasaki, K., Iwaki, M., Ando,
A., “Perceptual Discrimination between Musical
Sounds with and without Very High Frequency
Components,” presented at the AES115th
Convention, NY, USA, 2003 October 10-13.
[7] Nordmark, J. O., “Binaural time discrimination”,
J. Acoust. Soc. Am., vol. 60(4), pp. 870-880.
(October 1976)
[8] Nyquist, H., “Certain Topics In Telegraph
Transmission Theory”, reprinted from AIEE
Trans., vol. 47, pp. 617-644. (April 1928). IEEE
Proc., vol. 90(2), pp. 280–305. (2002).
[9] Rumsey, F., “High Resolution Audio”, J. Audio
Eng. Soc., vol. 55(12), pp. 1161–1167.
(December 2007)
[10] Stuart, J., “Coding for high-resolution audio
systems”, J. Audio Eng. Soc., vol. 52(3),
pp. 117-144. (March 2004)
[11] Woszczyk, W., “Physical and perceptual
considerations for high-resolution audio,”
presented at the AES115th Convention, New
York, NY, USA, 2003 October 10-13.
[12] Yoshikawa, S., Noge, S., Ohsu, M., Toyama, S.,
Yanagawa, H., Yamamoto, T., “Sound Quality
Evaluation of 96 kHz Sampling Digital Audio,
presented at the AES99th Convention, New
York, NY, USA 1995, October 6-9.
... Table 1 provides a near complete listing of all perceptual studies (i.e., listening tests) involving high resolution audio. Studies generally are divided into those focused on Real world content [11,24,35,43,58,[62][63][64][65] Same different [23,66] ABX [64,[67][68][69] AXY [24,70] XY [71,72] MulƟsƟmulus raƟng establishing the limits of auditory perception and those focused on our ability to discriminate differences in format. ...
... Others (Nishiguchi 2003 andHamasaki 2004) may have employed an equivalent of the Martingale betting system, where an experiment was repeated with a participant until a lack of effect was observed (though this may also be considered a method of verifying an initial observation). And several studies had conclusions that may have suffered from the multiple comparisons problem (Yoshikawa 1995, Nishiguchi 2003, Hamasaki 2004, Pras 2010. Interestingly, several studies reported results suggesting that for some trials, participants had an uncanny ability to discriminate far worse than guessing (Oohashi 1991, Meyer 2007, Woszcyk 2007, Pras 2010. ...
... And several studies had conclusions that may have suffered from the multiple comparisons problem (Yoshikawa 1995, Nishiguchi 2003, Hamasaki 2004, Pras 2010. Interestingly, several studies reported results suggesting that for some trials, participants had an uncanny ability to discriminate far worse than guessing (Oohashi 1991, Meyer 2007, Woszcyk 2007, Pras 2010. ...
Article
Full-text available
There is considerable debate over the benefits of recording and rendering high resolution audio, i.e., systems and formats that are capable of rendering beyond CD quality audio. We undertook a systematic review and meta-analysis to assess the ability of test subjects to perceive a difference between high resolution and standard, 16 bit, 44.1 or 48 kHz audio. All 18 published experiments for which sufficient data could be obtained were included, providing a meta-analysis involving over 400 participants in over 12,500 trials. Results showed a small but statistically significant ability of test subjects to discriminate high resolution content, and this effect increased dramatically when test subjects received extensive training. This result was verified by a sensitivity analysis exploring different choices for the chosen studies and different analysis approaches. Potential biases in studies, effect of test methodology, experimental design, and choice of stimuli were also investigated. The overall conclusion is that the perceived fidelity of an audio recording and playback chain can be affected by operating beyond conventional levels.
... There has been a lot of discussion regarding whether CD and other similar media provide adequate resolution to recreate the live listening experience, with the aim of determining whether such high resolutions as 192 kHz might be excessive [1,2,3,4,5,6]. It is commonly assumed that the upper frequency limit of the human auditory system for the perception of spectral information is about 20 kHz [7,8,9], so frequencies above this have been termed "ultrasonic". ...
... Several authors have reported positive results [1,5,10,11]. ...
... Pras et al. [5] found that expert listeners were able to distinguish between musical excerpts with sample rates of 44 kHz and 88.2 kHz. These differences, although difficult to detect, were reported by the listeners as corresponding to changes in the spatial reproduction, high frequency richness, timbre, precision, and fullness of the sounds. ...
Article
This paper describes listening tests investigating the audibility of various filters applied in a high-resolution wideband digital playback system. Discrimination between filtered and unfiltered signals was compared directly in the same subjects using a double-blind psychophysical test. Filter responses tested were representative of anti-alias filters used in A/D (analogue-to-digital) converters or mastering processes. Further tests probed the audibility of 16-bit quantization with or without a rectangular dither. Results suggest that listeners are sensitive to the small signal alterations introduced by these filters and quantization. Two main conclusions are offered: firstly, there exist audible signals that cannot be encoded transparently by a standard CD; and secondly, an audio chain used for such experiments must be capable of high-fidelity reproduction.
... Previous audio quality research in the context of digitization has explored listeners' ability to discriminate between audio files presented in CD audio and compressed MP3 formats [1]. Pras and Guastavino [2] have also compared the differentiability of sample rates. However, discrimination of digitized audio recorded using different phonograph playback systems (PPS) has to our knowledge remained unexplored. ...
... For each condition, we recorded five different musical examples two times each through each system described above, following current digitization guidelines (24-bit/96kHz). 2 The excerpts were presented to participants in an ITU-standard listening room. Each condition took approximately one hour to complete. ...
... Components for the initial mid-range system were selected based on their moderate price given our collection. We then proceeded to minimize the perceived disparity through the following steps: (1) replace mid-range system component n, (2) adjust setting for n within recommended range of acceptable settings (if provided, otherwise manufactures were contacted) to best match high-end sound quality, (3) equipment setup performed according to a well-reputed guide [3] and calibrated using the Ultimate Analogue Test Record (Analogue Productions, KS, USA), and (4) perform informal AB comparison using software described in Section 2.4 to determine the closeness of the two PPS. Each of these comparisons was performed using a set of records withheld from our actual test conditions. ...
Article
Full-text available
Digitization of phonograph records is an important step towards the preservation of our cultural history and heritage. The phonograph playback systems (PPS) required for this digitization process are comprised of several components in a variety of price ranges. We report on the results of two listening tests intended to ascertain the extent to which expert listeners can discriminate between PPS of different price ranges. These results are intended to determine the extent to which component selection affects the discrimination between PPS and to provide a set of guidelines for the purchase of PPS components for the digitization of phonograph record collections.
... This change therefore can be represented by 7 bits. The total number of bits for the biological signals of all muscles can be obtained by multiplying the number of total muscles by 7. In addition, the brain transmits a command signal to each muscle 60 times per second; thus, so the frequency is set to 60 Hz [36]. The bit rate for a muscle is obtained by multiplying the frequency and the data size. ...
Article
Full-text available
In this paper, the amount of information collected by sensory and motor organs is analyzed. Humans use their senses to recognize the information around them and to perform appropriate actions. To perform these processes, information is exchanged between organs via analog electrical signals. Analog electrical signals convey information transmitted from human sensory organs to the cerebrum or commands sent from the cerebrum to the appropriate motor organs. This paper analyzes the amount of analog signal information generated inside the human body and converts this information into equivalent digital data. This process is carried out in an effort to build a human-like humanoid based on the equivalent digital data. The analog information generated in the human body is investigated based on the medical publications to date. These analyses result in the bit rate and delay requirements of nervous systems that are built with digital networks. It is shown in this paper that both artificial eyes equivalently generate approximately 14 Gigabits from a one-time look when a humanoid performs at a human-like level. In addition, it is realized that the human body is more sensitive to pressure than to temperature since the pressure sensation generates, on average, more information than the temperature sensation.
... Higher-than-CD data rate doesn't guarantee improved sound quality, but doubling or quadrupling sample rate from 44.1 or 48 kHz has shown incremental improvements [26][27][28][29][30][31][32]. ...
Article
Recent interest in high-resolution digital audio has been accompanied by a trend to higher and higher sampling rates and bit depths, yet the sound quality improvements showdiminishing returns and so fail to reconcile human auditory capability with the information capacity of the channel. We propose an audio capture, archiving, and distribution methodology based on sampling kernels having finite length, unlike the “ideal” sinc kernel that extends indefinitely. We show that with the new kernels, original transient events need not become significantly extended in time when reproduced. This new approach runs contrary to some conventional audio desiderata such as the complete elimination of aliasing. The paper reviews advances in neuroscience and recent evidence on the statistics of real signals, from which we conclude that the conventional criteria may be unhelpful. We show that this proposed approach can result in improved time/frequency balance in a high-performance chain whose errors, from the perspective of the human listener, are equivalent to those introduced when sound travels a short distance through air.
... 33 These software toolkits also contain routines for playing back audio directly within the interface, though this functionality is generally limited. [269]. The human auditory system displays heightened sensitivity within the frequency range of 2-5 kHz, which should be taken into consideration while determining an appropriate sampling rate. ...
Article
These audio files accompany the PhD dissertation by R. L. Alexander entitled "The Bird's Ear View: Audification for the Spectral Analysis of Heliospheric Time Series Data".
... Sound: In its everyday use related to music, the term 'sound' may rather vaguely refer to the acoustical 'footprint' of either a certain instrument, ensemble, genre, performance space, audio production paradigm, time-epoch, playback device, effect-device, compression algorithm, audio emitter or playback room that makes a difference for the musical experience (Västfjäll et al., 2002;Smudits, 2003;Timmers, 2007;Pras et al. 2009;Pras & Guastavino 2010;Moore and Dockwray, 2010;). By drawing on ecological perceptual psychology (Gibson, 1986), the quasi-Gestalt-quality of these impressions may be explained by the perception and recognition of real existing spectro-morphological invariants in the audio signal that specify 'musical affordances' (Windsor and de Bézenac, 2012). ...
Article
Within academic music research, ‘musical expertise’ is often employed as a ‘moderator variable’ when conducting empirical studies on music listening. Prevalent conceptualizations typically conceive of it as a bundle of cognitive skills acquired through formal musical education. By implicitly drawing on the paradigm of the Western classical live concert, this ignores that for most people nowadays, the term ‘music’ refers to electro-acoustically generated sound waves rendered by audio or multimedia electronic devices. Hence, our paper tries to challenge the traditional musicologist’s view by drawing on empirical findings from three newer music-related research lines that explicitly include the question of media playback technologies. We conclude by suggesting a revised ‘musical expertise’ concept that extends from the traditional dimensions and also incorporates expertise gained through ecological perception, material practice and embodied listening experiences in the everyday. Altogether, our contribution shall draw attention to growing convergences between musicology and media and communications research.
Article
Full-text available
Existing designs for collaborative online audio mixing and production, within a Digital Audio Workstation (DAW) context, require a balance between synchronous collaboration, scalability, and audio resolution. Synchronous multiparty collaboration models typically utilize compressed audio streams. Alternatively those that stream high-resolution audio do not scale to multiple collaborators or experience issues owing to network limitations. Asynchronous platforms allow collaboration using copies of DAW projects and high-resolution audio files. However they require participants to contribute in isolation and have their work auditioned using asynchronous communication, which is not ideal for collaboration. This paper presents an innovative online DAW collaboration framework for audio mixing that addresses these limitations. The framework allows collaborators to synchronously communicate while contributing to the control of a shared DAW project. Collaborators perform remote audio mixing with access to high-resolution audio and receive real-time updates of remote collaborators’ actions. Participants share project and audio files before a collaboration session; however the framework transmits control data of remote mixing actions during the session. Implementation and evaluation have demonstrated the scalability of up to 30 collaborators on residential Internet bandwidth. The framework delivers an authentic studio mixing experience where high-resolution audio projects are democratically auditioned and synchronously mixed by remotely located collaborators.
Article
On the topic of high-performance audio, there remains disagreement over the ways in which sound quality might benefit from higher sample-rates or bit-depths in a digital path. Here we consider the hypothesis that if a digital pathway includes any unintended or undithered quantizations, then several types of errors are imprinted, whose nature will change with increased sampling rate and wordsize. Although dither methods for ameliorating quantization error have been well understood in the literature for some time, these insights are not always applied in practice. We observe that it can be rare for a performance to be captured, produced, and played back with a chain “flawless” in this regard. The paper includes an overview of digital sampling and quantization with additive, subtractive, and noise-shaped dither. The paper also discusses more advanced topics such as cascaded quantizers, fixed and floatingpoint arithmetic, and time-domain aspects of quantization errors. The paper concludes with guidelines and recommendations, including for the design of listening tests.
Article
We conducted subjective evaluation tests to study perceptual discrimination between musical sounds with and without very high frequency components (above 21 kHz). In order to conduct strict evaluation tests, the sound reproduction system used for these tests was designed to exclude any leakage or influence of very high frequency components in the audible frequency lange. As a result, no significant difference was found between sounds with and without very high frequency components among the sound stimuli and the subjects. From these results, however, we can still neither confirm nor deny the possibility that some subjects could discriminate between musical sounds with and without very high frequency components. Nevertheless, the results also showed that the test system is entirely reliable, and that further evaluation tests using this test system will accurately show whether the very high frequency components in sound stimuli affect human recognition of sound quality.
Article
What do we mean by high resolution? The recording and replay chain is reviewed from the viewpoints of digital audio engineering and human psychoacoustics. An attempt is made to define high resolution and to identify the characteristics of a transparent digital audio channel. The theory and practice of selecting high sample rates such as 96 kHz and word lengths of up to 24 bit are examined. The relative importance of sampling rate and word size at various points in the recording, mastering, transmission, and replay chain is discussed. Encoding methods that can achieve high resolution are examined and compared, and the advantages of schemes such as lossless coding, noise shaping, oversampling, and matched preemphasis with noise shaping are described.
Article
Claims both published and anecdotal are regularly made for audibly superior sound quality for two-channel audio encoded with longer word lengths and/or at higher sampling rates than the 16-bit/44.1-kHz CD standard. The authors report on a series of double-blind tests comparing the analog output of high-resolution players playing high-resolution recordings with the same signal passed through a 16-bit/44.1 -kHz "bottleneck." The tests were conducted for over a year using different systems and a variety of subjects. The systems included expensive professional monitors and one high-end system with electrostatic loudspeakers and expensive components and cables. The subjects included professional recording engineers, students in a university recording program, and dedicated audiophiles. The test results show that the CD-quality A/D/A loop was undetectable at normal-to-loud listening levels, by any of the subjects, on any of the playback systems. The noise of the CD-quality loop was audible only at very elevated levels.
Article
The reserach papers presented at the AES 32nd International Conference, held at Copenhagen, Denmark, on September 21-23, 2007, focusing on high resolution audio, are summarized and the research results are presented. The papers discussed include digital microphones for high resolution audios, localization in spatial audio- from wavefield synthesis to 22.2, and MPEG- a professional archival multimedia application format (MAF) under development. The conference also discussed topic which of the two digital audio systems best match the quality of the analog system. Results of the subjective tests of the presented papers suggest that audible differences between high resolution systems and lower resolution systems can be hard to hear in various cases.
Article
ABX tests have been around for decades and provide a simple, intuitive means to determine if there is an audible difference between two audio signals. Unfortunately, however, the results of proper statistical analyses are rarely published along with the results of the ABX test. The interpretation of the results may critically depend on a proper statistical analysis. In this paper, a very successful analysis method known as signal detection theory is presented in a way that is easy to apply to ABX tests. This method is contrasted with other statistical techniques to demonstrate the benefits of this approach.
Conference Paper
Many options currently exist for sample rate conversion. With sample rate reduction playing an integral part in the modern production world, downsampling algorithm quality is more important than ever. This paper presents data exploring the differences in sample rate reduction algorithms. While certain tests clearly display differences in the quality of the algorithms, listening test data shows the average listener is unable to repeatedly discern the difference in sample rate reduction methods.
Article
The least discriminable change in the position of a sound image was measured for pure tone with and without initial interaural delay, as well as for complex tones. The signals were either perfectly regular or jittered. The introduction of jitter allowed the subjects to lateralize equal‐amplitude tones beyond 1500 Hz, which has been assumed to represent the limit for binaural phase discrimination. No frequency limit for jittered tones either in a fixed relationship or as binaural beats could be found. The smallest deviation from regularity employed was 0.2 μsec, and under certain conditions it was effective in producing lateralization. With no jitter in one ear, jitter in the other ear was ineffective. For a given high‐frequency jittered tone,discrimination improves up to a limit with level and jitter magnitude. Complex tones, jittered and unjittered, show smaller just‐noticeable differences for low and medium frequencies. Some other implications of these findings are discussed. A striking similarity between the curves for binaural time discrimination and those for pitchdiscrimination was found. Subject Classification: [43]65.68, [43]65.62, [43]65.60.
Book
Detection Theory is an introduction to one of the most important tools for analysis of data where choices must be made and performance is not perfect. Originally developed for evaluation of electronic detection, detection theory was adopted by psychologists as a way to understand sensory decision making, then embraced by students of human memory. It has since been utilized in areas as diverse as animal behavior and X-ray diagnosis. This book covers the basic principles of detection theory, with separate initial chapters on measuring detection and evaluating decision criteria. Some other features include: complete tools for application, including flowcharts, tables, pointers, and software;. student-friendly language;. complete coverage of content area, including both one-dimensional and multidimensional models;. separate, systematic coverage of sensitivity and response bias measurement;. integrated treatment of threshold and nonparametric approaches;. an organized, tutorial level introduction to multidimensional detection theory;. popular discrimination paradigms presented as applications of multidimensional detection theory; and. a new chapter on ideal observers and an updated chapter on adaptive threshold measurement. This up-to-date summary of signal detection theory is both a self-contained reference work for users and a readable text for graduate students and other researchers learning the material either in courses or on their own. © 2005 by Lawrence Erlbaum Associates, Inc. All rights reserved.
Article
The most obvious method for determining the distortion of telegraph signals is to calculate the transients of the telegraph system. This method has been treated by various writers, and solutions are available for telegraph lines with simple terminal conditions. It is well known that the extension of the same methods to more complicated terminal conditions, which represent the usual terminal apparatus, leads to great difficulties. The present paper attacks the same problem from the alternative standpoint of the steady-state characteristics of the system. This method has the advantage over the method of transients that the complication of the circuit which results from the use of terminal apparatus does not complicate the calculations materially. This method of treatment necessitates expressing the criteria of distortionless transmission in terms of the steady-state characteristics. Accordingly, a considerable portion of the paper describes and illustrates a method for making this translation. A discussion is given of the minimum frequency range required for transmission at a given speed of signaling. In the case of carrier telegraphy, this discussion includes a comparison of single-sideband and double-sideband transmission. A number of incidental topics is also discussed
Physical and perceptual considerations for high-resolution audio," presented at the AES115th Convention
  • W Woszczyk
Woszczyk, W., "Physical and perceptual considerations for high-resolution audio," presented at the AES115th Convention, New York, NY, USA, 2003 October 10-13.