Conference PaperPDF Available

Subjective evaluation of high resolution recordings in PCM and DSD audio formats

Authors:

Abstract and Figures

High-resolution audio production and consumption are increasing attraction supported by releases of the relatively affordable audio recorders from multiple manufacturers and broader bandwidth of the Internet. However, differences in audio quality between high-resolution audio formats are still not well known, especially between the different formats available for the audio recorders. In order to evaluate the differences between subjective impression of the sounds recorded using high resolution audio formats, three audio formats - PCM (192 kHz/24 bits), DSD (2.8 MHz), and DSD (5.6 MHz)- recorded with multiple studio-quality audio recorders were evaluated in a double-blind A-B comparison listening test. Six sound programs evaluated by forty-six participants on eight attributes revealed statistically significant differences between PCM and DSD but not between the two sampling frequencies (2.8 MHz and 5.6 MHz) of DSD.
Content may be subject to copyright.
Audio Engineering Society
Convention Paper
Presented at the 136th Convention
2014 April 26–29 Berlin, Germany
This Convention paper was selected based on a submitted abstract and 750-word precis that have been peer reviewed
by at least two qualified anonymous reviewers. The complete manuscript was not peer reviewed. This convention
paper has been reproduced from the author’s advance manuscript without editing, corrections, or consideration by the
Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request
and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see
www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct
permission from the Journal of the Audio Engineering Society.
Subjective Evaluation of High Resolution
Recordings in PCM and DSD Audio Formats
Atsushi MARUI1, Toru KAMEKAWA1, Kazuhiko ENDO2, and Erisa SATO2
1Faculty of Music, Tokyo University of the Arts, 1-25-1 Senju, Adachi, Tokyo, 120-0034, Japan
2TEAC Corporation, 1-47 Ochiai, Tama, Tokyo, 206-8530, Japan
Correspondence should be addressed to Atsushi MARUI (marui@ms.geidai.ac.jp)
ABSTRACT
High-resolution audio production and consumption are increasing attraction supported by releases of the
relatively affordable audio recorders from multiple manufacturers and broader bandwidth of the Internet.
However, differences in audio quality between high-resolution audio formats are still not well known, espe-
cially between the different formats available for the audio recorders. In order to evaluate the differences
between subjective impression of the sounds recorded using high resolution audio formats, three audio for-
mats —PCM (192 kHz/24 bits), DSD (2.8 MHz), and DSD (5.6 MHz)— recorded with multiple studio-quality
audio recorders were evaluated in a double-blind A-B comparison listening test. Six sound programs evalu-
ated by forty-six participants on eight attributes revealed statistically significant differences between PCM
and DSD but not between the two sampling frequencies (2.8 MHz and 5.6 MHz) of DSD.
1.INTRODUCTION
While the music industry actively releasing percep-
tually coded versions for almost all new music re-
leases, high-resolution audio production and con-
sumption also is increasing attraction supported by
broader bandwidth of the Internet realizing the mu-
sic distribution over the Internet and production of
relatively affordable high-resolution capable sound
recorders from several manufacturers. Nevertheless,
in spite of the use of high-resolution formats in the
industry, differences in audio quality between high-
resolution audio formats are still not well known,
especially between the different audio formats avail-
able for the sound recorders.
Meyer and Moran [1], reported that they were un-
able to reject the null hypothesis of the listeners
9019
Marui et al. Subjective Evaluation of PCM and DSD
of SACD and 44.1 kHz/16 bit could not differenti-
ate between them. The source materials used in
the test are not well described that it is difficult
to know whether the result was from the difference
in the playback formats. The similar experiment
but in different approach was done by Woszczyk,
et al. [2], and concluded that higher sampling rate
(8-times the sampling rate of CD rate) was chosen
to have higher degree of fidelity to the analog ref-
erence. While the two reports compare the differ-
ent formats and/or sampling rates, Blech and Yang
compared PCM and DSD having the same bit rate
(2.8224 MHz against 176.4kHz/24 bit) to find the lis-
teners not able to discriminate between the two sys-
tems [3]. These research results are obtained from
discrimination tasks where listeners choose which
sound stimulus is different or the same to the other
stimuli considering global impression of the stim-
uli including spatial, spectral, and temporal aspects.
These aspects can be evaluated independently, but
it was not done for the focus of their research was
on simply to discover whether the listeners are able
to discriminate between different formats.
Our aim in this paper is to document the stim-
uli and the method used in evaluating the differ-
ences between subjective impression of the sounds
recorded using different high resolution audio for-
mats, especially between PCM and DSD. Three au-
dio formats recorded with multiple studio-quality
sound recorders were assessed in a subjective listen-
ing test on multiple attributes. The formats used in
the test were PCM (192kHz/24bits), DSD (2.8MHz),
and DSD (5.6MHz). Those three formats were cho-
sen because they are some of the highest resolutions
available in most of the consumer or professional au-
dio recorders currently existing in the market. Also,
the number of bits per second are comparable to each
other; DSD (2.8 MHz) being the lowest among the
three at 2,822,400, DSD (5.6 MHz) being the high-
est at 5,644,800, and PCM (192 kHz/24 bit) in the
middle at 4,608,000. However, authors are aware
that the processes of converting analog to digital and
vice versa are different in PCM and DSD, and direct
comparison of the number of bits is not very helpful
to understand the differences in audio quality. The
sound sources were recorded simultaneously using
four recorder models and comparison between dif-
ferent recorders were done in the test, but a portion
only relevant to the comparison of audio formats is
discussed in this paper since benchmarking is not
our intention.
In Section 2, preparation of stimuli, listening en-
vironment, participants, and test method are ex-
plained. Section 3 describes the experimental setup
and methods used in the test. Section 4 presents the
result of the statistical analysis of the data, followed
by the conclusion in Section 5.
2.STIMULI
2.1.Recording Devices and Formats
The recorders used in the recording session were
TASCAM DA-3000, TASCAM HS-2000, and KORG
MR-2000S for PCM (192kHz/24bit), TASCAM DA-
3000, TASCAM DV-RA1000HD, and KORG MR-
2000S for DSD (2.8MHz), and TASCAM DA-3000
and KORG MR-2000S for DSD (5.6MHz) (also
shown in Table 1).
Although the actual model names of the recorders
used in the test are revealed here for the readers
to know the exact procedure we used in the stimuli
preparation, we do not discuss about the test results
of comparison between different recorder models of
the same audio format since it is beyond the focus
of this report. Only the data relevant to the com-
parison of the three audio formats recorded using
the same recorder model will be discussed in the fol-
lowing sections. Since there are two such models in
this study (DA-3000 and MR-2000S), only sources
recorded using DA-3000 were used.
Figure 1 shows the frequency and time responses of
the analog-to-digital-to-analog converters for three
formats. The responses were measured with a swept-
sine signal (Optimized Aoshima’s Time-Stretched
Pulse [4]) using RME Fireface UC audio inter-
face operated with Apple Logic X software in
192 kHz/24 bit. The response of the audio interface
is compensated.
2.2.Recording of Source Material
Recording took place on April 27th, 2013, in Studio
A and Studio B in Senju Campus of Tokyo Univer-
sity of the Arts. Jazz musicians (a trio of piano,
bass, and drums), vocalists (two females and one
male), voice actors (two females and one male), and
AES 136th Convention, Berlin, Germany, 2014 April 26–29
Page 2 of 10
Marui et al. Subjective Evaluation of PCM and DSD
Format Recorder
PCM (192 kHz / 24 bit) TASCAM DA-3000
TASCAM HS-2000
KORG MR-2000S
DSD (2.8 MHz) TASCAM DA-3000
TASCAM DV-RA1000HD
KORG MR-2000S
DSD (5.6 MHz) TASCAM DA-3000
KORG MR-2000S
Table 1: Eight audio recorders used for recording music/speech performances.
16 31.5 63 125 250 500 1k 2k 4k 8k 16k 32k
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
DSD (2.8MHz)
DSD (5.6MHz)
PCM (192kHz/24bit)
Frequency (Hz)
Power (dB)
-60 -40 -20 0 20 40 60
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
DSD (2.8MHz)
DSD (5.6MHz)
PCM (192kHz/24bit)
Time (samples, Fs=192kHz)
Amplitude
Fig. 1: Left panel shows frequency responses of the audio recorder for three formats used in the test: DSD
(2.8 MHz) (top), DSD (5.6 MHz) (middle), and PCM (192 kHz/24 bit) (bottom). The abscissa is frequency
(Hz) and the ordinate is power (dB). Curves for DSD (2.8MHz) and PCM (192 kHz/24 bit) are shifted by
1 dB for readability. Right panel shows time responses of the audio recorder for three formats used in the
test: DSD (2.8 MHz) (top), DSD (5.6 MHz) (middle), and PCM (192 kHz/24 bit) (bottom). The abscissa is
time in sample (192 kHz sampling rate) and the ordinate is amplitude. Amplitudes were scaled to [1,+1)
range. Curves for DSD (2.8 MHz) and PCM (192 kHz/24 bit) are shifted by 1 for readability.
AES 136th Convention, Berlin, Germany, 2014 April 26–29
Page 3 of 10
Marui et al. Subjective Evaluation of PCM and DSD
a classical pianist participated in the recording. The
performers were recorded in separate sessions. All
performers and the recording engineer are of high
quality professionals and were paid for participation.
The microphones (AKG C414, C451, D112, DPA
4006, 4011, 4015, Neumann U87Ai, Royer R-121,
Sanken CO-100K, Shure SM57, and Sony C-38B)
were connected to either a microphone amplifier of
Trident S80 mixing console, Millenia HV-3D-8, John
Hardy M-1, or API 512c and mixed to two channels
on Trident S80 mixing console. The resulting two
channel mix was paralleled to eight two-channel lines
and sent to eight recorders.
The recorders were operated to record the mu-
sic/speech performance simultaneously. The cali-
bration was done to 16 dBFS with an input of
+4.0 dBu 1kHz sinewave achieving the equal record-
ing levels within ±0.1 dB error measured with NTI
XL2 Sound Level Meter. No effect processing was
applied and no digital processing was used in the
course of recording except for the analog-to-digital
conversion in each recorder.
2.3.Stimuli Preparation
Although there were 19 recorded sources (2 jazz trio
performances, 6 percussion, 6 speeches, 4 vocals, and
2 piano performances), only six of them were used
in the following subjective listening test:
drums solo
triangle solo
speech (male, in Japanese language)
vocal solo (female)
jazz trio
classical piano
These six were selected on the basis of having wide
spectral and temporal varieties that were thought
to reveal the differences between the audio formats.
Stimuli such as speech are added for their famil-
iarity to the listeners. Among these, triangle solo
and speech were recorded using only one microphone
(Sanken CO-100K for the triangle and Neumann
U87 for the speech), and thus two channels in the
recordings are identical.
The recorded materials were edited to be about 10
to 15 seconds each in the respective audio recorder
without level adjustment and fade in/out applied to
minimize the effect of signal processing. Any pro-
cessing except for this trimming was done on none of
the sounds in the course of making the stimuli. The
frequency responses and amplitudes of the stimuli
are shown in Figures 3 and 4.
3.LISTENING TEST
3.1.Listening Environment
Listening tests were conducted in two separate sites:
a listening room in TEAC Corporation (Site A) and
Sound Production Studio in Senju Campus of Tokyo
University of the Arts (Site B). Site A is a room
used for critical listening and products evaluations
with fairly less reverberation. Site B is a mixing
studio conforming to ITU-R BS.1116 [5] used often
for listening tests and evaluation of audio materials.
Two TASCAM DA-3000 (from the same production
lot with the same firmware version installed) were
used for playback of all the stimuli. They were set
to master- and slave-mode for playback synchroniza-
tion. Hence, the same digital-to-analog converter
was used for all stimuli played back. Outputs from
DA-3000 were sent to a remote controllable monitor
switcher (operates in analog domain) which enabled
a listener to switch between one of the two playback
sources.
Two loudspeakers were positioned in the standard
stereo playback according to ITU-R BS.775 [6], with
2.70 m (8.86feet) from the listening position (Fig-
ure 2). Two Genelec 1032A were used at Site A and
Genelec 8050A were used at Site B. A stereo vol-
ume controller was installed as a precaution for loud
noise exposure to human subjects. Because no loud
noise was emitted by accident, the level was kept at
constant level throughout the experiment. Esoteric
C-02 preamplifier was used at Site A and Tomoca
TCC-100ST was used at Site B for the volume con-
troller.
3.2.Participants
Total of 46 listeners (30 and 16 people at Site A and
B, respectively) with normal hearing participated in
the test. Participants in Site A were selected from
AES 136th Convention, Berlin, Germany, 2014 April 26–29
Page 4 of 10
Marui et al. Subjective Evaluation of PCM and DSD
loud-
speaker
loud-
speaker
2.70 m
(8.86 feet)
2.70 m
(8.86 feet)
60°
remote
DA-3000
(master)
DA-3000
(slave)
L R L R
cascade out
cascade in
switcher
volume
controller
Fig. 2: Signal path in listening test setup.
the people not involved in the development or eval-
uation of the recording devices. Participants in Site
B are students and professors in Sound Recording
program with timbral ear training experiences.
3.3.Test Design
A double-blind two-intervals two-alternatives forced
choice method (pairwise A-B comparison) was used
for the listening test. A listener was presented with
a pair of two stimuli and asked to listen carefully to
the similarity and dissimilarity while freely switching
between them, and asked to choose which of the two
stimuli has higher sensation or impression related to
a given attribute. The eight attributes used in the
test are:
image width,
image depth,
image definition,
timbral brightness,
timbral richness,
temporal separability,
overall quality, and
overall preference.
All attributes were provided in the listeners’ native
language of Japanese. The choices were made on all
eight attributes for one pair of stimuli before moving
on to the next pair. The stimuli pairs were presented
in a different random order for each of the partici-
pants. Stimuli in a pair were also assigned randomly
to two playback systems as well.
In order to reduce the duration of the test, only 10
pairs each for a given source material were done.
The comparisons included in the test (also shown in
Table 2) are:
three pairs among the three recorders in PCM
(192kHz/24bit): DA-3000, HS-2000, and MR-
2000S,
three pairs among the three recorders in DSD
(2.8MHz): DA-3000, DV-RA1000HD, and MR-
2000S,
a pair of the two recorders in DSD (5.6MHz):
DA-3000 and MR-2000S, and
three pairs among three formats PCM
(192kHz/24bit), DSD (2.8MHz), and DSD
(5.6MHz) on DA-3000.
Ten comparisons each for six programs resulted in
60 trials.
The test began after the instruction and a training
session were given. The listeners were allowed to
take a break at any time. The test was done indi-
vidually for each participant and took approximately
1.5 to 2 hours each. Listening tests were done be-
tween August 5th to 30th, 2013, and the listeners
were compensated for their participation.
4.RESULTS AND DISCUSSION
For the reasons discussed earlier, only the results of
comparisons between three audio formats are pre-
sented in this section.
AES 136th Convention, Berlin, Germany, 2014 April 26–29
Page 5 of 10
Marui et al. Subjective Evaluation of PCM and DSD
Condition Stimulus 1 Stimulus 2
PCM (192 kHz) DA-3000 HS-2000
DA-3000 MR-2000S
HS-2000 MR-2000S
DSD (2.8 MHz) DA-3000 DV-RA1000HD
DA-3000 MR-2000S
DV-RA1000HD MR-2000S
DSD (5.6 MHz) DA-3000 MR-2000S
DA-3000 DSD (5.6 MHz) PCM (192 kHz)
DSD (2.8 MHz) PCM (192 kHz)
DSD (5.6 MHz) DSD (2.8 MHz)
Table 2: Ten comparison pairs in the test. For each comparison, six programs were presented resulting in
60 trials.
No noticeable differences were found in responses
data between the two sites, therefore the data from
two sites were summed.
Binomial test was used to analyze the test results.
p-values from the binomial test for each combination
of source, comparison pair, and attribute are shown
in Table 3. A p-value shows the probability of how
likely that a left hand side format on “comparison”
column is chosen to have the same level of sensation
or impression on a given attribute to the right hand
side format on the same row. Smaller the p-value
is, statistically more significant that the left hand
side has a higher level of sensation or impression
on the given attribute. For example, Drums stimu-
lus of DSD (5.6 MHz) has p=.001 when compared
against PCM (192 kHz/24 bit) in overall preference.
This suggests that DSD version was statistically sig-
nificantly preferred over the PCM version. The last
row (“combined”) shows the result of the binomial
test of all sources combined. In the following discus-
sion, statistical significance level α=.01 is used. It
is indicated with two or three asterisks in Table 3.
Spatial attributes, width,depth, and definition, were
not significantly different for stimuli with mono-
phonic contents (triangle and speech). Comparisons
on these attributes was statistically significant for
Vocal and Jazz Trio stimuli, and a subset of the at-
tributes were found to be significantly different for
Drums and Piano stimuli. The result is somewhat
obvious that monaural stimuli cannot reveal the dif-
ferences between the two formats.
For timbral attributes, richness had significant dif-
ferences for all comparisons between DSD and PCM,
but only one significant difference with brightness
between DSD (5.6 MHz) and PCM for Drums stim-
ulus. Although the authors’ expectation was that
brightness can be used to discriminate between the
formats, opposite result was obtained. “Sharpness,”
a synonymous attribute to brightness, is the at-
tribute related to spectral centroid with weight on
high frequency [7, 8]. The results suggest that par-
ticipants were not able to hear the differences be-
tween the spectral differences in high frequency re-
gions. On the other hand, although very subtle,
difference in frequency curves of DSD and PCM is
larger below 31.5 Hz compared to that of high fre-
quency ranges above 16 kHz disregarding the spec-
tral noise above 32 kHz in DSD (2.8 MHz) (Figure 1).
Participants may have relied on the low frequency
contents to discriminate the formats, and it is sup-
ported by Triangle stimulus which has less low fre-
quency contents was not being highly significant in
the comparison.
The attribute temporal separability was found to be
not significant for all stimuli.
Overall quality and preference showed similar ten-
dency of participants being chosen DSD (5.8 MHz)
more than PCM (192 kHz/24 bit) for Drums,
Speech, Vocal, and Jazz Trio stimuli. Recall that
the participants were asked to choose which of the
two stimuli has the higher sensation or impression
in a given attribute. Therefore, DSD was chosen to
have higher quality and preference in most of the
attributes than PCM.
AES 136th Convention, Berlin, Germany, 2014 April 26–29
Page 6 of 10
Marui et al. Subjective Evaluation of PCM and DSD
There were no significant differences between DSD
(2.8 MHz) and DSD (5.6 MHz) for all attributes in
any of the source materials under α=.01 level.
Combined result show the result of binomial test
with response data from all six stimuli summed. Sta-
tistically significant differences in all attributes be-
tween PCM and DSD for both 2.8 MHz and 5.6 MHz
are seen. On the other hand, significant differ-
ences between 2.8 MHz and 5.6 MHz of DSD were
not found.
5.CONCLUSION
In order to evaluate the differences between sub-
jective impression of the sounds recorded using
high resolution audio formats, three audio for-
mats recorded with multiple studio-quality au-
dio recorders were evaluated in a double-blind A-
B comparison listening test. Three formats are
PCM (192 kHz/24 bits), DSD (2.8 MHz), and DSD
(5.6 MHz). They were chosen because they are some
of the highest resolutions currently available in most
of the consumer or professional audio recorders.
The three formats were compared by 46 participants
on six sound programs and eight attributes. From
the result of binomial test applied on the data from
pairwise comparison experiment, statistically signif-
icant differences between PCM and DSD but not
between the two sampling frequencies (2.8 MHz and
5.6 MHz) of DSD.
Although there were stimuli (such as monaural
sounds like Triangle and Speech) and attributes
(such as brightness and temporal separability) that
were not applicable to discriminate between the for-
mats, stimuli having broad spectra and clear tempo-
ral transients (such as Vocal, Jazz Trio, and Piano)
and attributes such as spatial width,spatial depth,
timbral richness were able to be used to discriminate
between DSD and PCM. Overall quality and prefer-
ence showed similar tendency of in favor of DSD
(5.6 MHz) over PCM (192 kHz/24 bit).
Authors were careful in preparing the stimuli and in
conducting the experiment. Nevertheless, of which
physical aspects participants were listening to when
discriminating the formats are still not fully under-
stood. It is our hope that this presentation serves
for understanding the qualities of the high-resolution
audio formats.
6.REFERENCES
[1] E. Brad Meyer and David R. Moran. Audibility
of CD-standard A/D/A loop inserted into high-
resolution audio playback. Audio Engineering
Society, 55(9):775–779, September 2007.
[2] Wieslaw Woszczyk, Jan Engel, John Usher,
Ronald Aarts, and Derk Reefman. Which of
the two digital audio systems best matches the
quality of the analog system? In Proceedings
of AES 31st International Conference, London,
UK, June 2007. Audio Engineering Society.
[3] Dominik Blech and Min-Chi Yang. DVD-Audio
versus SACD: Perceptual discrimination of digi-
tal audio coding formats. In Proceedings of 116th
Convention, Berlin, Germany, May 2004. Audio
Engineering Society.
[4] Nobuharu Aoshima. Computer-generated pulse
signal applied for sound measurement. Journal of
Acoustical Society of America, 69(5):1484–1488,
May 1981.
[5] International Telecommunication Union. Rec.
ITU-R BS.1116-1: Methods for the subjective as-
sessment of small impairments in audio systems
including multichannel sound systems, October
1997.
[6] International Telecommunication Union. Rec.
ITU-R BS.775-3: Multichannel stereophonic
sound system with and without accompanying
picture, August 2012.
[7] G. von Bismarck. Timbre of steady sounds: A
factorial investigation of its verbal attributes.
Acoustica, 30:146–159, 1974.
[8] G. von Bismarck. Sharpness as an attribute of
the timbre of steady sounds. Acoustica, 30:159–
172, 1974.
AES 136th Convention, Berlin, Germany, 2014 April 26–29
Page 7 of 10
Marui et al. Subjective Evaluation of PCM and DSD
16 31.5 63 125 250 500 1k 2k 4k 8k 16k 32k
0
10
20
30
40
50
60
70
80
90
Drums Solo
Frequency (Hz)
Power (dB)
16 31.5 63 125 250 500 1k 2k 4k 8k 16k 32k
0
10
20
30
40
50
60
70
80
90
Vocal Solo (female)
Frequency (Hz)
Power (dB)
16 31.5 63 125 250 500 1k 2k 4k 8k 16k 32k
0
10
20
30
40
50
60
70
80
90
Triangle Solo
Frequency (Hz)
Power (dB)
16 31.5 63 125 250 500 1k 2k 4k 8k 16k 32k
0
10
20
30
40
50
60
70
80
90
Jazz Trio
Frequency (Hz)
Power (dB)
16 31.5 63 125 250 500 1k 2k 4k 8k 16k 32k
0
10
20
30
40
50
60
70
80
90
Speech (male)
Frequency (Hz)
Power (dB)
16 31.5 63 125 250 500 1k 2k 4k 8k 16k 32k
0
10
20
30
40
50
60
70
80
90
Classical Piano
Frequency (Hz)
Power (dB)
Fig. 3: Frequency responses of six stimuli used in the listening test. Power on vertical axis is not in a physical
scale, but relative levels of six stimuli are preserved. Plots were generated from PCM (192 kHz/24bit) version
of the stimuli.
AES 136th Convention, Berlin, Germany, 2014 April 26–29
Page 8 of 10
Marui et al. Subjective Evaluation of PCM and DSD
0 2 4 6 8 10 12 14 16
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Drums Solo
Time (sec)
Amplitude
0 2 4 6 8 10 12 14 16
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Vocal Solo (female)
Time (sec)
Amplitude
0 2 4 6 8 10 12 14 16
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Triangle Solo
Time (sec)
Amplitude
0 2 4 6 8 10 12 14 16
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Jazz Trio
Time (sec)
Amplitude
0 2 4 6 8 10 12 14 16
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Speech (male)
Time (sec)
Amplitude
0 2 4 6 8 10 12 14 16
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Classical Piano
Time (sec)
Amplitude
Fig. 4: Amplitude plots of six stimuli used in the listening test. Plots were generated from PCM
(192 kHz/24 bit) version of the stimuli.
AES 136th Convention, Berlin, Germany, 2014 April 26–29
Page 9 of 10
Marui et al. Subjective Evaluation of PCM and DSD
Source Comparison Spatial Timbral Temporal Overall
Width Depth Definition Richness Brightness Separability Quality Preference
Drums DSD5 PCM 0.001 *** 0.024 * 0.226 0.005 ** 0.005 ** 0.087 . 0.000 *** 0.001 ***
DSD2 PCM 0.011 * 0.024 * 0.024 * 0.002 ** 0.024 * 0.226 0.005 ** 0.011 *
DSD5 DSD2 0.146 0.146 0.011 * 0.024 * 0.146 0.011 * 0.048 * 0.048 *
Triangle DSD5 PCM 0.013 * 0.441 0.151 0.052 . 0.027 * 0.151 0.231 0.231
DSD2 PCM 0.092 . 0.092 . 0.441 0.006 ** 0.151 0.092 . 0.092 . 0.013 *
DSD5 DSD2 0.231 0.559 0.987 0.671 0.231 0.329 0.052 . 0.441
Speech DSD5 PCM 0.092 . 0.027 * 0.151 0.006 ** 0.151 0.052 . 0.027 *** 0.006 **
DSD2 PCM 0.231 0.013 * 0.027 * 0.001 *** 0.908 0.151 0.013 * 0.027 *
DSD5 DSD2 0.231 0.769 0.849 0.671 0.151 0.151 0.441 0.441
Vocal DSD5 PCM 0.001 *** 0.001 *** 0.002 ** 0.000 *** 0.231 0.092 . 0.000 *** 0.000 ***
DSD2 PCM 0.000 *** 0.002 ** 0.013 * 0.000 *** 0.151 0.441 0.027 * 0.013 *
DSD5 DSD2 0.027 * 0.441 0.849 0.671 0.441 0.329 0.329 0.441
Trio DSD5 PCM 0.000 *** 0.000 *** 0.006 ** 0.000 *** 0.151 0.441 0.001 *** 0.002 **
DSD2 PCM 0.006 ** 0.092 . 0.151 0.001 *** 0.052 . 0.231 0.092 . 0.052 .
DSD5 DSD2 0.994 0.908 0.948 0.973 0.671 0.769 0.849 0.908
Piano DSD5 PCM 0.001 *** 0.002 ** 0.329 0.001 *** 0.329 0.151 0.052 . 0.092 .
DSD2 PCM 0.000 *** 0.092 . 0.092 . 0.000 *** 0.151 0.151 0.092 . 0.027 *
DSD5 DSD2 0.329 0.559 0.329 0.092 . 0.908 0.973 0.769 0.559
Combined DSD5 PCM 0.000 *** 0.000 ** 0.000 *** 0.000 *** 0.000 *** 0.003 ** 0.000 *** 0.000 ***
DSD2 PCM 0.000 *** 0.000 *** 0.000 *** 0.000 *** 0.007 ** 0.010 ** 0.000 *** 0.000 ***
DSD5 DSD2 0.034 * 0.475 0.858 0.244 0.172 0.093 . 0.058 . 0.142
Significance codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Table 3: p-values from binomial test for each combination of source, comparison pair, and attribute. A p-value shows the probability
of how likely that a left hand side format on “comparison” column (e.g., DSD5) is chosen as to have the same level of sensation on a
given attribute to the right hand side format (e.g., PCM) on the same row. Smaller the p-value is, statistically more significant that
the left hand side has higher level of sensation on the given attribute. The symbols “DSD5,” “DSD2,” and “PCM” each denote DSD
(5.6 MHz), DSD (2.8 MHz), and PCM (192 kHz/24 bit), respectively. The last row (“combined”) show the result of binomial test with
all sources combined.
AES 136th Convention, Berlin, Germany, 2014 April 26–29
Page 10 of 10
... Several studies have been focused on tasks involving direct discrimination between competing high resolution audio formats. In [56], test subjects generally did not perceive a difference between DSD (64×44.1 kHz, 1 bit) and DVD-A (176.4 kHz, 16 bit) in an ABX test, whereas [57] showed a statistically significant discrimination between PCM (192 kHz/24 bits) and DSD. However, in both cases, high resolution audio formats are compared against each other. ...
Article
Full-text available
There is considerable debate over the benefits of recording and rendering high resolution audio, i.e., systems and formats that are capable of rendering beyond CD quality audio. We undertook a systematic review and meta-analysis to assess the ability of test subjects to perceive a difference between high resolution and standard, 16 bit, 44.1 or 48 kHz audio. All 18 published experiments for which sufficient data could be obtained were included, providing a meta-analysis involving over 400 participants in over 12,500 trials. Results showed a small but statistically significant ability of test subjects to discriminate high resolution content, and this effect increased dramatically when test subjects received extensive training. This result was verified by a sensitivity analysis exploring different choices for the chosen studies and different analysis approaches. Potential biases in studies, effect of test methodology, experimental design, and choice of stimuli were also investigated. The overall conclusion is that the perceived fidelity of an audio recording and playback chain can be affected by operating beyond conventional levels.
Article
Claims both published and anecdotal are regularly made for audibly superior sound quality for two-channel audio encoded with longer word lengths and/or at higher sampling rates than the 16-bit/44.1-kHz CD standard. The authors report on a series of double-blind tests comparing the analog output of high-resolution players playing high-resolution recordings with the same signal passed through a 16-bit/44.1 -kHz "bottleneck." The tests were conducted for over a year using different systems and a variety of subjects. The systems included expensive professional monitors and one high-end system with electrostatic loudspeakers and expensive components and cables. The subjects included professional recording engineers, students in a university recording program, and dedicated audiophiles. The test results show that the CD-quality A/D/A loop was undetectable at normal-to-loud listening levels, by any of the subjects, on any of the playback systems. The noise of the CD-quality loop was audible only at very elevated levels.
Article
A factorial investigation on verbal attributes of timbres of steady sounds had shown that the attribute sharpness represented the factor carrying most of the variance (v. Bismarck [3]). In the present experiment, sharpness was scaled by several standard psychophysical methods in order to test its consistent measurability. Sharpness of both noise and harmonic complex tones, which were nearly equal in pitch and loudness and differed e. g. in the limiting frequencies and slopes of their spectral envelopes, could be determined quantitatively with different methods. Doubling, halving and directly comparing sharpness yielded internally consistent results. Sharpness increased with the upper and lower limiting frequency as well as the slope of the spectral envelope. The fine structure of the spectrum showed a comparatively small effect on sharpness.Exploratory experiments were aimed at scaling the sharpness of sounds differing strongly in loudness and pitch. Although some of these measurements revealed large scatter in the responses, sharpness appeared as an attribute distinguishable from pitch and loudness. The observed relations between sharpness and the investigated sound parameters could be approximated by a weighted first moment of the loudness-critical band rate-pattern.
Article
A computer-generated pulse signal for sound measurement is discussed. A pulse signal whose power spectrum is flat is generated by inverse Fourier transformation. The generation of a time-stretched pulse and its compression method are also considered. Computer-controlled measurements enable time averaging and the elimination of reflected sound is made in the computer memory by the operator's instruction monitoring acquired waveform on CRT.
Article
A survey of the literature showed that little is known about the perception of timbre. With the aid of the semantic differential and factor analysis, an attempt was made to extract from the timbre percept those independent features which can be described in terms of verbal attributes. Pairs of opposite attributes, such as dark – bright or smooth – rough, characterized the endpoints of 30 scales, on which 35 sounds were rated by two groups of subjects possessing either intensive or no musical training. The sounds were equalized in loudness and pitch; they differed systematically in the parameters of the spectral envelope.Factor analysis of the scale correlations provided four orthogonal factors which extracted 90% of the variance. Each factor could be represented by at least one scale. These scales may be considered nearly sufficient to describe the timbres. The factor carrying most of the variance (44%) was represented by the scale dull – sharp. The scales representing the other factors, however, which extracted only small portions of the variance, exhibited a relatively large inter-individual scatter of ratings and reflected special properties of the sound sample. These scales appeared to be less suitable for the description of timbre in general than the scale dull – sharp.