Content uploaded by Oliver Jokisch
Author content
All content in this area was uploaded by Oliver Jokisch on Jan 26, 2018
Content may be subject to copyright.
Part I
The final publication is
available at Springer via
http://dx.doi.org/10.1007/978-
3-319-43958-7
Quality Assessment of two Fullband Audio
Codecs Supporting Real-Time Communication
M. Maruschke, O. Jokisch, M. Meszaros, F. Tro jahn, M. Hoffmann
Leipzig University of Telecommunications (HfTL), Germany
maruschke@hft-leipzig.de, jokisch@hft-leipzig.de
http://www.hft-leipzig.de
Abstract. ——————– DRAFT ONLY! —————————
The complete publication is available at Springer via ———–
http://dx.doi.org/10.1007/978-3-319-43958-7
Recent audio codecs enable high quality signals up to fullband (20 kHz),
which is usually associated with the max. audible bandwidth. Following
previous studies on speech coding assessment, we survey in this novel
study the music coding ability of two real-time codecs with fullband
capability – the IETF standardized Opus codec as well as the 3GPP
specified EVS codec. We are testing both codecs with vocal, instrumental
and mixed music signals. For evaluation, we predict human assessments
by the instrumental POLQA method, which was primarily designed for
speech assessment. Additionally, we perform a listening test with 21
young adults as a reference. Opus and EVS show a similar music coding
performance. The quality assessment mainly depends on the concrete
music characteristics and on the tested bitrates from 16.4 to 64 kbit/s.
The POLQA measures and the listening results are correlating, whereas
the absolute ratings of the young listeners achieve lower MOS values.
Keywords: Opus, EVS, music coding, POLQA, listening test
1 Introduction
In the current IP-based real-time communication, two predominant fullband audio
codecs are used – the Internet-driven Opus codec [1] and the telecommunication
carrier-patronized codec Enhanced Voice Service (EVS) [2]. The widely spread
Opus codec is pre-installed in popular web-browsers such as Google Chrome
or Firefox and supports their so called Web-based Real-Time Communication
(WebRTC) functionality [3]. Nearly at the same time, the EVS coder standard-
ization was conducted by telecommunication industry – aiming at a flexible
and sustainable fullband audio codec which is backward compatible to already
existing wideband speech codecs in public cellular networks (e. g. AMR-WB
codec). Despite of diverse motivation in the codec developments, both Opus
and EVS support the following audio bandwidth categories (cf. Table 1): Both
codecs are intended and specified to support different needs – fullband audio (FB)
speech coding as well as high-quality music communication applications like live
music streaming or web-radio broadcasting [1] [2]. Therefore, it is reasonable to
assess the codec quality of such “all-rounders”, most notably for different music
options. In this contribution we focus on the comparison of the fullband music
performance provided by Opus versus EVS. To create diverging challenges, we
test samples of the following music categories:
–Singing voice (a-capella music),
–Musical instruments,
–Mixed music (instrumental parts and singing voice).
We intend to detect the audible differences by using these music styles in
varying codec operating modes. The design targets on adequate bitrate conditions
for both, Opus and EVS codecs in FB mode using the instrumental assessment
method Perceptual Objective Listening Quality Assessment (
POLQA
) [4] in FB
music operation. It should be noted here that the perceptual model of POLQA
was not developed for music assessment which means to test the limits of this
method which has not be published so far. Furthermore, we performed a listening
test with human probands. The widely-used audio quality rating score, the Mean
Opinion Score (
MOS
) was utilized. Following an overview about previous Opus
and EVS related research, we introduce our test design in section 3. Afterwards
we discuss the instrumental and perceptual assessment results with Opus and
EVS in section 4 and formulate some conclusions.
Beyond there are other fullband codecs in the AAC-ELD family [5] and also
the G.719 [6]. Nevertheless, for practical reasons (e. g. regarding the missing
capability in WebRTC environment) they are not widely spread.
2 Previous studies on Opus and EVS
Standardized in RFC 6716 [1] by the Internet Engineering Task Force (
IETF
),
Opus is designed as an all-purpose interactive speech and audio codec. Applicable
in multiple use cases, Opus is suitable for scopes like Voice over IP, videocon-
ferencing, online-gaming or audio on demand. It comprises low bit rate speech
as well as very high quality stereo music. To realize high quality and dynamic
characteristics, Opus combines the linear prediction-based SILK codec with
the Modified Discrete Cosine Transform (
MDCT
)-based Constrained Energy
Lapped Transform (
CELT
) codec. For a flexible use, the Opus codec supports
the frequency band types NB, WB, SWB and FB. Consequently, the Opus codec
provides speech and music (alternatively mono or stereo) within a bit rate range
Table 1. Audio bandwidths and corresponding quality levels
Abbreviation Meaning Pass-band Quality Expectation
NB narrowband 0,3 . . . 3,4 kHz traditional phone voice
WB wideband 50 Hz . . . 7 kHz AM radio/ HD voice
SWB superwideband 50 Hz . . . 14 kHz FM radio/ full HD voice + music
FB fullband 20 Hz . . . 20 kHz CD quality/ full HD voice + music
from 6 kbit/s to 510 kbit/s and low delay coding (2,5 ms to 60 ms) for all relevant
sample rates, from 8 kHz up to 48 kHz. Opus supports constant and variable
bitrate (VBR) which is the default operational mode. Since its introduction in
2010/2011, the Opus codec has passed several listening test campaigns – supple-
mented by comparison to other speech and audio codecs (Speex NB/WB, iLBC,
G.722.1/G.722.1C, AMR NB/WB, HE-AAC, Vorbis). The results are examined
and summarized in Hoene et al. [7] in which Opus outperformed all codecs – in
particular in the wider bands if applicable. Nevertheless, the mentioned tests
were carried out without disruptions on stand-alone codec.
EVS was standardized by the 3GPP in 2014 following the AMR-WB codec
and is designed for packet switched networks as well as for mobile communication
like VoLTE [2]. It’s comparable to Opus as an all-purpose codec. To provide
high quality and dynamic characteristics, EVS combines several working modes.
It can change on command between Linear Prediction (
LP
)-based, Frequency
Domain and Inactive Signal (Comfort Noise Generation (
CNG
)) coding. For
the applicability on multiple switched network use cases, EVS provides a higher
errors resilience against packet losses and errors. Futhermore EVS supports an
interactive mode to interact with AMR-WB. It supports all frequency band types
whereby FB is optional. Like Opus, EVS can also handle speech and music at a
bit rate range from 7.2 kbit/s up to 128 kbit/s and low delay times from 30.9
ms to 32 ms for all relevant sample rates (8 kHz NB up to 48 kHz FB). Current,
EVS is not yet ready for stereo music processing.
The ITU-T P.863 recommendation (
POLQA
) describes an objective method
for predicting overall listening speech quality from narrowband (NB) up to
superwideband (SWB) telecommunication scenarios as perceived by the user in
an ITU-T P.800 or ITU-T P.830 Absolute Category Rating (
ACR
) listening-only
test.
POLQA
supports two operational modes, one for narrowband and one
for superwideband. By reason of internal frequency limitations to 14 kHz, the
POLQA method in superwideband mode (P.863 SWB) is not able to differentiate
between clean and unprocessed audio 14 kHz SWB and 20 kHz FB test signals.
Nevertheless, we used the POLQA prediction tool for our quality survey: our
motivation is to check how far the POLQA method in the SWB mode is suitable
for FB music test evaluation.
Currently there are three research studies provided by other institutions/authors
on comparing Opus and EVS under several circumstances/conditions. The first
investigation is from Anssi R¨am¨o and Henri Toukomaa (both from Nokia Net-
works) [8]. It is a listening test using a discretive nine-point
MOS
scale at a
bit rate range from 4.7 kbit/s up to 128 kbit/s with clean speech and mixed
content. The second test provided by the ITU-T study group 12 is a P.800 ACR
based listening test to evaluate the prediction performance of
POLQA
assesment
method [9]. Therein, the EVS codec have been tested, focussing bit rates between
7.2 kbit/s and 24.4 kbit/s. Based on the MOS scores, the prediction performance
of POLQA superwideband mode (rev 2.4) was validated. The last known study
is provided by 3GPP itself [10]. It’s an ITU-T P.800 listening test using the
five-point MOS scale. The tests were conducted under laboratory conditions
using all frequency bandwidths (NB-FB) with bit rates between 4.7 kbit/s and
24.4 kbit/s . This quality test illustrates the performance characterization of the
EVS codec.
After careful consideration there are no direct comparisons between Opus and
EVS in FB mode with bit rates higher then 32 kBit/s using a five-point MOS
scale. So this question is placed and answered for the first time with this pa-
per. Furthermore, the solely FB audio/music codec testing with the POLQA
assessment is new.
3 Test design
3.1 Fullband Opus and EVS testing concept
The two codecs under examination require a different minimum data bitrate
for the audio FB operating mode. While for Opus this minimum bitrate equals
around 20 kbit/s (VBR mode), for EVS it is 16.4 kbit/s. Furthermore, both
codecs where tested using the bitrates 32 kbit/s and 64 kbit/s.
As EVS does not support the stereo production mode we tested both codecs
solely in the mono operation mode. Therefore, when using bitrates higher then 64
kbit/s a better quality is not expected because the codecs reach their saturation
curve (in mono mode), as demonstrated in [8].
As Sound Quality Assessment Material (SQAM) we selected recorded samples
for subjective tests provided by European Broadcasting Union (EBU) [11]. This
EBU SQAM lossless sound samples are available free of charge for research and
development use. To achieve a well-balanced variety of music types, we picked
6 sound samples from this database (2 singing voice, 2 musical instruments, 2
mixed music samples).
3.2 Instrumental POLQA method and perceptual test
In order to validate the POLQA (ITU-T P.863) prediction method an
ACR
listening only test was conducted.
The figure 1 shows the procedure for determining the audio quality expressed
by
MOS
ACR for the listening test and
MOS
Listening Quality Objective (LQO)
Fig. 1. Experimental setup including the POLQA method.
for the POLQA method.
For the instrumental assessment test we utilized the SQuadAnalyzer (version
2.4.0.4) software which supports the POLQA algorithm in SWB operating mode.
The selected stereo signal sound samples of the SQAM database were prepro-
cessed into mono signal reference samples, ready for our test setup.
To find out the appropriate listening test environment, we conducted two
test variants where different acoustic room conditions where given. To minimize
background noises, the first listening test variant took place in a sound insulating
cabinet according the P.800 requirements. The second test variant where per-
formed in a regular lecture room. It turned out that the differences in the results
of the cabinet variant where
≤
0.2 MOS points compared to the lecture room
variant. We estimate that this minor differences are insignificant. By summarizing
the results of both variants, the following overall test scenario resulted:
–21 students as naive listeners,
–20 - 28 years of age,
–frequent music listener,1
–
6 different SQAM samples (violin, glockenspiel, quartet, soprano, two pop
music examples),
–
resulting 42 audio samples in total (each SQAM sample given in three different
bitrates for Opus as well as for EVS, plus the FB reference signal),
–three training sequences,
–five point ACR MOS scale.
4 Results and discussion
4.1 Effect of the codec bitrate
The Figure 2 compares POLQA measures and listening test results for both
codecs – Opus and EVS – by summarizing all test samples depending on bitrate.
As a reference, the mean assessments of the uncoded samples at 768 kbit/s (PCM
48 kHz, 16 bit) are given. The quality degradation between reference and 64
kbit/s version is obviously low (Opus = / EVS = ..).
4.2 Influence of music characteristics
The Figure 3 illustrates the influence of the music characteristics of diverging
music pieces by using the Opus codec.
In contrast to the Opus results (cf. Figure 3), the Figure 4 summarizes the
EVS codec performance on the different music music pieces.
1
listening to music several hours a day, using different audio player techniques (HD
stereo headsets, high quality sound systems).
Fig. 2. Overall results including POLQA and listening test.
Fig. 3. Music coding by Opus – listening test results.
5 Conclusion
Will follow soon.
Fig. 4. Music coding by EVS – listening test results.
Acknowledgment
We would like to thank SwissQual AG (a Rhode & Schwarz company) for
supplying the POLQA tool SQuadAnalyzer – in particular Jens Berger for
the elaborate discussions. Further acknowledgments go to Andr´e Schuster for
supporting the experiments in the sound insulating cabinet of HfT Leipzig and
to all student volunteers in the listening tests.
References
1.
J. Valin, K.Vos, and T. Terriberry, “Definition of the Opus Audio Codec,” RFC
6716 (Proposed Standard), Internet Engineering Task Force, Sep. 2012. [Online].
Available: http://www.ietf.org/rfc/rf c6716.txt
2.
3GPP, “EVS Codec General Overview,” 3rd Generation Partnership Project
(3GPP), TS 26.441 v12.1.0, Dec. 2014. [Online]. Available: http://www.3gpp.org
/DynaReport/26441.htm
3. Google Inc. (2014, Sep.) WebRTC. [Online]. Available: http://www.webrtc.org/
4.
ITU-T, “Methods for objective and subjective assessment of speech quality
(POLQA): Perceptual Objective Listening Quality Assessment,” International
Telecommunication Union (Telecommunication Standardization Sector), REC P.863,
Sep. 2014. [Online]. Available: http://www.itu.int/rec/T-REC-P.863-201409-I/en
5.
Frauenhofer IIS, “The AAC-ELD Family For High Quality Communication
Services,” Frauenhofer IIS), Technical Paper, Dec. 2015. [Online]. Avail-
able: http://www.iis.fraunhofer.de/content/dam/iis/de/doc/ame/wp/Fraunhof
erIIS Technical-Paper AAC-ELD- family.pdf
6.
ITU-T, “Low-complexity, full-band audio coding for high-quality, conversational
applications,” International Telecommunication Union (Telecommunication
Standardization Sector), REC G.719, Jun. 2008. [Online]. Available: http:
//www.itu.int/rec/T-REC-G.719- 200806-I/en
7.
C. Hoene, J. Valin, K. Vos, and J. Skoglund, “Summary of OPUS listening test
results draft-ietf-codec-results-03,” Internet-Draft, Internet Engineering Task Force,
Jan. 2014, Available: http://tools.ietf.org/html/draft-ietf -codeco-results- 03 [re-
trieved: April., 2015].
8.
A. R¨am¨o and H. Toukomaa, “Subjective Qualitiy Evaluation of the 3GPP EVS
codec,” IEEE International Conference on Acoustics, Speech and Signal Processing;
Brisbane; Australia, April 2015.
9.
ITU-T. (2016, Jan.) P.Imp863: Implementer’s Guide on assessment of
EVS coded speech with Recommendation ITU-TP.863. [Online]. Available:
http://www.itu.int/rec/T-REC-P.Imp863-201601-I!Oth1/en
10.
ETSI, “Universal Mobile Telecommunications System (UMTS); LTE; Codec
for Enhanced Voice Services (EVS); Performance characterization,” European
Telecommunications Standards Institute (ETSI), TS 126952 v13.0.0, Jan. 2016.
[Online]. Available: http://www.etsi.org/deliver/etsi tr/126900 126999/126952/13.
00.00 60/tr 126952v130000p.pdf
11.
European Broadcasting Union. (2008, Oct.) Sound Quality Assessment Material
recordings for subjective tests. [Online]. Available: https://tech.ebu.ch/public
ations/sqamcd