Conference PaperPDF Available

Objective and Subjective QoE Evaluation for Adaptive Point Cloud Streaming

Authors:

Abstract and Figures

Volumetric media has the potential to provide the six degrees of freedom (6DoF) required by truly immersive media. However, achieving 6DoF requires ultra-high bandwidth transmissions, which real-world wide area networks cannot provide today. Therefore, recent efforts have started to target efficient delivery of volumetric media, using a combination of compression and adaptive streaming techniques. It remains, however, unclear how the effects of such techniques on the user perceived quality can be accurately evaluated. In this paper, we present the results of an extensive objective and subjective quality of experience (QoE) evaluation of volumetric 6DoF streaming. We use PCC-DASH, a standards-compliant means for HTTP adaptive streaming of scenes comprising multiple dynamic point cloud objects. By means of a thorough analysis, we investigate the perceived quality impact of the available bandwidth, rate adaptation algorithm, viewport prediction strategy and user’s motion within the scene. We determine which of these aspects has more impact on the user’s QoE, and to what extent subjective and objective assessments are aligned.
shows the normalized MOS scores (0-100) for each test condition. An important observation is that the range of the average MOS scores for all configurations is limited to an interval between 23.8 and 61.3. Even though the subjects were shown the uncompressed content, the majority of the people rated the videos on the lower end of the quality spectrum. This shows that the considered content, which was captured using 42 cameras and requires 19.7 Gb/s, was not to the subjects' standards. This was also clear from the debriefing interviews with feedback and remarks, where the subjects often mentioned that their expectations of the visual content were higher because of their familiarity with full-HD and 4K resolutions for traditional video. Fig. 4 shows the cumulative distribution function (CDF) of the MOS scores given by the 26 subjects, for different configurations. Results show that SRC 1 (line-up of the objects, zoomed out) is rated higher than SRC 2 and 3 (zoom in on a specific object). This can be related to the visual size and area of the considered point cloud objects, which is significantly lower in SRC 1. Also the texture of the point cloud objects turned out to be an important factor to the subjects. Multiple subjects indicated to have given higher scores to SRC 2 than to SRC 3, because the loot and redandblack objects show less contrast differences than the soldier and longdress objects in SRC 3. This explains the higher range of MOS scores for the latter content. We also observe that higher bandwidth values result in higher MOS scores, which was to be expected. Viewport prediction leads to better results as well, which can be explained by the lack of quality switches when shifting the focus from one object to the other (the client was able to anticipate this change during buffering). The bit rate allocation scheme also has an impact on the subjects' perception of the clip, with a preference toward uniform bandwidth allocation. Finally, the results show that even the uncompressed content is rated less than 60 almost half of the time. This again indicates that the subjects were generally underwhelmed by the provided video sequences. The above observations are confirmed by our mixed model ANOVA results, which identify all factors except the
… 
Content may be subject to copyright.
Objective and Subjective QoE Evaluation for
Adaptive Point Cloud Streaming
Jeroen van der Hooft1, Maria Torres Vega1, Christian Timmerer2,3, Ali C. Begen4, Filip De Turck1and
Raimund Schatz2,5
1IDLab, Department of Information Technology, Ghent University - imec, jeroen.vanderhooft@ugent.be
2Institute of Information Technology, Alpen-Adria-Universit¨
at Klagenfurt 3Bitmovin
4Computer Science Department, ¨
Ozye˘
gin University 5AIT Austrian Institute of Technology
Abstract—Volumetric media has the potential to provide the
six degrees of freedom (6DoF) required by truly immersive
media. However, achieving 6DoF requires ultra-high bandwidth
transmissions, which real-world wide area networks cannot
provide today. Therefore, recent efforts have started to target
efficient delivery of volumetric media, using a combination of
compression and adaptive streaming techniques. It remains,
however, unclear how the effects of such techniques on the user
perceived quality can be accurately evaluated. In this paper, we
present the results of an extensive objective and subjective quality
of experience (QoE) evaluation of volumetric 6DoF streaming.
We use PCC-DASH, a standards-compliant means for HTTP
adaptive streaming of scenes comprising multiple dynamic point
cloud objects. By means of a thorough analysis, we investigate
the perceived quality impact of the available bandwidth, rate
adaptation algorithm, viewport prediction strategy and user’s
motion within the scene. We determine which of these aspects
has more impact on the user’s QoE, and to what extent subjective
and objective assessments are aligned.
Index Terms—Volumetric media, HTTP adaptive streaming,
6DoF, MPEG V-PCC, QoE assessment, objective metrics.
I. INTRODUCTION
Six degrees of freedom (6DoF) allows an immersive media
user to move freely within the virtual environment. Enabling
volumetric 6DoF, however, requires sophisticated methods for
media representation and delivery. One plausible solution is
to combine point cloud compression (PCC) with adaptive
streaming techniques that are accustomed to two-dimensional
and 360° video content. PCC significantly reduces the storage
amount at the expense of complex preprocessing and rendering
at the client. HTTP adaptive streaming (HAS) copes with
dynamic network conditions while attempting to deliver the
highest quality possible under the given circumstances.
Even though various approaches have already proposed
to apply HAS to point clouds, this is still a niche area of
research. Notable studies include the ones by Hosseini and
Timmerer [1] and van der Hooft et al. [2]. While these studies
agree on the advantages of using HAS to nicely trade off the
streaming quality with the bandwidth consumption, the impact
of this trade-off on the user perception, i.e., on the quality of
experience (QoE), is yet to be thoroughly examined. Thus, a
subjective QoE study is necessary in order to shed light on to
the initial findings, which is our primary goal in this paper.
This paper presents an analysis of the effects of PCC and
HAS on the perceived quality. Starting from the state of the
art, we assess the quality by means of both subjective and
objective evaluations. Our three primary contributions are as
follows: (i) we perform a subjective quality assessment study
on 6DoF adaptive point cloud streaming (PCS), (ii) we analyze
the impact of different network conditions and configurations
on the QoE, and (iii) we provide a benchmark for objective
quality metrics. Before we get to the details of the study and
results, we provide background on point cloud compression
and streaming, and review related work.
II. BAC KG RO UN D AND RE LATE D WORK
A. Point Cloud Compression and Streaming
HAS allows the client to adapt the video quality based
on the network conditions, playback buffer status, user
preferences and the considered video content. When it comes
to point clouds, several PCC techniques exist to compress the
original data (e.g., the encoder by Mekuria et al. [3] and
the recent V-PCC encoder [4]). Changing the quantization
parameters results in multiple representations, each of which
comes with a different bit rate and quality. These compressed
objects can then be retrieved by a client and used to render a
three-dimensional scene with 6DoF.
Hosseini and Timmerer are the first to propose a standards-
compliant approach for on-demand PCS of a single object [1].
The authors sample different points to generate versions
of lower quality, and request objects on a per-frame basis.
He et al. consider view-dependent single PCS, using a cubic
projection to create six two-dimensional images that are
then compressed using traditional compression techniques [5].
The proposed approach relies on a (hybrid) broadband and
broadcast network and in-network optimizations such as
caching. Li et al. introduce a framework for PCS, balancing
communication and computational resources to maximize a
proprietary QoE metric [6]. Simulation results are promising,
but do not consider important factors such as latency. Van der
Hooft et al. present PCC-DASH, a framework for streaming
scenes consisting of multiple point cloud objects [2]. PCC is
used to prepare multiple quality versions of the objects, and
several rate adaptation heuristics that take into account the
user’s position and viewing angle are proposed.
B. Objective Metrics for Point Cloud Streaming
Multiple objective metrics have been used to represent
the quality of point clouds in the past. In this regard, a
distinction must be made between the quality of a point
cloud object compared to a derived version of the same
object, and the quality of the rendered field of view (i.e.,
what a user observes when looking through a head-mounted
display). To compare the quality of derived point clouds,
two Peak Signal-to-Noise Ratio (PSNR)-based metrics have
been proposed by MPEG, referred to as point-to-point and
point-to-plane geometry distortion metrics [7]. The former
calculates the mean square error (MSE) between the reference
and reconstructed points (both for geometry in terms of x,y
and z, and for color in terms of Y UV ). The latter calculates
the MSE between the surface plane and reconstructed points.
PSNR values are obtained based on the volume resolution for
geometry and on the color depths for each color channel.
Although relevant to assess the performance of compression
techniques for volumetric media, these metrics give no
indication of how the user visually perceives the corresponding
point cloud object(s) from a given viewpoint or angle.
Recently, well-known metrics for traditional video streaming
have been applied to assess the visual quality of the rendered
point cloud content, compared to a given benchmark (e.g.,
uncompressed point cloud objects). These metrics include the
PSNR, the structured similarity index (SSIM) [8] and the
multiscale SSIM [9], to name a few. Although these metrics
give an indication of the visual quality of the rendered content,
it should be noted that they take the background of the
consumed scenes into account. This background contributes
less to the perceived quality, since it is expected that users
mainly focus on the objects in the front. One work has
considered background removal for images generated from
point cloud content, using a MATLAB-based tool for assisted
removal [10]. Although useful, this feature comes with a
significant computational overhead when video is considered.
C. Subjective Evaluation of Point Cloud Quality
Subjective assessment of the quality of point cloud
rendering has been performed in a number of settings
and configurations featuring different attributes (colored,
colorless), rendering types (raw points, cube, mesh) and
degradation types (compression, noise, octree-pruning), see
[11], [12] for a comprehensive overview. The majority of
studies so far has focused on static models (except, e.g., [3]),
using a passive evaluation protocol (except, e.g., [11], [13])
and double-stimulus testing (except, e.g., [3]). Recent work
has confirmed the viability of using pre-rendered 2D imagery
for point cloud subjective quality testing (cf., [11], [14]) and
investigated alternatives to double-stimulus testing (cf., [15]).
In short, two common key aspects are found in the literature:
encoding evaluation and double-stimulus assessment. First,
almost all studies so far have focused on the quality
degradation due to the encoding. However, this degradation
does not cover the effect of networks on the perceived quality,
which will be fundamental given the bandwidth required
TABLE I: Parameter configurations considered in this work.
Parameter Configurations
Point cloud objects loot, redandblack, soldier, longdress
Quality representations R1, R2, R3, R4, R5
Segment duration 1 second
Scene / camera path 1, 2, 3
Bandwidth 20, 60, 100 Mb/s
Latency 37 ms (reference for 4G)
Buffer size 4 seconds
Object priority Avis
Bit rate allocation uniform, view-focused
Prediction most recent, clairvoyant
by PCS. Second, subjective evaluations were predominantly
performed using double-stimulus testing, where the users rate
the degradation of one video comparing to the unimpaired
source. While double-stimulus provides the means to assess
the perceived degradation of the content, it comes short to
assess the overall perceived quality. The aim of this work
is thus to provide an experimental subjective and objective
evaluation of the effects of adaptive PCS on the user’s QoE.
III. EXP ER IME NTAL APP ROAC H
To address aforementioned gaps in the existing research,
we conducted a subjective and objective evaluation of the
QoE of adaptive PCS. Our aim was to assess the QoE impact
of relevant settings and parameters in a setup that considers
the dynamic nature of PCS (animated models and moving
camera). Further, we wanted to benchmark common objective
metrics against subjective results in order to identify their
alignment with human perception. In particular, the purpose
of this research is to answer the following two questions:
RQ1: What is the impact of available network bandwidth,
viewport prediction, bit rate allocation, and 3D scene
type on adaptive PCS quality perception?
RQ2: How do objective image-based metrics correlate with
the subjective quality for different adaptive PCS
delivery settings and scenarios?
The remainder of this section presents the required steps and
approaches used to answer these research questions.
A. Evaluation Space and Content Generation
We implemented the setup and algorithms proposed by van
der Hooft et al. [2] and deployed them on a dedicated network
testbed1. This allowed us to replicate their experiments and
analyze the observed results ourselves. Multiple parameters
are defined by the authors, of which the ones in Table I were
considered in this work.
First, the four point cloud objects from the 8i dataset [16]
were encoded using the V-PCC encoder with MPEG’s five
reference quality representations [4]. As an example, Fig. 2
shows the lowest and highest quality representation of one of
the frames of the soldier object, which has an uncompressed
bit rate of 5.7 Gb/s. The original content comes with 300
frames per object, at a frame rate of 30 frames per second. A
segment duration of one second (corresponding to 30 frames)
is considered in this work.
1https://doc.ilabt.imec.be/ilabt/virtualwall/
Fig. 1: Screenshot of an example field of view.
TABLE II: Selected parameter configurations for each SRC.
Bandwidth [Mb/s] Content Allocation Prediction
20 compressed view-focused most recent
60 compressed view-focused most recent
100 compressed view-focused most recent
20 compressed view-focused clairvoyant
60 compressed view-focused clairvoyant
100 compressed view-focused clairvoyant
60 compressed uniform most recent
original N/A N/A
Second, the point cloud objects were merged together to
form different scenes. We considered two different types of
scenes: one in which the objects (humans) are positioned in
a line, and one in which they are placed on a semi-circle. To
allow the evaluation of a larger number of conditions, three
excerpts of the original content (120 s, playing out the 300
frames forward and backward six times each) were extracted.
For the remainder of this paper, these excerpts are referred to
as Reference Source Sequences (SRC). The first SRC, with a
duration of 24 s, belongs to the first scene and pans the four
objects. The other two, with a duration of 18 s, are taken from
the second scene and zoom in and out of two different objects
(loot and redandblack in SRC 2, soldier and longdress in SRC
3). With regard to the user’s movement and focus, it is worth
noting that the same programmatically generated traces were
used as the ones in [2]. This allowed us to generate and render
the field of view of different videos using the MPEG point
cloud compression renderer2. An example screenshot can be
found in Fig. 1. The three resulting videos (with the original
point cloud objects) have been made available online3.
Third, a network was emulated in which a single client was
connected to an HTTP server. Using traffic control (tc) on
a shared network link, the available bandwidth was fixed to
discrete values (20,60 and 100 Mb/s). The latency was set to
37 ms, a reference value for 4G networks. The server contained
all point cloud objects at different quality representations, and
offered a manifest file containing all the metadata required
during streaming. The client used a Python-based player,
which allowed to set multiple configurations. In this work, the
2http://mpegx.int-evry.fr/software/MPEG/PCC/mpeg-pcc-renderer/
3https://users.ugent.be/ jvdrhoof/pcc-dash/
Fig. 2: Two representations of the soldier object: R1 at
4.5 Mb/s(left) and R5 at 40.4 Mb/s(right).
buffer size was set to four seconds, while the size of the visual
area (defined as Avis in [2]) was used to prioritize objects
within the scene. Once these objects were ranked according
to their priority, the available bandwidth was allocated to the
four point cloud objects in a uniform way (i.e., increase the
quality of all objects one by one, starting with the object of
the highest priority), or in a view-focused way (i.e., increase
the quality of the visible objects first, before considering the
objects outside of the field of view). Finally, there was an
option to use either the most recent information on the user’s
position and focus at the time of buffering, or use clairvoyant
prediction (i.e., the client assumes perfect prediction on the
user’s position and focus when the content is being played
out). Once all configurations had been set, the client started
a streaming session by retrieving the different point cloud
segments one by one, adapting the video quality based on the
observed throughput and the configurations above. Decisions
on the quality representation for each of the point cloud objects
were logged on a per-segment basis, so that the resulting field
of view of the entire streaming session could be generated
once the experiment had finished.
For evaluation purposes, eight configurations were selected
for each SRC (see Table II). This results in a total number of
24 processed video sequences (PVS).
B. Subjective Quality Experiments
For the test setup and procedure, we used ITU-R BT.500-
13 and ITU-T P.913 as general guidelines. A screen with 2K
resolution was used to passively show the content to the user,
who was sitting at a distance of approximately four times the
height of the screen. The applied test protocol was as follows:
1) Welcome (3 min): Briefing and informed consent.
2) Setup (2 min): Screening and demographic data.
3) Training (1 min): Evaluation of a single video example not
related to point clouds.
4) Evaluation (16 min): 24 PVSs, post-stimulus questionnaire.
5) Debriefing (2 min): Feedback and remarks.
Prior to the experiment, subjects were screened for correct
visual acuity using Snellen charts (20/20) and for color
vision using Ishihara charts. A short training was performed
at the beginning of each test session to familiarize the
subjects with the test procedure. During the evaluation
Fig. 3: MOS scores for the 24 PVSs. MOS is normalized
to 1-100, with 1-20 equating “bad” and 81-100 equating
“excellent” quality. (MOS CI = 0.95). Each color designates an
SRC, with conditions further grouped by viewport prediction
and ordered by increasing network bandwidth. Infinite
bandwidth designates reference conditions. Condition code:
SRC, prediction [0: most recent, 1: clairvoyant], bandwidth
[Mb/s], rate allocation [V: view-focused, U: uniform].
sessions, 24 PVSs were shown, corresponding to the three
SRCs with different configurations. The post-stimulus rating
questionnaire prompted participants to rate the quality of each
PVS on an ACR-7 continuous scale.
C. Objective Quality Evaluation
To objectively evaluate the generated fields of view, the
considered PVSs were compared with the SRCs containing
the original, non-compressed point cloud objects. Three full-
reference video metrics were considered in this paper:
Weighted PSNR [17]: Using reference weights of 0.75,
0.125 and 0.125 to the YUV components, respectively;
SSIM [8]: Averaged over all frames in the video;
VQM [18]: The standardized NTIA General Model, using
full-reference calibration.
Results for these metrics were compared with those of
the subjective evaluation procedure. Note that the distortion
metrics, specifically developed for point clouds, cannot be
applied, because the considered scenes consist of multiple
point clouds, which are shown either sequentially or in parallel
throughout the considered SRCs.
IV. EVALUATIO N RES ULTS
In this section, we first answer RQ1 discussing the results
of the subjective study and the objective evaluation. We then
address RQ2 via a comparative analysis of both result sets.
A. Impact of Content/Streaming-Related Factors (RQ1)
1) Subjective Results: A total of 30 subjects participated
in our subjective experiment: 7 subjects were female and 23
were male, while 19, 10 and 1 subjects were between 20-
29, 30-39 and 40-49 years old, respectively. Subjects were
recruited from Ghent University, offering participants a chance
of winning a movie ticket in a raffle as incentive. As the result
of our outlier removal procedure, according to ITU-R BT.1788
Annex 2 [19], we eliminated four subjects from the study
dataset resulting in a final subject count of N= 26.
Fig. 4: CDF plots of subjective quality ratings grouped by SRC
(top left), available bandwidth [Mb/s] (top right), viewport
prediction (bottom left) and bit rate allocation (bottom right).
Fig. 3 shows the normalized MOS scores (0-100) for each
test condition. An important observation is that the range of
the average MOS scores for all configurations is limited to an
interval between 23.8and 61.3. Even though the subjects were
shown the uncompressed content, the majority of the people
rated the videos on the lower end of the quality spectrum.
This shows that the considered content, which was captured
using 42 cameras and requires 19.7 Gb/s, was not to the
subjects’ standards. This was also clear from the debriefing
interviews with feedback and remarks, where the subjects
often mentioned that their expectations of the visual content
were higher because of their familiarity with full-HD and 4K
resolutions for traditional video.
Fig. 4 shows the cumulative distribution function (CDF)
of the MOS scores given by the 26 subjects, for different
configurations. Results show that SRC 1 (line-up of the
objects, zoomed out) is rated higher than SRC 2 and 3
(zoom in on a specific object). This can be related to the
visual size and area of the considered point cloud objects,
which is significantly lower in SRC 1. Also the texture of
the point cloud objects turned out to be an important factor
to the subjects. Multiple subjects indicated to have given
higher scores to SRC 2 than to SRC 3, because the loot and
redandblack objects show less contrast differences than the
soldier and longdress objects in SRC 3. This explains the
higher range of MOS scores for the latter content. We also
observe that higher bandwidth values result in higher MOS
scores, which was to be expected. Viewport prediction leads
to better results as well, which can be explained by the lack
of quality switches when shifting the focus from one object to
the other (the client was able to anticipate this change during
buffering). The bit rate allocation scheme also has an impact
on the subjects’ perception of the clip, with a preference
toward uniform bandwidth allocation. Finally, the results show
that even the uncompressed content is rated less than 60 almost
half of the time. This again indicates that the subjects were
generally underwhelmed by the provided video sequences.
The above observations are confirmed by our mixed
model ANOVA results, which identify all factors except the
Fig. 5: Scatter plots for the different objective metrics (PSNR, SSIM, VQM) with MOS scores as the ground truth. Each dot
represents a test condition, each color a video sequence. Lines are fitted via linear regression using ordinary least squares.
TABLE III: Mixed-model ANOVA results for fixed (F-Test)
and random effects (likelihood-ratio test). Asterisks indicate
levels of significance.
Fixed Effects F p
Bandwidth 73.177 <0.001 ***
Prediction 4.830 0.035 *
Allocation 2.844 0.092
SRC 12.472 <0.001 ***
Random Effects ChiSq p
Bandwidth:User 6.499 0.011 *
Prediction:User 3.937 0.047 *
Allocation:User 0.000 0.998
SRC:User 12.644 <0.001 ***
User 32.494 <0.001 ***
*** p < 0.001, ** p < 0.01, * p < 0.05
bandwidth allocation strategy as exerting significant influence
on the MOS (see Table III), with the prediction component
being on the borderline (p=0.035). In addition, we also tested
for user-related influences by analyzing the random effects
part of our mixed-effects model. Indeed, the model suggests
the presence of significant “assessor effects” on quality ratings
(p<0.001). In particular, the influence of the shown SRC on
quality rating behavior varied significantly across participants
(p<0.001). However, we could not detect any systematic
influence of subject variables such as age or gender.
2) Objective Results: Table IV shows the results obtained
for the objective metrics. Three observations can be made.
First, even though the configurations are significantly different
(especially regarding the available bandwidth), differences in
terms of metrics are limited. Looking at the obtained PSNR
values, for instance, the range among all PVSs equals 4.97,
while the highest range among the three SRCs merely equals
2.38. This can mostly be attributed to the fact that the
considered metrics are calculated based on the whole field
of view, which includes the (static) background. A different
point cloud representation thus affects these metrics in a less
pronounced manner than is the case for traditional video.
Second, although differences are small, the observed trends
are evident: (i) with increasing network bandwidth, better
scores are always observed, (ii) viewport prediction has a
positive effect on the resulting video quality, and (iii) uniform
allocation of the available bandwidth leads to better results
than prioritizing objects visible at the time of buffering. The
latter is related to our prior observation that a change of focus
can result in a negative impact on the observed video quality.
Third, a strong linear correlation between the considered
metrics is observed. As shown in Fig. 6, the highest correlation
TABLE IV: Subjective MOS and objective metrics for the
different test conditions. Condition code: SRC, prediction,
bandwidth, bit rate allocation (see Fig. 3).
Condition MOS PSNR SSIM VQM
1 0 20 V 41.3462 38.9713 0.9846 0.0987
10 60 V 44.9365 40.0958 0.9880 0.0795
1 0 100 V 51.4000 40.3799 0.9887 0.0753
1 1 20 V 39.4873 39.1501 0.9852 0.0964
1 1 60 V 49.6907 40.3158 0.9885 0.0768
1 1 100 V 51.6669 40.5320 0.9890 0.0730
1 0 60 U 49.8088 40.1941 0.9883 0.0759
2 0 20 V 31.1538 40.6554 0.9847 0.0879
2 0 60 V 40.6415 42.2854 0.9893 0.0614
2 0 100 V 43.9754 42.6054 0.9901 0.0551
2 1 20 V 35.8327 41.1388 0.9863 0.0783
2 1 60 V 43.5254 42.6909 0.9901 0.0541
2 1 100 V 45.4488 42.6925 0.9901 0.0540
2 0 60 U 42.8208 42.1978 0.9893 0.0602
3 0 20 V 23.7823 37.7223 0.9790 0.1062
3 0 60 V 40.9615 39.2676 0.9848 0.0796
3 0 100 V 49.8077 39.9239 0.9871 0.0551
3 1 20 V 29.6150 38.1217 0.9807 0.0956
3 1 60 V 45.3858 39.8047 0.9869 0.0653
3 1 100 V 48.0119 40.1055 0.9879 0.0612
3 0 60 U 42.9485 39.5652 0.9864 0.0685
is achieved for the PSNR and SSIM metrics, with a Pearson
correlation coefficient of 0.85. Similar results are observed for
other correlation metrics, such as Spearman’s rank correlation.
B. Subjective vs. Objective Results (RQ2)
In this section, we answer RQ2 by comparing subjective
assessment results with objective metrics. To this end, Fig. 5
shows the scatter plots for the considered objective metrics
with the MOS scores as the ground truth. Each dot represents
a test condition, as defined in Table II. Lines are fit through
linear regression, using ordinary least squares. From these
graphs, we observe that there is a clear linear correlation
between the MOS scores and the considered objective metrics.
This corresponds to the results observed for the Pearson
correlation coefficients in Fig. 6, with values of 0.40, 0.82
and -0.59 for PSNR, SSIM and VQM, respectively. Comparing
results with the observed MOS scores, the SSIM metric thus
seems to best reflect the increasing/decreasing trend of the
subjects’ preferences toward the considered content. However,
given the observed offsets between the different videos, it
should be noted that the relation between the subjective scores
and the objective metrics is strongly SRC-dependent. Indeed,
although the MOS scores overlap, objective results for SRC 2
are significantly better than for SRC 1 and 3. Thus, we
Fig. 6: Pearson and Spearman correlation coefficients for the
MOS scores and the three objective metrics, over all 24 PVSs.
conclude that it is not possible to accurately deduce QoE
scores solely from the objective results. We believe this can
be attributed to two factors.
First, the scores for the objective metrics strongly depend
on the properties of the point cloud object(s). Objects that
are smaller, for instance, contribute less to the PSNR of the
field of view. Real users, however, tend to strongly focus on
(the quality of) the objects, without taking the background
into account. Furthermore, it is harder for users to evaluate
(differences between) the quality of the objects that are farther
away, resulting in a lower range of MOS scores.
Second, the objective scores do not reflect the impact of
quality switches within the video. These metrics are based on
(weighted) averages and do not take dynamic behavior into
account. A typical user, however, pays attention to quality
switches and rate the video as such. Psychological factors play
an important role here, in that a subject remembers that what
was bad (e.g., switching to a lower quality when a new object
is being focused on). This shows that accurate prediction
models for 6DoF user movement are important to improve
the QoE in these applications.
V. CONCLUSIONS AND FUTURE WO RK
In this paper, we evaluated the impact of adaptive streaming
optimizations on the quality of experience (QoE) of point
clouds. Based on the state of the art, we prepared a set
of 24 impaired volumetric test videos that were analyzed
both subjectively, through a single stimulus approach, and
objectively, through the full reference metrics PSNR, SSIM
and VQM. First, we found out that users are able to provide
accurate and consistent responses when assessing quality
without the presence of the ground truth, even though the
most common point cloud quality assessment approach in the
literature so far has been double stimulus. However, given the
different nature of volumetric media, subjects tend to give
lower ratings than those for traditional HD or 4K videos.
Second, high correlation between objective and subjective
metrics was shown for the case of adaptive point cloud
streaming. Nonetheless, objective metrics need to be rescaled
or adjusted to properly match the human perception. Thus,
there is a need for more representative metrics and QoE models
that more accurately reflect the quality perceived by the user.
In future work, we aim to extend our experimental test set with
additional scenes, conditions as well as a comparative analysis
of the performance of single versus double stimulus testing
for this type of media. Furthermore, we plan to use alternative
video-based point cloud compression techniques that allow for
real-time decoding of the considered point cloud objects. This
way, we can evaluate truly interactive 6DoF video scenarios.
ACKNOWLEDGMENTS
This research is part of a collaborative project between
Huawei and Ghent University, funded by Huawei Technolo-
gies, China, and has been supported in part by the Chris-
tian Doppler Laboratory ATHENA (https://athena.itec.aau.at/).
Maria Torres Vega is funded by the Research Foundation
Flanders (FWO), grant number 12W4819N.
REFERENCES
[1] M. Hosseini and C. Timmerer. Dynamic Adaptive Point Cloud
Streaming. In Packet Video, 2018.
[2] J. van der Hooft, T. Wauters, F. De Turck, C. Timmerer, and
H. Hellwagner. Towards 6DoF HTTP Adaptive Streaming Through Point
Cloud Compression. In ACM Multimedia, 2019.
[3] R. Mekuria, K. Blom, and P. Cesar. Design, Implementation, and
Evaluation of a Point Cloud Codec for Tele-Immersive Video. IEEE
Trans. CSVT, 27(4), 2017.
[4] S. Schwarz et al. Emerging MPEG Standards for Point Cloud
Compression. Jour. Emerging and Selected Topics in Circuits and
Systems, 9(1), 2018.
[5] L. He, W. Zhu, K. Zhang, and Y. Xu. View-Dependent Streaming of
Dynamic Point Cloud over Hybrid Networks. In Advances in Multimedia
Information Processing, pages 50–58. Springer, 2018.
[6] Jie Li, Cong Zhang, Zhi Liu, Wei Sun, and Qiyue Li. Joint
Communication and Computational Resource Allocation for QoE-driven
Point Cloud Video Streaming. arXiv e-prints, page arXiv:2001.01403,
2020.
[7] MPEG. MPEG 3DG and Requirements - Call for Proposals for Point
Cloud Compression V2. https://bit.ly/2RSWdWe, 2017.
[8] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image Quality
Assessment: From Error Visibility to Structural Similarity. IEEE Trans.
Image Processing, 13(4), 2004.
[9] Z. Wang, E. P. Simoncelli, and A. C. Bovik. Multiscale Structural
Similarity for Image Quality Assessment. In The 37th Asilomar
Conference on Signals, Systems Computers, 2003, volume 2, pages
1398–1402, 2003.
[10] E. Alexiou and T. Ebrahimi. Exploiting User Interactivity in Quality
Assessment of Point Cloud Imaging. In QoMEX, 2019.
[11] E. Alexiou, I. Viola, T. M. Borges, T. A. Fonseca, R. L. de Queiroz,
and T. Ebrahimi. A Comprehensive Study of the Rate-Distortion
Performance in MPEG Point Cloud Compression. APSIPA Trans. Signal
and Information Proc., 8, 2018.
[12] E. Dumic, C. R. Duarte, and L. A. da Silva Cruz. Subjective Evaluation
and Objective Measures for Point Clouds - State of the Art. In Int.
Colloquium on Smart Grid Metrology (SmaGriMet), page 1–5, 2018.
[13] E. Alexiou and T. Ebrahimi. On Subjective and Objective Quality
Evaluation of Point Cloud Geometry. In QoMEX, 2017.
[14] E. Alexiou et al. Point Cloud Subjective Evaluation Methodology Based
on 2D Rendering. In QoMEX, 2018.
[15] E. Zerman, P. Gao, C. Ozcinar, and A. Smolic. Subjective and Objective
Quality Assessment for Volumetric Video Compression. Image Quality
and System Performance XVI, 2019(10), 2019.
[16] E. d’Eon, T. Myers, B. Harrison, and P. A. Chou. Joint MPEG/JPEG
Input. 8i Voxelized Full Bodies - A Voxelized Point Cloud Dataset.
https://jpeg.org/plenodb/pc/8ilabs/, 2017.
[17] J. Ohm, G. J. Sullivan, H. Scharz, T. K. Tan, and T. Wiegand.
Comparison of the Coding Efficiency of Video Coding Standards -
Including High Efficiency Video Coding (HEVC). IEEE Trans. CSVT,
22(12), 2012.
[18] M. H. Pinson and S. Wolf. A New Standardized Method for Objectively
Measuring Video Quality. IEEE Transactions on Broadcasting,
50(3):312–322, 2004.
[19] ITU-R Rec. BT.1788. Methodology for the Subjective Assessment of
Video Quality in Multimedia Applications, 2007.
... In order to evaluate the perception of users of volumetric media when the content is streamed using adaptive streaming, we performed a subjective study [6] with 30 users. Three types of video content were composed by four different point cloud objects presented on a flat screen and streamed using different bandwidth and bitrate allocation configurations. ...
Preprint
Full-text available
This paper focuses on video streaming over telecommunication networks, taking the important evolution towards network softwarizaton into account. The importance and the opportunities provided by volumetric media delivery will be outlined by means of examples. The most appropriate management platform design for volumetric media delivery and the various challenges and possible approaches will be highlighted next. Finally an overview of research challenges and opportunities will be presented to generate further collaborative research in this area of research.
... Another paper [24], examines the impact of point cloud compression (PCC) and HTTP Adaptive Streaming (HAS) on the perceived quality of a scene. Subjects are shown volumetric test videos, and they rated volumetric videos lower in terms of perceived quality when compared to traditional HD or 4K videos. ...
Article
Full-text available
Live holographic teleportation is an emerging media application that allows Internet users to communicate in a fully immersive environment. One distinguishing feature of such an application is the ability to teleport multiple objects from different network locations into the receiver's field of view at the same time, mimicking the effect of group-based communications in a common physical space. In this case, live teleportation frames originated from different sources must be precisely synchronised at the receiver side to ensure user experiences with eliminated perception of motion misalignment effect. For the very first time in the literature, we quantify the motion misalignment between remote sources with different network contexts in order to justify the necessity of such frame synchronisation operations. Based on this motivation, we propose HoloSync, a novel edge-computing-based scheme capable of achieving controllable frame synchronisation performances for multi-source holographic teleportation applications. We carry out systematic experiments on a real system with the HoloSync scheme in terms of frame synchronisation performances in specific network scenarios, and their sensitivity to different control parameters.
... They test the impact of such heuristic considering different locations for the point clouds, as well as different camera paths to simulate user interactivity. Moreover, a QoE evaluation through subjective studies is performed in a subsequent work [61], which demonstrates the significant impact of the bandwidth allocation strategy on the final perceived quality. ...
Preprint
Full-text available
The rise of capturing systems for objects and scenes in 3D with increased fidelity and immersion has led to the popularity of volumetric video contents that can be seen from any position and angle in 6 degrees of freedom navigation. Such contents need large volumes of data to accurately represent the real world. Thus, novel optimization solutions and delivery systems are needed to enable volumetric video streaming over bandwidth-limited networks. In this chapter, we discuss theoretical approaches to volumetric video streaming optimization, through compression solutions, as well as network and user adaptation, for high-end and low-powered devices. Moreover, we present an overview of existing end-to-end systems, and we point to the future of volumetric video streaming.
... The ITU-T Standard provides two testing methods, subjective and objective methods of testing voice quality. To study real-time voice and video quality performance, numerous studies [224][225][226][227][228] applied QoE objective and subjective evaluation methods (based on the ITU-T testing standard methods). Some authors [229,230] focused their study on the impact of the network impairment of a real-time voice system on the voice's overall perceived quality. ...
Chapter
System identification is a process of creating a mathematical model of a system from its external observations (inputs and outputs). The concept of discovering models from data is trivial in science and engineering fields. The goal of this chapter is to review the recent development in the field of System Identification from the Automatic Control perspective. In the first part of this chapter, we present a classification of design features of Industrial Control Systems (ICSs). Then we review the literature on system identification techniques for creating models of ICSs. The classification of ICSs allows us to identify limitations and unexplored challenges in the literature on system identification techniques.KeywordsSystem identificationModel discoveryIndustrial control systems
... Evaluating both metrics, PLCCs up to 0.90 for the subjective MOS are achieved for a single publicly available dataset. In our own previous work [9], we presented an objective and subjective quality evaluation of point cloud streaming for multiple scenarios in terms of bandwidth, rate adaptation, viewport prediction and user motion. The results show high correlation with MOS for traditional video metrics such as Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM) and Video Quality Metric (VQM). ...
Conference Paper
Full-text available
Point cloud video streaming is a fundamental application of immersive multimedia. In it, objects represented as sets of points are streamed and displayed to remote users. Given the high bandwidth requirements of this content, small changes in the network and/or encoding can affect the users' perceived quality in unexpected manners. To tackle the degradation of the service as fast as possible, real-time Quality of Experience (QoE) assessment is needed. As subjective evaluations are not feasible in real time due to their inherent costs and duration, low-complexity objective quality assessment is a must. Traditional No-Reference (NR) objective metrics at client side are best suited to fulfill the task. However, they lack on accuracy to human perception. In this paper, we present a cluster-based objective NR QoE assessment model for point cloud video. By means of Machine Learning (ML)-based clustering and prediction techniques combined with NR pixel-based features (e.g., blur and noise), the model shows high correlations (up to a 0.977 Pearson Linear Correlation Coefficient (PLCC)) and low Root Mean Squared Error (RMSE) (down to 0.077 on a zero-to-one scale) towards objective benchmarks after evaluation on an adaptive streaming point cloud dataset consisting of sixteen source videos and 453 sequences in total .
... In previous research on subjective assessment of point cloud content such as the MPEG standardisation activity [8], participants were asked to view a video of the point cloud sequence rendered from a predefined camera path. The same approach was used by van der Hooft et al. to subjectively and objectively assess the quality of adaptive streaming for point cloud contents [48]; however, the influence of camera paths on objective quality assessment was shown to be significant in [45]. In order to allow users to interact with the content in a VR/AR environment, the dynamic point clouds need to be rendered in real time. ...
Article
Full-text available
Fuelled by the increase in popularity of virtual and augmented reality applications, point clouds have emerged as a popular 3D format for acquisition and rendering of digital humans, thanks to their versatility and real-time capabilities. Due to technological constraints and real-time rendering limitations, however, the visual quality of dynamic point cloud contents is seldom evaluated using virtual and augmented reality devices, instead relying on prerecorded videos displayed on conventional 2D screens. In this study, we evaluate how the visual quality of point clouds representing digital humans is affected by compression distortions. In particular, we compare three different viewing conditions based on the degrees of freedom that are granted to the viewer: passive viewing (2DTV), head rotation (3DoF), and rotation and translation (6DoF), to understand how interacting in the virtual space affects the perception of quality. We provide both quantitative and qualitative results of our evaluation involving 78 participants, and we make the data publicly available. To the best of our knowledge, this is the first study evaluating the quality of dynamic point clouds in virtual reality, and comparing it to traditional viewing settings. Results highlight the dependency of visual quality on the content under test, and limitations in the way current data sets are used to evaluate compression solutions. Moreover, influencing factors in quality evaluation in VR, and shortcomings in how point cloud encoding solutions handle visually-lossless compression, are discussed.
... Even though subjective quality evaluating techniques, in which the quality of a services/application is measured through users' feedback and tools such as MOS, differential MOS and the ACR-HR [21] [22] [23], are the most popular used techniques, recently, their efficiency is widely questioned by researchers. In IoT, there are some automatic applications, wherein human feedback is not always available, thus, only subjective metrics may not be suitable to evaluate their overall QoE. ...
Preprint
Full-text available
With the rapid growth of the Internet paradigm, a tremendous number of applications and services that require minimal or no human involvement have been developed to enhance the quality of everyday life in various domains. In order to ensure that such services provide their functionalities with the expected quality, it is essential to measure and evaluate this quality, which can be in some cases a challengeable task due to the lack of human intervention and feedback. Today, the vast majority of the Quality of Experience QoE works mainly address the multimedia services. however, the introduction of Internet of Things IoT has brought a new level of complexity into the field of QoE evaluation. With the emerging of the new IoT technologies such as machine to machine communication and artificial intelligence, there is a crucial demand to utilize additional evaluation metrics alongside the traditional subjective and objective human factors and network quality factors. In this systematic review, a comprehensive survey of the QoE evaluation in IoT is presented. It reviews the existing quality of experience definitions, influencing factors, metrics, and models. The review is concluded by identifying the current gaps in the literature and suggested some future research directions accordingly.
Chapter
The trading and communication systems of the wholesale energy market are an essential part of critical national infrastructure. If adversaries were to exploit the vulnerabilities in the wholesale energy trading and communication system, they could disrupt electricity generation and supply nationally, resulting in a devastating chain reaction. In this context, this study provides a review of deployments of security mechanisms for energy market trading and communication systems. This helps to understand the current security controls and challenges better and shines a light on potential research that can be conducted to make trading and communication systems more secure. This review is categorised into four themes: (1) security technologies that can be applied to energy trading and call audit systems, (2) blockchain technology that can be applied to protect energy trading and auditing services, (3) communication technology (voice over IP and video conferencing) that operates in the cloud, and (4) network performance and security management for voice over IP and video conferencing systems. This review investigates the use of blockchain technology that has increasingly emerged in a microgrid (peer-to-peer) energy trading and reveals a gap in using blockchain for macrogrid national energy trading. This study also emphasises the importance of balancing network security and performance when systems are hosted in the cloud.KeywordsBlockchainCyber securityCryptocurrencyPower trading securitySecurity and performanceSecurity technologyNetwork security
Article
Full-text available
Recent trends in multimedia technologies indicate the need for richer imaging modalities to increase user engagement with the content. Among other alternatives, point clouds denote a viable solution that offers an immersive content representation, as witnessed by current activities in JPEG and MPEG standardization committees. As a result of such efforts, MPEG is at the final stages of drafting an emerging standard for point cloud compression, which we consider as the state-of-the-art. In this study, the entire set of encoders that have been developed in the MPEG committee are assessed through an extensive and rigorous analysis of quality. We initially focus on the assessment of encoding configurations that have been defined by experts in MPEG for their core experiments. Then, two additional experiments are designed and carried to address some of the identified limitations of current approach. As part of the study, state-of-the-art objective quality metrics are benchmarked to assess their capability to predict visual quality of point clouds under a wide range of radically different compression artifacts. To carry the subjective evaluation experiments, a web-based renderer is developed and described. The subjective and objective quality scores along with the rendering software are made publicly available, to facilitate and promote research on the field.
Conference Paper
Full-text available
The increasing popularity of head-mounted devices and 360° video cameras allows content providers to offer virtual reality video streaming over the Internet, using a relevant representation of the immersive content combined with traditional streaming techniques. While this approach allows the user to freely move her head, her location is fixed by the camera's position within the scene. Recently, an increased interest has been shown for free movement within immersive scenes, referred to as six degrees of freedom. One way to realize this is by capturing objects through a number of cameras positioned in different angles, and creating a point cloud which consists of the location and RGB color of a significant number of points in the three-dimensional space. Although the concept of point clouds has been around for over two decades, it recently received increased attention by ISO/IEC MPEG, issuing a call for proposals for point cloud compression. As a result, dynamic point cloud objects can now be compressed to bit rates in the order of 3 to 55 Mb/s, allowing feasible delivery over today's mobile networks. In this paper, we propose PCC-DASH, a standards-compliant means for HTTP adaptive streaming of scenes comprising multiple, dynamic point cloud objects. We present a number of rate adaptation heuristics which use information on the user's position and focus, the available bandwidth, and the client's buffer status to decide upon the most appropriate quality representation of each object. Through an extensive evaluation, we discuss the advantages and drawbacks of each solution. We argue that the optimal solution depends on the considered scene and camera path, which opens interesting possibilities for future work.
Conference Paper
Full-text available
Volumetric video is becoming easier to capture and display with the recent technical developments in the acquisition, and display technologies. Using point clouds is a popular way to represent volumetric video for augmented or virtual reality applications. This representation, however, requires a large number of points to achieve a high quality of experience and needs compression before storage and transmission. In this paper, we study the subjective and objective quality assessment results for volumetric video compression, using a state-of-the-art compression algorithm: MPEG Point Cloud Compression Test Model Category 2 (TMC2). We conduct subjective experiments to find the perceptual impacts on compressed volumetric video with different quantization parameters and point counts. Additionally, we find the relationship between the state-of-the-art objective quality metrics and the acquired subjective quality assessment results. To the best of our knowledge, this study is the first to consider TMC2 compression for volumetric video represented as coloured point clouds and study its effects on the perceived quality. The results show that the effect of input point counts for TMC2 compression is not meaningful, and some geometry distortion metrics disagree with the perceived quality. The developed database is publicly available to promote the study of volumetric video compression.
Conference Paper
Full-text available
High-quality point clouds have recently gained interest as an emerging form of representing immersive 3D graphics. Unfortunately, these 3D media are bulky and severely bandwidth intensive, which makes it difficult for streaming to resource-limited and mobile devices. This has called researchers to propose efficient and adaptive approaches for streaming of high-quality point clouds. In this paper, we run a pilot study towards dynamic adaptive point cloud streaming, and extend the concept of dynamic adaptive streaming over HTTP (DASH) towards DASH-PC, a dynamic adaptive bandwidth-efficient and view-aware point cloud streaming system. DASH-PC can tackle the huge bandwidth demands of dense point cloud streaming while at the same time can semantically link to human visual acuity to maintain high visual quality when needed. In order to describe the various quality representations, we propose multiple thinning approaches to spatially sub-sample point clouds in the 3D space, and design a DASH Media Presentation Description manifest specific for point cloud streaming. Our initial evaluations show that we can achieve significant bandwidth and performance improvement on dense point cloud streaming with minor negative quality impacts compared to the baseline scenario when no adaptations is applied.
Conference Paper
Point clouds are a new modality for representation of plenoptic content and a popular alternative to create immersive media. Despite recent progress in capture, display, storage, delivery and processing, the problem of a reliable approach to subjectively and objectively assess the quality of point clouds is still largely open. In this study, we extend the state of the art in projection-based objective quality assessment of point cloud imaging by investigating the impact of the number of viewpoints employed to assess the visual quality of a content, while discarding information that does not belong to the object under assessment, such as background color. Additionally, we propose assigning weights to the projected views based on interactivity information, obtained during subjective evaluation experiments. In the experiment that was conducted, human observers assessed a carefully selected collection of typical contents, subject to geometry and color degradations due to compression. The point cloud models were rendered using cubes as primitive elements with adaptive sizes based on local neighborhoods. Our results show that employing a larger number of projected views does not necessarily lead to better predictions of visual quality, while user interactivity information can improve the performance.
Article
Due to the increased popularity of augmented and virtual reality experiences, the interest in capturing the real world in multiple dimensions and in presenting it to users in an immersible fashion has never been higher. Distributing such representations enables users to freely navigate in multi-sensory 3D media experiences. Unfortunately, such representations require a large amount of data, not feasible for transmission on today’s networks. Efficient compression technologies well adopted in the content chain are in high demand and are key components to democratize augmented and virtual reality applications. The Moving Picture Experts Group, MPEG, as one of the main standardization groups dealing with multimedia, identified the trend and started recently the process of building an open standard for compactly representing 3D point clouds, which are the 3D equivalent of the very well-known 2D pixels. This paper introduces the main developments and technical aspects of this ongoing standardization effort.
Chapter
Characterized by efficient and exquisite representation of the objects or scenarios in the real world, 3D point cloud has been widely applied in large amount of emerging applications such as virtual reality/augmented reality, automatic drive, gaming technologies or robotics. Each point of the data contains 3D geometry information and corresponding photometry information like color, intensity, normal or texture, leading to massive data capacity and severely influence the transmission quality with limited network resources. However, more than a half of the points in each point cloud frame are invisible as being occluded by others from the main viewpoint. To deal with the above issues, we propose a view-dependent streaming for dynamic point cloud based on the novel hybrid networks. We project the point cloud frame into six 2D frames and generate videos with different bitrates in consideration of various user interests. Therefore, differential transmission can be achieved such that the personalized contents like the current consumed viewpoint are transmitted via interactive broadband channel, while the less-attention contents can be pushed through general digital broadcasting channel. Therefore, benefit from existing hybrid transmission systems, reliable services with efficient utilization of limited transmission resources are achieved. Experimental results have shown considerable bandwidth saving based on the proposed scheme, maintaining satisfying reconstruction performance.
Conference Paper
Point clouds are one of the most promising technologies for 3D content representation. In this paper, we describe a study on quality assessment of point clouds, degraded by octree-based compression on different levels. The test contents were displayed using Screened Poisson surface reconstruction, without including any textural information, and they were rated by subjects in a passive way, using a 2D image sequence. Subjective evaluations were performed in five independent laboratories in different countries, with the inter-laboratory correlation analysis showing no statistical differences, despite the different equipment employed. Benchmarking results reveal that the state-of-the-art point cloud objective metrics are not able to accurately predict the expected visual quality of such test contents. Moreover, the subjective scores collected from this experiment were found to be poorly correlated with subjective scores obtained from another test involving visualization of raw point clouds. These results suggest the need for further investigations on adequate point cloud representations and objective quality assessment tools.