Conference PaperPDF Available

Textured Mesh vs Coloured Point Cloud: A Subjective Study for Volumetric Video Compression


Abstract and Figures

Volumetric video (VV) pipelines reached a high level of maturity, creating interest to use such content in interactive visualisation scenarios. VV allows real world content to be captured and represented as 3D models, which can be viewed from any chosen viewpoint and direction. Thus, VV is ideal to be used in augmented reality (AR) or virtual reality (VR) applications. Both textured polygonal meshes and point clouds are popular methods to represent VV. Even though the signal and image processing community slightly favours the point cloud due to its simpler data structure and faster acquisition, textured polygonal meshes might have other benefits such as better visual quality and easier integration with computer graphics pipelines. To better understand the difference between them, in this study, we compare these two different representation formats for a VV compression scenario utilising state-of-the-art compression techniques. For this purpose, we build a database and collect user opinion scores for subjective quality assessment of the compressed VV. The results show that meshes provide the best quality at high bitrates, while point clouds perform better for low bitrate cases. The created VV quality database will be made available online to support further scientific studies on VV quality assessment.
Content may be subject to copyright.
Textured Mesh vs Coloured Point Cloud: A
Subjective Study for Volumetric Video Compression
Emin Zerman, Cagri Ozcinar, Pan Gao, and Aljosa Smolic
V-SENSE, School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland.
Email: {zermane, ozcinarc, smolica}
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China.
Abstract—Volumetric video (VV) pipelines reached a high level
of maturity, creating interest to use such content in interactive
visualisation scenarios. VV allows real world content to be
captured and represented as 3D models, which can be viewed
from any chosen viewpoint and direction. Thus, VV is ideal
to be used in augmented reality (AR) or virtual reality (VR)
applications. Both textured polygonal meshes and point clouds
are popular methods to represent VV. Even though the signal
and image processing community slightly favours the point cloud
due to its simpler data structure and faster acquisition, textured
polygonal meshes might have other benefits such as better visual
quality and easier integration with computer graphics pipelines.
To better understand the difference between them, in this study,
we compare these two different representation formats for a
VV compression scenario utilising state-of-the-art compression
techniques. For this purpose, we build a database and collect user
opinion scores for subjective quality assessment of the compressed
VV. The results show that meshes provide the best quality at high
bitrates, while point clouds perform better for low bitrate cases.
The created VV quality database will be made available online
to support further scientific studies on VV quality assessment.
Index Terms—Volumetric video, textured mesh, point cloud,
subjective quality assessment, point cloud compression, mesh
2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX)
Volumetric video (VV) –also known as free-viewpoint
video– is a new form of immersive visual media which enables
viewers to look at the captured content from any direction
and any viewing angle. The technical advancements made in
capture and display technologies helped VV content creation
and delivery pipelines reach a high level of maturity. The
VV is generally captured in dedicated studios using several
cameras looking inwards [1]–[5]. The captured volumetric
content is ideal for augmented reality (AR) or virtual reality
(VR) applications, and can be used in various applications
from entertainment to education, and communication. As the
interest towards VV and its applications in AR and VR grow,
the visual media community continues the efforts towards
This publication has emanated from research conducted with the finan-
cial support of Science Foundation Ireland (SFI) under the Grant Number
(a) Textured mesh structure
(Mesh edges are plotted black)
(b) Textured triangular mesh
(c) Coloured point cloud structure
(Rendered with a small point size)
(d) Coloured point cloud
(Rendered with a large point size)
Fig. 1. Sample snapshots from the textured mesh and the point cloud of the
same volumetric video.
standardisation of VV delivery. Especially, point cloud com-
pression is a hot topic which attracts attention from researchers
from academia and industry [6], [7], including JPEG [8] and
MPEG [9] communities. In parallel, scientists also try to un-
derstand how perception and quality assessment (QA) change
for this new immersive visual media [10]–[15]. Nevertheless,
currently, there are no publicly available large QA databases
for VV which can be used for understanding of QA for VV.
Textured polygonal meshes [2] and point clouds [16] are
two popular formats to represent and store the captured VV.
Sample figures for these formats can be seen in Fig. 1.
Textured polygonal meshes consist of vertices, edges with
connectivity information, and a texture atlas to colourise the
3D model. Hence, polygonal meshes need to preserve the
connectivity information, and this brings some limitations
during compression and transmission. Point clouds, instead,
forego the connectivity information in favour of easier storage
and simplicity. Moreover, since point clouds do not have
connectivity information, they can be acquired and stored in
any order as long as the whole of the point cloud is considered.
The recent standardisation efforts [8], [9] show that the signal
processing community slightly favours point clouds nowadays978-1-7281-5965-2/20/$31.00 ©2020 IEEE
due to its easier acquisition and simpler data representation.
Nevertheless, textured polygonal meshes have other benefits
such as its easier integration with computer graphics pipelines
and a piecewise continuous surface representation.
In this paper, we aim to understand the perceptual differ-
ences between the two most commonly used VV represen-
tation formats: textured polygonal meshes and point clouds.
For this purpose, we create a database including both VV
representations and build a subjectively-annotated VV quality
database considering the compression scenario. Besides com-
paring textured polygonal meshes and coloured point clouds,
we also compare different compression methods. The main
contributions of this paper are:
a comparison of state-of-the-art VV compression meth-
ods: Draco [17], G-PCC [9], and V-PCC [9]
a comparison of the effect of two different VV repre-
sentation techniques (i.e., textured meshes and coloured
point clouds) on compression performance, and
a QA database with subjectively annotated mean opinion
score (MOS) values for VVs comprising of a total of 152
compressed VVs, generated from eight different contents
and using three different compression methods.
the details of which are discussed in the following sections.
Recent years have seen a peak in the interest for the creation
of VV [1]–[5]. Using different number of cameras and sensors,
these methods capture the real-life 3D content from the scene.
For example, Alexiadis et al. [1] used four Microsoft Kinect
sensors, each of which consists of an RGB camera and a
depth scanner. A more advanced system developed by Mi-
crosoft [2] captures the scene with 106 cameras. Afterwards,
the reconstructed mesh is temporally tracked and turned into
a streamable bitstream. Pages et al. [3], instead, propose a
more affordable system which uses 12 cameras for 3D mesh
reconstruction. In a more recent work [5], 32 cameras are
used to generate VV. Often in these studies, the reconstruction
starts with point cloud generation and proceeds with meshing.
Nevertheless, recent point cloud databases [16], [18] show that
VV can be also represented as point clouds.
Polygonal meshes have long been used in computer graphics
applications from virtual 3D representations of objects to
video games. Thus, mesh compression is a widely researched
topic in computer graphics [19]. Nevertheless, for most of the
applications, the said 3D models were computer generated, and
thus there was no need to change the connectivity (or structure)
of a 3D mesh structure except for compression purposes (e.g.,
progressive meshes). However, for VV applications, there are
certain limitations stemming from 3D reconstruction. In most
of the cases, the volumetric model is reconstructed for each
frame of source video independently. That is, there are initially
no temporal constraints imposed during the reconstruction pro-
cess. This can be remedied in post-reconstruction by temporal
tracking of meshes [2]. Even in this case, it is likely that
these temporally consistent groups of meshes will not have
the same length (i.e. number of frames), depending on the
nature of the motion. Therefore, representation as point clouds
for storage and delivery of VV has certain advantages not
requiring connectivity. Hence, point cloud compression is a hot
research topic now, as many researchers [6], [7] and JPEG [8]
and MPEG [9] communities are working towards it.
There are many studies conducted for the quality assessment
of polygonal meshes [20]–[23] and point clouds [10]–[15].
Nevertheless, only some of these studies considered the state-
of-the-art compression methods in a way which can be ben-
eficial for a VV setting. Doumanoglou et al. [22] and Chris-
taki et al. [23] studied different open source mesh compression
algorithms and found Google’s Draco the best performing
among the compared. Su et al. [12] used static point clouds to
compare three different encoders: L-PCC, S-PCC, and V-PCC.
According to the results of their subjective experiment, V-
PCC was found better than other methods. Alexiou et al. [13]
also used static point clouds to compare G-PCC and V-
PCC both subjectively and objectively. These studies provide
some insightful results even though they consider static 3D
models. On the other hand, studies considering the QA of
VVs (i.e., time-sequences of either meshes or point clouds)
are rather limited [14], [15]. Zerman et al. [14] used two VVs
and considered TMC2 (i.e., V-PCC), also creating an initial
subjective VV quality database. Gonc¸alves et al. [15] used
instead the 8i VV database [16] and compared TMC2 (i.e.,
V-PCC) to PCC [6]. TMC2 was found to be better than PCC.
To the best of our knowledge, this paper is the first study to
compare textured meshes and point clouds for a compression
scenario. Even though the differences between point clouds
and meshes (reconstructed from the said point clouds) were
compared in only one study [24], there were no conclusions
on which one is better. Instead, in this study, our findings give
a clear recommendation for the VV compression scenario.
To build the VV quality database and collect viewer quality
opinions on the comparison of textured meshes and point
clouds, we conducted a subjective quality experiment. The
details are discussed in the following subsections. In order
to foster scientific works for VV research, the database with
the collected subjective quality scores will be made publicly
available on the project webpage1.
Database Point
Mesh Contents
V-SENSE X X AxeGuy, LubnaFriends,
Rafa2, Matis
8i [16] XXLongdress, Loot,
Redandblack, Soldier
A. Stimuli & Database
For the creation of the database, we selected eight different
VVs in total, as shown in Table I. Four of these VVs were assessment-for-fvv-compression/
(a) AxeGuy
[v:25K / p:405K]
(b) LubnaFriends
[v:25K / p:402K]
(c) Rafa2
[v:25K / p:406K]
(d) Matis
[v:25K / p:406K]
(e) Longdress
(f) Loot
(g) Redandblack
(h) Soldier
Fig. 2. Selected VVs from (a-d) V-SENSE and (e-h) 8i databases [16]. Vertex (i.e.,v:) and point counts (i.e.,p:) are indicated in brackets for textured mesh
and point cloud representations respectively. All textured meshes have 50K faces. All VVs have 300 frames and are played with 30 fps.
generated by us (called “V-SENSE” from this point on) using
the method of Pages et al. [3]. The remaining four of these
VVs were taken from the 8i point cloud dataset [16]. While
V-SENSE database had both textured meshes and point clouds,
8i database only had point clouds. Sample renders for the
selected VVs are shown in Fig. 2, together with other details.
The collected data contains only human bodies as this was the
only content available for volumetric videos at the moment.
Anticipating that the VV will be used alongside the tra-
ditional visual media in the near future, we consider the
compression scenario in this paper. Considering the findings
in recent studies [22], [23], Google’s Draco encoder was
selected to compress VV represented as polygonal meshes.
In this case, JPEG compression was used for texture atlases.
For VVs represented as point clouds, we consider the state-
of-the-art compression algorithms which are being developed
in MPEG standardisation efforts [9]: G-PCC and V-PCC.
Draco and G-PCC were proposed for the aim of compression
of static volumetric contents, and V-PCC was developed for
compression of VV. Since Draco and G-PCC do not consider
temporal redundancies, we included the all-intra option of
V-PCC to compare them in a fair way. In total, we consider
four different cases as shown in Table II.
The Draco encoder was downloaded from the GitHub
page [17] and run on a Linux system. The texture atlases were
compressed using JPEG with different Quality levels in
Matlab. For the compression of point clouds, we used G-PCC
(mpeg-pcc-tmc13v6.0) and V-PCC (mpeg-pcc-tmc2v5.0) en-
coders, following the descriptions in the MPEG document [25]
for RAHT,all-intra, and random-access modes.
For the point clouds, we selected six different levels for
Encoders Description
Draco+JPEG Draco encoder for polygonal meshes was used to-
gether with JPEG for texture atlases
G-PCC (RAHT) G-PCC encoder with region-adaptive hierarchical
transform (RAHT) to compress point-wise colour
V-PCC (AI) V-PCC encoder where each frame was intra coded
V-PCC (RA) V-PCC encoder with random access setting
Quality Levels
R1 R2 R3 R4 R5 R6
QP 8 10 10 12 12 12
QT 6 10 10 10 12 12
JPEG 0 0 5 10 30 55
depth 10 10 10 10 10 10
level 6 7 7 7 8 10
colSt 64 32 16 8 4 1
V-PCC geoQP 32 28 24 20 16 -
texQP 42 37 32 27 22 -
G-PCC and five different levels for V-PCC as given in the
MPEG’s common test conditions (CTC) for point cloud com-
pression [25]. For the polygonal meshes, the coding param-
eters for six different quality levels were selected in a pilot
study which was conducted with three expert viewers. Out of
these six levels, five levels were selected to correspond to ACR
categories [26], and another level (with a higher bandwidth
requirement) was selected so that we can see the extent of
the curve for the ‘Draco+JPEG’ case. The selected parameters
are given in Table III. This stimuli preparation resulted in 152
distorted videos created from eight different contents.
For visualisation of the VV, a passive approach was used,
which also made sure that all the participants were presented
with the same stimuli. For this purpose, the compressed VVs
were rendered using Blender (version 2.80) with “Point cloud
visualizer” add-on2, and the rendered representations were
stored as traditional videos. Using ffmpeg, these videos are
compressed with x264 and -crf 15 parameter to ensure that
the compression is perceptually lossless. The point sizes were
selected to ensure that the stimuli would look watertight, both
for textured triangular meshes and coloured point clouds. The
VV was placed in the scene and camera was set to orbit the
VV’s initial origin twice in the clockwise direction within
video duration (i.e., 10 sec). – Distributed under GNU General Public License
(a) AxeGuy (b) LubnaFriends (c) Rafa2 (d) Matis
Fig. 3. Comparing textured mesh (Draco) and point cloud (G-PCC, V-PCC) compression results: MOS vs bitrate plots for the VVs in the V-SENSE database.
The horizontal lines indicate MOS for reference VVs for each content: solid red line for textured mesh and dashed blue line for point cloud.
B. Experiment Setup
As mentioned in the previous subsection, to reduce inter-
subject variation, a passive approach was chosen for viewer
interaction with the VV. A rendered version of each VV was
shown to the participants on a 24” LCD display. The distance
between the viewers and the screen was set to three times of
the stimuli height, which was around 1 metre. The experiment
was conducted in a dark room as recommended by ITU [26].
C. Participants & Procedure
For this test, a call for participation was announced through
e-mail lists within the university. In total 23 participants, 17
male and 6 female, volunteered to participate in the study. The
average age of the participants was 30.9 (Std = 4.14).
The participants were first trained on ‘Rafa’ sequence [14]
(different from ‘Rafa2’ used in this paper) with different
V-PCC compression levels. Afterwards, only compressed ex-
amples from different bitrate levels were shown to the par-
ticipants without providing specific instructions, to familiarise
them while avoiding any bias.
Considering that both textured meshes and point clouds
were present in the experiment, we wanted to conduct the
test using a single stimulus methodology in order to avoid
affecting the users by showing an explicit “reference” stimulus.
To minimise the time required for the experiment, the absolute
category rating (ACR) with hidden reference was selected as
the test methodology [26]. References for both meshes and
point clouds were shown separately (see Fig. 3). Each stimulus
was 10 seconds long, and participants voted for each stimulus
in 1.5 sec on average. The experiment was conducted in two
sessions, and each session was kept under 30 minutes to avoid
participant fatigue. In order to get consistent results for the
comparison between meshes and point clouds, the first session
consisted of the V-SENSE database with both representations,
and the second session consisted of 8i dataset with point
clouds. To be able to merge the subjective scores afterwards,
a common set was present in both sessions. That is, 44 videos
in total were present in both sessions, including both textured
meshes and coloured point clouds to avoid bias.
In this section, we first calculate the mean opinion score
(MOS) for each stimulus, and then we compare textured
meshes with point clouds. We also compare different state-
of-the-art compression methods, including also a quantitative
analysis based on subjective quality scores.
A. Calculation of MOS
Although the participants were shown sample VVs in the
training part prior to the experiment, we noticed that each
participant had a different understanding of the scale during
rating. To reduce the subject variability and keep the overall
mean, we follow the procedure described by Athar et al. [27].
According to this, the raw opinion scores, sij , for each subject
iand stimulus j, were used to calculate the z-score as follows:
zij =sij µi
where µiand σiare the mean and standard deviation of all
the opinion scores of the subject i. The MOS was then found
as follows:
MOSj=σrmos M OSz
σzmos +µrmos (2)
where MOSz
jis the mean of z-scores for the stimulus j,µand
σare the mean and standard deviation of either raw opinion
scores (rmos) or calculated z-scores (zmos). In the remainder
of this paper, we use the MOS values as computed in (2).
B. Comparing Mesh with Point Clouds
After calculating the MOS values for all the stimuli, we plot
the MOS vs bitrate figures and compare the performances of
the two VV representation methods. We must note that this
comparison (see Fig. 3) has only been done for the four VVs
of the V-SENSE dataset, since only they are available in both
textured mesh and coloured point cloud format.
It can be seen from the plots that the textured mesh
has a higher maximum MOS value compared to the point
cloud representation. On the other hand, point clouds seem
to be a better choice for the scenarios where the transmission
bandwidth and storage capacity are limited. In fact, textured
meshes seem to be the best choice for a no-compression case;
however, in most cases that the VV is compressed, point clouds
seem to have a good balance between the perceived quality
and the bitrate. The reduction in the MOS for the point cloud
case can be explained by changing perceptual resolution. Since
point clouds have to sample the texture of the model, this
(a) AxeGuy (b) LubnaFriends (c) Rafa2 (d) Matis
(e) Longdress (f) Loot (g) Redandblack (h) Soldier
Fig. 4. Comparing different point cloud (G-PCC, V-PCC) compression results: MOS vs bitrate plots for all the VVs considered in this study. The horizontal
dashed blue line indicates the MOS for the reference point cloud VV for each content.
might have the effect of reducing the apparent resolution due
to sampling and point cloud sizes.
It can also be seen that two of the VVs (‘LubnaFriends’
and ‘Matis’) were found to be of lower quality than the other
two VVs in the V-SENSE database. It has been noted by
the providers that ‘LubnaFriends’ and ‘Matis’ sequences were
generated with an earlier version of their technology compared
to ‘AxeGuy’ and ‘Rafa2’. The reason for this difference is
related to the reconstruction technology. Nevertheless, we
believe that including these VVs in the dataset brings value as
they have different characteristics and limitations.
C. Comparing Point Cloud Compression Methods
In addition to the comparison between meshes and point
clouds, we also compare the state-of-the-art point cloud com-
pression methods, using our database. To help readers see the
results for the four VVs mentioned in the previous subsection,
we plot the figures again without the “Draco+JPEG” curve.
The resulting plots for all VVs are shown in Fig. 4. Examining
the plots, we can observe that “G-PCC (RAHT)” is not
better than V-PCC in any case. This shows that the V-PCC
compression, which is based on video coding of projected
point clouds, yields better results with a smaller bitrate than
G-PCC, which is based on octree coding of points in 3D space.
As can be expected, the “Random Access” option of V-PCC
is more effective than the “All Intra” case.
Similar to the discussion in the previous subsection, the
difference between maximum MOS values for each content
is related to reconstruction errors. 8i content was captured
in a more professional studio setting with more cameras.
Additionally, another reason of this MOS difference can be
the difference in point counts, i.e. resolution. The point clouds
provided with the 8i database had around 800,000 points
while the ones from the V-SENSE dataset had around 400,000
Sequence Point Cloud Coding Scheme
AxeGuy 22.72 54.86 73.20
LubnaFriends 17.23 69.54 138.88
Rafa2 47.32 112.71 176.92
Matis 22.92 67.00 75.08
Average 27.55 76.03 115.99
points. This might also have affected the visual quality as the
resolution affects the appearance significantly.
D. Subjective BD-MOS Analysis
In this section, we quantitatively evaluate the performance
of the various mesh and point cloud compression schemes.
Defined similarly to the Bjontegaard Delta PSNR (BD-
PSNR) [28], Bjontegaard Delta MOS (BD-MOS) was devel-
oped in [29] for measuring the quality difference between
different algorithms for the same bit rate by using MOS values.
As demonstrated in [29], the BD-MOS can lead to better
results since it can fully take the saturation effect of the human
visual system and the full nature of the artefacts into account.
Therefore, we adopt BD-MOS for an additional subjective
quality comparison and analysis. Table IV summarises BD-
MOS gains of the various point cloud coding approaches over
the mesh coding scheme, i.e., “Draco+JPEG scheme”. As can
be observed, the “V-PCC (Random Access)” performs the best,
which can achieve a MOS gain from 73.2 to 176.9.
Table V gives the BD-MOS gains among point cloud com-
pression scheme, in which the “G-PCC (RAHT)” is used as the
anchor. It can also be found that “V-PCC (Random Access)”
Sequence V-PCC scheme
AxeGuy 17.81 36.12
LubnaFriends 18.11 30.75
Rafa2 28.97 42.45
Matis 14.35 18.53
Longdress 22.35 30.50
Loot 29.60 39.19
Redandblack 22.75 30.19
Soldier 25.47 44.64
Average 22.43 34.05
which exploits the temporal redundancy during coding yields
the best subjective quality, and achieves a BD-MOS gain of
34.05 on average for all the test sequences.
In this paper, we report building a subjectively anno-
tated quality database for volumetric video (VV) in order
to compare two different representations for VV considering
the compression scenario: textured meshes and point clouds.
Moreover, we compared different state-of-the-art point cloud
and mesh compression methods with objective and subjective
experiments. It can be inferred from the results that the
textured meshes provide the best visualisation and are more
advantageous for high-bitrate bandwidth (e.g., over 50 Mbps)
applications whereas the point clouds are well-suited for
applications with limited bandwidth (e.g., below 20 Mbps).
We can also say that “V-PCC (Random Access)” was found
to be more effective than all the other compression schemes
considered for most of the cases. One must note that the
constructed database had only human bodies and the findings
of this study are only valid for human bodies.
As future work, we aim to utilise this dataset with a better
perceptually suited quality assessment method, and we also
aim to complement this study with a comprehensive evaluation
and benchmark of the objective quality metrics for VV. The
created database will be made publicly available and can be
used for further scientific studies in VV compression or quality
[1] D. S. Alexiadis, D. Zarpalas, and P. Daras, “Real-time, full 3-D
reconstruction of moving foreground objects from multiple consumer
depth cameras,” IEEE Trans. Multimedia, vol. 15, no. 2, Feb 2013.
[2] A. Collet, M. Chuang, P. Sweeney, D. Gillett, D. Evseev, D. Calabrese,
H. Hoppe, A. Kirk, and S. Sullivan, “High-quality streamable free-
viewpoint video,ACM Trans. Graphics, vol. 34, no. 4, Jul. 2015.
[3] R. Pag´
es, K. Amplianitis, D. Monaghan, J. Ondˇ
rej, and A. Smoli´
“Affordable content creation for free-viewpoint video and VR/AR ap-
plications,” J. Visual Commun. Image Represent., vol. 53, 2018.
[4] “8i,”, accessed: 2020-01-25.
[5] O. Schreer, I. Feldmann, S. Renault, M. Zepp, M. Worchel, P. Eisert,
and P. Kauff, “Capture and 3D video processing of volumetric video,” in
IEEE International Conference on Image Processing (ICIP), Sep. 2019.
[6] R. Mekuria, K. Blom, and P. Cesar, “Design, implementation, and
evaluation of a point cloud codec for tele-immersive video,IEEE Trans.
Circuits Syst. Video Technol., vol. 27, no. 4, pp. 828–842, 2016.
[7] A. Javaheri, C. Brites, F. Pereira, and J. Ascenso, “Subjective and
objective quality evaluation of compressed point clouds,” in IEEE In-
ternational Workshop on Multimedia Signal Processing (MMSP), 2017.
[8] S. Perry, A. Pinheiro, E. Dumic, C. da Silva, and A. Luis, “Study of
subjective and objective quality evaluation of 3D point cloud data by
the JPEG committee,” in IS&T Electronic Imaging, Image Quality and
System Performance XVI, 2019, pp. 312–1–312–6.
[9] S. Schwarz, M. Preda, V. Baroncini, M. Budagavi, P. Cesar, P. A. Chou,
R. A. Cohen, M. Krivoku´
ca, S. Lasserre, Z. Li et al., “Emerging MPEG
standards for point cloud compression,” IEEE Trans. Emerg. Sel. Topics
Circuits Syst., vol. 9, no. 1, pp. 133–148, Mar 2019.
[10] R. Mekuria, Z. Li, C. Tulvan, and P. Chou, “Evaluation criteria for
PCC (Point Cloud Compression),” ISO/IEC JTC 1/SC29/WG11 Doc.
N16332, 2016.
[11] D. Tian, H. Ochimizu, C. Feng, R. Cohen, and A. Vetro, “Geometric
distortion metrics for point cloud compression,” in IEEE International
Conference on Image Processing (ICIP), Sept 2017, pp. 3460–3464.
[12] H. Su, Z. Duanmu, W. Liu, Q. Liu, and Z. Wang, “Perceptual quality
assessment of 3D point clouds,” in IEEE International Conference on
Image Processing (ICIP). IEEE, 2019, pp. 3182–3186.
[13] E. Alexiou, I. Viola, T. M. Borges, T. A. Fonseca, R. L. de Queiroz, and
T. Ebrahimi, “A comprehensive study of the rate-distortion performance
in MPEG point cloud compression,” APSIPA Transactions on Signal and
Information Processing, vol. 8, 2019.
[14] E. Zerman, P. Gao, C. Ozcinar, and A. Smolic, “Subjective and ob-
jective quality assessment for volumetric video compression,” in IS&T
Electronic Imaging, Image Quality and System Performance XVI, 2019.
[15] M. Gonc¸alves, L. Agostini, D. Palomino, M. Porto, and G. Correa,
“Encoding efficiency and computational cost assessment of state-of-the-
art point cloud codecs,” in IEEE International Conference on Image
Processing (ICIP). IEEE, 2019, pp. 3726–3730.
[16] E. d’Eon, B. Harrison, T. Myers, and P. A. Chou, “8i voxelized full bod-
ies, version 2 – A voxelized point cloud dataset,” ISO/IEC JTC1/SC29
Joint WG11/WG1 (MPEG/JPEG) input document m40059/M74006, Jan
2017, Geneva, Switzerland.
[17] “Google Draco,”, accessed: 2020-01-25.
[18] T. Ebner, I. Feldmann, O. Schreer, P. Kauff, and T. Unger, “HHI point
cloud dataset of a boxing trainer,” ISO/IEC JTC1/SC29 Joint WG11
(MPEG) input document m42921, Jul 2018, Ljubljana, Slovenia.
[19] A. Maglo, G. Lavou´
e, F. Dupont, and C. Hudelot, “3D mesh com-
pression: Survey, comparisons, and emerging trends,” ACM Computing
Surveys (CSUR), vol. 47, no. 3, 2015.
[20] J. Guo, V. Vidal, I. Cheng, A. Basu, A. Baskurt, and G. Lavoue,
“Subjective and objective visual quality assessment of textured 3D
meshes,” ACM Trans. Appl. Percept. (TAP), vol. 14, no. 2, 2017.
[21] K. Vanhoey, B. Sauvage, P. Kraemer, and G. Lavou´
e, “Visual quality
assessment of 3D models: on the influence of light-material interaction,”
ACM Trans. Appl. Percept. (TAP), vol. 15, no. 1, 2017.
[22] A. Doumanoglou, P. Drakoulis, N. Zioulis, D. Zarpalas, and P. Daras,
“Benchmarking open-source static 3D mesh codecs for immersive media
interactive live streaming,IEEE Trans. Emerg. Sel. Topics Circuits Syst.,
vol. 9, no. 1, pp. 190–203, March 2019.
[23] K. Christaki, E. Christakis, P. Drakoulis, A. Doumanoglou, N. Zioulis,
D. Zarpalas, and P. Daras, “Subjective visual quality assessment of im-
mersive 3D media compressed by open-source static 3D mesh codecs,
in 25th Int. Conf. on MultiMedia Modeling (MMM), 2019.
[24] E. Alexiou, M. V. Bernardo, L. A. da Silva Cruz, L. G. Dmitrovic,
C. Duarte, E. Dumic, T. Ebrahimi, D. Matkovic, M. Pereira, A. Pinheiro,
and A. Skodras, “Point cloud subjective evaluation methodology based
on 2D rendering,” in 10th International Conference on Quality of
Multimedia Experience (QoMEX), 2018.
[25] S. Schwarz, P. A. Chou, and M. Budagavi, “Common test conditions for
point cloud compression,” ISO/IEC JTC1/SC29 Joint WG11 (MPEG)
input document N17345, Jan 2018, Gwangju, Korea.
[26] ITU-T, “Subjective video quality assessment methods for multimedia
applications,” ITU-T Recommendation P.910, Apr 2008.
[27] S. Athar, T. Costa, K. Zeng, and Z. Wang, “Perceptual quality assessment
of UHD-HDR-WCG videos,” in IEEE International Conference on
Image Processing (ICIP). IEEE, 2019, pp. 1740–1744.
[28] G. Bjontegaard, “Improvements of the BD-PSNR model,” in ITU-T
SG16/Q6, 35th VCEG Meeting, Berlin, Germany, July, 2008.
[29] P. Hanhart, M. Rerabek, F. De Simone, and T. Ebrahimi, “Subjective
quality evaluation of the upcoming HEVC video compression standard,
in Applications of Digital Image Processing XXXV. SPIE, 2012.
... A polygon mesh consists of edges, vertices, and surfaces. Point clouds are preferred over meshes as they are easy to capture, store, and transmit and do not need connectivity information [1,6,7]. ...
... There are several works on the visual quality assessment of point clouds reported in the literature, as also listed in Table 1 [1][2][3][6][7][8][9][10][11][12][13][14][15][16][17][18]. These studies can be classified into two categories, among other criteria. ...
... These studies can be classified into two categories, among other criteria. The first type of studies evaluates geometrical degradations [1,2,8,[10][11][12], while the second type focuses on assessing the impact of compression algorithms [3,6,7,[13][14][15][16][17]. A more comprehensive overview of subjective studies for point cloud contents can be found in [19]. ...
... cloud [11,34], voxel [33], mesh [34], and even neural representations [23]. Volumetric video is envisioned as a fundamental service that is able to facilitate various new applications such as extended reality (XR) and Metaverse, empowering entertainment [20], healthcare [7], and education [2], etc. ...
... cloud [11,34], voxel [33], mesh [34], and even neural representations [23]. Volumetric video is envisioned as a fundamental service that is able to facilitate various new applications such as extended reality (XR) and Metaverse, empowering entertainment [20], healthcare [7], and education [2], etc. ...
Full-text available
Volumetric video emerges as a new attractive video paradigm in recent years since it provides an immersive and interactive 3D viewing experience with six degree-of-freedom (DoF). Unlike traditional 2D or panoramic videos, volumetric videos require dense point clouds, voxels, meshes, or huge neural models to depict volumetric scenes, which results in a prohibitively high bandwidth burden for video delivery. Users' behavior analysis, especially the viewport and gaze analysis, then plays a significant role in prioritizing the content streaming within users' viewport and degrading the remaining content to maximize user QoE with limited bandwidth. Although understanding user behavior is crucial, to the best of our best knowledge, there are no available 3D volumetric video viewing datasets containing fine-grained user interactivity features, not to mention further analysis and behavior prediction. In this paper, we for the first time release a volumetric video viewing behavior dataset, with a large scale, multiple dimensions, and diverse conditions. We conduct an in-depth analysis to understand user behaviors when viewing volumetric videos. Interesting findings on user viewport, gaze, and motion preference related to different videos and users are revealed. We finally design a transformer-based viewport prediction model that fuses the features of both gaze and motion, which is able to achieve high accuracy at various conditions. Our prediction model is expected to further benefit volumetric video streaming optimization. Our dataset, along with the corresponding visualization tools is accessible at
The offshore industry depends largely on usage of indirect methods, such as conventional seismics, complimented by well-scale test data and cores. These data are typically of varying quality and scales, creating an issue of non-uniqueness in the description of subsurface reservoirs used for e.g. energy and water production. The inclusion of onshore data from geological outcrops can increase the accuracy of 3D digital models used for subsurface characterization, as onshore observations can be made from micron to kilometre-scale, constraining patterns of spatial heterogeneity within geological features. We present a workflow for collecting, processing and combining geological and geophysical data surveyed at the Rørdal Chalk Quarry (Jutland, Denmark); an onshore analogue to the Maastrichtian section of the Danish North Sea. Detailed observations from these datasets can be used for identifying structural features and trends that are too small to be resolved by large-scale continuous geophysical datasets. We focus on digital outcrop models (DOMs), ground penetrating radar (GPR) and shallow seismic. The Rørdal quarry is excavated at a current rate of approximately 20–30 metres per year. The production thus continuously reveals new sections of strata. We take advantage of this unique possibility by repeatedly surveying the naturally fractured chalks using the applied methods. Strategic data collection has enabled tracing of structural elements in a three-dimensional domain. The construction of 3D volumes creates a solid foundation for a conceptual geological model as well as for static modelling of the quarry, which would not be obtainable working with individual datasets. As different geological features appear differently in the geological and geophysical data, their lateral extent and orientations can be evaluated in greater detail. The DOMs allows for structural interpretation and analysis in the digital space. We have established a statistically unique dataset containing 27,400 digital fracture interpretations on cm scale. This would not be possible to collect manually in the field. This extensive dataset offers two key strengths to alleviate typical uncertainties in outcrop studies; the dataset is large and it is taken from an area of near-continuous exposure, providing a population of observations that we believe to be both statistically significant and representative of natural variability in the system.KeywordsChalkFracturesGPRPhotogrammetryOutcrop analogues
Full-text available
Recent trends in multimedia technologies indicate the need for richer imaging modalities to increase user engagement with the content. Among other alternatives, point clouds denote a viable solution that offers an immersive content representation, as witnessed by current activities in JPEG and MPEG standardization committees. As a result of such efforts, MPEG is at the final stages of drafting an emerging standard for point cloud compression, which we consider as the state-of-the-art. In this study, the entire set of encoders that have been developed in the MPEG committee are assessed through an extensive and rigorous analysis of quality. We initially focus on the assessment of encoding configurations that have been defined by experts in MPEG for their core experiments. Then, two additional experiments are designed and carried to address some of the identified limitations of current approach. As part of the study, state-of-the-art objective quality metrics are benchmarked to assess their capability to predict visual quality of point clouds under a wide range of radically different compression artifacts. To carry the subjective evaluation experiments, a web-based renderer is developed and described. The subjective and objective quality scores along with the rendering software are made publicly available, to facilitate and promote research on the field.
Conference Paper
Full-text available
High Dynamic Range (HDR) Wide Color Gamut (WCG) Ultra High Definition (4K/UHD) content has become increasingly popular recently. Due to the increased data rate, novel video compression methods have been developed to maintain the quality of the videos being delivered to consumers under bandwidth constraints. This has led to new challenges for the development of objective Video Quality Assessment (VQA) models, which are traditionally designed without sufficient calibration and validation based on subjective quality assessment of UHD-HDR-WCG videos. The large performance variations between different consumer HDR TVs, and between consumer HDR TVs and professional HDR reference displays used for content production, further complicates the task of acquiring reliable subjective data that faithfully reflects the impact of compression on UHD-HDR-WCG videos. In this work, we construct a first-of-its-kind video database composed of PQ-encoded UHD-HDR-WCG content, which is subsequently compressed by H.264 and HEVC encoders. We carry out a subjective study on a professional 4K-HDR reference display in a controlled lab environment. We also benchmark representative Full Reference (FR) and No-Reference (NR) objective VQA models against the subjective data to evaluate their performance on compressed UHD-HDR-WCG video content. The database will be made available to the public, subject to content copyright constraints. (Among Top 10% of accepted papers in ICIP2019:
Conference Paper
Full-text available
Volumetric video is becoming easier to capture and display with the recent technical developments in the acquisition, and display technologies. Using point clouds is a popular way to represent volumetric video for augmented or virtual reality applications. This representation, however, requires a large number of points to achieve a high quality of experience and needs compression before storage and transmission. In this paper, we study the subjective and objective quality assessment results for volumetric video compression, using a state-of-the-art compression algorithm: MPEG Point Cloud Compression Test Model Category 2 (TMC2). We conduct subjective experiments to find the perceptual impacts on compressed volumetric video with different quantization parameters and point counts. Additionally, we find the relationship between the state-of-the-art objective quality metrics and the acquired subjective quality assessment results. To the best of our knowledge, this study is the first to consider TMC2 compression for volumetric video represented as coloured point clouds and study its effects on the perceived quality. The results show that the effect of input point counts for TMC2 compression is not meaningful, and some geometry distortion metrics disagree with the perceived quality. The developed database is publicly available to promote the study of volumetric video compression.
Conference Paper
The SC29/WG1 (JPEG) Committee within ISO/IEC is currently working on developing standards for the storage, compression and transmission of 3D point cloud information. To support the creation of these standards, the committee has created a database of 3D point clouds representing various quality levels and use-cases and examined a range of 2D and 3D objective quality measures. The examined quality measures are correlated with subjective judgments for a number of compression levels. In this paper we describe the database created, tests performed and key observations on the problems of 3D point cloud quality assessment.
This work provides a systematic understanding of the requirements of live 3D mesh coding, targeting (tele-)immersive media streaming applications. We thoroughly benchmark in rate-distortion and runtime performance terms, four static 3D mesh coding solutions that are openly available. Apart from mesh geometry and connectivity, our analysis includes experiments for compressing vertex normals and attributes, something scarcely found in literature. Additionally, we provide a theoretical model of the tele-immersion pipeline that calculates its expected frame-rate, as well as lower and upper bounds for its end-to-end latency. In order to obtain these measures, the theoretical model takes into account the compression performance of the codecs and some indicative network conditions. Based on the results we obtained through our codec benchmarking, we used our theoretical model to calculate and provide concrete measures for these tele-immersion pipeline’s metrics and discuss on the optimal codec choice depending on the network setup. This offers deep insight into the available solutions and paves the way for future research.
Due to the increased popularity of augmented and virtual reality experiences, the interest in capturing the real world in multiple dimensions and in presenting it to users in an immersible fashion has never been higher. Distributing such representations enables users to freely navigate in multi-sensory 3D media experiences. Unfortunately, such representations require a large amount of data, not feasible for transmission on today’s networks. Efficient compression technologies well adopted in the content chain are in high demand and are key components to democratize augmented and virtual reality applications. The Moving Picture Experts Group, MPEG, as one of the main standardization groups dealing with multimedia, identified the trend and started recently the process of building an open standard for compactly representing 3D point clouds, which are the 3D equivalent of the very well-known 2D pixels. This paper introduces the main developments and technical aspects of this ongoing standardization effort.
Conference Paper
Point clouds are one of the most promising technologies for 3D content representation. In this paper, we describe a study on quality assessment of point clouds, degraded by octree-based compression on different levels. The test contents were displayed using Screened Poisson surface reconstruction, without including any textural information, and they were rated by subjects in a passive way, using a 2D image sequence. Subjective evaluations were performed in five independent laboratories in different countries, with the inter-laboratory correlation analysis showing no statistical differences, despite the different equipment employed. Benchmarking results reveal that the state-of-the-art point cloud objective metrics are not able to accurately predict the expected visual quality of such test contents. Moreover, the subjective scores collected from this experiment were found to be poorly correlated with subjective scores obtained from another test involving visualization of raw point clouds. These results suggest the need for further investigations on adequate point cloud representations and objective quality assessment tools.