Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Point clouds have become increasingly prevalent in representing 3D scenes within virtual environments, alongside 3D meshes. Their ease of capture has facilitated a wide array of applications on mobile devices, from smartphones to autonomous vehicles. Notably, point cloud compression has reached an advanced stage and has been standardized. However, the availability of quality assessment datasets, which are essential for developing improved objective quality metrics, remains limited. In this paper, we introduce BASICS, a large-scale quality assessment dataset tailored for static point clouds. The BASICS dataset comprises 75 unique point clouds, each compressed with four different algorithms including a learning-based method, resulting in the evaluation of nearly 1500 point clouds by 3500 unique participants. Furthermore, we conduct a comprehensive analysis of the gathered data, benchmark existing point cloud quality assessment metrics and identify their limitations. By publicly releasing the BASICS dataset, we lay the foundation for addressing these limitations and fostering the development of more precise quality metrics.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The initial metrics considered for this study were the pointto-point (PSNR MSE D1) [8], point-to-plane (PSNR MSE D2) [8], point-to-attribute (PSNR MSE YUV) [9], Point Cloud Structural Similarity (PointSSIM) [10], Point Cloud Quality Metric (PCQM) [11], and Graph Similarity (GraphSIM) [12] along with its expanded Multiscale Graph Similarity (MS-GraphSIM) [13]. In this work image metrics were not considered, although they can be applied for point cloud objective quality evaluation [4], [14], [15], [9], [16], [17], [18]. However, these metrics depend on the visualization directions of the point clouds, leading to some instability. ...
... However, these metrics depend on the visualization directions of the point clouds, leading to some instability. Furthermore, some recent works [14], [4] seem to indicate that the most recent point cloud metrics tend to provide better performance. ...
... The Broad Quality Assessment of Static Point Clouds in Compression Scenario (BASICS) training dataset [14] is used for the analysis of the contribution of each metrics features. The main target of this study is to assess the quality of colored static point cloud coding solutions. ...
Preprint
Full-text available
Full-reference point cloud objective metrics are currently providing very accurate representations of perceptual quality. These metrics are usually composed of a set of features that are somehow combined, resulting in a final quality value. In this study, the different features of the best-performing metrics are analyzed. For that, different objective quality metrics are compared between them, and the differences in their quality representation are studied. This provided a selection of the set of metrics used in this study, namely the point-to-plane, point-to-attribute, Point Cloud Structural Similarity, Point Cloud Quality Metric and Multiscale Graph Similarity. The features defined in those metrics are examined based on their contribution to the objective estimation using recursive feature elimination. To employ the recursive feature selection algorithm, both the support vector regression and the ridge regression algorithms were employed. For this study, the Broad Quality Assessment of Static Point Clouds in Compression Scenario database was used for both training and validation of the models. According to the recursive feature elimination, several features were selected and then combined using the regression method used to select those features. The best combination models were then evaluated across five different publicly available subjective quality assessment datasets, targeting different point cloud characteristics and distortions. It was concluded that a combination of features selected from the Point Cloud Quality Metric, Multiscale Graph Similarity and PSNR MSE D2, combined with Ridge Regression, results in the best performance. This model leads to the definition of the Feature Selection Model.
... For example, the scores provided by different candidates or even the same candidate at different periods for the same point clouds are typically different. Third, existing publicly available PCQA datasets with MOS include both geometry and color distortions, such as coloured point cloud database (CPCD) 2.0 [18], SJTU point cloud quality assessment database (SJTU-PCQA) [13], SIAT point cloud quality database (SIAT-PCQD) [12], Waterloo point cloud database (WPC) [19], large scale point cloud quality assessment database (LS-PCQA) [20] and broad quality assessment of static point clouds in compression scenario (BASICS) [21]. And, the existing largest colorless geometry point cloud dataset (G-PCD) [22] only has 5 reference point clouds and 40 distorted instances, which are not sufficient to derive learningbased GQA metrics with high generalization ability. ...
... The entire network is jointly trained for quality prediction. Tao et al. [47] projected 3D point clouds into 2D color projection maps and geometric Octree, downsampling, color and geometry noise 420 WPC [19] 20 Gaussian noise, dowsampling, G-PCC, V-PCC 740 LS-PCQA [20] 104 G-PCC, V-PCC, color and geometry noise 22,568 BASICS [21] 75 V-PCC, G-PCC-Predlift, G-PCC-Raht, GeoCNN 1494 projection maps and designed a multi-scale feature fusion network to blindly evaluate the visual quality. However, the selection of projection directions may significantly influence the overall assessment performance [30,20]. ...
... Colored PCQA datasets, such as CPCD 2.0 [18], SJTU-PCQA [13], SIAT-PCQD [12], WPC [19], LS-PCQA [20] and BASICS [21], are built for general-purpose PCQA methods. These datasets contain numerous forms of colored point cloud (i.e., cultural heritages, computer-generated objects, and human figures) that are degraded by diverse geometric and color distortions [20]. ...
Preprint
Full-text available
Geometry quality assessment (GQA) of colorless point clouds is crucial for evaluating the performance of emerging point cloud-based solutions (e.g., watermarking, compression, and 3-Dimensional (3D) reconstruction). Unfortunately, existing objective GQA approaches are traditional full-reference metrics, whereas state-of-the-art learning-based point cloud quality assessment (PCQA) methods target both color and geometry distortions, neither of which are qualified for the no-reference GQA task. In addition, the lack of large-scale GQA datasets with subjective scores, which are always imprecise, biased, and inconsistent, also hinders the development of learning-based GQA metrics. Driven by these limitations, this paper proposes a no-reference geometry-only quality assessment approach based on list-wise rank learning, termed LRL-GQA, which comprises of a geometry quality assessment network (GQANet) and a list-wise rank learning network (LRLNet). The proposed LRL-GQA formulates the no-reference GQA as a list-wise rank problem, with the objective of directly optimizing the entire quality ordering. Specifically, a large dataset containing a variety of geometry-only distortions is constructed first, named LRL dataset, in which each sample is label-free but coupled with quality ranking information. Then, the GQANet is designed to capture intrinsic multi-scale patch-wise geometric features in order to predict a quality index for each point cloud. After that, the LRLNet leverages the LRL dataset and a likelihood loss to train the GQANet and ranks the input list of degraded point clouds according to their distortion levels. In addition, the pre-trained GQANet can be fine-tuned further to obtain absolute quality scores. Experimental results demonstrate the superior performance of the proposed no-reference LRL-GQA method compared with existing full-reference GQA metrics.
... 3) We validate streamPCQ-OR on the WPC5.0, BASICS [9] and M-PCCD [10] database and compared it with existing competitive PCQA models to verify its effectiveness and robustness. ...
... To evaluate the performance of the model, we selected other competitive PCQA models, namely (A) MPED [73], (B) PSNR Y [11], (C) PointSSIM [14], (D) IW-SSIM P [15], (E) MS-GraphSIM [17], (F) GraphSIM [16], (G) PCM RR [39], (H) 3DTA [49], (I) GMS-3DQA [54] and (J) MM-PCQA [62] for performance comparison with the proposed streamPCQ-OR model on three databases: WPC5.0, BASICS [9] and M-PCCD [10]. Initially, scatter plots were created, as shown in Fig. 10, depicting the objective scores obtained by all selected models on the WPC5.0 database against MOS along the bestfitting logistic function. ...
... The results are summarized in Table V. It can be observed that the streamPCQ-OR demonstrates strong competitiveness compared to other objective models, especially showing excellent predictive performance on the WPC5.0 and BASICS [9] databases. In addition, due to the overlap between the training set SJTU-PCQA [36] of GMS-3DQA [54] and the database M-PCCD [10], the performance of streamPCQ-OR on the M-PCCD [10] database is slightly lower than that of GMS-3DQA [54]. ...
Preprint
No-reference bitstream-layer point cloud quality assessment (PCQA) can be deployed without full decoding at any network node to achieve real-time quality monitoring. In this work, we focus on the PCQA problem dedicated to Octree-RAHT encoding mode. First, to address the issue that existing PCQA databases have a small scale and limited distortion levels, we establish the WPC5.0 database which is the first one dedicated to Octree-RAHT encoding mode with a scale of 400 distorted point clouds (PCs) including 4 geometric multiplied by 5 attitude distortion levels. Then, we propose the first PCQA model dedicated to Octree-RAHT encoding mode by parsing PC bitstreams without full decoding. The model introduces texture bitrate (TBPP) to predict texture complexity (TC) and further derives the texture distortion factor. In addition, the Geometric Quantization Parameter (PQS) is used to estimate the geometric distortion factor, which is then integrated into the model along with the texture distortion factor to obtain the proposed PCQA model named streamPCQ-OR. The proposed model has been compared with other advanced PCQA methods on the WPC5.0, BASICS and M-PCCD databases, and experimental results show that our model has excellent performance while having very low computational complexity, providing a reliable choice for time-critical applications. To facilitate subsequent research, the database and source code will be publicly released at https://github.com/qdushl/Waterloo-Point-Cloud-Database-5.0.
... We utilized the following three datasets for our experiments: 1) The broad quality assessment of static point clouds in compression scenario dataset (BASICS) [52], 2) The ICIP2020 dataset (ICIP20) [53], and 3) The waterloo point cloud database (WPC) [54]. In the datasets, the mean opinion score (MOS) of the distorted point clouds obtained from a subjective quality assessment is available. ...
... • The BASICS [52] dataset includes 75 references and 1498 distorted point clouds. The distorted point clouds undergo compression by V-PCC [5], G-PCC [5], or a deep-leaningbased compression method [7] at different compression levels. ...
... The proposed method was designed for the ICIP 2023 PCVQA Grand Challenge [26], which aims to assess compression distortion introduced in the BASICS dataset [52]. To confirm the effectiveness against compression distortion on another dataset, we experimented with the ICIP20 dataset in this paper. ...
Preprint
Point clouds are a general format for representing realistic 3D objects in diverse 3D applications. Since point clouds have large data sizes, developing efficient point cloud compression methods is crucial. However, excessive compression leads to various distortions, which deteriorates the point cloud quality perceived by end users. Thus, establishing reliable point cloud quality assessment (PCQA) methods is essential as a benchmark to develop efficient compression methods. This paper presents an accurate full-reference point cloud quality assessment (FR-PCQA) method called full-reference quality assessment using support vector regression (FRSVR) for various types of degradations such as compression distortion, Gaussian noise, and down-sampling. The proposed method demonstrates accurate PCQA by integrating five FR-based metrics covering various types of errors (e.g., considering geometric distortion, color distortion, and point count) using support vector regression (SVR). Moreover, the proposed method achieves a superior trade-off between accuracy and calculation speed because it includes only the calculation of these five simple metrics and SVR, which can perform fast prediction. Experimental results with three types of open datasets show that the proposed method is more accurate than conventional FR-PCQA methods. In addition, the proposed method is faster than state-of-the-art methods that utilize complicated features such as curvature and multi-scale features. Thus, the proposed method provides excellent performance in terms of the accuracy of PCQA and processing speed. Our method is available from https://github.com/STAC-USC/FRSVR-PCQA.
... The subjective quality study reported by Prazeres et al. [1] (referred to as EI2022 in the remainder of the paper) is expanded by increasing the number of evaluated metrics, and conducting more in-depth benchmarking. Furthermore, results are generalized using the BASICS database [2]. Additionally, more details on the subjective quality evaluation protocol, as well as visual examples are provided. ...
... The subjective quality evaluation methodology is described, and the performance of several commonly used objective evaluation metrics is analyzed under different types of compression artifacts. Moreover, the considered metrics are benchmarked on the BA-SICS validation dataset [2]. This is a large database that also contains The four codecs represent the diversity of coding artifacts usually created by the lossy coding of point clouds. ...
... Subjective quality evaluation studies also lead to the definition of point cloud quality assessment databases, commonly used to benchmark point cloud quality metrics [29][30][31][32]. Crowdsourcing methodologies have also been studied [2,33] as a method of subjective evaluation. ...
... To map the input attributes to quality score, we use a light-weight hybrid deep model; combined of Deformable Convolutional Network (DCN) and Vision Transformers (ViT). Experiments are carried out on ICIP20 [1], PointXR [2] dataset, and a new big dataset called BASICS [3]. The results show that our approach outperforms state-of-the-art NR-PCQA measures and even some FR-PCQA on PointXR. ...
... • We present a light-weight metric that ranked 1st in term of run-time and 4th in term of accuracy at ICIP 2023 -PCVQA grand challenge (Track 2 & 4 for no-reference metrics) [3]. • We extend the use of ViT for 3D-PC quality assessment using hybrid model; Convolution and self-Attention. ...
... For the PCQA domain, a variety of databases are publicly available. Three datasets were used for performance evaluation: two wellknown (PointXR [2] and ICIP [1]) and a third new dataset named BASICS [3], which was part of the ICIP 2023 Grand Challenge on Point Cloud Quality Assessment (PCVQA). The datasets are briefly described as follows: PointXR contains 5 PCs, from which 45 degraded versions were created using G-PCC with octree coding for geometry compression and Lifting and RAHT for color compression. ...
Conference Paper
Full-text available
Deep learning-based quality assessments have significantly enhanced perceptual multimedia quality assessment, however it is still in the early stages for 3D visual data such as 3D point clouds (PCs). Due to the high volume of 3D-PCs, such quantities are frequently compressed for transmission and viewing, which may affect perceived quality. Therefore, we propose no-reference quality metric of a given 3D-PC. Comparing to existing methods that mostly focus on geometry or color aspects, we propose integrating frequency magnitudes as indicator of spatial degradation patterns caused by the compression. To map the input attributes to quality score, we use a light-weight hybrid deep model; combined of Deformable Convolutional Network (DCN) and Vision Transformers (ViT). Experiments are carried out on ICIP20 [1], PointXR [2] dataset, and a new big dataset called BASICS [3]. The results show that our approach outperforms state-of-the-art NR-PCQA measures and even some FR-PCQA on PointXR. The implementation code can be found at: https://github.com/o-messai/3D-PCQA
... Evaluation Criteria: To quantify the correlation to the subjective assessment score, we employed Pearson's linear correlation coefficient (PLCC) and Spearman's rank-order correlation coefficient (SROCC). Dataset: We used the two datasets, the broad quality assessment of static point clouds in compression scenario (BA-SICS) [33], consisting of 1498 point clouds distorted by compression errors, and the Waterloo point cloud (WPC) [34] datasets, consisting of 740 distorted point clouds contaminated by compression, Gaussian noise, or down-sampling. The two datasets include the MOS of the corresponding distorted point clouds to be utilized to measure PLCC and SROCC. ...
... The two datasets include the MOS of the corresponding distorted point clouds to be utilized to measure PLCC and SROCC. Since the BASICS dataset was explicitly divided into training and test sets [33], we conducted training and evaluation with the training and test sets, respectively. As for the WPC dataset, distorted point clouds were segmented into five parts (20% data × 5 parts). ...
Preprint
Full-text available
Point clouds in 3D applications frequently experience quality degradation during processing, e.g., scanning and compression. Reliable point cloud quality assessment (PCQA) is important for developing compression algorithms with good bitrate-quality trade-offs and techniques for quality improvement (e.g., denoising). This paper introduces a full-reference (FR) PCQA method utilizing spectral graph wavelets (SGWs). First, we propose novel SGW-based PCQA metrics that compare SGW coefficients of coordinate and color signals between reference and distorted point clouds. Second, we achieve accurate PCQA by integrating several conventional FR metrics and our SGW-based metrics using support vector regression. To our knowledge, this is the first study to introduce SGWs for PCQA. Experimental results demonstrate the proposed PCQA metric is more accurately correlated with subjective quality scores compared to conventional PCQA metrics.
... As video capturing and processing techniques advance, the exploration of perceptual quality assessment and modeling has extended to the realm of 3D space. This research can be broadly categorized into two groups based on the data format used to construct the 3D models: perceptual quality modeling of point cloud [1,2,4,12,17,23,24] and mesh [19,20]. Among the works focusing on point cloud QoE modeling, the majority predominantly investigate impairments introduced to single-frame 3D models. ...
... These videos were encoded using V-PCC with compression rates ranging from R0 to R5. Because the ratings range from 0 to 100, we perform a discretization process to map these ratings into the range [1,2,3,4,5]. Specifically, we divided the rating range into five equal intervals of 20 units each. ...
Conference Paper
Full-text available
Volumetric video, which is typically represented by 3D point clouds, requires efficient point cloud compression (PCC) technologies for practical storage and transmission. Particularly, developed by the Moving Picture Experts Group (MPEG), video-based PCC (V-PCC) converts 3D point clouds into 2D image maps and compresses them with 2D video codecs, showing excellent compression performance. However, understanding the impact of compression on the perceptual quality of volumetric videos, which consist of both geometry and texture components, remains challenging. In this study, we propose a quality of experience (QoE) model to predict the subjective quality with respect to the compression level of geometry and texture, quantifying the impact of geometry and texture compression on perceptual quality. To the best of our knowledge, this study is the first to accurately model the perceptual quality of V-PCC-encoded volumetric videos. The QoE model is built based on a volumetric video quality assessment dataset, VOLVQAD, collected by us. We further evaluate our QoE model on the vsenseVVDB2 dataset, which was collected from diverse study settings, to validate its robustness and generalization ability. Both evaluations demonstrate the effectiveness of our model in various compression scenarios. This study makes a valuable contribution to our understanding of the factors that influence the QoE in V-PCC-encoded volumetric videos. The proposed model also holds potential for various other applications, such as adaptive bitrate allocation.
... Databases. To illustrate the effectiveness of our method, we employ three benchmarks with available raw opinion scores: SJTU-PCQA (Yang et al. 2020a), LS-PCQA Part I (Liu et al. 2023b) and BASICS (Ak et al. 2024). SJTU-PCQA includes 9 native point clouds and 378 distorted point clouds disturbed by 7 types of distortion under 6 levels. ...
Article
In recent years, No-Reference Point Cloud Quality Assessment (NR-PCQA) research has achieved significant progress. However, existing methods mostly seek a direct mapping function from visual data to the Mean Opinion Score (MOS), which is contradictory to the mechanism of practical subjective evaluation. To address this, we propose a novel language-driven PCQA method named CLIP-PCQA. Considering that human beings prefer to describe visual quality using discrete quality descriptions (e.g., "excellent" and "poor") rather than specific scores, we adopt a retrieval-based mapping strategy to simulate the process of subjective assessment. More specifically, based on the philosophy of CLIP, we calculate the cosine similarity between the visual features and multiple textual features corresponding to different quality descriptions, in which process an effective contrastive loss and learnable prompts are introduced to enhance the feature extraction. Meanwhile, given the personal limitations and bias in subjective experiments, we further covert the feature similarities into probabilities and consider the Opinion Score Distribution (OSD) rather than a single MOS as the final target. Experimental results show that our CLIP-PCQA outperforms other State-Of-The-Art (SOTA) approaches.
... I N recent years, thanks to the increasing capability of 3D acquisition systems, point clouds have emerged as one of the most popular formats for immersive media [1]. Point clouds consist of a collection of points defined by geometric coordinates and optional attributes such as color and reflectivity. ...
Preprint
Full-text available
During the compression, transmission, and rendering of point clouds, various artifacts are introduced, affecting the quality perceived by the end user. However, evaluating the impact of these distortions on the overall quality is a challenging task. This study introduces PST-PCQA, a no-reference point cloud quality metric based on a low-complexity, learning-based framework. It evaluates point cloud quality by analyzing individual patches, integrating local and global features to predict the Mean Opinion Score. In summary, the process involves extracting features from patches, combining them, and using correlation weights to predict the overall quality. This approach allows us to assess point cloud quality without relying on a reference point cloud, making it particularly useful in scenarios where reference data is unavailable. Experimental tests on three state-of-the-art datasets show good prediction capabilities of PST-PCQA, through the analysis of different feature pooling strategies and its ability to generalize across different datasets. The ablation study confirms the benefits of evaluating quality on a patch-by-patch basis. Additionally, PST-PCQA's light-weight structure, with a small number of parameters to learn, makes it well-suited for real-time applications and devices with limited computational capacity. For reproducibility purposes, we made code, model, and pretrained weights available at https://github.com/michaelneri/PST-PCQA.
... I N recent years, thanks to the increasing capability of 3D acquisition systems, point clouds have emerged as one of the most popular formats for immersive media [1]. Point clouds consist of a collection of points defined by geometric coordinates and optional attributes such as color and reflectivity. ...
Article
Full-text available
During the compression, transmission, and rendering of point clouds, various artifacts are introduced, affecting the quality perceived by the end user. However, evaluating the impact of these distortions on the overall quality is a challenging task. This study introduces PST-PCQA, a no-reference point cloud quality metric based on a low-complexity, learning-based framework. It evaluates point cloud quality by analyzing individual patches, integrating local and global features to predict the Mean Opinion Score. In summary, the process involves extracting features from patches, combining them, and using correlation weights to predict the overall quality. This approach allows us to assess point cloud quality without relying on a reference point cloud, making it particularly useful in scenarios where reference data is unavailable. Experimental tests on three state-of-the-art datasets show good prediction capabilities of PST-PCQA, through the analysis of different feature pooling strategies and its ability to generalize across different datasets. The ablation study confirms the benefits of evaluating quality on a patch-by-patch basis. Additionally, PST-PCQA's lightweight structure, with a small number of parameters to learn, makes it well-suited for real-time applications and devices with limited computational capacity. For reproducibility purposes, we made code, model, and pretrained weights available at https://github.com/michaelneri/PST-PCQA.
... Such approaches accomplish the task of assessing the quality of three-dimensional point clouds by leveraging the principles of image quality assessment [20]. However, algorithms resorting to two-dimensional projection face the challenges of occlusion and voids generated by projection, which have a significant impact on the evaluation results [21]. Moreover, the projection of the entire point cloud from three-dimensional to twodimensional is costly. ...
Preprint
Full-text available
One of the main challenges in point cloud compression (PCC) is how to evaluate the perceived distortion so that the codec can be optimized for perceptual quality. Current standard practices in PCC highlight a primary issue: while single-feature metrics are widely used to assess compression distortion, the classic method of searching point-to-point nearest neighbors frequently fails to adequately build precise correspondences between point clouds, resulting in an ineffective capture of human perceptual features. To overcome the related limitations, we propose a novel assessment method called RBFIM, utilizing radial basis function (RBF) interpolation to convert discrete point features into a continuous feature function for the distorted point cloud. By substituting the geometry coordinates of the original point cloud into the feature function, we obtain the bijective sets of point features. This enables an establishment of precise corresponding features between distorted and original point clouds and significantly improves the accuracy of quality assessments. Moreover, this method avoids the complexity caused by bidirectional searches. Extensive experiments on multiple subjective quality datasets of compressed point clouds demonstrate that our RBFIM excels in addressing human perception tasks, thereby providing robust support for PCC optimization efforts.
... To further demonstrate the effectiveness of PCE-GAN, we applied the trained PCE-GAN models directly to a new dataset without retraining. We tested its performance on the broad quality assessment of static point clouds in a compression scenario (BASICS) [60] dataset. The BASICS dataset consists of 75 point clouds, categorized into "Humans & Animals", "Inanimate Objects", and "Buildings & Landscapes," with 25 point clouds in each category, encompassing a wide range of subjects such as animals, humans, everyday items, vehicles, architectural structures, and natural landscapes. ...
Preprint
Point cloud compression significantly reduces data volume but sacrifices reconstruction quality, highlighting the need for advanced quality enhancement techniques. Most existing approaches focus primarily on point-to-point fidelity, often neglecting the importance of perceptual quality as interpreted by the human visual system. To address this issue, we propose a generative adversarial network for point cloud quality enhancement (PCE-GAN), grounded in optimal transport theory, with the goal of simultaneously optimizing both data fidelity and perceptual quality. The generator consists of a local feature extraction (LFE) unit, a global spatial correlation (GSC) unit and a feature squeeze unit. The LFE unit uses dynamic graph construction and a graph attention mechanism to efficiently extract local features, placing greater emphasis on points with severe distortion. The GSC unit uses the geometry information of neighboring patches to construct an extended local neighborhood and introduces a transformer-style structure to capture long-range global correlations. The discriminator computes the deviation between the probability distributions of the enhanced point cloud and the original point cloud, guiding the generator to achieve high quality reconstruction. Experimental results show that the proposed method achieves state-of-the-art performance. Specifically, when applying PCE-GAN to the latest geometry-based point cloud compression (G-PCC) test model, it achieves an average BD-rate of -19.2% compared with the PredLift coding configuration and -18.3% compared with the RAHT coding configuration. Subjective comparisons show a significant improvement in texture clarity and color transitions, revealing finer details and more natural color gradients.
... Databases. To illustrate the effectiveness of our method, we employ three benchmarks with available raw opinion scores: SJTU-PCQA (Yang et al. 2020a), LS-PCQA Part I (Liu et al. 2023b) and BASICS (Ak et al. 2024). SJTU-PCQA includes 9 native point clouds and 378 distorted point clouds disturbed by 7 types of distortion under 6 levels. ...
Preprint
In recent years, No-Reference Point Cloud Quality Assessment (NR-PCQA) research has achieved significant progress. However, existing methods mostly seek a direct mapping function from visual data to the Mean Opinion Score (MOS), which is contradictory to the mechanism of practical subjective evaluation. To address this, we propose a novel language-driven PCQA method named CLIP-PCQA. Considering that human beings prefer to describe visual quality using discrete quality descriptions (e.g., "excellent" and "poor") rather than specific scores, we adopt a retrieval-based mapping strategy to simulate the process of subjective assessment. More specifically, based on the philosophy of CLIP, we calculate the cosine similarity between the visual features and multiple textual features corresponding to different quality descriptions, in which process an effective contrastive loss and learnable prompts are introduced to enhance the feature extraction. Meanwhile, given the personal limitations and bias in subjective experiments, we further covert the feature similarities into probabilities and consider the Opinion Score Distribution (OSD) rather than a single MOS as the final target. Experimental results show that our CLIP-PCQA outperforms other State-Of-The-Art (SOTA) approaches.
... Furthermore, the linearity, monotony and accuracy of the metric compared to subjective scores at a more coarse level should be taken into account, such that one can hypothesize these will hold at finer granularity. In this respect, Ak et al. [51] show VMAF to have the best performance along image-based metrics for static point clouds. PLCCs and SROCCs of 0.742 and 0.669 are obtained. ...
Article
Full-text available
Point cloud video delivery will be an important part of future immersive multimedia. In it, objects represented as sets of points are embedded within a video which is streamed and displayed to remote users. This opens possibilities towards remote presence scenarios such as tele-conferencing, remote education and virtual training. Due to its infeasibly high bandwidth requirements, encoding is unavoidable. The introduced artifacts and network degradations can have an important but unpredictable impact on the end-user’s Quality of Experience (QoE). Thus, real-time quality monitoring and prediction mechanisms are key to allow for fast countermeasures in case of QoE decrease. Since current state-of-the-art research is focusing on either continuous QoE monitoring of traditional video streaming services or objective delivery optimizations of point cloud content without any QoE validation, we believe this work brings a valuable contribution to current literature. Therefore, we present a no-reference (NR) QoE model, consisting of KMeans clustering and sigmoidal mapping, that works on video-level, group-of-pictures (GOP)-level and frame-level granularity. Results show the value of the sigmoidal mapping across all granularity levels. The clustering algorithm shows its value at the video-level and in the role of an outlier detector on the more fine-grained levels. Satisfying results are yet obtained with correlation values often going above 0.700 on GOP- and frame-level while maintaining root mean squared error (RMSE) below 10 on a 0–100 scale. In addition, a Command Line Interface (CLI) Video Metric Tool is presented that allows for easy and modular calculation of NR metrics on a given video.
... The best results in terms of minimum c v values are boldfaced. • D 2 [32]: The dataset has 75 distinct reference PCs and 898 test PCs with a wide diversity of contents and distortions, only the training subset was used. • D 3 [23]: It contains 9 reference PCs, subjected to 7 distortion types (compression, color noise, geometric shift, down-sampling, and three mixed distortions) at six levels, yielding a total of 378 distorted PCs. ...
... Numerous studies have been conducted to evaluate the quality of point clouds, taking into account several coding approaches and experimental configurations [9,10,11,4,12,13]. Perry et al. presented an assessment of the perceived quality of MPEG Point Cloud codecs, notably Video Point Cloud Compression (V-PCC) and Geometry Point Cloud Compression (G-PCC), using a 2D display [2]. ...
Preprint
Full-text available
Typically, point cloud encoders allocate a similar bitrate for geometry and attributes (usually RGB color components) information coding. This paper reports a quality study considering different coding bitrate tradeoff between geometry and attributes. A set of five point clouds, representing different characteristics and types of content was encoded with the MPEG standard Geometry Point Cloud Compression (G-PCC), using octree to encode geometry information, and both the Region Adaptive Hierarchical Transform and the Prediction Lifting transform for attributes. Furthermore, the JPEG Pleno Point Cloud Verification Model was also tested. Five different attributes/geometry bitrate tradeoffs were considered, notably 70%/30%, 60%/40%, 50%/50%, 40%/60%, 30%/70%. Three point cloud objective metrics were selected to assess the quality of the reconstructed point clouds, notably the PSNR YUV, the Point Cloud Quality Metric, and GraphSIM. Furthermore, for each encoder, the Bjonteegaard Deltas were computed for each tradeoff, using the 50%/50% tradeoff as a reference. The reported results indicate that using a higher bitrate allocation for attribute encoding usually yields slightly better results.
... T HREE dimensional point clouds (3D PCs) are assemblages of points defined in a 3D coordinate system, representing the external surfaces of objects or scenes. Typically harvested via 3D scanning technologies like LiDAR sensors, photogrammetry and depth sensors [1], [2], these points are characterized by their X, Y and Z coordinates and may also encompass additional attributes like color, surface normals and reflectance. The utility of 3D PCs extends to various fields, including autonomous vehicles, digital museums, smart cities, virtual reality, augmented reality, immersive communication and the preservation of cultural heritage [1]- [7]. ...
Preprint
No-reference bitstream-layer point cloud quality assessment (PCQA) can be deployed without full decoding at any network node to achieve real-time quality monitoring. In this work, we develop the first PCQA model dedicated to Trisoup-Lifting encoded 3D point clouds by analyzing bitstreams without full decoding. Specifically, we investigate the relationship among texture bitrate per point (TBPP), texture complexity (TC) and texture quantization parameter (TQP) while geometry encoding is lossless. Subsequently, we estimate TC by utilizing TQP and TBPP. Then, we establish a texture distortion evaluation model based on TC, TBPP and TQP. Ultimately, by integrating this texture distortion model with a geometry attenuation factor, a function of trisoupNodeSizeLog2 (tNSL), we acquire a comprehensive NR bitstream-layer PCQA model named streamPCQ-TL. In addition, this work establishes a database named WPC6.0, the first and largest PCQA database dedicated to Trisoup-Lifting encoding mode, encompassing 400 distorted point clouds with both 4 geometric multiplied by 5 texture distortion levels. Experiment results on M-PCCD, ICIP2020 and the proposed WPC6.0 database suggest that the proposed streamPCQ-TL model exhibits robust and notable performance in contrast to existing advanced PCQA metrics, particularly in terms of computational cost. The dataset and source code will be publicly released at \href{https://github.com/qdushl/Waterloo-Point-Cloud-Database-6.0}{\textit{https://github.com/qdushl/Waterloo-Point-Cloud-Database-6.0}}
... Firstly, concerning single-object point cloud quality assessment, although there are existing studies in this area, we consider current algorithms unsuitable for evaluating outdoor LiDAR point clouds. Most existing algorithms focus on single objects rather than full scenes, which include backgrounds and multiple objects, and they often rely on a reference point cloud as ground truth-something typically unavailable for outdoor real-world LiDAR point clouds [17], [18], [19], [20]. Secondly, regarding confidence score thresholds from most object detection algorithms, while many 3D object detection methods predict object confidence scores to represent the probability of an object's existence at a specific position, these confidence scores are often misconstrued as indicators of point cloud quality. ...
Preprint
LiDAR is one of the most crucial sensors for autonomous vehicle perception. However, current LiDAR-based point cloud perception algorithms lack comprehensive and rigorous LiDAR quality assessment methods, leading to uncertainty in detection performance. Additionally, existing point cloud quality assessment algorithms are predominantly designed for indoor environments or single-object scenarios. In this paper, we introduce a novel image-guided point cloud quality assessment algorithm for outdoor autonomous driving environments, named the Image-Guided Outdoor Point Cloud Quality Assessment (IGO-PQA) algorithm. Our proposed algorithm comprises two main components. The first component is the IGO-PQA generation algorithm, which leverages point cloud data, corresponding RGB surrounding view images, and agent objects' ground truth annotations to generate an overall quality score for a single-frame LiDAR-based point cloud. The second component is a transformer-based IGO-PQA regression algorithm for no-reference outdoor point cloud quality assessment. This regression algorithm allows for the direct prediction of IGO-PQA scores in an online manner, without requiring image data and object ground truth annotations. We evaluate our proposed algorithm using the nuScenes and Waymo open datasets. The IGO-PQA generation algorithm provides consistent and reasonable perception quality indices. Furthermore, our proposed IGO-PQA regression algorithm achieves a Pearson Linear Correlation Coefficient (PLCC) of 0.86 on the nuScenes dataset and 0.97 on the Waymo dataset.
... A point cloud is a collection of non-uniformly scattered 3D points that may suffer from impairments in both geometry and attributes (e.g., color) during processing, resulting in perceptual degradation. To facilitate quality of experience (QoE)-oriented tasks (e.g., compression [3], [4] and enhancement [5], [6]), point cloud quality assessment (PCQA) has gained significant attention among researchers. PCQA can be achieved through subjective experiments or objective metrics. ...
Preprint
Full-reference (FR) point cloud quality assessment (PCQA) has achieved impressive progress in recent years. However, as reference point clouds are not available in many cases, no-reference (NR) metrics have become a research hotspot. Existing NR methods suffer from poor generalization performance. To address this shortcoming, we propose a novel NR-PCQA method, Point Cloud Quality Assessment via Domain-relevance Degradation Description (D3^3-PCQA). First, we demonstrate our model's interpretability by deriving the function of each module using a kernelized ridge regression model. Specifically, quality assessment can be characterized as a leap from the scattered perceptual domain (reflecting subjective perception) to the ordered quality domain (reflecting mean opinion score). Second, to reduce the significant domain discrepancy, we establish an intermediate domain, the description domain, based on insights from subjective experiments, by considering the domain relevance among samples located in the perception domain and learning a structured latent space. The anchor features derived from the learned latent space are generated as cross-domain auxiliary information to promote domain transformation. Furthermore, the newly established description domain decomposes the NR-PCQA problem into two relevant stages. These stages include a classification stage that gives the degradation descriptions to point clouds and a regression stage to determine the confidence degrees of descriptions, providing a semantic explanation for the predicted quality scores. Experimental results demonstrate that D3^3-PCQA exhibits robust performance and outstanding generalization ability on several publicly available datasets. The code in this work will be publicly available at https://smt.sjtu.edu.cn.
... We first note for completeness that an extensive research literature addresses a multitude of aspects related to the general topic area of immersive 3D environment creation without specific consideration of semantic compression. For instance, general 3D mesh estimation techniques have been examined, e.g., in [38], [39], while compression and compression evaluations have, for instance, been conducted in [40]- [44]. Also, advanced compression concepts, such as compression based on graph neural networks [45]- [48], have recently been explored. ...
Article
Full-text available
Extended Reality (XR) applications, which may encompass Augmented Reality (AR) and Virtual Reality (VR), commonly involve immersive virtual environments that are based on real physical environments. Transmitting the extensive color and depth image data for representing a physical environment over a communication network to create a corresponding immersive environment at a remote location is very challenging due to the enormous data volumes and the time-constraints of real-time immersion. We explore semantic compression, which conveys the semantic meaning of the data through color-codes (CCs) to reduce the transmitted data volume. The creation of an immersive environment conventionally involves the five steps (with corresponding output): calibration (single-frame point cloud), registration (registered point cloud), volume reconstruction (voxel frame), marching cubes (meshes), and rendering (3D environment). We develop the novel cXR+ semantic compression that splits the volume reconstruction into a client-side virtual network function (VNF) that represents the registered point cloud as CCs for network transmission to a server-side VNF that decodes the CCs to obtain the voxel frames so as to complete the volume reconstruction. We consider an analogous splitting of the calibration into two VNFs with Graph Coloring (GC) compression of the color and depth image data as a comparison benchmark. Additionally, we consider the transmission of the raw and JPG compressed point cloud data as well as the volume reconstruction at the client-side and subsequent transmission of the voxel frames as benchmarks. We conduct measurements of the compression (computing) time, transmission time, compression ratio, and reconstruction accuracy for real-world data sets so as to elucidate the trade-offs in employing voxel-based semantic compression for transmitting immersive environments over communication networks.
Article
Quality assessment, which evaluates the visual quality level of multimedia experiences, has garnered significant attention from researchers and has evolved substantially through dedicated efforts. Before the advent of large models, quality assessment typically relied on small expert models tailored for specific tasks. While these smaller models are effective at handling their designated tasks and predicting quality levels, they often lack explainability and robustness. With the advancement of large models, which align more closely with human cognitive and perceptual processes, many researchers are now leveraging the prior knowledge embedded in these large models for quality assessment tasks. This emergence of quality assessment within the context of large models motivates us to provide a comprehensive review focusing on two key aspects: 1) the assessment of large models, and 2) the role of large models in assessment tasks. We begin by reflecting on the historical development of quality assessment. Subsequently, we move to detailed discussions of related works concerning quality assessment in the era of large models. Finally, we offer insights into the future progression and potential pathways for quality assessment in this new era. We hope this survey will enable a rapid understanding of the development of quality assessment in the era of large models and inspire further advancements in the field.
Article
Full-text available
This meta-survey provides a comprehensive review of 3D point cloud (PC) applications in remote sensing (RS), essential datasets available for research and development purposes, and state-of-the-art point cloud compression methods. It offers a comprehensive exploration of the diverse applications of point clouds in remote sensing, including specialized tasks within the field, precision agriculture-focused applications, and broader general uses. Furthermore, datasets that are commonly used in remote-sensing-related research and development tasks are surveyed, including urban, outdoor, and indoor environment datasets; vehicle-related datasets; object datasets; agriculture-related datasets; and other more specialized datasets. Due to their importance in practical applications, this article also surveys point cloud compression technologies from widely used tree- and projection-based methods to more recent deep learning (DL)-based technologies. This study synthesizes insights from previous reviews and original research to identify emerging trends, challenges, and opportunities, serving as a valuable resource for advancing the use of point clouds in remote sensing.
Conference Paper
Measuring the complexity of visual content is crucial in various applications, such as selecting sources to test processing algorithms, designing subjective studies, and efficiently determining the appropriate encoding parameters and bandwidth allocation for streaming. While spatial and temporal complexity measures exist for 2D videos, a geometric complexity measure for 3D content is still lacking. In this paper, we present the first study to characterize the geometric complexity of 3D point clouds. Inspired by existing complexity measures, we propose several compression-based definitions of geometric complexity derived from the rate-distortion curves obtained by compressing a dataset of point clouds using G-PCC. Additionally, we introduce density-based and geometry-based descriptors to predict complexity. Our initial results show that even simple density measures can accurately predict the geometric complexity of point clouds.
Article
Recent years have witnessed the success of the deep learning-based technique in research of no-reference point cloud quality assessment (NR-PCQA). For a more accurate quality prediction, many previous studies have attempted to capture global and local features in a bottom-up manner, but ignored the interaction and promotion between them. To solve this problem, we propose a novel asynchronous feedback quality prediction network (AFQ-Net). Motivated by human visual perception mechanisms, AFQ-Net employs a dual-branch structure to deal with global and local features, simulating the left and right hemispheres of the human brain, and constructs a feedback module between them. Specifically, the input point clouds are first fed into a transformer-based global encoder to generate the attention maps that highlight these semantically rich regions, followed by being merged into the global feature. Then, we utilize the generated attention maps to perform dynamic convolution for different semantic regions and obtain the local feature. Finally, a coarse-to-fine strategy is adopted to merge the two features into the final quality score. We conduct comprehensive experiments on three datasets and achieve superior performance over the state-of-the-art approaches on all of these datasets. The code will be available at https://github.com/zhangyujie-1998/AFQ-Net.
Preprint
Full-text available
Although 3D generated content (3DGC) offers advantages in reducing production costs and accelerating design timelines, its quality often falls short when compared to 3D professionally generated content. Common quality issues frequently affect 3DGC, highlighting the importance of timely and effective quality assessment. Such evaluations not only ensure a higher standard of 3DGCs for end-users but also provide critical insights for advancing generative technologies. To address existing gaps in this domain, this paper introduces a novel 3DGC quality assessment dataset, 3DGCQA, built using 7 representative Text-to-3D generation methods. During the dataset's construction, 50 fixed prompts are utilized to generate contents across all methods, resulting in the creation of 313 textured meshes that constitute the 3DGCQA dataset. The visualization intuitively reveals the presence of 6 common distortion categories in the generated 3DGCs. To further explore the quality of the 3DGCs, subjective quality assessment is conducted by evaluators, whose ratings reveal significant variation in quality across different generation methods. Additionally, several objective quality assessment algorithms are tested on the 3DGCQA dataset. The results expose limitations in the performance of existing algorithms and underscore the need for developing more specialized quality assessment methods. To provide a valuable resource for future research and development in 3D content generation and quality assessment, the dataset has been open-sourced in https://github.com/zyj-2000/3DGCQA.
Preprint
Full-text available
Quality assessment, which evaluates the visual quality level of multimedia experiences, has garnered significant attention from researchers and has evolved substantially through dedicated efforts. Before the advent of large models, quality assessment typically relied on small expert models tailored for specific tasks. While these smaller models are effective at handling their designated tasks and predicting quality levels, they often lack explainability and robustness. With the advancement of large models, which align more closely with human cognitive and perceptual processes, many researchers are now leveraging the prior knowledge embedded in these large models for quality assessment tasks. This emergence of quality assessment within the context of large models motivates us to provide a comprehensive review focusing on two key aspects: 1) the assessment of large models, and 2) the role of large models in assessment tasks. We begin by reflecting on the historical development of quality assessment. Subsequently, we move to detailed discussions of related works concerning quality assessment in the era of large models. Finally, we offer insights into the future progression and potential pathways for quality assessment in this new era. We hope this survey will enable a rapid understanding of the development of quality assessment in the era of large models and inspire further advancements in the field.
Article
The evaluation of the quality experience in 3D visual media is increasingly crucial, as user experience plays a pivotal role in the commercial success of immersive displays for widespread 3D graphics. Although numerous methods for assessing the quality of static Point Clouds (PCs) have been proposed, they consistently commence from visual representations of PCs themselves and appear to lack an adequate description of the actual viewing environment for users. In this paper, focal length-oriented point cloud visualization is implemented as a crucial factor for evaluating static PCs with color. The key insight lies in the integration of projection planes with varying focal lengths into the Convolutional Gated Recurrent Unit (ConvGRU) module, enabling the learning of spatial correlation representations across multiple scales to simulate mechanisms of zooming in and zooming out. Then, we propose the dual-branch transformer, incorporating self-attention modeling and information validity-based attention modeling, to improve the efficiency of global feature descriptors in projected images. Finally, the Multi-Layer Perceptron (MLP) is employed to predict quality scores of PCs by aggregating the multi-plane and multi-scale corresponding features along a coarse-to-fine pattern. Towards No-Reference Point Cloud Quality Evaluation (NR-PCQE), the proposed lightweight network, known as MS-PCQE, demonstrates a 3.87%, 0.85%, and 2.10% improvement in SROCC performance compared to the second-ranked approaches in WPC, SJTU-PCQA, and SIAT-PCQD databases respectively, while its inference speed can reach approximately 2 frames per second (fps) without algorithmic acceleration and around 25fps with TensorRT acceleration. All resources can be found at https://github.com/zerosola/MS-PCQE.
Article
Full-text available
The recent rise in interest in point clouds as an imaging modality has motivated standardization groups such as JPEG and MPEG to launch activities aiming at developing compression standards for point clouds. Lossy compression usually introduces visual artifacts that negatively impact the perceived quality of media, which can only be reliably measured through subjective visual quality assessment experiments. While MPEG standards have been subjectively evaluated in previous studies on multiple occasions, no work has yet assessed the performance of the recent JPEG Pleno standard in comparison to them. In this study, a comprehensive performance evaluation of JPEG and MPEG standards for point cloud compression is conducted. The impact of different configuration parameters on the performance of the codecs is first analyzed with the help of objective quality metrics. The results from this analysis are used to define three rate allocation strategies for each codec, which are employed to compress a set of point clouds at four target rates. The set of distorted point clouds is then subjectively evaluated following two subjective quality assessment protocols. Finally, the obtained results are used to compare the performance of these compression standards and draw insights about best coding practices.
Article
Full-text available
This article describes an empirical exploration on the effect of information loss affecting compressed representations of dynamic point clouds on the subjective quality of the reconstructed point clouds. The study involved compressing a set of test dynamic point clouds using the MPEG V-PCC (Video-based Point Cloud Compression) codec at 5 different levels of compression and applying simulated packet losses with three packet loss rates (0.5%, 1% and 2%) to the V-PCC sub-bitstreams prior to decoding and reconstructing the dynamic point clouds. The recovered dynamic point clouds qualities were then assessed by human observers in experiments conducted at two research laboratories in Croatia and Portugal, to collect MOS (Mean Opinion Score) values. These scores were subject to a set of statistical analyses to measure the degree of correlation of the data from the two laboratories, as well as the degree of correlation between the MOS values and a selection of objective quality measures, while taking into account compression level and packet loss rates. The subjective quality measures considered, all of the full-reference type, included point cloud specific measures, as well as others adapted from image and video quality measures. In the case of image-based quality measures, FSIM (Feature Similarity index), MSE (Mean Squared Error), and SSIM (Structural Similarity index) yielded the highest correlation with subjective scores in both laboratories, while PCQM (Point Cloud Quality Metric) showed the highest correlation among all point cloud-specific objective measures. The study showed that even 0.5% packet loss rates reduce the decoded point clouds subjective quality by more than 1 to 1.5 MOS scale units, pointing out the need to adequately protect the bitstreams against losses. The results also showed that the degradations in V-PCC occupancy and geometry sub-bitstreams have significantly higher (negative) impact on decoded point cloud subjective quality than degradations of the attribute sub-bitstream.
Article
Full-text available
Fuelled by the increase in popularity of virtual and augmented reality applications, point clouds have emerged as a popular 3D format for acquisition and rendering of digital humans, thanks to their versatility and real-time capabilities. Due to technological constraints and real-time rendering limitations, however, the visual quality of dynamic point cloud contents is seldom evaluated using virtual and augmented reality devices, instead relying on prerecorded videos displayed on conventional 2D screens. In this study, we evaluate how the visual quality of point clouds representing digital humans is affected by compression distortions. In particular, we compare three different viewing conditions based on the degrees of freedom that are granted to the viewer: passive viewing (2DTV), head rotation (3DoF), and rotation and translation (6DoF), to understand how interacting in the virtual space affects the perception of quality. We provide both quantitative and qualitative results of our evaluation involving 78 participants, and we make the data publicly available. To the best of our knowledge, this is the first study evaluating the quality of dynamic point clouds in virtual reality, and comparing it to traditional viewing settings. Results highlight the dependency of visual quality on the content under test, and limitations in the way current data sets are used to evaluate compression solutions. Moreover, influencing factors in quality evaluation in VR, and shortcomings in how point cloud encoding solutions handle visually-lossless compression, are discussed.
Conference Paper
Full-text available
3D point clouds constitute an emerging multimedia content, now used in a wide range of applications. The main drawback of this representation is the size of the data since typical point clouds may contain millions of points, usually associated with both geometry and color information. Consequently, a significant amount of work has been devoted to the efficient compression of this representation. Lossy compression leads to a degradation of the data and thus impacts the visual quality of the displayed content. In that context, predicting perceived visual quality computationally is essential for the optimization and evaluation of compression algorithms. In this paper, we introduce PCQM, a full-reference objective metric for visual quality assessment of 3D point clouds. The metric is an optimally-weighted linear combination of geometry-based and color-based features. We evaluate its performance on an open subjective dataset of colored point clouds compressed by several algorithms; the proposed quality assessment approach outperforms all previous metrics in terms of correlation with mean opinion scores.
Conference Paper
Full-text available
Volumetric video (VV) pipelines reached a high level of maturity, creating interest to use such content in interactive visualisation scenarios. VV allows real world content to be captured and represented as 3D models, which can be viewed from any chosen viewpoint and direction. Thus, VV is ideal to be used in augmented reality (AR) or virtual reality (VR) applications. Both textured polygonal meshes and point clouds are popular methods to represent VV. Even though the signal and image processing community slightly favours the point cloud due to its simpler data structure and faster acquisition, textured polygonal meshes might have other benefits such as better visual quality and easier integration with computer graphics pipelines. To better understand the difference between them, in this study, we compare these two different representation formats for a VV compression scenario utilising state-of-the-art compression techniques. For this purpose, we build a database and collect user opinion scores for subjective quality assessment of the compressed VV. The results show that meshes provide the best quality at high bitrates, while point clouds perform better for low bitrate cases. The created VV quality database will be made available online to support further scientific studies on VV quality assessment.
Article
Full-text available
This article presents an overview of the recent standardization activities for point cloud compression (PCC). A point cloud is a 3D data representation used in diverse applications associated with immersive media including virtual/augmented reality, immersive telepresence, autonomous driving and cultural heritage archival. The international standard body for media compression, also known as the Motion Picture Experts Group (MPEG), is planning to release in 2020 two PCC standard specifications: video-based PCC (V-CC) and geometry-based PCC (G-PCC). V-PCC and G-PCC will be part of the ISO/IEC 23090 series on the coded representation of immersive media content. In this paper, we provide a detailed description of both codec algorithms and their coding performances. Moreover, we will also discuss certain unique aspects of point cloud compression.
Conference Paper
Full-text available
Numerous methodologies for subjective quality assessment exist in the field of image processing. In particular, the Absolute Category Rating with Hidden Reference (ACR-HR) and the Double Stimulus Impairment Scale (DSIS) are considered two of the most prominent methods for assessing the visual quality of 2D images and videos. Are these methods valid/accurate to evaluate the perceived quality of 3D graphics data? Is the presence of an explicit reference necessary, due to the lack of human prior knowledge on 3D graphics data compared to natural images/videos? To answer these questions, we compare these two subjective methods (ACR-HR and DSIS) on a dataset of high-quality colored 3D models, impaired with various distortions. These subjective experiments were conducted in a virtual reality (VR) environment. Our results show differences in the performance of the methods depending on the 3D contents and the types of distortions. We show that DSIS outperforms ACR-HR in term of accuracy and points out a stable performance. Results also yield interesting conclusions on the importance of a reference for judging the quality of 3D graphics. We finally provide recommendations regarding the influence of the number of observers on the accuracy.
Conference Paper
Full-text available
Subjective quality assessment is considered a reliable method for quality assessment of distorted stimuli for several multimedia applications. The experimental methods can be broadly categorized into those that rate and rank stimuli. Although ranking directly provides an order of stimuli rather than a continuous measure of quality, the experimental data can be converted using scaling methods into an interval scale, similar to that provided by rating methods. In this paper, we compare the results collected in a rating (mean opinion scores) experiment to the scaled results of a pairwise comparison experiment, the most common ranking method. We find a strong linear relationship between results of both methods, which, however, differs between content. To improve the relationship and unify the scale, we extend the experiment to include cross-content comparisons. We find that the crosscontent comparisons reduce the confidence intervals for pairwise comparison results, but also improve the relationship with mean opinion scores.
Article
Full-text available
The number of online experiments conducted with subjects recruited via online platforms has grown considerably in the recent past. While one commercial crowdworking platform - Amazon's Mechanical Turk - basically has established and since dominated this field, new alternatives offer services explicitly targeted at researchers. In this article, we present www.prolific.ac and lay out its suitability for recruiting subjects for social and economic science experiments. After briefly discussing key advantages and challenges of online experiments relative to lab experiments, we trace the platform's historical development, present its features, and contrast them with requirements for different types of social and economic experiments.
Article
Full-text available
This paper introduces a new Urban Point Cloud Dataset for Automatic Segmentation and Classification acquired by Mobile Laser Scanning (MLS). We describe how the dataset is obtained from acquisition to post-processing and labeling. This dataset can be used to learn classification algorithm, however, given that a great attention has been paid to the split between the different objects, this dataset can also be used to learn the segmentation. The dataset consists of around 2km of MLS point cloud acquired in two cities. The number of points and range of classes make us consider that it can be used to train Deep-Learning methods. Besides we show some results of automatic segmentation and classification. The dataset is available at: http://caor-mines-paristech.fr/fr/paris-lille-3d-dataset/
Conference Paper
Full-text available
Most of the Depth Image Based Rendering (DIBR) techniques produce synthesized images which contain nonuniform geometric distortions affecting edges coherency. This type of distortions are challenging for common image quality metrics. Morphological filters maintain important geometric information such as edges across different resolution levels. In this paper, morphological wavelet peak signal-to-noise ratio measure, MW-PSNR, based on morphological wavelet decomposition is proposed to tackle the evaluation of DIBR synthesized images. It is shown that MW-PSNR achieves much higher correlation with human judgment compared to the stateof- the-art image quality measures in this context.
Article
Full-text available
The depth-image-based rendering (DIBR) algorithms used for 3D video applications introduce new types of artifacts mostly located around the disoccluded regions. As the DIBR algorithms involve geometric transformations, most of them introduce non-uniform geometric distortions affecting the edge coherency in the synthesized images. Such distortions are not handled efficiently by the common image quality assessment metrics which are primarily designed for other types of distortions. In order to better deal with specific geometric distortions in the DIBR-synthesized images, we propose a full-reference metric based on multi-scale image decomposition applying morphological filters. Using non-linear morphological filters in multi-scale image decomposition, important geometric information such as edges is maintained across different resolution levels. Edge distortion between the multi-scale representation subbands of the reference image and the DIBR-synthesized image is measured precisely using mean squared error. In this way, areas around edges that are prone to synthesis artifacts are emphasized in the metric score. Two versions of morphological multiscale metric have been explored: (a) Morphological Pyramid Peak Signal-to-Noise Ratio metric (MP-PSNR) based on morphological pyramid decomposition, and (b) Morphological Wavelet Peak Signal-to-Noise Ratio metric (MWPSNR) based on morphological wavelet decomposition. The performances of the proposed metrics have been tested using two databases which contain DIBR-synthesized images: the IRCCyN/IVC DIBR image database and MCL-3D stereoscopic image database. Proposed metrics achieve significantly higher correlation with human judgment compared to the state-of-the-art image quality metrics and compared to the tested metric dedicated to synthesis related artifacts. The proposed metrics are computationally efficient given that the morphological operators involve only integer numbers and simple computations like min, max, and sum as well as simple calculation of MSE. MP-PSNR has slightly better performances than MW-PSNR. It has very good agreement with human judgment, Pearson’s 0.894, Spearman 0.77 when it is tested on the MCL-3D stereoscopic image database. We have demonstrated that PSNR has particularly good agreement with human judgment when it is calculated between images at higher scales of morphological multi-scale representations. Consequently, simplified and in essence reduced versions of multi-scale metrics are proposed, taking into account only detailed images at higher decomposition scales. The reduced version of MP-PSNR has very good agreement with human judgment, Pearson’s 0.904, Spearman 0.863 using IRCCyN/IVC DIBR image database.
Article
Full-text available
We present the first end-to-end solution to create high-quality free-viewpoint video encoded as a compact data stream. Our system records performances using a dense set of RGB and IR video cameras , generates dynamic textured surfaces, and compresses these to a streamable 3D video format. Four technical advances contribute to high fidelity and robustness: multimodal multi-view stereo fusing RGB, IR, and silhouette information; adaptive meshing guided by automatic detection of perceptually salient areas; mesh tracking to create temporally coherent subsequences; and encoding of tracked textured meshes as an MPEG video stream. Quantitative experiments demonstrate geometric accuracy, texture fidelity, and encoding efficiency. We release several datasets with calibrated inputs and processed results to foster future research.
Article
Full-text available
The objective of this investigation was to develop and investigate methods for point cloud generation by image matching using aerial image data collected by quadrocopter type micro unmanned aerial vehicle (UAV) imaging systems. Automatic generation of high-quality, dense point clouds from digital images by image matching is a recent, cutting-edge step forward in digital photogrammetric technology. The major components of the system for point cloud generation are a UAV imaging system, an image data collection process using high image overlaps, and post-processing with image orientation and point cloud generation. Two post-processing approaches were developed: one of the methods is based on Bae Systems’ SOCET SET classical commercial photogrammetric software and another is built using Microsoft®’s Photosynth™ service available in the Internet. Empirical testing was carried out in two test areas. Photosynth processing showed that it is possible to orient the images and generate point clouds fully automatically without any a priori orientation information or interactive work. The photogrammetric processing line provided dense and accurate point clouds that followed the theoretical principles of photogrammetry, but also some artifacts were detected. The point clouds from the Photosynth processing were sparser and noisier, which is to a large extent due to the fact that the method is not optimized for dense point cloud generation. Careful photogrammetric processing with self-calibration is required to achieve the highest accuracy. Our results demonstrate the high performance potential of the approach and that with rigorous processing it is possible to reach results that are consistent with theory. We also point out several further research topics. Based on theoretical and empirical results, we give recommendations for properties of imaging sensor, data collection and processing of UAV image data to ensure accurate point cloud generation.
Article
Full-text available
In the research field of image processing, mean squared error (MSE) and peak signal-to-noise ratio (PSNR) are extensively adopted as the objective visual quality metrics, mainly because of their simplicity for calculation and optimization. How- ever, it has been well recognized that these pixel-based difference measures correlate poorly with the human perception. Inspired by existing works (1)-(3), in this paper we propose a novel algorithm which separately evaluates detail losses and additive impairments for image quality assessment. The detail loss refers to the loss of useful visual information which affects the content visibility, and the additive impairment represents the redundant visual infor- mation whose appearance in the test image will distract viewer's attention from the useful contents causing unpleasant viewing experience. To separate detail losses and additive impairments, a wavelet-domain decoupling algorithm is developed which can be used for a host of distortion types. Two HVS characteristics, i.e., the contrast sensitivity function and the contrast masking effect, are taken into account to approximate the HVS sensitivities. We propose two simple quality measures to correlate detail losses and additive impairments with visual quality, respectively. Based on the findings in (3) that observers judge low-quality images in terms of the ability to interpret the content, the outputs of the two quality measures are adaptively combined to yield the overall quality index. By conducting experiments based on five subjec- tively-rated image databases, we demonstrate that the proposed metric has a better or similar performance in matching subjective ratings when compared with the state-of-the-art image quality metrics.
Conference Paper
Full-text available
The structural similarity image quality paradigm is based on the assumption that the human visual system is highly adapted for extracting structural information from the scene, and therefore a measure of structural similarity can provide a good approximation to perceived image quality. This paper proposes a multiscale structural similarity method, which supplies more flexibility than previous single-scale methods in incorporating the variations of viewing conditions. We develop an image synthesis method to calibrate the parameters that define the relative importance of different scales. Experimental comparisons demonstrate the effectiveness of the proposed method.
Article
Full-text available
Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu/∼lcv/ssim/.
Article
FovVideoVDP is a video difference metric that models the spatial, temporal, and peripheral aspects of perception. While many other metrics are available, our work provides the first practical treatment of these three central aspects of vision simultaneously. The complex interplay between spatial and temporal sensitivity across retinal locations is especially important for displays that cover a large field-of-view, such as Virtual and Augmented Reality displays, and associated methods, such as foveated rendering. Our metric is derived from psychophysical studies of the early visual system, which model spatio-temporal contrast sensitivity, cortical magnification and contrast masking. It accounts for physical specification of the display (luminance, size, resolution) and viewing distance. To validate the metric, we collected a novel foveated rendering dataset which captures quality degradation due to sampling and reconstruction. To demonstrate our algorithm's generality, we test it on 3 independent foveated video datasets, and on a large image quality dataset, achieving the best performance across all datasets when compared to the state-of-the-art.
Article
Unmanned aerial vehicles (UAVs) equipped with integrated global navigation satellite systems/inertial navigation systems (GNSS/INS) together with cameras and/or LiDAR sensors are being widely used for topographic mapping in a variety of applications such as precision agriculture, coastal monitoring, and archaeological documentation. Integration of image-based and LiDAR point clouds can provide a comprehensive 3D model of the area of interest. For such integration, ensuring a good alignment between data from the different sources is critical. Although many works have been conducted on this topic, there is still a need for a rigorous integration approach that minimizes the discrepancy between camera and LiDAR data caused by inaccurate system calibration parameters and/or trajectory artifacts. This study proposes an automated tightly-coupled camera/LiDAR integration workflow for GNSS/INS-assisted UAV systems. The proposed strategy is conducted in three main steps. First, an image-based point cloud is generated using a LiDAR/GNSS/INS-assisted structure from motion (SfM) strategy. Then, feature correspondences between image-based and LiDAR point clouds are automatically identified. Finally, an integrated-bundle adjustment procedure including image points, LiDAR raw measurements, and GNSS/INS information is conducted to minimize the discrepancy between point clouds from different sensors while estimating system calibration parameters and refining the trajectory information. The proposed SfM strategy and integration framework are evaluated using five datasets. The SfM results show that using LiDAR data can facilitate feature matching and further increase the number of reconstructed 3D points. The experimental results also illustrate that the developed automated camera/LiDAR integration strategy is capable of accurately estimating system calibration parameters to achieve good alignment among camera/LiDAR data from single/multiple systems. An absolute accuracy in the range of 3–5 cm is achieved for the image/LiDAR point clouds after the integration process.
Article
In this paper, we focus on subjective and objective Point Cloud Quality Assessment (PCQA) in an immersive environment and study the effect of geometry and texture attributes in compression distortion. Using a Head-Mounted Display (HMD) with six degrees of freedom, we establish a subjective PCQA database, named SIAT Point Cloud Quality Database (SIAT-PCQD). Our database consists of 340 distorted point clouds compressed by the MPEG point cloud encoder with the combination of 20 sequences and 17 pairs of geometry and texture quantization parameters. The impact of distorted geometry and texture attributes is further discussed in this paper. Then, we propose two projection-based objective quality evaluation methods, i.e., a weighted view projection based model and a patch projection based model. Our subjective database and findings can be used in point cloud processing, transmission, and coding, especially for virtual reality applications. The subjective dataset https://dx.doi.org/10.21227/ad8d-7r28 http://codec.siat.ac.cn/video_download_siat-pcqd.htmlhas been released in the public repository.
Article
Point cloud is emerged as a promising media format to represent realistic 3D objects or scenes in applications, such as virtual reality, teleportation, etc. How to accurately quantify the subjective point cloud quality for application-driven optimization, however, is still a challenging and open problem. In this paper, we attempt to tackle this problem in a systematic means. First, we produce a fairly large point cloud dataset where ten popular point clouds are augmented with seven types of impairments (e.g., compression, photometry/color noise, geometry noise, scaling) at six different distortion levels, and organize a formal subjective assessment with tens of subjects to collect mean opinion scores (MOS) for all 420 processed point cloud samples (PPCS). We then try to develop an objective metric that can accurately estimate the subjective quality. Towards this goal, we choose to project the 3D point cloud onto six perpendicular image planes of a cube for the color texture image and corresponding depth image, and aggregate image-based global (e.g., Jensen-Shannon (JS) divergence) and local features (e.g., edge, depth, pixel-wise similarity, complexity) among all projected planes for a final objective index. Model parameters are fixed constants after performing the regression using a small and independent dataset previously published. The proposed metric has demonstrated the state-of-the-art performance for predicting the subjective point cloud quality compared with multiple full-reference and no-reference models, e.g., the weighted peak signal-to-noise ratio (PSNR), structural similarity (SSIM), feature similarity (FSIM) and natural image quality evaluator (NIQE). The dataset is made publicly accessible at http://smt.sjtu.edu.cn or http://vision.nju.edu.cn for all interested audiences.
Conference Paper
Point cloud is a 3D image representation that has recently emerged as a viable approach for advanced content modality in modern communication systems. In view of its wide adoption, quality evaluation metrics are essential. In this paper, we propose and assess a family of statistical dispersion measurements for the prediction of perceptual degradations. The employed features characterize local distributions of point cloud attributes reflecting topology and color. After associating local regions between a reference and a distorted model, the corresponding feature values are compared. The visual quality of a distorted model is then predicted by error pooling across individual quality scores obtained per region. The extracted features aim at capturing local changes, similarly to the well-known Structural Similarity Index. Benchmarking results using available datasets reveal best-performing attributes and features, under different neighborhood sizes. Finally, point cloud voxelization is examined as part of the process, improving the prediction accuracy under certain conditions.
Conference Paper
In this study, we explore the use of virtual reality to subjectively evaluate the visual quality of point cloud contents. To this aim, we develop the PointXR toolbox, a set of Unity applications that can host experiments under variants of interactive and passive evaluation protocols. An auxiliary tool to facilitate the configuration of the supported rendering schemes for point cloud visualization is provided as part of it. Our toolbox is employed to conduct two validating experiments in a virtual environment with 6 degrees of freedom. The purpose is to assess the performance of color encoders that are incorporated in the upcoming MPEG standard on point cloud compression. For this study, we convert a set of mesh models to point cloud contents, and form a high-quality cultural heritage repository, namely, PointXR dataset. A comparison between the adopted protocols and the codecs’ performance is carried based on the ratings obtained from both experiments. Finally, interactivity patterns based on behavioral data that were recorded during the evaluations are extracted, and results are discussed. The PointXR toolbox, the PointXR dataset, and the experimental results are made publicly available.
Conference Paper
Recently stakeholders in the area of multimedia representation and transmission have been looking at plenoptic technologies to improve immersive experience. Among these technologies, point clouds denote a volumetric information representation format with important applications in the entertainment, automotive and geographical mapping industries. There is some consensus that state-of-the-art solutions for efficient storage and communication of point clouds are far from satisfactory. This paper describes a study on point cloud quality evaluation, conducted in the context of JPEG Pleno to help define the test conditions of future compression proposals. A heterogeneous set of static point clouds in terms of number of points, geometric structure and represented scenarios were selected and compressed using octree-pruning and a projection-based method, with three different levels of degradation. The models were comprised of both geometrical and color information and were displayed using point sizes large enough to ensure observation of watertight surfaces. The stimuli under assessment were presented to the observers on 2D displays as animations, after defining suitable camera paths to enable visualization of the models in their entirety and realistic consumption. The experiments were carried out in three different laboratories and the subjective scores were used in a series of correlation studies to benchmark objective quality metrics and assess inter-laboratory consistency
Conference Paper
The rise of immersive technologies has been recently fuelled by emerging applications which employ advanced content representations. Among various alternatives, point clouds denote a promising solution which has recently drawn a significant amount of interest, as witnessed by the latest activities of standardization committees. However, subjective and objective quality assessments for this type of content still remain an open problem. In this paper, we introduce a simple yet efficient objective metric to capture perceptual degradations of a distorted point cloud. Correlation with subjective quality assessment scores carried out by human subjects shows the proposed metric to be superior to the state of the art in terms of predicting the visual quality of point clouds under realistic types of distortions, such as octree-based compression.
Article
Faithfully evaluating perceptual image quality is an important task in applications such as image compression, image restoration and multimedia streaming. A good image quality assessment (IQA) model is expected to be not only effective (i.e., deliver high quality prediction accuracy) but also computationally efficient. Owing to the need to deploy image quality measurement tools in high-speed networks, the efficiency of an IQA metric is particularly important due to the increasing proliferation of high-volume visual data. Here we develop and explain a new effective and efficient IQA model, called gradient magnitude similarity deviation (GMSD). Although the image gradient has been employed in other IQA models, few have achieved favorable performance in terms of both accuracy and efficiency. The results are proactive: we find that the pixel-wise gradient magnitude similarity (GMS) between the reference and distorted images combined with a novel pooling strategy-the standard deviation of the GMS map-predict accurately perceptual image quality. The resulting GMSD algorithm is much faster than most state-of-the-art IQA methods, and delivers highly competitive prediction accuracy on benchmark IQA databases. Matlab code that implements GMSD can be downloaded at http://www4.comp.polyu.edu.hk/~cslzhang/ IQA/GMSD/GMSD.htm.
Article
To provide a convincing proof that a new method is better than the state of the art, computer graphics projects are often accompanied by user studies, in which a group of observers rank or rate results of several algorithms. Such user studies, known as subjective image quality assessment experiments, can be very time-consuming and do not guarantee to produce conclusive results. This paper is intended to help design efficient and rigorous quality assessment experiments and emphasise the key aspects of the results analysis. To promote good standards of data analysis, we review the major methods for data analysis, such as establishing confidence intervals, statistical testing and retrospective power analysis. Two methods of visualising ranking results together with the meaningful information about the statistical and practical significance are explored. Finally, we compare four most prominent subjective quality assessment methods: single-stimulus, double-stimulus, forced-choice pairwise comparison and similarity judgements. We conclude that the forced-choice pairwise comparison method results in the smallest measurement variance and thus produces the most accurate results. This method is also the most time-efficient, assuming a moderate number of compared conditions.
Article
Image quality assessment (IQA) aims to use computational models to measure the image quality consistently with subjective evaluations. The well-known structural similarity index brings IQA from pixel- to structure-based stage. In this paper, a novel feature similarity (FSIM) index for full reference IQA is proposed based on the fact that human visual system (HVS) understands an image mainly according to its low-level features. Specifically, the phase congruency (PC), which is a dimensionless measure of the significance of a local structure, is used as the primary feature in FSIM. Considering that PC is contrast invariant while the contrast information does affect HVS' perception of image quality, the image gradient magnitude (GM) is employed as the secondary feature in FSIM. PC and GM play complementary roles in characterizing the image local quality. After obtaining the local quality map, we use PC again as a weighting function to derive a single quality score. Extensive experiments performed on six benchmark IQA databases demonstrate that FSIM can achieve much higher consistency with the subjective evaluations than state-of-the-art IQA metrics.
Article
Measurement of visual quality is of fundamental importance to numerous image and video processing applications. The goal of quality assessment (QA) research is to design algorithms that can automatically assess the quality of images or videos in a perceptually consistent manner. Image QA algorithms generally interpret image quality as fidelity or similarity with a "reference" or "perfect" image in some perceptual space. Such "full-reference" QA methods attempt to achieve consistency in quality prediction by modeling salient physiological and psychovisual features of the human visual system (HVS), or by signal fidelity measures. In this paper, we approach the image QA problem as an information fidelity problem. Specifically, we propose to quantify the loss of image information to the distortion process and explore the relationship between image information and visual quality. QA systems are invariably involved with judging the visual quality of "natural" images and videos that are meant for "human consumption." Researchers have developed sophisticated models to capture the statistics of such natural signals. Using these models, we previously presented an information fidelity criterion for image QA that related image quality with the amount of information shared between a reference and a distorted image. In this paper, we propose an image information measure that quantifies the information that is present in the reference image and how much of this reference information can be extracted from the distorted image. Combining these two quantities, we propose a visual information fidelity measure for image QA. We validate the performance of our algorithm with an extensive subjective study involving 779 images and show that our method outperforms recent state-of-the-art image QA algorithms by a sizeable margin in our simulations. The code and the data from the subjective study are available at the LIVE website.
Toward a practical perceptual video quality metric
  • Li
Common test conditions for point cloud compression
  • Schwarz
Visualizing a million time series with the density line chart
  • D Moritz
  • D Fisher
Evaluation criteria for PCC (Point Cloud Compression)
  • Mekuria
SIAT-PCQD: Subjective point cloud quality database with 6DoF head-mounted display
  • X Wu
  • Y Zhang
  • C Fan
  • J Hou
  • S Kwong
Common test conditions for point cloud compression
  • S Schwarz
  • P A Chou
  • M Budagavi