About
341
Publications
39,233
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,665
Citations
Citations since 2017
Introduction
Skills and Expertise
Publications
Publications (341)
With the increased interest in immersive experiences, point cloud came to birth and was widely adopted as the first choice to represent 3D media. Besides several distortions that could affect the 3D content spanning from acquisition to rendering, efficient transmission of such volumetric content over traditional communication systems stands at the...
The development of Metaverse industry produces many 360
$^{\circ}$
images and videos. Transmitting these images or videos efficiently is the key to success of Metaverse. Since the subject’s field of view is limited in Metaverse, from the perception perspective, bit rates can be saved by focusing video encoding on salient regions. On different ways...
Dayong Wang Xin Lu Yu Sun- [...]
Ce Zhu
To seamlessly adapt to time-varying network bandwidths, the Quality Scalable High-Efficiency Video Coding (QSHVC) is developed. However, its coding process is overly complex, and this seriously limits its wide applications in real-time environments. Therefore, it is of great significance to study fast coding algorithms for QSHVC. In this paper, we...
Following the advent of immersive technologies and the increasing interest in representing interactive geometrical format, 3D Point Clouds (PC) have emerged as a promising solution and effective means to display 3D visual information. In addition to other challenges in immersive applications, objective and subjective quality assessments of compress...
With multi-layer encoding and Inter-layer prediction, Spatial Scalable High Efficiency Video Coding (SSHVC) has extremely high coding complexity. It is very crucial to speed up its coding to promote widespread and cost-effective SSHVC applications. Specifically, we first reveal that the average RD cost of Inter-layer Reference (ILR) mode is differe...
A Tone Mapping Operator (TMO) is required to render images with a High Dynamic Range (HDR) on media with limited dynamic capabilities. TMOs compress the dynamic range with the aim of preserving the visually perceptual cues of the scene. Previous literature has established the benefits of TMOs being semantic aware, understanding the content in the s...
In-loop filtering is used in video coding to process the reconstructed frame in order to remove blocking artifacts. With the development of convolutional neural networks (CNNs), CNNs have been explored for in-loop filtering considering it can be treated as an image de-noising task. However, in addition to being a distorted image, the reconstructed...
Point clouds are becoming essential in key applications with advances in capture technologies leading to large volumes of data. Compression is thus essential for storage and transmission. In this work, the state of the art for geometry and attribute compression methods with a focus on deep learning based approaches is reviewed. The challenges faced...
Tone mapping operators (TMO) are functions that map high dynamic range (HDR) images to a standard dynamic range (SDR), while aiming to preserve the perceptual cues of a scene that govern its visual quality. Despite the increasing number of studies on quality assessment of tone mapped images, current subjective quality datasets have relatively small...
Computational image aesthetics aims at designing algorithmic approaches to perform aesthetic decisions, in a similar fashion as humans. In the past fifteen years, computational aesthetics has undergone unprecedented development, thanks to the availability of large annotated datasets and deep learning approaches, impacting many applications in multi...
In-loop filtering is used in video coding to process the reconstructed frame in order to remove blocking artifacts. With the development of convolutional neural networks (CNNs), CNNs have been explored for in-loop filtering considering it can be treated as an image de-noising task. However, in addition to being a distorted image, the reconstructed...
Due to the expension of High Dynamic Range (HDR) imaging applications into various aspects of daily life, an efficient retrieval system, tailored to this type of data, has become a pressing challenge. In this paper, the reliability of Convolutional Neural Networks (CNN) descriptor and its investigation for HDR image retrieval are studied. The main...
Point clouds are essential for storage and transmission of 3D content. As they can entail significant volumes of data, point cloud compression is crucial for practical usage. Recently, point cloud geometry compression approaches based on deep neural networks have been explored. In this paper, we evaluate the ability to predict perceptual quality of...
Point clouds are essential for storage and transmission of 3D content. As they can entail significant volumes of data, point cloud compression is crucial for practical usage. Recently, point cloud geometry compression approaches based on deep neural networks have been explored. In this paper, we evaluate the ability to predict perceptual quality of...
Tone mapping operators (TMO) are pivotal in rendering High Dynamic Range (HDR) content on limited dynamic range media. Analysing the quality of tone mapped images depends on several objective factors and a combination of several subjective factors like aesthetics, fidelity etc. Objective Image quality assessment (IQA) metrics are often used to eval...
The interest in autonomous driving has continuously increased in the last two decades. However, to be adopted, such critical systems need to be safe. Concerning the perception of the ego-vehicle environment, the literature has investigated two different types of methods. On the one hand, traditional analytical methods generally rely on handcrafted...
Light fields enable increasing the degree of realism and immersion of visual experience by capturing a scene with a higher number of dimensions than conventional 2D imaging. On another side, higher dimensionality entails significant storage and transmission overhead compared to traditional video. Conventional coding schemes achieve high coding gain...
A Tone Mapping Operator (TMO) aims at reproducing the visual perception of a scene with a high dynamic range (HDR)
on low dynamic range (LDR) media. TMOs have primarily
aimed to preserve global perception by employing a model of
human visual system (HVS), analysing perceptual attributes
of each pixel and adjusting exposure at the pixel level. Prese...
This paper considers the problem of positive unlabeled (PU) learning. In this context, we propose a two-stage GAN-based model. More specifically, the main contribution is to incorporate a biased PU risk within the standard GAN discriminator loss function. In this manner, the discriminator is constrained to steer the generator to converge towards th...
Point clouds have been recognized as a crucial data structure for 3D content and are essential in a number of applications such as virtual and mixed reality, autonomous driving, cultural heritage, etc. In this paper, we propose a set of contributions to improve deep point cloud compression, i.e.: using a scale hyperprior model for entropy coding; e...
Scalable High Efficiency Video Coding (SHVC) is the extension of High Efficiency Video Coding (HEVC). In intra prediction for quality SHVC, a Coding Unit (CU) is recursively divided into a quadtree-based structure from the largest 64×64 CU to the smallest 8×8 CU, in which 35 intra prediction modes and Inter-Layer Reference (ILR) mode are checked to...
Existing techniques to compress point cloud attributes leverage either geometric or video-based compression tools. We explore a radically different approach inspired by recent advances in point cloud representation learning. Point clouds can be interpreted as 2D manifolds in 3D space. Specifically, we fold a 2D grid onto a point cloud and we map at...
In this working note paper we present the contribution and results of the participation of the UPB-L2S team to the MediaEval 2019 Predicting Media Memorability Task. The task requires participants to develop machine learning systems able to predict automatically whether a video will be memorable for the viewer, and for how long (e.g., hours, or day...
With surge of available but unlabeled data, Positive Unlabeled (PU) learning is becoming a thriving challenge. This work deals with this demanding task for which recent GAN-based PU approaches have demonstrated promising results. Generative adversarial Networks (GANs) are not hampered by deterministic bias or need for specific dimensionality. Howev...
With surge of available but unlabeled data, Positive Unlabeled (PU) learning is becoming a thriving challenge. This work deals with this demanding task for which recent GAN-based PU approaches have demonstrated promising results. Generative adversarial Networks (GANs) are not hampered by deterministic bias or need for specific dimensionality. Howev...
This article mainly aims at motivating more investigations on self-supervised learning (SSL) perception techniques and their applications in autonomous driving. Such approaches are of broad interest as they can improve analytical methods performances, for example to perceive farther and more accurately spatially or temporally. In the meantime, they...
Efficient point cloud compression is fundamental to enable the deployment of virtual and mixed reality applications, since the number of points to code can range in the order of millions. In this paper, we present a novel data-driven geometry compression method for static point clouds based on learned convolutional transforms and uniform quantizati...
The scalable high efficiency video coding (SHVC) is an extension of high efficiency video coding (HEVC), which introduces multiple layers and inter-layer predictions, thus significantly increases the coding complexity on top of the already complicated HEVC encoder. In inter prediction for quality SHVC, in order to determine the best possible mode a...
A computationally fast tone mapping operator (TMO) that can quickly adapt to a wide spectrum of high dynamic range (HDR) content is quintessential for visualization on varied low dynamic range (LDR) output devices such as movie screens or standard displays. Existing TMOs can successfully tone-map only a limited number of HDR content and require an...
Modern holography for 3D imaging allows to reconstruct all the
parallaxes that are needed for a truly immersive visualisation. Nevertheless, it possess huge amount of data which induces higher transmission and storage requirements. To gain more popularity and acceptance, digital holography demands development of efficient coding schemes that provid...
Presents a listing of the SPS society Multimedia Signal Processing Technical Committee.
An intra coding algorithm with layer separation is proposed. This algorithm is designed on top of an adopted tool in VVC, called Block DPCM (BDPCM), and benefits from texture information in a neighborhood to derive intensity levels of background and foreground layers. This information is used to reduce large rate of residual in case of incorrect la...
Efficient point cloud compression is fundamental to enable the deployment of virtual and mixed reality applications, since the number of points to code can range in the order of millions. In this paper, we present a novel data-driven geometry compression method for static point clouds based on learned convolutional transforms and uniform quantizati...
Transmission and compression technologies advancement over the past decade led to a shift of multimedia content towards cloud systems. Multiple copies of the same video are available through numerous distribution systems. Different compression levels, algorithms and resolutions are used to match the requirements of particular applications. As 4k di...
With the growing popularity of high dynamic range (HDR) imaging, efficient compression techniques are demanded, as HDR video entails typically higher raw data rate than traditional video. For this purpose, we introduce a hybrid spatially and temporally constrained content-adaptive tone mapping operator (TMO) to convert the input HDR video into a to...
In this paper, we propose a new framework to optimally tone map the high dynamic range (HDR) content for image matching under drastic illumination variations. Since tone mapping operators (TMO) have traditionally been used for displaying HDR scenes, their design is suboptimal when used for computer vision tasks such as image matching. We address th...
Leveraging on the properties of human visual system, most of the well-designed video coding standards utilize rate–distortion optimization techniques by maximizing a fidelity cost function (e.g., peak signal noise ratio, PSNR) under an available bit rate budget constrain. However, a huge amount of video data is consumed by computers rather than by...
High dynamic range (HDR) imaging enables to capture the full range of physical luminance of a real-world scene, and is expected to progressively replace traditional low dynamic range (LDR) pictures and videos. Despite the increasing HDR popularity, very little attention has been devoted to new forensic problems that are characteristic to this conte...
Accurate prediction of local distortion visibility thresholds is critical in many image and video processing applications. Existing methods require an accurate modeling of the human visual system, and are derived through pshycophysical experiments with simple, artificial stimuli. These approaches, however, are difficult to generalize to natural ima...
Subjective quality assessment is considered a reliable method for quality assessment of distorted stimuli for several multimedia applications. The experimental methods can be broadly categorized into those that rate and rank stimuli. Although ranking directly provides an order of stimuli rather than a continuous measure of quality, the experimental...
In this paper we present a new complete detector–descriptor framework for local features extraction from grayscale texture-plus-depth images. It is designed by putting together a locally normalized binary descriptor and the popular AGAST corner detector modified to incorporate the depth map into the keypoint detection process. With these new local...
High Dynamic Range (HDR) image visual quality assessment in the absence of a reference image is challenging. This research topic has not been adequately studied largely due to the high cost of HDR display devices. Nevertheless, HDR imaging technology has attracted increasing attention because it provides more realistic content, consistent to what t...