Dong Tian

Dong Tian
Mitsubishi Electric Research Laboratories · Multimedia

About

75
Publications
6,394
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,410
Citations
Featured research
Article
Full-text available
Point clouds are becoming essential in key applications with advances in capture technologies leading to large volumes of data. Compression is thus essential for storage and transmission. In this work, the state of the art for geometry and attribute compression methods with a focus on deep learning based approaches is reviewed. The challenges faced when compressing geometry and attributes are considered, with an analysis of the current approaches to address them, their limitations and the relations between deep learning and traditional ones. Current open questions in point cloud compression, existing solutions and perspectives are identified and discussed. Finally, the link between existing point cloud compression research and research problems to relevant areas of adjacent fields, such as rendering in computer graphics, mesh compression and point cloud quality assessment, is highlighted.
Article
Full-text available
Geometric data acquired from real-world scenes, e.g., 2D depth images, 3D point clouds, and 4D dynamic point clouds, have found a wide range of applications including immersive telepresence, autonomous driving, surveillance, etc. Due to irregular sampling patterns of most geometric data, traditional image/video processing methodologies are limited, while Graph Signal Processing (GSP)---a fast-developing field in the signal processing community---enables processing signals that reside on irregular domains and plays a critical role in numerous applications of geometric data from low-level processing to high-level analysis. To further advance the research in this field, we provide the first timely and comprehensive overview of GSP methodologies for geometric data in a unified manner by bridging the connections between geometric data and graphs, among the various geometric data modalities, and with spectral/nodal graph filtering techniques. We also discuss the recently developed Graph Neural Networks (GNNs) and interpret the operation of these networks from the perspective of GSP. We conclude with a brief discussion of open problems and challenges.
Article
Full-text available
Scene flow depicts the dynamics of a 3D scene, which is critical for various applications such as autonomous driving, robot navigation, AR/VR, etc. Conventionally, scene flow is estimated from dense/regular RGB video frames. With the development of depth-sensing technologies, precise 3D measurements are available via point clouds which have sparked new research in 3D scene flow. Nevertheless, it remains challenging to extract scene flow from point clouds due to the sparsity and irregularity in typical point cloud sampling patterns. One major issue related to irregular sampling is identified as the randomness during point set ab-straction/feature extraction-an elementary process in many flow estimation scenarios. A novel Spatial Abstraction with Attention (SA 2) layer is accordingly proposed to alleviate the unstable abstraction problem. Moreover, a Temporal Abstraction with Attention (TA 2) layer is proposed to rectify attention in temporal domain, leading to benefits with motions scaled in a larger range. Extensive analysis and experiments verified the motivation and significant performance gains of our method, dubbed as Flow Estimation via Spatial-Temporal Attention (FESTA), when compared to several state-of-the-art benchmarks of scene flow estimation.
Preprint
Full-text available
Geometric data acquired from real-world scenes, e.g., 2D depth images, 3D point clouds, and 4D dynamic point clouds, have found a wide range of applications including immersive telepresence, autonomous driving, surveillance, etc. Due to irregular sampling patterns of most geometric data, traditional image/video processing methodologies are limited, while Graph Signal Processing (GSP)---a fast-developing field in the signal processing community---enables processing signals that reside on irregular domains and plays a critical role in numerous applications of geometric data from low-level processing to high-level analysis. To further advance the research in this field, we provide the first timely and comprehensive overview of GSP methodologies for geometric data in a unified manner by bridging the connections between geometric data and graphs, among the various geometric data modalities, and with spectral/nodal graph filtering techniques. We also discuss the recently developed Graph Neural Networks (GNNs) and interpret the operation of these networks from the perspective of GSP. We conclude with a brief discussion of open problems and challenges.

Publications

Publications (75)
Article
Full-text available
Point clouds are becoming essential in key applications with advances in capture technologies leading to large volumes of data. Compression is thus essential for storage and transmission. In this work, the state of the art for geometry and attribute compression methods with a focus on deep learning based approaches is reviewed. The challenges faced...
Article
Full-text available
Geometric data acquired from real-world scenes, e.g., 2D depth images, 3D point clouds, and 4D dynamic point clouds, have found a wide range of applications including immersive telepresence, autonomous driving, surveillance, etc. Due to irregular sampling patterns of most geometric data, traditional image/video processing methodologies are limited,...
Article
Full-text available
Scene flow depicts the dynamics of a 3D scene, which is critical for various applications such as autonomous driving, robot navigation, AR/VR, etc. Conventionally, scene flow is estimated from dense/regular RGB video frames. With the development of depth-sensing technologies, precise 3D measurements are available via point clouds which have sparked...
Preprint
Scene flow depicts the dynamics of a 3D scene, which is critical for various applications such as autonomous driving, robot navigation, AR/VR, etc. Conventionally, scene flow is estimated from dense/regular RGB video frames. With the development of depth-sensing technologies, precise 3D measurements are available via point clouds which have sparked...
Preprint
Full-text available
Geometric data acquired from real-world scenes, e.g., 2D depth images, 3D point clouds, and 4D dynamic point clouds, have found a wide range of applications including immersive telepresence, autonomous driving, surveillance, etc. Due to irregular sampling patterns of most geometric data, traditional image/video processing methodologies are limited,...
Preprint
Topology matters. Despite the recent success of point cloud processing with geometric deep learning, it remains arduous to capture the complex topologies of point cloud data with a learning model. Given a point cloud dataset containing objects with various genera or scenes with multiple objects, we propose an autoencoder, TearingNet, which tackles...
Article
We propose a deep autoencoder with graph topology inference and filtering to achieve compact representations of unorganized 3D point clouds in an unsupervised manner. Many previous works discretize 3D points to voxels and then use lattice-based methods to process and learn 3D spatial information; however, this leads to inevitable discretization err...
Preprint
We propose a deep autoencoder with graph topology inference and filtering to achieve compact representations of unorganized 3D point clouds in an unsupervised manner. The encoder of the proposed networks adopts similar architectures as in PointNet, which is a well-acknowledged method for supervised learning of 3D point clouds. The decoder of the pr...
Patent
Full-text available
A method processes a signal by first constructing a graph from the signal, and then determining a graph matrix from the graph and the signal. A Krylov-based subspace is determined based on the graph matrix and the signal. A filter for the Krylov subspace is determined. The filter transforms the signal to produce a filtered signal, which is output.
Article
Full-text available
An axiomatic approach to signal reconstruction is formulated, involving a sample consistent set and a guiding set, describing desired reconstructions. New frame-less reconstruction methods are proposed, based on a novel concept of a reconstruction set, defined as a shortest pathway between the sample consistent set and the guiding set. Existence an...
Patent
Full-text available
A method reconstructs a signal by sampling the signal using a sampling procedure to obtain an input signal. A consistent set is determined from the input signal including the first elements such that applying the sampling procedure to the first elements results in the input signal. According to the type of the signal, a guiding set is determined in...
Conference Paper
In contrast to still image analysis, motion information offers a powerful means to analyze video. In particular, motion trajectories determined from keypoints have become very popular in recent years for a variety of video analysis tasks, including search, retrieval and classification. Additionally, cloud-based analysis of media content has been ga...
Presentation
Full-text available
We propose signal reconstruction algorithms which utilize a guiding subspace that represents desired properties of reconstructed signals. Optimal reconstructed signals are shown to belong to a convex bounded set, called the ``reconstruction'' set. Iterative reconstruction algorithms, based on conjugate gradient methods, are developed to approximate...
Article
Full-text available
We study the problem of reconstructing a signal from its projection on a subspace. The proposed signal reconstruction algorithms utilize a guiding subspace that represents desired properties of reconstructed signals. We show that optimal reconstructed signals belong to a convex bounded set, called the "reconstruction" set. We also develop iterative...
Article
Full-text available
In 3D image/video acquisition, different views are often captured with varying noise levels across the views. In this paper, we propose a graph-based image enhancement technique that uses a higher quality view to enhance a degraded view. A depth map is utilized as auxiliary information to match the perspectives of the two views. Our method performs...
Article
In order to improve 3-D video coding efficiency, we propose methods to estimate rendered view distortion in synthesized views as a function of the depth map quantization error. Our approach starts by calculating the geometric error caused by the depth map error based on the camera parameters. Then, we estimate the rendered view distortion based on...
Patent
An image for a virtual view of a scene is generated based on a set of texture images and a corresponding set of depth images acquired of the scene. A set of candidate depths associated with each pixel of a selected image is determined. For each candidate depth, a cost that estimates a synthesis quality of the virtual image is determined. The candid...
Article
In stereo video applications, the quality of the two views may vary based on different camera capturing conditions and setup, compression/transmission, and sensor noise. Although some studies show that the perceived video quality may not be significantly affected by the lower quality view, maintaining a similar video quality is still desired in ord...
Patent
Full-text available
A disparity vector for a pixel in a right image corresponding to a pixel in a left image in a pair of stereo images is determined. The disparity vector is based on a horizontal disparity and a vertical disparity and the pair of stereo images is unrectified. First, a set of candidate horizontal disparities is determined. For each candidate horizonta...
Article
Full-text available
Advanced multiview video systems are able to generate intermediate viewpoints of a 3-D scene. To enable low-complexity free view generation, texture and its associated depth are used as input data for each viewpoint. To improve the coding efficiency of such content, view synthesis prediction (VSP) is proposed to further reduce interview redundancy...
Conference Paper
Full-text available
Depth images are often presented at a lower spatial resolution, either due to limitations in the acquisition of the depth or to increase compression efficiency. As a result, upsampling low-resolution depth images to a higher spatial resolution is typically required prior to depth image based rendering. In this paper, depth enhancement and up-sampli...
Patent
Full-text available
A quality of a virtual image for a synthetic viewpoint in a 3D scene is determined. The 3D scene is acquired by texture images, and each texture image is associated with a depth image acquired by a camera arranged at a real viewpoint. A texture noise power is based on the acquired texture images and reconstructed texture images corresponding to a v...
Article
Full-text available
We propose an analytical model to estimate the synthesized view quality in 3D video. The model relates errors in the depth images to the synthesis quality, taking into account texture image characteristics, texture image quality and the rendering process. Specifically, we decompose the synthesis distortion into texture-error induced distortion and...
Conference Paper
Modern, state-of-the-art disparity estimation techniques are able to very accurately estimate the disparity for a wide variety of scene types. However all of these methods assume that the input images are epipolar rectified. When an image pair is not rectified, it must be pre-processed before any estimation can be done. In this paper we propose a d...
Conference Paper
Full-text available
Depth-based 3D formats are currently being developed as extensions to both AVC and HEVC standards. The availability of depth information facilitates the generation of intermediate views for advanced 3D applications and displays, and also enables more efficient coding of the multiview input data through view synthesis prediction techniques. This pap...
Conference Paper
We propose an analytical model to estimate the synthesized view quality in 3D video. Specifically, we estimate the depth-error induced distortion using an approach that combines frequency and spatial domain analysis. We also propose to decompose the spatial-variant video signals into gradient-based representations to capture the interaction between...
Conference Paper
View synthesis prediction provides an effective way to reduce inter-view redundancy of multiview video in addition to conventional disparity compensated prediction. Traditional forward warping techniques incur high complexity since an entire picture is typically warped from one viewpoint to another. To reduce this complexity, block-based backward w...
Conference Paper
In state-of-the-art HEVC-based 3D video codec, multiview video plus associated depth maps are used. In order to achieve better coding performance, instead of the conventional sum of squared errors (SSE), view synthesis optimization (VSO) is proposed and included in the anchor encoder software to calculate view synthesis distortion in rate-distortio...
Conference Paper
Advanced multiview video systems are able to generate intermediate viewpoints of a 3D scene. In addition to the texture content, corresponding depth is associated with each viewpoint. To improve the coding efficiency of such content, view synthesis prediction can be used to further reduce inter-view redundancy in addition to traditional disparity c...
Conference Paper
Traditional multi-view coding (MVC) systems compress the texture content captured from different view points, where temporal and inter-view redundancy are exploited to improve MVC coding efficiency. The advanced 3D video coding systems compress both the texture content and its corresponding depth captured from different view points, known as multiv...
Article
Full-text available
Depth map images are characterized by large homogeneous areas and strong edges. It has been observed that efficient compression of the depth map is achieved by applying a down-sampling operation prior to encoding. However, since high resolution depth maps are also needed for depth-based 3D coding tools, such as view synthesis prediction, an up-samp...
Article
Standardization of a new set of 3D formats has been initiated with the goal of improving the coding of stereo and multiview video, and also facilitating the generation of multiview output needed for auto-stereoscopic displays. Part of this effort will develop 3D and multiview extensions of the emerging standard for High Efficiency Video Coding (HEV...
Conference Paper
We propose an analytical model to estimate the rendering quality in 3D video. The model relates errors in the depth images to the rendering quality, taking into account texture image characteristics, texture image quality, the camera configuration and the rendering process. Specifically, we derive position (disparity) errors from the depth errors,...
Conference Paper
Full-text available
The quality of the depth map is crucial for depth image based rendering (DIBR) which enables a variety of advanced 3D video related applications such as perceived depth adjustment for stereoscopic video and intermediate view generation for multiview auto-stereoscopic displays. However, the input depth map for DIBR may suffer from errors and noise,...
Conference Paper
Full-text available
Depth map images are characterized by large homogeneous areas and strong edges. It has been observed that efficient compression of the depth map is achieved by applying a down-sampling operation prior to encoding. However, since high resolution depth maps are typically required for view synthesis, an up-sampling method that is able to recover the l...
Conference Paper
Full-text available
View synthesis is an essential function for a number of D video applications including free-viewpoint navigation and view generation for auto-stereoscopic displays. Depth Image Based Rendering (DIBR) techniques are typically applied for this purpose. However, the quality of the rendered views is very sensitive to the quality of the depth image. In...
Article
3D Video (3DV) with depth-image-based view synthesis is a promising candidate of next generation broadcasting applications. However, the synthesized views in 3DV are often contaminated by annoying artifacts, particularly notably around object boundaries, due to imperfect depth maps (e.g., produced by state-of-the-art stereo matching algorithms or c...
Article
Full-text available
With the development of 3D display and interactive multimedia systems, new 3D video applications, such as 3DTV and Free Viewpoint Video, are attracting significant interests. In order to enable these new applications, new data formats including captured 2D video sequences and corresponding depth maps have been proposed. Compared to conventional vid...
Conference Paper
Depth maps estimated using stereo matching between frames from different video views typically exhibit false contours and noisy artifacts around object boundaries. In this paper, iterative joint multilateral filtering is proposed to deal with these artifacts. The proposed filter consists of multiple filter kernels. Knowing that the estimated depth...
Conference Paper
During view synthesis based on depth maps, also known as Depth-Image-Based Rendering (DIBR), annoying artifacts are often generated around foreground objects, yielding the visual effects that slim silhouettes of foreground objects are scattered into the background. The artifacts are referred as the boundary noises. We investigate the cause of bound...
Conference Paper
In order to enable new video applications such as 3DTV and free-viewpoint video, new data formats including both 2D video sequences and corresponding depth map sequences have been proposed. One major characteristic making the depth maps different from video frames is that they typically consist of homogeneous areas separated by sharp edges represen...
Article
New data formats including 2D video and the corresponding depth maps enable new video applications in which virtual views can be rendered, such as 3DTV and free-viewpoint video (FVV). Different from video frames, depth maps typically consist of homogeneous areas (with no textures) separated by sharp edges representing depth value changes such as be...
Article
New data formats that include both video and the corresponding depth maps, such as multiview plus depth (MVD), enable new video applications in which intermediate video views (virtual views) can be generated using the transmitted/stored video views (reference views) and the corresponding depth maps as inputs. We propose a depth map coding method ba...
Article
Full-text available
In its Release 6, the Third Generation Partnership Project (3GPP) is defining a new service known as Multimedia Broadcast/Multicast (MBMS) that enables a number of new applications. Due to its nature, no feedback link from the receiver to the sender exists in MBMS. Hence no retrans- mission techniques can be employed to cope with the under- lying e...
Conference Paper
In 3D video (3DV) applications, a reduced number of views plus depth maps are transmitted or stored. When there is a need to render virtual views in between the actual views, the technique of depth image based rendering (DIBR) can be used to generate the intermediate views. To address the problem of noisy depth information in 3DV systems, we propos...
Conference Paper
Video representations that support view synthesis based on depth maps, such as multiview plus depth (MVD), have been recently proposed, raising interest in efficient tools for depth map coding. In this paper, we derive a new distortion metric that takes into consideration camera parameters and global video characteristics in order to quantify the e...
Article
To facilitate new video applications such as three-dimensional video (3DV) and free-viewpoint video (FVV), multiple view plus depth format (MVD), which consists of both video views and the corresponding per-pixel depth images, is being investigated. Virtual views can be generated using depth image based rendering (DIBR), which takes video and the c...
Article
Full-text available
This paper investigates the transmission of H.264 /AVC video in the 3GPP Multimedia Broadcast /Multicast Streaming service (MBMS). Application-layer forward error correction (FEC) codes are used to combat transmission errors in the radio access network. In this FEC protection scheme, the media RTP stream is organized into source blocks spanning man...
Article
Full-text available
It is well-known that the problem of addressing heterogeneous networks in multicast c