Jiahao PangHong Kong University of Science and Technology | UST · Department of Electronic and Computer Engineering
Jiahao Pang
Doctor of Philosophy
About
64
Publications
11,117
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,935
Citations
Introduction
Skills and Expertise
Publications
Publications (64)
A deep learning system typically suffers from a lack of reproducibility that is partially rooted in hardware or software implementation details. The irreproducibility leads to skepticism in deep learning technologies and it can hinder them from being deployed in many applications. In this work, the irreproducibility issue is analyzed where deep lea...
There have been recent efforts to learn more meaningful representations via fixed length codewords from mesh data, since a mesh serves as a complete model of underlying 3D shape compared to a point cloud. However, the mesh connectivity presents new difficulties when constructing a deep learning pipeline for meshes. Previous mesh unsupervised learni...
A 3D point cloud is typically constructed from depth measurements acquired by sensors at one or more viewpoints. The measurements suffer from both quantization and noise corruption. To improve quality, previous works denoise a point cloud a posteriori after projecting the imperfect depth data onto 3D space. Instead, we enhance depth measurements di...
Point cloud compression (PCC) is a key enabler for various 3-D applications, owing to the universality of the point cloud format. Ideally, 3D point clouds endeavor to depict object/scene surfaces that are continuous. Practically, as a set of discrete samples, point clouds are locally disconnected and sparsely distributed. This sparse nature is hind...
Point clouds are becoming essential in key applications with advances in capture technologies leading to large volumes of data. Compression is thus essential for storage and transmission. In this work, the state of the art for geometry and attribute compression methods with a focus on deep learning based approaches is reviewed. The challenges faced...
A 3D point cloud is typically constructed from depth measurements acquired by sensors at one or more viewpoints. The measurements suffer from both quantization and noise corruption. To improve quality, previous works denoise a point cloud \textit{a posteriori} after projecting the imperfect depth data onto 3D space. Instead, we enhance depth measur...
Geometric data acquired from real-world scenes, e.g., 2D depth images, 3D point clouds, and 4D dynamic point clouds, have found a wide range of applications including immersive telepresence, autonomous driving, surveillance, etc. Due to irregular sampling patterns of most geometric data, traditional image/video processing methodologies are limited,...
Representative image restoration problems include image denoising, image deblurring, image inpainting and image super‐resolution. This chapter focuses on resolving this category of problems with graph spectral signal processing. It introduces a sufficiently general but simple image degradation model. The chapter emphasises on approaches based on fo...
Scene flow depicts the dynamics of a 3D scene, which is critical for various applications such as autonomous driving, robot navigation, AR/VR, etc. Conventionally, scene flow is estimated from dense/regular RGB video frames. With the development of depth-sensing technologies, precise 3D measurements are available via point clouds which have sparked...
Scene flow depicts the dynamics of a 3D scene, which is critical for various applications such as autonomous driving, robot navigation, AR/VR, etc. Conventionally, scene flow is estimated from dense/regular RGB video frames. With the development of depth-sensing technologies, precise 3D measurements are available via point clouds which have sparked...
Geometric data acquired from real-world scenes, e.g., 2D depth images, 3D point clouds, and 4D dynamic point clouds, have found a wide range of applications including immersive telepresence, autonomous driving, surveillance, etc. Due to irregular sampling patterns of most geometric data, traditional image/video processing methodologies are limited,...
Topology matters. Despite the recent success of point cloud processing with geometric deep learning, it remains arduous to capture the complex topologies of point cloud data with a learning model. Given a point cloud dataset containing objects with various genera or scenes with multiple objects, we propose an autoencoder, TearingNet, which tackles...
A 3D point cloud is often synthesized from depth measurements collected by sensors at different viewpoints. The acquired measurements are typically both coarse in precision and corrupted by noise. To improve quality, previous works denoise a synthesized 3D point cloud a posteriori after projecting the imperfect depth data onto 3D space. Instead, we...
Recently, it is increasingly popular to equip mobile RGB cameras with Time-of-Flight (ToF) sensors for active depth sensing. However, for off-the-shelf ToF sensors, one must tackle two problems in order to obtain high-quality depth with respect to the RGB camera, namely 1) online calibration and alignment; and 2) complicated error correction for To...
With the developments of dual-lens camera modules,depth information representing the third dimension of thecaptured scenes becomes available for smartphones. It isestimated by stereo matching algorithms, taking as input thetwo views captured by dual-lens cameras at slightly differ-ent viewpoints. Depth-of-field rendering (also be referred toas synt...
We propose to combine the robustness merit of model-based approaches and the learning power of data-driven approaches for image restoration. Specifically, by integrating graph Laplacian regularization as a trainable module into a deep learning framework, we are less susceptible to overfitting than pure CNN-based approaches, achieving higher robustn...
3D point cloud - a new signal representation of volumetric objects - is a discrete collection of triples marking exterior object surface locations in 3D space. Conventional imperfect acquisition processes of 3D point cloud - e.g., stereo-matching from multiple viewpoint images or depth data acquired directly from active light sensors - imply non-ne...
Despite the recent success of stereo matching with convolutional neural networks (CNNs), it remains arduous to generalize a pre-trained deep stereo model to a novel domain. A major difficulty is to collect accurate ground-truth disparities for stereo pairs in the target domain. In this work, we propose a self-adaptation approach for CNN training, u...
Previous monocular depth estimation methods take a single view and directly regress the expected results. Though recent advances are made by applying geometrically inspired loss functions during training, the inference procedure does not explicitly impose any geometrical constraint. Therefore these models purely rely on the quality of data and the...
We observed that recent state-of-the-art results on single image human pose estimation were achieved by multi-stage Convolution Neural Networks (CNN). Notwithstanding the superior performance on static images, the application of these models on videos is not only computationally intensive, it also suffers from performance degeneration and flicking....
Leveraging on the recent developments in convolutional neural networks (CNNs), matching dense correspondence from a stereo pair has been cast as a learning problem, with performance exceeding traditional approaches. However, it remains challenging to generate high-quality disparities for the inherently ill-posed regions. To tackle this problem, we...
Recent advances in visual tracking showed that deep Convolutional Neural Networks (CNN) trained for image classification can be strong feature extractors for discriminative trackers. However, due to the drastic difference between image classification and tracking, extra treatments such as model ensemble and feature engineering must be carried out t...
Most of the recent successful methods in accurate object detection and localization used some variants of R-CNN style two stage Convolutional Neural Networks (CNN) where plausible regions were proposed in the first stage then followed by a second stage for decision refinement. Despite the simplicity of training and the efficiency in deployment, the...
Inverse imaging problems are inherently under-determined, and hence, it is important to employ appropriate image priors for regularization. One recent popular prior—the graph Laplacian regularizer—assumes that the target pixel patch is smooth with respect to an appropriately chosen graph. However, the mechanisms and implications of imposing the gra...
Color filter array (CFA) interpolation, or 3-band demosaicking, is a process of interpolating the missing color samples in each band to reconstruct a full color image. In this paper, we are concerned with the challenging problem of multispectral demosaicking, where each band is significantly undersampled due to the increment in the number of bands....
Subpixel rendering technology increases the apparent resolution of an LCD/OLED screen by exploiting the physical property that a pixel is composed of RGB individually addressable subpixels. Due to the intrinsic intercoordination between apparent luminance resolution and color fringing artifact, a common method of subpixel image assessment is subjec...
Image denoising is an under-determined problem, and hence it is important to define appropriate image priors for regularization. One recent popular prior is the graph Laplacian regularizer, where a given pixel patch is assumed to be smooth in the graph-signal domain. The strength and direction of the resulting graph-based filter are computed from t...
Image denoising is the most basic inverse imaging problem. As an under-determined problem, appropriate definition of image priors to regularize the problem is crucial. Among recent proposed priors for image denoising are: i) graph Laplacian regularizer where a given pixel patch is assumed to be smooth in the graph-signal domain; and ii) self-simila...
We present a novel framework for high bit-precision image acquisition and reconstruction. This framework is designed based on the inherent Markov property of image signals. In acquisition stage, we add planned sensor distortion (PSD) to the analog image signal before feeding it to A/D converters (or quantizers) in camera sensor. In reconstruction s...
In this work, we tackle the problem of coloring black-and-white images, which is image colorization. Existing image colorization algorithms can be categorized into two types: scribble-based colorization algorithms and example-based colorization algorithms. Differently, we propose a hybrid scheme that combines the advantages of both categories. Give...
The High Efficiency Video Coding (HEVC) utilizes Z-scan order to process coding units (CUs). For intra prediction, this order cannot fully exploit the spatial correlation between adjacent CUs. After transform and quantization, the residue still contains lots of energy along edges which consumes many bits for compression. To effectively reduce the r...
Digital image matting is the determination of foreground color, background color, and an opacity value of each pixel for an input image. Inherently, matting is a highly ill-posed and under-constrained problem. Thus, some assumptions need to be made to resolve it. Inspired by closed-form matting and color clustering matting, in this work, we first d...
This paper proposes an effective color halftone image visual cryptography method to embed a binary secret pattern into dot diffused color halftone images, Data Hiding by Dual Color Conjugate Dot Diffusion (DCCDD). DCCDD considers inter-channel correlation in order to restrict the embedding distortions between different channels within an acceptable...
Subpixel-based downsampling has shown its advantages over pixel-based downsampling in terms of preserving more spatial details along edges and generating sharper images, at the cost of certain amount of color-fringing artifacts in the downsampled image. To balance the sharpness and color-fringing artifacts, some algorithms are proposed to design op...
The advancing digital photography technology has resulted in a large number of photos stored in personal computers. Photo album compression algorithms aim to save storage space and efficiently manage photos. In this paper, a general forest structure model involving depth constrain for photo album compression is proposed, which further exploits the...
Subpixel-based image down-sampling is a class of methods that can provide improved apparent resolution of the down-scaled image compared to the pixel-based methods. The frequency characteristics of all possible subpixel-based down-sampling patterns for RGB vertical stripes are analytically studied in this paper. Our proposed algorithm reveals that...
Subpixel-based downsampling generates images with higher apparent resolution with the expense of annoying color-fringing artifacts near strong edges. In this paper we propose two methods that find a balance in the tradeoff of apparent resolution and color-fringing artifacts. The first method is called Chroma Replacing in which the color-fringing ar...
Among existing interpolation methods, convolution-based methods are able to perform arbitrary factor interpolation but the results are usually blurry or jaggy, adaptive interpolation methods usually can reduce the blurry and jaggy artifacts but cannot handle arbitrary factor interpolation. In this paper we propose an arbitrary factor adaptive inter...
Natural image matting refers to the problem of extracting regions of interest such as foreground object from an image based on user inputs like scribbles or trimap. More specifically, we need to estimate the color information of background, foreground and the corresponding opacity, which is an ill-posed problem inherently. Inspired by closed-form m...
Halftone image watermarking has been explored and developed rapidly over the past decade. However, there are still issues to be studied. This paper presents a data hiding method called Data Hiding by Dual Color Conjugate Error Diffusion (DHDCCED) to hide a binary secret pattern into two error diffused color halftone images, such that when the two c...
Image colorization is the task to color a grayscale image with limited color cues. In this work, we present a novel method to perform image colorization using sparse representation. Our method first trains an over-complete dictionary in YUV color space. Then taking a grayscale image and a small subset of color pixels as inputs, our method colorizes...
Least square regression has been widely used in image interpolation. Some existing regression-based interpolation methods used ordinary least squares (OLS) to formulate cost functions. These methods usually have difficulties at object boundaries because OLS is sensitive to outliers. Weighted least squares (WLS) is then adopted to solve the outlier...