Binh-Son Hua

Binh-Son Hua
  • Research Scientist at VinAI Research

About

86
Publications
11,745
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,177
Citations
Current institution
VinAI Research
Current position
  • Research Scientist
Additional affiliations
February 2013 - December 2014
National University of Singapore
Position
  • Research Assistant
January 2009 - January 2015
National University of Singapore
Position
  • PhD Student

Publications

Publications (86)
Preprint
Full-text available
Diffusion models have shown great promise in synthesizing visually appealing images. However, it remains challenging to condition the synthesis at a fine-grained level, for instance, synthesizing image pixels following some generic color pattern. Existing image synthesis methods often produce contents that fall outside the desired pixel conditions....
Preprint
Full-text available
We propose a new view synthesis method via synthesizing a 3D neural field from both single or few-view input images. To address the ill-posed nature of the image-to-3D generation problem, we devise a two-stage method that involves a reconstruction model and a diffusion model for view synthesis. Our reconstruction model first lifts one or more input...
Preprint
Full-text available
We propose SharpDepth, a novel approach to monocular metric depth estimation that combines the metric accuracy of discriminative depth estimation methods (e.g., Metric3D, UniDepth) with the fine-grained boundary sharpness typically achieved by generative methods (e.g., Marigold, Lotus). Traditional discriminative models trained on real-world data w...
Preprint
Full-text available
Existing Score Distillation Sampling (SDS)-based methods have driven significant progress in text-to-3D generation. However, 3D models produced by SDS-based methods tend to exhibit over-smoothing and low-quality outputs. These issues arise from the mode-seeking behavior of current methods, where the scores used to update the model oscillate between...
Preprint
Instance segmentation on 3D point clouds (3DIS) is a longstanding challenge in computer vision, where state-of-the-art methods are mainly based on full supervision. As annotating ground truth dense instance masks is tedious and expensive, solving 3DIS with weak supervision has become more practical. In this paper, we propose GaPro, a new instance s...
Preprint
In this paper, we address the problem of conditional scene decoration for 360-degree images. Our method takes a 360-degree background photograph of an indoor scene and generates decorated images of the same scene in the panorama view. To do this, we develop a 360-aware object layout generator that learns latent object vectors in the 360-degree view...
Preprint
In this paper, we propose a new method for mapping a 3D point cloud to the latent space of a 3D generative adversarial network. Our generative model for 3D point clouds is based on SP-GAN, a state-of-the-art sphere-guided 3D point cloud generator. We derive an efficient way to encode an input 3D point cloud to the latent space of the SP-GAN. Our po...
Preprint
Full-text available
Monte Carlo integration is typically interpreted as an estimator of the expected value using stochastic samples. There exists an alternative interpretation in calculus where Monte Carlo integration can be seen as estimating a \emph{constant} function -- from the stochastic evaluations of the integrand -- that integrates to the original integral. Th...
Chapter
In this paper, we conduct a study on the state-of-the-art methods for text-to-image synthesis and propose a framework to evaluate these methods. We consider syntheses where an image contains a single or multiple objects. Our study outlines several issues in the current evaluation pipeline: (i) for image quality assessment, a commonly used metric, e...
Preprint
Recently, great progress has been made in 3D deep learning with the emergence of deep neural networks specifically designed for 3D point clouds. These networks are often trained from scratch or from pre-trained models learned purely from point cloud data. Inspired by the success of deep learning in the image domain, we devise a novel pre-training t...
Preprint
Full-text available
High dynamic range (HDR) imaging is an indispensable technique in modern photography. Traditional methods focus on HDR reconstruction from multiple images, solving the core problems of image alignment, fusion, and tone mapping, yet having a perfect solution due to ghosting and other visual artifacts in the reconstruction. Recent attempts at single-...
Chapter
We present a novel method for few-shot video classification, which performs appearance and temporal alignments. In particular, given a pair of query and support videos, we conduct appearance alignment via frame-level feature matching to achieve the appearance similarity score between the videos, while utilizing temporal order-preserving priors for...
Chapter
Furnishing and rendering indoor scenes has been a long-standing task for interior design, where artists create a conceptual design for the space, build a 3D model of the space, decorate, and then perform rendering. Although the task is important, it is tedious and requires tremendous effort. In this paper, we introduce a new problem of domain-speci...
Chapter
Full-text available
Object reconstruction from 3D point clouds has achieved impressive progress in the computer vision and computer graphics research field. However, reconstruction from time-varying point clouds (a.k.a. 4D point clouds) is generally overlooked. In this paper, we propose a new network architecture, namely RFNet-4D, that jointly reconstruct objects and...
Preprint
Full-text available
Architectural photography is a genre of photography that focuses on capturing a building or structure in the foreground with dramatic lighting in the background. Inspired by recent successes in image-to-image translation methods, we aim to perform style transfer for architectural photographs. However, the special composition in architectural photog...
Preprint
Full-text available
We present a novel method for few-shot video classification, which performs appearance and temporal alignments. In particular, given a pair of query and support videos, we conduct appearance alignment via frame-level feature matching to achieve the appearance similarity score between the videos, while utilizing temporal order-preserving priors for...
Article
Monte Carlo integration is typically interpreted as an estimator of the expected value using stochastic samples. There exists an alternative interpretation in calculus where Monte Carlo integration can be seen as estimating a constant function---from the stochastic evaluations of the integrand---that integrates to the original integral. The integra...
Preprint
Full-text available
In this work, we propose to use out-of-distribution samples, i.e., unlabeled samples coming from outside the target classes, to improve few-shot learning. Specifically, we exploit the easily available out-of-distribution samples to drive the classifier to avoid irrelevant features by maximizing the distance from prototypes to out-of-distribution sa...
Article
Full-text available
3D point clouds deep learning is a promising field of research that allows a neural network to learn features of point clouds directly, making it a robust tool for solving 3D scene understanding tasks. While recent works show that point cloud convolutions can be invariant to translation and point permutation, investigations of the rotation invarian...
Preprint
Object reconstruction from 3D point clouds has achieved impressive progress in the computer vision and computer graphics research field. However, reconstruction from time-varying point clouds (a.k.a. 4D point clouds) is generally overlooked. In this paper, we propose a new network architecture, namely RFNet-4D, that jointly reconstructs objects and...
Preprint
Full-text available
Medical image analysis using deep learning has recently been prevalent, showing great performance for various downstream tasks including medical image segmentation and its sibling, volumetric image segmentation. Particularly, a typical volumetric segmentation network strongly relies on a voxel grid representation which treats volumetric data as a s...
Preprint
Full-text available
Medical image segmentation has been so far achieving promising results with Convolutional Neural Networks (CNNs). However, it is arguable that in traditional CNNs, its pooling layer tends to discard important information such as positions. Moreover, CNNs are sensitive to rotation and affine transformation. Capsule network is a data-efficient networ...
Preprint
3D point clouds deep learning is a promising field of research that allows a neural network to learn features of point clouds directly, making it a robust tool for solving 3D scene understanding tasks. While recent works show that point cloud convolutions can be invariant to translation and point permutation, investigations of the rotation invarian...
Preprint
Full-text available
Capsule network is a recent new deep network architecture that has been applied successfully for medical image segmentation tasks. This work extends capsule networks for volumetric medical image segmentation with self-supervised learning. To improve on the problem of weight initialization compared to previous capsule networks, we leverage self-supe...
Chapter
Full-text available
Medical image analysis using deep learning has recently been prevalent, showing great performance for various downstream tasks including medical image segmentation and its sibling, volumetric image segmentation. Particularly, a typical volumetric segmentation network strongly relies on a voxel grid representation which treats volumetric data as a s...
Chapter
Full-text available
Medical image segmentation has been so far achieving promising results with Convolutional Neural Networks (CNNs). However, it is arguable that in traditional CNNs, its pooling layer tends to discard important information such as positions. Moreover, CNNs are sensitive to rotation and affine transformation. Capsule network is a data-efficient networ...
Preprint
Furnishing and rendering an indoor scene is a common but tedious task for interior design: an artist needs to observe the space, create a conceptual design, build a 3D model, and perform rendering. In this paper, we introduce a new problem of domain-specific image synthesis using generative modeling, namely neural scene decoration. Given a photogra...
Preprint
Full-text available
Network pruning is an effective method to reduce the computational expense of over-parameterized neural networks for deployment on low-resource systems. Recent state-of-the-art techniques for retraining pruned networks such as weight rewinding and learning rate rewinding have been shown to outperform the traditional fine-tuning technique in recover...
Article
Monte Carlo integration is an efficient method to solve a high-dimensional integral in light transport simulation, but it typically produces noisy images due to its stochastic nature. Many existing methods, such as image denoising and gradient-domain reconstruction, aim to mitigate this noise by introducing some form of correlation among pixels. Wh...
Preprint
With the recent developments of convolutional neural networks, deep learning for 3D point clouds has shown significant progress in various 3D scene understanding tasks including 3D object recognition. In a safety-critical environment, it is however not well understood how such neural networks are vulnerable to adversarial examples. In this work, we...
Preprint
Recent advances in deep learning for 3D point clouds have shown great promises in scene understanding tasks thanks to the introduction of convolution operators to consume 3D point clouds directly in a neural network. Point cloud data, however, could have arbitrary rotations, especially those acquired from 3D scanning. Recent works show that it is p...
Article
Full-text available
In this work, we present a novel method to learn a local cross-domain descriptor for 2D image and 3D point cloud matching. Our proposed method is a dual auto-encoder neural network that maps 2D and 3D input into a shared latent space representation. We show that such local cross-domain descriptors in the shared embedding are more discriminative tha...
Preprint
In this work, we present a novel method to learn a local cross-domain descriptor for 2D image and 3D point cloud matching. Our proposed method is a dual auto-encoder neural network that maps 2D and 3D input into a shared latent space representation. We show that such local cross-domain descriptors in the shared embedding are more discriminative tha...
Conference Paper
Deep learning requires availability of massive 3D data. How to acquire 3D scenes efficiently?
Conference Paper
Full-text available
Deep learning techniques for point cloud data have demonstrated great potentials in solving classical problems in 3D computer vision such as 3D object classification and segmentation. Several recent 3D object classification methods have reported state-of-the-art performance on CAD model datasets such as ModelNet40 with high accuracy (∼92%). Despite...
Preprint
Deep learning with 3D data has progressed significantly since the introduction of convolutional neural networks that can handle point order ambiguity in point cloud data. While being able to achieve good accuracies in various scene understanding tasks, previous methods often have low training speed and complex network architecture. In this paper, w...
Preprint
Full-text available
Recent progresses in 3D deep learning has shown that it is possible to design special convolution operators to consume point cloud data. However, a typical drawback is that rotation invariance is often not guaranteed, resulting in networks being trained with data augmented with rotations. In this paper, we introduce a novel convolution operator for...
Preprint
Full-text available
Deep learning techniques for point cloud data have demonstrated great potentials in solving classical problems in 3D computer vision such as 3D object classification and segmentation. Several recent 3D object classification methods have reported state-of-the-art performance on CAD model datasets such as ModelNet40 with high accuracy (~92%). Despite...
Article
Full-text available
Monte Carlo methods for physically‐based light transport simulation are broadly adopted in the feature film production, animation and visual effects industries. These methods, however, often result in noisy images and have slow convergence. As such, improving the convergence of Monte Carlo rendering remains an important open problem. Gradient‐domai...
Preprint
Full-text available
Deep learning techniques have become the to-go models for most vision-related tasks on 2D images. However, their power has not been fully realised on several tasks in 3D space, e.g., 3D scene understanding. In this work, we jointly address the problems of semantic and instance segmentation of 3D point clouds. Specifically, we develop a multi-task p...
Conference Paper
We introduce a novel framework for using natural language to generate and edit 3D indoor scenes, harnessing scene semantics and text-scene grounding knowledge learned from large annotated 3D scene databases. The advantage of natural language editing interfaces is strongest when performing semantic operations at the sub-scene level, acting on groups...
Conference Paper
Despite the wide adoption in film production and animation industry nowadays, Monte Carlo light transport simulation is still prone to producing noisy images within short rendering time. Accelerating the convergence of Monte Carlo rendering without sacrificing its accuracy is by far a challenging task. In this course, we will learn about gradient-d...
Article
Full-text available
Gradient-domain rendering can improve the convergence of surface-based light transport by exploiting smoothness in image space. Scenes with participating media exhibit similar smoothness and could potentially benefit from gradient-domain techniques. We introduce the first gradient-domain formulation of image synthesis with homogeneous participating...
Article
Full-text available
The widespread adoption of autonomous systems such as drones and assistant robots has created a need for real-time high-quality semantic scene segmentation. In this paper, we propose an efficient yet robust technique for on-the-fly dense reconstruction and semantic segmentation of 3D indoor scenes. To guarantee real-time performance, our method is...
Article
Full-text available
Deep learning with 3D data such as reconstructed point clouds and CAD models has received great research interests recently. However, the capability of using point clouds with convolutional neural network has been so far not fully explored. In this technical report, we present a convolutional neural network for semantic segmentation and object reco...
Preprint
Depth sensing devices have created various new applications in scientific and commercial research with the advent of Microsoft Kinect and PMD (Photon Mixing Device) cameras. Most of these applications require the depth cameras to be pre-calibrated. However, traditional calibration methods using a checkerboard do not work very well for depth cameras...
Article
The most common solutions to the light transport problem rely on either Monte Carlo (MC) integration or density estimation methods, such as uni- & bi-directional path tracing or photon mapping. Recent gradient-domain extensions of MC approaches show great promise; here, gradients of the final image are estimated numerically (instead of the image in...
Article
Full-text available
Recent advances of 3D acquisition devices have enabled large-scale acquisition of 3D scene data. Such data, if completely and well annotated, can serve as useful ingredients for a wide spectrum of computer vision and graphics works such as data-driven modeling and scene understanding, object detection and recognition. However, annotating a vast amo...
Preprint
Recent advances of 3D acquisition devices have enabled large-scale acquisition of 3D scene data. Such data, if completely and well annotated, can serve as useful ingredients for a wide spectrum of computer vision and graphics works such as data-driven modeling and scene understanding, object detection and recognition. However, annotating a vast amo...
Conference Paper
Full-text available
Several RGB-D datasets have been publicized over the past few years for facilitating research in computer vision and robotics. However, the lack of comprehensive and fine-grained annotation in these RGB-D datasets has posed challenges to their widespread usage. In this paper, we introduce SceneNN, an RGB-D scene dataset consisting of 100 scenes. Al...
Poster
Full-text available
Monte Carlo path tracing has been increasingly popular in movie production recently. It is a general and unbiased rendering technique that can easily handle diffuse and glossy surfaces. To trace light paths, most of existing path tracers rely on surface BRDFs for directional sampling. This works well for glossy appearance, but tends to be not effec...
Article
Full-text available
Depth sensing devices have created various new applications in scientific and commercial research with the advent of Microsoft Kinect and PMD (Photon Mixing Device) cameras. Most of these applications require the depth cameras to be pre-calibrated. However, traditional calibration methods using a checkerboard do not work very well for depth cameras...
Article
Full-text available
Intrinsic image decomposition is an important problem that targets the recovery of shading and reflectance components from a single image. While this is an ill-posed problem on its own, we propose a novel approach for intrinsic image decomposition using reflectance sparsity priors that we have developed. Our sparse representation of reflectance is...
Conference Paper
Full-text available
Dual photography is a well-known application of light transport acquired by a projector-camera system. By applying compressive sensing, compressive dual photography [1] is a fast approach to acquire the light transport for dual photography. However, the reconstruction step in compressive dual photography can still take several hours before dual ima...
Conference Paper
Full-text available
While geometry reconstruction has been extensively studied, several shortcomings still exist. First, traditional geometry reconstruction methods such as geometric or photometric stereo only recover either surface depth or normals. Second, such methods require calibration. Third, such methods cannot recover accurate geometry in the presence of inter...
Conference Paper
Full-text available
We propose a single-image, shift-invariant motion deblurring approach where the blur kernel is directly estimated from light streaks in the blurred image. Combining with the sparsity constraint, the blur kernel can be solved quickly and accurately from a user input region containing a light streak. This kernel can then be applied to state-of-the-ar...
Conference Paper
Full-text available
We propose a blind inverse halftoning method with adaptive energy diffusion. A discrete Voronoi diagram is built by treating halftone dots as Voronoi cell sites. Gaussian filters are then created adaptively based on Voronoi cells and used for energy diffusion to form the grayscale inverse halftone image. We further perform a median filter on the Ga...

Network

Cited By