Bing Zeng

Bing Zeng
University of Electronic Science and Technology of China | UESTC · Electronic Engineering

PhD

About

274
Publications
17,515
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,997
Citations
Citations since 2017
90 Research Items
2363 Citations
20172018201920202021202220230100200300400500
20172018201920202021202220230100200300400500
20172018201920202021202220230100200300400500
20172018201920202021202220230100200300400500

Publications

Publications (274)
Preprint
This paper proposes a hybrid synthesis method for multi-exposure image fusion taken by hand-held cameras. Motions either due to the shaky camera or caused by dynamic scenes should be compensated before any content fusion. Any misalignment can easily cause blurring/ghosting artifacts in the fused result. Our hybrid method can deal with such motions...
Preprint
Existing homography and optical flow methods are erroneous in challenging scenes, such as fog, rain, night, and snow because the basic assumptions such as brightness and gradient constancy are broken. To address this issue, we present an unsupervised learning approach that fuses gyroscope into homography and optical flow learning. Specifically, we...
Article
In this letter, we propose two solutions for the rate control of the VVC low-delay coding. Both solutions are developed by determining the bit allocation factors for video frames based on their dependency. Specifically, we design the first solution according to the distortion correlation between the key-frame and its subsequent frames. With this so...
Article
Omnidirectional video streaming is usually implemented based on the representations of tiles, where the tiles are obtained by splitting the video frame into several rectangular areas and each tile is converted into multiple representations with different resolutions and encoded at different bitrates. One key issue in omnidirectional video streaming...
Article
In this paper, we introduce a new framework for unsupervised deep homography estimation. Our contributions are 3 folds. First, unlike previous methods that regress 4 offsets for a homography, we propose a homography flow representation, which can be estimated by a weighted sum of 8 pre-defined homography flow bases. Second, considering a homography...
Chapter
High dynamic range (HDR) deghosting algorithms aim to generate ghost-free HDR images with realistic details. Restricted by the locality of the receptive field, existing CNN-based methods are typically prone to producing ghosting artifacts and intensity distortions in the presence of large motion and severe saturation. In this paper, we propose a no...
Article
The paper proposes a method to effectively fuse multi-exposure inputs and generate high-quality high dynamic range (HDR) images with unpaired datasets. Deep learning-based HDR image generation methods rely heavily on paired datasets. The ground truth images play a leading role in generating reasonable HDR images. Datasets without ground truth are h...
Preprint
Full-text available
Convolution neural network (CNN) based methods offer effective solutions for enhancing the quality of compressed image and video. However, these methods ignore using the raw data to enhance the quality. In this paper, we adopt the raw data in the quality enhancement for the HEVC intra-coded image by proposing an online learning-based method. When q...
Article
We propose a decoder-friendly chrominance enhancement method for the compressed images. Our proposed method is developed based on the luminance-guided chrominance enhancement network (LGCEN) and online learning. With LGCEN, the textures of the compressed chrominance components are enhanced by the guidance of luminance component. Moreover, LGCEN is...
Article
Data association is important in the point cloud registration. In this work, we propose to solve the partial-to-partial registration from a new perspective, by introducing multi-level feature interactions between the source and the reference clouds at the feature extraction stage, such that the registration can be realized without the attentions or...
Preprint
Full-text available
In this paper, we propose a luminance-guided chrominance image enhancement convolutional neural network for HEVC intra coding. Specifically, we firstly develop a gated recursive asymmetric-convolution block to restore each degraded chrominance image, which generates an intermediate output. Then, guided by the luminance image, the quality of this in...
Preprint
In this paper, we propose a new fast CU partition algorithm for VVC intra coding based on cross-block difference. This difference is measured by the gradient and the content of sub-blocks obtained from partition and is employed to guide the skipping of unnecessary horizontal and vertical partition modes. With this guidance, a fast determination of...
Article
3D object detection has become an emerging task in autonomous driving scenarios. Most of previous works process 3D point clouds using either projection-based or voxel-based models. However, both approaches contain some drawbacks. The voxel-based methods lack semantic information, while the projection-based methods suffer from numerous spatial infor...
Article
The paper proposes a solution based on Generative Adversarial Network (GAN) for solving jigsaw puzzles. The problem assumes that an image is divided into equal square pieces, and asks to recover the image according to information provided by the pieces. Conventional jigsaw puzzle solvers often determine the relationships based on the boundaries of...
Article
We present an unsupervised optical flow estimation method by proposing an adaptive pyramid sampling in the deep pyramid network. Specifically, in the pyramid downsampling, we propose a Content-Aware Pooling (CAP) module, which promotes local feature gathering by avoiding cross region pooling, so that the learned features become more representative....
Preprint
Full-text available
Panorama images have a much larger field-of-view thus naturally encode enriched scene context information compared to standard perspective images, which however is not well exploited in the previous scene understanding methods. In this paper, we propose a novel method for panoramic 3D scene understanding which recovers the 3D room layout and the sh...
Article
Mobile captured images can be aligned using their gyroscope sensors. Optical image stabilizer (OIS) terminates this possibility by adjusting the images during the capturing. In this work, we propose a deep network that compensates for the motions caused by the OIS, such that the gyroscopes can be used for image alignment on the OIS cameras. To achi...
Article
Occlusion is an inevitable and critical problem in unsupervised optical flow learning. Existing methods either treat occlusions equally as non-occluded regions or simply remove them to avoid incorrectness. However, the occlusion regions can provide effective information for optical flow learning. In this paper, we present OIFlow, an occlusion-inpai...
Article
The channel redundancy of convolutional neural networks (CNNs) results in the large consumption of memories and computational resources. In this work, we design a novel Slim Convolution (SlimConv) module to boost the performance of CNNs by reducing channel redundancies. Our SlimConv consists of three main steps: Reconstruct , Transform , and F...
Article
Full-text available
In this paper, we introduce a novel point-to-surface representation for 3D point cloud learning. Unlike the previous methods that mainly adopt voxel, mesh, or point coordinates, we propose to tackle this problem from a new perspective: learn a set of quadratic terms based static and global reference surfaces to describe 3D shapes, such that the coo...
Preprint
Full-text available
Data association is important in the point cloud registration. In this work, we propose to solve the partial-to-partial registration from a new perspective, by introducing feature interactions between the source and the reference clouds at the feature extraction stage, such that the registration can be realized without the explicit mask estimation...
Preprint
Full-text available
We present a new pipeline for holistic 3D scene understanding from a single image, which could predict object shape, object pose, and scene layout. As it is a highly ill-posed problem, existing methods usually suffer from inaccurate estimation of both shapes and layout especially for the cluttered scene due to the heavy occlusion between objects. W...
Preprint
Point cloud registration is a key task in many computational fields. Previous correspondence matching based methods require the point clouds to have distinctive geometric structures to fit a 3D rigid transformation according to point-wise sparse feature matches. However, the accuracy of transformation heavily relies on the quality of extracted feat...
Preprint
The paper proposes a method to effectively fuse multi-exposure inputs and generates high-quality high dynamic range (HDR) images with unpaired datasets. Deep learning-based HDR image generation methods rely heavily on paired datasets. The ground truth provides information for the network getting HDR images without ghosting. Datasets without ground...
Preprint
Full-text available
The paper proposes a solution based on Generative Adversarial Network (GAN) for solving jigsaw puzzles. The problem assumes that an image is cut into equal square pieces, and asks to recover the image according to pieces information. Conventional jigsaw solvers often determine piece relationships based on the piece boundaries, which ignore the impo...
Article
Full-text available
The paper proposes a solution to effectively handle salient regions for style transfer between unpaired datasets. Recently, Generative Adversarial Networks (GAN) have demonstrated their potentials of translating images from source domain ${X}$ to target domain ${Y}$ in the absence of paired examples. However, such a translation cannot guarantee...
Article
In this letter, we compose a new three-stage deep convolutional neural network (NTSDCN) for image demosaicking, and it consists of our proposed Laplacian energy-constrained local residual unit (LC-LRU) and a feature-guided prior fusion unit (FG-PFU). Specifically, the LC-LRU is used to refine the learning target of the specific residual blocks in t...
Chapter
Single image deraining regards an input image as a fusion of a background image, a transmission map, rain streaks, and atmosphere light. While advanced models are proposed for image restoration (i.e., background image generation), they regard rain streaks with the same properties as background rather than transmission medium. As vapors (i.e., rain...
Article
Facial Expression Recognition (FER) is a challenging yet important research topic owing to its significance with respect to its academic and commercial potentials. In this work, we propose an oriented attention pseudo-siamese network that takes advantage of global and local facial information for high accurate FER. Our network consists of two branc...
Article
Quality enhancement of HEVC compressed videos has attracted a lot of attentions in recent years. In this paper, we propose a robust multi-frame guided attention network (MGANet) to reconstruct high-quality frames based on HEVC compressed videos. In our network, we first use an advanced motion flow algorithm to estimate the motion information of inp...
Article
Removing rain effects from an image is of importance for various applications such as autonomous driving, drone piloting, and photo editing. Conventional methods rely on some heuristics to handcraft various priors to remove or separate the rain effects from an image. Recent deep learning models are proposed to learn end-to-end methods to complete t...
Preprint
Single image deraining regards an input image as a fusion of a background image, a transmission map, rain streaks, and atmosphere light. While advanced models are proposed for image restoration (i.e., background image generation), they regard rain streaks with the same properties as background rather than transmission medium. As vapors (i.e., rain...
Preprint
3D object detection has become an emerging task in autonomous driving scenarios. Previous works process 3D point clouds using either projection-based or voxel-based models. However, both approaches contain some drawbacks. The voxel-based methods lack semantic information, while the projection-based methods suffer from numerous spatial information l...
Preprint
Accurate 3D object detection from point clouds has become a crucial component in autonomous driving. However, the volumetric representations and the projection methods in previous works fail to establish the relationships between the local point sets. In this paper, we propose Sparse Voxel-Graph Attention Network (SVGA-Net), a novel end-to-end trai...
Article
Physically based rendering has been widely used to generate photo-realistic images, which greatly impacts industry by providing appealing rendering, such as for entertainment and augmented reality, and academia by serving large scale high-fidelity synthetic training data for data hungry methods like deep learning. However, physically based renderin...
Article
Multiple description coding (MDC) is an efficient source coding technique for error-prone transmission over multiple channels. In this paper, we focus on the design of a new polyphase down-sampling based MDC (NPDS-MDC) for image signals. The encoding of our proposed NPDS-MDC consists of three steps. First, we perform down-sampling on each ${N} \ti...
Article
Full-text available
Removing rain streaks from a single image continues to draw attentions today in outdoor vision systems. In this paper, we present an efficient method to remove rain streaks. First, the location map of rain pixels needs to be known as precisely as possible, to which we implement a relatively accurate detection of rain streaks by utilizing two charac...
Preprint
The channel redundancy in feature maps of convolutional neural networks (CNNs) results in the large consumption of memories and computational resources. In this work, we design a novel Slim Convolution (SlimConv) module to boost the performance of CNNs by reducing channel redundancies. Our SlimConv consists of three main steps: Reconstruct, Transfo...
Article
Full-text available
Light-field raw data captured by a state-of-the-art light-field camera is limited in its spatial and angular resolutions due to the camera’s optical hardware. In this paper, we propose an all-software algorithm to synthesize light-field raw data from a single RGB-D input image, which is driven largely by the need in the research area of light-field...
Preprint
The unprecedented performance achieved by deep convolutional neural networks for image classification is linked primarily to their ability of capturing rich structural features at various layers within networks. Here we design a series of experiments, inspired by children's learning of the arithmetic addition of two integers, to showcase that such...
Preprint
We present a new deep point cloud rendering pipeline through multi-plane projections. The input to the network is the raw point cloud of a scene and the output are image or image sequences from a novel view or along a novel camera trajectory. Unlike previous approaches that directly project features from 3D points onto 2D image domain, we propose t...
Preprint
In recent years, deep learning based methods have made significant progress in rain-removing. However, the existing methods usually do not have good generalization ability, which leads to the fact that almost all of existing methods have a satisfied performance on removing a specific type of rain streaks, but may have a relatively poor performance...
Article
Full-text available
In this paper, we focus on the intrinsic image decomposition problem for stereoscopic image pairs. The existing methods cannot be applied directly to decompose stereoscopic images, as it often produces inconsistent reflectance (albedo) and 3D artifacts after the decomposition. We propose a straightforward yet effective framework that enables a high...
Preprint
Rain removal in images/videos is still an important task in computer vision field and attracting attentions of more and more people. Traditional methods always utilize some incomplete priors or filters (e.g. guided filter) to remove rain effect. Deep learning gives more probabilities to better solve this task. However, they remove rain either by ev...
Preprint
Removing rain effects from an image automatically has many applications such as autonomous driving, drone piloting and photo editing and still draws the attention of many people. Traditional methods use heuristics to handcraft various priors to remove or separate the rain effects from an image. Recently end-to-end deep learning based deraining meth...
Article
In color image compression, the chroma components are often sub-sampled before compression and up-sampled after compression. Although sub-sampling the chroma components saves bit-cost for compression, it often induces extra color distortions in the compressed images. In this paper, we propose two approaches to tackle this problem. Firstly, we propo...
Preprint
In this paper, we propose a quality enhancement network for Versatile Video Coding (VVC) compressed videos by jointly exploiting spatial details and temporal structure (SDTS). The network consists of a temporal structure prediction subnet and a spatial detail enhancement subnet. The former subnet is used to estimate and compensate the temporal moti...
Preprint
Rain streaks will inevitably be captured by some outdoor vision systems, which lowers the image visual quality and also interferes various computer vision applications. We present a novel rain removal method in this paper, which consists of two steps, i.e., detection of rain streaks and reconstruction of the rain-removed image. An accurate detectio...
Article
In this paper, color-depth conditional generative adversarial networks (CDcGAN) are proposed to resolve the problems of simultaneous color image super-resolution and depth image super-resolution in 3D videos. Firstly, a generative network is presented to leverage the mutual information of the low-resolution color image and low-resolution depth imag...
Preprint
In video compression, most of the existing deep learning approaches concentrate on the visual quality of a single frame, while ignoring the useful priors as well as the temporal information of adjacent frames. In this paper, we propose a multi-frame guided attention network (MGANet) to enhance the quality of compressed videos. Our network is compos...
Article
In this paper, a local activity measurement of the clipped and normalized variance or standard deviation is proposed to drive anisotropic diffusion and relative total variation (RTV) to work better for structural preservation. Firstly, two novel edge-stop functions are introduced for our local activity-driven anisotropic diffusion (LAD-AD) to effic...
Article
This paper presents a method to stabilize shaky stereoscopic videos captured by hand-held stereo cameras. It is often problematic to apply traditional monocular video stabilization techniques directly to the stereoscopic views independently. This is mainly because some undesirable vertical disparities and inaccurate horizontal disparities are produ...
Article
Coding of a color image usually happens in the YCbCr space so that the rate-distortion optimization is conducted in this space. Due to the use of a non-unitary matrix in the RGBto- YCbCr conversion, an optimal coding performance achieved in the YCbCr space does not guarantee an optimal quality in the RGB space, which would impact most display devic...
Article
Transform domain downward conversion (TDDC) for image coding is usually implemented by discarding some high-frequency components from each transformed block. As a result, a block of fewer coefficients is formed and a lower compression cost is achieved due to the coding of only a few low-frequency coefficients. In this paper, we focus on the design...
Chapter
This paper presents a method that automatically segments the foreground objects for stereoscopic images. Given a stereo pair, a disparity map can be estimated, which encodes the depth information. Objects that stay close to the camera are considered as foreground while regions with larger depths are deemed as background. Although the raw disparity...
Chapter
Endoscopic videos have been widely used for stomach diagnoses. It is of particular importance to obtain the 3D shapes, which enables observations from different perspectives so as to facilitate comprehensive and accurate diagnoses. However, obtaining 3D shapes is challenging for traditional multi-view 3D reconstruction methods, due to strong motion...
Article
Traditional color image compression is usually conducted in the YCbCr space but many color displayers only accept RGB signals as inputs. Due to the use of a non-uniform matrix in the YCbCr-RGB conversion, a low distortion achieved in the YCbCr space cannot guarantee a low distortion for the RGB signals. To solve this problem, we propose a novel com...
<