Wen Gao

Wen Gao
Peking University | PKU · Department of Computer Science and Technology

PhD

About

1,062
Publications
248,036
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
28,657
Citations
Additional affiliations
February 1996 - February 2006
Institute of Computing Technology, Chinese Academy of Sciences
Position
  • Professor (Full)
December 1991 - January 1996
Harbin Institute of Technology
Position
  • Professor (Full)
January 2006 - present
Peking University
Position
  • Professor (Full)

Publications

Publications (1,062)
Preprint
Cross-subject EEG emotion recognition is a challenging and popular research direction in affective computing. At present, graph-based methods have been proposed to model EEG data with graph structure. Although these existing methods have achieved significant improvements, it is difficult for many methods relying on local features to effectively lea...
Preprint
Efficient recognition of emotions has attracted extensive research interest, which makes new applications in many fields possible, such as human-computer interaction, disease diagnosis and service robot, etc. Although existing work on sentiment analysis relying on sensors or unimodal methods performs well for simple contexts like business recommend...
Article
Driver drowsiness is an important cause of traffic accidents. Many studies using computer vision techniques to detect driver drowsiness states, such as slow blinking, yawning, and nodding, have demonstrated excellent potential. Although existing studies have made significant progress, the number of samples in the training corpora is small, which ma...
Preprint
Emotion plays an increasingly important role in our daily lives and negative emotions can increase dangerous driving behaviors leading to extremely serious traffific accidents. Therefore, it is necessary to establish a system that can automatically recognize emotions to alert drivers to avoid dangerous driving behaviors. To this end, we propose to...
Preprint
Driver drowsiness is an important cause of traffic accidents. Many studies using computer vision techniques to detect driver drowsiness states, such as slow blinking, yawning, and nodding, have demonstrated excellent potential. Although existing studies have made significant progresses, the number of samples in the training corpora is small, which...
Preprint
Identifying COVID-19 patients at an early stage through deep learning technology will largely reduce the burden on clinicians and effectively prevent the rapid spread of the virus. Although some pioneering results have been obtained in automatic segmentation of pneumonia lesions from CT slices, there is still a big gap between the precision of auto...
Preprint
Stress has been identified as one of major contributing factors in car crashes due to its negative impact on driving performance. It is in urgent need that the stress levels of drivers can be detected in real time with high accuracy so that intervening or navigating measures can be taken in time to mitigate the situation. Existing driver stress det...
Preprint
The resurgence of deep learning has brought many new ideas and methods to improve the accuracy of traffic flow prediction. In this paper, a novel multi-step prediction model based on deep neural networks is proposed, which can explore the periodic features of traffic flow by learning the intrinsic relationship between traffic flow and corresponding...
Preprint
The resurgence of deep learning has brought many new methods and ideas to the traditional transportation field, especially for improving the accuracy of traffic prediction. In this paper, a novel one-to-one traffic prediction model based on deep neural networks is proposed, which can model traffic flow data and time information simultaneously to en...
Article
Visual communication in wireless scenarios is quite challenging because the channel quality often varies unpredictably and dramatically with time and for different users. As a result, uncoded pseudo-analog transmission schemes like SoftCast have attracted a lot of research interest in the past few years due to the ability to handle channel variatio...
Conference Paper
Full-text available
Owing to the ability of removing compression artifacts, extensive in-loop filters have been proposed for video coding standards. They are performed after the reconstruction of all coding units (CUs), however, none of them has been taken into account in the mode decision when coding each CU. To address this issue and make the rate-distortion optimiz...
Article
The pose problem is one of the bottlenecks in automatic face recognition. We argue that one of the diffculties in this problem is the severe misalignment in face images or feature vectors with different poses. In this paper, we propose that this problem can be statistically solved or at least mitigated by maximizing the intra-subject across-pose co...
Article
Towards low bit rate mobile visual search, recent works have proposed to aggregate the local features and compress the aggregated descriptor (such as Fisher vector, the vector of locally aggregated descriptors) for low latency query delivery as well as moderate search complexity. Even though Hamming distance can be computed very fast, the computati...
Article
Full-text available
The popularity of stereo images and various display devices poses the need of stereo image retargeting techniques. Existing warping-based retargeting methods can well preserve the shape of salient objects in a retargeted stereo image pair. Nevertheless, these methods often incur depth distortion, since they attempt to preserve depth by maintaining...
Article
In this paper, a hierarchical dependency context model (HDCM) is firstly proposed to exploit the statistical correlations of DCT (Discrete Cosine Transform) coefficients in H.264/AVC video coding standard, in which the number of non-zero coefficients in a DCT block and the scanned position are used to capture the magnitude varying tendency of DCT c...
Article
In super-resolution that constructs a high-resolution (HR) image from a set of low-resolution (LR) reference images, it is crucial to align the LR reference images in order to efficiently exploit the pixels therein. However, due to the existence of complex local motion, ideal registration is difficult to acquire. In this paper, we present a robust...
Article
Full-text available
In this paper, an effective low bit-rate video coding scheme is developed to realize state-of-the-art video coding efficiency with lower encoder complexity, while supporting standard compliance and error resilience. Such an architecture is particularly attractive for application scenarios involving resource-deficient wireless video communications....
Article
AVS2 is a new generation of video coding standard developed by the IEEE 1857 Working Group under project 1857.4. AVS2 is also the second-generation video coding standard established by the Audio and Video Coding Standard (AVS) Working Group of China; the first-generation AVS1 was developed by the AVS Working Group and issued as Chinese national sta...
Article
Search range (SR) is a key parameter on the search quality control for motion estimation (ME) of a real-time video encoder. Dynamic search range (DSR) is a commonly employed algorithm to reduce the computational complexity of ME in a video encoder. In this paper, we model an effective predicted motion vector (PMV) deviation metric to predict the re...
Article
Wavefront parallelism is effective for parallel video encoding thanks to its merits of low latency, no quality loss and high degree of parallelism. In traditional video encoders, macroblock row wavefront (MRW) parallelism was widely adopted. However the performance of MRW is limited by workload unbalance and computing resource unbalance among multi...
Article
In this paper, a unified and adaptive web video thumbnail recommendation framework is proposed, which recommends thumbnails both for video owners and browsers on the basis of image quality assessment, image accessibility analysis, video content representativeness analysis and query-sensitive matching. At the very start, video shot detection is perf...
Article
Most model-based rate control schemes use independent rate-distortion (R-D) models at macroblock (MB) level to represent the relationship among bit rate, distortion and encoding complexity. However the correlations between frames (INTER-dependency) are not well considered for distortion, bit allocation and quantization parameter (QP) decision. In t...
Article
As an important procedure in image retrieval, off-line indexing focuses on organizing relevant images together and largely decides the efficiency, accuracy, and memory cost of the retrieval system. Because the image contains multi-level visual and semantic clues, the described indexing strategy should be able to reflect such multi-level relevance....
Conference Paper
Full-text available
High Efficiency Video Coding (HEVC) is the emerging video coding standard, which provides equivalent subjective quality with about 50% bit rate reduction compared to H.264/AVC High Profile. However, the improvement of coding efficiency is obtained at the expense of increased computational complexity. In this paper, a fast algorithm for HEVC intra c...
Article
From many fewer acquired measurements than suggested by the Nyquist sampling theory, compressive sensing (CS) theory demonstrates that, a signal can be reconstructed with high probability when it exhibits sparsity in some domain. Most of the conventional CS recovery approaches, however, exploited a set of fixed bases (e.g. DCT, wavelet and gradient...
Article
As a 3D extension of the High Efficiency Video Coding (HEVC) standard, 3D-HEVC is developed to improve the coding efficiency of multi-view video. However, the improvement of the coding efficiency is obtained at the expense of a computational complexity increase. How to relieve the computational burden of the encoder is becoming a critical problem i...
Article
Recent local stereo matching methods have achieved comparable performance with global methods. However, the final disparity map still contains significant outliers. In this article, the authors propose a local stereo matching method that employs a new combined cost approach and a secondary disparity refinement mechanism. They formulate combined cos...
Article
Full-text available
The IEEE 1857 Standard for Advanced Audio and Video Coding was released as IEEE 1857-2013 in June 2013. Despite consisting of several different groups, the most significant feature of IEEE 1857-2013 is its Surveillance Groups, which can not only achieve at least twice the coding efficiency on surveillance videos as H.264/AVC High Profile, but it's...
Article
For the real-time and low-delay video surveillance and teleconferencing applications, the newly video coding standard HEVC can achieve much higher coding efficiency over H.264/AVC. However, we still argue that the hierarchical prediction structure in the HEVC low-delay encoder still does not fully utilize the special characteristics of surveillance...
Article
Full-text available
Vehicle electrification is envisioned to be a significant component of the forthcoming smart grid. In this paper, a smart grid vision of the electric vehicles for the next 30 years and beyond is presented from six perspectives pertinent to intelligent transportation systems: 1) vehicles; 2) infrastructure; 3) travelers; 4) systems, operations, and...
Conference Paper
Full-text available
In this paper, we provide some insights into the IEEE Standard for Systems of Advanced Audio and Video Coding (IEEE 1857.3). Specifically, the standard defines file format and real-time transport protocol (RTP) payload format for IEEE 1857 video and IEEE 1857.2 audio. The storage of video and audio not only uses the existing capabilities of the ISO...
Article
We propose a computational model which computes the importance of 2-D object shape parts, and we apply it to detect and localize objects with and without occlusions. The importance of a shape part (a localized contour fragment) is considered from the perspective of its contribution to the perception and recognition of the global shape of the object...
Article
To ensure application interoperability in visual object search technologies, the MPEG Working Group has made great efforts in standardizing visual search technologies. Moreover, extraction and transmission of compact descriptors are valuable for next-generation, mobile, visual search applications. This article reviews the significant progress of MP...
Article
Currently, many local descriptors have been proposed to tackle a basic issue in computer vision: duplicate visual content matching. These descriptors either are represented as high dimensional vectors relatively expensive to extract and compare or are binary codes limited in robustness. Bag-of-visual Words (BoWs) model compresses local features int...
Article
Human pose estimation is a key step to action recognition. We propose a method of estimating 3D human poses from a single image, which works in conjunction with an existing 2D pose/joint detector. 3D pose estimation is challenging because multiple 3D poses may correspond to the same 2D pose after projection due to the lack of depth information. Mor...
Article
Incorporating image classification into image retrieval system brings many attractive advantages. For instance, the search space can be narrowed down by rejecting images in irrelevant categories of the query. The retrieved images can be more consistent in semantics by indexing and returning images in the relevant categories together. However, due t...
Conference Paper
Full-text available
Rate-Distortion (R-D) optimization technique plays an important role in video coding. R-D sense discarding (thresholding) technique can make great improvement on the coding efficiency. This work first proposes a multi-level coefficient discarding scheme, which is composed of coefficient-level (CL), block-level and macroblock-level discarding. In CL...
Conference Paper
The SoftCast scheme recently proposed for wireless visual communication avoids the threshold effect that traditional communication systems usually suffer from. It provides graceful quality transition by sending images in a sequence of whitened transform coefficients using dense-constellation modulation and analog-like transmission. A key point in S...
Conference Paper
Full-text available
The fraction-pel interpolation filter varies in the video coding standards such as H.264/AVC, AVS and HEVC. Since fractional-pel motion compensation plays an important role in the video encoder, the interpolation of fractional-pel pixels can be refined and designed better to enhance the coding efficiency. In this paper, we firstly propose the gener...
Conference Paper
Total-variation (TV) regularization is widely adopted in image restoration problems to exploit the feature that natural images are smooth with small gradient values at most regions. Basic TV method assumes identical zero-mean Laplacian distribution for the gradients at all pixels. However, for real-world images, the statistics of gradients may not...
Article
Visual patterns, i.e. high-order combinations of visual words, contributes to a discriminative abstraction of the high-dimensional bag-of-words image representation. However, the existing visual patterns are built upon the 2D photographic concurrences of visual words, which is ill-posed comparing to their real-world 3D concurrences, since the words...
Article
Full-text available
Traditional patch-based sparse representation modeling of natural images usually suffer from two problems. First, it has to solve a large-scale optimization problem with high computational complexity in dictionary learning. Second, each patch is considered independently in dictionary learning and sparse coding, which ignores the relationship among...
Article
Full-text available
This paper presents a novel strategy for high-fidelity image restoration by characterizing both local smoothness and nonlocal self-similarity of natural images in a unified statistical manner. The main contributions are three-folds. First, from the perspective of image statistics, a joint statistical modeling (JSM) in an adaptive hybrid space-trans...
Article
The IEEE Standards Association recently approved a comprehensive set of standards that cover advanced audio and video coding for storage and transmission, targeting emerging applications such as Internet media streaming and smart-camera surveillance. The standards' basic goal is to facilitate reliable and efficient exchange of audiovisual data stre...
Conference Paper
In this paper, we propose a hybrid parallel decoding strategy for HEVC which combines task-level parallelism and datalevel parallelism based on CTUs. The data-level parallelism makes the execution time distribution of different decoding stages more balanced, and makes the task-level parallelism more efficient. Our approach imposes no constraint on...
Conference Paper
Full-text available
The consistent video quality and encoding latency due to buffering are two important aspects in designing rate control scheme for the application of real-time video coding system. To well balance these two contrary objectives, we firstly analyze the constraint of buffer latency and the definition of a “consistent” video quality. Then a window-based...
Article
Full-text available
We propose a novel representation for stereo videos namely 2D-plus-depth-cue. This representation is able to encode stereo videos compactly by leveraging the by-product of a stereo video conversion process. Specifically, the depth cues are derived from an interactive labeling process during 2D-to-stereo video conversion - they are contour points of...
Article
Full-text available
Recovering images from corrupted observations is necessary for many real-world applications. In this paper, we propose a unified framework to perform progressive image recovery based on hybrid graph Laplacian regularized regression. We first construct a multiscale representation of the target image by Laplacian pyramid, then progressively recover t...
Article
Full-text available
Video retargeting is a useful technique to adapt a video to a desired display resolution. It aims to preserve the information contained in the original video and the shapes of salient objects while maintaining the temporal coherence of contents in the video. Existing video retargeting schemes achieve temporal coherence via constraining each region/...
Conference Paper
Full-text available
As an important procedure in image retrieval, off-line indexing focuses on organizing relevant images together and making them easy to access. However, most of existing indexing strategies view database images individually and only consider partial relevance, i.e., either visual or semantic relevance among them. To overcome these issues and design...
Article
In this paper, we propose a novel image interpolation algorithm via graph-based Bayesian label propagation. The basic idea is to first create a graph with known and unknown pixels as vertices and with edge weights encoding the similarity between vertices, then the problem of interpolation converts to how to effectively propagate the label informati...
Conference Paper
Conventional image and video communication systems are usually designed with the objective being to maximize the fidelity of reconstructed images measured by mean square errors (MSE). It is well known that the fidelity metric MSE may not reflect the visual quality perceived by human eyes. Recent advancements in image quality assessment tell us that...
Article
Visual phrase considers multiple visual words and captures extra spatial clues among them. Thus, visual phrase shows better discriminative power than single visual word in image retrieval and matching. Not withstanding their success, existing visual phrases still show obvious shortcomings: (1) limited flexibility, i.e., visual phrases are considere...
Article
With the proliferation of mobile devices, recent years have witnessed an emerging potential to integrate mobile visual search techniques into digital library. Such a mobile application scenario in digital library has posed significant and unique challenges in document image search. The mobile photograph makes it tough to extract discriminative feat...
Article
The ever increasing Internet image collection densely samples the real world objects, scenes, etc. and is commonly accompanied with multiple metadata such as textual descriptions and user comments. Such image data has potential to serve as a knowledge source for large-scale image applications. Facilitated by such publically available and ever-incre...
Article
The exponential growth of surveillance videos presents an unprecedented challenge for high-efficiency surveillance video coding technology. Compared with the existing coding standards that were basically developed for generic videos, surveillance video coding should be designed to make the best use of the special characteristics of surveillance vid...
Article
Extraction and transmission of compact descriptors are of great importance for next-generation mobile visual search applications. Existing visual descriptor techniques mainly compress visual features into compact codes of fixed bit rate, which is not adaptive to the bandwidth fluctuation in wireless environment. In this letter, we propose a Rate-ad...
Article
In conventional motion compensation, prediction block is related only with one motion vector for P frame. Multihypothesis motion compensation (MHMC) is proposed to improve the prediction performance of conventional motion compensation. However, multiple motion vectors have to be searched and coded for MHMC. In this paper, we propose a new low-cost...
Article
The High Efficiency Video Coding has a significant compression performance benefit versus previous standards. Thanks to the high efficiency prediction tools, blocks with all-zero quantized transform coefficients are quite common in HEVC. The computation load of transform and quantization can be remarkably reduced if the all-zero blocks can be detec...
Article
In this correspondence, we explore a low-complexity adaptive view synthesis optimization (VSO) scheme in the upcoming high-efficiency video coding (HEVC)-based 3-D video coding standard. We first devise a novel zero-synthesized view difference (ZSVD) model which jointly accounts for the distortion of the synthesized view induced by the compound imp...
Conference Paper
Full-text available
Motion estimation is the most complex module which contributes nearly 70% of computation resources in a hardware-based video encoder. This huge computational complexity limits the performance of HD video encoders in terms of encoding speed and power consumption. This paper presents a hardware oriented multi-resolution motion estimation algorithm us...
Article
Full-text available
Rate distortion optimization (RDO) is the best known mode decision method, while the high implementation complexity limits its applications and almost no real-time hardware encoder is truly full-featured RDO based. In this paper, first, a full-featured RDO-based mode decision (MD) algorithm is presented, which makes more modes enter RDO process. Se...
Article
It is in urgent need to develop fast and efficient transcoding methods so as to remarkably save the storage of surveillance videos and synchronously transmit conference videos over different bandwidths. Towards this end, the special characteristics of these videos, e.g., the relatively static background, should be utilized for transcoding. Therefor...
Article
In this paper, a Rate-GOP based frame level rate control scheme is proposed for High Efficiency Video Coding (HEVC). The proposed scheme is developed with the consideration of the new coding tools adopted into HEVC, including the quad-tree coding structure and the new reference frame selection mechanism, called reference picture set (RPS). The cont...
Article
This paper provides an overview of the mode dependent coding tools in the development of video coding technology. In video coding, the prediction mode is closely related with the local image statistical characteristics. For instance, the angular intra mode contains the oriented structure information and the interpartition mode can reveal the local...
Article
Full-text available
Content-based copy detection (CBCD) is drawing increasing attention as an alternative technology to watermarking for video identification and copyright protection. In this article, we present a comprehensive method to detect copies that are subjected to complicated transformations. A multimodal feature representation scheme is designed to exploit t...
Conference Paper
In this paper, we propose a novel perception-based shape decomposition method which aims to decompose a shape into semantically meaningful parts. In addition to three popular perception rules (the Minima rule, the Short-cut rule and the Convexity rule) in shape decomposition, we propose a new rule named part-similarity rule to encourage consistent...
Conference Paper
MB-level parallelism is widely used in parallel video coding thanks to its merits of low latency, no performance loss and high degree of parallelism. Most of video encoders with MB-level parallelism employ MB Row Scheduling (MRS) scheme. In software video encoder, early terminate algorithms tend to cause significant difference in coding time of dif...
Conference Paper
To reduce the surveillance video coding cost, it is intuitive to encode surveillance videos by dealing with the foreground objects and the background separately. One widely used method following this strategy is Region-of-Interest (ROI) based coding. To achieve significant improvement for the coding efficiency of ROI based methods, this paper prese...
Conference Paper
Full-text available
In this work, a CTU level rate control algorithm is proposed for High Efficiency Video Coding (HEVC). On top of the CTU level rate control algorithm, an efficient bit allocation method considering the HEVC hierarchical coding structure is specifically designed. Instead of directly applying the rate distortion model at the CTU level, it's proposed t...
Conference Paper
In this paper, we propose a new reduced reference image quality assessment algorithm based on the recent advances in sparse coding and representation, particularly, the entropy of primitives (EoP). The EoP is defined in terms of the distribution of the primitives, which form an overcomplete dictionary to represent the natural scene by linear combin...
Conference Paper
A growing societal awareness about privacy and security push the development of signal processing techniques in the encrypted domain. Data compression in encrypted domain attracts much attention recently years due to its avoiding the leakage of data source during compression. This paper proposes an improved block-by-block compression scheme of encr...
Conference Paper
HEVC (High Efficiency Video Coding) has recently been published as the next generation video coding standard. Compared with previous standards, the coding efficiency is greatly improved at the cost of much higher codec complexity. On the other hand, ARM with SIMD (Single Instruction Multiple Data) instructions is widely deployed on mobile platform,...
Conference Paper
In this paper, we aim at evaluating the perceptual visual information based on a novel top-down methodology: entropy of primitive (EoP). The EoP is determined by the distribution of the atoms in describing an image, and is demonstrated to exhibit closely correlation with the perceptual image quality. Based on the visual information evaluation, we f...
Conference Paper
In the High Efficiency Video Coding (HEVC) based 3D video coding, 3D-HEVC, the disparity vector (DV) derivation is critical for inter-view motion prediction, inter-view residual prediction, disparity-compensated prediction (DCP) or any other tools exploiting inter-view correlation. In HTM-5.0.1, the DV is derived from some spatial and temporal neig...
Conference Paper
Traditional communication systems usually suffer from the threshold effect when channel signal-to-noise ratio (CSNR) fluctuates unpredictably in wireless and mobile scenarios. The SoftCast scheme, however, provides graceful quality transition in wide CSNR range. In SoftCast, input image is decorrelated by a transform and modulated directly to a den...
Conference Paper
Full-text available
This paper proposes a coding tree unit (CTU) level rate control for HEVC based on the Laplace distribution modeling of the transformed residuals. Firstly, we give a study on the relationship model among the optimal quantization step, the Laplace parameter and the Lagrange multiplier. Based on the relationship model, the quantization parameter for e...
Conference Paper
In this paper, a content adaptive in-loop depth map filter is proposed for HEVC based 3DV video coding to improve the quality of the synthesized views. The proposed depth map filtering scheme is block based by reusing the TU (transform unit) split structure in HEVC. Firstly, TU are classified into 3 categories: flat, directional, and textureless, a...
Conference Paper
In this paper, a novel quadratic ρ-domain frame layer rate control algorithm is proposed for Low Delay (LD) configuration in High Efficiency Video Coding (HEVC). Firstly, we propose a quadratic ρ-domain rate quantization (R-Q) model to establish the relationship between bit rate and quantization parameter. Subsequently, inspired by the specialized...