Guangtao Zhai

Guangtao Zhai
Shanghai Jiao Tong University | SJTU ·  Department of Electronic Engineering

Ph.D
Full-texts ^^^ are on the Google Scholar page above. Please don't bother making the requests.

About

525
Publications
123,331
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,904
Citations
Citations since 2016
378 Research Items
8969 Citations
201620172018201920202021202205001,0001,5002,000
201620172018201920202021202205001,0001,5002,000
201620172018201920202021202205001,0001,5002,000
201620172018201920202021202205001,0001,5002,000
Introduction
Guangtao Zhai currently works as a professor at the Department of Electrical Engineering, Shanghai Jiao Tong University. He does research in Multimedia Signal Processing. He serves as Editor-in-Chief for Displays (Elsevier) journal.
Skills and Expertise
Additional affiliations
December 2017 - present
Shanghai Jiao Tong University
Position
  • Professor (Full)
August 2012 - July 2013
Friedrich-Alexander-University of Erlangen-Nürnberg
Position
  • Humboldt Research Fellow
May 2012 - December 2017
Shanghai Jiao Tong University
Position
  • Professor

Publications

Publications (525)
Article
Full-text available
With the development of multimedia technology, Augmented Reality (AR) has become a promising next-generation mobile platform. The primary value of AR is to promote the fusion of digital contents and real-world environments, however, studies on how this fusion will influence the Quality of Experience (QoE) of these two components are lacking. To ach...
Article
To improve the viewer’s Quality of Experience (QoE) and optimize computer graphics applications, 3D model quality assessment (3D-QA) has become an important task in the multimedia area. Point cloud and mesh are the two most widely used digital representation formats of 3D models, the visual quality of which is quite sensitive to lossy operations li...
Article
Recent years have witnessed the rapid development of image storage and transmission systems, in which image compression plays an important role. Generally speaking, image compression algorithms are developed to ensure good visual quality at limited bit rates. However, due to the different compression optimization methods, the compressed images may...
Article
Purposes : Large-scale jaw reconstruction can hardly achieve satisfactory results only by relying on doctors’ experience. In this study, we assessed a new approach using a machine learning algorithm based on jaw feature points to assist complex jaw reconstruction in patients with maxillary and mandibular defects. Methods : One hundred and two comp...
Preprint
Full-text available
Human motion synthesis is a long-standing problem with various applications in digital twins and the Metaverse. However, modern deep learning based motion synthesis approaches barely consider the physical plausibility of synthesized motions and consequently they usually produce unrealistic human motions. In order to solve this problem, we propose a...
Preprint
Full-text available
No-reference image quality assessment (NR-IQA) aims to quantify how humans perceive visual distortions of digital images without access to their undistorted references. NR-IQA models are extensively studied in computational vision, and are widely used for performance evaluation and perceptual optimization of man-made vision systems. Here we make on...
Chapter
This paper presents a new vision Transformer, named Iwin Transformer, which is specifically designed for human-object interaction (HOI) detection, a detailed scene understanding task involving a sequential process of human/object detection and interaction recognition. Iwin Transformer is a hierarchical Transformer which progressively performs token...
Article
Augmented reality (AR) overlays digital content onto reality. In an AR system, correct and precise estimations of user visual fixations and head movements can enhance the quality of experience by allocating more computational resources for analyzing, rendering and 3D registration on the areas of interest. However, there is inadequate research to he...
Preprint
Full-text available
Digital humans are attracting more and more research interest during the last decade, the generation, representation, rendering, and animation of which have been put into large amounts of effort. However, the quality assessment of digital humans has fallen behind. Therefore, to tackle the challenge of digital human quality assessment issues, we pro...
Article
Quality sleep is a basic human need for well-being, yet sleep deprivation has been a long-term global problem. A common type of sleep deprivation is obstrucive sleep apnea, where people repeatedly stop breathing during sleep with subsequent abnormal vital signs, namely, respiration rate and heart rate. While tremendous effort has been made for vita...
Article
Perceptual quality assessment of the videos acquired in the wilds is of vital importance for quality assurance of video services. The inaccessibility of reference videos with pristine quality and the complexity of authentic distortions pose great challenges for this kind of blind video quality assessment (BVQA) task. Although model-based transfer l...
Preprint
Full-text available
The visual quality of point clouds has been greatly emphasized since the ever-increasing 3D vision applications are expected to provide cost-effective and high-quality experiences for users. Looking back on the development of point cloud quality assessment (PCQA) methods, the visual quality is usually evaluated by utilizing single-modal information...
Article
Current saliency prediction models based on convolutional neural networks (CNNs) achieve solid improvement in predicting human attention on omnidirectional image (ODI). However, these models that employ standard convolution have two main shortcomings: content-agnostic and computation-intensive. To address these two shortcomings, we propose a decoup...
Preprint
Full-text available
Point cloud is one of the most widely used digital representation formats for 3D contents, the visual quality of which may suffer from noise and geometric shift during the production procedure as well as compression and downsampling during the transmission process. To tackle the challenge of point cloud quality assessment (PCQA), many PCQA methods...
Preprint
Full-text available
Objective quality assessment of 3D point clouds is essential for the development of immersive multimedia systems in real-world applications. Despite the success of perceptual quality evaluation for 2D images and videos, blind/no-reference metrics are still scarce for 3D point clouds with large-scale irregularly distributed 3D points. Therefore, in...
Article
Full-text available
Efficiently modeling spatial–temporal information in videos is crucial for action recognition. To achieve this goal, state-of-the-art methods typically employ the convolution operator and the dense interaction modules such as non-local blocks. However, these methods cannot accurately fit the diverse events in videos. On the one hand, the adopted co...
Preprint
Omnidirectional images and videos can provide immersive experience of real-world scenes in Virtual Reality (VR) environment. We present a perceptual omnidirectional image quality assessment (IQA) study in this paper since it is extremely important to provide a good quality of experience under the VR environment. We first establish an omnidirectiona...
Preprint
Existing learning-based frame interpolation algorithms extract consecutive frames from high-speed natural videos to train the model. Compared to natural videos, cartoon videos are usually in a low frame rate. Besides, the motion between consecutive cartoon frames is typically nonlinear, which breaks the linear motion assumption of interpolation alg...
Article
Palmprints are of long practical and cultural interest. Palmprint principal lines, also called primary palmar lines, are one of the most dominant palmprint features and do not change over the lifespan. The existing methods utilize filters and edge detection operators to get the principal lines from the palm region of interest (ROI), but can not dis...
Conference Paper
Full-text available
Image quality assessment (IQA) is very important for both end-users and service-providers since a high-quality image can significantly improve the user's quality of experience (QoE). Most existing blind image quality assessment (BIQA) models were developed for synthetically distorted images, however, they perform poorly on in-the-wild images, which...
Preprint
With the development of rendering techniques, computer graphics generated images (CGIs) have been widely used in practical application scenarios such as architecture design, video games, simulators, movies, etc. Different from natural scene images (NSIs), the distortions of CGIs are usually caused by poor rending settings and limited computation re...
Preprint
To support the application scenarios where high-resolution (HR) images are urgently needed, various single image super-resolution (SISR) algorithms are developed. However, SISR is an ill-posed inverse problem, which may bring artifacts like texture shift, blur, etc. to the reconstructed images, thus it is necessary to evaluate the quality of super-...
Preprint
The intelligent video surveillance system (IVSS) can automatically analyze the content of the surveillance image (SI) and reduce the burden of the manual labour. However, the SIs may suffer quality degradations in the procedure of acquisition, compression, and transmission, which makes IVSS hard to understand the content of SIs. In this paper, we f...
Preprint
The 4K content can deliver a more immersive visual experience to consumers due to the huge improvement of spatial resolution. However, existing blind image quality assessment (BIQA) methods are not suitable for the original and upscaled 4K contents due to the expanded resolution and specific distortions. In this paper, we propose a deep learning-ba...
Preprint
Point cloud is one of the most widely used digital formats of 3D models, the visual quality of which is quite sensitive to distortions such as downsampling, noise, and compression. To tackle the challenge of point cloud quality assessment (PCQA) in scenarios where reference is not available, we propose a no-reference quality assessment metric for c...
Preprint
Recent years have witnessed the rapid development of image storage and transmission systems, in which image compression plays an important role. Generally speaking, image compression algorithms are developed to ensure good visual quality at limited bit rates. However, due to the different compression optimization methods, the compressed images may...
Preprint
Full-text available
We present a novel vision Transformer, named TUTOR, which is able to learn tubelet tokens, served as highly-abstracted spatiotemporal representations, for video-based human-object interaction (V-HOI) detection. The tubelet tokens structurize videos by agglomerating and linking semantically-related patch tokens along spatial and temporal domains, wh...
Article
The explosive growth of image data facilitates the fast development of image processing and computer vision methods for emerging visual applications, meanwhile introducing novel distortions to the processed images. This poses a grand challenge to existing blind image quality assessment (BIQA) models, which are weak at adapting to subpopulation shif...
Preprint
Full-text available
Blind image quality assessment (BIQA), which aims to accurately predict the image quality without any pristine reference information, has been highly concerned in the past decades. Especially, with the help of deep neural networks, great progress has been achieved so far. However, it remains less investigated on BIQA for night-time images (NTIs) wh...
Article
A single superimposed image containing two image views causes visual confusion for both human vision and computer vision. Human vision needs a "develop-then-rival" process to decompose the superimposed image into two individual images, which effectively suppresses visual confusion. However, separating individual image views from a single superimpos...
Preprint
Full-text available
Quality assessment for User Generated Content (UGC) videos plays an important role in ensuring the viewing experience of end-users. Previous UGC video quality assessment (VQA) studies either use the image recognition model or the image quality assessment (IQA) models to extract frame-level features of UGC videos for quality regression, which are re...
Article
Full-text available
Crowd counting has long been a challenging task due to the perspective distortion and variability in head size. The previous methods ignore the multi-scale information in images or simply use convolutions with different kernel sizes to extract multi-scale features, resulting in incomplete multi-scale features extracted. In this paper, we propose a...
Preprint
Full-text available
With the rapid development of multimedia technology, Augmented Reality (AR) has become a promising next-generation mobile platform. The primary theory underlying AR is human visual confusion, which allows users to perceive the real-world scenes and augmented contents (virtual-world scenes) simultaneously by superimposing them together. To achieve g...
Preprint
Full-text available
With the development of multimedia technology, Augmented Reality (AR) has become a promising next-generation mobile platform. The primary value of AR is to promote the fusion of digital contents and real-world environments, however, studies on how this fusion will influence the Quality of Experience (QoE) of these two components are lacking. To ach...
Chapter
Recently, there has been a growing interest in Ultra High-Definition (UHD) content, which brings a better visual experience for end-users. However, quite a few contents with 4K resolution are upscaled from High-Definition (HD) contents and suffer degradations in quality, such as blur, texture shift, etc. These pseudo 4K contents can not deliver the...
Preprint
Full-text available
This paper presents a new vision Transformer, named Iwin Transformer, which is specifically designed for human-object interaction (HOI) detection, a detailed scene understanding task involving a sequential process of human/object detection and interaction recognition. Iwin Transformer is a hierarchical Transformer which progressively performs token...
Preprint
Full-text available
In this paper, we propose an effective and efficient method for Human-Gaze-Target (HGT) detection, i.e., gaze following. Current approaches decouple the HGT detection task into separate branches of salient object detection and human gaze prediction, employing a two-stage framework where human head locations must first be detected and then be fed in...
Preprint
Full-text available
Conversation is an essential component of virtual avatar activities in the metaverse. With the development of natural language processing, textual and vocal conversation generation has achieved a significant breakthrough. Face-to-face conversations account for the vast majority of daily conversations. However, this task has not acquired enough atte...
Preprint
Recently, image quality has been generally describedby a mean opinion score (MOS). However, we observe that thequality scores of an image given by a group of subjects are verysubjective and diverse. Thus it is not enough to use a MOS todescribe the image quality. In this paper, we propose to describeimage quality using a parameterized distribution...
Article
Mesh is a type of data structure commonly used for 3-D shapes. Representation learning for 3-D meshes is essential in many computer vision and graphics applications. The recent success of convolutional neural networks (CNNs) for structured data (e.g., images) suggests the value of adapting insights from CNN for 3-D shapes. However, 3-D shape data a...
Article
Full-text available
Software-defined wireless sensor networks (SDWSN), where the data and control planes are decoupled, are more suited to handling big sensor data and effectively monitoring dynamic environments and events. To overcome the limitations of using static routing tables under high traffic intensity, such as network congestion, high packet loss rate, low th...
Preprint
Most video understanding methods are learned on high-quality videos. However, in most real-world scenarios, the videos are first compressed before the transportation and then decompressed for understanding. The decompressed videos are degraded in terms of perceptual quality, which may degenerate the downstream tasks. To address this issue, we propo...
Preprint
Full-text available
While recent advances in deep neural networks have made it possible to render high-quality images, generating photo-realistic and personalized talking head remains challenging. With given audio, the key to tackling this task is synchronizing lip movement and simultaneously generating personalized attributes like head movement and eye blink. In this...
Article
Full-text available
It is necessary to manually measure many parameters of eyes when an ophthalmologist diagnoses, which is time consuming, unsanitary, subjective and unrepeatable. Those manually achieved parameters often risk clinical trials in challenges on objectivity, resulting in unreliable clinical conclusions. We designed a two-phase algorithm to automatically...
Article
For people who ardently love painting but unfortunately have visual impairments, holding a paintbrush to create a work is a very difcult task. People in this special group are eager to pick up the paintbrush, like Leonardo da Vinci, to create and make full use of their own talents. Therefore, to maximally bridge this gap, we propose a painting navi...
Article
Mobile games have played an increasingly significant role in people's leisure lives in recent years, thanks to the fast expansion of the gaming industry and the widespread use of mobile devices. The aesthetic quality of game pictures is a very important factor that attracts users' interest. However, evaluating the aesthetic quality of mobile game p...
Article
Graph entropy measures have recently gained wide attention for identifying and discriminating various networks in biology, society, transportation, etc. However, existing methods cannot sufficiently explore the structural contents by merely considering the elementary invariants of a graph, ignoring the underlying patterns in higher-order features....
Article
Imagine an interesting situation when watching a movie, we can scan the screen using our smartphones to get some extra information about this movie such as the cast, the release date, the movie's homepage, etc. Our prospect is a world where each video contains invisible information that can be delivered to us through mobile devices with cameras. Th...
Article
The 4K content can deliver a more immersive visual experience to consumers due to the huge improvement in spatial resolution. However, the high spatial resolution brings a great challenge for video transmission and storage. Therefore, it is necessary to compress or downscale the 4K content before transmitting it to end-users. Existing blind image q...
Article
Image quality assessment (IQA) is very important for both end-users and service-providers since a high-quality image can significantly improve the user's quality of experience (QoE). Most existing blind image quality assessment (BIQA) models were developed for synthetically distorted images, however, they perform poorly on in-the-wild images, which...
Article
High-speed video acquisition under poor illumination conditions is a challenging task. Imaging using long exposure can ensure brightness and suppress noise. However, the captured images may be blurry due to fast object movements or camera shakes. Imaging with short exposure can record sharp textures, but the high camera gain may cause noticeable no...
Article
Low-light image enhancement algorithms (LIEA) can light up images captured in dark or back-lighting conditions. However, LIEA may introduce various distortions such as structure damage, color shift, and noise into the enhanced images. Despite various LIEAs proposed in the literature, few efforts have been made to study the quality evaluation of low...