About
74
Publications
12,577
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,163
Citations
Introduction
Ning Xie received the ME and Ph.D degrees from the Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan, in 2009 and 2012. He is an associate professor in the School of Computer Science and Engineering, UESTC. His research interests include computer graphics, game engine, and the theory and application of artificial intelligence and machine learning. His research is supported by research grants including NSFC (China), MOE (China), CREST(Japan) and The Ministry of Education, Culture, Sports, Science and Technology(Japan).
Current institution
Publications
Publications (74)
Point clouds serve as the foundational representation of 3D objects, playing a pivotal role in both computer vision and computer graphics. Recently, the acquisition of point clouds has been effortless because of the development of hardware devices. However, the collected point clouds may be incomplete due to environmental conditions, such as occlus...
Virtual Reality (VR) creates a highly realistic and controllable simulation environment that can manipulate users' sense of space and time. While the sensation of "losing track of time" is often associated with enjoyable experiences, the link between time perception and user experience in VR and its underlying mechanisms remains largely unexplored....
Technology advancements have led to the emergence of edge devices, wearable sensors, and IoT that have revolutionized how we interact with technology. These devices enable real-time monitoring and analysis of human activities, leading to wearable sensor-based Human activity recognition (HAR). Several recognition models are proposed for federated le...
Large pre-trained models (LPMs) have demonstrated exceptional performance in diverse natural language processing and computer vision tasks. However, fully fine-tuning these models poses substantial memory challenges, particularly in resource-constrained environments. Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, mitigate this issue...
Point cloud completion aims at completing shapes from their partial. Most existing methods utilized shape’s priors information for point cloud completion, such as inputting the partial and getting the complete one through an encoder-decoder deep learning structure. However, it is very often to easily cause the loss of information in the generation...
Nowadays, AR HMDs are widely used in scenarios such as intelligent manufacturing and digital factories. In a factory environment, fast and accurate text input is crucial for operators' efficiency and task completion quality. However, the traditional AR keyboard may not meet this requirement, and the noisy environment is unsuitable for voice input....
Cross-modal retrieval (e.g., query a given image to obtain a semantically similar sentence, and vice versa) is an important but challenging task, as the heterogeneous gap and inconsistent distributions exist between different modalities. The dominant approaches struggle to bridge the heterogeneity by capturing the common representations among heter...
Zhitao Liu Yi Li Ning Xie- [...]
Wei Zhang
Virtual reality (VR) produces a highly realistic simulated environment with controllable environment variables. This paper proposes a Dynamic Scene Adjustment (DSA) mechanism based on the user interaction status and performance, which aims to adjust the VR experiment variables to improve the user's game engagement. We combined the DSA mechanism wit...
3D cross-modal retrieval is gaining attention in the multimedia community. Central to this topic is learning a joint embedding space to represent data from different modalities, such as images, 3D point clouds, and polygon meshes, to extract modality-invariant and discriminative features. Hence, the performance of cross-modal retrieval methods heav...
The establishment of 3D content with deep learning has been a focus of research in computer graphics during past years. Recently, researchers analyze 3D shapes through the dividing-and-conquer strategy with the geometry information and the structure information. Although many works perform well, there are still several problems. For example, the ge...
3D visualization of digital human becomes a key tool for the medical visualization, especially for medical education. Web3D technology has been commonly applied in this field. However, the quality of rendering is not expected for the medical purpose. Nowadays, global illumination (GI) map is an efficient tool for real-time lighting and shadow rende...
With wide applications in surveillance and human-robot interaction, view-invariant human action recognition is critical, however, challenging, due to the action occlusion and information loss caused by view change. Current methods mainly seek for a common feature space for different views. However, such solutions become invalid when there exist few...
Filtering an image by eliminating noise and irrelevant details while preserving prominent structure edges is an important pre-processing task in many fields, such as image processing and computer vision. In this study, a novel approach called anisotropic joint trilateral rolling filter (AJTRF) is proposed for the smoothing of an image while preserv...
> It is difficult to extract text for character recognition and document analysis from document images that contain irregular backgrounds owing to a multitude of causes such as aging, bleed-through, creased paper, smears and stains, or uneven shading. In this paper, we propose a method called progressive restoration of text strokes (PRTS) for extra...
Liang Peng Yang Yang Yi Bin- [...]
Xing Xu
Visual attention, which allows more concentration on the image regions that are relevant to a reference question, brings remarkable performance improvement in Visual Question Answering (VQA). Most VQA attention models employ the entire reference question representation to query relevant image regions. Nonetheless, only certain salient words of the...
As one of the most successful multimedia tools for digital media and creative industry, computer-aided drawing system assists users to convert the input real photos into painterly style images. Nowadays, it is widely developed as cloud brush engine service in many creative software tools and applications of artistic rendering such as Prisma, Photos...
In this paper, we study the problem of cross-modal retrieval by hashing-based approximate nearest neighbor (ANN) search techniques. Most existing cross-modal hashing work mainly addresses the issue of multi-modal integration complexity using the same mapping and similarity calculation for data from different media types. Nonetheless, this may cause...
Zheng is one of the most representative plucked-stringed musical instruments in East Asia over three thousand years as illustrated in Figure 1. Up to now, although it is still popular, there are several factors blocking its spreading. The musical instrument is precious and not suitable to carry along. Moreover, it is hard to learn on one own.
Recent years have witnessed the unprecedented efforts of visual representation for enabling various efficient and effective multimedia applications. In this paper, we propose a novel visual representation learning framework, which generates efficient semantic hash codes for visual samples by substantially exploring concepts, semantic attributes as...
We propose a framework for statistical modeling of the 3D geometry and topology of botanical trees. We treat botanical trees as points in a tree‐shape space equipped with a proper metric that captures the geometric and the topological differences between trees. Geodesics in the tree‐shape space correspond to the optimal sequence of deformations, i....
We propose a framework for statistical modeling of the 3D geometry and topology of botanical trees. We treat botanical trees as points in a tree-shape space equipped with a proper metric that captures the geometric and the topological differences between trees. Geodesics in the tree-shape space correspond to the optimal sequence of deformations, i....
Predicting the popularity of Point of Interest (POI) has become increasingly crucial for location-based services, such as POI recommendation. Most of the existing methods can seldom achieve satisfactory performance due to the scarcity of POI's information, which tendentiously confines the recommendation to popular scene spots, and ignores the unpop...
Video captioning has been attracting broad research attention in the multimedia community. However, most existing approaches heavily rely on static visual information or partially capture the local temporal knowledge (e.g., within 16 frames), thus hardly describing motions accurately from a global view. In this paper, we propose a novel video capti...
Hashing methods have been extensively applied to efficient multimedia data indexing and retrieval on account of explosion of multimedia data. Cross-modal hashing usually learns binary codes by mapping multi-modal data into a common Hamming space. Most supervised methods utilize relation information like class labels as pairwise similarities of cros...
Recent years have witnessed the unprecedented efforts of visual representation for enabling various efficient and effective multimedia applications. In this paper, we propose a novel visual representation framework, which generates efficient semantic hash codes for visual samples by substantially exploring concepts, semantic attributes as well as t...
Among various traditional art forms, brush stroke drawing is one of the widely used styles in modern computer graphic tools such as GIMP, Photoshop and Painter. In this paper, we develop an AI-aided art authoring (A4) system of non-photorealistic rendering that allows users to automatically generate brush stroke paintings in a specific artist's sty...
We propose an algorithm for generating novel 3D tree model variations from existing ones via geometric and structural blending. Our approach is to treat botanical trees as elements of a tree-shape space equipped with a proper metric that quantifies geometric and structural deformations. Geodesics, or shortest paths under the metric, between two poi...
Action recognition in videos, which contains many complex and semantic contents, is still a challenging task in computer vision research. In this paper, we propose a novel attention mechanism that leverages the gate system of Long Short Term Memory (LSTM) to compute the attention weights for action recognition. The proposed attention mechanism is e...
Subspace representations have been widely applied for videos in many tasks. In particular, the subspace-based query-by-image video retrieval (QBIVR), facing high challenges on similarity-preserving measurements and efficient retrieval schemes, urgently needs considerable research attention. In this paper, we propose a novel subspace-based QBIVR fra...
Computer-aided drawing system assists users to convert the input real photos into painterly style images. Nowadays, it is widely developed as cloud brush engine service in many creative software tools and applications of artistic rendering such as Prisma [1], Photoshop [2], and Meitu [3], because the machine learning server has more powerful than t...
Large-scale search methods are increasingly critical for many content-based visual analysis applications, among which hashing-based approximate nearest neighbor (ANN) search techniques have attracted broad interests due to their high efficiency in storage and retrieval. However, existing hashing works are commonly designed for measuring data simila...
With the explosive growth of user-generated contents (e.g., texts, images and videos) on social networks, it is of great significance to analyze and extract people’s interests from the massive social media data, thus providing more accurate personalized recommendations and services. In this paper, we propose a novel multimodal deep learning algorit...
In this paper, we introduce a novel deep semantic indexing method, a.k.a. captioning, for image database. Our method can automatically generate a natural language caption describing an image as a semantic reference to index the image. Specifically, we use a convolutional localization network to generate a pool of region proposals from an image, and...
Zero-shot learning (ZSL) aims to recognize classes whose samples did not appear during training. Existing research focuses on mapping deep visual feature to semantic embedding space explicitly or implicitly. However, ZSL improvements led by discriminative feature transformation is not well studied. In this paper, we propose a ZSL framework that map...
Predicting the popularity of Point of Interest (POI) has become increasingly crucial for location-based services, such as POI recommendation. Most of the existing methods can seldom achieve satisfactory performance due to the scarcity of POI's information, which tendentiously confines the recommendation to popular scenic spots, and ignores the unpo...
Cosmetic medical visualization has become an important application in computer graphics, especially for facial appearance visualization[Chandawarkar et al. 2013]. Recent approaches have reached very realistic results by blend shape[Ma et al. 2012], which is the most practical tool to make the facial appearance and expression animation in applicatio...
Text line extraction from a document image is a very important task for optical character recognition, document analysis etc. In this paper, a novel approach is presented to extract text lines from a printed or handwritten document image. The document image is binarized at first, and then connected components are detected and consequently character...
Nowadays, voice acting plays more advanced in the video games, especially for the role-playing games, anime-based games and serious games. In order to enhance the communication, synchronizing the lip and mouth movements naturally is an important part of convincing 3D character performance [Xu et al. 2013]. In this paper, we propose a lightweight Li...
This paper proposes a simple approach for extracting text lines and segmenting image regions from a textual and non-textual region mixed color document image with uneven shading, and finally a clean document image is obtained. Our experimental results demonstrate that the proposed approach performs plausible.
As a result of informatization in construction, Building Information Modeling (BIM) has now become a core technology for smart construction. We present a Web3D-based lightweighting solution for real-time visualization of large-scale BIM scenes, considering the redundancy, semantics, and the parameterization of BIM data under the limited resources o...
This paper explores how the reconstruction of special history scenario will be applied in online education. After investigating various virtual reality techniques including design of virtual educational system, reconstruction of virtual scene, management of scene, AI, lightweighting for 3D model and light shadow rendering, we build an online educat...
With rapid popularization of mobile camera, capturing a document and storing it become easy. However, when the document is illuminated under poor conditions, the document image may appear uneven shading. In that case, it is difficult to restore texts for character recognition and document analysis etc. In this paper, we propose an effective and sim...
Different from the strategy of virtual scene in the stand-alone application, web version may have larger scale scene and more users online at same time. However, because of the network delay (latency) and limitation of computational power of the servers, it causes that users unable to interactively access the virtual scene fluently. Meanwhile, the...
3D tree models are widely applied to construct large-scale virtual scenes. However, converting real trees into computer representation faces two main problems. One is the low quality of reconstructed point cloud. The other is that the skeletonization produces inaccurate results due to the complex structure of trees. We propose a novel pipeline to r...
Recent years have witnessed the effectiveness and efficiency of learning-based hashing methods which generate short binary codes preserving the Euclidean similarity in the original space of high dimension. However, because of their complexities and out-of-sample problems, most of methods are not appropriate for embedding of large-scale datasets. In...
The neighbor table/distributed hash table (DHT) is used to choose the data supplier for data-dispatching services in distributed virtual environments based on peer-to-peer networks. It is essential that a stable and efficient neighbor table/DHT be maintained. Because the avatar has much freedom to roam, the spatial distribution of nodes is not unif...
Accessing Web3D contents is relatively slow through Internet under limited bandwidth. Preprocessing of 3D models can certainly alleviate the problem, such as 3D compression and progressive meshes (PM). But none of them considers the similarity between components of a 3D model, so that we could take advantage of this to further improve the efficienc...
Segmentation of document images into text or drawings is an important process, which is often related to binarization of a document image to perform character recognition and document analysis. This process is easier to do using a document image with a uniform background and illuminated under well-conditioned lighting. However, when a document imag...
Although image contrast enhancement is a low level image processing issue, it is very important for improving image quality. This paper propose a new algorithm using optimal linear transform to enhance color image while preserving hue attributes and scaling saturation for each color pixel. The algorithm has two features: 1) optimal piecewise affine...
Regression aims at estimating the conditional mean of output given input. However, regression is not informative enough if the conditional density is multimodal, heteroskedastic, and asymmetric. In such a case, estimating the conditional density itself is preferable, but conditional density estimation (CDE) is challenging in high-dimensional space....
We propose a collaborative web interior design system that assists multi-users in real time house editing and furniture arrangement. Our system incorporates the transparent adaptation layer, which is an important network protocol, to achieve the online multiplayer real-time collaborative interoperability. Instead of adapting the basic linear stack...
As a pre-processing step for document analysis, binarizing document image is often performed. The bina- rization is regarded as a necessary step to separate text and background. When document images are acquired under poor lighting circumstances, the images are soiled with uneven shading. This paper presents an improved approach for binarizing docu...
Oriental ink painting, called Sumi-e, is one of the most appealing painting
styles that has attracted artists around the world. Major challenges in
computer-based Sumi-e simulation are to abstract complex scene information and
draw smooth and natural brush strokes. To automatically find such strokes, we
propose to model the brush as a reinforcement...
In this work, we propose a new non-photo realistic rendering approach to the creation of artistic painting from a color image. Our algorithm consists chiefly of two steps: a trilateral filter is firstly applied to the original image for creating drawings vertical to the edges and then a DoG-like band-pass filter is adapted for generating streams al...
We propose a sketch-based system for rendering oriental brush strokes on complex shapes. While previous research has focused on methods for converting user-specified trajectories into oriental ink painting (Sumi-e) strokes, we propose an approach that takes as input the contours of complex shapes, estimates automatically the sizes of the brush foot...
We propose an interactive sketch-based system for rendering oriental brush strokes on complex shapes. We introduce a contour-driven approach; the user inputs contours to represent complex shapes, the system estimates automatically the optimal trajectory of the brush, and then renders them into oriental ink paintings. Unlike previous work where the...
We propose in this paper an interactive sketch-based system for simulating oriental brush strokes on complex shapes. We introduce a contour-driven approach where the user inputs contours to represent complex shapes, the system estimates automatically the optimal trajectory of the brush, and then renders them into oriental ink painting. Unlike previ...