About
168
Publications
17,142
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,290
Citations
Introduction
Skills and Expertise
Education
September 1998 - July 2001
September 1987 - July 1991
Publications
Publications (168)
Video location prediction for handwritten digits presents unique challenges in computer vision due to the complex spatiotemporal dependencies and the need to maintain digit legibility across predicted frames, while existing deep learning-based video prediction models have shown promise, they often struggle with preserving local details and typicall...
Short-term heavy rainfall prediction is a critical and practical research domain with direct implications for property and life safety. However, existing RNN-based models utilizing Weather Radar Echo Images for prediction tasks lack error calibration mechanisms at each time step, leading to a significant issue of cumulative error that results in a...
Neural radiance fields (NeRF) has achieved revolutionary breakthrough in the novel view synthesis task for complex 3D scenes. However, this new paradigm struggles to meet the requirements for real-time rendering and high perceptual quality in virtual reality. In this paper, we propose VPRF, a novel visual perceptual based radiance fields representa...
Indoor positioning is the key enabling technology for many location-aware applications. As GPS does not work indoors, various solutions are proposed for navigating devices. Among these solutions, Bluetooth low energy (BLE) technology has gained significant attention due to its affordability, low power consumption, and rapid data transmission capabi...
Vehicle re-identification (vehicle ReID) is designed to recognize all instances of a specific vehicle across various camera viewpoints, facing significant challenges such as high similarity among different vehicles from the same viewpoint and substantial variance for the same vehicle across different viewpoints. In this paper, we introduce the RAND...
In the current industrial informatics society, the numerous cameras deployed in the modern city promote the development of various video services, such as security monitoring and object retrieval. However, traditional methods encounter data leakage risks. Some camera owners are reluctant to share their data since the video contains confidential inf...
Multi-label image classification datasets are often partially labeled where many labels are missing, posing a significant challenge to training accurate deep classifiers. However, the powerful Mixup sample-mixing data augmentation cannot be well utilized to address this challenge, as it cannot perform linear interpolation on the unknown labels to c...
Model-Driven Engineering (MDE) is a technique that aims to boost productivity in software development and ensure the safety of critical systems. Central to MDE is the refinement of high-level requirement models into executable code. Given that requirement models form the foundation of the entire development process, ensuring their correctness is cr...
In 5G Radio Access Networks (RAN), network slicing is a crucial technology for offering a variety of services. Inter-slice resource allocation is important for dynamic service requirements. In order to implement inter-slice bandwidth resource allocation at a large time scale, we used the Multi-Agent deep reinforcement learning (DRL) Asynchronous Ad...
Indoor wireless positioning has long been a dynamic field of research due to its broad application range. While many commercial products have been developed, they often are not open source or require substantial and costly infrastructure. Academically, research has extensively explored Bluetooth Low Energy (BLE) for positioning, yet there are a not...
In response to the challenge of handling large-scale 3D point cloud data, downsampling is a common approach, yet it often leads to the problem of feature loss. We present a dynamic downsampling algorithm for 3D point cloud maps based on an improved voxel filtering approach. The algorithm consists of two modules, namely, dynamic downsampling and poi...
Early diagnosis of abnormal electrocardiogram (ECG) signals can provide useful information for the prevention and detection of arrhythmia diseases. Due to the similarities in Normal beat (N) and Supraventricular Premature Beat (S) categories and imbalance of ECG categories, arrhythmia classification cannot achieve satisfactory classification result...
In nature, objects that use camouflage have features like colors and textures that closely resemble their background. This creates visual illusions that help them hide and protect themselves from predators. This similarity also makes the task of detecting camouflaged objects very challenging. Methods for camouflaged object detection (COD), which re...
As the most commonly used attack strategy by Botnets, the Domain Generation Algorithm (DGA) has strong invisibility and variability. Using deep learning models to detect different families of DGA domain names can improve the network defense ability against hackers. However, this task faces an extremely imbalanced sample size among different DGA cat...
Industrial process safety has always been a concern for engineers and researchers. Fault diagnosis frameworks based on data-driven methods are prevalent and play a vital role in guaranteeing industrial process safety. However, the data collected in actual industrial production regularly exhibits high-dimensional and complex timing characteristics....
Model-Driven Engineering (MDE) is a technique that aims to boost productivity in software development and ensure the safety of critical systems. Central to MDE is the refinement of high-level requirement models into executable code. Given that requirement models form the foundation of the entire development process, ensuring their correctness is cr...
Data lakes are typically large data repositories where enterprises store data in a variety of data formats. From the perspective of data storage, data can be categorized into structured, semi-structured, and unstructured data [1]. On the one hand, due to the complexity of data forms and transformation procedures, many enterprises simply pour valuab...
Depth estimation extracting scenes' structural information is a key step in various light field(LF) applications. However, most existing depth estimation methods are based on the Lambertian assumption, which limits the application in non-Lambertian scenes. In this paper, we discover a unique transparent cheating problem for non-Lambertian scenes wh...
Light field (LF) imaging benefits a wide range of applications with geometry information it captured. However, due to the restricted sensor resolution, LF cameras sacrifice spatial resolution for sufficient angular resolution. Hence LF spatial super-resolution (LFSSR), which highly relies on inter-intra view correlation extraction, is widely studie...
The Macao Government provides web-based streaming videos for the public to monitor live traffic and road conditions across the city. This allows individuals to review the latest road traffic conditions online before planning their travels. To let road user makes better and faster decisions, it is desirable to design an automated subsystem in an Int...
Real-time rendering offers instantaneous visual feedback, making it crucial for mixed-reality applications. The light field captures both light intensity and direction in a 3D environment, serving as a data-rich medium to enhance mixed-reality experiences. However, two major challenges remain: 1) current light field rendering techniques are unsuita...
In the contemporary industrial landscape, the widespread deployment of data collection units has become the standard, significantly enhancing the synchronization of data-driven control and monitoring systems. However, high noise levels and sensor failures frequently lead to non-uniform data loss, including random and block missing, which severely h...
To enhance users’ immersion in the mixed reality (MR) cross-scene environment, it is imperative to make geometric modifications to arbitrary multi-scale virtual scenes, including adjustments to layout and size, based on the appearance of diverse real-world spaces. Numerous studies have been conducted on the layout arrangement of pure virtual scenes...
Image multi-label classification datasets are often partially labeled (for each sample, only the labels on some categories are known). One popular solution for training convolutional neural networks is treating all unknown labels as negative labels, named Negative mode. But it produces wrong labels unevenly over categories, decreasing the binary cl...
Light field (LF) is a emerging technology, which can be used in many fields. Furthermore, LF cameras can capture spatial and angular information of 3D real-world scenes. This information is beneficial for image super-resolution (SR). However, most existing LF approaches have the limitation of utilizing the global-view information, which contains th...
The task of dense video captioning is to generate detailed natural-language descriptions for an original video, which requires deep analysis and mining of semantic captions to identify events in the video. Existing methods typically follow a localisation-then-captioning sequence within given frame sequences, resulting in caption generation that is...
Zebrafish behavioral patterns reveal valuable insights for biomedical research. To accurately identify these patterns, visual tracking systems need to reconstruct 3D trajectories from multi-view video sequences. However, 3D zebrafish tracking faces challenges such as the dynamics in movements, the similarity in appearances, and the distortion cause...
Dense video caption is a task that aims to help computers analyze the content of a video by generating abstract captions for a sequence of video frames. However, most of the existing methods only use visual features in the video and ignore the audio features that are also essential for understanding the video. In this paper, we propose a fusion mod...
In person re-identification (re-id), the key to retrieving the correct person image is to extract discriminative features. The features at different levels are considered complementary. In this work, we design a person re-id learning network that can extract mutually multi-level features called ASCLNet. ASCLNet contains three feature branches, and...
As a vital technology for ensuring the stable operation of industrial equipment, fault diagnosis has received a lot of research in recent years. Most complex industrial processes are in normal working conditions during operation, so the amount of data collected under normal working conditions is much larger than that under fault working conditions....
In actual industrial processes, although a large number of original data are easy to obtain, only a few samples are effectively labelled, which is insufficient to construct a supervised fault diagnostic model. Facing the industrial demand of fault diagnosis, in this paper, a novel density ratio (DR)‐based batch active learning (BAL) fault diagnosis...
Light field (LF) images acquired by hand-held devices suffer from a trade-off between spatial and angular resolutions. To solve this problem, super-resolution (SR) in the spatial and angular domains is studied separately in previous works. However, spatial-angular correlation can not be reconstructed effectively by the separate SR methods. In this...
Light field (LF) cameras record multiple perspectives through a sparse sampling of real scenes, and these perspectives provide each other with complementary information. This information is beneficial to the LF super-resolution (LFSR). Comparing with traditional single-image super-resolution (SISR), LF has the parallax structure and perspective cor...
Multiple Object Tracking (MOT) usually adopts the Tracking-by-Detection paradigm, which transforms the problem into data association. However, these methods are restricted by detector performance, especially in dense scenes. In this paper, we propose a novel group-guided data association, which improves the robustness of MOT to error detections and...
In the digital world, gesture recognition plays a crucial role in human-computer interaction (HCI). In this paper, we propose an innovative wearable air-writing system that allows users to write the English alphabet and Arabic numerals in free space without using any predefined gestures or rules. Based on an Inertial Measurement Unit (IMU), the pro...
Light field (LF) images taken by plenoptic cameras can record spatial and angular information from real-world scenes, and it is beneficial to fully integrate these two pieces of information to improve image super-resolution (SR). However, most of the existing approaches to LF image SR cannot fully fuse the information at the spatial and angular lev...
Light field (LF) depth estimation is a crucial basis for LF-related applications. Most existing methods are based on the Lambertian assumption and cannot deal with non-Lambertian surfaces represented by transparent objects and mirrors. In this paper, we propose a novel Adaptive-Cross-Operator-based(ACO) depth estimation algorithm for non-Lambertian...
Network slicing is a critical technology for fifth-generation (5G) networks, owing to its merits in meeting the diversified requirements of users. Effective resource allocation for network slicing in Radio Access Networks (RAN) is still challenging owing to dynamic service requirements. Therein, automatic resource allocation based on environmental...
This study explores the “Diz lá!” mobile application, an innovative tool released in 2018, enabling users, especially Chinese speakers, to learn Portuguese. This mobile application harnesses the principles of
Mobile-Assisted Language Learning
(MALL) and
Self-Determination Theory
(SDT), facilitating continued language education amid the COVID-19...
Vehicle Re-Identification (ReID) aims to find images of the same vehicle from different videos. It remains a challenging task in the video analysis field due to the huge appearance discrepancy of the same vehicle in cross-view matching and the subtle difference of different similar vehicles in same-view matching. In this paper, we propose a Co-occu...
Vehicle re-identification (ReID) is a hot topic in intelligent city surveillance. With the development of smart cameras and vehicular edge computing (VEC), numerous media data has opened up new possibilities for enhancing the applications of vehicle ReID. However, traditional vehicle re-identification systems face the following challenges: 1) it is...
The problem of vehicle re-identification in surveillance scenarios has grown in popularity as a research topic. Deep learning has been successfully applied in re-identification tasks in the last few years due to its superior performance. However, deep learning approaches require a large volume of training data, and it is particularly crucial in veh...
The vehicle re-identification (V-ReID) task is critical in urban surveillance and can be used for a variety of purposes. We propose a novel augmentation method to improve the V-ReID performance. Our deep learning framework mainly consists of a local rotation transformation and a target selection module. In particular, we begin by using a random sel...
Image caption is textual explanation automatically generated by a computer according to the content in an image. It involves both image and natural language processing, and thus becomes an important research topic in pattern recognition. Deep learning has been successful in accomplishing this task, and the quality of captions generated by existing...