Ruize HanTianjin University | tju · College of Intelligence and Computing
Ruize Han
Doctor of Engineering
I got my Ph.D. degree at the College of Intelligence and Computing in Tianjin University (TJU).
About
40
Publications
14,424
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,207
Citations
Introduction
Ruize Han received the B.S. degree in mathematics and applied mathematics from Hebei University of Technology, China, in 2016, and the M.E. degree in computer technology from Tianjin University, China, in 2019. He is currently a Ph.D.
candidate with the College of Intelligence and Computing at Tianjin University, China. His major research interest is visual intelligence, specifically including multi-camera video collaborative analysis and visual object tracking.
Publications
Publications (40)
Open-vocabulary multi-object tracking (OVMOT) represents a critical new challenge involving the detection and tracking of diverse object categories in videos, encompassing both seen categories (base classes) and unseen categories (novel classes). This issue amalgamates the complexities of open-vocabulary object detection (OVD) and multi-object trac...
Human interaction recognition is an essential task in video surveillance. The current works on human interaction recognition mainly focus on the scenarios only containing the close-contact interactive subjects without other people. In this paper, we handle more practical but more challenging scenarios where interactive subjects are contactless and...
Multi-view multi-human association and tracking (MvMHAT), is an emerging yet important problem for multi-person scene video surveillance, aiming to track a group of people over time in each view, as well as to identify the same person across different views at the same time, which is different from previous MOT and multi-camera MOT tasks only consi...
We study a novel yet practical problem of open-corpus multi-object tracking (OCMOT), which extends the MOT into localizing, associating, and recognizing generic-category objects of both seen (base) and unseen (novel) classes, but without the category text list as prompt. To study this problem, the top priority is to build a benchmark. In this work,...
Vision-based sign language translation (SLT) targets to translate sign language videos into understandable natural language sentences. Current SLT methods ignore the utilization of contextual information in specific dialogue scenarios, which may lead to incorrect translations that do not match the dialogue content. Accordingly, this work proposes a...
Using multiple moving cameras with different and time-varying views can significantly expand the capability of multiple human tracking in larger areas and with various perspectives. In particular, the use of moving cameras of complementary top and horizontal views can facilitate multi-human detection and tracking from both global and local perspect...
The potential of video surveillance can be further explored by using mobile cameras. Drone-mounted cameras at a high altitude can provide top views of a scene from a global perspective while cameras worn by people on the ground can provide first-person views of the same scene with more local details. To relate these two views for collaborative anal...
We tackle a new problem of multi-view camera and subject registration in the bird's eye view (BEV) without pre-given camera calibration. This is a very challenging problem since its only input is several RGB images from different first-person views (FPVs) for a multi-person scene, without the BEV image and the calibration of the FPVs, while the out...
Person re-identification (Re-ID) is a classical computer vision task and has achieved great progress so far. Recently, long-term Re-ID with clothes-changing has attracted increasing attention. However, existing methods mainly focus on image-based setting, where richer temporal information is overlooked. In this paper, we focus on the relatively new...
Gait recognition is an important AI task, which has been progressed rapidly with the development of deep learning. However, existing learning based gait recognition methods mainly focus on the single domain, especially the constrained laboratory environment. In this paper, we study a new problem of unsupervised domain adaptive gait recognition (UDA...
Human group detection, which splits crowd of people into groups, is an important step for video-based human social activity analysis. The core of human group detection is the human social relation representation and division. In this paper, we propose a new two-stage multi-head framework for human group detection. In the first stage, we propose a h...
To obtain a more comprehensive activity understanding for a crowded scene, in this paper, we propose a new problem of panoramic human activity recognition (PAR), which aims to simultaneously achieve the recognition of individual actions, social group activities, and global activities. This is a challenging yet practical problem in real-world applic...
Visual object tracking is an important task in computer vision, which has many real-world applications, e.g., video surveillance, visual navigation. Visual object tracking also has many challenges, e.g., object occlusion and deformation. To solve above problems and track the target accurately and efficiently, many tracking algorithms have emerged i...
视频目标跟踪是计算机视觉中的重要任务之一,在实际生活中有着广泛的应用,例如视频监控、视觉导航等。视 频目标跟踪任务也面临着诸多挑战,如目标遮挡、目标形变等情形。为解决目标跟踪中的挑战,实现精确高效的目标跟踪, 近年来出现大量的目标跟踪算法。本文介绍了近十年来视频目标跟踪领域两大主流算法框架(基于相关滤波和孪生网络的目 标跟踪算法)的基本原理、改进策略和代表性工作,之后按照网络结构分类介绍了其他基于深度学习的目标跟踪算法,还从 解决目标跟踪所面临挑战的角度介绍了应对各类问题的典型解决方案,并总结了视频目标跟踪的历史发展脉络和未来发展趋 势。本文还详细介绍和比较了面向目标跟踪任务的数据集和挑战赛,并从数据集的数据统计和算法的评估结果出发,总结了 各类视频目标跟踪算法的特点和优势。针对目标跟踪未来...
To obtain a more comprehensive activity understanding for a crowded scene, in this paper, we propose a new problem of panoramic human activity recognition (PAR), which aims to simultaneous achieve the individual action, social group activity, and global activity recognition. This is a challenging yet practical problem in real-world applications. Fo...
Human group detection, which splits crowd of people into groups, is an important step for video-based human social activity analysis. The core of human group detection is the human social relation representation and division.In this paper, we propose a new two-stage multi-head framework for human group detection. In the first stage, we propose a hu...
Identifying the same persons across different views plays an important role in many vision applications. In this paper, we study this important problem, denoted as Multi-view Multi-Human Association (MvMHA), on multi-view images that are taken by different cameras at the same time. Different from previous works on human association across two views...
Background
Retinal vessel segmentation benefits significantly from deep learning. Its performance relies on sufficient training images with accurate ground-truth segmentation, which are usually manually annotated in the form of binary pixel-wise label maps. Manually annotated ground-truth label maps, more or less, contain errors for part of the pix...
Multi-view Multi-human association and tracking (MvMHAT) aims to track a group of people over time in each view, as well as to identify the same person across different views at the same time. This is a relatively new problem but is very important for multi-person scene video surveillance. Different from previous multiple object tracking (MOT) and...
Crowded scene surveillance can significantly benefit from combining egocentric-view and its complementary top-view cameras. A typical setting is an egocentric-view camera, e.g., a wearable camera on the ground capturing rich local details, and a top-view camera, e.g., a drone-mounted one from high altitude providing a global picture of the scene. T...
Compared to a single fixed camera, multiple moving cameras, e.g., those worn by people, can better capture the human interactive and group activities in a scene, by providing multiple, flexible and possibly complementary views of the involved people. In this setting the actual promotion of activity detection is highly dependent on the effective cor...
Fast and accurate identification of the co-interest persons, who draw joint interest of the surrounding people, plays an important role in social scene understanding and surveillance. Previous study mainly focuses on detecting co-interest persons from a single-view video. In this paper, we study a much more realistic and challenging problem, namely...
With a good balance between accuracy and speed, correlation filter (CF) has become a popular and dominant visual object tracking scheme. It implicitly extends the training samples by circular shifts of a given target patch, which serve as negative samples for fast online learning of the filters. Since all these shifted patches are not real negative...
Sign Language Recognition (SLR) translates sign language video into natural language. In practice, sign language video, owning a large number of redundant frames, is necessary to be selected the essential. However, unlike common video that describes actions, sign language video is characterized as continuous and dense action sequence, which is diff...
The global trajectories of targets on ground can be well captured from a top view in a high altitude, e.g., by a drone-mounted camera, while their local detailed appearances can be better recorded from horizontal views, e.g., by a helmet camera worn by a person. This paper studies a new problem of multiple human tracking from a pair of top-and hori...
With the recent development and application of human–computer interaction systems, facial expression recognition (FER) has become a popular research area. The recognition of facial expression is a difficult problem for existing machine learning and deep learning models because that the images can vary in brightness, background, pose, etc. Deep lear...
Spatial regularization (SR) is known as an effective tool to alleviate the boundary effect of correlation filter (CF), a successful visual object tracking scheme, from which a number of state-of-the-art visual object trackers can be stemmed. Nevertheless, SR highly increases the optimization complexity of CF and its target-driven nature makes spati...
Video surveillance can be significantly enhanced by using both top-view data, e.g., those from drone-mounted cameras in the air, and horizontal-view data, e.g., those from wearable cameras on the ground. Collaborative analysis of different-view data can facilitate various kinds of applications, such as human tracking, person identification, and hum...
With a good balance between tracking accuracy and speed, correlation filter (CF) has become one of the best object tracking frameworks, based on which many successful trackers have been developed. Recently, spatially regularized CF tracking (SRDCF) has been developed to remedy the annoying boundary effects of CF tracking, thus further boosts the tr...
Spatial regularization (SR), being an effective tool to alleviate the boundary effects, can significantly improve the accuracy and robustness of correlation filters (CF) based visual object tracking. The core of SR is a spatially variant weight map that is used to regularize the online learned correlation filters by selecting more meaningful sample...
In this paper, we propose an effective approach to estimating a near-surface lighting function from a limited number of images captured under different illuminations. Unlike classical methods relying on simplified parallel lighting model or near-point lighting model, our approach directly focuses on the much more realistic near-surface light source...