Lei Qin

Lei Qin
  • Associate Professor
  • Institute of Computing Technology, Chinese Academy of Sciences

About

62
Publications
15,198
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,105
Citations

Publications

Publications (62)
Article
Convolutional neural networks (CNNs) have been applied to visual tracking with demonstrated success in recent years. However, the performance of CNN-based trackers can be further improved, because the predicted upright bounding box cannot tightly enclose the target due to factors such as deformations and rotations. Besides, many existing CNN-based...
Article
Convolutional Neural Networks (CNNs) have been applied to visual tracking with demonstrated success in recent years. Most CNN-based trackers utilize hierarchical features extracted from a certain layer to represent the target. However, features from a certain layer are not always effective for distinguishing the target object from the backgrounds e...
Article
Sparse coding has been applied to visual tracking and related vision problems with demonstrated success in recent years. Existing tracking methods based on local sparse coding sample patches from a target candidate and sparsely encode these using a dictionary consisting of patches sampled from target template images. The discriminative strength of...
Article
Full-text available
Inspired by the photometric invariance of color space, this paper proposes a simple yet powerful descriptor for object detection and recognition, called Rotative Maximal Pattern (RMP). The effectiveness of RMP comes from the two components: Rotatable Couple Templates (RCTs) with max pooling, and Normalized Histogram Intersection (NHI) with the theo...
Conference Paper
Full-text available
The Visual Object Tracking challenge VOT2016 aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 70 trackers are presented, with a large number of trackers being published at major computer vision conferences and journals in the recent years. The number of tested state-of-...
Conference Paper
Although saliency prediction in crowd has been recently recognized as an essential task for video analysis, it is not comprehensively explored yet. The challenges lie in that eye fixations in crowded scenes are inherently "distinct" and "multi-modal", which differs from those in regular scenes. To this end, the existing saliency prediction schemes...
Article
Crowd behavior analysis has recently attracted extensive attention in research. However, the existing research mainly focuses on investigating motion patterns in crowds, while the emotional aspects of crowd behaviors are left unexplored. Analyzing the emotion of crowd behaviors is indeed extremely important, as it uncovers the social moods that are...
Conference Paper
In this paper, we present a novel unsupervised method for abnormal behavior detection, which considers both local and global contextual information. For the local contextual representation, we firstly divide video frames into local regions, then extract low-level feature such as histogram of orientated optical flow (HOF) and sequential feature whic...
Conference Paper
Group detection becomes an important task in crowd behavior surveillance. However, most existing methods ignore the formation persistency characteristics, which predict unreliable interactions when the crowd is realistic and complex. To address this issue, we propose a novel graph-based method to declare that the formation period really matters for...
Article
In this study, the authors propose a collaborative composition model for automatically recommending suitable positions and poses in the scene of photography taken by amateurs. By analysing aesthetic-aware features, the authors' strategy jointly takes attention and geometry composition into account to learn the aesthetic manifestation knowledge of p...
Article
With the fast development of mobile devices as well as the broadband wireless network, mobile devices are playing a more and more important role in people's daily life. Nowadays, many landmark images are captured by mobile devices. However, these images are often captured under different lightening conditions with varied poses and camera orientatio...
Article
Full-text available
Person re-identification aims at matching individuals across multiple non-overlapping adjacent cameras. By condensing multiple gallery images of a person as a whole, we propose a novel method named Set-Label Model (SLM) to improve the performance of person re-identification under the multi-shot setting. Moreover, we utilize mutual-information to me...
Patent
Full-text available
A method and device for extracting color features, relating to the field of image processing includes converting an original image into sub-images corresponding to channels in a color space, dividing the sub-images into a plurality of cells with identical size, and calculating the color histograms of each of the plurality of cells. A cell and neigh...
Article
Dictionary in Local Coordinate Coding (LCC) is important to approximate a non-linear function with linear ones. Optimizing dictionary from predefined coding schemes is a challenge task. This paper focuses on learning dictionary from two Locality Coding Adaptors (LCAs), i.e., locality Gaussian Adaptor (GA) and locality Euclidean Adaptor (EA), for la...
Article
In real application scenarios, the visual observations of the same type of action vary significantly from one view to another. This paper addresses the action recognition problem under the view changes, especially when no labels are available in the target view. A novel feature, called Sequential Motion Accumulation (SMA), is proposed to characteri...
Article
With the widespread use of depth sensors, it is crucial to provide an effective and efficient solution for human action analysis applications upon the informative depth data. In this paper, we present a generic framework of modeling the human action by deep architecture enhanced local features with depth data. To introduce robust higher-level repre...
Article
Building on the recent advances in the Fisher kernel framework for image classification, this paper proposes a novel image representation for head yaw estimation. Specifically, for each pixel of the image, a concise 9-dimensional local descriptor is computed consisting of the pixel coordinates, intensity, the first and second order derivatives, as...
Article
Most of the previous works for multitarget tracking employ two strategies: global optimization and online state estimation. In general, global methods attempt to prevent local optimization and find the best results given global models. However, in time-critical applications, global optimization has long temporal latency. In contrast, most of the on...
Article
The aim of this paper was to address the problem of dense crowd event recognition in the surveillance video. Previous particle flow-based methods efficiently capture the convolutional motion in the crowded scene. However, the group-level description was rarely studied due to huge loss of group structure and intra-class variability. To address these...
Conference Paper
Previous works for multi-target tracking employ two strategies: global optimization and online state estimation. In time-critical applications, the former methods have long temporal latency, and the latter can’t recover from erroneous association or drifting. In this paper, we combine these two strategies, and propose a new low-latency online track...
Conference Paper
The Visual Object Tracking challenge 2014, VOT2014, aims at comparing short-term single-object visual trackers that do not ap-ply pre-learned models of object appearance. Results of 38 trackers are presented. The number of tested trackers makes VOT 2014 the largest benchmark on short-term tracking to date. For each participating tracker, a short de...
Conference Paper
Full-text available
The Visual Object Tracking challenge 2014, VOT2014, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 38 trackers are presented. The number of tested trackers makes VOT 2014 the largest benchmark on short-term tracking to date. For each participating tracker, a short des...
Article
The bag of visual words model (BoW) and its variants have demonstrated their effectiveness for visual applications. The BoW model first extracts local features and generates the corresponding codebook where the elements of a codebook are viewed as visual words. However, the codebook is dataset dependent and has to be generated for each image datase...
Article
Human actions are important contents which are helpful for video analysis and interpretation. Recently, notable methods have been proposed to recognize individual actions and pair׳s interactions, whereas recognizing more complex actions involving multiple persons remains a challenge. In this paper, we focus on the actions performed by a small group...
Article
This paper proposes an approach to recognize human activity, which is based on tracking trajectories of local spatio-temporal feature points. To make up for the temporal information loss of local features, this paper uses the KLT feature tracker to track each spatial-temporal local feature and treats the tracked feature trajectory snippets as the b...
Article
In this letter, we propose an online discriminative learning method for feature combination during multi-target tracking. Previous works utilize offline learned weights for fusion of multiple features, which is not always effective for different tracking contexts. Our work aims to update the weights adaptively in online tracking. We formulate the f...
Article
Interactions among pedestrians usually play an important role in understanding crowd behavior. However, there are great challenges, such as occlusions, motion, and appearance variance, on accurate analysis of pedestrian interactions. In this paper, we introduce a novel social attribute-aware force model (SAFM) for detection of abnormal crowd events...
Article
Visual Human Action Recognition is a universal hot topic of image processing, computer vision, pattern recognition, machine learning and artificial intelligence with wide application in video surveillance, human-computer interaction, virtual reality, content based video retrieval, video coding, etc. In this paper, we analyze the state-of-the-arts a...
Article
Full-text available
Considerable progress has been made on hand-crafted features in object detection, while little effort has been devoted to make use of the color cues. In this paper, we study the role of color cues in detection via a representative object, i.e., pedestrian, as its variability of pose or appearance is very common for "general" objects. The efficiency...
Article
Moving object detection is an important task in real-time video surveillance. However, in real scenario, moving cast shadows associated with moving objects may also be detected, making moving cast shadow detection a challenge for video surveillance. In this paper, we propose an adaptive shadow detection method based on the cast shadow model. The me...
Chapter
Although the spatial–temporal local features and the bag of visual words model (BoW) have achieved a great success and a wide adoption in action classification, there still remain some problems. First, the local features extracted are not stable enough, which may be aroused by the background action or camera shake. Second, using local features alon...
Conference Paper
Full-text available
Identifying individuals in multi-view camera network, known as person re-identification, becomes an emerging topic for video surveillance. In this paper, we address person re-identification as a set-based classification problem and introduce mutual-information to fully utilize gallery information. Firstly, we define a set-based structure that conta...
Conference Paper
Occlusion is one of the challenging problems in visual tracking. Most of the previous works alleviate this problem by randomly sampled weak features, or analyze it by methods closely related with specific trackers. In this paper, we propose an effective mechanism to detect the occlusion status by random forests, and embed this method into object tr...
Conference Paper
Boosting has been extensively used in image processing. Many work focuses on the design or the usage of boosting, but training boosting on large-scale datasets tends to be ignored. To handle the large-scale problem, we present stochastic boosting (StocBoost) that relies on stochastic gradient descent (SGD) which uses one sample at each iteration. T...
Conference Paper
In this paper, a novel crowd behavior representation, Bag of Trajectory Graphs (BoTG), is presented for dense crowd event recognition. To overcome huge loss of crowd structure and variability of motion in previous particle flow based methods, we design group-level representation beyond particle flow. From the observation that crowd particles are co...
Article
Most existing feature selection methods for object tracking assume that the samples in the previous frames are governed by the same distribution of the labeled samples obtained in the current frame and unlabeled samples collected in the next frame. However, according to our statistical analysis on very common videos, this assumption is not true in...
Conference Paper
Improving human action recognition in videos is restricted by the inherent limitations of the visual data. In this paper, we take the depth information into consideration and construct a novel dataset of human daily actions. The proposed ACT42 dataset provides synchronized data from 4 views and 2 sources, aiming to facilitate the research of action...
Article
Currently, Nearest-Neighbor approaches (NN) have been applied to large scale real world image data mining. However, the following three disadvantages prevent them from wider application compared to other machine learning methods: (i) the performance is inferior on small datasets; (ii) the performance will degrade for data with high dimensions; (iii...
Conference Paper
As an important aspect in video content analysis, event detection is still an open problem. In particular, the study on detecting interactive events in crowd scenes is still limited. In this paper, we investigate detecting interactive events between persons, e.g. PeopleMeet, PeopleSplitUp and Embrace in complex scenes using a sequence learning base...
Conference Paper
In this paper, we present an intelligent portrait photographing framework for automatically recommending the suitable positions and poses in the scene of photography taken by amateurs. By analyzing aesthetic characteristics features, we propose a solution by constructing aesthetic composition representation which covers the attention composition an...
Conference Paper
In this paper, a novel social attribute-aware force model is presented for abnormal crowd pattern detection in video sequences. We take social characteristics of crowd behaviors into account in order to improve the effectiveness of the simulation on the interaction behaviors of the crowd. A quick unsupervised method is proposed to estimate the scen...
Article
In this paper, we propose a new feature subset evaluation method for feature selection in object tracking. According to the fact that a feature which is useless by itself could become a good one when it is used together with some other features, we propose to evaluate feature subsets as a whole for object tracking instead of scoring each feature in...
Conference Paper
In this paper, we present a theoretical analysis on learning anchors for local coordinate coding (LCC), which is a method to model functions for data lying on non-linear manifolds. In our analysis several local coding schemes, i.e., orthogonal coordinate coding (OC-C), local Gaussian coding (LGC), local Student coding (LSC), are theoretically compa...
Conference Paper
Feature plays an important role in pedestrian detection, and considerable progress has been made on shape-based descriptors. However, color cues have barely been devoted to detection tasks, seemingly due to the variable appearance of pedestrians. In this paper, Color Maximal-Dissimilarity Pattern (CMDP) is proposed to encode color cues by two core...
Conference Paper
Full-text available
Human activity analysis is an important and challenging task in video content analysis and understanding. In this paper, we focus on the activity of small human group, which involves countable persons and complex interactions. To cope with the variant number of participants and inherent interactions within the activity, we propose a hierarchical mo...
Conference Paper
Full-text available
Most feature selection methods for object tracking assume that the labeled samples obtained in the next frames follow the similar distribution with the samples in the previous frame. However, this assumption is not true in some scenarios. As a result, the selected features are not suitable for tracking and the “drift” problem happens. In this paper...
Conference Paper
Tracking non-rigid objects with significant shape variation in complex scenario is a difficult problem. Human tracking is a special case of this problem since human body has good local rigid properties. In this paper, we propose a novel human tracking method which explores the local rigid properties while keeping the global structure very well. Thi...
Article
Full-text available
In object detection, disparities in distributions between the training samples and the test ones are often inevitable, resulting in degraded performance for application scenarios. In this paper, we focus on the disparities caused by viewpoint and scene changes and propose an efficient solution to these particular cases by adapting generic detectors...
Conference Paper
Full-text available
Nowadays, various efforts have sprung up aiming to automatically analyze home videos and provide users satisfactory experiences. In this paper, we present a novel user experience for home video called Memory Matrix, which could facilitate users to re-experience the joy of their memories, travelling along not only the time axis but also the space ax...
Conference Paper
Full-text available
The spatial-temporal local features and the bag of words representation have been widely used in the action recognition field. However, this framework usually neglects the internal spatial-temporal relations between video-words, resulting in ambiguity in action recognition task, especially for videos “in the wild”. In this paper, we solve this prob...
Conference Paper
Full-text available
Nowadays, the issue of objective video quality assessment has been extensively studied. However, the human visual system (HVS) is the ultimate receiver for videos thus leading to a gap between objective scores calculated by computers and subjective preferences given by observers. In this paper, we focus on bridging this gap by introducing a psychol...
Conference Paper
Human action recognition has been well studied recently, but recognizing the activities of more than three persons remains a challenging task. In this paper, we propose a motion trajectory based method to classify human group activities. Gaussian Processes are introduced to represent human motion trajectories from a probabilistic perspective to han...
Article
Full-text available
Previous web image re-ranking approaches usually construct similarity measure on image level. Considering the diversity of large scale web image database, these approaches ignore the difference of importance between target area and background area in images, thus are not robust to background clutter and may bring some false similarity contribution...
Conference Paper
In this paper, we present a novel approach to classify texture collections. This approach does not require experts to provide annotated training set. Given the image collection, we extract a set of invariant descriptors from each image. The descriptors of all images are vec- tor-quantized to form 'keypoints'. Then we represent the texture images by...
Conference Paper
Full-text available
Image matching is a fundamental task of many computer vision problems. In this paper we present a novel approach to match two images in presenting significant geometric deformations and considerable photometric variations. The approach is based on local invariant features. First, local invariant regions are detected by a three-step process which de...
Conference Paper
Full-text available
Image matching is a fundamental task of many computer vision problems. In this paper we present a novel approach for matching two images in the presence of image rotation, scale, and illumination changes. The proposed approach is based on local invariant features. A two-step process detects local invariant regions. Characteristic circles associated...
Conference Paper
Full-text available
In this paper we present a novel approach to match two images in presenting large scale and rotation changes. The proposed ap- proach is based on scale invariant region description. Scale invariant re- gion is detected by a two-step process and represented by a new descrip- tor. The descriptor is a two-dimensional gray-level histogram. Different de...

Network

Cited By