Article

General Tensor Discriminant Analysis and Gabor Features for Gait Recognition

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The traditional image representations are not suited to conventional classification methods, such as the linear discriminant analysis (LDA), because of the under sample problem (USP): the dimensionality of the feature space is much higher than the number of training samples. Motivated by the successes of the two dimensional LDA (2DLDA) for face recognition, we develop a general tensor discriminant analysis (GTDA) as a preprocessing step for LDA. The benefits of GTDA compared with existing preprocessing methods, e.g., principal component analysis (PCA) and 2DLDA, include 1) the USP is reduced in subsequent classification by, for example, LDA; 2) the discriminative information in the training tensors is preserved; and 3) GTDA provides stable recognition rates because the alternating projection optimization algorithm to obtain a solution of GTDA converges, while that of 2DLDA does not. We use human gait recognition to validate the proposed GTDA. The averaged gait images are utilized for gait representation. Given the popularity of Gabor function based image decompositions for image understanding and object recognition, we develop three different Gabor function based image representations: 1) the GaborD representation is the sum of Gabor filter responses over directions, 2) GaborS is the sum of Gabor filter responses over scales, and 3) GaborSD is the sum of Gabor filter responses over scales and directions. The GaborD, GaborS and GaborSD representations are applied to the problem of recognizing people from their averaged gait images.A large number of experiments were carried out to evaluate the effectiveness (recognition rate) of gait recognition based on first obtaining a Gabor, GaborD, GaborS or GaborSD image representation, then using GDTA to extract features and finally using LDA for classification. The proposed methods achieved good performance for gait recognition based on image sequences from the USF HumanID Database. Experimental comparisons are made with nine state of the art classification methods in gait recognition.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The Gabor filter is a windowed Fourier transform that combines a sinusoidal signal with a Gaussian wave. It is widely used in image processing for extracting low-level texture features [30,35,29,1,40]. The motivations of using Gabor filters in fine-grained recognition can be illustrated in two-folds. ...
... Gabor Filters. The Gabor filter is an effective tool for extracting texture features, which makes it widely used in various computer vision applications such as face recognition [30,35,29], vehicle verification [1], object detection [42,22,39] and gait recognition [40]. Recently, some works [18,32,50,53] introduce Gabor filters into deep neural networks. ...
... An effective Gabor filter requires the appropriate combination of parameters to match a given task. Previous works [40,30,42] have generally used a hand-crafted approach for parameter selection, relying on heuristic rules to inform manual parameter settings. However, this kind of handcrafted approach is highly dependent on the expertise of the individual and cannot guarantee effectiveness for the task at hand. ...
Preprint
Extracting and using class-discriminative features is critical for fine-grained recognition. Existing works have demonstrated the possibility of applying deep CNNs to exploit features that distinguish similar classes. However, CNNs suffer from problems including frequency bias and loss of detailed local information, which restricts the performance of recognizing fine-grained categories. To address the challenge, we propose a novel texture branch as complimentary to the CNN branch for feature extraction. We innovatively utilize Gabor filters as a powerful extractor to exploit texture features, motivated by the capability of Gabor filters in effectively capturing multi-frequency features and detailed local information. We implement several designs to enhance the effectiveness of Gabor filters, including imposing constraints on parameter values and developing a learning method to determine the optimal parameters. Moreover, we introduce a statistical feature extractor to utilize informative statistical information from the signals captured by Gabor filters, and a gate selection mechanism to enable efficient computation by only considering qualified regions as input for texture extraction. Through the integration of features from the Gabor-filter-based texture branch and CNN-based semantic branch, we achieve comprehensive information extraction. We demonstrate the efficacy of our method on multiple datasets, including CUB-200-2011, NA-bird, Stanford Dogs, and GTOS-mobile. State-of-the-art performance is achieved using our approach.
... During the many stages of image handling, including acquisition, transmission, processing, storage, etc., various distortions in image quality may occur. Image quality assessment (IQA) has become an important need in a wide range of computer-vision-related applications, such as image retrieval [1,2] and visual recognition [3,4]. Humans have the ability to subjectively evaluate the quality of digital images. ...
... (2) We systematically reviewed representative feature extraction techniques and regression models for two-stage methods and typical architectures for end-to-end one-stage approaches, which could help researchers keep track of the recent progress of BIQA. (3) We analyzed the performance of these competing methods and proposed some future research directions based on the discussion of the performance results. ...
Article
Full-text available
As a fundamental research problem, blind image quality assessment (BIQA) has attracted increasing interest in recent years. Although great progress has been made, BIQA still remains a challenge. To better understand the research progress and challenges in this field, we review BIQA methods in this paper. First, we introduce the BIQA problem definition and related methods. Second, we provide a detailed review of the existing BIQA methods in terms of representative hand-crafted features, learning-based features and quality regressors for two-stage methods, as well as one-stage DNN models with various architectures. Moreover, we also present and analyze the performance of competing BIQA methods on six public IQA datasets. Finally, we conclude our paper with possible future research directions based on a performance analysis of the BIQA methods. This review will provide valuable references for researchers interested in the BIQA problem.
... Therefore, it is necessary to explore a feature construction method that can simultaneously characterize multi-mode information of EEG for the accurate mental workload evaluation. Tensor is a representation of high-dimensional information of data, which has been widely used in the fields of gait recognition [29], image recognition [30] and so on. The intrinsic attributes of EEG data can be effectively retained by the tensorized form of representation [25]- [27]. ...
... Although minimizing the distribution distance of tensor features can reduce the distribution diversity, it cannot guarantee the learned features have enough ability in distinguishing workload levels. Therefore, the General Tensor Discriminative Analysis (GTDA) [29], [31] methods were introduced to construct the objective function of class-wise discrimination learning procedure after the distribution alignment, which can be seen as: (5) In which, By unfolding (5), the class-wise discrimination learning cost function can be represented as: ...
Article
Full-text available
The accurate evaluation of mental workload of operators in human machine systems is of great significance in ensuring the safety of operators and the correct execution of tasks. However, the effectiveness of EEG based cross-task mental workload evaluation are still unsatisfactory because of the different EEG response patterns in different tasks, which hindered its generalization in real scenario severely. To solve this problem, this paper proposed a feature construction method based on EEG tensor representation and transfer learning, which was verified in various task conditions. Specifically, four working memory load tasks with different types of information were designed firstly. The EEG signals of participants were collected synchronously during task execution. Then, the wavelet transform method was used to perform time-frequency analysis of multi-channel EEG signals, and three-way EEG tensor (time-frequency-channel) features were constructed. EEG tensor features from different tasks were transferred based on the criteria of feature distribution alignment and class-wise discrimination criteria. Finally, the support vector machine was used to construct a 3-class mental workload recognition model. Results showed that compared with the classical feature extraction methods, the proposed method can achieve higher accuracy in both within-task and cross-task mental workload evaluation (91.1% for within-task and 81.3% for cross-task). These results demonstrated that the EEG tensor representation and transfer learning method is feasible and effective for cross-task mental workload evaluation, which can provide theoretical basis and application reference for future researches.
... For feature-based methods, this paper employs several image classification techniques that combine various handcrafted feature methods with the classification algorithm Support Vector Machine (SVM), namely SVM + Histogram, SVM + DWT [43], SVM + Gabor [44], SVM + SIFT [6], SVM + HOG [7], and SVM + uLBP [5]. These methods demonstrate relatively low computational complexity and good performance across multiple datasets. ...
Article
Full-text available
Accurately classifying degraded images is a challenging task that relies on domain expertise to devise effective image processing techniques for various levels of degradation. Genetic Programming (GP) has been proven to be an excellent approach for solving image classification tasks. However, the program structures designed in current GP-based methods are not effective in classifying images with quality degradation. During the iterative process of GP algorithms, the high similarity between individuals often results in convergence to local optima, hindering the discovery of the best solutions. Moreover, the varied degrees of image quality degradation often lead to overfitting in the solutions derived by GP. Therefore, this research introduces an innovative program structure, distinct from the traditional program structure, which automates the creation of new features by transmitting information learned across multiple nodes, thus improving GP individual ability in constructing discriminative features. An accompanying evolution strategy addresses high similarity among GP individuals by retaining promising ones, thereby refining the algorithm’s development of more effective GP solutions. To counter the potential overfitting issue of the best GP individual, a multi-generational individual ensemble strategy is proposed, focusing on constructing an ensemble GP individual with an enhanced generalization capability. The new method evaluates performance in original, blurry, low contrast, noisy, and occlusion scenarios for six different types of datasets. It compares with a multitude of effective methods. The results show that the new method achieves better classification performance on degraded images compared with the comparative methods.
... The information in the GEI is likely to drastically change the person's appearance under different covariate conditions [2]. In the past, GEI model-free gait templates have been adopted in a number of empirical studies of gait recognition algorithms [3][4][5]. In our recent work on an effective method to reduce covariate issues in gait-based human identification, an efficient gait representation is suggested [6]. ...
Chapter
Full-text available
Identification of a person based on their walking pattern is affected by factors such as camera viewing angle, apparel with the subject, holding a bag, walking surface, and complex situations. These are the covariate conditions in human gait analysis. The covariate conditions change an individual's gait pattern, making it difficult to implement gait recognition in a realistic environment. This work addresses the issue of covariate conditions in gait identification through a fusion of features. Using a pre-trained VGG16 model with four fully connected layers, the dynamic features are extracted and merged with HoG (Histogram of Oriented Gradients) features extracted from the raw GEI (Gait Energy Image) gait templates. PCA (Principal Component Analysis) is then used to lower the dimension of the combined features in order to select the discriminant feature vectors. The CASIA-B dataset is used to examine the efficacy of the suggested technique, which employs an MLP (Multi-layer Perceptron) classifier. The findings show that the proposed technique outperforms other existing approaches in terms of accuracy while walking normally, wearing a coat, and carrying a bag under identical viewing conditions.
... The Multi-view CCA (MCCA) [5] seeks the projection by increasing the total pair-wise correlations in the same domain for each perspective. The pairwise correlation, according to Luo et al. [31], eliminates high-order statistics and only takes into account the statistical characteristics of each distinct pair of perspectives, directly optimising the aggregate high-order connection of all viewpoints, by looking at the covariance tensor [32], [33]. ...
Preprint
Full-text available
p>In order to identify the shared subspace between two views, in Canonical Correlation Analysis (CCA), a popular multi-view dimension reduction technique tries to maximize correlation between the views. Although, there are frequently more than two views in many actual applications, it can only process data with two views. In prior studies, data with more than two viewpoints were managed using either linear correlation or higher degree polynomial correlation. The link between multiview data viewed from diverse perspectives is established by these two types of correlation, that is linear correlation and high order correlation, each of which has a different effect on view consistency. In this paper, we propose a Multi-view Uncorrelated Neighborhood Preserving Embedding (MUNPE), which simultaneously considers two distinct types of correlation to give flexible view consistency. While keeping the local structures of each perspective, the MUNPE also takes into account the complementaries of numerous viewpoints. The MUNPE makes the characteristics gathered by numerous projections for each view uncorrelated in order to get many projections and reduce the duplication of low-dimensional data. Iterative methods are used to resolve the MUNPE, and the algorithm’s convergence has been demonstrated. The testing on the Multiple Feature and other synthetic data sets were successful for MUNPE. It is observed that performance is better than MLPP[1], MSE[2], MLLE[3], GCCA[4], MCCA[5] algorithms.</p
... The Multi-view CCA (MCCA) [5] seeks the projection by increasing the total pair-wise correlations in the same domain for each perspective. The pairwise correlation, according to Luo et al. [31], eliminates high-order statistics and only takes into account the statistical characteristics of each distinct pair of perspectives, directly optimising the aggregate high-order connection of all viewpoints, by looking at the covariance tensor [32], [33]. ...
Preprint
Full-text available
p>In order to identify the shared subspace between two views, in Canonical Correlation Analysis (CCA), a popular multi-view dimension reduction technique tries to maximize correlation between the views. Although, there are frequently more than two views in many actual applications, it can only process data with two views. In prior studies, data with more than two viewpoints were managed using either linear correlation or higher degree polynomial correlation. The link between multiview data viewed from diverse perspectives is established by these two types of correlation, that is linear correlation and high order correlation, each of which has a different effect on view consistency. In this paper, we propose a Multi-view Uncorrelated Neighborhood Preserving Embedding (MUNPE), which simultaneously considers two distinct types of correlation to give flexible view consistency. While keeping the local structures of each perspective, the MUNPE also takes into account the complementaries of numerous viewpoints. The MUNPE makes the characteristics gathered by numerous projections for each view uncorrelated in order to get many projections and reduce the duplication of low-dimensional data. Iterative methods are used to resolve the MUNPE, and the algorithm’s convergence has been demonstrated. The testing on the Multiple Feature and other synthetic data sets were successful for MUNPE. It is observed that performance is better than MLPP[1], MSE[2], MLLE[3], GCCA[4], MCCA[5] algorithms.</p
... To highlight the visual clues, they are marked with different colors. conditions [6], and viewpoints [7]), which lead to large visual differences in the silhouettes of the same ID, i.e., the high intra-class diversity. Second, since the silhouettes have no texture and color information, the silhouettes of different IDs could be similar, especially for those under the same covariate settings. ...
Preprint
Gait recognition aims at identifying the pedestrians at a long distance by their biometric gait patterns. It is inherently challenging due to the various covariates and the properties of silhouettes (textureless and colorless), which result in two kinds of pair-wise hard samples: the same pedestrian could have distinct silhouettes (intra-class diversity) and different pedestrians could have similar silhouettes (inter-class similarity). In this work, we propose to solve the hard sample issue with a Memory-augmented Progressive Learning network (GaitMPL), including Dynamic Reweighting Progressive Learning module (DRPL) and Global Structure-Aligned Memory bank (GSAM). Specifically, DRPL reduces the learning difficulty of hard samples by easy-to-hard progressive learning. GSAM further augments DRPL with a structure-aligned memory mechanism, which maintains and models the feature distribution of each ID. Experiments on two commonly used datasets, CASIA-B and OU-MVLP, demonstrate the effectiveness of GaitMPL. On CASIA-B, we achieve the state-of-the-art performance, i.e., 88.0% on the most challenging condition (Clothing) and 93.3% on the average condition, which outperforms the other methods by at least 3.8% and 1.4%, respectively.
... 2) Feature descriptors: In order to increase the generalization and robustness, local feature descriptors were applied in further studies to create vector representations of local neighborhoods and increase the ability to handle scale variations, rotations, and occlusions. Various handcrafted feature descriptors, such as Histogram of Oriented Gradients (HOG) [66], local binary patterns (LBP) [67], scale-invariant feature transform (SIFT) [68], speeded up robust features (SURF) [69], Gabor features [70], Haar-like wavelets [71], and BDF [72] have been proposed that extract valuable information from the images. Generally, in these groups of object detection methods, a fixed-size sliding window is slid over a constructed feature pyramid, and therefore, they are usually designed for detecting objects with a fixed aspect ratio. ...
... Most notably is the so-called higher-order discriminant analysis (HODA) [8]. In addition to HODA, discriminant analysis with tensor representation (DATER) [9] and general tensor discriminant analysis (GTDA) [10] provide iterative procedures to maximize the scatter ratio criterion. However, DATER does not converge over iterations. ...
... General Tensor Discriminant Analysis (GTDA) and Gabor features have been used in [102] to reduce computational complexity by using visual information. Mowbray & Nixon [75] achieved an accuracy of 85% by using Fourier Descriptors for feature reduction. ...
Article
Full-text available
A biometric system is a technology that utilizes an individual’s unique physiological and behavioral characteristics to identify and authenticate them. It falls under the category of pattern recognition. Gait recognition, specifically the identification of individuals based on their walking patterns, has garnered significant attention from researchers due to its potential to accurately identify individuals from a distance. Gait recognition systems involve a complex integration of technical, operational and definitional choices and have been applied in a variety of contexts such as security, medical examinations, identity management, and access control. The utilization of gait recognition methods and tools has led to the development of various useful and widely accepted applications. This article provides an overview of the various techniques and approaches employed in gait recognition, including the framework, history, and parameters utilized. The article also delves into the different classifiers, both traditional and deep learning-based, used in the field. Additionally, it examines the different types of datasets utilized in experimental research and the methodology for evaluating articles on gait recognition. With its potential in security applications, gait recognition is expected to have a wide range of future applications.
... As the size of the input tensor increases, these approaches become computationally expensive and produce redundant features. Since the adjusted pixels in an image are highly correlated, we can reduce the dimension size by minimizing redundancy in input tensors by downsampling tensor entries resulting after the Gabor filters application [16,47]. The tensor entries are downsampled by a factor of eight, which approximates the size of the tensor with 1024Â1024Â5Â8 ...
Article
Full-text available
Wilson’s Disease (WD) is a rare, autosomal recessive disorder caused by excessive accumulation of Copper (Cu) in various human organs such as the liver, brain, and eyes. Accurate WD diagnosis is challenging because of: (1) subtle intensity variations in infected tissues, and (2) Biased training results in case of a small and imbalanced dataset. This study provides a novel WD classification model for a small MRI dataset (3072 scans). The proposed study explores multi-dimensional Gabor kernels in five scales and eight orientations to produce pixel-specific features and process them in the 4th-order tensor format. The tucker decomposition technique is applied to obtain approximate factors from the Gabor tensors set. Five-fold cross-validation results show that the proposed classification model achieves 99.91% classification accuracy which is better than four well-known feature extraction techniques: (1) 2D-Discrete Wavelet Transform, (2) Intensity histograms, (3) Histogram of oriented gradients, and (4) Grey level co-occurrence matrix. Also, our method improves the classification accuracy by an average of 33% and Area Under the Curve (AUC) by 25% over the above-mentioned feature extraction techniques. In the latter category, the performance of the proposed method is compared with three deep learning models: (1) Customized Convolution Neural Network (CCNN), (2) AlexNet, and (3) VGGNet. In addition, it enhances classification accuracy by 10%, 3.5%, and 3%, compared to CCNN, AlexNet, and VGGNet, respectively. Also, our proposed approach is computationally fast compared to discussed feature extraction techniques.
... The projected data obtained by dimensionality reduction can be used in subsequent data mining tasks including classification, computer visualization, etc. In supervised learning, linear discriminant analysis (LDA) [1,2] is one of the most useful and popular dimensionality reduction methods, and has been applied in many area, including bioinformatics [3], geographical classification [4], gait recognition [5], and face recognition [6,7]. LDA aims to learn a set of optimal projections to extract useful discriminative information, through maximizing the between-class distance and simultaneously minimizing the within-class distance in the projected space. ...
Article
Full-text available
Classical linear discriminant analysis (LDA) is based on squared Frobenious norm and hence is sensitive to outliers and noise. To improve the robustness of LDA, this paper introduces a capped l2,1-norm of a matrix, which employs non-squared l2-norm and “capped” operation, and further proposes a novel capped l2,1-norm linear discriminant analysis, called CLDA. Due to the use of capped l2,1-norm, CLDA can effectively remove extreme outliers and suppress the effect of noise data. In fact, CLDA can also be viewed as a weighted LDA and is solved through a series of generalized eigenvalue problems. The experimental results on an artificial data set, some UCI data sets and two image data sets demonstrate the effectiveness of CLDA.
... Dataset consists of 1,098,207 examples and 6 attributes like walking, running, upstairs, downstairs, sitting, and standing. As far as the class distribution is concerned these are the highest number of samples for walking [23] (4,24,400 which is almost 38.4% of our total samples) and the lowest number of samples for standing (48,395 which is equal to 4.4% of our total samples). Raw data is imbalanced and so data balancing is done in the data pre-processing section. ...
Article
Full-text available
Human Activity Recognition (HAR) has always been a difficult task to tackle. It is mainly used in security surveillance, human-computer interaction, and health care as an assistive or diagnostic technology in combination with other technologies such as the Internet of Things (IoT). Human Activity Recognition data can be recorded with the help of sensors, images, or smartphones. Recognizing daily routine-based human activities such as walking, standing, sitting, etc., could be a difficult statistical task to classify into categories and hence 2-dimensional Convolutional Neural Network (2D CNN) MODEL, Long Short Term Memory (LSTM) Model, Bidirectional long short-term memory (Bi-LSTM) are used for the classification. It has been demonstrated that recognizing the daily routine-based on human activities can be extremely accurate, with almost all activities accurately getting recognized over 90% of the time. Furthermore, because all the examples are generated from only 20 s of data, these actions can be recognised fast. Apart from classification, the work extended to verify and investigate the need for wearable sensing devices in individually walking patients with Cerebral Palsy (CP) for the evaluation of chosen Spatio-temporal features based on 3D foot trajectory. Case-control research was conducted with 35 persons with CP ranging in weight from 25 to 65 kg. Optical Motion Capture (OMC) equipment was used as the referral method to assess the functionality and quality of the foot-worn device. The average accuracy ± precision for stride length, cadence, and step length was 3.5 ± 4.3, 4.1 ± 3.8, and 0.6 ± 2.7 cm respectively. For cadence, stride length, swing, and step length, people with CP had considerably high inter-stride variables. Foot-worn sensing devices made it easier to examine Gait Spatio-temporal data even without a laboratory set up with high accuracy and precision about gait abnormalities in people who have CP during linear walking.
... 2) Feature descriptors: In order to increase the generalization and robustness, local feature descriptors were applied in further studies to create vector representations of local neighborhoods and increase the ability to handle scale variations, rotations, and occlusions. Various handcrafted feature descriptors, such as Histogram of Oriented Gradients (HOG) [66], local binary patterns (LBP) [67], scale-invariant feature transform (SIFT) [68], speeded up robust features (SURF) [69], Gabor features [70], Haar-like wavelets [71], and BDF [72] have been proposed that extract valuable information from the images. Generally, in these groups of object detection methods, a fixed-size sliding window is slid over a constructed feature pyramid, and therefore, they are usually designed for detecting objects with a fixed aspect ratio. ...
Preprint
Full-text available
p>Traffic video analytics has become one of the core components in the evolution of transportation systems. Artificially intelligent traffic management systems apply computer vision techniques to alleviate the monotony of manually monitoring the video feed from surveillance cameras. Locating the objects of interest is the most crucial step in the pipeline of such video analytics systems. An abundance of research has been conducted to find the location of the targets in traffic scenes. This paper presents a comprehensive review of different algorithms used for object detection in traffic surveillance applications in addition to the recent trends and future directions. Based on the approaches used in the related studies, we categorize the object detection methods into motion-based and appearance-based techniques. We further classify each group of techniques into a number of subcategories and analyze the advantages and disadvantages of each method. The major challenges, limitations, and potential solutions are also discussed along with the future scope and directions.</p
... 2) Feature descriptors: In order to increase the generalization and robustness, local feature descriptors were applied in further studies to create vector representations of local neighborhoods and increase the ability to handle scale variations, rotations, and occlusions. Various handcrafted feature descriptors, such as Histogram of Oriented Gradients (HOG) [66], local binary patterns (LBP) [67], scale-invariant feature transform (SIFT) [68], speeded up robust features (SURF) [69], Gabor features [70], Haar-like wavelets [71], and BDF [72] have been proposed that extract valuable information from the images. Generally, in these groups of object detection methods, a fixed-size sliding window is slid over a constructed feature pyramid, and therefore, they are usually designed for detecting objects with a fixed aspect ratio. ...
Preprint
Full-text available
p>Traffic video analytics has become one of the core components in the evolution of transportation systems. Artificially intelligent traffic management systems apply computer vision techniques to alleviate the monotony of manually monitoring the video feed from surveillance cameras. Locating the objects of interest is the most crucial step in the pipeline of such video analytics systems. An abundance of research has been conducted to find the location of the targets in traffic scenes. This paper presents a comprehensive review of different algorithms used for object detection in traffic surveillance applications in addition to the recent trends and future directions. Based on the approaches used in the related studies, we categorize the object detection methods into motion-based and appearance-based techniques. We further classify each group of techniques into a number of subcategories and analyze the advantages and disadvantages of each method. The major challenges, limitations, and potential solutions are also discussed along with the future scope and directions.</p
Chapter
In this book chapter, we provide a selective review of recent advances in tensor analysis and tensor modeling in statistics and machine learning. We then provide examples in health data science applications.
Article
Effective feature dimension reduction (DR) from high-dimensional remote sensing images has been a significant challenge for remote sensing object recognition. Directly adopting vector-based DR method ignores remote sensing data’s inherent tensor structure information, leading to the undersample problem. Additionally, the existing tensor-based DR methods either require an exponential storage space increasing with the orders of the input tensor (i.e., Tucker-form methods) or are dependent on the permutation of tensor modes limiting the discriminant capability of the DR results (i.e., tensor train form methods). To conquer these problems, unlike the existing Tucker or tensor train form feature representation, the novel tensor ring (TR) subspace learning theory is proposed systematically and rigorously to extend the traditional vector and tensor subspace learning to the TR subspace. Then, by embedding Fisher criterion into TR subspace, the Tensor Ring Discriminant Analysis (TRDA) is proposed to achieve DR for remote sensing tensors with flexible tensor rank and lower storage cost. To train TRDA under different computing resources, non-recursive and exact TRDA training methods are presented to obtain the global suboptimal and local optimal solutions, respectively. Furthermore, to adapt to the case of multisource data and unlabeled data, the multiple TRDA (MTRDA) and semi-supervised TRDA (S-TRDA) are further proposed to refine multisource features in multiple TR subspaces and absorb useful information using adaptive scatter tensor, respectively. Using optical, hyperspectral, and SAR datasets, experimental results demonstrate that the proposed TRDA can obtain better recognition accuracy and smaller storage cost than the typical vector and tensor-based DR methods.
Article
Unlike traditional video cameras, event cameras capture asynchronous event streams in which each event encodes pixel location, triggers’ timestamps, and the polarity of brightness changes. In this paper, we introduce a novel hypergraph-based framework for moving object classification. Specifically, we capture moving objects with an event camera, to perceive and collect asynchronous event streams in a high temporal resolution. Unlike stacked event frames, we encode asynchronous event data into a hypergraph, fully mining the high-order correlation of event data, and designing a mixed convolutional hypergraph neural network for training to achieve a more efficient and accurate motion target recognition. The experimental results show that our method has a good performance in moving object classification (e.g., gait identification).
Article
Cataract surgery remains the only definitive treatment for visually significant cataracts, which are a major cause of preventable blindness worldwide. Successful performance of cataract surgery relies on stable dilation of the pupil. Automated pupil segmentation from surgical videos can assist surgeons in detecting risk factors for pupillary instability prior to the development of surgical complications. However, surgical illumination variations, surgical instrument obstruction, and lens material hydration during cataract surgery can limit pupil segmentation accuracy. To address these problems, we propose a novel method named adaptive wavelet tensor feature extraction (AWTFE). AWTFE is designed to enhance the accuracy of deep learning-powered pupil recognition systems. First, we represent the correlations among spatial information, color channels, and wavelet subbands by constructing a third-order tensor. We then utilize higher-order singular value decomposition to eliminate redundant information adaptively and estimate pupil feature information. We evaluated the proposed method by conducting experiments with state-of-the-art deep learning segmentation models on our BigCat dataset consisting of 5,700 annotated intraoperative images from 190 cataract surgeries and a public CaDIS dataset. The experimental results reveal that the AWTFE method effectively identifies features relevant to the pupil region and improved the overall performance of segmentation models by up to 2.26% (BigCat) and 3.31% (CaDIS). Incorporation of the AWTFE method led to statistically significant improvements in segmentation performance (P < 1.29 × 10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">−10</sup> for each model) and yielded the highest-performing model overall (Dice coefficients of 94.74% and 96.71% for the BigCat and CaDIS datasets, respectively). In performance comparisons, the AWTFE consistently outperformed other feature extraction methods in enhancing model performance. In addition, the proposed AWTFE method significantly improved pupil recognition performance by up to 2.87% in particularly challenging phases of cataract surgery.
Article
Full-text available
Tensor completion is a vital task in multidimensional signal processing and machine learning. To recover the missing data in a tensor, various low-rank structures of a tensor can be assumed, and Tucker format is a popular choice. However, the promising capability of Tucker completion is realized only when we can determine a suitable multilinear rank, which controls the model complexity and thus is essential to avoid overfitting/underfitting. Rather than exhaustively searching the best multilinear rank, which is computationally inefficient, recent advances have proposed a Bayesian way to learn the multilinear rank from training data automatically. However, in prior arts, only a single parameter is dedicated to learn the variance of the core tensor elements. This rigid assumption restricts the modeling capabilities of existing methods in real-world data, where the core tensor elements may have a wide range of variances. To have a flexible core tensor while still retaining succinct Bayesian modeling, we first bridge the tensor Tucker decomposition to the canonical polyadic decomposition (CPD) with low-rank factor matrices, and then propose a novel Bayesian modeling based on the Gaussian-inverse Wishart prior. Inference algorithm is further derived under the variational inference framework. Extensive numerical studies on synthetic data and real-world datasets demonstrate the significantly improved performance of the proposed algorithm in terms of multilinear rank learning and missing data recovery.
Article
italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Objective: Since single brain computer interface (BCI) is limited in performance, it is necessary to develop collaborative BCI (cBCI) systems which integrate multi-user electroencephalogram (EEG) information to improve system performance. However, there are still some challenges in cBCI systems, including effective discriminant feature extraction of multi-user EEG data, fusion algorithms, time reduction of system calibration, etc. Methods: This study proposed an event-related potential (ERP) feature extraction and classification algorithm of spatio-temporal weighting and correlation analysis (STC) to improve the performance of cBCI systems. The proposed STC algorithm consisted of three modules. First, source extraction and interval modeling were used to overcome the problem of inter-trial variability. Second, spatio-temporal weighting and temporal projection were utilized to extract effective discriminant features for multi-user information fusion and cross-session transfer. Third, correlation analysis was conducted to match target/non-target templates for classification of multi-user and cross-session datasets. Results: The collaborative cross-session datasets of rapid serial visual presentation (RSVP) from 14 subjects were used to evaluate the performance of the EEG classification algorithm. For single-user/collaborative EEG classification of within-session and cross-session datasets, STC had significantly higher performance than the existing state-of-the-art machine learning algorithms. Conclusion: It was demonstrated that STC was effective to improve the classification performance of multi-user collaboration and cross-session transfer for RSVP-based BCI systems, and was helpful to reduce the system calibration time.
Article
Full-text available
Gait analysis provides a convenient strategy for the diagnosis and rehabilitation assessment of diseases of skeletal, muscular, and neurological systems. However, challenges remain in current gait recognition methods due to the drawbacks of complex systems, high cost, affecting natural gait, and one‐size‐fits‐all model. Here, a highly integrated gait recognition system composed of a self‐powered multi‐point body motion sensing network (SMN) based on full textile structure is demonstrated. By combining of newly developed energy harvesting technology of triboelectric nanogenerator (TENG) and traditional textile manufacturing process, SMN not only ensures high pressure response sensitivity up to 1.5 V kPa⁻¹, but also is endowed with several good properties, such as full flexibility, excellent breathability (165 mm s⁻¹), and good moisture permeability (318 g m⁻² h⁻¹). By using machine learning to analyze periodic signals and dynamic parameters of limbs swing, the gait recognition system exhibits a high accuracy of 96.7% of five pathological gaits. In addition, a customizable auxiliary rehabilitation exercise system that monitors the extent of the patient's rehabilitation exercise is developed to observe the patient's condition and instruct timely recovery training. The machine learning‐assisted SMN can provide a feasible solution for disease diagnosis and personalized rehabilitation of the patients.
Chapter
This book chapter provides a brief introduction of tensors and a selective overview of tensor data analysis. Tensor data analysis has been an increasingly popular and also a challenging topic in multivariate statistics. In this book chapter, we aim to review the current literature on statistical models and methods for tensor data analysis.KeywordsDiscriminant analysisDimension reductionGeneralized linear modelMultivariate linear regressionTensor data
Article
Full-text available
Canonical correlation analysis (CCA) has attracted great interest in multi-view representation. However, most of the CCA methods heavily rely on the matrix structure, which may neglect the prior geometric information in high-order data. To deal with the above issue, we first propose a novel tensor CCA formulation with orthogonality, called TCCA-O, based on the Tucker decomposition to preserve the orthogonality. Then, we incorporate a structured sparse regularization term into the TCCA-O, called TCCA-OS, to improve feature representation. In addition, we develop an efficient alternating direction method of multipliers (ADMM)-based algorithm to solve TCCA-OS and conduct numerical comparisons on four public datasets. The results validate the advantages of the proposed methods in terms of classification accuracy, parameter sensitivity, noise robustness, and model stability. In particular, TCCA-O and TCCA-OS improve the classification accuracy by at least 10.03% and 10.36%, respectively, over the state-of-the-art CCA methods on the Caltech101-7 dataset.
Article
Behavioral biometrics has been widely investigated and deployed in real world scenarios for human authentication. However, there has been almost nil attempt to identify behavior patterns in a cross-scenario setting, which is common in practice and needs urgent attention. This paper defines and investigates cross-scenario behavioral biometrics authentication using keystroke dynamics. A novel system called CrossBehaAuth is presented for extending keystroke dynamics-based behavior authentication to new scenarios and extensive problems. We design a temporal-aware learning mechanism based deep neural network for cross-scenario keystroke dynamics authentication. This mechanism selectively learns and encodes temporal information for efficient behavioral pattern transfer in cross-scenario settings. A local Gaussian data augmentation approach is proposed to increase the diversity of behavioral data and therefore, further improve the performance. We evaluate the proposed approach on two publicly available datasets. The extensive experimental results confirm the efficacy of our CrossBehaAuth for cross-scenario keystroke dynamics authentication. Our approach significantly improves the authentication accuracy in cross-scenario settings and even achieves comparable performance on single-scenario authentication tasks. In addition, our approach shows its generalizability and advantages in both single and cross scenario keystroke dynamics authentication.
Article
Biometrics have attracted growing research interests as the information security and safety gain increasing attention to date. As a kind of important biomedical signal, Electroen-cephalogram (EEG) contains valuable information about identity, emotionality, personality, etc. Thus, automatically distinguishing the identities based on EEG is beneficial to the development of biometrics, forensics and informatics. Although deep learning has absorbed much research attention for the issue of EEG-based person identification, the performance enhancement of this methodology seems to have hit a bottleneck recently. Hence, by rethinking the problems haunting this issue, we plan to reinvigorate the conventional method pipeline, and put forward a novel and effective tensorial scheme away from the deep learning mainstream. Specifically, the proposed tensorial scheme extracts the effective tensorial representation from multi-channel EEG at first; then, the scheme performs the designed tensorial learning to improve the discriminability of the feature space; finally, the scheme carries out the devised tensorial measurement in the learned metric space for classification. Experimental results have demonstrated the superiority of proposed scheme over the related advanced approaches by means of the challenging benchmark databases DEAP, SEED and DREAMER.
Article
Research on biometric methods has gained momentum due to the increase in security concerns. Face and gait biometrics are two biometrics that are safe and non-invasive in nature and can be captured without the knowledge of the person. These two biometric can be easily used in surveillance applications. This paper presents a multimodal biometrics system based on face and gait based on principal component analysis (PCA) along with a simplified deep neural network (S-DNN). In the S-DNN analysis, cross entropy instead of Euclidean distance is considered. In the proposed method, PCA is used for feature extraction, and for the re-construction of faces, in place of inverse PCA, an artificial neural network is considered to improve the accuracy. The combination of PCA and ANN is separately used for both the biometrics and the matching score for each biometric identifier is obtained. Finally, the softmax function is used for the fusion of matching scores to get the final matching score. The main advantages of the proposed methodology are higher accuracy (99.51%) and very little processing time.
Article
Considering the multiple observations from satellites with different types of sensors, the available object data are regarded as different combinations of multisource heterogeneous data. To achieve accurate object recognition, the dimension reduction (DR) technology is important to capture the low-dimensional discriminative representation containing complementary information from multisource data, while off-the-shelf DR methods can only handle input represented as vector or homogeneous tensors and fail to deal with multisource data represented as heterogeneous tensors. In addition, the existing DR methods are classifier-independent, making it difficult to ensure effective recognition under a specific classifier. To solve these problems, the DR method for Heterogeneous Tensors based on Graph-based and Classifier-oriented Embedding (HTGCE) method is proposed to learn the low-dimensional representation of multisource data using different combinations of samples. First, self-learning adjacency matrices are constructed to capture the local structure of different combinations of multisource data autonomously. Then, unlike the classifier-independent discriminant term used in existing DR methods, a classifier-oriented discriminant term is constructed to enhance the specific classifier-based recognition results. Furthermore, the reconstruction error minimization term is created to enable DR results to inherit the main information of the original data. Moreover, an adaptive weight factor is built to balance the importance of different sources for object recognition. Finally, an alternative optimization strategy is presented to solve the optimization problem of HTGCE. Using the multiresolution multiangle optical dataset and the paired optical and SAR dataset, the experimental results demonstrate that the HTGCE outperforms typical vector- and tensor-based DR methods in terms of recognition accuracy.
Article
Digital recognition of meters aims to identify numbers in complex environments. Existing methods of digital recognition of meters are dependent on deep networks supported by high-quality large-scale data and features extraction, whereas low-quality small-scale digital datasets are not effective in recognition. Moreover, occlusion of digital images obstructs the extraction of fixed features. Multi-classifier under Feature Engineering (MC-FE) is proposed to solve the above problems. To be specific, MC-FE builds a feature library containing a variety of mainstream features. It directly selects the optimal combination of distinguishing features applicable to the current dataset, rather than using the fixed features. In addition, 10 regression machines integrated by support vector machines are adopted. The respective regression machine determines the probability of every number from 0 to 9. The positioning of the meter is the premise of accurate identification. A multi-layer kernel regression positioning (ML-KRP) is designed to increase the accuracy of meter identification. The results of the experiments on several digital recognition datasets reveal that MC- FE and ML-KRP outperform the state-of-the-arts in digital recognition under this small-scale setting.
Article
Gait recognition provides the opportunity to identify different walking styles of people without physical intervention. However, covariates such as changing clothes and carrying conditions may influence recognition accuracy. Our objective was to identify the walking patterns of people for different covariates through analyzing images from publicly available data set CASIA-B on gait. On the dataset, the proposed method was evaluated by using GEI (gait energy image) as inputs for normal walking, changing clothes, and carrying conditions in a multi-view environment. A support vector machine (SVM) and a histogram of oriented gradients (HOG) were applied to classify images of the human gait in order to meet the objectives. Observations show that, under consideration of the mean of the individual accuracies, the accuracy of recognition is in the following order: clothing > normal walk > carrying at a 90° angle. Measurement accuracy of 87.9% was achieved for the coat-wearing people and measurement accuracy of 83.33% was achieved for all the mentioned covariates. The accuracy of the clothing covariate stated as 87.9% is a useful for people especially for different season like winter.
Conference Paper
Very recently, ESEAP mutual authentication protocol was designed to avoid the drawbacks of Wang et al. protocol and highlights that the protocol is protecting all kind of security threats using informal analysis. This work investigates the ESEAP protocol in security point of view and notices that the scheme is not fully protected against stolen verifier attack and does not provide user anonymity. Furthermore, the same protocol has user identity issues, i.e., the server cannot figure out the user identity during the authentication phase. Later we discuss the inconsistencies in the security analysis of ESEAP presented by RESEAP.
Article
Full-text available
We develop a face recognition algorithm which is insensitive to large variation in lighting direction and facial expression. Taking a pattern classification approach, we consider each pixel in an image as a coordinate in a high-dimensional space. We take advantage of the observation that the images of a particular face, under varying illumination but fixed pose, lie in a 3D linear subspace of the high dimensional image space-if the face is a Lambertian surface without shadowing. However, since faces are not truly Lambertian surfaces and do indeed produce self-shadowing, images will deviate from this linear subspace. Rather than explicitly modeling this deviation, we linearly project the image into a subspace in a manner which discounts those regions of the face with large deviation. Our projection method is based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variation in lighting and facial expressions. The eigenface technique, another method based on linearly projecting the image space to a low dimensional subspace, has similar computational requirements. Yet, extensive experimental results demonstrate that the proposed “Fisherface” method has error rates that are lower than those of the eigenface technique for tests on the Harvard and Yale face databases
Chapter
Full-text available
In An Essay Concerning Human Understanding, John Locke (Locke, 1690/1959, 11:8:9) listed five primary qualities of objects in the world around us: solidity, extension, figure, number, and motion. These attributes have served us well in the study of perception. Solidity, for example, has been studied in terms of transparency and depth in vision, texture in touch, and stereophony in audition—stereophonic literally means ‘solid sound.’ Extension, or size, has been very important, particularly in vision, where psychologists have studied effects of relative and absolute size, and proximal and distal size. Figure, interpreted as shape or form, has undoubtedly been the most important quality of objects for all perception, providing impetus for Gestalt and other psychologists. Number, especially when logically extended to include composition and the relation of parts to wholes, has also played an important role. Motion has also played some part; however, unlike the others, motion has most often been excised from the aggregate by the perceptual psychologist. It appears that we have spent most of our efforts in the study of static figures of a certain size, composition, and apparent solidity. It is clear that motion is a quality unlike the others; it invokes time. Our general purpose is to help integrate motion and time back into the study of perception.
Chapter
Full-text available
We develop a face recognition algorithm which is insensitive to gross variation in lighting direction and facial expression. Taking a pattern classification approach, we consider each pixel in an image as a coordinate in a high-dimensional space. We take advantage of the observation that the images of a particular face under varying illumination direction lie in a 3-D linear subspace of the high dimensional feature space — if the face is a Lambertian surface without self-shadowing. However, since faces are not truly Lambertian surfaces and do indeed produce self-shadowing, images will deviate from this linear subspace. Rather than explicitly modeling this deviation, we project the image into a subspace in a manner which discounts those regions of the face with large deviation. Our projection method is based on Fisher's Linear Discriminant and produces well separated classes in a low-dimensional subspace even under severe variation in lighting and facial expressions. The Eigenface technique, another method based on linearly projecting the image space to a low dimensional subspace, has similar computational requirements. Yet, extensive experimental results demonstrate that the proposed Fisherface method has error rates that are significantly lower than those of the Eigenface technique when tested on the same database.
Conference Paper
Full-text available
We analyze walking people using a gait sequence representation that bypasses the need for frame-to-frame tracking of body parts. The gait representation maps a video sequence of silhouettes into a pair of two-dimensional spatio-temporal patterns that are near-periodic along the time axis. Mathematically, such patterns are called “frieze” patterns and associated symmetry groups “frieze groups”. With the help of a walking humanoid avatar, we explore variation in gait frieze patterns with respect to viewing angle, and find that the frieze groups of the gait patterns and their canonical tiles enable us to estimate viewing direction of human walking videos. In addition, analysis of periodic patterns allows us to determine the dynamic time warping and affine scaling that aligns two gait sequences from similar viewpoints. We also show how gait alignment can be used to perform human identification and model-based body part segmentation.
Article
Full-text available
A new LDA-based face recognition system is presented in this paper. Linear discriminant analysis (LDA) is one of the most popular linear projection techniques for feature extraction. The major drawback of applying LDA is that it may encounter the small sample size problem. In this paper, we propose a new LDA-based technique which can solve the small sample size problem. We also prove that the most expressive vectors derived in the null space of the within-class scatter matrix using principal component analysis (PCA) are equal to the optimal discriminant vectors derived in the original space using LDA. The experimental results show that the new LDA process improves the performance of a face recognition system significantly.
Article
Full-text available
We describe a video-rate surveillance algorithm for determining whether people are carrying objects or moving unencumbered from a stationary camera. The contribution of the paper is the shape analysis algorithm that both determines whether a person is carrying an object and segments the object from the person so that it can be tracked, e.g., during an exchange of objects between two people. As the object is segmented, an appearance model of the object is constructed. The method combines periodic motion estimation with static symmetry analysis of the silhouettes of a person in each frame of the sequence. Experimental results demonstrate robustness and real-time performance of the proposed algorithm.
Conference Paper
Full-text available
Linear Discriminant Analysis (LDA) is a well-known scheme for feature extraction and dimension reduction. It has been used widely in many ap- plications involving high-dimensional data, such as face recognition and image retrieval. An intrinsic limitation of classical LDA is the so-called singularity problem, that is, it fails when all scatter matrices are singu- lar. A well-known approach to deal with the singularity problem is to apply an intermediate dimension reduction stage using Principal Com- ponent Analysis (PCA) before LDA. The algorithm, called PCA+LDA, is used widely in face recognition. However, PCA+LDA has high costs in time and space, due to the need for an eigen-decomposition involving the scatter matrices. In this paper, we propose a novel LDA algorithm, namely 2DLDA, which stands for 2-Dimensional Linear Discriminant Analysis. 2DLDA over- comes the singularity problem implicitly, while achieving efficiency. The key difference between 2DLDA and classical LDA lies in the model for data representation. Classical LDA works with vectorized representa- tions of data, while the 2DLDA algorithm works with data in matrix representation. To further reduce the dimension by 2DLDA, the combi- nation of 2DLDA and classical LDA, namely 2DLDA+LDA, is studied, where LDA is preceded by 2DLDA. The proposed algorithms are ap- plied on face recognition and compared with PCA+LDA. Experiments show that 2DLDA and 2DLDA+LDA achieve competitive recognition accuracy, while being much more efficient.
Article
Full-text available
On the basis of measured receptive field profiles and spatial frequency tuning characteristics of simple cortical cells, it can be concluded that the representation of an image in the visual cortex must involve both spatial and spatial frequency variables. In a scheme due to Gabor, an image is represented in terms of localized symmetrical and antisymmetrical elementary signals. Both measured receptive fields and measured spatial frequency tuning curves conform closely to the functional form of Gabor elementary signals. It is argued that the visual cortex representation corresponds closely to the Gabor scheme owing to its advantages in treating the subsequent problem of pattern recognition.
Article
Full-text available
We propose a view-based approach to recognize humans from their gait. Two different image features have been considered: the width of the outer contour of the binarized silhouette of the walking person and the entire binary silhouette itself. To obtain the observation vector from the image features, we employ two different methods. In the first method, referred to as the indirect approach, the high-dimensional image feature is transformed to a lower dimensional space by generating what we call the frame to exemplar (FED) distance. The FED vector captures both structural and dynamic traits of each individual. For compact and effective gait representation and recognition, the gait information in the FED vector sequences is captured in a hidden Markov model (HMM). In the second method, referred to as the direct approach, we work with the feature vector directly (as opposed to computing the FED) and train an HMM. We estimate the HMM parameters (specifically the observation probability B) based on the distance between the exemplars and the image features. In this way, we avoid learning high-dimensional probability density functions. The statistical nature of the HMM lends overall robustness to representation and recognition. The performance of the methods is illustrated using several databases.
Article
Full-text available
Identification of people by analysis of gait patterns extracted from video has recently become a popular research problem. However, the conditions under which the problem is "solvable" are not understood or characterized. To provide a means for measuring progress and characterizing the properties of gait recognition, we introduce the HumanID Gait Challenge Problem. The challenge problem consists of a baseline algorithm, a set of 12 experiments, and a large data set. The baseline algorithm estimates silhouettes by background subtraction and performs recognition by temporal correlation of silhouettes. The 12 experiments are of increasing difficulty, as measured by the baseline algorithm, and examine the effects of five covariates on performance. The covariates are: change in viewing angle, change in shoe type, change in walking surface, carrying or not carrying a briefcase, and elapsed time between sequences being compared. Identification rates for the 12 experiments range from 78 percent on the easiest experiment to 3 percent on the hardest. All five covariates had statistically significant effects on performance, with walking surface and time difference having the greatest impact. The data set consists of 1,870 sequences from 122 subjects spanning five covariates (1.2 Gigabytes of data). The gait data, the source code of the baseline algorithm, and scripts to run, score, and analyze the challenge experiments are available at http://www.GaitChallenge.org. This infrastructure supports further development of gait recognition algorithms and additional experiments to understand the strengths and weaknesses of new algorithms. The more detailed the experimental results presented, the more detailed is the possible meta-analysis and greater is the understanding. It is this potential from the adoption of this challenge problem that represents a radical departure from traditional computer vision research methodology.
Conference Paper
Full-text available
We present a robust representation for gait recognition that is compact, easy to construct, and affords efficient matching. Instead of a time series based representation comprising of a sequence of raw silhouette frames or of features extracted therein, as has been the practice, we simply align and average the silhouettes over one gait cycle. We then base recognition on the Euclidean distance between these averaged silhouette representations. We show, using the recently formulated gait challenge problem (www.gaitchallenge.org), that the improvement in execution time is 30 times while possessing recognition power that is comparable to the gait baseline algorithm, which is becoming the comparison standard in gait recognition. Experiments with portions of the average silhouette representation show that recognition power is not entirely derived from upper body shape, rather the dynamics of the legs also contribute equally to recognition. However, this study does raise intriguing doubts about the need for accurate shape and dynamics representations for gait recognition.
Conference Paper
Full-text available
Linear discriminant analysis (LDA) is a popular feature extraction technique for face recognition. However, it often suffers from the small sample size problem when dealing with the high dimensional face data. Some approaches have been proposed to overcome this problem, but they are often unstable and have to discard some discriminative information. In this paper, a dual-space LDA approach for face recognition is proposed to take full advantage of the discriminative information in the face space. Based on a probabilistic visual model, the eigenvalue spectrum in the space of within-class scatter matrix is estimated, and discriminant analysis is simultaneously applied in the principal and subspaces of the within-class scatter matrix. The two sets of discriminative features are then combined for recognition. It outperforms existing LDA approaches.
Conference Paper
Full-text available
Researchers in the gait community propose various features, either appearance or model based, which they believe encode certain individual traits. One of the main assumptions made in many gait recognition techniques is constant walking-speed. Even though the gait patterns are repeatable, changes in walking speed can influence the gait patterns themselves. In this work we explore how changes in walking speed affect gait parameters in terms of recognition performance. A speed-varying walking database was collected to allow us to investigate and quantify the impact of speed on gait recognition methodically. We develop a normalization procedure, which maps gait features across speeds, and demonstrate their utility in previously proposed appearance-based gait recognition methods.
Conference Paper
Full-text available
Multilinear algebra, the algebra of higher-order tensors, offers a potent mathematical framework for analyzing ensembles of images resulting from the interaction of any number of underlying factors. We present a dimensionality reduction algorithm that enables subspace analysis within the multilinear framework. This N-mode orthogonal iteration algorithm is based on a tensor decomposition known as the N-mode SVD, the natural extension to tensors of the conventional matrix singular value decomposition (SVD). We demonstrate the power of multilinear subspace analysis in the context of facial image ensembles, where the relevant factors include different faces, expressions, viewpoints, and illuminations. In prior work we showed that our multilinear representation, called TensorFaces, yields superior facial recognition rates relative to standard, linear (PCA/eigenfaces) approaches. We demonstrate factor-specific dimensionality reduction of facial image ensembles. For example, we can suppress illumination effects (shadows, highlights) while preserving detailed facial features, yielding a low perceptual error.
Conference Paper
Full-text available
We describe a method to detect instances of a walking person carrying an object seen from a stationary camera. We take a correspondence-free motion-based recognition approach, that exploits known shape and periodicity cues of the human silhouette shape. Specifically, we subdivide the binary silhouette into four horizontal segments, and analyze the temporal behavior of the bounding box width over each segment. We posit that the periodicity and amplitudes of these time series satisfy certain criteria for a natural walking person, and deviations therefrom are an indication that the person might be carrying an object. The method is tested on 41 360×240 color outdoor sequences of people walking and carrying objects at various poses and camera viewpoints. A correct detection rate of 85% and a false alarm rate of 12% are obtained.
Conference Paper
Full-text available
This paper demonstrates gait recognition using only the trajectories of lower body joint angles projected into the walking plane. For this work, we begin with the position of 3D markers as projected into the sagittal or walking plane. We show a simple method for estimating the planar offsets between the markers and the underlying skeleton and joints; given these offsets we compute the joint angle trajectories. To compensate for systematic temporal variations from one instance to the next-predominantly distance and speed of walk-we fix the number of footsteps and time-normalize the trajectories by a variance compensated time warping. We perform recognition on two walking databases of 18 people (over 150 walk instances) using simple nearest neighbor algorithm with Euclidean distance as a measurement criteria. We also use the expected confusion metric as a means to estimate how well joint-angle signals will perform in a larger population.
Conference Paper
Full-text available
A gait-recognition technique that recovers static body and stride parameters of subjects as they walk is presented. This approach is an example of an activity-specific biometric: a method of extracting identifying properties of an individual or of an individual's behavior that is applicable only when a person is performing that specific action. To evaluate our parameters, we derive an expected confusion metric (related to mutual information), as opposed to reporting a percent correct with a limited database. This metric predicts how well a given feature vector will filter identity in a large population. We test the utility of a variety of body and stride parameters recovered in different viewing conditions on a database consisting of 15 to 20 subjects walking at both an angled and frontal-parallel view with respect to the camera, both indoors and out. We also analyze motion-capture data of the subjects to discover whether confusion in the parameters is inherently a physical or a visual measurement error property.
Conference Paper
Full-text available
We describe a robust technique for detecting nonstationary periodic motion from a moving and static camera. We also describe a robust technique for discriminating motion symmetries (periodic motion classification), which we apply to classifying running humans (bipeds) and canines (quadrupeds). The system has been implemented to run in real-time (30 Hz) on standard PC workstations
Conference Paper
Full-text available
A hierarchical representation consisting of two level linear combinations (LC) is proposed for face recognition. At the first level, a face image is represented as a linear combination (LC) of a set of basis vectors, i.e. eigenfaces. Thereby a face image corresponds to a feature vector (prototype) in the eigenface space. Normally several such prototypes are available for a face class, each representing the face under a particular condition such as in viewpoint, illumination, and so on. We propose to use the second level LC, that of the prototypes belonging to the same face class, to treat the prototypes coherently. The purpose is to improve face recognition under a new condition not captured by the prototypes by using a linear combination of them. A new distance measure called nearest LC (NLC) is proposed as opposed to the NN. Experiments show that our method yields significantly better results than the one level eigenface methods
Conference Paper
Full-text available
A new view-based approach to the representation and recognition of action is presented. The basis of the representation is a temporal template-a static vector-image where the vector value at each point is a function of the motion properties at the corresponding spatial location in an image sequence. Using 18 aerobics exercises as a test domain, we explore the representational power of a simple, two component version of the templates: the first value is a binary value indicating the presence of motion, and the second value is a function of the recency of motion in a sequence. We then develop a recognition method which matches these temporal templates against stored instances of views of known actions. The method automatically performs temporal segmentation, is invariant to linear changes in speed, and runs in real-time on a standard platform. We recently incorporated this technique into the KIDSROOM: an interactive, narrative play-space for children
Article
Substantial evidence supports a relationship between gait perception and gait synthesis. Furthermore, passive mechanical systems demonstrate that the jointed leg systems of humans have innate oscillations that form a gait. These observations suggest that systems may perceive gaits by synchronizing an internal oscillating model to observed oscillations. We present such a system in this paper that uses phase-locked loops to synchronize an internal oscillator with oscillations from a video source. Arrays of phase-locked loops, called video phase-locked loops, synchronize a system with the oscillations in pixel intensities. We then test the perception of the resulting synchronized-oscillator model in various gait recognition tasks. Tools based on Procrustes analysis and directional statistics provide the computational mechanism to compare patterns of oscillations. We discuss the possibility of an alternative model for motion perception based on synchronization with the transient oscillations of temporal band-pass filters that is consistent with other proposed models for human perception. Synchronization of a kinematic model to oscillations also suggests a path to bridge the gap between the model-free and model-based domains.
Article
A comprehensive survey of computer vision-based human motion capture literature from the past two decades is presented. The focus is on a general overview based on a taxonomy of system functionalities, broken down into four processes: initialization, tracking, pose estimation, and recognition. Each process is discussed and divided into subprocesses and/or categories of methods to provide a reference to describe and compare the more than 130 publications covered by the survey. References are included throughout the paper to exemplify important issues and their relations to the various methods. A number of general assumptions used in this research field are identified and the character of these assumptions indicates that the research field is still in an early stage of development. To evaluate the state of the art, the major application areas are identified and performances are analyzed in light of the methods presented in the survey. Finally, suggestions for future research directions are offered.
Article
Using gait as a biometric is of emerging interest. We describe a new model-based moving feature extraction analysis is presented that automatically extracts and describes human gait for recognition. The gait signature is extracted directly from the evidence gathering process. This is possible by using a Fourier series to describe the motion of the upper leg and apply temporal evidence gathering techniques to extract the moving model from a sequence of images. Simulation results highlight potential performance benefits in the presence of noise. Classification uses the k-nearest neighbour rule applied to the Fourier components of the motion of the upper leg. Experimental analysis demonstrates that an improved classification rate is given by the phase-weighted Fourier magnitude information over the use of the magnitude information alone. The improved classification capability of the phase-weighted magnitude information is verified using statistical analysis of the separation of clusters in the feature space. Furthermore, the technique is shown to be able to handle high levels of occlusion, which is of especial importance in gait as the human body is self-occluding. As such, a new technique has been developed to automatically extract and describe a moving articulated shape, the human leg, and shown its potential in gait as a biometric.
Article
We proposed a direct LDA algorithm for high-dimensional data classification, with application to face recognition in particular. Since the number of samples is typically smaller than the dimensionality of the samples, both S b and S w are singular. By modifying the simultaneous diagonalization procedure, we are able to discard the null space of S b – which carries no discriminative information – and to keep the null space of S w , which is very important for classification. In addition, computational techniques are introduced to handle large scatter matrices efficiently. The result is a unified LDA algorithm that gives an exact solution to Fisher’s criterion whether or not S w is singular.
Article
In this article the author uses projective relations as the theoretical foundation of his investigations of visual space and motion. Several laboratory experiments involving perceptual vector analysis and its geometric basis are described. In most of the experiments the visual stimuli consisted of computer-controlled patterns displayed on a televisionlike screen and projected into the eyes of subjects by means of a collimating device that removed parallax as well as the possibility of seeing the screen. A common characteristics of the experiments was that the observer was evidently not free to choose between a Euclidean interpretation of the changing geometry of the figure in the display and a projective interpretation. For example, the observer could not persuade himself that what he was seeing was simply a square growing larger and smaller in the same visual plane; his visual system insisted on telling him that he was seeing a square of constant size approaching and receding. Hence he perceived rigid motion in depth, rotation in a specific slant, bending in depth and so on, paired with the highest possible degree of object constancy. Further experiments were conducted to determine if the principles of perceptual analysis hold true for the more complex paterns of motions encountered in everyday life. These experiments led to the conclusion that during locomotion the components of the human visual environment are interpreted as rigid structures in relative motion.
Article
Two-dimensional spatial linear filters are constrained by general uncertainty relations that limit their attainable information resolution for orientation, spatial frequency, and two-dimensional (2D) spatial position. The theoretical lower limit for the joint entropy, or uncertainty, of these variables is achieved by an optimal 2D filter family whose spatial weighting functions are generated by exponentiated bivariate second-order polynomials with complex coefficients, the elliptic generalization of the one-dimensional elementary functions proposed in Gabor's famous theory of communication [J. Inst. Electr. Eng. 93, 429 (1946)]. The set includes filters with various orientation bandwidths, spatial-frequency bandwidths, and spatial dimensions, favoring the extraction of various kinds of information from an image. Each such filter occupies an irreducible quantal volume (corresponding to an independent datum) in a four-dimensional information hyperspace whose axes are interpretable as 2D visual space, orientation, and spatial frequency, and thus such a filter set could subserve an optimally efficient sampling of these variables. Evidence is presented that the 2D receptive-field profiles of simple cells in mammalian visual cortex are well described by members of this optimal 2D filter family, and thus such visual neurons could be said to optimize the general uncertainty relations for joint 2D-spatial-2D-spectral information resolution. The variety of their receptive-field dimensions and orientation and spatial-frequency bandwidths, and the correlations among these, reveal several underlying constraints, particularly in width/length aspect ratio and principal axis organization, suggesting a polar division of labor in occupying the quantal volumes of information hyperspace.(ABSTRACT TRUNCATED AT 250 WORDS)
Article
Most vision research embracing the spatial frequency paradigm has been conceptually and mathematically a one-dimensional analysis of two-dimensional mechanisms. Spatial vision models and the experiments sustaining them have generally treated spatial frequency as a one-dimensional variable, even though receptive fields and retinal images are two-dimensional and linear transform theory obliges any frequency analysis to preserve dimension. Four models of cortical receptive fields are introduced and studied here in 2D form, in order to illustrate the relationship between their excitatory/inhibitory spatial structure and their resulting 2D spectral properties. It emerges that only a very special analytic class of receptive fields possess independent tuning functions for spatial frequency and orientation; namely, those profiles whose two-dimensional Fourier Transforms are expressible as the separable product of a radial function and an angular function. Furthermore, only such receptive fields would have the same orientation tuning curve for single bars as for gratings. All classes lacking this property would describe cells responsive to different orientations for different spatial frequencies and vice versa; this is shown to be the case, for example, for the Hubel & Wiesel model of cortical orientation-tuned simple cells receiving inputs from an aligned row of center/surround LGN cells. When these results are considered in conjunction with psychophysical evidence for nonseparability of spatial frequency and orientation tuning properties within a “channel”, it becomes mandatory that future spatial vision research of the Fourier genre take on an explicitly two-dimensional character.
Article
This paper presents a novel Gabor-based kernel Principal Component Analysis (PCA) method by integrating the Gabor wavelet representation of face images and the kernel PCA method for face recognition. Gabor wavelets first derive desirable facial features characterized by spatial frequency, spatial locality, and orientation selectivity to cope with the variations due to illumination and facial expression changes. The kernel PCA method is then extended to include fractional power polynomial models for enhanced face recognition performance. A fractional power polynomial, however, does not necessarily define a kernel function, as it might not define a positive semi-definite Gram matrix. Note that the sigmoid kernels, one of the three classes of widely used kernel functions (polynomial kernels, Gaussian kernels, and sigmoid kernels), do not actually define a positive semi-definite Gram matrix, either. Nevertheless, the sigmoid kernels have been successfully used in practice, such as in building support vector machines. In order to derive real kernel PCA features, we apply only those kernel PCA eigenvectors that are associated with positive eigenvalues. The feasibility of the Gabor-based kernel PCA method with fractional power polynomial models has been successfully tested on both frontal and pose-angled face recognition, using two data sets from the FERET database and the CMU PIE database, respectively. The FERET data set contains 600 frontal face images of 200 subjects, while the PIE data set consists of 680 images across 5 poses (left and right profiles, left and right half profiles, and frontal view) with 2 different facial expressions (neutral and smiling) of 68 subjects. The effectiveness of the Gaborbased Chengjun Liu is with the Department of Computer Science, New J...
Article
We present a new Euclidean distance for images, which we call IMage Euclidean Distance (IMED). Unlike the traditional Euclidean distance, IMED takes into account the spatial relationships of pixels. Therefore, it is robust to small perturbation of images. We argue that IMED is the only intuitively reasonable Euclidean distance for images. IMED is then applied to image recognition. The key advantage of this distance measure is that it can be embedded in most image classification techniques such as SVM, LDA, and PCA. The embedding is rather efficient by involving a transformation referred to as Standardizing Transform (ST). We show that ST is a transform domain smoothing. Using the Face Recognition Technology (FERET) database and two state-of-the-art face identification algorithms, we demonstrate a consistent performance improvement of the algorithms embedded with the new metric over their original versions.
Conference Paper
A person’s gait changes when he or she is carrying an object such as a bag, suitcase or rucksack. As a result, human identification and tracking are made more difficult because the averaged gait image is too simple to represent the carrying status. Therefore, in this paper we first introduce a set of Gabor based human gait appearance models, because Gabor functions are similar to the receptive field profiles in the mammalian cortical simple cells. The very high dimensionality of the feature space makes training difficult. In order to solve this problem we propose a general tensor discriminant analysis (GTDA), which seamlessly incorporates the object (Gabor based human gait appearance model) structure information as a natural constraint. GTDA differs from the previous tensor based discriminant analysis methods in that the training converges. Existing methods fail to converge in the training stage. This makes them unsuitable for practical tasks. Experiments are carried out on the USF baseline data set to recognize a human’s ID from the gait silhouette. The proposed Gabor gait incorporated with GTDA is demonstrated to significantly outperform the existing appearance-based methods.
Conference Paper
http://vislab.ucr.edu/PUBLICATIONS/pubs/Journal%20and%20Conference%20Papers/after10-1-1997/Conference/2004/Statistical%20Feature%20Fusion%20for%20Gait04.pdf This paper presents a novel approach for human recognition by combining statistical gait features from real and synthetic templates. Real templates are directly computed from training silhouette sequences, while synthetic templates are generated from training sequences by simulating silhouette distortion. A statistical feature extraction approach is used for learning effective features from real and synthetic templates. Features learned from real templates characterize human walking properties provided in training sequences, and features learned from synthetic templates predict gait properties under other conditions. A feature fusion strategy is therefore applied at the decision level to improve recognition performance. We apply the proposed approach to USF HumanID Database. Experimental results demonstrate that the proposed fusion approach not only achieves better performance than individual approaches, but also provides large improvement in performance with respect to the baseline algorithm.
Conference Paper
We present a model-based method for accurate extraction of pedestrian silhouettes from video sequences. Our approach is based on two assumptions, 1) there is a common appearance to all pedestrians, and 2) each individual looks like him/herself over a short amount of time. These assumptions allow us to learn pedestrian models that encompass both a pedestrian population appearance and the individual appearance variations. Using our models, we are able to produce pedestrian silhouettes that have fewer noise pixels and missing parts. We apply our silhouette extraction approach to the NIST gait data set and show that under the gait recognition task, our model-based silhouettes result in much higher recognition rates than silhouettes directly extracted from background subtraction, or any nonmodel-based smoothing schemes.
Conference Paper
Our goal is to establish a simple baseline method for human identification based on body shape and gait. This baseline recognition method provides a lower bound against which to evaluate more complicated procedures. We present a viewpoint-dependent technique based on template matching of body silhouettes. Cyclic gait analysis is performed to extract key frames from a test sequence. These frames are compared to training frames using normalized correlation, and subject classification is performed by nearest-neighbor matching among correlation scores. The approach implicitly captures biometric shape cues such as body height, width, and body-part proportions, as well as gait cues such as stride length and amount of arm swing. We evaluate the method on four databases with varying viewing angles, background conditions (indoors and outdoors), walking styles and pixels on target.
Conference Paper
We describe a representation of gait appearance for the purpose of person identification and classification. This gait representation is based on simple features such as moments extracted from orthogonal view video silhouettes of human walking motion. Despite its simplicity, the resulting feature vector contains enough information to perform well on human identification and gender classification tasks. We explore the recognition behaviors of two different methods to aggregate features over time under different recognition tasks. We demonstrate the accuracy of recognition using gait video sequences collected over different days and times and under varying lighting environments. In addition, we show results for gender classification based our gait appearance features using a support-vector machine
Conference Paper
We described a video-rate surveillance algorithm to detect and track people from a stationary camera, and to determine if they are carrying objects or moving unencumbered. The contribution of the paper is the shape analysis algorithm that both determines if a person is carrying an object and segments the object from the person so that it can be tracked, e.g., during an exchange of objects between two people. As the object is segmented an appearance model of the object is constructed. The method combines periodic motion estimation with static symmetry analysis of the silhouettes of a person in each frame of the sequence. Experimental results demonstrate robustness and real-time performance of the proposed algorithm
Conference Paper
An algorithm for simultaneous detection, segmentation, and characterization of spatiotemporal periodicity is presented. The use of periodicity templates is proposed to localize and characterize temporal activities. The templates not only indicate the presence and location of a periodic event, but also give an accurate quantitative periodicity measure. Hence, they can be used as a new means of periodicity representation. The proposed algorithm can also be considered as a “periodicity filter”, a low-level model of periodicity perception. The algorithm is computationally simple, and shown to be more robust than optical flow based techniques in the presence of noise. A variety of real-world examples are used to demonstrate the performance of the algorithm
Conference Paper
We introduce two enhanced Fisher linear discriminant (FLD) models (EFM) in order to improve the generalization ability of the standard FLD based classifiers such as Fisherfaces. Similar to Fisherfaces, both EFM models apply first principal component analysis (PCA) for dimensionality reduction before proceeding with FLD type of analysis. EFM-1 implements the dimensionality reduction with the goal to balance between the need that the selected eigenvalues account for most of the spectral energy of the raw data and the requirement that the eigenvalues of the within-class scatter matrix in the reduced PCA subspace are not too small. EFM-2 implements the dimensionality reduction as Fisherfaces do. It proceeds with the whitening of the within-class scatter matrix in the reduced PCA subspace and then chooses a small set of features (corresponding to the eigenvectors of the within-class scatter matrix) so that the smaller trailing eigenvalues are not included in further computation of the between-class scatter matrix. Experimental data using a large set of faces-1,107 images drawn from 369 subjects and including duplicates acquired at a later time under different illumination-from the FERET database shows that the EFM models outperform the standard FLD based methods
Article
This paper introduces a novel Gabor-Fisher (1936) classifier (GFC) for face recognition. The GFC method, which is robust to changes in illumination and facial expression, applies the enhanced Fisher linear discriminant model (EFM) to an augmented Gabor feature vector derived from the Gabor wavelet representation of face images. The novelty of this paper comes from (1) the derivation of an augmented Gabor feature vector, whose dimensionality is further reduced using the EFM by considering both data compression and recognition (generalization) performance; (2) the development of a Gabor-Fisher classifier for multi-class problems; and (3) extensive performance evaluation studies. In particular, we performed comparative studies of different similarity measures applied to various classifiers. We also performed comparative experimental studies of various face recognition schemes, including our novel GFC method, the Gabor wavelet method, the eigenfaces method, the Fisherfaces method, the EFM method, the combination of Gabor and the eigenfaces method, and the combination of Gabor and the Fisherfaces method. The feasibility of the new GFC method has been successfully tested on face recognition using 600 FERET frontal face images corresponding to 200 subjects, which were acquired under variable illumination and facial expressions. The novel GFC method achieves 100% accuracy on face recognition using only 62 features