ArticlePublisher preview available

Tensor-driven low-rank discriminant analysis for image set classification

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Classification based on image sets has recently attracted great interest in computer vision community. In this paper, we proposed a transductive Tensor-driven Low-rank Discriminant Analysis (TLRDA) model for image set classification, in which the tensor-driven low-rank approximation and the discriminant graph embedding are integrated to improve the representativeness of image sets. In addition, we develop an iterative shrinkage thresholding algorithm to better optimize the objective function of the proposed TLRDA. Experiments on seven publicly available datasets demonstrate that our proposed method is guaranteed to converge within a small number of iterations during the training procedure and obtains promising results compared with state-of-the-art methods.
This content is subject to copyright. Terms and conditions apply.
DOI 10.1007/s11042-017-5173-0
Tensor-driven low-rank discriminant analysis for image
set classification
Jing Zhang1·Zhengnan Li1·Peiguang Jing1·
Ye Liu2·Yuting Su1
Received: 21 April 2017 / Revised: 25 August 2017 / Accepted: 29 August 2017
© Springer Science+Business Media, LLC 2017
Abstract Classification based on image sets has recently attracted great interest in com-
puter vision community. In this paper, we proposed a transductive Tensor-driven Low-rank
Discriminant Analysis (TLRDA) model for image set classification, in which the tensor-
driven low-rank approximation and the discriminant graph embedding are integrated to
improve the representativeness of image sets. In addition, we develop an iterative shrinkage
thresholding algorithm to better optimize the objective function of the proposed TLRDA.
Experiments on seven publicly available datasets demonstrate that our proposed method is
guaranteed to converge within a small number of iterations during the training procedure
and obtains promising results compared with state-of-the-art methods.
Keywords Image set classification ·Low-rank ·Tensor-driven ·Discriminant analysis ·
Grassmann manifold
Peiguang Jing
pgjing@tju.edu.cn
Jing Zhang
zhangjing@tju.edu.cn
Zhengnan Li
pgjing@tju.edu.cn
Ye Liu
liuye.cis@gmail.com
Yuting Su
ytsu@tju.edu.cn
1School of Electrical and Information Engineering, Tianjin University, Tianjin, China
2School of Computing, National University of Singapore, Singapore, Singapore
Published online: 8 September 2017
Multimed Tools Appl (2019) 78:4001–4020
/
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... Yu et al. [25] proposed a robust sparse tensor subspace learning (RSTSL) for 3D human pose regression. Zhang et al. [30] presented a transductive Tensordriven Low-rank Discriminant Analysis (TLRDA) model for image set classification. While all the reported work has made significant advancement toward the classification of images or videos, they fundamentally share the common principle that learning is carried out inside the tensor space, and classifications are conducted by using vector-based classifiers. ...
Article
Full-text available
The logistic regression is a widely used method for multimedia classification. However, when it is applied to high-order data such as video sequences, traditional vector-based logistic regression often incurs loss of space-time structural information. The tensor extension method based on CP (CANDECOMP/PARAFAC) decomposition is powerful for capturing the multilinear latent information. The existing CP algorithms require the tensor rank to be manually specified, however, the determination of tensor rank remains a challenging problem especially for CP rank. To effectively exploit underlying space-time structural in video sequences, we propose a tensor-based logistic regression learning algorithm, in which the weight parameter are regarded to be a tensor, calculated after the CP tensor decomposition. We introduce a regularization term, L(2,1)-norm, into the logistic tensor regression, and automatically select the CP rank, making it adaptive to the input videos for improved weight tensor and thus classification performances. Extensive experimental results in comparison with five state-of-the-art regression methods support that our proposed algorithm achieves the best classification performances, providing a good potential for a range of applications towards computerized video classifications via tensor-based video descriptions.
Article
Full-text available
Spatiotemporal description is a research field with applications in various areas such as video indexing, surveillance, human-computer interfaces, among others. Big Data problems in large databases are now being treated with Deep Learning tools, however we still have room for improvement in spatiotemporal handcraft description. Moreover, we still have problems that involve small data in which data augmentation and other techniques are not valid, or even, it is not worth the use of an expensive method. The main contribution of this work is the development of a framework for spatiotemporal representation using orientation tensors enabling dimension reduction and invariance. This is a multipurpose framework called Features As Spatiotemporal Tensors (FASTensor). We evaluate this framework in two different applications: Video Pornography classification and Cancer Cell classification. The latter one is also a contribution of this work, since we introduce a new dataset called Melanoma Cancer Cell (MCC). It is a small dataset with inherent difficulties in the acquisition process and its particular motion nature. The results are competitive, while also being ease to compute. Finally, our results in the MCC dataset can be used in other cancer cell treatment analysis.
Article
Although image set classification has attracted great attention in computer vision and pattern recognition communities, however, learning a compact and discriminative representation is still a challenge. In this paper, we present a novel tensor discriminant representation learning method to better solve the image classification task. Specifically, we first exploit the advantages of Grassmann manifold and tensor to model image sets as a high-order tensor. We then propose a transductive low-rank regularized tensor discriminant representation algorithm referred as LRRTDR to learn more intrinsic representations for image sets. Our proposed LRRTDR mainly contains two components: low-rank tensor embedding and discriminant graph embedding. The low-rank tensor embedding is to learn the lowest-rank representation from a low-dimensional subspace, which is spanned by a set of latent basis matrices. The discriminant graph embedding is to further enhance the discriminant ability of the learned representations under the graph embedding framework. To solve the optimization problem of LRRTDR, we develop an alternating direction scheme based on Iterative Shrinkage Thresholding Algorithm (ISTA). Experimental results on five publicly available datasets demonstrate that our proposed algorithm not only converges with few iterations, but also achieves better accuracy compared with state-of-the-art methods.
Article
Multi-task learning has received great interest recently in the area of machine learning. It shows a considerable capacity to jointly learn multiple latent relationships hidden among tasks, and has been widely used in data mining and computer vision problems. In this paper, we propose a new multi-task based collaborative linear regression framework to address the image classification problem, which allows the class-specific and collaboratively shared latent structure components to be explored simultaneously. The proposed framework take multi-target regression of each class as a task to transfer shared structures among them. To be more efficient and adaptive, the class-wise nonlinear subspace is also learned in this framework to earn inter-class discrimination and model adaptability. The proposed framework provides a unified and flexible perceptiveness for jointly learning the nonlinear projected features and regression parameters. Furthermore, a numerical scheme via iterative alternating optimization is also developed to solve the novel objective function in the proposed framework and guarantee the convergence. Extensive experimental results tested on several datasets demonstrated that our proposed framework outperforms existing competitive methods and achieves consistently high performance. OAPA
Article
Full-text available
Political ideology represents an imperfect yet important indicator of a host of personality traits and cognitive preferences. These preferences, in turn, seemingly propel liberals and conservatives towards divergent life-course experiences. Criminal behavior represents one particular domain of conduct where differences rooted in political ideology may exist. Using a national dataset, we test whether and to what extent political ideology is predictive of self-reported criminal behavior. Our results show that self-identified political ideology is mono-tonically related to criminal conduct cross-sectionally and prospectively and that liberals self-report more criminal conduct than do conservatives. We discuss potential causal mechanisms relating political ideology to individual conduct.
Article
As compared to simple actions, activities are much more complex, but semantically consistent with a human's real life. Techniques for action recognition from sensor generated data are mature. However, there has been relatively little work on bridging the gap between actions and activities. To this end, this paper presents a novel approach for complex activity recognition comprising of two components. The first component is temporal pattern mining, which provides a mid-level feature representation for activities, encodes temporal relatedness among actions, and captures the intrinsic properties of activities. The second component is adaptive Multi-Task Learning, which captures relatedness among activities and selects discriminant features. Extensive experiments on a real-world dataset demonstrate the effectiveness of our work.
Article
Urban water quality is of great importance to our daily lives. Prediction of urban water quality help control water pollution and protect human health. However, predicting the urban water quality is a challenging task since the water quality varies in urban spaces non-linearly and depends on multiple factors, such as meteorology, water usage patterns, and land uses. In this work, we forecast the water quality of a station over the next few hours from a data-driven perspective, using the water quality data and water hydraulic data reported by existing monitor stations and a variety of data sources we observed in the city, such as meteorology, pipe networks, structure of road networks, and point of interests (POIs). First, we identify the influential factors that affect the urban water quality via extensive experiments. Second, we present a multi-task multi-view learning method to fuse those multiple datasets from different domains into an unified learning model. We evaluate our method with real-world datasets, and the extensive experiments verify the advantages of our method over other baselines and demonstrate the effectiveness of our approach.
Article
High-dimensional data in the real world often resides in low-dimensional subspaces. The state-of-the-art methods for subspace segmentation include Low Rank Representation (LRR) and Sparse Representation (SR). The former seeks the global lowest rank representation but restrictively assumes the independence among subspaces, whereas the latter seeks the clustering of disjoint or overlapped subspaces through locality measure, which may cause failure in the case of large noises. To this end, a Low Rank subspace Sparse Representation framework, hereafter referred to as LRSR, is proposed in this paper to recover and segment embedding subspaces simultaneously. Three major contributions can be claimed in this paper: First, a clean dictionary is constructed by optimizing its nuclear norm, low-rank-sparse coefficient matrix obtained using linearized alternating direction method (LADM). Second, both the convergence proof and the complexity analysis are given to prove the effectiveness and efficiency of our proposed LRSR algorithm. Third, the experiments on synthetic data and two benchmark datasets further verify that the LRSR enjoys the capability of clustering disjoint subspaces as well as the robustness against large noises, thanks to its considerations of both global and local subspace information. Therefore, it has been demonstrated in this research that our proposed LRSR algorithm outperforms the state-of-the-art subspace clustering methods, verified by both theoretical analysis and the empirical studies.
Article
The task of video sequence classification plays a critical role in the development of computer vision. Considering this fact, this letter proposes a novel tensor decomposition method called tensor-driven temporal correlation in which general tensors are used as input for video sequence classification. Because distortion and redundancy may exist in the tensor representations of video sequences, we project the original tensor into subspaces spanned by spatial basis matrices in the proposed formulation. Moreover, to better preserve the temporal smoothness between consecutive slices of the tensor, the basis matrices are jointly learned by introducing an autoregressive model. An experiment on the commonly used Cambridge hand-gesture database demonstrates that our proposed method reaches convergence within a small number of iterations during the training stage and achieves promising results compared with state-of-the-art methods.
Article
This paper focuses on the specific problem of multiview learning where samples have the same feature set but different probability distributions, e.g., different viewpoints or different modalities. Since samples lying in different distributions cannot be compared directly, this paper aims to learn a latent subspace shared by multiple views assuming that the input views are generated from this latent subspace. Previous approaches usually learn the common subspace by either maximizing the empirical likelihood, or preserving the geometric structure. However, considering the complementarity between the two objectives, this paper proposes a novel approach, named low-rank discriminant embedding (LRDE), for multiview learning by taking full advantage of both sides. By further considering the duality between data points and features of multiview scene, i.e., data points can be grouped based on their distribution on features, while features can be grouped based on their distribution on the data points, LRDE not only deploys low-rank constraints on both sample level and feature level to dig out the shared factors across different views, but also preserves geometric information in both the ambient sample space and the embedding feature space by designing a novel graph structure under the framework of graph embedding. Finally, LRDE jointly optimizes low-rank representation and graph embedding in a unified framework. Comprehensive experiments in both multiview manner and pairwise manner demonstrate that LRDE performs much better than previous approaches proposed in recent literatures.