Ran He

Chinese Academy of Sciences, Peping, Beijing, China

Are you Ran He?

Claim your profile

Publications (63)67.12 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Cross-modal retrieval has recently drawn much attention due to the widespread existence of multimodal data. It takes one type of data as the query to retrieve relevant data objects of another type, and generally involves two basic problems: the measure of relevance and coupled feature selection. Most previous methods just focus on solving the first problem. In this paper, we aim to deal with both problems in a novel joint learning framework. To address the first problem, we learn projection matrices to map multimodal data into a common subspace, in which the similarity between different modalities of data can be measured. In the learning procedure, the ℓ21-norm penalties are imposed on the projection matrices separately to solve the second problem, which selects relevant and discriminative features from different feature spaces simultaneously. A multimodal graph regularization term is further imposed on the projected data, which preserves the inter-modality and intra-modality similarity relationships. An iterative algorithm is presented to solve the proposed joint learning problem, along with its convergence analysis. Experimental results on cross-modal retrieval tasks demonstrate that the proposed method outperforms the state-of-the-art subspace approaches.
    No preview · Article · Dec 2015 · IEEE Transactions on Pattern Analysis and Machine Intelligence
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses the problem of grouping the data points sampled from a union of multiple subspaces in the presence of outliers. Information theoretic objective functions are proposed to combine structured low-rank representations (LRRs) to capture the global structure of data and information theoretic measures to handle outliers. In theoretical part, we point out that group sparsity-induced measures (ℓ2,1-norm, ℓα-norm, and correntropy) can be justified from the viewpoint of half-quadratic (HQ) optimization, which facilitates both convergence study and algorithmic development. In particular, a general formulation is accordingly proposed to unify HQ-based group sparsity methods into a common framework. In algorithmic part, we develop information theoretic subspace clustering methods via correntropy. With the help of Parzen window estimation, correntropy is used to handle either outliers under any distributions or sample-specific errors in data. Pairwise link constraints are further treated as a prior structure of LRRs. Based on the HQ framework, iterative algorithms are developed to solve the nonconvex information theoretic loss functions. Experimental results on three benchmark databases show that our methods can further improve the robustness of LRR subspace clustering and outperform other state-of-the-art subspace clustering methods.
    No preview · Article · Dec 2015 · IEEE transactions on neural networks and learning systems
  • Xiang Wu · Ran He · Zhenan Sun
    [Show abstract] [Hide abstract]
    ABSTRACT: Convolution neural network (CNN) has significantly pushed forward the development of face recognition techniques. To achieve ultimate accuracy, CNN models tend to be deeper or multiple local facial patch ensemble, which result in a waste of time and space. To alleviate this issue, this paper studies a lightened CNN framework to learn a compact embedding for face representation. First, we introduce the concept of maxout in the fully connected layer to the convolution layer, which leads to a new activation function, named Max-Feature-Map (MFM). Compared with widely used ReLU, MFM can simultaneously capture compact representation and competitive information. Then, one shallow CNN model is constructed by 4 convolution layers and totally contains about 4M parameters; and the other is constructed by reducing the kernel size of convolution layers and adding Network in Network (NIN) layers between convolution layers based on the previous one. These models are trained on the CASIA-WebFace dataset and evaluated on the LFW and YTF datasets. Experimental results show that the proposed models achieve state-of-the-art results. At the same time, a reduction of computational cost is reached by over 9 times in comparison with the released VGG model.
    No preview · Article · Nov 2015
  • Shu Zhang · Jian Liang · Ran He · Zhenan Sun
    [Show abstract] [Hide abstract]
    ABSTRACT: Learning based hashing techniques have attracted broad research interests in the Big Media research area. They aim to learn compact binary codes which can preserve semantic similarity in the Hamming embedding. However, the discrete constraints imposed on binary codes typically make hashing optimizations very challenging. In this paper, we present a code consistent hashing (CCH) algorithm to learn discrete binary hash codes. To form a simple yet efficient hashing objective function, we introduce a new code consistency constraint to leverage discriminative information and propose to utilize the Hadamard code which favors an information-theoretic criterion as the class prototype. By keeping the discrete constraint and introducing an orthogonal constraint, our objective function can be minimized efficiently. Experimental results on three benchmark datasets demonstrate that the proposed CCH outperforms state-of-the-art hashing methods in both image retrieval and classification tasks, especially with short binary codes.
    No preview · Article · Sep 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: Subspace clustering has important and wide applications in computer vision and pattern recognition. It is a challenging task to learn low-dimensional subspace structures due to complex noise existing in high-dimensional data. Complex noise has much more complex statistical structures, and is neither Gaussian nor Laplacian noise. Recent subspace clustering methods usually assume a sparse representation of the errors incurred by noise and correct these errors iteratively. However large corruptions incurred by complex noise can not be well addressed by these methods. A novel optimization model for robust subspace clustering is proposed in this paper. Its objective function mainly includes two parts. The first part aims to achieve a sparse representation of each high-dimensional data point with other data points. The second part aims to maximize the correntropy between a given data point and its low-dimensional representation with other points. Correntropy is a robust measure so that the influence of large corruptions on subspace clustering can be greatly suppressed. An extension of pairwise link constraints is also proposed as prior information to deal with complex noise. Half-quadratic minimization is provided as an efficient solution to the proposed robust subspace clustering formulations. Experimental results on three commonly used datasets show that our method outperforms state-of-the-art subspace clustering methods.
    No preview · Article · Jul 2015 · IEEE Transactions on Image Processing
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a structured ordinal measure method for video-based face recognition that simultaneously learns ordinal filters and structured ordinal features. The problem is posed as a non-convex integer program problem that includes two parts. The first part learns stable ordinal filters to project video data into a large-margin ordinal space. The second seeks self-correcting and discrete codes by balancing the projected data and a rank-one ordinal matrix in a structured low-rank way. Unsupervised and supervised structures are considered for the ordinal matrix. In addition, as a complement to hierarchical structures, deep feature representations are integrated into our method to enhance coding stability. An alternating minimization method is employed to handle the discrete and low-rank constraints, yielding high-quality codes that capture prior structures well. Experimental results on three commonly used face video databases show that our method with a simple voting classifier can achieve state-of-the-art recognition rates using fewer features and samples.
    Preview · Article · Jul 2015
  • Qiyue Yin · Shu Wu · Ran He · Liang Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: Multi-view clustering, which aims to cluster datasets with multiple sources of information, has a wide range of applications in the communities of data mining and pattern recognition. Generally, it makes use of the complementary information embedded in multiple views to improve clustering performance. Recent methods usually find a low-dimensional embedding of multi-view data, but often ignore some useful prior information that can be utilized to better discover the latent group structure of multi-view data. To alleviate this problem, a novel pairwise sparse subspace representation model for multi-view clustering is proposed in this paper. The objective function of our model mainly includes two parts. The first part aims to harness prior information to achieve a sparse representation of each high-dimensional data point with respect to other data points in the same view. The second part aims to maximize the correlation between the representations of different views. An alternating minimization method is provided as an efficient solution for the proposed multi-view clustering algorithm. A detailed theoretical analysis is also conducted to guarantee the convergence of the proposed method. Moreover, we show that the must-link and cannot-link constraints can be naturally integrated into the proposed model to obtain a link constrained multi-view clustering model. Extensive experiments on five real world datasets demonstrate that the proposed model performs better than several state-of-the-art multi-view clustering methods.
    No preview · Article · May 2015 · Neurocomputing
  • [Show abstract] [Hide abstract]
    ABSTRACT: High dimensional dense features have been shown to be useful for face recognition, but result in high query time when searching a large-scale face database. Hence binary codes are often used to obtain fast query speeds as well as reduce storage requirements. However, binary codes for face features can become unstable and unpredictable due to face variations induced by pose, expression and illumination. This paper proposes a predictable hash code algorithm to map face samples in the original feature space to Hamming space. First, we discuss the ‘predictability’ of hash codes for face indexing. Second, we formulate the predictable hash coding problem as a non-convex combinatorial optimization problem, in which the distance between codes for samples from the same class is minimized while the distance between codes for samples from different classes is maximized. An Expectation Maximization method is introduced to iteratively find a sparse and predictable linear mapping. Lastly, a deep feature representation is learned to further enhance the predictability of binary codes. Experimental results on three commonly used face databases demonstrate the superiority of our predictable hash coding algorithm on large-scale problems.
    No preview · Article · Apr 2015 · Pattern Recognition
  • [Show abstract] [Hide abstract]
    ABSTRACT: Subspace segmentation methods usually rely on the raw explicit feature vectors in an unsupervised manner. In many applications, it is cheap to obtain some pairwise link information that tells whether two data points are in the same subspace or not. Though partially available, such link information serves as some kind of high-level semantics, which can be further used as a constraint to improve the segmentation accuracy. By constructing a link matrix and using it as a regularizer, we propose a semi-supervised subspace segmentation model where the partially observed subspace membership prior can be encoded. Specificly, under the common linear representation assumption, we enforce the representational coefficient to be consistent with the link matrix. Thus the low-level and high-level information about the data can be integrated to produce more precise segmentation results. We then develop an effective algorithm to optimize our model in an alternating minimization way. Experimental results for both motion segmentation and face clustering validate that incorporating such link information is helpful to assist and bias the unsupervised subspace segmentation methods.
    No preview · Article · Jan 2015
  • Shu Zhang · Man Zhang · Ran He · Zhenan Sun
    [Show abstract] [Hide abstract]
    ABSTRACT: Dictionary learning has important applications in face recognition. However, large transformation variations of face images pose a grand challenge to conventional dictionary learning methods. A large portion of misleading dictionary atoms are usually learned to represent transformation factors, which will cause ambiguity in face recognition. To address this problem, this paper proposes a general framework for transform-invariant basis matrix learning. Specifically, we present a transform-invariant dictionary learning method which explicitly incorporates an appearance consistent error term to the original objective function in dictionary learning. The unified objective function is effectively optimized in an alternating iterative way. An ensemble of aligned images and a discriminative transform-invariant dictionary for sparse coding can be obtained by solving the formulated objective function. Experimental results on two public face databases demonstrate our algorithm's superiority compared with two state-of-the-art dictionary learning methods and the recently proposed transform-invariant PCA method.
    No preview · Article · Jan 2015
  • Source
    Ran He · Man Zhang · Liang Wang · Ye Ji · Qiyue Yin
    [Show abstract] [Hide abstract]
    ABSTRACT: In multimedia applications, the text and image components in a web document form a pairwise constraint that potentially indicates the same semantic concept. This paper studies cross-modal learning via the pairwise constraint, and aims to find the common structure hidden in different modalities. We first propose a compound regularization framework to deal with the pairwise constraint, which can be used as a general platform for developing cross-modal algorithms. For unsupervised learning, we propose a cross-modal subspace clustering method to learn a common structure for different modalities. For supervised learning, to reduce the semantic gap and the outliers in pairwise constraints, we propose a cross-modal matching method based on compound ?21 regularization along with an iteratively reweighted algorithm to find the global optimum. Extensive experiments demonstrate the benefits of joint text and image modeling with semantically induced pairwise constraints, and show that the proposed cross-modal methods can further reduce the semantic gap between different modalities and improve the clustering/retrieval accuracy.
    Preview · Article · Nov 2014 · IEEE Transactions on Image Processing
  • [Show abstract] [Hide abstract]
    ABSTRACT: Several methods have been proposed to describe face images in order to recognize them automatically. Local methods based on spatial histograms of local patterns (or operators) are among the best-performing ones. In this paper, a new method that allows to obtain more robust histograms of local patterns by using a more discriminative spatial division strategy is proposed. Spatial histograms are obtained from regions clustered according to the semantic pixel relations, making better use of the spatial information. Here, a simple rule is used, in which pixels in an image patch are clustered by sorting their intensity values. By exploring the information entropy on image patches, the number of sets on each of them is learned. Besides, Principal Component Analysis with a Whitening process is applied for the final feature vector dimension reduction, making the representation more compact and discriminative. The proposed division strategy is invariant to monotonic grayscale changes, and shows to be particularly useful when there are large expression variations on the faces. The method is evaluated on three widely used face recognition databases: AR, FERET and LFW, with the very popular LBP operator and some of its extensions. Experimental results show that the proposal not only outperforms those methods that use the same local patterns with the traditional division, but also some of the best-performing state-of-the-art methods.
    No preview · Article · May 2014 · EURASIP Journal on Image and Video Processing
  • [Show abstract] [Hide abstract]
    ABSTRACT: Robust sparse representation has shown significant potential in solving challenging problems in computer vision such as biometrics and visual surveillance. Although several robust sparse models have been proposed and promising results have been obtained, they are either for error correction or for error detection, and learning a general framework that systematically unifies these two aspects and explores their relation is still an open problem. In this paper, we develop a half-quadratic (HQ) framework to solve the robust sparse representation problem. By defining different kinds of half-quadratic functions, the proposed HQ framework is applicable to performing both error correction and error detection. More specifically, by using the additive form of HQ, we propose an $(\ell_1)$-regularized error correction method by iteratively recovering corrupted data from errors incurred by noises and outliers; by using the multiplicative form of HQ, we propose an $(\ell_1)$-regularized error detection method by learning from uncorrupted data iteratively. We also show that the $(\ell_1)$-regularization solved by soft-thresholding function has a dual relationship to Huber M-estimator, which theoretically guarantees the performance of robust sparse representation in terms of M-estimation. Experiments on robust face recognition under severe occlusion and corruption validate our framework and findings.
    No preview · Article · Feb 2014 · IEEE Transactions on Software Engineering
  • [Show abstract] [Hide abstract]
    ABSTRACT: Subspace clustering has important and wide applications in computer vision and pattern recognition. It is a challenging task to learn low-dimensional subspace structures due to the possible errors (e.g., noise and corruptions) existing in high-dimensional data. Recent subspace clustering methods usually assume a sparse representation of corrupted errors and correct the errors iteratively. However large corruptions in real-world applications can not be well addressed by these methods. A novel optimization model for robust subspace clustering is proposed in this paper. The objective function of our model mainly includes two parts. The first part aims to achieve a sparse representation of each high-dimensional data point with other data points. The second part aims to maximize the correntropy between a given data point and its low-dimensional representation with other points. Correntropy is a robust measure so that the influence of large corruptions on subspace clustering can be greatly suppressed. An extension of our method with explicit introduction of representation error terms into the model is also proposed. Half-quadratic minimization is provided as an efficient solution to the proposed robust subspace clustering formulations. Experimental results on Hopkins 155 dataset and Extended Yale Database B demonstrate that our method outperforms state-of-the-art subspace clustering methods.
    No preview · Conference Paper · Dec 2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Cross-modal matching has recently drawn much attention due to the widespread existence of multimodal data. It aims to match data from different modalities, and generally involves two basic problems: the measure of relevance and coupled feature selection. Most previous works mainly focus on solving the first problem. In this paper, we propose a novel coupled linear regression framework to deal with both problems. Our method learns two projection matrices to map multimodal data into a common feature space, in which cross-modal data matching can be performed. And in the learning procedure, the ell_21-norm penalties are imposed on the two projection matrices separately, which leads to select relevant and discriminative features from coupled feature spaces simultaneously. A trace norm is further imposed on the projected data as a low-rank constraint, which enhances the relevance of different modal data with connections. We also present an iterative algorithm based on half-quadratic minimization to solve the proposed regularized linear regression problem. The experimental results on two challenging cross-modal datasets demonstrate that the proposed method outperforms the state-of-the-art approaches.
    No preview · Conference Paper · Dec 2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Great progress has been achieved in face recognition in the last three decades. However, it is still challenging to characterize the identity related features in face images. This paper proposes a novel facial feature extraction method named Gabor Ordinal Measures (GOM), which integrates the distinctiveness of Gabor features and the robustness of ordinal measures as a promising solution to jointly handle inter-person similarity and intra-person variations in face images. In the proposal, different kinds of ordinal measures are derived from magnitude, phase, real and imaginary components of Gabor images, respectively, and then are jointly encoded as visual primitives in local regions. The statistical distributions of these visual primitives in face image blocks are concatenated into a feature vector and linear discriminant analysis is further used to obtain a compact and discriminative feature representation. Finally, a two-stage cascade learning method and a greedy block selection method are used to train a strong classifier for face recognition. Extensive experiments on publicly available face image databases such as FERET, AR and large scale FRGC v2.0 demonstrate state-of-the-art face recognition performance of GOM.
    Full-text · Article · Nov 2013 · IEEE Transactions on Information Forensics and Security
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper investigates the problem of cross-modal retrieval, where users can search results across various modalities by submitting any modality of query. Since the query and its retrieved results can be of different modalities, how to measure the content similarity between different modalities of data remains a challenge. To address this problem, we propose a joint graph regularized multi-modal subspace learning (JGRMSL) algorithm, which integrates inter-modality similarities and intra-modality similarities into a joint graph regularization to better explore the cross-modal correlation and the local manifold structure in each modality of data. To obtain good class separation, the idea of Linear Discriminant Analysis (LDA) is incorporated into the proposed method by maximizing the between-class covariance of all projected data and minimizing the within-class covariance of all projected data. Experimental results on two public cross-modal datasets demonstrate the effectiveness of our algorithm.
    No preview · Conference Paper · Nov 2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Subspace clustering via Low-Rank Representation (LRR) has shown its effectiveness in clustering the data points sampled from a union of multiple subspaces. In original LRR, the noise in data is assumed to be Gaussian or sparse, which may be inappropriate in real-world scenarios, especially when the data is densely corrupted. In this paper, we aim to improve the robustness of LRR in the presence of large corruptions and outliers. First, we propose a robust LRR method by introducing the correntropy loss function. Second, a column-wise correntropy loss function is proposed to handle the sample-specific errors in data. Furthermore, an iterative algorithm based on half-quadratic optimization is developed to solve the proposed methods. Experimental results on Hopkins 155 dataset and Extended Yale Database B show that our methods can further improve the robustness of LRR and outperform other subspace clustering methods.
    No preview · Conference Paper · Nov 2013
  • Qi Li · Zhenan Sun · Ran He · Tieniu Tan
    [Show abstract] [Hide abstract]
    ABSTRACT: Both image alignment and image clustering are widely researched with numerous applications in recent years. These two problems are traditionally studied separately. However in many real world applications, both alignment and clustering results are needed. Recent study has shown that alignment and clustering are two highly coupled problems. Thus we try to solve the two problems in a unified framework. In this paper, we propose a novel joint alignment and clustering algorithm by integrating spatial transformation parameters and clustering parameters into a unified objective function. The proposed function seeks the lowest rank representation among all the candidates that can represent misaligned images. It is indeed a transformed Low-Rank Representation. As far as we know, this is the first time to cluster the misaligned images using the transformed Low-Rank Representation. We can solve the proposed function by linear zing the objective function, and then iteratively solving a sequence of linear problems via the Augmented Lagrange Multipliers method. Experimental results on various data sets validate the effectiveness of our method.
    No preview · Conference Paper · Nov 2013
  • Lihu Xiao · Zhenan Sun · Ran He · Tieniu Tan
    [Show abstract] [Hide abstract]
    ABSTRACT: The wide deployments of iris recognition systems promote the emergence of different types of iris sensors. Large differences such as illumination wavelength and resolution result in cross-sensor variations of iris texture patterns. These variations decrease the accuracy of iris recognition. To address this issue, a feasible solution is to select an optimal effective feature set for all types of iris sensors. In this paper, we propose a margin based feature selection method for cross-sensor iris recognition. This method learns coupled feature weighting factors by minimizing a cost function, which aims at selecting the feature set to represent the intrinsic characteristics of iris images from different sensors. Then, the optimization problem can be formulated and solved using linear programming. Extensive experiments on the Notre Dame Cross Sensor Iris Database and CASIA cross sensor iris database show that the proposed method outperforms conventional feature selection methods in cross-sensor iris recognition.
    No preview · Conference Paper · Nov 2013