Fig 4 - uploaded by Chenping Hou

Content may be subject to copyright.

# Dimensionality reduction results of the S-curve with outliers. (a) S-curve with both small noises and outliers. (b) Embedding of RLLE. (c) Embedding of ‘Three-stage LLTE’.

Source publication

Dimensionality reduction is vital in many fields and locally linear embedding (LLE) is one of the most important approaches. However, LLE is unavoidable to derive the nonuniform wraps and folds when the data are of low sample density or unevenly sampled. LLE would also fail when the data are contaminated by even small noises. We have analyzed the p...

## Context in source publication

**Context 1**

... LLTE performs better than RLLE when the data has small noises, ‘Three-stage LLTE’ could not only eliminate and map the outliers in a practical way as RLLE, but also map the clean data in a more practical way than RLLE. Therefore, the total results of ‘Three-stage LLTE’ are more reasonable than that of RLLE. We have done an experiment shown in Fig. 4 for illustration. Moreover, ‘Three-stage LLTE’ is not sensitive to the effectiveness of outlier detection since LLTE is robust to small noises. The performance of RLLE, however, heavily depends on the effectiveness of the outlier detection. If points with small noises are not detected, RLLE would fail. Intuitively, one can see the comparisons between RLLE and LLTE shown in the following ...

## Similar publications

Unsupervised metric learning consists in building data-specific similarity measures without information of the class labels. Dimensionality reduction (DR) methods have shown to be a powerful mathematical tool for uncovering the underlying geometric structure of data. Manifold learning algorithms are capable of finding a more compact representation...

Differential expression plays an important role in cancer diagnosis and classification. In recent years, many methods have been used to identify differentially expressed genes. However, the recognition rate and reliability of gene selection still need to be improved. In this paper, a novel constrained method named robust nonnegative matrix factoriz...

Scaling scRNA-seq to profile millions of cells is crucial for constructing high-resolution maps of transcriptional manifolds. Current analysis strategies, in particular dimensionality reduction and two-phase clustering, offer only limited scaling and sensitivity to define such manifolds. We introduce Metacell-2, a recursive divide-and-conquer algor...

Recently,
${L_{1}}$
-norm-based robust discriminant feature extraction technique has been attracted much attention in dimensionality reduction and pattern recognition. However, it does not relate to the scatter matrix which well characterizes the geometric structure of data. In this paper, we propose a robust formulation of graph embedding framewo...

In this paper, a human action recognition method based on the kernelized Grassmann manifold learning is introduced. The goal is to find a map which transfers the high-dimensional data to a discriminative low-dimensional space by considering the geometry of the manifold. To this end, a multi-graph embedding method using three graphs named as center-...

## Citations

... Directly training with such data consumes a lot of time, downgrades learning performance, and tends to be over-fitting (Liu and Motoda 2012;Scott and Thompson 1983). Hence, it is crucial to reduce the dimensionality of high dimensional data (Hou et al. 2009). Since not all features are important, feature selection has become a dominating method for dimensionality reduction and arouses increasing attention. ...

Graph-based semi-supervised feature selection has aroused continuous attention in processing high-dimensional data with most unlabeled and fewer data samples. Many graph-based models perform on a pre-defined graph, which is separated from the procedure of feature selection, making the model hard to select the discriminative features. To address this issue, we exploit a self-adjusted graph for semi-supervised embedded feature selection method (SAGFS), which learns an optimal sparse similarity graph to replace the pre-defined graph to alleviate the effect of data noise. SAGFS allows the learned graph itself to be adjusted according to the local geometric structure of the data and the procedure of selecting features to select the most representative features. Besides that, we introduce l2,p$l_{2,p}$-norm to constrain the projection matrix for efficient feature selection. An efficient alternating optimization algorithm is presented, together with analyses on its convergence. Systematical experiments on several publicly datasets are performed to analyze the proposed model from several aspects, and demonstrate that our approaches outperform other comparison methods.

... In addition, the irrelevant and redundant features may also give rise to a series of problems including overfitting and poor prediction performance, resulting in the inefficiency of the learning tasks [1][2][3][4][5]. Therefore, dimensionality reduction has become an important stage of data preprocessing in such applications [6,7]. ...

Feature selection is a technique to improve the classification accuracy of classifiers and a convenient data visualization method. As an incremental, task oriented, and model-free learning algorithm, Q-learning is suitable for feature selection, this study proposes a dynamic feature selection algorithm, which combines feature selection and Q-learning into a framework. First, the Q-learning is used to construct the discriminant functions for each class of the data. Next, the feature ranking is achieved according to the all discrimination functions vectors for each class of the data comprehensively, and the feature ranking is doing during the process of updating discriminant function vectors. Finally, experiments are designed to compare the performance of the proposed algorithm with four feature selection algorithms, the experimental results on the benchmark data set verify the effectiveness of the proposed algorithm, the classification performance of the proposed algorithm is better than the other feature selection algorithms, meanwhile the proposed algorithm also has good performance in removing the redundant features, and the experiments of the effect of learning rates on the our algorithm demonstrate that the selection of parameters in our algorithm is very simple.

... Some features of the highdimensional data are related to the target task, while many features are redundant [23]. Therefore, dimension reduction has become an important stage of data preprocessing in such applications [12,13]. Feature selection and feature extraction are two main dimension reduction methods [2,22]. ...

Feature selection is an important data dimension reduction method, and it has been used widely in applications involving high-dimensional data such as genetic data analysis and image processing. In order to achieve robust feature selection, the latest works apply the l2,1 or l2,p-norm of matrix to the loss function and regularization terms in regression, and have achieved encouraging results. However, these existing works rigidly set the matrix norms used in the loss function and the regularization terms to the same l2,1 or l2,p-norm, which limit their applications. In addition, the algorithms for solutions they present either have high computational complexity and are not suitable for large data sets, or cannot provide satisfying performance due to the approximate calculation. To address these problems, we present a generalized l2,p-norm regression based feature selection (l2,p-RFS) method based on a new optimization criterion. The criterion extends the optimization criterion of (l2,p-RFS) when the loss function and the regularization terms in regression use different matrix norms. We cast the new optimization criterion in a regression framework without regularization. In this framework, the new optimization criterion can be solved using an iterative re-weighted least squares (IRLS) procedure in which the least squares problem can be solved efficiently by using the least square QR decomposition (LSQR) algorithm. We have conducted extensive experiments to evaluate the proposed algorithm on various well-known data sets of both gene expression and image data sets, and compare it with other related feature selection methods.

... Moreover, redundant features are not only useless, but can also severely reduce the effect of machine learning [1], [2], [3], [4], [5]. Therefore, dimensionality reduction is extremely important in the data preprocessing stage [6], [7]. Several methods exist to reduce dimensionality, such as the kernel method [8], [9], [10], subspace projection [8], and artificial neural networks [11]. ...

Feature selection and feature transformation are the two main approaches to reduce dimensionality, and they are often presented separately. In this study, a novel robust and efficient feature selection method, called FS-VLDA-L21 (feature selection based on variant of linear discriminant analysis and L2,1-norm), is proposed by combining a new variant of linear discriminant analysis and L2,1 sparsity regularization. Here, feature transformation and feature selection are integrated into a unified optimization objective. To obtain significant discriminative power between classes, all the data in the same class are expected to be regressed to a single vector, and the important task is to explore a transformation matrix such that the squared regression error is minimized. Therefore, we derive a new discriminant analysis from a novel view of least squares regression. In addition, we impose row sparsity on the transformation matrix through L2,1-norm regularized term to achieve feature selection. Consequently, the most discriminative features are selected, simultaneously eliminating the redundant ones. To address the L2,1-norm based optimization problem, we design a new efficient iterative re-weighted algorithm and prove its convergence. Extensive experimental results on four well-known datasets demonstrate the performance of our feature selection method.

... where C is usually a diagonal constant matrix and B is the constraint matrix defined to avoid a trivial solution of the objective function. L A denotes the Laplacian matrix of A. IsoP, NPE and LPP are samples of linearization extension [21] which the low-dimensional representations can be obtained from a linear projection as Y = W T X and aims to learn the transformation matrix W. The objective function of linearization extension can be expressed as: ...

Many feature extraction methods reduce the dimensionality of data based on the input graph matrix. The graph construction which reflects relationships among raw data points is crucial to the quality of resulting low-dimensional representations. To improve the quality of graph and make it more suitable for feature extraction tasks, we incorporate a new graph learning mechanism into feature extraction and add an interaction between the learned graph and the low-dimensional representations. Based on this learning mechanism, we propose a novel framework, termed as unsupervised single view feature extraction with structured graph (FESG), which learns both a transformation matrix and an ideal structured graph which contains the clustering information. Moreover, we propose a novel way to extend FESG framework for multi-view learning tasks. The extension is named as unsupervised multiple views feature extraction with structured graph (MFESG), which learns an optimal weight for each view automatically without requiring an additional parameter. To show the effectiveness of the framework, we design two concrete formulations within the frameworks of FESG and MFESG, together with two solving algorithms. Promising experimental results on plenty of real-world datasets have validated the effectiveness of our proposed algorithms. IEEE

... I N pattern recognition and data mining tasks, we are often confront with the curse of dimensionality, which may make it hard for us to train a stable classifier and it will take a long time to train the classifier. Thus, dimensionality reduction is a hot and classical topic [1], which attempts to overcome the curse of the dimensionality and to extract relevant features [2], [3]. For example, although the dimension of original feature of all images of the same subject is very high, its intrinsic dimensionality is usually very low [4]. ...

Dimensionality reduction is a crucial step for pattern recognition and data mining tasks to overcome the curse of dimensionality. Principal component analysis (PCA) is a traditional technique for unsupervised dimensionality reduction, which is often employed to seek a projection to best represent the data in a least-squares sense, but if the original data is nonlinear structure, the performance of PCA will quickly drop. An supervised dimensionality reduction algorithm called Linear discriminant analysis (LDA) seeks for an embedding transformation, which can work well with Gaussian distribution data or single-modal data, but for non-Gaussian distribution data or multimodal data, it gives undesired results. What is worse, the dimension of LDA cannot be more than the number of classes. In order to solve these issues, Local shrunk discriminant analysis (LSDA) is proposed in this work to process the non-Gaussian distribution data or multimodal data, which not only incorporate both the linear and nonlinear structures of original data, but also learn the pattern shrinking to make the data more flexible to fit the manifold structure. Further, LSDA has more strong generalization performance, whose objective function will become local LDA and traditional LDA when different extreme parameters are utilized respectively. What is more, a new efficient optimization algorithm is introduced to solve the non-convex objective function with low computational cost. Compared with other related approaches, such as PCA, LDA and local LDA, the proposed method can derive a subspace which is more suitable for non-Gaussian distribution and real data. Promising experimental results on different kinds of data sets demonstrate the effectiveness of the proposed approach.

... Among them, principal component analysis (PCA) [1,2] and linear discriminative analysis (LDA) [3,4] are undoubtedly the most classical ones. Meanwhile, manifold learning has become a research hotspot in PR and machine learning (ML) with several representative methods being proposed, including isometric mapping (ISOMAP) [5], locally linear embedding (LLE) [6], Laplacian eigenmaps (LE) [7], local tangent space alignment (LTSA) [8] and locally linear transformation embedding (LLTE) [9]. ...

... In the experiment on each feature combination, stochastic 20 samples per class are chosen for training, while the remaining 180 samples are used for testing. Therefore, the total number of training samples is 200, and the total number 92.0 ± 0.9 (33) 93.2 ± 0.8 (35) 65.3 ± 2.4 (9) 93.7 ± 0.9(68) 50.8 ± 9.2(9) ...

... 68.5 ± 6.5 (24) 89.5 ± 1.3(43) 90.2 ± 1.8 (29) 68.5 ± 2.4 (9) 91.4 ± 1.2 (32) 55.8 ± 9.6(9) 91.7 ± 1. 57.8 ± 10.1 (6) 74.4 ± 2.0 (6) Bold values indicate the best recognition accuracy obtained in each specific experimental setting of testing samples is 1800. Table 7 shows the maximal average recognition accuracies (%) across ten runs of each method together with their corresponding standard deviations and dimensions. ...

Due to the noise disturbance and limited number of training samples, within-set and between-set sample covariance matrices in canonical correlations analysis (CCA) based methods usually deviate from the true ones. In this paper, we re-estimate the covariance matrices by embedding fractional order and incorporate the class label information. First, we illustrate the effectiveness of the fractional-order embedding model through theory analysis and experiments. Then, we quote fractional-order within-set and between-set scatter matrices, which can significantly reduce the deviation of sample covariance matrices. Finally, we incorporate the supervised information, novel generalized CCA and discriminative CCA are presented for multi-view dimensionality reduction and recognition, called fractional-order embedding generalized canonical correlations analysis and fractional-order embedding discriminative canonical correlations analysis. Extensive experimental results on various handwritten numeral, face and object recognition problems show that the proposed methods are very effective and obviously outperform the existing methods in terms of classification accuracy.

... Representative global algorithms contain isometric mapping [1], maximum variance unfolding [2], and local coordinates alignment with global preservation [3]. Local methods mainly include Laplacian eigenmaps (LEM) [4], locally linear embedding (LLE) [5], Hessian eigenmaps (HLLE) [6], local tangent space alignment (LTSA) [7], local linear transformation embedding [8], stable local approaches [9], and maximal linear embedding [10]. ...

Spectral analysis‐based dimensionality reduction algorithms, especially the local manifold learning methods, have become popular recently because their optimizations do not involve local minima and scale well to large, high‐dimensional data sets. Despite their attractive properties, these algorithms are developed based on different geometric intuitions, and only partial information from the true geometric structure of the underlying manifold is learned by each method. In order to discover the underlying manifold structure more faithfully, we introduce a novel method to fuse the geometric information learned from different local manifold learning algorithms in this chapter. First, we employ local tangent coordinates to compute the local objects from different local algorithms. Then, we utilize the truncation function from differential manifold to connect the local objects with a global functional and finally develop an alternating optimization‐based algorithm to discover the low‐dimensional embedding. Experiments on synthetic as well as real data sets demonstrate the effectiveness of our proposed method.

... LGCS has extensive applications. Algorithms which can be interpreted as linear extension of graph embedding [19] can all find its corresponding editions in LGCS framework, e.g., LPP, NPE, IsoP. In the next subsection, a special LPP within our framework will be presented to show the effectiveness of the proposed framework. ...

... The existence of these features may result in low efficiency, over-fitting and poor prediction performance in learning tasks [1]- [5]. Consequently, dimensionality reduction has become an important stage of data preprocessing in such applications [6], [7]. ...

Feature selection and feature transformation, the two main ways to reduce
dimensionality, are often presented separately. In this paper, a feature
selection method is proposed by combining the popular transformation based
dimensionality reduction method Linear Discriminant Analysis (LDA) and sparsity
regularization. We impose row sparsity on the transformation matrix of LDA
through ${\ell}_{2,1}$-norm regularization to achieve feature selection, and
the resultant formulation optimizes for selecting the most discriminative
features and removing the redundant ones simultaneously. The formulation is
extended to the ${\ell}_{2,p}$-norm regularized case: which is more likely to
offer better sparsity when $0<p<1$. Thus the formulation is a better
approximation to the feature selection problem. An efficient algorithm is
developed to solve the ${\ell}_{2,p}$-norm based optimization problem and it is
proved that the algorithm converges when $0<p\le 2$. Systematical experiments
are conducted to understand the work of the proposed method. Promising
experimental results on various types of real-world data sets demonstrate the
effectiveness of our algorithm.