Generalized discriminant analysis: a matrix exponential approach.

Department of Computer Science, Chongqing University, Chongqing 400030, China.
IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics: a publication of the IEEE Systems, Man, and Cybernetics Society (Impact Factor: 3.01). 08/2009; 40(1):186-97. DOI: 10.1109/TSMCB.2009.2024759
Source: PubMed

ABSTRACT Linear discriminant analysis (LDA) is well known as a powerful tool for discriminant analysis. In the case of a small training data set, however, it cannot directly be applied to high-dimensional data. This case is the so-called small-sample-size or undersampled problem. In this paper, we propose an exponential discriminant analysis (EDA) technique to overcome the undersampled problem. The advantages of EDA are that, compared with principal component analysis (PCA) + LDA, the EDA method can extract the most discriminant information that was contained in the null space of a within-class scatter matrix, and compared with another LDA extension, i.e., null-space LDA (NLDA), the discriminant information that was contained in the non-null space of the within-class scatter matrix is not discarded. Furthermore, EDA is equivalent to transforming original data into a new space by distance diffusion mapping, and then, LDA is applied in such a new space. As a result of diffusion mapping, the margin between different classes is enlarged, which is helpful in improving classification accuracy. Comparisons of experimental results on different data sets are given with respect to existing LDA extensions, including PCA + LDA, LDA via generalized singular value decomposition, regularized LDA, NLDA, and LDA via QR decomposition, which demonstrate the effectiveness of the proposed EDA method.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The classification of patterns into naturally ordered labels is referred to as ordinal regression. This paper proposes an ensemble methodology specifically adapted to this type of problem, which is based on computing different classification tasks through the formulation of different order hypotheses. Every single model is trained in order to distinguish between one given class (k) and all the remaining ones, while grouping them in those classes with a rank lower than k, and those with a rank higher than k. Therefore, it can be considered as a reformulation of the well-known one-versus-all scheme. The base algorithm for the ensemble could be any threshold (or even probabilistic) method, such as the ones selected in this paper: kernel discriminant analysis, support vector machines and logistic regression (LR) (all reformulated to deal with ordinal regression problems). The method is seen to be competitive when compared with other state-of-the-art methodologies (both ordinal and nominal), by using six measures and a total of 15 ordinal datasets. Furthermore, an additional set of experiments is used to study the potential scalability and interpretability of the proposed method when using LR as base methodology for the ensemble.
    IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics) 06/2013; · 3.24 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The traditional vectorized classifier is supposed to incorporate the class structural information but ignore the individual structure of single pattern. In contrast, the matrixized classifier is supposed to consider both the class and the individual structures, and thus gets a superior performance to the vectorized classifier. In this paper, we explore one middle granularity named the cluster between the class and individual, and introduce the cluster structure that means the structure within each class into the matrixized classifier design. Doing so can simultaneously utilize the class, the cluster, and the individual structures in the way that is from global to point. Therefore, the proposed classifier design here owns the three-fold structural information, and can bring the classification performance to an improving trend. In practice, we adopt the Modification of Ho–Kashyap algorithm with Squared approximation of the misclassification errors (MHKS) as the learning paradigm and develop a Three-fold Structured MHKS named TSMHKS. The advantage of the three-fold structural learning framework is considering different close degrees between samples so as to improve the performance. The experimental results demonstrate the feasibility and effectiveness of the TSMHKS. Furthermore, we discuss the theoretical and experimental generalization bound of the proposed algorithm.
    Pattern Recognition. 06/2013; 46(6):1532–1555.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Multi-view learning was supposed to process data with multiple information sources. Our previous work extended multi-view learning and proposed one effective learning machine named MultiV-MHKS. MultiV-MHKS firstly changed a base classifier into M different sub-classifiers, and then designed one joint learning process for the generated M sub-ones. Each sub-classifier was taken as one view of MultiV-MHKS. However, MultiV-MHKS assumed that each sub-classifier should play an equal role in the ensemble. Thus the weight values rqrq, q=1…Mq=1…M for each sub-classifier were set to the equal value. In practice, this hypothesis was neither flexible nor appropriate since rqs should reflect different effects of their corresponding views. In order to make rqs flexible and appropriate, in this paper we propose a regularized multi-view learning machine named RMultiV-MHKS with the optimized rqs. In this case, we optimize rqs through using the Response Surface Technique (RST) on cross-validation data and thus can obtain a regularized multi-view learning machine. Doing so can assign a certain view with zero weight in the combination, which means that this specific view does not carry discriminative information for the problem and hence can be pruned. The experimental results here validate the effectiveness of the proposed RMultiV-MHKS and meanwhile explore the effect of some important parameters. The characters of the RMultiV-MHKS are: (1) distributing more weight to the favorable views which can reflect the property of the problem; (2) owning a tighter generalization risk bound than its corresponding single-view learning machine in terms of the Rademacher complexity; (3) having a statistically superior classification performance to the original MultiV-MHKS.
    Neurocomputing. 11/2012; 97:201–213.