Robust penalized logistic regression with truncated loss functions

Department of Health Studies, Chicago, IL 60615, USA.
Canadian Journal of Statistics (Impact Factor: 0.65). 06/2011; 39(2):300-323. DOI: 10.1002/cjs.10105
Source: PubMed


The penalized logistic regression (PLR) is a powerful statistical tool for classification. It has been commonly used in many practical problems. Despite its success, since the loss function of the PLR is unbounded, resulting classifiers can be sensitive to outliers. To build more robust classifiers, we propose the robust PLR (RPLR) which uses truncated logistic loss functions, and suggest three schemes to estimate conditional class probabilities. Connections of the RPLR with some other existing work on robust logistic regression have been discussed. Our theoretical results indicate that the RPLR is Fisher consistent and more robust to outliers. Moreover, we develop estimated generalized approximate cross validation (EGACV) for the tuning parameter selection. Through numerical examples, we demonstrate that truncating the loss function indeed yields better performance in terms of classification accuracy and class probability estimation. The Canadian Journal of Statistics 39: 300–323; 2011 © 2011 Statistical Society of Canada
La régression logistique pénalisée (PLR) est un outil statistique puissant pour effectuer une classification. Elle est courramment utilisée en pratique. Malgré son succès, les règles de classifications obtenues peuvent être sensibles aux valeurs aberrantes, car la fonction de perte de la PLR n'est pas bornée. Afin de construire des règles de classifications plus robustes, nous proposons une PLR robuste (RPLR) utilisant une fonction de perte logistique tronquée et nous suggérons trois mécanismes pour l'estimation des probabilités conditionnelles d'appartenir à chaque classe. Nous discutons aussi des relations entre la RPLR et d'autres travaux déjà existants sur la régression logistique robuste. Nos résultats théoriques indiquent que la RPLR est convergente et plus robuste aux valeurs aberrantes. De plus, nous développons un estimateur de la validation croisée généralisée approximative (EGACV) pour sélectionner le paramétre d'ajustement. À l'aide d'exemples numériques, nous démontrons que la troncation de la fonction de perte conduit à de meilleures performances en terme de précision de la classification et l'estimation des probabilités d'appartenance aux différentes classes. La revue canadienne de statistique 39: 300–323; 2011 © 2011 Société statistique du Canada

  • Source
    • "To remove non-relevant variables and/or limit the complexity of the solution, both methods allow the integration of a penalization term in its objective function. The choice of the L 1 penalization for logistic regression aims to reduce the risk of overfitting induced by potential co-linearity and the combinatorial exploration of all possible two-way interactions (Hosmer Jr, 2013; Park & Liu, 2011). L 1 logistic regression uses linear combinations of explanatory variables to learn a single decision boundary and build an easily understandable linear model. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Multivariate classification methods using explanatory and predictive models are necessary for characterizing subgroups of patients according to their risk profiles. Popular methods include logistic regression and classification trees with performances that vary according to the nature and the characteristics of the dataset. In the context of imported malaria, we aimed at classifying severity criteria based on a heterogeneous patient population. We investigated these approaches by implementing two different strategies: L1 logistic regression (L1LR) that models a single global solution and classification trees that model multiple local solutions corresponding to discriminant subregions of the feature space. For each strategy, we built a standard model, and a sparser version of it. As an alternative to pruning, we explore a promising approach that first constrains the tree model with an L1LR-based feature selection, an approach we called L1LR-Tree. The objective is to decrease its vulnerability to small data variations by removing variables corresponding to unstable local phenomena. Our study is twofold: i) from a methodological perspective comparing the performances and the stability of the three previous methods, i.e L1LR, classification trees and L1LR-Tree, for the classification of severe forms of imported malaria, and ii) from an applied perspective improving the actual classification of severe forms of imported malaria by identifying more personalized profiles predictive of several clinical criteria based on variables dismissed for the clinical definition of the disease. The main methodological results show that the combined method L1LR-Tree builds sparse and stable models that significantly predicts the different severity criteria and outperforms all the other methods in terms of accuracy.
    Preview · Article · Nov 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: Feature selection is an important preprocessing step in machine learning and pattern recognition. It is also a data mining task in some real-world applications. Feature quality evaluation is a key issue when designing an algorithm for feature selection. The classification margin has been used widely to evaluate feature quality in recent years. In this study, we introduce a robust loss function, called Brownboost loss, which computes the feature quality and selects the optimal feature subsets to enhance robustness. We compute the classification loss in a feature space with hypothesis-margin and minimize the loss by optimizing the weights of features. An algorithm is developed based on gradient descent using L2-norm regularization techniques. The proposed algorithm is tested using UCI datasets and gene expression datasets, respectively. The experimental results show that the proposed algorithm is effective in improving the classification robustness.
    No preview · Article · Dec 2013 · Knowledge-Based Systems
  • [Show abstract] [Hide abstract]
    ABSTRACT: The relevance vector machine (RVM) is a widely employed statistical method for classification, which provides probability outputs and a sparse solution. However, the RVM can be very sensitive to outliers far from the decision boundary which discriminates between two classes. In this paper, we propose the robust RVM based on a weighting scheme, which is insensitive to outliers and simultaneously maintains the advantages of the original RVM. Given a prior distribution of weights, weight values are determined in a probabilistic way and computed automatically during training. Our theoretical result indicates that the influences of outliers are bounded through the probabilistic weights. Also, a guideline for determining hyperparameters governing a prior is discussed. The experimental results from synthetic and real data sets show that the proposed method performs consistently better than the RVM if a training data set is contaminated by outliers.
    No preview · Article · Aug 2015 · Annals of Operations Research