New Multicollinearity Indicators in Linear Regression Models

International Statistical Review (Impact Factor: 1.2). 04/2007; 75(1):114-121. DOI: 10.1111/j.1751-5823.2007.00007.x
Source: RePEc


Correlation is an important statistical issue for the Ordinary Least Squares estimates and for data-reduction techniques, such as the Factor and the Principal Components analyses. In this paper we propose new indicators for the multicollinearity problem in the multiple linear regression model.
La corrélation c'est une issue statistique importante pour les évaluations des moindres carrés ordinaires et pour les techniques de réduction de données, telles que l' analyse factorielle et l'analyse des composants principaux. Dans l'étude on peut trouver des nouvelles cotes pour évaluer le problème de multicollinearité dans le modèle de régression linéaire.

33 Reads
  • Source
    • "Other alternative collinearity measures have been proposed. These include the red indicator proposed by Kovacs et al. (2005) as a synthetic and normalized indicator of collinearity obtained as the quadratic mean of the elements outside the main diagonal of the independent variables correlation matrix; the direct effects factor (DEF) proposed by Curto and Pinto (2007), which compares the direct effects of independent variables on the response variable with the indirect ones resulting from the intercorrelations among the independent variables; and the inter correlation effect (ICE), which is a relative collinearity measure that can be used to test how the estimated coefficients are affected by the correlation among regressors (Curto and Pinto 2007). In spite of these different alternatives, the VIF remains the most widespread collinearity measure in econometric software and for practitioners in general. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper discusses some limitations when applying the CVIF of Curto and Pinto in J Appl Stat 38(7):1499–1507 (2011) and proposes some modifications to overcome them. The concept of modified CVIF is also extended to be applied in ridge estimation.
    Full-text · Article · Nov 2015 · Statistical Papers
  • Source
    • "Some recent works have analyzed and proposed some improvements and alternatives for the VIF as a measure of the existence of collinearity. For example, Dias and Castro [4] propose new indicators to this problem in the multiple linear regression model. Dias and Castro [5] show that the real impact on variance can be overestimated by the traditional VIF when the explanatory variables contain no redundant information about the dependent variable and propose corrected version of this collinearity indicator. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Ridge regression has been widely applied to estimate under collinearity by defining a class of estimators that are dependent on the parameter k. The variance inflation factor (VIF) is applied to detect the presence of collinearity and also as an objective method to obtain the value of k in ridge regression. Contrarily to the definition of the VIF, the expressions traditionally applied in ridge regression do not necessarily lead to values of VIFs equal to or greater than 1. This work presents an alternative expression to calculate the VIF in ridge regression that satisfies the aforementioned condition and also presents other interesting properties.
    Full-text · Article · Mar 2015 · Journal of Applied Statistics
  • Source
    • "They ascertain the degree of diversity within and in between feature sets. In addition to this, the descriptors of each set were analyzed for the multicollinearity using Variance Inflation Factor (VIF) (equation 3) [34] [35]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In modeling approaches, artificial neural networks (ANNs) have a special place to address the nonlinear phenomena or curved manifold. Often one or other feature selection approach is used prior to ANN to feed the input variables for its models. The function of ‘selected’ versus ‘arbitrary’ features on the outcome of ANN models is investigated with a variety of objectively selected and arbitrarily chosen variables from chemical databases namely thiazolidinones, anilinoquinolines and piperazinoquinolines. For each database, its biological activity is considered as the dependent variable and the molecular descriptors from DRAGON software are used as explanatory variables. The selection sets are obtained from feature selection approaches namely, combinatorial protocol in multiple linear regression, stepwise regression and genetic algorithm. Apart from these, a large number of arbitrary sets have been created by randomly picking the descriptors from corresponding databases. The features of all sets have shown a variety of inter- and intra- set diversities. A three-layer back propagation ANN with Levenberg-Marquardt optimization algorithm has been used for modeling the phenomena. Regardless of the origin of the feature sets, the ANN models from a very large number of sets have well explained the activity and qualified themselves to be predictive models. Also, no specific pattern is apparent between the quality of ANN model and the origin of its input feature set. Since these results are unusual, the study is extended to a few more databases. All the results emphasized the innate ability of ANN in developing complex network of relations among features to estimate the target variable. This has prompted us to suggest that prior feature selection is not essential for ANN and it is a desirable option for meaningful outputs in terms of the rationale behind the inputs.
    Full-text · Article · Nov 2009 · QSAR & Combinatorial Science
Show more