Comparison of ridge regression, partial least-squares, pairwise correlation, forward- and best subset selection methods for prediction of retention indices for aliphatic alcohols

Institute of Chemistry, Chemical Research Center, Hungarian Academy of Sciences, H-1525 Budapest, P.O. Box 17, Hungary.
Journal of Chemical Information and Modeling (Impact Factor: 4.07). 05/2005; 45(2):339-46. DOI: 10.1021/ci049827t
Source: PubMed

ABSTRACT A quantitative structure-retention relationship (QSRR) study based on multiple linear regression (MLR) was performed for the description and prediction of Kováts retention indices (RI) of alcohol compounds. Alcohols were of saturated, linear or branched types and contained a hydroxyl group on the primary, secondary or tertiary carbon atoms. Constitutive and weighted holistic invariant molecular (WHIM) descriptors were used to represent the structure of alcohols in the MLR models. Before the model building, five variable selection methods were applied to select the most relevant variables from a large set of descriptors, respectively. The selected molecular properties were included into the MLR models. The efficiency of the variable selection methods was also compared. The selection methods were as follows: ridge regression (RR), partial least-squares method (PLS), pair-correlation method (PCM), forward selection (FS) and best subset selection (BSS). The stability and the validity of the MLR models were tested by a cross-validation technique using a leave-n-out technique. Neither RR nor PLS selected variables were able to describe the Kováts retention index properly, and PCM gave reliable results in the description but not for prediction. We built models with good predicting ability using FS and BSS as a selection method. The most relevant variables in the description and prediction of RIs were the mean electrotopological state index, the molecular mass, and WHIM indices characterizing size and shape.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Gill-net saturation was analyzed through a delta model (i.e., two-stage model) by examining the effects of soak time and fish accumulation (number of fish of all species enmeshed per square meter of a given gill net, including the species of interest) on catch per unit effort (CPUE) of walleyes Sander vitreus and yellow perch Perca flavescens in Lake Erie. The analysis was based on fishery-independent survey data for 1989–2003. In the delta model, the positive values of CPUE were estimated by a generalized additive model (GAM) assuming a log-gamma distribution, and the probability of obtaining nonzero values of CPUE was estimated by a GAM assuming a binomial distribution. Soak time and fish accumulation had significant effects on CPUE. The CPUE of walleyes decreased in gill nets soaked for 10 h and started to decline when fish accumulation was around 2 fish/m. We did not observe a substantial decline in the CPUE of yellow perch within the soak time interval we examined, but we did observe a decline when fish accumulation was 6–8 fish/m. The decline in CPUE with increasing soak time for walleyes and with increasing fish accumulation levels for both walleyes and yellow perch indicates that gill-net saturation did exist in Lake Erie gill-net surveys for these two species and that the gill nets were saturated faster by walleyes than by yellow perch. We suggest that gill-net saturation be considered when applying CPUE from gill-net surveys to stock assessment and that the generalized linear additive-based modeling approach be considered as an alternative in gill-net saturation analyses.Received March 11, 2010; accepted January 6, 2011
    North American Journal of Fisheries Management 04/2011; 31(2):280-290. DOI:10.1080/02755947.2011.574931 · 1.11 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The semi-empirical electrotopological index (ISET) used for quantitative structure-retention relationship (QSRR) models firstly developed for alkanes and alkenes, was remodeled for different organic compounds such as ketones, aldehydes and esters. In this study, the ISET was developed and optimized to describe the chromatographic retention of aliphatic alcohols on six different stationary phases (SE-30, OV-3, OV-7, OV-11, OV-17 and OV-25). The presence of a hydroxyl group leads to a charge redistribution increasing the interactions between these molecules and the stationary phases relative to the interactions between hydrocarbons and the same stationary phases. These considerations were included in the calculation of ISET. The simple linear regressions (SLR) between retention indices and the ISET for each stationary phase showed good statistical quality, high internal stability and good predictive ability for external groups, especially for stationary phases with low polarity. A single combined model, in which the McReynolds polarity term was added as a descriptor, was developed for all the data points for the stationary phases and the results were of satisfactory predictive quality. The efficiency and the applicability of the approach were demonstrated through the high quality quantitative structure-property relationship (QSPR) model for the boiling point (BP). For this physical property 134 compounds were used; most of them were not included in the original data set employed to develop the ISET. Copyright © 2010 John Wiley & Sons, Ltd.
    Journal of Chemometrics 03/2010; 24(3‐4):149 - 157. DOI:10.1002/cem.1303 · 1.80 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This review covers a novel approach to comparing methods, based on the sum of ranking differences (SRD). Many method-comparison studies suffer from ambiguity or from comparisons not being quite fair. This problem can be avoided if there are differences between ideal and actual rankings. The absolute values of differences for the ideal and actual ranking are summed up and the procedure is repeated for each (actual) method. The SRD values obtained such a way order the methods simply. If the ideal ranking is not known, it can be replaced by the average (maximum or minimum of all methods or by a known sequence).SRD corresponds to the principle of parsimony and provides an easy tool to evaluate the methods: the smaller the sum the better the method. Models and other items can be similarly ranked.Validation can be carried out using simulated random numbers for comparison: an empirical histogram (bootstrap-like) shows whether the SRD values are far from random.Two case studies (clustering of HPLC columns and prediction of retention data) illustrate and validate the applicability of this novel approach to comparing methods.The technique is entirely general; it can be used in different fields (e.g., for stationary-phase (column) selection in chromatography, model and descriptor selection, comparing analytical and chemometric techniques, determination of panel consistency, etc.). The only prerequisite is that the data can be arranged in matrix form without empty cells.
    TrAC Trends in Analytical Chemistry 01/2010; 29(1):101-109. DOI:10.1016/j.trac.2009.09.009 · 6.61 Impact Factor