Flexible Modeling of the Effects of Serum Cholesterol on Coronary Heart Disease Mortality

Department of Epidemiology and Biostatistics, McGill University, Montreal, Quebec, Canada.
American Journal of Epidemiology (Impact Factor: 5.23). 04/1997; 145(8):714-29. DOI: 10.1093/aje/145.8.714
Source: PubMed


Current understanding of the impact of lipids and other risk factors on coronary heart disease is largely based on the results of parametric multiple regression analyses of large prospective studies. To assess the potential impact of the a priori assumption of linearity of continuous risk factors on the results of parametric analyses, the authors completed a secondary analysis of the Lipid Research Clinics Prevalence and Follow-up Studies (1972-1987) data using an assumption-free nonparametric modeling approach. The effects of total serum cholesterol and the ratio of total serum cholesterol to high density lipoprotein cholesterol, adjusted for common risk factors, were estimated using a smoothing spline method available in the generalized additive model extension of the multiple logistic regression. The data set included 2,512 men in the random sample of the Lipid Research Clinics study who did not take lipid-lowering medications. During the median follow-up of 12.6 years, 94 coronary heart disease deaths occurred. The generalized additive model fits the effects of total serum cholesterol (p < 0.01) and the ratio of total serum cholesterol to high density lipoprotein cholesterol (p < 0.02) significantly better than the parametric logistic regression. Validation studies confirmed that, among new observations arising from the same population, generalized additive model estimates predicted outcomes better than the parametric estimates. Nonlinear effects of both lipid measures were robust and may be clinically important. The authors conclude that the linearity assumption inherent in parametric models may result in biased estimates of the effects of total serum cholesterol on coronary heart disease mortality and recommend that their findings be verified in a nonparametric analysis of data from another large prospective study.

5 Reads
  • Source
    • "Logistic regression is a member of a family of generalized linear models. Replacing the LR argument with various forms of smooth functions has provided benefits in the study of colon-cancer [21], heart-disease [22] and infant mortality [23]. Other researchers have incorporated univariate kernel density estimations for studying prostate-cancer [24], health disparities [25], and nutrient intake [26]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Statistical learning (SL) techniques can address non-linear relationships and small datasets but do not provide an output that has an epidemiologic interpretation. A small set of clinical variables (CVs) for stage-1 non-small cell lung cancer patients was used to evaluate an approach for using SL methods as a preprocessing step for survival analysis. A stochastic method of training a probabilistic neural network (PNN) was used with differential evolution (DE) optimization. Survival scores were derived stochastically by combining CVs with the PNN. Patients (n = 151) were dichotomized into favorable (n = 92) and unfavorable (n = 59) survival outcome groups. These PNN derived scores were used with logistic regression (LR) modeling to predict favorable survival outcome and were integrated into the survival analysis (i.e. Kaplan-Meier analysis and Cox regression). The hybrid modeling was compared with the respective modeling using raw CVs. The area under the receiver operating characteristic curve (Az) was used to compare model predictive capability. Odds ratios (ORs) and hazard ratios (HRs) were used to compare disease associations with 95% confidence intervals (CIs). The LR model with the best predictive capability gave Az = 0.703. While controlling for gender and tumor grade, the OR = 0.63 (CI: 0.43, 0.91) per standard deviation (SD) increase in age indicates increasing age confers unfavorable outcome. The hybrid LR model gave Az = 0.778 by combining age and tumor grade with the PNN and controlling for gender. The PNN score and age translate inversely with respect to risk. The OR = 0.27 (CI: 0.14, 0.53) per SD increase in PNN score indicates those patients with decreased score confer unfavorable outcome. The tumor grade adjusted hazard for patients above the median age compared with those below the median was HR = 1.78 (CI: 1.06, 3.02), whereas the hazard for those patients below the median PNN score compared to those above the median was HR = 4.0 (CI: 2.13, 7.14). We have provided preliminary evidence showing that the SL preprocessing may provide benefits in comparison with accepted approaches. The work will require further evaluation with varying datasets to confirm these findings.
    BioMedical Engineering OnLine 11/2011; 10:97. DOI:10.1186/1475-925X-10-97 · 1.43 Impact Factor
  • Source
    • "Other researchers modified the LR model to include non-parametric functions to study colon cancer [8]. Generalized models have also been used in various capacities to model lung function change [9], blood pressure [10], alcohol consumption [11], and heart disease [12]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: When investigating covariate interactions and group associations with standard regression analyses, the relationship between the response variable and exposure may be difficult to characterize. When the relationship is nonlinear, linear modeling techniques do not capture the nonlinear information content. Statistical learning (SL) techniques with kernels are capable of addressing nonlinear problems without making parametric assumptions. However, these techniques do not produce findings relevant for epidemiologic interpretations. A simulated case-control study was used to contrast the information embedding characteristics and separation boundaries produced by a specific SL technique with logistic regression (LR) modeling representing a parametric approach. The SL technique was comprised of a kernel mapping in combination with a perceptron neural network. Because the LR model has an important epidemiologic interpretation, the SL method was modified to produce the analogous interpretation and generate odds ratios for comparison. The SL approach is capable of generating odds ratios for main effects and risk factor interactions that better capture nonlinear relationships between exposure variables and outcome in comparison with LR. The integration of SL methods in epidemiology may improve both the understanding and interpretation of complex exposure/disease relationships.
    BMC Bioinformatics 01/2011; 12:37. DOI:10.1186/1471-2105-12-37 · 2.58 Impact Factor
  • Source
    • "This would imply that, for example, the relative risk of mortality is the same when comparing (a) an 80-year old vs a 60-year-old subject, and (b) a 40-year old vs a 20-year old, because in both cases there is a 20-year age difference. Again, in the last two decades, several epidemiological and clinical studies have shown that the linearity assumption is seriously violated for many prognostic and risk factors, and its a priori acceptance may lead to important biases and misleading conclusions (Hastie and Tibshirani, 1990; Sleeper and Harrington, 1990; Gray, 1992; Royston and Altman, 1994; Abrahamowicz et al, 1997; Remontet et al, 2007; Royston and Sauerbrei, 2008). Thus, the methodological arguments and the empirical evidence indicate that both the PH and the linearity assumptions should be carefully verified in prognostic studies. "
    [Show abstract] [Hide abstract]
    ABSTRACT: C-reactive protein (CRP) is gaining credibility as a prognostic factor in different cancers. Cox's proportional hazard (PH) model is usually used to assess prognostic factors. However, this model imposes a priori assumptions, which are rarely tested, that (1) the hazard ratio associated with each prognostic factor remains constant across the follow-up (PH assumption) and (2) the relationship between a continuous predictor and the logarithm of the mortality hazard is linear (linearity assumption). We tested these two assumptions of the Cox's PH model for CRP, using a flexible statistical model, while adjusting for other known prognostic factors, in a cohort of 269 patients newly diagnosed with non-small cell lung cancer (NSCLC). In the Cox's PH model, high CRP increased the risk of death (HR=1.11 per each doubling of CRP value, 95% CI: 1.03-1.20, P=0.008). However, both the PH assumption (P=0.033) and the linearity assumption (P=0.015) were rejected for CRP, measured at the initiation of chemotherapy, which kept its prognostic value for approximately 18 months. Our analysis shows that flexible modeling provides new insights regarding the value of CRP as a prognostic factor in NSCLC and that Cox's PH model underestimates early risks associated with high CRP.
    British Journal of Cancer 03/2010; 102(7):1113-22. DOI:10.1038/sj.bjc.6605603 · 4.84 Impact Factor
Show more


5 Reads
Available from