Development and Validation of Predictive Indices for a Continuous Outcome Using Gene Expression Profiles

Biometric Research Branch, Division of Cancer Treatment and Diagnosis, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA. Email: .
Cancer informatics 05/2010; 9:105-14.
Source: PubMed

ABSTRACT There have been relatively few publications using linear regression models to predict a continuous response based on microarray expression profiles. Standard linear regression methods are problematic when the number of predictor variables exceeds the number of cases. We have evaluated three linear regression algorithms that can be used for the prediction of a continuous response based on high dimensional gene expression data. The three algorithms are the least angle regression (LAR), the least absolute shrinkage and selection operator (LASSO), and the averaged linear regression method (ALM). All methods are tested using simulations based on a real gene expression dataset and analyses of two sets of real gene expression data and using an unbiased complete cross validation approach. Our results show that the LASSO algorithm often provides a model with somewhat lower prediction error than the LAR method, but both of them perform more efficiently than the ALM predictor. We have developed a plug-in for BRB-ArrayTools that implements the LAR and the LASSO algorithms with complete cross-validation.

Download full-text


Available from: Richard Simon, Sep 17, 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Hepatocellular Carcinoma (HCC) is the one of leading causes of cancer related deaths worldwide. In most cases, the patients are first infected with Hepatitis C virus (HCV) which then progresses to HCC. HCC is usually diagnosed in its advanced stages and is more difficult to treat at this stage. Early diagnosis increases survival rate as treatment options are available for early stages. Therefore, accurate biomarkers of early HCC diagnosis are needed. DNA microarray technology has been widely used in cancer research. Scientists study DNA microarray gene expression data to identify cancer gene signatures which helps in early cancer diagnosis and prognosis. Most studies are done on single data sets and the biomarkers are only fit to work with these data sets. When tested on any other data sets, classification is poor. In this paper, we combined four different data sets of liver tissue samples (101 HCV-cirrhotic tissues and 57 HCV-cirrhotic tissues from patients with HCC). Differently expressed genes were studied by use of high-density oligonucleotide arrays. We extracted the most informative features using LASSO regression and Random Forest. Then applied different classifiers to distinguish HCV samples from HCV-HCC related samples using the genes selected. .
    WSEAS Transactions on Information Science and Applications 01/2014; II:750.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The current 'isolate, inactivate, inject' vaccine development strategy has served the field of vaccinology well, and such empirical vaccine candidate development has even led to the eradication of smallpox. However, such an approach suffers from limitations, and as an empirical approach, does not fully utilize our knowledge of immunology and genetics. A more complete understanding of the biological processes culminating in disease resistance is needed. The advent of high-dimensional assay technology and 'systems biology' along with a vaccinomics approach [1,2•] is spawning a new era in the science of vaccine development. Here we review recent developments in systems biology and strategies for applying this approach and its resulting data to expand our knowledge base and drive directed development of new vaccines. We also provide applied examples and point out new directions for the field in order to illustrate the power of systems biology.
    Current opinion in immunology 06/2011; 23(3):436-43. DOI:10.1016/j.coi.2011.04.005 · 7.87 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Melanoma cell lines and normal human melanocytes (NHM) were assayed for p53-dependent G1 checkpoint response to ionizing radiation (IR)-induced DNA damage. Sixty-six percent of melanoma cell lines displayed a defective G1 checkpoint. Checkpoint function was correlated with sensitivity to IR with checkpoint-defective lines being radio-resistant. Microarray analysis identified 316 probes whose expression was correlated with G1 checkpoint function in melanoma lines (P≤0.007) including p53 transactivation targets CDKN1A, DDB2, and RRM2B. The 316 probe list predicted G1 checkpoint function of the melanoma lines with 86% accuracy using a binary analysis and 91% accuracy using a continuous analysis. When applied to microarray data from primary melanomas, the 316 probe list was prognostic of 4-yr distant metastasis-free survival. Thus, p53 function, radio-sensitivity, and metastatic spread may be estimated in melanomas from a signature of gene expression.
    Pigment Cell & Melanoma Research 04/2012; 25(4):514-26. DOI:10.1111/j.1755-148X.2012.01010.x · 5.64 Impact Factor