Development and Validation of Predictive Indices for a Continuous Outcome Using Gene Expression Profiles

Biometric Research Branch, Division of Cancer Treatment and Diagnosis, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA. Email: .
Cancer informatics 05/2010; 9:105-14.
Source: PubMed


There have been relatively few publications using linear regression models to predict a continuous response based on microarray expression profiles. Standard linear regression methods are problematic when the number of predictor variables exceeds the number of cases. We have evaluated three linear regression algorithms that can be used for the prediction of a continuous response based on high dimensional gene expression data. The three algorithms are the least angle regression (LAR), the least absolute shrinkage and selection operator (LASSO), and the averaged linear regression method (ALM). All methods are tested using simulations based on a real gene expression dataset and analyses of two sets of real gene expression data and using an unbiased complete cross validation approach. Our results show that the LASSO algorithm often provides a model with somewhat lower prediction error than the LAR method, but both of them perform more efficiently than the ALM predictor. We have developed a plug-in for BRB-ArrayTools that implements the LAR and the LASSO algorithms with complete cross-validation.


Available from: Richard Simon, Sep 17, 2014
  • Source
    • "Logistic regression models are commonly used when working with HCV and HCV-HCC classes but shouldn't be used when then number of predictor variables (p) exceeds the sample size (n) [17]. Three linear regression algorithms Least Angle Regression (LAR), Least Absolute Shrinkage Operator (LASSO) and Average Linear Regression (ALM) were evaluated in the prediction of classes on high dimensional gene expression data by Yingdong Zhao [18]. It was demonstrated that LAR and LASSO perform quite well and in a similar manner when used on data without noise and better than ALM but LASSO performed best on data with noise. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Hepatocellular Carcinoma (HCC) is the one of leading causes of cancer related deaths worldwide. In most cases, the patients are first infected with Hepatitis C virus (HCV) which then progresses to HCC. HCC is usually diagnosed in its advanced stages and is more difficult to treat at this stage. Early diagnosis increases survival rate as treatment options are available for early stages. Therefore, accurate biomarkers of early HCC diagnosis are needed. DNA microarray technology has been widely used in cancer research. Scientists study DNA microarray gene expression data to identify cancer gene signatures which helps in early cancer diagnosis and prognosis. Most studies are done on single data sets and the biomarkers are only fit to work with these data sets. When tested on any other data sets, classification is poor. In this paper, we combined four different data sets of liver tissue samples (101 HCV-cirrhotic tissues and 57 HCV-cirrhotic tissues from patients with HCC). Differently expressed genes were studied by use of high-density oligonucleotide arrays. We extracted the most informative features using LASSO regression and Random Forest. Then applied different classifiers to distinguish HCV samples from HCV-HCC related samples using the genes selected. .
    WSEAS Transactions on Information Science and Applications 01/2014; II:750.
  • Source
    • "To analyze the data of RNA-seq, the mapped genes were used to research the fold change by RPKM. Briefly, RPKM from PBMN and TIL were input into BRB ArrayTools ( [18]. We selected significance analysis of Microarray (SAM) with 1.2-fold change, false discovery rate 0.1, and permutation 100 to work on both RNA-seq profiles from PBMN and TIL. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Single-cell sampling with RNA-seq analysis plays an important role in reference laboratory; cytogenomic diagnosis for specimens on glass-slides or rare cells in circulating blood for tumor and genetic diseases; measurement of sensitivity and specificity in tumor-tissue genomic analysis with mixed-cells; mechanism analysis of differentiation and proliferation of cancer stem cell for academic purpose. Our single- cell RNA-seq technique shows that fragments were 250-450 bp after fragmentation, amplification, and adapter addition. There were 11.6 million reads mapped in raw sequencing reads (19.6 million). The numbers of mapped genes, mapped transcripts, and mapped exons were 31,332, 41,210, and 85,786, respectively. All QC results demonstrated that RNA-seq techniques could be used for single-cell genomic performance. Analysis of the mapped genes showed that the number of genes mapped by RNA-seq (6767 genes) was much higher than that of differential display (288 libraries) among similar specimens which we have developed and published. The single-cell RNA-seq can detect gene splicing using different subtype TGF-beta analysis. The results from using Q-rtPCR tests demonstrated that sensitivity is 76% and specificity is 55% from single-cell RNA-seq technique with some gene expression missing (2/8 genes). However, it will be feasible to use RNA-seq techniques to contribute to genomic medicine at single-cell level.
    12/2013; 2013:724124. DOI:10.1155/2013/724124
  • [Show abstract] [Hide abstract]
    ABSTRACT: The current 'isolate, inactivate, inject' vaccine development strategy has served the field of vaccinology well, and such empirical vaccine candidate development has even led to the eradication of smallpox. However, such an approach suffers from limitations, and as an empirical approach, does not fully utilize our knowledge of immunology and genetics. A more complete understanding of the biological processes culminating in disease resistance is needed. The advent of high-dimensional assay technology and 'systems biology' along with a vaccinomics approach [1,2•] is spawning a new era in the science of vaccine development. Here we review recent developments in systems biology and strategies for applying this approach and its resulting data to expand our knowledge base and drive directed development of new vaccines. We also provide applied examples and point out new directions for the field in order to illustrate the power of systems biology.
    Current opinion in immunology 06/2011; 23(3):436-43. DOI:10.1016/j.coi.2011.04.005 · 7.48 Impact Factor
Show more