Development and Validation of Predictive Indices for a Continuous Outcome Using Gene Expression Profiles

Biometric Research Branch, Division of Cancer Treatment and Diagnosis, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA. Email: .
Cancer informatics 05/2010; 9:105-14.
Source: PubMed

ABSTRACT There have been relatively few publications using linear regression models to predict a continuous response based on microarray expression profiles. Standard linear regression methods are problematic when the number of predictor variables exceeds the number of cases. We have evaluated three linear regression algorithms that can be used for the prediction of a continuous response based on high dimensional gene expression data. The three algorithms are the least angle regression (LAR), the least absolute shrinkage and selection operator (LASSO), and the averaged linear regression method (ALM). All methods are tested using simulations based on a real gene expression dataset and analyses of two sets of real gene expression data and using an unbiased complete cross validation approach. Our results show that the LASSO algorithm often provides a model with somewhat lower prediction error than the LAR method, but both of them perform more efficiently than the ALM predictor. We have developed a plug-in for BRB-ArrayTools that implements the LAR and the LASSO algorithms with complete cross-validation.


Available from: Richard Simon, Sep 17, 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Single-cell sampling with RNA-seq analysis plays an important role in reference laboratory; cytogenomic diagnosis for specimens on glass-slides or rare cells in circulating blood for tumor and genetic diseases; measurement of sensitivity and specificity in tumor-tissue genomic analysis with mixed-cells; mechanism analysis of differentiation and proliferation of cancer stem cell for academic purpose. Our single- cell RNA-seq technique shows that fragments were 250-450 bp after fragmentation, amplification, and adapter addition. There were 11.6 million reads mapped in raw sequencing reads (19.6 million). The numbers of mapped genes, mapped transcripts, and mapped exons were 31,332, 41,210, and 85,786, respectively. All QC results demonstrated that RNA-seq techniques could be used for single-cell genomic performance. Analysis of the mapped genes showed that the number of genes mapped by RNA-seq (6767 genes) was much higher than that of differential display (288 libraries) among similar specimens which we have developed and published. The single-cell RNA-seq can detect gene splicing using different subtype TGF-beta analysis. The results from using Q-rtPCR tests demonstrated that sensitivity is 76% and specificity is 55% from single-cell RNA-seq technique with some gene expression missing (2/8 genes). However, it will be feasible to use RNA-seq techniques to contribute to genomic medicine at single-cell level.
    12/2013; 2013:724124. DOI:10.1155/2013/724124
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The outcomes of clinical trials using bone marrow stromal cell (BMSC) are variable; the degree of the expansion of BMSCs during clinical manufacturing may contribute to this variability since cell expansion is limited by senescence. Human BMSCs from aspirates of healthy subjects were subcultured serially until cell growth stopped. Phenotype and functional measurements of BMSCs from two subjects including senescence-associated beta-galactosidase staining and colony formation efficiency changed from an early to a senescence pattern at passage 6 or 7. Transcriptome analysis of 10 early and 15 late passage BMSC samples from 5 subjects revealed 2122 differentially expressed genes, which were associated with immune response, development, and cell proliferation pathways. Analysis of 57 serial BMSC samples from 7 donors revealed that the change from an early to senescent profile was variable among subjects and occurred prior to changes in phenotypes. BMSC age expressed as a percentage of maximum population doublings (PDs) was a good indicator for an early or senescence transcription signature but this measure of BMSC life span can only be calculated after expanding BMSCs to senescence. In order to find a more useful surrogate measure of BMSC age, we used a computational biology approach to identify a set of genes whose expression at each passage would predict elapsed age of BMSCs. A total of 155 genes were highly correlated with BMSC age. A least angle regression algorithm identified a set of 24 BMSC age-predictive genes. In conclusion, the onset of senescence-associated molecular changes was variable and preceded changes in other indicators of BMSC quality and senescence. The 24 BMSC age predictive genes will be useful in assessing the quality of clinical BMSC products.
    Stem Cell Research 07/2013; 11(3):1060-1073. DOI:10.1016/j.scr.2013.07.005 · 3.91 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Hepatocellular Carcinoma (HCC) is the one of leading causes of cancer related deaths worldwide. In most cases, the patients are first infected with Hepatitis C virus (HCV) which then progresses to HCC. HCC is usually diagnosed in its advanced stages and is more difficult to treat at this stage. Early diagnosis increases survival rate as treatment options are available for early stages. Therefore, accurate biomarkers of early HCC diagnosis are needed. DNA microarray technology has been widely used in cancer research. Scientists study DNA microarray gene expression data to identify cancer gene signatures which helps in early cancer diagnosis and prognosis. Most studies are done on single data sets and the biomarkers are only fit to work with these data sets. When tested on any other data sets, classification is poor. In this paper, we combined four different data sets of liver tissue samples (101 HCV-cirrhotic tissues and 57 HCV-cirrhotic tissues from patients with HCC). Differently expressed genes were studied by use of high-density oligonucleotide arrays. We extracted the most informative features using LASSO regression and Random Forest. Then applied different classifiers to distinguish HCV samples from HCV-HCC related samples using the genes selected. .
    WSEAS Transactions on Information Science and Applications 01/2014; II:750.