Department of Statistics Stanford University 390 Serra Mall Stanford, California 94305 USA E-mail: .
The Annals of Applied Statistics (Impact Factor: 2.24). 09/2008; 2(3):986-1012. DOI: 10.1214/08-AOAS182SUPP
Source: PubMed

ABSTRACT We consider the problem of testing the significance of features in high-dimensional settings. In particular, we test for differentially-expressed genes in a microarray experiment. We wish to identify genes that are associated with some type of outcome, such as survival time or cancer type. We propose a new procedure, called Lassoed Principal Components (LPC), that builds upon existing methods and can provide a sizable improvement. For instance, in the case of two-class data, a standard (albeit simple) approach might be to compute a two-sample t-statistic for each gene. The LPC method involves projecting these conventional gene scores onto the eigenvectors of the gene expression data covariance matrix and then applying an L(1) penalty in order to de-noise the resulting projections. We present a theoretical framework under which LPC is the logical choice for identifying significant genes, and we show that LPC can provide a marked reduction in false discovery rates over the conventional methods on both real and simulated data. Moreover, this flexible procedure can be applied to a variety of types of data and can be used to improve many existing methods for the identification of significant features.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Despite its eradication over 30 years ago, smallpox (as well as other orthopox viruses) remains a pathogen of interest both in terms of biodefense and for its use as a vector for vaccines and immunotherapies. Here we describe the application of mRNA-Seq transcriptome profiling to understanding immune responses in smallpox vaccine recipients. Contrary to other studies examining gene expression in virally infected cell lines, we utilized a mixed population of peripheral blood mononuclear cells in order to capture the essential intercellular interactions that occur in vivo, and would otherwise be lost, using single cell lines or isolated primary cell subsets. In this mixed cell population we were able to detect expression of all annotated vaccinia genes. On the host side, a number of genes encoding cytokines, chemokines, complement factors and intracellular signaling molecules were downregulated upon viral infection, whereas genes encoding histone proteins and the interferon response were upregulated. We also identified a small number of genes that exhibited significantly different expression profiles in subjects with robust humoral immunity compared with those with weaker humoral responses. Our results provide evidence that differential gene regulation patterns may be at work in individuals with robust humoral immunity compared with those with weaker humoral immune responses.Genes and Immunity advance online publication, 18 April 2013; doi:10.1038/gene.2013.14.
    Genes and immunity 04/2013; · 4.22 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: DNA microarrays are a relatively new technology that can simultaneously measure the expression level of thousands of genes. They have become an important tool for a wide variety of biological experiments. One of the most common goals of DNA microarray experiments is to identify genes associated with biological processes of interest. Conventional statistical tests often produce poor results when applied to microarray data owing to small sample sizes, noisy data, and correlation among the expression levels of the genes. Thus, novel statistical methods are needed to identify significant genes in DNA microarray experiments. This article discusses the challenges inherent in DNA microarray analysis and describes a series of statistical techniques that can be used to overcome these challenges. The problem of multiple hypothesis testing and its relation to microarray studies are also considered, along with several possible solutions.
    Wiley interdisciplinary reviews. Computational statistics. 01/2013; 5(4).
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this article, we introduce a procedure for selecting variables in principal components analysis. The procedure was developed to identify a small subset of the original variables that "best explain" the principal components through nonparametric relationships. There are usually some "noisy" uninformative variables in a dataset, and some variables that are strongly related to each other because of their general interdependence. The procedure is designed to be used following the satisfactory initial use of a principal components analysis with all variables, and its aim is to help to interpret underlying structures, particularly in high dimensional data. We analyse the asymptotic behaviour of the method and provide an example by applying the procedure to some real data.


Available from