Topics (5)

Publications (37) View all

  • Article: Sparse Ising Models with Covariates
    [show abstract] [hide abstract]
    ABSTRACT: There has been a lot of work fitting Ising models to multivariate binary data in order to understand the conditional dependency relationships between the variables. However, additional covariates are frequently recorded together with the binary data, and may influence the dependence relationships. Motivated by such a dataset on genomic instability collected from tumor samples of several types, we propose a sparse covariate dependent Ising model to study both the conditional dependency within the binary data and its relationship with the additional covariates. This results in subject-specific Ising models, where the subject's covariates influence the strength of association between the genes. As in all exploratory data analysis, interpretability of results is important, and we use L1 penalties to induce sparsity in the fitted graphs and in the number of selected covariates. Two algorithms to fit the model are proposed and compared on a set of simulated data, and asymptotic results are established. The results on the tumor dataset and their biological significance are discussed in detail.
    09/2012;
  • Article: Sequential multiplexed analyte quantification using peptide immunoaffinity enrichment coupled to mass spectrometry.
    [show abstract] [hide abstract]
    ABSTRACT: Peptide immunoaffinity enrichment coupled to selected reaction monitoring (SRM) mass spectrometry (immuno-SRM) has emerged as a technology with great potential for quantitative proteomic assays. One advantage over traditional immunoassays is the tremendous potential for concurrent quantification of multiple analytes from a given sample (i.e. multiplex analysis). We sought to explore the capacity of the immuno-SRM technique for analyzing large numbers of analytes by evaluating the multiplex capabilities and demonstrating the sequential analysis of groups of peptides from a single sample. To evaluate multiplex analysis, immuno-SRM assays were arranged in groups of 10, 20, 30, 40, and 50 peptides using a common set of reagents. The multiplex immuno-SRM assays were used to measure synthetic peptides added to plasma covering several orders of magnitude concentration. Measurements made in large multiplex groups were highly correlated (r(2) ≥ 0.98) and featured good agreement (bias ≤ 1%) compared with single-plex assays or a 10-plex configuration. The ability to sequentially enrich sets of analyte peptides was demonstrated by enriching groups of 10 peptides from a plasma sample in a sequential fashion. The data show good agreement (bias ≤ 1.5%) and similar reproducibility regardless of enrichment order. These significant advancements demonstrate the utility of immuno-SRM for analyzing large numbers of analytes, such as in large biomarker verification experiments or in pathway-based targeted analysis.
    Molecular &amp Cellular Proteomics 12/2011; 11(6):M111.015347. · 7.40 Impact Factor
  • Source
    Article: Bootstrap inference for network construction with an application to a breast cancer microarray study
    Shuang Li, Li Hsu, Jie Peng, Pei Wang
    [show abstract] [hide abstract]
    ABSTRACT: Gaussian Graphical Models (GGMs) have been used to construct genetic regulatory networks where regularization techniques are widely used since the network inference usually falls into a high-dimension-low-sample-size scenario. Yet, finding the right amount of regularization can be challenging, especially in an unsupervised setting where traditional methods such as BIC or cross-validation often do not work well. In this paper, we propose a new method - Bootstrap Inference for Network COnstruction (BINCO) - to infer networks by directly controlling the false discovery rates (FDRs) of the selected edges. This method fits a mixture model for the distribution of edge selection frequencies to estimate the FDRs, where the selection frequencies are calculated via model aggregation. This method is applicable to a wide range of applications beyond network construction. When we applied our proposed method to building a gene regulatory network with microarray expression breast cancer data, we were able to identify high-confidence edges and well-connected hub genes that could potentially play important roles in understanding the underlying biological processes of breast cancer.
    11/2011;
  • Source
    Article: A targeted proteomics-based pipeline for verification of biomarkers in plasma.
    [show abstract] [hide abstract]
    ABSTRACT: High-throughput technologies can now identify hundreds of candidate protein biomarkers for any disease with relative ease. However, because there are no assays for the majority of proteins and de novo immunoassay development is prohibitively expensive, few candidate biomarkers are tested in clinical studies. We tested whether the analytical performance of a biomarker identification pipeline based on targeted mass spectrometry would be sufficient for data-dependent prioritization of candidate biomarkers, de novo development of assays and multiplexed biomarker verification. We used a data-dependent triage process to prioritize a subset of putative plasma biomarkers from >1,000 candidates previously identified using a mouse model of breast cancer. Eighty-eight novel quantitative assays based on selected reaction monitoring mass spectrometry were developed, multiplexed and evaluated in 80 plasma samples. Thirty-six proteins were verified as being elevated in the plasma of tumor-bearing animals. The analytical performance of this pipeline suggests that it should support the use of an analogous approach with human samples.
    Nature Biotechnology 06/2011; 29(7):625-34. · 29.50 Impact Factor
  • Article: Learning oncogenic pathways from binary genomic instability data.
    Pei Wang, Dennis L Chao, Li Hsu
    [show abstract] [hide abstract]
    ABSTRACT: Genomic instability, the propensity of aberrations in chromosomes, plays a critical role in the development of many diseases. High throughput genotyping experiments have been performed to study genomic instability in diseases. The output of such experiments can be summarized as high-dimensional binary vectors, where each binary variable records aberration status at one marker locus. It is of keen interest to understand how aberrations may interact with each other, as it provides insight into the process of the disease development. In this article, we propose a novel method, LogitNet, to infer such interactions among these aberration events. The method is based on penalized logistic regression with an extension to account for spatial correlation in the genomic instability data. We conduct extensive simulation studies and show that the proposed method performs well in the situations considered. Finally, we illustrate the method using genomic instability data from breast cancer samples.
    Biometrics 03/2011; 67(1):164-73. · 1.83 Impact Factor

Following (9) See all

Followers (13) See all