Genetic association studies of cancer: Where do we go from here?

Fred Hutchinson Cancer Research Center, Seattle, Washington, United States
Cancer Epidemiology Biomarkers & Prevention (Impact Factor: 4.32). 06/2007; 16(5):864-5. DOI: 10.1158/1055-9965.EPI-07-0289
Source: PubMed
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A large number of factors can affect the statistical power and bias of analyses of data from large cohort studies, including misclassification, correlated data, follow-up time, prevalence of the risk factor of interest, and prevalence of the outcome. This paper presents a method for simulating cohorts where individual's risk is correlated within communities, recruitment is staggered over time, and outcomes are observed after different follow-up periods. Covariates and outcomes are misclassified, and Cox proportional hazards models are fit with a community-level frailty term. The effect on study power of varying effect sizes, prevalences, correlation, and misclassification are explored, as well as varying the proportion of controls in nested case-control studies.
    Biometrical Journal 10/2010; 52(5):604-15. DOI:10.1002/bimj.200900277 · 1.15 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Network propagation algorithms have proved useful for the analysis of high-dimensional genomic data. One limitation is that the current formulation only allows network propagation on positively weighted graphs. In this paper, we explore two signed network propagation algorithms and general optimization frameworks for detecting differential gene expressions and DNA copy number variations (CNV). The proposed algorithms consider both positive and negative relations in graphs to model gene up/down-regulation or amplification/deletion CNV events. The first algorithm (Signed-NP) integrates gene co-expressions and differential expressions for consistent and robust gene selection from microarray datasets by propagation on gene correlation graphs. The second algorithm (Signed-NPBi) identifies gene or CNV markers by propagation on sample-feature bipartite graphs to capture bi-clusters between samples and genomic features. Large scale experiments on several microarray gene expression datasets and CNV datasets validate that Signed-NP and Signed-NPBi perform better classification of gene expression and CNV data than standard network propagation. The experiments also demonstrate that Signed-NP is capable of selecting genes that are more biologically interpretable and consistent across multiple datasets, and Signed-NPBi can detect hidden CNV patterns in bi-clusters by smoothing on correlations between adjacent probes.
    Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine; 10/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A central problem in biomarker discovery from large-scale gene expression or single nucleotide polymorphism (SNP) data is the computational challenge of taking into account the dependence among all the features. Methods that ignore the dependence usually identify non-reproducible biomarkers across independent datasets. We introduce a new graph-based semi-supervised feature classification algorithm to identify discriminative disease markers by learning on bipartite graphs. Our algorithm directly classifies the feature nodes in a bipartite graph as positive, negative or neutral with network propagation to capture the dependence among both samples and features (clinical and genetic variables) by exploring bi-cluster structures in a graph. Two features of our algorithm are: (1) our algorithm can find a global optimal labeling to capture the dependence among all the features and thus, generates highly reproducible results across independent microarray or other high-thoughput datasets, (2) our algorithm is capable of handling hundreds of thousands of features and thus, is particularly useful for biomarker identification from high-throughput gene expression and SNP data. In addition, although designed for classifying features, our algorithm can also simultaneously classify test samples for disease prognosis/diagnosis. We applied the network propagation algorithm to study three large-scale breast cancer datasets. Our algorithm achieved competitive classification performance compared with SVMs and other baseline methods, and identified several markers with clinical or biological relevance with the disease. More importantly, our algorithm also identified highly reproducible marker genes and enriched functions from the independent datasets. Supplementary results and source code are available at Supplementary data are available at Bioinformatics online.
    Bioinformatics 09/2008; 24(18):2023-9. DOI:10.1093/bioinformatics/btn383 · 4.62 Impact Factor