Genetic Association Studies of Cancer: Where Do We Go from Here?

ArticleinCancer Epidemiology Biomarkers & Prevention 16(5):864-5 · June 2007with3 Reads
DOI: 10.1158/1055-9965.EPI-07-0289 · Source: PubMed
    • "More generally, comprehensive power calculations allowing for a wide variety of factors, including community-or family-level dependence, will be a necessary component of interpreting the analyses of genotype and disease incidence data from large cohort studies. There are studies which have convincingly identified or replicated genetic and environmental association for a range of chronic diseases, though the conclusions are often inconsistent (see Rebbeck et al., 2007). In addition to possible scientific and technical issues which can cause spurious correlations, statistical inference methods which ignore dependence will underestimate parameter uncertainty and be prone to signaling artificial effects. "
    [Show abstract] [Hide abstract] ABSTRACT: A large number of factors can affect the statistical power and bias of analyses of data from large cohort studies, including misclassification, correlated data, follow-up time, prevalence of the risk factor of interest, and prevalence of the outcome. This paper presents a method for simulating cohorts where individual's risk is correlated within communities, recruitment is staggered over time, and outcomes are observed after different follow-up periods. Covariates and outcomes are misclassified, and Cox proportional hazards models are fit with a community-level frailty term. The effect on study power of varying effect sizes, prevalences, correlation, and misclassification are explored, as well as varying the proportion of controls in nested case-control studies.
    Article · Oct 2010
    • "Environmental influences are theorized to operate in this interaction by moderating the phenotypic expression of genotypes. As a result, many investigators are calling for studies of gene-environment interaction for common diseases such as heart disease, cancer and even alcoholism [Schwartz et al., 1996; Rebbeck et al., 2007; Enoch 2006]. Landscape epidemiology encompasses most of the environmental variables involved in human ecology. "
    [Show abstract] [Hide abstract] ABSTRACT: Complex diseases such as cancer and heart disease result from interactions between an individual's genetics and environment, i.e. their human ecology. Rates of complex diseases have consistently demonstrated geographic patterns of incidence, or spatial "clusters" of increased incidence relative to the general population. Likewise, genetic subpopulations and environmental influences are not evenly distributed across space. Merging appropriate methods from genetic epidemiology, ecology and geography will provide a more complete understanding of the spatial interactions between genetics and environment that result in spatial patterning of disease rates. Geographic information systems (GIS), which are tools designed specifically for dealing with geographic data and performing spatial analyses to determine their relationship, are key to this kind of data integration. Here the authors introduce a new interdisciplinary paradigm, ecogeographic genetic epidemiology, which uses GIS and spatial statistical analyses to layer genetic subpopulation and environmental data with disease rates and thereby discern the complex gene-environment interactions which result in spatial patterns of incidence.
    Full-text · Article · May 2009
    • "Recent developments in high-throughput technology allow large-scale measurement of genomic variations such as gene expression and single nucleotide polymorphisms (SNPs) of a population. Associating these genomic and genetic variations with disease-related phenotypes provides good potential for elucidating etiology of diseases (Rebbeck et al., 2007). It has also been shown that the discovered biomarkers can possibly provide better prognosis and diagnosis than the currently available clinical measures for risk assessment of patients with various diseases (Gevaert et al., 2006; van't Veer et al., 2002). "
    [Show abstract] [Hide abstract] ABSTRACT: A central problem in biomarker discovery from large-scale gene expression or single nucleotide polymorphism (SNP) data is the computational challenge of taking into account the dependence among all the features. Methods that ignore the dependence usually identify non-reproducible biomarkers across independent datasets. We introduce a new graph-based semi-supervised feature classification algorithm to identify discriminative disease markers by learning on bipartite graphs. Our algorithm directly classifies the feature nodes in a bipartite graph as positive, negative or neutral with network propagation to capture the dependence among both samples and features (clinical and genetic variables) by exploring bi-cluster structures in a graph. Two features of our algorithm are: (1) our algorithm can find a global optimal labeling to capture the dependence among all the features and thus, generates highly reproducible results across independent microarray or other high-thoughput datasets, (2) our algorithm is capable of handling hundreds of thousands of features and thus, is particularly useful for biomarker identification from high-throughput gene expression and SNP data. In addition, although designed for classifying features, our algorithm can also simultaneously classify test samples for disease prognosis/diagnosis. We applied the network propagation algorithm to study three large-scale breast cancer datasets. Our algorithm achieved competitive classification performance compared with SVMs and other baseline methods, and identified several markers with clinical or biological relevance with the disease. More importantly, our algorithm also identified highly reproducible marker genes and enriched functions from the independent datasets. Supplementary results and source code are available at Supplementary data are available at Bioinformatics online.
    Full-text · Article · Sep 2008
Show more