On the analysis of copy-number variations in genome-wide association studies: a translation of the family-based association test. Genetic Epidemiology 32(3):273
ABSTRACT Though there is an increasing support for an important contribution of copy number variation (CNV) to the genetic architecture of complex disease, few methods have been developed for the analysis of such variation in the context of genetic association studies. In this paper, we propose a generalization of family-based association tests (FBATs) to allow for the analysis of CNVs at a genome-wide level. We translate the popular FBAT approach so that, instead of genotypes, raw intensity values that reflect copy number are used directly in the test statistic, thereby bypassing the need for a CNV genotyping algorithm. Moreover, both inherited and de novo CNVs can be analyzed without any prior knowledge about the type of CNV, making it easily applicable to large-scale association studies. All robustness properties of the genotype FBAT approach are maintained and all previously developed FBAT extensions, including FBATs for time-to-onset, multivariate FBATs, and FBAT-testing strategies, can be directly transferred to the analysis of CNVs. Using simulation studies, we evaluate the power and the robustness of the new approach. Furthermore, for those CNVs that can be genotyped, we compare FBATs based on genotype calls with FBATs that are directly based on the intensity data. An application to one of the first CNV genome-wide-association studies of asthma identifies a very plausible candidate gene. A software implementation of the approach is freely available at http://www.hsph.harvard.edu/research/iuliana-ionita/software. The approach has also been completely integrated in the PBAT software package.
- SourceAvailable from: Peng Li
[Show abstract] [Hide abstract]
- "The second strategy is to directly test the CNV associations from the intensity data without making CNV calls (Barnes et al., 2008; Ionita-Laza et al., 2008; Eleftherohorinou et al., 2011; Shi and Li, 2013). The simplest method is to directly test the association for each probe using LRR as a surrogate (Ionita-Laza et al., 2008). This method does not use spatial information of CNVs or the distribution of the intensity data and thus is not expected to be efficient. "
ABSTRACT: Copy number variations (CNVs) constitute a major source of genetic variations in human populations and have been reported to be associated with complex diseases. Methods have been developed for detecting CNVs and testing CNV associations in genome-wide association studies (GWAS) based on SNP arrays. Commonly used two-step testing procedures work well only for long CNVs while direct CNV association testing methods work only for recurrent CNVs. Assuming that short CNVs disrupting any part of a given genomic region increase disease risk, we developed a variable threshold exact test (VTET) for testing disease associations of CNVs randomly distributed in the genome using intensity data from SNP arrays. By extensive simulations, we found that VTET outperformed two-step testing procedures based on existing CNV calling algorithms for short CNVs and that the performance of VTET was robust to the length of the genomic region. In addition, VTET had a comparable performance with CNVtools for testing the association of recurrent CNVs. Thus, we expect VTET to be useful for testing disease associations of both recurrent and randomly distributed CNVs using existing GWAS data. We applied VTET to a lung cancer GWAS and identified a genome-wide significant region on chromosome 18q22.3 for lung squamous cell carcinoma.Frontiers in Genetics 03/2014; 5:53. DOI:10.3389/fgene.2014.00053
[Show abstract] [Hide abstract]
- "We next assessed the impact of family-based adjustment on association testing. Using the genome-wide aCGH data in 385 parent-child trios, we applied the CNV-FBAT algorithm  both before and after family adjustment. Given that the adjustment procedure used local family data which aims to reconcile differences between parental and offspring copy number abundance, and because the association test assesses for differences between the observed offspring copy number and that expected from parental data, there was concern about the method possibly introducing systematic null bias and reducing statistical power. "
ABSTRACT: Background In recent years there has been a growing interest in the role of copy number variations (CNV) in genetic diseases. Though there has been rapid development of technologies and statistical methods devoted to detection in CNVs from array data, the inherent challenges in data quality associated with most hybridization techniques remains a challenging problem in CNV association studies. Results To help address these data quality issues in the context of family-based association studies, we introduce a statistical framework for the intensity-based array data that takes into account the family information for copy-number assignment. The method is an adaptation of traditional methods for modeling SNP genotype data that assume Gaussian mixture model, whereby CNV calling is performed for all family members simultaneously and leveraging within family-data to reduce CNV calls that are incompatible with Mendelian inheritance while still allowing de-novo CNVs. Applying this method to simulation studies and a genome-wide association study in asthma, we find that our approach significantly improves CNV calls accuracy, and reduces the Mendelian inconsistency rates and false positive genotype calls. The results were validated using qPCR experiments. Conclusions In conclusion, we have demonstrated that the use of family information can improve the quality of CNV calling and hopefully give more powerful association test of CNVs.BMC Bioinformatics 05/2013; 14(1):157. DOI:10.1186/1471-2105-14-157 · 2.67 Impact Factor
[Show abstract] [Hide abstract]
- "Of particular interest is its reported role in the regulation of the T cell response through promotion of the clearance of the T-cell receptor from the cell surface . It has previously been reported that a region of chromosome 7 containing the T-cell receptor gamma (TCRγ) gene is associated with asthma [29,30]. "
ABSTRACT: Background Despite the success of genome-wide association studies for asthma, few, if any, definitively causal variants have been identified and there is still a substantial portion of the heritability of the disease yet to be discovered. Some of this “missing heritability” may be accounted for by family-specific coding variants found to be segregating with asthma. Methods To identify family-specific variants segregating with asthma, we recruited one family from a previous study of asthma as reporting multiple asthmatic and non-asthmatic children. We performed whole-exome sequencing on all four children and both parents and identified coding variants segregating with asthma that were not found in other variant databases. Results Ten novel variants were identified that were found in the two affected offspring and affected mother, but absent in the unaffected father and two unaffected offspring. Of these ten, variants in three genes (PDE4DIP, CBLB, and KALRN) were deemed of particular interest based on their functional prediction scores and previously reported function or asthma association. We did not identify any common risk variants segregating with asthma, however, we did observe an increase in the number of novel, nonsynonymous variants in asthma candidate genes in the asthmatic children compared to the non-asthmatic children. Conclusions This is the first report applying exome sequencing to identify asthma susceptibility variants. Despite having sequenced only one family segregating asthma, we have identified several potentially functional variants in interesting asthma candidate genes. This will provide the basis for future work in which more families will be sequenced to identify variants across families that cluster within genes.BMC Medical Genetics 10/2012; 13(1):95. DOI:10.1186/1471-2350-13-95 · 2.45 Impact Factor