Rare Variants in Ischemic Stroke: An Exome Pilot Study

The University of Hong Kong, Hong Kong
PLoS ONE (Impact Factor: 3.23). 04/2012; 7(4):e35591. DOI: 10.1371/journal.pone.0035591
Source: PubMed


The genetic architecture of ischemic stroke is complex and is likely to include rare or low frequency variants with high penetrance and large effect sizes. Such variants are likely to provide important insights into disease pathogenesis compared to common variants with small effect sizes. Because a significant portion of human functional variation may derive from the protein-coding portion of genes we undertook a pilot study to identify variation across the human exome (i.e., the coding exons across the entire human genome) in 10 ischemic stroke cases. Our efforts focused on evaluating the feasibility and identifying the difficulties in this type of research as it applies to ischemic stroke. The cases included 8 African-Americans and 2 Caucasians selected on the basis of similar stroke subtypes and by implementing a case selection algorithm that emphasized the genetic contribution of stroke risk. Following construction of paired-end sequencing libraries, all predicted human exons in each sample were captured and sequenced. Sequencing generated an average of 25.5 million read pairs (75 bp×2) and 3.8 Gbp per sample. After passing quality filters, screening the exomes against dbSNP demonstrated an average of 2839 novel SNPs among African-Americans and 1105 among Caucasians. In an aggregate analysis, 48 genes were identified to have at least one rare variant across all stroke cases. One gene, CSN3, identified by screening our prior GWAS results in conjunction with our exome results, was found to contain an interesting coding polymorphism as well as containing excess rare variation as compared with the other genes evaluated. In conclusion, while rare coding variants may predispose to the risk of ischemic stroke, this fact has yet to be definitively proven. Our study demonstrates the complexities of such research and highlights that while exome data can be obtained, the optimal analytical methods have yet to be determined.

Download full-text


Available from: Braxton D Mitchell,
  • Source
    • "Our pipeline uses BWA [43], based on BWT, a fast and memory-efficient read aligner. BWA is the most common choice for WES alignment [44-46]. It allows gapped alignment, using very little memory. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: The advent of massively parallel sequencing technologies (Next Generation Sequencing, NGS) profoundly modified the landscape of human genetics.In particular, Whole Exome Sequencing (WES) is the NGS branch that focuses on the exonic regions of the eukaryotic genomes; exomes are ideal to help us understanding high-penetrance allelic variation and its relationship to phenotype. A complete WES analysis involves several steps which need to be suitably designed and arranged into an efficient pipeline.Managing a NGS analysis pipeline and its huge amount of produced data requires non trivial IT skills and computational power. Results: Our web resource WEP (Whole-Exome sequencing Pipeline web tool) performs a complete WES pipeline and provides easy access through interface to intermediate and final results. The WEP pipeline is composed of several steps:1) verification of input integrity and quality checks, read trimming and filtering; 2) gapped alignment; 3) BAM conversion, sorting and indexing; 4) duplicates removal; 5) alignment optimization around insertion/deletion (indel) positions; 6) recalibration of quality scores; 7) single nucleotide and deletion/insertion polymorphism (SNP and DIP) variant calling; 8) variant annotation; 9) result storage into custom databases to allow cross-linking and intersections, statistics and much more. In order to overcome the challenge of managing large amount of data and maximize the biological information extracted from them, our tool restricts the number of final results filtering data by customizable thresholds, facilitating the identification of functionally significant variants. Default threshold values are also provided at the analysis computation completion, tuned with the most common literature work published in recent years. Conclusions: Through our tool a user can perform the whole analysis without knowing the underlying hardware and software architecture, dealing with both paired and single end data. The interface provides an easy and intuitive access for data submission and a user-friendly web interface for annotated variant visualization.Non-IT mastered users can access through WEP to the most updated and tested WES algorithms, tuned to maximize the quality of called variants while minimizing artifacts and false positives.The web tool is available at the following web address: http://www.caspur.it/wep.
    BMC Bioinformatics 04/2013; 14(7). DOI:10.1186/1471-2105-14-S7-S11 · 2.58 Impact Factor
  • Source
    • "Stroke Cole et al. [2012] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genetic risk factors that underlie many rare and common neurological diseases remain poorly understood because of the multi-factorial and heterogeneous nature of these disorders. Although genome-wide association studies (GWAS) have successfully uncovered numerous susceptibility genes for these diseases, odds ratios associated with risk alleles are generally low and account for only a small proportion of estimated heritability. These results implicated that there are rare (present in <5% of the population) but not causative variants exist in the pathogenesis of these diseases, which usually have large effect size and cannot be captured by GWAS. With the decreasing cost of next-generation sequencing (NGS) technologies, whole-genome and whole-exome sequencing have enabled the rapid identification of rare variants with large effect size, which made huge progress in understanding the basis of many Mendelian neurological conditions as well as complex neurological diseases. In this article, recent NGS-based studies that aimed to investigate genetic causes for neurological diseases, including Alzheimer’s disease, Parkinson’s disease, epilepsy, multiple sclerosis, stroke, amyotrophic lateral sclerosis and spinocerebellar ataxias, have been reviewed. In addition, we also discuss the future directions of NGS applications in this article.
    03/2013; 1(1). DOI:10.3978/j.issn.2305-5839.2014.11.11
  • Source
    • "We selected SNPs with MAF higher than 10% in the Han Chinese population (CHB) using HapMap project data, this is not suited for situations where genetic architecture is such that multiple rare disease-causing variants contribute significantly to disease risk. Recent studies demonstrate that identification of rare variants may lead to critically important insights about disease etiology through implication of new genes and/or pathways [43], [44]. The rare variants in the ADIPOQ gene should be investigated to clarify their susceptibility to the development of COPD. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Adiponectin is reported to be related to the development of chronic obstructive pulmonary disease (COPD). Genetic variants in the gene encoding adiponectin (ADIPOQ) have been reported to be associated with adiponectin level in several genome-wide linkage and association studies. However, relatively little is known about the effects of ADIPOQ gene variants on COPD susceptibility. We determined the frequencies of single-nucleotide polymorphisms (SNPs) in ADIPOQ in a Chinese Han population and their possible association with COPD susceptibility. We conducted a case-control study of 279 COPD patients and 367 age- and gender-distribution-matched control subjects. Seven tagging SNPs in ADIPOQ, including rs710445, rs16861205, rs822396, rs7627128, rs1501299, rs3821799 and rs1063537 were genotyped by SNaPshot. Association analysis of genotypes/alleles and haplotypes constructed from these loci with COPD was conducted under different genetic models. The alleles or genotypes of rs1501299 distributed significantly differently in COPD patients and controls (allele: P = 0.002, OR = 1.43 and 95%CI = 1.14-1.79; genotype: P = 0.008). The allele A at rs1501299 was potentially associated with an increased risk of COPD in all dominant model analysis (P = 0.009; OR: 1.54; 95%CI: 1.11-2.13), recessive model analyses (P = 0.015; OR: 1.75; 95% CI: 1.11-2.75) and additive model analyses (P = 0.003; OR: 2.11; 95% CI: 1.29-3.47). In haplotype analysis, we observed haplotypes AAAAACT and GGACCTC had protective effects, while haplotypes AGAACTC, AGGCCTC, GGAACTC, GGACACT and GGGCCTC were significantly associated with the increased risk of COPD. We conducted the first investigation of the association between the SNPs in ADIPOQ and COPD risk. Our current findings suggest that ADIPOQ may be a potential risk gene for COPD. Further studies in larger groups are warranted to confirm our results.
    PLoS ONE 11/2012; 7(11):e50848. DOI:10.1371/journal.pone.0050848 · 3.23 Impact Factor
Show more