[Show abstract][Hide abstract] ABSTRACT: The identification of the causative genetic variants in quantitative trait loci (QTL) influencing phenotypic traits is challenging, especially in crosses between outbred strains. We have previously identified several QTL influencing tameness and aggression in a cross between two lines of wild-derived, outbred rats (Rattus norvegicus) selected for their behavior towards humans. Here, we use targeted sequence capture and massively parallel sequencing of all genes in the strongest QTL in the founder animals of the cross. We identify many novel sequence variants, several of which are potentially functionally relevant. The QTL contains several regions where either the tame or the aggressive founders contain no sequence variation, and two regions where alternative haplotypes are fixed between the founders. A re-analysis of the QTL signal showed that the causative site is likely to be fixed among the tame founder animals, but that several causative alleles may segregate among the aggressive founder animals. Using a formal test for the detection of positive selection, we find 10 putative positively selected regions, some of which are close to genes known to influence behavior. Together, these results show that the QTL is probably not caused by a single selected site, but may instead represent the joint effects of several sites that were targets of polygenic selection.
[Show abstract][Hide abstract] ABSTRACT: X-linked mental retardation (XLMR) is a complex human disease that causes intellectual disability. Causal mutations have been found in approximately 90 X-linked genes; however, molecular and biological functions of many of these genetically defined XLMR genes remain unknown. PHF8 (PHD (plant homeo domain) finger protein 8) is a JmjC domain-containing protein and its mutations have been found in patients with XLMR and craniofacial deformities. Here we provide multiple lines of evidence establishing PHF8 as the first mono-methyl histone H4 lysine 20 (H4K20me1) demethylase, with additional activities towards histone H3K9me1 and me2. PHF8 is located around the transcription start sites (TSS) of approximately 7,000 RefSeq genes and in gene bodies and intergenic regions (non-TSS). PHF8 depletion resulted in upregulation of H4K20me1 and H3K9me1 at the TSS and H3K9me2 in the non-TSS sites, respectively, demonstrating differential substrate specificities at different target locations. PHF8 positively regulates gene expression, which is dependent on its H3K4me3-binding PHD and catalytic domains. Importantly, patient mutations significantly compromised PHF8 catalytic function. PHF8 regulates cell survival in the zebrafish brain and jaw development, thus providing a potentially relevant biological context for understanding the clinical symptoms associated with PHF8 patients. Lastly, genetic and molecular evidence supports a model whereby PHF8 regulates zebrafish neuronal cell survival and jaw development in part by directly regulating the expression of the homeodomain transcription factor MSX1/MSXB, which functions downstream of multiple signalling and developmental pathways. Our findings indicate that an imbalance of histone methylation dynamics has a critical role in XLMR.
[Show abstract][Hide abstract] ABSTRACT: While whole-genome resequencing remains expensive, genomic partitioning provides an affordable means of targeting sequence efforts towards regions of high interest. There are several competitive methods for targeted capture; these include molecular inversion probes, microdroplet-segregated multiplex PCR, and on-array or in-solution capture-by-hybridization. Enrichment of the human exome by array hybridization has been successfully applied to pinpoint the causative allele of Mendelian disorders. This protocol focuses on the application of Agilent 1 M arrays for capture-by-hybridization and sequencing on the Illumina platform, although the library preparation method may be adaptable to other vendors' array platforms and sequencing technologies.
Current protocols in human genetics / editorial board, Jonathan L. Haines ... [et al.] 07/2010; Chapter 18:Unit 18.3.
[Show abstract][Hide abstract] ABSTRACT: It is now possible to perform whole-genome shotgun sequencing as well as capture of specific genomic regions for extinct organisms. However, targeted resequencing of large parts of nuclear genomes has yet to be demonstrated for ancient DNA. Here we show that hybridization capture on microarrays can successfully recover more than a megabase of target regions from Neandertal DNA even in the presence of approximately 99.8% microbial DNA. Using this approach, we have sequenced approximately 14,000 protein-coding positions inferred to have changed on the human lineage since the last common ancestor shared with chimpanzees. By generating the sequence of one Neandertal and 50 present-day humans at these positions, we have identified 88 amino acid substitutions that have become fixed in humans since our divergence from the Neandertals.
[Show abstract][Hide abstract] ABSTRACT: The classical candidate-gene approach has failed to identify novel breast cancer susceptibility genes. Nowadays, massive parallel sequencing technology allows the development of studies unaffordable a few years ago. However, analysis protocols are not yet sufficiently developed to extract all information from the huge amount of data obtained.
In this study, we performed high throughput sequencing in two regions located on chromosomes 3 and 6, recently identified by linkage studies by our group as candidate regions for harbouring breast cancer susceptibility genes. In order to enrich for the coding regions of all described genes located in both candidate regions, a hybrid-selection method on tiling microarrays was performed.
We developed an analysis pipeline based on SOAP aligner to identify candidate variants with a high real positive confirmation rate (0.89), with which we identified eight variants considered candidates for functional studies. The results suggest that the present strategy might be a valid second step for identifying high penetrance genes.
PLoS ONE 01/2010; 5(4):e9976. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Genome-wide association studies suggest that common genetic variants explain only a modest fraction of heritable risk for common diseases, raising the question of whether rare variants account for a significant fraction of unexplained heritability. Although DNA sequencing costs have fallen markedly, they remain far from what is necessary for rare and novel variants to be routinely identified at a genome-wide scale in large cohorts. We have therefore sought to develop second-generation methods for targeted sequencing of all protein-coding regions ('exomes'), to reduce costs while enriching for discovery of highly penetrant variants. Here we report on the targeted capture and massively parallel sequencing of the exomes of 12 humans. These include eight HapMap individuals representing three populations, and four unrelated individuals with a rare dominantly inherited disorder, Freeman-Sheldon syndrome (FSS). We demonstrate the sensitive and specific identification of rare and common variants in over 300 megabases of coding sequence. Using FSS as a proof-of-concept, we show that candidate genes for Mendelian disorders can be identified by exome sequencing of a small number of unrelated, affected individuals. This strategy may be extendable to diseases with more complex genetics through larger sample sizes and appropriate weighting of non-synonymous variants by predicted functional impact.
[Show abstract][Hide abstract] ABSTRACT: DNA methylation stabilizes developmentally programmed gene expression states. Aberrant methylation is associated with disease progression and is a common feature of cancer genomes. Presently, few methods enable quantitative, large-scale, single-base resolution mapping of DNA methylation states in desired regions of a complex mammalian genome. Here, we present an approach that combines array-based hybrid selection and massively parallel bisulfite sequencing to profile DNA methylation in genomic regions spanning hundreds of thousands of bases. This single molecule strategy enables methylation variable positions to be quantitatively examined with high sampling precision. Using bisulfite capture, we assessed methylation patterns across 324 randomly selected CpG islands (CGI) representing more than 25,000 CpG sites. A single lane of Illumina sequencing permitted methylation states to be definitively called for >90% of target sties. The accuracy of the hybrid-selection approach was verified using conventional bisulfite capillary sequencing of cloned PCR products amplified from a subset of the selected regions. This confirmed that even partially methylated states could be successfully called. A comparison of human primary and cancer cells revealed multiple differentially methylated regions. More than 25% of islands showed complex methylation patterns either with partial methylation states defining the entire CGI or with contrasting methylation states appearing in specific regional blocks within the island. We observed that transitions in methylation state often correlate with genomic landmarks, including transcriptional start sites and intron-exon junctions. Methylation, along with specific histone marks, was enriched in exonic regions, suggesting that chromatin states can foreshadow the content of mature mRNAs.
Genome Research 09/2009; 19(9):1593-605. · 14.40 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Complementary techniques that deepen information content and minimize reagent costs are required to realize the full potential of massively parallel sequencing. Here, we describe a resequencing approach that directs focus to genomic regions of high interest by combining hybridization-based purification of multi-megabase regions with sequencing on the Illumina Genome Analyzer (GA). The capture matrix is created by a microarray on which probes can be programmed as desired to target any non-repeat portion of the genome, while the method requires only a basic familiarity with microarray hybridization. We present a detailed protocol suitable for 1-2 microg of input genomic DNA and highlight key design tips in which high specificity (>65% of reads stem from enriched exons) and high sensitivity (98% targeted base pair coverage) can be achieved. We have successfully applied this to the enrichment of coding regions, in both human and mouse, ranging from 0.5 to 4 Mb in length. From genomic DNA library production to base-called sequences, this procedure takes approximately 9-10 d inclusive of array captures and one Illumina flow cell run.
[Show abstract][Hide abstract] ABSTRACT: The most widely used method for detecting genome-wide protein-DNA interactions is chromatin immunoprecipitation on tiling microarrays, commonly known as ChIP-chip. Here, we conducted the first objective analysis of tiling array platforms, amplification procedures, and signal detection algorithms in a simulated ChIP-chip experiment. Mixtures of human genomic DNA and "spike-ins" comprised of nearly 100 human sequences at various concentrations were hybridized to four tiling array platforms by eight independent groups. Blind to the number of spike-ins, their locations, and the range of concentrations, each group made predictions of the spike-in locations. We found that microarray platform choice is not the primary determinant of overall performance. In fact, variation in performance between labs, protocols, and algorithms within the same array platform was greater than the variation in performance between array platforms. However, each array platform had unique performance characteristics that varied with tiling resolution and the number of replicates, which have implications for cost versus detection power. Long oligonucleotide arrays were slightly more sensitive at detecting very low enrichment. On all platforms, simple sequence repeats and genome redundancy tended to result in false positives. LM-PCR and WGA, the most popular sample amplification techniques, reproduced relative enrichment levels with high fidelity. Performance among signal detection algorithms was heavily dependent on array platform. The spike-in DNA samples and the data presented here provide a stable benchmark against which future ChIP platforms, protocol improvements, and analysis methods can be evaluated.
Genome Research 04/2008; 18(3):393-403. · 14.40 Impact Factor