Article

Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes.

Clemson University Genomics Institute, Clemson University, 51 New Cherry Street, Clemson, SC 29634, USA.
BMC Genomics (Impact Factor: 4.04). 01/2011; 12:379. DOI: 10.1186/1471-2164-12-379
Source: PubMed

ABSTRACT BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library.
This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight.
Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.

0 Bookmarks
 · 
126 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. RESULTS: We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. CONCLUSIONS: We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.
    Genome biology 06/2013; 14(6):R53. · 10.47 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Theobroma cacao (cacao) is a tree cultivated in the tropics around the world for its seeds that are the source of both chocolate and cocoa butter. Genetic marker development for marker-assisted selection (MAS) is critical for the success of cacao breeding for disease resistance and yield. To develop conserved ortholog set II (COSII) single-nucleotide polymorphism (SNP) markers for MAS in cacao, we have used three strategies and three types of cacao genetic and sequence data to identify and map 98 cacao COSII genes. The resources available at the time these studies were first undertaken dictated the strategy utilized. For the first strategy, SNPs were identified using cacao expressed sequence tags homologous to COSII sequences. Strategy II utilized a leaf transcriptome of cacao genotype “Matina 1–6” and Strategy III the genomic sequence of a 3-Mb region of “Matina 1–6” linkage group 5 associated with an important quantitative trait locus (QTL) for resistance to black pod. We have identified SNP markers for 83 of the 98 mapped COSII genes, and 19 of these SNP markers co-locate with QTLs. These COSII SNP markers, the first identified for cacao, will be used for genotyping and off-typing in cacao breeding programs and employed for genetic mapping and syntenic studies to trace co-location of genes regulating traits of importance between cacao and other species. KeywordsChocolate–Genetic mapping–Molecular markers–Quantitative trait loci (QTL) co-localization
    Tree Genetics & Genomes 02/2011; 8(1):97-111. · 2.44 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: For lignocellulosic bioenergy to become a viable alternative to traditional energy production methods, rapid increases in conversion efficiency and biomass yield must be achieved. Increased productivity in bioenergy production can be achieved through concomitant gains in processing efficiency as well as genetic improvement of feedstock that have the potential for bioenergy production at an industrial scale. The purpose of this review is to explore the genetic and genomic resource landscape for the improvement of a specific bioenergy feedstock group, the C4 bioenergy grasses. First, bioenergy grass feedstock traits relevant to biochemical conversion are examined. Then we outline genetic resources available bioenergy grasses for mapping bioenergy traits to DNA markers and genes. This is followed by a discussion of genomic tools and how they can be applied to understanding bioenergy grass feedstock trait genetic mechanisms leading to further improvement opportunities.
    Biotechnology for Biofuels 11/2012; 5(1):80. · 6.22 Impact Factor

Full-text (3 Sources)

Download
24 Downloads
Available from
May 30, 2014