Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes

Clemson University Genomics Institute, Clemson University, 51 New Cherry Street, Clemson, SC 29634, USA.
BMC Genomics (Impact Factor: 4.04). 07/2011; 12:379. DOI: 10.1186/1471-2164-12-379
Source: PubMed

ABSTRACT BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library.
This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight.
Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.

Download full-text


Available from: Niina Haiminen, Jun 20, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The history of cocoa and chocolate including the birth and the expansion of the chocolate industry were described. Recent developments in the industry and cocoa economy were briefly depicted. An overview of the classification of cacao as well as studies on phenotypic and genetic diversity was presented. Cocoa agronomic practices including traditional and modern propagation techniques were reviewed. Nutrition-related health benefits derived from cocoa consumption were listed and widely reviewed. The specific action of cocoa antioxidants was compared to those of teas and wines. Effects of adding milk to chocolate and chocolate drinks versus bioavailability of cocoa polyphenols were discussed. Finally, flavour, sensory, microbiological and toxicological aspects of cocoa consumption were presented.
    Critical Reviews in Food Science and Nutrition 04/2015; DOI:10.1080/10408398.2012.669428 · 5.55 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. RESULTS: We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. CONCLUSIONS: We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.
    Genome biology 06/2013; 14(6):R53. DOI:10.1186/gb-2013-14-6-r53 · 10.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: For lignocellulosic bioenergy to become a viable alternative to traditional energy production methods, rapid increases in conversion efficiency and biomass yield must be achieved. Increased productivity in bioenergy production can be achieved through concomitant gains in processing efficiency as well as genetic improvement of feedstock that have the potential for bioenergy production at an industrial scale. The purpose of this review is to explore the genetic and genomic resource landscape for the improvement of a specific bioenergy feedstock group, the C4 bioenergy grasses. First, bioenergy grass feedstock traits relevant to biochemical conversion are examined. Then we outline genetic resources available bioenergy grasses for mapping bioenergy traits to DNA markers and genes. This is followed by a discussion of genomic tools and how they can be applied to understanding bioenergy grass feedstock trait genetic mechanisms leading to further improvement opportunities.
    Biotechnology for Biofuels 11/2012; 5(1):80. DOI:10.1186/1754-6834-5-80 · 6.22 Impact Factor