The genome of Theobroma cacao

Nature Genetics (Impact Factor: 29.35). 01/2011; 34(2):101-109. DOI: 10.1038/npre.2010.4908.1
Source: OAI


We sequenced and assembled the genome of Theobroma cacao, an economically important tropical fruit tree crop that is the source of chocolate. The assembly corresponds to 76% of the estimated genome size and contains almost all previously described genes, with 82% of them anchored on the 10 T. cacao chromosomes. Analysis of this sequence information highlighted specific expansion of some gene families during evolution, for example flavonoid-related genes. It also provides a major source of candidate genes for T. cacao disease resistance and quality improvement. Based on the inferred paleohistory of the T. cacao genome, we propose an evolutionary scenario whereby the ten T. cacao chromosomes were shaped from an ancestor through eleven chromosome fusions. The T. cacao genome can be considered as a simple living relic of higher plant evolution.

Download full-text


Available from: Spencer Craig Brown, Dec 13, 2013
58 Reads
  • Source
    • "ransposons ( i . e . non - autonomous , poss - ibly nested , defective or inactive ) , though similar estimates have been reported for other plants . The genome of cocoa tree , Theobroma cacao , was estimated to contain approximately 6 . 75 £ 10 4 copies of trans - posable elements , although this estimate was regarded as a rough underestimation ( Argout et al . , 2011 ) . The genome of Populus trichocarpa has over 5000 copies of retrotransposons with a prevalence of Gypsy - like elements ( Tuskan et al . , 2006 ) ; recently , 1479 full - length LTR retrotransposons were identified in poplar ( Cossu et al . , 2012 ) . In Eucalyptus , 24 – 226 copies of trans - criptionally active Copia - like LTR retr"
    [Show abstract] [Hide abstract]
    ABSTRACT: The development of modern approaches to the genetic improvement of the tree crop Ilex paraguariensis (‘yerba mate’) and Ilex dumosa (‘yerba sen ˜orita’) is halted by the scarcity of basic genetic information. In this study, we characterized the implementation of low-cos methodologies such as representational difference analysis (RDA), single-strand conformation polymorphisms (SSCP), and reverse and direct dot-blot filter hybridization assays coupled with thorough bioinformatic characterization of sequence data for both species. Also, we estimated the genome size of each species using flow cytometry. This study contributes to the better understanding of the genetic differences between two cultivated species, by generating new quantitative and qualitative genome-level data. Using the RDA technique, we isolated a group of non-coding repetitive sequences, tentatively considered as Ilex-specific, which were 1.21- to 39.62-fold more abundant in the genome of I. paraguariensis. Another group of repeti tive DNA sequences involved retrotransposons, which appeared 1.41- to 35.77-fold more abundantly in the genome of I. dumosa. The genomic DNA of each species showed differen performances in filter hybridizations: while I. paraguariensis showed a high intraspecific affinity I. dumosa exhibited a higher affinity for the genome of the former species (i.e. interspecific) These differences could be attributed to the occurrence of homologous but slightly divergen repetitive DNA sequences, highly amplified in the genome of I. paraguariensis but not in the genome of I. dumosa. Additionally, our hybridization outcomes suggest that the genomes o both species have less than 80% similarity. Moreover, for the first time, we report herein a genome size estimate of 1670Mbp for I. paraguariensis and that of 1848Mbp for I. dumosa.
    Plant Genetic Resources 05/2014; 13(02):1. DOI:10.1017/S1479262114000756 · 0.58 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Cacao swollen shoot virus (CSSV) is a member of the family Caulimoviridae, genus Badnavirus naturally transmitted to Theobroma cacaoby several mealybug species. Typical symptoms of the disease on cocoa trees are red vein banding of young leaves, mosaic on older leaves and swelling of the orthotropic shoots. The virus, restricted to West Africa whereas the cacao tree originates from the Western Hemisphere, could therefore most probably have an indigenous origin on the West African subcontinent. The disease has caused enormous economical damage in Ghana since the1930s but was only restricted to small areas in Togo and Côte d’Ivoire until recently. Now, renewed outbreaks in the main producing areas in Côte d’Ivoire, Ghana and Togo cause serious problems. The knowledge of the viral biodiversity in the different outbreaks will in turn help to provide a better understanding of the development of the epidemics, and of the evolution of viral populations and may permit to retrace the emergence and dispersal of CSSV. CSSV diversity is genetically structured in at least eight different species according to ICTV recommendations. Only group B was detected in the three countries and in most of the outbreaks, whereas the other groups have a more restricted geographic repartition. To understand such extend of CSSV variability compared to its very short evolutionary history on cocoa trees, we used BEAST software. The results suggest the existence of many emergences from native hosts to cacao trees in the various countries of West Africa.
    International Plant and Animal Genome Conference XXII 2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The scope and breadth of genome-scale metabolic reconstructions have continued to expand over the last decade. Herein, we introduce a genome-scale model for a plant with direct applications to food and bioenergy production (i.e., maize). Maize annotation is still underway, which introduces significant challenges in the association of metabolic functions to genes. The developed model is designed to meet rigorous standards on gene-protein-reaction (GPR) associations, elementally and charged balanced reactions and a biomass reaction abstracting the relative contribution of all biomass constituents. The metabolic network contains 1,563 genes and 1,825 metabolites involved in 1,985 reactions from primary and secondary maize metabolism. For approximately 42% of the reactions direct literature evidence for the participation of the reaction in maize was found. As many as 445 reactions and 369 metabolites are unique to the maize model compared to the AraGEM model for A. thaliana. 674 metabolites and 893 reactions are present in Zea mays iRS1563 that are not accounted for in maize C4GEM. All reactions are elementally and charged balanced and localized into six different compartments (i.e., cytoplasm, mitochondrion, plastid, peroxisome, vacuole and extracellular). GPR associations are also established based on the functional annotation information and homology prediction accounting for monofunctional, multifunctional and multimeric proteins, isozymes and protein complexes. We describe results from performing flux balance analysis under different physiological conditions, (i.e., photosynthesis, photorespiration and respiration) of a C4 plant and also explore model predictions against experimental observations for two naturally occurring mutants (i.e., bm1 and bm3). The developed model corresponds to the largest and more complete to-date effort at cataloguing metabolism for a plant species.
    PLoS ONE 07/2011; 6(7):e21784. DOI:10.1371/journal.pone.0021784 · 3.23 Impact Factor
Show more