The carbohydrate-active ENZYMES database (CAZy): an expert resource for glycogenomics. Nucleic Acids Res 37:D233-D238

Architecture et Fonction des Macromolécules Biologiques, UMR6098, CNRS, Universités Aix-Marseille I & II, 163 Avenue de Luminy, 13288 Marseille, France.
Nucleic Acids Research (Impact Factor: 9.11). 11/2008; 37(Database issue):D233-8. DOI: 10.1093/nar/gkn663
Source: PubMed


The Carbohydrate-Active Enzyme (CAZy) database is a knowledge-based resource specialized in the enzymes that build and breakdown
complex carbohydrates and glycoconjugates. As of September 2008, the database describes the present knowledge on 113 glycoside
hydrolase, 91 glycosyltransferase, 19 polysaccharide lyase, 15 carbohydrate esterase and 52 carbohydrate-binding module families.
These families are created based on experimentally characterized proteins and are populated by sequences from public databases
with significant similarity. Protein biochemical information is continuously curated based on the available literature and
structural information. Over 6400 proteins have assigned EC numbers and 700 proteins have a PDB structure. The classification
(i) reflects the structural features of these enzymes better than their sole substrate specificity, (ii) helps to reveal the
evolutionary relationships between these enzymes and (iii) provides a convenient framework to understand mechanistic properties.
This resource has been available for over 10 years to the scientific community, contributing to information dissemination
and providing a transversal nomenclature to glycobiologists. More recently, this resource has been used to improve the quality
of functional predictions of a number genome projects by providing expert annotation. The CAZy resource resides at URL:

Download full-text


Available from: Brandi L Cantarel,
  • Source
    • "These capabilities have been obtained from many diverse sources through evolution. Aside from mutation-based evolution, such as the massive expansion and diversification of carbohydrate-active enzyme families (Cantarel et al., 2009), Euglena appears to have made extensive use of gene fusions to produce novel domain arrangements, a few examples of which are discussed below. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Euglena gracilis is a eukaryotic microalgae that has been the subject of scientific study for hundreds of years. It has a complex evolutionary history, with traces of at least four endosymbiotic genomes and extensive horizontal gene transfer. Given the importance of Euglena in terms of evolutionary cell biology and its unique taxonomic position, we initiated a de novo transcriptome sequencing project in order to understand this intriguing organism. By analysing the proteins encoded in this transcriptome, we can identify an extremely complex metabolic capacity, rivalling that of multicellular organisms. Many genes have been acquired from what are now very distantly related species. Herein we consider the biology of Euglena in different time frames, from evolution through control of cell biology to metabolic processes associated with carbohydrate and natural products biochemistry.
    10/2015; DOI:10.1016/j.pisc.2015.07.002
  • Source
    • "The detection, module composition and family assignment of all CAZymes were performed as described previously (Cantarel et al. 2009; Levasseur et al. 2013; Lombard et al. 2014). Briefly, the method combines BLAST and HMMER searches conducted against sequence libraries and HMM profiles made of the individual functional modules featured in the CAZy database ( "

  • Source
    • "Different enzymes within a family share the same conserved active site residues in their GH modules, but contain specific NCR sequences . Several sequence characteristics and functions have been described in NCRs, such as carbohydrate binding modules (CBMs), thrombospondin type 3 repeats (TSP3), and bacterial immunoglobulin-like domains of group 2 (Big2 domains) (Bourne and Henrissat, 2001; Cantarel et al., 2009; Michel et al., 2009). The CBMs in * Corresponding author. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A Glycoside hydrolase (GH) typically contains one catalytic module and varied non-catalytic regions (NCRs). However, effects of the NCRs to the catalytic modules remain mostly unclear except the carbohydrate-binding modules (CBMs). AgaG4 is a GH16 endo-β-agarase of the agarolytic marine bacterium Flammeovirga sp. MY04. The enzyme consists of an extra sugar-binding peptide within the catalytic module, with no predictable CBMs but function-unknown sequences in the NCR, which is a new characteristic of agarase sequences. In this study, we deleted the NCR sequence, a 140-amino acid peptide at the C-terminus and expressed the truncated gene, agaG4-T140, in Escherichia coli. After purification and refolding, the truncated agarase rAgaG4-T140 retained the same catalytic temperature and pH value as rAgaG4. Using combined fluorescent labeling, HPLC and MS/MS techniques, we identified the end-products of agarose degradation by rAgaG4-T140 as neoagarotetraose and neoagarohexaose, with a final molar ratio of 1.53:1 and a conversion ratio of approximately 70%, which were similar to those of rAgaG4. However, the truncated agarase rAgaG4-T140 markedly decreased in protein solubility by 15 times and increased in enzymatic activities by 35 times. The oligosaccharide production of rAgaG4-T140 was approximately 25 times the weight of that produced by equimolar rAgaG4. This study provides some insights into the influences of NCR on the biochemical characteristics of agarase AgaG4 and implies some new strategies to improve the properties of a GH enzyme.
    Journal of Ocean University of China 10/2015; 14(5). DOI:10.1007/s11802-015-2800-0 · 0.56 Impact Factor
Show more