CAZy is developed by the Glycogenomics group at AFMB in Marseille, France.

Carbohydrate-active enzymes (CAZymes) are responsible for the synthesis and breakdown of glycoconjugates, oligo- and polysaccharides. They typically correspond to 1-5% of the genes of a living organism. Glycoconjugates, oligo- and polysaccharides play essential roles in life not only as structure and energy reserve components but also in many intra- and intercellular recognition events. CAZymes are often involved in immune and host-pathogen interactions, and are implicated in human and agricultural-related diseases. Carbohydrate-active enzymes also play a central role in the biosynthesis and degradation of the plant cell wall, which represents the most abundant of photosynthetically-fixed carbon on Earth and hence these enzymes are widely thought as being one of the keys to the production of biofuels.

The functional annotation of CAZymes in genomes is challenging for non-specialists, due to the varying modularity of these enzymes and the grouping of enzymes with different substrate specificity in the same sequence-based families. Many errors are therefore created by automated functional annotation pipelines with their consequent accumulation and propagation in public databases. Over the last 15 years we have developed the Carbohydrate-active enzymes database (CAZy; http://www.cazy.org), a dedicated family classification system that correlate with the structure and molecular mechanism of CAZymes. Close to 300 families of catalytic and ancillary modules are presented online and correspond to over 100,000 non-redundant entries. The CAZy classification is widely used by the scientific community. We develop tools for unambiguous high-throughput modular and functional annotation of CAZymes in sequences issued from genomic and metagenomic efforts. Our objectives are to improve the coverage and nature of functional annotations of CAZymes from publicly released and ongoing genome sequencing efforts and to use these improved and uniformly annotated data sets for comparative genomics.


