Genomes

Introduction

Protein sequences originating from complete genomes and that can be assigned to CAZy families are listed in the links below. The only genomes that are consistently surveyed in the CAZy database are those released by the NCBI as regular entries in the daily releases of GenBank. In a very limited number of cases, we have included data from RefSeq genomes.

The collection of carbohydrate-active enzymes encoded by the genome of an organism ("CAZome") provides an insight into the nature and extent of the metabolism of complex carbohydrates of the species. The CAZomes of free living organisms typically correspond to 1-5% of the predicted coding sequences. Extremely reduced CAZomes are characteristic of species with a strict intracellular parasitic lifestyle. Because of the massive chemical, structural and functional variability of carbohydrates, CAZome analyses and comparisons highlight significant differences between species.

Although often useful, the simple assignment of a protein sequence to a CAZy family does not constitute a refined functional prediction for genomic annotation. For the later task, we are developping a CAZy-based annotation methodology, which takes into account protein modularity, family and subfamily assignment, relatedness to experimentally characterized enzymes and expertise in the varying substrate specificity of carbohydrate-active enzymes. This methodology, which results in coherent, expert and comparable sets of annotations, is applied to novel genomes and metagenomes on a collaborative basis.

NEW : In 2021, we decided to no longer systematically analyze genomes for which at least 60 strains have already been analyzed in CAZy (Organism blacklisted).
Similarly, genomes released as complete but originating from metagenomics data assembly are not systematically analyzed, nor are those from the myriads of clinical isolates of strains already present in CAZy. This allows to focus our curation efforts on the novel explored space.

Tables for Direct Genome Access by Kingdom

Bacteria31679
Eukaryota1665
Archaea541
Viruses490

Last update: 2024-09-19 © Copyright 1998-2024
AFMB - CNRS - Université d'Aix-Marseille