CAZy - Help

CAZy Families

Enzyme and Associated Module family pages have :
a header describing the common known aspects of each family
a listing of proteins

Protein

Each protein is identified by a combination of trivial names, gene identifications, and locus_tags. In some cases a standard name following suggested nomenclature [1] may be found in bold. The CAZy team may edit protein names for harmonization purposes. Examples are :

Trivial name	α-amylase
Gene Name	AmyA
Locus_Tag name	TM1840
Standard name	Amy13A

[1] Henrissat B , Teeri TT, Warren RA (1998) A scheme for designating enzymes that hydrolyse the polysaccharides in the cell walls of plants. FEBS Lett 425:352-354 [doi:10.1016/S0014-5793(98)00265-8].

EC numbers

Only proteins experimentally characterized are attributed an EC number according to the IUBMB rules. A consequence is that we systematically remove the EC numbers assigned based on sequence similarity only (a frequent event in genomic efforts). Sometimes no EC number exists to describe a particular CAZyme activity. In this case we display a partial EC number instead of a full EC number.

Full EC	3.2.1.1 (as in "regular" α-amylase)
Partial EC	3.2.1.- (as in sucrose hydrolase)

Organism

The systematic name of organisms are attributed according to the NCBI Taxonomy often complemented by extra information (strain, serovar, variant, etc). Sequences derived from a fully sequenced genome are shown in bold. If the same protein is found in different organisms, it appears as separate entries.

Subf

The number designating a subfamily is indicated when defined. Following phylogenetic analysis, subgroups can be defined using different criteria. Subfamilies defined in CAZy are select so that their corresponding members are unambiguously distinct from the rest of the family. The inclusion of a protein in a subfamily may support function prediction, a particularly useful feature for families bearing different activities. This information level is only provided for family GH13 at present.

Families in CAZy : update and creation of families

Sequence Updates	Protein sequences are downloaded from the daily releases of the NCBI and are compared to our internal BLAST and HMM libraries of modules. Positive hits are automatically or manually added to the database, all including a validation step by a curator.
Biochemical Updates	Information on biochemistry and structure is extracted from the literature. Experimentally demonstrated enzyme activities are assigned to individual enzyme modules whenever possible.
New Families	New CAZy families are created whenever we find published biochemical evidence for activity (either GH, PL, GT, CE or CBM) associated to a protein sequence not yet classified in one of our families. Please contact us if we have missed a family, we will be happy to add it.

GenBank

Links to GenBank accessions are provided for each protein. As a protein may correspond to several accessions (different versions or multiple sequencings) which can vary in length, quality and sequence itself, the "best" model is identified in bold. Sequences issued from patents, issued from pseudogenes, and other external source have thus no such GenBank link.

UniProt

Since 2021, Cross-references to UniProt accessions (Swiss-Prot and TrEMBL) are systematically searched for biochemically characterized proteins only, and updated on a regular basis.

PDB / 3D

PDB accession codes are provided for proteins with an experimentally determined three-dimensional structure. This information is derived from weekly analyses of new PDB releases and from the survey of recent literature. The three-dimensional structures of the constitutive modules of a CAZyme are sometimes determined individually after production of truncated proteins. Whenever possible we show the PDB code corresponding to the family displayed (conversely if a PDB code is available for the same protein but for a module different from the family displayed, it will not be shown). Finally, ’cryst’ indicates that we have seen a public protein crystalization report.

PDB[Chain1,Chain2,...]	Accession of a released PDB entry followed by all chains sharing the same underlying sequence.
PDB	Accession of an unreleased PDB entry corresponding to the protein in question. These codes are obtained from the analysis of the PDB unreleased structures set or from recent literature.
cryst	Protein with a crystallization note released in the literature or in public meetings.