Help

CAZy Families

Enzyme and Associated Module family pages have:
- a header describing the common known aspects of each family
- a listing of proteins

Protein

Each protein is identified by a combination of trivial names, gene identifications, and locus_tags. In some cases a standard name following suggested nomenclature [1] may be found in bold. The CAZy team may edit protein names for harmonization purposes. Examples are:

Trivial name α-amylase
Gene Name AmyA
Locus_Tag name TM1840
Standard name Amy13A

[1] Henrissat B , Teeri TT, Warren RA (1998) A scheme for designating enzymes that hydrolyse the polysaccharides in the cell walls of plants. FEBS Lett 425:352-354 [doi:10.1016/S0014-5793(98)00265-8].

EC numbers

Only proteins experimentally characterized are attributed an EC number according to the IUBMB rules. A consequence is that we systematically remove the EC numbers assigned based on sequence similarity only (a frequent event in genomic efforts). Sometimes no EC number exists to describe a particular CAZyme activity. In this case we display a partial EC number instead of a full EC number.

Full EC 3.2.1.1 (as in "regular" α-amylase)
Partial EC 3.2.1.- (as in sucrose hydrolase)

Organism

The systematic name of organisms are attributed according to the NCBI Taxonomy often complemented by extra information (strain, serovar, variant, etc). Sequences derived from a fully sequenced genome are shown in bold. If the same protein is found in different organisms, it appears as separate entries.

Subf

The number designating a subfamily is indicated when defined. Following phylogenetic analysis, subgroups can be defined using different criteria. Subfamilies defined in CAZy are select so that their corresponding members are unambiguously distinct from the rest of the family. The inclusion of a protein in a subfamily may support function prediction, a particularly useful feature for families bearing different activities. This information level is only provided for family GH13 at present.

Families in CAZy : update and creation of families

Sequence Updates Protein sequences are downloaded from the daily releases of the NCBI and are compared to our internal BLAST and HMM libraries of modules. Positive hits are automatically or manually added to the database, all including a validation step by a curator.
Biochemical Updates Information on biochemistry and structure is extracted from the literature. Experimentally demonstrated enzyme activities are assigned to individual enzyme modules whenever possible.
New Families New CAZy families are created whenever we find published biochemical evidence for activity (either GH, PL, GT, CE or CBM) associated to a protein sequence not yet classified in one of our families. Please contact us if we have missed a family, we will be happy to add it.

GenBank

Access to GenBank (analogous to EMBL, DDBJ, all nucleotide databases) codes are indirectly provided through links to Protein Identification (or PID) numbers. As individual protein sequences may vary in length, quality and content, the "best" model is identified in bold. Sequences issued from patents and other external information may have no corresponding GenBank (or nucleotide) code. Sequences deposited with no identification of coding sequence or issued from pseudogenes may have no corresponding PID.

UniProt

Accessions from the UniProt database (covering all SwissProt entries, but only very partially those from TrEMBL or PIR) are provided for convenience purposes. The SwissProt entries present in CAZy are for the vast majority the result of unsupervised automated annotation. We collaborate with the UniProt Consortium to provide regular updates of our reciprocal links.

PDB / 3D

PDB accession codes are provided for proteins with an experimentally determined three-dimensional structure. This information is derived from weekly analyses of new PDB releases and from the survey of recent literature. The three-dimensional structures of the constitutive modules of a CAZyme are sometimes determined individually after production of truncated proteins. Whenever possible we show the PDB code corresponding to the family displayed (conversely if a PDB code is available for the same protein but for a module different from the family displayed, it will not be shown). Finally, ’cryst’ indicates that we have seen a public protein crystalization report.

PDB[Chain1,Chain2,...] Accession corresponding to a fully released PDB entry with all chain(s) corresponding to the protein in question described.
PDB Accession of an unreleased PDB entry corresponding to the protein in question. These codes are obtained from the analysis of the PDB unreleased structures set or from recent literature.
cryst Protein with a crystallization note released in the literature or in public meetings.
Last update: 2017-03-07 © Copyright 1998-2017
AFMB - CNRS - Université d'Aix-Marseille