Classifications in CAZy : short description
The CAZy database presently describes two types of common components of Carbohydrate-active enzymes :
Catalytic Modules (Enzymes) | Subdivided in various enzyme classes and families thereof that catalyze the breakdown, biosynthesis and/or modification of glycoconjugates, oligo- and polysaccharides |
Associated Modules | Families of modules found attached to the catalytic modules |
Family, Subfamily and Clan Acronyms in CAZy : common designations
Families are typically designated using a simple form including the class or category of the module and a number that reflects the order of family creation within the same group. The common designations found in our site are :
Family | Extended Designation |
GH# | Glycoside Hydrolase Family # (ex : GH7) |
GT# | Glycosyltransferase Family # (ex : GT7) |
PL# | Polysaccharide Lyase Family # (ex : PL5) |
CE# | Carbohydrate Esterase Family # (ex : CE2) |
CBM# | Carbohydrate-Binding Module Family # (ex : CBM35) |
Subfamilies are subgroups found within a family that share a more recent ancestor and, that are usually more uniform in molecular function. At present, subfamilies where originally described for family GH13 and cover PL families from PL1 to PL22. Subfamily assignments are gradually being extended to other CAZy families (GH5, GH16, GH30, GH43). Each subfamily is designated by a numeral suffix as follows :
Subfamily | Extended Designation |
GH#_[1,2,3,...,n] | Glycoside Hydrolase Family #, Subfamily [1,2,3,...,n] (ex : GH13_1) |
Clans are groups of families sharing a fold and catalytic machinery. They represent a higher level in the hierarchy of CAZy classification and are only defined for Glycoside Hydrolases at present. The common designation is GH-Letter (ex : GH-A) where the letter simply reflects the order of definition of the clans. Attribution of GH families to clans usually follows structural determination.
CAZy Families
Enzyme and Associated Module family pages have :
– a header describing the common known aspects of each family
– a listing of proteins
Each protein is identified by a combination of trivial names, gene identifications, and locus_tags. In some cases a standard name following suggested nomenclature [1] may be found in bold. The CAZy team may edit protein names for harmonization purposes. Examples are :
Trivial name | α-amylase |
Gene Name | AmyA |
Locus_Tag name | TM1840 |
Standard name | Amy13A |
[1] Henrissat B , Teeri TT, Warren RA (1998) A scheme for designating enzymes that hydrolyse the polysaccharides in the cell walls of plants. FEBS Lett 425:352-354 [doi:10.1016/S0014-5793(98)00265-8].
EC numbers
Only proteins experimentally characterized are attributed an EC number according to the IUBMB rules. A consequence is that we systematically remove the EC numbers assigned based on sequence similarity only (a frequent event in genomic efforts). Sometimes no EC number exists to describe a particular CAZyme activity. In this case we display a partial EC number instead of a full EC number.
Full EC | (as in "regular" α-amylase) |
Partial EC | 3.2.1.- (as in sucrose hydrolase) |
The systematic name of organisms are attributed according to the NCBI Taxonomy often complemented by extra information (strain, serovar, variant, etc). Sequences derived from a fully sequenced genome are shown in bold. If the same protein is found in different organisms, it appears as separate entries.
The subfamily designation is indicated where defined. Following phylogenetic or sequence similarity network analyses, subgroups can be defined using different criteria. The inclusion of a protein in a subfamily may support function prediction, a particularly useful feature for families bearing different activities. This information level is currently provided for families GH5, GH13, GH16, GH19, GH30, GH31, GH43, GH45, GH51, GH55, GH68, GH123, GH130.
CAZac description
CAZy sequence families in CAZy can group different activities, such as hydrolases and transglycosylases. In the CAZac search form, selection of the ‘enzyme class’ will restrict the search to a particular sequence class (GH, GT, PL or AA). Enzyme activity IDs in the CAZac system are made of a letter followed by numbers. The letter designates the overall enzyme activity while the numbers are unique. The field ‘enzymatic activity’ in the CAZac search menu allows to restrict the search to specific types of reactions, such as hydrolase (H), transferase (T), lyase (L) etc within the various classes [2]. It is thus possible to combine ‘enzyme class’ and ‘enzyme activity’ for instance to list enzymes in GH families that have transferase (transglycosylation) activity.
Class | EC | Activity Id | Reactant |
GH | |||
Hydrolases | 3.2.1.x | H | H2O |
Phosphorylases | 2.4.1.x , 2.4.99.x | T | a/ePi |
Transglycosylases | 2.4.1.x , 2.4.2.x | T | a/e-leaving group |
Lyases | 4.2.2.x | L | none |
Mutases | 5.4.99.x | M | a/e-leaving group |
PL | |||
Lyases | 4.2.2.x | L | none |
AA - LPMO | |||
Oxydoréductases | 4.2.2.x | O | H2O2 |
GT | |||
Transferases | 4.2.2.x | T | a/e-leaving group |
Residue undergoing catalysis
In CAZac, a standardized nomenclature inspired by the IUPAC convention was devised to describe monosaccharides.
A ‘strict’ option can be added to limit the search to one specific monosaccharide, for example ‘strict’ bDGalp does not give activities on bDGalpNAc.
[2] Vincent Lombard, Bernard Henrissat, Marie-Line Garron, CAZac : an activity descriptor for carbohydrate-active enzymes, Nucleic Acids Research, 2024 [doi:10.1093/nar/gkae1045].
Families in CAZy : update and creation of families
Sequence Updates | Protein sequences are downloaded from the daily releases of the NCBI and are compared to our internal BLAST and HMM libraries of modules. Positive hits are automatically or manually added to the database, all including a validation step by a curator. |
Biochemical Updates | Information on biochemistry and structure is extracted from the literature. Experimentally demonstrated enzyme activities are assigned to individual enzyme modules whenever possible. |
New Families | New CAZy families are created whenever we find published biochemical evidence for activity (either GH, PL, GT, CE or CBM) associated to a protein sequence not yet classified in one of our families. Please contact us if we have missed a family, we will be happy to add it. |
Links to GenBank accessions are provided for each protein. As a protein may correspond to several accessions (different versions or multiple sequencings) which can vary in length, quality and sequence itself, the "best" model is identified in bold. Sequences issued from patents, issued from pseudogenes, and other external source have thus no such GenBank link.
Since 2021, Cross-references to UniProt accessions (Swiss-Prot and TrEMBL) are systematically searched for biochemically characterized proteins only, and updated on a regular basis.
PDB / 3D
PDB accession codes are provided for proteins with an experimentally determined three-dimensional structure. This information is derived from weekly analyses of new PDB releases and from the survey of recent literature. The three-dimensional structures of the constitutive modules of a CAZyme are sometimes determined individually after production of truncated proteins. Whenever possible we show the PDB code corresponding to the family displayed (conversely if a PDB code is available for the same protein but for a module different from the family displayed, it will not be shown). Finally, ’cryst’ indicates that we have seen a public protein crystalization report.
PDB[Chain1,Chain2,...] | Accession of a released PDB entry followed by all chains sharing the same underlying sequence. |
PDB | Accession of an unreleased PDB entry corresponding to the protein in question. These codes are obtained from the analysis of the PDB unreleased structures set or from recent literature. |
cryst | Protein with a crystallization note released in the literature or in public meetings. |