MEROPS: the peptidase database

Article (PDF Available)inNucleic Acids Research 36(Database issue):D320-5 · February 2008with65 Reads
DOI: 10.1093/nar/gkm954 · Source: PubMed
Peptidases (proteolytic enzymes or proteases), their substrates and inhibitors are of great relevance to biology, medicine and biotechnology. The MEROPS database ( aims to fulfil the need for an integrated source of information about these. The organizational principle of the database is a hierarchical classification in which homologous sets of peptidases and protein inhibitors are grouped into protein species, which are grouped into families and in turn grouped into clans. Important additions to the database include newly written, concise text annotations for peptidase clans and the small molecule inhibitors that are outside the scope of the standard classification; displays to show peptidase specificity compiled from our collection of known substrate cleavages; tables of peptidase–inhibitor interactions; and dynamically generated alignments of representatives of each protein species at the family level. New ways to compare peptidase and inhibitor complements between any two organisms whose genomes have been completely sequenced, or between different strains or subspecies of the same organism, have been devised.


D320–D325 Nucleic Acids Research, 2008, Vol. 36, Database issue Published online 8 November 2007
MEROPS: the peptidase database
Neil D. Rawlings*, Fraser R. Morton, Chai Yin Kok, Jun Kong and Alan J. Barrett
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK
Received September 17, 2007; Revised and Accepted October 15, 2007
Peptidases (proteolytic enzymes or proteases), their
substrates and inhibitors are of great relevance to
biology, medicine and biotechnology. The MEROPS
database ( aims to fulfil
the need for an integrated source of information
about these. The organizational principle of the
database is a hierarchical classification in which
homologous sets of peptidases and protein inhibi-
tors are grouped into protein species, which are
grouped into families and in turn grouped into clans.
Important additions to the database include newly
written, concise text annotations for peptidase
clans and the small molecule inhibitors that are
outside the scope of the standard classification;
displays to show peptidase specificity compiled
from our collection of known substrate cleav-
ages; tables of peptidase–inhibitor interactions;
and dynamically generated alignments of repre-
sentatives of each protein species at the family
level. New ways to compare peptidase and inhibitor
complements between any two organisms whose
genomes have been completely sequenced, or
between different strains or subspecies of the
same organism, have been devised.
The MEROPS database is a manually curated informa-
tion resource for peptidases (also known as proteases,
proteinases or proteolytic enzymes), their inhibitors and
substrates. The database has been in existence since 1996
and can be found at Releases
are made quarterly.
Peptidases and protein inhibitors are arranged in
the database according to a hierarchical classification.
The classification is based on sequence comparisons of the
domains known to be important for activity (known as the
peptidase or inhibitor unit). A protein that has been
sequenced and characterized biochemically is chosen as
a representative (‘holotype’). All sequences that represent
species variants of the holotype are grouped into
a ‘protein species’. The sequences of statistically significant
related protein species are grouped into a ‘family’. Families
that are believed to have had a common ancestor, either
because the tertiary structures of the proteins are similar or
(in the case of peptidases) active site residues are in the
same order in the sequence, are grouped into a ‘clan’ (1,2).
Statistics from release 7.8 (April 2007) of MEROPS are
shown in Table 1 and compared with release 7.1 from July
2005. The number of peptidase sequences has more than
doubled, whereas the numbers of protein families and
clans has increased only marginally. This reflects the
considerable effort being put into completing genome
sequences. There has also been a significant increase
(17%) in the number of peptidase species.
We have expanded the text summaries to include clans of
peptidases. A peptidase clan summary is structured under
the headings: description, history (when and where the clan
identifier was first published), contents of clan (a descrip-
tion of the types of peptidases contained within the clan),
evidence (an explanation as to why the families are included
in the same clan), catalytic mechanism, peptidase activity,
protein fold (descriptions of the known tertiary structures
for members of the clan and to which families they belong),
homologous non-peptidase families (families of proteins
other than peptidases that share a similar tertiary
structure), evolution (pointing out possible relationships
that may exist with peptidases in other clans or significant
absences amongst organism kingdoms), activation mecha-
nism and other databases (links to clans in the Pfam
database (3) and superfamilies in the SCOP database (4)).
Specificity logos
One of the most important characteristics of a peptidase
that distinguishes it from related peptidases is its action on
substrates. For some peptidases, such as trypsin, the
specificity is easily described because catalytic activity
action is restricted to cleavage of lysyl or arginyl bonds.
However, for many peptidases specificity is much more
complex, and is difficult to define.
*To whom correspondence should be addressed. Tel: +44 1223 494983; Fax: +44 1223 494919; Email:
ß 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Schechter and Berger (5) introduced a naming conven-
tion that helps with the description of peptidase specificity.
The peptide chain around the cleavage site of the substrate
is assumed to thread through the active site of the
peptidase so that each binding pocket of the peptidase is
occupied by one amino acid, and many crystal structures
of peptidase–inhibitor complexes tend to confirm this
model. The residues of the substrate C-terminal to the site
of cleavage (the ‘scissile bond’) are numbered P1
etc. (the ‘prime side’), whereas those N-terminal to the
scissile bond are numbered P1, P2, P3, etc. (the ‘non-prime
side’). Each binding pocket of the peptidase (which may be
lined with several amino acid side chains) on the prime
side is numbered S1
, etc. and those on the non-
prime side are numbered S1, S2, S3, etc. The number of
important binding pockets (and therefore substrate
residues) differs between peptidases. In trypsin, specificity
is determined only by residues in P1 and P1
(only arginine
or lysine are acceptable in P1 and proline is not acceptable
in P1
). Peptidases that have a specificity requirement
beyond P1 or P1
are said to have an ‘extended binding
site’. Mitochondrial intermediate peptidase, which
removes an N-terminal octapeptide targeting signal from
proteins destined for the mitochondrial lumen, may have
the longest extended binding site (6), but for most
peptidases binding rarely exceeds P4.
The MEROPS collection of cleavages in natural and
synthetic substrates now exceeds 7000. For each cleavage
we store up to four residues on either side of the scissile
bond (residues P4–P4
). For any peptidase with more than
ten known cleavages we now present a display that gives an
indication of the amino acids preferred at its substrate-
binding sites. This display uses the WebLogo software (7).
For the purposes of this display, the eight residues P4–P4
for all substrates of a peptidase are considered to be an
alignment. The observed frequency of amino acids at each
position is calculated as a bit score, with the maximum
possible score being 4.32 bits. At each position, the single-
letter code of an amino acid is shown if the bit score exceeds
0.1, and the height of the letter is proportional to the bit
score. Acidic residues are shown in red, basic in blue,
hydrophobic residues in black and others in green. Figure 1
shows the cleavage site sequence logo for caspase-3. The
logo shows an absolute requirement for aspartate in P1
(position 4) and a preference for aspartate in P4, and this
specificity has been confirmed by experimentation (8).
In addition to the logo, a text string describing the
specificity is also shown (Figure 1). Where only one amino
acid predominates at a position, it is shown in uppercase if
the bit score exceeds 0.4. Where more than one amino acid
exceeds 0.1 bits for a single position, a letter is shown in
uppercase if the bit score exceeds 0.7.
The MEROPS database has included protein inhibitors of
peptidases since 2004. However, there are many other
inhibitors that are not proteins, including peptides and
synthetic inhibitors, which we term small molecule
inhibitors (SMIs). These include many that are laboratory
reagents used in the characterization of peptidases, and
others that are drugs such as the inhibitors of the
retropepsin of the HIV virus. Information about SMIs
has now been collated and is presented within MEROPS.
There is no satisfactory, single method to classify SMIs, so
their names and alternative names are simply listed
alphabetically. For many SMIs summaries have been
written. Each summary contains a recommended name,
other names including the chemical name, history, details
of peptidases inhibited, a description of the mechanism of
Figure 1. Cleavage site sequence logo showing specificity for caspase-3. Amino acids preferred in positions P4–P4
are shown in single-letter code.
The specificity is shown as a string where each position is separated by a forward slash character and multiple letters in a position indicate a wide
specificity for these amino acids. The scissile bond is shown by a red cross symbol. In the diagram, the height of the letter is proportional to the
number of cleavage sites in which it is present. Positions P4–P4
are numbered one to eight, with the scissile bond between residues four and five.
Caspase-3, like most caspases, has a preference for Asp in P1 and a majority of substrates also have Asp in P4.
Table 1. Counts of protein species, families and clans for peptidase and
protein inhibitor homologues in the MEROPS database
Peptidases Inhibitors Peptidases Inhibitors
Sequences 66 524 4 912 30 090 3 690
Protein species 2 403 571 2 053 532
Families 185 53 180 53
Clans 51 33 39 32
Nucleic Acids Research, 2008, Vol. 36, Database issue D321
inhibition, an image of the chemical structure, a cross
reference to the PubChem database (9), comments and
recommended reviews. An example summary page for
pepstatin is shown in Figure 2. In addition, a ‘Relevant
Inhibitors’ field has been added to the peptidase summa-
ries, which lists SMIs that are known to inhibit the
peptidase or that do not inhibit even if expected to. Each
item in the list has a link to the relevant SMI summary.
We have collected over 900 known peptidase–protein
inhibitor interactions from the literature. At least one
interaction with a peptidase is known for 375 inhibitor
species. We now present a table of interactions for each
inhibitor, which includes a link to relevant peptidase
summary, a reference and some details such as a published
(dissociation constant) figure and conditions or
comments about the interaction. An example is shown
in Figure 3. A similar table is also presented for each
peptidase, listing the inhibitors with which it interacts.
Comparison of peptidase/inhibitor complement between
strains of an organism
It is commonplace for several strains of a bacterium to be
subjected to genome sequencing. In MEROPS, we have
always maintained a non-redundant sequence collection,
and only one variant of a protein sequence is retained,
unless there is evidence that the proteins are products of
different genes. This has meant that for each bacterial
protein we display only one sequence, even though
variations may be known from several different strains.
All proteins derived from the same species have been
considered part of the same genomic complement,
regardless of strain. This approach may hide unique
expression of proteins that may have medical significance.
For example, not all strains of Escherichia coli are
pathogenic, and peptidases or inhibitors restricted to
pathogenic strains may be of medical importance.
Similarly, some proteins important for pathogenicity
may be inactivated by residue replacements in non-
pathogenic strains. To address this problem, whilst still
maintaining our non-redundant sequence collection, we
have made use of our nucleotide database sequence
accession collection, which is now annotated below the
species level (subspecies, strain, pathovar, etc.).
We now present two displays showing comparison of
peptidases or inhibitors between strains of a prokaryote
organism. The user is invited to select an organism from a
list of prokaryotes with completed genomes. The first
display shows clans and families and the number of
sequences in each family for each strain. This enables the
user to spot absences or additional members for the strain
of interest. The second display is at the protein species
level. Not only can absences or additional proteins be
observed, but also we provide a dynamically generated
alignment so that conservation of residues for the same
Figure 2. Example SMI page. The summary page for the inhibitor pepstatin is shown.
D322 Nucleic Acids Research, 2008, Vol. 36, Database issue
protein from different strains can be assessed. The
sequences are taken directly from the UniProt protein
sequence database (10) and the alignment is generated by
the MUSCLE software package (11).
To complement these changes, the genomes pages now
show counts of peptidases and protein inhibitors at the
strain level.
Comparison of peptidase/inhibitor complements
between organisms
We have also developed a display so that comparisons can
be made between any two organisms with completely
sequenced genomes, not just prokaryotes. The comparison
can be done at the family or protein species levels, and
counts are shown of peptidase homologues for each
species. Significant differences between the organisms are
highlighted. At the family level, separate counts are shown
for homologues presumed to be peptidases (i.e. possessing
all active site residues) and those that are predicted to be
non-peptidase homologues (with an unacceptable replace-
ment of at least one active site residue). Figure 4 shows
part of the comparison between human and the patho-
genic fungus Candida albicans (the causative agent of
thrush) at the family level. The only peptidase family
present in C. albicans but absent in humans is S64, which
includes the Ssy5 peptidase, a component of the pathways
for the uptake of external amino acids; the peptidase
activates the Stp1 transcription factor leading to
production of an amino acid permease (12). This identifies
the Ssy5 peptidase as a potential drug target, because
no homologue exists in human and the equivalent
gene has been shown to be essential in Saccharomyces
cerevisiae by the Saccharomyces Genome Deletion
Project (13).
Dynamic alignments for holotypes
With the completion of so many prokaryote and
eukaryote genomes, the number of sequences within
some families now exceeds a thousand. The alignments
we generate can therefore be very large. However, within
any family the number of proteins characterized well
enough to be considered holotypes is still small. The
family with most holotypes is S1, for which there are
over 3000 sequences and 400 holotypes. So that the
variability within a family can be more easily understood,
we now generate an alignment of peptidase or inhibitor
units just for the holotypes in each family. The MUSCLE
software package (11) is used to generate each alignment.
Active site residues, disulphide bridges, carbohydrate-
attachment sites and transmembrane regions are
New label keys for family alignments and trees
The label keys for the family alignments and trees are now
generated dynamically. The content of each line has been
altered and now shows the MEROPS identifier (linked to
the relevant summary page), the organism name (linked to
the relevant organism card), the recommended name
(and any subset of that name), the MEROPS accession
number for the sequence (linked to the sequence page),
and the extent of the peptidase or inhibitor unit.
The focus of the MEROPS database for the last 2 years
has been towards the addition of further annotation at the
protein species level. One peptidase species can be
distinguished from another by several criteria, including
Figure 3. Peptidase–inhibitor interactions for aprotinin.
Nucleic Acids Research, 2008, Vol. 36, Database issue D323
its interactions with substrates and inhibitors, and we now
include displays for both of these aspects. New tools
enable the user compare peptidases and protein inhibitor
species between strains of the same organism or between
The database is freely available and can be accessed from Our sequence databases in
FastA format and versions of the database as either flat
files or a compressed file of SQL statements for import
into MySQL can be uploaded from our FTP site (ftp://
We wish to thank the Wellcome Trust and the Medical
Research Council for financial support. Funding to pay
the Open Access publication charges for this article was
provided by the Wellcome Trust.
Conflict of interest statement. None declared.
1. Rawlings,N.D. and Barrett,A. J. (1993) Evolutionary families of
peptidases. Biochem. J., 290, 205–218.
2. Rawlings,N.D., Tolle,D.P. and Barrett,A.J. (2004) Evolutionary
families of peptidase inhibitors. Biochem. J., 378, 705–716.
3. Finn,R.D., Mistry,J., Schuster-Bockler,B., Griffiths-Jones,S.,
Hollich,V., Lassmann,T., Moxon,S., Marshall,M. et al. (2006)
Pfam: clans, web tools and services. Nucleic Acids Res.,
34(Database issue), D247–D251.
4. Andreeva,A., Howorth,D., Brenner,S.E., Hubbard,T.J., Chothia,C.
and Murzin,A.G. (2004) SCOP database in 2004: refinements
integrate structure and sequence family data. Nucleic Acids Res.,
32(Database issue), D226–D229.
5. Schechter,I. and Berger,A. (1968) On the active site of proteases. 3.
Mapping the active site of papain; specific peptide inhibitors of
papain. Biochem. Biophys. Res. Commun., 32, 898–902.
6. Branda,S.S. and Isaya,G. (1995) Prediction and identification of
new natural substrates of the yeast mitochondrial intermediate
peptidase. J. Biol. Chem., 270, 27366–27373.
7. Crooks,G.E., Hon,G., Chandonia,J.-M. and Brenner,S.E. (2004)
WebLogo: a sequence logo generator. Genome Res., 14, 1188–1190.
8. Thornberry,N.A., Rano,T.A., Peterson,E.P., Rasper,D.M.,
Timkey,T., Garcia-Calvo,M., Houtzager,V.M., Nordstrom,P.A.
et al. (1997) A combinatorial approach defines specificities of
members of the caspase family and granzyme B. Functional
relationships established for key mediators of apoptosis. J. Biol.
Chem., 272, 17907–17911.
9. Wheeler,D.L., Barrett,T., Benson,D.A., Bryant,S.H., Canese,K.,
Chetvernin,V., Church,D.M., DiCuccio,M. et al. (2007) Database
Figure 4. Comparison between the peptidase complements of the human and the C. albicans genomes. Only the top and bottom portions of
the table are shown.
D324 Nucleic Acids Research, 2008, Vol. 36, Database issue
resources of the National Center for Biotechnology Information.
Nucleic Acids Res., 35(Database issue), D5–D12.
10. Wu,C.H., Apweiler,R., Bairoch,A., Natale,D.A., Barker,W.C.,
Boeckmann,B., Ferro,S., Gasteiger,E. et al. (2006) The Universal
Protein Resource (UniProt): an expanding universe of protein
information. Nucleic Acids Res., 34(Database issue), D187–D191.
11. Edgar,R.C. (2004) MUSCLE: a multiple sequence alignment
method with reduced time and space complexity.
BMC Bioinformatics, 5, 113.
12. Abdel-Sater,F., El Bakkoury,M., Urrestarazu,A., Vissers,S. and
Andre,B. (2004) Amino acid signaling in yeast: casein
kinase I and the Ssy5 endoprotease are key
determinants of endoproteolytic activation of the
membrane-bound Stp1 transcription factor. Mol. Cell. Biol., 24,
13. Wilson,W.A. and Roach,P.J. (2003) Saccharomyces gene deletion
project: applications and use in the study of protein kinases and
phosphatases. Meth. Enzymol., 366, 403–418.
Nucleic Acids Research, 2008, Vol. 36, Database issue D325
    • "The genome-encoding cytochrome P450s were annotated using Blastp to search the fungal Cytochrome P450 database (by April 2015) with a cut-off Evalue set at 1e-10 [78, 79]. Proteomes were classified into proteolytic enzyme families by performing a batch Blast search against the MEROPS protease database (release 9.13) [80, 81], and carbohydrate-active enzymes were classified using a HMMER (v3.1b1, with default parameters) scan against the profiles compiled with dbCAN release 4.0 [82] based on the CAZy database [83] . Based on orthologue analysis and functional annotation , PDD enzyme-related genes were screened and classified into different enzyme coding categories [22, 42]. "
    [Show abstract] [Hide abstract] ABSTRACT: Background Aspergillus westerdijkiae produces ochratoxin A (OTA) in Aspergillus section Circumdati. It is responsible for the contamination of agricultural crops, fruits, and food commodities, as its secondary metabolite OTA poses a potential threat to animals and humans. As a member of the filamentous fungi family, its capacity for enzymatic catalysis and secondary metabolite production is valuable in industrial production and medicine. To understand the genetic factors underlying its pathogenicity, enzymatic degradation, and secondary metabolism, we analysed the whole genome of A. westerdijkiae and compared it with eight other sequenced Aspergillus species. ResultsWe sequenced the complete genome of A. westerdijkiae and assembled approximately 36 Mb of its genomic DNA, in which we identified 10,861 putative protein-coding genes. We constructed a phylogenetic tree of A. westerdijkiae and eight other sequenced Aspergillus species and found that the sister group of A. westerdijkiae was the A. oryzae - A. flavus clade. By searching the associated databases, we identified 716 cytochrome P450 enzymes, 633 carbohydrate-active enzymes, and 377 proteases. By combining comparative analysis with Kyoto Encyclopaedia of Genes and Genomes (KEGG), Conserved Domains Database (CDD), and Pfam annotations, we predicted 228 potential carbohydrate-active enzymes related to plant polysaccharide degradation (PPD). We found a large number of secondary biosynthetic gene clusters, which suggested that A. westerdijkiae had a remarkable capacity to produce secondary metabolites. Furthermore, we obtained two more reliable and integrated gene sequences containing the reported portions of OTA biosynthesis and identified their respective secondary metabolite clusters. We also systematically annotated these two hybrid t1pks-nrps gene clusters involved in OTA biosynthesis. These two clusters were separate in the genome, and one of them encoded a couple of GH3 and AA3 enzyme genes involved in sucrose and glucose metabolism. Conclusions The genomic information obtained in this study is valuable for understanding the life cycle and pathogenicity of A. westerdijkiae. We identified numerous enzyme genes that are potentially involved in host invasion and pathogenicity, and we provided a preliminary prediction for each putative secondary metabolite (SM) gene cluster. In particular, for the OTA-related SM gene clusters, we delivered their components with domain and pathway annotations. This study sets the stage for experimental verification of the biosynthetic and regulatory mechanisms of OTA and for the discovery of new secondary metabolites.
    Full-text · Article · Dec 2016
    • "Among them, PGPDAAGPA exhibited the highest inhibition of 75.00%, while IGSGPQ did not show any inhibition at 20 mM. Although they share the same amino acid residues at P1 (G) and P1′ (P), the residues of substrate C-terminal to the site of cleavage (the scissile bond) are numbered P1′, P2′, and those N-terminal to the scissile bond are numbered P1, P2, etc. (Rawlings, Morton, Kok, Kong, & Barrett, 2008)). ACE-inhibitory activities of the rest of the peptides were in the range of 1.60 to 72.93%. "
    [Show abstract] [Hide abstract] ABSTRACT: Stichopus horrens is the most popular species of sea cucumber due to strong beliefs of its numerous medicinal properties. In this study, ACE-inhibitory peptides of S. horrens generated through enzymatic hydrolysis using Alcalase were isolated. Three peptides EVSQGRP, CRQNTLGHNTQTSIAQ and VSRHFASYAN were found to exhibit high inhibition potency with IC50 values of 0.05, 0.08 and 0.21mM, respectively. It was found that the EVSQGRP,VSRHFASYAN and SAAVGSP exhibiting mixed inhibition patterns were susceptible to degradation by ACE as well, suggesting that themixed-mode inhibition could be a result of new generated peptide fragments while CRQNTLGHNTQTSIAQ inhibited ACE in a non-competitive manner. Invivo ACE inhibition studies showed that 400mg/kg of Alcalase-generated proteolysate stabilized the blood pressure in normotensive rats. These results suggest that the hydrolysed protein components of S. horrens possess bioactive peptides that can be exploited as functional food ingredients against hypertension.
    Full-text · Article · Jan 2016
    • "The potential pathogenic and virulence genes were identified by BLASTP against the pathogen-host interaction database (PHI-base) v.3.5 [13]. The families of proteases were identified by BLASTP against the peptidase database (MEROPS) release 9.10 [88, 89] with the E-value cutoff < = 1E-20. The carbohydrate active enzymes; glycoside hydrolases (GHs), polysaccharide lyases (PLs), carbohydrate esterases (CEs), were classified by the CAZy database [90], of which protein sequences were manually compiled via a CAZy tool [91] kindly provided by Alexander Holm Viborg. "
    [Show abstract] [Hide abstract] ABSTRACT: Ophiocordyceps unilateralis is an outstanding insect fungus for its biology to manipulate host ants’ behavior and for its extreme host-specificity. Through the sequencing and annotation of Ophiocordyceps polyrhachis-furcata, a species in the O. unilateralis species complex specific to the ant Polyrhachis furcata, comparative analyses on genes involved in pathogenicity and virulence between this fungus and other fungi were undertaken in order to gain insights into its biology and the emergence of host specificity. O. polyrhachis-furcata possesses various genes implicated in pathogenicity and virulence common with other fungi. Overall, this fungus possesses protein-coding genes similar to those found on other insect fungi with available genomic resources (Beauveria bassiana, Metarhizium robertsii (formerly classified as M. anisopliae s.l.), Metarhizium acridum, Cordyceps militaris, Ophiocordyceps sinensis). Comparative analyses in regard of the host ranges of insect fungi showed a tendency toward contractions of various gene families for narrow host-range species, including cuticle-degrading genes (proteases, carbohydrate esterases) and some families of pathogen-host interaction (PHI) genes. For many families of genes, O. polyrhachis-furcata had the least number of genes found; some genes commonly found in other insect fungi are even absent (e.g. Class 1 hydrophobin). However, there are expansions of genes involved in 1) the production of bacterial-like toxins in O. polyrhachis-furcata, compared with other entomopathogenic fungi, and 2) retrotransposable elements. The gain and loss of gene families helps us understand how fungal pathogenicity in insect hosts evolved. The loss of various genes involved throughout the pathogenesis for O. unilateralis would result in a reduced capacity to exploit larger ranges of hosts and therefore in the different level of host specificity, while the expansions of other gene families suggest an adaptation to particular environments with unexpected strategies like oral toxicity, through the production of bacterial-like toxins, or sophisticated mechanisms underlying pathogenicity through retrotransposons.
    Full-text · Article · Dec 2015
Show more