Update of KEYnet: a gene and protein names database for biosequences functional organisation.
ABSTRACT KEYnet is a database where gene and protein names are hierarchically structured. Particular care has been devoted to the search and organisation of synonyms. The structuring is based on biological criteria in order to assist the user in data search and to minimise the risk of information loss. Links to the EMBL data library by the entry name and the accession number are implemented. KEYnet is available through the WWW at the following site: http://www.ba.cnr.it/keynet.html
- [show abstract] [hide abstract]
ABSTRACT: EMBL and GenBank keyword indexes have no hierarchical structure. In this paper we present a method for merging and reorganizing them in a tree structure whose primary roots are the keywords 'protein', 'DNA', 'RNA', and 'unclassified'. Synonymous keywords have been grouped together and erroneous keywords have been corrected. This taxonomic organization of keywords results in a more extensive and efficient retrieval which is further aided by "synonyms declaration". The tree has been produced using the computer programs GENPOINT and CREANET.Protein sequences & data analysis 10/1990; 3(4):327-34.
- Methods in Enzymology 02/1996; 266:114-28. · 2.00 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: KEYnet is a database where gene and protein names are hierarchically structured. Particular care has been devoted to the search and organisation of synonyms. The structuring is based on biological criteria in order to assist the user in the data search and to minimise the risk of loss of information. Links to the EMBL data library by the entry name and the accession number have been implemented. KEYnet is available through the World Wide Web at the following site: http://www.ba.cnr.it/keynet.html. Recently KEYnet has incorporated specific gene name classifications, which can be browsed starting from the above-mentioned KEYnet home page: the Mitochondrial Gene Names classification and the Rat Gene Names classification. KEYnet database has also been structured in a flatfile format and can be queried through SRS (http://bio-www.ba.cnr.t:8000/srs).Nucleic Acids Research 01/1999; 27:365-367. · 8.28 Impact Factor
Nucleic Acids Research, 2000, Vol. 28, No. 1 © 2000 Oxford University Press
Update of KEYnet: a gene and protein names database
for biosequences functional organisation
D. Catalano, F. Licciulli, D. D’Elia and M. Attimonelli1,*
Area di Ricerca, CNR, 70126 Bari, Italy and1Department of Biochemistry and Molecular Biology, Faculty of Sciences,
University of Bari, 70126 Bari, Italy
Received October 4, 1999; Revised and Accepted October 13, 1999
KEYnet is a database where gene and protein names
are hierarchically structured. Particular care has
been devoted to the search and organisation of syno-
nyms. The structuring is based on biological criteria
the risk of information loss. Links to the EMBL data
library by the entry name and the accession number
are implemented. KEYnet is available through the WWW
at the following site: http://www.ba.cnr.it/keynet.html
The most common interrogation criteria for bio-databases are
gene and protein names but, so far, the majority of them have
been incorrectly annotated in the nucleic acid sequence databases
which causes inconsistencies in data retrieval. In order to
properly target retrieval using such criteria, gene and protein
names needto be correctlycoded. Herewe present the database
KEYnet (1,2) where gene and protein names are organised in a
hierarchical structure according to the biological function of
the associated sequence. Links among lexical or biological
synonyms are implemented.
Each entry in the KEYnet database is related to a gene or
protein name. The whole database is hierarchically structured
according to the scheme previously reported (1,2) and visible
at http://bio-www.ba.cnr.it:8000/Tutorials/KEYnet/network.html .
In particular, KEYnet structure is made up of a set of elements,
nodes, linked to form a father–son relationship. At the highest
level there is the root which links all the branches in the tree.
The most important branches are the nodes Protein, DNA and
RNA. Each leaf in the tree is composed of several elements
linked by synonymy. Two by-side branches are implemented:
the RAT Gene Names Tree and the Mitochondrial Genome
Tree [the Mitochondrion Gene names classification has been
structured as a contribution to the MitBASE project (3)]. Gene
and protein names are extracted from the EMBL data library (4).
Biological information about associated sequences are
extracted from the same primary databases [EMBL data library
(4) and GenBank (5)] and from specialised databases such as
SWISS-PROT (6), ENZYME (7) or any other suitable database.
MEDLINE is also consulted whenever the above mentioned
databases do not contain the necessary information for the
gene and protein name classification. KEYnet database is
updated at each EMBL data library release and, at this time, the
link among KEYnet and the EMBL data library is established.
One of the major problems encountered during data
classification is the gene names branch. Gene naming is
recognised worldwide as a difficult problem, due to the
freedom with which users assign a name to a gene whenever it
is discovered. Several attempts to address this problem are in
progress (8,9; see http://www.ebi.ac.uk:7081/docs/nomenclature
and http://www.gene.ucl.ac.uk/nomenclature ).
We have organised gene names by establishing a starting set
of main ancestor keywords relevant to their primary biological
functions. At present KEYnet contains 66 219 gene and protein
names as is reported in detail in the table at http://bio-www.
KEYnet QUERY SYSTEMS
KEYnet database can be queried through the RETKEY
program, written in FORTRAN and C, available at the CNR
Research Area of the Bari server. A slightly different version is
KEYnetWWW (http://www.ba.cnr.it/keynet.html ), which is
more powerful because it can be accessed worldwide and the
retrievable information is more complete.
The usage of KEYnetWWW is described in the following
examples. Searching for glutamine synthetase nucleotide
sequences in the KEYnet database (http://bio-www.ba.cnr.
it:8000/Tutorials/KEYnet/example1 ) we obtain 257 entries
from release 58 of the EMBL data library. Searching for the
same protein starting from the ENZYME database through the
SRS (10) retrieval system (http://bio-www.area.ba.cnr.it:8000/
Tutorials/KEYnet/example2 ) gives 148 entries from the same
revised and the numbers actually refer to entries related to
nucleotide sequences coding for glutamine synthetase.
Users of KEYnet are kindly invited to cite the present article.
This work has been partially supported by the EU-Biotechnology
Programme (Contracts n. BIO4-CT95-0037 and BIO4-CT97-0),
by ‘Programma Biotecnologie legge 95/95 (MURST 5%)’, by
MPI (Italy) and by CNR Research Area of Bari (IT).
*To whom correspondence should be addressed. Tel: +39 080 548 2130; Fax: +39 080 548 4467; Email: email@example.com
Nucleic Acids Research, 2000, Vol. 28, No. 1
1. Tullo,A., Liuni,S. and Attimonelli,M. (1990) Protein Seq. Data Anal., 3,
2. Liciulli,F., Catalano,D., D’Elia,D., Lorusso,V. and Attiminelli,M. (1999)
Nucleic Acids Res., 27, 365–367.
3. Attimonelli,M., Altamura,N., Benne,R., Boyen,C., Brennicke,A.,
Carone,A., Cooper,J.M., D’Elia,D., de Montalvo,A., de Pinto,B.,
De Robertis,M., Golik,P., Grienenberger,J.M., Knoop,V., Lanave,C.,
Lazowska,J., Lemagnen,A., Malladi,B.S., Memeo,F., Monnerot,M.,
Pilbout,S., Schapira,A.H.V., Sloof,P., Slonimski,P., Stevens,K. and
Saccone,C. (1999) Nucleic Acids Res., 27, 128–133. Updated article in
this issue: Nucleic Acids Res. (2000), 28, 148–152.
4. Stoesser,G., Tuli,M.A., Lopez,R. and Sterk,P. (1999) Nucleic Acids Res.,
27, 18–24. Updated article in this issue: Nucleic Acids Res. (2000), 28,
5. Dennis,A., Benson,M., Boguski,S., Lipman,D.J., Ostell,J.,
Ouellette,B.F.F., Rapp,B.A. and Wheeler,D.L. (1999) Nucleic Acids Res.,
27, 12–17. Updated article in this issue: Nucleic Acids Res. (2000), 28,
6. Bairoch,A. and Apweiler,R. (1999) Nucleic Acids Res., 27, 49–54.
Updated article in this issue: Nucleic Acids Res. (2000), 28, 45–48.
7. Bairoch,A. (1999) Nucleic Acids Res., 27, 310–311.
8. Lonsdale,D.M. and Leaver,C.J. (1988) Plant Mol. Biol., 6, 14–21.
9. Hallick,R.B. (1989) Plant Mol. Biol., 7, 266–275.
10. Etzold,T.,Ulyanov,A.andArgos,P.(1996) MethodsEnzymol.,266,114–128.