ArticlePDF Available

BNDB – the Biochemical Network Database

Authors:

Abstract and Figures

Technological advances in high-throughput techniques and efficient data acquisition methods have resulted in a massive amount of life science data. The data is stored in numerous databases that have been established over the last decades and are essential resources for scientists nowadays. However, the diversity of the databases and the underlying data models make it difficult to combine this information for solving complex problems in systems biology. Currently, researchers typically have to browse several, often highly focused, databases to obtain the required information. Hence, there is a pressing need for more efficient systems for integrating, analyzing, and interpreting these data. The standardization and virtual consolidation of the databases is a major challenge resulting in a unified access to a variety of data sources. We present the Biochemical Network Database (BNDB), a powerful relational database platform, allowing a complete semantic integration of an extensive collection of external databases. BNDB is built upon a comprehensive and extensible object model called BioCore, which is powerful enough to model most known biochemical processes and at the same time easily extensible to be adapted to new biological concepts. Besides a web interface for the search and curation of the data, a Java-based viewer (BiNA) provides a powerful platform-independent visualization and navigation of the data. BiNA uses sophisticated graph layout algorithms for an interactive visualization and navigation of BNDB. BNDB allows a simple, unified access to a variety of external data sources. Its tight integration with the biochemical network library BN++ offers the possibility for import, integration, analysis, and visualization of the data. BNDB is freely accessible at http://www.bndb.org.
Content may be subject to copyright.
BioMed Central
Page 1 of 9
(page number not for citation purposes)
BMC Bioinformatics
Open Access
Database
BNDB – The Biochemical Network Database
Jan Küntzer*
1
, Christina Backes
1
, Torsten Blum
2
, Andreas Gerasch
2
,
Michael Kaufmann
2
, Oliver Kohlbacher
2
and Hans-Peter Lenhof
1
Address:
1
Center for Bioinformatics, Saarland University, 66041 Saarbrücken, Germany and
2
Center for Bioinformatics/Wilhelm Schickard
Institute for Computer Science, Eberhard Karls University Tübingen, 72076 Tübingen, Germany
Email: Jan Küntzer* - kuentzer@bioinf.uni-sb.de; Christina Backes - cbackes@bioinf.uni-sb.de; Torsten Blum - blum@informatik.uni-
tuebingen.de; Andreas Gerasch - gerasch@informatik.uni-tuebingen.de; Michael Kaufmann - mk@informatik.uni-tuebingen.de;
Oliver Kohlbacher - oliver.kohlbacher@uni-tuebingen.de; Hans-Peter Lenhof - len@bioinf.uni-sb.de
* Corresponding author
Abstract
Background: Technological advances in high-throughput techniques and efficient data acquisition
methods have resulted in a massive amount of life science data. The data is stored in numerous
databases that have been established over the last decades and are essential resources for scientists
nowadays. However, the diversity of the databases and the underlying data models make it difficult
to combine this information for solving complex problems in systems biology. Currently,
researchers typically have to browse several, often highly focused, databases to obtain the required
information. Hence, there is a pressing need for more efficient systems for integrating, analyzing,
and interpreting these data. The standardization and virtual consolidation of the databases is a
major challenge resulting in a unified access to a variety of data sources.
Description: We present the Biochemical Network Database (BNDB), a powerful relational
database platform, allowing a complete semantic integration of an extensive collection of external
databases. BNDB is built upon a comprehensive and extensible object model called BioCore, which
is powerful enough to model most known biochemical processes and at the same time easily
extensible to be adapted to new biological concepts. Besides a web interface for the search and
curation of the data, a Java-based viewer (BiNA) provides a powerful platform-independent
visualization and navigation of the data. BiNA uses sophisticated graph layout algorithms for an
interactive visualization and navigation of BNDB.
Conclusion: BNDB allows a simple, unified access to a variety of external data sources. Its tight
integration with the biochemical network library BN++ offers the possibility for import,
integration, analysis, and visualization of the data. BNDB is freely accessible at http://www.bndb.org.
Background
The development of high-throughput technologies has
generated an extensive quantity of -omics data over the
last decades. Despite the technological progress, improve-
ments in the application area, e.g. in drug discovery, have
failed to keep pace with increased research and develop-
ment spending, as demonstrated by Nightingale et al. [1].
One of the main reasons for this discrepancy is the
increasing number of highly focused databases differing
in both the data models and the interfaces [2]. The data-
Published: 2 October 2007
BMC Bioinformatics 2007, 8:367 doi:10.1186/1471-2105-8-367
Received: 2 July 2007
Accepted: 2 October 2007
This article is available from: http://www.biomedcentral.com/1471-2105/8/367
© 2007 Küntzer et al.; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0
),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
BMC Bioinformatics 2007, 8:367 http://www.biomedcentral.com/1471-2105/8/367
Page 2 of 9
(page number not for citation purposes)
bases are often independently developed, have a substan-
tial overlap and are not well standardized. The absence of
a standardization limits the usability of these databases
and leads to a demand for a unified access to the data [3].
Hence, a large number of systems addressing this problem
with diffierent approaches have been developed. These
approaches can be classified by their architecture into
three main categories [4]: navigators, mediators, and ware-
houses. The first category, navigators, is based on the idea
of a navigational or link-based integration of several data
sources. Such a portal normally does not integrate the
data itself, but provides the user with pages navigating to
external data sources. Well-established examples of portal
systems are SRS [5], BioNavigator [6], and Entrez [7]. A
mediator gives access to distributed data by reformulating
the queries of the user at runtime into queries on external
data sources. However, availability and efficiency are
major drawbacks of such solutions. Examples for this cat-
egory are Discovery Link [8], TAMBIS [9], and BioMedia-
tor [10]. Systems of the third category, warehouses,
require a complete semantic integration of the data from
various external data sources into a single local database
via an integrative data model. Such approaches allow for
an efficient execution of queries since they avoid typical
problems of the other methods such as network bottle-
necks, short-time unavailability of the external data
sources, and changes in the external data sources. How-
ever, data warehouses usually require complex data mod-
els and regular updates of the integrated data sources, in
order to avoid the possibility of returning outdated query
results. BNDB is a representative of this category, as are
other systems like GUS [11], ONDEX [12], cPath [13], and
Biozon [14].
Construction and content
Based on an object-oriented data model, called BioCore,
we developed and implemented BNDB, an SQL data
warehouse system that integrates data sets from external
and internal data sources via importers. The BioCore
model allows not only for modelling of nearly all cur-
rently known biochemical processes, but also for includ-
ing new biological concepts with little effort [15,16]. The
architecture of the system is presented in Fig. 1.
The BNDB is implemented as a relational database using
MySQL [17]. We decided to chose a relational database
management system over an object-oriented system, since
relational DBMS are well-established and the current de-
facto standard. This guarantees a high portability of the
biochemical network database allowing a user to create a
local version of the BNDB on a wide range of platforms.
Therefore, we created an object-relational mapping of the
BioCore model onto a relational database management
system, using only SQL2 [18] compatible statements. This
restriction allows the usage of any relational or object-
relational database management system like DB2, Oracle,
or PostgreSQL. The database consists of more than 240
tables representing all BioCore classes [see Additional file
1]. Additionally, BNDB includes tables for the user and
rights management, as well as for the reconstruction of the
object-oriented structure of the database. The schema for
BNDB (Fig. 2) is available on the website.
In the current state, BNDB represents a comprehensive
collection of biological data integrated from the following
data sources:
• Sequence databases: SwissProt [19], RefSeq [20]
• Pathway databases: KEGG [21], BioCyc [22], TransPath
[23]
• Protein interaction databases: DIP [24], MINT [25],
IntAct [26], HPRD [27]
• Transcription factor databases: TransFac [28]
For the horizontal data integration [29,30] of these data
we implemented comprehensive merging heuristics. The
key concept behind these methods is the integration of
complementary data sources and the elimination of
redundancy in the data. We use two fundamental
approaches for the merging of the data:
(1) object matching based on unambiguous external iden-
tifiers and (2) structural matching based on identical
object relations.
The first approach relies on the existence and correctness
of selected standardized IDs in the imported databases
(see Fig. 3). Each object in the database is linked with a
variety of different external data source identifiers, like
RefSeq, GeneId, SwissProt, Unigene, InterPro, etc. We
only use those identifiers, that unambiguously identify
the corresponding biochemical objects. For the merging
we collect all unambiguous database identifiers in BNDB.
For each of these IDs we check if they are connected to
more than one object instance of the same type. If this is
the case, we merge these instances into one single
instance. All attributes of these instances are merged and
multiple occurrences of these attributes are removed.
External database IDs not describing unique objects, but
rather clusters of objects (e.g. Unigene, InterPro, etc.) are
not considered in the merging process. For objects with-
out external identifiers, like biochemical events (e.g. met-
abolic reactions), we use the second approach based on
structural matching of object relations. We define two
events to be equal if they are of the same event type and
contain the same participants occurring in the same role,
BMC Bioinformatics 2007, 8:367 http://www.biomedcentral.com/1471-2105/8/367
Page 3 of 9
(page number not for citation purposes)
whereas events, participants, and role are the major build-
ing blocks of the BioCore schema [15].
The merging process itself consists of several steps: In an
initial step, we merge most of the database objects by their
identifiers and remove redundancy in their attributes
through the first approach. Then, in the second step we
collect and merge all equivalent events in BNDB through
the second structural approach.
A simplified example for merging genes using the first
approach, is presented in Fig. 4. Four instances of the
human BAD gene with different external database identi-
fiers and names are merged by unambiguous identifiers.
Two instances are connected with the same NCBI GI-ID
and therefore identified to be equal. These instances are
merged into one single instance connected with the
merged attributes. The remaining instances are all associ-
ated with the same NCBI-GeneID. Thus, our algorithm
merges these three instances into one single gene instance,
which is linked with the merged information of all four
former instances. All merging heuristics were imple-
mented using the Biochemical Network Library BN++
[16] and the source code is available on our website.
Utility and discussion
For accessing BNDB we offer three different ways: a web
interface, a network visualizer, and a programming inter-
face.
Web interface
An intuitive web client browser enables querying and
browsing BNDB. The user can search by name, descrip-
tion, or publication for participants, events and pathways.
The user query is converted internally into an SQL query.
For the standard search the user does not need to know
any information about the internal structure of BNDB or
its underlying data model BioCore. In addition, for more
advanced users the web interface gives the possibility to
perform direct SQL queries. The retrieved results are pre-
sented text- and link-based in a user-friendly way. Hyper-
links to external data sources are provided for additional
information whenever external database identifiers are
connected with the object (for an example see Fig. 5).
Depending on the rights of the user, the system allows for
a curation of the database by editing the displayed results.
Furthermore, we included a functionality for adding new
information in a convenient way, such that the user does
not need to know the internal structure of the database.
ArchitectureFigure 1
Architecture. Architecture of the BNDB data warehouse.
BioCore
implements
implements
C++
library
BGL
yFiles
Java
library
us
e
s
u
s
e
s
us
e
s
us
e
s
BN++ DB
(MySQL)
BN++
Framework
BiNA
SQL
SQL
Plugin Plugin Plugin
MINT
MINT
T
T
ra
T
T
n
sP
P
th
th
T
T
ra
T
T
nsF
F
a
c
BioCyc
RefSeq
KEGG
I
n
tA
c
t
ItAt
HPRD
DIP
implements
contains
BMC Bioinformatics 2007, 8:367 http://www.biomedcentral.com/1471-2105/8/367
Page 4 of 9
(page number not for citation purposes)
The BNDB interface guides the user through the adding
process and warns if necessary information is missing.
Network visualization
We provide a stand-alone Java application called BiNA for
querying and analyzing the data contained in BNDB and
for visualizing biological networks. The tool allows for
visualizing metabolic and regulatory networks with
sophisticated graph layout algorithms. Besides the direct
visualization, BiNA also provides a mapping engine to
analyze arbitrary data sets in the context of networks. This
allows to map numerical biological data, e.g. mRNA
expression data, onto graph attributes like node/edge
color or size. The visualization of two data sets at the same
time makes it easy to compare different data sets and iden-
tify correlations. The color scheme and the edge thickness
used for the drawing can be freely defined by the user and
is shown as a legend in the visualization view. Addition-
ally, the mapped data values can be changed easily to
interactively explore time-series expression data. In the
metabolic view, the edges labeled with the catalyzing
enzymes can be colored by the expression values of the
enzyme-coding genes. In the regulatory view, the map-
ping plugin allows for coloring the nodes representing
proteins, genes or protein families, whereas the protein
families are colored by the values of all contained mem-
bers. Fig. 6 gives an example for a metabolic view in BiNA
with mapped expression data.
The graph and visualization capabilities of our applica-
tion are comparable to that of visualization systems such
as Cytoscape [31], PathSys [32], VisANT [33], or commer-
cial tools such as MetaDrug [34] or PathwayStudio [35].
Additionally, BiNA offers a multifunctional workbench,
which is easily extensible. The viewer itself can be
regarded as a collection of modules that depend on each
other. The hierarchical plugin system automatically
resolves dependencies between plugins through a well-
defined and very powerful interface. The plugin structure
of BiNA allows for an easy integration of own analysis
routines. Currently, several plugins exists, e.g. for map-
ping gene expression data onto the network, pathway
search algorithms, or exporting pathways into SBML and
BioPAX.
Programming interface
BNDB is fully integrated with the Biochemical Network
Library BN++ [15,16] providing a sophisticated program-
Simplified DDL Diagram of BNDBFigure 2
Simplified DDL Diagram of BNDB. The simplified structure of the database schema.
<<index>> + id_index(id)
<<PK>> - id: int
- reverse: tinyint = NULL
Event
<<index>> + id_index(id)
<<index>> + event_index(event)
<<index>> + participant_index(participant)
<<PK>> - id: int
- event: int
<<not null>> - participant: int
Role
<<index>> + id_index(id)
<<PK>> - id: int
- classname_id: int
- accession_nr: int
- timestamp: timestamp
Thing
<<index>> + id_index(id)
<<PK>> - id: int
Participant
{columns=id}
{columns=event}
{columns=id}
{columns=participant}
<<key>>
<<key>>
{columns=id}
<<key>>
<<key>>
{columns=id}
<<key>>
<<key>>
<<key>>
<<key>>
<<key>>
<<key>>
{columns=id}
{columns=id}
{columns=id}
{columns=id}
BMC Bioinformatics 2007, 8:367 http://www.biomedcentral.com/1471-2105/8/367
Page 5 of 9
(page number not for citation purposes)
ming interface. Hence, arbitrary data like a complete path-
way can be serialized and deserialized from C++ by a
single line of code. This speeds up the development proc-
ess of analysis routines, since a programmer can concen-
trate on the implementation of the algorithm. In addition,
the BN++ software framework offers a comprehensive col-
lection of implemented analysis routines.
The C++ programming interface provides a convenient,
but very flexible way to merge the data. With a few lines of
code it is possible to construct a customized local meta-
database containing only that data the user requires.
Conclusion
With BNDB we present a data warehouse system integrat-
ing a large number of different biological databases.
Access to these data is provided through a generic web
interface allowing for adding, editing, and searching the
data in BNDB. In addition, we have developed BiNA, a
powerful and extensible tool for visualizing biochemical
networks directly from BNDB. Through the BN++ soft-
ware framework BNDB is easily accessible for software
developers and can be integrated into tailor-made appli-
cations and customized to user needs. All tools and meth-
ods described herein, BNDB, BiNA, the source code, the
web interface to BNDB, and the underlying data model
are freely available from our website.
Database UniverseFigure 3
Database Universe. The nodes represent external databases labeled by their name. An edge is draw from A to B meaning
that database A knows the ids of database B. In addition, the database are grouped by the contained data: the protein interac-
tion dbs are yellow, enzyme dbs are green, the protein and sequence dbs are blue, pathway dbs are olive, and the orange nodes
are domain dbs.
MINT BioGrid
IntAct
DIP
KEGG
PfamProDomProsite
Transpath
HPRD
Ensembl
Transfac
BRENDA
InterPro
ENZYME
MetaCyc
GenBank
PubMed
PDB
HUGO
OMIM
PIR
UniGene
UniProt
BMC Bioinformatics 2007, 8:367 http://www.biomedcentral.com/1471-2105/8/367
Page 6 of 9
(page number not for citation purposes)
A major advantage of BNDB is its underlying data model
BioCore. This comprehensive and extensible object model
can represent most currently known biochemical entities
and processes. Therefore, BNDB is able to store a huge
variety of different biochemical data. Researchers can eas-
ily adapt it to their own needs and build customized data-
bases. Another benefit is the full integration of BNDB into
the visualizer BiNA. Other systems often present only a
database with an analysis tool (e.g. Biozon), or a database
with a web interface (e.g. Entrez). For the graphical repre-
sentation of the networks, many of these systems use
standard visualizer (e.g. Cytoscape). However, we think
that the full integration of an own visualization tool facil-
itates the visualization and presentation of the stored
data.
We have developed several applications based on BNDB
that show the usefulness of the approach, e.g. an efficient
gene set analysis tool, GeneTrail [36], which enables the
user to identify enriched functional categories in protein
or gene sets. GeneTrail has been successfully applied to
detect a molecular target of the antimicrobial metabolite
kendomycin [37].
In summary, BNDB is a comprehensive database system,
which makes it not only possible to retrieve the combined
information of integrated data sources in an easy way, but
can also be customized and extended to meet the needs of
different users.
Availability and requirements
Project name: BNDB;
Project home page: http://www.bndb.org
;
Operating system(s): Platform independent;
Programming language: Java; Other requirements: Java
1.6.0 or higher;
Object matching based mergingFigure 4
Object matching based merging. Simplified example for merging genes using the object matching based approach. In this
case we have four instances of the human BAD gene, which we merge using the GI identifier and the GeneID. The resulting
gene contains all merged names and identifiers.
GeneID: 572
Unigene: Hs.370254
GI: 10835069
Gene
(BAD)
KEGG: hsa:572
GeneID: 572
Gene
(BAD)
OMIM: 603167
GeneID: 572
Gene
(BCL2-antagonist of cell death)
RefSeqID: NM_004322.2
GI: 10835069
Gene
GeneID: 572
Unigene: Hs.370254
GI: 10835069
RefSeqID: NM_004322.2
Gene
(BAD)
OMIM: 603167
GeneID: 572
Unigene: Hs.370254
GI: 10835069
RefSeqID: NM_004322.2
KEGG: hsa:572
Gene
(BAD)
(BCL2-antagonist of cell death)
merge
merge
BMC Bioinformatics 2007, 8:367 http://www.biomedcentral.com/1471-2105/8/367
Page 7 of 9
(page number not for citation purposes)
Searching using the web interfaceFigure 5
Searching using the web interface. Search for glycolysis in the web interface.
Visualization using BiNAFigure 6
Visualization using BiNA. Visualization of the glycolysis using the metabolic graph layout. The blue boxes represent meta-
bolic compounds. If there is an enzymatic reaction occurring between compounds, a directed edge labeled with the enzyme
class catalyzing the reaction is drawn. The edge labels are colored by the expression value of the enzyme-coding genes. In this
example we use expression values for the normal control of the GDS820 data set from the GEO database.
BMC Bioinformatics 2007, 8:367 http://www.biomedcentral.com/1471-2105/8/367
Page 8 of 9
(page number not for citation purposes)
Licence: GNU GPL;
BNDB is freely accessible at http://www.bndb.org
. The
current versions of BN++ and BiNA are distributed under
the GNU GPL license and available from the website
http://www.bnplusplus.org/downloads
.
Abbreviations
BN++ Biochemical Network Library
BiNA Biological Network Analysis
DBMS Database Management System
NCBI National Center for Biotechnology Information
BGL Boost Graph Library
SQL Standard Query Language
SBML Systems Biology Markup Language
BioPAX Biological Pathways Exchange
GEO Gene Expression Omnibus database
Authors' contributions
AG programmed the network visualization tool. MK pro-
vided specialist knowledge on network visualization. JK,
CB, and TB were involved in implementing one or more
importers. JK, OK, and HPL contributed to the system
design of BN++ and to the design of its data model. MK,
OK and HPL supervised the project. All authors read and
approved the final manuscript.
Additional material
Acknowledgements
The project was funded by the Deutsche Forschungsgemeinschaft (BIZ4:1-
4) and the Klaus Tschira Foundation.
References
1. Nightingale P, Martin P: The myth of the biotech revolution.
Trends Biotechol 2004, 22(11):564-569.
2. Galperin MY: The Molecular Biology Database Collection:
2006 update. Nucl Acids Res 2006, 34:D3-D5.
3. Cassman M, Arkin A, Doyle F, Katagiri F, Lauffenburger DA, Stokes
C: Assessment of International Research and Development
in Systems Biology. In Tech rep World Technology Evaluation
Center (WTEC); 2005.
4. Hernandez T, Kambhampati S: Integration of Biological Sources:
Current Systems and Challenges Ahead. SIGMOD Rec 2004,
33(3):51-60.
5. Etzold T, Argos P: SRS – an indexing and retrieval tool for flat
file data libraries. Comput Appl Biosci 1993, 9:49-57.
6. BioNavigator – BioNode & BioNodeSA: Overview [http://
www.antigen.com/library]
7. Entrez – Search and Retrieval System [http://
www.ncbi.nlm.nih.gov/sites/gquery]
8. Haas LM, Schwarz PM, Kodali P, Kotlar E, Rice JE, Swope WC: Dis-
coveryLink: A system for integrated access to life sciences
data sources. IBM Systems J 2001, 40(2):489-511.
9. Stevens R, Baker P, Bechhofer S, Ng G, Jacoby A, Paton NW, Goble
CA, Brass A: TAMBIS: transparent access to multiple bioin-
formatics information sources. Bioinformatics 2000,
16(2):184-185.
10. Donelson L, Tarczy-Hornoch P, Mork P, Dolan C, Mitchell JA, Barrier
M, Mei H: The BioMediator system as a data integration tool
to answer diverse biologic queries. Medinfo 2004,
11(2):768-772.
11. Davidson SB, Crabtree J, Brunk BP, Schug J, Tannen V, Overton GC,
Stoeckert CJ: K2/Kleisli and GUS: Experiments in integrated
access to genomic data sources. IBM Systems J 2001,
40(2):512-530.
12. Koehler J, Baumbach J, Taubert J, Specht M, Skusa A, Ruegg A, Rawl-
ings C, Verrier P, Philippi S: Graph-based analysis and visualiza-
tion of experimental results with ONDEX. Bioinformatics 2006,
22(11):1383-1390.
13. Cerami EG, Bader GD, Gross BE, Sander C: cPath: open source
software for collecting, storing, and querying biological path-
ways. BMC Bioinformatics 2006, 7(497):.
14. Birkland A, Yona G: BIOZON: a system for unification, man-
agement and analysis of heterogeneous biological data. BMC
Bioinformatics 2006, 7(70):.
15. Sirava M, Schäfer T, Eigelsperger M, Kohlbacher O, Bornberg-Bauer
E, Lenhof HP: BioMiner – modeling, analyzing, and visualizing
biochemical pathways and networks. Bioinformatics 2002,
18(2219-230 [http://www.zbi.uni-saarland.de/chair/projects/BioM
iner].
16. Küntzer J, Blum T, Gerasch A, Backes C, Hildebrandt A, Kaufmann M,
Kohlbacher O, Lenhof HP: BN++ -A Biological Information Sys-
tem. J Integr Bioinformatics 2006, 3(2):34.
17. The MySQL Database System [http://www.mysql.com
]
18. JTC1/SC21 I: Information Technology – Database Languages –
SQL2. In Tech rep ANSI; 1992.
19. Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann
B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ,
Mazumder R, O'Donovan C, Redaschi N, Suzek B: The Universal
Protein Resource (UniProt): an expanding universe of pro-
tein information. Nucl Acids Res 2006, 34:D187-D191.
20. Pruitt KD, Tatusova T, Maglott DR: NCBI Reference Sequence
(RefSeq): a curated non-redundant sequence database of
genomes, transcripts and proteins. Nucl Acids Res 2005,
33:D501-D504.
21. Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M,
Kawashima S, Katayama T, Araki M, Hirakawa M: From genomics
to chemical genomics: new developments in KEGG. Nucl
Acids Res 2006, 34:D354-D357.
22. Krieger CJ, Zhang P, Mueller LA, Wang A, Paley S, Arnaud M, Pick J,
Rhee SY, Karp PD: MetaCyc: a multiorganism database of met-
abolic pathways and enzymes. Nucl Acids Res 2004,
32:D438-D442.
23. Krull M, Pistor S, Voss N, Kel A, Reuter I, Kronenberg D, Michael H,
Schwarzer K, Potapov A, Choi C, Kel-Margoulis O, Wingender E:
TRANSPATH: An Information Resource for Storing and
Visualizing Signaling Pathways and their Pathological Aber-
rations. Nucl Acids Res 2006, 34:D546-D551.
24. Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D:
The Database of Interacting Proteins: 2004 update. Nucl Acids
Res 2004, 32:D449-D451.
25. Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-
Citterich M, GC : MINT: a Molecular INTeraction database.
FEBS Lett 2002, 513:135-140.
26. Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S,
Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, Margalit
H, Armstrong J, Bairoch A, Cesareni G, Sherman D, Apweiler R:
Additional file 1
DDL Diagram of BNDB. The general structure of the database schema.
Click here for file
[http://www.biomedcentral.com/content/supplementary/1471-
2105-8-367-S1.jpeg]
Publish with BioMed Central and every
scientist can read your work free of charge
"BioMed Central will be the most significant development for
disseminating the results of biomedical researc h in our lifetime."
Sir Paul Nurse, Cancer Research UK
Your research papers will be:
available free of charge to the entire biomedical community
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours — you keep the copyright
Submit your manuscript here:
http://www.biomedcentral.com/info/publishing_adv.asp
BioMedcentral
BMC Bioinformatics 2007, 8:367 http://www.biomedcentral.com/1471-2105/8/367
Page 9 of 9
(page number not for citation purposes)
IntAct -an open source molecular interaction database. Nucl
Acids Res 2004, 32:D452-D455.
27. Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK,
Surendranath V, Niranjan V, Muthusamy B, Gandhi TK, Gronborg M,
Ibarrola N, Deshpande N, Shanker K, Shivashankar HN, Rashmi BP,
Ramya MA, Zhao Z, Chandrika KN, Padma N, Harsha HC, Yatish AJ,
Kavitha MP, Menezes M, Choudhury DR, Suresh S, Ghosh N, Saravana
R, Chandran S, Krishna S, Joy M, Anand SK, Madavan V, Joseph A,
Wong GW, Schiemann WP, Constantinescu SN, Huang L, Khosravi-
Far R, Steen H, Tewari M, Ghaffari S, Blobe GC, Dang CV, Garcia JG,
Pevsner J, Jensen ON, Roepstorff P, Deshpande KS, Chinnaiyan AM,
Hamosh A, Chakravarti A, Pandey A: Development of human
protein reference database as an initial platform for
approaching systems biology in humans. Genome Res 2003,
13:2363-2371.
28. Matys V, Kel-Margoulis O, Fricke E, Liebich I, Land S, Barre-Dirrie A,
Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P,
Lewicki-Potapov B, Saxel H, Kel A, Wingender E: TRANSFAC and
its module TRANSCompel: transcriptional gene regulation
in eukaryotes. Nucl Acids Res 2006, 34:D108-D110.
29. Davidson S, Overton GC, Buneman P: Challenges in Integrating
Biological Data Sources. J Comput Biol 1995, 2:557-572.
30. Spaccapietra S, Parent C, Dupont Y: Model Independent Asser-
tions for Integration of Heterogeneous Schemas. VLDB Journal
1992, 1:81-126.
31. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin
N, Schwikowski B, Ideker T: Cytoscape: a software environment
for integrated models of biomolecular interaction networks.
Genome Research 2003, 13(11):2498-2504.
32. Baitaluk M, Qian X, Godbole S, Raval A, Ray A, Gupta A: PathSys:
integrating molecular interaction graphs for systems biol-
ogy. BMC Bioinformatics 2006, 7(55):.
33. Hu Z, Mellor J, Wu J, DeLisi C: VisANT: an online visualization
and analysis tool for biological interaction data. BMC Bioinfor-
matics 2004, 5(17):.
34. GeneGo – System Biology for Drug Discovery [http://
www.genego.com]
35. Nikitin A, Egorov S, Daraselia N, Mazo I: Pathway studio – the
analysis and navigation of molecular networks. Bioinformatics
2003, 19(16):2155-2157.
36. Backes C, Keller A, Kuentzer J, Kneissl B, Comtesse N, Elnakady YA,
Muller R, Meese E, Lenhof HP: GeneTrail – advanced gene set
enrichment analysis. Nucl Acids Res 2007, 35:W186-W192.
37. Elnakady YA, Rohde M, Sasse F, Backes C, Keller A, Lenhof HP,
Weissman KJ, Müller R: Evidence for the mode of action of the
highly cytotoxic streptomyces polyketide kendomycin.
Chembiochem 2007, 8(11):1261-1272.

Supplementary resource (1)

Data
October 2007
Jan Küntzer · Christina Backes · Torsten Blum · Andreas Gerasch · Hans-Peter Lenhof
... The processed data is directly passed to GeneTrail for statistical evaluation in an extensive gene set analysis. Furthermore, since the GeneTrail C++ framework already supported information retrieval from the Biochemical Network Database BNDB [25], we extended our framework with a graph data structure to make use of the network topology. Using this functionality, we developed two approaches for detecting differentially regulated components of a regulatory network. ...
... This guarantees a fast access to the information that is independent of the availability of a database connection or external resources. An overview of GeneTrail's integrated pre-defined biological categories is illustrated in Figure 2 [25] is part of the biological information retrieval system BN++ [44,46]. BN++ is a C++ library tailored for modelling biochemical networks and is based on a comprehensive and easily extensible data model, called BioCore. ...
... The GeneTrail C++ framework supports information retrieval from the BNDB [25] as described previously in Section 2.1.2.1. In order to take full advantage of the provided information, we extended our framework with a graph data structure using the efficient boost graph library (BGL) [64]. ...
Thesis
Cancer is the consequence of genetic alterations that influence the behavior of affected cells. While the phenotypic effects of cancer like infinite proliferation are common hallmarks of this complex class of diseases, the connections between the genetic alterations and these effects are not always evident. The growth of information generated by experimental high-throughput techniques makes it possible to combine heterogeneous data from different sources to gain new insights into these complex molecular processes. The demand on computational biology to develop tools and methods to facilitate the evaluation of such data has increased accordingly. To this end, we developed new approaches and bioinformatics tools for the analysis of high-throughput data. Additionally, we integrated these new approaches into our comprehensive C++ framework GeneTrail. GeneTrail presents a powerful package that combines information retrieval, statistical evaluation of gene sets, result presentation, and data exchange. To make GeneTrail';s capabilities available to the research community, we implemented a graphical user interface in PHP and set up a webserver that is world-wide accessible. In this thesis, we discuss newly integrated algorithms and extensions of GeneTrail, as well as some comprehensive studies that have been performed with GeneTrail in the context of cancer research. We applied GeneTrail to analyze properties of tumor-associated antigens to elucidate the mechanisms of antigen candidate selection. Furthermore, we performed an extensive analysis of miRNAs and their putative target pathways and networks in cancer. In the field of differential network analysis, we employed a combination of expression values and topological data to identify patterns of deregulated subnetworks and putative key players for the deregulation. Signatures of deregulated subnetworks may help to predict the sensitivity of tumor subtypes to therapeutic agents and, hence, may be used in the future to guide the selection of optimal agents. Furthermore, the identified putative key players may represent oncogenes, tumor suppressor genes, or other genes that contribute to crucial changes of regulatory and signaling processes in cancer cells and may serve as potential targets for an individualized tumor therapy. With these applications, we demonstrate the usefulness of our GeneTrail package and hope that our work will contribute to a better understanding of cancer.
... We integrated our approach into the visualization tool BiNA [4,5], which supports sophisticated inspection, navigation, and mapping of large-scale data. To get access to additional information about the biochemistry beyond, we use the complete metabolic network of KEGG provided by the BN++ data warehouse [5]. ...
... We integrated our approach into the visualization tool BiNA [4,5], which supports sophisticated inspection, navigation, and mapping of large-scale data. To get access to additional information about the biochemistry beyond, we use the complete metabolic network of KEGG provided by the BN++ data warehouse [5]. BN++ is an integration system for biological networks and supports importing the KEGG PATHWAY and COMPOUND databases via the KEGG FTP servers. ...
Conference Paper
Full-text available
Static drawings of biological pathways are still an important research tool for biologists. Gerhard Michal created his seminal drawings of metabolic networks in the 1960s and thus defined canonical representations of some key pathways. The Kyoto Encyclopedia of Genes and Genomes (KEGG) provides the most popular static drawings of biological networks of different types, used in a huge number of publications. These drawings are so widely known that they are immediately recognizable to most biologists. This enables collaborative work and simplifies the communication of analysis results. Automatic layout of these pathway maps is complicated by the fact that the information available from KEGG does not contain the entire layout information of the reference maps. Here we present a fully automated algorithm for interactive KEGG layout construction. The algorithm conserves the original KEGG layout to the extent possible while improving readability by removing unnecessary elements (in organism-specific maps). Multiple pathway maps can be laid out simultaneously to facilitate the navigation of larger networks. The algorithm supports the hierarchical layout of sub networks and thus supports interactive exploration of large datasets.
... Existing data integration approaches for metabolic models have focused on developing data warehouses, which collect metabolic data from different databases under integrated nomenclature, such as SABIO-RK [36], BNDB [37], Reactome [38], and model repositories, such as JWS Online [22] and BioModels Database [23], which store user-developed models but not necessarily using a unified nomenclature. In contrast, ReMatch was developed to act as a model repository that uses a unified nomenclature, improving reusability of stored metabolic models. ...
Article
Full-text available
Summar y ReMatch is a web-based, user-friendly tool that constructs stoichiometric network models for metabolic flux analysis, integrating user-developed models into a database collected from several comprehensive metabolic data resources, including KEGG, MetaCyc and CheBI. Particularly, ReMatch augments the metabolic reactions of the model with carbon mappings to facilitate The construction of a network model consisting of biochemical reactions is the first step in most metabolic modelling tasks. This model construction can be a tedious task as the required information is usually scattered to many separate databases whose interoperability is suboptimal, due to the heterogeneous naming conventions of metabolites in different databases. Another, particularly severe data integration problem is faced in ReMatch has been developed to solve the above data integration problems. First, ReMatch matches the imported user-developed model against the internal ReMatch database while considering a comprehensive metabolite name thesaurus. This, together with wild card support, allows the user to specify the model quickly without having to look the names up manually. Second, ReMatch is able to augment reactions of the model with carbon mappings, obtained either from the internal database or given by the user with an easy-touse tool. The constructed models can be exported into 13C-FLUX and SBML file formats. Further, a stoichiometric matrix and visualizations of the network model can be generated. The constructed models of metabolic networks can be optionally made available to the other users of ReMatch. Thus, ReMatch provides a common repository for metabolic network models with carbon mappings for the needs of metabolic flux analysis community. ReMatch is freely available for academic use at http://www.cs.helsinki.fi/group/sysfys/software/rematch/.
... This network contains the regulatory relationships selected from all KEGG pathways and can be download from the website http://genetrail.bioinf.uni-sb.de/ilp/Home.html. Backes et al. access the data via the Biochemical Network Database (BNDB) [43] for a consistent interface. It contains 2010 genes connected by 9900 regulatory relationships, among which 1579 genes, annotated by GO terms, with 7630 regulatory relationships are selected to form our human regulatory network. ...
Article
Full-text available
Background: The complexity of biological systems motivates us to use the underlying networks to provide deep understanding of disease etiology and the human diseases are viewed as perturbations of dynamic properties of networks. Control theory that deals with dynamic systems has been successfully used to capture systems-level knowledge in large amount of quantitative biological interactions. But from the perspective of system control, the ways by which multiple genetic factors jointly perturb a disease phenotype still remain. Results: In this work, we combine tools from control theory and network science to address the diversified control paths in complex networks. Then the ways by which the disease genes perturb biological systems are identified and quantified by the control paths in a human regulatory network. Furthermore, as an application, prioritization of candidate genes is presented by use of control path analysis and gene ontology annotation for definition of similarities. We use leave-one-out cross-validation to evaluate the ability of finding the gene-disease relationship. Results have shown compatible performance with previous sophisticated works, especially in directed systems. Conclusions: Our results inspire a deeper understanding of molecular mechanisms that drive pathological processes. Diversified control paths offer a basis for integrated intervention techniques which will ultimately lead to the development of novel therapeutic strategies.
... • BiNa[86] (http://bit.ly/y6ix9i) • BioUML [82] (http://bit.ly/yIETIt) ...
Chapter
Approaches to investigate biological processes have been of strong interest in the past few years and are the focus of several research areas like systems biology. Biological networks as representations of such processes are crucial for an extensive understanding of living beings. Due to their size and complexity, their growth and continuous change, as well as their compilation from databases on demand, researchers very often request novel network visualization, interaction, and exploration techniques. In this chapter, we first provide background information that is needed for the interactive visual analysis of various biological networks. Fields such as (information) visualization, visual analytics, and automatic layout of networks are highlighted and illustrated by a number of examples. Then, the state of the art in network visualization for the life sciences is presented together with a discussion of standards for the graphical representation of cellular networks and biological processes.
... Some programs can be extended via plug-ins, e.g., the Biological Network Analyzer BiNA (Gerasch et al., 2014), CellDesigner (Funahashi et al., 2008), or Cytoscape (Shannon et al., 2003). The flexible stand-alone application BiNA (Gerasch et al., 2014) is based on a hierarchical graph concept and provides highly configurable styles for the visualization of regulatory and metabolic network data as well as access to the BN++ pathway data warehouse (Küntzer et al., 2007). The web-modeling tool BioGrapher (Krause et al., 2013) is implemented with HTML5, CSS, and JavaScript and can be used to create SBGN maps. ...
Article
Full-text available
Collaborative genome-scale reconstruction endeavors of metabolic networks would not be possible without a common, standardized formal representation of these systems. The ability to precisely define biological building blocks together with their dynamic behavior has even been considered a prerequisite for upcoming synthetic biology approaches. Driven by the requirements of such ambitious research goals, standardization itself has become an active field of research on nearly all levels of granularity in biology. In addition to the originally envisaged exchange of computational models and tool interoperability, new standards have been suggested for an unambiguous graphical display of biological phenomena, to annotate, archive, as well as to rank models, and to describe execution and the outcomes of simulation experiments. The spectrum now even covers the interaction of entire neurons in the brain, three-dimensional motions, and the description of pharmacometric studies. Thereby, the mathematical description of systems and approaches for their (repeated) simulation are clearly separated from each other and also from their graphical representation. Minimum information definitions constitute guidelines and common operation protocols in order to ensure reproducibility of findings and a unified knowledge representation. Central database infrastructures have been established that provide the scientific community with persistent links from model annotations to online resources. A rich variety of open-source software tools thrives for all data formats, often supporting a multitude of programing languages. Regular meetings and workshops of developers and users lead to continuous improvement and ongoing development of these standardization efforts. This article gives a brief overview about the current state of the growing number of operation protocols, mark-up languages, graphical descriptions, and fundamental software support with relevance to systems biology.
Chapter
Visualisations of metabolites and metabolic pathways have been used since the early years of research in biology, and pathway maps have become very popular in biochemistry textbooks, on posters, as well as in electronic resources and web pages about metabolism. Visualisations help to present knowledge and support browsing through chemical structures, enzymes, reactions and pathways. In addition, visual and immersive analytics of metabolism connects network analysis algorithms and interactive visualisation methods to investigate structures in the network such as centralities, motifs and paths, or to compare pathways for finding differences between species or conditions. The graphical depiction of networks supports the mapping and investigation of additional data such as metabolomics, enzyme activity, flux and transcriptomics data, and the exploration of the data in the network context. It builds a foundation for investigating the dynamics of metabolic processes obtained either experimentally or via modelling and simulation. Here we discuss past, present and future of the visualisation of metabolic networks and pathways and provide links to several resources.
Chapter
The epithelial cells of the gastrointestinal (GI) tract communicate with each other and with cells of other organs via a complex network of highly regulated movement of ions and biomolecules. The molecules ensure regulated activity of cells, tissues, and organs of the GI system and the body as whole. The regulated movement and subsequent activities of the biomolecules released from one cell to the target are made possible by receptive substances (receptors) localized on the membrane of the target cells or intracellular organelles, or in the cytosol. This process, which is referred to as cell-to-cell communication or cellular signaling, ensures the regulated functioning of the cells and tissues of the GI system and the whole organism. This chapter is dedicated to the mechanism of cell-to-cell communication and signaling in normal and relates it to how disease develops. Basic mechanisms of GI epithelial cell signaling and gut nutrient receptor sensing (GI chemosensation) are discussed.
Article
Full-text available
This paper reports on a Web 2.0 tool that aims to facilitate and augment collaboration and decision making in data-intensive and cognitively-complex biomedical settings. The proposed tool exploits prominent high-performance computing paradigms and large data processing technologies to meaningfully search, analyze and aggregate data existing in diverse, extremely large and rapidly evolving sources. It can be viewed as an innovative workbench incorporating and orchestrating a set of interoperable services that reduce the data-intensiveness and complexity overload at critical decision points to a manageable level, thus permitting stakeholders to be more productive and concentrate on creative activities. Through a particular collaboration scenario, we explore various possibilities and challenges of managing biomedical collaboration with the use of the proposed tool. Much attention is given at the increase of volume, rate of production and complexity of the associated data types.
Article
Full-text available
Recent years have seen an explosive growth in the amount of biochemical data available. Numerous databases have been established and are being used as an essential resource by biologists around the world. The sheer amount and heterogeneity of these data poses a major challenge: data integration and, based thereupon, the integrative analysis of these data. We present BN++, the biochemical network library, a powerful software package for integrating, analyzing, and visualizing biochemical data in the context of networks. BN++ is based on a comprehensive and extensible object model (BioCore), which has been implemented as a C++ framework, a Java class library, and a relational database. The C++ framework is used to efficiently import, integrate, and analyze the data, which is stored in a data warehouse. The Java-based viewer (BiNA) provides a powerful platform-independent visualization of the data using sophisticated graph layout algorithms. Currently, the data warehouse imports and integrates data from about a dozen important databases including, among others, sequence data, metabolic and regulatory networks, and protein interaction data. We illustrate the usefulness of BN++ with a few select example applications. Availability: BN++ is open source software available from our website at www.bnplusplus.org.
Article
Full-text available
Human Protein Reference Database (HPRD) is an object database that integrates a wealth of information relevant to the function of human proteins in health and disease. Data pertaining to thousands of protein-protein interactions, posttranslational modifications, enzyme/substrate relationships, disease associations, tissue expression, and subcellular localization were extracted from the literature for a nonredundant set of 2750 human proteins. Almost all the information was obtained manually by biologists who read and interpreted >300,000 published articles during the annotation process. This database, which has an intuitive query interface allowing easy access to all the features of proteins, was built by using open source technologies and will be freely available at http://www.hprd.org to the academic community. This unified bioinformatics platform will be useful in cataloging and mining the large number of proteomic interactions and alterations that will be discovered in the postgenomic era.
Book
Full-text available
The current textbook image of biological processes is that of a static model of loosely linked, highly detailed, molecular devices. However, every biologist knows that dynamic processes drive biology. Systems biology is defined for the purpose of this study as the understanding of biological network behaviors, and in particular their dynamic aspects, which requires the utilization of mathematical modeling tightly linked to experiment. This involves a variety of approaches, such as the identification and validation of networks, the creation of appropriate datasets, the development of tools for data acquisition and software development, and the use of modeling and simulation software in close linkage with experiment. All of these are discussed in this report. Of course, the definition becomes ambiguous at the margins. But at the core is the focus on networks, which makes it clear that the goal is to understand the operation of the systems, rather than the component parts. The panel concluded that the U.S. is currently ahead of the rest of the world in systems biology, largely because of earlier investment over the past five to seven years by funding organizations and research institutions. This is reflected in a large number of active research groups, and educational programs, and a diverse and growing funding base. However, there is evidence of rapid development outside the U.S., much of it begun in the last two to three years. It must be stressed that the attempt to incorporate the details of molecular events obtained over the past half century into a dynamic picture of network behavior in biological systems is only just beginning, in the U.S. and elsewhere. In particular, progress in the core activity of systems biology modeling tied to experiment is still limited. Progress would be facilitated by strong international collaborations in training, research, and infrastructure. Overall, the picture is of an active field in early stages of explosive growth.
Article
Full-text available
Vast amounts of life sciences data reside today in specialized data sources, with specialized query processing capabilities. Data from one source often must be combined with data from other sources to give users the information they desire. There are database middleware systems that extract data from multiple sources in response to a single query. IBM's DiscoveryLink is one such system, targeted to applications from the life sciences industry. DiscoveryLink provides users with a virtual database to which they can pose arbitrarily complex queries, even though the actual data needed to answer the query may originate from several different sources, and none of those sources, by itself, is capable of answering the query. We describe the DiscoveryLink offering, focusing on two key elements, the wrapper architecture and the query optimizer, and illustrate how it can be used to integrate the access to life sciences data from heterogeneous data sources.
Article
Full-text available
The integrated access to heterogeneous data sources is a major challenge for the biomedical community. Several solution strategies have been explored: link-driven federation of databases, view integration, and warehousing. In this paper we report on our experiences with two systems that were developed at the University of Pennsylvania: K2, a view integration implementation, and GUS, a data warehouse. Although the view integration and the warehouse approaches each have advantages, there is no clear “winner.” Therefore, in selecting the best strategy for a particular application, users must consider the data characteristics, the performance guarantees required, and the programming resources available. Our experiences also point to some practical tips on how database updates should be published, and how XML can be used to facilitate the processing of updates in a warehousing environment.
Article
Full-text available
Recent years have seen an explosive growth in the amount of biochemical data available. Numerous databases have been established and are being used as an essential resource by biologists around the world. The sheer amount and heterogeneity of these data poses a major challenge: data integration and, based thereupon, the integrative analysis of these data. We present BN++, the biochemical network library, a powerful software package for integrating, analyzing, and visualizing biochemical data in the context of networks. BN++ is based on a comprehensive and extensible object model (BioCore), which has been implemented as a C++ framework, a Java class library, and a relational database. The C++ framework is used to efficiently import, integrate, and analyze the data, which is stored in a data warehouse. The Java-based viewer (BiNA) provides a powerful platform-independent visualization of the data using sophisticated graph layout algorithms. Currently, the data warehouse imports and integrates data from about a dozen important databases including, among others, sequence data, metabolic and regulatory networks, and protein interaction data. We illustrate the usefulness of BN++ with a few select example applications. BN++ is open source software available from our website at http://www.bnplusplus.org.
Article
This paper surveys the area of biological and genomic sources integration, which has recently become a major focus of the data integration research field. The challenges that an integration system for biological sources must face are due to several factors such as the variety and amount of data available, the representational heterogeneity of the data in the different sources, and the autonomy and differing capabilities of the sources. This survey describes the main integration approaches that have been adopted. They include warehouse integration, mediator-based integration, and navigational integration. Then we look at the four major existing integration systems that have been developed for the biological domain: SRS, BioKleisli, TAMBIS, and DiscoveryLink. After analyzing these systems and mentioning a few others, we identify the pros and cons of the current approaches and systems and discuss what an integration system for biologists ought to be.
Article
Motivation: Assembling the relevant information needed to interpret the output from high-throughput, genome scale, experiments such as gene expression microarrays is challenging. Analysis reveals genes that show statistically significant changes in expression levels, but more information is needed to determine their biological relevance. The challenge is to bring these genes together with biological information distributed across hundreds of databases or buried in the scientific literature (millions of articles). Software tools are needed to automate this task which at present is labor-intensive and requires considerable informatics and biological expertise. Results: This article describes ONDEX and how it can be applied to the task of interpreting gene expression results. ONDEX is a database system that combines the features of semantic database integration and text mining with methods for graph-based analysis. An overview of the ONDEX system is presented, concentrating on recently developed features for graph-based analysis and visualization. A case study is used to show how ONDEX can help to identify causal relationships between stress response genes and metabolic pathways from gene expression data. ONDEX also discovered functional annotations for most of the genes that emerged as significant in the microarray experiment, but were previously of unknown function.
Article
SRS (Sequence Retrieval System) is an information indexing and retrieval system designed for libraries with a flat file format such as the EMBL nucleotide sequence databank, the SwissProt protein sequence databank or the Prosite library of protein subsequence consensus patterns. SRS supports the data structure of these libraries by providing special indices for inzplemenzing lists of subenfities (e.g. feature tables) or hierarchically structured data–fields (e.g. taxonomic classification). A language (ODD) has been designed for the convenient specification of library format and organization, representation of individual data–fields within the system (design of indices) and structuring other data needed during retrieval. This ensures flexibility required for coping with different library formats, which are subject to continuous change. Queries and inspection of retrieved entries can be performed from a user interface with pull–down menus and windows. SRS supports rious input and output formats but is particularly well adapted to the GCG programs.