The Comparative Toxicogenomics Database (CTD)

Article (PDF Available)inEnvironmental Health Perspectives 111(6):793-5 · June 2003with26 Reads
DOI: 10.1289/txg.6028 · Source: PubMed
Abstract
The Mount Desert Island Biological Laboratory in Salsbury Cove, Maine, USA, is developing the Comparative Toxicogenomics Database (CTD), a community-supported genomic resource devoted to genes and proteins of human toxicologic significance. CTD will be the first publicly available database to a) provide annotated associations among genes, proteins, references, and toxic agents, with a focus on annotating data from aquatic and mammalian organisms; b) include nucleotide and protein sequences from diverse species; c) offer a range of analysis tools for customized comparative studies; and d) provide information to investigators on available molecular reagents. This combination of features will facilitate cross-species comparisons of toxicologically significant genes and proteins. These comparisons will promote understanding of molecular evolution, the significance of conserved sequences, the genetic basis of variable sensitivity to environmental agents, and the complex interactions between the environment and human health. CTD is currently under development, and the planned scope and functions of the database are described herein. The intent of this report is to invite community participation in the development of CTD to ensure that it will be a valuable resource for environmental health, molecular biology, and toxicology research.

Figures

Approximately 75,000 chemicals are
currently listed in the U.S. Environmental
Protection Agency (U.S. EPA) Toxic
Substances Control Act Chemical Substances
Inventory (U.S. EPA 2003); however, the
toxic potential and the molecular mecha-
nisms underlying the action of many of
these chemicals are not well understood.
Scientists have long exploited diverse
experimental models to understand the
complexity of gene–environment interac-
tions. With the rising number of pub-
licly available sequences and completely
sequenced genomes, comparative studies are
proving to be essential for elucidating bio-
logical systems (Koonin et al. 2000) and
annotating accumulating genomic and
proteomic data (Whelan et al. 2001).
Comparisons of more distantly related verte-
brate and invertebrate species may be
of particular value for identifying con-
served genetic and molecular mechanisms
(Wittbrodt et al. 2002). It is on this premise
that the Comparative Toxicogenomics
Database (CTD) is being developed.
CTD will facilitate comparisons of
sequences and functions of toxicologically
significant genes and proteins from diverse
organisms, with an emphasis on aquatic
and mammalian species. The goal is to pro-
vide unique insights into the significance of
conserved sequences and polymorphisms,
the genetic basis of variable sensitivity,
molecular evolution, and adaptation. The
potential value of such comparisons is
demonstrated by studies of the aryl hydro-
carbon receptor (AhR) (Hahn 2002;
Thomas et al. 2002), which modulates the
toxic action of the environmental contami-
nant 2,3,7,8-tetrachlorodibenzo-p-dioxin
(TCDD) (Poland and Knutson 1982;
Schmidt and Bradfield 1996). Mammals,
fishes, and aquatic invertebrates exhibit dif-
ferent toxicity profiles (Hahn 2002).
Studies of AhR in these organisms identi-
fied duplication events in fishes and differ-
ences in sequence identity, TCDD-binding
capacity, and activation of downstream tar-
gets (Hahn 2002; Thomas et al. 2002).
Although the physiologic roles of AhR are
still not well understood, correlations
between AhR sequences and functions in
distantly related organisms may provide
valuable information about the evolution-
ary impact on this gene, possible insights
into the genetic basis of toxicity, and
directions for future research.
There is a strong precedent for compar-
ative studies with aquatic organisms. The
recent sequencing of the pufferfish (Fugu
rubripes) genome has resulted in the discov-
ery of nearly 1,000 human genes not
described previously in the public domain
(Aparicio et al. 2002). The anticipated
sequences for zebrafish (Danio rerio) and
spotted green pufferfish (Tetraodon
nigroviridis) genomes will likely make addi-
tional contributions to the annotation of
the human genome. Evolutionarily diverse
aquatic organisms have become important
models for studying human disease. For
example, membrane transporters that are
the sites of action of diuretic drugs, includ-
ing the bumetanide-sensitive Na-K-Cl
cotransporter and the thiazide-sensitive
NaCl cotransporter, were first cloned from
specialized organs in marine species
(Gamba et al. 1993; Xu et al. 1994).
Mutagenesis studies in teleosts have gener-
ated a spectrum of biologically relevant and
nonoverlapping phenotypes (Wittbrodt
et al. 2002). Large-scale genetic screens
have produced more than 500 zebrafish
mutants, many with phenotypes similar to
human disorders (Dooley and Zon 2000).
Medaka (Oryzias latipes) are routinely used
for studies in carcinogenesis and environ-
mental health (Wittbrodt et al. 2002). The
more distantly related elasmobranchs have
provided unique insight into conserved
functional domains of genes associated
with human liver function (Ballatori and
Villalobos 2002; Cai et al. 2001, 2002) and
cystic fibrosis (Aller et al. 1999).
The growing body of genomic informa-
tion available to the scientific community
has led to an increase in the number and
scope of biological databases. A recent
review (Baxevanis 2002) estimated a total
of 335 existing databases in 2002, an
increase from 281 in 2001. These data-
bases address a range of complex chal-
lenges for biologists, such as managing
comprehensive repositories of genomic
and proteomic data (Benson et al. 2002;
O’Donovan et al. 2002), annotating
species-specific genomes (Blake et al.
2002; Sprague et al. 2001), and identify-
ing protein families and conserved
domains (Baxevanis 2002). Existing toxi-
cology databases have cataloged chemical
and physical properties of toxic agents,
mutagenicity data, environmental health
and regulatory information, ecologic data,
and scientific references (Russom 2002;
Wexler 2001; Young 2002). It is impotant
to note that there is no existing publicly
available resource that provides toxicologic
Environmental Health Perspectives
VOLUME 111 | NUMBER 6 | May 2003
793
The Comparative Toxicogenomics Database (CTD)
Carolyn J. Mattingly,
1,2,3
Glenn T. Colby,
1,2,3
John N. Forrest,
2,3,4
and James L. Boyer
2,3,4
1
Department of Bioinformatics,
2
Center for Membrane Toxicity Studies, and
3
Center for Marine Functional Genomic Studies, Mount
Desert Island Biological Laboratory, Salsbury Cove, Maine, USA;
4
Department of Medicine, Yale University School of Medicine,
New Haven, Connecticut, USA
Address correspondence to C.J. Mattingly, Dept. of
Bioinformatics, Mount Desert Island Biological
Laboratory, Salsbury Cove, ME 04672 USA.
Telephone: (207) 288-3605. Fax: (207) 288-2130.
E-mail: cmattin@mdibl.org
We thank N. Ballatori, J. Blake, J. Eppig, B.
Forbush, and D. Towle for insightful feedback and
support. This project is funded by National
Institute of Environmental Health Sciences grants
ES11267-02 and ES03828-17.
Inquiries about CTD may be sent to
ctd@mdibl.org
Received 30 September 2002; accepted 12
February 2003.
The Mount Desert Island Biological Laboratory in Salsbury Cove, Maine, USA, is developing the
Comparative Toxicogenomics Database (CTD), a community-supported genomic resource
devoted to genes and proteins of human toxicologic significance. CTD will be the first publicly
available database to a) provide annotated associations among genes, proteins, references, and
toxic agents, with a focus on annotating data from aquatic and mammalian organisms; b) include
nucleotide and protein sequences from diverse species; c) offer a range of analysis tools for cus-
tomized comparative studies; and d) provide information to investigators on available molecular
reagents. This combination of features will facilitate cross-species comparisons of toxicologically
significant genes and proteins. These comparisons will promote understanding of molecular evo-
lution, the significance of conserved sequences, the genetic basis of variable sensitivity to environ-
mental agents, and the complex interactions between the environment and human health. CTD
is currently under development, and the planned scope and functions of the database are
described herein. The intent of this report is to invite community participation in the develop-
ment of CTD to ensure that it will be a valuable resource for environmental health, molecular
biology, and toxicology research. Key words: aquatic, comparative, database, environmental
health, fishes, genomic, health, toxicogenomics, toxicology. Environ Health Perspect 111:793–795
(2003). doi:10.1289/txg.6028 available via http://dx.doi.org/ [Online 13 February 2003]
Toxicogenomics
Commentary
annotation of genomic and proteomic
data from diverse species. In addition to
CTD, another public toxicogenomic
database is being developed by the
National Center for Toxicogenomics at
the National Institute of Environmental
Health Sciences (NIEHS). The Chemical
Effects in Biological Systems (CEBS)
Knowledge Base will capture and integrate
global molecular expression data with
pathway and regulatory network informa-
tion related to toxicology and human dis-
ease (Waters et al. 2003). It is the goal of
both development groups that CTD and
CEBS be complementary in focus and
functionally compatible.
Scope
Biologic features and strategic plan.
CTD
is being developed at the Mount Desert
Island Biological Laboratory (MDIBL) in
Salsbury Cove, Maine, USA, in collaboration
with investigators at NIEHS Marine and
Freshwater Biomedical Sciences (MFBS)
centers and other scientists with expertise in
molecular biology, toxicology, and bioinfor-
matics. CTD will include curated informa-
tion about nucleotide and protein sequences,
associated references, toxic agents, reagents,
and taxonomy. Tools for data analysis,
manipulation, and visualization for compara-
tive studies will also be provided. This scope
of features dictates a phased implementation
approach that will combine automated and
manual curation strategies. The first year
(September 2002–August 2003) will include
three implementation phases.
Phase I will focus on the acquisition
and integration of sequences, references to
the scientific literature, and toxic agents.
Although annotation will focus on genes and
proteins with associated toxicologic data, an
inclusive set of sequence data will be stored
locally in CTD to a) maximize the value of
comparative sequence analyses that may be
performed using integrated computational
tools, b) prevent exclusion of sequences
with potential toxicological significance,
c) allow querying of annotated features, and
d) provide integration with data from other
sources. Subsets of nucleotide sequences will
be acquired from the National Center
for Biotechnology Information (NCBI;
http://www.ncbi.nlm.nih.gov). CTD will store
all nucleotide reference sequences for human
(Homo sapiens), mouse (Mus musculus), rat
(Rattus norvegicus), and fruitfly (Drosophila
melanogaster), thereby providing a nonredun-
dant set of sequences for these particular
species (Pruitt and Maglott 2001). All
nucleotide sequences for other vertebrates
and invertebrates will be loaded from
GenBank (http://www.ncbi.nlm.nih.gov/
Sitemap/index.html GenBank; Benson et al.
2002). Protein sequences for the corre-
sponding organisms will be acquired from
SWISS-PROT (http://ca.expasy.org/sprot/),
which provides a comprehensive, annotated,
and nonredundant protein sequence data set
(O’Donovan et al. 2002). Direct submis-
sions of sequence data to CTD will not be
accepted to avoid duplication of informa-
tion loaded from GenBank and SWISS-
PROT. Information will be updated from
these databases frequently to ensure that
CTD remains current and comprehensive.
During phase I, references associated
with genes and proteins will be identified
from GenBank and SWISS-PROT sequence
records and the NCBI literature database
PubMed (http://www.ncbi.nlm.nih.gov/
entrez/query.fcgi?db=PubMed). Candidate
associations between genes, proteins, and
toxic agents will be identified using queries
to search the titles, abstracts, and Medical
Subject Headings (MeSH) of references
(Lipscomb 2000; Young 2002) included in
CTD. For queries of genes and proteins,
nomenclature inconsistencies will be
accounted for initially by including syn-
onyms identified in public biologic databases
also addressing this issue, such as Locus Link
(http://www.ncbi.nlm.nih.gov/LocusLink/) and
the Mouse Genome Informatics databases
(http://www.informatics.jax.org/). Queries for
toxic agents will be constructed using a hier-
archical vocabulary that will enhance
MeSH’s Chemicals Index and Chemicals
and Drugs category by supplementing it
with chemical information from the U.S.
EPA, the U.S. Fish and Wildlife Service,
and the National Toxicology Program.
Criteria for queries will be established in col-
laboration with investigators from other
NIEHS MFBS centers and other investiga-
tors from the scientific community with
expertise in molecular biology and toxicology.
All associations between data sets in CTD
will be labeled “not reviewed” until a curator
has confirmed their accuracy.
During phase II, we will evaluate
and integrate analysis tools for sequence
similarity searches (e.g., WU-BLAST)
(Altschul et al. 1990), multiple alignments
(e.g., ClustalW) (Thompson et al. 1994),
and phylogenetic analysis (e.g., PHYLIP)
(Felsenstein 1993). Currently, many web
sites offer BLAST capabilities against stati-
cally defined data sets that include sequences
from specific organisms, groups of organ-
isms, or databases. These data sets are often
either too inclusive, resulting in an over-
abundance of “hits,” or exclude organisms
of interest. By storing sequences and related
data locally in a relational database, it will be
possible for users to define customized data
sets. This capability will permit highly
focused sequence analysis, such as restricting
BLAST searches to a specific combination of
taxa. In addition, large-scale automated
sequence analysis will be possible.
During phase III, we will develop a
World Wide Web (WWW) interface for
CTD that will include user registration and
comment forms, basic and advanced query
options to access data for sequences, refer-
ences, and toxic agents, and a platform for
analyzing sequences. At the completion of
phase III, CTD will be made accessible to
collaborators and participating members of
the community to evaluate its functionality
and test the system. On the basis of feed-
back from the scientific community, we
will then work with MFBS center investi-
gators in subsequent years to continue the
data curation process and prioritize the
inclusion of additional data sets such as
expressed sequence tags, single nucleotide
polymorphisms, and data from microarray
experiments.
Toxicogenomics
|
Mattingly et al.
794
VOLUME 111 | NUMBER 6 | May 2003
Environmental Health Perspectives
CTD biologists
Hardware and
software capabilities
Data model
Design
database
Prototype
MFBS Center
biologists
Performance
requirements
Logical/physical
database design
Design
system
MFBS Center
biologists
Beta testers
Beta version
Public
release
Figure 1. Software development life cycle. The CTD system will be implemented in stages. A data model
was designed prior to developing functional specifications and a prototype system. Biologists will evalu-
ate content and functionality throughout the development life cycle.
Implementation.
CTD is being designed
using a data-driven approach in which the
data model is developed prior to specifying
system functions (Figure 1). This approach
will a) promote reusability of data, b) estab-
lish a consistent set of names and defini-
tions for data, c) determine what functions
the system will support, and d) provide a
concise overview of the system’s scope
(Simsion 1994). CTD will be implemented
in an Oracle relational database. The cur-
rent data model includes 40 entities with
well-documented definitions, including text
descriptions of all entities and attributes,
data types, constraint definitions, and repre-
sentative values. CTD will include a cura-
tion tool and WWW user interface. Oracle
Forms Developer will be used to develop
the first generation of the curation tool,
which will be used to annotate and modify
data. This tool is tightly integrated with the
Oracle database and provides client-side
validation, reusable components, and rapid
prototyping capability. The WWW inter-
face will be developed using the Python
programming language.
World Wide Web interface. The CTD
WWW interface will combine the familiar
paradigms of NCBI and Mouse Genome
Informatics databases. Simple and advanced
query forms will be available to retrieve
information about genes, including nucleo-
tide and protein sequences, as well as refer-
ences, toxic agents, reagents, and taxonomy.
Each of these major categories will have a
resource page providing a description of
associated data and links to resources with
supplemental information. Data will be
highly integrated within CTD and with
external databases.
Community involvement. MDIBL is
committed to involving the scientific com-
munity in the development of CTD. To
this end, we are formally collaborating with
investigators at each of the NIEHS MFBS
centers; hosting conferences to evaluate the
progress and strategic plan of CTD;
attending national meetings to promote
awareness of and participation in CTD
development; and planning online mecha-
nisms for feedback and data submissions.
From its inception, CTD has benefited
from significant community support. In
April 2000, 45 biologists and bioinformat-
ics experts attended a conference at
MDIBL (MDIBL 2000) to address the
application of bioinformatics in toxicology
research. Discussions at this meeting for-
mulated the initial plan for a toxicoge-
nomics database and were the foundation
for the NIEHS-phased innovation grant
application that now funds CTD. In May
2002 MDIBL hosted a workshop (MDIBL
2002) to promote dialog about genomic
databases in the scientific community and
to seek feedback about the progress of
CTD. Because of the success and utility of
these meetings, another conference is
planned for 2004.
Community Invitation
To ensure that CTD is a valuable resource
for the scientific community, we invite par-
ticipation in its development. Specific chal-
lenges for which we encourage feedback
include addressing nomenclature inconsis-
tencies, clustering sequence data from
diverse species, and determining the role of
microarray data in CTD. Defining strate-
gies to meet these challenges will have
broad implications for molecular biologists
and toxicologists.
R
EFERENCES
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.
1990. Basic local alignment search tool. J Mol
Biol 215:403–410.
Aller SG, Lombardo ID, Bhanot S, Forrest JN, Jr. 1999.
Cloning, characterization, and functional expres-
sion of a CNP receptor regulating CFTR in the
shark rectal gland. Am J Physiol 276:C442–C449.
Aparicio S, Chapman J, Stupka E, Putnam N, Chia
JM, Dehal P, et al. 2002. Whole-genome shotgun
assembly and analysis of the genome of Fugu
rubripes. Science 297:1301–1310.
Ballatori N, Villalobos AR. 2002. Defining the molecu-
lar and cellular basis of toxicity using compara-
tive models. Toxicol Appl Pharmacol 183:207–220.
Baxevanis AD. 2002. The Molecular Biology Database
Collection: 2002 update. Nucleic Acids Res 30:1–12.
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J,
Rapp BA, Wheeler DL. 2002. GenBank. Nucleic
Acids Res 30:17–20.
Blake JA, Richardson JE, Bult CJ, Kadin JA, Eppig JT.
2002. The Mouse Genome Database (MGD): the
model organism database for the laboratory
mouse. Nucleic Acids Res 30:113–115.
Cai SY, Wang W, Ballatori N, Boyer JL. 2001. Bile salt
export pump is highly conserved during verte-
brate evolution and its expression is inhibited by
PFIC type II mutations. Am J Physiol Gastrointest
Liver Physiol 281:G316–322.
Cai SY, Wang W, Soroka CJ, Ballatori N, Boyer JL.
2002. An evolutionarily ancient Oatp: insights into
conserved functional domains of these proteins.
Am J Physiol Gastrointest Liver Physiol
282:G702–G710.
Dooley K, Zon LI. 2000. Zebrafish: a model system for
the study of human disease. Curr Opin Genet Dev
10:252–256.
Felsenstein J. 1993. PHYLIP Phylogeny Inference
Package 3.5. Seattle, WA:The University of
Washington.
Gamba G, Saltzberg SN, Lombardi M, Miyanoshita A,
Lytton J, Hediger MA, et al. 1993. Primary struc-
ture and functional expression of a cDNA encod-
ing the thiazide-sensitive, electroneutral
sodium-chloride cotransporter. Proc Natl Acad
Sci USA 90:2749–2753.
Hahn M. 2002. Aryl hydrocarbon receptors: diversity
and evolution. Chem Biol Interact 141:131–160.
Koonin EV, Aravind L, Kondrashov AS. 2000. The
impact of comparative genomics on our under-
standing of evolution. Cell 101:573–576.
Lipscomb CE. 2000. Medical Subject Headings
(MeSH). Bull Med Libr Assoc 88:265–266.
MDIBL (Mount Desert Island Biological Laboratory).
2002. Conference on Bioinformatics of Genes and
ESTs Relevant to Membrane Cellular Toxicology,
28–29 April 2000, Salsbury Cove, ME.
MDIBL (Mount Desert Island Biological Laboratory).
2002. Conference on Community Participation in
Genomic Databases, 3–5 May 2002, Salsbury
Cove, ME.
O’Donovan C, Martin MJ, Gattiker A, Gasteiger E,
Bairoch A, Apweiler R. 2002. High-quality protein
knowledge resource: SWISS-PROT and TrEMBL.
Brief Bioinform 3:275–284.
Poland A, Knutson JC. 1982. 2,3,7,8-Tetrachlorodibenzo-
p-dioxin and related halogenated aromatic hydro-
carbons: examination of the mechanism of toxicity.
Annu Rev Pharmacol Toxicol 22:517–554.
Pruitt KD, Maglott, DR. 2001. RefSeq and LocusLink:
NCBI gene-centered resources. Nucleic Acids
Res 29:137–140.
Russom CL. 2002. Mining environmental toxicology
information: web resources. Toxicology 173:75–88.
Schmidt JV, Bradfield CA. 1996. Ah receptor signaling
pathways. Annu Rev Cell Dev Biol 12:55–89.
Simsion G. 1994. Data Modeling Essentials: Analysis,
Design, and Innovation. London:International
Thomson Computer Press.
Sprague J, Doerry E, Douglas S, Westerfield M. 2001.
The Zebrafish Information Network (ZFIN): a
resource for genetic, genomic and developmen-
tal research. Nucleic Acids Res 29:87–90.
Thomas RS, Penn SG, Holden K, Bradfield CA, Rank
DR. 2002. Sequence variation and phylogenetic
history of the mouse Ahr gene. Pharmacogenetics
12:151–163.
Thompson JD, Higgins DG, Gibson TJ. 1994.
CLUSTAL W: improving the sensitivity of progres-
sive multiple sequence alignment through
sequence weighting, position-specific gap penal-
ties and weight matrix choice. Nucleic Acids Res
22:4673–4680.
U.S. EPA. New Chemicals Program. Washington,
DC:U.S. Environmental Protection Agency.
Available: http://www.epa.gov/opptintr/newchems/
invntory.htm [accessed 27 January 2003].
Waters M, Boorman G, Bushel P, Cunningham M,
Irwin R, Merrick A, et al. 2003. Systems
Toxicology and the Chemical Effects in Biological
Systems (CEBS) Knowledge Base. Environ Health
Perspect 111:811–824 (2003).
Wexler P. 2001. TOXNET: an evolving web resource
for toxicology and environmental health informa-
tion. Toxicology 157:3–10.
Whelan S, Lio P, Goldman N. 2001. Molecular phylo-
genetics: state-of-the-art methods for looking
into the past. Trends Genet 17:262–272.
Wittbrodt J, Shima A, Schartl M. 2002. Medaka—a
model organism from the Far East. Nat Rev Genet
3:53–64.
Xu JC, Lytle C, Zhu TT, Payne JA, Benz E, Forbush B.
1994. Molecular cloning and functional expression
of the bumetanide-sensitive Na-K-Cl cotrans-
porter. Proc Natl Acad Sci USA 91:2201–2205.
Young RR. 2002. Genetic toxicology: web resources.
Toxicology 173:103–121.
Toxicogenomics
|
A resource for comparative studies in toxicology
Environmental Health Perspectives
VOLUME 111 | NUMBER 6 | May 2003
795
    • "5 (CTD 9 ) at Mount Desert Island Biological Laboratory aims to promote comparative studies of genes and proteins across species (Mattingly et al., 2006a,b; Mattingly et al., 2004; Mattingly et al., 2003). CTD data is searchable through the ToxNET portal. "
    [Show abstract] [Hide abstract] ABSTRACT: Read-across, i.e. filling toxicological data gaps by relating to similar chemicals, for which test data are available, is usually done based on chemical similarity. Besides structure and physico-chemical properties, however, biological similarity based on biological data adds extra strength to this process. In the context of developing Good Read-Across Practice guidance, a number of case studies were evaluated to demonstrate the use of biological data to enrich read-across. In the simplest case, chemically similar substances also show similar test results in relevant in vitro assays. This is a well-established method for the read-across of e.g. genotoxicity assays. Larger datasets of biological and toxicological properties of hundreds and thousands of substances become increasingly available enabling big data approaches in read-across studies. Several case studies using various big data sources are described in this paper. An example is given for the US EPA's ToxCast dataset allowing read-across for high quality uterotrophic assays for estrogenic endocrine disruption. Similarly, an example for REACH registration data enhancing read-across for acute toxicity studies is given. A different approach is taken using omics data to establish biological similarity: Examples are given for stem cell models in vitro and short-term repeated dose studies in rats in vivo to support read-across and category formation. These preliminary biological data-driven read-across studies highlight the road to the new generation of read-across approaches that can be applied in chemical safety assessment.
    Full-text · Article · Feb 2016
    • "The model store for BEL models is based on a relational database schema, which makes this type of disease models amenable for SQL-based querying and enables an unprecedented integration of models and databases. A growing number of relevant public databases (e.g., CTD [120], UniProt [121], WikiPathways [122], DrugBank [123], HGNC [124], PubMed [125], Reactome [126], proteinAtlas [127], NCBI taxonomy [128], ChEBI [129], InterPro [130], various PPI databases [131,132]) have been imported in this system; extensive mappings between these databases allows for seamless integration of models, referential databases, and data analysis results coming from the tranSMART component.Figure 6 shows the overall workflow that is supported by the AETIONOMY KB: starting with candidate gene lists that result from, e.g., differential gene expression analysis performed within the tranSMART component or that can result from the SNP mining approach described above, the disease models are queried for the representation of functional context around these candidate genes (or other genes of interest, e.g., biomarker candidates). Mapping of gene and SNP identifier to the OpenBEL models containing gene and SNP information allows for the identification of BEL subgraphs that represent functional context around these genes of interest. "
    [Show abstract] [Hide abstract] ABSTRACT: Since the decoding of the Human Genome, techniques from bioinformatics, statistics, and machine learning have been instrumental in uncovering patterns in increasing amounts and types of different data produced by technical profiling technologies applied to clinical samples, animal models, and cellular systems. Yet, progress on unravelling biological mechanisms, causally driving diseases, has been limited, in part due to the inherent complexity of biological systems. Whereas we have witnessed progress in the areas of cancer, cardiovascular and metabolic diseases, the area of neurodegenerative diseases has proved to be very challenging. This is in part because the aetiology of neurodegenerative diseases such as Alzheimer´s disease or Parkinson´s disease is unknown, rendering it very difficult to discern early causal events. Here we describe a panel of bioinformatics and modeling approaches that have recently been developed to identify candidate mechanisms of neurodegenerative diseases based on publicly available data and knowledge. We identify two complementary strategies—data mining techniques using genetic data as a starting point to be further enriched using other data-types, or alternatively to encode prior knowledge about disease mechanisms in a model based framework supporting reasoning and enrichment analysis. Our review illustrates the challenges entailed in integrating heterogeneous, multiscale and multimodal information in the area of neurology in general and neurodegeneration in particular. We conclude, that progress would be accelerated by increasing efforts on performing systematic collection of multiple data-types over time from each individual suffering from neurodegenerative disease. The work presented here has been driven by project AETIONOMY; a project funded in the course of the Innovative Medicines Initiative (IMI); which is a public-private partnership of the European Federation of Pharmaceutical Industry Associations (EFPIA) and the European Commission (EC).
    Full-text · Article · Dec 2015
    • "Another database that provides information about autoimmune disorders is the Autoimmune Disease Database, which gives descriptions of autoimmune disorders and links these diseases to candidate genes, which is, again, a database that useful only for researchers [7]. The Comparative Toxicogenomic Database (CTD) is a rich resource for researchers to access information about the etiology of environmental diseases and explore chemical-gene and protein interactions [8]. Such attempts have contributed enormously to efforts related to the prevention, diagnosis and treatment of diseases and have resulted in the development of new approaches to alleviate the consequences of life-threatening illnesses. "
    [Show abstract] [Hide abstract] ABSTRACT: The scope of the Human Disease Insight (HDI) database is not limited to researchers or physicians as it also provides basic information to non-professionals and creates disease awareness, thereby reducing the chances of patient suffering due to ignorance. HDI is a knowledge-based resource providing information on human diseases to both scientists and the general public. Here, our mission is to provide a comprehensive human disease database containing most of the available useful information, with extensive cross-referencing. HDI is a knowledge management system that acts as a central hub to access information about human diseases and associated drugs and genes. In addition, HDI contains well-classified bioinformatics tools with helpful descriptions. These integrated bioinformatics tools enable researchers to annotate disease-specific genes and perform protein analysis, search for biomarkers and identify potential vaccine candidates. Eventually, these tools will facilitate the analysis of disease-associated data. The HDI provides two types of search capabilities and includes provisions for downloading, uploading and searching disease/gene/drug-related information. The logistical design of the HDI allows for regular updating. The database is designed to work best with Mozilla Firefox and Google Chrome and is freely accessible at http://humandiseaseinsight.com.
    Full-text · Article · Nov 2015
Show more

  • undefined · undefined
  • undefined · undefined
  • undefined · undefined