GDPC: connecting researchers with multiple integrated data sources.
ABSTRACT The goal of this project is to simplify access to genomic diversity and phenotype data, thereby encouraging reuse of this data. The Genomic Diversity and Phenotype Connection (GDPC) accomplishes this by retrieving data from one or more data sources and by allowing researchers to analyze integrated data in a standard format. GDPC is written in JAVA and provides (1) data sources available as web services that transfer XML formatted data via the SOAP protocol; (2) a JAVA API for programmatic access to data sources; and (3) a front-end application that allows users to manage data sources, retrieve data based on filters, sort/group data based on property values and save/open the data as XML files. AVAILABILITY: The source code, compiled code, documentation and GDPC Browser are freely available at: www.maizegenetics.net/gdpc/index.html the current release of GDPC is version 1.0, with updated releases planned for the future. Comments are welcome.
Article: Using metabolomics to estimate unintended effects in transgenic crop plants: problems, promises, and opportunities.[show abstract] [hide abstract]
ABSTRACT: Transgenic crops are widespread in some countries and sectors of the agro-economy, but are also highly contentious. Proponents of transgenic crop improvement often cite the "substantial equivalence" of transgenic crops to the their nontransgenic parents and sibling varieties. Opponents of transgenic crop improvement dismiss the substantial equivalence standard as being without statistical basis and emphasize the possible unintended effects to food quality and composition due to genetic transformation. Systems biology approaches should help consumers, regulators, and other stakeholders make better decisions regarding transgenic crop improvement by characterizing the composition of conventional and transgenically improved crop species and products. In particular, metabolomic profiling via mass spectrometry and nuclear magnetic resonance can make broad and deep assessments of food quality and content. The metabolome observed in a transgenic variety can then be assessed relative to the consumer and regulator accepted phenotypic range observed among conventional varieties. I briefly discuss both targeted (closed architecture) and nontargeted (open architecture) metabolomics with respect to the transgenic crop debate and highlight several challenges to the field. While most experimental examples come from tomato (Solanum lycoperiscum), analytical methods from all of systems biology are discussed.Journal of biomolecular techniques: JBT 08/2008; 19(3):159-66.
Article: The generation challenge programme platform: semantic standards and workbench for crop science.[show abstract] [hide abstract]
ABSTRACT: The Generation Challenge programme (GCP) is a global crop research consortium directed toward crop improvement through the application of comparative biology and genetic resources characterization to plant breeding. A key consortium research activity is the development of a GCP crop bioinformatics platform to support GCP research. This platform includes the following: (i) shared, public platform-independent domain models, ontology, and data formats to enable interoperability of data and analysis flows within the platform; (ii) web service and registry technologies to identify, share, and integrate information across diverse, globally dispersed data sources, as well as to access high-performance computational (HPC) facilities for computationally intensive, high-throughput analyses of project data; (iii) platform-specific middleware reference implementations of the domain model integrating a suite of public (largely open-access/-source) databases and software tools into a workbench to facilitate biodiversity analysis, comparative analysis of crop genomic data, and plant breeding decision making.International Journal of Plant Genomics 02/2008; 2008:369601.
Article: Panzea: a database and resource for molecular and functional diversity in the maize genome.[show abstract] [hide abstract]
ABSTRACT: Serving as a community resource, Panzea (http://www.panzea.org) is the bioinformatics arm of the Molecular and Functional Diversity in the Maize Genome project. Maize, a classical model for genetic studies, is an important crop species and also the most diverse crop species known. On average, two randomly chosen maize lines have one single-nucleotide polymorphism every approximately 100 bp; this divergence is roughly equivalent to the differences between humans and chimpanzees. This exceptional genotypic diversity underlies the phenotypic diversity maize needs to be cultivated in a wide range of environments. The Molecular and Functional Diversity in the Maize Genome project aims to understand how selection has shaped molecular diversity in maize and then relate molecular diversity to functional phenotypic variation. The project will screen 4000 loci for the signature of selection and create a wide range of maize and maize-teosinte mapping populations. These populations will be genotyped and phenotyped, permitting high-power and high-resolution dissection of the traits and relating the molecular diversity to functional variation. Panzea provides access to the genotype, phenotype and polymorphism data produced by the project through user-friendly web-based database searches and data retrieval/visualization tools, as well as a wide variety of information and services related to maize diversity.Nucleic Acids Research 02/2006; 34(Database issue):D752-7. · 8.03 Impact Factor
BIOINFORMATICS APPLICATIONS NOTE
GDPC: connecting researchers with multiple
integrated data sources
Terry M. Casstevens1,∗and Edward S. Buckler1,2
1Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853-2703, USA,
2USDA-ARS, Ithaca, NY 14853-2703, USA
Received on October 27, 2003; revised and accepted on April 8, 2004
Advance Access publication April 24, 2004
Summary: The goal of this project is to simplify access to
genomic diversity and phenotype data, thereby encouraging
reuse of this data. The Genomic Diversity and Phenotype
Connection (GDPC) accomplishes this by retrieving data from
one or more data sources and by allowing researchers to ana-
lyze integrated data in a standard format. GDPC is written in
JAVA and provides (1) data sources available as web services
that transfer XML formatted data via the SOAP protocol; (2)
a JAVA API for programmatic access to data sources; and
(3) a front-end application that allows users to manage data
sources, retrieve data based on filters, sort/group data based
on property values and save/open the data as XML files.
Availability: The source code, compiled code, documenta-
tion and GDPC Browser are freely available at: www.
maizegenetics.net/gdpc/index.html. The current release of
GDPC is version 1.0, with updated releases planned for the
future. Comments are welcome.
Numerous research projects on genomic diversity and pheno-
types have generated valuable data collections, including
thousands of QTL mapping studies. These datasets, however,
tend to be abandoned after results are published and thus are
not easily accessible by subsequent projects or the general
public. Ideally, this data would be made publicly available
at the conclusion of each project by migrating the collec-
ted data to larger, public databases. The Genomic Diversity
and Phenotype Connection (GDPC) accelerates the availab-
ility of data by providing the infrastructure to create and
use connections to multiple data sources. It is possible to
use multiple data sources because each connection masks
the specifics of its data source. GDPC already has a con-
nection to the maize diversity database, Panzea (Du et al.
2003, http://www.panzea.org; Doebley et al. 2003), and there
is work in progress to create a connection to the comparative
cereal database, Gramene (Ware et al. 2002; Stein et al. 2003,
∗To whom correspondence should be addressed.
http://www.gramene.org; Fig. 1). Connections to additional
data sources are also planned, and other organizations are
connections that integrate their data with GDPC. Once these
sources are ‘GDPC enabled’, data from these sources can
be analyzed simultaneously with front-end software applica-
tions. These applications use the GDPC JAVA API that
standardizes access to any ‘GDPC enabled’ data source. The
GDPC Browser is an application that currently uses this API.
‘GDPC enabled’ data sources
Data sources are at the core of GDPC. A data source is con-
sidered ‘GDPC enabled’ when it has been programmed to
access methods to the data source. A ‘GDPC enabled’ data
source is typically designed to be a remote web service that
transfers XML formatted data via the SOAP protocol. Data
sources can also be accessed using the JAVA JDBC API, and
many other ways of accessing data sources could also be
designed. The data are always returned in a common format
regardless of the access method, which makes it possible to
integrate, analyze and view data from multiple sources. This
is a significant advantage over other tools that give access to
only one data source. The programming that makes a source
‘GDPC enabled’ is referred to as a ‘GDPC connection’. This
connection knows the specifics of its data source and masks
them from the rest of the system. Researchers can also create
connections to their own data. This allows them to integrate
and analyze their data with publicly available data. As new
GDPC connections to data sources are created, any ‘GDPC
aware’ software application will automatically have access
to that data source. Future plans include developing classes
that will allow local files to serve as data sources. Also,
plans include registering GDPC connections with MOBY
to allow users to look up available data sources via that
Bioinformatics vol. 20 issue 16 © Oxford University Press 2004; all rights reserved.
by guest on October 13, 2011
T.M.Casstevens and E.S.Buckler
Fig. 1. As shown here, genomic diversity and phenotype data can be retrieved, integrated and analyzed from multiple data sources using
GDPC JAVA API
programmatic ways to retrieve the data do not generally exist.
In contrast, GDPC provides a JAVA API that standardizes
access to data regardless of the underlying format. Applica-
tions that use this API are described as being ‘GDPC aware’.
Programmers can use this API to develop front-end applica-
tions that perform algorithms relevant to their project. Since
GDPC masks the specifics of the different data sources, pro-
The GDPC Browser and TASSEL (Buckler, 2003, http://
examples of applications currently using this API. Future
plans include development of a Mesquite module (Maddison
and Maddison, 2003, http://mesquiteproject.org/mesquite/
mesquite.html) that will make any ‘GDPC enabled’ data
source accessible from Mesquite.
to retrieve, view and group genomic diversity and pheno-
typic data based on property values. With this application,
users can manage one or more data sources and retrieve
data from these sources based on user-defined filters. Once
retrieved, the data elements’ properties can be viewed by
selecting the elements. The different types of data elements
are taxa, loci, environment experiments, genotype experi-
ments, localities, genotypes and phenotypes. Working lists
of these elements can be created and sorted based on the
needs of the researcher. These working lists can then be
saved as XML files, and later opened to restore the data ele-
ments. Data can also be exported to other formats chosen by
Each year much effort and great expense goes into collecting
in remote databases, the data would prove far more valuable
if it were maintained in a way that allowed others to continu-
ally improve and reuse it. In time, these publicly available
data sources would improve in quality and the datasets would
grow larger. GDPC provides access to such data collections
by retrieving data from multiple data sources and by allowing
researchers to analyze integrated data in a standard format.
This project is supported by a cooperative agreement with the
Buckler,E.S. (2003) TASSEL: Trait Analysis by aSSociation, Evol-
ution, and Linkage.
Muse,S. and Weir,B. (2003) Panzea: maize diversity.
Du,C., Buckler,E., and Muse,S. (2003) Development of a maize
molecular evolutionary genomic database. Comput. Funct.
Genom., 4, 246–249.
Maddison,W.P. and Maddison,D.R. (2003) Mesquite: a modular
system for evolutionary analysis.
Stein,L., McCouch,S.R. and Cartinhour,S. (2003) Gramene: a
resource for comparative grass genomics.
Teytelman,L., Schmidt,S., Zhao,W., Cartinhour,S., McCouch,S.
and Stein,L. (2002) Gramene: a resource for comparative grass
genomics. Nucleic Acids Res., 30, 103–105.
Wilkinson,M.D. and Links,M. (2002) BioMoby: an open source
biological web services.
by guest on October 13, 2011