[Show abstract][Hide abstract] ABSTRACT: Charting the interactions among genes and among their protein products is essential for understanding biological systems. A flood of interaction data is emerging from high throughput technologies, computational approaches, and literature mining methods. Quick and efficient access to this data has become a critical issue for biologists. Several excellent multi-organism databases for gene and protein interactions are available, yet most of these have understandable difficulty maintaining comprehensive information for any one organism. No single database, for example, includes all available interactions, integrated gene expression data, and comprehensive and searchable gene information for the important model organism, Drosophila melanogaster.
DroID, the Drosophila Interactions Database, is a comprehensive interactions database designed specifically for Drosophila. DroID houses published physical protein interactions, genetic interactions, and computationally predicted interactions, including interologs based on data for other model organisms and humans. All interactions are annotated with original experimental data and source information. DroID can be searched and filtered based on interaction information or a comprehensive set of gene attributes from Flybase. DroID also contains gene expression and expression correlation data that can be searched and used to filter datasets, for example, to focus a study on sub-networks of co-expressed genes. To address the inherent noise in interaction data, DroID employs an updatable confidence scoring system that assigns a score to each physical interaction based on the likelihood that it represents a biologically significant link.
DroID is the most comprehensive interactions database available for Drosophila. To facilitate downstream analyses, interactions are annotated with original experimental information, gene expression data, and confidence scores. All data in DroID are freely available and can be searched, explored, and downloaded through three different interfaces, including a text based web site, a Java applet with dynamic graphing capabilities (IM Browser), and a Cytoscape plug-in. DroID is available at http://www.droidb.org.
[Show abstract][Hide abstract] ABSTRACT: The human genome evolution project seeks to reveal the genetic underpinnings of key phenotypic features that are distinctive of humans, such as a greatly enlarged cerebral cortex, slow development, and long life spans. This project has focused predominantly on genotypic changes during the 6-million-year descent from the last common ancestor (LCA) of humans and chimpanzees. Here, we argue that adaptive genotypic changes during earlier periods of evolutionary history also helped shape the distinctive human phenotype. Using comparative genome sequence data from 10 vertebrate species, we find a signature of human ancestry-specific adaptive evolution in 1,240 genes during their descent from the LCA with rodents. We also find that the signature of adaptive evolution is significantly different for highly expressed genes in human fetal and adult-stage tissues. Functional annotation clustering shows that on the ape stem lineage, an especially evident adaptively evolved biological pathway contains genes that function in mitochondria, are crucially involved in aerobic energy production, and are highly expressed in two energy-demanding tissues, heart and brain. Also, on this ape stem lineage, there was adaptive evolution among genes associated with human autoimmune and aging-related diseases. During more recent human descent, the adaptively evolving, highly expressed genes in fetal brain are involved in mediating neuronal connectivity. Comparing adaptively evolving genes from pre- and postnatal-stage tissues suggests that different selective pressures act on the development vs. the maintenance of the human phenotype.
Proceedings of the National Academy of Sciences 04/2008; 105(9):3215-20. · 9.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Previous molecular analyses of mammalian evolutionary relationships involving a wide range of placental mammalian taxa have been restricted in size from one to two dozen gene loci and have not decisively resolved the basal branching order within Placentalia. Here, on extracting from thousands of gene loci both their coding nucleotide sequences and translated amino acid sequences, we attempt to resolve key uncertainties about the ancient branching pattern of crown placental mammals. Focusing on approximately 1,700 conserved gene loci, those that have the more slowly evolving coding sequences, and using maximum-likelihood, Bayesian inference, maximum parsimony, and neighbor-joining (NJ) phylogenetic tree reconstruction methods, we find from almost all results that a clade (the southern Atlantogenata) composed of Afrotheria and Xenarthra is the sister group of all other (the northern Boreoeutheria) crown placental mammals, among boreoeutherians Rodentia groups with Lagomorpha, and the resultant Glires is close to Primates. Only the NJ tree for nucleotide sequences separates Rodentia (murids) first and then Lagomorpha (rabbit) from the other placental mammals. However, this nucleotide NJ tree still depicts Atlantogenata and Boreoeutheria but minus Rodentia and Lagomorpha. Moreover, the NJ tree for amino acid sequences does depict the basal separation to be between Atlantogenata and a Boreoeutheria that includes Rodentia and Lagomorpha. Crown placental mammalian diversification appears to be largely the result of ancient plate tectonic events that allowed time for convergent phenotypes to evolve in the descendant clades.
Proceedings of the National Academy of Sciences 10/2007; 104(36):14395-400. · 9.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Data from large-scale protein interaction screens for humans and model eukaryotes have been invaluable for developing systems-level models of biological processes. Despite this value, only a limited amount of interaction data is available for prokaryotes. Here we report the systematic identification of protein interactions for the bacterium Campylobacter jejuni, a food-borne pathogen and a major cause of gastroenteritis worldwide.
Using high-throughput yeast two-hybrid screens we detected and reproduced 11,687 interactions. The resulting interaction map includes 80% of the predicted C. jejuni NCTC11168 proteins and places a large number of poorly characterized proteins into networks that provide initial clues about their functions. We used the map to identify a number of conserved subnetworks by comparison to protein networks from Escherichia coli and Saccharomyces cerevisiae. We also demonstrate the value of the interactome data for mapping biological pathways by identifying the C. jejuni chemotaxis pathway. Finally, the interaction map also includes a large subnetwork of putative essential genes that may be used to identify potential new antimicrobial drug targets for C. jejuni and related organisms.
The C. jejuni protein interaction map is one of the most comprehensive yet determined for a free-living organism and nearly doubles the binary interactions available for the prokaryotic kingdom. This high level of coverage facilitates pathway mapping and function prediction for a large number of C. jejuni proteins as well as orthologous proteins from other organisms. The broad coverage also facilitates cross-species comparisons for the identification of evolutionarily conserved subnetworks of protein interactions.
[Show abstract][Hide abstract] ABSTRACT: BackgroundRapidly accumulating genome sequence data from multiple species offer powerful opportunities for the detection of DNA sequence
evolution. Phylogenetic tree construction and codon-based tests for natural selection are the prevailing tools used to detect
functionally important evolutionary change in protein coding sequences. These analyses often require multiple DNA sequence
alignments that maintain the correct reading frame for each collection of putative orthologous sequences. Since this feature
is not available in most alignment tools, codon reading frames often must be checked manually before evolutionary analyses
ResultsHere we report an online codon-preserved alignment tool (OCPAT) that generates multiple sequence alignments automatically
from the coding sequences of any list of human gene IDs and their putative orthologs from genomes of other vertebrate tetrapods.
OCPAT is programmed to extract putative orthologous genes from genomes and to align the orthologs with the reading frame maintained
in all species. OCPAT also optimizes the alignment by trimming the most variable alignment regions at the 5' and 3' ends of
each gene. The resulting output of alignments is returned in several formats, which facilitates further molecular evolutionary
analyses by appropriate available software. Alignments are generally robust and reliable, retaining the correct reading frame.
The tool can serve as the first step for comparative genomic analyses of protein-coding gene sequences including phylogenetic
tree reconstruction and detection of natural selection. We aligned 20,658 human RefSeq mRNAs using OCPAT. Most alignments
are missing sequence(s) from at least one species; however, functional annotation clustering of the ~1700 transcripts that
were alignable to all species shows that genes involved in multi-subunit protein complexes are highly conserved.
ConclusionThe OCPAT program facilitates large-scale evolutionary and phylogenetic analyses of entire biological processes, pathways,
Source Code for Biology and Medicine 01/2007; 2(1):1-6.
[Show abstract][Hide abstract] ABSTRACT: Biological processes are mediated by networks of interacting genes and proteins. Efforts to map and understand these networks are resulting in the proliferation of interaction data derived from both experimental and computational techniques for a number of organisms. The volume of this data combined with the variety of specific forms it can take has created a need for comprehensive databases that include all of the available data sets, and for exploration tools to facilitate data integration and analysis. One powerful paradigm for the navigation and analysis of interaction data is an interaction graph or map that represents proteins or genes as nodes linked by interactions. Several programs have been developed for graphical representation and analysis of interaction data, yet there remains a need for alternative programs that can provide casual users with rapid easy access to many existing and emerging data sets.
Here we describe a comprehensive database of Drosophila gene and protein interactions collected from a variety of sources, including low and high throughput screens, genetic interactions, and computational predictions. We also present a program for exploring multiple interaction data sets and for combining data from different sources. The program, referred to as the Interaction Map (IM) Browser, is a web-based application for searching and visualizing interaction data stored in a relational database system. Use of the application requires no downloads and minimal user configuration or training, thereby enabling rapid initial access to interaction data. IM Browser was designed to readily accommodate and integrate new types of interaction data as it becomes available. Moreover, all information associated with interaction measurements or predictions and the genes or proteins involved are accessible to the user. This allows combined searches and analyses based on either common or technique-specific attributes. The data can be visualized as an editable graph and all or part of the data can be downloaded for further analysis with other tools for specific applications. The database is available at http://proteome.wayne.edu/PIMdb.html
The Drosophila Interactions Database described here places a variety of disparate data into one easily accessible location. The database has a simple structure that maintains all relevant information about how each interaction was determined. The IM Browser provides easy, complete access to this database and could readily be used to publish other sets of interaction data. By providing access to all of the available information from a variety of data types, the program will also facilitate advanced computational analyses.
[Show abstract][Hide abstract] ABSTRACT: Gene expression profiles from the anterior cingulate cortex (ACC) of human, chimpanzee, gorilla, and macaque samples provide clues about genetic regulatory changes in human and other catarrhine primate brains. The ACC, a cerebral neocortical region, has human-specific histological features. Physiologically, an individual's ACC displays increased activity during that individual's performance of cognitive tasks. Of approximately 45,000 probe sets on microarray chips representing transcripts of all or most human genes, approximately 16,000 were commonly detected in human ACC samples and comparable numbers, 14,000-15,000, in gorilla and chimpanzee ACC samples. Phylogenetic results obtained from gene expression profiles contradict the traditional expectation that the non-human African apes (i.e., chimpanzee and gorilla) should be more like each other than either should be like humans. Instead, the chimpanzee ACC profiles are more like the human than like the gorilla; these profiles demonstrate that chimpanzees are the sister group of humans. Moreover, for those unambiguous expression changes mapping to important biological processes and molecular functions that statistically are significantly represented in the data, the chimpanzee clade shows at least as much apparent regulatory evolution as does the human clade. Among important changes in the ancestry of both humans and chimpanzees, but to a greater extent in humans, are the up-regulated expression profiles of aerobic energy metabolism genes and neuronal function-related genes, suggesting that increased neuronal activity required increased supplies of energy.
Proceedings of the National Academy of Sciences 04/2004; 101(9):2957-62. · 9.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Maps depicting binary interactions between proteins can be powerful starting points for understanding biological systems. A proven technology for generating such maps is high-throughput yeast two-hybrid screening. In the most extensive screen to date, a Gal4-based two-hybrid system was used recently to detect over 20,000 interactions among Drosophila proteins. Although these data are a valuable resource for insights into protein networks, they cover only a fraction of the expected number of interactions.
To complement the Gal4-based interaction data, we used the same set of Drosophila open reading frames to construct arrays for a LexA-based two-hybrid system. We screened the arrays using a novel pooled mating approach, initially focusing on proteins related to cell-cycle regulators. We detected 1,814 reproducible interactions among 488 proteins. The map includes a large number of novel interactions with potential biological significance. Informative regions of the map could be highlighted by searching for paralogous interactions and by clustering proteins on the basis of their interaction profiles. Surprisingly, only 28 interactions were found in common between the LexA- and Gal4-based screens, even though they had similar rates of true positives.
The substantial number of new interactions discovered here supports the conclusion that previous interaction mapping studies were far from complete and that many more interactions remain to be found. Our results indicate that different two-hybrid systems and screening approaches applied to the same proteome can generate more comprehensive datasets with more cross-validated interactions. The cell-cycle map provides a guide for further defining important regulatory networks in Drosophila and other organisms.
[Show abstract][Hide abstract] ABSTRACT: What do functionally important DNA sites, those scrutinized and shaped by natural selection, tell us about the place of humans in evolution? Here we compare approximately 90 kb of coding DNA nucleotide sequence from 97 human genes to their sequenced chimpanzee counterparts and to available sequenced gorilla, orangutan, and Old World monkey counterparts, and, on a more limited basis, to mouse. The nonsynonymous changes (functionally important), like synonymous changes (functionally much less important), show chimpanzees and humans to be most closely related, sharing 99.4% identity at nonsynonymous sites and 98.4% at synonymous sites. On a time scale, the coding DNA divergencies separate the human-chimpanzee clade from the gorilla clade at between 6 and 7 million years ago and place the most recent common ancestor of humans and chimpanzees at between 5 and 6 million years ago. The evolutionary rate of coding DNA in the catarrhine clade (Old World monkey and ape, including human) is much slower than in the lineage to mouse. Among the genes examined, 30 show evidence of positive selection during descent of catarrhines. Nonsynonymous substitutions by themselves, in this subset of positively selected genes, group humans and chimpanzees closest to each other and have chimpanzees diverge about as much from the common human-chimpanzee ancestor as humans do. This functional DNA evidence supports two previously offered taxonomic proposals: family Hominidae should include all extant apes; and genus Homo should include three extant species and two subgenera, Homo (Homo) sapiens (humankind), Homo (Pan) troglodytes (common chimpanzee), and Homo (Pan) paniscus (bonobo chimpanzee).
Proceedings of the National Academy of Sciences 07/2003; 100(12):7181-8. · 9.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: What do functionally important DNA sites, those scrutinized and shaped
by natural selection, tell us about the place of humans in evolution?
Here we compare 90 kb of coding DNA nucleotide sequence from 97 human
genes to their sequenced chimpanzee counterparts and to available
sequenced gorilla, orangutan, and Old World monkey counterparts, and, on
a more limited basis, to mouse. The nonsynonymous changes (functionally
important), like synonymous changes (functionally much less important),
show chimpanzees and humans to be most closely related, sharing 99.4%
identity at nonsynonymous sites and 98.4% at synonymous sites. On a time
scale, the coding DNA divergencies separate the human-chimpanzee clade
from the gorilla clade at between 6 and 7 million years ago and place
the most recent common ancestor of humans and chimpanzees at between 5
and 6 million years ago. The evolutionary rate of coding DNA in the
catarrhine clade (Old World monkey and ape, including human) is much
slower than in the lineage to mouse. Among the genes examined, 30 show
evidence of positive selection during descent of catarrhines.
Nonsynonymous substitutions by themselves, in this subset of positively
selected genes, group humans and chimpanzees closest to each other and
have chimpanzees diverge about as much from the common human-chimpanzee
ancestor as humans do. This functional DNA evidence supports two
previously offered taxonomic proposals: family Hominidae should include
all extant apes; and genus Homo should include three extant species and
two subgenera, Homo (Homo) sapiens (humankind), Homo (Pan) troglodytes
(common chimpanzee), and Homo (Pan) paniscus (bonobo chimpanzee).
Proceedings of the National Academy of Sciences 06/2003; · 9.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: A rate-limiting and costly step in many proteomics analyses is the cloning of all of the ORFs for an organism into technique-specific vectors. Here, we describe the generation of a Campylobacter jejuni expression clone set using a high-throughput cloning approach based on recombination in E. coli. The approach uses native E. coli recombination functions and requires no in vitro enzymatic steps or special strains. Our results indicate that this approach is an efficient and economical alternative for high-throughput cloning.
Journal of Proteome Research 3(3):582-6. · 5.00 Impact Factor