[Show abstract][Hide abstract] ABSTRACT: The relentless advance of biochemistry has enabled us to take apart biological systems with ever more fine-grained and precise
instruments. The fruits of this dissection are millions of measurements of base pairs and biochemical concentrations. Yet
to make sense of these numbers, we need to reverse our dissection by putting the system back together on the computer. This
first step in this process is reconstructing molecular anatomy through static modeling, the determination of which pieces (DNA, RNA, protein, and metabolite) is present, and how they are related (e.g., regulator,
target, inhibitor, cofactor). Given this broad outline of component connectivity, we may then attempt to reconstruct molecular
physiology via dynamic modeling, computer simulations that model when cellular events occur (ODE), where they occur (PDE), and how frequently they recur
(SDE). In this review we discuss techniques for both of these modeling paradigms, illustrating each by reference to important
KeywordsBiological networks-Computer simulation-Dynamic modeling-Static modeling
[Show abstract][Hide abstract] ABSTRACT: We developed Graemlin 2.0, a new multiple network aligner with (1) a new multi-stage approach to local network alignment; (2) a novel scoring function that can use arbitrary features of a multiple network alignment, such as protein deletions, protein duplications, protein mutations, and interaction losses; (3) a parameter learning algorithm that uses a training set of known network alignments to learn parameters for our scoring function and thereby adapt it to any set of networks; and (4) an algorithm that uses our scoring function to find approximate multiple network alignments in linear time. We tested Graemlin 2.0's accuracy on protein interaction networks from IntAct, DIP, and the Stanford Network Database. We show that, on each of these datasets, Graemlin 2.0 has higher sensitivity and specificity than existing network aligners. Graemlin 2.0 is available under the GNU public license at http://graemlin.stanford.edu .
Journal of computational biology: a journal of computational molecular cell biology 09/2009; 16(8):1001-22. DOI:10.1089/cmb.2009.0099 · 1.74 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We have experimentally and computationally defined a set of genes that form a conserved metabolic module in the alpha-proteobacterium Caulobacter crescentus and used this module to illustrate a schema for the propagation of pathway-level annotation across bacterial genera. Applying comprehensive forward and reverse genetic methods and genome-wide transcriptional analysis, we (1) confirmed the presence of genes involved in catabolism of the abundant environmental sugar myo-inositol, (2) defined an operon encoding an ABC-family myo-inositol transmembrane transporter, and (3) identified a novel myo-inositol regulator protein and cis-acting regulatory motif that control expression of genes in this metabolic module. Despite being encoded from non-contiguous loci on the C. crescentus chromosome, these myo-inositol catabolic enzymes and transporter proteins form a tightly linked functional group in a computationally inferred network of protein associations. Primary sequence comparison was not sufficient to confidently extend annotation of all components of this novel metabolic module to related bacterial genera. Consequently, we implemented the Graemlin multiple-network alignment algorithm to generate cross-species predictions of genes involved in myo-inositol transport and catabolism in other alpha-proteobacteria. Although the chromosomal organization of genes in this functional module varied between species, the upstream regions of genes in this aligned network were enriched for the same palindromic cis-regulatory motif identified experimentally in C. crescentus. Transposon disruption of the operon encoding the computationally predicted ABC myo-inositol transporter of Sinorhizobium meliloti abolished growth on myo-inositol as the sole carbon source, confirming our cross-genera functional prediction. Thus, we have defined regulatory, transport, and catabolic genes and a cis-acting regulatory sequence that form a conserved module required for myo-inositol metabolism in select alpha-proteobacteria. Moreover, this study describes a forward validation of gene-network alignment, and illustrates a strategy for reliably transferring pathway-level annotation across bacterial species.
[Show abstract][Hide abstract] ABSTRACT: The collection of multiple genome-scale datasets is now routine, and the frontier of research in systems biology has shifted accordingly. Rather than clustering a single dataset to produce a static map of functional modules, the focus today is on data integration, network alignment, interactive visualization and ontological markup. Because of the intrinsic noisiness of high-throughput measurements, statistical methods have been central to this effort. In this review, we briefly survey available datasets in functional genomics, review methods for data integration and network alignment, and describe recent work on using network models to guide experimental validation. We explain how the integration and validation steps spring from a Bayesian description of network uncertainty, and conclude by describing an important near-term milestone for systems biology: the construction of a set of rich reference networks for key model organisms.
Briefings in Bioinformatics 10/2007; 8(5):318-32. DOI:10.1093/bib/bbm038 · 9.62 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The recent proliferation of protein interaction networks has motivated research into network alignment: the cross-species comparison of conserved functional modules. Previous studies have laid the foundations for such comparisons and demonstrated their power on a select set of sparse interaction networks. Recently, however, new computational techniques have produced hundreds of predicted interaction networks with interconnection densities that push existing alignment algorithms to their limits. To find conserved functional modules in these new networks, we have developed Graemlin, the first algorithm capable of scalable multiple network alignment. Graemlin's explicit model of functional evolution allows both the generalization of existing alignment scoring schemes and the location of conserved network topologies other than protein complexes and metabolic pathways. To assess Graemlin's performance, we have developed the first quantitative benchmarks for network alignment, which allow comparisons of algorithms in terms of their ability to recapitulate the KEGG database of conserved functional modules. We find that Graemlin achieves substantial scalability gains over previous methods while improving sensitivity.
Genome Research 10/2006; 16(9):1169-81. DOI:10.1101/gr.5235706 · 14.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Current systems that publish relational data as nested (XML) views are passive in the sense that they can only respond to user-initiated queries over the nested views. In this article, we propose an active system whereby users can place triggers on (unmaterialized) nested views of relational data. In this architecture, we present scalable and efficient techniques for processing triggers over nested views by leveraging existing support for SQL triggers over flat relations in commercial relational databases. We have implemented our proposed techniques in the context of the Quark XML middleware system. Our performance results indicate that our proposed techniques are a feasible approach to supporting triggers over nested views of relational data.
ACM Transactions on Database Systems 09/2006; 31(3):921-967. DOI:10.1145/1166074.1166080 · 0.68 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We have combined four different types of functional genomic data to create high coverage protein interaction networks for 11 mi- crobes. Our integration algorithm naturally handles statistically depen- dent predictors and automatically corrects for differing noise levels and data corruption in different evidence sources. We find that many of the predictions in each integrated network hinge on moderate but consis- tent evidence from multiple sources rather than strong evidence from a single source, yielding novel biology which would be missed if a single data source such as coexpression or coinheritance was used in isolation. In addition to statistical analysis, we demonstrate via case study that these subtle interactions can discover new aspects of even well studied functional modules. Our work represents the largest collection of proba- bilistic protein interaction networks compiled to date, and our methods can be applied to any sequenced organism and any kind of experimental or computational technique which produces pairwise measures of protein interaction.
Research in Computational Molecular Biology, 10th Annual International Conference, RECOMB 2006, Venice, Italy, April 2-5, 2006, Proceedings; 04/2006
[Show abstract][Hide abstract] ABSTRACT: XML has emerged as a dominant standard for information exchange on the Internet. However, a large fraction of data continues to be stored in relational databases. At a high level, there are two approaches to supporting triggers over XML views. The first is to materialize the entire view and store it in an XML database with support for XML triggers. However, this approach suffers from the overhead of replicating and incrementally maintaining the materialized XML on every relational update affecting the view, even though users may only be interested in relatively rare events. In this paper, we propose the alternative approach of translating XML triggers into SQL triggers. There are some challenges involved in this approach, however, because triggers can be specified over complex XML views with nested predicates, while SQL triggers can only be specified over flat tables. Consequently, even identifying the parts of an XML view that could have changed due to a (possibly deeply nested) SQL update is a non-trivial task, as is the problem of computing the old and new values of an updated fragment of the view. We address the above challenges and propose a system architecture and an algorithm for supporting triggers over XML views of relational data. We implement and evaluate our system; the performance results indicate our techniques are a feasible approach to supporting triggers over XML views of relational data.
Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, 5-8 April 2005, Tokyo, Japan; 01/2005
[Show abstract][Hide abstract] ABSTRACT: Usage data at a high-traffic web site can expose information about external events and surges in popularity that may not be accessible solely from analyses of content and link structure. We consider sites that are organized around a set of items available for purchase or download, consider, for example, an e-commerce site or collection of online research papers, and we study a simple indicator of collective user interest in an item, the batting average, defined as the fraction of visits to an item's description that result in an acquisition of that item. We develop a stochastic model for identifying points in time at which an item's batting average experiences significant change. In experiments with usage data from the Internet Archive, we find that such changes often occur in an abrupt, discrete fashion, and that these changes can be closely aligned with events such as the highlighting of an item on the site or the appearance of a link from an active external referrer. In this way, analyzing the dynamics of item popularity at an active web site can help characterize the impact of a range of events taking place both on and off the site.
Proceedings of the National Academy of Sciences 05/2004; 101 Suppl 1(Supplement 1):5254-60. DOI:10.1073/pnas.0307539100 · 9.67 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Usage data at a high-traffic Web site can expose information about external events and surges in popularity that may not be accessible solely from analyses of content and link structure.
[Show abstract][Hide abstract] ABSTRACT: Usage data at a high-trafc Web site can expose information about external events and surges in popularity that may not be accessible solely from analyses of content and link struc- ture. We consider sites that are organized around a set of items available for purchase or download ó consider for example an e-commerce site or collection of on-line research papers ó and we study a simple indicator of collective user interest in an item, the batting average, dened as the fraction of visits to an item's description that result in an acquisition of that item. We develop a stochastic model for identifying points in time at which an item's batting average experiences signicant change. In experiments with usage data from the Internet Archive, we nd that such changes often occur in an abrupt, discrete fashion, and that these changes can be closely aligned with events such as the highlighting of an item on the site or the appearance of a link from an active external referrer. In this way, analyzing the dynamics of item popularity at an active Web site can help characterize the impact of a range of events taking place both on and off the site.