[Show abstract][Hide abstract] ABSTRACT: Soybean (Glycine max) seeds are an important source of seed storage compounds, including protein, oil, and sugar used for food, feed, chemical, and biofuel production. We assessed detailed temporal transcriptional and metabolic changes in developing soybean embryos to gain a systems biology view of developmental and metabolic changes and to identify potential targets for metabolic engineering. Two major developmental and metabolic transitions were captured enabling identification of potential metabolic engineering targets specific to seed filling and to desiccation. The first transition involved a switch between different types of metabolism in dividing and elongating cells. The second transition involved the onset of maturation and desiccation tolerance during seed filling and a switch from photoheterotrophic to heterotrophic metabolism. Clustering analyses of metabolite and transcript data revealed clusters of functionally related metabolites and transcripts active in these different developmental and metabolic programs. The gene clusters provide a resource to generate predictions about the associations and interactions of unknown regulators with their targets based on "guilt-by-association" relationships. The inferred regulators also represent potential targets for future metabolic engineering of relevant pathways and steps in central carbon and nitrogen metabolism in soybean embryos and drought and desiccation tolerance in plants.
[Show abstract][Hide abstract] ABSTRACT: BACKGROUND: Cold acclimation in woody perennials is a metabolically intensive process, but coincides with environmental conditions that are not conducive to the generation of energy through photosynthesis. While the negative effects of low temperatures on the photosynthetic apparatus during winter have been well studied, less is known about how this is reflected at the level of gene and metabolite expression, nor how the plant generates primary metabolites needed for adaptive processes during autumn. RESULTS: The MapMan tool revealed enrichment of the expression of genes related to mitochondrial function, antioxidant and associated regulatory activity, while changes in metabolite levels over the time course were consistent with the gene expression patterns observed. Genes related to thylakoid function were down-regulated as expected, with the exception of plastid targeted specific antioxidant gene products such as thylakoid-bound ascorbate peroxidase, components of the reactive oxygen species scavenging cycle, and the plastid terminal oxidase. In contrast, the conventional and alternative mitochondrial electron transport chains, the tricarboxylic acid cycle, and redox-associated proteins providing reactive oxygen species scavenging generated by electron transport chains functioning at low temperatures were all active. CONCLUSIONS: A regulatory mechanism linking thylakoid-bound ascorbate peroxidase action with "chloroplast dormancy" is proposed. Most importantly, the energy and substrates required for the substantial metabolic remodeling that is a hallmark of freezing acclimation could be provided by heterotrophic metabolism.
[Show abstract][Hide abstract] ABSTRACT: Developing soybean seeds accumulate oils, proteins, and carbohydrates that are used as oxidizable substrates providing metabolic precursors and energy during seed germination. The accumulation of these storage compounds in developing seeds is highly regulated at multiple levels, including at transcriptional and post-transcriptional regulation. RNA sequencing was used to provide comprehensive information about transcriptional and post-transcriptional events that take place in developing soybean embryos. Bioinformatics analyses lead to the identification of different classes of alternatively spliced isoforms and corresponding changes in their levels on a global scale during soybean embryo development. Alternative splicing was associated with transcripts involved in various metabolic and developmental processes, including central carbon and nitrogen metabolism, induction of maturation and dormancy, and splicing itself. Detailed examination of selected RNA isoforms revealed alterations in individual domains that could result in changes in subcellular localization of the resulting proteins, protein-protein and enzyme-substrate interactions, and regulation of protein activities. Different isoforms may play an important role in regulating developmental and metabolic processes occurring at different stages in developing oilseed embryos.
[Show abstract][Hide abstract] ABSTRACT: Sorting permutations by operations such as reversals and block-moves has received much interest because of its applications
in the study of genome rearrangements and in the design of interconnection networks. A short block-move is an operation on
a permutation that moves an element at most two positions away from its original position. This paper investigates the problem
of finding a minimum-length sorting sequence of short block-moves for a given permutation. A 4/3 -approximation algorithm
for this problem is presented. Woven double-strip permutations are defined and a polynomial-time algorithm for this class
of permutations is devised that employs graph matching techniques. A linear-time maximum matching algorithm for a special
class of grid graphs improves the time complexity of the algorithm for woven double-strip permutations.
Key words. Computational biology, Genome rearrangement, Approximation algorithms, Maximum matching, Permutations.
[Show abstract][Hide abstract] ABSTRACT: Massive amounts of transcriptomic data documenting plant responses to changes in environment continue to accumulate in online databases. Unfortunately, many of these data sets have not been analyzed in full detail, especially those that involve time course experiments. To gain more knowledge of the successive gene expression events that occur when stress is initiated in one organ and then relayed to another, we have chosen stress response data for Arabidopsis shoots and roots from the detailed time course study of Killian et al. as a promising source to mine. Using refined statistical analysis, modified vector analysis, and a GO enrichment algorithm, more information was revealed concerning the effects of salt and UVB on gene expression events in shoots and roots over a 24-h time period. GeneMania, with in-house modifications, was used to further analyze abscisic acid (ABA) and jasmonic acid-related (JA) gene expression events in salt-stressed roots and shoots. JA effects appeared to be quite distinct in roots when compared to shoots, especially with respect to the expression of members of the negative regulatory JAZ gene family. In contrast, ABA-related gene expression events were more similar in the two organs. Instances of crosstalk between hormones were observed, as were early responses of regulatory genes involved in both auxin and cytokinin signaling. In the case of each hormone class examined, hormone biosynthesis genes were coexpressed with the genes encoding negative regulators of the corresponding signaling pathway. Hypotheses to explain this finding and future experiments to further explore these nonlinear phenomena are proposed.
Omics: a journal of integrative biology 03/2012; 16(4):208-28. · 2.29 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Microarray gene expression profiling is a powerful technique to understand complex developmental processes, but making biologically meaningful inferences from such studies has always been challenging. We previously reported a microarray study of the freezing acclimation period in Sitka spruce (Picea sitchensis) in which a large number of candidate genes for climatic adaptation were identified. In the current paper, we apply additional systems biology tools to these data to further probe changes in the levels of genes and metabolites and activities of associated pathways that regulate this complex developmental transition. One aspect of this adaptive process that is not well understood is the role of the cell wall. Our data suggest coordinated metabolic and signaling responses leading to cell wall remodeling. Co-expression of genes encoding proteins associated with biosynthesis of structural and non-structural cell wall carbohydrates was observed, which may be regulated by ethylene signaling components. At the same time, numerous genes, whose products are putatively localized to the endomembrane system and involved in both the synthesis and trafficking of cell wall carbohydrates, were up-regulated. Taken together, these results suggest a link between ethylene signaling and biosynthesis, and targeting of cell wall related gene products during the period of winter hardening. Automated Layout Pipeline for Inferred NEtworks (ALPINE), an in-house plugin for the Cytoscape visualization environment that utilizes the existing GeneMANIA and Mosaic plugins, together with the use of visualization tools, provided images of proposed signaling processes that became active over the time course of winter hardening, particularly at later time points in the process. The resulting visualizations have the potential to reveal novel, hypothesis-generating, gene association patterns in the context of targeted subcellular location.
Frontiers in Plant Science 01/2012; 3:241. · 3.64 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Ancestral genome reconstruction can be understood as a phylogenetic study with more details than a traditional phylogenetic tree reconstruction. We present a new computational system called REGEN for ancestral bacterial genome reconstruction at both the gene and replicon levels. REGEN reconstructs gene content, contiguous gene runs, and replicon structure for each ancestral genome. Along each branch of the phylogenetic tree, REGEN infers evolutionary events, including gene creation and deletion and replicon fission and fusion. The reconstruction can be performed by either a maximum parsimony or a maximum likelihood method. Gene content reconstruction is based on the concept of neighboring gene pairs. REGEN was designed to be used with any set of genomes that are sufficiently related, which will usually be the case for bacteria within the same taxonomic order. We evaluated REGEN using simulated genomes and genomes in the Rhizobiales order.
[Show abstract][Hide abstract] ABSTRACT: Inversions are one of the most frequent large-scale rearrangements observed in actual genomes. While a large body of literature exists on mathematical problems related to the computation of the inversion distance between abstract genomes, these works generally do not take into account that most inversions in bacterial chromosomes are symmetric or roughly symmetric with respect to the origin of replication. We define a new problem: how to sort genomes (or permutations) using almost-symmetric inversions. We show an algorithm that can sort any permutation using only almost-symmetric inversions. Two variants of this algorithm are presented that have better performance in practice. We explore the question of determining the minimum number of almost-symmetric inversions needed to sort a genome by presenting lower and upper bounds and results for special permutation families. The results obtained are the first steps in exploring this interesting new problem.
[Show abstract][Hide abstract] ABSTRACT: ClaMS - "Classifier for Metagenomic Sequences" - is a Java application for binning assembled contigs in metagenomes using user-specified training sets and initial parameters. Since ClaMS trains on sequence composition-based genomic signatures, it is much faster than binning tools that rely on alignments to homologs; ClaMS can bin ~20,000 sequences in 3 minutes on a laptop with a 2.4 GH× Intel Core 2 Duo processor and 2 GB RAM. ClaMS is meant to be a desktop application for biologists and can be run on any machine under any Operating System on which the Java Runtime Environment can be installed.
Standards in Genomic Sciences 11/2011; 5(2):248-53. · 3.17 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Most real-world networks exhibit a high clustering coefficient—the probability that two neighbors of a node are also neighbors of each other. We propose two algorithms, Conf and Throw, that take triangle and single edge degree sequences as input and generate a random graph with a target clustering coefficient. We analyze them theoretically for the case of a regular graph. Conf generates a random graph with the input degree sequence and the clustering coefficient anticipated from the input. Experimental results match quite well with the anticipated clustering coefficient except for highly dense graphs, in which case the experimental clustering coefficient is higher than the anticipated value. For Throw, the degree sequence and the clustering coefficient of the generated graph varies from the input. However, it maintains the expected degree distribution, and the clustering coefficient of the generated graph can also be predicted using analytical results. Experiments show that, for Throw, the results match quite well with the analytical results. Typically, only information about degree distribution is available. We also propose an algorithm Deg that takes degree sequence and clustering coefficient as input and generates a graph with the same properties. Experiments show results for Deg that are quite similar to those for Conf.
Physica A: Statistical Mechanics and its Applications 11/2011; 390(23):4577-4587. · 1.72 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We present a graph-based model for representing two aligned genomic sequences. An alignment graph is a mixed graph consisting of two sets of vertices, each representing one of the input sequences, and three sets of edges. These edges allow the model to represent a number of evolutionary events. This model is used to perform sequence alignment at the level of nucleotides. We define a scoring function for alignment graphs. We show that minimizing the score is NP-complete. However, we present a dynamic programming algorithm that solves the minimization problem optimally for a certain class of alignments, called breakable arrangements. Algorithms for analyzing breakable arrangements are presented. We also present a greedy algorithm that is capable of representing reversals. We present a dynamic programming algorithm that optimally aligns two genomic sequences, when one of the input sequences is a breakable arrangement of the other. Comparing what we define as breakable arrangements to alignments generated by other algorithms, it is seen that many already aligned genomes fall into the category of being breakable. Moreover, the greedy algorithm is shown to represent reversals, besides rearrangements, mutations, and other evolutionary events.
Journal of computational biology: a journal of computational molecular cell biology 05/2011; 18(5):705-28. · 1.69 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The Structural Classification of Proteins (SCOP) database uses a large number of hidden Markov models (HMMs) to represent families and superfamilies composed of proteins that presumably share the same evolutionary origin. However, how the HMMs are related to one another has not been examined before.
In this work, taking into account the processes used to build the HMMs, we propose a working hypothesis to examine the relationships between HMMs and the families and superfamilies that they represent. Specifically, we perform an all-against-all HMM comparison using the HHsearch program (similar to BLAST) and construct a network where the nodes are HMMs and the edges connect similar HMMs. We hypothesize that the HMMs in a connected component belong to the same family or superfamily more often than expected under a random network connection model. Results show a pattern consistent with this working hypothesis. Moreover, the HMM network possesses features distinctly different from the previously documented biological networks, exemplified by the exceptionally high clustering coefficient and the large number of connected components.
The current finding may provide guidance in devising computational methods to reduce the degree of overlaps between the HMMs representing the same superfamilies, which may in turn enable more efficient large-scale sequence searches against the database of HMMs.
[Show abstract][Hide abstract] ABSTRACT: Heat shock proteins (HSPs) are induced not only under heat stress conditions but also under other environmental stresses such as water stress. In plants, HSPs families are larger than those of other eukaryotes. In order to elucidate a possible connection between HSP expression and photosynthetic acclimation or conditioning, we conducted a water stress experiment in loblolly pine (Pinus taeda L.) seedlings involving progressive treatment consisting of one cycle of mild stress (-1 MPa) followed by two cycles of severe stress (-1.7 MPa). Net photosynthesis was measured at each stress level. Photosynthetic acclimation occurred in the progressive treatment after the first cycle, but not in the severe treatment, suggesting that a cycle of mild stress conditioned the trees to adapt to a more severe stress. Real time results indicated specific patterns in needles in the expression of HSP70, HSP90 and sHSP genes for each treatment, both at maximum stress and at recovery. We identified a pine homolog to GRP94 (ER resident HSP90) that was induced after rehydration coincident with acclimation. Further analysis of the promoter region of the pine GRP94 showed putative cis-elements associated with water stress and rehydration, corresponding to the expression pattern observed in our experiment.
Plant Physiology and Biochemistry 02/2010; 48(4):256-64. · 2.35 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Systems biology has made massive strides in recent years, with capabilities to model complex systems including cell division, stress response, energy metabolism, and signaling pathways. Concomitant with their improved modeling capa- bilities, however, such biochemical network models have also become notoriously complex for humans to comprehend. We propose network comprehension as a key problem for the KDD community, where the goal is to create explainable representations of complex biological networks. We formu- late this problem as one of extracting temporal signatures from multi-variate time series data, where the signatures are composed of ordinal comparisons between time series components. We show how such signatures can be inferred by formulating the data mining problem as one of feature selection in rank-order space. We propose ve new feaure selection strategies for rank-order space and assess their se- lective superiorities. Experimental results on budding yeast cell cycle models demonstrate compelling results comparable to human interpretations of the cell cycle.
Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25-28, 2010; 01/2010
[Show abstract][Hide abstract] ABSTRACT: With the advent of the thousand dollar genome, one can anticipate the need to store, communicate, and manipulate many human genomes. Data compression methods have been developed to store and communicate genomes efficiently. Unfortunately, these methods do not support efficient manipulation (e.g., subsequence retrieval) of the compressed genome. We develop a data compression scheme that achieves both efficient storage and efficient sequence manipula-tion. We demonstrate the practicality of the method on two databases of genomes, one for the human mitochondrion and one for the H3N2 virus. In both cases, we achieve high compression ratios and O(log n) subsequence retrieval times.
[Show abstract][Hide abstract] ABSTRACT: A multimodal network (MMN) is a novel graph-theoretic formalism designed to capture the structure of biological networks and to represent relationships derived from multiple biological databases. MMNs generalize the standard notions of graphs and hypergraphs, which are the bases of current diagrammatic representations of biological phenomena and incorporate the concept of mode. Each vertex of an MMN is a biological entity, a biot, while each modal hyperedge is a typed relationship, where the type is given by the mode of the hyperedge. The current paper defines MMNs and concentrates on the structural aspects of MMNs. A companion paper develops MMNs as a representation of the semantics of biological networks and discusses applications of the MMNs in managing complex biological data. The MMN model has been implemented in a database system containing multiple kinds of biological networks.
IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 07/2009; 6(2):321-32. · 2.25 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: A multimodal network (MMN) is a novel graph-theoretic formalism designed to capture the structure of biological networks and to represent relationships derived from multiple biological databases. MMNs generalize the standard notions of graphs and hypergraphs, which are the bases of current diagrammatic representations of biological phenomena, and incorporate the concept of mode. Each vertex of an MMN is a biological entity, a biot, while each modal hyperedge is a typed relationship, where the type is given by the mode of the hyperedge. The semantics of each modal hyperedge e is given through denotational semantics, where a valuation function fe defines the relationship among the values of the vertices incident on e. The meaning of an MMN is denoted in terms of the semantics of a hyperedge sequence. A companion paper defines MMNs and concentrates on the structural aspects of MMNs. This paper develops MMN denotational semantics when used as a representation of the semantics of biological networks and discusses applications of MMNs in managing complex biological data.
IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 04/2009; 6(2):271-80. · 2.25 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We have explored correlations between the measured efficiency of the RNAi process and several computed signatures that characterize equilibrium secondary structure of the participating mRNA, siRNA, and their complexes. A previously published data set of 609 experimental points (with efficiency represented as percentage of remaining mRNA) was used for the analysis. While virtually no correlation with the computed structural signatures are observed for individual data points, several clear trends emerge when the ldquonoiserdquo is reduced by averaging over 10 bins of N ~ 60 data points per bin. The strongest of the trends is a positive linear (r2 = 0.87) correlation between ln(remaining mRNA) and DeltaGms, the combined free energy cost of unraveling the siRNA and creating the break in the mRNA secondary structure at the complementary target strand region. At the same time, the free energy change DeltaGtotal of the entire process mRNA + siRNA rarr (mRNA-siRNA)Complex is not correlated with RNAi efficiency, even after the averaging. These general findings appear to be robust to details of the computational protocols, suggesting that, while straightforward analysis based on equilibrium secondary structure thermodynamics may not be directly applicable to the entire RNAi process, it is applicable to at least one of its key stages. The correlation between computed DeltaGms and experimentally observed RNAi efficiency can be used to enhance the ability of a machine learning algorithm based on a support vector machine (SVM) to predict effective siRNA sequences for a given target mRNA. Specifically, we observe modest, 3 to 7%, but consistent improvement in the positive predictive value (PPV) when the SVM training set is pre- or post-filtered to half the original size according to a DeltaGms threshold.
[Show abstract][Hide abstract] ABSTRACT: There has been much research on the combinatorial problem of generating the linear extensions of a given poset. This paper focuses on the reverse of that problem, where the input is a set of linear orders, and the goal is to construct a poset or set of posets that generates the input. Such a problem finds applications in computational neuroscience, systems biology, paleontology, and physical plant engineering. In this paper, several algorithms are presented for efficiently finding a single poset that generates the input set of linear orders. The variation of the problem where a minimum set of posets that cover the input is also explored. It is found that the problem is polynomially solvable for one class of simple posets (kite(2) posets) but NP-complete for a related class (hammock(2,2,2) posets).
Discrete Mathematics Algorithms and Applications 01/2009; 05(04).