[show abstract][hide abstract] ABSTRACT: BACKGROUND: Cold acclimation in woody perennials is a metabolically intensive process, but coincides with environmental conditions that are not conducive to the generation of energy through photosynthesis. While the negative effects of low temperatures on the photosynthetic apparatus during winter have been well studied, less is known about how this is reflected at the level of gene and metabolite expression, nor how the plant generates primary metabolites needed for adaptive processes during autumn. RESULTS: The MapMan tool revealed enrichment of the expression of genes related to mitochondrial function, antioxidant and associated regulatory activity, while changes in metabolite levels over the time course were consistent with the gene expression patterns observed. Genes related to thylakoid function were down-regulated as expected, with the exception of plastid targeted specific antioxidant gene products such as thylakoid-bound ascorbate peroxidase, components of the reactive oxygen species scavenging cycle, and the plastid terminal oxidase. In contrast, the conventional and alternative mitochondrial electron transport chains, the tricarboxylic acid cycle, and redox-associated proteins providing reactive oxygen species scavenging generated by electron transport chains functioning at low temperatures were all active. CONCLUSIONS: A regulatory mechanism linking thylakoid-bound ascorbate peroxidase action with "chloroplast dormancy" is proposed. Most importantly, the energy and substrates required for the substantial metabolic remodeling that is a hallmark of freezing acclimation could be provided by heterotrophic metabolism.
[show abstract][hide abstract] ABSTRACT: Sorting permutations by operations such as reversals and block-moves has received much interest because of its applications
in the study of genome rearrangements and in the design of interconnection networks. A short block-move is an operation on
a permutation that moves an element at most two positions away from its original position. This paper investigates the problem
of finding a minimum-length sorting sequence of short block-moves for a given permutation. A 4/3 -approximation algorithm
for this problem is presented. Woven double-strip permutations are defined and a polynomial-time algorithm for this class
of permutations is devised that employs graph matching techniques. A linear-time maximum matching algorithm for a special
class of grid graphs improves the time complexity of the algorithm for woven double-strip permutations.
Key words. Computational biology, Genome rearrangement, Approximation algorithms, Maximum matching, Permutations.
[show abstract][hide abstract] ABSTRACT: Massive amounts of transcriptomic data documenting plant responses to changes in environment continue to accumulate in online databases. Unfortunately, many of these data sets have not been analyzed in full detail, especially those that involve time course experiments. To gain more knowledge of the successive gene expression events that occur when stress is initiated in one organ and then relayed to another, we have chosen stress response data for Arabidopsis shoots and roots from the detailed time course study of Killian et al. as a promising source to mine. Using refined statistical analysis, modified vector analysis, and a GO enrichment algorithm, more information was revealed concerning the effects of salt and UVB on gene expression events in shoots and roots over a 24-h time period. GeneMania, with in-house modifications, was used to further analyze abscisic acid (ABA) and jasmonic acid-related (JA) gene expression events in salt-stressed roots and shoots. JA effects appeared to be quite distinct in roots when compared to shoots, especially with respect to the expression of members of the negative regulatory JAZ gene family. In contrast, ABA-related gene expression events were more similar in the two organs. Instances of crosstalk between hormones were observed, as were early responses of regulatory genes involved in both auxin and cytokinin signaling. In the case of each hormone class examined, hormone biosynthesis genes were coexpressed with the genes encoding negative regulators of the corresponding signaling pathway. Hypotheses to explain this finding and future experiments to further explore these nonlinear phenomena are proposed.
Omics: a journal of integrative biology 03/2012; 16(4):208-28. · 2.29 Impact Factor
[show abstract][hide abstract] ABSTRACT: Microarray gene expression profiling is a powerful technique to understand complex developmental processes, but making biologically meaningful inferences from such studies has always been challenging. We previously reported a microarray study of the freezing acclimation period in Sitka spruce (Picea sitchensis) in which a large number of candidate genes for climatic adaptation were identified. In the current paper, we apply additional systems biology tools to these data to further probe changes in the levels of genes and metabolites and activities of associated pathways that regulate this complex developmental transition. One aspect of this adaptive process that is not well understood is the role of the cell wall. Our data suggest coordinated metabolic and signaling responses leading to cell wall remodeling. Co-expression of genes encoding proteins associated with biosynthesis of structural and non-structural cell wall carbohydrates was observed, which may be regulated by ethylene signaling components. At the same time, numerous genes, whose products are putatively localized to the endomembrane system and involved in both the synthesis and trafficking of cell wall carbohydrates, were up-regulated. Taken together, these results suggest a link between ethylene signaling and biosynthesis, and targeting of cell wall related gene products during the period of winter hardening. Automated Layout Pipeline for Inferred NEtworks (ALPINE), an in-house plugin for the Cytoscape visualization environment that utilizes the existing GeneMANIA and Mosaic plugins, together with the use of visualization tools, provided images of proposed signaling processes that became active over the time course of winter hardening, particularly at later time points in the process. The resulting visualizations have the potential to reveal novel, hypothesis-generating, gene association patterns in the context of targeted subcellular location.
[show abstract][hide abstract] ABSTRACT: ClaMS - "Classifier for Metagenomic Sequences" - is a Java application for binning assembled contigs in metagenomes using user-specified training sets and initial parameters. Since ClaMS trains on sequence composition-based genomic signatures, it is much faster than binning tools that rely on alignments to homologs; ClaMS can bin ~20,000 sequences in 3 minutes on a laptop with a 2.4 GH× Intel Core 2 Duo processor and 2 GB RAM. ClaMS is meant to be a desktop application for biologists and can be run on any machine under any Operating System on which the Java Runtime Environment can be installed.
Standards in Genomic Sciences 11/2011; 5(2):248-53. · 2.01 Impact Factor
[show abstract][hide abstract] ABSTRACT: The Structural Classification of Proteins (SCOP) database uses a large number of hidden Markov models (HMMs) to represent families and superfamilies composed of proteins that presumably share the same evolutionary origin. However, how the HMMs are related to one another has not been examined before.
In this work, taking into account the processes used to build the HMMs, we propose a working hypothesis to examine the relationships between HMMs and the families and superfamilies that they represent. Specifically, we perform an all-against-all HMM comparison using the HHsearch program (similar to BLAST) and construct a network where the nodes are HMMs and the edges connect similar HMMs. We hypothesize that the HMMs in a connected component belong to the same family or superfamily more often than expected under a random network connection model. Results show a pattern consistent with this working hypothesis. Moreover, the HMM network possesses features distinctly different from the previously documented biological networks, exemplified by the exceptionally high clustering coefficient and the large number of connected components.
The current finding may provide guidance in devising computational methods to reduce the degree of overlaps between the HMMs representing the same superfamilies, which may in turn enable more efficient large-scale sequence searches against the database of HMMs.
[show abstract][hide abstract] ABSTRACT: Heat shock proteins (HSPs) are induced not only under heat stress conditions but also under other environmental stresses such as water stress. In plants, HSPs families are larger than those of other eukaryotes. In order to elucidate a possible connection between HSP expression and photosynthetic acclimation or conditioning, we conducted a water stress experiment in loblolly pine (Pinus taeda L.) seedlings involving progressive treatment consisting of one cycle of mild stress (-1 MPa) followed by two cycles of severe stress (-1.7 MPa). Net photosynthesis was measured at each stress level. Photosynthetic acclimation occurred in the progressive treatment after the first cycle, but not in the severe treatment, suggesting that a cycle of mild stress conditioned the trees to adapt to a more severe stress. Real time results indicated specific patterns in needles in the expression of HSP70, HSP90 and sHSP genes for each treatment, both at maximum stress and at recovery. We identified a pine homolog to GRP94 (ER resident HSP90) that was induced after rehydration coincident with acclimation. Further analysis of the promoter region of the pine GRP94 showed putative cis-elements associated with water stress and rehydration, corresponding to the expression pattern observed in our experiment.
Plant Physiology and Biochemistry 02/2010; 48(4):256-64. · 2.78 Impact Factor
[show abstract][hide abstract] ABSTRACT: Systems biology has made massive strides in recent years, with capabilities to model complex systems including cell division, stress response, energy metabolism, and signaling pathways. Concomitant with their improved modeling capa- bilities, however, such biochemical network models have also become notoriously complex for humans to comprehend. We propose network comprehension as a key problem for the KDD community, where the goal is to create explainable representations of complex biological networks. We formu- late this problem as one of extracting temporal signatures from multi-variate time series data, where the signatures are composed of ordinal comparisons between time series components. We show how such signatures can be inferred by formulating the data mining problem as one of feature selection in rank-order space. We propose ve new feaure selection strategies for rank-order space and assess their se- lective superiorities. Experimental results on budding yeast cell cycle models demonstrate compelling results comparable to human interpretations of the cell cycle.
Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25-28, 2010; 01/2010
[show abstract][hide abstract] ABSTRACT: With the advent of the thousand dollar genome, one can anticipate the need to store, communicate, and manipulate many human genomes. Data compression methods have been developed to store and communicate genomes efficiently. Unfortunately, these methods do not support efficient manipulation (e.g., subsequence retrieval) of the compressed genome. We develop a data compression scheme that achieves both efficient storage and efficient sequence manipula-tion. We demonstrate the practicality of the method on two databases of genomes, one for the human mitochondrion and one for the H3N2 virus. In both cases, we achieve high compression ratios and O(log n) subsequence retrieval times.
[show abstract][hide abstract] ABSTRACT: A multimodal network (MMN) is a novel graph-theoretic formalism designed to capture the structure of biological networks and to represent relationships derived from multiple biological databases. MMNs generalize the standard notions of graphs and hypergraphs, which are the bases of current diagrammatic representations of biological phenomena and incorporate the concept of mode. Each vertex of an MMN is a biological entity, a biot, while each modal hyperedge is a typed relationship, where the type is given by the mode of the hyperedge. The current paper defines MMNs and concentrates on the structural aspects of MMNs. A companion paper develops MMNs as a representation of the semantics of biological networks and discusses applications of the MMNs in managing complex biological data. The MMN model has been implemented in a database system containing multiple kinds of biological networks.
IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 07/2009; 6(2):321-32. · 2.25 Impact Factor
[show abstract][hide abstract] ABSTRACT: A multimodal network (MMN) is a novel graph-theoretic formalism designed to capture the structure of biological networks and to represent relationships derived from multiple biological databases. MMNs generalize the standard notions of graphs and hypergraphs, which are the bases of current diagrammatic representations of biological phenomena, and incorporate the concept of mode. Each vertex of an MMN is a biological entity, a biot, while each modal hyperedge is a typed relationship, where the type is given by the mode of the hyperedge. The semantics of each modal hyperedge e is given through denotational semantics, where a valuation function fe defines the relationship among the values of the vertices incident on e. The meaning of an MMN is denoted in terms of the semantics of a hyperedge sequence. A companion paper defines MMNs and concentrates on the structural aspects of MMNs. This paper develops MMN denotational semantics when used as a representation of the semantics of biological networks and discusses applications of MMNs in managing complex biological data.
IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 01/2009; 6(2):271-80. · 2.25 Impact Factor
[show abstract][hide abstract] ABSTRACT: There has been much research on the combinatorial problem of generating the linear extensions of a given poset. This paper focuses on the reverse of that problem, where the input is a set of linear orders, and the goal is to construct a poset or set of posets that generates the input. Such a problem finds applications in computational neuroscience, systems biology, paleontology, and physical plant engineering. In this paper, several algorithms are presented for efficiently finding a single poset that generates the input set of linear orders. The variation of the problem where a minimum set of posets that cover the input is also explored. It is found that the problem is polynomially solvable for one class of simple posets (kite(2) posets) but NP-complete for a related class (hammock(2,2,2) posets).
[show abstract][hide abstract] ABSTRACT: We have explored correlations between the measured efficiency of the RNAi process and several computed signatures that characterize equilibrium secondary structure of the participating mRNA, siRNA, and their complexes. A previously published data set of 609 experimental points (with efficiency represented as percentage of remaining mRNA) was used for the analysis. While virtually no correlation with the computed structural signatures are observed for individual data points, several clear trends emerge when the ldquonoiserdquo is reduced by averaging over 10 bins of N ~ 60 data points per bin. The strongest of the trends is a positive linear (r2 = 0.87) correlation between ln(remaining mRNA) and DeltaGms, the combined free energy cost of unraveling the siRNA and creating the break in the mRNA secondary structure at the complementary target strand region. At the same time, the free energy change DeltaGtotal of the entire process mRNA + siRNA rarr (mRNA-siRNA)Complex is not correlated with RNAi efficiency, even after the averaging. These general findings appear to be robust to details of the computational protocols, suggesting that, while straightforward analysis based on equilibrium secondary structure thermodynamics may not be directly applicable to the entire RNAi process, it is applicable to at least one of its key stages. The correlation between computed DeltaGms and experimentally observed RNAi efficiency can be used to enhance the ability of a machine learning algorithm based on a support vector machine (SVM) to predict effective siRNA sequences for a given target mRNA. Specifically, we observe modest, 3 to 7%, but consistent improvement in the positive predictive value (PPV) when the SVM training set is pre- or post-filtered to half the original size according to a DeltaGms threshold.
[show abstract][hide abstract] ABSTRACT: Gene conversion, a non-reciprocal transfer of genetic information from one sequence to another, is a biological process whose
importance in affecting both short-term and long-term evolution cannot be overemphasized. Knowing where gene conversion has
occurred gives us important insights into gene duplication and evolution in general. In this paper we present an ensemble-based
learning method for predicting gene conversions using two different models of reticulate evolution. Since detecting gene conversion
is a rare-class problem, we implement cost-sensitive learning in the form of a generated cost matrix that is used to modify
various underlying classifiers. Results show that our method combines the predictive power of different models and is able
to predict gene conversion more accurately than any of the two studied models. Our work provides a useful framwork for future
improvement of gene conversion predictions through multiple models of gene conversion.
[show abstract][hide abstract] ABSTRACT: A drought screen identified accessions of Solanum tuberosum ssp. andigena that showed varying degrees of physiological acclimation or adaptation to repeated drought stress. The accessions also showed variable tuber phenotypes from small tubers that failed to develop in an accession that showed photosynthetic adaptation to normal tubers in an accession with a phenotype showing some degree of photosynthetic adaptation and acclimation. Using microarray data, we correlated the expression of genes associated with carbon metabolism with the tuber development phenotypes under drought. Genes associated with sucrose and starch metabolism showed responses consistent with starch deficiency in the adapted accession and normal starch deposition in the intermediate accession. Starch phosphorylase and glycogen bound starch synthase were induced in the adapted accession, which had abnormal tuber development. Genes associated with trehalose were induced in the intermediate accession with normal tuber development. Genes associated with respiration were also induced in the intermediate accession, and a pattern compatible with the existence of a 3PGA recovery pathway was revealed. Expression of thioredoxin genes also correlated with tuber development phenotypes under drought stress. The data suggest differential regulation of starch deposition in accessions of Andigena with different abilities to respond to drought stress.
Plant Physiology and Biochemistry 02/2008; 46(1):34-45. · 2.78 Impact Factor
[show abstract][hide abstract] ABSTRACT: With the advent of high-throughput gene perturbation screens (e.g., RNAi assays, genome-wide deletion mutants), modeling the complex relationship between genes and phenotypes has become a paramount problem. One broad class of methods uses 'guilt by association' methods to impute phenotypes to genes based on the interactions between the given gene and other genes with known phenotypes. But these methods are inadequate for genes that have no cataloged interactions but which nevertheless are known to result in important phenotypes. In this paper, we present an approach to first model relationships between phenotypes using the notion of 'relative importance' and subsequently use these derived relationships to make phenotype predictions. Besides improved accuracy on S. cerevisiae deletion mutants and C. elegans knock-down datasets, we show how our approach sheds insight into relations between phenotypes.
Computational systems bioinformatics / Life Sciences Society. Computational Systems Bioinformatics Conference 02/2008; 7:225-35.
[show abstract][hide abstract] ABSTRACT: CMGSDB (Database for Computational Modeling of Gene Silencing) is an integration of heterogeneous data sources about Caenorhabditis elegans with capabilities for compositional data mining (CDM) across diverse domains. Besides gene, protein and functional annotations, CMGSDB currently unifies information about 531 RNAi phenotypes obtained from heterogeneous databases using a hierarchical scheme. A phenotype browser at the CMGSDB website serves this hierarchy and relates phenotypes to other biological entities. The application of CDM to CMGSDB produces 'chains' of relationships in the data by finding two-way connections between sets of biological entities. Chains can, for example, relate the knock down of a set of genes during an RNAi experiment to the disruption of a pathway or specific gene expression through another set of genes not directly related to the former set. The web interface for CMGSDB is available at https://bioinformatics.cs.vt.edu/cmgs/CMGSDB/, and serves individual biological entity information as well as details of all chains computed by CDM.
Nucleic Acids Research 02/2008; 36(Database issue):D69-76. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: Responses to prolonged drought and recovery from drought of two South American potato (Solanum tuberosum L. ssp. andigena (Juz & Buk) Hawkes) landraces, Sullu and Ccompis were compared under field conditions. Physiological and biomass measurements, yield analysis, the results of hybridisation to a potato microarray platform (44 000 probes) and metabolite profiling were used to characterise responses to water deficit. Drought affected shoot and root biomass negatively in Ccompis but not in Sullu, whereas both genotypes maintained tuber yield under water stress. Ccompis showed stronger reduction in maximum quantum yield under stress than Sullu, and less decrease in stomatal resistance. Genes associated with PSII functions were activated during recovery in Sullu only. Evidence for sucrose accumulation in Sullu only during maximum stress and recovery was observed, in addition to increases in cell wall biosynthesis. A depression in the abundance of plastid superoxide dismutase transcripts was observed under maximum stress in Ccompis. Both sucrose and the regulatory molecule trehalose accumulated in the leaves of Sullu only. In contrast, in Ccompis, the raffinose oligosaccharide family pathway was activated, whereas low levels of sucrose and minor stress-mediated changes in trehalose were observed. Proline, and expression of the associated genes, rose in both genotypes under drought, with a 3-fold higher increase in Sullu than in Ccompis. The results demonstrate the presence of distinct molecular and biochemical drought responses in the two potato landraces leading to yield maintenance but differential biomass accumulation in vegetative tissues.
[show abstract][hide abstract] ABSTRACT: Genomes have both deterministic and random aspects, with the underlying DNA sequences exhibiting features at numerous scales,
from codons and cis-elements through genes and on to regions of conserved or divergent gene order. The DNA Words program aims to identify mathematical
structures that characterize genomes at multiple scales. The focus of this work is the fine structure of genomic sequences,
the manner in which short nucleotide sequences fit together to comprise the genome as an abstract sequence, within a graph-theoretic
setting. A DNA word graph is a generalization of a de Bruijn graph that records the occurrence counts of node and edges in
a genomic sequence. A DNA word graph can be derived from a genomic sequence generated by a finite Markov chain or a subsequence
of a sequenced genome. Both theoretically and empirically, DNA word graphs give rise to genomic signatures. Several genomic
signatures are derived from the structure of a DNA word graph, including an information-rich and visually appealing genomic
bar code. Application of genomic signatures to several genomes demonstrate their practical value in identifying and distinguishing