Lenwood S. Heath

Virginia Polytechnic Institute and State University, Блэксбург, Virginia, United States

Are you Lenwood S. Heath?

Claim your profile

Publications (124)132.2 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background. Developing a universal standardized microbial typing and nomenclature system that provides phylogenetic and epidemiological information in real time has never been as urgent in public health as today. We previously proposed to use genome similarity as the basis for immediate and precise typing and naming of individual organisms or viruses. Here, we tested the validity of the proposed system applying it to the epidemiology of infectious diseases using Ebola virus disease (EVD) outbreaks as the example. Methods. One hundred twenty-eight publicly available ebolavirus genomes were compared with each other and average nucleotide identity (ANI) was calculated. ANI was then used to assign unique codes, from here on called Life Identification Numbers (LINs), to every viral isolate, whereby each LIN consists of a series of positions reflecting increasing genome similarity. Congruence of LINs with phylogenetic and epidemiological relationships was then determined. Results. Assigned LINs correlate with phylogeny at the species and infraspecies level and can even identify some individual transmission chains during the 2014-2015 EVD epidemic in West Africa. Conclusions. LINs could provide a fast, automated, standardized, and scalable approach to precisely identify and name viral isolates upon genome sequence submission, facilitating unambiguous communication during disease epidemics among clinicians, epidemiologists, and governments.
    03/2015; 2(2). DOI:10.1093/ofid/ofv024
  • Eman Badr, Lenwood S Heath
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract Splicing regulatory elements (SREs) are short, degenerate sequences on pre-mRNA molecules that enhance or inhibit the splicing process via the binding of splicing factors, proteins that regulate the functioning of the spliceosome. Existing methods for identifying SREs in a genome are either experimental or computational. Here, we propose a formalism based on de Bruijn graphs that combines genomic structure, word count enrichment analysis, and experimental evidence to identify SREs found in exons. In our approach, SREs are not restricted to a fixed length (i.e., k-mers, for a fixed k). As a result, we identify 2001 putative exonic enhancers and 3080 putative exonic silencers for human genes, with lengths varying from 6 to 15 nucleotides. Many of the predicted SREs overlap with experimentally verified binding sites. Our model provides a novel method to predict variable length putative regulatory elements computationally for further experimental investigation.
    Journal of computational biology: a journal of computational molecular cell biology 11/2014; DOI:10.1089/cmb.2014.0183 · 1.67 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A broadly accepted and stable biological classification system is a prerequisite for biological sciences. It provides the means to describe and communicate about life without ambiguity. Current biological classification and nomenclature use the species as the basic unit and require lengthy and laborious species descriptions before newly discovered organisms can be assigned to a species and be named. The current system is thus inadequate to classify and name the immense genetic diversity within species that is now being revealed by genome sequencing on a daily basis. To address this lack of a general intra-species classification and naming system adequate for today's speed of discovery of new diversity, we propose a classification and naming system that is exclusively based on genome similarity and that is suitable for automatic assignment of codes to any genome-sequenced organism without requiring any phenotypic or phylogenetic analysis. We provide examples demonstrating that genome similarity-based codes largely align with current taxonomic groups at many different levels in bacteria, animals, humans, plants, and viruses. Importantly, the proposed approach is only slightly affected by the order of code assignment and can thus provide codes that reflect similarity between organisms and that do not need to be revised upon discovery of new diversity. We envision genome similarity-based codes to complement current biological nomenclature and to provide a universal means to communicate unambiguously about any genome-sequenced organism in fields as diverse as biodiversity research, infectious disease control, human and microbial forensics, animal breed and plant cultivar certification, and human ancestry research.
    PLoS ONE 02/2014; 9(2):e89142. DOI:10.1371/journal.pone.0089142 · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Developing soybean seeds accumulate oils, proteins, and carbohydrates that are used as oxidizable substrates providing metabolic precursors and energy during seed germination. The accumulation of these storage compounds in developing seeds is highly regulated at multiple levels, including at transcriptional and post-transcriptional regulation. RNA sequencing was used to provide comprehensive information about transcriptional and post-transcriptional events that take place in developing soybean embryos. Bioinformatics analyses lead to the identification of different classes of alternatively spliced isoforms and corresponding changes in their levels on a global scale during soybean embryo development. Alternative splicing was associated with transcripts involved in various metabolic and developmental processes, including central carbon and nitrogen metabolism, induction of maturation and dormancy, and splicing itself. Detailed examination of selected RNA isoforms revealed alterations in individual domains that could result in changes in subcellular localization of the resulting proteins, protein-protein and enzyme-substrate interactions, and regulation of protein activities. Different isoforms may play an important role in regulating developmental and metabolic processes occurring at different stages in developing oilseed embryos.
    Biology 12/2013; 2(4):1311-37. DOI:10.3390/biology2041311
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Soybean (Glycine max) seeds are an important source of seed storage compounds, including protein, oil, and sugar used for food, feed, chemical, and biofuel production. We assessed detailed temporal transcriptional and metabolic changes in developing soybean embryos to gain a systems biology view of developmental and metabolic changes and to identify potential targets for metabolic engineering. Two major developmental and metabolic transitions were captured enabling identification of potential metabolic engineering targets specific to seed filling and to desiccation. The first transition involved a switch between different types of metabolism in dividing and elongating cells. The second transition involved the onset of maturation and desiccation tolerance during seed filling and a switch from photoheterotrophic to heterotrophic metabolism. Clustering analyses of metabolite and transcript data revealed clusters of functionally related metabolites and transcripts active in these different developmental and metabolic programs. The gene clusters provide a resource to generate predictions about the associations and interactions of unknown regulators with their targets based on "guilt-by-association" relationships. The inferred regulators also represent potential targets for future metabolic engineering of relevant pathways and steps in central carbon and nitrogen metabolism in soybean embryos and drought and desiccation tolerance in plants.
    06/2013; 3(2):347-72. DOI:10.3390/metabo3020347
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Cold acclimation in woody perennials is a metabolically intensive process, but coincides with environmental conditions that are not conducive to the generation of energy through photosynthesis. While the negative effects of low temperatures on the photosynthetic apparatus during winter have been well studied, less is known about how this is reflected at the level of gene and metabolite expression, nor how the plant generates primary metabolites needed for adaptive processes during autumn. Results The MapMan tool revealed enrichment of the expression of genes related to mitochondrial function, antioxidant and associated regulatory activity, while changes in metabolite levels over the time course were consistent with the gene expression patterns observed. Genes related to thylakoid function were down-regulated as expected, with the exception of plastid targeted specific antioxidant gene products such as thylakoid-bound ascorbate peroxidase, components of the reactive oxygen species scavenging cycle, and the plastid terminal oxidase. In contrast, the conventional and alternative mitochondrial electron transport chains, the tricarboxylic acid cycle, and redox-associated proteins providing reactive oxygen species scavenging generated by electron transport chains functioning at low temperatures were all active. Conclusions A regulatory mechanism linking thylakoid-bound ascorbate peroxidase action with “chloroplast dormancy” is proposed. Most importantly, the energy and substrates required for the substantial metabolic remodeling that is a hallmark of freezing acclimation could be provided by heterotrophic metabolism.
    BMC Plant Biology 04/2013; 13(1):72. DOI:10.1186/1471-2229-13-72 · 3.94 Impact Factor
  • Source
    Kuan Yang, Lenwood S Heath, João C Setubal
    [Show abstract] [Hide abstract]
    ABSTRACT: Ancestral genome reconstruction can be understood as a phylogenetic study with more details than a traditional phylogenetic tree reconstruction. We present a new computational system called REGEN for ancestral bacterial genome reconstruction at both the gene and replicon levels. REGEN reconstructs gene content, contiguous gene runs, and replicon structure for each ancestral genome. Along each branch of the phylogenetic tree, REGEN infers evolutionary events, including gene creation and deletion and replicon fission and fusion. The reconstruction can be performed by either a maximum parsimony or a maximum likelihood method. Gene content reconstruction is based on the concept of neighboring gene pairs. REGEN was designed to be used with any set of genomes that are sufficiently related, which will usually be the case for bacteria within the same taxonomic order. We evaluated REGEN using simulated genomes and genomes in the Rhizobiales order.
    12/2012; 3(3):423-43. DOI:10.3390/genes3030423
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Microarray gene expression profiling is a powerful technique to understand complex developmental processes, but making biologically meaningful inferences from such studies has always been challenging. We previously reported a microarray study of the freezing acclimation period in Sitka spruce (Picea sitchensis) in which a large number of candidate genes for climatic adaptation were identified. In the current paper, we apply additional systems biology tools to these data to further probe changes in the levels of genes and metabolites and activities of associated pathways that regulate this complex developmental transition. One aspect of this adaptive process that is not well understood is the role of the cell wall. Our data suggest coordinated metabolic and signaling responses leading to cell wall remodeling. Co-expression of genes encoding proteins associated with biosynthesis of structural and non-structural cell wall carbohydrates was observed, which may be regulated by ethylene signaling components. At the same time, numerous genes, whose products are putatively localized to the endomembrane system and involved in both the synthesis and trafficking of cell wall carbohydrates, were up-regulated. Taken together, these results suggest a link between ethylene signaling and biosynthesis, and targeting of cell wall related gene products during the period of winter hardening. Automated Layout Pipeline for Inferred NEtworks (ALPINE), an in-house plugin for the Cytoscape visualization environment that utilizes the existing GeneMANIA and Mosaic plugins, together with the use of visualization tools, provided images of proposed signaling processes that became active over the time course of winter hardening, particularly at later time points in the process. The resulting visualizations have the potential to reveal novel, hypothesis-generating, gene association patterns in the context of targeted subcellular location.
    Frontiers in Plant Science 10/2012; 3:241. DOI:10.3389/fpls.2012.00241 · 3.64 Impact Factor
  • L. S. Heath, J. P. C. Vergara
    [Show abstract] [Hide abstract]
    ABSTRACT: Sorting permutations by operations such as reversals and block-moves has received much interest because of its applications in the study of genome rearrangements and in the design of interconnection networks. A short block-move is an operation on a permutation that moves an element at most two positions away from its original position. This paper investigates the problem of finding a minimum-length sorting sequence of short block-moves for a given permutation. A 4/3 -approximation algorithm for this problem is presented. Woven double-strip permutations are defined and a polynomial-time algorithm for this class of permutations is devised that employs graph matching techniques. A linear-time maximum matching algorithm for a special class of grid graphs improves the time complexity of the algorithm for woven double-strip permutations. Key words. Computational biology, Genome rearrangement, Approximation algorithms, Maximum matching, Permutations.
    Algorithmica 04/2012; 28(3):323-352. DOI:10.1007/s004530010041 · 0.57 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Massive amounts of transcriptomic data documenting plant responses to changes in environment continue to accumulate in online databases. Unfortunately, many of these data sets have not been analyzed in full detail, especially those that involve time course experiments. To gain more knowledge of the successive gene expression events that occur when stress is initiated in one organ and then relayed to another, we have chosen stress response data for Arabidopsis shoots and roots from the detailed time course study of Killian et al. as a promising source to mine. Using refined statistical analysis, modified vector analysis, and a GO enrichment algorithm, more information was revealed concerning the effects of salt and UVB on gene expression events in shoots and roots over a 24-h time period. GeneMania, with in-house modifications, was used to further analyze abscisic acid (ABA) and jasmonic acid-related (JA) gene expression events in salt-stressed roots and shoots. JA effects appeared to be quite distinct in roots when compared to shoots, especially with respect to the expression of members of the negative regulatory JAZ gene family. In contrast, ABA-related gene expression events were more similar in the two organs. Instances of crosstalk between hormones were observed, as were early responses of regulatory genes involved in both auxin and cytokinin signaling. In the case of each hormone class examined, hormone biosynthesis genes were coexpressed with the genes encoding negative regulators of the corresponding signaling pathway. Hypotheses to explain this finding and future experiments to further explore these nonlinear phenomena are proposed.
    Omics: a journal of integrative biology 03/2012; 16(4):208-28. DOI:10.1089/omi.2011.0111 · 2.73 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Inversions are one of the most frequent large-scale rearrangements observed in actual genomes. While a large body of literature exists on mathematical problems related to the computation of the inversion distance between abstract genomes, these works generally do not take into account that most inversions in bacterial chromosomes are symmetric or roughly symmetric with respect to the origin of replication. We define a new problem: how to sort genomes (or permutations) using almost-symmetric inversions. We show an algorithm that can sort any permutation using only almost-symmetric inversions. Two variants of this algorithm are presented that have better performance in practice. We explore the question of determining the minimum number of almost-symmetric inversions needed to sort a genome by presenting lower and upper bounds and results for special permutation families. The results obtained are the first steps in exploring this interesting new problem.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: ClaMS - "Classifier for Metagenomic Sequences" - is a Java application for binning assembled contigs in metagenomes using user-specified training sets and initial parameters. Since ClaMS trains on sequence composition-based genomic signatures, it is much faster than binning tools that rely on alignments to homologs; ClaMS can bin ~20,000 sequences in 3 minutes on a laptop with a 2.4 GH× Intel Core 2 Duo processor and 2 GB RAM. ClaMS is meant to be a desktop application for biologists and can be run on any machine under any Operating System on which the Java Runtime Environment can be installed.
    Standards in Genomic Sciences 11/2011; 5(2):248-53. DOI:10.4056/sigs.2075298 · 3.17 Impact Factor
  • Source
    Lenwood S. Heath, Nidhi Parikh
    [Show abstract] [Hide abstract]
    ABSTRACT: Most real-world networks exhibit a high clustering coefficient—the probability that two neighbors of a node are also neighbors of each other. We propose two algorithms, Conf and Throw, that take triangle and single edge degree sequences as input and generate a random graph with a target clustering coefficient. We analyze them theoretically for the case of a regular graph. Conf generates a random graph with the input degree sequence and the clustering coefficient anticipated from the input. Experimental results match quite well with the anticipated clustering coefficient except for highly dense graphs, in which case the experimental clustering coefficient is higher than the anticipated value. For Throw, the degree sequence and the clustering coefficient of the generated graph varies from the input. However, it maintains the expected degree distribution, and the clustering coefficient of the generated graph can also be predicted using analytical results. Experiments show that, for Throw, the results match quite well with the analytical results. Typically, only information about degree distribution is available. We also propose an algorithm Deg that takes degree sequence and clustering coefficient as input and generates a graph with the same properties. Experiments show results for Deg that are quite similar to those for Conf.
    Physica A: Statistical Mechanics and its Applications 11/2011; 390(23):4577-4587. DOI:10.1016/j.physa.2011.06.052 · 1.72 Impact Factor
  • Source
    Liqing Zhang, Layne T Watson, Lenwood S Heath
    [Show abstract] [Hide abstract]
    ABSTRACT: The Structural Classification of Proteins (SCOP) database uses a large number of hidden Markov models (HMMs) to represent families and superfamilies composed of proteins that presumably share the same evolutionary origin. However, how the HMMs are related to one another has not been examined before. In this work, taking into account the processes used to build the HMMs, we propose a working hypothesis to examine the relationships between HMMs and the families and superfamilies that they represent. Specifically, we perform an all-against-all HMM comparison using the HHsearch program (similar to BLAST) and construct a network where the nodes are HMMs and the edges connect similar HMMs. We hypothesize that the HMMs in a connected component belong to the same family or superfamily more often than expected under a random network connection model. Results show a pattern consistent with this working hypothesis. Moreover, the HMM network possesses features distinctly different from the previously documented biological networks, exemplified by the exceptionally high clustering coefficient and the large number of connected components. The current finding may provide guidance in devising computational methods to reduce the degree of overlaps between the HMMs representing the same superfamilies, which may in turn enable more efficient large-scale sequence searches against the database of HMMs.
    BMC Bioinformatics 05/2011; 12:191. DOI:10.1186/1471-2105-12-191 · 2.67 Impact Factor
  • Nahla A Belal, Lenwood S Heath
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a graph-based model for representing two aligned genomic sequences. An alignment graph is a mixed graph consisting of two sets of vertices, each representing one of the input sequences, and three sets of edges. These edges allow the model to represent a number of evolutionary events. This model is used to perform sequence alignment at the level of nucleotides. We define a scoring function for alignment graphs. We show that minimizing the score is NP-complete. However, we present a dynamic programming algorithm that solves the minimization problem optimally for a certain class of alignments, called breakable arrangements. Algorithms for analyzing breakable arrangements are presented. We also present a greedy algorithm that is capable of representing reversals. We present a dynamic programming algorithm that optimally aligns two genomic sequences, when one of the input sequences is a breakable arrangement of the other. Comparing what we define as breakable arrangements to alignments generated by other algorithms, it is seen that many already aligned genomes fall into the category of being breakable. Moreover, the greedy algorithm is shown to represent reversals, besides rearrangements, mutations, and other evolutionary events.
    Journal of computational biology: a journal of computational molecular cell biology 05/2011; 18(5):705-28. DOI:10.1089/cmb.2010.0101 · 1.67 Impact Factor
  • N.A. Belal, L.S. Heath
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a method for detecting horizontal gene transfer (HGT) using partial orders (posets). The method requires a poset for each species/gene pair, where we have a set of species S, and a set of genes G. Given the posets, the method constructs a phylogenetic tree that is compatible with the set of posets; this is done for each gene. Also, the set of posets can be derived from the tree. The trees constructed for each gene are then compared and tested for contradicting information, where a contradiction suggests HGT.
    Computer Technology and Development (ICCTD), 2010 2nd International Conference on; 12/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Heat shock proteins (HSPs) are induced not only under heat stress conditions but also under other environmental stresses such as water stress. In plants, HSPs families are larger than those of other eukaryotes. In order to elucidate a possible connection between HSP expression and photosynthetic acclimation or conditioning, we conducted a water stress experiment in loblolly pine (Pinus taeda L.) seedlings involving progressive treatment consisting of one cycle of mild stress (-1 MPa) followed by two cycles of severe stress (-1.7 MPa). Net photosynthesis was measured at each stress level. Photosynthetic acclimation occurred in the progressive treatment after the first cycle, but not in the severe treatment, suggesting that a cycle of mild stress conditioned the trees to adapt to a more severe stress. Real time results indicated specific patterns in needles in the expression of HSP70, HSP90 and sHSP genes for each treatment, both at maximum stress and at recovery. We identified a pine homolog to GRP94 (ER resident HSP90) that was induced after rehydration coincident with acclimation. Further analysis of the promoter region of the pine GRP94 showed putative cis-elements associated with water stress and rehydration, corresponding to the expression pattern observed in our experiment.
    Plant Physiology and Biochemistry 02/2010; 48(4):256-64. DOI:10.1016/j.plaphy.2009.12.005 · 2.35 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Systems biology has made massive strides in recent years, with capabilities to model complex systems including cell division, stress response, energy metabolism, and signaling pathways. Concomitant with their improved modeling capa- bilities, however, such biochemical network models have also become notoriously complex for humans to comprehend. We propose network comprehension as a key problem for the KDD community, where the goal is to create explainable representations of complex biological networks. We formu- late this problem as one of extracting temporal signatures from multi-variate time series data, where the signatures are composed of ordinal comparisons between time series components. We show how such signatures can be inferred by formulating the data mining problem as one of feature selection in rank-order space. We propose ve new feaure selection strategies for rank-order space and assess their se- lective superiorities. Experimental results on budding yeast cell cycle models demonstrate compelling results comparable to human interpretations of the cell cycle.
    Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25-28, 2010; 01/2010
  • Lenwood S. Heath, Naren Ramakrishnan
  • Source
    Lenwood S Heath, Ao-Ping Hou, Huadong Xia, Liqing Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: With the advent of the thousand dollar genome, one can anticipate the need to store, communicate, and manipulate many human genomes. Data compression methods have been developed to store and communicate genomes efficiently. Unfortunately, these methods do not support efficient manipulation (e.g., subsequence retrieval) of the compressed genome. We develop a data compression scheme that achieves both efficient storage and efficient sequence manipula-tion. We demonstrate the practicality of the method on two databases of genomes, one for the human mitochondrion and one for the H3N2 virus. In both cases, we achieve high compression ratios and O(log n) subsequence retrieval times.

Publication Stats

3k Citations
132.20 Total Impact Points


  • 1988–2013
    • Virginia Polytechnic Institute and State University
      • • Department of Plant Pathology, Physiology, and Weed Science
      • • Department of Computer Science
      Блэксбург, Virginia, United States
  • 2011
    • Arab Academy for Science, Technology & Maritime Transport
      Al Iskandarīyah, Alexandria, Egypt
  • 2004
    • North Carolina State University
      Raleigh, North Carolina, United States
  • 1992
    • University of North Carolina at Chapel Hill
      North Carolina, United States
    • University of Massachusetts Amherst
      • School of Computer Science
      Amherst Center, Massachusetts, United States
  • 1987
    • Massachusetts Institute of Technology
      • Department of Mathematics
      Cambridge, MA, United States
  • 1984
    • University of North Carolina at Charlotte
      Charlotte, North Carolina, United States