Publications (12)81.57 Total impact
-
Article: Genomic organization of eukaryotic tRNAs
[show abstract] [hide abstract]
ABSTRACT: Abstract Background Surprisingly little is known about the organization and distribution of tRNA genes and tRNA-related sequences on a genome-wide scale. While tRNA gene complements are usually reported in passing as part of genome annotation efforts, and peculiar features such as the tandem arrangements of tRNA gene in Entamoeba histolytica have been described in some detail, systematic comparative studies are rare and mostly restricted to bacteria. We therefore set out to survey the genomic arrangement of tRNA genes and pseudogenes in a wide range of eukaryotes to identify common patterns and taxon-specific peculiarities. Results In line with previous reports, we find that tRNA complements evolve rapidly and tRNA gene and pseudogene locations are subject to rapid turnover. At phylum level, the distributions of the number of tRNA genes and pseudogenes numbers are very broad, with standard deviations on the order of the mean. Even among closely related species we observe dramatic changes in local organization. For instance, 65% and 87% of the tRNA genes and pseudogenes are located in genomic clusters in zebrafish and stickleback, resp., while such arrangements are relatively rare in the other three sequenced teleost fish genomes. Among basal metazoa, Trichoplax adhaerens has hardly any duplicated tRNA gene, while the sea anemone Nematostella vectensis boasts more than 17000 tRNA genes and pseudogenes. Dramatic variations are observed even within the eutherian mammals. Higher primates, for instance, have 616 ± 120 tRNA genes and pseudogenes of which 17% to 36% are arranged in clusters, while the genome of the bushbaby Otolemur garnetti has 45225 tRNA genes and pseudogenes of which only 5.6% appear in clusters. In contrast, the distribution is surprisingly uniform across plant genomes. Consistent with this variability, syntenic conservation of tRNA genes and pseudogenes is also poor in general, with turn-over rates comparable to those of unconstrained sequence elements. Despite this large variation in abundance in Eukarya we observe a significant correlation between the number of tRNA genes, tRNA pseudogenes, and genome size. Conclusions The genomic organization of tRNA genes and pseudogenes shows complex lineage-specific patterns characterized by an extensive variability that is in striking contrast to the extreme levels of sequence-conservation of the tRNAs themselves. The comprehensive analysis of the genomic organization of tRNA genes and pseudogenes in Eukarya provides a basis for further studies into the interplay of tRNA gene arrangements and genome organization in general.BMC Genomics. 01/2010; -
Article: Defining genes: a computational framework.
[show abstract] [hide abstract]
ABSTRACT: The precise elucidation of the gene concept has become the subject of intense discussion in light of results from several, large high-throughput surveys of transcriptomes and proteomes. In previous work, we proposed an approach for constructing gene concepts that combines genomic heritability with elements of function. Here, we introduce a definition of the gene within a computational framework of cellular interactions. The definition seeks to satisfy the practical requirements imposed by annotation, capture logical aspects of regulation, and encompass the evolutionary property of homology.Theory in Biosciences 07/2009; 128(3):165-70. · 0.98 Impact Factor -
Article: SynBlast: assisting the analysis of conserved synteny information.
[show abstract] [hide abstract]
ABSTRACT: MOTIVATION: In the last years more than 20 vertebrate genomes have been sequenced, and the rate at which genomic DNA information becomes available is rapidly accelerating. Gene duplication and gene loss events inherently limit the accuracy of orthology detection based on sequence similarity alone. Fully automated methods for orthology annotation do exist but often fail to identify individual members in cases of large gene families, or to distinguish missing data from traceable gene losses. This situation can be improved in many cases by including conserved synteny information. RESULTS: Here we present the SynBlast pipeline that is designed to construct and evaluate local synteny information. SynBlast uses the genomic region around a focal reference gene to retrieve candidates for homologous regions from a collection of target genomes and ranks them in accord with the available evidence for homology. The pipeline is intended as a tool to aid high quality manual annotation in particular in those cases where automatic procedures fail. We demonstrate how SynBlast is applied to retrieving orthologous and paralogous clusters using the vertebrate Hox and ParaHox clusters as examples. SOFTWARE: The SynBlast package written in Perl is available under the GNU General Public License at http://www.bioinf.uni-leipzig.de/Software/SynBlast/.BMC Bioinformatics 09/2008; 9:351. · 2.75 Impact Factor -
Article: Transcriptional regulation of the human CD97 promoter by Sp1/Sp3 in smooth muscle cells.
[show abstract] [hide abstract]
ABSTRACT: The EGF-TM7 receptor CD97 shows different features of expression and function in muscle cells compared to hematopoetic and tumor cells. Since the molecular function and regulation of CD97 are poorly understood, this study aimed at defining its basal transcriptional regulation in smooth muscle cells (SMCs). The computational analysis of the CD97 5'-flanking region revealed that the TATA box-lacking promoter possesses several GC-rich regions as putative Sp1/Sp3 binding sites. Transfection studies with serially deleted promoter constructs demonstrated that the minimal promoter fragment resided in the -218/+45 region containing one out of five identified GC-boxes in the leiomyosarcoma cell line SK-LMS-1 and human bronchial smooth muscle cells (HbSMCs). Mutation of the most proximal GC-site in CD97 reporter gene constructs caused a significant decrease in promoter activity. Gel shift assays and chromatin immunoprecipitation revealed that Sp1 and Sp3 bound specifically to the most proximal GC-site. Furthermore, we showed that Sp1 and Sp3 over-expression activates CD97 promoter activity in HEK293 cells. Our data characterize for the first time the activity of the human CD97 promoter which is controlled by Sp1/Sp3 transcription factors in SMCs.Gene 05/2008; 413(1-2):67-75. · 2.34 Impact Factor -
Article: "Genes".
[show abstract] [hide abstract]
ABSTRACT: In order to describe a cell at molecular level, a notion of a "gene" is neither necessary nor helpful. It is sufficient to consider the molecules (i.e., chromosomes, transcripts, proteins) and their interactions to describe cellular processes. The downside of the resulting high resolution is that it becomes very tedious to address features on the organismal and phenotypic levels with a language based on molecular terms. Looking for the missing link between biological disciplines dealing with different levels of biological organization, we suggest to return to the original intent behind the term "gene". To this end, we propose to investigate whether a useful notion of "gene" can be constructed based on an underlying notion of function, and whether this can serve as the necessary link and embed the various distinct gene concepts of biological (sub)disciplines in a coherent theoretical framework. In reply to the Genon Theory recently put forward by Klaus Scherrer and Jürgen Jost in this journal, we shall discuss a general approach to assess a gene definition that should then be tested for its expressiveness and potential cross-disciplinary relevance.Theory in Biosciences 04/2008; 127(3):215-21. · 0.98 Impact Factor -
Article: Noisy: Identification of problematic columns in multiple sequence alignments
[show abstract] [hide abstract]
ABSTRACT: Abstract Motivation Sequence-based methods for phylogenetic reconstruction from (nucleic acid) sequence data are notoriously plagued by two effects: homoplasies and alignment errors. Large evolutionary distances imply a large number of homoplastic sites. As most protein-coding genes show dramatic variations in substitution rates that are not uncorrelated across the sequence, this often leads to a patchwork pattern of (i) phylogenetically informative and (ii) effectively randomized regions. In highly variable regions, furthermore, alignment errors accumulate resulting in sometimes misleading signals in phylogenetic reconstruction. Results We present here a method that, based on assessing the distribution of character states along a cyclic ordering of the taxa, allows the identification of phylogenetically uninformative homoplastic sites in a multiple sequence alignment. Removal of these sites appears to improve the performance of phylogenetic reconstruction algorithms as measured by various indices of "tree quality". In particular, we obtain more stable trees due to the exclusion of phylogenetically incompatible sites that most likely represent strongly randomized characters. Software The computer program noisy implements this approach. It can be employed to improving phylogenetic reconstruction capability with quite a considerable success rate whenever (1) the average bootstrap support obtained from the original alignment is low, and (2) there are sufficiently many taxa in the data set – at least, say, 12 to 15 taxa. The software can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/noisy/ .Algorithms for Molecular Biology. 01/2008; -
Article: Evolution of genes and genomes on the Drosophila phylogeny.
[show abstract] [hide abstract]
ABSTRACT: Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.Nature 12/2007; 450(7167):203-18. · 36.28 Impact Factor -
Article: Evolution of genes and genomes on the Drosophila phylogeny
[show abstract] [hide abstract]
ABSTRACT: Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.Nature 11/2007; 450(7167):203-218. · 36.28 Impact Factor -
Article: Computational RNomics of Drosophilids
[show abstract] [hide abstract]
ABSTRACT: Abstract Background Recent experimental and computational studies have provided overwhelming evidence for a plethora of diverse transcripts that are unrelated to protein-coding genes. One subclass consists of those RNAs that require distinctive secondary structure motifs to exert their biological function and hence exhibit distinctive patterns of sequence conservation characteristic for positive selection on RNA secondary structure. The deep-sequencing of 12 drosophilid species coordinated by the NHGRI provides an ideal data set of comparative computational approaches to determine those genomic loci that code for evolutionarily conserved RNA motifs. This class of loci includes the majority of the known small ncRNAs as well as structured RNA motifs in mRNAs. We report here on a genome-wide survey using RNAz. Results We obtain 16 000 high quality predictions among which we recover the majority of the known ncRNAs. Taking a pessimistically estimated false discovery rate of 40% into account, this implies that at least some ten thousand loci in the Drosophila genome show the hallmarks of stabilizing selection action of RNA structure, and hence are most likely functional at the RNA level. A subset of RNAz predictions overlapping with TRF1 and BRF binding sites [Isogai et al ., EMBO J . 26: 79–89 (2007)], which are plausible candidates of Pol III transcripts, have been studied in more detail. Among these sequences we identify several "clusters" of ncRNA candidates with striking structural similarities. Conclusion The statistical evaluation of the RNAz predictions in comparison with a similar analysis of vertebrate genomes [Washietl et al., Nat. Biotech . 23 : 1383–1390 (2005)] shows that qualitatively similar fractions of structured RNAs are found in introns, UTRs, and intergenic regions. The intergenic RNA structures, however, are concentrated much more closely around known protein-coding loci, suggesting that flies have significantly smaller complement of independent structured ncRNAs compared to mammals.BMC Genomics. 01/2007; -
Article: Multiple sequence alignment with user-defined anchor points
[show abstract] [hide abstract]
ABSTRACT: Abstract Background Automated software tools for multiple alignment often fail to produce biologically meaningful results. In such situations, expert knowledge can help to improve the quality of alignments. Results Herein, we describe a semi-automatic version of the alignment program DIALIGN that can take pre-defined constraints into account. It is possible for the user to specify parts of the sequences that are assumed to be homologous and should therefore be aligned to each other. Our software program can use these sites as anchor points by creating a multiple alignment respecting these constraints. This way, our alignment method can produce alignments that are biologically more meaningful than alignments produced by fully automated procedures. As a demonstration of how our method works, we apply our approach to genomic sequences around the Hox gene cluster and to a set of DNA-binding proteins. As a by-product, we obtain insights about the performance of the greedy algorithm that our program uses for multiple alignment and about the underlying objective function. This information will be useful for the further development of DIALIGN. The described alignment approach has been integrated into the TRACKER software system.Algorithms for Molecular Biology. 01/2006; -
Article: Evolutionary patterns of non-coding RNAs.
[show abstract] [hide abstract]
ABSTRACT: A plethora of new functions of non-coding RNAs (ncRNAs) have been discovered in past few years. In fact, RNA is emerging as the central player in cellular regulation, taking on active roles in multiple regulatory layers from transcription, RNA maturation, and RNA modification to translational regulation. Nevertheless, very little is known about the evolution of this "Modern RNA World" and its components. In this contribution, we attempt to provide at least a cursory overview of the diversity of ncRNAs and functional RNA motifs in non-translated regions of regular messenger RNAs (mRNAs) with an emphasis on evolutionary questions. This survey is complemented by an in-depth analysis of examples from different classes of RNAs focusing mostly on their evolution in the vertebrate lineage. We present a survey of Y RNA genes in vertebrates and study the molecular evolution of the U7 snRNA, the snoRNAs E1/U17, E2, and E3, the Y RNA family, the let-7 microRNA (miRNA) family, and the mRNA-like evf-1 gene. We furthermore discuss the statistical distribution of miRNAs in metazoans, which suggests an explosive increase in the miRNA repertoire in vertebrates. The analysis of the transcription of ncRNAs suggests that small RNAs in general are genetically mobile in the sense that their association with a hostgene (e.g. when transcribed from introns of a mRNA) can change on evolutionary time scales. The let-7 family demonstrates, that even the mode of transcription (as intron or as exon) can change among paralogous ncRNA.Theory in Biosciences 05/2005; 123(4):301-69. · 0.98 Impact Factor -
Article: The duplication of the Hox gene clusters in teleost fishes.
[show abstract] [hide abstract]
ABSTRACT: Higher teleost fishes, including zebrafish and fugu, have duplicated their Hox genes relative to the gene inventory of other gnathostome lineages. The most widely accepted theory contends that the duplicate Hox clusters orginated synchronously during a single genome duplication event in the early history of ray-finned fishes. In this contribution we collect and re-evaluate all publicly available sequence information. In particular, we show that the short Hox gene fragments from published PCR surveys of the killifish Fundulus heteroclitus, the medaka Oryzias latipes and the goldfish Carassius auratus can be used to determine with little ambiguity not only their paralog group but also their membership in a particular cluster.Together with a survey of the genomic sequence data from the pufferfish Tetraodon nigroviridis we show that at least percomorpha, and possibly all eutelosts, share a system of 7 or 8 orthologous Hox gene clusters. There is little doubt about the orthology of the two teleost duplicates of the HoxA and HoxB clusters. A careful analysis of both the coding sequence of Hox genes and of conserved non-coding sequences provides additional support for the "duplication early" hypothesis that the Hox clusters in teleosts are derived from eight ancestral clusters by means of subsequent gene loss; the data remain ambiguous, however, in particular for the HoxC clusters.Assuming the "duplication early" hypothesis we use the new evidence on the Hox gene complements to determine the phylogenetic positions of gene-loss events in the wake of the cluster duplication. Surprisingly, we find that the resolution of redundancy seems to be a slow process that is still ongoing. A few suggestions on which additional sequence data would be most informative for resolving the history of the teleostean Hox genes are discussed.Theory in Biosciences 07/2004; 123(1):89-110. · 0.98 Impact Factor
Top Journals
- Theory in Biosciences (2)
- Theory in Biosciences (2)
- Nature (1)
- Gene (1)
- BMC Bioinformatics (1)
Institutions
-
2008
-
Santa Fe Institute
Santa Fe, NM, USA
-
-
2004
-
University of Leipzig
- Institut für Informatik
Leipzig, Saxony, Germany
-