Carsten Kemena's research while affiliated with University of Münster and other places

Publications (28)

Article
Full-text available
Background: Modularity is important for evolutionary innovation. The recombination of existing units to form larger complexes with new functionalities spares the need to create novel elements from scratch. In proteins, this principle can be observed at the level of protein domains, functional subunits which are regularly rearranged to acquire new...
Article
Full-text available
Even in the era of next generation sequencing, in which bioinformatics tools abound, annotating transcriptomes and proteomes remains a challenge. This can have major implications for the reliability of studies based on these datasets. Therefore, quality assessment represents a crucial step prior to downstream analyses on novel transcriptomes and pr...
Chapter
Protein domains are reusable segments of proteins and play an important role in protein evolution. By combining the elements from a relatively small set of domains into unique arrangements, a large number of distinct proteins can be generated. Since domains often have specific functions, changes in their arrangement usually affect the overall prote...
Article
The evolution of division of labor between sterile and fertile individuals represents one of the major transitions in biological complexity. A fascinating gradient in eusociality evolved among the ancient hemimetabolous insects, ranging from noneusocial cockroaches through the primitively social lower termites—where workers retain the ability to re...
Article
Full-text available
Around 150 million years ago, eusocial termites evolved from within the cockroaches, 50 million years before eusocial Hymenoptera, such as bees and ants, appeared. Here, we report the 2-Gb genome of the German cockroach, Blattella germanica, and the 1.3-Gb genome of the drywood termite Cryptotermes secundus. We show evolutionary signatures of termi...
Preprint
Full-text available
Around 150 million years ago, eusocial termites evolved from within the cockroaches, 50 million years before eusocial Hymenoptera, such as bees and ants, appeared. Here, we report the first, 2GB genome of a cockroach, Blattella germanica , and the 1.3GB genome of the drywood termite, Cryptotermes secundus . We show evolutionary signatures of termit...
Article
Full-text available
Motivation: Genome studies have become cheaper and easier than ever before, due to the decreased costs of high-throughput sequencing and the free availability of analysis software. However, the quality of genome or transcriptome assemblies can vary a lot. Therefore, quality assessment of assemblies and annotations are crucial aspects of genome anal...
Article
Full-text available
This review provides an overview on the development of Multiple sequence alignment (MSA) methods and their main applications. It is focused on progress made over the past decade. The three first sections review recent algorithmic developments for protein, RNA/DNA and genomic alignments. The fourth section deals with benchmarks and explores the rela...
Article
Full-text available
A central goal of biology is to uncover the genetic basis for the origin of new phenotypes. A particularly effective approach is to examine the genomic architecture of species that have secondarily lost a phenotype with respect to their close relatives. In the eusocial Hymenoptera, queens and workers have divergent phenotypes that may be produced v...
Article
Full-text available
Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A current challenge of bioinformatic research, especially when tak...
Article
Full-text available
Background Proteins are composed of domains, protein segments that fold independently from the rest of the protein and have a specific function. During evolution the arrangement of domains can change: domains are gained, lost or their order is rearranged. To facilitate the analysis of these changes we propose the use of multiple domain alignments.R...
Article
Full-text available
Adaptation requires genetic variation, but founder populations are generally genetically depleted. Here we sequence two populations of an inbred ant that diverge in phenotype to determine how variability is generated. Cardiocondyla obscurior has the smallest of the sequenced ant genomes and its structure suggests a fundamental role of transposable...
Article
Full-text available
Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark datasets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole genome alignment (WGA). Using the same...
Article
Full-text available
This article introduces the SARA-Coffee web server; a service allowing the online computation of 3D structure based multiple RNA sequence alignments. The server makes it possible to combine sequences with and without known 3D structures. Given a set of sequences SARA-Coffee outputs a multiple sequence alignment along with a reliability index for ev...
Preprint
Full-text available
Background: Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark datasets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole genome alignment (WGA). Res...
Article
Full-text available
replying to D. M. McCandlish, E. Rajon, P. Shah, Y. Ding & J. B. Plotkin 497, http://dx.doi.org/10.1038/nature12219 (2013)Understanding fitness landscapes, a conceptual depiction of the genotype-to-phenotype relationship, is crucial to many areas of biology. Two aspects of fitness landscapes are the focus of contemporary studies of molecular evolut...
Article
Full-text available
Motivation: Aligning RNAs is useful to search for homologous genes, study evolutionary relationships, detect conserved regions and identify any patterns that may be of biological relevance. Poor levels of conservation among homologs, however, make it difficult to compare RNA sequences, even when considering closely evolutionary related sequences....
Article
Full-text available
The main forces directing long-term molecular evolution remain obscure. A sizable fraction of amino-acid substitutions seem to be fixed by positive selection, but it is unclear to what degree long-term protein evolution is constrained by epistasis, that is, instances when substitutions that are accepted in one genotype are deleterious in another. H...
Article
Full-text available
Multiple Sequence Alignment (MSA) is an extremely powerful tool for important biological applications, such as phylogenetic analysis, identification of conserved motifs and domains and structure prediction. In this paper we propose a new approach to reduce the computational requirements of TCoffee, a memory demanding MSA tool that uses a consistenc...
Article
Full-text available
Evaluating alternative multiple protein sequence alignments is an important unsolved problem in Biology. The most accurate way of doing this is to use structural information. Unfortunately, most methods require at least two structures to be embedded in the alignment, a condition rarely met when dealing with standard datasets. We developed STRIKE, a...
Article
Full-text available
T-Coffee (Tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (MSA) method suitable for aligning most types of biological sequences. The main strength of T-Coffee is its ability to combine third party aligners and to integrate structural (or homology) information when building MSAs. The ser...
Article
Full-text available
This review focuses on recent trends in multiple sequence alignment tools. It describes the latest algorithmic improvements including the extension of consistency-based methods to the problem of template-based multiple sequence alignments. Some results are presented suggesting that template-based methods are significantly more accurate than simpler...
Article
Full-text available
Motivation: Evaluating alternative multiple protein sequence alignments is an important unsolved problem in Biology. The most accurate way of doing this is to use structural information. Unfortunately most methods require at least two structures to be embedded in the alignment a condition rarely met when dealing with standard datasets. Result: We d...

Citations

... Published genome assemblies for this group include the large, repeat-heavy genome of the locust (Orthoptera, [7]). Investigations on eusociality within the Dictyoptera -independent from a later origin of eusociality within the holometabolous Hymenoptera -have compared the genomes of termites relative to the cockroach [8][9][10]. In this fast-moving field, there is even a recent first look at genomes of species among the 'old wing' orders Odonata and Ephemeroptera [11 ], outgroups to the Neoptera. ...
... This process relies on insights from the bioinformaticians who will lead the assembly and analysis of the sequence data (17,18). A critical first step in genome assembly is to determine what sequence data will be most useful to maximize the potential for de endeavored to maximize consensus experience in the Alignathon competition discussed elsewhere in this article (51). Achievement of best practices and transfer of these alignment methods to the next generation of genome scientists are goals that the G10KCOS embraces. ...
... DomArchov provides an important contribution for studies that use the DA abstraction as the primary data structure. The DA abstraction is widely used to probe questions of protein evolution, including the co-occurrence and variation in domain repertoire across taxonomic lineages (Cromar et al., 2016;Dohmen et al., 2020;Karev et al., 2004;Tordai et al., 2005;Ye and Godzik, 2004), plasticity in domain order (Bashton and Chothia, 2002;Kummerfeld and Teichmann, 2009;Weiner III et al., 2006), domain occurrence graphs (Cromar et al., 2014;Karev et al., 2002;Przytycka et al., 2006;Vogel et al., 2005), and domain promiscuity, i.e. the propensity of a domain to co-occur with many other domains (Basu et al., 2008(Basu et al., , 2009Cohen-Gihon et al., 2011;Cromar et al., 2014;Marcotte et al., 1999). DomArchov provides, for the first time, a simulation engine to complement such studies. ...
... One summarizes contiguity using metrics like N50 length, where half the assembly comprises sequences of length N50 or longer, or L50 count, the smallest number of sequences whose lengths sum to 50% of the assembly. Complementary approaches estimate completeness by examining gene or protein content, e.g., the DOmain-based General Measure for transcriptome and proteome quality Assessment (DOGMA) [6,7] or BUSCO [8,9]. BUSCO has emerged as a standard and is used by UniProt [10] and the US NCBI [11], as well as by genomics data quality assessment pipelines like MultiQC [12] and BlobToolKit [13]. ...
... Gathering threshold) 58 . Proteins having domains structural or sequence similarity were assigned to the same PFAM clan, for which an e-value threshold was defined 59 . To characterize proteins in respect of the biological process they are involved in, Gene Ontology (GO) terms were assigned to domains using Blast2GO PRO (https://www.blast2go.com/) ...
... The transition to eusociality in Hymenoptera has been associated with gene family expansion, including the expansion of odorant receptor (30)(31)(32)(33)(34), insulin signaling (35), and vitellogenin genes (36,37). In two termites with available genome information, Z. nevadensis and Cryptotermes secundus, copy numbers were found to be expanded for genes such as ionotropic receptor (20,21), vitellogenin (20,38), insulin receptor (39), and juvenile hormone biosynthesis genes (40). Despite the accumulating examples of duplications of some categories of genes in social insects, their genome-wide impact and the evolutionary significance remain unclear. ...
... quantification values from Qubit, were converted from nanogram per microlitre to copies per microlitre. Calculations were based on a C brevis genome size of 1.3 × 10 9 base pairs as per C. secundus 34 . ...
... The genome annotation was assessed with BUSCO v5.1.2 executed in protein mode (Àm protein option) and DOGMA v3.4 (Dohmen et al., 2016), evaluating conserved Pfam domains. The genome annotation assessed with DOGMA used 948 singledomain conserved domain arrangements (CDAs) and 491 multiple-domain CDAs across eukaryotes. ...
... The retrieval module also appears in bio-related applications such as protein sequence and structure predictions. For example, MSA (multiple sequence alignment) for proteins [71] can be seen as a way to search and retrieve relevant protein sequences given a query sequence. Using MSA as a retrieval strategy to retrieve additional protein sequence has been an essential building block in recent protein models including, notably, the MSA transformer [72] for protein sequence modeling and AlphaFold for protein structure prediction [73]. ...
... For example, the hymenopteran species exhibit a range of eusociality levels, from solitary to advanced eusocial lifestyles, and are used to investigate topics such as evolution of eusociality, molecular regulation of division of labor and epigenetics of behavior (2)(3)(4)(5)(6)(7)(8). Hymenopteran genome sequencing projects are also used to develop models for evolution and adaptation to fungal and plant symbioses (9)(10)(11)(12)(13), evolution of social parasitism (14), parasitoid biology (15)(16)(17), impact of endosymbionts (13,18,19), adaptation of invasive species (20), ecological speciation (21), transitions to asexual reproduction (21), phenotypic plasticity (8,14,22), selfish B chromosome drive (23) and the evolution of miniaturization (16). In addition to developing biological models, genome sequencing is used to address topics related to agriculture, such as response to pesticides (24) and roles as biological control agents (15,16). ...