Mario Stanke

Mario Stanke
University of Greifswald · Department of Mathematics and Computer Science

Prof. Dr. rer. nat.

About

211
Publications
47,393
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
24,645
Citations
Citations since 2017
116 Research Items
14876 Citations
201720182019202020212022202305001,0001,5002,0002,500
201720182019202020212022202305001,0001,5002,0002,500
201720182019202020212022202305001,0001,5002,0002,500
201720182019202020212022202305001,0001,5002,0002,500
Introduction
Methods: - probabilistical graphical models (HMMs, CRFs) - machine learning (custom models for bioinformatics, generative models, CNNs, discriminative learning, TensorFlow) - algorithms, data structures and software development (C++, Python)
Additional affiliations
July 2010 - present
University of Greifswald
Position
  • Professor (Associate)

Publications

Publications (211)
Article
Full-text available
Background The Earth Biogenome Project has rapidly increased the number of available eukaryotic genomes, but most released genomes continue to lack annotation of protein-coding genes. In addition, no transcriptome data is available for some genomes. Results Various gene annotation tools have been developed but each has its limitations. Here, we in...
Article
Full-text available
Sex differences in the size of specific brain structures have been extensively studied, but careful and reproducible statistical hypothesis testing to identify them produced overall small effect sizes and differences in brains of males and females. On the other hand, multivariate statistical or machine learning methods that analyze MR images of the...
Preprint
Full-text available
Gene prediction remains an active area of bioinformatics research. Challenges are presented by large eukaryotic genomes and heterogeneous data situations. To meet the challenges, several streams of evidence must be integrated, from protein homology and transcriptome data, as well as information derived from the genome itself. The amount and signifi...
Preprint
Full-text available
The Earth Biogenome Project has rapidly increased the number of available eukaryotic genomes, but most released genomes continue to lack annotation of protein-coding genes. In addition, no transcriptome data is available for some genomes. Various gene annotation tools have been developed but each has its limitations. Here, we introduce GALBA, a ful...
Presentation
Full-text available
The increasing availability of databases that provide large amounts of extrinsic evidence in the form of protein sequences and RNA-Seq libraries provides powerful sources of information to improve methods for gene structure prediction of protein-coding genes. The BRAKER pipeline fully automates the annotation of novel eukaryotic genomes by utilizin...
Poster
Full-text available
The increasing availability of databases that provide large amounts of extrinsic evidence in the form of protein sequences and RNA-Seq libraries provides powerful sources of information to improve methods for gene structure prediction of protein-coding genes. The BRAKER pipeline fully automates the annotation of novel eukaryotic genomes by utilizin...
Article
Full-text available
Background The alignment of large numbers of protein sequences is a challenging task and its importance grows rapidly along with the size of biological datasets. State-of-the-art algorithms have a tendency to produce less accurate alignments with an increasing number of sequences. This is a fundamental problem since many downstream tasks rely on ac...
Article
Full-text available
Background An important initial phase of arguably most homology search and alignment methods such as required for genome alignments is seed finding. The seed finding step is crucial to curb the runtime as potential alignments are restricted to and anchored at the sequence position pairs that constitute the seed. To identify seeds, it is good practi...
Preprint
Full-text available
Sex differences in the size of specific brain structures have been extensively studied but careful and reproducible statistical hypothesis testing to identify them produced overall small effect sizes and differences brains of males and females. On the other hand, multivariate statistical or machine learning methods that analyse MR images of the who...
Article
Motivation The comparison of genomes using models of molecular evolution is a powerful approach for finding, or towards understanding, functional elements. In particular, comparative genomics is a fundamental building brick in annotating ever larger sets of alignable genomes completely, accurately and consistently. Results We here present our new...
Chapter
We use an evolution strategy to evolve game strategies for resistance fighters as well as spies for the popular card game “The Resistance”. In our experiment, players only communicate via observable actions. Players are judged by how they behave and not by what they say. Resistance fighters observe the behavior of all game players and try to deduce...
Preprint
The northern white rhinoceros (NWR; Ceratotherium simum cottoni ) is functionally extinct, with only two females remaining alive. Efforts to rescue the NWR have inspired the exploration of unconventional conservation methods, including the generation of artificial gametes from induced pluripotent stem cells and somatic cell nuclear transfer. To ena...
Article
Full-text available
Background BRAKER is a suite of automatic pipelines, BRAKER1 and BRAKER2, for the accurate annotation of protein-coding genes in eukaryotic genomes. Each pipeline trains statistical models of protein-coding genes based on provided evidence and, then predicts protein-coding genes in genomic sequences using both the extrinsic evidence and statistical...
Article
Full-text available
Animal activity is an indicator for its welfare and manual observation is time and cost intensive. To this end, automatic detection and monitoring of live captive animals is of major importance for assessing animal activity, and, thereby, allowing for early recognition of changes indicative for diseases and animal welfare issues. We demonstrate tha...
Preprint
Full-text available
Background: BRAKER is a suite of automatic pipelines, BRAKER1 and BRAKER2, for the accurate annotation of protein-coding genes in eukaryotic genomes. Each pipeline trains statistical models of protein-coding genes based on provided evidence and, then predicts protein-coding genes in genomic sequences using both the extrinsic evidence and statistica...
Article
Full-text available
Amphidiploid fungal Verticillium longisporum strains Vl43 and Vl32 colonize the plant host Brassica napus but differ in their ability to cause disease symptoms. These strains represent two V. longisporum lineages derived from different hybridization events of haploid parental Verticillium strains. Vl32 and Vl43 carry same‐sex mating‐type genes deri...
Article
Full-text available
In contrast to the western honey bee, Apis mellifera, other honey bee species have been largely neglected despite their importance and diversity. The genetic basis of the evolutionary diversification of honey bees remains largely unknown. Here, we provide a genome-wide comparison of three honey bee species each representing one of the three subgene...
Article
Full-text available
Phytopathogenic Verticillia cause Verticillium wilt on numerous economically important crops. Plant infection begins at the roots, where the fungus is confronted with rhizosphere inhabiting bacteria. The effects of different fluorescent pseudomonads, including some known biocontrol agents of other plant pathogens, on fungal growth of the haploid Ve...
Preprint
Full-text available
The comparison of genomes using models of molecular evolution is a powerful approach for finding or towards understanding functional elements. In particular, comparative genomics is a fundamental building brick in building high-quality, complete and consistent annotations of ever larger sets of alignable genomes. We here present our new program Cla...
Article
Full-text available
The task of eukaryotic genome annotation remains challenging. Only a few genomes could serve as standards of annotation achieved through a tremendous investment of human curation efforts. Still, the correctness of all alternative isoforms, even in the best-annotated genomes, could be a good subject for further investigation. The new BRAKER2 pipelin...
Preprint
Full-text available
Full automation of gene prediction has become an important bioinformatics task since the advent of next generation sequencing. The eukaryotic genome annotation pipeline BRAKER1 had combined self-training GeneMark ET with AUGUSTUS to generate genes coordinates with support of transcriptomic data. Here, we introduce BRAKER2, a pipeline with GeneMark...
Preprint
Full-text available
An important initial phase of arguably most homology search and alignment methods such as required for genome alignments is seed finding . The seed finding step is crucial to curb the runtime as potential alignments are restricted to and anchored at the sequence position pairs that constitute the seed. To identify seeds, it is good practice to use...
Article
Full-text available
Background: The red flour beetle Tribolium castaneum has emerged as an important model organism for the study of gene function in development and physiology, for ecological and evolutionary genomics, for pest control and a plethora of other topics. RNA interference (RNAi), transgenesis and genome editing are well established and the resources for...
Poster
Full-text available
While the number of sequenced genomes is ever growing, a vast majority of already available eukaryotic genomes may not be utilized to its full potential since it is lacking a high quality annotation of protein coding genes. Automation of the process of eukaryotic genome annotation is a challenging task due to diversity of input data situations. BRA...
Presentation
Full-text available
Slides from presentation at Plant and Animal Genomes XXVIII
Preprint
Full-text available
Background: The red flour beetle Tribolium castaneum has emerged as an important model organism for the study of gene function in development and physiology, for ecological and evolutionary genomics, for pest control and a plethora of other topics. RNA interference (RNAi), transgenesis and genome editing are well established and the resources for g...
Article
Full-text available
Background Phenotypic plasticity is a pervasive property of all organisms and considered to be of key importance for dealing with environmental variation. Plastic responses to temperature, which is one of the most important ecological factors, have received much attention over recent decades. A recurrent pattern of temperature-induced adaptive plas...
Article
Full-text available
Background: Vast amounts of next generation sequencing RNA data has been deposited in archives, accompanying very diverse original studies. The data is readily available also for other purposes such as genome annotation or transcriptome assembly. However, selecting a subset of available experiments, sequencing runs and reads for this purpose is a...
Preprint
Full-text available
Background: The red flour beetle Tribolium castaneum has emerged as an important model organism for the study of gene function in development and physiology, for ecological and evolutionary genomics, for pest control and a plethora of other topics. RNA interference (RNAi), transgenesis and genome editing are well established and the resources for g...
Chapter
Comparing multiple related genomes can help to improve their structural annotation. The accuracy and consistency of the predicted exon–intron structures of the protein coding genes can be higher when considering all genomes at once rather than annotating one genome at a time. The comparative gene prediction algorithm of AUGUSTUS performs such a mul...
Chapter
BRAKER is a pipeline for highly accurate and fully automated gene prediction in novel eukaryotic genomes. It combines two major tools: GeneMark-ES/ET and AUGUSTUS. GeneMark-ES/ET learns its parameters from a novel genomic sequence in a fully automated fashion; if available, it uses extrinsic evidence for model refinement. From the protein-coding ge...
Preprint
Full-text available
Vast amounts of next generation sequencing RNA data has been deposited in archives, accompanying very diverse original studies. The data is readily available also for other purposes such as genome annotation or transcriptome assembly. However, selecting a subset of available experiments, sequencing runs and reads for this purpose is a nontrivial ta...
Poster
Full-text available
The rapidly growing number of sequenced eukaryotic genomes requires fully automated methods for accurate gene structure annotation. With this goal in mind, we had developed BRAKER1 [1], a combination of self-training GeneMark-ET [2] and AUGUSTUS [3], that uses genomic and RNA-seq data to automatically generate full gene structure annotations in nov...
Article
Full-text available
We report full-length draft de novo genome assemblies for 16 widely used inbred mouse strains and find extensive strain-specific haplotype variation. We identify and characterize 2,567 regions on the current mouse reference genome exhibiting the greatest sequence diversity. These regions are enriched for genes involved in pathogen defence and immun...
Article
Full-text available
AUGUSTUS is a tool for finding protein‐coding genes and their exon‐intron structure in genomic sequences. It does not necessarily require additional experimental input, as it can be applied in so‐called ab initio mode. However, extrinsic evidence from various sources such as transcriptome sequencing or the annotations of closely related genomes can...
Article
Ionizing radiation can induce genomic lesions such as DNA double-strand breaks whose incomplete or faulty repair can result in mutations, which in turn can influence cellular functions and alter the fate of affected cells and organ systems. Ionizing-radiation-induced sequence alterations/mutations occur in a stochastic manner, which contributes to...
Article
Full-text available
The recent introductions of low-cost, long-read, and read-cloud sequencing technologies coupled with intense efforts to develop efficient algorithms have made affordable, high-quality de novo sequence assembly a realistic proposition. The result is an explosion of new, ultracontiguous genome assemblies. To compare these genomes, we need robust meth...
Preprint
Full-text available
The recent introductions of low-cost, long-read, and read-cloud sequencing technologies coupled with intense efforts to develop efficient algorithms have made affordable, high-quality de novo sequence assembly a realistic proposition. The result is an explosion of new, ultra-contiguous genome assemblies. To compare these genomes we need robust meth...
Article
Full-text available
Understanding the mechanisms driving lineage-specific evolution in both primates and rodents has been hindered by the lack of sister clades with a similar phylogenetic structure having high-quality genome assemblies. Here, we have created chromosome-level assemblies of theMus caroliandMus paharigenomes. Together with theMus musculusandRattus norveg...
Preprint
Full-text available
The most commonly employed mammalian model organism is the laboratory mouse. A wide variety of genetically diverse inbred mouse strains, representing distinct physiological states, disease susceptibilities, and biological mechanisms have been developed over the last century. We report full length draft de novo genome assemblies for 16 of the most w...
Chapter
Full-text available
Newly sequenced genomes are being added to the tree of life at an unprecedented fast pace. Increasingly, such new genomes are phylogenetically close to previously sequenced and annotated genomes. In other cases, whole clades of closely related species or strains ought to be annotated simultaneously. Often, in subsequent studies differences between...
Article
Full-text available
Understanding the mechanisms driving lineage-specific evolution in both primates and rodents has been hindered by the lack of sister clades with a similar phylogenetic structure having high-quality genome assemblies. Here, we have created chromosome-level assemblies of the Mus caroli and Mus pahari genomes. Together with the Mus musculus and Rattus...
Article
Full-text available
Background: The duplication of genes can occur through various mechanisms and is thought to make a major contribution to the evolutionary diversification of organisms. There is increasing evidence for a large-scale duplication of genes in some chelicerate lineages including two rounds of whole genome duplication (WGD) in horseshoe crabs. To investi...