
Jens Allmer- Professor
- Professor (Full) at Ruhr West University of Applied Sciences
Jens Allmer
- Professor
- Professor (Full) at Ruhr West University of Applied Sciences
Visit my blog, where I discuss various topics, from vitamins to LLMs.
About
206
Publications
64,215
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,135
Citations
Introduction
My interest in bioinformatics started before 2000. I then started developing algorithms for high throughput data from mass spectrometry-based proteomics. Today, I amended that with genomic and transcriptomic data and algorithms to analyze these. A unification of these efforts is presented in our ongoing proteogenomics studies.
To give back to the general public, I recently started a blog where I discuss various topics regarding health but also machine learning: https://www.allmer.de/blog
Current institution
Additional affiliations
Education
January 2006 - July 2006
January 2004 - December 2005
September 2002 - December 2003
Publications
Publications (206)
MicroRNAs are crucial for post-transcriptional gene regulation, and their dysregulation has been associated with diseases like cancer and, therefore, their analysis has become popular. The experimental discovery of miRNAs is cumbersome and, thus, many computational tools have been proposed. Here we assess 13 ab initio pre-miRNA detection approaches...
Background:
MicroRNAs (miRNAs) are short RNA sequences that guide post-transcriptional regulation of gene expression via complementarity to their target mRNAs. Discovered only recently, miRNAs have drawn a lot of attention. Multiple protein complexes interact to first cleave a hairpin from nascent RNA, export it into the cytosol, trim its loop, an...
Proteomics is the study of proteins, their time- and location-dependent expression profiles, as well as their modifications and interactions. Mass spectrometry is useful to investigate many of the questions asked in proteomics. Database search methods are typically employed to identify proteins from complex mixtures. However, databases are not ofte...
Chlamydomonas reinhardtii is a unicellular green alga whose lineage diverged from land plants over 1 billion years ago. It is a model system for studying
chloroplast-based photosynthesis, as well as the structure, assembly, and function of eukaryotic flagella (cilia), which were
inherited from the common ancestor of plants and animals, but lost in...
MicroRNAs (miRNAs) are involved in post-transcriptional modulation of gene expression and thereby have a large influence on the resulting phenotype. We have previously shown that miRNAs may be involved in the communication between Toxoplasma gondii and its hosts and further confirmed a number of proposed specific miRNAs. Yet, little is known about...
This data was used to evaluate the hypothesis that the Zika virus contains pre-miRNAs that are expressed.
The changes in protein expression are hallmarks of development and disease. Protein expression can be established qualitatively and quantitatively using mass spectrometry (MS). Samples are prepared, proteins extracted and then analyzed using MS and MS/MS. The resulting spectra need to be processed computationally to assign peptide spectrum match. D...
Proteogenomics enables the confirmation and refinement of gene models, the detection of new ones, and the proposition of alternative transcripts using support at the protein level. Such evidence is usually generated using mass spectrometry and subsequent result mapping to various sequence databases. This workflow entails several problems: (1) To sp...
MicroRNAs (miRNAs) play a pivotal role in posttranscriptional gene regulation, making computational methods crucial for studying miRNAs. However, with the exponential growth of available data, accessing and effectively utilizing the vast amount of information has become a significant challenge for precise miRNA study. In this chapter, we address th...
MicroRNAs (miRNAs), a class of small, non-coding RNAs, play a pivotal role in regulating gene expression at the post-transcriptional level. These regulatory molecules are integral to many biological processes and have been implicated in the pathogenesis of various diseases, including Human Immunodeficiency Virus (HIV) infection. This review aims to...
The treatment of human diseases is a major research question in many fields related to medicine. It has become clear that patient stratification is of utmost importance so that patients receive the best possible treatment. Bio/disease markers are critical to achieve stratification. Markers can come from many different sources such as genomics, tran...
Alternative polyadenylation (APA) increases transcript diversity through the generation of isoforms with varying 3′ untranslated region (3′ UTR) lengths. As the 3′ UTR harbors regulatory element target sites, such as miRNAs or RNA-binding proteins, changes in this region can impact post-transcriptional regulation and translation. Moreover, the APA...
Deep learning is a powerful machine learning technique that can learn from large amounts of data using multiple layers of artificial neural networks. This paper reviews some applications of deep learning in bioinformatics, a field that deals with analyzing and interpreting biological data. We first introduce the basic concepts of deep learning and...
Changes in protein expression are hallmarks of development and disease. Protein expression can be established qualitatively and quantitatively using mass spectrometry (MS). Samples are prepared, proteins extracted, and then analyzed using MS and MS/MS. The resulting spectra need to be processed computationally to assign a peptide spectrum match. Da...
Science has become a highly competitive undertaking concerning, for example, resources, positions, students, and publications. At the same time, the number of journals presenting scientific findings skyrockets while the knowledge increase per manuscript seems to be diminishing. Science has also become ever more dependent on computational analyses....
Background
Cell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an organism. The Kyoto encyclopedia of genes and genomes (KEGG...
Diseases such as cancer are often defined by dysregulation of gene expression. Noncoding RNAs (ncRNA) such as microRNAs are involved in gene expression and cell-cell communication. Many other ncRNAs exist, such as circular RNAs and small nucleolar RNAs. A wealth of knowledge is available for many ncRNAs, but the information is federated in many dat...
Data analytics, machine learning, and artificial intelligence have found widespread application in science. They are usually employed as part of more extensive data analysis pipelines starting with raw data processing. Unfortunately, many tools that are not tested yet are used to support critical decision-making. The intended internet of science pl...
Der Sammelband des KI-Campus gibt Einblicke und Impulse, wie offene, digitale Lernangebote zum Thema Künstliche Intelligenz (KI) in die Hochschullehre integriert werden können. In elf Beiträgen teilen Lehr-Fellows verschiedener Fachbereiche ihre Erfahrungen und Erkenntnisse, wie sie Online-Kurse, Videos und Podcasts des KI-Campus in unterschiedlich...
Background: Cell homeostasis relies on the concerted actions of several genes; and dysregulated genes lead to disease manifestations. In living organisms, genes or their products do not act alone, but instead act within a large network. Subsets of these networks can be viewed as modules which provide certain functionality in an organism. Kyoto Ency...
Mature microRNAs (miRNAs) are short RNA sequences about 18–24 nucleotide long, which provide the recognition key within RISC for the posttranscriptional regulation of target RNAs. Considering the canonical pathway, mature miRNAs are produced via a multistep process. Their transcription (pri-miRNAs) and first processing step via the microprocessor c...
microRNAs (miRNAs or miRs) are short non-coding RNA molecules which have been shown to be dysregulated and released into the extracellular milieu as a result of many drug and non-drug-induced pathologies in different organ systems. Consequently, circulating miRs have been proposed as useful biomarkers of many disease states, including drug-induced...
Background
Subacute sclerosing panencephalitis (SSPE) is a chronic, progressive disease caused by a persistent infection of the measles virus. Despite extensive efforts, the exact neurodegeneration mechanism in SSPE remains unknown. MicroRNAs (miRNAs) have emerged as an essential part of cellular antiviral defense mechanisms and can be modulated by...
Gene regulation is of utmost importance to cell homeostasis; thus, any dysregulation in it often leads to disease. MicroRNAs (miRNAs) are involved in posttranscriptional gene regulation and consequently, their dysregulation has been associated with many diseases. MiRBase version 21 contains microRNAs from about 200 species organized into about 70 c...
MicroRNAs (miRNAs) are short RNA sequences that are actively involved in gene regulation. These regulators on the post-transcriptional level have been discovered in virtually all eukaryotic organisms. Additionally, miRNAs seem to exist in viruses and might also be produced in microbial pathogens. Initially, transcribed RNA is cleaved by Drosha, pro...
SARS-CoV-2 has spread worldwide and caused social, economic, and health turmoil. The first genome assembly of SARS-CoV-2 was produced in Wuhan, and it is widely used as a reference. Subsequently, more than a hundred additional SARS-CoV-2 genomes have been sequenced. While the genomes appear to be mostly identical, there are variations. Therefore, a...
Unfortunately, some available tools for peptide mapping in proteomics and proteogenomics have technical shortcomings. Thus, we developed a set of test cases to allow tool developers to test their implementations comprehensively.
For the identification and sequencing of proteins, mass spectrometry (MS) has become the tool of choice and, as such, drives proteomics. MS/MS spectra need to be assigned a peptide sequence for which two strategies exist. Either database search or de novo sequencing can be employed to establish peptide spectrum matches. For database search, mzIdent...
The Table contains predicted miRNAs for 538 ZIKV genomes available in NCBI (see http://dx.doi.org/10.13140/RG.2.2.36262.34884). Predictions were filtered and further annotated with human homologous miRNAs and predicted targets in humans (also filtered). Overall about 1 million rows of data are in the compiled table.
The data was collected from NCBI (https://www.ncbi.nlm.nih.gov/nuccore/) and combined into one file for further data analysis and to share the collected data with others.
Many diseases are driven by dysregulated gene expression. MicroRNAs are key players for post-transcriptional gene regulation. miRBase contains microRNAs (miRNAs) from about 200 species organized into about 70 clades. It has been shown that not all miRNAs collected in the database are likely to be real and, therefore, novel routes to delineate betwe...
MicroRNAs (miRNAs) are small non-coding RNA sequences that have been implicated in many physiological processes and diseases. The experimental discovery of miRNAs is complicated because both miRNAs and their targets need to be expressed for the confirmation of functional interactions, but expression is under spatiotemporal control. This has motivat...
Motivation: Proteogenomics involves the supporting of gene models with experimental proteomics data. Mass spectrometry allows the measurement of peptides and the mass spectra can be assigned a peptide sequence using various algorithms. These tools were not designed for proteogenomics and peptide mapping to reference databases needs to be performed...
This paper extends the idea of an Internet of Science with the option of workflow creation and sharing.
MicroRNAs (miRNAs) are short RNA sequences actively involved in post-transcriptional gene regulation. Such miRNAs have been discovered in most eukaryotic organisms. They also seem to exist in viruses and perhaps in microbial pathogens to target the host. Drosha is the enzyme which first cleaves the pre-miRNA from the nascent pri-miRNA. Previously,...
Big data and complex analysis workflows (pipelines) are common issues in data driven science such as bioinformatics. Large amounts of computational tools are available for data analysis. Additionally, many workflow management systems to piece together such tools into data analysis pipelines have been developed. For example, more than 50 computation...
Motivation:
Disease is often manifested via changes in transcript and protein abundance. MicroRNAs (miRNAs) are instrumental in regulating protein abundance and may measurably influence transcript levels. MicroRNAs often target more than one mRNA (for humans, the average is three), and mRNAs are often targeted by more than one miRNA (for the genes...
Proteins have a strong influence on the phenotype and their aberrant expression leads to diseases. MicroRNAs (miRNAs) are short RNA sequences which posttranscriptionally regulate protein expression. This regulation is driven by miRNAs acting as recognition sequences for their target mRNAs within a larger regulatory machinery. A miRNA can have many...
European hazelnut (Corylus avellana) is a diploid tree species and is widely used in confections. Hazelnuts are, to a large part, produced in Turkey with the cultivar “Tombul” widely grown in the Black Sea region. In this work, the “Tombul” genome was partially sequenced by next-generation sequencing technology yielding 29.2% (111.85 Mb) of the ~ 3...
List of differentially expressed genes. All of the comparisons are given in different excel sheets and the significant differential expressions are gathered in extra sheets for each comparison.
Distribution of normalized genes. The values shown on the y-axis is the resulting numbers from the formula presented in normalization method. Normalization was done for each gene in each sample and the distribution of mapped nucleotides were found to have closer median values than raw counts.
Statistics of reads. All of the mapped, unmapped, and deleted reads can be seen in their respective columns.
List of all mature miRNAs. Mature miRNAs that passed the filtering after machine learning process are given with their sequences.
List of differentially expressed interactions. All of the comparisons are given in different excel sheets and the significant differential expressions are gathered in extra sheets for each comparison.
List of pre-miRNAs. From 0.99 confidence to all filtering steps, lists of pre-miRNAs are provided in their respective sheets.
List of mature sequences and their target candidates. For the formed network, all of the targeted genes for each mature miRNA (that survived the filtering) were presented.
Raw and normalized gene and miRNA counts. All of the count values are presented in their respective samples in columns, in 4 different sheets for raw and normalized values.
List of differentially expressed miRNAs. All of the comparisons are given in different excel sheets and the significant differential expressions are gathered in extra sheets for each comparison.
Score distributions of 1,000 machine learned models established using 1,000-fold Monte Carlo cross validation.
Distribution of normalized miRNA expressions. Normalization method that was employed to genes were applied to miRNA expressions. It was seen that median values were varying between samples but closer among similar mean read lengths.
Proteins define phenotypes and their dysregulation leads to diseases. Post-translational regulation of protein
abundance can be achieved by microRNAs (miRNAs). Therefore studying this method of gene regulation is
of high importance. MicroRNAs interact with their target messenger RNA via hybridization within a
specialized molecular framework. Many m...
MicroRNAs (miRNAs), approximately 22 nucleotides long, post-transcriptionally active gene expression regulators, play active roles in modulating cellular processes. Gene regulation and miRNA regulation are intertwined and the main aim of this study is to facilitate the analysis of miRNAs within gene regulatory pathways. VANESA enables the reconstru...
p>Proteins define phenotypes and their dysregulation leads to diseases. Post-translational regulation of protein abundance can be achieved by microRNAs (miRNAs). Therefore studying this method of gene regulation is of high importance. MicroRNAs interact with their target messenger RNA via hybridization within a specialized molecular framework. Many...
Background
Diseases like cancer can manifest themselves through changes in protein abundance, and microRNAs (miRNAs) play a key role in the modulation of protein quantity. MicroRNAs are used throughout all kingdoms and have been shown to be exploited by viruses to modulate their host environment. Since the experimental detection of miRNAs is diffic...
Spinach is a popular leafy green vegetable due to its nutritional composition. It contains high concentrations of vitamins A, E, C, and K, and folic acid. Development of genetic markers for spinach is important for diversity and breeding studies. In this work, Next Generation Sequencing (NGS) technology was used to develop genomic simple sequence r...
MicroRNAs (miRNAs) are involved in the post-transcriptional regulation of protein abundance and thus have a great impact on the resulting phenotype. It is, therefore, no wonder that they have been implicated in many diseases ranging from virus infections to cancer. This impact on the phenotype leads to a great interest in establishing the miRNAs of...
MicroRNAs (miRNAs) are small RNA molecules which are known to take part in post-transcriptional regulation of gene expression. Here, VANESA, an existing platform for reconstructing, visualizing, and analysis of large biological networks, has been further expanded to include all experimentally validated human miRNAs available within miRBase, TarBase...
Identification of microRNA (miRNA) precursors has seen increased efforts in recent years. The difficulty in experimental detection of pre-miRNAs increased the usage of computational approaches. Most of these approaches rely on machine learning especially classification. In order to achieve successful classification, many parameters need to be consi...
Improvements in genome sequencing technology increased the availability of full genomes and transcriptomes of many organisms. However, the major benefit of massive parallel sequencing is to better understand the organization and function of genes which then lead to understanding of phenotypes. In order to interpret genomic data with automated gene...
Editorial
The term MicroRNA or its contraction miRNA currently appears in 21,215 titles of abstracts, published between 1997 and now, available on Pubmed (2016-21-22:12:59 EET). 4,108 of these were published in 2016 alone which signifies the importance of miRNA-related research. MicroRNAs can be detected experimentally using various techniques like...
Gene regulation modulates RNA expression via transcription factors. Post-transcriptional gene regulation in turn influences the amount of protein product through, for example, microRNAs (miRNAs). Experimental establishment of miRNAs and their effects is complicated and even futile when aiming to establish the entirety of miRNA target interactions....
Negative Datasets
Spreadsheet containing all synthetic negative datasets produced in this study.
Figures S1–S8 and Table S1
This file contains a collection of supplementary figures (S1–S8) and the Table S1 which lists the selected features.
All established model performances
Collection of all established model performances for 100 fold MCCV (43200).
All established model performances
The recorded model performances for 10-fold cross validation (720).
Background
Post-transcriptional gene dysregulation can be a hallmark of diseases like cancer and microRNAs (miRNAs) play a key role in the modulation of translation efficiency. Known pre-miRNAs are listed in miRBase, and they have been discovered in a variety of organisms ranging from viruses and microbes to eukaryotic organisms. The computational...
A disease phenotype is often due to dysregulation of gene expression. Post-translational regulation of protein abundance by microRNAs (miRNAs) is, therefore, of high importance in, for example, cancer studies. MicroRNAs provide a complementary sequence to their target messenger RNA (mRNA) as part of a complex molecular machinery. Known miRNAs and t...
Boron is an essential plant micronutrient; but is toxic at high concentrations. Boron toxicity can severely affect crop productivity in arid and semi-arid environments. Puccinellia distans (Jacq.) Par1. , common alkali grass, is found throughout the world and can survive under boron concentrations that are lethal for other plant species. In additio...
Faba bean (Vicia faba L.) is an important food legume crop with a huge genome. Development of genetic markers for faba bean is important to study diversity and for molecular breeding. In this study, we used Next Generation Sequencing (NGS) technology for the development of genomic simple sequence repeat (SSR) markers. A total of 14,027,500 sequence...
Motivation:
Protein synthesis is not a straight forward process and one gene locus can produce many isoforms, for example, by starting mRNA translation from alternative start sites. altORF evaluator (altORFev) predicts alternative open reading frames within eukaryotic mRNA translated by a linear scanning mechanism and its modifications (leaky scan...
Identification of microRNA (miRNA) precursors has seen increased efforts in recent years. The difficulty in experimental detection of pre-miRNAs increased the usage of computational approaches. Most of these approaches rely on machine learning especially classification. In order to achieve successful classification, many parameters need to be consi...
Improvements in genome sequencing technology increased the availability of full genomes and transcriptomes of many organisms. However, the major benefit of massive parallel sequencing is to better understand the organization and function of genes which then lead to understanding of phenotypes. In order to interpret genomic data with automated gene...
Experimental detection and validation of miRNAs is a tedious, time-consuming, and expensive process. Computational methods for miRNA gene detection are being developed so that the number of candidates that need experimental validation can be reduced to a manageable amount. Computational methods involve homology-based and ab inito algorithms. Both a...
One of the leading causes of death is cancer, it is the reason for more than 8.2 million yearly fatalities according to a 2012 statistics of World Health Organization. Cancer itself is, however, not a new problem and Odel and colleagues uncovered that cancer existed even as many as 1.7 million years ago. That study indicates that the cause of cance...
In parallel with the development of nucleotide sequencing an equally important interest in further describing the sequence in terms of function arose and the latter represents the current bottleneck in the overall research question. Sequencing the transcriptome allows determination of expressed nucleotide sequences and using mass spectrometry allow...
MicroRNAs are short RNA sequences involved in post-transcriptional gene regulation. MicroRNAs are known for a wide variety of species ranging from bacteria to plants. It has become clear that some cross-kingdom regulation is possible especially between viruses and their hosts. We hypothesized that intracellular parasites, like
Toxoplasma
gondii, si...
MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18–24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, t...
Talk at the EUPA 2016 Abstract The phenotype of an organism or a disease is largely dependent on protein expression. Proteomics investigates proteins, their modifications, their spatial and temporal expression patterns, and further parameters. The current workhorse in proteomics is mass spectrometry (MS) which enables identification and quantitatio...
Supplementary Table 2
All feature selection methods except for the last two and the selected features as well as information on how to calculate them.
Supplementary Table 1
The results for the remaining six feature selection methods.
Apoptosis, which plays a vital role in the homeostasis of a number of important cellular processes, is tightly regulated by protein-coding genes and small RNAs. Post-genomic annonation has revealed the importance of long non-coding RNAs (lncRNAs) in gene regulation. However, currently, there isn’t any report documenting a systematic screening of th...
Apoptosis is a type of Programmed Cell Death (PCD) which is essential for cellular homeostasis and proper development. Diseases like autoimmune diseases and cancer are associated with aberrant apoptosis. Despite the well-known role of certain proteins and microRNAs in apoptosis, the potential regulatory role of long non-coding RNAs (lncRNAs) is sti...
Questions
Questions (12)
I was thinking that the shortest feedback loop should be a gene which produces a transcription factor (TF) which in turn down regulates the expression of the gene.
This may seem a bit pointless, but perhaps the same TF with some other co-factor(s) regulates other gene expressions as well and, therefore, its expression needs to be tightly controlled.
Could anyone point me to such examples?
Thank you.
Following the successful first volume of miRNomics
we are currently inviting chapter contributions for the second volume of the book covering all areas of miRNomics. The book will again be published in the Methods in Molecular Biology Series by Springer which is widely indexed in various indices. If you find that one of the chapters suits your expertise and you are willing to write it by August 2018, please contact us. If you feel a topic is missing, please do the same.
Please refer to the tentative list of chapters in the RG project: https://www.researchgate.net/project/miRNomics-II/.
Dear Colleagues,
embryogenesis (in general development) implies multiple steps of cell duplication. From this perspective, I would like to know which two human tissues can be considered very distant to each other counting the number of replication steps for both branches from a common ancestor (stem cell) from a time point at the end of embryogenesis.
If this is not known, could you provide an educated guess?
Thank you
We are currently inviting chapter contributions for a book covering all areas of proteogenomics. The book will be published in the Methods in Molecular Biology Series by Humana Press which is widely indexed.
If you find that one of the chapters suits your expertise and you are willing to write it by December 2017, please contact us. If you feel a topic is missing, please do the same.
Chapters
1. Public data sources (MS, NGS, Sequence, Annotation DBs; can be multiple chapters)
2. Database search strategies in proteogenomics
3. Sequence database preprocessing for proteogenomics
4. Peptide mapping
5. Protein inference
6. Visualization
7. Gene and protein annotation in proteogenomics
8. Genome annotation
9. Integrated proteogenomics pipelines (can be multiple chapters)
10. Prokaryote proteogenomics
11. Eukaryote proteogenomics
12. Protegenomics for non-model organisms
13. Clinical proteogenomics
14. Oncoprotegenomics
15. Biomarker discovery
16. Future perspectives in proteogenomics
XX. Please Propose
Aim:
Are the features that we are interested in more conserved in a set of virus strains than expected?
Input:
1) 100 virus genomes (strains of same virus) as pairwise aligned sequences (no MSA).
2) Features we are interested in as locations in one of the genomes (e.g.: GFF).
3) As 2) but randomly selected regions of same size as 2).
Intuitively, I would just calculate the differences for these features (2) over all genomes and compare the %nucleotide changes to random selected parts (3).
How should this actually be done according to the current state of the art and are there any tools to do this calculation?