Jens Allmer

Jens Allmer
Verified
Jens verified their affiliation via an institutional email.
Verified
Jens verified their affiliation via an institutional email.
  • Professor
  • Professor (Full) at Ruhr West University of Applied Sciences

Visit my blog, where I discuss various topics, from vitamins to LLMs.

About

206
Publications
64,215
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,135
Citations
Introduction
My interest in bioinformatics started before 2000. I then started developing algorithms for high throughput data from mass spectrometry-based proteomics. Today, I amended that with genomic and transcriptomic data and algorithms to analyze these. A unification of these efforts is presented in our ongoing proteogenomics studies. To give back to the general public, I recently started a blog where I discuss various topics regarding health but also machine learning: https://www.allmer.de/blog
Current institution
Ruhr West University of Applied Sciences
Current position
  • Professor (Full)
Additional affiliations
September 2017 - September 2017
Wageningen University & Research
Position
  • Bioinformatician
June 2016 - present
Bielefeld University
Position
  • Professor
Description
  • Overseeing a joint project between Germany and Turkey about the integration of gene and microRNA regulatory networks.
January 2011 - present
Izmir Institute of Technology
Position
  • Professor (Associate)
Education
January 2006 - July 2006
University of Münster
Field of study
  • Biology
January 2004 - December 2005
University of Pennsylvania
Field of study
  • Biology
September 2002 - December 2003

Publications

Publications (206)
Article
Full-text available
MicroRNAs are crucial for post-transcriptional gene regulation, and their dysregulation has been associated with diseases like cancer and, therefore, their analysis has become popular. The experimental discovery of miRNAs is cumbersome and, thus, many computational tools have been proposed. Here we assess 13 ab initio pre-miRNA detection approaches...
Article
Full-text available
Background: MicroRNAs (miRNAs) are short RNA sequences that guide post-transcriptional regulation of gene expression via complementarity to their target mRNAs. Discovered only recently, miRNAs have drawn a lot of attention. Multiple protein complexes interact to first cleave a hairpin from nascent RNA, export it into the cytosol, trim its loop, an...
Article
Full-text available
Proteomics is the study of proteins, their time- and location-dependent expression profiles, as well as their modifications and interactions. Mass spectrometry is useful to investigate many of the questions asked in proteomics. Database search methods are typically employed to identify proteins from complex mixtures. However, databases are not ofte...
Article
Full-text available
Chlamydomonas reinhardtii is a unicellular green alga whose lineage diverged from land plants over 1 billion years ago. It is a model system for studying chloroplast-based photosynthesis, as well as the structure, assembly, and function of eukaryotic flagella (cilia), which were inherited from the common ancestor of plants and animals, but lost in...
Article
Full-text available
MicroRNAs (miRNAs) are involved in post-transcriptional modulation of gene expression and thereby have a large influence on the resulting phenotype. We have previously shown that miRNAs may be involved in the communication between Toxoplasma gondii and its hosts and further confirmed a number of proposed specific miRNAs. Yet, little is known about...
Data
This data was used to evaluate the hypothesis that the Zika virus contains pre-miRNAs that are expressed.
Chapter
The changes in protein expression are hallmarks of development and disease. Protein expression can be established qualitatively and quantitatively using mass spectrometry (MS). Samples are prepared, proteins extracted and then analyzed using MS and MS/MS. The resulting spectra need to be processed computationally to assign peptide spectrum match. D...
Chapter
Proteogenomics enables the confirmation and refinement of gene models, the detection of new ones, and the proposition of alternative transcripts using support at the protein level. Such evidence is usually generated using mass spectrometry and subsequent result mapping to various sequence databases. This workflow entails several problems: (1) To sp...
Chapter
MicroRNAs (miRNAs) play a pivotal role in posttranscriptional gene regulation, making computational methods crucial for studying miRNAs. However, with the exponential growth of available data, accessing and effectively utilizing the vast amount of information has become a significant challenge for precise miRNA study. In this chapter, we address th...
Article
Full-text available
MicroRNAs (miRNAs), a class of small, non-coding RNAs, play a pivotal role in regulating gene expression at the post-transcriptional level. These regulatory molecules are integral to many biological processes and have been implicated in the pathogenesis of various diseases, including Human Immunodeficiency Virus (HIV) infection. This review aims to...
Preprint
Full-text available
The treatment of human diseases is a major research question in many fields related to medicine. It has become clear that patient stratification is of utmost importance so that patients receive the best possible treatment. Bio/disease markers are critical to achieve stratification. Markers can come from many different sources such as genomics, tran...
Article
Full-text available
Alternative polyadenylation (APA) increases transcript diversity through the generation of isoforms with varying 3′ untranslated region (3′ UTR) lengths. As the 3′ UTR harbors regulatory element target sites, such as miRNAs or RNA-binding proteins, changes in this region can impact post-transcriptional regulation and translation. Moreover, the APA...
Article
Full-text available
Deep learning is a powerful machine learning technique that can learn from large amounts of data using multiple layers of artificial neural networks. This paper reviews some applications of deep learning in bioinformatics, a field that deals with analyzing and interpreting biological data. We first introduce the basic concepts of deep learning and...
Preprint
Full-text available
Changes in protein expression are hallmarks of development and disease. Protein expression can be established qualitatively and quantitatively using mass spectrometry (MS). Samples are prepared, proteins extracted, and then analyzed using MS and MS/MS. The resulting spectra need to be processed computationally to assign a peptide spectrum match. Da...
Article
Full-text available
Science has become a highly competitive undertaking concerning, for example, resources, positions, students, and publications. At the same time, the number of journals presenting scientific findings skyrockets while the knowledge increase per manuscript seems to be diminishing. Science has also become ever more dependent on computational analyses....
Article
Full-text available
Background Cell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an organism. The Kyoto encyclopedia of genes and genomes (KEGG...
Article
Diseases such as cancer are often defined by dysregulation of gene expression. Noncoding RNAs (ncRNA) such as microRNAs are involved in gene expression and cell-cell communication. Many other ncRNAs exist, such as circular RNAs and small nucleolar RNAs. A wealth of knowledge is available for many ncRNAs, but the information is federated in many dat...
Chapter
Data analytics, machine learning, and artificial intelligence have found widespread application in science. They are usually employed as part of more extensive data analysis pipelines starting with raw data processing. Unfortunately, many tools that are not tested yet are used to support critical decision-making. The intended internet of science pl...
Book
Full-text available
Der Sammelband des KI-Campus gibt Einblicke und Impulse, wie offene, digitale Lernangebote zum Thema Künstliche Intelligenz (KI) in die Hochschullehre integriert werden können. In elf Beiträgen teilen Lehr-Fellows verschiedener Fachbereiche ihre Erfahrungen und Erkenntnisse, wie sie Online-Kurse, Videos und Podcasts des KI-Campus in unterschiedlich...
Preprint
Full-text available
Background: Cell homeostasis relies on the concerted actions of several genes; and dysregulated genes lead to disease manifestations. In living organisms, genes or their products do not act alone, but instead act within a large network. Subsets of these networks can be viewed as modules which provide certain functionality in an organism. Kyoto Ency...
Chapter
Mature microRNAs (miRNAs) are short RNA sequences about 18–24 nucleotide long, which provide the recognition key within RISC for the posttranscriptional regulation of target RNAs. Considering the canonical pathway, mature miRNAs are produced via a multistep process. Their transcription (pri-miRNAs) and first processing step via the microprocessor c...
Article
Full-text available
microRNAs (miRNAs or miRs) are short non-coding RNA molecules which have been shown to be dysregulated and released into the extracellular milieu as a result of many drug and non-drug-induced pathologies in different organ systems. Consequently, circulating miRs have been proposed as useful biomarkers of many disease states, including drug-induced...
Article
Full-text available
Background Subacute sclerosing panencephalitis (SSPE) is a chronic, progressive disease caused by a persistent infection of the measles virus. Despite extensive efforts, the exact neurodegeneration mechanism in SSPE remains unknown. MicroRNAs (miRNAs) have emerged as an essential part of cellular antiviral defense mechanisms and can be modulated by...
Chapter
Gene regulation is of utmost importance to cell homeostasis; thus, any dysregulation in it often leads to disease. MicroRNAs (miRNAs) are involved in posttranscriptional gene regulation and consequently, their dysregulation has been associated with many diseases. MiRBase version 21 contains microRNAs from about 200 species organized into about 70 c...
Article
Full-text available
MicroRNAs (miRNAs) are short RNA sequences that are actively involved in gene regulation. These regulators on the post-transcriptional level have been discovered in virtually all eukaryotic organisms. Additionally, miRNAs seem to exist in viruses and might also be produced in microbial pathogens. Initially, transcribed RNA is cleaved by Drosha, pro...
Article
Full-text available
SARS-CoV-2 has spread worldwide and caused social, economic, and health turmoil. The first genome assembly of SARS-CoV-2 was produced in Wuhan, and it is widely used as a reference. Subsequently, more than a hundred additional SARS-CoV-2 genomes have been sequenced. While the genomes appear to be mostly identical, there are variations. Therefore, a...
Data
Unfortunately, some available tools for peptide mapping in proteomics and proteogenomics have technical shortcomings. Thus, we developed a set of test cases to allow tool developers to test their implementations comprehensively.
Article
Full-text available
For the identification and sequencing of proteins, mass spectrometry (MS) has become the tool of choice and, as such, drives proteomics. MS/MS spectra need to be assigned a peptide sequence for which two strategies exist. Either database search or de novo sequencing can be employed to establish peptide spectrum matches. For database search, mzIdent...
Data
The Table contains predicted miRNAs for 538 ZIKV genomes available in NCBI (see http://dx.doi.org/10.13140/RG.2.2.36262.34884). Predictions were filtered and further annotated with human homologous miRNAs and predicted targets in humans (also filtered). Overall about 1 million rows of data are in the compiled table.
Data
The data was collected from NCBI (https://www.ncbi.nlm.nih.gov/nuccore/) and combined into one file for further data analysis and to share the collected data with others.
Preprint
Full-text available
Many diseases are driven by dysregulated gene expression. MicroRNAs are key players for post-transcriptional gene regulation. miRBase contains microRNAs (miRNAs) from about 200 species organized into about 70 clades. It has been shown that not all miRNAs collected in the database are likely to be real and, therefore, novel routes to delineate betwe...
Preprint
Full-text available
MicroRNAs (miRNAs) are small non-coding RNA sequences that have been implicated in many physiological processes and diseases. The experimental discovery of miRNAs is complicated because both miRNAs and their targets need to be expressed for the confirmation of functional interactions, but expression is under spatiotemporal control. This has motivat...
Preprint
Full-text available
Motivation: Proteogenomics involves the supporting of gene models with experimental proteomics data. Mass spectrometry allows the measurement of peptides and the mass spectra can be assigned a peptide sequence using various algorithms. These tools were not designed for proteogenomics and peptide mapping to reference databases needs to be performed...
Preprint
Full-text available
This paper extends the idea of an Internet of Science with the option of workflow creation and sharing.
Chapter
MicroRNAs (miRNAs) are short RNA sequences actively involved in post-transcriptional gene regulation. Such miRNAs have been discovered in most eukaryotic organisms. They also seem to exist in viruses and perhaps in microbial pathogens to target the host. Drosha is the enzyme which first cleaves the pre-miRNA from the nascent pri-miRNA. Previously,...
Article
Full-text available
Big data and complex analysis workflows (pipelines) are common issues in data driven science such as bioinformatics. Large amounts of computational tools are available for data analysis. Additionally, many workflow management systems to piece together such tools into data analysis pipelines have been developed. For example, more than 50 computation...
Article
Motivation: Disease is often manifested via changes in transcript and protein abundance. MicroRNAs (miRNAs) are instrumental in regulating protein abundance and may measurably influence transcript levels. MicroRNAs often target more than one mRNA (for humans, the average is three), and mRNAs are often targeted by more than one miRNA (for the genes...
Chapter
Full-text available
Proteins have a strong influence on the phenotype and their aberrant expression leads to diseases. MicroRNAs (miRNAs) are short RNA sequences which posttranscriptionally regulate protein expression. This regulation is driven by miRNAs acting as recognition sequences for their target mRNAs within a larger regulatory machinery. A miRNA can have many...
Article
Full-text available
European hazelnut (Corylus avellana) is a diploid tree species and is widely used in confections. Hazelnuts are, to a large part, produced in Turkey with the cultivar “Tombul” widely grown in the Black Sea region. In this work, the “Tombul” genome was partially sequenced by next-generation sequencing technology yielding 29.2% (111.85 Mb) of the ~ 3...
Data
List of differentially expressed genes. All of the comparisons are given in different excel sheets and the significant differential expressions are gathered in extra sheets for each comparison.
Data
Distribution of normalized genes. The values shown on the y-axis is the resulting numbers from the formula presented in normalization method. Normalization was done for each gene in each sample and the distribution of mapped nucleotides were found to have closer median values than raw counts.
Data
Statistics of reads. All of the mapped, unmapped, and deleted reads can be seen in their respective columns.
Data
List of all mature miRNAs. Mature miRNAs that passed the filtering after machine learning process are given with their sequences.
Data
List of differentially expressed interactions. All of the comparisons are given in different excel sheets and the significant differential expressions are gathered in extra sheets for each comparison.
Data
List of pre-miRNAs. From 0.99 confidence to all filtering steps, lists of pre-miRNAs are provided in their respective sheets.
Data
List of mature sequences and their target candidates. For the formed network, all of the targeted genes for each mature miRNA (that survived the filtering) were presented.
Data
Raw and normalized gene and miRNA counts. All of the count values are presented in their respective samples in columns, in 4 different sheets for raw and normalized values.
Data
List of differentially expressed miRNAs. All of the comparisons are given in different excel sheets and the significant differential expressions are gathered in extra sheets for each comparison.
Data
Score distributions of 1,000 machine learned models established using 1,000-fold Monte Carlo cross validation.
Data
Distribution of normalized miRNA expressions. Normalization method that was employed to genes were applied to miRNA expressions. It was seen that median values were varying between samples but closer among similar mean read lengths.
Conference Paper
Full-text available
Proteins define phenotypes and their dysregulation leads to diseases. Post-translational regulation of protein abundance can be achieved by microRNAs (miRNAs). Therefore studying this method of gene regulation is of high importance. MicroRNAs interact with their target messenger RNA via hybridization within a specialized molecular framework. Many m...
Article
Full-text available
MicroRNAs (miRNAs), approximately 22 nucleotides long, post-transcriptionally active gene expression regulators, play active roles in modulating cellular processes. Gene regulation and miRNA regulation are intertwined and the main aim of this study is to facilitate the analysis of miRNAs within gene regulatory pathways. VANESA enables the reconstru...
Article
p>Proteins define phenotypes and their dysregulation leads to diseases. Post-translational regulation of protein abundance can be achieved by microRNAs (miRNAs). Therefore studying this method of gene regulation is of high importance. MicroRNAs interact with their target messenger RNA via hybridization within a specialized molecular framework. Many...
Article
Full-text available
Background Diseases like cancer can manifest themselves through changes in protein abundance, and microRNAs (miRNAs) play a key role in the modulation of protein quantity. MicroRNAs are used throughout all kingdoms and have been shown to be exploited by viruses to modulate their host environment. Since the experimental detection of miRNAs is diffic...
Article
Full-text available
Spinach is a popular leafy green vegetable due to its nutritional composition. It contains high concentrations of vitamins A, E, C, and K, and folic acid. Development of genetic markers for spinach is important for diversity and breeding studies. In this work, Next Generation Sequencing (NGS) technology was used to develop genomic simple sequence r...
Article
Full-text available
MicroRNAs (miRNAs) are involved in the post-transcriptional regulation of protein abundance and thus have a great impact on the resulting phenotype. It is, therefore, no wonder that they have been implicated in many diseases ranging from virus infections to cancer. This impact on the phenotype leads to a great interest in establishing the miRNAs of...
Article
Full-text available
MicroRNAs (miRNAs) are small RNA molecules which are known to take part in post-transcriptional regulation of gene expression. Here, VANESA, an existing platform for reconstructing, visualizing, and analysis of large biological networks, has been further expanded to include all experimentally validated human miRNAs available within miRBase, TarBase...
Article
Full-text available
Identification of microRNA (miRNA) precursors has seen increased efforts in recent years. The difficulty in experimental detection of pre-miRNAs increased the usage of computational approaches. Most of these approaches rely on machine learning especially classification. In order to achieve successful classification, many parameters need to be consi...
Article
Full-text available
Improvements in genome sequencing technology increased the availability of full genomes and transcriptomes of many organisms. However, the major benefit of massive parallel sequencing is to better understand the organization and function of genes which then lead to understanding of phenotypes. In order to interpret genomic data with automated gene...
Article
Full-text available
Editorial The term MicroRNA or its contraction miRNA currently appears in 21,215 titles of abstracts, published between 1997 and now, available on Pubmed (2016-21-22:12:59 EET). 4,108 of these were published in 2016 alone which signifies the importance of miRNA-related research. MicroRNAs can be detected experimentally using various techniques like...
Article
Full-text available
Gene regulation modulates RNA expression via transcription factors. Post-transcriptional gene regulation in turn influences the amount of protein product through, for example, microRNAs (miRNAs). Experimental establishment of miRNAs and their effects is complicated and even futile when aiming to establish the entirety of miRNA target interactions....
Data
Negative Datasets Spreadsheet containing all synthetic negative datasets produced in this study.
Data
Figures S1–S8 and Table S1 This file contains a collection of supplementary figures (S1–S8) and the Table S1 which lists the selected features.
Data
All established model performances Collection of all established model performances for 100 fold MCCV (43200).
Data
All established model performances The recorded model performances for 10-fold cross validation (720).
Article
Full-text available
Background Post-transcriptional gene dysregulation can be a hallmark of diseases like cancer and microRNAs (miRNAs) play a key role in the modulation of translation efficiency. Known pre-miRNAs are listed in miRBase, and they have been discovered in a variety of organisms ranging from viruses and microbes to eukaryotic organisms. The computational...
Conference Paper
Full-text available
A disease phenotype is often due to dysregulation of gene expression. Post-translational regulation of protein abundance by microRNAs (miRNAs) is, therefore, of high importance in, for example, cancer studies. MicroRNAs provide a complementary sequence to their target messenger RNA (mRNA) as part of a complex molecular machinery. Known miRNAs and t...
Preprint
Full-text available
Boron is an essential plant micronutrient; but is toxic at high concentrations. Boron toxicity can severely affect crop productivity in arid and semi-arid environments. Puccinellia distans (Jacq.) Par1. , common alkali grass, is found throughout the world and can survive under boron concentrations that are lethal for other plant species. In additio...
Article
Full-text available
Faba bean (Vicia faba L.) is an important food legume crop with a huge genome. Development of genetic markers for faba bean is important to study diversity and for molecular breeding. In this study, we used Next Generation Sequencing (NGS) technology for the development of genomic simple sequence repeat (SSR) markers. A total of 14,027,500 sequence...
Article
Full-text available
Motivation: Protein synthesis is not a straight forward process and one gene locus can produce many isoforms, for example, by starting mRNA translation from alternative start sites. altORF evaluator (altORFev) predicts alternative open reading frames within eukaryotic mRNA translated by a linear scanning mechanism and its modifications (leaky scan...
Article
Full-text available
Identification of microRNA (miRNA) precursors has seen increased efforts in recent years. The difficulty in experimental detection of pre-miRNAs increased the usage of computational approaches. Most of these approaches rely on machine learning especially classification. In order to achieve successful classification, many parameters need to be consi...
Article
Full-text available
Improvements in genome sequencing technology increased the availability of full genomes and transcriptomes of many organisms. However, the major benefit of massive parallel sequencing is to better understand the organization and function of genes which then lead to understanding of phenotypes. In order to interpret genomic data with automated gene...
Article
Full-text available
Experimental detection and validation of miRNAs is a tedious, time-consuming, and expensive process. Computational methods for miRNA gene detection are being developed so that the number of candidates that need experimental validation can be reduced to a manageable amount. Computational methods involve homology-based and ab inito algorithms. Both a...
Poster
One of the leading causes of death is cancer, it is the reason for more than 8.2 million yearly fatalities according to a 2012 statistics of World Health Organization. Cancer itself is, however, not a new problem and Odel and colleagues uncovered that cancer existed even as many as 1.7 million years ago. That study indicates that the cause of cance...
Article
In parallel with the development of nucleotide sequencing an equally important interest in further describing the sequence in terms of function arose and the latter represents the current bottleneck in the overall research question. Sequencing the transcriptome allows determination of expressed nucleotide sequences and using mass spectrometry allow...
Chapter
MicroRNAs are short RNA sequences involved in post-transcriptional gene regulation. MicroRNAs are known for a wide variety of species ranging from bacteria to plants. It has become clear that some cross-kingdom regulation is possible especially between viruses and their hosts. We hypothesized that intracellular parasites, like Toxoplasma gondii, si...
Article
Full-text available
MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18–24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, t...
Presentation
Full-text available
Talk at the EUPA 2016 Abstract The phenotype of an organism or a disease is largely dependent on protein expression. Proteomics investigates proteins, their modifications, their spatial and temporal expression patterns, and further parameters. The current workhorse in proteomics is mass spectrometry (MS) which enables identification and quantitatio...
Data
Work Feature Selection Workflows
Data
Supplementary Table 2 All feature selection methods except for the last two and the selected features as well as information on how to calculate them.
Data
Supplementary Table 1 The results for the remaining six feature selection methods.
Poster
Full-text available
Apoptosis, which plays a vital role in the homeostasis of a number of important cellular processes, is tightly regulated by protein-coding genes and small RNAs. Post-genomic annonation has revealed the importance of long non-coding RNAs (lncRNAs) in gene regulation. However, currently, there isn’t any report documenting a systematic screening of th...
Presentation
Full-text available
Apoptosis is a type of Programmed Cell Death (PCD) which is essential for cellular homeostasis and proper development. Diseases like autoimmune diseases and cancer are associated with aberrant apoptosis. Despite the well-known role of certain proteins and microRNAs in apoptosis, the potential regulatory role of long non-coding RNAs (lncRNAs) is sti...

Questions

Questions (12)
Question
I was thinking that the shortest feedback loop should be a gene which produces a transcription factor (TF) which in turn down regulates the expression of the gene.
This may seem a bit pointless, but perhaps the same TF with some other co-factor(s) regulates other gene expressions as well and, therefore, its expression needs to be tightly controlled.
Could anyone point me to such examples?
Thank you.
Question
Following the successful first volume of miRNomics
we are currently inviting chapter contributions for the second volume of the book covering all areas of miRNomics. The book will again be published in the Methods in Molecular Biology Series by Springer which is widely indexed in various indices. If you find that one of the chapters suits your expertise and you are willing to write it by August 2018, please contact us. If you feel a topic is missing, please do the same.
Please refer to the tentative list of chapters in the RG project: https://www.researchgate.net/project/miRNomics-II/.
Question
Dear Colleagues,
embryogenesis (in general development) implies multiple steps of cell duplication. From this perspective, I would like to know which two human tissues can be considered very distant to each other counting the number of replication steps for both branches from a common ancestor (stem cell) from a time point at the end of embryogenesis. 
If this is not known, could you provide an educated guess?
Thank you 
Question
We are currently inviting chapter contributions for a book covering all areas of proteogenomics. The book will be published in the Methods in Molecular Biology Series by Humana Press which is widely indexed.
If you find that one of the chapters suits your expertise and you are willing to write it by December 2017, please contact us. If you feel a topic is missing, please do the same.
Chapters
1. Public data sources (MS, NGS, Sequence, Annotation DBs; can be multiple chapters)
2. Database search strategies in proteogenomics
3. Sequence database preprocessing for proteogenomics
4. Peptide mapping
5. Protein inference
6. Visualization
7. Gene and protein annotation in proteogenomics
8. Genome annotation
9. Integrated proteogenomics pipelines (can be multiple chapters)
10. Prokaryote proteogenomics
11. Eukaryote proteogenomics
12. Protegenomics for non-model organisms
13. Clinical proteogenomics
14. Oncoprotegenomics
15. Biomarker discovery
16. Future perspectives in proteogenomics
XX. Please Propose
Question
Aim:
Are the features that we are interested in more conserved in a set of virus strains than expected?
Input:
1) 100 virus genomes (strains of same virus) as pairwise aligned sequences (no MSA).
2) Features we are interested in as locations in one of the genomes (e.g.: GFF).
3) As 2) but randomly selected regions of same size as 2).
Intuitively, I would just calculate the differences for these features (2) over all genomes and compare the %nucleotide changes to random selected parts (3).
How should this actually be done according to the current state of the art and are there any tools to do this calculation?

Network

Cited By