Robert FinnEMBL-EBI | EBI · Finn Group
Robert Finn
Microbiology BSc Biochemistry PhD
About
270
Publications
87,760
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
84,974
Citations
Introduction
Skills and Expertise
Additional affiliations
July 2010 - December 2013
Publications
Publications (270)
The use of culture independent molecular methods, often referred to as metagenomics, have revolutionized the ability to explore and characterize microbial communities from diverse environmental sources. Most metagenomic workflows have been developed for identification of prokaryotic and eukaryotic community constituents, but tools for identificatio...
In lichen research, metagenomes are increasingly being used for evaluating symbiont composition and metabolic potential, but the overall content and limitations of these metagenomes have not been assessed. We reassembled over 400 publicly available metagenomes, generated metagenome-assembled genomes (MAGs), constructed phylogenomic trees, and mappe...
Motivation
Metagenome-Assembled Genomes (MAGs) or Single-cell Amplified Genomes (SAGs) are often incomplete, with sequences missing due to errors in assembly or low coverage. This presents a particular challenge for the identification of true gene frequencies within a microbial population, as core genes missing in only a few assemblies will be misc...
Cyanobacteria are globally occurring photosynthetic bacteria notable for their contribution to primary production and production of toxins which have detrimental ecosystem impacts. Furthermore, cyanobacteria can form mutualistic symbiotic relationships with a diverse set of eukaryotes, including land plants, aquatic plankton and fungi. Nevertheless...
Despite the surge in data acquisition, there is a limited availability of tools capable of effectively analyzing microbiome data that identify correlations between taxonomic compositions and continuous environmental factors. Furthermore, existing tools also do not predict the environmental factors in new samples, underscoring the pressing need for...
Domestication represents one of the largest biological shifts of life on Earth, and for many animal species, behavioral selection is thought to facilitate early stages of the process. The gut microbiome of animals can respond to environmental changes and have diverse and powerful effects on host behavior. As such, we hypothesize that selection for...
Metagenome Assembled Genomes (MAGs) are often incomplete, with sequences missing due to errors in assembly or low coverage. Incomplete MAGs present a particular challenge for identification of shared genes within a microbial population, known as core genes, as a core gene missing in only a few assemblies will result in it being mischaracterized at...
Microbiome research has grown substantially over the past decade in terms of the range of biomes sampled, identified taxa, and the volume of data derived from the samples. In particular, experimental approaches such as metagenomics, metabarcoding, metatranscriptomics and metaproteomics have provided profound insights into the vast, hitherto unknown...
Ensembl (https://www.ensembl.org) is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates and model organisms for more than two decades. In recent years, there has been a dramatic shift in the genomic landscape, with a large increase in the number and phylogenetic breadth of high-qualit...
Meta’omic data on microbial diversity and function accrue exponentially in public repositories, but derived information is often siloed according to data type, study or sampled microbial environment. Here we present SPIRE, a Searchable Planetary-scale mIcrobiome REsource that integrates various consistently processed metagenome-derived microbial da...
Background
Genomic Observatories (GOs) are sites of long-term scientific study that undertake regular assessments of the genomic biodiversity. The European Marine Omics Biodiversity Observation Network (EMO BON) is a network of GOs that conduct regular biological community samplings to generate environmental and metagenomic data of microbial commun...
The use of culture independent molecular methods, often referred to as metagenomics, have revolutionized the ability to explore and characterize microbial communities from diverse environmental sources. Most metagenomic workflows have been developed for identification of prokaryotic and eukaryotic community constituents, but tools for identificatio...
Microbiome data, metadata and analytical workflows have become 'big' in terms of volume and complexity. Although the infrastructure and technologies to share data have been established, the interdisciplinary and multi-omic nature of the field can make resources difficult to identify and use. Following best practices for data deposition requires sub...
The study of viral communities has revealed the enormous diversity and impact these biological entities have on various ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterisation of viral communities based on sequencing data. Here we introduce VIRify, a new...
Natural products (bio)synthesised by microbes are an important component of the pharmacopeia with a vast array of biomedical applications, in addition to their key role in many ecological interactions. One approach for the discovery of these metabolites is the identification of biosynthetic gene clusters (BGCs), genomic units which encode the molec...
Understanding the development of functional attributes of host-associated microbial communities is essential for developing novel microbe-based solutions for sustainable animal production. We applied multi-omics to 388 broiler chicken caecal samples to characterise and model the functional dynamics of 822 bacterial strains. Although microbial commu...
Atopic dermatitis (AD) is a multifactorial, chronic relapsing disease associated with genetic and environmental factors. Among skin microbes, Staphylococcus aureus and Staphylococcus epidermidis are associated with AD, but how genetic variability and staphylococcal strains shape the disease remains unclear. We investigated the skin microbiome of an...
Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such...
Lichens are the archetypal symbiosis and the one for which the term was coined. Although application of shotgun sequencing techniques has shown that many lichen symbioses can harbour more symbionts than the canonically recognized fungus and photobiont, no global census of lichen organismal composition has been undertaken. Here, we analyze the genom...
An increasingly common output arising from the analysis of shotgun metagenomic datasets is the generation of metagenome-assembled genomes (MAGs), with tens of thousands of MAGs now described in the literature. However, the discovery and comparison of these MAG collections is hampered by the lack of uniformity in their generation, annotation and sto...
Most metagenomic workflows have been developed for identification of prokaryotic and eukaryotic community constituents, but tools for identification of plastid genomes are lacking. plastiC is a workflow that allows users to identify plastid genomes in metagenome assemblies, assess completeness, and predict taxonomic association from diverse environ...
The MGnify platform (https://www.ebi.ac.uk/metagenomics) facilitates the assembly, analysis and archiving of microbiome-derived nucleic acid sequences. The platform provides access to taxonomic assignments and functional annotations for nearly half a million analyses covering metabarcoding, metatranscriptomic, and metagenomic datasets, which are de...
Understanding the myriad pathways by which antimicrobial-resistance genes (ARGs) spread across biomes is necessary to counteract the global menace of antimicrobial resistance. We screened 17939 assembled metagenomic samples covering 21 biomes, differing in sequen-cing quality and depth, unevenly across 46 countries, 6 continents, and 14 years (2005...
Ensembl (https://www.ensembl.org) has produced high-quality genomic resources for vertebrates and model organisms for more than twenty years. During that time, our resources, services and tools have continually evolved in line with both the publicly available genome data and the downstream research and applications that utilise the Ensembl platform...
Fast optimisation of farming practices is essential to meet environmental sustainability challenges. Hologenomics, the joint study of the genomic features of animals and the microbial communities associated with them, opens new avenues to obtain in-depth knowledge on how host-microbiota interactions affect animal performance and welfare, and in doi...
The study of viral communities has revealed the enormous diversity and impact these biological entities have on a range of different ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterization of viral communities based on sequencing data. Here we introduce V...
Metagenomics is a culture-independent method for studying the microbes inhabiting a particular environment. Comparing the composition of samples (functionally/taxonomically), either from a longitudinal study or cross-sectional studies, can provide clues into how the microbiota has adapted to the environment. However, a recurring challenge, especial...
Microbial communities have essential roles in ocean ecology and planetary health. Microbes participate in nutrient cycles, remove huge quantities of carbon dioxide from the air and support ocean food webs. The taxonomic and functional diversity of the global ocean microbiome has been revealed by technological advances in sampling, DNA sequencing an...
Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40%-60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation int...
Shotgun metagenomics provides access to genetic information of microbes in a culture-independent manner. The recovery of metagenome assembled genomes (MAGs) of these organisms enables in-depth analysis of the functional potential of these, often elusive, organisms. While workflows for the recovery of prokaryotic MAGs are established, MAGs of eukary...
Background
The human intestinal microbiome is a complex community that contributes to host health and disease. In addition to normal microbiota, pathogens like carbapenem-resistant Enterobacteriaceae may be asymptomatically present. When these bacteria are present at very low levels, they are often undetectable in hospital surveillance cultures, kn...
Metagenomics is a culture-independent method for studying the microbes inhabiting a particular environment. Comparing the composition of samples (functionally/taxonomically), either from a longitudinal study or cross-sectional studies can provide clues into how the microbiota has adapted to the environment. However, a recurring challenge, especiall...
Human skin functions as a physical barrier to foreign pathogen invasion and houses numerous commensals. Shifts in the human skin microbiome have been associated with conditions ranging from acne to atopic dermatitis. Previous metagenomic investigations into the role of the skin microbiome in health or disease have found that much of the sequenced d...
The human gut microbiome plays an important role in health, but its archaeal diversity remains largely unexplored. In the present study, we report the analysis of 1,167 nonredundant archaeal genomes (608 high-quality genomes) recovered from human gastrointestinal tract, sampled across 24 countries and rural and urban populations. We identified prev...
Ensembl Genomes (https://www.ensemblgenomes.org) provides access to non-vertebrate genomes and analysis complementing vertebrate resources developed by the Ensembl project (https://www.ensembl.org). The two resources collectively present genome annotation through a consistent set of interfaces spanning the tree of life presenting genome sequence, a...
Genomic knowledge of the tree of life is biased to specific groups of organisms. For example, only six full genomes are currently available in the rhizaria clade. Here, we have applied metagenomic techniques enabling the assembly of the genome of Polymyxa betae (Rhizaria, Plasmodiophorida) RES F41 isolate from unpurified zoospore holobiont and comp...
The particularly interdisciplinary nature of human microbiome research makes the organization and reporting of results spanning epidemiology, biology, bioinformatics, translational medicine and statistics a challenge. Commonly used reporting guidelines for observational or genetic epidemiology studies lack key features specific to microbiome studie...
The COVID-19 pandemic has seen unprecedented use of SARS-CoV-2 genome sequencing for epidemiological tracking and identification of emerging variants. Understanding the potential impact of these variants on the infectivity of the virus and the efficacy of emerging therapeutics and vaccines has become a cornerstone of the fight against the disease....
Non-coding RNAs (ncRNA) are essential for all life, and their functions often depend on their secondary (2D) and tertiary structure. Despite the abundance of software for the visualisation of ncRNAs, few automatically generate consistent and recognisable 2D layouts, which makes it challenging for users to construct, compare and analyse structures....
Despite the advent of whole genome metagenomics, targeted approaches (such as 16S rRNA gene amplicon sequencing) continue to be valuable for determining the microbial composition of samples. Amplicon microbiome sequencing can be performed on clinical samples from a normally sterile site to determine the aetiology of an infection (usually single pat...
Recovering genomes from shotgun metagenomic sequence data allows detailed taxonomic and functional characterization of individual species or strains in a microbial community. Retrieving these metagenome-assembled genomes (MAGs) involves seven stages. First, low-quality bases, along with adapter and host sequences, are removed. Second, overlapping s...
Basidiomycete yeasts have recently been reported as stably associated secondary fungal symbionts (SFSs) of many lichens, but their role in the symbiosis remains unknown. Attempts to sequence their genomes have been hampered both by the inability to culture them and their low abundance in the lichen thallus alongside two dominant eukaryotes (an asco...
Background
The human intestinal microbiome is a complex community that contributes to host health and disease. In addition to normal microbiota, pathogens like carbapenem-resistant Enterobacteriaceae may be asymptomatically present. When these bacteria are present at very low levels, they are often undetectable in hospital surveillance cultures, kn...
Bacteriophages drive evolutionary change in bacterial communities by creating gene flow networks that fuel ecological adaptions. However, the extent of viral diversity and its prevalence in the human gut remains largely unknown. Here, we introduce the Gut Phage Database, a collection of ∼142,000 non-redundant viral genomes (>10 kb) obtained by mini...
The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computationa...
The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computationa...
Comprehensive, high-quality reference genomes are required for functional characterization and taxonomic assignment of the human gut microbiota. We present the Unified Human Gastrointestinal Genome (UHGG) collection, comprising 204,938 nonredundant genomes from 4,644 gut prokaryotes. These genomes encode >170 million protein sequences, which we col...
There are fundamental differences between the current levels of genomic and proteomic knowledge for bacteria and fungi. With multiple growth forms and over 100,000 known species, the fungi probably present a more complex situation, but genomic studies are hindered by the lack of reliable reference data for many species. As activities such as enviro...
The Ensembl COVID-19 browser (covid-19.ensembl.org) was launched in May 2020 in response to the ongoing pandemic. It is Ensembl’s contribution to the global efforts to develop treatments, diagnostics and vaccines for COVID-19, and it supports research into the genomic epidemiology and evolution of the SARS-CoV-2 virus. This freely available resourc...
Efficient response to the pandemic through the mobilization of the larger scientific community is challenged by the limited reusability of the available primary genomic data. Here, the Genomic Standards Consortium board highlights the essential need for contextual genomic data FAIRness, for empowering key data-driven biological questions.
The human gut microbiome plays an important role in health and disease, but the archaeal diversity therein remains largely unexplored. Here we report the pioneering analysis of 1,167 non-redundant archaeal genomes recovered from human gastrointestinal tract microbiomes across countries and populations. We identified three novel genera and 15 novel...
Rfam is a database of RNA families where each of the 3444 families is represented by a multiple sequence alignment of known RNA sequences and a covariance model that can be used to search for additional members of the family. Recent developments have involved expert collaborations to improve the quality and coverage of Rfam data, focusing on microR...
The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are pred...
SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need...
The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the S...
RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences that provides a single access point to 44 RNA resources and >18 million ncRNA sequences from a wide range of organisms and RNA types. RNAcentral now also includes secondary (2D) structure information for >13 million sequences, making RNAcentral the world’s largest RNA 2D str...
Non-coding RNAs (ncRNA) are essential for all life, and the functions of many ncRNAs depend on their secondary (2D) and tertiary (3D) structure. Despite proliferation of 2D visualisation software, there is a lack of methods for automatically generating 2D representations in consistent, reproducible, and recognisable layouts, making them difficult t...
Microbial eukaryotes constitute a significant fraction of biodiversity and have recently gained more attention, but the recovery of high-quality metagenomic assembled eukaryotic genomes is limited by the current availability of tools. To help address this, we have developed EukCC, a tool for estimating the quality of eukaryotic genomes based on the...
Bacteriophages drive evolutionary change in bacterial communities by creating gene flow networks that fuel ecological adaptions. However, the extent of viral diversity and prevalence in the human gut remains largely unknown. Here, we introduce the Gut Phage Database (GPD), a collection of ∼142,000 non-redundant viral genomes (>10 kb) obtained by mi...
Non‐coding RNAs are essential for all life and carry out a wide range of functions. Information about these molecules is distributed across dozens of specialized resources. RNAcentral is a database of non‐coding RNA sequences that provides a unified access point to non‐coding RNA annotations from >40 member databases and helps provide insight into...