
Geoffrey J BartonUniversity of Dundee
Geoffrey J Barton
Ph.D.
About
254
Publications
50,354
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
33,462
Citations
Citations since 2017
Publications
Publications (254)
Fragment screening data from 37 experiments, and 1,309 protein structures binding to 1,601 ligands were analysed. A new method to group ligands by binding sites was developed and sites clustered according to profiles of relative solvent accessibility. This identified 293 unique ligand binding sites, which are grouped into four clusters (C1-4). C1 p...
The molecular evolution of a protein is constrained by its structure and function. This is how patterns of evolutionary conservation in a multiple sequence alignment can be exploited by algorithms like AlphaFold to predict structure and other features. Human population sequencing offers the potential to show similar trends within a single species....
Eukaryotic genes are interrupted by introns that are removed from transcribed RNAs by splicing. The extent of alternative splicing is the best genomic predictor of developmental complexity, yet it is unclear what mediates change in patterns of splicing complexity between species. Here we show that variation in 5′ splice site sequence preferences co...
Protein kinases are major regulators of cellular processes, but the roles of most kinases remain unresolved. Dictyostelid social amoebas have been useful in identifying functions for 30% of its kinases in cell migration, cytokinesis, vesicle trafficking, gene regulation and other processes but their upstream regulators and downstream effectors are...
Alternative splicing of messenger RNAs is associated with the evolution of developmentally complex eukaryotes. Splicing is mediated by the spliceosome, and docking of the pre-mRNA 5' splice site into the spliceosome active site depends upon pairing with the conserved ACAGA sequence of U6 snRNA. In some species, including humans, the central adenosi...
The nutrient-rich tubers of the greater yam, Dioscorea alata L., provide food and income security for millions of people around the world. Despite its global importance, however, greater yam remains an orphan crop. Here, we address this resource gap by presenting a highly contiguous chromosome-scale genome assembly of D. alata combined with a dense...
Alternative splicing of messenger RNAs is associated with the evolution of developmentally complex eukaryotes. Splicing is mediated by the spliceosome, and docking of the pre-mRNA 5’ splice site into the spliceosome active site depends upon pairing with the conserved ACAGA sequence of U6 snRNA. In some species, including humans, the central adenosi...
SARS-CoV-2 Spike (Spike) binds to human angiotensin-converting enzyme 2 (ACE2) and the strength of this interaction could influence parameters relating to virulence. To explore whether population variants in ACE2 influence Spike binding and hence infection, we selected 10 ACE2 variants based on affinity predictions and prevalence in gnomAD and meas...
Ankyrin protein repeats bind to a wide range of substrates and are one of the most common protein motifs in nature. Here, we collate a high-quality alignment of 7,407 ankyrin repeats and examine for the first time, the distribution of human population variants from large-scale sequencing of healthy individuals across this family. Population variant...
Yanocomp is a tool for predicting the positions and stoichiometries of RNA modifications in Nanopore direct RNA sequencing data. It uses general mixture models to identify differentially modified sites between two conditions, with good support for replicates. Yanocomp models across adjacent kmers and uses a uniform component to account for outliers...
Ankyrin protein repeats bind to a wide range of substrates and are one of the most common protein motifs in nature. Here, we collate a high-quality alignment of 7,407 ankyrin repeats and examine for the first time, the distribution of human population variants from large-scale sequencing of healthy individuals across this family. Population variant...
SARS-CoV-2 infection begins with the interaction of the SARS-CoV-2 Spike (Spike) and human angiotensin-converting enzyme 2 (ACE2). To explore whether population variants in ACE2 might influence Spike binding and hence infection, we selected 10 ACE2 variants based on affinity predictions and prevalence in gnomAD and measured their affinities for Spi...
The nutrient-rich tubers of the greater yam Dioscorea alata L. provide food and income security for millions of people around the world. Despite its global importance, however, greater yam remains an "orphan crop." Here we address this resource gap by presenting a highly-contiguous chromosome-scale genome assembly of greater yam combined with a den...
Transcription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long reads reveals the true complexity of processing. However, the relatively high error rates of long-read sequencing technologies can reduce the accuracy of intron identification. Here we apply alignment metrics and machine-le...
In this chapter, we introduce core functionality of the Jalview interactive platform for the creation, analysis, and publication of multiple sequence alignments. A workflow is described based on Jalview’s core functions: from data import to figure generation, including import of alignment reliability scores from T-Coffee and use of Jalview from the...
Genes involved in disease resistance are some of the fastest evolving and most diverse components of genomes. Large numbers of nucleotide-binding, leucine-rich repeat receptor (NLR) genes are found in plant genomes and are required for disease resistance. However, NLRs can trigger autoimmunity, disrupt beneficial microbiota or reduce fitness. It is...
Transcription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long reads reveals the true complexity of processing. However, the relatively high error rates of long-read sequencing technologies can reduce the accuracy of intron identification. Here we apply alignment metrics and machine-le...
Transcription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long-reads reveals the true complexity of processing, however the relatively high error rates of long-read technologies can reduce the accuracy of intron identification. Here we present a two-pass approach, combining alignment m...
SARS-CoV-2 invades host cells via an endocytic pathway that begins with the interaction of the SARS-CoV-2 Spike glycoprotein (S-protein) and human Angiotensin-converting enzyme 2 (ACE2). Genetic variability in ACE2 may be one factor that mediates the broad-spectrum severity of SARS-CoV-2 infection and COVID-19 outcomes. We investigated the capacity...
Understanding genome organization and gene regulation requires insight into RNA transcription, processing and modification. We adapted nanopore direct RNA sequencing to examine RNA from a wild-type accession of the model plant Arabidopsis thaliana and a mutant defective in mRNA methylation (m6A). Here we show that m6A can be mapped in full-length m...
The Dundee Resource for Sequence Analysis and Structure Prediction (DRSASP; http://www.compbio.dundee.ac.uk/drsasp.html) is a collection of web services provided by the Barton Group at the University of Dundee. DRSASP's flagship services are the JPred4 webserver for secondary structure and solvent accessibility prediction and the JABAWS 2.2 webserv...
Motivation:
RNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, differential gene expression (DGE) tools typically assume the form of the underlying gene expression distribution. In this paper, the statistical properties of gene expression from RNA-seq are investigated in the complex...
Antisense transcription is known to have a range of impacts on sense gene expression, including (but not limited to) impeding transcription initiation, disrupting post-transcriptional processes, and enhancing, slowing, or even preventing transcription of the sense gene. Strand-specific RNA-Seq protocols preserve the strand information of the origin...
Motivation: Antisense transcription is known to have a range of impacts on sense gene expression, including (but not limited to) impeding transcription initiation, disrupting post-transcriptional processes, and enhancing, slowing, or even preventing transcription of the sense gene. Strand-specific RNA-Seq protocols preserve the strand information o...
JABAWS 2.2 is a computational framework that simplifies the deployment of web services for Bioinformatics. In addition to the five multiple sequence alignment (MSA) algorithms in JABAWS 1.0, JABAWS 2.2 includes three additional MSA programs (Clustal Omega, MSAprobs, GLprobs), four protein disorder prediction methods (DisEMBL, IUPred, Ronn, GlobPlot...
The enzyme name “O-linked β-N-acetylglucosamine” is incorrect in the article title, as well as in the second sentence of the Introduction section. In both of these cases, “O-linked 6-N-acet-ylglucosamine” should be “O-linked β-N-acetylglucosamine”. The correct title is: A study of the structural properties of sites modified by the O-linked β-N-acet...
The translation of personal genomics to precision medicine depends on the accurate interpretation of the multitude of genetic variants observed for each individual. However, even when genetic variants are predicted to modify a protein, their functional implications may be unclear. Many diseases are caused by genetic variants affecting important pro...
Protein O-GlcNAcylation (O-GlcNAc) is an essential post-translational modification (PTM) in higher eukaryotes. The O-linked β-N-acetylglucosamine transferase (OGT), targets specific Serines and Threonines (S/T) in intracellular proteins. However, unlike phosphorylation, fewer than 25% of known O-GlcNAc sites match a clear sequence pattern. Accordin...
Properties of sites in the SS132 dataset.
List of all entries in the SS132 dataset. PDB, PDB accession number; Chain, chain in the PDB file; Position, residue position within the chain; Cluster, cluster id. RSA, relative solvent accessibility; SS, secondary structure.
(CSV)
Motivation
The biological importance of changes in gene and transcript expression is well recognised and is reflected by the wide variety of tools available to characterise these changes. Regulation via Differential Transcript Usage (DTU) is emerging as an important phenomenon. Several tools exist for the detection of DTU from read alignment or ass...
Human genome sequencing has generated population variant datasets containing millions of variants from hundreds of thousands of individuals ¹⁻³ . The datasets show the genomic distribution of genetic variation to be influenced on genic and sub-genic scales by gene essentiality, 1,4,5 protein domain architecture ⁶ and the presence of genomic feature...
RNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, Differential Gene Expression (DGE) tools typically assume the form of the underlying distribution of gene expression. A recent highly replicated study revealed that RNA-seq gene expression measurements in yeast are best represented a...
Protein O-GlcNAcylation (O-GlcNAc) is an essential post-translational modification (PTM) in higher eukaryotes. The O-linked β-N-acetylglucosamine transferase (OGT), targets specific Serines and Threonines (S/T) in intracellular proteins. However, unlike phosphorylation, fewer than 25% of known O-GlcNAc sites match a clear sequence pattern. Accordin...
Background
Annotation of gene models and transcripts is a fundamental step in genome sequencing projects. Often this is performed with automated prediction pipelines, which can miss complex and atypical genes or transcripts. RNA sequencing (RNA-seq) data can aid the annotation with empirical data. Here we present de novo transcriptome assemblies ge...
Significance
Organizers are small groups of cells in developing embryos that secrete signals to control behaviors such as cell differentiation or cell movement of larger groups. In Dictyostelia, the apical tip is the site where differentiation of the fruiting body stalk initiates. The cause of tip-specific stalk formation has been unclear, but we s...
Human genome sequencing has generated population variant datasets containing millions of variants from hundreds of thousands of individuals. The datasets show the genomic distribution of genetic variation to be influenced on genic and sub-genic scales by gene essentiality, protein domain architecture and the presence of genomic features such as spl...
Motivation
The current generation of DNA sequencing technologies produce a large amount of data quickly. All of these data need to pass some form of quality control (QC) processing and checking before they can be used for any analysis. The large number of samples that are run through Illumina sequencing machines makes the process of QC an onerous a...
Motivation
The current generation of DNA sequencing technologies produce a large amount of data quickly. All of these data need to pass some form of quality control processing and checking before they can be used for any analysis. The large number of samples that are run through Illumina sequencing machines makes the process of quality control an o...
De novo Transcriptomics Assembly 1) The process of reconstructing a transcriptome by assembling the fragment reads together without the use of reference genome. 2) Like doing a jigsaw puzzle:- - Broken pieces (Sequencing errors) - Duplicate pieces (repeats) - Random pieces of another puzzle (contamination) - Finds overlapping reads
RNA-seq is now the technology of choice for genome-wide differential gene expression experiments, but it is not clear how many biological replicates are needed to ensure valid biological interpretation of the results or which statistical tools are best for analyzing the data. An RNA-seq experiment with 48 biological replicates in each of two condit...
BACKGROUND The Dictyostelia are popular model organisms to study the molecular ba- sis of cell and developmental biology. Here, we report the de novo tran- scriptome assembly of Dictyostelium discoideum and Polysphondylium pallidum from Illumina RNA sequencing data. e D. discoideum ge- nome was published in 2005 with the gene models being manually...
The small ubiquitin-like modifier 2 (SUMO-2) is required for survival when cells are exposed to treatments that induce proteotoxic stress by causing the accumulation of misfolded proteins. Exposure of cells to heat shock or other forms of proteotoxic stress induces the conjugation of SUMO-2 to proteins in the nucleus. We investigated the chromatin...
An RNA-seq experiment with 48 biological replicates in each of 2 conditions was performed to determine the number of biological replicates (n_r) required, and to identify the most effective statistical analysis tools for identifying differential gene expression (DGE). When n_r=3, seven of the nine tools evaluated give true positive rates (TPR) of o...
High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR...
JPred4 (http://www.compbio.dundee.ac.uk/jpred4) is the latest version of the popular JPred protein secondary structure prediction server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure prediction. In addition to protein secondary structure, JPred also makes predictions of solvent accessibil...
The 14-3-3 family of phosphoprotein-binding proteins regulate many cellular processes by docking onto pairs of phosphorylated Ser and Thr residues in a constellation of intracellular targets. Therefore, there is a pressing need to develop new prediction methods that use an updated set of 14-3-3-binding motifs for the identification of new 14-3-3 ta...
The differentiation of mouse embryonic stem (ES) cells is controlled by the interaction of multiple signaling pathways, typically mediated by post-translational protein modifications. The addition of O-linked N-acetylglucosamine (O-GlcNAc) to serine and threonine residues of nuclear and cytoplasmic proteins is one such modification (O-GlcNAcylation...
Supporting Information Table 2
Supporting Information Table 4
Supporting Information Table 7
Supporting Information Figures
Supporting Information Table 5
Supporting Information Table 6
Supporting Information Table 8
Supporting Information Video 1
Supporting Information Table 3
Supporting Information Table 1