
Ross KingChalmers University of Technology
Ross King
About
197
Publications
21,786
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,994
Citations
Introduction
Skills and Expertise
Publications
Publications (197)
The enzyme deoxyhypusine synthase (DHS) catalyzes the first step in the post-translational modification of the eukaryotic translation factor 5A (eIF5A). This is the only protein known to contain the amino acid hypusine, which results from this modification. Both eIF5A and DHS are essential for cell viability in eukaryotes, and inhibiting DHS is a p...
The concept of personalised medicine in cancer therapy is becoming increasingly important. There already exist drugs administered specifically for patients with tumours presenting well-defined mutations. However, the field is still in its infancy, and personalised treatments are far from being standard of care. Personalised medicine is often associ...
The cutting edge of applying AI to science is the closed-loop automation of scientific research: robot scientists. We have previously developed two robot scientists: `Adam' (for yeast functional biology), and `Eve' (for early-stage drug design)). We are now developing a next generation robot scientist Genesis. With Genesis we aim to demonstrate tha...
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus continues to cause severe disease and deaths in many parts of the world, despite massive vaccination efforts. Antiviral drugs to curb an ongoing infection remain a priority. The virus-encoded 3C-like main protease (MPro; nsp5) is seen as a promising target. Here, with a positive...
We propose a new machine learning formulation designed specifically for extrapolation. The textbook way to apply machine learning to drug design is to learn a univariate function that when a drug (structure) is input, the function outputs a real number (the activity): f(drug) →\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym}...
The process of developing theories and models and testing them with experiments is fundamental to the scientific method. Automating the entire scientific method then requires not only automation of the induction of theories from data, but also experimentation from design to implementation. This is the idea behind a robot scientist -- a coupled syst...
Large language models (LLMs) have transformed AI and achieved breakthrough performance on a wide range of tasks that require human intelligence. In science, perhaps the most interesting application of LLMs is for hypothesis formation. A feature of LLMs, which results from their probabilistic structure, is that the output text is not necessarily a v...
Motivation
Proteomic profiles reflect the functional readout of the physiological state of an organism. An increased understanding of what controls and defines protein abundances is of high scientific interest. Saccharomyces cerevisiae is a well studied model organism, and there is a large amount of structured knowledge on yeast systems biology in...
The use of computational models is growing throughout most scientific domains. The increased complexity of such models, as well as the increased automation of scientific research, imply that model revisions need to be systematically recorded. We present RIMBO (Revisions for Improvements of Models in Biology Ontology), which describes the changes ma...
Scientific discovery in biology is difficult due to the complexity of the systems involved and the expense of obtaining high quality experimental data. Automated techniques are a promising way to make scientific discoveries at the scale and pace required to model large biological systems. A key problem for 21st century biology is to build a computa...
We propose a new machine learning formulation designed specifically for extrapolation. The textbook way to apply machine learning to drug design is to learn a univariate function that when a drug (structure) is input, the function outputs a real number (the activity): F(drug) → activity. The PubMed server lists around twenty thousand papers doing t...
This study explores the application and performance of Transformational Machine Learning (TML) in drug discovery. TML, a meta learning algorithm, excels in exploiting common attributes across various domains, thus developing composite models that outperform conventional models. The drug discovery process, which is complex and time-consuming, can be...
Artificial intelligence (AI)-driven laboratory automation—combining robotic labware and autonomous software agents—is a powerful trend in modern biology. We developed Genesis-DB, a database system designed to support AI-driven autonomous laboratories by providing software agents access to large quantities of structured domain information. In additi...
The goal of Protein Structure Prediction (PSP) problem is to predict a protein's 3D structure (confirmation) from its amino acid sequence. The problem has been a 'holy grail' of science since the Noble prize-winning work of Anfinsen demonstrated that protein conformation was determined by sequence. A recent and important step towards this goal was...
The early stages of the drug design process involve identifying compounds with suitable bioactivities via noisy assays. As databases of possible drugs are often very large, assays can only be performed on a subset of the candidates. Selecting which assays to perform is best done within an active learning process, such as batched Bayesian optimizati...
We present an extension to the federated ensemble regression using classification algorithm, an ensemble learning algorithm for regression problems which leverages the distribution of the samples in a learning set to achieve improved performance. We evaluated the extension using four classifiers and four regressors, two discretizers, and 119 respon...
The representation of the protein-ligand complexes used in building machine learning models play an important role in the accuracy of binding affinity prediction. The Extended Connectivity Interaction Features (ECIF) is one such representation. We report that (i) including the discretized distances between protein-ligand atom pairs in the ECIF sche...
Scientific results should not just be ‘repeatable’ (replicable in the same laboratory under identical conditions), but also ‘reproducible’ (replicable in other laboratories under similar conditions). Results should also, if possible, be ‘robust’ (replicable under a wide range of conditions). The reproducibility and robustness of only a small fracti...
Significance
Machine learning (ML) is the branch of artificial intelligence (AI) that develops computational systems that learn from experience. In supervised ML, the ML system generalizes from labelled examples to learn a model that can predict the labels of unseen examples. Examples are generally represented using features that directly describe...
The features in some machine learning datasets can naturally be divided into groups. This is the case with genomic data, where features can be grouped by chromosome. In many applications it is common for these groupings to be ignored, as interactions may exist between features belonging to different groups. However, including a group that does not...
Ensemble learning has been shown to significantly improve predictive accuracy in a variety of machine learning problems. For a given predictive task, the goal of ensemble learning is to improve predictive accuracy by combining the predictive power of multiple models. In this paper, we present an ensemble learning algorithm for regression problems w...
The key to success in machine learning is the use of effective data representations. The success of deep neural networks (DNNs) is based on their ability to utilize multiple neural network layers, and big data, to learn how to convert simple input representations into richer internal representations that are effective for learning. However, these i...
In phenotype prediction the physical characteristics of an organism are predicted from knowledge of its genotype and environment. Such studies, often called genome-wide association studies, are of the highest societal importance, as they are of central importance to medicine, crop-breeding, etc. We investigated three phenotype prediction problems:...
Abstract The goal of quantitative structure activity relationship (QSAR) learning is to learn a function that, given the structure of a small molecule (a potential drug), outputs the predicted activity of the compound. We employed multi-task learning (MTL) to exploit commonalities in drug targets and assays. We used datasets containing curated reco...
To secure the world's food supply it is essential that we improve our knowledge of the genetic underpinnings of complex agronomic traits. In this paper, we report our findings from performing trait prediction and association mapping using marker stability in diverse rice landraces. We used the least absolute shrinkage and selection operator as our...
In phenotype prediction, the physical characteristics of an organism are predicted from knowledge of its genotype and environment. Such studies are of the highest societal importance, as they are of central importance to medicine, crop-breeding, etc. We investigated three phenotype prediction problems: one simple and clean (yeast), and the other tw...
Significance
Systems biology involves the development of large computational models of biological systems. The radical improvement of systems biology models will necessarily involve the automation of model improvement cycles. We present here a general approach to automating systems biology model improvement. Humans are eukaryotic organisms, and the...
The key to success in machine learning (ML) is the use of effective data representations. Traditionally, data representations were hand-crafted. Recently it has been demonstrated that, given sufficient data, deep neural networks can learn effective implicit representations from simple input representations. However, for most scientific problems, th...
Clark Glymour argued in 2004 that "despite a lack of public fanfare, there is mounting evidence that we are in the midst of ... a revolution - premised on the automation of scientific discovery" [1]. This paper highlights some of the philosophical and sociological dimensions that have been found empirically in work conducted with robot scientists -...
Clark Glymour argued in 2004 that "despite a lack of public fanfare, there is mounting evidence that we are in the midst of ... a revolution - premised on the automation of scientific discovery" [1]. This paper highlights some of the philosophical and sociological dimensions that have been found empirically in work conducted with robot scientists -...
Malaria, caused by parasites of the genus Plasmodium, leads to over half a million deaths per year, 90% of which are caused by Plasmodium falciparum. P. vivax usually causes milder forms of malaria; however, P. vivax can remain dormant in the livers of infected patients for weeks or years before re-emerging in a new bout of the disease. The only dr...
Malaria, caused by parasites of the genus Plasmodium, leads to over half a milliondeaths per year, 90% of which caused by Plasmodium falciparum. P. vivax usually causes milder forms of malaria, however, P. vivax can remain dormant in the livers of infected patients for weeks or years before re-emerging in a new bout of the disease. The only drugs a...
We investigate the learning of quantitative structure activity relationships (QSARs) as a case-study of meta-learning. This application area is of the highest societal importance, as it is a key step in the development of new medicines. The standard QSAR learning problem is: given a target (usually a protein) and a set of chemical compounds (small...
We investigate the learning of quantitative structure activity relationships (QSARs) as a case-study of meta-learning. This application area is of the highest societal importance, as it is a key step in the development of new medicines. The standard QSAR learning problem is: given a target (usually a protein) and a set of chemical compounds (small...
The theory of computer science is based around Universal Turing Machines (UTMs): abstract machines able to execute all possible algorithms. Modern digital computers are physical embodiments of UTMs. The nondeterministic polynomial (NP) time complexity class of problems is the most significant in computer science, and an efficient (i.e. polynomial P...
The paper presents an ontology for the description of Drug Discovery Investigation (DDI). This has been developed through the use of a Robot Scientist “Eve”, and in consultation with industry. DDI aims to define the principle entities and the relations in the research and development phase of the drug discovery pipeline. DDI is highly transferable...
The theory of computer science is based around Universal Turing Machines (UTMs): abstract machines able to execute all possible algorithms. Modern digital computers are physical embodiments of UTMs. The nondeterministic polynomial (NP) time complexity class of problems is the most significant in computer science, and an efficient (i.e. polynomial P...
Perennial ryegrass (Lolium perenne L.) is one of the most widely grown forage grasses in temperate agriculture. In order to maintain and increase its usage as forage in livestock agriculture, there is a continued need for improvement in biomass yield, quality, disease resistance, and seed yield. Genetic gain for traits such as biomass yield has bee...
Many advances in synthetic biology require the removal of a large number of genomic elements from a genome. Most existing deletion methods leave behind markers, and as there are a limited number of markers, such methods can only be applied a fixed number of times. Deletion methods that recycle markers generally are either imprecise (remove untarget...
PCR1 Agarose Gel.
This gel demonstrates PCR1 products for the initial 11 ORFs in lanes 1–11 when a 10:1 ratio of primers (forward:reverse) was employed. Other lanes on this gel show PCR2 results, and are discussed later. Although amplification in PCR1 was good when equimolar amounts of the forward and reverse primers were used, the amplification pr...
YGR125W and YGL202W.
Replacement PCR2 primers were designed for YGR125W and YGL202W using the new improved method, and they gave the expected PCR2 product without apparent dimer production.
(PNG)
DNA Heteroduplex.
In addition to products of the expected size (approx 1.25 kb), PCR2 gave an additional product approximately double the expected size (approx 2.5 kb). This additional product was seen on several gels, including the two given here.
(PDF)
YHR163W, YPR073C, YGR125W, YJL134W.
Confirmation gel for strains YHR163W, YPR073C, YGR125W, YJL134W after selection on FOA, with expected product sizes shown at bottom of lanes.
(PNG)
YGR125W and YNL169C.
Confirmation gel for strains YGR125W and YNL169C after selection on FOA, with expected product sizes shown at top of lanes.
(PNG)
YMR272C Testing.
Tests on PCR2 for YMR272C excluding the primers (including just the forward primer (F), including just the reverse primer (R), including both primers (F + R), including no primers (NP) showed that both primers were needed to produce any product.
(PNG)
PCR2—96 Test ORFs.
Gel images showing PCR2 products for all except 8 of the 96 ORFs. The remaining 8 are shown in the gel in S12 Fig.
(PDF)
PCR1—96 Test ORFs.
Gel images showing PCR1 products for the 96 ORFs.
(PDF)
Equimolar PCR1 Primers.
This gel shows the contrasting situation where equimolar amounts of primers are used for PCR1. The expected product was still obtained in PCR3 but sometimes with further amplification of the PCR1 and PCR2 products. Occasionally, even when PCR1 was successful at a 10:1 primer ratio, the PCR3 amplification produced further amp...
PCR2 Non-Specific Amplification.
When testing the primers selected on the basis of these initial criteria, PCR2 failed for the ORF YGR125W and electrophoretic analysis indicated primer dimer formation that had not been predicted. This gel shows the problem for YGR125W and demonstrates that the reverse primer is at fault. The reverse primer is used...
YMR272C.
Tests on PCR2 for YMR272C to vary the extension time showed that the larger product was still evident despite reduced extension time.
(PNG)
PCR2 and PCR3—96 Test ORFs.
The remaining eight of the PCR2 results are shown in the first 8 non-marker lanes of this gel, which then also contains the first 24 results for PCR3.
(PNG)
PCR3—96 Test ORFs.
Gel images showing the remaining PCR3 products for the 96 ORFs.
(PDF)
Gradient PCR1.
Temperature gradients to investigate PCR1 failures for YLR091W, YML097C, YAL058W and YCL050C.
(PNG)
Gradient PCR1.
Temperature gradients to investigate PCR1 failures for YDR071C and YHR207C.
(PNG)
Gradient PCR1.
Temperature gradients to investigate PCR1 failures for YJL128C, YML088W, YMR018W and YMR278W.
(PNG)
Gradient PCR1.
Temperature gradients to investigate PCR1 failures for YPL033C.
(PNG)
11 ORFs Products.
Table of expected sizes of the products produced for the 11 ORFS.
(CSV)
Gradient PCR1.
Temperature gradients to investigate PCR1 failures for YBR255W, YAR002W, YIL148W and YJR073C.
(PNG)
YGL202W and YGR125W PCR2.
Further forward primers were then chosen for PCR2 for YGL202W and YGR125W.
(CSV)
Confirmation Primer Products.
Expected product sizes using the confirmation primers. The four resultant strains were verified using confirmation primers from the EuroFAN project (http://www-sequence.stanford.edu/group/yeast_deletion_project/Deletion_primers_PCR_sizes.txt).
(CSV)
Primer Table for 11 ORFs.
Table of primer sequences for the 11 ORFs.
(CSV)
Summary of the 96 Test ORFs.
Table of properties of the PCR results for the 96 ORFs.
(CSV)
Primer GC Content.
Results of investigation into overall GC content. a Where there is no comment, the expected amplicon was obtained using the standard annealing temperature of 58°C (See Methods). b Only 6 of the 21 bases at the 3’ end of the forward primer are G or C. In general, reactions involving primers with an overall G + C content below 30%...
Primer GC Content.
Results of investigation into G + C content of the 8 bases at the 3’ end. a Where there is no comment, the expected amplicon was obtained using the standard annealing temperature of 58°C (See Methods). b This reaction gave a product of unexpected size. See also S6 Table.
(CSV)
96 Test Primers.
Table of primer sequences for the 96 ORFs.
(CSV)
Sequence Data.
Sequence results for the four strains with confirmed deletions.
(ZIP)
Alignment Files.
Alignments of sequencing results with expected sequence after the deletion.
(ZIP)
96 ORFs Selection Criteria.
Summary of reasons/methods for choosing the 96 ORFS.
(PDF)
Primer Sequences.
Compressed CSV file of all primers and products for all ORFs in S. cerevisiae, S. pombe and L. lactis.
(BZ2)
Subject Areas: biotechnology, computational biology, synthetic biology There is an urgent need to make drug discovery cheaper and faster. This will enable the development of treatments for diseases currently neglected for economic reasons, such as tropical and orphan diseases, and generally increase the supply of new drugs. Here, we report the Robo...