ArticlePDF Available

Toward Robot Scientists for autonomous scientific discovery

Authors:

Abstract and Figures

We review the main components of autonomous scientific discovery, and how they lead to the concept of a Robot Scientist. This is a system which uses techniques from artificial intelligence to automate all aspects of the scientific discovery process: it generates hypotheses from a computer model of the domain, designs experiments to test these hypotheses, runs the physical experiments using robotic systems, analyses and interprets the resulting data, and repeats the cycle. We describe our two prototype Robot Scientists: Adam and Eve. Adam has recently proven the potential of such systems by identifying twelve genes responsible for catalysing specific reactions in the metabolic pathways of the yeast Saccharomyces cerevisiae. This work has been formally recorded in great detail using logic. We argue that the reporting of science needs to become fully formalised and that Robot Scientists can help achieve this. This will make scientific information more reproducible and reusable, and promote the integration of computers in scientific reasoning. We believe the greater automation of both the physical and intellectual aspects of scientific investigations to be essential to the future of science. Greater automation improves the accuracy and reliability of experiments, increases the pace of discovery and, in common with conventional laboratory automation, removes tedious and repetitive tasks from the human scientist. BBSRC, RAEng/EPSRC
Content may be subject to copyright.
Sparkes et al: Automated Experimentation 2010, 2:1
http://www.aejournal.net/content/2/1/1
Open Access
REVIEW
© 2010 Sparkes et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
Review
Towards Robot Scientists for autonomous scientific
discovery
Andrew Sparkes*
1
, Wayne Aubrey
1
, Emma Byrne
3
, Amanda Clare
1
, Muhammed N Khan
1
, Maria Liakata
1
,
Magdalena Markham
2
, Jem Rowland
1
, Larisa N Soldatova
1
, Kenneth E Whelan
1
, Michael Young
2
and Ross D King
1
Abstract
We review the main components of autonomous scientific discovery, and how they lead to the concept of a Robot
Scientist. This is a system which uses techniques from artificial intelligence to automate all aspects of the scientific
discovery process: it generates hypotheses from a computer model of the domain, designs experiments to test these
hypotheses, runs the physical experiments using robotic systems, analyses and interprets the resulting data, and
repeats the cycle. We describe our two prototype Robot Scientists: Adam and Eve. Adam has recently proven the
potential of such systems by identifying twelve genes responsible for catalysing specific reactions in the metabolic
pathways of the yeast Saccharomyces cerevisiae. This work has been formally recorded in great detail using logic. We
argue that the reporting of science needs to become fully formalised and that Robot Scientists can help achieve this.
This will make scientific information more reproducible and reusable, and promote the integration of computers in
scientific reasoning. We believe the greater automation of both the physical and intellectual aspects of scientific
investigations to be essential to the future of science. Greater automation improves the accuracy and reliability of
experiments, increases the pace of discovery and, in common with conventional laboratory automation, removes
tedious and repetitive tasks from the human scientist.
Review
Towards the full automation of scientific discovery
A Robot Scientist encompasses a combination of different
technologies: computer controlled scientific instruments,
integrated robotic automation to link the instruments
together, a computational model of the object of study, arti-
ficial intelligence and machine learning to iteratively create
hypotheses about a problem and later interpret experimental
results (closed-loop learning), and the formalisation of the
scientific discovery process. We show how these elements
come together to create an automated closed-loop learning
system: a Robot Scientist.
Automation in all its forms has played an integral role in
the development of human society since the 19th century.
The advent of computers and computer science in the mid-
20th century made practical the idea of automating aspects
of scientific discovery, and now computing is playing an
increasingly prominent role in the scientific discovery pro-
cess [1,2]. Experimental scientists use computers for instru-
ment control, data acquisition and data analysis, and the
functionality available on scientific instrumentation con-
trolled by computers is improving rapidly. In addition, an
increasing number of scientists no longer conduct physical
experiments, instead using simulation or data-mining to
discover new knowledge from existing data [3]. Artificial
intelligence (AI) has been used in an attempt to automate
some of the intelligent aspects of the scientific discovery
process still predominantly carried out by human scientists.
Some examples of systems using AI components follow:
DENDRAL was an AI program developed in the 1960s
that used background knowledge of chemistry to analyse
experimental mass spectra data. It used heuristic search to
determine solutions for the chemical structures responsible
for the spectra, and was the first application of AI to a prob-
lem of scientific reasoning. This version became known as
Heuristic-DENDRAL. A variant called Meta-DENDRAL
followed, and was the first expert system for scientific
hypothesis formation. It took a set of possible chemical
structures and corresponding mass spectra as input, and
inferred a set of hypotheses to explain correlation between
some of the proposed structures and the mass spectrum.
* Correspondence: nds@aber.ac.uk
1Computational Biology Group, Department of Computer Science, Penglais
Campus, Aberystwyth University, Aberystwyth, SY23 3DB, UK
Sparkes et al, Automated Experimentation 2010, 2:1
http://www.aejournal.net/content/2/1/1
Page 2 of 11
This information was then used to describe the knowledge
that Heuristic-DENDRAL could utilise in its search for
suitable structures [4].
AM was an Automated Mathematician, a heuristic artifi-
cial intelligence program that modelled mathematical dis-
covery in the mid 1970s [5]. It was said to have discovered
numbers, prime numbers and several interesting mathemat-
ical conjectures. This system later evolved into EURISKO,
developed in the late 1970s, which was more flexible in that
it could be applied to other task domains. EURISKO was
used successfully, for example, in optimising the design of
integrated circuits for microchips [5].
KEKADA was a another heuristic based system that
could develop hypotheses and plan experiments, searching
for surprising phenomena [6]. Kulkarni and Simon used
this system to model the discovery of the urea synthesis
pathway by Krebs. However, KEKADA had limited back-
ground knowledge when compared to human scientists, and
like AM and EURISKO, needed more heuristics in order to
continue its discoveries.
BACON [7], ABACUS [8], Fahrenheit [9] and IDS [10]
were automated data driven discovery systems that could
discover scientific laws as algebraic equations. They relied
on data being entered by the experimenter, or on simulation
of experiments. More recently, another example of a data
driven system uses iterative cycles of algorithmic correla-
tion to extract natural laws of geometric and momentum
conservation, using data captured from motion-tracking
experiments [11].
Most intelligent scientific discovery programs still do not
'close the loop' (feeding their results back into their experi-
mental models) and do not design or execute their own
experiments. Now, with recent advances in hardware and
technology, the limitations of being unable to run experi-
ments are diminishing. For example, microfluidics ('lab-on-
a-chip') based approaches to experimentation may soon
allow small controllable biological experimental systems to
be linked to automated scientific discovery [12,13]. We
look forward to these advances in technology promoting
future developments in automated scientific discovery. Inte-
grated robotic systems are now capable of carrying out
highly complex scientific discovery processes [14-18]. The
majority of such automation is designed to perform experi-
ments, previously conducted manually, in a more efficient,
reliable and accurate manner than a human could ever
achieve. Automation also enables previously impractical
experiments to be carried out (such experiments may
involve dangerous chemicals, pathogenic organisms,
numerous assays or process steps, or need frequent mea-
surements over long periods). With this type of system a
human scientist typically sets-up the automated system to
perform a sequence of processes, the system then executes
the various steps automatically, and finally the human anal-
yses the results. Any computation involved is usually con-
nected with running the system, or with data management,
visualisation and analysis. Complex databases are required
to manage scientific data and knowledge and, as a result,
science increasingly depends on efficient information man-
agement and manipulation [19].
Computational models or simulations have been used to
give insight into how complex systems work; they represent
the current state of understanding, provide a basis for pre-
dictions, and also have the benefit of being relatively cheap
to execute. Models can be perturbed by making computa-
tional modifications to external conditions or to the archi-
tecture of the models themselves, then tested against
acquired data. Such models can be constructed manually
using human knowledge (e.g. [20]) or be automatically
derived from experimental data (e.g. [11]).
The formal recording of scientific experimental data and
meta-data can help towards creating better models, as well
as facilitating the easier reuse of that data. Recording scien-
tific data and meta-data in formal languages can provide
complete, accurate, and detailed descriptions of why exper-
iments were performed, how they were carried out, what
the observed and derived results were, and how these
results were interpreted. A well implemented formalisation
provides the transparency required for science, allowing
others to understand exactly why, as well as how an experi-
ment was done, and provides all the essential information
required to repeat that experiment.
To fully automate the scientific discovery process, com-
puters also need to be able to create the initial hypotheses
that define the reasons for carrying out the experiments, and
then to be capable of learning from the results. Deduction,
induction and abduction are types of logical reasoning used
in scientific discovery [21]. Deduction enables the infer-
ence of valid facts from existing known facts and rules,
induction enables the inference of hypothesised rules from
known facts, and abduction enables the inference of
hypothesised facts from known facts.
The full automation of science requires 'closed-loop
learning', where the computer not only analyses the results,
but learns from them and feeds the resulting knowledge
back into the next cycle of the process [22]. Computational
closed-loop learning systems have certain advantages over
human scientists: their biases are explicit, they can produce
full records of their reasoning processes, they can incorpo-
rate large volumes of explicit background knowledge, they
can incorporate explicit complex models, they can analyse
data much faster, and they do not need to rest.
The Robot Scientist concept
The combination of computational methods, automated
instruments, closed-loop learning, advanced laboratory
robotic systems and formal logical expression of informa-
tion leads to the concept of a 'Robot Scientist' [23]. A Robot
Scientist uses techniques from the field of artificial intelli-
gence to carry out cycles of experimentation on a laboratory
Sparkes et al, Automated Experimentation 2010, 2:1
http://www.aejournal.net/content/2/1/1
Page 3 of 11
robotic system. It automatically generates hypotheses from
the available background knowledge and model(s), designs
physical experiments to test these hypotheses, carries out
the experiments on a laboratory robotic system, and then
analyses and interprets the results (see Figure 1). Because
computers are involved throughout, it is possible to explic-
itly capture every detail of the scientific discovery process:
goals, hypotheses, results, conclusions, etc. Moreover, in
addition to all the direct experimental data there is also a
wealth of useful meta-data that can be captured, such as
environmental conditions, detailed experiment content lay-
out information, and instrument settings, protocols and run-
time logs. These meta-data can be especially important
when studying complex biological systems where the spe-
cifics of the environment can have such a large effect on
results.
Robot Scientist prototypes
Here we describe our two prototype Robot Scientists,
'Adam' and 'Eve'. Adam has already proven itself by dis-
covering new knowledge [24], whilst Eve is still under
development. Both robots are designed to carry out bio-
medical scientific research.
A Robot Scientist to study yeast metabolism - 'Adam'
Our first prototype Robot Scientist, Adam, was physically
commissioned at the end of 2005 (see Figure 2). It was
designed to carry out microbial growth experiments to
study functional genomics in the yeast Saccharomyces cer-
evisiae, specifically to identify the genes encoding 'locally
orphan enzymes'. A locally orphan enzyme is an enzyme
that is known to exist in an organism, but where the corre-
sponding gene is as yet unidentified (definition agreed with
Yannick Poullot and Peter D. Karp, who defined the term
'orphan enzyme' which has a slightly different meaning, see
[25]). Adam uses a comprehensive logical model of yeast
metabolism (based on the Forster iFF708 model), coupled
with a bioinformatic database (Kyoto Encyclopaedia of
Genes and Genomes - KEGG) and standard bioinformatics
homology search techniques (PSI-BLAST and FASTA) to
hypothesise likely candidate genes that may encode the
locally orphan enzymes. This hypothesis generation pro-
cess is abductive. There were two types of hypothesis gen-
erated. The first level links an orphan enzyme, represented
by its enzyme class (E.C.) number, to a gene (ORF) that
potentially encodes it. This relation is expressed as a two
place predicate where the first argument is the ORF and the
second the E.C. number. An example hypothesis at this
level is:
encodesORFtoEC('YBR166C', '1.1.1.25')
The second level of hypothesis involves the association
between a deletant strain, referenced via the name of its
missing ORF, and a chemical compound which should
affect the growth of the strain, if added as a nutrient to its
environment. This level of hypothesis is derived from the
first by logical inference using our model of yeast metabo-
lism. An example of such a hypothesis is: affects
growth('C00108','YBR166C'), where the first
argument is the compound (names according to KEGG) and
the second argument is the deletant strain. More examples
of Adam's hypotheses can be found on our website (see
appendix note 1). Adam then designs the experimental
assays required to test these hypotheses for execution on the
laboratory robotic system. These experiments are based on
a two-factor design that compares multiple replicates of
deletant strains with and without metabolites compared
Hypothesis-driven closed-loop learning
Figure 1 Hypothesis-driven closed-loop learning. Diagram showing how iterative cycles of hypothesis-driven experimentation allow for the au-
tonomous generation of new scientific knowledge.
Sparkes et al, Automated Experimentation 2010, 2:1
http://www.aejournal.net/content/2/1/1
Page 4 of 11
against wild type strain controls with and without metabo-
lites. Full details of the experiment design process (such as
how suitable metabolites were chosen) can be found on the
Robot Scientist website (see appendix note 1).
Adam's robotic system comprises various automated lab-
oratory instruments served by three robotic arms (see Fig-
ure 3). Experiments are created by combining the planned
yeast strains, metabolites and defined growth medium solu-
tions in SBS (Society for Biomolecular Screening) format
microtitre plates, at medium to high throughput using a
number of conventional liquid handlers (one of which is
capable of aspirating or dispensing different volumes in 96
wells simultaneously - see appendix note 2). Adam is capa-
ble of creating up to 1000 individual experiments in a day,
with a typical experiment running for 4 days.
The measurements observed by the system are optical
density measurements taken at 595 nm recorded by the two
microtitre plate readers (see appendix note 3), which when
plotted over time become graphs that act as a proxy for cel-
lular growth and indicate phenotype. The growth curves are
smoothed, after which biologically significant parameters
are extracted and statistically analysed to determine
whether the original hypotheses have been confirmed or
refuted. Scientific knowledge gained from this process is
used to update the model of yeast metabolism. Full details
about all these processes can be found on the Robot Scien-
tist website (see appendix note 1).
The yeast S. cerevisiae is extensively studied as a model
for eukaryotic cells. Yeast has a small physical size and fast
generation time which makes it suitable for high-through-
put experimentation, plus the growth curve results are rela-
tively easy to observe and highly sensitive to changes in
genotype and environment. There already exists a vast
amount of information about yeast, including a detailed
(but still incomplete) logical model of its metabolic path-
ways [26]. In Adam this model was used both as the basis
for forming hypotheses, and also when designing the exper-
iments required to test these hypotheses [20].
Adam's software Adam is intended to be fully automated,
with human intervention required only to supply library
strain stocks and consumables. As such, Adam includes a
collection of software components that together allow the
system to perform cycles of experimentation.
In Adam's early work, the Inductive Logic Programming
program C-Progol 5 [27,28] was used to automatically infer
hypotheses for the investigation of aromatic amino acid
metabolism. The inference was a restricted form of Abduc-
tive Logic Programming [29], where an incomplete back-
ground theory and experimental observations were used to
infer facts concerning gene function. A logic program cor-
responding to the Aromatic Amino Acid Biosynthesis path-
way of S. cerevisiae was used as the background theory and
the inferred facts matched an ORF from yeast to a
catalysing enzyme. The hypotheses completed the back-
ground theory by rediscovering the missing ORF/enzyme
relations.
For Adam's most recent hypothesis generation work we
have a detailed logical computer model of the metabolic
reaction pathways in yeast (written in Prolog), from which
locally orphaned enzymes are identified. The bioinformat-
ics method of hypotheses generation attempts to use
sequence similarity techniques to identify likely candidates
for the ORFs that catalyse these reactions, thereby allowing
the Robot Scientist to discover novel biology. The method
is described by the following steps:
Adam's laboratory robotic system
Figure 2 Adam's laboratory robotic system. (a) An external view of Adam's laboratory robotic system, also showing Eve's on the far right, and (b) a
view looking down through the middle of Adam's robotic system, again with Eve's beyond.
Sparkes et al, Automated Experimentation 2010, 2:1
http://www.aejournal.net/content/2/1/1
Page 5 of 11
1. Identify Enzyme Commission (E.C.) numbers corre-
sponding to enzymes which participate in yeast metabo-
lism but have no known ORF assigned to them.
2. For each E.C. number find the ORFs in other organ-
isms that code for that enzyme. Use all organisms from
the KEGG genome database for this search. Collect all
amino acid sequences for these ORFs. These are known
as the 'query sequences'.
3. For each query sequence use sequence similarity
search (PSI-BLAST or FASTA) to identify the most
similar sequences/ORFs in S. cerevisiae.
4. A single hypothesis is the mapping of one S. cerevi-
siae ORF to one E.C. class - e.g. YER152C 2.6.1.39.
There are typically many hypotheses for each enzyme
class.
Experiment design code then uses the system model to
generate biological experiment plans involving deletant
strains and metabolite solutions to test the hypotheses, cre-
ating microplate layouts using Latin-square design to
improve the detection of quantitatively small differences
above the background noise. The microplate layouts and
related liquid handler volume files are passed to the robotic
system control software, that executes the experiment
plans. The resulting growth curve data is processed using
algorithms based on cubic splines to fit, smooth and de-
noise the curves, and then to extract biologically significant
parameters such as growth rate and lag time. The parame-
ters from multiple experiment replicates are analysed using
machine learning (random forests [30]) to obtain statisti-
cally significant results that can be used to either confirm or
refute hypotheses, potentially resulting in new scientific
knowledge that can be used to update the system model.
The cycle can then repeat with further hypothesis genera-
tion.
Finally, there is a comprehensive custom-made relational
database that stores all the data and meta-data generated
throughout the various stages (MySQL).
Further details describing all the software and informatics
can be found online at our website (see appendix note 1).
Adam's results Adam conceived 20 hypotheses concern-
ing the identity of genes encoding 13 locally orphan
enzymes in S. cerevisiae. Adam tested all these hypotheses
on its robotic system and was able to confirm by experi-
mentation, with a high degree of confidence, the correct-
ness of 12 of them [24]. Conventional manual biological
experiments were performed to verify 3 of these conclu-
Plan diagram of Adam's laboratory robotic system
Figure 3 Plan diagram of Adam's laboratory robotic system. Layout diagram of Adam's laboratory robotic system, comprising: [1] Liconic STR602
freezer, [2] Caliper Presto liquid handler, [3] Thermo 384 multidrop, [4] two Caliper Twister II robot arms, [5] Caliper Sciclone i1000 liquid handler, [6]
Bio-Tek ELx405 plate washer, [7] Agilent (Velocity 11) VSpin plate centrifuge, [8] three Liconic STX40 incubators, [9]> two Molecular Devices Spectra-
max 190 plate readers, [10] Variomag plate shaker, [11] IAI Corporation Scara robot arm, [12] two pneumatically actuated plate slides, [13] two high
efficiency particulate air (HEPA) filters, and [14] aluminium and rigid transparent plastic enclosure. There are also four computers controlling the ro-
botics, plus a networked computer server which runs all the other code vital to Adam's function: the metabolism model, bioinfomatics, hypothesis
generation, experiment planning, results relational database, data analysis etc. (not shown).
11
2
3
7
10
12
6
8
4
9
5
1
4
8 8
12
13
13
14
1m
Sparkes et al, Automated Experimentation 2010, 2:1
http://www.aejournal.net/content/2/1/1
Page 6 of 11
sions, and additional detailed literature searches revealed
evidence supporting a further 6 more. Subsequent compara-
tive genomics also indicated a number of possible reasons
why the identities of some of the genes encoding these
locally orphan enzymes had remained unknown for so long:
there appear to have been gene duplication events with
retained overlapping functions, a number of the enzymes
appear to catalyse more than one associated reaction, and
some of the functional annotations in the existing literature
are incorrect. Adam's use of bioinformatic and quantitative
phenotypic analyses were needed to deconvolve this func-
tional complexity. Interestingly, Adam also came to an
incorrect conclusion regarding one of its original 20
hypotheses which highlights a weakness in its system
model. This was because the hypothesised gene candidate
YIL033C was predicted to be a glutaminase enzyme, and
Adam confirmed this activity experimentally by performing
assays involving 11 metabolites predicted to have differen-
tial effect on a glutaminase deletant. However, it transpires
that YIL033C has a cAMP-dependent protein kinase sub-
unit which is involved in regulating metabolism, and this
could also explain the observed phenotype. Adam's current
metabolism system model does not represent kinase control
mechanisms, and so Adam did not take this into account.
There is also the possibility that YIL033C is both a kinase
and a glutaminase, and some evidence exists to support this
theory (see [31]).
See the Robot Scientist website for more details on
Adam's results, its hardware and its software (see appendix
note 4).
Formalisation We formalised the information related to
Adam's investigations. This was based on the generic ontol-
ogy of scientific experiments: EXPO [32,33]. We devel-
oped a custom version of EXPO called LABORS (see
appendix note 5) which was tailored to formalise Adam's
experiments. We also developed an ontology to describe
experimental actions (both by humans and machines) called
EXACT [34].
LABORS was developed when no generic formalism for
the logical description of experiments was available. The
OBI project aims to provide such a formalism and the first
release is due in the near future [35]. The Robot Scientist
project joined the OBI consortium in October 2008, and the
LABORS representations which are common to other bio-
logical domains have been aligned with the OBI representa-
tions. However, other terms specific to automated
investigations still remain within LABORS. Use of
LABORS resulted in the generation of a nested tree struc-
ture 10 levels deep containing over 10,000 research ele-
ments, connecting all the experimental information to the
observations. These data are expressed in the logical pro-
gramming language Datalog [36], and have been made pub-
licly available (see the 'Results' section on the Robot
Scientist website - see appendix note 4). This explicit logi-
cal description makes Adam's investigations more explicit,
reproducible, and reusable.
A Robot Scientist to study chemical genetics and drug design -
'Eve'
Our second Robot Scientist, Eve, was physically commis-
sioned in the early part of 2009 (see Figure 4). Both the
software and the biological assays are still under develop-
ment. Eve is a prototype system to demonstrate the automa-
tion of closed-loop learning in drug-screening and design.
Eve's robotic system is capable of moderately high-
throughput compound screening (greater than 10,000 com-
pounds per day) and is designed to be flexible enough such
that it can be rapidly re-configured to carry out a number of
different biological assays.
The main goal with Eve is to integrate machine learning
and automated quantitative structure-activity relationship
(QSAR [37]) into the drug-screening process, to improve
both the efficiency and quality, as well as reduce the cost, of
a primary drug screen. Eve will begin by performing a stan-
dard mass screen against the target assay, monitoring the
results in real time, and when sufficient hits are found it will
stop the mass screen. After verifying the hits Eve will then
switch to a more targeted approach using machine learning
and QSARs to look at the chemical structures of the hit
compounds, and generate hypotheses about what it consid-
ers would be useful compounds to test next. It then plans
the screening experiments to test these hypotheses, runs
these experiments on the robotic system, uses machine
learning to analyse these results, and then iteratively cycles
around testing other compounds until it can identify the best
set of lead compounds for the target. Eve will first test those
compounds which are available from its own compound
library, then suggest other compounds that are commer-
cially available that should be tested. Potentially Eve could
even suggest new compounds that should be synthesised for
testing.
Eve will initially use an automation-accessible compound
library of 14,400 chemical compounds: the Maybridge 'Hit-
finder' library (see appendix note 6). This compound library
is cluster-based and was developed specifically to contain a
diverse range of compounds. It was selected as a subset of
the full Maybridge compound library by a two-stage filter-
ing process based first on 'Lipinski's rule of five' [38] to
reduce the set to 200,000 compounds, then secondly by
using a Pharmacophore Fingerprinting process [39] and
cluster analysis to further reduce the set to 14,400 com-
pounds. We realise that this is not a large compound library
by industrial standards; a pharmaceutical company may
have many hundreds of thousands or even millions of com-
pounds in its primary screening library. Our aim is to dem-
onstrate the proof-of-principle that incorporating machine
learning and QSARs into the screening process can
improve on the current mass screening approach.
Sparkes et al, Automated Experimentation 2010, 2:1
http://www.aejournal.net/content/2/1/1
Page 7 of 11
Eve's laboratory robotic system (see Figure 5) contains a
carefully selected set of instruments designed to give the
system the flexibility to prepare and execute a broad variety
of biological assays, including: cellular growth assays, cell
based chemical compound screening assays, and cellular
morphology assays. There are three types of liquid handling
instruments included in the system, one of which uses
advanced non-contact acoustic transference (see appendix
note 7), as used by many large pharmaceutical companies.
For observation of assays, the system contains two multi-
functional microplate readers (see appendix note 8) capable
(with the appropriate filters) of recording measurements
across a broad range of both excitation and emission wave-
lengths. There is also an automated cellular imager (see
appendix note 9) capable of taking images of the well con-
tents of microplates using both bright-field and a broad
range of other wavelengths. This automated high-through-
put microscope could be used to collect cell morphological
information, for example to see how cells change size and
shape over time after the addition of specific compounds.
Also, the primary biological assays intended to be used on
Eve will create one or more fluorescent protein markers that
can be detected on the readers and imager, such that Eve
can not only quantify the amount of marker produced using
the readers, but also potentially localise it to specific cellu-
lar regions or organelles using the imager. Eve also utilises
control software for the robotic system that is flexible
enough to allow us to reconfigure the experimental process
(see appendix note 10). In all, we believe this system is
equivalent to the best systems available in the Pharmaceuti-
cal industry.
In addition, Eve is also physically connected to Adam via
a linear track slide which allows the transfer of microtitre
plates in either direction. This will allow development of
assays using equipment from both systems and further
increases the flexibility available in experimental design.
Eve's software As with our other Robot Scientist Adam,
Eve is also intended to be fully automated. Eve will include
software components that will allow the system to perform
cycles of targeted drug screening. There are three stages to
this approach; mass screening, hit verification, and hypoth-
esis-driven targeted screening. For each of these stages
there will be experiment design code that generates the bio-
logical experiment plans, which combine chimeric yeast
target strains and chemical compounds.
First, the mass screening experiment plans will be gener-
ated and passed to the robotic system control software (see
appendix note 10) for execution. Monitoring software will
automatically analyse the mass screening data in real time
to identify and quantify chemical 'hits', and when there are
sufficient hits, the system will be switched into the verifica-
tion mode.
Experiment generation code will then create the verifica-
tion microplate layouts to re-screen each hit against the tar-
get at multiple concentrations and with multiple replicates.
Once these have been executed on the robotic system, the
resulting dose-response curves will be analysed by curve
smoothing algorithms and statistical tests to create a refined
list of verified hits that includes quantitative information
about how each chemical affected the target.
This list will then be passed to machine learning code that
uses the quantitative information and QSARs to inspect the
chemical structures of the hits and create hypotheses about
other possibly active compounds. Cycles of targeted screen-
ing then commence to test these hypotheses, with the exper-
iment planning code generating microplate layouts and
executing them on the robotic system. The aim is for the
machine learning code to analyse each successive cycle of
Eve's laboratory robotic system
Figure 4 Eve's laboratory robotic system. (a) An external view of Eve's laboratory robotic system, also showing Adam's at the extreme left, and (b)
a view looking down on some of the instruments within Eve's robotic system.
Sparkes et al, Automated Experimentation 2010, 2:1
http://www.aejournal.net/content/2/1/1
Page 8 of 11
this targeted screening and progressively refine an optimal
list of lead compounds.
Eve will use the same relational database that was devel-
oped for Adam to store all its data and meta-data, with
some modifications specific to the drug discovery research
and storing additional instrument meta-data.
Formalisation of Eve As with Adam, the intention is to
formalise all data and meta-data relating to Eve, again by
creating a derivation of the EXPO ontology. There are
many similarities between Adam and Eve's data and meta-
data, as well as in the types of instruments used on the sys-
tems, so we plan to identify the common elements in both
the LABORS and new Eve ontologies, and update EXPO to
include this common understanding.
The Future
The immediate future for our Robot Scientist work involves
the continued development of the new Eve system and the
improvement and continued use of the Adam system. Addi-
tionally, the two systems will be capable of working
together to address scientific questions. By combining the
functionality of the instrumentation on both systems we
will be able to discover more about yeast, bacteria and C.
elegans by preparing experiments that measure a variety of
phenotypes including growth, fluorescent protein expres-
sion, cellular morphology, chemical susceptibility, growth
competition, multiple gene deletions, and other visible phe-
notypic assays. For example, Adam's robotic system could
prepare plates containing multiple cellular strains that could
then be passed to Eve's robotic system where multiple com-
pounds could be added before visualisation of the morpho-
logical effects using the automated cellular imager. The
longer-term aim is for Robot Scientists to become more
commonplace in the laboratory: automating reasoning,
decision making and information management, as well as
automating the execution of experimental procedures. We
believe that such automation is necessary to meet the chal-
lenges of 21st century biological research.
Discussion
Any automated scientific discovery system or Robot Scien-
tist clearly has both advantages and disadvantages. Some of
the main points are discussed here.
It has been suggested that systems such as ours would be
better described as 'Laboratory Assistants' rather than the
implicitly more independent term 'Scientists'. Although in
some ways the term 'Laboratory Assistant' has some merit,
as they are not independent workers. In other, more impor-
tant ways, the term is inappropriate as Laboratory Assis-
Plan diagram of Eve's laboratory robotic system
Figure 5 Plan diagram of Eve's laboratory robotic system. Layout diagram of Eve's laboratory robotic system, comprising: [1] Labcyte Echo 550
acoustic liquid handler, [2] BMG Pherastar reader, [3] MDS ImageXpress Micro cellular imager, [4] BMG Polarstar reader, [5] Cytomat 2C435 incubator,
[6] Cytomat 6003 dry store, [7] FluidX DC-96pro capper/recapper, [8] two Variomag teleshake plate shakers and two Metrologic Orbit 1D barcode read-
ers, [9] Cytomat linear actuator track, [10] robot plinth holding two Mitsubishi robot arms; models RV-3SJB and RV-3SJ, [11] FluidX Xtr-96 tube rack 2D
barcode scanner, [12] Agilent (Velocity 11) Bravo liquid handler, [13] Thermo Combi-nL multidrop, [14] two Thermo Combi multidrops, and [15] con-
sumables stacks for microplates, tube racks and tips. There are also two computers controlling the robotics, plus a networked computer server which
runs all the other code vital to Eve's function: the chemistry knowledge base, QSARs and hypothesis generation, experiment planning, results relational
database, data analysis etc. (not shown).
1m
12
1
25
11
13
7
3
9
4
10
6
14
15
88
Sparkes et al, Automated Experimentation 2010, 2:1
http://www.aejournal.net/content/2/1/1
Page 9 of 11
tants do not generally form hypotheses, decide on the
experiments to test them, automatically analyse and inter-
pret the results etc. It should also be remembered that these
systems are still just prototypes, and it is probable that
future developments in hardware and software will increase
the independent nature of such systems. So, on balance, we
prefer the more evocative term 'Robot Scientist', and argue
that the Adam system has discovered new knowledge about
gene function in S. cerevisiae that has been independently
verified [24].
Another common argument against Robot Scientists is
that they remove the chance for serendipitous discovery,
and that they are incapable of innovation. We would argue
that more often than not a serendipitous discovery is simply
the result of an experiment that has been designed without
prior analysis of all the potential outcomes. Louis Pasteur
phrased this sentiment as 'In the fields of observation
chance favours only the prepared mind'. While it is true that
the underlying artificial intelligence components fail to
meet human expectations for innovative thought, we
believe that developing a richer background model and
incorporating more sophisticated reasoning mechanisms
will bring us closer to that goal.
It is also true that a Robot Scientist generates its hypothe-
ses based on information obtained from publicly available
databases, and as such is susceptible to any errors contained
therein. However, this is no different to the situation in
which human scientists find themselves, as they also have
to rely on published information. Most such databases are
curated by humans, and provide a service that biologists
routinely use in their work. Both could choose (or in the
case of the Robot Scientist be programmed) to assign
weightings in their confidence of various pieces of evidence
based on how they were labelled in the databases (e.g. indi-
rect vs direct experimental evidence). When errors are pres-
ent the robot is most likely to propose an incorrect
hypothesis which the experimental data will then refute.
For Adam we avoided problems by primarily using only
one public database (KEGG) and manually updating our
system model where conflicts were noticed, before allow-
ing automated generation of hypotheses. Where a Robot
Scientist may have more trouble is where it lacks the
broader background knowledge base which may be avail-
able to a human scientist (e.g. the problem with Adam's sys-
tem model not representing kinases mentioned earlier). A
better system model and a broader knowledge base can be
developed for Robot Scientists that in time would negate
this difference.
Similarly, it has been pointed out that the data analysis
algorithms of a Robot Scientist might be less able to deal
with flaws in experimental measurements than a human,
and may come to false conclusions as a consequence. We
believe this to be mostly a matter of refinement of program-
ming; for example, Adam's growth curve smoothing and
de-noising routines use machine learning and statistical
data analysis of multiple replicates to routinely deal with
the effects of significant noise, contamination, and even
gaps in the measured readings. Further refinement could be
done to identify abnormal or other unexpected results, for
example the shape of a bacterial contamination growth
curve, or the long lag time associated with yeast cross-con-
tamination, and then discount them automatically. The
advantage of a Robot Scientist here is that it would always
be consistent in its handling of the data. More physical
issues with an experiment, for example flaws in plastic
wares, faults in instruments, incorrect placements of plates
in instruments etc. are currently easier for a human to notice
and correct. Whilst there are some measures we can put in
place to automatically deal with this type of issue (e.g. cali-
bration and fault detection system checks prior to experi-
ment runs), we believe that future refinements of the
hardware and plastic wares used in laboratory automation
will reduce the effects of this type of problem.
Finally, there has also been discussion about costs, com-
paring the cost of using a Robot Scientist against using
humans to perform the same tasks. There is a substantial
cost for these systems, not only in initial capital outlay and
user training, but also in ongoing servicing and mainte-
nance costs, and we would not currently consider them to
be 'cost-effective' in comparison to human scientists. How-
ever, these systems are early prototypes, and we would
expect such costs to reduce significantly as laboratory auto-
mation becomes more widespread, more reliable, and the
software more user-friendly. The cost of hiring human sci-
entists and technicians and buying the instruments and
equipment they need to perform such high-throughput and
complex experiments should not be underestimated either,
and the robots have the advantages of efficiency, consistent
quality, and the ability to run outside normal working hours.
Conclusions
Robot Scientists are the next logical step in laboratory auto-
mation. They can automate all aspects of the scientific dis-
covery process: they generate hypotheses from a computer
model of the domain, design experiments to test these
hypotheses, run the physical experiments using robotic sys-
tems, and then analyse and interpret the results. They also
have the potential to record every detail of what they have
done and why, enabling scientific investigations to be more
reproducible and reuseable. We look forward to a time
when Robot Scientists will commonly work with human
scientists to progress the path of science.
Appendix
1. Robot Scientist informatics, http://www.aber.ac.uk/
compsci/Research/bio/robotsci/data/informatics/
2. Sciclone i1000, Caliper Life Sciences, Hopkinton,
MA, USA
Sparkes et al, Automated Experimentation 2010, 2:1
http://www.aejournal.net/content/2/1/1
Page 10 of 11
3. Spectramax 190 readers, MDS Analytical Technolo-
gies, Concord, Ontario, Cananda
4. Robot Scientist website, at http://www.aber.ac.uk/en/
cs/research/cb/projects/robotscientist/
5. LABORS ontology: http://www.aber.ac.uk/compsci/
Research/bio/robotsci/data/data/LABORS.owl
6. Maybridge Hitfinder library, Maybridge, Cornwall,
UK
7. Echo 550 acoustic liquid handler, Labcyte, Sunny-
vale, CA, USA
8. Pherastar and Polarstar readers, BMG labtech BmgH,
Offenburg, Germany
9. ImageXpress Micro, MDS Analytical Technologies,
Concord, Ontario, Cananda
10. Overlord 2 control software, PAA Ltd., Farnbor-
ough, UK
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
Areas of work and authors listed alphabetically:
AI software (EB, KEW); automation hardware and control software(AC, AS, JR);
automation requirements specification (AC, AS, KEW, JR, LNS, MY, WA); biology
and manual experiments (MM, MY, WA); chemical compound definition (MNK);
growth curve analysis (JR, ML, MY); hypothesis generation and bioinformatics
(KEW, ML); logical formalisation of data (AC, LNS, ML); ontologies (LNS); pro-
curement (JR); relational database (AC, AS, MNK); statistics and data analysis
(AC, AS, ML); writing this manuscript (AC, AS, JR, ML); yeast metabolism model
(KEW).
RDK conceived the idea, led the project and was involved in all aspects of it.
All authors read and approved the final manuscript.
Acknowledgements
Thanks to the reviewers for their helpful suggestions and comments.
Thanks also to our laboratory technicians for all their hard work maintaining
and supplying the robotic systems; Magdalena Markham, Katherine Martin
and Ronald Pateman.
This work was funded by the U.K. Biotechnology and Biological Sciences
Research Council grants, by SRIF 2 and SRIF 3 awards from the Higher Educa-
tion Funding Council for Wales, and by fellowships from the Royal Academy of
Engineering/Engineering and Physical Sciences Research Council, and the
Royal Commission for the Great Exhibition of 1851 for AC, and from Research
Councils UK for LS.
Author Details
1Computational Biology Group, Department of Computer Science, Penglais
Campus, Aberystwyth University, Aberystwyth, SY23 3DB, UK,
2Institute of Biological, Environmental and Rural Sciences, Aberystwyth
University, SY23 3DD, UK and
3School of Engineering and Information, Middlesex University, NW4 4BT, UK
References
1. Langley P: The computer-aided discovery of scientific knowledge.
Lecture Notes in Computer Science Proceedings of the First International
Conference on Discovery Science 1998, 1532:25-39.
2. Džeroski S, Todorovski L, (Eds): Computational Discovery of Scientific
Knowledge: Introduction, Techniques, and Applications in Environmental and
Life Sciences Berlin, Heidelberg: Springer-Verlag; 2007.
3. Szalay A, Gray J: 2020 Computing: Science in an exponential world.
Nature 2006, 440(7083):413-414.
4. Lindsay R, Buchanan B, Feigenbaum E, Lederberg J: DENDRAL: A Case
Study of the First Expert System for Scientific Hypothesis Formation.
Artificial Intelligence 1993, 61(2):209-261.
5. Lenat DB, Brown JS: Why AM and EURISKO appear to work. Artificial
Intelligence 1984, 23(3):269-294.
6. Kulkarni D, Simon H: The processes of scientific discovery: The strategy
of experimentation. Cognitive Science: A Multidisciplinary Journal 1988,
12(2):139-175.
7. Langley P, Simon H: Scientific discovery: Computational explorations of the
creative processes The MIT press; 1987.
8. Falkenhainer B, Michalski R: Integrating quantitative and qualitative
discovery: the ABACUS system. Machine Learning 1986, 1(4):367-401.
9. Zytkow J: Automated discovery of empirical laws. Fundamenta
Informaticae 1996, 27(2-3):299-318.
10. Nordhausen B, Langley P: A robust approach to numeric discovery. In
Proceedings of the seventh international conference (1990) on Machine
learning Morgan Kaufmann Publishers Inc. San Francisco, CA, USA;
1990:411-418.
11. Schmidt M, Lipson H: Distilling Free-Form Natural Laws from
Experimental Data. Science 2009, 324(5923):81-85.
12. Muggleton S: 2020 computing: exceeding human limits. Nature 2006,
440(7083):409-410.
13. Dittrich P, Manz A: Lab-on-a-chip: microfluidics in drug discover y.
Nature Reviews Drug Discovery 2006, 5(3):210-218.
14. Yamashita T, Nishimura I, Nakamura T, Fukami T: A System for LogD
Screening of New Drug Candidates Using a Water-Plug Injection
Method and Automated Liquid Handler. Journal of the Association for
Laboratory Automation 2009, 14(2):76-81.
15. Hogan C, Simons S, Zhang H, Burdick D: Living with Irresolute Cell Lines
in an Automated World. Journal of the Association for Laboratory
Automation 2008, 13(3):159-167.
16. Saitoh S, Yoshimori T: Fully Automated Laboratory Robotic System for
Automating Sample Preparation and Analysis to Reduce Cost and Time
in Drug Development Process. Journal of the Association for Laboratory
Automation 2008.
17. Manley J, Smith T, Holden J, Edwards R, Liptrot G: Modular approaches to
automation system design using industrial robots. Journal of the
Association for Laboratory Automation 2008, 13:13-23.
18. Maccio M, Bell D, Davolos D: Modular Automation Platforms: A Case
Study of a Flexible NMR Sample Preparation Robot. Journal of the
Association for Laboratory Automation 2006, 11(6):387-398.
19. Foster I: 2020 Computing: A two-way street to science's future. Nature
2006, 440(7083):419.
20. Whelan K, King R: Using a logical model to predict the growth of yeast.
BMC Bioinformatics 2008, 9:97.
21. Flach P, Kakas A, Ray O: Abduction, Induction, and the Logic of Scientific
Knowledge Development. In Workshop on Abduction and Induction in AI
and Scientific Modelling Citeseer; 2006.
22. Michalski R, Watanabe L: Constructive Closed-Loop Learning:
Introductory Ideas and Examples. In Tech. rep., Tech. Rep. No. MLI-Report
88-1 Fairfax, VA: George Mason University, Artificial Intelligence Center;
1988.
23. King R, Whelan K, Jones F, Reiser P, Bryant C, Muggleton S, Kell D, Oliver S:
Functional genomic hypothesis generation and experimentation by a
robot scientist. Nature 2004, 427:247-252.
24. King R, Rowland J, Oliver S, Young M, Aubrey W, Byrne E, Liakata M,
Markham M, Pir P, Soldatova L, Sparkes A, Whelan K, Clare A: The
Automation of Science. Science 2009, 324(5923):85-89.
25. Pouliot Y, Karp P: A survey of orphan enzyme activities. BMC
bioinformatics 2007, 8:244.
26. Reiser P, King R, Kell D, Muggleton S, Bryant C, Oliver S: Developing a
logical model of yeast metabolism. Electronic Transactions in Artificial
Intelligence 2001, 5:223-244.
27. Muggleton S, De Raedt L: Inductive logic programming: Theory and
methods. Journal of Logic Programming 1994, 19/20:629-679.
28. Muggleton S: Inverse entailment and Progol. New generation computing
1995, 13(3):245-286.
29. Kakas A, Kowalski R, Toni F: Abductive logic programming. Journal of
logic and computation 1992, 2(6):719.
30. Breiman L: Random forests. Machine learning 2001, 45:5-32.
31. Cannon J, Gitan R, Tatchell K: Yeast cAMP-dependent protein kinase
regulatory subunit mutations display a variety of phenotypes. Journal
of Biological Chemistry 1990, 265(20):11897-11904.
32. Soldatova L, Clare A, Sparkes A, King R: An ontology for a Robot
Scientist. Bioinformatics 2006, 22(14):.
Received: 14 July 2009 Accepted: 4 January 2010
Published: 4 January 2010
This article is available from: http://www.aejournal.net/content/2/1/1© 2010 Sparkes et al; li censee BioM ed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Automated Experimentation 2010, 2:1
Sparkes et al, Automated Experimentation 2010, 2:1
http://www.aejournal.net/content/2/1/1
Page 11 of 11
33. Soldatova L, King R: An ontology of scientific experiments. Journal of the
Royal Society Interface 2006, 3(11):795-803.
34. Soldatova L, Aubrey W, King R, Clare A: The EXACT description of
biomedical protocols. Bioinformatics 2008, 24(13):i295.
35. The OBI Consortium: SIG: Bio-Ontologies: Modeling biomedical
experimental processes with OBI. In Proceedings ISMB/ECCB 2009 Oxford
University Press; 2009.
36. Ullman J: Principles of database and knowledge-base systems, Classical
Database Systems Volume I. Computer Science Press, Inc. New York, NY,
USA; 1988.
37. Dudek A, Arodz T, Gálvez J: Computational methods in developing
quantitative structure-activity relationships (QSAR): a review.
Combinatorial Chemistry and High Throughput Screening 2006, 9(3):213.
38. Lipinski C, Lombardo F, Dominy B, Feeney P: Experimental and
computational approaches to estimate solubility and permeability in
drug discovery and development settings. Advanced Drug Delivery
Reviews 1997, 23(1-3):3-25.
39. Butina D: Unsupervised data base clustering based on daylight's
fingerprint and Tanimoto similarity: A fast and automated way to
cluster small and large data sets. Journal of Chemical Information and
Computer Sciences 1999, 39(4):747-750.
doi: 10.1186/1759-4499-2-1
Cite this article as: Sparkes et al, Towards Robot Scientists for autonomous
scientific discovery Automated Experimentation 2010, 2:1
... We define science automation broadly as the use of automation to perform tasks that would otherwise require human researchers to perform. Science automation ranges from automating mundane research tasks such as pipetting to autonomous 'robot scientists' that automate the process of generating hypotheses, designing and performing experiments, collecting and analysing data, and updating its model of the phenomena being studied based on this analysis (Sparkes et al. 2010). ...
... Autonomous experimentation platforms are intended to perform autonomous scientific discovery and to serve as 'robot scientists', where automation technology can perform every step of the research process (Sparkes et al. 2010). Autonomous scientific discovery systems would be able to create hypotheses that explain observations, test these hypotheses by designing and experiments, interpreting the results of these experiments, and repeating this process in light of the new data it has acquired (King et al. 2018). ...
Article
Full-text available
Science is being transformed by the increasing capabilities of automation technologies and artificial intelligence (AI). Integrating AI and machine learning (ML) into scientific practice requires changing established research methods while maintaining a scientific understanding of research findings. Researchers are at the forefront of this change, but there is currently little understanding of how they are experiencing these upheavals in scientific practice. In this paper, we examine how researchers working in several research fields (automation engineering, computational design, conservation decision-making, materials science, and synthetic biology) perceive AI/ML technologies used in their work, such as laboratory automation, automated design of experiments, computational design, and computer experiments. We find that researchers emphasised the need for AI/ML technologies to have practical benefits (such as efficiency and improved safety) to justify their use. Researchers were also hesitant to automate data analysis, and the importance of explainability differed between researchers working with laboratory automation and those using AI/ML directly in their research. This difference is due to the different role AI/ML plays in different research fields: laboratory automation performs processes already defined by the researcher and the actions are visible or recorded, while in AI/ML applications the decisions that produced the result may be obscure to the researcher. Understanding the role AI/ML plays in scientific practice is important for ensuring that scientific knowledge continues to grow.
... AI can also play a significant role in the experimental design phase. By generating comparative models and simulations, AI can assist researchers in designing more targeted experiments, thus effectively validating hypotheses or exploring new technological path-ways [40]. This data-driven approach to experimental design can significantly improve the success rate of experiments and reduce unnecessary resource waste. ...
... Science is an activity to extend the wisdom of humankind, and the automation of science is the ultimate way to accelerate this activity [1][2][3] . In the fields of life sciences and chemistry, significant advancements have been made in automating experimental operations, exemplified by the widespread use of automated pipetting machines and laboratory robots [4][5][6][7][8][9][10][11][12][13] . ...
Preprint
The automation of experiments in life sciences and chemistry has significantly advanced with the development of various instruments and AI technologies. However, achieving full laboratory automation, where experiments conceived by scientists are seamlessly executed in automated laboratories, remains a challenge. We identify the lack of automation in planning and operational tasks--critical human-managed processes collectively termed "care"--as a major barrier. Automating care is the key enabler for full laboratory automation. To address this, we propose the concept of self-maintainability (SeM): the ability of a laboratory system to autonomously adapt to internal and external disturbances, maintaining operational readiness akin to living cells. A SeM-enabled laboratory features autonomous recognition of its state, dynamic resource and information management, and adaptive responses to unexpected conditions. This shifts the planning and execution of experimental workflows, including scheduling and reagent allocation, from humans to the system. We present a conceptual framework for implementing SeM-enabled laboratories, comprising three modules--Requirement manager, Labware manager, and Device manager--and a Central manager. SeM not only enables scientists to execute envisioned experiments seamlessly but also provides developers with a design concept that drives the technological innovations needed for full automation.
... The prospect of LPMs functioning as 'artificial scientists' who might possess 'artificial understanding' raises the question of whether such understanding should conform to extant criteria for human understanding or requires a novel conception of understanding [104]. As LPMs evolve from tools to autonomous agents, in line with notions such as Zytkow's 'robot discoverer' [121], the 'Robot Scientist' [122], and the recent 'Intelligent Agent system' [23], their growing role in theory development and conceptual innovation is an area that needs to be investigated. ...
Preprint
Full-text available
This paper explores ideas and provides a potential roadmap for the development and evaluation of physics-specific large-scale AI models, which we call Large Physics Models (LPMs). These models, based on foundation models such as Large Language Models (LLMs) - trained on broad data - are tailored to address the demands of physics research. LPMs can function independently or as part of an integrated framework. This framework can incorporate specialized tools, including symbolic reasoning modules for mathematical manipulations, frameworks to analyse specific experimental and simulated data, and mechanisms for synthesizing theories and scientific literature. We begin by examining whether the physics community should actively develop and refine dedicated models, rather than relying solely on commercial LLMs. We then outline how LPMs can be realized through interdisciplinary collaboration among experts in physics, computer science, and philosophy of science. To integrate these models effectively, we identify three key pillars: Development, Evaluation, and Philosophical Reflection. Development focuses on constructing models capable of processing physics texts, mathematical formulations, and diverse physical data. Evaluation assesses accuracy and reliability by testing and benchmarking. Finally, Philosophical Reflection encompasses the analysis of broader implications of LLMs in physics, including their potential to generate new scientific understanding and what novel collaboration dynamics might arise in research. Inspired by the organizational structure of experimental collaborations in particle physics, we propose a similarly interdisciplinary and collaborative approach to building and refining Large Physics Models. This roadmap provides specific objectives, defines pathways to achieve them, and identifies challenges that must be addressed to realise physics-specific large scale AI models.
... A self-exploring automated experiment runs autonomously in a closed loop and consecutively selects which input parameter combinations to test, runs the experiments, evaluates the measured data, and iterates the process until a specific target has been reached [15,43,44]. The most common high-level targets for fluid experiments are: (a) mapping the response of output values of interest in response to variation of the input parameters; (b) exploring the parameter space to identify different flow regimes; (c) optimizing output values of interest; or (d) a combination of the above. ...
Article
Full-text available
Typical unsteady vortex-dominated flows like those involved in bio-inspired propulsion, airfoil separation, bluff body wakes, and vortex-induced vibrations can be prohibitively expensive to simulate and impossible to measure comprehensively. These examples are governed by nonlinear interactions, and often involve moving boundaries, high-dimensional parameter spaces, and multiscale flow structures. The classical way to get around these challenges has been to reduce the experimental complexity by using canonical motions or simplified unsteady inflow conditions. A paradigm shift is emerging in the form of self-exploring automated experiments that combine the automation of the experimental pipeline with data-science tools to increase experimental throughput and expedite scientific discovery. Such automated experiments can explore and exploit higher-dimensional parameter spaces and cover more realistic and technically relevant unsteady conditions compared to what is traditionally feasible with supervised canonical experiments. This alternative approach can yield robust and generalizable models and control solutions, as well as the discovery of rare and extreme events. Here, we provide a perspective on the transformative potential of self-exploring automated experiments for the discovery, optimization, and control of unsteady vortex-dominated flow phenomena. Published by the American Physical Society 2024
... Stage III: towards complete manufacturing autonomy. Finally, a stage is reached where robot scientists can autonomously 'experiment and analyze' the AI-generated hypothesis to push the scientific boundaries, a typical example being robot scientists Adam and Eve 92 . We argue that the manufacturing domain has good potential to be one of the earliest adopters of this level of autonomy due to the already existing deep The different domains of knowledge related to manufacturing processes and the potential role of AI in expanding these domains are illustrated. ...
Article
Full-text available
The paper shares the author’s perspectives on the role of explainable-AI in the evolving landscape of AI-driven smart manufacturing decisions. First, critical perspectives on the reasons for the slow adoption of explainable-AI in manufacturing are shared, leading to a discussion on its role and relevance in inspiring scientific understanding and discoveries towards achieving complete autonomy. Finally, to standardize explainability quantification, a new Transparency–Cohesion–Comprehensibility (TCC) evaluation framework is proposed and demonstrated.
Article
Abstract There has been much talk and substantial progress in automated and flexible Smart lab concepts in biopharma R&D. This is acknowledged to be important in enabling the acceleration of innovation and digitization of R&D operations. However, many proposals stop short of full end-to-end automation – limiting out-of-hours operation, which is particularly important in tasks such as cell culture - or are locked to a particular vendor's offering in a dedicated system - which can limit the flexibility and shared use access so important in R&D. In this contribution we describe a proof-of-concept of a fully integrated automated adherent cell culture system based on a modular architecture that allows integration of the most recent developments on the market (cell imaging, collaborative cloud robotics, mobile robots) as well as reuse of existing legacy devices (incubators and refrigerators). This creates a “cell culture autopilot” for small-scale cell culture, with repetitive media exchange, confluency checking, and splitting steps which are typically labor-intensive and must take place at times outside the working day. The system is built around the open lab communication standard SiLA2 and various other open-source resources to create three ways in which the SiLA2 standard can be leveraged. This choice of connectivity options provides freedom to integrate the most appropriate device while minimizing undesired vendor-lock in. This paper provides sufficient details for the reader to access the resources to build on such a system for cell culture and other applications. We believe this to be the first report of a true vendor-agnostic system operating in a commercial environment. This paper corresponds to the special issue on Robotics in Laboratory Automation as it describes robotics for labware transportation within a shared environment, and an automation framework supporting physical and logical interoperability.
Chapter
A deeper understanding of the emergence of creative products in science is essential, as scientific products have a profoundly impact society by driving advancements in medicine, technology, and industry. In this chapter, we examine creative products in science through the 5A framework, emphasizing the interconnectedness of the actor (the creative scientist), the action (scientific process), the audience (assessment), and affordances (material and socio-cultural resources) in the creation of scientific products (artefacts). Creative products, such as peer-reviewed publications and grant applications, are developed in an iterative cycle of divergent and convergent thinking, shaped by interactions with peers and access to resources. The audience plays an active role in assessing the value of these creative products, with peer reviewers and the broader scientific community acting as gatekeepers. Material and socio-cultural affordances, including funding, infrastructure, and intellectual freedom, further shape the development of scientific products.
Article
Full-text available
In this paper we outline some recent developments in the study of abduction and induction and their role in scientific modelling and knowledge refinement. We also describe a centr al challenge that appears to be emerging from this study: namely, the problem of developing practical approaches for exploiting abduction and induction, of formally characterising the limitations of such ap- proaches, and of identifying the classes of real-world problems to which they can be usefully applied. 1 Modelling Scientific Theories Modelling a scientific domain is a continuous process of obse rving and understanding phenomena according to some currently available model, and using this understanding to improve the original domain model. In this process one starts with a relatively simple model which gets further improved and expanded as the process is iterated. At any given stage of its development, the current model is very likely to be incomplete. The task then is to use the information given to us by experimental observations to improve and possibly complete this description. The development of our theories is driven by the obser- vations and the need for these theories to conform to the observa- tions. This point of view forms the basis of many formal theories of scientific discovery (22, 7, 15) 4 in the sense that the development of a scientific theory is considered to be an incremental proc ess of refinement strongly guided by the empirical observations. Considering a logical approach to this problem of incremental de- velopment of a scientific model, philosophers of science hav e recog- nized the need to introduce new synthetic forms of reasoning, along- side with the analytical reasoning form of deduction. Drawing on Aristotle's syllogistic logic, Charles Sanders Peirce (6, 21) distin- guished between abduction and induction, and studied their respec- tive role in the development of scientific theories. More rec ently, sev- eral authors have studied abduction and induction from the perspec- tive of Artificial Intelligence and Cognitive Science. (8, 1 1, 17, 4). In particular, one recent volume (4) is devoted to the problem of com- paring these two forms of reasoning and investigating their possible unification or integration for the purposes of Artificial Int elligence. Given a theory T describing our current (incomplete) model of the scientific domain under investigation, and a set of observat ions de- scribed by the sentences, O, abduction and induction are employed
Article
A fully automated robotic system was developed and deployed in-house in a modular way to meet the needs of a high throughput chemistry laboratory. The main system components consist of a Stäubli TX60 industrial robot and a Vapourtec V-10 evaporator, with control software by Aitken Scientific. A custom server application was written by Stäubli robotics to interface the robot and control software. The design was done using SolidWorks Computer Aided Design to speed up development, with out-sourced software development and hardware procurement or fabrication. Both hardware and software were modularized such that components could be reused in the future. An industrial robot and original equipment manufacturer (OEM) components were used to improve reliability and minimize support. A custom gripper was designed using a Schunk MPG50 pneumatic two-finger parallel actuator with stainless steel fingers. An injector station was designed to simplify and automate large volume evaporations, with built-in self-cleaning. Custom fabrication of racks, grippers, etc was done using local precision engineering firms. Providing full documentation and training allows support to be done by third-party service engineers. Initial data show that the system is both intuitive and reliable in use.
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
Article
The am program was constructed by Lenat in 1975 as an early experiment in getting machines to learn by discovery. In the preceding article in this issue of the AI Journal, Ritchie and Hanna focus on that work as they raise several fundamental questions about the methodology of artificial intelligence research. Part of this paper is a response to the specific points they make. It is seen that the difficulties they cite fall into four categories, the most serious of which are omitted heuristics, and the most common of which are miscommunications. Their considerations, and our post-am work on machines that learn, have clarified why am succeeded in the first place, and why it was so difficult to use the same paradigm to discover new heuristics. Those recent insights spawn questions about “where the meaning really resides” in the concepts discovered by am. This in turn leads to an appreciation of the crucial and unique role of representation in theory formation, specifically the benefits of having syntax mirror semantics. Some criticism of the paradigm of this work arises due to the ad hoc nature of many pieces of the work; at the end of this article we examine how this very adhocracy may be a potential source of power in itself.
Article
We have developed a fully automated laboratory robotic system (FA-LAS) capable of conducting most of routine experiments. Main objectives for developing this novel robotic system are to streamline the analytical tasks by automating sample preparations and analytical procedures, and to ensure the safety of analysts by assigning risk-involving procedures, such as handling of highly active substances. FA-LAS integrates numbers of unique functional devices and is highly competent for analyses of various research and development products. Comparative evaluations of FA-LAS with conventional manual preparation for drug substances and pharmaceutical products' real quantitative high performance liquid chromatography (HPLC) tests fulfilled this study's targeted criteria, which are 100 ± 5% for accuracy and coefficient of variation (%CV) ≤ 3% for repeatability for both HPLC analyses of drug substances and pharmaceutical products. In addition, several advantages over conventional manual methods were confirmed, such as reducing “working hour” for analysts, eliminating “working-hour” restriction, and decreasing drastically lag time for each prepared sample toward HPLC analysis. FA-LAS's unique features will work as a bridge between the sample management system and automatic documentation system, and it will reduce drastically in human resources, budgets, and time frame for development of pharmaceutical products.
Article
An automated cell-culture platform becomes the nucleus of an organization performing cell-based research. However, every cell-based project placed on the system brings unique challenges. With each cell line comes millions of years of evolutionary encumbrance and a genetic inclination driving unique phenotypic peculiarities. In vivo, diverse eukaryotic cells rely on their “mammalian host” for survival. An automated system must perform in vitro, the myriad actions needed to sustain multiple cell lines as well, hence becoming an “automated host.” Cells invariably, will endeavor to do as they please. Molding these cells into the operational bounds of a man-made system requires insight into the relationship between cell and machine. Citing our own experiences, we will describe herein the use of the SelecT automated cell-culture platform (The Automation Partnership, Hertfordshire, England) in our discovery and preclinical profiling programs at Novartis. Achieving the balance between cells and the automated environment, and accommodating variable cell dynamics are discussed.
Article
A variety of features used by our Department in the design and integration of Automation Platforms are presented here. A challenging project for any automation group is the automation of NMR sample preparation. The dispensing of highly volatile or viscous solutions into the typical 5 mm ID NMR glass tube, and the subsequent capping of the tube, presents unique problems. An angled incremental single-channel dispensing technique prevents bubble formation when a 10-mM protein—based solute is used. A novel gripper finger design, used in conjunction with in-house fabricated Teflon caps, allows reliable capping of NMR tubes. In situ vortexing minimizes vial handling with increased throughput. Magnetic mounting of robot tools (hands) provides precise snap-in positioning with collision-safe breakaway. This simplifies crash recovery during development testing and production use. A wraparound Safety Enclosure with modular safety circuit fulfills ANSI/RIA R15.06–1999 Safety Requirements. Flexible control software permits run interruption for loading and preparation of additional NMR tubes. Prepared samples may be removed during run interruption. A “Fly-By” barcode scanning tool enables positive compound sample ID with improved throughput. Preexisting instrument control software is conveniently interfaced to a Scheduler application through an open-architecture instrument integration framework. This framework allows the development of automation platform—independent Middleware for schedule and assay portability. A new generation of low-power, lightweight, portable and expandable platforms is also presented where a building block tandem approach is used in conjunction with the Rent-a-robot concept for robot recycling.