ArticlePDF Available

CRYSTAL: A multi-agent AI system for automated mapping of materials' crystal structures

Authors:

Abstract

We introduce CRYSTAL, a multi-agent AI system for crystal-structure phase mapping. CRYSTAL is the first system that can automatically generate a portfolio of physically meaningful phase diagrams for expert-user exploration and selection. CRYSTAL outperforms previous methods to solve the example Pd-Rh-Ta phase diagram, enabling the discovery of a mixed-intermetallic methanol oxidation electrocatalyst. The integration of multiple data-knowledge sources and learning and reasoning algorithms, combined with the exploitation of problem decompositions, relaxations, and parallelism, empowers AI to supersede human scientific data interpretation capabilities and enable otherwise inaccessible scientific discovery in materials science and beyond.
Articial Intelligence Research Letter
CRYSTAL: a multi-agent AI system for automated mapping of materials
crystal structures
Carla P. Gomes, Junwen Bai, Yexiang Xuea), Johan Björck, Brendan Rappazzo, Sebastian Ament, Richard Bernstein, and
Shufeng Kong, Department of Computer Science, Cornell University, Ithaca, NY 14853, USA
Santosh K. Suramb),Joint Center for Articial Photosynthesis, California Institute of Technology, Pasadena CA 91125, USA
R. Bruce van Dover, Department of Materials Science and Engineering, Cornell University, Ithaca, NY, USA
John M. Gregoire ,Joint Center for Articial Photosynthesis, California Institute of Technology, Pasadena CA 91125, USA
Address all correspondence to Carla P. Gomes at gomes@cs.cornell.edu and John M. Gregoire at gregoire@caltech.edu
(Received 18 January 2019; accepted 8 April 2019)
Abstract
We introduce CRYSTAL, a multi-agent AI system for crystal-structure phase mapping. CRYSTAL is the rst system that can automatically
generate a portfolio of physically meaningful phase diagrams for expert-user exploration and selection. CRYSTAL outperforms previous
methods to solve the example Pd-Rh-Ta phase diagram, enabling the discovery of a mixed-intermetallic methanol oxidation electrocatalyst.
The integration of multiple data-knowledge sources and learning and reasoning algorithms, combined with the exploitation of problem decom-
positions, relaxations, and parallelism, empowers AI to supersede human scientic data interpretation capabilities and enable otherwise
inaccessible scientic discovery in materials science and beyond.
Introduction
Articial Intelligence (AI) excels at a range of cognitive tasks,
from speech and image recognition to game playing,
[1,2]
and
holds great promise for automating scientic discovery.
[38]
The interpretation of scientic data remains a challenge for
AI due to both the need for intricate scientic background
knowledge and reasoning and the lack of large annotated train-
ing datasets. AI-based reasoning and learning methods are par-
ticularly critical for the eld of high-throughput materials
science where automated experiments are dramatically acceler-
ating the pace of materials discovery for a variety of critical
technologies.
[46,9,10]
Foundational techniques in high-
throughput materials discovery include simultaneous synthesis
of hundreds to thousands of materials using co-sputtering fol-
lowed by rapid structural characterization via synchrotron
XRD [Fig. 1(a)]. For complex materials containing three or
more elements, the most common rate-limiting step in the dis-
covery process is the construction of a crystal phase diagram
from the composition and structural characterization data. We
refer to this task as the phase mapping problem, which requires
the identication of basis patterns (or factors) corresponding to
pure crystal phases, some of which may not be sampled sepa-
rately, such that all the XRD measurements can be explained
as a mixture of the basis patterns [Figs. 1(c)1(f)]. The XRD
measurements are typically noisy, which contributes to the
challenge of separating the basis pattern sourcesfrom the col-
lection of patterns. Additionally, materials thermodynamics
places a set of intricate physical constraints on the solution,
and while synthesis of materials may not reach thermodynamic
equilibrium, the non-equilibrium behavior is most commonly
exhibited as the presence of non-equilibrium phases as opposed
to deviations from, e.g., the Gibbs phase rule.
Phase mapping has traditionally been a bottleneck of the
high-throughput materials discovery cycle as the synthesis
and characterization experiments [Figs. 1(a) and 1(b)] can be
performed on several libraries of materials per day while the
manual effort required to solve a given phase mapping problem
limits the throughput to only several phase diagrams per year.
Previous reports have detailed the shortcomings of existing
de-mixing algorithms,
[12]
most notably in the presence of
noise and substantial alloying, an important phenomenon in
which a range of elemental compositions crystallize into the
same phase, causing its basis pattern to shift systematically
with composition.
[13]
Non-negative matrix factorization
(NMF) techniques
[14]
have shown promise in the efcient
extraction of representative diffraction patterns from large data-
sets,
[15,16]
but their limited ability to encode physical con-
straints and prior knowledge results in routine production of
non-physical solutions. From a computational perspective,
phase mapping is an example of a challenging NP-hard prob-
lem
[17]
whose sheer number of possible combinations of
a)
Current address: Department of Computer Science, Purdue University, West
Lafayette, IN 47907, USA.
b)
Current address: Toyota Research Institute, Los Altos, CA 94022, USA.
MRS Communications (2019),9, 600608
© Materials Research Society, 2019
doi:10.1557/mrc.2019.50
600 MRS COMMUNICATIONS VOLUME 9 IS SUE 2 www.mrs.org/mrc
https://doi.org/10.1557/mrc.2019.50
Downloaded from https://www.cambridge.org/core. Cornell University Library, on 02 Sep 2019 at 14:54:20, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
basis patterns and activations grows exponentially with data
size, rendering monolithic solvers and traditional search meth-
ods computationally infeasible, which motivates the explora-
tion of innovative AI approaches.
CRYSTAL: a multi-agent AI system
for phase mapping
CRYSTAL is a multi-agent AI system that can run in unsuper-
vised or semi-supervised mode and that decomposes phase
mapping into smaller, more tractable sub-problems that are
tackled by nimble algorithmic bots with unique background
knowledge and reasoning capabilities (Fig. 2). Interleaved
Agile Factor Decomposition (IAFD) is CRYSTALs core
phase-mapping engine, which interleaves factor decomposition
(AgileFD bot) with constraint enforcement (Gibbs,Gibbs
Alloy, and Phase Connectivity bots), whose collective reason-
ing produces physically meaningful phase maps. At a high
level, IAFD relaxes and postpones the combinatorial physical
constraints and iteratively repairs and enforces them when
violations are detected.
The graph reasoning algorithms of the Gibbs,Gibbs Alloy,
and Phase Connectivity bots are applied at a local scale to
enable parallel computation and ensure scalability for large
real-world problems. The key insight is that global maintenance
of the combinatorial physical constraints is computationally
prohibitive, yet appropriate data exploitation with local con-
straint enforcement provides global constraint satisfaction at
a relatively small computational expense. While generating a
phase diagram is a confounding and time-consuming task
even for experienced materials scientists, IAFD generates a sol-
ution typically within 2 min for the dataset reported in this
paper, a groundbreaking advance in phase mapping since no
other algorithm imposes the physical constraints to reliably
yield physically meaningful solutions (see Table S1).
The capability to rapidly generate physically meaningful
solutions enables CRYSTALs large-scale computations to
assess solution stability, uncovering a critical and previously
Figure 1. Materials discovery cycle. (a) Synthesis of materials using sputter co-deposition from palladium (Pd), rhodium (Rh), and tantalum (Ta) sources to
form a materials librarythin lm with continuous composition variation. Collection of both elastically (XRD) and inelastically (XRF) scattered x-rays, using a
synchrotron x-ray beam, to characterize the materialscrystal structure and composition, respectively, the latter enabling a ternary composition map of the
Pd-Rh-Ta library. (b) Each library is screened for catalytic activity using an electrochemical imaging strategy in which the best catalysts are identied using a
uorescent marker.
[11]
Materials which appear active in the absence of methanol are denoted as unstable.(c) The triangle-composition plot contains the 197
distilled XRD/XRF measurements that comprise the input for phase mapping. The 12 XRD patterns along the Pd-Rh composition line illustrate
composition-dependent peak shifting due to the two elements alloying in a single-crystal structure. (d) CRYSTALs phase map solution identies ve phases
(purple, yellow, orange, blue, and red) and six multi-phase elds; each samples XRD pattern is explained by either a single phase ora mixture of phases. (e) The
XRD pattern for a phase 3sample is shown with red sticks denoting the known peak pattern of the face centered cubic (fcc) crystal structure, indicating that the
broad range of compositions in the fcc phase eld crystallize into this same structure. (f) The average atomic radius varies systematically with the alloy
composition, which CRYSTAL captures by mapping the composition-dependent fcc lattice constant.
Articial Intelligence Research Letter
MRS COMMUNICATIONS VOLUME 9 ISSUE 2 www.mrs.org/mrc 601
https://doi.org/10.1557/mrc.2019.50
Downloaded from https://www.cambridge.org/core. Cornell University Library, on 02 Sep 2019 at 14:54:20, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
overlooked aspect of phase mapping. Even with the imposed
physical constraints, different phase diagram solutions are
often inadequately differentiatedby the source XRD patterns.
Inadequate differentiation in phase mapping arises in part from
the fundamental non-invertibility of an XRD pattern to obtain
its crystalline phase source(s) and is compounded by both
noise in the source data and presence of different phases with
similar basis patterns. The standard practice in phase mapping,
and more generally in data modeling, is to extract knowledge
from a single solution that sufciently reconstructs the source
data. In contrast, CRYSTAL explores the search space by
deploying bots in parallel to produce a large number of candi-
date solutions. Using additional bots for solution analysis and
aggregation (Figs. 24), CRYSTAL runs unsupervised and
autonomously to generate a parsimonious portfolio of phase
diagrams that represent different interpretations of the source
data.
CRYSTALs algorithms
As mentioned above, CRYSTAL is a collection of nimble
algorithmic bots, with different knowledge and reasoning
capabilities performing a variety of tasks outlined in Fig. 2
and described in more detail below. The IAFD bots collec-
tively solve the phase mapping problem, using an unsuper-
vised generative approach, to produce a phase map that
satises the physical constraints. The CRYSTAL planner
launches parallel runs of IAFD with different random initiali-
zations and parameters, in particular the number of target
phases, and each IAFD run follows the algorithm outlined
below and illustrated in Fig. 3:
Step 0: Initialization: Initialize the inner-loop (AgileFD-Gibbs)
counter CNT1, bounded above by p, together with CNT2
which counts outer-loops bounded above by q, to be 0. As dis-
cussed below, pand qare typically set to 3 and 2 respectively.
Step 1: AgileFD bot: Apply AgileFD on randomly selected N/p
untouchedsamples
Step 2: Gibbs bot: Enforce Gibbs phase constraint on these N/p
samples
Step 3: If some samples still have not been processed, namely
CNT1 is still smaller than p, increase CNT1 by 1 and go back to
step 1, otherwise, move on to step 4
Step 4: Gibbs-Alloy bot: enforce Gibbs-Alloy constraint on all
Nsamples
Step 5: Phase Connectivity bot: enforce Connectivity constraint
on all Nsamples
Step 6: If Gibbs-Alloy constraint is violated, go back to step 4,
otherwise, go to step 7
Step 7: If CNT2 hasnt reached the upper bound q,rened solu-
tions from Step 6 are fed into step 1 as initialization and the
whole algorithm starts over again. Otherwise, the IAFD output
is taken as the phase map nalized in step 6. Further documen-
tation and source code for IAFD can be found at http://www.
udiscover.it/resources/software/.
AgileFD bot
As illustrated in Fig. 3, we formulate phase mapping as a con-
strained matrix factorization problem. Experimental XRD
measurements are represented by a matrix Aof size L × N.
Each column of Ais a vector representing the XRD pattern
sampled at Ldiffraction angles, obtained at one out of Nsample
locations. The phase mapping problem entails decomposing A
in terms of factors Wand H, satisfying physics constraints. W
encodes the characteristic patterns or structure associated with
each pure crystalline phase and Hrepresents the mixing param-
eters, such as the phases present, their proportions, and any
alloying present.
Under the assumed isotropicalloying model, an XRD pat-
tern measured as a function of scattering vector magnitude will
Figure 2. Outline of the CRYSTAL system. CRYSTAL incorporates a diverse collection of fast and specialized algorithms with different types of knowledge and
computational capabilities. IAFD integrates the AgileFD,Gibbs,Gibbs Alloy, and Phase Connectivity bots, constituting CRYSTALs core phase-mapping engine:
AgileFD performs agile factor decomposition to learn the factors or basis patterns, corresponding to pure crystal structures, and its three partner bots enforce
physical constraints. The Phase Matching bot matches the basis patterns discovered by IAFD to known crystal structure patterns from databases. The Phase
Dimension Analysis bot analyzes and validates the generated phase maps and infers the systems maximum number of pure phases, which dictates how many
system congurations CRYSTAL explores, using parallelism and randomization, to produce a large number of candidate phase maps. The hierarchical Clustering
bot uses automated thresholding to identify a small set of representative candidate solutions, which are provided to either the CRYSTAL Planner for solution
renement or to the Analysis & Reporting bot to generate phase diagrams and other visualizations for human-expert inspection. The Visualizer & Interface bot
enables users to interact with CRYSTAL for solution selection and ne tuning.
602 MRS COMMUNICATIONS VOLUME 9 IS SUE 2 www.mrs.org/mrc
https://doi.org/10.1557/mrc.2019.50
Downloaded from https://www.cambridge.org/core. Cornell University Library, on 02 Sep 2019 at 14:54:20, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
shift multiplicatively, with, for example, a 1% lattice contrac-
tion causing peaks to shift by a factor of 1.01. We convert
the multiplicative shift to an additive constant by performing
the factorization of logarithmically transformed patterns, mak-
ing Aa convolutive mixture of the bases in W. W and Hare
naturally non-negative, leading to a convolutive non-negative
matrix factorization (CNMF) problem,
[18]
which AgileFD
performs using lightweight multiplicative coordinate gradient-
descent updating rules applied to the logarithmically transformed
XRD data.
[19,20]
Gibbs and Gibbs-Alloy bots
For a physical system where lelements are deposited, Gibbs
phase rule implies that at most lphases are present at each sam-
ple location. Mathematically, this is equivalent to constraining
the number of non-zero elements in the vector
m
Hm
,nfor any
sample location nto be no more than l.
The Gibbs bot uses a Mixed Integer Programming (MIP)
approach to nd the best activation matrix Hthat activates no
more than lphases per sample point and minimize the recon-
struction loss, while holding the phases in Wxed.
[21]
Notice
that when Wis xed, the columns of matrix Hare independent
which leads to a decomposition into a smaller MIP program per
violated sample point, which nds the best lphases that mini-
mize the reconstruction loss of sample location n, as described
further in the Supplementary Information.
The Gibbs Alloy bot extends the Gibbs constraint by reduc-
ing the allowed number of coexisting phases allowed by one
when alloying is detected, i.e., for those sample locations
with varying mean shift parameters compared with nearby
locations. This is motivated by the thermodynamic degree of
freedom associated with alloying, although instead of identify-
ing details of the alloying behavior, the bot identies where
alloying is taking place and lowers the number of allowable
phases by 1 at those composition points.
Phase and Phase Field Connectivity
bot
The Connectivity constraint requires that both (i) the sample
points where a specic phase is present and (ii) the sample
points where each unique set of phases is present form a con-
nected component in composition space, which is determined
using the activations of each phase for each composition sam-
ple. Specically, we dene a graph Gin which sample points
are nodes and two nearby sample points in composition
space are connected with an edge based on the Delaunay
Figure 3. Solving phase mapping using an unsupervised generative approach. The IAFD bot network solves phase mapping as a constrained matrix factorization
problem in which the input XRD pattern matrix (A) is decomposed into factors Wand Hsuch that W×Happroximates Awhile satisfying physical constraints. W
encodes the characteristic patterns of pure crystal phases (including shifted versions) and Htheir activations, which dictate both the amount and the pattern
shifting extent of each pure phase in each XRD measurement. IAFD starts with p(typically three) rounds of interactions between the AgileFD and Gibbs bots
followed by rounds of iterations between the Gibbs Alloy and Phase Connectivity bots, until all the constraints are satised. AgileFD performs matrix factorization
using light-weight multiplicative updating rules, without enforcing the combinatorial physical constraints. The AgileFD solutions violations of the connectivity
constraint and the constraints based on Gibbsphase rule are repaired by the corresponding bots in an interleaved manner using efcient algorithms (red circles
highlight repaired activations of H). The entire procedure is repeated for solution renement (typically q= 2), and the resulting generated basis patterns are
passed to the Phase Matching bot to identify the crystal structures by comparison with ICDD and/or determine if the solution potentially contains a new phase.
The gure illustrates a representative XRD pattern of the Pd-Rh-Ta system (#69) that is decomposed into shifted versions of two different basis patterns (0.16
and 0.84 of each, respectively).
Articial Intelligence Research Letter
MRS COMMUNICATIONS VOLUME 9 ISSUE 2 www.mrs.org/mrc 603
https://doi.org/10.1557/mrc.2019.50
Downloaded from https://www.cambridge.org/core. Cornell University Library, on 02 Sep 2019 at 14:54:20, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
Triangulation. The Connectivity constraint states that the sam-
ple locations nfor which
m
Hm
n,kis larger than zero must form a
connected component in graph G, and that phase elds consist-
ing of a unique combination of phases must similarly form a con-
nected component. The Phase and Phase Field Connectivity bot
recties the connectivity constraints in a lazy and iterative
manner. See also Supplementary Information.
Phase Matching and Phase
Dimensionality Analysis bots
The Phase Matching bot matches the basis patterns produced
by the IAFD module, as part of each solution, to known crystal
structure patterns from the ICDD. Fitting an ICDD-derived pat-
tern to each basis pattern (see Supplementary Information for
details) provides the additional opportunity to threshold the
loss such that if no ICDD-derived pattern sufciently matches
the basis pattern, then the basis pattern may be describing a new
phase. The matching of a single ICDD entry to multiple basis
patterns in a single solution is an indication that the K(number
of phases) of the solution is too large and thus the solution is
invalid. Based on this concept, the CRYSTAL planner
monitors Phase Dimensionality Analysis results to determine
the maximum number of phases to consider for the given sys-
tem. The CRYSTAL planner runs IAFD congurations with an
increasing number of phases until the resulting phase diagrams
have an ICDD entry assigned to more than one basis pattern or
the valid solution rate becomes vanishingly small, providing
automatic determination of the upper bound on the number of
phases (basis patterns).
Phase Diagram Clustering and Analysis
& Reporting bots
The IAFD bot produces physically meaningful phase diagrams,
whose phases are labeled using the ICDD, for the known
phases. CRYSTAL runs the IAFD module in parallel (in this
paper we report 500 runs for a given number of phases but in
general 100 or fewer runs is sufcient) to produce candidate
phase diagrams that require automated phase diagram analysis
and consolidation. The Clustering bot takes as input the set of
solutions produced by parallel runs of the IAFD module and
outputs a small set of representative candidate solutions. This
bot uses hierarchical agglomerative clustering based on
Figure 4. CRYSTALs solution to the Pd-Rh-Ta catalyst system. (a) CRYSTAL automatically generates 2500 phase diagrams in parallel from which the Phase
Dimension Analysis bot identies 1639 valid solutions and the Clustering bot identies 100 representative solutions for additional renement. (b) From the 100
rened phase diagrams, CRYSTAL automatically identies the span of solutionswith different physical meaning, which is 20 phase diagrams in this case. (c) The
selected 20 phase diagrams that represent their respective clusters. (d) The nal solution resulting from expert consideration of CRYSTALs report. The expert
user also provided minor manual renement of the phase diagram, in particular small phase eld boundary adjustments in composition regions with sparse
measurement data. (e) Color scheme for the phase elds where the single-phase elds are labeled and phase combinations are denoted by linkages. The 11
phase elds marked with a black circle appear in the nal solution. (f) The basis patterns for the nal solution with stick patterns from the International Center for
Diffraction Data shown in red. Composition mapsof the relative lattice constant for each phase reveal alloying-based shifts due to the different atomic radii of the
elements (Ta > Pd > Rh). The dot size denotes the phase concentration. (g) The methanol oxidation onset potential for the ternary and binary composition spaces
where Rh-Ta is the only binary to exhibit catalytic activity. The overlay of the nal solutions phase eld boundaries reveals that the best activity (lowest onset
potential) is observed in the mixed orth-Rh
2
Ta + hex-Pd
3
Ta phase eld.
604 MRS COMMUNICATIONS VOLUME 9 IS SUE 2 www.mrs.org/mrc
https://doi.org/10.1557/mrc.2019.50
Downloaded from https://www.cambridge.org/core. Cornell University Library, on 02 Sep 2019 at 14:54:20, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
pair-wise phase diagram dissimilarity with automated thresh-
olding. A distance metric between phase diagrams is dened
as the across-sample average of the dissimilarity of the pair
of phase elds in which the sample resides in the pair of
phase diagrams. The phase eld for a given sample is dened
as the set of basis patterns activated for that sample, which is
labeled according to the ICDD phase matching results, and
the comparison of the resulting pair of ICDD phase sets is per-
formed under consideration that multiple ICDD patterns may
match a given basis pattern within a predened tolerance.
Hierarchical clustering of phase diagram solutions provides
(i) 100 representative solutions from the initial runs of IAFD
and (ii) a portfolio of unique phase diagrams from the 100
rened solutions where the number of clusters is determined
through automated thresholding based on unique sets of
ICDD patterns. The set of representative candidate solutions
identied by the clustering bot are provided to either the
CRYSTAL Planner for solution renement by the IAFD mod-
ule, as initial solutions, or to the Analysis & Reporting bot to
generate phase diagrams and other visualizations for
human-expert inspection.
Pd-Rh-Ta experiments
The Pd-Rh-Ta system was chosen for investigation based on
the use of Pd in catalysts for alcohol oxidation in alkaline elec-
trolytes
[22]
and recent success improving the methanol oxida-
tion reactivity of Pt by combining it with Ta, where surface
sub-oxides of Ta appeared to lower the adsorption of CO and
thus mitigate catalyst poisoning.
[23]
The 197 XRD and XRF
measurements were acquired on four co-sputtered thin lm
composition libraries: the Pd-Rh-Ta ternary library and one
for each binary system (the edges of the composition triangle).
The XRF measurements provide the mapping from the physical
location on the substrate to the Pd-Rh-Ta ternary composition
space, with measurement details and data processing as
described in Ref. 24.
The catalytic activity of each of the four composition spread
thin lms was mapped using the high-throughput uorescence-
based screening, and the results for the Pd-Rh-Ta lm are
shown in Fig. 1(b) with false-color images of uorescence
intensity showing background-subtracted images from a
charge-coupled device camera, as previously described.
[23,25]
The N
2
-sparged aqueous electrolyte contained 3 mM quinine
(uorescent indicator) and 0.1 M potassium triate (supporting
electrolyte). An initial voltage sweep in this electrolyte without
methanol was used to identify any regions exhibiting uores-
cence, which could be due to lm oxidation or oxidative corro-
sion, prompting our labeling of these composition regions as
unstable.A solution with the addition of 5 M methanol
was then used for screening catalysis of the methanol oxidation
reaction. As standard practice, the experiment was repeated
three times with fresh electrolyte and similar results were
obtained, Figs. 1(b) and 4(g) showing the results from the
rst of these voltage sweeps. In all of these experiments, the
voltage sweep was performed from 0.4 to +0.5 V versus Ag/
AgCl at a scan rate of 0.05 V/s. By setting a uorescence inten-
sity threshold, the onset potential associated with each pixel in
the library image was determined and the XRF-based composi-
tion map was used to map the onset potential data to composi-
tion space, as shown in Fig. 4(g).
Phase mapping for catalyst discovery
in the Pd-Rh-Ta system
The Pd-Rh-Ta system poses substantial phase mapping chal-
lenges due to strongly overlapped features in its phasesXRD
patterns as well as substantial alloying-based peak shifting,
which are compounded by experimental noise in the thin-lm
XRD measurements. CRYSTAL generated a total of 2500
phase maps (500 phase maps per conguration with a number
of phases K= 3, 4, 5, 6, and 7). As shown in Table I, 100% of
the composition points, phases, and phase elds meet the phys-
ical constraints imposed by the algorithmic bots. For compari-
son, an analogous 2500 runs were performed using AgileFD
and NMF, with Table I revealing some constraint satisfaction,
but not sufcient to produce any physically meaningful solu-
tions. Comparison was also made with NMF
K
, a recently
reported algorithm that involves clustering of NMF compo-
nents to identify basis patterns,
[26]
which produced a single
K= 5 solution for which the constraint satisfaction rates exceed
that of NMF and AgileFD but still do not provide a physically
meaningful phase diagram where all constraints are satised.
Comparison of this NMF
K
solution with that of Fig. 4 is
shown in Fig. S1, revealing a substantially different interpreta-
tion of the data.
CRYSTAL continued processing of its initial 2500 phase
diagrams via the Phase Dimension Analysis bot, which deter-
mined that the system contains no more than K= 6 phases
and passed 1639 valid solutions to the Clustering bot, which
identied 100 representative solutions using hierarchical
agglomerative clustering based on pairwise phase diagram dis-
similarity. After further rening the 100 solutions using the
IAFD bots, the hierarchical Clustering bot, using automated
thresholding, identied 20 representative phase diagrams that
represent the span of different data interpretations. These
phase diagrams were passed to the Analysis & Reporting bot
to produce phase diagram visualizations and composition
maps of the phase elds and lattice constant shifts, which are
readily interpretable by an expert [Figs. 1(d)1(f) and 4]. The
expert user analyzed and compared the 20 candidate phase dia-
grams, eliminating 15 candidate phase diagrams since they do
not meet subtle criteria based on prior knowledge specic to the
Pd-Rh-Ta system. In this case, this prior knowledge was from a
previous analysis of the Pd-Rh binary line where a single face
centered cubic (fcc) seven-phase diagram was analyzed. While
this prior knowledge was used to screen candidate solutions in
the present work, this type of knowledge can be used to initial-
ize and/or constrain IAFD, as described previously for
AgileFD.
[19,20]
Other types of prior knowledge may also be
incorporated, which may require the development of new algo-
rithmic bots, another motivation for building CRYSTAL as a
Articial Intelligence Research Letter
MRS COMMUNICATIONS VOLUME 9 ISSUE 2 www.mrs.org/mrc 605
https://doi.org/10.1557/mrc.2019.50
Downloaded from https://www.cambridge.org/core. Cornell University Library, on 02 Sep 2019 at 14:54:20, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
network of bots that can be adapted for specic research tasks
and expanded to incorporate new modes of reasoning. After
applying this lter, the expert user selected the nal phase dia-
gram remaining ve phase diagrams, which we note could have
also been automatically selected via a voting method since the
selected phase diagram represents the hierarchical cluster con-
taining 17 rened solutions, more than any other remaining
cluster [Fig. 4(c)].
In addition to identifying complete solubility of Pd and Rh
into the fcc structure, this phase diagram also indicates substan-
tial alloying in the intermetallic phases [Figs. 4(d)4(f)]. For
each phase, the lattice constant variation with composition
matches expectations based on the metallic radii of the ele-
ments. For the three non-cubic phases, the observed <1% lattice
expansions are well modeled by the isometric peak shifting
model. The hex-Pd
3
Ta phase exhibits the largest alloying
extent of these phases, with up to 30 at.% of the smaller Rh
and up to 35 at.% excess of the larger Ta leading to lattice
expansions <0.7%.
CRYSTALs phase diagram enabled insightful interpretation
of the catalytic activity in the Pd-Rh-Ta system. Figure 4(g)
shows the result of a high-throughput screening of the
Pd-Rh-Ta libraries for methanol oxidation, a critical reaction
for direct methanol fuel cells traditionally addressed with
Pt-based catalysts.
[27]
While many of the compositions exhibit
inactivity or instability under the reaction conditions, selected
Pd-Rh-Ta compositions exhibit an activity that is on par with
the best Pt-based catalysts evaluated by this high-throughput
method. On the Rh-Ta binary line, the orth-Rh
2
Ta phase pro-
vides the highest catalytic activity. The methanol oxidation
onset potential is further lowered by 0.2 V to approximately
0.5 V versus RHE via mixing orth-Rh
2
Ta with an alloy of
hex-Pd
3
Ta at a composition Pd
0.17
Rh
0.33
Ta
0.5
. Such combina-
tions of catalyst materials have recently been proposed for over-
coming historical barriers that limit catalyst performance for
multi-step reactions,
[28,29]
and indeed the activity of this multi-
intermetallic catalyst is quite remarkable. The best thin lm cat-
alysts for methanol oxidation that have been identied by this
technique include the Pt-Ta intermetallics with an onset poten-
tial of 120 mV versus Ag/AgCl,
[23]
and a family of Pt-based fcc
alloys including binary alloys with Ru, In, Sn, and Zn, which
have onset potentials between 0 and 40 mV versus Ag/
Table I. Comparison of constraint satisfaction for solutions generated by different algorithms.
K(# basis patterns) 3 4 5 6 7
CRYSTAL Gibbs 100% 100% 100% 100% 100%
Gibbs-Alloy 100% 100% 100% 100% 100%
Pure phase connectivity 100% 100% 100% 100% 100%
Phase field connectivity 100% 100% 100% 100% 100%
AgileFD Gibbs 100% 74.93% 60.20% 47.46% 37.46%
Gibbs-Alloy 52.66% 32.36% 19.05% 11.71% 7.88%
Pure phase connectivity 63.67% 49.95% 16.68% 7.03% 2.31%
Phase field connectivity 33.99% 32.29% 28.58% 34.55% 49.38%
NMF Gibbs 100% 79.01% 64.35% 55.01% 47.24%
Gibbs-Alloy 100% 79.01% 64.35% 55.01% 47.24%
Pure phase connectivity 51.60% 29.70% 23.68% 11.17% 1.94%
Phase field connectivity 28.73% 26.37% 24.73% 39.22% 56.42%
NMF
K
Gibbs NA NA 87% NA NA
Gibbs-Alloy NA NA 77% NA NA
Pure phase connectivity NA NA 40% NA NA
Phase field connectivity NA NA 50% NA NA
For each number of basis patterns (K), 500 random initializations were used to generate a set of solutions, which were then evaluated for compliance with four
physical constraints. The percentage of samples (Gibbs and Gibbs-Alloy), percentage of phases (Pure phase connectivity), and percentage of phase fields (Phase
field connectivity) are shown assuming a threshold value of 10
6
on phase activations (values of H). While AgileFD and NMF solutions satisfy constraints for
some of the composition points despite the lack of enforcement, none of the resulting phase diagrams meet all requirements. NMF
K
produced a single K=5
solution, which satisfies constraints better than NMF and AgileFD but similarly does not provide physically meaningful phase diagrams where all constraints are
satisfied. Due to its constraint enforcement, CRYSTAL produces solutions which meet all of the physical requirements.
606 MRS COMMUNICATIONS VOLUME 9 IS SUE 2 www.mrs.org/mrc
https://doi.org/10.1557/mrc.2019.50
Downloaded from https://www.cambridge.org/core. Cornell University Library, on 02 Sep 2019 at 14:54:20, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
AgCl.
[30]
The Pd
0.17
Rh
0.33
Ta
0.5
catalyst is the only known
Pt-free catalyst with onset potential in this range. The onset
potential of 0.5 V versus RHE is also within the range of
onset overpotentials observed with Pd-based catalyst in alkaline
electrolytes,
[22]
and given the inactivity of Pd and Rh in the
weak acidic electrolyte used in our experiments, the Ta-based
intermetallics appear to enable the activity at lower pH, opening
a new direction for catalyst development and a pathway for the
further development of Pt-free catalysts.
The discovery of this multiphase catalyst highlights both the
power of high-throughput materials science and the effective-
ness of AI techniques for integrating multiple knowledge
sources to provide meaningful solutions. By teaching
CRYSTAL to reason about phase diagrams, we have for the
rst time automated the generation and exploration of alterna-
tive data models, demonstrating the ability of AI systems to
accelerate phase mapping and providing a novel data-
interpretation approach for materials sciences and beyond.
Supplementary material
The supplementary material for this article can be found at
https://doi.org/10.1557/mrc.2019.50
Acknowledgments
This work was supported by NSF awards CCF-1522054 and
CNS-0832782 (Expeditions), CNS-1059284 (Infrastructure),
and IIS-1344201 (INSPIRE); ARO awards W911NF-14-1-0498
and W911NF-17-1-0187; AFOSR Multidisciplinary University
Research Initiatives (MURI) Program FA9550-18-1-0136,
Toyota Research Institute award; and US DOE Award No.
DE-SC0004993. Use of SSRL is supported by DOE Contract
No. DE-AC02-76SF00515. Use of CHESS is supported by the
NSF award DMR-1332208. The authors thank A. Mehta,
D. G. Van Campen, M. Tague, and D. Dale for assistance with
data collection.
Author contributions
The authorscontributions are as follows: C.P.G., R.B.vD., and
J.M.G. conceived and managed the project. C.P.G. conceived
CRYSTALS multiple knowledge source approach. J.Ba.,
JBj., C.P.G., and Y.X. designed the botsalgorithms. J.Ba.,
C.P.G., J.M.G., B.H.R., and Y.X. designed the Diagram
Rendering bot. J.Ba. implemented the IAFD bots, phase match-
ing bot, and phase analysis bot. B.H.R. implemented the dia-
gram rendering algorithm, and Analysis & Reporting and
Visualizer & Interface bots. S.K. performed the comparison
with NMF
K
. R.A.B. assisted with programming in several com-
ponents of CRYSTAL. R.B.vD. and J.M.G. acquired Pd-Rh-Ta
data, and S.K.S. and J.M.G. acquired Nb-Cu-V data with assis-
tance as noted in the Acknowledgments. S.K.S. and J.M.G.
served as human experts for both systems. C.P.G. and J.M.G.
were the primary authors of the manuscript. S.A., J.Ba., J.Bj.,
C.P.G., J.G.M., B.H.R., and Y.X. were the primary authors
of the Methods and Supplementary Information.
Data availability
The raw data for the Pd-Rh-Ta along with CRYSTALs results and
reports will be available at http://www.udiscover.it/resources/data/.
Further documentation and source code for IAFD can be found
at http://www.udiscover.it/resources/software/.
Author information
The authors declare no competing nancial interests.
Correspondence should be addressed to gomes@cs.cornell.
edu and gregoire@caltech.edu.
References
1. Articial intelligence. Science 349, 248 (2015).
2. D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A.
Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F.
Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis:
Mastering the game of Go without human knowledge. Nature 550, 354
(2017).
3. D.P. Tabor, L.M. Roch, S.K. Saikin, C. Kreisbeck, D. Sheberla, J.H.
Montoya, S. Dwaraknath, M. Aykol, C. Ortiz, H. Tribukait, C.
Amador-Bedolla, C.J. Brabec, B. Maruyama, K.A. Persson, and A.
Aspuru-Guzik: Accelerating the discovery of materials for clean energy
in the era of smart automation. Nat. Rev. Mater.3, 5 (2018).
4. P. De Luna, J. Wei, Y. Bengio, A. Aspuru-Guzik, and E. Sargent: Use
machine learning to nd energy materials. Nature 552, 23 (2017).
5. R. Ramprasad, R. Batra, G. Pilania, A. Mannodi-Kanakkithodi, and C. Kim:
Machine learning in materials informatics: recent applications and pros-
pects. Nat. Comput. Mater.3, 54 (2017).
6. P. Nikolaev, D. Hooper, F. Webber, R. Rao, K. Decker, M. Krein, J. Poleski,
R. Barto, and B. Maruyama: Autonomy in materials research: a case study
in carbon nanotube growth. Nat. Comput. Mater.2, 16031 (2016).
7. E. Smalley: AI-powered drug discovery captures pharma interest. Nat.
Biotechnol.35, 604 (2017).
8. R.D. King, K.E. Whelan, F.M. Jones, P.G.K. Reiser, C.H. Bryant, S.H.
Muggleton, D.B. Kell, and S.G. Oliver: Functional genomic hypothesis
generation and experimentation by a robot scientist. Nature 427, 247
(2004).
9. M.L. Green, C.L. Choi, J.R. Hattrick-Simpers, A.M. Joshi, I. Takeuchi, S.C.
Barron, E. Campo, T. Chiang, S. Empedocles, J.M. Gregoire, A.G. Kusne,
J. Martin, A. Mehta, K. Persson, Z. Trautt, J.V. Duren, and A. Zakutayev:
Fullling the promise of the materials genome initiative with high-
throughput experimental methodologies. Appl. Phys. Rev.4, 011105
(2017).
10. A.G. Kusne,T. Gao, A. Mehta, L. Ke, M.C. Nguyen, K.-M. Ho, V. Antropov,
C.-Z. Wang, M.J. Kramer, C. Long, and I. Takeuchi: On-the-y machine-
learning for high-throughput experiments: search for rare-earth-free per-
manent magnets. Sci. Rep.4, 6367 (2014).
11. E. Reddington, A. Sapienza, B. Gurau, R. Viswanathan, S. Sarangapani,
E.S. Smotkin, and T.E. Mallouk: Combinatorial electrochemistry: a highly
parallel, optical screening method for discovery of better electrocatalysts.
Science 280, 1735 (1998).
12. J.R. Hattrick-Simpers, J.M. Gregoire, and A.G. Kusne: Perspective: com-
positionstructureproperty mapping in high-throughput experiments:
turning data into knowledge. APL Mater.4, 053211 (2016).
13. L.A. Baumes, M. Moliner, N. Nicoloyannis, and A. Corma: A reliable meth-
odology for high throughput identication of a mixture of crystallographic
phases from powder x-ray diffraction data. Cryst. Eng. Comm.10, 1321
(2008).
14. D.D. Lee and H.S. Seung: Learning the parts of objects by non-negative
matrix factorization. Nature 401, 788 (1999).
15. C.J. Long, D. Bunker, X. Li, V.L. Karen, and I. Takeuchi: Rapid identica-
tion of structural phases in combinatorial thin-lm libraries using x-ray
diffraction and non-negative matrix factorization. Rev. Sci. Instrum.80,
103902 (2009).
Articial Intelligence Research Letter
MRS COMMUNICATIONS VOLUME 9 ISSUE 2 www.mrs.org/mrc 607
https://doi.org/10.1557/mrc.2019.50
Downloaded from https://www.cambridge.org/core. Cornell University Library, on 02 Sep 2019 at 14:54:20, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
16. A.G. Kusne, D. Keller, A. Anderson, A. Zaban, and I. Takeuchi: High-
throughput determination of structural phase diagram and constituent
phases using GRENDEL. Nanotechnology 26, 444002 (2015).
17. R. LeBras, T. Damoulas, J.M. Gregoire, A. Sabharwal, C.P. Gomes, and
R.B. van Dover: Constraint Reasoning and Kernel Clustering for Pattern
Decomposition with Scaling, in Principles and Practice of Constraint
Programming CP 2011: 17th International Conference, CP 2011,
Perugia, Italy, September 1216, 2011. Proceedings, edited by J. Lee
(Springer Berlin Heidelberg, Berlin, Heidelberg, 2011), p. 508.
18. A. Cichocki, R. Zdunek, A.H. Phan, and S. Amari: Nonnegative Matrix and
Tensor Factorizations: Applications to Exploratory Multi-way Data
Analysis and Blind Source Separation (John Wiley & Sons, Chichester,
West Sussex, UK, 2009).
19. P. Smaragdis: Non-negative Matrix Factor Deconvolution; Extraction of
Multiple Sound Sources from Monophonic Inputs, in Independent
Component Analysis and Blind Signal Separation: Fifth International
Conference, ICA 2004, Granada, Spain, September 2224, 2004.
Proceedings, edited by C. G. Puntonet and A. Prieto (Springer Berlin
Heidelberg, Berlin, Heidelberg, 2004), p. 494.
20. S.K. Suram, Y. Xue, J. Bai, R. Le Bras, B. Rappazzo, R. Bernstein, J.
Bjorck, L. Zhou, R.B. van Dover, C.P. Gomes, and J.M. Gregoire:
Automated phase mapping with AgileFD and its application to light
absorber discovery in the VMnNb oxide system. ACS Comb. Sci.19,
37 (2017).
21. J. Bai, J. Bjorck, Y. Xue, S.K. Suram, J. Gregoire, and C. Gomes:
Relaxation methods for constrained matrix factorization problems: solv-
ing the phase mapping problem in materials discovery, in International
Conference on AI and OR Techniques in Constraint Programming for
Combinatorial Optimization Problems (Springer 2017), p. 104.
22. C. Bianchini and P.K. Shen: Palladium-based electrocatalysts for alcohol
oxidation in half cells and in direct alcohol fuel cells. Chem. Rev.109,
4183 (2009).
23. J.M. Gregoire, M.E. Tague, S. Cahen, S. Khan, H.C.D. Abruña, F.J.
DiSalvo, and R.B. van Dover: Improved fuel cell oxidation catalysis in
Pt
1x
Ta
x
.Chem. Mater.22, 1080 (2009).
24. J.M. Gregoire, D. Dale, A. Kazimirov, F.J. DiSalvo, and R.B. van Dover:
High energy x-ray diffraction/x-ray uorescence spectroscopy for high-
throughput analysis of composition spread thin lms. Rev. Sci.
Instrum.80, 123905 (2009).
25. J. Jin, M. Prochaska, D. Rochefort, D. Kim, L. Zhuang, F. Disalvo, R.
Vandover, and H. Abruna: A high-throughput search for direct methanol
fuel cell anode electrocatalysts of type PtxBiyPbz. Appl. Surf. Sci.254,
653 (2007).
26. V. Stanev, V.V. Vesselinov, A.G. Kusne, G. Antoszewski, I. Takeuchi, and
B.S. Alexandrov: Unsupervised phase mapping of x-ray diffraction data by
nonnegative matrix factorization integrated with custom clustering. npj
Comput. Mater.4, 43 (2018).
27. H. Liu, C. Song, L. Zhang, J. Zhang, H. Wang, and D.P. Wilkinson: A
review of anode catalysis in the direct methanol fuel cell. J. Power
Sources 155, 95 (2006).
28. M. Andersen,A.J. Medford, J.K. Nørskov, and K. Reuter: Scaling-relation-
based analysis of bifunctional catalysis: the case for homogeneous
bimetallic alloys. ACS Catal.7, 3960 (2017).
29. E. Casado-Rivera, Z. Gál, A.C.D. Angelo, C. Lind, F.J. DiSalvo, and H.D.
Abruña: Electrocatalytic oxidation of formic acid at an ordered intermetal-
lic PtBi surface. ChemPhysChem 4, 193 (2003).
30. M.E. Tague, J.M. Gregoire, A. Legard, E. Smith, D. Dale, R. Hennig, F.J.
DiSalvo, R.B. van Dover, and H.D. Abruña: High throughput thin lm Pt-M
alloys for fuel electrooxidation: low concentrations of M (M= Sn, Ta, W,
Mo, Ru, Fe, In, Pd, Hf, Zn, Zr, Nb, Sc, Ni, Ti, V, Cr, Rh). J. Electrochem.
Soc.159, F880 (2012).
608 MRS COMMUNICATIONS VOLUME 9 IS SUE 2 www.mrs.org/mrc
https://doi.org/10.1557/mrc.2019.50
Downloaded from https://www.cambridge.org/core. Cornell University Library, on 02 Sep 2019 at 14:54:20, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
... Understanding material trade-offs, especially using better-choice materials-i.e., functional materials with reduced environmental impact by enhanced durability and lifetime [121]-is paramount for reuse and repurposing. To assist such an understanding, machine learning and AI methods have been proposed, e.g., to accelerate material development by physics-constrained AI [53] and design space exploration by active transfer learning and data augmentation [73]. Such advanced mechanisms improve value retention in circular systems engineering, and therefore, their development offers research avenues with elevated utility. ...
Preprint
The perception of the value and propriety of modern engineered systems is changing. In addition to their functional and extra-functional properties, nowadays' systems are also evaluated by their sustainability properties. The next generation of systems will be characterized by an overall elevated sustainability -- including their post-life, driven by efficient value retention mechanisms. Current systems engineering practices fall short of supporting these ambitions and need to be revised appropriately. In this paper, we introduce the concept of circular systems engineering, a novel paradigm for systems sustainability, and define two principles to successfully implement it: end-to-end sustainability and bipartite sustainability. We outline typical organizational evolution patterns that lead to the implementation and adoption of circularity principles, and outline key challenges and research opportunities.
... Understanding material trade-offs, especially using better-choice materials-i.e., functional materials with reduced environmental impact by enhanced durability and lifetime [120]-is paramount for reuse and repurposing. To assist such an understanding, machine learning and AI methods have been proposed, e.g., to accelerate material development by physics-constrained AI [52] and design space exploration by active transfer learning and data augmentation [72]. Such advanced mechanisms improve value retention in circular systems engineering, and therefore, their development offers research avenues with elevated utility. ...
Article
Full-text available
The perception of the value and propriety of modern engineered systems is changing. In addition to their functional and extra-functional properties, nowadays’ systems are also evaluated by their sustainability properties. The next generation of systems will be characterized by an overall elevated sustainability—including their post-life, driven by efficient value retention mechanisms. Current systems engineering practices fall short of supporting these ambitions and need to be revised appropriately. In this paper, we introduce the concept of circular systems engineering, a novel paradigm for systems sustainability, and define two principles to successfully implement it: end-to-end sustainability and bipartite sustainability. We outline typical organizational evolution patterns that lead to the implementation and adoption of circularity principles, and outline key challenges and research opportunities.
... • Classification of crystal structure using a convolutional neural network, 2017, Park et al. 64 • Symmetry prediction and knowledge discovery from X-ray diffraction patterns using an interpretable machine learning approach, 2020, Suzuki et al. 65 • Emergence and distinction of classes in XRD data via machine learning, 2019, Royse et al. 66 • Automating crystal-structure phase mapping by combining deep learning with constraint reasoning, Gomes et al. 67,68 Transmission electron microscopy (TEM) • Extraction of physical parameters from X-ray spectromicroscopy data using machine learning, 2018, Suzuki et al. 74 • Machine-Learning X-Ray absorption spectra to quantitative accuracy, 2020, Carbone et al. 75 • "Inverting" X-ray absorption spectra of catalysts by machine learning in search for activity descriptors, 2019, 77 • Machine learning enhanced spectroscopic analysis: towards autonomous chemical mixture characterization for rapid process optimization, 2021, Angulo et al. 78 Brunauer-Emmett-Teller (BET) ...
Preprint
Full-text available
Materials acceleration platforms (MAPs) combine automation and artificial intelligence to accelerate the discovery of molecules and materials. They have potential to play a role in addressing complex societal problems such as climate change. Solar chemicals and fuels generation via heterogeneous CO2 photo(thermal)catalysis is a relatively unexplored process that holds potential for contributing towards an environmentally and economically sustainable future, and therefore a very promising application for MAP science and engineering. Here, we present a brief overview of how design and innovation in heterogeneous CO2 photo(thermal)catalysis, from materials discovery to engineering and scale-up, could benefit from MAPs. We discuss relevant design and performance descriptors and the level of automation of state-of-the-art experimental techniques, and we review examples of artificial intelligence in data analysis. Based on these precedents, we finally propose a MAP outline for autonomous and accelerated discoveries in the emerging field of solar chemicals and fuels sourced from CO2 photo(thermal)catalysis.
... Understanding material trade-offs, especially using better-choice materials-i.e., functional materials with reduced environmental impact by enhanced durability and lifetime [121]-is paramount for reuse and repurposing. To assist such an understanding, machine learning and AI methods have been proposed, e.g., to accelerate material development by physics-constrained AI [53] and design space exploration by active transfer learning and data augmentation [73]. Such advanced mechanisms improve value retention in circular systems engineering, and therefore, their development offers research avenues with elevated utility. ...
Preprint
Full-text available
The perception of the value and propriety of modern engineered systems is changing. In addition to their functional and extra-functional properties, nowadays' systems are also evaluated by their sustainability properties. The next generation of systems will be characterized by an overall elevated sustainability—including their post-life, driven by efficient value retention mechanisms. Current systems engineering practices fall short of supporting these ambitions and need to be revised appropriately. In this paper, we introduce the concept of circular systems engineering, a novel paradigm for systems sustainability, and define two principles to successfully implement it: end-to-end sustainability and bipartite sustainability. We outline typical organizational evolution patterns that lead to the implementation and adoption of circularity principles, and outline key challenges and research opportunities.
... Future inclusion of GxE interactions to the model can account for better tailoring genotypes and management practices to expected climates [33]. New artificial intelligence models and correlation structures may even unriddle yet undiscovered aspects on the GxExM interaction [74]. Last, future frameworks should also consider the complexity of multi-objective optimization approaches in a Bayesian stochastic environment, considering the objectives of farmers, market and outcomes uncertainties (e.g. ...
Article
Full-text available
Crop yield results from the complex interaction between genotype, management, and environment. While farmers have control over what genotype to plant, and how to manage it, their decisions are often sub-optimal due to climate variability. Sub-seasonal climate predictions embrace the great potential to improve risk analysis and decision-making. However, adequate frameworks integrating future weather uncertainty to predict crop outcomes are lacking. Maize (Zea mays L.) yields are highly sensitive to weather anomalies, and very responsive to plant density (plants m-2), thus this variable could be optimized conditional to the seasonal prospects. The aims of this study were to (i) design a model that describes the yield-to-plant density (herein termed as yield-density) relationship as a function of weather variables, (ii) evaluate the predictive performance and analyze the sources of uncertainty, and (iii) provide probabilistic forecasts for predicting the economic optimum plant density (EOPD). We present a novel approach to enable decision-making in agriculture using sub-seasonal climate predictions and Bayesian modeling. This model provides crop management recommendations by accounting for various sources of uncertainty. A Bayesian hierarchical shrinkage model was fitted to the response of maize yield-density trials performed during the 2010-2019 period across 7 states in the United States, identifying the relative importance of key weather, crop, and soil variables. Tercile forecasts of precipitation and temperature from the International Research Institute (IRI) were used to forecast EOPD before the start of the season. The variables with the greatest influence on the yield-density relationship were weather anomalies, especially months with above-normal temperatures. Improvements on climate forecasting may also improve precision, as the coefficient of determination (R2) increased from 0.26 to 0.32 when weather forecasts were correct. This study may contribute to the development of decision-support tools that can trigger discussions between farmers and consultants about management strategies and their associated risks.
Article
Full-text available
Autonomous materials research labs require the ability to combine and learn from diverse data streams. This is especially true for learning material synthesis–process–structure–property relationships, key to accelerating materials optimization and discovery as well as accelerating mechanistic understanding. We present the Synthesis–process–structure–property relAtionship coreGionalized lEarner (SAGE) algorithm. A fully Bayesian algorithm that uses multimodal coregionalization and probability to merge knowledge across data sources into a unified model of synthesis–process–structure–property relationships. SAGE outputs a probabilistic posterior including the most likely relationship given the data along with proper uncertainty quantification. Beyond autonomous systems, SAGE will allow materials researchers to unify knowledge across their lab toward making better experiment design decisions.
Article
Full-text available
The rapid growth of automated and autonomous instrumentation brings forth opportunities for the co-orchestration of multimodal tools that are equipped with multiple sequential detection methods or several characterization techniques to explore identical samples. This is exemplified by combinatorial libraries that can be explored in multiple locations via multiple tools simultaneously or downstream characterization in automated synthesis systems. In co-orchestration approaches, information gained in one modality should accelerate the discovery of other modalities. Correspondingly, an orchestrating agent should select the measurement modality based on the anticipated knowledge gain and measurement cost. Herein, we propose and implement a co-orchestration approach for conducting measurements with complex observables, such as spectra or images. The method relies on combining dimensionality reduction by variational autoencoders with representation learning for control over the latent space structure and integration into an iterative workflow via multi-task Gaussian Processes (GPs). This approach further allows for the native incorporation of the system's physics via a probabilistic model as a mean function of the GPs. We illustrate this method for different modes of piezoresponse force microscopy and micro-Raman spectroscopy on a combinatorial Sm-BiFeO3 library. However, the proposed framework is general and can be extended to multiple measurement modalities and arbitrary dimensionality of the measured signals.
Article
Full-text available
Designing de novo proteins beyond those found in nature holds significant promise for advancements in both scientific and engineering applications. Current methodologies for protein design often rely on AI-based models, such as surrogate models that address end-to-end problems by linking protein structure to material properties or vice versa. However, these models frequently focus on specific material objectives or structural properties, limiting their flexibility when incorporating out-of-domain knowledge into the design process or comprehensive data analysis is required. In this study, we introduce ProtAgents, a platform for de novo protein design based on Large Language Models (LLMs), where multiple AI agents with distinct capabilities collaboratively address complex tasks within a dynamic environment. The versatility in agent development allows for expertise in diverse domains, including knowledge retrieval, protein structure analysis, physics-based simulations, and results analysis. The dynamic collaboration between agents, empowered by LLMs, provides a versatile approach to tackling protein design and analysis problems, as demonstrated through diverse examples in this study. The problems of interest encompass designing new proteins, analyzing protein structures and obtaining new first-principles data – natural vibrational frequencies – via physics simulations. The concerted effort of the system allows for powerful automated and synergistic design of de novo proteins with targeted mechanical properties. The flexibility in designing the agents, on one hand, and their capacity in autonomous collaboration through the dynamic LLM-based multi-agent environment on the other hand, unleashes great potentials of LLMs in addressing multi-objective materials problems and opens up new avenues for autonomous materials discovery and design.
Article
Full-text available
While the vision of accelerating materials discovery using data driven methods is well-founded, practical realization has been throttled due to challenges in data generation, ingestion, and materials state-aware machine learning. High-throughput experiments and automated computational workflows are addressing the challenge of data generation, and capitalizing on these emerging data resources requires ingestion of data into an architecture that captures the complex provenance of experiments and simulations. In this manuscript, we describe an event-sourced architecture for materials provenance (ESAMP) that encodes the sequence and interrelationships among events occurring in a simulation or experiment. We use this architecture to ingest a large and varied dataset (MEAD) that contains raw data and metadata from millions of materials synthesis and characterization experiments performed using various modalities such as serial, parallel, multi-modal experimentation. Our data architecture tracks the evolution of a material's state, enabling a demonstration of how state-equivalency rules can be used to generate datasets that significantly enhance data-driven materials discovery. Specifically, using state-equivalency rules and parameters associated with state-changing processes in addition to the typically used composition data, we demonstrated marked reduction of uncertainty in prediction of overpotential for oxygen evolution reaction (OER) catalysts. Finally, we discuss the importance of ESAMP architecture in enabling several aspects of accelerated materials discovery such as dynamic workflow design, generation of knowledge graphs, and efficient integration of simulation and experiment.
Article
Full-text available
Analyzing large X-ray diffraction (XRD) datasets is a key step in high-throughput mapping of the compositional phase diagrams of combinatorial materials libraries. Optimizing and automating this task can help accelerate the process of discovery of materials with novel and desirable properties. Here, we report a new method for pattern analysis and phase extraction of XRD datasets. The method expands the Nonnegative Matrix Factorization method, which has been used previously to analyze such datasets, by combining it with custom clustering and cross-correlation algorithms. This new method is capable of robust determination of the number of basis patterns present in the data which, in turn, enables straightforward identification of any possible peak-shifted patterns. Peak-shifting arises due to continuous change in the lattice constants as a function of composition, and is ubiquitous in XRD datasets from composition spread libraries. Successful identification of the peak-shifted patterns allows proper quantification and classification of the basis XRD patterns, which is necessary in order to decipher the contribution of each unique single-phase structure to the multi-phase regions. The process can be utilized to determine accurately the compositional phase diagram of a system under study. The presented method is applied to one synthetic and one experimental dataset, and demonstrates robust accuracy and identification abilities.
Article
Full-text available
Artificial intelligence can speed up research into new photovoltaic, battery and carbon-capture materials, argue Edward Sargent, Alán Aspuru-Guzikand colleagues.
Article
Full-text available
A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo's own move selections and also the winner of AlphaGo's games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100-0 against the previously published, champion-defeating AlphaGo. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Conference Paper
Full-text available
Matrix factorization is a robust and widely adopted technique in data science, in which a given matrix is decomposed as the product of low rank matrices. We study a challenging constrained matrix factorization problem in materials discovery, the so-called phase mapping problem. We introduce a novel “lazy” Iterative Agile Factor Decomposition (IAFD) approach that relaxes and postpones non-convex constraint sets (the lazy constraints), iteratively enforcing them when violations are detected. IAFD interleaves multiplicative gradient-based updates with efficient modular algorithms that detect and repair constraint violations, while still ensuring fast run times. Experimental results show that IAFD is several orders of magnitude faster and its solutions are also in general considerably better than previous approaches. IAFD solves a key problem in materials discovery while also paving the way towards tackling constrained matrix factorization problems in general, with broader implications for data science.
Article
Full-text available
The Materials Genome Initiative, a national effort to introduce new materials into the market faster and at lower cost, has made significant progress in computational simulation and modeling of materials. To build on this progress, a large amount of experimental data for validating these models, and informing more sophisticated ones, will be required. High-throughput experimentation generates large volumes of experimental data using combinatorial materials synthesis and rapid measurement techniques, making it an ideal experimental complement to bring the Materials Genome Initiative vision to fruition. This paper reviews the state-of-the-art results, opportunities, and challenges in high-throughput experimentation for materials design. A major conclusion is that an effort to deploy a federated network of high-throughput experimental (synthesis and characterization) tools, which are integrated with a modern materials data infrastructure, is needed.
Article
Full-text available
Advances in materials are an important contributor to our technological progress, and yet the process of materials discovery and development itself is slow. Our current research process is human-centred, where human researchers design, conduct, analyse and interpret experiments, and then decide what to do next. We have built an Autonomous Research System (ARES)—an autonomous research robot capable of first-of-its-kind closed-loop iterative materials experimentation. ARES exploits advances in autonomous robotics, artificial intelligence, data sciences, and high-throughput and in situ techniques, and is able to design, execute and analyse its own experiments orders of magnitude faster than current research methods. We applied ARES to study the synthesis of single-walled carbon nanotubes, and show that it successfully learned to grow them at targeted growth rates. ARES has broad implications for the future roles of humans and autonomous research robots, and for human-machine partnering. We believe autonomous research robots like ARES constitute a disruptive advance in our ability to understand and develop complex materials at an unprecedented rate.
Article
Full-text available
Rapid construction of phase diagrams is a central tenet of combinatorial materials science with accelerated materials discovery efforts often hampered by challenges in interpreting combinatorial x-ray diffraction datasets, which we address by developing AgileFD, an artificial intelligence algorithm that enables rapid phase mapping from a combinatorial library of x-ray diffraction patterns. AgileFD models alloying-based peak shifting through a novel expansion of convolutional nonnegative matrix factorization, which not only improves the identification of constituent phases but also maps their concentration and lattice parameter as a function of composition. By incorporating Gibbs phase rule into the algorithm, physically meaningful phase maps are obtained with unsupervised operation, and more refined solutions are attained by injecting expert knowledge of the system. The algorithm is demonstrated through investigation of the V-Mn-Nb oxide system where decomposition of eight oxide phases, including two with substantial alloying, provides the first phase map for this pseudo-ternary system. This phase map enables interpretation of high-throughput band gap data, leading to the discovery of new solar light absorbers and the alloying-based tuning of the direct-allowed band-gap energy of MnV2O6. The open-source family of AgileFD algorithms can be implemented into a broad range of high throughput workflows to accelerate materials discovery.
Article
The discovery and development of novel materials in the field of energy are essential to accelerate the transition to a low-carbon economy. Bringing recent technological innovations in automation, robotics and computer science together with current approaches in chemistry, materials synthesis and characterization will act as a catalyst for revolutionizing traditional research and development in both industry and academia. This Perspective provides a vision for an integrated artificial intelligence approach towards autonomous materials discovery, which, in our opinion, will emerge within the next 5 to 10 years. The approach we discuss requires the integration of the following tools, which have already seen substantial development to date: high-throughput virtual screening, automated synthesis planning, automated laboratories and machine learning algorithms. In addition to reducing the time to deployment of new materials by an order of magnitude, this integrated approach is expected to lower the cost associated with the initial discovery. Thus, the price of the final products (for example, solar panels, batteries and electric vehicles) will also decrease. This in turn will enable industries and governments to meet more ambitious targets in terms of reducing greenhouse gas emissions at a faster pace.
Article
We present a generic analysis of the implications of energetic scaling relations on the possibilities for bifunctional gains at homogeneous bimetallic alloy catalysts. Such catalysts exhibit a large number of interface sites, where second-order reaction steps can involve intermediates adsorbed at different active sites. Using different types of model reactions, we show that such site-coupling reaction steps can provide bifunctional gains that allow for a bimetallic catalyst composed of two individually poor catalyst materials to approach the activity of the optimal mono-material catalyst. However, bifunctional gains can not result in activities higher than the activity peak of the mono-material volcano curve as long as both sites obey similar scaling relations, as is generally the case for bimetallic catalysts. These scaling relation imposed limitations could be overcome by combining different classes of materials such as metals and oxides.