Content uploaded by Logan Ward
Author content
All content in this area was uploaded by Logan Ward on Feb 09, 2019
Content may be subject to copyright.
Combining Machine Learning and
High-Throughput Science
Logan Ward
Postdoctoral Scholar
Dept. of Computer Science, University of Chicago
Data Science and Learning Division, Argonne National Laboratory
9 February 2019
Machine Learning is a Modeling Tool
FEA Image: http://www.icams.de/content/research/index.html
2
Goal: Accelerate design of materials
Method: Replace experiments with computers
Many Theory-Based Tools:
•Density Functional Theory
•Phase Field
•Finite Element Analysis
•Computational Thermodynamics
Emerging Field: Materials Informatics
Machine Learning
Materials
Data
Predictive
Model
…but some things set it apart
Advantages over [Purely] Physics-Based Models:
✓Fast 104-107 evaluations/CPU/sec
✓Adaptable Limited need to know underlying physics
✓Self-correcting Improves with more data.
✓Unbiased Can lead to unexpected predictions
3
Balachandran et al. Sci. Rep. (2016), 19660.
My Talk Today:
-Show the relationship between ML and Automated Science
-Discuss how to make such studies more common
To realize these benefits,
we must perform many
experiments
Acknowledgements:
NU (EECS): R. Liu, A. Krishna, A. Agrawal
NU (MSE): K. Kim, J. He, V. Hegde, P. Voorhees, C. Wolverton
ML and HT Computation
4
High-Throughput DFT: A Common Tool
5
Chen et al. Chem. Mater. (2012)
Kirklin et al. PCCP (2014)
Nyshadham et al. Acta Mat (2017)
Many Computations Spent on Less Promising Areas:
Can we do better with ML?
Quaternary Heuslers
6
Goal: Find more Quaternary Heuslers (QHs)
Why quaternary Heuslers?
Ternary Heuslers have great properties,
…so a 4th degree of freedom could be better?
Problem: ~3M possible combinations
Solution: Guide search with ML
DFT
He et al. PRL. (2016), 046602 Jung et al. MMTA. (2003), 1221
Ref: Kim et al., PRM (2018)
Materials Informatics Workflow
7
Collect Process Represent Learn
3 4 -1.0
3 5 -0.5
Step 1: Gather Data
Ref: Kim et al., in preparation
8
Data Source: Open Quantum Materials Database
Why OQMD?
✔Large✔Contains unstable materials
✔>200k ternary Heuslers✔90k quaternary Heuslers
What do we want in an ML model?
1Faber et al. Int J Quantum Chemisty. 2015
9
Stability = f(“Crystal Structure”)
Key Concern: How to represent an crystal structure
Must be invariant to…
1. Translation
2. Rotation
3. Unit cell choice
Representation should be…
1. Compact1
2. Complete1
3. Unique1
4. Descriptive1
5. Deformation stable
6. Relaxation insensitive
Truth
Observations
Ideal Model
E
configuration
Our Method: Voronoi Tessellations
Ref: Ward et al. PRB, (2017) 024104
10
Atomic Characteristics:
1. Element identity
2. Coordination number
3. Bond length
4. Cell size
5. Chemical ordering
6. Packing efficiencies
7. Neighbor identities
8. Cell shape
Atomic Characteristics + Statistics over Unit Cell = 275 Attributes
Other Advantageous Features
11
Volume Invariant
✓Resistant to relaxations
✓Reduced need to know lattice parameters
Quick Training (Random Forest)
✓Scales with
✓Very parallelizable
One Tessellation Per Prototype
✓Fast for prototype search
Learning Rate Comparison (2016)
12 Dataset: 32k DFT from the OQMD
Test: Remove 1000, train on N remaining
CM: Faber et al. Int J Quantum Chemisty. 2015; PRDF: Schutt et al. PRB. (2014)
Learning Rate Comparison (2018)
13 Dataset: 32k DFT from the OQMD
Test: Remove 1000, train on N remaining
CM: Faber et al. Int J Quantum Chemisty. 2015; PRDF: Schutt et al. PRB. (2014)
Reasonable Accuracies of Formation Enthalpy
Is our model useful?
14
Interpolate to different chemistries Identify stable polymorph
Test: 10-fold CV, grouped by composition
MAE: 53 meV/atom (<30% MAD)
RMSE: 74 meV/atom (<30% std. dev.)
: 0.91
Random ordering:
Model is…
•accurate on unobserved compositions
•able to identify most stable polymorph
Finding New Heuslers
15
ML success rate:
55 / 909 (6%)
Original search:
353 / 96189 (0.3%)
ML search >10x faster than conventional search
Acknowledgments:
Northwestern: Chris Wolverton
NIST: J. Hattrick-Simpers
SLAC: F. Ren, A. Mehta
USC: T. Williams
UNSW: K. Laws
ML and High Throughput Experiment
16
Sputtered Metallic Glasses
17
Goal: Find new sputterable glasses
Applications: Wear coatings
Problem: Sputtered MGs are not well explored
Number of alloys: 393
Number of ternaries: 31
Data Source: Landolt-Bornstein
Possible Space: 24 elements ~= 2.4M ternary alloys
Even @ 1000 alloys/day -> 6 years of search
Need: Prioritize promising experiments
Approach: Machine Learning
Ding, et al. Nat. Mat. (2014)
Ref: Ren et al. Sci Adv. (2018), eaaq1566
Grosse et al. PRB. (2006)
Why Machine Learning?
18
Mixing
Thermo.
Size Mismatch
Yang and Zhang. Mat. Chem. Phys. (2012)
Laws et al. Nat. Comm. (2015) Perim et al. Nat. Comm. (2015)
Theories Still Limited, or
Computationally Infeasible
Starting Point: Model from 2016
Ref: Ward et al. npj Comp. Mat., (2016) 28
19
Dataset: Landolt-Börnstein
6836 experimental measurements
295 ternary systems
All measurements taken using melt-spinning
Binary property: [Can Form Glass] | [Cannot]
Model: Random Forest
90% accurate in 10-fold cross-validation
Can interpolate between ternaries
Exp. ML
Step 1: Account for Processing Method
Ref: Ren et al. Sci Adv. (2018), eaaq1566
Stacking: Hutchinson et al. NIPS (2017)
20
Ding, et al. Nat. Mat. (2014)
Wikipedia
Melt Spinning Data
Sputtering Data
ML
ML
“Stacking”
Step 1: Account for Processing Method
Ref: Ren et al. Sci Adv. (2018), eaaq1566
21
Goal: Find new glasses via sputtering
Step 1: Include processing information
Tune for
Sputtering
Melt Spinning Sputtering
Step 1: Account for Processing Method
22
Test: Leave-Ternary-Out CV
Test Set: 393 alloys,
31 ternary diagrams
Melt-spinning model
AUC: 0.76
Sputtering Model
AUC: 0.78
HiTp Materials Discovery
23
Co-
Deposition Automated
XRD FWHM
Analysis New Data
Ding, et al. Nat. Mat. (2014)
More data in 1 experiment than past 30 years of sputtering experiments
Search Space: All sputterable ternary alloys (2.4M Alloys)
CPU Time: ~4 CPU hours
Selected Experiment: Co-V-Zr
Evaluation Technique: Combinatorial Co-Deposition
Comparison to Experiment
24
Decent agreement. ML model “Zr-lean”, but close enough for success
J. Hattrick-Simpers
(NIST)
A. Mehta
(SLAC)
F. Ren
(SLAC) T. Williams
(USC)
Repeat, with Improved Model
25
Before Co-V-Zr After Co-V-Zr
Initial
Model Add
Co-V-Zr Improved
Accuracy
Original
Model
HiTp
Data
Updated
Model
Improvements for Many Systems
26
Melt-spinning Model
AUC: 0.61
Landolt Bornstein
+ All HiTp Data
AUC: 0.77
Sputtering Model:
AUC: 0.64
Test: Leave-Ternary-Out CV
Test Set: 673 alloys,
35 ternary diagrams
HiTp Data Improves ML Accuracy for all Systems
ML and High-Throughput Science
Ref: Ren et al. Sci Adv. (2018), eaaq1566
27
Why isn’t ML more common?
28
Where are the roadblocks?
Desired Outcome: Adaptive design with automated experiments
29
Figure: Balachandran et al. Sci. Rep. (2016), 19660. doi: 10.1038/srep19660
Problems
with Data Problems
with Learning
Many Facets to Data Problems
See: Ward et al. MRS Bulletin (2018)
30
MDF: Simplify Access to Data
EP
EP
EP
•Query
•Browse
•Aggregate
•Mint DOIs
•Associate
metadata
•Persist
datasets
Databases
Datasets
APIs
LIMS
etc.
Distributed data
storage
Data
Publication
Data
Discovery
Ian Foster
(PI) Ben Blaiszik Jonathon
Gaff
Publish and find materials data,
regardless of size, location, type
31
Where are the learning problems?
32
Collect Process Represent Learn
3 4 -1.0
3 5 -0.5
There are many materials-specific software challenges,
usually dealing with field-unique data
Assume this
is Solved
This Part
is Solved
Need #1: Accessible Methods
Pictures from GitHub, Wikipedia
33
The state of the community looks excellent
Molecules
Microstructures
pyMKS
Electronic Structures
Crystal Structures Many new codes
released in the
past year!
Micrographs
Main Take-Away Points
36
Fast ML Accelerates Design Fast Experiment Improves ML
Data Infrastructure Required to Make Informatics Widespread
Ref: Fang et al. Sci Adv (2018)
Ref: Kim et al. PRM (2018) 123801
Ref: Ward et al. MRS Bulletin (2018)
Thanks to our sponsors!
37
U . S . D E P A R T M E N T O F
ENERGY