Data science for Materials Science Engineering Materials Descriptors

  • Slides: 21
Download presentation
Data science for Materials Science & Engineering Materials Descriptors for Data Science FACE CAMERA

Data science for Materials Science & Engineering Materials Descriptors for Data Science FACE CAMERA In this module • Enhance data using descriptors (this lecture) • Analyze descriptors, calculate and visualize correlations (this lecture) • Hands on tutorial using nano. HUB: modeling melting temperatures • Homework assignment Juan C. Verduzco, Zachary D. Mc. Clure, and Alejandro Strachan jverduzc@purdue. edu || zmcclure@purdue. edu || strachan@purdue. edu School of Materials Engineering & Network for Computational Nanotechnology Purdue University West Lafayette, Indiana USA Materials descriptors for data science - Lecture 1

Learning objectives and prerequisites FACE CAMERA After completing this lecture you will: • Enhance

Learning objectives and prerequisites FACE CAMERA After completing this lecture you will: • Enhance materials models using descriptors • Periodic table data • Surrogate properties and physics-based models • Analyze descriptors, calculate correlations to rank descriptors Pre-requisites: • Basic programming skills Materials descriptors for data science - Lecture 2

Data Science & Machine Learning in Science & Engineering Acquiring and handling data FACE

Data Science & Machine Learning in Science & Engineering Acquiring and handling data FACE CAMERA Learning from data Predictive models (supervised learning) Cyber-infrastructure Finding patterns (unsupervised learning) Design of experiments Materials descriptors for data science - Lecture 3

Data-driven models FACE CAMERA Machine learning: learning from data, by example Raw data Neural

Data-driven models FACE CAMERA Machine learning: learning from data, by example Raw data Neural Network Output • Neural networks are universal approximators • With lots of training data and large neural networks we can develop any model Materials descriptors for data science - Lecture 4

Deep learning FACE CAMERA Deep learning: lots of raw data, deep neural networks Composition

Deep learning FACE CAMERA Deep learning: lots of raw data, deep neural networks Composition Raw Processing data Heat treatment Neural Network Output Strength • What does the NN need to learn? Materials descriptors for data science - Lecture 5

Deep learning FACE CAMERA Deep learning: lots of raw data, deep neural networks Composition

Deep learning FACE CAMERA Deep learning: lots of raw data, deep neural networks Composition Processing Heat treatment Phase diagram & kinetics Phases present Recrystallization temperature Grain size Hardening law Dislocation density Interfacial strengthening Solid solution strengthening Strength Cold work • What does the NN need to learn? Materials descriptors for data science - Lecture 6

What is data is limited? FACE CAMERA This is often the case in materials

What is data is limited? FACE CAMERA This is often the case in materials science and related fields Composition Processing Heat treatment Phase diagram & kinetics Phases present Recrystallization temperature Grain size Dislocation density Interfacial strengthening Solid solution strengthening Strength Cold work Microstructure (phases & Hardening law grain size) XRD • Descriptors enable us to infuse physics into machine learning models • Improve accuracy when data is scarce Materials descriptors for data science - Lecture 7

Use of descriptors vs. deep learning FACE CAMERA Descriptors Raw data Machine learning Deep

Use of descriptors vs. deep learning FACE CAMERA Descriptors Raw data Machine learning Deep Learning Output Adapted from: Jha et al. Scientific reports. 2018 Dec 4; 8(1): 1 -3. Materials descriptors for data science - Lecture 8

Example: Elem. Net FACE CAMERA Random Forest (only composition) Random Forest with physics descriptors

Example: Elem. Net FACE CAMERA Random Forest (only composition) Random Forest with physics descriptors Deep neural network (only composition) Jha D, Ward L, Paul A, Liao WK, Choudhary A, Wolverton C, Agrawal A. Elemnet: Deep learning the chemistry of materials from only elemental composition. Scientific reports. 2018 Dec 4; 8(1): 1 -3. Materials descriptors for data science - Lecture 9

Simplest, periodic table-based descriptors • Easy to calculate from composition FACE CAMERA • Materials

Simplest, periodic table-based descriptors • Easy to calculate from composition FACE CAMERA • Materials Agnostic Platform for Informatics and Exploration (magpie) – Stoichiometric attributes, that depend on the fraction of the elements present – Elemental property statistics, related to the properties of the conforming elements – Electronic structure attributes • Python libraries are also accessible for querying: Ward, L. , Agrawal, A. , Choudhary, A. , & Wolverton, C. (2016). A general-purpose machine learning framework for predicting properties of inorganic materials. npj Computational Materials, 2(1), 1 -7. Materials descriptors for data science - Lecture 10

Example: Modeling melting temperature FACE CAMERA Develop models to predict the melting temperature of

Example: Modeling melting temperature FACE CAMERA Develop models to predict the melting temperature of oxides Elements and formula Experimental melting temperature: this is what we want to predict https: //nanohub. org/tools/featureselect/ Materials descriptors for data science - Lecture 11

Random forests and decision trees FACE CAMERA IPF < 0. 5 Tm = 500

Random forests and decision trees FACE CAMERA IPF < 0. 5 Tm = 500 K IPF < 0. 57 IPF < 0. 62 Melting temp Tm = 1000 K Tm = 1400 K Tm = 2000 K Ionic packing fraction Materials descriptors for data science - Lecture 12

Random forests FACE CAMERA Random forests are ensembles of decision trees Choose training data

Random forests FACE CAMERA Random forests are ensembles of decision trees Choose training data for each tree randomly Choose features for each tree randomly Materials descriptors for data science - Lecture 13

Train a model with modest descriptors FACE CAMERA Create arrays of input and output

Train a model with modest descriptors FACE CAMERA Create arrays of input and output data Normalize data Input array is 157 x 4 (number of datapoints x number of descriptors) Materials descriptors for data science - Lecture 14

Train a model with modest descriptors FACE CAMERA • Mean average error (MAE) over

Train a model with modest descriptors FACE CAMERA • Mean average error (MAE) over 10 random forests is 500+ K • Can we do better with better descriptors? Materials descriptors for data science - Lecture 15

Adding descriptors based on periodic table data FACE CAMERA s e i rt e

Adding descriptors based on periodic table data FACE CAMERA s e i rt e p o a t c i d io r Pe r p ble http: //wolverton. northwestern. edu/research/machine-learning Descriptor combinations calculated from composition Materials descriptors for data science - Lecture 16

Train a model with periodic table featurizers FACE CAMERA • Mean average error (MAE)

Train a model with periodic table featurizers FACE CAMERA • Mean average error (MAE) over 10 random forests is ~360 K – much better! • Can we do even better with physics? Materials descriptors for data science - Lecture 17

Add stiffness and Lindemann melting law Lindemann: physics based expression for melting temperatures FACE

Add stiffness and Lindemann melting law Lindemann: physics based expression for melting temperatures FACE CAMERA MAE: 426 K • Mean average error (MAE) over 10 random forests is ~290 K Materials descriptors for data science - Lecture 18

Good vs. bad descriptors: correlations FACE CAMERA • Correlate strongly with the output •

Good vs. bad descriptors: correlations FACE CAMERA • Correlate strongly with the output • Do not correlate strongly with each other • Increase dimensionality of the problem without adding value Pearson correlation (linear): Expectation value Mean values Standard deviations Materials descriptors for data science - Lecture 19

Pearson correlations for melting temperatures Tm elt FACE CAMERA IPF 0. 34 Density 0.

Pearson correlations for melting temperatures Tm elt FACE CAMERA IPF 0. 34 Density 0. 39 Space group 0. 43 Molar vol -0. 14 Shear modulus 0. 74 Bulk modulus 0. 72 Lindemann 0. 8 Melt Temp 1 1. 0 0. 0 -0. 25 Materials descriptors for data science - Lecture 20

Summary FACE CAMERA • Data science and machine learning tools are becoming indispensable in

Summary FACE CAMERA • Data science and machine learning tools are becoming indispensable in materials science and engineering • A good set of descriptors can significantly improve the performance of a machine learning model – Specially important when data is scarce • There are several attributes that can be derived from a material composition that can help describe it better to the model • Using domain knowledge can help in designing descriptors – Periodic table data – Physics-based simulation results and models – Experiments that are easier to perform than the Qo. I • Multiple libraries to query these attributes are available and ready to use in Python environments Materials descriptors for data science - Lecture 21