From highperformance computing to bigdata analytics of materials
From high-performance computing to big-data analytics of materials science and back Claudia Draxl
Scope HPC Data analytics Data creation Algorithms ? Data quality Data collection
Materials data & their structure Level Properties Methods Size I Atomic positions and nuclear charges, properties of free atoms, symmetry, temperature, pressure Input: definition of material gene 10 k. B 10 MB II The amount materials ontheory Total energy, electronof density, potential, data produced Density-functional wavefunctions, atomic, forces, optimized (DFT) and ab initio workstations compute clusters, and geometry, elastic constants, etc. molecular dynamics (MD) supercomputers is growing exponentially. III Excitation Most energies, of it is dielectric thrownscreening, away …. matrix elements of Coulomb interaction, etc. optical spectra, electrical conductivity, phonon spectra, thermal conductivity, etc. Many-body perturbation theory (MBPT), DF perturbation theory, ab initio MD IV Efficiency of solar cell, thermoelectric figure of Modeling, output derived merit, turn-over frequency of catalyst, etc. from levels I-III as a function of temperature and pressure phenotype 10 MB 10 TB 1 GB 1 TB 10 k. B 1 MB
Methodology Bottelneck: matrix diagonalization Generalized eigenvalue problem Scaling: N 3 N … number of atoms in the unit cell N … 1 - >1000 >100 basis functions / atom Density-functional theory e t a st d n u o gr Kohn-Sham equation
Methodology e r u t c u tr s nd a b Density-functional theory e t a st d n u o gr Kohn-Sham equation
Methodology Bottleneck: non-local operators Scaling: ~N 4 e r u t c u tr s nd a b Many-body perturbation theory G 0 W 0 approximation Density-functional theory e t a st d n u o gr Kohn-Sham equation
a r t ec p s e r u t c u tr s nd a b G 0 W 0 approximation Density-functional theory e t a st d n u o gr Kohn-Sham equation
a r t ec p s e r u t c u tr s nd a b e t a st d n u o gr Electron-hole pair
a r t ec p s Bethe-Salpeter equation Many-body perturbation theory e r u t c u tr s nd a b G 0 W 0 approximation Density-functional theory e t a st d n u o gr Kohn-Sham equation
a r t ec p s Bethe-Salpeter equation Many-body perturbation theory e r u t c u tr s nd a b Spectroscopy Bottleneck: matrix size e t a st d n u o gr 10 000 -100 000 for small systems Non-local operators
Thermoelectrics Direct conversion between temperature difference and electric voltage Enormous potential for waste-heat recovery 100 MW Thermoelectric generator
Waste-heat recovery What makes a good thermoelectric? Figure of merit ZT with S Seebeck coefficient s electronic conductivity k thermal conductivity Current values: Z = 0. 6 – 1. 5 Profitable applications: Z > 2 Problem: High electrical conductivity s and low thermal conductivity k is required at the same time
Scope HPC Data creation Algorithms ? Data collection
Materials data & their structure Level Properties Methods Size I Atomic positions and nuclear charges, properties of free atoms, symmetry, temperature, pressure Input: definition of material gene 10 k. B 1 MB II Total energy, electron density, potential, wavefunctions, atomic forces, optimized geometry, elastic constants, etc. Density-functional theory (DFT) and ab initio molecular dynamics (MD) 10 MB 10 TB III Excitation energies, dielectric screening, matrix elements of Coulomb interaction, etc. optical spectra, electrical conductivity, phonon spectra, thermal conductivity, etc. Many-body perturbation theory (MBPT), DF perturbation theory, ab initio MD IV Efficiency of solar cell, thermoelectric figure of Modeling, output derived merit, turn-over frequency of catalyst, etc. from levels I-III as a function of temperature and pressure phenotype 1 GB 1 TB 10 k. B 1 MB
Novel Materials Discovery http: //nomad-repository. eu 3 314 840 The No. Ma. D (Novel Materials Discovery) Repository was established to host, organize, and share materials data.
Kristian Thygesen. Ciaran Clissman Pintail Dublin DTU Lyngby Arndt Bode LRZ Munich Jose Maria Cela Alessandro De Vita BSC Barcelona Kings College London Claudia Draxl HU Berlin Matthias Scheffler FHI Berlin Angel Rubio MPSD Hamburg Risto Nieminen Kimmo Koski Aalto Univ. Helsinki CSC Helsinki Daan Frenkel Univ. Cambridge Francesc Illas Univ. Barcelona Stefan Heinzel MPSCD Garching
user Data flow raw data Master No. Ma. D Lab raw data
NOMAD Laboratory Existing resources Code-dependent data Give access to the vast amount of materials data computed worldwide Data conversion Big-data analytics Materials encyclopedia NOMAD HPC expertise & hardware database Visualization
Data conversion How to make data comparable? NOMAD supports ~40 different computer codes Common representation for various quantities E. g. , pseudopotentials vs all-electron methods Evaluate error bars Different functionals, force fields, … Metadata Generic and code-specific https: //nomad-coe. eu/index. php? page=nomad-meta-info
HPC Data creation Algorithms ? Data quality Data collection
Delta factors
Delta factors Compute E(V) using PBE Fit to the Birch-Murnaghan equation of state Compare with other codes / method Quality factor D K. Lejaeghere et al. , Science 351, aad 3000 (2016).
Delta factors https: //molmod. ugent. be/deltacodesdft
Delta factors https: //molmod. ugent. be/deltacodesdft
Delta factors O 2
Delta factors Slide from S. Cottenier K. Lejaeghere et al. , Science 351, aad 3000 (2016).
This is all great … but just the beginning What about other systems? Surfaces, defects, molecules, … What about other quantities? Band gaps, barriers, spectra …
Can we reach ultimate precision? Total energies of atoms compared to MADNESS using multiresolution analysis Andris Gulans Same for molecules Yes we can! Ha
http: //exciting-code. org A. Gulans, S. Kontur, C. Meisenbichler, D. Nabok, P. Pavone, S. Rigamonti, S. Sagmeister, U. Werner, and C. Draxl exciting: a full-potential all-electron package implementing density-functional theory and manybody perturbation theory J. Phys: Condens. Matter 26, 363202 (2014). exciting is a full-potential all-electron density-functional-theory package implementing the families of linearized augmented planewave methods. It can be applied to all kinds of materials, irrespective of the atomic species involved, and also allows for exploring the physics of core electrons. A particular focus are excited states within many-body perturbation theory.
Dual basis for WF, density, potential, … Atomic spheres , Atomic-like basis functions Interstitial Planewave basis All-electron method Can handle strong variations Can explore the core region a I I b
HPC Data creation Algorithms ? Data quality Data collection
Algorithms Matrix size Tested up to 8 mio. Needed up to 1 mio. after implementing a Coulomb cutoff Block-Davidson Special implementation to be used in connection with LAPW method Scaling O(V*log(V)) V …volume of unit cell Will domain scientists meet exascale challenges?
HPC Data analytics Data creation Algorithms ? Data quality Data collection
Big-data Analytics Identifying correlations and structure in big data of materials will enable scientists and engineers to decide which materials are useful for specific applications or which new materials should be the focus of future studies.
Big-Data Analytics One example …
Classification of materials Can we predict the crystal structure from the nuclear charges ZA and ZB? Classical example: Phillips – Van Vechten Problem J. A. Van Vechten, PRB 182 , 891 (1969). J. C. Phillips, Rev. Mod. Phys. 42, 317 (1970). A. Zunger, PRB 22, 5839 (1980). D. G. Pettifor, Solid State Commun. 51, 31 (1984). Y. Saad, D. Gao, T. Ngo, S. Bobbitt, J. R. Chelikowsky, and W. Andreoni, PRB 85, 104104 (2012). L. Ghiringhelli J. Vybiral S. Levchenko M. Scheffler
Rocksalt vs zincblende D = E(rs) – E(zb) [e. V] > 0. 2 Î [0. 1, 0. 2] Î [0. 05, 0. 1] 50 ZB 40 20 Î [-0. 05, 0. 05] Î [-0. 1, -0. 05] 10 Î [-0. 2, -0. 1] 30 10 20 30 40 50 ZA Î ≤ -0. 2
Rocksalt vs zincblende Classification problem J. A. Van Vechten, PRB 182 , 891 (1969). J. C. Phillips, RMP 42, 317 (1970). 50 ZB 40 d 2 30 20 10 10 20 30 40 50 ZA d 1
How to proceed? Training set Calculate property, P, for many materials Descriptors Build feature space, d DFT Cross validation Predictions Learning Calculate property for test set / new materials For training data find function P(d) e. g. LASSO
Building descriptors Primary features Free atoms IP(A), IP(B) EA(A), EA(B) H(A), H(B) L(A), L(B) Ionization potential Electron affinity Highest occupied Kohn-Sham level Lowest unoccupied Kohn-Sham level rs(A), rs(B) rp(A), rp(B) rd(A), rd(B) Radius at max. of s-like wavefunction Radius at max. of p-like wavefunction Radius at max. of d-like wavefunction Dimers HL(AA), HL(BB), HL(AB) HOMO-LUMO KS gap Eb(AA), Eb(BB), Eb(AB) Binding energy d(AA), d(BB), d(AB) Equilibrium distance L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheffler, PRL 114, 105503 (2015).
Building descriptors Full feature space 10 000 nonlinear combinations of primary features +, -, *, /, 2, 3, √, exp. , … Linear relationship P(d) = c d Let the machine choose most relevant descriptors L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheffler, PRL 114, 105503 (2015). L. M. Ghiringhelli, et al. , New J. Phys. , in print.
Descriptors selected by LASSO Least Absolute Shrinkage and Selection Operator 2 D representation 0. 25 d 2 [Å] 0. 20 0. 15 0. 10 0. 05 1 2 3 4 5 d 1 [e. V/Å2] Most relevant descriptors d 1 d 2 d 3 6 L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheffler, PRL 114, 105503 (2015). L. M. Ghiringhelli, et al. , New J. Phys. , in print.
HPC Data analytics Data creation Algorithms ? Data quality Data collection
Novel Materials Discovery http: //nomad-repository. eu Currently 57 mio. files Amount rapidly increasing Several replica (BSC, China 2, Korea …) Working on the entire DB on the search for unusual phenomena Additional dedicated high-throughput calculations needed HPC resources required 3 314 840
Summary Data analytics Data quality HPC Data creation Algorithms Data collection
This project has received funding from the European Union’s Horizon 2020 research and innovation programme, grant agreement No 676580. Thank you!
- Slides: 46