PARALLEL COMPUTING IN COMPUTATIONAL CHEMISTRY Why What Happens









































- Slides: 41
PARALLEL COMPUTING IN COMPUTATIONAL CHEMISTRY
Why ? What Happens in molecular level? 2
3
is a branch of chemistry that uses computer simulation to assist in solving chemical problems. It uses methods of theoretical chemistry, incorporated into efficient computer programs, to calculate the structures and properties of molecules. 4
Phase Transition for a Hard Sphere System B. J. Alder and T. E. Wainwright J. Chem. Phys. 27, 1208 (1957); doi: 10. 1063/1. 1743957 PNO: 96 PBC MD simulation IBM-704
IBM-704 The first mass-produced computer with floating-point arithmetic hardware was introduced by IBM in 1954 32 bit execute up to 12, 000 calculations per second (O k. FLOPS) : Petaplops 10^15 ( A quadrillion (thousand trillion) calculations per second ) Future: exa. FLOPS 10^18 (a billion calculations per second) !!!! 6
1960: Vineyard group; Simulated radiation damage of a Cu crystal with MD 1964: Rahman; MD simulation of liquid Ar 1969: Barker and Watts; Monte Carlo simulation of water 1971: Rahman and Stillinger; MD simulation of water Cray T 3 E 1(1995) (1976) 7
Ref: www. maximumpc. com Year speed unit 1985 33 MHz 1989 100 MHz 1993 233 MHz 1996 385 MHz 1997 450 MHz 1999 570 MHz 1999 1. 4 GHz 2000 2 GHz 2001 2. 25 GHz 2004 2. 3 GHz 2004 3. 2 GHz 2006 3. 2 GHz
: is a type of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved at the same time. 9
: ctions Instru ows e wind h t n a e 1 - cl oor n the d a e l c 2 oof n the r 3 - clea le he tab t n a e l 4 - c
problem instructions processor processor 11
Having suitable hardware The problem can be parallelized Having suitable algorithm 12
Having suitable hardware The problem can be parallelized Having suitable algorithm 13
HARDWARE: Parallel hardware architectures CPU Memory CPU Control Unit Arithmetic C Logic P Unit U Shared Memory Distributed Memory Input Output Memory CPU network Memory CPU
entral rocessing control, input/output) nit (basic arithmetic, logical, Single Core CPU Dual Core Quad Core 15
HARDWARE: Computational Units (GPU) raphics rocessing CPU nit GPU 16
HARDWARE: Computational Units (GPU) Ref: www. ks. uiuc. edu/Research/namd Molecular dynamics simulation of protein insertion process NCSA Lincoln Cluster performance (8 Intel cores and 2 NVIDIA Tesla GPUs per node, 1 million atoms) 17
HARDWARE: Computational Units (GPU) GPUs need a fundamentally different architecture. GPU constraints: One would have to program an application specifically for a GPU that uses different techniques. It needs new programming languages. It needs new programming paradigm. NAMD (www. ks. uiuc. edu/Research/namd) LAMMPS (lammps. sandia. gov) Gromacs (www. gromacs. org) DL_POLY 4 (www. stfc. ac. uk//research/app/ccg/software/DL_POLY/44516. aspx) GAMESS 2012 closed shell MP 2 and closed shell CCSD(T) energy (www. msg. ameslab. gov/gamess) 18
Having suitable hardware The problem can be parallelized Having suitable algorithm 19
20
x(1)=100. DO 10 i=2, 1000 x(i)=sin(x(i-1)) 10 CONTINUE i=2 : X(2)=sin(x(1)) i=3 : X(3)=sin(x(2)) i=4 : X(4)=sin(x(3)) 21
A= 2 1 5 3 0 7 1 6 9 2 4 4 3 6 7 2 for (i = 0; i < n; i++) for (j = 0; i < n; j++) B= 6 1 2 3 4 5 6 5 1 9 8 -8 4 0 -8 5 C[i][j] = 0; for (k= 0; k < n; k++) C[i][j] += a[i][k] * b[k][j] end for 22
A= B= 2 1 5 3 0 7 1 6 9 2 4 4 3 6 7 2 6 1 2 3 4 5 6 5 1 9 8 -8 4 0 -8 5 23
Having suitable hardware The problem can be parallelized Having suitable algorithm 24
25
Obtain initial guess for density matrix Fock matrix formation Iterate Two-electron integrals Diagonalize Fock matrix formation Form new density matrix Density formation Annihilation Others Integral evaluation 26
. . . Ref: DOI: 10. 1039/c 002859 b 27
Molecular dynamics (MD) is a computer simulation technique where the time evolution of a set of interacting atoms is followed by integrating their equation of motion. 28
Initialize Force Calculation forces Motion Analysis Others Summarize 29
. . . Ref: ROM. J. BIOCHEM. , 46, 2, 129 -148 (2009) 30
Advantages: Simplicity (this is relatively easy parallel strategy to implement, requiring only minor changes to scalar code. dis-advantages: Memory usage is high (due to duplication of data) Communication costs are quite high 31
Properties: Communication operations scale as rather than N Memory cost for positions and force vectors are reduced by the factor Retains the simplicity of the RD technique. Ref: DOI: 10. 1007/1 -4020 -2670 -5_15 32
rcut Properties: The communication costs can be minimized. Needs more sophisticated programming. Ref: DOI: 10. 1002/(SICI)1096 -987 X(199703)18: 4<478: : AID-JCC 3>3. 0. CO; 2 -Q 33
A lot of independent programs run as serial using a lot of CPU. (embarrassingly parallel) A problem divides in to some parts and each parts is run on each CPU. Load balancing Communication cost Computation cost The number of CPU Amount of memory The chosen algorithm 34
35
36
37
38
Hexanitroethane C 2 N 6 O 12 B 3 LYP/6 -31 g(df, pd) Single point 39
40
THANKS! 41