Universita dellInsubria Como Italy Introduction to Quantum Monte

Universita’ dell’Insubria, Como, Italy Introduction to Quantum Monte Carlo Dario Bressanini http: //scienze-como. uninsubria. it/bressanini UNAM, Mexico City, 2007 1

Why do simulations? • Simulations are a general method for “solving” many-body problems. Other methods usually involve approximations. • Experiment is limited and expensive. Simulations can complement the experiment. • Simulations are easy even for complex systems. • They scale up with the computer power. 2

Buffon needle experiment, AD 1777 L d 3

Simulations • “The general theory of quantum mechanics is now almost complete. The underlying physical laws necessary for the mathematical theory of a large part of physics and the whole of chemistry are thus completely known, and the difficulty is only that the exact application of these laws leads to equations much too complicated to be soluble. ” Dirac, 1929 4

General strategy n How to solve a deterministic problem using a Monte Carlo method? n Rephrase the problem using a probability distribution n “Measure” A by sampling the probability distribution 5

Monte Carlo Methods n The points Ri are generated using random numbers This is why the methods are called Monte Carlo methods n Metropolis, Ulam, Fermi, Von Neumann (-1945) n We introduce noise into the problem!! ´ ´ Our results have error bars. . . Nevertheless it might be a good way to proceed 6

Stanislaw Ulam (1909 -1984) S. Ulam is credited as the inventor of Monte Carlo method in 1940 s, which solves mathematical problems using statistical sampling. 7

Why Monte Carlo? • We can approximate the numerical value of a definite integral by the definition: • where we use L points xi uniformly spaced. 8

Error in Quadrature • Consider an integral in D dimensions: • N= LD uniformly spaced points, to CPU time • The error with N sampling points is 9

Monte Carlo Estimates of Integrals • If we sample the points not on regular grids, but randomly (uniformly distributed), then Where we assume the integration domain is a regular box of V=LD. 10

Monte Carlo Error • From probability theory one can show that the Monte Carlo error decreases with sample size N as • Independent of dimension D (good). • To get another decimal place takes 100 times longer! (bad) 11

MC is advantageous for large dimensions • Error by simple quadrature N-1/D • Using smarter quadrature N-A/D • Error by Monte Carlo always N-1/2 • Monte Carlo is always more efficient for large D (usually D > 4 - 6) 12

Monte Carlo Estimates of π (1, 0) We can estimate π using Monte Carlo 13

Monte Carlo Integration • Note that u Can automatically estimate the error by computing the standard deviation of the sampled function values u All points generated are independent u All points generated are used 14

Inefficient? • If the function is strongly peaked, the process is inefficient • We should generate more points where the function is large • Use a non-uniform distribution! 15

General Monte Carlo • If the samples are not drawn uniformly but with some probability distribution p(R), we can compute by Monte Carlo: Where p(R) is normalized, 16

Monte Carlo • so Convergence guaranteed by the Central Limit Theorem • The statistical error 0 if p(R) f(R), convergence is faster 17

Warning! • Beware of Monte Carlo integration routines in libraries: they usually cannot assume anything about your functions since they must be general. • Can be quite inefficients • Also beware of standard compiler supplied Random Number Generators (they are known to be bad!!) 18

Equation of state of a fluid The problem: compute the equation of state (p as function of particle density N/V ) of a fluid in a box given some interaction potential between the particles Assume for every position of particles we can compute the potential energy V(R) 19

The Statistical Mechanics Problem n For equilibrium properties we can just compute the Boltzmann multi-dimensional integrals n Where the energy usually is a sum 20

An inefficient recipe n n For 100 particles (not really thermodynamic limit), integrals are in 300 dimensions. The naïve MC procedure would be to uniformly distribute the particles in the box, throwing them randomly. If the density is high, throwing particles at random will put them some of them too close to each other. almost all such generated points will give negligible contribution, due to the boltzmann factor 21

An inefficient recipe n E(R) becomes very large and positive n We should try to generate more points where E(R) is close to the minima 22

The Metropolis Algorithm n How do we do it? n Use the Metropolis algorithm (M(RT)2 1953). . . and a powerful computer n The algorithm is a random walk (markov chain) in configuration space. Points are not independent Anyone who consider arithmetical methods of producing random digits is, of course, in a state of sin. John Von Neumann 23

24

25

Importance Sampling n The idea is to use Importance Sampling, that is sampling more where the function is large n “…, instead of choosing configurations randomly, …, we choose configuration with a probability exp(-E/k. BT) and weight them evenly. ” - from M(RT)2 paper 26

The key Ideas n Points are no longer independent! n We consider a point (a Walker) that moves in configuration space according to some rules 27

A Markov Chain n A Markov chain is a random walk through configuration space: R 1 R 2 R 3 R 4 … n Given Rn there is a transition probability to go to the next point Rn+1 : p(Rn Rn+1) stochastic matrix n In a Markov chain, the distribution of Rn+1 depends only on Rn. There is no memory n We must use an ergodic markov chain 28

The key Ideas n Choose an appropriate p(Rn Rn+1) so that at equilibrium we sample a distribution π(R) (for this problem is just π = exp(-E/k. BT) ) n A sufficient condition is to apply detailed balance. n Consider an infinite number of walkers, and two positions R, and R’ n At equilibrium, the #of walkers that go from R R’ is equal to the #of walkers R’ R p(R R’) ≠ p(R’ R) 29

The Detailed Balance n π(R) is the distribution we want to sample n We have the freedom to choose p(R R’) 30

Rejecting points n The third key idea is to use rejection to enforce detailed balance n p(R R’) is split into a Transition step and an Acceptance/Rejection step n T(R R’) generate the next “candidate” point n A(R R’) will decide to accept or reject this point 31

The Acceptance probability n Given some T, a possible choice for A is n For symmetric T 32

What it does n Suppose π(R’) ≥ π(R) ´ n move is always accepted Suppose π(R’) < π(R) ´ move is accepted with probability π(R’)/π(R) ´ Flip a coin n The algorithm samples regions of large π(R) n Convergence is guaranteed but the rate is not!! 33

IMPORTANT! n Accepted and rejected states count the same! n When a point is rejected, you add the previous one to the averages n Measure acceptance ratio. Set to roughly 1/2 by varying the “step size” n Exact: no time step error, no ergodic problems in principle (but no dynamics). 34

Quantum Mechanics n We wish to solve HY = EY to high accuracy ´ n The solution usually involves computing integrals in high dimensions: 3 -30000 The “classic” approach (from 1929): ´ ´ ´ Find approximate Y (. . . but good. . . ). . . whose integrals are analitically computable (gaussians) Compute the approximate energy chemical accuracy ~ 0. 001 hartree ~ 0. 027 e. V 35

VMC: Variational Monte Carlo n Start from the Variational Principle n Translate it into Monte Carlo language 36

VMC: Variational Monte Carlo n E is a statistical average of the local energy over P(R) n Recipe: ´ take an appropriate trial wave function ´ distribute N points according to P(R) ´ compute the average of the local energy 37

Error bars estimation n In Monte Carlo it is easy to estimate the statistical error n if n The associated statistical error is 38

The Metropolis Algorithm move Ri Call the Oracle Rtry reject accept Ri+1=Ri Ri+1=Rtry Compute averages 39

The Metropolis Algorithm The Oracle if p ≥ 1 /* accept always */ accept move If 0 ≤ p < 1 /* accept with probability p */ if p > rnd() accept move else reject move 40

VMC: Variational Monte Carlo n n n No need to analytically compute integrals: complete freedom in the choice of the trial wave function. Can use explicitly correlated wave functions Can satisfy the cusp conditions r 12 r 2 He atom 41

VMC advantages n Can compute lower bounds n Can go beyond the Born-Oppenheimer approximation, with any potential, in any number of dimensions. Ps 2 molecule (e+e+e-e-) in 2 D and 3 D M+m+M-m- as a function of M/m 42

Properties of the Local energy n For an exact eigenstate EL is a constant n At particles coalescence the divergence of V must be cancelled by the divergence of the kinetic term n For an approximate trial function, EL is not constant 43

Reducing Errors n For a trial function, if EL can diverge, the statistical error will be large n To eliminate the divergence we impose the Kato’s cusp conditions 44

Kato’s cusps conditions on Y n We can include the correct analytical structure electron – electron cusps: electron – nucleus cusps: 45

Optimization of Y n Suppose we have variational parameters in the trial wave function that we want to optimize n The straigthforward optimization of E is numerically unstable, because EL can diverge n For a finite N can be unbound n Also, our energies have error bars. Can be difficult to compare 46

Optimization of Y n It is better to optimize n It is a measure of the quality of the trial function n Even for finite N is numerically stable. n The lowest s will not have the lowest E but it is usually close 47

Optimization of Y n Meaning of optimization of s Trying to reduce the distance between upper and lower bound n For which potential V’ is YT an eigenfunction? n We want V’ to be “close” to the real V 48

VMC drawbacks n Error bar goes down as N-1/2 n It is computationally demanding n The optimization of Y becomes difficult as the number of nonlinear parameters increases n It depends critically on our skill to invent a good Y n There exist exact, automatic ways to get better wave functions. Let the computer do the work. . . To be continued. . . 49

In the last episode: VMC Today: DMC

First Major VMC Calculation n W. Mc. Millan Thesis in 1964 n VMC calculation of ground state of liquid helium 4. n Applied MC techniques from classical liquid theory. 51

VMC advantages and drawbacks n Simple, easy to implement n Intrinsic error bars n Usually obtains 60 -90% of correlation energy n Error bar goes down as N-1/2 n It is computationally demanding n The optimization of Y becomes difficult as the number of nonlinear parameters increases n It depends critically on our skill to invent a good Y 52

Diffusion Monte Carlo n VMC is a “classical” simulation method Nature is not classical, dammit, and if you want to make a simulation of nature, you'd better make it quantum mechanical, and by golly it's a wonderful problem, because it doesn't look so easy. Richard P. Feynman n Suggested by Fermi in 1945, but implemented only in the 70’s 53

Diffusion equation analogy n The time dependent Schrödinger equation is similar to a diffusion equation n The diffusion equation can be “solved” by directly simulating the system Time evolution Diffusion Branch Can we simulate the Schrödinger equation? 54

Imaginary Time Sch. Equation n The analogy is only formal ´ n Y is a complex quantity, while C is real and positive If we let the time t be imaginary, then Y can be real! Imaginary time Schrödinger equation 55

Y as a concentration n Y is interpreted as a concentration of fictitious particles, called walkers n The schrödinger equation is simulated by a process of diffusion, growth and disappearance of walkers Ground State 56

Diffusion Monte Carlo SIMULATION: discretize time • Diffusion process • Kinetic process (branching) 57

First QMC calculation in chemistry 77 lines of Fortran code! 58

Formal development n Formally, in imaginary time n In coordinate representation 59

Schrödinger Equation in integral form n Monte Carlo is good at integrals. . . n We interpret G as a probability to move from R to R’ in an time step t. We iterate this equation 60

Iteration of Schrödinger Equation n We can iterate this equation 61

Zassenhaus formula n In general we do not have the exact G n We must use a small time step t, but at the same time we must let t 62

Trotter theorem n A and B do not commute, use Trotter Theorem n Figure out what each operator does independently and then alternate their effect. This is rigorous in the limit as n n In DMC A is diffusion operator, B is a branching operator 63

Short Time approximation n Diffusion + branching n At equilibrium the algorithm will sample f 0 n The energy can be estimated as 64

The DMC algorithm 65

A picture for H 2+ 66

Short Time approximation 67

Importance sampling n V can diverge, so branching can be inefficient n We can transform the Schrödinger equation, by multiplying by YT 68

Importance sampling n Similar to a Fokker-Plank equation n Simulated by diffusion+drift+branching n To the pure diffusion algorithm we added a drift step that pushes the random walk in directions of increasing trial function 69

Importance sampling n The branching term now is n Fluctuations are controlled n At equilibrium it samples: 70

DMC Algorithm • Initialize a population of walkers {Ri} • For each walker Diffusion Drift R’ R 71

DMC Algorithm • Compute branching • Duplicate R’ to M copies: M = int( ξ + w ) • Compute statistics • Adjust Eref to make average population constant. • Iterate…. 72

Good for Helium studies n Thousands of theoretical and experimental papers have been published on Helium, in its various forms: Atom Small Clusters Droplets Bulk 73

3 He 4 He m n Stability Chart 4 Hen 3 Hem 0 1 2 3 4 5 6 7 8 9 10 11 0 Bound L=0 1 Unbound 2 3 Unknown 4 L=1 S=1/2 5 L=1 S=1 Terra Incognita Bound 32 3 He 4 He 2 2 L=0 S=0 3 He 4 He 2 4 L=1 S=1 3 He 4 He 3 8 3 He 4 He 3 4 L=0 S=1/2 L=1 S=1/2 74

Good for vibrational problems 75

For electronic structure? 76

The Fermion Problem n Wave functions for fermions have nodes. ´ Diffusion equation analogy is lost. Need to introduce positive and negative walkers. The (In)famous Sign Problem n n If we knew the exact nodes of Y, we could exactly simulate the system by QMC methods, restricting random walk to a positive region bounded by nodes. Unfortunately, the exact nodes are unknown. Use approximate nodes from a trial Y. Kill the walkers if they cross a node. + - 77

Common misconception on nodes • Nodes are not fixed by antisymmetry alone, only a 3 N-3 sub-dimensional subset 78

Common misconception on nodes • They have (almost) nothing to do with Orbital Nodes. u It is (sometimes) possible to use nodeless orbitals 79

Common misconceptions on nodes • A common misconception is that on a node, two like-electrons are always close. This is not true 1 2 2 1 80

Common misconceptions on nodes • Nodal theorem is NOT VALID in N-Dimensions u Higher energy states does not mean more nodes (Courant and Hilbert ) u It is only an upper bound 81

Common misconceptions on nodes • Not even for the same symmetry species Courant counterexample 82

Tiling Theorem (Ceperley) Impossible for ground state Nodal domains must have the same shape The Tiling Theorem does not say how many nodal domains we should expect! 83

Nodes are relevant • Levinson Theorem: u • • the number of nodes of the zero-energy scattering wave function gives the number of bound states Fractional quantum Hall effect Quantum Chaos (billiards) Integrable system Chaotic system 84

The Fixed Node approximation n Since in general we do not know the exact nodes, we resort to approximate nodes n We use the nodes of some trial function n The energy is an upper bound to E 0 n The energy depends only on the nodes, the rest of YT affects the statistical error n Usually very good results! Even poor YT usually have good nodes 87

Trial Wave functions n For small systems (N<7) ´ n Specialized forms (linear expansions, hylleraas, . . . ) For larger systems (up to ~ 200) ´ Slater-Jastrow Form n A sum of Slater Determinants n Jastrow factor: a polynomial parametrized in interparticle distances 88

A little intermezzo Be atom nodal structure

Be Nodal Structure r 1+r 2 r 3 -r 4 r 1 -r 2 94

Be nodal structure n Now there are only two nodal domains n It can be proved that the exact Be wave function has exactly two regions Node is See Bressanini, Ceperley and Reynolds http: //scienze-como. uninsubria. it/bressanini/ 95

Be nodal structure n A physicist proof. . . (David Ceperley) n 4 electrons: 1 and 2 spin up, 3 and 4 spin down n Tiling Theorem applies. There at most 4 nodal domains R - + + 96

Be nodal structure n We need to find a point R and a path R(t) that connects R to P 12 P 34 R so that Y(R(t)) ≠ 0 n Consider the point R = (r 1, -r 1, r 3, -r 3) n Y is invariant w. r. t. rotations n Path: Rotating by p along r 1 x r 3 , Y is constant n But Y(R) ≠ 0: ´ Yexact= YHF + higher terms YHF(R) = 0 higher terms ≠ 0 r 3 r 2 r 1 r 4 97

An example n High precision total energy calculations of molecules n An example: what is the most stable fullerene? C 24 n QMC could make consistent predictions of the lowest structure n Other methods are not capable of making consistent predictions about the stability of fullerenes 99

DMC advantages and drawbacks n Correlation between particles is automatically taken into account. n Exact for boson systems n Fixed node for electrons obtains 85 -95% of correlation energy. Very good results in many different fields n Works for T=0. For T > 0 must use Path Integral MC n Not a “black box” n It is computationally demanding for large systems n Derivatives of Y are very hard. Not good enough 100

Current research n Current research focusses on ´ Applications: nanoscience, solid state, condensed matter, nuclear physics, geometry for molecules, . . . ´ Estimating derivatives of wave function ´ Solving the sign problem (very hard!!) ´ Make it O(N) method (currently is O(N^3)) to treat bigger systems (currently about 200 particles) ´ Better wave functions ´ Better optimization methods 101

A reflection. . . A new method for calculating properties in nuclei, atoms, molecules, or solids automatically provokes three sorts of negative reactions: Æ A new method is initially not as well formulated or understood as existing methods Æ It can seldom offer results of a comparable quality before a considerable amount of development has taken place Æ Only rarely do new methods differ in major ways from previous approaches Nonetheless, new methods need to be developed to handle problems that are vexing to or beyond the scope of the current approaches (Slightly modified from Steven R. White, John W. Wilkins and Kenneth G. Wilson) 102

THE END