SIR method sampleimportance resampling Problem with grid method

SIR method (sample-importance resampling)

Problem with grid method • We don’t know how fine to make the grid steps • We actually need steps to be continuous • Instead of systematic sampling from a grid, the SIR method randomly samples (r, N 1973) pairs from the grid region • Good guesses (draws) with high likelihood×prior are kept and bad draws are discarded • When enough draws have been saved so that the posterior is smooth (1000– 5000), then stop 21 Antarctic blue SIR. xlsx, sheet “Normal prior”

SIR: sample-importance resampling (simplest and least efficient version) Find maximum(likelihood×prior), Y Randomly sample pairs of r and N 1973 For each pair, calculate X = likelihood×prior Accept pair with probability X/Y, otherwise reject Note that X/Y = exp(NLL(Y) –NLL(X)), which is often easier to work with • Accepted pairs are the posterior • Repeat until you have sufficient accepted pairs • • • 21 Antarctic blue SIR. xlsx, sheet “Normal prior”

Value of r SIR: accepted, rejected Value of N 1973 21 Antarctic blue SIR. xlsx, sheet “Normal prior”

Advantage of discrete samples • Each draw that is saved is a sample from the posterior distribution • We can take these pairs of (r, N 1973) and project the model into the future for each pair • This gives us future predictions for the joint values of the parameters that also takes into account correlations between parameter values

21 Antarctic blue SIR. xlsx, sheet “Normal prior”

20, 000 samples, 296 accepted • r = 0. 072, 95% interval = 0. 027 -0. 112 – Grid method 0. 072, 0. 029 -0. 115 • N 1973 = 320, 95% interval = 145 -689 • 98. 5% of all function calls were rejected (wasteful) • Tricks to increase acceptance rates – Accept with probability X/Z where Z is smaller than the MPD (Y), will accept more draws, but must duplicate some draws in the posterior – Sample parameter values from an importance function, compare likelihood ratios, and also account for importance function (not covered here) 21 Antarctic blue SIR. xlsx, sheet “Normal prior”

MCMC method Markov chain Monte Carlo

Markov chain Monte Carlo (MCMC) (general idea) • Idea: why not meander around the accepted values from the SIR algorithm? • Method – – Start somewhere Randomly jump somewhere else If you found a better place, go there If you found a worse place, go there with some probability • There are formal proofs that this “Metropolis. Hastings” algorithm works

Number of citations to Metropolis et al. (1953) It took 19 years to first exceed 20 citations in one year Side note on the authors: The first author Nicholas Metropolis was just the head of the lab, but didn’t actually come up with any of the technical details of the algorithm. The other four authors were two married couples, of which the second author, Arianna Rosenbluth, actually implemented the entire algorithm on one of the earliest computers (the “MANIAC I”). Metropolis N et al. (1953) Equations of state calculations by fast computing machines. Journal of Chemical Physics 21: 1087 -1092

MCMC algorithm • Start anywhere with values for r 1, N 1973, 1, and calculate X 1 = likelihood×prior • Jump function: add random numbers to r 1 and N 1973, 1, to get a candidate draw: r*, N 1973*, and X* = likelihood×prior • Calculate X*/X 1 which equals exp(NLL(X 1) – NLL(X*)) • If random number U[0, 1] < X*/X 1 then r 2 = r*, N 1973, 2 = N 1973*, and X 2 = X* [accept draw, move to new values] • If random number U[0, 1] ≥ X*/X 1 then r 2 = r 1, N 1973, 2 = N 1973, 1, X 2 = X 1 [reject draw, stay at current values] 21 Antarctic blue MCMC. xlsx

MCMC algorithm • Successive points wander around the posterior • If you start far away, it will take some time to get near to the highest likelihood • Therefore, discard first 20% of accepted draws (burnin period) • Thin the chain, by retaining only one in every n accepted draws, to keep retained draws to a manageable number • Convergence is when there is no autocorrelation in thinned chain (there are other tests for convergence) 21 Antarctic blue MCMC. xlsx

21 Antarctic blue MCMC. xlsx sheet “Normal prior”

Trace for r r vs. N 1973 Value of r Draws 1– 500 Value of r Value of N 1973 Trace for N 1973 Draw (2, 000 -10, 000) Value of N 1973 Value of r Value of N 1973 Draw (first 500) Draws 2, 000– 10, 000 Value of N 1973 21 Antarctic blue MCMC. xlsx

10, 000 samples, 2669 accepted • r = 0. 074, 95% interval = 0. 032 -0. 118 – Grid method 0. 072, 0. 029 -0. 115 – SIR method 0. 072, 0. 027 -0. 112 • N 1973 = 302, 95% interval = 130 -673 • To improve convergence: increase length of chain, change jump size, change thinning rate, make jumps multivariate etc. 21 Antarctic blue MCMC. xlsx

(10000 samples) Value of r SIR MCMC (20000 samples) Rejected Does not explore space with low likelihood Value of N 1973 Accepted Therefore many more draws accepted Value of N 1973 21 Accepted rejected comparison. xlsx

What to do with accepted draws • Histogram of r = marginal posterior for r • Histogram of N 1973 = marginal posterior for N 1973 • Proportion of r values < 0 is the probability that the population is declining (2 out of 8000 => P = 0. 0002) • Can run population model for each accepted draw r and N 1973 and calculate 95% credibility intervals for any past or future year

Bayesian methods summary • Many different algorithms: grid method, SIR method, MCMC method (Gibbs samplers, Hamiltonian MC, No-U-Turn Sampler) etc. • All involve priors, likelihoods, and posteriors • Natural interpretation of probability • Allow use of other information • Posterior draws can be used for prediction

Picture by Beth Fossen of baby Ruth