BIOE 293 Quantitative ecology seminar Marm Kilpatrick Steve

BIOE 293 Quantitative ecology seminar Marm Kilpatrick Steve Munch Spring Quarter 2015

Seminar Goal For 5 -10 quantitative methods: • To understand how the approach works by reading a “methods” paper or book chapter on it. This includes the assumptions, strengths, weaknesses, and limitations. • Read papers more critically. Assess whether an approach is potentially useful for your own work • Try analyzing data using the approach and discuss challenges. • Sadly, you likely won’t be an expert in any of the topics at the end, but you’ll have a start at becoming one

Potential Topics & Voting scheme! • 0. Generic intro stuff • III. Multi-layer models • I. Linear models and apps • IV. Nonlinear from linear • Statistical approaches • Maximum Entropy • • Generalized linear models Path analysis/SEM Correlated data Phylogenetic methods • II. Linear, multivariate • PCA, MCA, CCA • Hierarchical models • Multi-state mark recapture • GAMs • Wavelets • V. Potpourri • • • Meta-analyses Isotope mixing models Ecological Niche models Kriging Machine learning Occupancy modeling

Statistical approaches • Frequentist, AIC, Bayesian – what questions are they answering, what advantages/disadvantages do each them have? • Frequentist: P-values • AIC: Best fitting model(s) • Bayesian: Descriptions of posterior distributions • Maximum Entropy (a different way of thinking about the same stuff, with different pros/cons) p(x) is probability density; m(x) is “background probability distribution”

Most statistical methods start with a model for the probability of data (x) given parameters (q) P(x|q) a. k. a. the ‘Likelihood’ It’s what happens next that gets people so worked up:

Frequentist Data: (x) parameters: (q) -Think of q as fixed, but unknown -Find parameters that maximize P(x|q). Bayesian Bayes’ rule P(q|x)=P(x|q)P(q)/P(x) -Derive bounds on these estimates that should provide good coverage in repeated sampling. -Treat all unknowns as ‘random’ -Hypothesis tests compare fit against some null distribution. -Intervals based directly on P(q|x). -Use Bayes rule to find P(q|x). -Model selection based on goodness of fit. -Model selection and Hypothesis tests usually based on P(model|data). PPL also used. -Frequently only asymptotically correct -Need to specify P(q) and P(model). Information theoretic -Choose amongst set of candidate models based on some ‘Information criterion. ’ -All various attempts to choose model that comes closest to ‘truth’ Maximum Entropy -Derive probability model that contains smallest amount of ‘extra’ information. -Introduced by ET Jaynes as a way to specify minimally informative priors, later expanded into its own inferential tool. -Current applications in ecology range from purely statistical (e. g. MAx. Ent for SDM) to purely theoretical (Harte’s applications to size, area, density distributions)

Linear Models and applications • Generalized linear models and data transformations: distributions, links, leverage and more • Correlated data – GLS for time series, spatial data

Phylogenetic methods (for analyses where species are data points) Felsenstein 1985 Am Nat

Linear Models and applications • Path analysis/Structural equation modeling Hypotheses The data Wootton 1994 Ecology

Multivariate correlational approaches • Principal components analysis (PCA), MCA (PCA for categorical data), CCA (for exploring correlations between 2 sets of predictors (matrices)) • What people often do after they’ve collected lots of data but don’t know what to do with it

III. Multi-layer models (usually linear, but not necessarily) • Hierarchical models (Mixed effects models, nested models, random effects models) • For analyzing data that is influenced by variables that differ at more than one “level” • Multi-state mark recapture models • Survival analyses • Allow for temporary emigration (temporary movement to unvisited locations) • Allow for variable states/traits of individuals to influence survival

Hierarchical models Finite mixture models Mixed effects models Hidden Markov models Introduce ‘hidden’ or ‘latent’ variable to account for heterogeneity among individuals Capture nonstandard distributional shapes Treat some estimated effects (i. e. parameters) as ‘random’ (i. e. variable) State-space models Separate observation and process models Allow for imperfect observations of dynamical systems P(x|q) Likelihood P(q|r) Prior P(r) Hyperprior

Ecological Niche Models (Because everyone loves maps) Occurrence data Environmental variables Probability of occurrence

IV. Nonlinear models (out of linear ones) • Generalized additive models (GAMs) Where each f is represented as a ‘basis expansion’ hj(x) are fixed ‘basis functions’ and aj are coefficients to be estimated. Has same structure as a linear model

Wavelets

Potpourri (Other topics)

Meta-analyses A method for combining results from multiple studies Salkeld et al 2013 Ecol Lett Assessing bias, modeling heterogeneity

Isotope mixing models • Estimate the proportions of different food items in your diet

Kriging

Machine learning approaches – regression trees, random forests • Regression tree: split data into successive groups • Random forests: Lots of regression trees to minimize overfitting De’ath&Fabricius 2000 Ecology

Occupancy modeling • Measuring the occupancy and distribution of an organism when accounting for imperfect detection • With additional assumptions, can be used to estimate abundance • Uses repeated visitation of locations and presence/absence of species of interest