The case of the missing model The modernisation

  • Slides: 69
Download presentation
The case of the missing model: The modernisation of multivariate analysis in ecology David

The case of the missing model: The modernisation of multivariate analysis in ecology David Warton School of Mathematics and Statistics

The case of the missing model: The modernisation of multivariate analysis in ecology •

The case of the missing model: The modernisation of multivariate analysis in ecology • Introduction • The Bray-Curtis distance and other 1980’s memorabilia • Building a model for multivariate abundances in ecology • A 2020 vision for multivariate analysis in ecology David Warton – Modernising multivariate analysis in ecology

An exciting time to do statistics! http: //www. allaboutapple. com Speed: 1 MHz RAM:

An exciting time to do statistics! http: //www. allaboutapple. com Speed: 1 MHz RAM: 64 k. B Matthew Riegler, CC-BY, Wikimedia Commons Speed: 3. 8 GHz RAM: 16 GB Computing revolution → Data analysis revolution David Warton – Modernising multivariate analysis in ecology

New technology → New data Rashid et al. (2012) Evolutionary Bioinformatics Warton & Shepherd

New technology → New data Rashid et al. (2012) Evolutionary Bioinformatics Warton & Shepherd (2010) Annals of Applied Statistics David Warton – Modernising multivariate analysis in ecology Yamaguchi et al. (2011) Cancer Informatics

New technology → New analysis methods → Francis Hui (2014) Dunstan et al. (2013)

New technology → New analysis methods → Francis Hui (2014) Dunstan et al. (2013) Journal of Agricultural, Biological and Environmental Statistics David Warton – Modernising multivariate analysis in ecology

Example: multivariate abundances in ecology Widely used to study communities. How do grassland invertebrates

Example: multivariate abundances in ecology Widely used to study communities. How do grassland invertebrates respond to climate? Alex Brown (2010) David Warton – Modernising multivariate analysis in ecology 36 sites 64 Hemiptera species 0 0 0 2 0 0 3 0 0 4 0 0 0 ⁞ 4 5 1 2 0 0 0 0 0 3 0 ⁞ 1 0 0 0 0 1 1 ⁞ 0 1 0 0 0 0 1 ⁞ 0 0 1 0 0 0 0 0 12 ⁞ 0 0 0 0 2 6 1 ⁞ 1 0 0 0 1 0 ⁞ 0 0 0 0 3 0 0 0 0 ⁞ 0 0 1 0 0 0 0 ⁞ 0 … … … … 0 1 0 0 2 0 0 0 0 ⁞ 0

Multivariate abundances: two key properties • Abundance Different species, different variability (mean-variance relationship) 0

Multivariate abundances: two key properties • Abundance Different species, different variability (mean-variance relationship) 0 • Multivariate Lots of species (high dimensional) Big problem! David Warton – Modernising multivariate analysis in ecology 0 0 2 0 0 3 0 0 4 0 0 0 ⁞ 4 5 1 2 0 0 0 0 0 3 0 ⁞ 1 0 0 0 0 1 1 ⁞ 0 1 0 0 0 0 1 ⁞ 0 0 1 0 0 0 0 0 12 ⁞ 0 0 0 0 2 6 1 ⁞ 1 0 0 0 1 0 ⁞ 0 0 0 0 3 0 0 0 0 ⁞ 0 0 1 0 0 0 0 ⁞ 0 … … … … 0 1 0 0 2 0 0 0 0 ⁞ 0

The case of the missing model: The modernisation of multivariate analysis in ecology •

The case of the missing model: The modernisation of multivariate analysis in ecology • Introduction • The Bray-Curtis distance and other 1980’s memorabilia • Building a model for multivariate abundances in ecology • A 2020 vision for multivariate analysis in ecology David Warton – Modernising multivariate analysis in ecology

1980’s solution… 0. 82 0. 69 0 0 0 2 0 0 3 0

1980’s solution… 0. 82 0. 69 0 0 0 2 0 0 3 0 0 4 0 0 0 ⁞ 4 5 1 2 0 0 0 0 0 3 0 ⁞ 1 0 0 0 0 1 1 ⁞ 0 1 0 0 0 0 1 ⁞ 0 0 1 0 0 0 0 0 12 ⁞ 0 0 0 0 2 6 1 ⁞ 1 0 0 0 1 0 ⁞ 0 0 0 0 3 0 0 0 0 ⁞ 0 0 1 0 0 0 0 ⁞ 0 … … … … 0 1 0 0 2 0 0 0 0 ⁞ 0 1. 00 1. 00 0. 68 0. 92 ⁞ 0. 60 0. 90 0. 60 Widely used – often taught in second year ecology! David Warton – Modernising multivariate analysis in ecology 1. 00 0. 60

Some other 1980’s trends… 0. 82 0. 69 0 0 0 2 0 0

Some other 1980’s trends… 0. 82 0. 69 0 0 0 2 0 0 3 0 0 4 0 0 0 ⁞ 4 5 1 2 0 0 0 0 0 3 0 ⁞ 1 0 0 0 0 1 1 ⁞ 0 1 0 0 0 0 1 ⁞ 0 0 1 0 0 0 0 0 12 ⁞ 0 0 0 0 2 6 1 ⁞ 1 0 0 0 1 0 ⁞ 0 0 0 0 3 0 0 0 0 ⁞ 0 0 1 0 0 0 0 ⁞ 0 … … … … 0 1 0 0 2 0 0 0 0 ⁞ 0 1. 00 1. 00 0. 68 0. 92 ⁞ 0. 60 0. 90 0. 60 1. 00 0. 60 David Warton – Modernising multivariate analysis in ecology

1980’s solution… no good in the 2010’s Poor power properties (transformation doesn’t help rarer

1980’s solution… no good in the 2010’s Poor power properties (transformation doesn’t help rarer spp) (because mean-variance relationship ignored) Simulation: a large (10 x) effect in one species only. Results: effect only detected in high-variance species. Warton et al. (2012) Methods in Ecology & Evolution David Warton – Modernising multivariate analysis in ecology

Transformation never works for rare counts David Warton – Modernising multivariate analysis in ecology

Transformation never works for rare counts David Warton – Modernising multivariate analysis in ecology

1980’s solution… no good in the 2010’s Can’t detect special mean structures (because no

1980’s solution… no good in the 2010’s Can’t detect special mean structures (because no mean model) e. g. evidence of a change in composition? Simulation: 2 -fold change in total abundance, no change in composition Warton et al. (in prep) David Warton – Modernising multivariate analysis in ecology

1980’s solution… no good in the 2010’s Can’t detect special mean structures (because no

1980’s solution… no good in the 2010’s Can’t detect special mean structures (because no mean model) David Warton – Modernising multivariate analysis in ecology

1980’s solution… no good in the 2010’s Same dataset, same question, different answers? (lacks

1980’s solution… no good in the 2010’s Same dataset, same question, different answers? (lacks diagnostic tools) e. g. coral abundance in the Thousand Islands, Indonesia David Warton – Modernising multivariate analysis in ecology (Warwick et al. 1990)

What’s missing from this picture? 0 0 0 2 0 0 3 0 0

What’s missing from this picture? 0 0 0 2 0 0 3 0 0 4 0 0 0 ⁞ 4 5 1 2 0 0 0 0 0 3 0 ⁞ 1 0 0 0 0 1 1 ⁞ 0 1 0 0 0 0 1 ⁞ 0 0 1 0 0 0 0 0 12 ⁞ 0 0 0 0 2 6 1 ⁞ 1 0 0 0 1 0 ⁞ 0 0 0 0 3 0 0 0 0 ⁞ 0 0 1 0 0 0 0 ⁞ 0 … … … … 0 1 0 0 2 0 0 0 0 ⁞ 0 0. 82 0. 69 1. 00 0. 68 0. 92 ⁞ 0. 60 0. 90 0. 60 1. 00 0. 60 This prevents specification of a model for the data-generating mechanism David Warton – Modernising multivariate analysis in ecology

The case of the missing model: The modernisation of multivariate analysis in ecology •

The case of the missing model: The modernisation of multivariate analysis in ecology • Introduction • The Bray-Curtis distance and other 1980’s memorabilia • Building a model for multivariate abundances in ecology • A 2020 vision for multivariate analysis in ecology David Warton – Modernising multivariate analysis in ecology

Some things that happened since the 1980’s… • Bayesian hierarchical models (MCMC) • Monte

Some things that happened since the 1980’s… • Bayesian hierarchical models (MCMC) • Monte Carlo EM and modern numerical integration • PIT residuals as diagnostics for GLMs (Gelfand Smith 1990) (Wei & Tanner 1990) (Dunn & Smyth 1996) David Warton – Modernising multivariate analysis in ecology

Residuals are hard to define for count data David Warton – Modernising multivariate analysis

Residuals are hard to define for count data David Warton – Modernising multivariate analysis in ecology

PIT residuals ~U(0, 1) when model is correct David Warton – Modernising multivariate analysis

PIT residuals ~U(0, 1) when model is correct David Warton – Modernising multivariate analysis in ecology

We can use PIT residuals to check model fit David Warton – Modernising multivariate

We can use PIT residuals to check model fit David Warton – Modernising multivariate analysis in ecology

Some things that happened since the 1980’s… • • Bayesian hierarchical models (MCMC) Modern

Some things that happened since the 1980’s… • • Bayesian hierarchical models (MCMC) Modern numerical integration, Monte Carlo EM PIT residuals to check model fit for counts Penalised likelihood (e. g. LASSO, ) Generalised linear latent variable models Graphical modelling Variational approximation David Warton – Modernising multivariate analysis in ecology (Gelfand Smith 1990) (Wei & Tanner 1990) (Dunn & Smyth 1996) Tibshirani 1996 (Skrondal & Rabe-Hesketh 2004) (Friedman et al. 2008) (Ormerod & Wand 2010)

Things to consider when building a model • What is the research Question? –

Things to consider when building a model • What is the research Question? – descriptive, prediction, testing a priori hypothesis, … • What Properties do my data have? – abundance (mean-variance) – multivariate (high-dimensional) Warton et al. (2015) Plant Ecology David Warton – Modernising multivariate analysis in ecology

Data properties? Abundance (mean-variance) The question? Test (env impact) Composition Explain spp variation Predict

Data properties? Abundance (mean-variance) The question? Test (env impact) Composition Explain spp variation Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology Multivariate (high-dimensional)

Data properties? Abundance (mean-variance) Multivariate (high-dimensional) The question? Test (env impact) Composition Generalised linear

Data properties? Abundance (mean-variance) Multivariate (high-dimensional) The question? Test (env impact) Composition Generalised linear model (GLM) Explain spp variation Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology for abundance yij of species j at site i as a function of site predictors xi

Data properties? Abundance (mean-variance) Multivariate (high-dimensional) The question? Test (env impact) Composition Explain spp

Data properties? Abundance (mean-variance) Multivariate (high-dimensional) The question? Test (env impact) Composition Explain spp variation Predict (new sites or spp) Classify Ordination Multivariate frameworks: • Generalised estimating equations (GEE) (Warton 2011 Biometrics) • Hierarchical models • Copulas (Warton et al 2015 Trends Ecol Evol) (Popovic & Warton in review) But high-dimensionality is the real issue (64 species → 2016 correlations!) Study correlations David Warton – Modernising multivariate analysis in ecology

Data properties? Abundance (mean-variance) Multivariate (high-dimensional) The question? Test (env impact) Composition Explain spp

Data properties? Abundance (mean-variance) Multivariate (high-dimensional) The question? Test (env impact) Composition Explain spp variation Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology Options for high dimensionality: • Ignore it (e. g. for prediction) Brown et al (2014) Methods Ecol Evol • Resample rows for valid inference Wang et al (2012) Methods Ecol Evol • Shrink covariance (e. g. toward identity) Warton (2008) JASA • Covariance modelling, e. g. – Factor analysis (latent variables) – Graphical modelling Warton et al (2015) Trends Ecol Evol

Data properties? Abundance (mean-variance) Composition Do grassland invertebrate communities change when host plants are

Data properties? Abundance (mean-variance) Composition Do grassland invertebrate communities change when host plants are transplanted? 64 Hemiptera species Explain spp variation Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology 36 sites The question? Test (env impact) Multivariate (high-dimensional) 0 0 0 2 0 0 3 0 0 4 0 0 0 ⁞ 4 5 1 2 0 0 0 0 0 3 0 ⁞ 1 0 0 0 0 1 1 ⁞ 0 1 0 0 0 0 1 ⁞ 0 0 1 0 0 0 0 0 12 ⁞ 0 0 0 0 2 6 1 ⁞ 1 0 0 0 1 0 ⁞ 0 0 0 0 3 0 0 0 0 ⁞ 0 0 1 0 0 0 0 ⁞ 0 … … … … 0 1 0 0 2 0 0 0 0 ⁞ 0

Data properties? Abundance (mean-variance) The question? Test (env impact) Multivariate (high-dimensional) Generalised linear model

Data properties? Abundance (mean-variance) The question? Test (env impact) Multivariate (high-dimensional) Generalised linear model (GLM) Composition Explain spp variation Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology for abundance yij of species j at site i as a function of site predictors xi

Data properties? Abundance (mean-variance) The question? Test (env impact) Composition Explain spp variation Predict

Data properties? Abundance (mean-variance) The question? Test (env impact) Composition Explain spp variation Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology Multivariate (high-dimensional) Resample rows for testing How to resample GLMs? “PIT-trap” for residual resampling (Warton & Wang in review)

PIT-trap: David Warton – Modernising multivariate analysis in ecology Warton & Wang (in review)

PIT-trap: David Warton – Modernising multivariate analysis in ecology Warton & Wang (in review)

Data properties? Abundance (mean-variance) The question? Test (env impact) Composition Explain spp variation Predict

Data properties? Abundance (mean-variance) The question? Test (env impact) Composition Explain spp variation Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology row-resample GLMs Multivariate (high-dimensional)

Data properties? The question? Abundance (mean-variance) Ordination David Warton – Modernising multivariate analysis in

Data properties? The question? Abundance (mean-variance) Ordination David Warton – Modernising multivariate analysis in ecology Multivariate (high-dimensional)

Data properties? Abundance (mean-variance) Multivariate (high-dimensional) The question? Generalised linear model (GLM) Ordination David

Data properties? Abundance (mean-variance) Multivariate (high-dimensional) The question? Generalised linear model (GLM) Ordination David Warton – Modernising multivariate analysis in ecology

Data properties? Abundance (mean-variance) Multivariate (high-dimensional) The question? Generalised linear model (GLM) with latent

Data properties? Abundance (mean-variance) Multivariate (high-dimensional) The question? Generalised linear model (GLM) with latent variables zi Warton et al. (Trends Ecol Evol) Ovaskainen et al. (in press) Methods Ecol Evol Ordination Computation a bit tricky: The zi: (1) are ordination axes • MCMC (2) model correlation • variational approximation (Hui, “boral”, Ovaskainen & Blanchet “HMSC”) (Hui et al in review) David Warton – Modernising multivariate analysis in ecology

Data properties? Multivariate (high-dimensional) The question? Abundance (mean-variance) Ordination Hui et al. (2015) Methods

Data properties? Multivariate (high-dimensional) The question? Abundance (mean-variance) Ordination Hui et al. (2015) Methods Ecol Evol David Warton – Modernising multivariate analysis in ecology

Data properties? Abundance (mean-variance) The question? Test (env impact) Multivariate (high-dimensional) row-resample GLMs Composition

Data properties? Abundance (mean-variance) The question? Test (env impact) Multivariate (high-dimensional) row-resample GLMs Composition Explain spp variation Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology Latent variable models also see Pledger & Arnold (2014, J Comp Graph Stats)

Data properties? Abundance (mean-variance) Composition Is there a change in community composition? (relative abundance)

Data properties? Abundance (mean-variance) Composition Is there a change in community composition? (relative abundance) 64 Hemiptera species Explain spp variation Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology 36 sites The question? Test (env impact) Multivariate (high-dimensional) 0 0 0 2 0 0 3 0 0 4 0 0 0 ⁞ 4 5 1 2 0 0 0 0 0 3 0 ⁞ 1 0 0 0 0 1 1 ⁞ 0 1 0 0 0 0 1 ⁞ 0 0 1 0 0 0 0 0 12 ⁞ 0 0 0 0 2 6 1 ⁞ 1 0 0 0 1 0 ⁞ 0 0 0 0 3 0 0 0 0 ⁞ 0 0 1 0 0 0 0 ⁞ 0 … … … … 0 1 0 0 2 0 0 0 0 ⁞ 0

Data properties? Abundance (mean-variance) The question? Test (env impact) Composition Explain spp variation Predict

Data properties? Abundance (mean-variance) The question? Test (env impact) Composition Explain spp variation Predict “row effect”, ai (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology Multivariate (high-dimensional) Is there a change in community composition? (relative abundance) 64 Hemiptera species 0 0 0 2 0 0 3 0 0 4 0 0 0 ⁞ 4 5 1 2 0 0 0 0 0 3 0 ⁞ 1 0 0 0 0 1 1 ⁞ 0 1 0 0 0 0 1 ⁞ 0 0 1 0 0 0 0 0 12 ⁞ 0 0 0 0 2 6 1 ⁞ 1 0 0 0 1 0 ⁞ 0 0 0 0 3 0 0 0 0 ⁞ 0 0 1 0 0 0 0 ⁞ 0 … … … … 0 1 0 0 2 0 0 0 0 ⁞ 0

Data properties? Abundance (mean-variance) The question? Test (env impact) Multivariate (high-dimensional) row-resample GLMs Composition

Data properties? Abundance (mean-variance) The question? Test (env impact) Multivariate (high-dimensional) row-resample GLMs Composition Warton et al. (in prep) Explain spp variation Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology Latent variable models also see Pledger & Arnold (2014, J Comp Graph Stats)

Data properties? The question? Abundance (mean-variance) Composition David Warton – Modernising multivariate analysis in

Data properties? The question? Abundance (mean-variance) Composition David Warton – Modernising multivariate analysis in ecology Multivariate (high-dimensional)

Data properties? The question? Abundance (mean-variance) Composition David Warton – Modernising multivariate analysis in

Data properties? The question? Abundance (mean-variance) Composition David Warton – Modernising multivariate analysis in ecology Multivariate (high-dimensional) Coral data: effect on total abundance or composition?

Data properties? The question? Abundance (mean-variance) Composition David Warton – Modernising multivariate analysis in

Data properties? The question? Abundance (mean-variance) Composition David Warton – Modernising multivariate analysis in ecology Multivariate (high-dimensional) Coral data: effect on total abundance or composition?

Data properties? The question? Abundance (mean-variance) Composition David Warton – Modernising multivariate analysis in

Data properties? The question? Abundance (mean-variance) Composition David Warton – Modernising multivariate analysis in ecology Multivariate (high-dimensional) Coral data: effect on total abundance or composition?

Data properties? Abundance (mean-variance) Why do different species respond differently? Composition 64 Hemiptera species

Data properties? Abundance (mean-variance) Why do different species respond differently? Composition 64 Hemiptera species Explain spp variation Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology 36 sites The question? Test (env impact) Multivariate (high-dimensional) 0 0 0 2 0 0 3 0 0 4 0 0 0 ⁞ 4 5 1 2 0 0 0 0 0 3 0 ⁞ 1 0 0 0 0 1 1 ⁞ 0 1 0 0 0 0 1 ⁞ 0 0 1 0 0 0 0 0 12 ⁞ 0 0 0 0 2 6 1 ⁞ 1 0 0 0 1 0 ⁞ 0 0 0 0 3 0 0 0 0 ⁞ 0 0 1 0 0 0 0 ⁞ 0 … … … … 0 1 0 0 2 0 0 0 0 ⁞ 0

Data properties? Abundance (mean-variance) The question? Test (env impact) Composition Multivariate (high-dimensional) Why do

Data properties? Abundance (mean-variance) The question? Test (env impact) Composition Multivariate (high-dimensional) Why do different species respond differently? → species traits (zj) as predictors Explain spp variation Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology climate x trait interactions (bxz) explain differences in species response, “the fourth corner” Brown et al. (2014), Warton et al. (2015) Methods in Ecology and Evolution

Data properties? Abundance (mean-variance) The question? Test (env impact) Composition Explain spp variation Predict

Data properties? Abundance (mean-variance) The question? Test (env impact) Composition Explain spp variation Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology Multivariate (high-dimensional) Why do different species respond differently? → species traits (zj) as predictors

Data properties? Abundance (mean-variance) The question? Test (env impact) Multivariate (high-dimensional) row-resample GLMs Composition

Data properties? Abundance (mean-variance) The question? Test (env impact) Multivariate (high-dimensional) row-resample GLMs Composition Explain spp variation Traits (zj) as predictors, interaction with xi: Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology Latent variable models also see Pledger & Arnold (2014, J Comp Graph Stats)

Data properties? Abundance (mean-variance) The question? Test (env impact) Multivariate (high-dimensional) row-resample GLMs Composition

Data properties? Abundance (mean-variance) The question? Test (env impact) Multivariate (high-dimensional) row-resample GLMs Composition Explain spp variation Traits (zj) as predictors, interaction with xi: Predict (new sites or spp) LASSO, random effects, neural nets, … (Harris 2015, Methods in Ecology and Evolution ) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology Latent variable models also see Pledger & Arnold (2014, J Comp Graph Stats)

Data properties? Abundance (mean-variance) The question? Test (env impact) Multivariate (high-dimensional) row-resample GLMs Composition

Data properties? Abundance (mean-variance) The question? Test (env impact) Multivariate (high-dimensional) row-resample GLMs Composition Explain spp variation Traits (zj) as predictors, interaction with xi: Predict (new sites or spp) LASSO, random effects, neural nets, … Classify (species by response, or sites by composition) Ordination Latent variable models Study correlations David Warton – Modernising multivariate analysis in ecology (Harris 2015, Methods in Ecology and Evolution ) also see Pledger & Arnold (2014, J Comp Graph Stats)

Data properties? Abundance (mean-variance) The question? Test (env impact) Multivariate (high-dimensional) row-resample GLMs Composition

Data properties? Abundance (mean-variance) The question? Test (env impact) Multivariate (high-dimensional) row-resample GLMs Composition Explain spp variation Traits (zj) as predictors, interaction with xi: Predict (new sites or spp) LASSO, random effects, neural nets, … Classify Finite mixture models Ordination Latent variable models Study correlations David Warton – Modernising multivariate analysis in ecology (Harris 2015, Methods in Ecology and Evolution ) Pledger & Arnold (2014, JCGS), Dunstan et al. (2011 Ecol Model…) also see Pledger & Arnold (2014, J Comp Graph Stats)

Data properties? Abundance (mean-variance) The question? Test (env impact) Multivariate (high-dimensional) row-resample GLMs Composition

Data properties? Abundance (mean-variance) The question? Test (env impact) Multivariate (high-dimensional) row-resample GLMs Composition Explain spp variation Traits (zj) as predictors, interaction with xi: Predict (new sites or spp) LASSO, random effects, neural nets, … Classify Finite mixture models Ordination Latent variable models Study correlations David Warton – Modernising multivariate analysis in ecology (Harris 2015, Methods in Ecology and Evolution ) Pledger & Arnold (2014, JCGS), Dunstan et al. (2011 Ecol Model…) also see Pledger & Arnold (2014, J Comp Graph Stats)

Data properties? Abundance (mean-variance) The question? Test (env impact) Multivariate (high-dimensional) Can use a

Data properties? Abundance (mean-variance) The question? Test (env impact) Multivariate (high-dimensional) Can use a latent variable model (“biplot”): Composition Explain spp variation Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology Warton et al. (2015, Trend Ecol Evol)

Data properties? Abundance (mean-variance) GLM+. . . The question? Test (env impact) Or graphical

Data properties? Abundance (mean-variance) GLM+. . . The question? Test (env impact) Or graphical modelling (using a Gaussian copula): Composition Explain spp variation Predict (new sites or spp) Multivariate (high-dimensional) copula S-1 assumed to be sparse: Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology

Data properties? Abundance (mean-variance) GLM+. . . The question? Test (env impact) Multivariate (high-dimensional)

Data properties? Abundance (mean-variance) GLM+. . . The question? Test (env impact) Multivariate (high-dimensional) copula Or graphical modelling (using a Gaussian copula): Composition Explain spp variation Predict (new sites or spp) Classify Ordination S-1 assumed to be sparse, estimate to maximise which can be simplified to an iteratively reweighted GLASSO on the Zij Study correlations David Warton – Modernising multivariate analysis in ecology Popovic & Warton (in review)

Data properties? Abundance (mean-variance) GLM+. . . The question? Test (env impact) Multivariate (high-dimensional)

Data properties? Abundance (mean-variance) GLM+. . . The question? Test (env impact) Multivariate (high-dimensional) copula row-resample GLMs Composition Explain spp variation Traits (zj) as predictors, interaction with xi: Predict (new sites or spp) LASSO, random effects, neural nets, … Classify Finite mixture models Ordination Latent variable models Study correlations Graphical models (and LVMs) David Warton – Modernising multivariate analysis in ecology (Harris 2015, Methods in Ecology and Evolution ) also see Pledger & Arnold (2014, J Comp Graph Stats)

Data properties? Abundance (mean-variance) GLM+. . . The question? Test (env impact) Multivariate (high-dimensional)

Data properties? Abundance (mean-variance) GLM+. . . The question? Test (env impact) Multivariate (high-dimensional) GEE, hierarchical, copula row-resample GLMs Composition Explain spp variation Traits (zj) as predictors, interaction with xi: Predict (new sites or spp) LASSO, random effects, neural nets, … Classify Finite mixture models Ordination Latent variable models Study correlations Graphical models (and LVMs) David Warton – Modernising multivariate analysis in ecology (Harris 2015, Methods in Ecology and Evolution ) also see Pledger & Arnold (2014, J Comp Graph Stats)

Software for multivariate models Test (env impact) Composition resample GLMs (anova. manyglm on mvabund)

Software for multivariate models Test (env impact) Composition resample GLMs (anova. manyglm on mvabund) (manyany on mvabund) Explain spp variation Traits (zj) as predictors, interaction with xi: (traitglm on mvabund) Predict (new sites or spp) LASSO (glm 1 path on mvabund, or glmnet), random effects (lme 4 etc), neural nets (mistnet) Classify Finite mixture models (speciesmix, RCPmod) Ordination Latent variable models (boral, HMSC) Study correlations Graphical models (and LVMs) David Warton – Modernising multivariate analysis in ecology

The case of the missing model: The modernisation of multivariate analysis in ecology •

The case of the missing model: The modernisation of multivariate analysis in ecology • Introduction • The Bray-Curtis distance and other 1980’s memorabilia • Building a model for multivariate abundances in ecology • A 2020 vision for multivariate analysis in ecology David Warton – Modernising multivariate analysis in ecology

A 2020 vision for multivariate analysis Less of this: More of this: David Warton

A 2020 vision for multivariate analysis Less of this: More of this: David Warton – Modernising multivariate analysis in ecology

To-do list for 2020: Abundance (mean-variance) Test (env impact) Composition Explain spp variation Predict

To-do list for 2020: Abundance (mean-variance) Test (env impact) Composition Explain spp variation Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology Multivariate (high-dimensional)

To-do list for 2020: Abundance (mean-variance) Multivariate (high-dimensional) Test (env impact) Composition Explain spp

To-do list for 2020: Abundance (mean-variance) Multivariate (high-dimensional) Test (env impact) Composition Explain spp variation Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology • Model-based inference with many parameters

To-do list for 2020: Abundance (mean-variance) Multivariate (high-dimensional) Test (env impact) Composition Explain spp

To-do list for 2020: Abundance (mean-variance) Multivariate (high-dimensional) Test (env impact) Composition Explain spp variation Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology • Model-based inference with many parameters • Classification + high-dimensional response

To-do list for 2020: Abundance (mean-variance) Multivariate (high-dimensional) Test (env impact) Composition Explain spp

To-do list for 2020: Abundance (mean-variance) Multivariate (high-dimensional) Test (env impact) Composition Explain spp variation Predict (new sites or spp) Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology • Model-based inference with many parameters • Classification + high-dimensional response • Copula inference for high-dimensional counts

To-do list for 2020: Abundance (mean-variance) Multivariate (high-dimensional) Test (env impact) Composition • Model-based

To-do list for 2020: Abundance (mean-variance) Multivariate (high-dimensional) Test (env impact) Composition • Model-based inference with many parameters Explain spp variation • Classification + high-dimensional response Predict (new sites or spp) • Hierarchical classification using models Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology • Copula inference for high-dimensional counts

To-do list for 2020: Abundance (mean-variance) Multivariate (high-dimensional) Test (env impact) Composition • Model-based

To-do list for 2020: Abundance (mean-variance) Multivariate (high-dimensional) Test (env impact) Composition • Model-based inference with many parameters Explain spp variation • Classification + high-dimensional response Predict (new sites or spp) • Hierarchical classification using models Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology • Copula inference for high-dimensional counts • Structured correlation across sites (spatial etc)

To-do list for 2020: Abundance (mean-variance) Multivariate (high-dimensional) Test (env impact) Composition • Model-based

To-do list for 2020: Abundance (mean-variance) Multivariate (high-dimensional) Test (env impact) Composition • Model-based inference with many parameters Explain spp variation • Classification + high-dimensional response Predict (new sites or spp) • Hierarchical classification using models Classify Ordination Study correlations David Warton – Modernising multivariate analysis in ecology • Copula inference for high-dimensional counts • Structured correlation across sites (spatial etc) • Spread the word to ecologists…

Spreading the word… • • • Free well-documented software (mvabund etc) Youtube video →

Spreading the word… • • • Free well-documented software (mvabund etc) Youtube video → Blog posts Textbook (to write in 2016) Workshops Eco-Stats conferences David Warton – Modernising multivariate analysis in ecology

Acknowledgements • UNSW Eco-Stats Group and Collaborators • Australian Research Council – Future Fellowship

Acknowledgements • UNSW Eco-Stats Group and Collaborators • Australian Research Council – Future Fellowship scheme (FT 120100501) – Discovery scheme (DP 0985886, DP 120100882, DP 150100823) • UNSW – School of Mathematics and Statistics – Evolution and Ecology Research Centre • Ecologists who shared their data! David Warton – Modernising multivariate analysis in ecology