Maxent interface Maximum Entropy Maxent Deterministic Precise mathematical

  • Slides: 12
Download presentation
Maxent interface

Maxent interface

Maximum Entropy (Maxent) • Deterministic • Precise mathematical definition • Continuous and categorical environmental

Maximum Entropy (Maxent) • Deterministic • Precise mathematical definition • Continuous and categorical environmental data • Continuous output

Maxent can be downloaded at: http: //www. cs. princeton. edu/~schapire/maxent/ Note: when downloading Maxent,

Maxent can be downloaded at: http: //www. cs. princeton. edu/~schapire/maxent/ Note: when downloading Maxent, make sure that maxent. jar is saved as is, and not as a. zip file

Input data: 1. Samples: a. csv file with 3 fields (species label, longitude, latitude)

Input data: 1. Samples: a. csv file with 3 fields (species label, longitude, latitude) and a header as first line. Can have multiple species in a single file 2. Environmental layers*: ASCII files (ESRI or DIVA-GIS formats) grouped in a folder. No mask file is needed 3. * also possible to use SWD format (sample-with-data): a. csv file containing the environmental variables’ values for each occurrence point

Classes of features: 1. Linear* → variable itself 2. Quadratic → square of variable

Classes of features: 1. Linear* → variable itself 2. Quadratic → square of variable 3. Product → product of two variables 4. Threshold → binary transformation (0, 1) of a continuous variable using a threshold 5. Hinge → like a linear feature, but constant below a threshold * Categorical data: Binary feature → variable itself

What are the features used for? • to constrain the probability distribution of maximum

What are the features used for? • to constrain the probability distribution of maximum entropy (most spread out) which determines a species probability distribution (output prediction) Constraints: Linear* → mean Quadratic → variance Product → covariance Threshold → fit an arbitrary response Hinge → like linear (but constant below a threshold) * Categorical data: Binary feature → proportion

Auto features setting optimizes the use of a set of features based on the

Auto features setting optimizes the use of a set of features based on the number of presence records for the species • • Linear features if <10 presence points available Linear + quadratic if 10 -14 presence points available Linear + quadratic + hinge if 15 -79 presence points available All features if >80 presence points available In order to override this default setting, it is necessary to use the command line flags described in help menu. However, the beta regularization value has to be adjusted too.

Outputs: • Species. Name. html contains response curves, pictures of predictions and jackknife to

Outputs: • Species. Name. html contains response curves, pictures of predictions and jackknife to measure variable importance if chosen • prediction can be saved as cumulative, logistic, or raw • Output file types available: ASCII grid and DIVA-GIS grid (. mxe is not a grid output) • model can be projected on different climatic datasets (different geographic region or different period of time

Examples of response curves (how each environmental variable affects Maxent model) Picture of Maxent

Examples of response curves (how each environmental variable affects Maxent model) Picture of Maxent prediction

Cumulative output Logistic output Raw output Used to be the default type New default

Cumulative output Logistic output Raw output Used to be the default type New default type Raw values (very small) Each value is the sum of probabilities of cells < the cell grid, times 100 Non-linear scale up of raw values Sum over all cells used for training is 1 General notes: • Thresholding (binning) can change the look of the map significantly • Care in interpreting the thresholds (e. g. a cumulative value of 80% doesn’t mean that the probability of a species’ occurrence is 80%) • Grids have floating points values, thus they should be imported as floating point grids this in GIS software in order to preserve the fine details in classifying cells as suitable

Lastly. . . (more) Settings button: opens a new window with more settings Random

Lastly. . . (more) Settings button: opens a new window with more settings Random test/train partition of occurrence data for each run; same for background data % of occurrence data randomly set aside as test points (default is 0) modifies the regularization value (higher value gives a more spread out distribution); works only if auto features option is off. Occurrence data from a file (rather than a random sample of training data) is used to test AUC, omission, etc Sampling is assumed to be biased according to sampling distribution

To summarize in a few words. . . To run Maxent: • Occurrence data

To summarize in a few words. . . To run Maxent: • Occurrence data in a. csv file • Training environmental dataset (no mask needed) containing ASCII grids • Optional: environmental dataset for projecting models Maxent outputs (predictions): • ASCII grids, floating point (not integers) • Can be raw, logistic, or cumulative predictions • Additional files, including an. html summary file