GSMNP:Notebook/Maxent/Creating a Range Map: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
==Motivation and Background==
A desirable model output not included in the standard html output is a binary range map for the species. This map has only 2 categories, habitat and non-habitat. The delineation of habitat must be chosen by the user based on the binomial test results in the Analysis of omission and commission section (5.1). As an example, we chose a Balanced training omission, predicted area and threshold value threshold for the Hooded Warbler model. The logistic threshold was 0.139. Our task is to create a binary map from the original logistic projection map where values greater than 0.139 are true (1) and values less than 0.139 are false (0). The procedure follows:
===Motivation===
*Import the output logistic ASCII grid into ArcMap.
*Modeling of species distributions in parks holds many values for the scientific community, but for stewardship of park resources by the NPS, it is critical.
ArcToolbox<math>\Rightarrow</math>Conversion Tools<math>\Rightarrow</math>To Raster<math>\Rightarrow</math>ASCII to Raster
**Only having species occurrences as points is of limited usefulness to park managers, since they cannot infer what is between the points.
**Input ASCII raster file: path to output folder/species name_ASCII.asc – Output raster: path to output folder/pred_species abbrev
*Knowing with some probability where species are in large natural areas is essential to taking actions to protect them, including monitoring, stewardship of rare species, reacting to a species that is suddenly found to be at-risk, and modeling future scenarios that place species in jeopardy.
**Output data type (optional): FLOAT
*Currently there are many threats to natural systems and native species at Great Smoky Mountains National Park.
**Set Spatial Analyst Workspace to output folder Spatial Analyst<math>\Rightarrow</math>Options...<math>\Rightarrow</math>General
**The biological complexity, interactive stressors and limited agency resources at the Smokies, make knowing where to take the most effective actions imperative.
**Working Directory: path to output folder
===Background===
*Compute the logical comparisons
*Maxent is a method for generating predictive distributions given a set of occurrence data and known environmental variables at those locations.  
*Spatial Analyst<math>\Rightarrow</math>Raster Calculator
**This predicted distribution is constrained such that it is close to the empirical average of environmental variables at the occurrence locations.  
**hab_species abbrev = pred_species abbrev > threshold
**Among all possible models that fulfill these constraints the model of maximum entropy is the model which fits only the minimum constraints
**OK
**(i.e. it avoids over-fitting by choosing the most unconstrained model possible given the constraints set by the environmental variables at presence locations).  
After following the above recipe, the resulting grid will be in Arc binary grid format and have a value of 1 for habitat pixels, 0 for non-habitat pixels, and NODATA for pixels not inside the analysis mask [[Image:Gsm-maxent-fig3.jpg|thumb|alt=Binary Range Map|Figure 3, Binary Range Map]]
*Maxent has been used extensively is physics and economics applications.  
**It is just one among many different options for generating species prediction distributions using environmental variables at species presence site ([http://www.nhm.ku.edu/desktopgarp/ GARP], [http://data.princeton.edu/R/glms.html GLM], [http://cran.r-project.org/web/packages/gam/index.html GAM]), but has several advantages. Taken from [http://www.cs.princeton.edu/~schapire/papers/ecolmod.pdf Phillips et al. (2006)], maxent:
#requires only presence data, not presence/absence data
#can use both continuous and categorical variables
#the optimization is efficient,
#has a concise probabilistic definition,
#it avoids over-fitting through regularization
#can address sampling bias formally,
#output is continuous (not just yes/no), and
#is generative rather than discriminative which makes it better for small sample sizes.
===Strengths & Weaknesses===
*There is some criticism against using Maxent for species distribution modelling. Specifically, Maxent considers only presence data instead of both presence and absence data. As a result, capture probabilities are not explicitly included in the model. This is nearly anathema in the field of Wildlife Biology where predictions based on mark-recapture studies have been the norm for years.
*There are at least 3 practical answers to this criticism:
#The first is to be explicit about the prediction probabilities that maxent produces.
##Rather than modelling the probability of an occurrence, maxent models the probability that an occurrence at a given location is different from a randomly selected location.
##The difference from true occurrence prediction is subtle, and in many cases probably does not matter.
#Second, outside of animal studies, presence data, not presence/absence data or multiple observer data, is the norm.
##We know of no published data on plants where multiple observers were used to assess the observation probability of a species. Longitudinal studies are common, but they are not used in the same way that mark-recapture studies are used with animals.
#Finally, because of the advantages outlined above, maxent is the easiest model to implement for the large amount of species that must modeled in the GRSM.
##Developing an in-house model with all the advantages of maxent that includes both presence/absence data would be extremely costly.  
##It is likely that support for presence/absence data will be included in future versions of maxent, at which point the predictions surfaces can easily be recalculated without the cost of developing an in-house solution.

Latest revision as of 20:41, 6 August 2014

A desirable model output not included in the standard html output is a binary range map for the species. This map has only 2 categories, habitat and non-habitat. The delineation of habitat must be chosen by the user based on the binomial test results in the Analysis of omission and commission section (5.1). As an example, we chose a Balanced training omission, predicted area and threshold value threshold for the Hooded Warbler model. The logistic threshold was 0.139. Our task is to create a binary map from the original logistic projection map where values greater than 0.139 are true (1) and values less than 0.139 are false (0). The procedure follows:

  • Import the output logistic ASCII grid into ArcMap.

ArcToolbox[math]\displaystyle{ \Rightarrow }[/math]Conversion Tools[math]\displaystyle{ \Rightarrow }[/math]To Raster[math]\displaystyle{ \Rightarrow }[/math]ASCII to Raster

    • Input ASCII raster file: path to output folder/species name_ASCII.asc – Output raster: path to output folder/pred_species abbrev
    • Output data type (optional): FLOAT
    • Set Spatial Analyst Workspace to output folder Spatial Analyst[math]\displaystyle{ \Rightarrow }[/math]Options...[math]\displaystyle{ \Rightarrow }[/math]General
    • Working Directory: path to output folder
  • Compute the logical comparisons
  • Spatial Analyst[math]\displaystyle{ \Rightarrow }[/math]Raster Calculator
    • hab_species abbrev = pred_species abbrev > threshold
    • OK

After following the above recipe, the resulting grid will be in Arc binary grid format and have a value of 1 for habitat pixels, 0 for non-habitat pixels, and NODATA for pixels not inside the analysis mask

Binary Range Map
Figure 3, Binary Range Map