GSMNP:Notebook/MaxEnt

From OpenWetWare

(Difference between revisions)
Jump to: navigation, search
(Motivation and Background)
(Pertinent Links)
Line 101: Line 101:
*Phillips, S.J., Dudic, M. (2008) Modelling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography, 31, 161-175.
*Phillips, S.J., Dudic, M. (2008) Modelling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography, 31, 161-175.
*Phillips, S.J., Dudic, M., Elith, J., Graham, C.H., Lehmann, A., Leathwick, J., Ferrier, S. (2009) Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecological Applications, 19, 181-197.
*Phillips, S.J., Dudic, M., Elith, J., Graham, C.H., Lehmann, A., Leathwick, J., Ferrier, S. (2009) Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecological Applications, 19, 181-197.
 +
*http://www.nics.tennessee.edu/faq
==Notes==
==Notes==

Revision as of 22:51, 7 May 2014

Search this Project

Customize your entry pages

Project Description

  • A project notebook for use of Maximum Entropy Species Distribution Modelling within Great Smoky Mountains National Park. Based in part on a document produced by R. Todd Jobe [1] and Benjamin Zank, "Modelling species distributions for the Great Smoky Mountains National Park using Maxent." Image:Jobe 2008 MaxEnt.pdf

Introduction

The goal of this document is to provide help for managers and researchers at Great Smoky Mountains National Park (GRSM) in modelling species distributions using maximum entropy (maxent) methods. It provides a reference for the maxent software (Phillips and Dudik 2008): the standard for modelling species distributions. Below is a brief background on maxent and the motivation for its use in GRSM. In the sections following we provide help for: getting the software, preparing data for use in the model, running the model, and creating a binary range map from the model.

This document is designed to supplement, not replace the help files contained in the maxent software. It is strongly recommended that users also read (Phillips et al. 2006), (Phillips and Dudik 2008), and the tutorial.doc, which is packaged with the maxent software. Also, this document is structured for users working on a Windows system that has an installation of ArcGIS (ESRI 2006), though maxent can be run equally well on other operating systems (2) and with other GIS software.

Motivation and Background

Modeling of species distributions in parks holds many values for the scientific community, but for stewardship of park resources by the NPS, it is critical. Only having species occurrences as points is of limited usefulness to park managers, since they cannot infer what is between the points.

Knowing with some probability where species are in large natural areas is essential to taking actions to protect them, including monitoring, stewardship of rare species, reacting to a species that is suddenly found to be at-risk, and modeling future scenarios that place species in jeopardy.

Currently there are many threats to natural systems and native species at Great Smoky Mountains National Park. The biological complexity, interactive stressors and limited agency resources at the Smokies, make knowing where to take the most effective actions imperative.

Maxent is a method for generating predictive distributions given a set of occurrence data and known environmental variables at those locations. This predicted distribution is constrained such that it is close to the empirical average of environmental variables at the occurrence locations. Among all possible models that fulfill these constraints the model of maximum entropy is the model which fits only the minimum constraints (i.e. it avoids over-fitting by choosing the most unconstrained model possible given the constraints set by the environmental variables at presence locations).

Maxent has been used extensively is physics and economics applications. It is just one among many different options for generating species prediction distributions using environmental variables at species presence site (GARP, GLM, GAM), but has several advantages. Taken from Phillips et al. (2006), maxent:

  1. requires only presence data, not presence/absence data
  2. can use both continuous and categorical variables #the optimization is efficient,
  3. has a concise probabilistic definition, #it avoids over-fitting through regularization
  4. can address sampling bias formally, #output is continuous (not just yes/no), and
  5. is generative rather than discriminative which makes it better for small sample sizes.

There is some criticism against using Maxent for species distribution modelling. Specifically, Maxent considers only presence data instead of both presence and absence data. As a result, capture probabilities are not explicitly included in the model. This is nearly anathema in the field of Wildlife Biology where predictions based on mark-recapture studies have been the norm for years.

There are at least 3 practical answers to this criticism:

  1. The first is to be explicit about the prediction probabilities that maxent produces. Rather than modelling the probability of an occurrence, maxent models the probability that an occurrence at a given location is different from a randomly selected location. The difference from true occurrence prediction is subtle, and in many cases probably does not matter.
  2. Second, outside of animal studies, presence data, not presence/absence data or multiple observer data, is the norm. We know of no published data on plants where multiple observers were used to assess the observation probability of a species. Longitudinal studies are common, but they are not used in the same way that mark-recapture studies are used with animals.
  3. Finally, because of the advantages outlined above, maxent is the easiest model to implement for the large amount of species that must modeled in the GRSM. Developing an in-house model with all the advantages of maxent that includes both presence/absence data would be extremely costly. It is likely that support for presence/absence data will be included in future versions of maxent, at which point the predictions surfaces can easily be recalculated without the cost of developing an in-house solution.

Getting the Software

There are many different software packages that can optimize data using maximum entropy methods. In this document, however, we focus on the most common software package for biologists (Maxent). The software is available for download at http://www.cs.princeton.edu/~schapire/maxent/. The program is written in Java. This makes it cross-platform, which means that the code runs equally well on Unix, Macintosh and Windows operating systems. Most computer systems come with the Java run-time environment pre-installed or it is download Java during the course of Internet use.

'To see if Java is installed: Image:Fig1.tiff

  1. Open a terminal
  • "Windows"
    • Start --> Run --> cmd

"Mac OS"

    • Go --> Applications --> Utilities --> Terminal
  1. type: java -version

If the above command returns an error, then Java is not properly installed. It can be downloaded from http://java.com.

The main file to consider once the maxent files are downloaded from the website are: maxent.jar and maxent.bat. maxent.jar is the Java executable. It can be called from the command line using the java command: java -jar maxent.jar (4).

"Windows" The maxent.bat file is a windows batch file which can be double-clicked from the windows interface and starts the maxent.jar executable. Both of these files are small. When performing an analysis, it makes sense to just copy these two files into the workspace that is created to hold the data and outputs (3.2).

The maxent software contains a considerable amount of help documentation available from the user interface. There is also an excellent tutorial provided at the website where maxent is downloaded. It is strongly recommended that users go through the tutorial prior to using maxent on real data.

"Unix" Image:Fig2.tiff

Preparing the Data

Preliminary Decisions

Choose Species

Choose Environmental Variables

Choose a Projection

Prepare a Workspace

Prepare the Environmental Layers

Prepare the Species Occurrence Data

Last Steps

Running the Model

Pertinent Links

  • Information concerning maximum entropy species distribution modeling in Great Smoky Mountains National Park.
  • Collection of works concerning Maximum Entropy models:
  • Phillips, S.J., Anderson, R.P., Schapire, R.E. (2006) Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190, 231-259.
  • Phillips, S.J., Dudic, M., Schapire, R.E. (2004) A maximum entropy approach to species distribution modeling. Proceedings of the Twenty-First *International Conference on Machine Learning, 655-662.
  • Phillips, S.J., Dudic, M. (2008) Modelling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography, 31, 161-175.
  • Phillips, S.J., Dudic, M., Elith, J., Graham, C.H., Lehmann, A., Leathwick, J., Ferrier, S. (2009) Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecological Applications, 19, 181-197.
  • http://www.nics.tennessee.edu/faq

Notes

  • This project is currently under development as part of a Spring 2014 Practicum work for Tanner Jessel.
  • There is some related information at http://mountainsol.wordpress.com

Recent changes



Personal tools