- A project notebook for use of Maximum Entropy Species Distribution Modelling within Great Smoky Mountains National Park. Based in part on a document produced by R. Todd Jobe  and Benjamin Zank, "Modelling species distributions for the Great Smoky Mountains National Park using Maxent." Image:Jobe 2008 MaxEnt.pdf
The goal of this document is to provide help for managers and researchers at Great Smoky Mountains National Park (GRSM) in modeling species distributions using maximum entropy (maxent) methods. It provides a reference for the maxent software (Phillips and Dudik 2008): the standard for modeling species distributions.
This document is designed to supplement, not replace the help files contained in the maxent software. It is strongly recommended that users also read (Phillips et al. 2006), (Phillips and Dudik 2008), and the tutorial.doc, which is packaged with the maxent software. Also, this document is structured for users working on a Windows system that has an installation of ArcGIS (ESRI 2006), though maxent can be run equally well on other operating systems (2) and with other GIS software.
A brief "GSMNP:Notebook/Maxent/Motivation_and_Background" section discusses the rationale for using maxent in GRSM.
Getting the Software
There are many different software packages that can optimize data using maximum entropy methods. In this document, however, we focus on the most common software package for biologists (Maxent).
The software is available for download at http://www.cs.princeton.edu/~schapire/maxent/.
- The program is written in Java. This makes it cross-platform, which means that the code runs equally well on Unix, Macintosh and Windows operating systems.
- Most computer systems come with the Java run-time environment pre-installed or it is download Java during the course of Internet use.
- Java is a computer programming language and is not always pre-installed on a personal computer.
To see if Java is installed:
- Open a command line interface. This varies depending on your operating system.
- Mac OS
- Go --> Applications --> Utilities --> Terminal
- type: java -version
- If the above command returns an error, then Java is not properly installed. It can be downloaded from http://java.com.
Installing / Running Maxent Java Application (Graphical User Interface)
The main file to consider once the maxent files are downloaded from the website are: #maxent.jar and #maxent.bat.
- maxent.jar is the Java executable. It can be called from the command line using the java command: java -jar maxent.jar, but in Windows, it is simpler to launch the .jar file by clicking the .bat file (discussed immediately below).
- The maxent.bat file is a windows batch file
- double-clicking from the windows interface starts the maxent.jar executable.
- Both the .jar and .bat files are small.
- When performing an analysis, it makes sense to just copy these two files into a single workspace (typically a newly created folder/directory for the given project) to hold both the input data and outputs (3.2).
- The maxent software contains a considerable amount of help documentation available from the user interface.
- There is also an excellent tutorial provided at the website where maxent is downloaded.
- It is strongly recommended that users go through the tutorial prior to using maxent on real data.
Preparing the Data
Maxent requires precise formatting of the species occurrence data and the environmental data. Further, the spatial attributes of all data must be identical. This section is meant to guide users through the preliminary decision about species and environments that must be made, and then help users convert their data into formats appropriate for analysis in maxent.
There are some decision made up front which will alter how every other part of the analysis proceeds. Species and environmental layers must be selected which conform to certain geographic requirements, and the spatial attributes of all these layers must be defined.
Maxent can build models for multiple species at one time. The species to be modelled must have geolocated occurrences. It is advantageous if the precision of these geolocations are also known. Environmental maps can be adjusted to match the precision of the geolocations. If any temporally sensitive environmental data are included (e.g. temperature for a particular year, or fire history), then the species observation dates must coincide with dates for which the environmental data are valid.
Choose Environmental Variables
The predictions of any model will be improved if the selected environmental layers reflect the ecology of the organism. These associations may not be known for many species beforehand, however. Including every remotely sensed variable available is another option, and maxent provides estimates of the importance for each environmental variable included in the model (5.2). Maxent also provides a tuning parameter that adjusts the degree over-fitting (4). So, the kitchen-sink approach to variable inclusion works better in maxent than other approaches. At a bare minimum, species respond broadly to gradients of temperature and moisture. Three variables that approximate these gradients in GRSM are elevation, topographic convergence index, and hillshade (Jobe 2006).
Choose a Projection
You must choose a projection that matches precisely among all data types. This includes having the same datum among all data types. Data layers for GRSM are typically projected as Universal Transverse Mercator (UTM) zone 17, and either have the NAD27 or WGS84 datum. WGS84 is preferred, but the choice of datum and projection does not matter as long as both the occurrence data and all the environmental are exactly the same.
Projecting digital elevation models (DEMs) is not recommended if any other environmental layer is derived from them (e.g. slope, hillshade, hydrological models). The resampling required for projection introduces striations in the derived layers. It is best practice to project all other layers to match the projection of the DEM. Alternatively, derive layers from the DEM in the original projection, reproject all the grids.
In ArcGIS you can use ArcToolbox to project both rasters and features. To project all layers to a common projection use the batch project option:
- Start ArcGIS
- Load all unprojected grids into the document.
- ArcToolboxData Management ToolsProjections and Transformations(Right-click) Project RasterBatch...
- Highlight each raster in the workspace and drag them to the field Input Raster.
- For the first raster (double-click) Output coordinate system
- Select a coordinate system from the box using an imported grid or browsing for a projection.
- Copy and paste the resulting value into each row of Output coordinate system.
- Repeat for Geographic Transformation if necessary.
At the end you should have new set of environmental layers, all sharing the same projection.
Prepare a Workspace
It is simpler to create one folder for a given analysis. Here, we term this the workspace. The files maxent.bat and maxent.jar should be copied into this workspace. Also, two sub-folders should be created in the workspace: grid, which will hold the prepared ArcGrid binary environmental layers, and ascii, which will hold the prepared ESRI ASCII environmental layers.
Prepare the Environmental Layers
The environmental layers set the geographic extent of the analysis window in the maxent software. So, it is best to prepare these layers before the species occurrence data, because some of the occurrences may lie outside this window and will have to be pared accordingly (3.4)
Maxent expects environmental data to be in ESRI ASCII grid format (AAIGrid). These grids can contain either continuous, or categorical data. If the grid is categorical, each category must be coded as an integer value. Environmental layers must share the same extent, the same grain, and the same mask (i.e. NODATA cells). In short, each layer must be identical except for the values contained in the data cells.
The names of each environmental layer should be less than 13 characters. Optionally, categorical layers should begin with prefix (e.g. c_). If maxent is ever run from the command line, these layers can be switched from continuous (the default) to categorical based on their prefix using the command option togglelayertype.
There are many ways to ensure that the environmental layers have matching spatial attributes, but here I present a method that uses the Spatial Analyst toolbar in ArcMap. I assume that the environmental layers are already in the standard ArcInfo binary grid format and that they have the same projection (3.1.3). If some environmental layers are stored as polygon shapefiles, then they must be converted to ArcInfo binary grids from: Spatial AnalystConvertFeatures to Raster... (details for starting Spatial Analyst are given below). The cell size for the output grid may be determined beforehand, or should be taken to be the largest cell size of the environmental layers already stored as grids.
- Begin by loading all the environmental grids as layers in ArcMap.
- Make sure the Spatial Analyst tool bar is available. If not:
- ToolsExtensionsSpatial Analyst (check to activate)
- ViewToolbarsSpatial Analyst
- Set the analysis environment of Spatial Analyst
- Spatial AnalystOptions...
- Working Directory:Path to Analysis Workspace\grid – Mask: <None>
- Analysis Extent: Intersection of Inputs
- Cell Size
- Analysis Cell Size: Maximum of Inputs, or a predefined cell size that is greater than or equal to the largest cell size in your grid.
- Create an analysis mask using the current environment in Spatial Analyst. This mask will align the NODATA cells for each of the output environmental layers.
- Duplicate your environmental layers into grids that have the correct spatial attributes.
- Spatial AnalystRaster Calculator...
- Create new grids with the same name as the original grids in the working directory:
grid1 = [grid1] Raster Calculator does not actually replace the original grids. Instead, grids of the same name are created in the working directory, that have the appropriate spatial attributes.
grid2 = [grid2]
- Remove the old grids from the ArcMap data frame.
- Convert the newly created grids into ASCII grids
- Activate the ArcToolbox Raster to ASCII tool in Batch mode: ArcToolboxConversion ToolsFrom Raster(Right Click) Raster to ASCIIBatch
- The Raster to ASCII batch window has two fields Input raster and Output ASCII raster file.
- Drag the grid layers from ArcMap to Input raster.
- Rename the default values of Output ASCII raster file to grids of the same name, but in the ascii folder:
Input raster 1 grid1 2 grid2 ... Output ASCII raster file Path to workspace\ascii\grid1 Path to workspace\ascii\grid2
After following these steps, the ascii folder in the analysis workspace will have all of the grids necessary for analysis in maxent. ArcMap should not be closed at this point, however, because the binary grids will still need to be used.
Prepare the Species Occurrence Data
Here, I assume that all species occurrence data have been projected to match the environmental layers (3.1.3), that the data exist as a point shapefile, and that one field of the shapefile contains the species name.
- Clip occurrences to the maximum extent. Occurrences cannot have geolocations outside of the environmental layers. To guarantee this, the occurrence data must be clipped to the environmental layer.
- Convert the mask grid to a polygon:
Spatial AnalystConvertRaster to Features
- Input raster: Path to Workspace\grid\mask – Field: VALUE
- Output geometry type: Polygon
- Generalize lines: unchecked
- Output features: Path to Workspace\plyMask
- Clip the occurrence data using the new mask
- ArcToolboxAnalysis ToolsExtractClip
- Input Features: Path to Occurrence Shapefile
- Clip Features: Path to Workspace\plyMask.shp
- Output Feature Class: Path to Workspace\pntOccurrences.shp
- Add XY coordinate fields to the attribute table of the occurrence data, if they do not already exist.
ArcToolboxData Management ToolsAdd XY
- Input Features: Path to Workspace\pntOccurrences.shp
- Export the species occurrences attributes table as a .dbf file.
- Add pntOccurrences.shp to ArcMap as a layer.
- Right-click pntOccurrencesOpen Attribute Table
- Export: All Records
- Output table: Path to Workspace\tblOccurrences.dbf
- Convert the .dbf file to a .csv.
- Open tblOccurrences.dbf in Microsoft Excel.
- Delete all fields except for the Species Name, X, and Y.
- Ensure the fields are ordered: Species Name,X,Y.
- Delete the header row.
- Save the file as a .csv: pntOccurrences.csv
The end result of creating the species occurrence data should be a comma-separated values (csv) file, pntOccurrences.csv, with three fields (no header row): species, x, & y. This is the file that will be input to maxent.
An output folder must be created in the workspace to hold the results from the Maxent model (an easy folder name is output.
Optionally, you may also generate an samples with data (SWD) file for the species and the environment. Details of this format are given in the maxent tutorial, but basically it saves model run time if the environmental data at the sample points is added to the species occurrence file. Maxent optimizes the relationship between occurrences and environment using a random sample of 10,000 random points. You can skip this step in the Maxent model run by doing it yourself in ArcGIS. The procedure for generating SWD files for the observations and the environmental data is this:
- Download and install Hawth’s tools (http://www.spatialecology.com/htools/tooldesc.php).
- Run the tool Intersect Point Data
- Point file to intersect:Your species vector layer
- Raster: Select all environmental layers
- Export the species vector layer as a *.dbf and then as a *.csv file as described in 3.4.
- Generate a point shapefile containing 10,000 random points within the mask layer from Arc- Toolbox. Data Management ToolsFeature ClassCreate Random Points
- Output Location : path to workspace
- Output Point Feature Class : environ
- Constraining Feature Class : mask
- Number of Points : Long, 10000
Add XY coordinates as in 3.4
- Extract the environmental data to the environ layer using the Intersect Point Data tool as above.
- Export the environ shapefile as a *.dbf and then as a *.csv file as described in 3.4
The end result of these steps will be two files, species.csv and environ.csv. These can be loaded as the species and environmental files, respectively, in the Maxent GUI or specified at the command line (4). Maxent will still need to use the contents of the ASCII folder for generating prediction layers if that option is selected.
Running the Model
Maxent may be run both from a graphical user interface (GUI) and called from the command line. It is suggested that preliminary analyses be done on the GUI, while larger analyses be done on the command line. Given that the species and environmental data have been generated following the instruction in 3, setting a model run on the GUI is relatively straightforward.
The simplest configuration of Maxent requires only setting the path to the species *.csv file, the environmental layers folder (ASCII), and an output folder (which must already exist). All configu- rations of maxent are discussed below.
This field should contain the path to the species occurrence csv file prepared according to 3.4 (Fig. 1). This file can contain multiple species in a single file and have the environmental data included in SWD format (3.5). When a file is selected, the contents are read and the species appear in the box. A subset of species can be selected using the checkboxes. It should be noted that when multiple species are included in one sample file, the output is split by species. The outputs are not combined into a single file save for maxentResults.csv (5)
This field contains the path to the folder containing the environmental layers prepared in 3.3. Alternatively it may contain the path to an environmental SWD file (3.5). As with the samples file, the environmental layer names are read into the window below the directory field. You should change the continuous parameter to categorical for any environmental variables that fit this description. Maxent fits categorical variables using a different function than continuous variables.
On the left of the window are checkboxes for the types of functions that may be fit to each environmental variables. By default, auto is select. This option should be left as is, unless there is a specific reason to change it. See the tutorial or help files for a more detailed explanation of the possible fitted curves.
The other settings available from the main screen of maxent control the type of output that is generated by the model. They are located on the lower right-hand side of the main window (Fig. 1).
Create response curves
If checked, then response curve of the species for each environmental variable in the model are added.
Make pictures of predictions
If checked, then the output html file will contain an image of the prediction surface
Do jackknife to measure variable importance
Tests for the relative importance of each variable in the model and outputs the results to the html file.
The type of prediction output by the model. One of Logistic (where values are the probability of observing a species given the suitability of that environment), Cumulative (% of the maxent distribution at or below the current prediction) or Raw (the probability of observing a species in that particular pixel). Logistic output is the default and is the easiest to interpret as a measure of habitat suitability.
Output file type
The type of prediction grid to be created. You may choose from ESRI ASCII Grid (.asc), a slightly smaller maxent format (.mxe), a grid file for use by image processing software (.grd), or a band interleaved by line file (.bil). .asc is the default format, and it is best to leave it as such unless there is a specific reason for using the other formats.
The folder to which output will be directed. This folder must be created ahead of time (3.5). Once the model is run, the most important files in this folder will be the ones with an html extension.
Projection layers directory/file
If you specify environmental data as sample points in swd for- mat (3.5) you can have the output model projected onto larger grids located in the directory specified in this field. This allows model optimization to proceed rapidly, yet still provide predictions for a large extent. If environmental data are not in swd format, leave this field blank.
- Information concerning maximum entropy species distribution modeling in Great Smoky Mountains National Park.
- Collection of works concerning Maximum Entropy models:
- Phillips, S.J., Anderson, R.P., Schapire, R.E. (2006) Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190, 231-259.
- Phillips, S.J., Dudic, M., Schapire, R.E. (2004) A maximum entropy approach to species distribution modeling. Proceedings of the Twenty-First *International Conference on Machine Learning, 655-662.
- Phillips, S.J., Dudic, M. (2008) Modelling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography, 31, 161-175.
- Phillips, S.J., Dudic, M., Elith, J., Graham, C.H., Lehmann, A., Leathwick, J., Ferrier, S. (2009) Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecological Applications, 19, 181-197.
- This project is currently under development as part of a Spring 2014 Practicum work for Tanner Jessel.
- There is some related information at http://mountainsol.wordpress.com