# User:R. Eric Collins/MBL/PAUP

### From OpenWetWare

David Swofford author of PAUP* and uncredited for many of the methods within

## Model Selection

- Parsimony
- long branch attraction because branch lengths not taken into account so similar bases could equally be from equilibration or conservation

- Models are _always_ wrong
- don't/can't expect them to match reality

- What is a _good_ model?
- as simple as necessary but no simpler
- a balance between under- and over-fitting

- heterotachy: differential rates of evolution at different sites on different branches
- can confuse maximum likelihood and choose long-branch tree
- new mixture models are being written to address this issue

- Model Selection Criteria
- Likelihood ratio tests: δ = -2(ln L0 - ln L1)
- chi-squared (frequentist) based, so always have possibility of Type I error, depending on error tolerance (α)
- i.e. reject simple model in favor of more complex model even if simple model is true

- Akaike information criterion (AIC)
- AIC_i = -2lnL_i + 2K
- tends to overestimate (be liberal with) number of parameters

- Bayesian information criterion (BIC)
- BIC_i = -2lnL_i + K ln n where n is sample size (typically number of sites)
- converges on correct answer as more data is added

- Likelihood ratio tests: δ = -2(ln L0 - ln L1)

- PAUP
- tips
- restrict dataset after loading datafile instead of making multiple copies of data subsets
- uses -ln L so all things are minimized in PAUP (lower is better)

- ModelTest
- any reasonable tree can be used, actual tree topology has little effect on model selection
- shouldn't have to run ModelTest, better to understand model selection well enough to winnow down model manually

- tips

to specify Tamura-Nei (transitions have equal probabilities, each transversion has different probabilities) lscores/nst=6 rclass=(abaaca)

- to only operate on a subset of data
- first load all the data
- taxset: set macro of taxa names
- delete: delete a certain subset of taxa
- exclude: exclude subset of characters