Wilke:Using HyPhy: Difference between revisions
From OpenWetWare
Jump to navigationJump to search
Line 14: | Line 14: | ||
* Likelihood Function | * Likelihood Function | ||
== | == HyPhy Batch File == | ||
Here is a basic HyPhy script. Below, the script is discussed in depth. | |||
<pre> | <pre> | ||
DataSet myData = ReadDataFile ("aln.fasta"); | |||
DataSetFilter myFilter = CreateFilter (myData,1,<nowiki>""</nowiki>, <nowiki>""</nowiki>, <nowiki>""</nowiki> );''' | |||
F81RateMatrix = | |||
{{* ,mu,mu,mu} | |||
{mu,* ,mu,mu} | |||
{mu,mu,* ,mu} | |||
{mu,mu,mu,* }}; | |||
HarvestFrequencies (obsFreqs, myFilter, 1, 1, 1); | |||
Tree myTree = ((a,b),c,d); | |||
Model F81 = (F81RateMatrix, obsFreqs); | |||
LikelihoodFunction theLikFun = (myFilter, myTree); | |||
Optimize (MLEs, theLikFun); | |||
fprintf (stdout, theLikFun); | |||
</pre> | |||
Now, let's go line by line through the script above. | |||
'''DataSet myData = ReadDataFile ("aln.fasta");''' | |||
*Stores your multiple alignment file in the variable myData. Note that the path to the file must be specified if it is not found in the working directory. | |||
*A phylogeny may be (optionally) included at the bottom of the data file to be included in later analysis. More on this later... | |||
'''DataSetFilter myFilter = CreateFilter (myData,1,<nowiki>""</nowiki>, <nowiki>""</nowiki>, <nowiki>""</nowiki> );''' | |||
*Stores a data filter in the variable myFilter. The function "Create Filter" takes five arguments, the last three of which are optional: CreateFilter (DataSetId, Unit,Vertical Partition, Horizontal Partition, Exclusions); | |||
*''DataSetId'' is the variable name for the previously imported data set, in this case called myData. | |||
*''Unit'' defines how many characters should be treated as a single object. For codon data, this value would be 3 since every three characters are analyzed together. For nucleotide data, this value is 1. | |||
*''Vertical Partition'' specifies which sites should be analyzed. In this case, the entire data set is analyzed together so no partition is specified | |||
*''Horizontal Partition'' ...... | |||
*''Alphabet Exclusions'' is a comma-separated list of characters to be ignored during analysis. An example may be stop codons, which would be written “TAA, TGA, TAG”. | |||
== Scripts == | == Scripts == |
Revision as of 15:50, 9 January 2012
Notice: The Wilke Lab page has moved to http://wilkelab.org.
The page you are looking at is kept for archival purposes and will not be further updated.
The page you are looking at is kept for archival purposes and will not be further updated.
THE WILKE LAB
The Basics
Each HyPhy analysis must include several essential components:
- Data Set
- This is a multiple sequence alignment file which may be in one of several formats, including fasta, phylip, or nexus.
- Data Filter
- This selects which parts of the data sets should be used in analysis. In the simplest case, the entire set will be processed as a single unit. In a more complex scenario, however, you may have a data set which includes both introns and exons, which you would want to analyze under different evolutionary models. This may be specified using a data filter, which thus "partitions" your data set.
- Evolutionary Model
- You will need to provide HyPhy with a rate matrix describing your substitution model of choice in order to process the data.
- Phylogeny
- This should be in newick format.
- Likelihood Function
HyPhy Batch File
Here is a basic HyPhy script. Below, the script is discussed in depth.
DataSet myData = ReadDataFile ("aln.fasta"); DataSetFilter myFilter = CreateFilter (myData,1,"", "", "" );''' F81RateMatrix = {{* ,mu,mu,mu} {mu,* ,mu,mu} {mu,mu,* ,mu} {mu,mu,mu,* }}; HarvestFrequencies (obsFreqs, myFilter, 1, 1, 1); Tree myTree = ((a,b),c,d); Model F81 = (F81RateMatrix, obsFreqs); LikelihoodFunction theLikFun = (myFilter, myTree); Optimize (MLEs, theLikFun); fprintf (stdout, theLikFun);
Now, let's go line by line through the script above.
DataSet myData = ReadDataFile ("aln.fasta");
- Stores your multiple alignment file in the variable myData. Note that the path to the file must be specified if it is not found in the working directory.
- A phylogeny may be (optionally) included at the bottom of the data file to be included in later analysis. More on this later...
DataSetFilter myFilter = CreateFilter (myData,1,"", "", "" );
- Stores a data filter in the variable myFilter. The function "Create Filter" takes five arguments, the last three of which are optional: CreateFilter (DataSetId, Unit,Vertical Partition, Horizontal Partition, Exclusions);
- DataSetId is the variable name for the previously imported data set, in this case called myData.
- Unit defines how many characters should be treated as a single object. For codon data, this value would be 3 since every three characters are analyzed together. For nucleotide data, this value is 1.
- Vertical Partition specifies which sites should be analyzed. In this case, the entire data set is analyzed together so no partition is specified
- Horizontal Partition ......
- Alphabet Exclusions is a comma-separated list of characters to be ignored during analysis. An example may be stop codons, which would be written “TAA, TGA, TAG”.