Liston:Computer Scripts: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 41: Line 41:
<tr><td>[[BPstats.py]]</td>
<tr><td>[[BPstats.py]]</td>
<td>Performs and outputs various statistical tests for each contig in a GSS basepile output. it is assumed that the information for each contig is 1000 bases long. The following statistics are outputted in a tab-delimitated list for each contig by the script:
<td>Performs and outputs various statistical tests for each contig in a GSS basepile output. it is assumed that the information for each contig is 1000 bases long. The following statistics are outputted in a tab-delimitated list for each contig by the script:
Reference Match Length: The index of the last known base (i.e. not 'N')
<ul>
Target Match Sum: The number of known bases (i.e. not 'N')
<li>Reference Match Length: The index of the last known base (i.e. not 'N')</li>
Coverage Proportion: proportion of 'Target match sum' to ' Reference match length'
<li>Target Match Sum: The number of known bases (i.e. not 'N')</li>
Average Density: The average density value for all bases within ' Reference match length'
<li>Coverage Proportion: proportion of 'Target match sum' to ' Reference match length'</li>
Median Density: median density of the entire range of density values</td>
<li>Average Density: The average density value for all bases within ' Reference match length'</li>
<li>Median Density: median density of the entire range of density values</li>
</ul></td>
<td>GSS Basepile output. 1000 bases pre contig</td>
<td>GSS Basepile output. 1000 bases pre contig</td>
<td>tab-deliminated list of the statistics for each contig</td></tr>
<td>tab-deliminated list of the statistics for each contig</td></tr>

Revision as of 15:58, 9 January 2010

This page contains the source code for some of the bioinformatics scripts used by the Liston Lab. Most of the scripts are written in Python 2.6.4 and are designed for Unix systems. A few are written as a list of unix commands designed to be executables.


Python Script Conventions

The scripts must be compiled using a Python compiler in the following format:

   python theScript.py [modifiers] <Arguments>. 

For example, in order to run the script sumqual.py one could enter the following into an Unix shell:

   python sumqual.py -c -v ../myQualFile.qual ../myMumFile

This would compile and run the script sumqual.py with the modifiers -c and -v, using myQualFile.qual and myMumFile as arguments. All of the scripts save their output in a file in the current working directory, with a name usually composed of some combination of the arguments and the name of the script. However, one can save the output anywhere, under any name, using the following technique:

   python sumqual.py -c -v ../myQualFile.qual ../myMumFile > ../myOutput.ext

The order in which the modifiers are given is not important, however, the order of the required arguments is important. For Example the above modifiers could be entered in the opposite order (-v -c), but the two file paths need to be in a predetermined order. Some scripts have modifiers that require arguments of their own. These modifier arguments should be written directly after their respective modifier. For example, if the above modifier, -c, had a argument, one would type,

   python sumqual.py -c theArgument -v ../myQualFile.qual ../myMumFile

Every Script has a description of what it does and how/when to use it in its source code. The list all the modifiers that the script supports and what they do is also included. A similar help menu can be viewed by calling the script with no arguments. For example, typing the following,

   python sumqual.py

would cause a help menu to be printed to the screen.

Python Scripts

Script NameDiscriptionInput File FormatOutput File Format
baseanno.py Converts a file containing a list of annotations, as well as each of their respective start and stop indices, into a file containing a list of base indices, each followed by any annotations that apply at that specific base. Each line of the input file is expected to be whitespace-delaminated, however if your annotations have spaces in them, the script can be made to enforce tab-delimitation. The output file is always tab-delaminated. [Annotation text] [Start Index] [End Index] [Base Index] [Annotation1] [Annotation2] ... [AnnotationX]
basediff.py Finds base differences between multiple aligned sequences in a single FASTA file and output a tab-delaminated txt file containing the base values for all the sequences at the index where the difference occurred. The script has many modifiers that change what is considered a difference. FASTA file containing two or more aligned sequences .txt file in the following format: [Base index] [Seq 1 value] [Seq 2 value] ... [Seq N value]
BPstats.py Performs and outputs various statistical tests for each contig in a GSS basepile output. it is assumed that the information for each contig is 1000 bases long. The following statistics are outputted in a tab-delimitated list for each contig by the script:
  • Reference Match Length: The index of the last known base (i.e. not 'N')
  • Target Match Sum: The number of known bases (i.e. not 'N')
  • Coverage Proportion: proportion of 'Target match sum' to ' Reference match length'
  • Average Density: The average density value for all bases within ' Reference match length'
  • Median Density: median density of the entire range of density values
GSS Basepile output. 1000 bases pre contig tab-deliminated list of the statistics for each contig


gapstrip.py

qualtofa.py