Short read toolbox: Difference between revisions
No edit summary |
No edit summary |
||
Line 24: | Line 24: | ||
# A test | # A test | ||
# EOF. | # EOF. | ||
</pre> | |||
=GenBank Submission Files= | |||
Preparing sequences for submission to GenBank can be a somewhat arduous task. Here's a script I've made to rty to help automate the process. | |||
<pre> | |||
C:\> tab2gbtbl_v2.pl -a feature_table.txt | |||
</pre> | </pre> | ||
Revision as of 14:43, 14 October 2009
Short Read Toolbox
This page has been created to help facilitate the processing of Illumina short read data. Scripts and link information are provided.
Illumina data currently comes in two flavors: single end and paired end reads. Single end reads occur as single files and are pretty straight forward. Paired end reads involve reads from both ends of approximately sized DNA fragments (size selected based on gel electrophoresis). Various softwares have chosen to deal with this data in various formats. The format produced from the Illumina pipeline currently consists of paired files, two for each pair. Within each paired set of files the records are listed sequentially such that read 1a is a pair to 1b, read 2a is a pair to 2b, and so on. In order to simplify data format management I've tried to adhere to this format and deal with translations to other formats downstream from this. This also means that I've tried to separate my scripts into scripts that address single-end reads and paired-end reads.
- Single-end scripts work on a single file.
- Paired-end scripts coordinate two paired files.
Single-End Scripts
To be done.
Code:
#!/usr/bin/perl # A test # EOF.
Paired-End Scripts
To be done.
#!/usr/bin/perl # A test # EOF.
GenBank Submission Files
Preparing sequences for submission to GenBank can be a somewhat arduous task. Here's a script I've made to rty to help automate the process.
C:\> tab2gbtbl_v2.pl -a feature_table.txt