Short read toolbox: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
No edit summary
Line 24: Line 24:
# A test
# A test
# EOF.
# EOF.
</pre>
=GenBank Submission Files=
Preparing sequences for submission to GenBank can be a somewhat arduous task.  Here's  a script I've made to rty to help automate the process.
<pre>
C:\> tab2gbtbl_v2.pl -a feature_table.txt
</pre>
</pre>



Revision as of 14:43, 14 October 2009

Short Read Toolbox

This page has been created to help facilitate the processing of Illumina short read data. Scripts and link information are provided.

Illumina data currently comes in two flavors: single end and paired end reads. Single end reads occur as single files and are pretty straight forward. Paired end reads involve reads from both ends of approximately sized DNA fragments (size selected based on gel electrophoresis). Various softwares have chosen to deal with this data in various formats. The format produced from the Illumina pipeline currently consists of paired files, two for each pair. Within each paired set of files the records are listed sequentially such that read 1a is a pair to 1b, read 2a is a pair to 2b, and so on. In order to simplify data format management I've tried to adhere to this format and deal with translations to other formats downstream from this. This also means that I've tried to separate my scripts into scripts that address single-end reads and paired-end reads.

  • Single-end scripts work on a single file.
  • Paired-end scripts coordinate two paired files.

Single-End Scripts

To be done.

Code:

#!/usr/bin/perl
# A test
# EOF.

Paired-End Scripts

To be done.

#!/usr/bin/perl
# A test
# EOF.

GenBank Submission Files

Preparing sequences for submission to GenBank can be a somewhat arduous task. Here's a script I've made to rty to help automate the process.

C:\> tab2gbtbl_v2.pl -a feature_table.txt

Useful links