Short read toolbox: Difference between revisions
No edit summary |
No edit summary |
||
Line 38: | Line 38: | ||
*[http://rga.cgrb.oregonstate.edu/ RGA] - Perl script which calls blat to assemble short reads. | *[http://rga.cgrb.oregonstate.edu/ RGA] - Perl script which calls blat to assemble short reads. | ||
*[http://maq.sourceforge.net/ MAQ] - Mapping and Assembly with Qualities. | *[http://maq.sourceforge.net/ MAQ] - Mapping and Assembly with Qualities. | ||
=List of Alignment Programs= | |||
*[http://align.bmr.kyushu-u.ac.jp/mafft/online/server/] - MAFFT | |||
*[http://mummer.sourceforge.net/] - MUMmer | |||
*[http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#BLATAlign] - Blat | |||
=Online Short-Read Resources= | =Online Short-Read Resources= |
Revision as of 16:24, 2 November 2009
Short Read Toolbox
This page has been created to help facilitate the processing of Illumina short read data. Scripts and link information are provided.
Illumina data currently comes in two flavors: single end and paired end reads. Single end reads occur as single files and are pretty straight forward. Paired end reads involve reads from both ends of approximately sized DNA fragments (size selected based on gel electrophoresis). Various softwares have chosen to deal with this data in various formats. The format produced from the Illumina pipeline currently consists of paired files, two for each pair. Within each paired set of files the records are listed sequentially such that read 1a is a pair to 1b, read 2a is a pair to 2b, and so on. In order to simplify data format management I've tried to adhere to this format and deal with translations to other formats downstream from this. This also means that I've tried to separate my scripts into scripts that address single-end reads and paired-end reads.
- Single-end scripts work on a single file.
- Paired-end scripts coordinate two paired files.
Single-End Scripts
To be done.
Code:
#!/usr/bin/perl # A test # EOF.
Paired-End Scripts
To be done.
#!/usr/bin/perl # A test # EOF.
List of Micro-Read Quality Control Assessors
- TileQC - Requires R, RMySQL and MySQL.
List of Open Source de Novo Assemblers
- Velvet - Implements De Bruijn Graphs in C. Requires 64 bit Linux OS.
- Edena - 32 and 64 bit Linux.
- QSRA - Utilizes quality scores.
List of Open Source Reference Guided Assemblers
- RGA - Perl script which calls blat to assemble short reads.
- MAQ - Mapping and Assembly with Qualities.
List of Alignment Programs
Online Short-Read Resources
GenBank Submission Files
Preparing sequences for submission to GenBank can be a somewhat arduous task. Here's a script I've made to rty to help automate the process.
C:\> tab2gbtbl_v2.pl -a feature_table.txt