Short read toolbox: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
No edit summary
Line 46: Line 46:
=Online Short-Read Resources=
=Online Short-Read Resources=
*[http://seqanswers.com/ SEQanswers]
*[http://seqanswers.com/ SEQanswers]
*[http://pathogenomics.bham.ac.uk/blog/2009/09/tips-for-de-novo-bacterial-genome-assembly/ De novo tips] - Blog on de novo assembly


=GenBank Submission Files=
=GenBank Submission Files=

Revision as of 17:22, 2 November 2009

Short Read Toolbox

This page has been created to help facilitate the processing of Illumina short read data. Scripts and link information are provided.

Illumina data currently comes in two flavors: single end and paired end reads. Single end reads occur as single files and are pretty straight forward. Paired end reads involve reads from both ends of approximately sized DNA fragments (size selected based on gel electrophoresis). Various softwares have chosen to deal with this data in various formats. The format produced from the Illumina pipeline currently consists of paired files, two for each pair. Within each paired set of files the records are listed sequentially such that read 1a is a pair to 1b, read 2a is a pair to 2b, and so on. In order to simplify data format management I've tried to adhere to this format and deal with translations to other formats downstream from this. This also means that I've tried to separate my scripts into scripts that address single-end reads and paired-end reads.

  • Single-end scripts work on a single file.
  • Paired-end scripts coordinate two paired files.

Single-End Scripts

To be done.

Code:

#!/usr/bin/perl
# A test
# EOF.

Paired-End Scripts

To be done.

#!/usr/bin/perl
# A test
# EOF.

List of Micro-Read Quality Control Assessors

  • TileQC - Requires R, RMySQL and MySQL.

List of Open Source de Novo Assemblers

  • Velvet - Implements De Bruijn Graphs in C. Requires 64 bit Linux OS.
  • Edena - 32 and 64 bit Linux.
  • QSRA - Utilizes quality scores.

List of Open Source Reference Guided Assemblers

  • RGA - Perl script which calls blat to assemble short reads.
  • MAQ - Mapping and Assembly with Qualities.

List of Alignment Programs

Online Short-Read Resources


GenBank Submission Files

Preparing sequences for submission to GenBank can be a somewhat arduous task. Here's a script I've made to rty to help automate the process.

C:\> tab2gbtbl_v2.pl -a feature_table.txt

Useful links