Blast link

From OpenWetWare

Revision as of 06:30, 20 March 2011 by Ben Woodcroft (Talk | contribs)
Jump to: navigation, search

Blast_link is an add-on for wwwblast that allows arbitrary links to be added to the results page for each hit. A common (default) use case is to implement the link such that the entire hit sequences can be extracted, where the default install of www-blast does not permit this. This use-case will herein be referred to as the link-to-sequences mode.

Contents

Pre-requisites

Blast_link requires Perl. The default link-to-sequences mode of blast_link also requires the BLAST+ executables makeblastdb and blastdbcmd.

Installation

Pre-requisites

Blast_link is an add-on for wwwblast, and requires that to be installed first. Note that the preparation of the binary BLAST databases is slightly more complex when working with blast_link, so follow the instructions on this wiki page and not the wwwblast wiki page for that part.

Download

The newest version of blast_link can be obtained from GitHub (less direct link). Blast_link is entirely open source program and modifications and improvements are appreciated. It is not affiliated with NCBI.

Put the extracted files in the base blast directory (the directory that blast.html is in).

After the extracted files have been put in place, you should be able to navigate to the new blast query page, blast_link.cgi. For example, the URL might be http://localhost/blast/blast_link.cgi. The old blast.html site will still work, but doing so will bypass the blast_link code.

Preparation of binary BLAST+ databases

For the link-to-sequences mode of blast_link, creation of the binary databases is slightly more onerous than usual, because the sequence identifiers need to be indexed by blast+. In order to be indexed, two conditions must be me:

  1. Each sequence identifier in the fasta file must conform to the NCBI naming standards.
  2. When issuing the makeblastdb command using the fasta file, the -parse_seqids flag must be used.

The simplest way to make the fasta file conform to the standard is to prepend 'gnl|blast|' before each sequence identifier. You might do this in a text editor by doing a find and replace of '>' for '>gnl|blast|' but you have to watch out there are no other '>' characters not at the start of the identifier line. The same task can also be achieved by using sed, a command line tool that comes by default with OSX and Linux:

$ sed 's/^>/>gnl|blast|/' mysequences.fasta >mysequences.ncbi_standard_ids.fasta

Then all that remains is to create the binary BLAST database using the -parse_seqids flag. For instance, if mysequences.ncbi_standard_ids.fasta is a fasta file of nucleotide sequences,

$ makeblastdb -in mysequences.ncbi_standard_ids.fasta -dbtype nucl -parse_seqids
Building a new DB, current time: 09/23/2010 14:12:18
New DB name:   mysequences.ncbi_standard_ids.fasta
New DB title:  mysequences.ncbi_standard_ids.fasta
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Adding sequences from FASTA; added 1620 sequences in 0.207906 seconds.

Then the name of the database to specify in blast.rc and blast.html will be 'mysequences.ncbi_standard_ids.fasta'.

To test if the parsing worked, use the blastdbcmd tool, like so:

$ blastdbcmd -entry 'gnl|blast|Contig2' -db mysequences.ncbi_standard_ids.fasta
>Contig2
ATGCAAAACCCCCCCC

Using blast_link

Once you have setup blast.rc and blast.html a la wwwblast, blast_link is setup and ready to go. From the entry form e.g. http://localhost/blast/blast_link.cgi, you should be able to click on hits to get their full sequences.

Acknowledgements

Blast_link was created by Ben J. Woodcroft working in Bernie Degnan's group at the University of Queensland and Stuart Ralph's group at the University of Melbourne. Creation of this wiki page was funded by Dan Jackson's group at the University of Göttingen.

Personal tools