User:Lindenb/Notebook/UMR915/20110222: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 61: Line 61:


for the Align param ( http://code.google.com/p/mosaik-aligner/wiki/ParameterSettings )
for the Align param ( http://code.google.com/p/mosaik-aligner/wiki/ParameterSettings )
==Align==
<pre>/usr/local/package/mosaik-aligner/bin/MosaikAligner -mmp 0.05 -act 26 -bw 51 -mhp 100  -p 7 -in ../reads1.mkb -ia _ignore.backup/hg19/reference.dat -j _ignore.backup/hg19/reference_hs15 -out align1.hg19.mka
------------------------------------------------------------------------------
MosaikAligner 1.1.0018                                              2010-10-29
Michael Stromberg & Wan-Ping Lee  Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------
- Using the following alignment algorithm: all positions
- Using the following alignment mode: aligning reads to all possible locations
- Using a maximum mismatch percent threshold of 0.05
- Using 7 processors
- Using a Smith-Waterman bandwidth of 51
- Using an alignment candidate threshold of 26bp.
- Setting hash position threshold to 100
- Using a jump database for hashing. Storing keys & positions in memory.
- Using a homo-polymer gap open penalty of 4
- loading reference sequence... finished.
- loading jump key database into memory... finished.
- loading jump positions database into memory... finished.
Aligning read library (317992):
100%[============================================================================================================================]      83.7 reads/s  in  1:03:18 
Alignment statistics (mates):
===================================
# filtered out:    19943 (  6.3 %)
# unique:          288449 ( 90.7 %)
# non-unique:        9600 (  3.0 %)
-----------------------------------
total:            317992
total aligned:    298049 ( 93.7 %)
MosaikAligner CPU time: 26744.380 s, wall time: 4015.994 s
/usr/local/package/mosaik-aligner/bin/MosaikAligner -mmp 0.05 -act 26 -bw 51 -mhp 100  -p 7 -in ../reads2.mkb -ia _ignore.backup/hg19/reference.dat -j _ignore.backup/hg19/reference_hs15 -out align2.hg19.mka
------------------------------------------------------------------------------
MosaikAligner 1.1.0018                                              2010-10-29
Michael Stromberg & Wan-Ping Lee  Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------
- Using the following alignment algorithm: all positions
- Using the following alignment mode: aligning reads to all possible locations
- Using a maximum mismatch percent threshold of 0.05
- Using 7 processors
- Using a Smith-Waterman bandwidth of 51
- Using an alignment candidate threshold of 26bp.
- Setting hash position threshold to 100
- Using a jump database for hashing. Storing keys & positions in memory.
- Using a homo-polymer gap open penalty of 4
- loading reference sequence... finished.
- loading jump key database into memory... finished.
- loading jump positions database into memory... finished.
Aligning read library (563204):
100%[============================================================================================================================]    131.1 reads/s  in  1:11:35 
Alignment statistics (mates):
===================================
# filtered out:    28749 (  5.1 %)
# unique:          519486 ( 92.2 %)
# non-unique:      14969 (  2.7 %)
-----------------------------------
total:            563204
total aligned:    534455 ( 94.9 %)
MosaikAligner CPU time: 30219.450 s, wall time: 4500.977 s</pre>

Revision as of 06:07, 22 February 2011

20110221        Top        20110223       


Belgium

re-aligning on hg19 . see Makefile in

 /GENOTYPAGE/data/users/lindenb/20101108_belgium/20110222
/usr/local/package/mosaik-aligner/bin/MosaikBuild -fr /GENOTYPAGE/data/pubdb/ucsc/hg19/chromosomes/hg19.fa -oa _ignore.backup/hg19/reference.dat
------------------------------------------------------------------------------
MosaikBuild 1.1.0018                                                2010-10-29
Michael Stromberg & Wan-Ping Lee  Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

- converting /GENOTYPAGE/data/pubdb/ucsc/hg19/chromosomes/hg19.fa to a reference sequence archive.

- parsing reference sequences:
ref seqs: 25 (0.2884 ref seqs/s)

- writing reference sequences:
100%[===========================================================================================================]    0.8177 ref seqs/s        in 30 s  

- calculating MD5 checksums:
100%[===========================================================================================================]      1.43 ref seqs/s        in 17 s  

- writing reference sequence index:
100%[===========================================================================================================]      25.0 ref seqs/s        in  1 s  

- creating concatenated reference sequence:
100%[===========================================================================================================]      7.13 ref seqs/s        in  3 s  

- writing concatenated reference sequence...        finished.
- creating concatenated 2-bit reference sequence... finished.
- writing concatenated 2-bit reference sequence...  finished.
- writing masking vector...                         finished.

MosaikBuild CPU time: 187.420 s, wall time: 200.587 s
/usr/local/package/mosaik-aligner/bin/MosaikJump -ia _ignore.backup/hg19/reference.dat -hs 15 -out _ignore.backup/hg19/reference_hs15
------------------------------------------------------------------------------
MosaikJump 1.1.0018                                                 2010-10-29
Michael Stromberg                 Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------


- retrieving reference sequence... finished.

- hashing reference sequence:
100%[=============================================================================================================] 2,301,657 hashes/s       in 22:24  

- serializing final sorting vector... finished.

- writing jump positions database:
100%[=====================================================================================================] 1,423,918 hash positions/s       in 33:29  

- serializing jump keys database (17 blocks):
blocks: 17 (0.9169 blocks/s)

MosaikJump CPU time: 3327.950 s, wall time: 3405.920 s

mean length for sample 1 is 385.282

gunzip -c 1.TCA.454Reads.fna.gz | egrep ">" | tr " " "\n" | grep length | cut -d '=' -f 2 | awk '{n+=1.0; total+=int($1)*1.0;} END { print total/n;}'

using

 -mmp 0.05 -act 26 -bw 51 -mhp 100 

for the Align param ( http://code.google.com/p/mosaik-aligner/wiki/ParameterSettings )

Align

/usr/local/package/mosaik-aligner/bin/MosaikAligner -mmp 0.05 -act 26 -bw 51 -mhp 100  -p 7 -in ../reads1.mkb -ia _ignore.backup/hg19/reference.dat -j _ignore.backup/hg19/reference_hs15 -out align1.hg19.mka
------------------------------------------------------------------------------
MosaikAligner 1.1.0018                                              2010-10-29
Michael Stromberg & Wan-Ping Lee  Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

- Using the following alignment algorithm: all positions
- Using the following alignment mode: aligning reads to all possible locations
- Using a maximum mismatch percent threshold of 0.05
- Using 7 processors
- Using a Smith-Waterman bandwidth of 51
- Using an alignment candidate threshold of 26bp.
- Setting hash position threshold to 100
- Using a jump database for hashing. Storing keys & positions in memory.
- Using a homo-polymer gap open penalty of 4
- loading reference sequence... finished.
- loading jump key database into memory... finished.
- loading jump positions database into memory... finished.

Aligning read library (317992):
100%[============================================================================================================================]      83.7 reads/s   in   1:03:18  

Alignment statistics (mates):
===================================
# filtered out:     19943 (  6.3 %)
# unique:          288449 ( 90.7 %)
# non-unique:        9600 (  3.0 %)
-----------------------------------
total:             317992
total aligned:     298049 ( 93.7 %)

MosaikAligner CPU time: 26744.380 s, wall time: 4015.994 s
/usr/local/package/mosaik-aligner/bin/MosaikAligner -mmp 0.05 -act 26 -bw 51 -mhp 100  -p 7 -in ../reads2.mkb -ia _ignore.backup/hg19/reference.dat -j _ignore.backup/hg19/reference_hs15 -out align2.hg19.mka
------------------------------------------------------------------------------
MosaikAligner 1.1.0018                                              2010-10-29
Michael Stromberg & Wan-Ping Lee  Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

- Using the following alignment algorithm: all positions
- Using the following alignment mode: aligning reads to all possible locations
- Using a maximum mismatch percent threshold of 0.05
- Using 7 processors
- Using a Smith-Waterman bandwidth of 51
- Using an alignment candidate threshold of 26bp.
- Setting hash position threshold to 100
- Using a jump database for hashing. Storing keys & positions in memory.
- Using a homo-polymer gap open penalty of 4
- loading reference sequence... finished.
- loading jump key database into memory... finished.
- loading jump positions database into memory... finished.

Aligning read library (563204):
100%[============================================================================================================================]     131.1 reads/s   in   1:11:35  

Alignment statistics (mates):
===================================
# filtered out:     28749 (  5.1 %)
# unique:          519486 ( 92.2 %)
# non-unique:       14969 (  2.7 %)
-----------------------------------
total:             563204
total aligned:     534455 ( 94.9 %)

MosaikAligner CPU time: 30219.450 s, wall time: 4500.977 s