User:Timothee Flutre/Notebook/Postdoc/2013/12/01: Difference between revisions
From OpenWetWare
(Autocreate 2013/12/01 Entry for User:Timothee_Flutre/Notebook/Postdoc) |
(→Entry title: first version) |
||
Line 6: | Line 6: | ||
| colspan="2"| | | colspan="2"| | ||
<!-- ##### DO NOT edit above this line unless you know what you are doing. ##### --> | <!-- ##### DO NOT edit above this line unless you know what you are doing. ##### --> | ||
== | ==One-liners for high-throughput sequencing data== | ||
* '''softwares''': see BWA, Bowtie, MOSAIK, etc; they take fastq files as input and return bam files as output | |||
IN1="reads_R1.fastq.gz" | |||
IN2="reads_R2.fastq.gz" | |||
OUT="alignments.bam" | |||
* '''total number of reads in the fastq file''': | |||
nbLines=$(zcat $IN1 | wc -l); echo "scale=0; "${nbLines}"/4" | bc -l | |||
* '''total number of reads in the bam file''': should be equal to the nb of reads in the fastq file if no filtering was made | |||
samtools view $OUT | cut -f1 | sort | uniq | wc -l | |||
* '''flag statistics in SAM/BAM''': | |||
samtools flagstat $OUT | |||
which returns something like: | |||
4635834 + 0 in total (QC-passed reads + QC-failed reads) | |||
20290 + 0 secondary | |||
0 + 0 supplimentary | |||
0 + 0 duplicates | |||
4443270 + 0 mapped (95.85%:-nan%) | |||
4615544 + 0 paired in sequencing | |||
2307772 + 0 read1 | |||
2307772 + 0 read2 | |||
4299122 + 0 properly paired (93.14%:-nan%) | |||
4412810 + 0 with itself and mate mapped | |||
10170 + 0 singletons (0.22%:-nan%) | |||
57898 + 0 with mate mapped to a different chr | |||
44330 + 0 with mate mapped to a different chr (mapQ>=5) | |||
* '''total number of entries in the bam file''': same as line 1 | |||
samtools view $OUT | wc -l | |||
* '''list of different flags in the bam file''': along with their number of occurrences | |||
samtools view $OUT | cut -f2 | sort | uniq -c | |||
* '''total number of mapped entries''': same as line 5 | |||
samtools view -F 4 $OUT | wc -l | |||
* '''total number of unmapped entries''': same as line 1 - line 5 | |||
samtools view -f 4 $OUT | wc -l | |||
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### --> | <!-- ##### DO NOT edit below this line unless you know what you are doing. ##### --> |
Revision as of 06:48, 1 July 2015
Project name | <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html> |
One-liners for high-throughput sequencing data
IN1="reads_R1.fastq.gz" IN2="reads_R2.fastq.gz" OUT="alignments.bam"
nbLines=$(zcat $IN1 | wc -l); echo "scale=0; "${nbLines}"/4" | bc -l
samtools view $OUT | cut -f1 | sort | uniq | wc -l
samtools flagstat $OUT which returns something like: 4635834 + 0 in total (QC-passed reads + QC-failed reads) 20290 + 0 secondary 0 + 0 supplimentary 0 + 0 duplicates 4443270 + 0 mapped (95.85%:-nan%) 4615544 + 0 paired in sequencing 2307772 + 0 read1 2307772 + 0 read2 4299122 + 0 properly paired (93.14%:-nan%) 4412810 + 0 with itself and mate mapped 10170 + 0 singletons (0.22%:-nan%) 57898 + 0 with mate mapped to a different chr 44330 + 0 with mate mapped to a different chr (mapQ>=5)
samtools view $OUT | wc -l
samtools view $OUT | cut -f2 | sort | uniq -c
samtools view -F 4 $OUT | wc -l
samtools view -f 4 $OUT | wc -l |