User talk:Darek Kedra/sandbox 28: Difference between revisions
From OpenWetWare
Jump to navigationJump to search
Darek Kedra (talk | contribs) m (→Software list) |
Darek Kedra (talk | contribs) |
||
Line 31: | Line 31: | ||
# cufflinks http://cufflinks.cbcb.umd.edu/ (may require Boost libs!) | # cufflinks http://cufflinks.cbcb.umd.edu/ (may require Boost libs!) | ||
# GEMtools https://github.com/gemtools/gemtools | # GEMtools https://github.com/gemtools/gemtools | ||
==Vagrant fixes == | |||
For X11 forwarding the Vagrantfile has to contain | |||
config.ssh.forward_x11 = true | |||
==Introduction to Linux and the command line== | ==Introduction to Linux and the command line== | ||
Line 41: | Line 48: | ||
#pipelines and redirection | #pipelines and redirection | ||
#awk in 5 minutes | #awk in 5 minutes | ||
#where to go from there (clusters, python) | #where to go from there (clusters, python) | ||
==FASTQ== | ==FASTQ== | ||
#Illumina file formats (quality encodings) | #Illumina file formats (quality encodings) |
Revision as of 16:34, 6 November 2013
Winterschool program
Software list
Basics
- linux Ubuntu 12.04.3 vs Debian 7.1 (think about 32 vs 64 bit versions)
- java http://www.java.com/en/download/linux_manual.jsp?locale=en
Specific tools 1
- TagDust: http://genome.gsc.riken.jp/osc/english/software/src/tagdust.tgz
- fastareformat from fastareformat exonerate-2.2.0 [1]
- fixing fasta headers (gff fields) with python? small script
- GEM [2]
- CAVEAT: (problem with cores on different laptops...)
http://sourceforge.net/projects/gemlibrary/files/gem-library/Binary%20pre-release%203/
- BWA http://sourceforge.net/projects/bio-bwa/files/
- Stampy http://www.well.ox.ac.uk/~gerton/software/Stampy/stampy-1.0.22r1848.tgz
- last http://last.cbrc.jp/ (the 362 versiona has split and splice-mapping options)
- bowtie http://bowtie-bio.sourceforge.net/bowtie2/index.shtml (bowtie2)
- samtools http://sourceforge.net/projects/samtools/files/
- picard http://sourceforge.net/projects/picard/files/
- IGV/ IGVtools http://www.broadinstitute.org/software/igv/download
- bamtools https://github.com/pezmaster31/bamtools
- requires cmake: http://www.cmake.org/files/v2.8/cmake-2.8.12.tar.gz (or apt get)
- bedtools http://code.google.com/p/bedtools/downloads/list
- GATK http://www.broadinstitute.org/gatk/auth?package=GATK (download yourself: license!)
- vcftools http://sourceforge.net/projects/vcftools/files/
Specific tools 2/RNA-Seq
- tophat http://tophat.cbcb.umd.edu/
- cufflinks http://cufflinks.cbcb.umd.edu/ (may require Boost libs!)
- GEMtools https://github.com/gemtools/gemtools
Vagrant fixes
For X11 forwarding the Vagrantfile has to contain
config.ssh.forward_x11 = true
Introduction to Linux and the command line
- why Linux?
- logging in, connecting to other servers with ssh / sftp
- copy, rename/move files, create directories, symbolic links
- view files (more/less, head, tail), count (wc)
- search for strings / replace strings (grep & sed)
- compressing / uncompressing files (gzip, bzip2, tar)
- pipelines and redirection
- awk in 5 minutes
- where to go from there (clusters, python)
FASTQ
- Illumina file formats (quality encodings)
- paired / unpaired reads
- quality checking (fastqc)
- trimming & filtering (TagDust)
- source of published FASTQ data: Short Read Archive vs ENA
Genomic fasta and gtf/gff gene annotation
- resources at ENSEMBL
- basic checks and reformatting
- grepping fasta headers
- fasta reformat from exonerate??
Mapping genomic reads
- overview of mappers
- GEM
- bwa +/- stampy
- last / bowtie
- mapping steps (for each mapper)
- genome indexing
- mapping
- +/- postprocessing
SAM and BAM file formats
- Analyzing BAM files
- sorting / indexing
- viewing the mappings in IGV
tools for processing BAM files
- samtools
- picard
- bamtools
getting mapping stats
- extracting reads mapping to regions
- getting coverage info for selected regions
Detecting SNPs
- general procedure
- GATK pipeline
- other SNP calling programs [tba]
Working with VCF files
- VCF file format
- viewing VCFs in IGV
- filtering SNPs by quality
- set operations on VCF files (common SNPs, unique SNPs)
RNASeq
- caveats (ribosomal RNA contamination)
- mapping RNASeq
- tophat
- GRAPE
- creating gene models from RNASeq (cufflinks)