User talk:Darek Kedra/sandbox 28: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 31: Line 31:
# cufflinks http://cufflinks.cbcb.umd.edu/ (may require Boost libs!)
# cufflinks http://cufflinks.cbcb.umd.edu/ (may require Boost libs!)
# GEMtools https://github.com/gemtools/gemtools
# GEMtools https://github.com/gemtools/gemtools
==Vagrant fixes ==
For X11 forwarding the Vagrantfile has to contain
config.ssh.forward_x11 = true


==Introduction to Linux and the command line==
==Introduction to Linux and the command line==
Line 41: Line 48:
#pipelines and redirection
#pipelines and redirection
#awk in 5 minutes
#awk in 5 minutes
#where to go from there (clusters, python)  
#where to go from there (clusters, python)
 
==FASTQ==  
==FASTQ==  
#Illumina file formats (quality encodings)
#Illumina file formats (quality encodings)

Revision as of 16:34, 6 November 2013

Winterschool program

Software list

Basics

  1. linux Ubuntu 12.04.3 vs Debian 7.1 (think about 32 vs 64 bit versions)
  2. java http://www.java.com/en/download/linux_manual.jsp?locale=en

Specific tools 1

  1. FastQC http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
  1. TagDust: http://genome.gsc.riken.jp/osc/english/software/src/tagdust.tgz
  2. fastareformat from fastareformat exonerate-2.2.0 [1]
  3. fixing fasta headers (gff fields) with python? small script
  4. GEM [2]
    1. CAVEAT: (problem with cores on different laptops...)

http://sourceforge.net/projects/gemlibrary/files/gem-library/Binary%20pre-release%203/

  1. BWA http://sourceforge.net/projects/bio-bwa/files/
  2. Stampy http://www.well.ox.ac.uk/~gerton/software/Stampy/stampy-1.0.22r1848.tgz
  3. last http://last.cbrc.jp/ (the 362 versiona has split and splice-mapping options)
  4. bowtie http://bowtie-bio.sourceforge.net/bowtie2/index.shtml (bowtie2)
  5. samtools http://sourceforge.net/projects/samtools/files/
  6. picard http://sourceforge.net/projects/picard/files/
  7. IGV/ IGVtools http://www.broadinstitute.org/software/igv/download
  1. bamtools https://github.com/pezmaster31/bamtools
    1. requires cmake: http://www.cmake.org/files/v2.8/cmake-2.8.12.tar.gz (or apt get)
  2. bedtools http://code.google.com/p/bedtools/downloads/list
  3. GATK http://www.broadinstitute.org/gatk/auth?package=GATK (download yourself: license!)
  4. vcftools http://sourceforge.net/projects/vcftools/files/

Specific tools 2/RNA-Seq

  1. tophat http://tophat.cbcb.umd.edu/
  2. cufflinks http://cufflinks.cbcb.umd.edu/ (may require Boost libs!)
  3. GEMtools https://github.com/gemtools/gemtools

Vagrant fixes

For X11 forwarding the Vagrantfile has to contain

config.ssh.forward_x11 = true


Introduction to Linux and the command line

  1. why Linux?
  2. logging in, connecting to other servers with ssh / sftp
  3. copy, rename/move files, create directories, symbolic links
  4. view files (more/less, head, tail), count (wc)
  5. search for strings / replace strings (grep & sed)
  6. compressing / uncompressing files (gzip, bzip2, tar)
  7. pipelines and redirection
  8. awk in 5 minutes
  9. where to go from there (clusters, python)

FASTQ

  1. Illumina file formats (quality encodings)
  2. paired / unpaired reads
  3. quality checking (fastqc)
  4. trimming & filtering (TagDust)
  5. source of published FASTQ data: Short Read Archive vs ENA

Genomic fasta and gtf/gff gene annotation

  1. resources at ENSEMBL
  2. basic checks and reformatting
  • grepping fasta headers
  • fasta reformat from exonerate??

Mapping genomic reads

  1. overview of mappers
    1. GEM
    2. bwa +/- stampy
    3. last / bowtie
  2. mapping steps (for each mapper)
  3. genome indexing
  4. mapping
  5. +/- postprocessing

SAM and BAM file formats

  1. Analyzing BAM files
  2. sorting / indexing
  3. viewing the mappings in IGV

tools for processing BAM files

  1. samtools
  2. picard
  3. bamtools

getting mapping stats

  1. extracting reads mapping to regions
  2. getting coverage info for selected regions

Detecting SNPs

  1. general procedure
  2. GATK pipeline
  3. other SNP calling programs [tba]

Working with VCF files

  1. VCF file format
  2. viewing VCFs in IGV
  3. filtering SNPs by quality
  4. set operations on VCF files (common SNPs, unique SNPs)

RNASeq

  1. caveats (ribosomal RNA contamination)
  2. mapping RNASeq
  3. tophat
  4. GRAPE
  5. creating gene models from RNASeq (cufflinks)