Wayne:High Throughput Sequencing Resources: Difference between revisions
From OpenWetWare
Jump to navigationJump to search
No edit summary |
No edit summary |
||
Line 4: | Line 4: | ||
== Basic unix and usage of Sirius (our lab server) == | == Basic unix and usage of Sirius (our lab server) == | ||
Sirius is our analytical powerhouse (64 cores, amazing for parallel computing; 512Gb memory; 64 bit file system in the x86_64 configuration) and we have specific locations on the server to do specific jobs. It is stored in a lovely server closet and so the way to access it is though a secure shell (''ssh''). Your username and password are obtained through our IT staff. Once you have logged on, there are a series of commands and "server etiquette" you will need to follow. For the PDF, click [http://openwetware.org/images/5/5e/Sirius_rules.pdf here]. | Sirius is our analytical powerhouse (64 cores, amazing for parallel computing; 512Gb memory; 64 bit file system in the x86_64 configuration) and we have specific locations on the server to do specific jobs. It is stored in a lovely server closet and so the way to access it is though a secure shell (''ssh''). Your username and password are obtained through our IT staff. Once you have logged on, there are a series of commands and "server etiquette" you will need to follow. For the PDF, click [http://openwetware.org/images/5/5e/Sirius_rules.pdf here]. <br> | ||
'''Login''' | |||
*ssh user@sirius.eeb.ucla.edu | *ssh user@sirius.eeb.ucla.edu | ||
*slogin user@sirius.eeb.ucla.edu | *slogin user@sirius.eeb.ucla.edu | ||
Line 13: | Line 13: | ||
*To change the default password you are given, use: | *To change the default password you are given, use: | ||
**passwd | **passwd | ||
*To logout of the server | |||
**logout (or control+D) | |||
<br> | |||
'''Structure and organization''' | |||
*Your home (user) director holds <5Gb of data (be aware!) | *Your home (user) director holds <5Gb of data (be aware!) | ||
**/home/user | **/home/user | ||
Line 28: | Line 31: | ||
*The location to place scripts and data ONLY while you are working with it | *The location to place scripts and data ONLY while you are working with it | ||
**/work/user | **/work/user | ||
<br> | |||
'''Rules''' | |||
*Developing a pipeline: | *Developing a pipeline: | ||
**copy a small but representative part of your data to sirius | **copy a small but representative part of your data to sirius | ||
Line 41: | Line 45: | ||
*Never start more jobs than the number of available cores (e.g. If there are 50 jobs running, do NOT submit more than 14 to make a total of 64 jobs)!! | *Never start more jobs than the number of available cores (e.g. If there are 50 jobs running, do NOT submit more than 14 to make a total of 64 jobs)!! | ||
*Look at the memory and cpu usage before you start to load sirius with commands (cmd) | *Look at the memory and cpu usage before you start to load sirius with commands (cmd) | ||
* | **htop --- use to view real-time CPU usage | ||
**top --- displays the top CPU processes/jobs and provides an ongoing look at processor activity in real time. It displays a listing of the most CPU-intensive tasks on the system, and can provide an interactive interface for manipulating processes. It can sort the tasks by CPU usage, memory usage and runtime. | |||
**top | |||
*If you don't know something, use manual | *If you don't know something, use manual | ||
**man ls | **man ls --- to look up the functionality of the ls tool, use Google, or ask admins (Jonathan or Ron) or in-lab (Rena or Pedro) | ||
*''mpstat'' --- to display the utilization of each CPU individually. It reports processors related statistics | |||
*''mpstat -P ALL'' --- the mpstat command display activities for each available processor, processor 0 being the first one. Global average activities among all processors are also reported | |||
*''sar'' --- displays the contents of selected cumulative activity counters in the operating system | |||
<br> | |||
'''Installing programs yourself''' | |||
*Check if it's already installed | *Check if it's already installed | ||
* | *mkdir ~/bin --- to creak a directory in your home folder | ||
* | *cat .bash_profile --- put it in your path or check to see if it's already there | ||
*PATH=$PATH:$HOME/bin | |||
*export PATH | |||
*compile it with prefix ~/bin --- install programs to bin | |||
<br> | |||
'''Data transfer (network)''' | |||
*scp ''options'' user@host_source:path/to/file1 user@host_dest:/dest/path/to/file2 -- Command Line Interface (CLI) for moving files | |||
*scp -r user@host_source:path/to/dir user@host_dest:/dest/path -- Command Line Interface (CLI) for moving directories | |||
* | *FileZilla, Cyberduck, Fugu, etc..... -- Graphical User Interface (GUI) | ||
*df -h -- check disk usage | |||
*du -hs /path --- check disk space used by a directory | |||
* | *du -h -max-depth=1 /path --- check disk space used by a directory | ||
* | |||
'''Data editing''' | |||
**vim ''filename'' --- to edit the file | |||
**vim filename | |||
'''History''' | |||
*ctrl+r | *ctrl+r --- searching history | ||
*history | *history --- display history | ||
*!#cmd_num | *!#cmd_num --- display history | ||
*Arrow up is a short cut to scroll through recently used commands | *Arrow up is a short cut to scroll through recently used commands | ||
'''Files''' | '''Files''' | ||
*ls --- lists your files | *ls --- lists your files | ||
*ls -l --- | *ls -l --- lists your files in long format | ||
* - | *ls -a --- shows hidden files | ||
*ls -t --- sorted by time modified instead of name | |||
* -- | *more ''filename'' --- shows first part of a file; hit space bar to see more | ||
*head ''filename'' --- print to screen the top 10 lines or so of the specified file | |||
*tail ''filename'' --- print to screen the last 10 lines or so of the specified file | |||
*emacs ''filename'' --- an editor for editing a file | |||
*cp ''filename1'' ''filename2'' --- copies a file in your current location | |||
*cp ''path/to/filename1'' ''path/to/filename2'' --- you can specify a file copy at another location | |||
*rm ''filename'' --- permanently remove a file (Caution! This cannot be undone!) | |||
*diff ''filename1'' ''filename2'' --- compares files and shows where they differ | |||
*wc ''filename'' --- tells you how many lines (whitespace or newline delimited), words, and characters (bytes) are in a file | |||
*wc -l ''filename'' --- tells you how many lines are in a file (whitespace or newline delimited) | |||
*wc -w ''filename'' --- tells you how many words are in a file | |||
*wc -c ''filename'' --- tells you how many characters (bytes) are in a file | |||
*chmod ''options'' ''filename'' --- change the read, write, and execute permissions for a file (Google this!) | |||
<br> | <br> | ||
'''File compression''' | |||
*gzip ''filename'' --- compresses files to make a file with a .gz extension | |||
*gzip -c ''filename'' >''filename.gz'' --- compress file into tar.gz; the ">" means print to outfile ''filename.gz'' | |||
*gunzip ''filename'' ---uncompress a gzip file | |||
*tar -xzf ''filename.tar.gz'' --- decompressing a tar.gz file | |||
*gzcat ''filename'' --- lets y ou look at a gzipped file without having to gunzip it | |||
<br> | <br> | ||
'''Directories''' | |||
* pwd --- prints working directory (your current location) | |||
* cd /path/to/desired/location --- change directories by providing path | |||
* cd ../ --- go up one directory | |||
*mkdir ''directoryName'' --- make a new directory | |||
*rmdir ''directoryName'' --- remove directory (must be empty)...Remember that you cannot undo this move! | |||
*rmdir -r ''directoryName'' --- recursively remove directory and the files it contains...Remember that you cannot undo this move! | |||
*rmdir ''filename'' --- remove specified file...Remember that you cannot undo this move! | |||
<br> | |||
'''Finding things''' | |||
*whereis [''filename, command''] --- lists all occurances of filename or command | |||
*~''/path'' --- tilde designated a shortcut for the path to your home directory | |||
*nohup ''commands'' & --- to initiate a no-hangup background job (writes stdout to nohup.out) | |||
*screen --- to initiate a new screen session to start a new background job (ctrl+a+d if you need to detach; screen -ls to list running screens; reattach screen pid) | |||
<br> | |||
< | |||
<br> | <br> |
Revision as of 08:44, 20 February 2013
Basic unix and usage of Sirius (our lab server)
Sirius is our analytical powerhouse (64 cores, amazing for parallel computing; 512Gb memory; 64 bit file system in the x86_64 configuration) and we have specific locations on the server to do specific jobs. It is stored in a lovely server closet and so the way to access it is though a secure shell (ssh). Your username and password are obtained through our IT staff. Once you have logged on, there are a series of commands and "server etiquette" you will need to follow. For the PDF, click here.
Login
- ssh user@sirius.eeb.ucla.edu
- slogin user@sirius.eeb.ucla.edu
- to learn about the server:
- uname -a
- To change the default password you are given, use:
- passwd
- To logout of the server
- logout (or control+D)
Structure and organization
- Your home (user) director holds <5Gb of data (be aware!)
- /home/user
- For genomes and databases
- /databases
- Location of installed programs
- /usr/local/bin
- /opt/
- The location to store your data
- /data/
- /data/user
- You can create your own personal directory if you'd like (see below for commands)
- The location to place scripts and data ONLY while you are working with it
- /work/user
Rules
- Developing a pipeline:
- copy a small but representative part of your data to sirius
- run all the programs you need on them
- debug and save final version of pipeline e.g. in a text file
- copy all your data
- run your pipeline on all data
- debug and update pipeline
- mv results wherever you want
- erase data
- Never start more jobs than the number of available cores (e.g. If there are 50 jobs running, do NOT submit more than 14 to make a total of 64 jobs)!!
- Look at the memory and cpu usage before you start to load sirius with commands (cmd)
- htop --- use to view real-time CPU usage
- top --- displays the top CPU processes/jobs and provides an ongoing look at processor activity in real time. It displays a listing of the most CPU-intensive tasks on the system, and can provide an interactive interface for manipulating processes. It can sort the tasks by CPU usage, memory usage and runtime.
- If you don't know something, use manual
- man ls --- to look up the functionality of the ls tool, use Google, or ask admins (Jonathan or Ron) or in-lab (Rena or Pedro)
- mpstat --- to display the utilization of each CPU individually. It reports processors related statistics
- mpstat -P ALL --- the mpstat command display activities for each available processor, processor 0 being the first one. Global average activities among all processors are also reported
- sar --- displays the contents of selected cumulative activity counters in the operating system
Installing programs yourself
- Check if it's already installed
- mkdir ~/bin --- to creak a directory in your home folder
- cat .bash_profile --- put it in your path or check to see if it's already there
- PATH=$PATH:$HOME/bin
- export PATH
- compile it with prefix ~/bin --- install programs to bin
Data transfer (network)
- scp options user@host_source:path/to/file1 user@host_dest:/dest/path/to/file2 -- Command Line Interface (CLI) for moving files
- scp -r user@host_source:path/to/dir user@host_dest:/dest/path -- Command Line Interface (CLI) for moving directories
- FileZilla, Cyberduck, Fugu, etc..... -- Graphical User Interface (GUI)
- df -h -- check disk usage
- du -hs /path --- check disk space used by a directory
- du -h -max-depth=1 /path --- check disk space used by a directory
Data editing
- vim filename --- to edit the file
History
- ctrl+r --- searching history
- history --- display history
- !#cmd_num --- display history
- Arrow up is a short cut to scroll through recently used commands
Files
- ls --- lists your files
- ls -l --- lists your files in long format
- ls -a --- shows hidden files
- ls -t --- sorted by time modified instead of name
- more filename --- shows first part of a file; hit space bar to see more
- head filename --- print to screen the top 10 lines or so of the specified file
- tail filename --- print to screen the last 10 lines or so of the specified file
- emacs filename --- an editor for editing a file
- cp filename1 filename2 --- copies a file in your current location
- cp path/to/filename1 path/to/filename2 --- you can specify a file copy at another location
- rm filename --- permanently remove a file (Caution! This cannot be undone!)
- diff filename1 filename2 --- compares files and shows where they differ
- wc filename --- tells you how many lines (whitespace or newline delimited), words, and characters (bytes) are in a file
- wc -l filename --- tells you how many lines are in a file (whitespace or newline delimited)
- wc -w filename --- tells you how many words are in a file
- wc -c filename --- tells you how many characters (bytes) are in a file
- chmod options filename --- change the read, write, and execute permissions for a file (Google this!)
File compression
- gzip filename --- compresses files to make a file with a .gz extension
- gzip -c filename >filename.gz --- compress file into tar.gz; the ">" means print to outfile filename.gz
- gunzip filename ---uncompress a gzip file
- tar -xzf filename.tar.gz --- decompressing a tar.gz file
- gzcat filename --- lets y ou look at a gzipped file without having to gunzip it
Directories
- pwd --- prints working directory (your current location)
- cd /path/to/desired/location --- change directories by providing path
- cd ../ --- go up one directory
- mkdir directoryName --- make a new directory
- rmdir directoryName --- remove directory (must be empty)...Remember that you cannot undo this move!
- rmdir -r directoryName --- recursively remove directory and the files it contains...Remember that you cannot undo this move!
- rmdir filename --- remove specified file...Remember that you cannot undo this move!
Finding things
- whereis [filename, command] --- lists all occurances of filename or command
- ~/path --- tilde designated a shortcut for the path to your home directory
- nohup commands & --- to initiate a no-hangup background job (writes stdout to nohup.out)
- screen --- to initiate a new screen session to start a new background job (ctrl+a+d if you need to detach; screen -ls to list running screens; reattach screen pid)
High throughput (HT) platform and read types
- ABI-SOLiD
- Illumina single-end vs. paired-end
- Ion Torrent
- MiSeq
- Roche-454
- Solexa
CBI Collaboratory
UCLA's
Computational Biosciences Institute Collaboratory hosts a variety of 3-day workshops that provide both a general introduction to genome/bioinformatic sciences as well as more advanced (focus) workshops (e.g. ChIP-Seq; BS-Seq; Exome sequencing). The CBI Collaboratory focuses on a set of publicly available resources, from the web-based bioinformatic tool Galaxy/UCLA (resource for HT workflows and is a central location of a variety of HT tools for multiple platforms and data types), but also tools such as R and Matlab. The introductory workshops do not require any programming experience and the Collaboratory Fellows additionally serve as a counseling resource for data analysis.
File formats and conversions
- blc
- qseq
- fastq
Deplexing using barcoded sequence tags
- Editing (or hamming) distance
Quality control
- Fastx tools
- Using mapping as the quality control for reads
Trimming and clipping
- Trim based on low quality scored per nucleotide position within a read
- Clip sequence artefacts (e.g. adapters, primers)
FASTQC and FASTX tools
BED and SAM tools
GATK variant calling
R basics
Python basics
HT sequence analysis using R (and Bioconductor)
DNA sequence analysis
RNA-seq analysis
Common objectives of transcriptome analysis:
- Quantifying and annotating aligned reads
- Normalizing RNA-Seq read count data and identifying differentially expressed genes (DEG) (R packages):
- easyRNASeq (simplifies read counting per genome feature)
- DEXSeq (Inference of differential exon usage)
- baySeq (also see: segmentSeq)
- Genominator (Bullard et al. 2010)
- Detection of alternative splice junctions
SOLiD software tools