Harvard:Biophysics 101/2009/Infrastructure: Difference between revisions
No edit summary |
|||
Line 37: | Line 37: | ||
* '''/home/trait/core/fastq_to_fasta.py''' | * '''/home/trait/core/fastq_to_fasta.py''' | ||
- strips quality data from a [[http://en.wikipedia.org/wiki/FASTQ_format|FASTQ]] format file and returns just the [[http://en.wikipedia.org/wiki/FASTA_format|FASTA]] format | |||
* '''/home/trait/core/gff_concordancy.py''' | * '''/home/trait/core/gff_concordancy.py''' | ||
* '''/home/trait/core/gff_dbsnp_query.py''' | * '''/home/trait/core/gff_dbsnp_query.py''' |
Revision as of 22:33, 14 November 2009
Infrastructure Background
This is a listing of background information for those interested in Trait-O-Matic
Tasks
- Text Classification
--I already have MeSH parsed into a mySQL database, I am currently working on ICD-10 as a supplemental hierarchy of diseases [[1]]. As soon as this is ready the data will be applied to the classification.
- Reference Extraction
http://incubator.apache.org/pdfbox/ is a Java-based PDF text extractor that we can use to extract paragraphs out from references so that we can then display them together with traits. I will show a demonstration of text extraction in class today!
Overview
/home/trait/core
- /home/trait/core/affx_500k_to_gff.py
- outputs GFF records for each SNP in an Affymetrix 500k Genechip file
- /home/trait/core/cgi_to_gff.py
- outputs GFF record of each entry in the Complete Genomics csv file
- /home/trait/core/codon.py
- codon_123(input) -- returns a three letter amino acid abbreviation given a single letter code input
- codon_321(input) -- returns a single letter code given a three letter amino acid abbreviation
- /home/trait/core/config.py
- contains the configuration for t-o-m, such as passwords for databases and the like
- /home/trait/core/fastq_to_fasta.py
- strips quality data from a [[2]] format file and returns just the [[3]] format
- /home/trait/core/gff_concordancy.py
- /home/trait/core/gff_dbsnp_query.py
- /home/trait/core/gff_hgmd_map.py
- /home/trait/core/gff_intersect.py
- /home/trait/core/gff_morbid_map.py
- /home/trait/core/gff_nonsynonymous_filter.py
- /home/trait/core/gff_omim_map.py
- /home/trait/core/gff_pharmgkb_map.py
- /home/trait/core/gff_snpedia_map.py
- /home/trait/core/gff_sort.pl
- /home/trait/core/gff_subtract.py
- /home/trait/core/gff_twobit_map.py
- /home/trait/core/gff_twobit_query.py
- /home/trait/core/hapmap_load_database.py
- /home/trait/core/json_allele_frequency_query.py
- /home/trait/core/json_to_job_database.py
- /home/trait/core/maq_snp_to_gff.py
- /home/trait/core/omim_print_variants.py
- /home/trait/core/setup.py
- /home/trait/core/snpedia.py
- Outputs tab-separated variant information (into data/snpedia.txt) for each entry in SNPedia
- /home/trait/core/snpedia_print_genotypes.py
- Goes through snpedia.txt and prints out the associated genotypes (found in the snp19 database)
- /home/trait/core/snpinduse_to_gff.py
- /home/trait/core/trait-o-matic-server.py
- /home/trait/core/venter_gff_snp_to_gff.py
- /home/trait/core/warehouse.py
- /home/trait/core/watson_gff_to_gff.py
- /home/trait/core/yh_gff_snp_to_gff.py
- /home/trait/data
CodeIgniter Frontend
Database Contents
Revision Control
Joining
- Acquire a Harvard Ethics Training in Human Research (HETHR) certificate by completing the training at [4]. This should take around 2 hours, you have to read 6 of the required sections and 4 of the "electives". Email your certificate to Sasha (awaitz@post.harvard)
- Sign up for access to the control panel at [5]
- Create an RSA ssh public key. This will allow the server to authenticate you. This is done by using the command
ssh-keygen -t rsa
Ensure that your private and public keys are in your .ssh or .openssh directories. Otherwise, ssh will not know where to look for them.
- Upload your public key to the control panel
- Follow the directions on the front page of the control panel. They will tell you to edit your .ssh/config file by adding:
Host *.oxf
ProxyCommand ssh -p2222 turnout@switchyard.oxf.freelogy.org -x -a -o "Compression no" $SSH_PROXY_FLAGS %h
User <YOUR_USERNAME>
- You can now ssh to your node by following the directions on the front page of the contol panel
- To set up trait-o-matic, follow the directions at [6]