Harvard:Biophysics 101/2009/Infrastructure: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
Line 38: Line 38:
* '''/home/trait/core/fastq_to_fasta.py'''
* '''/home/trait/core/fastq_to_fasta.py'''


- strips quality data from a [[http://en.wikipedia.org/wiki/FASTQ_format|FASTQ]] format file and returns just the [[http://en.wikipedia.org/wiki/FASTA_format|FASTA]] format
- strips quality data from a [http://en.wikipedia.org/wiki/FASTQ_format FASTQ] format file and returns just the [http://en.wikipedia.org/wiki/FASTA_format FASTA] format


* '''/home/trait/core/gff_concordancy.py'''
* '''/home/trait/core/gff_concordancy.py'''

Revision as of 22:34, 14 November 2009

Infrastructure Background

This is a listing of background information for those interested in Trait-O-Matic

Tasks

  • Text Classification

--I already have MeSH parsed into a mySQL database, I am currently working on ICD-10 as a supplemental hierarchy of diseases [[1]]. As soon as this is ready the data will be applied to the classification.

  • Reference Extraction

http://incubator.apache.org/pdfbox/ is a Java-based PDF text extractor that we can use to extract paragraphs out from references so that we can then display them together with traits. I will show a demonstration of text extraction in class today!

Overview

/home/trait/core

  • /home/trait/core/affx_500k_to_gff.py

- outputs GFF records for each SNP in an Affymetrix 500k Genechip file

  • /home/trait/core/cgi_to_gff.py

- outputs GFF record of each entry in the Complete Genomics csv file

  • /home/trait/core/codon.py

- codon_123(input) -- returns a three letter amino acid abbreviation given a single letter code input

- codon_321(input) -- returns a single letter code given a three letter amino acid abbreviation

  • /home/trait/core/config.py

- contains the configuration for t-o-m, such as passwords for databases and the like

  • /home/trait/core/fastq_to_fasta.py

- strips quality data from a FASTQ format file and returns just the FASTA format

  • /home/trait/core/gff_concordancy.py
  • /home/trait/core/gff_dbsnp_query.py
  • /home/trait/core/gff_hgmd_map.py
  • /home/trait/core/gff_intersect.py
  • /home/trait/core/gff_morbid_map.py
  • /home/trait/core/gff_nonsynonymous_filter.py
  • /home/trait/core/gff_omim_map.py
  • /home/trait/core/gff_pharmgkb_map.py
  • /home/trait/core/gff_snpedia_map.py
  • /home/trait/core/gff_sort.pl
  • /home/trait/core/gff_subtract.py
  • /home/trait/core/gff_twobit_map.py
  • /home/trait/core/gff_twobit_query.py
  • /home/trait/core/hapmap_load_database.py
  • /home/trait/core/json_allele_frequency_query.py
  • /home/trait/core/json_to_job_database.py
  • /home/trait/core/maq_snp_to_gff.py
  • /home/trait/core/omim_print_variants.py
  • /home/trait/core/setup.py
  • /home/trait/core/snpedia.py

- Outputs tab-separated variant information (into data/snpedia.txt) for each entry in SNPedia

  • /home/trait/core/snpedia_print_genotypes.py

- Goes through snpedia.txt and prints out the associated genotypes (found in the snp19 database)

  • /home/trait/core/snpinduse_to_gff.py
  • /home/trait/core/trait-o-matic-server.py
  • /home/trait/core/venter_gff_snp_to_gff.py
  • /home/trait/core/warehouse.py
  • /home/trait/core/watson_gff_to_gff.py
  • /home/trait/core/yh_gff_snp_to_gff.py
  • /home/trait/data

CodeIgniter Frontend

Database Contents

Revision Control

Joining

  • Acquire a Harvard Ethics Training in Human Research (HETHR) certificate by completing the training at [2]. This should take around 2 hours, you have to read 6 of the required sections and 4 of the "electives". Email your certificate to Sasha (awaitz@post.harvard)
  • Sign up for access to the control panel at [3]
  • Create an RSA ssh public key. This will allow the server to authenticate you. This is done by using the command

ssh-keygen -t rsa

Ensure that your private and public keys are in your .ssh or .openssh directories. Otherwise, ssh will not know where to look for them.

  • Upload your public key to the control panel
  • Follow the directions on the front page of the control panel. They will tell you to edit your .ssh/config file by adding:

Host *.oxf ProxyCommand ssh -p2222 turnout@switchyard.oxf.freelogy.org -x -a -o "Compression no" $SSH_PROXY_FLAGS %h User <YOUR_USERNAME>

  • You can now ssh to your node by following the directions on the front page of the contol panel
  • To set up trait-o-matic, follow the directions at [4]