Harvard:Biophysics 101/2009/Infrastructure: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 107: Line 107:
* Contains the JavaScript scripts that control the behavior of the webpage (such as dropdowns, etc)
* Contains the JavaScript scripts that control the behavior of the webpage (such as dropdowns, etc)


* Sortable.js -- an implementation of [ http://tetlaw.id.au/view/blog/table-sorting-with-prototype/ this ] table sorting script
* Sortable.js -- an implementation of [http://tetlaw.id.au/view/blog/table-sorting-with-prototype/ this] table sorting script


=='''Database Contents'''==
=='''Database Contents'''==

Revision as of 15:02, 15 November 2009

Infrastructure Background

This is a listing of background information for those interested in Trait-O-Matic

Tasks

  • Text Classification

--I already have MeSH parsed into a mySQL database, I am currently working on ICD-10 as a supplemental hierarchy of diseases [[1]]. As soon as this is ready the data will be applied to the classification.

  • Reference Extraction

http://incubator.apache.org/pdfbox/ is a Java-based PDF text extractor that we can use to extract paragraphs out from references so that we can then display them together with traits. I will show a demonstration of text extraction in class today!

Overview

/home/trait/core

  • /home/trait/core/affx_500k_to_gff.py

- outputs GFF records for each SNP in an Affymetrix 500k Genechip file

  • /home/trait/core/cgi_to_gff.py

- outputs GFF record of each entry in the Complete Genomics csv file

  • /home/trait/core/codon.py

- codon_123(input) -- returns a three letter amino acid abbreviation given a single letter code input

- codon_321(input) -- returns a single letter code given a three letter amino acid abbreviation

  • /home/trait/core/config.py

- contains the configuration for t-o-m, such as passwords for databases and the like

  • /home/trait/core/fastq_to_fasta.py

- strips quality data from a FASTQ format file and returns just the FASTA format

  • /home/trait/core/gff_concordancy.py

- inputs two lists of GFF-containing files and outputs the concordance between the two in a tabular file

  • /home/trait/core/gff_dbsnp_query.py

- appends dbSNP information to db_xref (or GFF3, Dbxref) attributes

  • /home/trait/core/gff_hgmd_map.py
  • /home/trait/core/gff_intersect.py

- outputs the intersection of two GFF files, with attributes taken from the first

  • /home/trait/core/gff_morbid_map.py
  • /home/trait/core/gff_nonsynonymous_filter.py
  • /home/trait/core/gff_omim_map.py
  • /home/trait/core/gff_pharmgkb_map.py
  • /home/trait/core/gff_snpedia_map.py
  • /home/trait/core/gff_sort.pl

- sorts a GFF file (by feature length, minScore, maxScore, or custom expression)

  • /home/trait/core/gff_subtract.py
  • /home/trait/core/gff_twobit_map.py
  • /home/trait/core/gff_twobit_query.py
  • /home/trait/core/hapmap_load_database.py
  • /home/trait/core/json_allele_frequency_query.py
  • /home/trait/core/json_to_job_database.py
  • /home/trait/core/maq_snp_to_gff.py

- outputs GFF record for each entry in a Maq SNP file

  • /home/trait/core/omim_print_variants.py
  • /home/trait/core/setup.py
  • /home/trait/core/snpedia.py

- Outputs tab-separated variant information (into data/snpedia.txt) for each entry in SNPedia

  • /home/trait/core/snpedia_print_genotypes.py

- Goes through snpedia.txt and prints out the associated genotypes (found in the snp19 database)

  • /home/trait/core/snpinduse_to_gff.py
  • /home/trait/core/trait-o-matic-server.py
  • /home/trait/core/venter_gff_snp_to_gff.py
  • /home/trait/core/warehouse.py
  • /home/trait/core/watson_gff_to_gff.py
  • /home/trait/core/yh_gff_snp_to_gff.py

/home/trait/data

This folder stores all of the raw data used by the application


/home/trait/www/system/application/controllers/

  • results.php -- this controls the display of results!

/home/trait/www/system/application/models/

/home/trait/www/system/application/views/

/home/trait/www/scripts

  • Contains the JavaScript scripts that control the behavior of the webpage (such as dropdowns, etc)
  • Sortable.js -- an implementation of this table sorting script

Database Contents

Ariel

  • jobs contains each processed genome's information
  • files contains the locations of each processed genome's temporary files (in /tmp/...)

- id / path / kind (genotype, phenotype, omim, hgmd, morbid, snpedia, pharmgkb) / job

  • users contains the usernames, password hashes, and emails of users who have submitted jobs

Caliban

Revision Control

Joining

  • Acquire a Harvard Ethics Training in Human Research (HETHR) certificate by completing the training at [2]. This should take around 2 hours, you have to read 6 of the required sections and 4 of the "electives". Email your certificate to Sasha (awaitz@post.harvard)
  • Sign up for access to the control panel at [3]
  • Create an RSA ssh public key. This will allow the server to authenticate you. This is done by using the command

ssh-keygen -t rsa

Ensure that your private and public keys are in your .ssh or .openssh directories. Otherwise, ssh will not know where to look for them.

  • Upload your public key to the control panel
  • Follow the directions on the front page of the control panel. They will tell you to edit your .ssh/config file by adding:

Host *.oxf ProxyCommand ssh -p2222 turnout@switchyard.oxf.freelogy.org -x -a -o "Compression no" $SSH_PROXY_FLAGS %h User <YOUR_USERNAME>

  • You can now ssh to your node by following the directions on the front page of the contol panel
  • To set up trait-o-matic, follow the directions at [4]