Harvard:Biophysics 101/2009/Infrastructure: Difference between revisions
No edit summary |
|||
Line 93: | Line 93: | ||
This folder stores all of the raw data used by the application | This folder stores all of the raw data used by the application | ||
== '''/home/trait/www/system/application/controllers/''' == | |||
* results.php -- this controls the display of results! | * results.php -- this controls the display of results! | ||
== '''/home/trait/www/system/application/models/''' == | |||
== | == '''/home/trait/www/system/application/views/''' == | ||
==Database Contents== | =='''Database Contents'''== | ||
== | == Ariel == | ||
* ''jobs'' contains each processed genome's information | * ''jobs'' contains each processed genome's information | ||
Line 118: | Line 115: | ||
* ''users'' contains the usernames, password hashes, and emails of users who have submitted jobs | * ''users'' contains the usernames, password hashes, and emails of users who have submitted jobs | ||
== | == Caliban == | ||
Revision as of 14:24, 15 November 2009
Infrastructure Background
This is a listing of background information for those interested in Trait-O-Matic
Tasks
- Text Classification
--I already have MeSH parsed into a mySQL database, I am currently working on ICD-10 as a supplemental hierarchy of diseases [[1]]. As soon as this is ready the data will be applied to the classification.
- Reference Extraction
http://incubator.apache.org/pdfbox/ is a Java-based PDF text extractor that we can use to extract paragraphs out from references so that we can then display them together with traits. I will show a demonstration of text extraction in class today!
Overview
/home/trait/core
- /home/trait/core/affx_500k_to_gff.py
- outputs GFF records for each SNP in an Affymetrix 500k Genechip file
- /home/trait/core/cgi_to_gff.py
- outputs GFF record of each entry in the Complete Genomics csv file
- /home/trait/core/codon.py
- codon_123(input) -- returns a three letter amino acid abbreviation given a single letter code input
- codon_321(input) -- returns a single letter code given a three letter amino acid abbreviation
- /home/trait/core/config.py
- contains the configuration for t-o-m, such as passwords for databases and the like
- /home/trait/core/fastq_to_fasta.py
- strips quality data from a FASTQ format file and returns just the FASTA format
- /home/trait/core/gff_concordancy.py
- inputs two lists of GFF-containing files and outputs the concordance between the two in a tabular file
- /home/trait/core/gff_dbsnp_query.py
- appends dbSNP information to db_xref (or GFF3, Dbxref) attributes
- /home/trait/core/gff_hgmd_map.py
- /home/trait/core/gff_intersect.py
- outputs the intersection of two GFF files, with attributes taken from the first
- /home/trait/core/gff_morbid_map.py
- /home/trait/core/gff_nonsynonymous_filter.py
- /home/trait/core/gff_omim_map.py
- /home/trait/core/gff_pharmgkb_map.py
- /home/trait/core/gff_snpedia_map.py
- /home/trait/core/gff_sort.pl
- sorts a GFF file (by feature length, minScore, maxScore, or custom expression)
- /home/trait/core/gff_subtract.py
- /home/trait/core/gff_twobit_map.py
- /home/trait/core/gff_twobit_query.py
- /home/trait/core/hapmap_load_database.py
- /home/trait/core/json_allele_frequency_query.py
- /home/trait/core/json_to_job_database.py
- /home/trait/core/maq_snp_to_gff.py
- outputs GFF record for each entry in a Maq SNP file
- /home/trait/core/omim_print_variants.py
- /home/trait/core/setup.py
- /home/trait/core/snpedia.py
- Outputs tab-separated variant information (into data/snpedia.txt) for each entry in SNPedia
- /home/trait/core/snpedia_print_genotypes.py
- Goes through snpedia.txt and prints out the associated genotypes (found in the snp19 database)
- /home/trait/core/snpinduse_to_gff.py
- /home/trait/core/trait-o-matic-server.py
- /home/trait/core/venter_gff_snp_to_gff.py
- /home/trait/core/warehouse.py
- /home/trait/core/watson_gff_to_gff.py
- /home/trait/core/yh_gff_snp_to_gff.py
/home/trait/data
This folder stores all of the raw data used by the application
/home/trait/www/system/application/controllers/
- results.php -- this controls the display of results!
/home/trait/www/system/application/models/
/home/trait/www/system/application/views/
Database Contents
Ariel
- jobs contains each processed genome's information
- files contains the locations of each processed genome's temporary files (in /tmp/...)
- id / path / kind (genotype, phenotype, omim, hgmd, morbid, snpedia, pharmgkb) / job
- users contains the usernames, password hashes, and emails of users who have submitted jobs
Caliban
Revision Control
Joining
- Acquire a Harvard Ethics Training in Human Research (HETHR) certificate by completing the training at [2]. This should take around 2 hours, you have to read 6 of the required sections and 4 of the "electives". Email your certificate to Sasha (awaitz@post.harvard)
- Sign up for access to the control panel at [3]
- Create an RSA ssh public key. This will allow the server to authenticate you. This is done by using the command
ssh-keygen -t rsa
Ensure that your private and public keys are in your .ssh or .openssh directories. Otherwise, ssh will not know where to look for them.
- Upload your public key to the control panel
- Follow the directions on the front page of the control panel. They will tell you to edit your .ssh/config file by adding:
Host *.oxf
ProxyCommand ssh -p2222 turnout@switchyard.oxf.freelogy.org -x -a -o "Compression no" $SSH_PROXY_FLAGS %h
User <YOUR_USERNAME>
- You can now ssh to your node by following the directions on the front page of the contol panel
- To set up trait-o-matic, follow the directions at [4]