Harvard:Biophysics 101/2009/Infrastructure: Difference between revisions

Revision as of 14:24, 15 November 2009

Infrastructure Background

This is a listing of background information for those interested in Trait-O-Matic

Tasks

Text Classification

--I already have MeSH parsed into a mySQL database, I am currently working on ICD-10 as a supplemental hierarchy of diseases [[1]]. As soon as this is ready the data will be applied to the classification.

Reference Extraction

http://incubator.apache.org/pdfbox/ is a Java-based PDF text extractor that we can use to extract paragraphs out from references so that we can then display them together with traits. I will show a demonstration of text extraction in class today!

Overview

/home/trait/core

/home/trait/core/affx_500k_to_gff.py

- outputs GFF records for each SNP in an Affymetrix 500k Genechip file

/home/trait/core/cgi_to_gff.py

- outputs GFF record of each entry in the Complete Genomics csv file

/home/trait/core/codon.py

- codon_123(input) -- returns a three letter amino acid abbreviation given a single letter code input

- codon_321(input) -- returns a single letter code given a three letter amino acid abbreviation

/home/trait/core/config.py

- contains the configuration for t-o-m, such as passwords for databases and the like

/home/trait/core/fastq_to_fasta.py

- strips quality data from a FASTQ format file and returns just the FASTA format

/home/trait/core/gff_concordancy.py

- inputs two lists of GFF-containing files and outputs the concordance between the two in a tabular file

/home/trait/core/gff_dbsnp_query.py

- appends dbSNP information to db_xref (or GFF3, Dbxref) attributes

/home/trait/core/gff_hgmd_map.py
/home/trait/core/gff_intersect.py

- outputs the intersection of two GFF files, with attributes taken from the first

/home/trait/core/gff_morbid_map.py
/home/trait/core/gff_nonsynonymous_filter.py
/home/trait/core/gff_omim_map.py
/home/trait/core/gff_pharmgkb_map.py
/home/trait/core/gff_snpedia_map.py
/home/trait/core/gff_sort.pl

- sorts a GFF file (by feature length, minScore, maxScore, or custom expression)

/home/trait/core/gff_subtract.py
/home/trait/core/gff_twobit_map.py
/home/trait/core/gff_twobit_query.py
/home/trait/core/hapmap_load_database.py
/home/trait/core/json_allele_frequency_query.py
/home/trait/core/json_to_job_database.py
/home/trait/core/maq_snp_to_gff.py

- outputs GFF record for each entry in a Maq SNP file

/home/trait/core/omim_print_variants.py
/home/trait/core/setup.py
/home/trait/core/snpedia.py

- Outputs tab-separated variant information (into data/snpedia.txt) for each entry in SNPedia

/home/trait/core/snpedia_print_genotypes.py

- Goes through snpedia.txt and prints out the associated genotypes (found in the snp19 database)

/home/trait/core/snpinduse_to_gff.py
/home/trait/core/trait-o-matic-server.py
/home/trait/core/venter_gff_snp_to_gff.py
/home/trait/core/warehouse.py
/home/trait/core/watson_gff_to_gff.py
/home/trait/core/yh_gff_snp_to_gff.py

/home/trait/data

This folder stores all of the raw data used by the application

/home/trait/www/system/application/controllers/

results.php -- this controls the display of results!

/home/trait/www/system/application/models/

/home/trait/www/system/application/views/

Database Contents

Ariel

jobs contains each processed genome's information

files contains the locations of each processed genome's temporary files (in /tmp/...)

- id / path / kind (genotype, phenotype, omim, hgmd, morbid, snpedia, pharmgkb) / job

users contains the usernames, password hashes, and emails of users who have submitted jobs

Caliban

Revision Control

Joining

Acquire a Harvard Ethics Training in Human Research (HETHR) certificate by completing the training at [2]. This should take around 2 hours, you have to read 6 of the required sections and 4 of the "electives". Email your certificate to Sasha (awaitz@post.harvard)

Sign up for access to the control panel at [3]

Create an RSA ssh public key. This will allow the server to authenticate you. This is done by using the command

ssh-keygen -t rsa

Ensure that your private and public keys are in your .ssh or .openssh directories. Otherwise, ssh will not know where to look for them.

Upload your public key to the control panel

Follow the directions on the front page of the control panel. They will tell you to edit your .ssh/config file by adding:

Host *.oxf ProxyCommand ssh -p2222 turnout@switchyard.oxf.freelogy.org -x -a -o "Compression no" $SSH_PROXY_FLAGS %h User <YOUR_USERNAME>

You can now ssh to your node by following the directions on the front page of the contol panel

To set up trait-o-matic, follow the directions at [4]

@@ Line 93: / Line 93: @@
 This folder stores all of the raw data used by the application
-== '''/home/trait/www/system/application''' ==
-This folder stores all of the PHP files that are called by the webservice
-* '''/home/trait/www/system/application/controllers/'''
+== '''/home/trait/www/system/application/controllers/''' ==
 * results.php -- this controls the display of results!
-* '''/home/trait/www/system/application/models/'''
+== '''/home/trait/www/system/application/models/''' ==
-* '''/home/trait/www/system/application/views/'''
-==CodeIgniter Frontend==
+== '''/home/trait/www/system/application/views/''' ==
-==Database Contents==
+=='''Database Contents'''==
-== '''Ariel''' ==
+== Ariel ==
 * ''jobs'' contains each processed genome's information
@@ Line 118: / Line 115: @@
 * ''users'' contains the usernames, password hashes, and emails of users who have submitted jobs
-== '''Caliban''' ==
+== Caliban ==

Harvard:Biophysics 101/2009/Infrastructure: Difference between revisions

Revision as of 14:24, 15 November 2009

Contents

Infrastructure Background

Tasks

Overview

/home/trait/core

/home/trait/data

/home/trait/www/system/application/controllers/

/home/trait/www/system/application/models/

/home/trait/www/system/application/views/

Database Contents

Ariel

Caliban

Revision Control

Joining

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools