User:Lindenb/Notebook/UMR915/20100716

From OpenWetWare

Jump to: navigation, search

20100714        Top        20100722       


  • show interface to RR
  • starting creating a linkage with BDB
  • normalized the database with UMR915DBNormalisation
  • downloaded the data for the new project (see mail Jul 16 2010 09H52 )
  • added critical distance between variations.

Contents

loading GATK variations

loading GATK variations with UnifiedGenotyper. Something like:

 ~/bin/insertvariants.sh -C -s X1 -d "bwa/recal GATK UnifiedGenotyper X1 20100709" -t vcf -cfg  ~/.umr915.properties gatk_call_X1.vcf.gz

generating new input for sift

 split -C 900k  jeter.sift.input.txt  sift_

and for polyphen

removing duplicate in the database

after normalisation, remove duplicates in sift and polyphen:

  mysql -N -u anonymous -e 'select id from sift group by variation_id,alt having count(*)!=1' umr915 |\
    awk '{printf("delete from sift where id=%s;\n",$1);}'
  mysql -N -u anonymous -e 'select id from polyphen group by variation_id,alt,library having count(*)!=1' umr915 |\
    awk '{printf("delete from polyphen where id=%s;\n",$1);}' > jeter.sql

compare SIFT/polyphen

 select P.prediction as "polyphen",S.prediction as "sift",count(*) from polyphen as P, sift as S where P.variation_id=S.variation_id and P.alt=S.alt and P.library="HumVar" group by 1,2 order by 1,2

Polyphen HumVar

polyphensiftcount(*)
PROBABLY_DAMAGINGNULL5
PROBABLY_DAMAGINGUNSCORED437
PROBABLY_DAMAGINGTOLERATED3156
PROBABLY_DAMAGINGDAMAGING_LOW3616
PROBABLY_DAMAGINGDAMAGING5773
POSSIBLY_DAMAGINGNULL23
POSSIBLY_DAMAGINGUNSCORED576
POSSIBLY_DAMAGINGTOLERATED6934
POSSIBLY_DAMAGINGDAMAGING_LOW4549
POSSIBLY_DAMAGINGDAMAGING3053
BENIGNNULL40
BENIGNUNSCORED1009
BENIGNTOLERATED20032
BENIGNDAMAGING_LOW5135
BENIGNDAMAGING1692
UNKNOWNNULL6
UNKNOWNUNSCORED636
UNKNOWNTOLERATED3037
UNKNOWNDAMAGING_LOW2838
UNKNOWNDAMAGING255

Polyphen HumDiv

polyphensiftcount(*)
PROBABLY_DAMAGINGNULL12
PROBABLY_DAMAGINGUNSCORED681
PROBABLY_DAMAGINGTOLERATED5545
PROBABLY_DAMAGINGDAMAGING_LOW5597
PROBABLY_DAMAGINGDAMAGING6951
POSSIBLY_DAMAGINGNULL15
POSSIBLY_DAMAGINGUNSCORED407
POSSIBLY_DAMAGINGTOLERATED5413
POSSIBLY_DAMAGINGDAMAGING_LOW3283
POSSIBLY_DAMAGINGDAMAGING1854
BENIGNNULL41
BENIGNUNSCORED934
BENIGNTOLERATED19164
BENIGNDAMAGING_LOW4420
BENIGNDAMAGING1713
UNKNOWNNULL6
UNKNOWNUNSCORED636
UNKNOWNTOLERATED3037
UNKNOWNDAMAGING_LOW2838
UNKNOWNDAMAGING255
Personal tools