Harvard:Biophysics 101/2007/Project: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
mNo edit summary
m (a new idea)
Line 55: Line 55:


If we can predict where some particularly virulent strain will hit, and what its genomic characteristics will be, perhaps we can avert it with vaccines or quarantine measures.
If we can predict where some particularly virulent strain will hit, and what its genomic characteristics will be, perhaps we can avert it with vaccines or quarantine measures.
==Graphics-based visualization of polymorphism data==
Inspired by the brilliant work at gapminder.org, which is looking at data of a different sort.
===Input Data=== 
HapMap data, genomic data, etc.
===Data Characterization and analysis===
Visualize data as data points in two or three dimensional space, and then using a combination of graphics and genomics algorithms, process this data to find points of interest. For example, haplotypes could be plotted with loci along one axis and individuals on another and some other factor on a third. Recombination frequency data could be gathered for various SNPs, for example, and any that stand out as compared with a theoretically model would be points of interest. Individual genomes could be binned by some sort of graphic algorithm that orders people along the axis in such a way as to minimize chaos.
===Action===
Though this analysis, we should be able to gain an understanding of which alleles 'work together'; not only would this help elucidate certain protein-protein interactions, we would also be able to locate in each personal genome potentially hazardous combinations of alleles, etc. and suggest therapeutic methods to address the phenotypes that result.


==Application4==
==Application4==

Revision as of 00:35, 28 February 2007

Biophysics 101: Genomics, Computing, and Economics

Home        People        Schedule        Project        Python        Help       

Project Ideas

Project ideas that came up in the class February 22 are posted [here]

Application1 - ApoE

Alzheimers desease

Input Data

ApoE sequences

Data Characterization and analysis

Identify variation and search OMIM for similar variation and relationship to desease

Action

Suggest clinical testing actions and lifestyle changes

Identifying Common Genetic Motifs in Disease

We can write a script to interface all input genotypes with phentoypes for disease (note: we don't specifically have to look for motifs common to disease, but that seems pretty practical to me. Any phenotype will do, though).

Input Data

Since this script would theoretically cross-reference genotype and phenotype, we would need:

  • Genotypic Inputs (presumably in the form of personalized genome sequences)
  • Phenotypic Inputs (presumably this would take the form of a medical history for the corresponding genome sequence)

Data Characterization and analysis

I think we could design an algorithm to go through and scan for varying numbers of motifs of varying lengths found in specific population subsets, but absent in others. Are there any significant patterns found in a diseased group of people? Significant motifs present in sick populations? Significant motifs absent?

We will certainly have to perform quality-control, and perhaps we can model Cystic Fibrosis, color blindness, sickle cell (etc) to optimize our detection methods.

Action

How can we use these data to help people? Any identified motifs could certainly direct our research efforts, implicating new sites and players in the molecular mechanisms of disease. I'm a little confused by the recommendation on the project page of 'medical/dietary action'. Certainly we could use our data to inform someone of their risk for disease (note, this information could also be abused. Perhaps that would better inform their life-style choices? Prevention is an ideal solution to disease, but, for the inevitable genetic ones, we direct research towards therapy and subversion of the identified molecular mechanisms.

BioWeather(ish): Influenza

Wouldn't it be cool if we could track mutations in influenza viruses and determine spatially and locationally where they occured, what virulence changes resulted, and what likely mutations (and properties of spread of these mutated viruses) might occur in the future, as our algorithm is updated with real-time epidemiological information?

Input Data

Sequences from Influenza A pages, and possibly real-time WHO data on (roughly characterized, if not genomically so) strains and spread rates.

Data Characterization and analysis

  • These are just some qualities we could analyze, but...
    • What are the differences in sequence, as tracked over:
      • Time
      • Location
    • What are the physical meanings of those differences (ie. protein changes)?
    • What changes, if any, are predictable in certain regions
    • (Note: I think this paper is a significant contribution to the influenza field and could inform this project -- CSN)
      • I'm currently unsure, but I've heard that extra-virulent strains may occur when there's a mix of:
        • Different human strains, or
        • Human strains and animal (bird, pig, etc.?) strains
        • Different animal strains
      • So perhaps we may spot where spatial distances between strains are getting smaller and make hypotheses about new hybrid strain creation (and virulence) based on that
    • Real-time inputs will allow prediction of spread characteristics as well as, possibly, predictions of virulence

Action

If we can predict where some particularly virulent strain will hit, and what its genomic characteristics will be, perhaps we can avert it with vaccines or quarantine measures.

Graphics-based visualization of polymorphism data

Inspired by the brilliant work at gapminder.org, which is looking at data of a different sort.

Input Data

HapMap data, genomic data, etc.

Data Characterization and analysis

Visualize data as data points in two or three dimensional space, and then using a combination of graphics and genomics algorithms, process this data to find points of interest. For example, haplotypes could be plotted with loci along one axis and individuals on another and some other factor on a third. Recombination frequency data could be gathered for various SNPs, for example, and any that stand out as compared with a theoretically model would be points of interest. Individual genomes could be binned by some sort of graphic algorithm that orders people along the axis in such a way as to minimize chaos.

Action

Though this analysis, we should be able to gain an understanding of which alleles 'work together'; not only would this help elucidate certain protein-protein interactions, we would also be able to locate in each personal genome potentially hazardous combinations of alleles, etc. and suggest therapeutic methods to address the phenotypes that result.

Application4

Input Data

Data Characterization and analysis

Action