Moore Notes 8 14 13

From OpenWetWare
Jump to navigationJump to search
  • Background on the SFams project
  • Problematic observations
    • Identical sequences are ending up in different families.
      • It looks like an HMM from a prior round doesn't always have the power to detect a sequence used to build the model.
      • So, when a new genome with identical (or similar sequences) are analyzed, the new sequence isn't incorporated into the "correct" family
      • Might be able to fix by sifting only with those HMMs that have a high recall metric
        • This would require killing families with low recall
      • Alternatively, do a two-stage sifting process, where you use seq similarity first, then HMM for more diverged homology detection
    • Simplest band-aid may be a fast-blast approach to identify strong sequence identity, then HMM classify
    • Could try linking families together in some intellegent fashion
  • Need to talk to PIs about the value of this project and whether its worth investing time fixing this
    • We think this has inherent research value beyond being a community resource
    • May take fair amount of time to fix some of the issues associated with iterative clustering