Moore Notes 8 14 13
From OpenWetWare
Jump to navigationJump to search
- Background on the SFams project
- Problematic observations
- Identical sequences are ending up in different families.
- It looks like an HMM from a prior round doesn't always have the power to detect a sequence used to build the model.
- So, when a new genome with identical (or similar sequences) are analyzed, the new sequence isn't incorporated into the "correct" family
- Might be able to fix by sifting only with those HMMs that have a high recall metric
- This would require killing families with low recall
- Alternatively, do a two-stage sifting process, where you use seq similarity first, then HMM for more diverged homology detection
- Simplest band-aid may be a fast-blast approach to identify strong sequence identity, then HMM classify
- Could try linking families together in some intellegent fashion
- Identical sequences are ending up in different families.
- Need to talk to PIs about the value of this project and whether its worth investing time fixing this
- We think this has inherent research value beyond being a community resource
- May take fair amount of time to fix some of the issues associated with iterative clustering