|How I Score Frag Data|
Step 1 - Set up GeneMarker
- refer to GeneMarker protocol (particularly the Panel Editor section)
Step 2 - Set up your score sheet in Excel
- The best way to organize your score sheet is with samples in rows and markers in columns, with two columns for each marker (because we’re working with nuclear markers in diploid organisms), like so:
- This layout is pretty standard and allows for easy import into the analysis software you’ll be using downstream.
- I have one of these for each plate (as opposed to one of these for ALL several hundred of my samples), so each score sheet corresponds to one master mix, one cycler run, one frag ID. It makes it easier for me to troubleshoot if I need to.
- My preference is to print out the blank score sheet and manually fill in the allele calls. I update the electronic score sheet when I’m all done, then archive the hard copy.
Step 3 - Score!
- When I score my data, I actually look at it twice in different contexts. This allows me to identify any anomalies or problems with the data as well as double-check for any typos.
Context #1 = Marker-focused:
- The first time I look at the data, I zoom in on one marker and score each sample (Marker 1 for all samples, then Marker 2 for all samples, etc.), entering the alleles into my score sheet as I go. Here, I’m focusing on accurate allele calls. Many times, the peaks will be low-intensity; when you zoom in on just the marker, it’s easier to ‘call’ the allele. Additionally, some of your markers may have unique peak shapes – so long as you’re calling them consistently, you’re golden. By focusing on one marker at a time, you can better remember the unique shape you have to deal with.
Context #2 = Sample-focused:
- For the second time around, I make sure I can see all the markers at once for each sample. Here, I’m focusing on sample quality. If the sample is contaminated, I should see more than two peaks at multiple markers; I’ll know to re-extract, not re-PCR. If the sample is degraded, I should see strong peaks at the markers with shorter amplicons, and weaker peaks as the amplicon size increases; I’ll know to re-extract, not re-PCR. If the sample crapped out during extraction (e.g. the DNA got dumped during the alcohol washes, so you’re left with zero DNA), then you’ll see no amplification for any marker; I’d probably re-PCR first to make sure it wasn’t a PCR error, like not actually plating any DNA, or the sample evaporating during cycling, etc. These are all possibilities that you’ll be able to identify only if you take the whole sample into focus.
Step 4 - Quality Control
- As I’m writing down the alleles, I’m also assessing the quality of the peaks – do they pass or fail?
High-Quality Peaks = PASS
- I’m comfortable that the alleles I’m writing down are a true representation of what’s actually in that animal. Some characteristics of high-quality peaks:
- - Peak height is distinct from background readings
- - Peak has characteristic leading peak (this may not occur for every marker, but if it does, then it should be consistent for all samples within that marker)
- - Peak has characteristic -a peak (again, this may not occur for every marker, but if it does, then it should be consistent for all samples within that marker)
- - If heterozygous for a marker, then peaks follow slope rule
Low-Quality Peaks = FAIL
- Something’s fishy about the peaks. Some characteristics of low-quality peaks:
- - Peak height is too low with respect to background readings
- - Peak has funny shape
- - Peak is out of range from where most samples fall
- - In heterozygotes, slope rule is violated
- On my score sheet, low-quality peaks get an allele call (because usually, I have an idea of what it probably is, I’m just not ready to publish that result), and a highlighted cell. Highlight means “I don’t trust this allele call, but here’s what I think it might be for when I re-run this sample.”
- Peak height should be distinct from the background noise.
|high and clean! PASS||low, but above background, PASS||Low, not distinct from background, FAIL|
- Since PCR is a competitive process (the different fragments of template DNA are competing for Taq to come build their complementary strand), Taq will finish amplifying shorter amplicons more often than longer amplicons.
|slope is good, PASS||slope goes up instead of down, FAIL|
- If you find that the slope rules is violated, you may not have enough DNA template; try increasing how much you plate before PCRing.
- The evolutionary mechanism that makes populations have variation in microsatellite length is the enzyme Taq ‘slipping’ during the replication process. That same mechanism occurs in PCRs – Taq slips, and instead of making, say, 16 copies of the TAGA repeat motif, it accidentally makes 15. When looking at the frag data, this means you’ll have a nice tall peak that represents the true 16 copies, as well as a shorter peak 4 base pairs shorter that represents the accidental 15 copies.
- Dinucleotide repeats are particularly difficult for Taq to maintain fidelity to the true number of copies, so you end up seeing a sort of “ramping up” effect as Taq slips up more frequently. Leading peaks (also called ‘stutter’ by the microsatellite community) are fine, so long as you can identify them as being leading peaks.
|tetranucleotide repeat||dinucleotide repeat|
- Taq likes to add an extra adenine on the end of a PCR product, but this tendency is not consistent – sometimes it does, sometimes it doesn’t, sometimes it mostly does, sometimes it mostly doesn’t.
- We use pigtails (the extra GTTT on the 5’ end of the reverse primer) to encourage Taq to adenylate its PCR products (turns out it’s much easier to encourage than to discourage Taq to adenylate).
- The true peak is usually still visible. This is fine, so long as you can consistently call the peaks.
- If you find that the real peak (“-a”) and the adenylated peak (“+a”) are about the same size, you may have too much DNA template; try decreasing how much you plate before PCRing.
Step 5 - Backlogs
- For all those samples that didn’t pass for all makers, you’ll be re-doing them (re-extracting, re-PCRing, whatever). Here’s how I organize my passes from my fails.
Updating the electronic score sheet
- I transpose my hard copy edits to the electronic copy, including highlights. For samples that passed on all markers, I color them green (green means go) and copy that sample to another spreadsheet, the FINAL scores. Those that need re-dos get copied to the BACKLOG scores spreadsheet, retaining the highlighting. As I re-do those samples, I refer back to this score sheet to compare (and combine) results.
Create backlog plates
- For all those backlogged samples, I group them by whether they need to be re-extracted, re-PCRed in multiplex, re-PCRed in uniplex, etc. This is mostly a judgment call on my part – wherever I think something went wrong is where I back-up to for that sample.
- John Butler's "STR and Molecular Biology Artifacts" April 2007 - slides on biological artifacts of microsatellites