Meeting with Andy Bohonak
Politics of Academia
- clarify - authorship (including if I don't publish by a certain time), data ownership, sample ownership, lab notebooks
- make paper trail ... happy email
- SDSU form fills out with authorship, etc ... might be a good excuse to being it up
verify Mendelian inheritance - high priority
- pregnant females, have babies (watch Rosemary's baby)
- maybe keep offspring alive
- screen mom and babies
- find out likelihood that a clutch all has same father
- will allow to see if that marker has weird null allele things
Duplicate data
- To find error rate of data (in extraction, PCR, analysis, etc)
- In text, able to say "total sample size was 300 individuals per species, of which 30 were repeated 3 times, 50 repeated twice; 2 errors found, traceable to such-and-such a process, giving error rate of 1.5%"
- for this project, not necessary to repeat all the way back to extraction
Spatial Scale
- gotta get:
- more consistent distances across roads
- more points across same road
Literature search
- Lots of pop gen papers at spatial scale
Software
- STRUCTURE - easy quick, do it with prelim data!
- CREATE - formats microsat data for STRUCTURE, etc.
Playing with Structure software
- Structure 2.3.3 for MacOS
- spent 4 or 5 hours playing around with Structure, just trying to get it to work
- Here's what I learned today, so I don't forget!
Formatting Data
MS Excel
- first column is sample names (structure term = "Label")
- second column is population designation (structure term = "PopData", in integer form)
- 1 = green cluster
- 2 = blue cluster
- 3 = red cluster
- third column is location designation (structure term = "LocData", in integer form)
- 1 = Camp Pendleton
- 2 = Rancho Jamul / Hollenbeck
- 3 = Point Loma / Cabrillo National Monument
- 4 = Santa Ysabel Open Space Preserve
- 5 = Torrey Pines State Natural Reserve
- first row is marker names (structure term = "Marker Name")
- since diploid, leave an extra column for the second allele between markers
MS Word
- copy data from Excel, Paste Special into Word
- Save As ...
- File Conversion: Latin-US (DOS), CR/LF
Structure
New Project
- File: New Project
- Step 1
- Name the project: whatever you want
- Select directory: make a folder for all Structure projects to be saved in
- Choose data file: browse to the .txt file you just made (from Excel and Word)
- Step 2, 3, 4
- -- --
Run Project
- Run Length: 10,000 and 10,000 is a good place to start
- Ancestry Model: Admixture is appropriate for the ConGen project
- Allele Frequency Model: run both Correlated and Independent
- correlated: frequencies in the different populations are likely to be similar, due to migration or shared ancestry; improves clustering for closely related populations (subtle structure), but may increase the risk of overestimating K
- independent: we expect allele frequencies in different populations to be reasonably different from each other; works well for many data sets (strong structure)
- Advanced: unclick "Compute probability of data (for estimating K)" to make program run faster
- do several iterations where K=?
- that pretty graphic that's in all the papers:
- in the left-hand pane, select the run you want (K=?)
- in the right-hand pane, click Bar plot: Show, then group by PopID
|