DataONE:meeting notes:16 June 2010 chat
Chat with hpiwowar@gmail.com, Nicholas, valerie.janae.enriquez@gmail.com 1 message ________________________________________ Sarah Walker Judson <walker.sarah.3@gmail.com> Wed, Jun 16, 2010 at 11:16 AM To: walker.sarah.3@gmail.com Cc: hpiwowar@gmail.com, valerie.janae.enriquez@gmail.com, walker.sarah.3@gmail.com, valerie.janae.enriquez@gmail.com, walker.sarah.3@gmail.com, hpiwowar@gmail.com, nicholas.m.weber@gmail.com
In the chat room: Heather Piwowar (hpiwowar@gmail.com), Valerie Enriquez (valerie.janae.enriquez@gmail.com), Nicholas Weber (nicholas.m.weber@gmail.com)
10:00 AM Heather: You've been invited to this chat room!
Hi guys! you all here?
10:01 AM Nicholas: Hello all
Sarah: Good morning
10:02 AM Valerie: hello
Nicholas: I apologize in advance for having to leave early again today... it's the only day the rest of the summer I won't be available (promise) Heather: that's ok. we understand. read the transcript and find me later if anything needs to be cleared up?
10:03 AM I posted an agenda...
Nicholas: definitely Heather: http://www.openwetware.org/wiki/DataONE:Notebook/Summer_2010/2010/06/16 sorry I didn't get it up earlier.
10:04 AM Anything else that anyone knows they want to cover?
Sarah: i have a few questions of things I realized i'm not resolved on, but hopefully those will be answered as we go Heather: ok. if not, bring them up as it goes
10:05 AM Since Nic needs to leave early, maybe it makes sense to start quick thoughts about data collection
I'm guessing that mostly it will be Valerie and Sarah syncing up on that today, right? Anything jump out that you'd want Nic's opinion on?
10:06 AM Nicholas: Sara, have you seen my ss? and does the metadata I'm collecting seem to be what you expected?
Sarah, sorry Valerie: if he's found anything regarding recommended citation formats for the repositories that I haven't found (I couldn't find anything on TreeBASE except the example) Sarah: you're metadata is different, but we can talk about that later
10:07 AM i have a few questions for nic (but answer valerie's first):
Heather: yes, that is true. Valerie, you and Nic have some overlapping ss on repository metadata, don't you you could definitely merge those, I'm guessing, and one of you could take the lead and the other could just use the data :) Nicholas: I haven't found anything about citing data within TreeBase other than the identifiers Valerie: ok, that was pretty much what I found
10:08 AM Heather: (or you could both contribute to a shared ss)
Nicholas: pangea defers to the WDC and OECD...but those are just recommendations, and they don't spell out citing datasets at all
10:09 AM Valerie: ok, I remember seeing the pangaea wiki with the recommendation and it was pretty vague
http://wiki.pangaea.de/wiki/Citation
10:10 AM Nicholas: excellent, thank you
Valerie: well, not vague so much as just recommended guidelines Nicholas: my focus has been on journals so far so this is great
10:11 AM Valerie: but yeah, that was my only question for you Nic. 10:12 AM Sarah: so who is collecting depository metadata? i thought nic was, but that conversation made me unsure
Nicholas: I am, but I think Valerie probably dove into this while trying to construct search terms Heather: both :) so no doubt some overlap that could be simplified.....
10:13 AM yup, part of Nic's project, but helped Valerie get going (especially when it wasn't as sure that that was part of Nic's scope....)
Sarah: okay, and i know that you (nic) are covering a lot more journals than i am. do you have any suggstions on ones that i should cover for disciplinary coverage or ones that have good data sharing policies?
10:14 AM Nicholas: so far, not a lot do
In fact, most published by big names like elsevier and springer have really general vague guidelines for authors
10:15 AM and no mention of an external place to submit data (other than genbank in some cases)
Sarah: ok. well, if you run into any, let me know since I'm still scoping good journals to use Valerie: admittedly, most of the stuff I've found is in BioMed Central and Syst. Biol., but I think you've already looked there Heather: oh! This reminds me. Nic and Valerie, in case it is useful.... Valerie: (and PLoSONE) Sarah: yeah, i'm using sysbio as a focus journal Heather: Nature has a GREAT list of repositories
10:16 AM Valerie: oh, neat
I keep meaning to do more work at the simmons library (since if I'm logged in at one of their computers, I have fulltext) Sarah: ok. i've thought about nature, but the journal itself isn't "discipline" specific, but the articles themselves would be Heather: Right, so Sarah I don't think it would make sense as one of your "5" Valerie: Nature has come up quite a few times in my searches
10:17 AM Heather: http://www.nature.com/authors/editorial_policies/availability.html
Valerie: sweet, thanks Nicholas: Valerie, I am checking about remote access - I might be able to get a pass for you to use on UIUC's resource collection Heather: one of the best spelled out policies of any of the (biomed) journals that I've looked at And while I'm at it, Sarah here are is a link to the model repository I was thinking of
10:18 AM Valerie: thanks Nic, I remember UIUC had a ton of access because of the Engineering school
Heather: http://www.ebi.ac.uk/biomodels-main/ Sarah: thanks Heather: and I don't have access to this article but you might? http://portal.acm.org/citation.cfm?id=508791.508876
10:19 AM Sarah: yeah. I've got awesome remote access to my institution's library.
Heather: So Nic you might want to review that list of Nature's repositories for consideration in your list Nicholas: I will Valerie: yeah, they cut a lot of funding for online journals at simmons :( Heather: ok. Nicholas: I'll put that pdf into the OWW too (I had access) Heather: other questions for Nic?
10:20 AM Sarah: i have a general group question about collaboration
how do we want to combine our google spreadsheets? i don't like working directly in google docs, but am willing to post at the end of everyday Nicholas: I agree Sarah: but i'm not sure if a shared sheet is the answer quite yet but we do need unique identifier fields so we can hook our data up late later
10:21 AM Heather: what's your hesitation on shared sheet at this point, Sarah?
Nicholas: trying to work in google docs was slowing me down... What I found though is that if I can work in excel and then use the (re-upload) button at the end of the day Sarah: just that it gets big Nicholas: its pretty effective Sarah: we're all recording relevant full text sections right? my spreadsheet is getting memory intensive in a hurry i'm about ready to migrate to access b/c it's loading slow and i can only imagine worse once i add funciont
10:22 AM *functions
which i just noticed google docs has added! so, my main concern is storage size not openness or whatever Heather: ok, so for now let's say upload to google spreadsheets at the end of every day? Valerie: ok
10:23 AM Sarah: ok. but should be keep them separate or combine?
if we're reuploading, probably separate right? Heather: right Sarah: otherwise we'll erase each others data Nicholas: we could create a folder Heather: I think maybe in the next few weeks a more streamlined solution may become clear Nicholas: and share the folder for easier access Heather: but for now that would work Sarah: oh. good idea about the folders Nicholas: I just realized you could share an entire folder
10:24 AM Sarah: nic, do you know if the reupload takes your formulas up to the cloud?
Heather: I think though that people extracting the same things from different sources (Sarah/Valerie and Valerie/Nic?) Sarah: i've had problems with this in the past Nicholas: I am not sure about that Heather Sarah: the function language is different in excel and google doc Heather: should make sure that their "headings" are the same..... ideally. Sarah: yeah...valerie and i don't have that much overlap at this point, my extraction is more intensive in the article
10:25 AM Heather: which part aren't you sure of, Nic?
Nicholas: sorry, I thought you had asked about the formulas I meant Sarah Sarah: oh...i look into that when i upload, could be a problem for me migrating to docs Heather: to help me understand, what sort of formulas do you have at this point?
10:26 AM Sarah: just if
Heather: is it something that is key to your spreadsheets right now? Sarah: but i've used docs in the past and it translate them as text rather than active fomulas yep, it's how i'm setting up my coding but, i can do that on the ground and have docs hold just the text just can't edit in docs then
10:27 AM no worries
back to what we were talking about... Heather: ok Sarah: but, before we get into exact fields can i ask quickly about unique identifiers that is a pain to collect if you don't do it upfront Heather: yup, but can we pause even before that Sarah: sure Heather: because I think Nic needs to run in a few minutes Nic, are you available tomorrow? and others? Nicholas: I am
10:28 AM Valerie: I am
Heather: one idea is to hold the stats etc part of the conversation until tomorrow, if you are all available? Sarah: me too Heather: 9am pacific time? different preference? Sarah: good Nicholas: sounds great Valerie: sure
10:29 AM Heather: cool./
ok, then I'll hold off on all the other bits till then, except data collection conversation. I'll draft a blog post this afternoon for tomorrow kudos, you guys, on the blog posts. Great so far. Nice tone, nice content.
10:30 AM Valerie: neat, I was worried I was too conversational in mine
Nicholas: we've had a lot of retweets too... 60 page views today Heather: I figured I'd make my post an integration one, looking at how the projects overlap. Do I have permissions to post it directly, or shall I send it to someone? awesome! Nicholas: actually 55,,,, 51 yesterday Sarah: yeah. i noticed that. i tried to put in a plug for people to comment so we know who is looking at it
10:31 AM Nicholas: Heather, do you have a wordpress account?
Heather: do you have google analytics set up? that can be fun (and a huge time waster) Nicholas: otherwise I'll send you an invite this afternoon when my class finishes Heather: yup I do. hpiwowar or researchremix, I forget which. researchremix I think ok.
10:32 AM one last kudos then, from Todd in a chat I had with him yesterday. He thinks the projects make a really nice collage of research interests
Nicholas: ok, I just added you Heather: and also thinks that you've really stepped up and are doign a great job, all of you.
10:33 AM so nice work :)
Hopefully our next big conference call will be a bit more technologically friendly and we can get feedback and direction from the other mentors too.
10:34 AM ok, anything else before Nic heads off?
Sarah: yeah, can i cover unique identifiers just to make sure it's getting done? Heather: yup Nicholas: I have to run (apologies) ... I'll check the transcript this evening and talk more tomorrow? Sarah: these are the ones i think we need and just want to verify they are extracted - journal = isbn
- issue= vol+issue - article = doi - depository = ?? our own numbering?
Heather: journal, do you mean issn? Sarah: that way all the data can be linked in a later database or whatever for analysis
10:35 AM oh yep, sorry
Heather: I think there are also two issns, one for print and one for electronic Sarah: is everyone collecting that? Valerie: ok, that's a good idea, I wasn't sure how much data to capture about the journals Sarah: i couldn't find issn esp i mean in other people's ss Valerie: I had just been doing DOIs and links to article abstracts Nicholas has left
10:36 AM Heather: which issn do you want as the primary key?
some journals don't have print issns these days Sarah: probably electronic since that's what our citation programs will probably extract anyways Heather: (I don't know if all have electronic issns?)
10:37 AM Sarah: we can just record it on a journal list and all use that one
just like a relational table Heather: also year? Sarah: yeah. good. i think our ss should mimic relational tables of a db to make that easier in a later stage Valerie: ok Sarah: rather than retroactively collecting tedious data that is easy enough to retrieve upfront
10:38 AM maybe we could have a group ss about that baseline metadata
especially for the depositories and journals Heather: good idea Sarah: articles, we should just all collect doi
10:39 AM we don't have to worry about standardizing doi too much b/c it's not used for linking, just record tracking
but that i mean, just get a doi but don't worry about how many slashes it has or whatever Heather: why won't it be used for linking? Sarah: b/c articles are linked to journals, but articles aren't linked to anything smaller right?
10:40 AM Heather: I thinnk they might be.
for example, some article metadata can easily be extracted from ISI Web of Science to collect number of authors, etc. Sarah: we discussed maybe multiple datasets yesterday, but then determined that should be coded not nested Heather: not sure I understand "coded not nested"
10:41 AM Sarah: yeah...but keep your doi consistent with your extraction source, but not necessarily among extraction sources
Heather: sure, yes. and I think doi might be used for linking. Sarah: nested...meaning related coded...meaning as article metadata directly stored with the article info not in a related table in terms of a database Heather: I think we don't know what might be nested with this data in the future
10:42 AM people might want to suppliment it with various things we can't forsee
Sarah: yeah, but doi is still a unique identifier Heather: and I'm guessing that the doi might be a great unique identifier for that. right. Sarah: valerie could record a 10.99/999/99 one and mine might be //10.99.9993 and they would still identify our article maybe i'm not making sense
10:43 AM i've just seen doi recorded slightly differently in some places
Heather: agreed... I jsut wanted to make sure there wasn't misunderstanding around the fact that dois might be used for linking at some point I think doi cleanup later won't be hard Sarah: no, doi is the unique identifier but doesn't need a standardized format at this point that's all Heather: agreed! Sarah: sorry to make it more complicated sounding than it really is Valerie: ok
10:44 AM Sarah: okay, sorry for that diversion, should be talk about data extraction and overlap between valerie and myself?
let me pull up both the spreadsheets real quick Heather: yup
10:45 AM Sarah: so, in general, do me valerie is recording more of the success of a search than reading through the articles in depth
Heather: post the urls so we make sure we are looking at the same ones? Sarah: https://spreadsheets.google.com/ccc?key=0Am4hbt8Ef8WXdENmZU83dTRUbW5fNFg3RjFFa1Z0LUE&hl=en https://spreadsheets.google.com/ccc?key=0AgM1E1R2tI_6dE1LYlYtWHRXblNXa3ladXNNY3BDbEE&hl=en Valerie: https://spreadsheets.google.com/ccc?key=0AgM1E1R2tI_6dE1LYlYtWHRXblNXa3ladXNNY3BDbEE&hl=en#gid=0
10:46 AM Heather: that's true right now as she figures out which searches are effective. But soon she'll standardize on a few searches and then broaden the base of things she is looking at and do more extraction
Sarah: ok. b/c i've been focusing on more in depth extraction we only have a few similar fields at this point
10:47 AM Heather: Valerie... I'm guessing we need a reconnect soon just to make sure we are on the same page on this... in the mean time speak up if I'm saying somehting you don't agree with
Sarah: also, valerie's is more focused on treebase in particular, while i'm considering all possible data reusing and sharing Valerie: ok, I just think I need more information on what depth extraction is
10:48 AM Sarah: both the known depositories and produced data that may just be stored with the journal
mostly, when i get in an article i read the materials and methods in detail i record info about if the dataset was cited, how it was cited, what type of dataset it was,e tc i do this all for data reuse, data sharing, and data produced
10:49 AM not just "yes/no"
Valerie: oh, ok. yeah, I think I've just been skimming to see if it's mentioned at all Sarah: of if it cited something Valerie: yeah, 1 or 0 Sarah: yeah Heather: I think the main difference is that Valerie is just doing it for data reuse, not data sharing and data produced Sarah: that's what i noticed but i isn't the "out of treebase" field data sharing?
10:50 AM sorry i'm stinking at typing fast today
Heather: Because the data shared and data produced info doesn't factor into future analyses, given her method Valerie, does your ss still have some data sharing lines at the top? Sarah: sorry the "data into treebase" isn't that data sharing? i.e. if the author deposited their dat Valerie: yes Sarah: data
10:51 AM Heather: From when you were collecting data sharing initially, before Everything Was Clear? ;)
Valerie: I mainly started doing that to differentiate whether or not an article has deposited data or if they have taken data out yeah I might delete those rows once I'm done Heather: maybe you could delete those lines, to prevent misunderstanding, would that make sense? Sarah: yeah, i see that as i progress down the records Valerie: yeah
10:52 AM Sarah: or move to a seperate sheet to not lose it entirely
Valerie: I made a point to mark out phase II search, which has been more successful ah Sarah: like "failed attempts" or "data into treebase" Valerie: ok Sarah: ok, so assuming the searches get solidified, what data will you extract? still just yay or nay about treebase citation?
10:53 AM or should be adopt some of my fields for more detailed extraction
? Valerie: well, yay or nay about the ways it's cited (either author only or doi or something else) Heather: Sarah, see her column M, "text of sentence(s) in article making citation reference" Sarah: yeah. i have a similar field
10:54 AM Heather: (also worth noting that she's starting with TreeBASE now but will move to a few additional repositories)
Sarah: i just expected that we would have more overlap since that seemed like a concern at the beginning but i'm more clear on our different approaches now
10:55 AM valerie's seeing if the depositories are actually being cited
i'm seeing if individual articles are citing data right? Heather: right. Sarah: in very nutshell terms Valerie: you're going into much further depth than I am right Sarah: ok. we have a few overlap fields that we should standardize, but mine won't look like yours until i do my full coding also, we can use doi to communicate
10:56 AM Heather: another way of looking at it: Sarah is looking at data reuse in a few journals across time, Valerie is looking at data reuse of a few repositories across journals
And while Sarah is extracting her data Sarah: my data might have more indepth info about an article you happen across Heather: she is also extracting other data that will be useful in other, related, analyses since she's in the articles anyway, and her "search" strategy makes data extraction from those articles useful.
10:57 AM right?
Valerie: yes Sarah: yep
10:58 AM did you get my comment about doi?
Heather: cool. Valerie: for example, today, I found that an article cited by treebase with one author has a different primary author listed in other citations sharing dois? Sarah: mostly that if we record doi, our data could "communicate" and I might have more indepth info about an article you happen across which would be cool Heather: agreed! good point.
10:59 AM Valerie: oh yeah
that would be awesome Heather: Valerie, one vision for your research I don't think I made very clear.... ... is this idea that ultimately, once your searches are finalized,
11:00 AM it will be useful to try to apply them across many journals (50? all of them ala Google Scholar or Scirus? something like that).
to really get a pan-journal view to complement the narrow-journal focus that Sarah will be getting.
11:01 AM Valerie: I have been finding a wide range of journals
but a lot of the more "successful" searches can be in the same journal (which I should keep an eye on) Heather: (Developing searches may be easiest within a journal, your call, I just wanted to make sure I hadn't confused you with developing vs unleashing finalized searches)
11:02 AM Valerie: I've had the most luck using ISI Web of Science
Heather: that's fine. Just wanted to make sure we were clear that there is no need (and in fact an anti-need) to on purpose limit your search space Valerie: since it's sort of hard to search by citation in Scirus the only limit I put is 2008-2010, really Heather: yes, definitely. by citation probably easiest in ISI or Scopus.
11:03 AM cool!
ok, sorry for the diversion, just wanted to make sure we were on the same page Valerie: it's cool, I'm sure I'll have more questions as I go along Sarah: while we're diverted...quick comment what kind of data is stored on treebase?
11:04 AM Valerie: genetics, I think.
Sarah: i'm working on the molecular ecology journal and they post on genbank almost as if it's required but not their phylogenies so therefore not on treebase Valerie: hm...] Sarah: i thought treebase was data matricies (pairwise) for phylogenetics like a bunch of genes aligned so genetics, yes, but aligned sequences
11:05 AM for constructing phylogeneies
i need to look at the raw data b/c i find it odd that molecular ecology has people depositing on genbank but not treebase Valerie: some sentences I've copy/pasted mention genbank and treebase practically in the same breath
11:06 AM Sarah: yeah...i'm seeing lots of genbank but minimal to no treebase where i think it should be
yeah...looking at it now, its phylogenetic trees weird i think genbank is more well known
11:07 AM and almost given that you post their or reuse data from their
we should ask nic what he thinks is the differences in regard to depository metadata Valerie: yes
11:08 AM Sarah: did we lose heather?
Heather: I'm here... not in discussion because I am currently ignorant about Treebase, unfort. Hope to fix it soon.
11:09 AM Valerie: Treebase is finicky
Heather: Was taking a minute to look at ss columns Valerie: either that or it just doesn't like google chrome or the version of mozilla firefox I'm using Heather: so will be ready to pick up a conversation there when conversation works its way back Sarah: yeah. i'm done with that sorry
11:10 AM Heather: no, no problem, it is good stuff
ok, so I think that ideally, Sarah, all of your "Data REUSE" columns could/should overlap with Valerie's, yes? so: Mention of DataSet WhereDataSetCitedInPaper Where_Coded Type of cited Dataset Way DataSet cited Acquirement of dataset Depository of cited dataset Details_CitedData RelevantCitations Sarah: yep
11:11 AM but i need to code them to be "treebase y/n"
etc so they match better Valerie: mine aren't quite like that, but I can switch it around to match Heather: could/should overlap with these from Valerie's sheets: Citation of TreeBASE through mention of TreeBASE Full citation as per TreeBASE recommendations Citation of TreeBASE through DOI or URI, or Study/Matrix Accession no. Citation of TreeBASE through mention of data author Number of articles citing this article text of sentence(s) in article making citation reference Data INTO TreeBASE Data OUT OF TreeBASE additional notes Sarah: right, now I'm covering multiple datasets per paper and still figuring out how to do that yeah, her's are more binary (y/n)
11:12 AM but, i'm trying to figure out how to do that to keep information about the same dataset together
for instance, i have a genbank citation that is cited with an accession # but another gene not deposited in genbank cited by mentioning a paper etc in the same paper
11:13 AM so i want to keep that data associated with each other
rather than as general metadata Heather: yup. I think worth figuring out a consensus that meets both needs....
11:14 AM Sarah: i'll look at it again today after i look at some articles from ecology to see how that might affect my fields
Heather: So that the data could be aggregated directly with minimal reformatting Valerie: ok. all else fails, I could add some extra columns to match Sarah: then i'll propose how to organize that b/c i think it's more making my data look like valerie's than vice versa b/c mine is messy right now not fully coded
11:15 AM Valerie: should I add a year heading to mine for the article?
Sarah: but yeah, we'll both probably need additional fields of some sort yes! Valerie: (article date and ISSN for journal title) Sarah: i'm going to post a group sheet about minimal needed metadata Heather: good plan, sarah. Valerie: thanks Sarah: i.e for tracking and linking purposes
11:16 AM Heather: valerie, agreed, you will likely have to add extra fields that accomodate tracking of data from multiple repositories at once
cool. I'll leave you guys to figure out a mutually-workable set of columsn for that then Valerie: ok, because I was putting each database in its on page on the workbook Heather has left Heather has joined
11:18 AM Sarah: yep. will try to figure out my coding today so it can integrate with valeries
Valerie: ok, and I'll add extra columns as needed
11:18 AM Heather: You've been invited to this chat room!
Valerie has joined Heather: still there sarah? sorry, dont' know what happened
11:19 AM Sarah: yep can you not get on our previous group chat?
Valerie: ok, and I'll add extra columns as needed Heather: yup, Valerie guessing your structure of different pages for different repositories will probably have to morph into something that works well for Sarah's situation too... I kept getting "group did not receive your chat" warning messages and I didn't get any of your messages. ? Sarah: i'm in two windows Heather: ah technology. Sarah: which is confusing Valerie: ah
11:20 AM Heather: I think you can close the other one....
Valerie: the last message I got was from Sarah about editing her headings Sarah: yep that's the last i said
11:21 AM Heather: here's the last bits that I saw, for completeness
Sarah: valerie, agreed, you will likely have to add extra fields that accomodate tracking of data from multiple repositories at once
cool. I'll leave you guys to figure out a mutually-workable set of columsn for that then
valerie.janae.enriquez: ok, because I was putting each database in its on page on the workbook
Sarah: so it will no doubt be an ongoing conversation 11:17 AM and they will no doubt change over time as you run into different situations
Sarah: yep if we're done with that, then i have another question Heather: that works for you guys? you'll hammer out some combo solution, and keep it updated as it no doubt changes. yup, I'm done with that.
11:22 AM Sarah: valerie, what criteria do you use for the "Full citation as per Pangaea recommendations" field?
i've been keeping notes in my spreadsheet about how i classify fields, which makes the spreadsheet messy but then i can at least retrace my criteria
11:23 AM i've been treating accession numbers as the "best" citation
Valerie: um, I'm starting to realize that the "recommendations" field is problematic since not all of the databases have recommendations Sarah: does yours include a citation in the biblio? Valerie: usually Heather: thought, sarah, would this be more easily kept as OWW notes? maybe not just wanted to put it out there as idea... "i've been keeping notes in my spreadsheet about how i classify fields, which makes the spreadsheet messy but then i can at least retrace my criteria"
11:24 AM Sarah: so a full citation = dataset cited in the biblio
yeah, i'll move it there once i solidify my fields Valerie: yes the only thing is that the recommendations vary Sarah: yeah, i've set more general criteria that isn;'t dependent on the depository
11:25 AM Valerie: (TreeBASE articles usually cite by Author and Pangaea has a doi)
Sarah: or the journal Valerie: ok Sarah: i don't have many of those type of citations in the biblio, can you send me an example? i just haven't seen them much preferably an example from sysbio or comparable do they cite the accession/handling number in the biblio in that scenario
11:26 AM Valerie: http://rspb.royalsocietypublishing.org/content/277/1684/1065
Sarah: i'm seeing a lot of weird in text citations that don't carry over to the biblio Valerie: well, this one doesn't use the accession number this only cites the author but the authors don't exactly match despite the data/original article being the same Sarah: i see a lot of article citations, but not accession #
11:27 AM is that what you're seeing?
Valerie: yes this example doesn't cite the accession # most of them don't at least the ones I've found Heather: I'm guessing it will depend on the repository Sarah: ok, so we should code that in place of the "full ciation per recommendations" field Heather: I think lots of people reusing things out of Genbank will cite the accession number, for example
11:28 AM Sarah: then, later we can have a calculation or whatever that says if that criteria matches the recommendation of the depository
Valerie: yes, I see a lot more Genbank numbers than TreeBASE Sarah: we could have nic collect specific metadata on that and code it to match ours i.e. citation required y/n, accession required y/n which would match our criteria of cited y/n, accession # y/n
11:29 AM does that make any sense?
Valerie: yes Heather: good plan Sarah: i'll consider that today as i standardize fields and think about the baseline metadata we need from each source hopefully it won't require too much backtracking at this point! better now than when we're chomping at the bit to get analysis done!
11:30 AM Valerie has left
Heather: whoops
11:31 AM Sarah: hmm?
Heather: Valerie, are you there? hmmm. dropped somehow.
11:32 AM well, whatcha think Sarah, other things to cover or enough group chatting for today?
Valerie has joined Sarah: no, i'm good Heather: Hi Valerie, are you back? Valerie: yes I got kicked offline Heather: no prob whatcha think, enough group chat for today, or other things to cover?
11:33 AM Valerie: not really. so pretty much just edit my headings to match the standards Sarah's coming up with?
(add date, ISSN, remove "recommendation" column) Heather: yup... well, and think about whatever Sarah proposes and make sure it fits with your data collection needs :)
11:34 AM Sarah: yeah, i'll whip something up in the next hour or so then send it your way and tell me what you think
Heather: and work it out together Sarah: no point in us both doing it Valerie: ok, it might be useful for me to have the recommendation column, but it might not be useful for Sarah Sarah: i mean, until we need to discuss it yeah, like i said, the recommendation field could be a calculation of the other criteria fields
11:35 AM then nic's depository metadata can inform that field about whether it is yes or no based on the standardized criteria
i'm talking at the database or analysis level probably could be done in excel to put your mind at ease Valerie: ok Sarah: with a simple formula for each depository field sorry Heather: if it is useful for you, Valerie, during data collection then keep it... but what I hear Sarah saying is that it can be collected/calculated systematically later with yours and hers at the same time, potentially
11:36 AM Sarah: yeah, maybe make a qualitative rating of it now when you're familiar with the article and then it can be verified with an automation calculation later
which might be more objective anyways Heather: so no need to extract that right now (and in terms of clean/consistent data, ther eis some reason not to extract it manually right now) Sarah: i.e. it could rate how many of the criteria were met, not just yes or no Valerie: ok
11:37 AM Heather: that said, I could imagine that keeping a running visual of that fact might help you in your search refinement? if so, do what helps you.
Valerie: ok, neat Heather: and the other, automated columns could always be added later, regardless
11:38 AM Valerie: ok
Sarah: i've been classifying all my fields as "extract, code, or calc"
11:39 AM that way i know which i need to get out of the article itself, which i can infer from the extracted data, and which will be calulated down the raod
*road and the fun thing is, more calculation/automation ideas always come when you code the data well anyways, we just have to be careful at this intial step to leave our selves options in the analysis
11:40 AM Heather: yup
Valerie: ok Heather: great, well then I'm going to head off. Sarah, you'll post the chats? find me online if I can help? Sarah: yep and we'll talk about stats tomorrow, right?
11:41 AM and followup i guess of the ss i post today
Heather: otherwise I look forward to seeing your consensus columns and daily-google-spreadsheet uploads/updates yes, tomorrow let's talk knoxville and stats and anything else that comes up.
11:42 AM Sarah: sounds good
Valerie: excellent Heather: ok! bye. Valerie: later! Heather has left Valerie has left