DataONE:meeting notes:16 June 2010 chat

Chat with hpiwowar@gmail.com, Nicholas, valerie.janae.enriquez@gmail.com 1 message ________________________________________ Sarah Walker Judson <walker.sarah.3@gmail.com> Wed, Jun 16, 2010 at 11:16 AM To: walker.sarah.3@gmail.com Cc: hpiwowar@gmail.com, valerie.janae.enriquez@gmail.com, walker.sarah.3@gmail.com, valerie.janae.enriquez@gmail.com, walker.sarah.3@gmail.com, hpiwowar@gmail.com, nicholas.m.weber@gmail.com

In the chat room: Heather Piwowar (hpiwowar@gmail.com), Valerie Enriquez (valerie.janae.enriquez@gmail.com), Nicholas Weber (nicholas.m.weber@gmail.com)

10:00 AM Heather: You've been invited to this chat room!

 Hi guys! you all here?

10:01 AM Nicholas: Hello all

Sarah: Good morning

10:02 AM Valerie: hello

Nicholas: I apologize in advance for having to leave early again today... it's the only day the rest of the summer I won't be available (promise)
Heather: that's ok. we understand.
 read the transcript and find me later if anything needs to be cleared up?

10:03 AM I posted an agenda...

Nicholas: definitely
Heather: http://www.openwetware.org/wiki/DataONE:Notebook/Summer_2010/2010/06/16
 sorry I didn't get it up earlier.

10:04 AM Anything else that anyone knows they want to cover?

Sarah: i have a few questions of things I realized i'm not resolved on, but hopefully those will be answered as we go
Heather: ok. if not, bring them up as it goes

10:05 AM Since Nic needs to leave early, maybe it makes sense to start quick thoughts about data collection

 I'm guessing that mostly it will be Valerie and Sarah syncing up on that today, right?
 Anything jump out that you'd want Nic's opinion on?

10:06 AM Nicholas: Sara, have you seen my ss? and does the metadata I'm collecting seem to be what you expected?

 Sarah, sorry
Valerie: if he's found anything regarding recommended citation formats for the repositories that I haven't found (I couldn't find anything on TreeBASE except the example)
Sarah: you're metadata is different, but we can talk about that later

10:07 AM i have a few questions for nic (but answer valerie's first):

Heather: yes, that is true. Valerie, you and Nic have some overlapping ss on repository metadata, don't you
 you could definitely merge those, I'm guessing, and one of you could take the lead
 and the other could just use the data :)
Nicholas: I haven't found anything about citing data within TreeBase other than the identifiers
Valerie: ok, that was pretty much what I found

10:08 AM Heather: (or you could both contribute to a shared ss)

Nicholas: pangea defers to the WDC and OECD...but those are just recommendations, and they don't spell out citing datasets at all

10:09 AM Valerie: ok, I remember seeing the pangaea wiki with the recommendation and it was pretty vague

 http://wiki.pangaea.de/wiki/Citation

10:10 AM Nicholas: excellent, thank you

Valerie: well, not vague so much as just recommended guidelines
Nicholas: my focus has been on journals so far so this is great

10:11 AM Valerie: but yeah, that was my only question for you Nic. 10:12 AM Sarah: so who is collecting depository metadata? i thought nic was, but that conversation made me unsure

Nicholas: I am, but I think Valerie probably dove into this while trying to construct search terms
Heather: both :) so no doubt some overlap that could be simplified.....

10:13 AM yup, part of Nic's project, but helped Valerie get going (especially when it wasn't as sure that that was part of Nic's scope....)

Sarah: okay, and i know that you (nic) are covering a lot more journals than i am. do you have any suggstions on ones that i should cover for disciplinary coverage or ones that have good data sharing policies?

10:14 AM Nicholas: so far, not a lot do

 In fact, most published by big names like elsevier and springer have really general vague guidelines for authors

10:15 AM and no mention of an external place to submit data (other than genbank in some cases)

Sarah: ok. well, if you run into any, let me know since I'm still scoping good journals to use
Valerie: admittedly, most of the stuff I've found is in BioMed Central and Syst. Biol., but I think you've already looked there
Heather: oh! This reminds me. Nic and Valerie, in case it is useful....
Valerie: (and PLoSONE)
Sarah: yeah, i'm using sysbio as a focus journal
Heather: Nature has a GREAT list of repositories

10:16 AM Valerie: oh, neat

 I keep meaning to do more work at the simmons library (since if I'm logged in at one of their computers, I have fulltext)
Sarah: ok. i've thought about nature, but the journal itself isn't "discipline" specific, but the articles themselves would be
Heather: Right, so Sarah I don't think it would make sense as one of your "5"
Valerie: Nature has come up quite a few times in my searches

10:17 AM Heather: http://www.nature.com/authors/editorial_policies/availability.html

Valerie: sweet, thanks
Nicholas: Valerie, I am checking about remote access - I might be able to get a pass for you to use on UIUC's resource collection
Heather: one of the best spelled out policies of any of the (biomed) journals that I've looked at
 And while I'm at it, Sarah here are is a link to the model repository I was thinking of

10:18 AM Valerie: thanks Nic, I remember UIUC had a ton of access because of the Engineering school

Heather: http://www.ebi.ac.uk/biomodels-main/
Sarah: thanks
Heather: and I don't have access to this article but you might? http://portal.acm.org/citation.cfm?id=508791.508876

10:19 AM Sarah: yeah. I've got awesome remote access to my institution's library.

Heather: So Nic you might want to review that list of Nature's repositories for consideration in your list
Nicholas: I will
Valerie: yeah, they cut a lot of funding for online journals at simmons :(
Heather: ok.
Nicholas: I'll put that pdf into the OWW too (I had access)
Heather: other questions for Nic?

10:20 AM Sarah: i have a general group question about collaboration

 how do we want to combine our google spreadsheets?
 i don't like working directly in google docs, but am willing to post at the end of everyday
Nicholas: I agree
Sarah: but i'm not sure if a shared sheet is the answer quite yet
 but we do need unique identifier fields so we can hook our data up late
 later

10:21 AM Heather: what's your hesitation on shared sheet at this point, Sarah?

Nicholas: trying to work in google docs was slowing me down... What I found though is that if I can work in excel and then use the (re-upload) button at the end of the day
Sarah: just that it gets big
Nicholas: its pretty effective
Sarah: we're all recording relevant full text sections right?
 my spreadsheet is getting memory intensive in a hurry
 i'm about ready to migrate to access b/c it's loading slow and i can only imagine worse once i add funciont

10:22 AM *functions

 which i just noticed google docs has added!
 so, my main concern is storage size
 not openness or whatever
Heather: ok, so for now let's say upload to google spreadsheets at the end of every day?
Valerie: ok

10:23 AM Sarah: ok. but should be keep them separate or combine?

 if we're reuploading, probably separate
 right?
Heather: right
Sarah: otherwise we'll erase each others data
Nicholas: we could create a folder
Heather: I think maybe in the next few weeks a more streamlined solution may become clear
Nicholas: and share the folder for easier access
Heather: but for now that would work
Sarah: oh. good idea
 about the folders
Nicholas: I just realized you could share an entire folder

10:24 AM Sarah: nic, do you know if the reupload takes your formulas up to the cloud?

Heather: I think though that people extracting the same things from different sources (Sarah/Valerie and Valerie/Nic?)
Sarah: i've had problems with this in the past
Nicholas: I am not sure about that Heather
Sarah: the function language is different in excel and google doc
Heather: should make sure that their "headings" are the same..... ideally.
Sarah: yeah...valerie and i don't have that much overlap at this point, my extraction is more intensive in the article

10:25 AM Heather: which part aren't you sure of, Nic?

Nicholas: sorry, I thought you had asked about the formulas
 I meant Sarah
Sarah: oh...i look into that when i upload, could be a problem for me migrating to docs
Heather: to help me understand, what sort of formulas do you have at this point?

10:26 AM Sarah: just if

Heather: is it something that is key to your spreadsheets right now?
Sarah: but i've used docs in the past and it translate them as text rather than active fomulas
 yep, it's how i'm setting up my coding
 but, i can do that on the ground and have docs hold just the text
 just can't edit in docs then

10:27 AM no worries

 back to what we were talking about...
Heather: ok
Sarah: but, before we get into exact fields
 can i ask quickly about unique identifiers
 that is a pain to collect if you don't do it upfront
Heather: yup, but can we pause even before that
Sarah: sure
Heather: because I think Nic needs to run in a few minutes
 Nic, are you available tomorrow? and others?
Nicholas: I am

10:28 AM Valerie: I am

Heather: one idea is to hold the stats etc part of the conversation until tomorrow, if you are all available?
Sarah: me too
Heather: 9am pacific time? different preference?
Sarah: good
Nicholas: sounds great
Valerie: sure

10:29 AM Heather: cool./

 ok, then I'll hold off on all the other bits till then, except data collection conversation.
 I'll draft a blog post this afternoon for tomorrow
 kudos, you guys, on the blog posts. Great so far.
 Nice tone, nice content.

10:30 AM Valerie: neat, I was worried I was too conversational in mine

Nicholas: we've had a lot of retweets too... 60 page views today
Heather: I figured I'd make my post an integration one, looking at how the projects overlap. Do I have permissions to post it directly, or shall I send it to someone?
 awesome!
Nicholas: actually 55,,,, 51 yesterday
Sarah: yeah. i noticed that. i tried to put in a plug for people to comment so we know who is looking at it

10:31 AM Nicholas: Heather, do you have a wordpress account?

Heather: do you have google analytics set up? that can be fun (and a huge time waster)
Nicholas: otherwise I'll send you an invite this afternoon when my class finishes
Heather: yup I do.
 hpiwowar or researchremix, I forget which. researchremix I think
 ok.

10:32 AM one last kudos then, from Todd in a chat I had with him yesterday. He thinks the projects make a really nice collage of research interests

Nicholas: ok, I just added you
Heather: and also thinks that you've really stepped up and are doign a great job, all of you.

10:33 AM so nice work :)

 Hopefully our next big conference call will be a bit more technologically friendly and we can get feedback and direction from the other mentors too.

10:34 AM ok, anything else before Nic heads off?

Sarah: yeah, can i cover unique identifiers just to make sure it's getting done?
Heather: yup
Nicholas: I have to run (apologies) ... I'll check the transcript this evening and talk more tomorrow?
Sarah: these are the ones i think we need and just want to verify they are extracted
 - journal = isbn

- issue= vol+issue - article = doi - depository = ?? our own numbering?

Heather: journal, do you mean issn?
Sarah: that way all the data can be linked in a later database or whatever for analysis

10:35 AM oh yep, sorry

Heather: I think there are also two issns, one for print and one for electronic
Sarah: is everyone collecting that?
Valerie: ok, that's a good idea, I wasn't sure how much data to capture about the journals
Sarah: i couldn't find issn esp
 i mean in other
 people's ss
Valerie: I had just been doing DOIs and links to article abstracts
Nicholas has left

10:36 AM Heather: which issn do you want as the primary key?

 some journals don't have print issns these days
Sarah: probably electronic since that's what our citation programs will probably extract anyways
Heather: (I don't know if all have electronic issns?)

10:37 AM Sarah: we can just record it on a journal list and all use that one

 just like a relational table
Heather: also year?
Sarah: yeah. good. i think our ss should mimic relational tables of a db to make that easier in a later stage
Valerie: ok
Sarah: rather than retroactively collecting tedious data that is easy enough to retrieve upfront

10:38 AM maybe we could have a group ss about that baseline metadata

 especially for the depositories and journals
Heather: good idea
Sarah: articles, we should just all collect doi

10:39 AM we don't have to worry about standardizing doi too much b/c it's not used for linking, just record tracking

 but that i mean, just get a doi but don't worry about how many slashes it has or whatever
Heather: why won't it be used for linking?
Sarah: b/c articles are linked to journals, but articles aren't linked to anything smaller right?

10:40 AM Heather: I thinnk they might be.

 for example, some article metadata can easily be extracted from ISI Web of Science
 to collect number of authors, etc.
Sarah: we discussed maybe multiple datasets yesterday, but then determined that should be coded not nested
Heather: not sure I understand "coded not nested"

10:41 AM Sarah: yeah...but keep your doi consistent with your extraction source, but not necessarily among extraction sources

Heather: sure, yes.
 and I think doi might be used for linking.
Sarah: nested...meaning related
 coded...meaning as article metadata
 directly stored with the article info
 not in a related table
 in terms of a database
Heather: I think we don't know what might be nested with this data in the future

10:42 AM people might want to suppliment it with various things we can't forsee

Sarah: yeah, but doi is still a unique identifier
Heather: and I'm guessing that the doi might be a great unique identifier for that.
 right.
Sarah: valerie could record a 10.99/999/99 one
 and mine might be //10.99.9993
 and they would still identify our article
 maybe i'm not making sense

10:43 AM i've just seen doi recorded slightly differently in some places

Heather: agreed... I jsut wanted to make sure there wasn't misunderstanding around the fact that dois might be used for linking at some point
 I think doi cleanup later won't be hard
Sarah: no, doi is the unique identifier but doesn't need a standardized format at this point
 that's all
Heather: agreed!
Sarah: sorry to make it more complicated sounding than it really is
Valerie: ok

10:44 AM Sarah: okay, sorry for that diversion, should be talk about data extraction and overlap between valerie and myself?

 let me pull up both the spreadsheets real quick
Heather: yup

10:45 AM Sarah: so, in general, do me valerie is recording more of the success of a search than reading through the articles in depth

Heather: post the urls so we make sure we are looking at the same ones?
Sarah: https://spreadsheets.google.com/ccc?key=0Am4hbt8Ef8WXdENmZU83dTRUbW5fNFg3RjFFa1Z0LUE&hl=en
 https://spreadsheets.google.com/ccc?key=0AgM1E1R2tI_6dE1LYlYtWHRXblNXa3ladXNNY3BDbEE&hl=en
Valerie: https://spreadsheets.google.com/ccc?key=0AgM1E1R2tI_6dE1LYlYtWHRXblNXa3ladXNNY3BDbEE&hl=en#gid=0

10:46 AM Heather: that's true right now as she figures out which searches are effective. But soon she'll standardize on a few searches and then broaden the base of things she is looking at and do more extraction

Sarah: ok. b/c i've been focusing on more in depth extraction
 we only have a few similar fields at this point

10:47 AM Heather: Valerie... I'm guessing we need a reconnect soon just to make sure we are on the same page on this... in the mean time speak up if I'm saying somehting you don't agree with

Sarah: also, valerie's is more focused on treebase in particular, while i'm considering all possible data reusing and sharing
Valerie: ok, I just think I need more information on what depth extraction is

10:48 AM Sarah: both the known depositories and produced data that may just be stored with the journal

 mostly, when i get in an article i read the materials and methods in detail
 i record info about if the dataset was cited, how it was cited, what type of dataset it was,e tc
 i do this all for data reuse, data sharing, and data produced

10:49 AM not just "yes/no"

Valerie: oh, ok. yeah, I think I've just been skimming to see if it's mentioned at all
Sarah: of if it cited something
Valerie: yeah, 1 or 0
Sarah: yeah
Heather: I think the main difference is that Valerie is just doing it for data reuse, not data sharing and data produced
Sarah: that's what i noticed
 but i isn't the "out of treebase" field data sharing?

10:50 AM sorry i'm stinking at typing fast today

Heather: Because the data shared and data produced info doesn't factor into future analyses, given her method
 Valerie, does your ss still have some data sharing lines at the top?
Sarah: sorry the "data into treebase"
 isn't that data sharing?
 i.e. if the author deposited their dat
Valerie: yes
Sarah: data

10:51 AM Heather: From when you were collecting data sharing initially, before Everything Was Clear? ;)

Valerie: I mainly started doing that to differentiate whether or not an article has deposited data or if they have taken data out
 yeah
 I might delete those rows once I'm done
Heather: maybe you could delete those lines, to prevent misunderstanding, would that make sense?
Sarah: yeah, i see that as i progress down the records
Valerie: yeah

10:52 AM Sarah: or move to a seperate sheet to not lose it entirely

Valerie: I made a point to mark out phase II search, which has been more successful
 ah
Sarah: like "failed attempts"
 or "data into treebase"
Valerie: ok
Sarah: ok, so assuming the searches get solidified, what data will you extract?
 still just yay or nay about treebase citation?

10:53 AM or should be adopt some of my fields for more detailed extraction

 ?
Valerie: well, yay or nay about the ways it's cited
 (either author only or doi or something else)
Heather: Sarah, see her column M, "text of sentence(s) in article making citation reference"
Sarah: yeah. i have a similar field

10:54 AM Heather: (also worth noting that she's starting with TreeBASE now but will move to a few additional repositories)

Sarah: i just expected that we would have more overlap since that seemed like a concern at the beginning
 but i'm more clear on our different approaches now

10:55 AM valerie's seeing if the depositories are actually being cited

 i'm seeing if individual articles are citing data
 right?
Heather: right.
Sarah: in very nutshell terms
Valerie: you're going into much further depth than I am
 right
Sarah: ok. we have a few overlap fields that we should standardize, but mine won't look like yours until i do my full coding
 also, we can use doi to communicate

10:56 AM Heather: another way of looking at it: Sarah is looking at data reuse in a few journals across time, Valerie is looking at data reuse of a few repositories across journals

 And while Sarah is extracting her data
Sarah: my data might have more indepth info about an article you happen across
Heather: she is also extracting other data that will be useful
 in other, related, analyses
 since she's in the articles anyway, and her "search" strategy makes data extraction from those articles useful.

10:57 AM right?

Valerie: yes
Sarah: yep

10:58 AM did you get my comment about doi?

Heather: cool.
Valerie: for example, today, I found that an article cited by treebase with one author has a different primary author listed in other citations
 sharing dois?
Sarah: mostly that if we record doi, our data could "communicate" and I might have more indepth info about an article you happen across
 which would be cool
Heather: agreed! good point.

10:59 AM Valerie: oh yeah

 that would be awesome
Heather: Valerie, one vision for your research I don't think I made very clear....
 ... is this idea that ultimately, once your searches are finalized,

11:00 AM it will be useful to try to apply them across many journals (50? all of them ala Google Scholar or Scirus? something like that).

 to really get a pan-journal view
 to complement the narrow-journal focus that Sarah will be getting.

11:01 AM Valerie: I have been finding a wide range of journals

 but a lot of the more "successful" searches can be in the same journal
 (which I should keep an eye on)
Heather: (Developing searches may be easiest within a journal, your call, I just wanted to make sure I hadn't confused you with developing vs unleashing finalized searches)

11:02 AM Valerie: I've had the most luck using ISI Web of Science

Heather: that's fine. Just wanted to make sure we were clear that there is no need (and in fact an anti-need) to on purpose limit your search space
Valerie: since it's sort of hard to search by citation in Scirus
 the only limit I put is 2008-2010, really
Heather: yes, definitely. by citation probably easiest in ISI or Scopus.

11:03 AM cool!

 ok, sorry for the diversion, just wanted to make sure we were on the same page
Valerie: it's cool, I'm sure I'll have more questions as I go along
Sarah: while we're diverted...quick comment
 what kind of data is stored on treebase?

11:04 AM Valerie: genetics, I think.

Sarah: i'm working on the molecular ecology journal and they post on genbank almost as if it's required
 but not their phylogenies
 so therefore not on treebase
Valerie: hm...]
Sarah: i thought treebase was data matricies (pairwise) for phylogenetics
 like a bunch of genes aligned
 so genetics, yes, but aligned sequences

11:05 AM for constructing phylogeneies

 i need to look at the raw data b/c i find it odd that molecular ecology has people depositing on genbank but not treebase
Valerie: some sentences I've copy/pasted mention genbank and treebase practically in the same breath

11:06 AM Sarah: yeah...i'm seeing lots of genbank but minimal to no treebase where i think it should be

 yeah...looking at it now, its phylogenetic trees
 weird
 i think genbank is more well known

11:07 AM and almost given that you post their or reuse data from their

 we should ask nic what he thinks is the differences in regard to depository metadata
Valerie: yes

11:08 AM Sarah: did we lose heather?

Heather: I'm here...
 not in discussion because I am currently ignorant about Treebase, unfort.
 Hope to fix it soon.

11:09 AM Valerie: Treebase is finicky

Heather: Was taking a minute to look at ss columns
Valerie: either that or it just doesn't like google chrome or the version of mozilla firefox I'm using
Heather: so will be ready to pick up a conversation there when conversation works its way back
Sarah: yeah. i'm done with that
 sorry

11:10 AM Heather: no, no problem, it is good stuff

 ok, so I think that ideally, Sarah, all of your "Data REUSE" columns could/should overlap with Valerie's, yes?
 so:
 Mention of DataSet WhereDataSetCitedInPaper Where_Coded Type of cited Dataset Way DataSet cited Acquirement of dataset Depository of cited dataset Details_CitedData RelevantCitations
Sarah: yep

11:11 AM but i need to code them to be "treebase y/n"

 etc
 so they match better
Valerie: mine aren't quite like that, but I can switch it around to match
Heather: could/should overlap with these from Valerie's sheets:
 Citation of TreeBASE through mention of TreeBASE Full citation as per TreeBASE recommendations Citation of TreeBASE through DOI or URI, or Study/Matrix Accession no. Citation of TreeBASE through mention of data author Number of articles citing this article text of sentence(s) in article making citation reference Data INTO TreeBASE Data OUT OF TreeBASE additional notes
Sarah: right, now I'm covering multiple datasets per paper and still figuring out how to do that
 yeah, her's are more binary (y/n)

11:12 AM but, i'm trying to figure out how to do that to keep information about the same dataset together

 for instance,
 i have a genbank citation that is cited with an accession #
 but another gene not deposited in genbank cited by mentioning a paper
 etc
 in the same paper

11:13 AM so i want to keep that data associated with each other

 rather than as general metadata
Heather: yup. I think worth figuring out a consensus that meets both needs....

11:14 AM Sarah: i'll look at it again today after i look at some articles from ecology to see how that might affect my fields

Heather: So that the data could be aggregated directly with minimal reformatting
Valerie: ok. all else fails, I could add some extra columns to match
Sarah: then i'll propose how to organize that b/c i think it's more making my data look like valerie's than vice versa
 b/c mine is messy right now
 not fully coded

11:15 AM Valerie: should I add a year heading to mine for the article?

Sarah: but yeah, we'll both probably need additional fields of some sort
 yes!
Valerie: (article date and ISSN for journal title)
Sarah: i'm going to post a group sheet about minimal needed metadata
Heather: good plan, sarah.
Valerie: thanks
Sarah: i.e for tracking and linking purposes

11:16 AM Heather: valerie, agreed, you will likely have to add extra fields that accomodate tracking of data from multiple repositories at once

 cool. I'll leave you guys to figure out a mutually-workable set of columsn for that then
Valerie: ok, because I was putting each database in its on page on the workbook
Heather has left
 Heather has joined

11:18 AM Sarah: yep. will try to figure out my coding today so it can integrate with valeries

Valerie: ok, and I'll add extra columns as needed

11:18 AM Heather: You've been invited to this chat room!

Valerie has joined
Heather: still there sarah?
 sorry, dont' know what happened

11:19 AM Sarah: yep can you not get on our previous group chat?

Valerie: ok, and I'll add extra columns as needed
Heather: yup, Valerie guessing your structure of different pages for different repositories will probably have to morph into something that works well for Sarah's situation too...
 I kept getting "group did not receive your chat" warning messages
 and I didn't get any of your messages.
 ?
Sarah: i'm in two windows
Heather: ah technology.
Sarah: which is confusing
Valerie: ah

11:20 AM Heather: I think you can close the other one....

Valerie: the last message I got was from Sarah about editing her headings
Sarah: yep that's the last i said

11:21 AM Heather: here's the last bits that I saw, for completeness

 Sarah: valerie, agreed, you will likely have to add extra fields that accomodate tracking of data from multiple repositories at once

cool. I'll leave you guys to figure out a mutually-workable set of columsn for that then

valerie.janae.enriquez: ok, because I was putting each database in its on page on the workbook

Sarah: so it will no doubt be an ongoing conversation 11:17 AM and they will no doubt change over time as you run into different situations

Sarah: yep
 if we're done with that, then i have another question
Heather: that works for you guys? you'll hammer out some combo solution, and keep it updated as it no doubt changes.
 yup, I'm done with that.

11:22 AM Sarah: valerie, what criteria do you use for the "Full citation as per Pangaea recommendations" field?

 i've been keeping notes in my spreadsheet about how i classify fields, which makes the spreadsheet messy but then i can at least retrace my criteria

11:23 AM i've been treating accession numbers as the "best" citation

Valerie: um, I'm starting to realize that the "recommendations" field is problematic since not all of the databases have recommendations
Sarah: does yours include a citation in the biblio?
Valerie: usually
Heather: thought, sarah, would this be more easily kept as OWW notes? maybe not just wanted to put it out there as idea... "i've been keeping notes in my spreadsheet about how i classify fields, which makes the spreadsheet messy but then i can at least retrace my criteria"

11:24 AM Sarah: so a full citation = dataset cited in the biblio

 yeah, i'll move it there once i solidify my fields
Valerie: yes
 the only thing is that the recommendations vary
Sarah: yeah, i've set more general criteria that isn;'t dependent on the depository

11:25 AM Valerie: (TreeBASE articles usually cite by Author and Pangaea has a doi)

Sarah: or the journal
Valerie: ok
Sarah: i don't have many of those type of citations in the biblio, can you send me an example?
 i just haven't seen them much
 preferably an example from sysbio or comparable
 do they cite the accession/handling number in the biblio in that scenario

11:26 AM Valerie: http://rspb.royalsocietypublishing.org/content/277/1684/1065

Sarah: i'm seeing a lot of weird in text citations that don't carry over to the biblio
Valerie: well, this one doesn't use the accession number
 this only cites the author
 but the authors don't exactly match despite the data/original article being the same
Sarah: i see a lot of article citations, but not accession #

11:27 AM is that what you're seeing?

Valerie: yes
 this example doesn't cite the accession #
 most of them don't
 at least the ones I've found
Heather: I'm guessing it will depend on the repository
Sarah: ok, so we should code that in place of the "full ciation per recommendations" field
Heather: I think lots of people reusing things out of Genbank will cite the accession number, for example

11:28 AM Sarah: then, later we can have a calculation or whatever that says if that criteria matches the recommendation of the depository

Valerie: yes, I see a lot more Genbank numbers than TreeBASE
Sarah: we could have nic collect specific metadata on that and code it to match ours
 i.e. citation required y/n, accession required y/n
 which would match our criteria of cited y/n, accession # y/n

11:29 AM does that make any sense?

Valerie: yes
Heather: good plan
Sarah: i'll consider that today as i standardize fields and think about the baseline metadata we need from each source
 hopefully it won't require too much backtracking at this point!
 better now than when we're chomping at the bit to get analysis done!

11:30 AM Valerie has left

Heather: whoops

11:31 AM Sarah: hmm?

Heather: Valerie, are you there?
 hmmm. dropped somehow.

11:32 AM well, whatcha think Sarah, other things to cover or enough group chatting for today?

Valerie has joined
Sarah: no, i'm good
Heather: Hi Valerie, are you back?
Valerie: yes
 I got kicked offline
Heather: no prob
 whatcha think, enough group chat for today, or other things to cover?

11:33 AM Valerie: not really. so pretty much just edit my headings to match the standards Sarah's coming up with?

 (add date, ISSN, remove "recommendation" column)
Heather: yup... well, and think about whatever Sarah proposes and make sure it fits with your data collection needs :)

11:34 AM Sarah: yeah, i'll whip something up in the next hour or so then send it your way and tell me what you think

Heather: and work it out together
Sarah: no point in us both doing it
Valerie: ok, it might be useful for me to have the recommendation column, but it might not be useful for Sarah
Sarah: i mean, until we need to discuss it
 yeah, like i said, the recommendation field could be a calculation of the other criteria fields

11:35 AM then nic's depository metadata can inform that field about whether it is yes or no based on the standardized criteria

 i'm talking at the database or analysis level
 probably could be done in excel to put your mind at ease
Valerie: ok
Sarah: with a simple formula for each depository
 field sorry
Heather: if it is useful for you, Valerie, during data collection then keep it... but what I hear Sarah saying is that it can be collected/calculated systematically later
 with yours and hers at the same time, potentially

11:36 AM Sarah: yeah, maybe make a qualitative rating of it now when you're familiar with the article and then it can be verified with an automation calculation later

 which might be more objective anyways
Heather: so no need to extract that right now (and in terms of clean/consistent data, ther eis some reason not to extract it manually right now)
Sarah: i.e. it could rate how many of the criteria were met, not just yes or no
Valerie: ok

11:37 AM Heather: that said, I could imagine that keeping a running visual of that fact might help you in your search refinement? if so, do what helps you.

Valerie: ok, neat
Heather: and the other, automated columns could always be added later, regardless

11:38 AM Valerie: ok

Sarah: i've been classifying all my fields as "extract, code, or calc"

11:39 AM that way i know which i need to get out of the article itself, which i can infer from the extracted data, and which will be calulated down the raod

 *road
 and the fun thing is, more calculation/automation ideas always come when you code the data well
 anyways, we just have to be careful at this intial step to leave our selves options in the analysis

11:40 AM Heather: yup

Valerie: ok
Heather: great, well then I'm going to head off. Sarah, you'll post the chats?
 find me online if I can help?
Sarah: yep
 and we'll talk about stats tomorrow, right?

11:41 AM and followup i guess of the ss i post today

Heather: otherwise I look forward to seeing your consensus columns and daily-google-spreadsheet uploads/updates
 yes, tomorrow let's talk knoxville and stats and anything else that comes up.

11:42 AM Sarah: sounds good

Valerie: excellent
Heather: ok! bye.
Valerie: later!
Heather has left
 Valerie has left

DataONE:meeting notes:16 June 2010 chat

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools