DataONE:Notebook/Summer 2010/2010/07/07/2010/07/13: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(→‎Group Meeting Transcript July 13, 2010: point to transcript at cleaned url)
Line 6: Line 6:
| colspan="2"|
| colspan="2"|
<!-- ##### DO NOT edit above this line unless you know what you are doing. ##### -->
<!-- ##### DO NOT edit above this line unless you know what you are doing. ##### -->
==Group Meeting Transcript July 13, 2010==
 
12:59 PM
See transcript here:
Heather: You've been invited to this chat room!
[[DataONE:Notebook/Summer_2010/2010/07/13]]
1:00 PM
Sarah has joined
Heather: Hi all!
me: hello
Sarah: good morning!
Heather: I just posted a last-minute agenda here:
 
http://www.openwetware.org/wiki/DataONE:Notebook/Summer_2010/2010/07/13
nicholas.m.weber: Hi
Heather: Additions?
1:01 PM
nicholas.m.weber: looks good
me: not really, those points look like they address my questions
1:02 PM
Heather: ok... let's start.
 
Knoxville.
 
Realized that although some people posted notes about the meeting, we don't have a face-to-face page on OWW
 
would probably be good, post agenda, link to discussions, slides, etc.
 
volunteer to make one?
1:03 PM
me: I have some notes and could make a page about the meeting on OWW that others could add to
Heather: thanks valerie, perfect.
 
maybe link from the notebook date, too?
 
have you guys put your presentations up on plone yet?
me: you mean make a notebook with date entries with each of the notes?
 
no, I don't believe so
1:04 PM
(the main DataONE site?)
Heather: I haven't, I need to do that. Can each of you do that too if you haven;t?
 
yup, the main DataONE site... we are to put all of our "final" products there, and I think our presentations coudn as interm final products :)
nicholas.m.weber: ok
me: ok, cool. I can do that
1:05 PM
Sarah: i have my notes on my oww calendar...you mean just link it to the main agenda?
Heather: Valerie, sorry, I meant go to http://www.openwetware.org/wiki/DataONE:Notebook/Summer_2010/2010 and click on the first date of the face-to-face meeting and add a pointer from there
Sarah: i'm putting my ppt up now
me: oh, ok
Heather: Sarah, yup. And/or link from the f2f page that Valerie will make to your notes pages
1:06 PM
My goal: A "summer 2010" page that summarizes, briefly, our face 2 face meeeting
 
so links from there to sarah's notes, the agenda, our slides, etc.
 
does that make sense?
 
kind of like our "correspondance" chat transcripts, but flushed out with a few more links.....
1:07 PM
still confusing?
 
not confusing?
 
bueller?
 
:)
1:08 PM
Sarah: i'm clear
nicholas.m.weber: I think it's clear
1:09 PM
Heather: Valerie, you good? if not, ping me offline....
 
Valerie, you good? if not, ping me offline....
me: I'm good
Heather: any other questions about knoxville wrap-up. reimbersement or anything?
 
(Not that I know much about that, but you can ask....)
nicholas.m.weber: not here
1:10 PM
Heather: ok.
 
you probably saw the attempts to provide more context to our OWW site
 
disclaimers, clarifications, etc.
me: yes
Heather: any suggestions?
1:11 PM
if so, now or later, let us know.
 
or, for that matter, just make the changes yourself!
 
it is a wiki after all :)
nicholas.m.weber: I thought it was comprehensive... with respect to the "readme" disclaimer... should we be doing that with each ss?
me: should we link to both our project pages and the main Summer 2010 page?
Heather: Valerie, from your SS you mean?
1:12 PM
Nic, I think yes.
nicholas.m.weber: ok
Heather: Valerie, from your spreadsheet README blurbs you can just link to one OWW page, I think, whichever one you would want to get to first if you were the person following the link....
me: yes
1:13 PM
ok
Sarah: i just added them on the first sheet of each ss
Heather: great, thanks Sarah.
 
ok, any other thoughts on that stuff?
 
if so, chime in.
1:14 PM
if not, just want to check in on the ASIS&T deadline and any other deadlines coming up....
 
Nic, you thinking a poster? Anyone else thinking a poster? or not?
nicholas.m.weber: http://www.asis.org/asist2010/cfp-postersdemosvideoss.html
1:15 PM
Sarah: i don't think i would be able to make it out to the meetings, and i don't know if it is the best place for my research anyways
Heather: sarah, that works, agreed.
me: I don't think I can make the meetings either, and I'm not sure if my data is best presented as a poster/video/demonstration.
1:16 PM
Heather: ok, good valerie.
nicholas.m.weber: there are six tracks for this years conf http://www.asis.org/asist2010/schedule-track.html
Heather: Nic, what do you think? were you planning to go?
nicholas.m.weber: I think mine fits well into the
Track 6 – Information in Context: Economic, Social, and Policy Perspectives
 
and Im already planning on going
Heather: yup... sounds like your stuff, eh?
1:17 PM
nicholas.m.weber: so I'm hoping to get a draft done late tonight
Heather: ok, great. Circulate in email tomorrow so that mentors have a chance to weigh in?
nicholas.m.weber: sure
Heather: and make sure the funding sentences are good and whatever else we need to make sure we get right for a "formal" DataONE release.
 
I mean... the mentors can then make sure....
1:18 PM
nicholas.m.weber: I was going to search for a presentation to see what others had done and then try to model that
Heather: great. yeah, don't stress over it.
nicholas.m.weber: there might be something on the plone site?
 
ok
Heather: just wanted to reiterate that having mentors see it before submission is a good idea, so you need to give them two or three days ideally
1:19 PM
not sure, probably
 
feel free to bounce it off of me whenever, if you want faster feedback. otherwise, I look forward to seeing it when you send it out.
1:20 PM
nicholas.m.weber: great
Heather: want to give us a quick summary of what you think of Google Fusion?
 
do you recommend it?
 
if so, for what?
nicholas.m.weber: sure. its really nice for making annotations
 
sure. its really nice for making annotations
Heather: if not, ???
nicholas.m.weber: but its very hard to make edits to individual cells
1:21 PM
I think it would be nice to use in a situation where a group is trying to hash out the fields they need to gather
Sarah: I like it better than regular docs, but am having some bugs
me: ah, I noticed it only liked uploading one sheet at a time (as opposed to whole workbooks)
nicholas.m.weber: it's nice that way
me: (unless I'm doing it wrong)
Heather: I noticed that too Valerie
Sarah: agreed, i like the commenting feature but it might not be as useful to us at this point
Heather: hrm... what does that mean about our README idea, btw?
nicholas.m.weber: maybe I'm missing something, but I couldnt figure out a way to merge me discussions with new sheets
Sarah: yes, but you can put the readme with the description
1:22 PM
Heather: true
nicholas.m.weber: good idea sarah
Sarah: also, can you save figures (visualizations) that you like for others to see?
Heather: sarah, what kind of bugs?
Sarah: oh, just controlling the data
Heather: nic, in what ways is it difficult to make edits?
nicholas.m.weber: I like that you can set up alerts for comments
1:23 PM
well when I was trying to change cells in a column it didn;t allow me to use keyboard nav
Heather: I don't know, Sarah, about saving vizes
 
arg that is a pain
1:24 PM
nicholas.m.weber: so I was clicking between each cell and then moving the curser with the mouse... small complaint, but if you're editing a big set it can get time consuming
Heather: yeah for sure
 
so what do you guys think?
 
are you goign to keep experimenting?
1:25 PM
or decide to skip it at this point if it doesn't solve any major problems you were having?
 
or ?
 
(btw feel free to type while I'm typing and interrrupt me, I don't mind a bit.....)
1:26 PM
nicholas.m.weber: I think I'll use it to share my sheets but not create them (meaning once I get them edited I'll begin uploading there instead of googledocs)
 
it would be a lot easier for a mentor to give feedback that way
Heather: yup. and you want to share them that way is because then other people can easily comment by using the comments?
 
yup. and you want to share them that way is because then other people can easily comment by using the comments?
 
gotcha
1:27 PM
so here is a different idea....
 
I think Google Spreadsheets supports RSS feeds.
1:28 PM
You could have a "comments" column or two where you explicitly ask people to give comments, then monitor via RSS
 
Downside: the comments aren't cell-specific
nicholas.m.weber: that could work
Heather: Upside: you could set it up such that people could actually edit the cells, which I think is a better approach for gathering input from the commumity
Sarah: does fusion have rss
 
?
Heather: less approval based
1:29 PM
hrm, probably?
nicholas.m.weber: i know in the discussion you can check an "alert me"
 
not rss for the entire sheet though I don't think
1:30 PM
Heather: I guess I'm thinking that if it is a pain to enter data there, that is a pretty big knock against it unless there are strong advantages to using it.
 
(even just entering data as a modification activity, after doing most of the work creating the spreadsheet)
1:31 PM
but I don't have strong opinions, just wanted to suggest alternatives
 
the ability to merge tables via Fusion does look pretty cool.... shrug, I dunno.
nicholas.m.weber: If I can't get comfortable with it by the end of the day I'll probably take the "comments" approach back to google spreadsheets
1:32 PM
Heather: ok. and Sarah or Valerie if you keep using it that is fine too... I think collecting info about what works and what doesn't for what usecases is valuable
1:33 PM
me: ok, sure
Heather: ok. Nic, how are your tables going?
 
You are commenting now?
1:34 PM
Let us know when you are at the point where you want people to dive in and help curate the ambigous data?
nicholas.m.weber: good, yesterday and today I spent time defining what columns I had tried to collect and then figuring out what I was missing
Heather: ok
nicholas.m.weber: one sec Im trying to get the links for my tables
Heather: I think that in parallel with this it will help to be thinking about stats
1:35 PM
the reason I say that is that thinking about stats is often a fast and real way to figure out what data you really need, in what format, etc
 
so I'd say don't try to get your spreadsheets perfect and then think about stats
nicholas.m.weber: ok
Heather: because it never works that way :)
1:36 PM
nicholas.m.weber: so in thinking about stats... I started to play with the R commands that you gave me... but I think I need to spend more time validating so I know what is valuable to look for
1:37 PM
Heather: ok, so you've got R installed and running and you can see plots etc?
nicholas.m.weber: i could perform most of the commands
1:38 PM
Heather: great. ok, then I suggest that maybe we have a dedicated chat to work through the next phase of R things
nicholas.m.weber: I'm not real familiar with it but Im anxious to play around
Heather: it might be lengthy, so don't necessarily want to do it here now
nicholas.m.weber: ok
Heather: valerie and sarah you are welcome to participate, but not sure how valuable? your call
me: I haven't really poked around much with R yet.
1:39 PM
(sorry)
Heather: That's ok
 
Valerie, I think maybe hold off on R for now because you have lots of cool article prep to do
 
To the extent we do R things, we can do them customized to your data later
me: ok, cool
Heather: Sarah, I don't think we are goign to do anything you don't tknow
1:40 PM
Nic, when you want to have an R talk?
Sarah: yeah, i'm good in terms of r
nicholas.m.weber: Maybe tomorrow ?
Heather: ok.
 
maybe 10am Pacific?
nicholas.m.weber: sure
1:41 PM
Heather: great
 
see you on google chat then.
nicholas.m.weber: ok
Heather: anything else you want us to go over here now?
nicholas.m.weber: I don't think so
Heather: ok.
1:42 PM
Maybe we just skip to Sarah quickly.
 
Sarah, how's data collection going? Stats? Anything you want to cover here now? (I want to make sure we spend lots of time on Valerie's spreadsheets ;) )
1:43 PM
Sarah: sorry, i was on another window trying to get my ppt on the plone
 
(which isn't working, i get an error...has anyone else tried?)
 
anyways,
1:44 PM
i'm finishing up data collection and anticipate being done by tomorrow
Heather: haven't tried. Maybe as a PDF?
 
great!
Sarah: i'm shooting for at least 15 articles per journal per year....is that adequate for stats?
 
that's for the 2000/2010 comparison
 
that's for the 2000/2010 comparison
 
and not all of them have a reuse
1:45 PM
Heather: Hrm, it is low, 25 is probably better, but who knows. stats is a bit of an art when you don't know the magnitude of the effect you are expecting.
Sarah: my problem right now is that many of the journals barely have 50 articles per year, so should i would be sampling a greater proportion from those journals
1:46 PM
Heather: I don't think that is a big problem actually
 
It means that those journal-years woudl be weighted more heavily, indirectly
 
but in multivariate analysis that would mostly be taken care of
 
better any remaining bias from that, I think, than not enough samples
1:47 PM
Sarah: ok, well then, push my data collection projection back a bit
 
also, the 2000 snapshot doesn't seem to be that informative
 
very few have any reuse and sharing instances
1:48 PM
so that cuts the sample size from 15 to 1 or 2
Heather: not surprising, but informative nontheless
Sarah: so, should i proceed in 2000?
Heather: yeah. that's ok though.
 
hrm, I think it depends on your time constraints. I don't have a good enough sense of how long it takes you per article
 
so what the cost is of you doing the extra 10 per year
1:49 PM
me: Is that something that can be noted in the discussion or methods, an explanation of the sample size discrepancies?
Sarah: um ... 2hours for 10 articles
Heather: It makes it an easier story to tell if you have 25 for all years
Sarah: that's conservative, but ends up being realistic when dealing with difficult articles
Heather: Valerie, yes for sure. Methods if it is a reason for not doing something (ok, or discussion, fields vary)
 
and discussion for how the sample size may limit the generalizability of results
1:50 PM
yup, I believe it.
 
That is still pretty fast, given all you are extracting.
 
So if you were to go for 25 across all journal-years, when would you push your data collection done date back to?
Sarah: friday probably
 
considering download times and such
 
considering download times and such
Heather: Yeah, I
 
Yeah, I
1:51 PM
I'd say go for it and get a 25 across the board picture
1:52 PM
That's my opinion. You are closer, to push back when/if your gut disagrees.
Sarah: so....25 in 2010, 25 in 2000, and then 25 per year in my two time series
 
correct?
Heather: definitely 25 in 2010, 25 in 2000
Sarah: like i told you before, I think the time series, even though not relevant for trends, is the most statistically usable dataset
1:53 PM
Heather: so yes, I'd say 25 in the time series too, ideally
 
that said, since that dataset has more... um... what is the word that I'm looking for
Sarah: robust
 
?
 
better sampling of actual resuses?
1:54 PM
Heather: the years are more similar to each other, so more overlap. yeah, robust... or hrm,,,,
Sarah: i.e. though my sample size is 25, not all of those can be assessed for reuse/sharing practices
Heather: not duplication, but similar datapoints
 
anyway
Sarah: yeah, i get it even though were lacking the word
Heather: that dataset has more similar datapoints, so having 15 per year (or whatever) but lots of years
1:55 PM
wouldn't be as bad as 15 per year with 10 years between them
 
know what I mean?
Sarah: yeah
Heather: so if you have to cut back on collecting extra
 
you could probably manage without beefing up the time series
1:56 PM
I hear you that not all can be assessed for sharing/reuse
 
Ideally, sure, we'd have 25 sharing and 25 reuse or something (or a sample size large enough to acheive that)
 
but that is clearly outside the possibility of this summer project
 
so we'll make due with what we have
 
and add a wish list in the discussion section
1:57 PM
Sarah: ok, and just use the word "preliminary" a lot in the writeup
Heather: yeah exactly
Sarah: got it.
 
i'm good for now then if you want to cover valerie's stuff
Heather: great.
 
Valerie.
 
where do you want to start?
1:58 PM
nicholas.m.weber: (maybe she's editing the OWW
 
?)
me: probably the search spreadsheet
 
(sorry)
nicholas.m.weber: ah
me: yeah, I was posting some links there
1:59 PM
Heather: great. what do you think? want to start by giving us a tour of your spreadsheets? or talkig about DAAC data, or ?
me: I think the overall search might be the data I work with the most.
 
ok, tour of the spreadsheets sounds like a good starter
2:00 PM
I can re-link if needed
Heather: want to post links, or point us to the page that summarizes links, or?
me: ok yes
Heather: ohhhh I just thought of it, sarah. The word I was looking for was redundant.
2:01 PM
anyway. valerie, carry on.
me: My first raw data spreadsheet https://spreadsheets.google.com/ccc?key=0AgM1E1R2tI_6dE1LYlYtWHRXblNXa3ladXNNY3BDbEE&hl=en
 
My first raw data spreadsheet https://spreadsheets.google.com/ccc?key=0AgM1E1R2tI_6dE1LYlYtWHRXblNXa3ladXNNY3BDbEE&hl=en
Heather: I love the red warning block.
me: My search comparison spreadsheet: http://spreadsheets.google.com/ccc?key=0AgM1E1R2tI_6dE9yX2J2NGwwcWhtSWg0NUZvRWlXdmc&hl=en
 
ha
 
I figured it was eye-grabbing (if not eye-gouging)
2:02 PM
Heather: It definitely makes me want to think twice before basing my next research grant on your current results :)
me: ha
 
The spreadsheet I made based on Sarah's Shared Fields template http://dl.dropbox.com/u/2281212/SharedFields_Valerie.xls
2:03 PM
I should probably start chronologically
Heather: great. so those three are your main ones?
me: yes
Heather: (sorry, btw, that I didnt' keep better tabs on this. Portland conference and all that, but no excuse.)
 
great.
me: it's ok, a lot of the information overlaps/is still pretty raw
2:04 PM
I more or less wanted to go through to see if there was anything else I needed to capture
Heather: ok, so where would you start explaining these to someone?
me: well, the first link, the data_citations spreadsheet was when I was running random searches for the databases
2:05 PM
Heather: yup
me: I captured information only for articles that had cited data, although a lot of times, I found it was cases of deposit
 
so I created the "phase II" or "edit" sections
Heather: so I'm a bit confused... ok, each row is a "hit" is that right?
me: I went from a generalized search to a specific search
 
yes
 
yes
2:06 PM
actually no
 
in phase 1, some of them were misses after I copied the sentences
 
and they were revealed to be data deposit and not data reuse
Heather: where is this phase II/edit section?
 
ah hah!
me: on the TreeBASE and Pangaea pages, they're down
Heather: I was on the DAAC tab and didn't see it
me: (I mean, if you scroll down)
2:07 PM
yeah, I hadn't done that with the DAAC tab because those were from the spreadsheet Bob sent
Heather: gotcha!
2:08 PM
ok, so if you found the same reference via multiple searches, it shows up on multiple rows?
me: I think there was one case where that happened, and I made a note of it in the same row
 
I probably should have made separate rows to avoid confusion
 
I probably should have made separate rows to avoid confusion
 
but I'm pretty sure that only happened once
2:09 PM
Heather: only one case? I'm surprised
me: unless I have duplicates
 
which may be possibl
 
e
Heather: I woudl have thought that doing similar searches in ISI or google scholar or whatever woudl have produced overlapping resutls
me: ah, this is something I should be noticing actively and taking note of then?
Heather: no, not necessarily
2:10 PM
me: after awhile, the results sort of blur together
Heather: I'm just trying to make sure I understand 100%
me: so it is likely that there are overlaps
Heather: yeah, I hear you
me: I'm sorry I got confused
Heather: also, I think at this point you were doing very exploratory searches, right?
 
are the results from all of your "formal" 27 searches in here?
2:11 PM
me: yes
Heather: ok
me: and the summary of those searches is the search_comparisons spreadsheet
 
there's actually 38 there
 
but I used multiple examples
Heather: gotcha
me: (for particular author names/datasets/etc.)
Heather: I'm not at all suggesting you go back and get this now....
2:12 PM
but if/when doing things like this again, another piece of information that woudl be helpful is the sentence that makes the citation
 
so not the citation itself (though I'm glad you have that too!)
 
but the sentence that makes the citation
 
to understand the context and words it uses to talk about its reuse
me: ok. I thought I had put that in column P
2:13 PM
Heather: maybe you did, hold on
 
hmmm, so in row 53 of the treebase tab,
 
hmmm, so in row 53 of the treebase tab,
 
column P looks like this, right?
 
HIGDON JW Phylogeny and divergence of the pinnipeds (Carnivora : Mammalia) assessed using a multigene dataset BMC EVOLUTIONARY BIOLOGY 7 : ARTN 216 2007
me: oh
Heather: was that in the article bibliography?
2:14 PM
me: yeah, I might have had that as a placeholder
Heather: ok, I can see that in some other rows above you do have the sentence
me: (Since I didn't have fulltext on some of them)
Heather: gotcha
2:15 PM
hmmm, in that case it might just make sense to replace the placeholder with "unknown, no access to full text" or something like that?
me: ok, that makes more sense
Heather: I do see all the other sentences though in other rows. that's great
 
I'll turn it around now then and ask for the other thing too....
2:16 PM
me: oh wait, are you looking at the RAW or EDIT page for TreeBASE?
Heather: so in addition to the sentence, which it looks like you do mostly have....
me: I may have found the fulltext for all/most of the articles
Heather: it would be useful to also have the reference, to see if it did say anything at all about the data within the bibliometric citation
 
I was looking at TreeBASE RAW. Should I have been looking at EDIT?
2:17 PM
me: oh yeah, sorry
Heather: no problem
me: I think I only went through and found the fulltext for the edited sheet
Heather: ah yes, column P is much more complete! awesome
me: I was worried for a second because I'm pretty sure I was able to get most of these through UNM
2:18 PM
Heather: ok nice
me: all right, but yeah, it was a lot more useful to see the citation in context
 
like one example where it just mentions treebase in row 19
 
"The TreeBASE interface http://www.treebase.org supports six query types: author, citation, study accession number, matrix accession number, taxon and structure. "
 
it's more of a mention than a citation
2:19 PM
Heather: yeah. and not really data in or out, eh?
 
so I'm looking at the DAAC tab in that spreadsheet and I'm a bit confused.
me: yeah, I wasn't sure what to put for that
Heather: what is in it?
2:20 PM
me: I put in 1s where it should probably be 0
 
since it didn't actually use data
 
since it didn't actually use data
Heather: right, yup, that might be best
 
so on the DAAC tab, it has 17 data rows.
 
so on the DAAC tab, it has 17 data rows.
2:21 PM
you didn't pull all of the data from Bob's spreadsheet for this sheet, but you did pull some?
me: I think I pulled some
 
to construct the searches for this spreadsheet
Heather: ok, gotcha
me: like the cited author search
 
or the doi search
2:22 PM
Heather: ok. I think I've got a handle on that spreadsheet now.
 
on to your searching ss?
me: should I have gone through all of the ones on Bob's spreadsheet?
 
ok, yes
 
as I mentioned, even though I came up with 27 types of searches, this sheet has 38 rows because I tried to use multiple examples of author name/doi
2:23 PM
Heather: ok, makes sense
 
now your dropbox sheet?
me: ok
Heather: (Nic and Sarah feel free to jump in whenever if you have comments!)
2:24 PM
me: I had used Sarah's formulas on the ISIraw_PasteFullRecordHere page
Heather: I'm a fan of your DAAC tab, I know that so far
me: and copied and pasted the ISI downloads into it
 
thanks
 
the one thing I ran into was for my non-ISI searches, I ended up just entering the DOI on the Reuse pages as opposed to trying to fill out the ISI full record page
2:25 PM
Heather: ok, so is the "Article" tab part of what makes the formulas work? so it contains transient info?
me: I think so. I started filling it out, but as I got into articles not from ISI, I did most of my filling out of information on the Reuse pages
2:26 PM
Heather: ok
 
so the DAAC tab contains one row for every row in Bob's spreadsheets, right?
me: I tried to copy/paste them all onto each page and place 0s where they didn't apply
 
yes
 
plus the other ORNL DAAC articles I had found through other searches
Heather: what about thte other reuse sheets?
2:27 PM
they contain the reuse articles you foudn for that repository type?
 
they contain the reuse articles you foudn for that repository type?
me: now that I think about it, I should have made a way to integrate the search spreadsheet with this spreadsheet so I would know which articles came from what searches
 
yes
Heather: do they also contain reuse articles that sarah found (I'm guessing not, just want to make sure)
me: the 0s are placeholders because I tried to list all of the articles on each page
2:28 PM
I think I cleared Sarah's data before putting mine in just in case of overlap
Heather: ok
me: although she had sent me the template early in her data collection
Heather: so where there is orange zero blocks, that is because that article reused data from a repository other than the one the tab is named for, is that right?
me: yes
 
either the orange blocks or a 0
Heather: ok
2:29 PM
me: although when I look through each page, I don't think that the rows all have the same numbers, so I may have accidentally pasted over some things
Heather: ok. and the share tabs don't really have any useful content at this point, right?
2:30 PM
me: I hadn't looked into the sharing, since I was mostly looking for reuse only.
Heather: right.
 
just making sure I understand
 
just making sure I understand
 
and that there isn't secret text hiding behind the orange background or something
 
:)
 
nice Tennessee orange btw
me: there might be. I'm not quite familiar with all of the formulas
 
ha, I didn't realize that
Heather: nah, I was joking.
Sarah: it's just conditional formatting
2:31 PM
Heather: gotcha
me: ah
Sarah: i did so i could see empty cells that needed my attention
Heather: good idea
Sarah: so in theory, if the record is complete, nothing should be orange
 
though in mine, i haven't taken the time to enter all the "no" data yet
2:32 PM
Heather: ok, valerie is there anything else you want to point to in this data deluge?
me: well, I was just wondering if I should try to combine either this sheet or my raw data sheets with the search comparison sheet
 
would it be clearer to explain?
2:33 PM
Heather: hmmm, I'd wait
 
I'd figure out what your top goal is and be driven by that at this point
me: ok
Heather: You've done lots of bottom-up, which is super. Now time to flip around I think.......
2:34 PM
me: is this where I go through the spreadsheets to do reverse searches?
Heather: Well, I'm not sure.
 
So I'm starting to get an idea in my mind about what the backbone of your article could be based on
 
Do you have ideas about that?
2:35 PM
Thinking if we start there, it will inform what other searches to do, data to gather, spreadsheets to consolidate, etc
me: well, I'm figuring out that each search function is built to accommodate different methods
Heather: So maybe let's talk ideas
me: good plan. I wasn't sure if I was coming to the right conclusions
2:36 PM
I had been making observations in OWW but nothing particularly in depth
Heather: Yup.
 
Want me to relay my DAAC idea, and you can see if it rings true for you, or inspires other things, or ?
me: sure
 
sure
Heather: (or maybe you read it already and just want to cut straight to the commenting?)
2:37 PM
here, I'll try saying it one more time
nicholas.m.weber@gmail.com has left
 
nicholas.m.weber@gmail.com has left
me: the searching by DOI idea?
Heather: because sometimes hearing things multiple ways/times can help make them clear.
 
Yeah. or I guess more generally, using the DAAC experience
 
as the backbone of the article
2:38 PM
You paint a picture of a repository that wants to know how its data is reused
2:39 PM
so that they can learn, give feedback to funders, provide links to data creators, etc
 
They ask a librarian to look for these reuses once a year
 
(hrm in case it isn't clear, you paint a picture of the DAAC repository itself, not a hypothetical one)
me: ok
2:40 PM
Heather: Ask the DAAC librarian who does it how long it takes her
 
how she does it, etc
 
Report that in the article
 
I'm guessing it takes a while and is frustrating
me: and compare that with my own experience?
Heather: if her experience is anything like yours!
me: she's probably a million times better at it
Heather: I wasn't thinking compare, as much as compliment.
me: ah, ok
2:41 PM
Heather: Focus on the DAAC experience as a case study for the first half of the article, maybe
me: and then mention the others?
Heather: and towards the end of the DAAC part, you could say "and furthermore they offer DOIs!"
 
ok without the !
me: ah
Heather: and talk about how DOIs are supposed to make traking articke reuse easier
 
but darn it, no one is using them
2:42 PM
me: ooh, that's good
Heather: and you can quantify the "no one"
 
by finishing up the great analysis you were doing on the DAAC tab of your dropbox spreadsheet
 
that will allow you to say somethin glike "only 14% of all the reuses found by the DAAC librarians actually used DOIs"
 
or something like that
2:43 PM
me: I wasn't really doing much analysis, I was just plugging away based on the awesome model Sarah set up.
 
I wasn't really doing much analysis, I was just plugging away based on the awesome model Sarah set up.
Heather: right, but I think you were capturing the references, is that right?
 
or if you weren't, you could?
me: yeah
Heather: to see the patterns of citation?
me: the sentence and location
2:44 PM
and the fact that almost none of them have it in the references section
Heather: the benefit of the DAAC dataset is that it is a librarian-derived set of repository based reuses
 
and so it provides a great baseline... something your study has been missing until now
2:45 PM
you could say, because of this diversity in data citations, my initial attempts to find DAAC reuses met with little success.
 
only 23% mention the DAAC url, etc
2:46 PM
then follow on to the DAAC first half
me: ah, ok
 
others just mention the data authors, etc.
Heather: by saying "most repostiories don't have DOIs and so finding their reuses is even harder"
 
yeah
 
so the points of the article would be something like
2:47 PM
a) finding instances of data reuse is hard (estimate of difficulty: estimate from DAAC librarian plus a bit of anecdotal colour from you)
2:48 PM
b) there are plans to make it easier, but so far the uptake has been low (estimate from # DOIs in references within DAAC set)
 
c) hrm, I'm sure there is a third point in here somewhere :)
Sarah has left
 
Sarah has left
2:49 PM
Heather: And this is done using the motivation/dataset/discussion of the DAAC case to start with, and flushed out towards the end with your experiences qualitatively and with different repositories
 
Thinking you have a conversation with Bob Cook and the DAAC librarian and maybe others to collect their thoughts and experiences.
 
Whatcha think?
me: this sounds good
 
definitely a way to open up a dialogue
 
definitely a way to open up a dialogue
2:50 PM
Heather: Doesn't have to be like that exactly of course, make it your own
me: it's good to have a framework to work with
Heather: But I think the DAAC dataset and reuse-hunting-experiences provide a useful framework
 
yeah exactly
me: I've written arguments before, but not in a scientific/bibliometric capacity.
2:51 PM
ok, neat
Heather: so not sure where you start
me: should I email the DAAC librarian?
Heather: yeah. maybe Bob Cook first?
me: ok
2:52 PM
Heather: maybe before you do that, you might want to have a look at the DAAC data you have so far extracted
 
on the DAAC tab
me: come up with a list of interview questions?
 
oh ok
Heather: so that you could come into the conversation with a bit of knowledge
2:53 PM
along the lines of "I looked at 54 of the 124 reuses in that spreadhheet
 
and it appears that only about 12 of them have DOIs"
 
"this was less than I would have guessed"
 
is that in line with your experience?
 
etc
 
or whatever the reaal numbers are
2:54 PM
me: yeah, that makes sense
 
I could quantify those
Heather: tally up a few other things too, maybe, so that when the librarian tells you how she looks for the reuses her answers make sense to you based on the data you have
 
like are there some where when you look at the full text you have no idea how the librarian would have found them?
2:55 PM
if so, you can ask her explicitly about those.
 
you know what I mean.
 
Spend a bit of time with the DAAC citations you've extracted so you are familiar and can ask good questions and understand the answers :)
2:56 PM
me: ok
Heather: one thought is before you do this, sleep on it
me: that was why I was wondering about combining the search compariosns with the other sheet
Heather: just to make sure you don't have some direction that you'd rather take it
2:57 PM
another idea is to send an email to data_citations email list describing this direction for the article, seeing if they have any suggestions, etc before you email Bob and the librarian
 
anyway, follow your gut on that. just wanted to share a backbone idea so that you could start to frame your article and drill to what you needed
me: excellent idea
2:58 PM
I'll write out a rudimentary outline
 
I'll write out a rudimentary outline
 
and some initial figures/question
 
s
 
and then email the datacitations list
Heather: yeah, ok, if you think combining the search spreadhseet witht eh DAAC one would help, I could definitely see that
me: ok
2:59 PM
I think it could help
 
at least for keeping me from getting dizzy juggling all the sheets :D
Heather: I'm guessing you might also want to merge together your DAAC dropbox sheet
 
with some of the columns of Bob's original sheet
3:00 PM
like the column that said what dataset number they were reusing
me: yeah
Heather: what DAAC project, etc
 
(basically all the columns, why not)
3:01 PM
it would be interesting to know what % of the DAAC reuses included the Data_Set_ID number in their citations/papers, etc.
 
it would be interesting to know what % of the DAAC reuses included the Data_Set_ID number in their citations/papers, etc.
 
whatcha think?
me: as opposed to the article doi
 
good plan
Heather: right. or in addition, or who knows?
3:02 PM
shall we go through the DAAC dropbox sheet in detail, briefly?
me: sure
Heather: I know they are Sarah's columns, but want to make sure they are capturing everything you need to capture for your arguement and story
3:03 PM
Now in this case the orange 0s are because you didn't have full text, or you didn't get to those, or ?
 
Now in this case the orange 0s are because you didn't have full text, or you didn't get to those, or ?
me: the orange 0s were the ones carried over from the other reuse pages
 
except for one
 
where I didn't have full text
 
I meant to keep using my own color coding scheme that I started on the ISI raw data page
3:04 PM
Heather: the DAAC spreadsheet from Bob had 116 rows of reuse articles I think, right?
 
I think you want a sheet that only has those 116 reuse articles on it
me: oh
 
hm. I wonder why I have another ORNL DAAC spreadsheet with even less than that
3:05 PM
I think I mixed up my spreadsheets
Heather: so maybe make a copy of this sheet and cut out everything that comes from your searches instead
me: It looks like I didn't copy/paste all of bob's spreadsheet
 
ok
Heather: jsut for the sake of the "first half" of the paper as I'm envisioning it....
3:06 PM
yeah. make sure you have exactly those papers, no more, no less, then you can calculate stats based on "what the DAAC librarian found"
me: ok
Heather: ok, column D in the dropbox sheet
 
ok, column D in the dropbox sheet
 
is Y when they mention the DAAC somewhere in their reuse paper or citations?
3:07 PM
me: yes
Heather: great
 
type of dataset
me: There's a key I made up that either abbreviated the name of the project or did something else
3:08 PM
Heather: ok
me: like RP: River Productivity Data
Heather: so maybe make a standalone spreadsheet of this stuff
 
and in the README you could put those codes?
 
or some other way you think it woudl be easy to communicate
me: ok
 
ok
3:09 PM
Heather: location of intext citation?
 
intro methods abstract etc, right
me: yes
Heather: now if it is in multiple places, how did you decide what sentence to cut and paste into column I?
3:10 PM
me: I tried to get all of them
 
adding [...] between each sentence
 
adding [...] between each sentence
Heather: ok, just concatiated togetehr. gotcha.
me: it got lengthy
Heather: is the "R" in location for references?
me: yes
3:11 PM
Heather: ok.
me: although in some cases I might have mistaken D for R
 
by saying Repository instead of Depository
Heather: it might be useful to make a second column beside I to hold the references cut and pastes
 
yeah, I hear you.
 
The R/not R distinctioni is important, since it determines whether the info can be looked up through full text or ISI
me: oh you mean like a 1 or 0 for if there is a relevant excerpt followed by the excerpt?
3:12 PM
Heather: hmmm, not sure what you mean.
me: oh
Heather: I mean have two columns like your "I" column now
me: there is a "relevant bibliographic citation" in the next rows
Heather: where one of them has text that is in the body of the article
me: in K, I believe
Heather: ok, gotcha, so I'm jumping ahead of myself, eh?
 
ok, gotcha, so I'm jumping ahead of myself, eh?
3:13 PM
me: Sarah was very thorough in her headers
Heather: yeah, ok. hrmmm/.
 
ok, let me come back to this thought in a minute
me: ok
Heather: in the mean time, in column G, what is R?
3:14 PM
references again?
 
??
me: I think that might have been R for repository
 
I meant to put D for Depository
 
I meant to put D for Depository
Heather: ok
me: since those cite ORNL or Oak Ridge, etc
 
since those cite ORNL or Oak Ridge, etc
3:15 PM
Heather: what is column H?
me: it seemed redundant
 
I took it as where did it come from
Heather: what does "NI" stand for, do you know?
me: either from the author or the repository
 
or Not Indicated
3:16 PM
Heather: ah hah.
 
ok.
 
is having a Y in column J
 
the same thing as having an "R" in column F?
me: not necessarily
3:17 PM
Heather: so then what does it mean to have an R in column F?
me: oh, I think I put it in R for references if that was the only mention
3:18 PM
Heather: or there is an R in column F if the references themselves mention data
me: yes
Heather: as opposed to the references being the original data-collection paper?
me: or the repository name
 
yes
 
yes
 
as is the case for most citations by author name
Heather: there is an R in column F if the references themselves mention data or the repository name
 
there is an R in column F if the references themselves mention data or the repository name
 
is that right?
me: yes
3:19 PM
Heather: gotcha
 
then the citation itself is in column K
me: yes
 
where either the repository name, author name, doi, etc. are mentioned
 
where either the repository name, author name, doi, etc. are mentioned
 
in the reference page
 
in the reference page
Heather: and what was the criteria you used for determining column L?
me: er reference section
Heather: yup, I hear you. that makes sense.
3:20 PM
me: L was more or less if it was according to ORNL's data citation policy
 
where it includes the DOI
 
so I put N for most of them
Heather: ok.
me: I have a link to ORNL's data citation policy I can reference in the paper
Heather: you might want to go recode that a bit
 
one thing that would be useful to pull out explicitly, into its own column, is whether there is a DOI
3:21 PM
it looks like, for example, rows 81 and 82 have a Y in that column but no DOI
 
it looks like, for example, rows 81 and 82 have a Y in that column but no DOI
3:22 PM
me: oh, I think I counted it if the URL was included
Heather: ok.
me: like http://www.daac.ornl.gov/MODIS/modis.html
Heather: yup, I'd break that into two columsn
 
for your own information
me: so y/n DOI column
Heather: yup
me: ok cool
Heather: plus a y/n url column
3:23 PM
plus anything else that you think would be helful to break out
 
maybe a y/n "it mentions the name DAAC in the bibliographic reference" for example
 
part of the point of data citations is that it woudl be WAY easier to track them if we could use bibliometric resources
me: ok, so you mean like the first spreadsheet I made?
Heather: like ISI or Scopus
 
like we do with articles
 
but we can't do that unless
3:24 PM
a) people use bibliometric citations rather than just in-text mentions
 
and b) we know what to look for (and what field to look for it in) within the bibliometric citation
 
standard with articles
 
but as you found with ISI it frankly isn't clear what to look up where to find doi and citations in bibliographies!
3:25 PM
me: yeah
Heather: whoops: but as you found with ISI it frankly isn't clear what to look up where to find doi and data citations in bibliographies!
me: there's not a field for it
Heather: right.
 
that is a very useful poitn to make in your article
 
I'm surprised ISI doesn't have an [all] search aiblity, to search in all facets of the citation
 
it is a pain not to have it!
me: yeah
 
no fulltext
3:26 PM
every other search has fulltext
Heather: which reminds me... do you have scopus access?
me: I'm not sure
 
I haven't used it
Heather: yeah. I wouldn't put it at the top of your list, but I think it might be useful to use
me: ok
Heather: if you are going to redo any of your searches
3:27 PM
it does have an [all] so you can search in all aspects of the citation
me: good to know
Heather: yeah... for what it is worth, I'd be careful using the word "full-text" with regard to citations
 
it is easily confusing, for me
 
it is easily confusing, for me
 
because you mean the full string of the citation, right? but most people think of the full text of the article, the intro, results, etc.
3:28 PM
me: oh
 
that was what I meant
 
full-text article searching
Heather: hmmm.
me: because I know Google and Scirus do that
 
but not ISI
Heather: yeah, so agreeed, full-text article searching would be helpful and would solve the problem
me: although it does have its limits
3:29 PM
Heather: riught
 
like they don't have the coverage that ISI has
 
because ISI only has metadata and references, right?
me: yes
Heather: because that is what publishers are willing to share with them.
 
so while it woudl be nice if ISI had full-text searching, that isn't likely to happen any time soon
3:30 PM
me: at the very least, a doi search
Heather: what I wish that ISI had, in a practical, I-don't-see-why-they-can't sense
 
is the ability to search for a word in any part of the citation
 
as opposed to just in the authors field or just the journals field or just the first-page field
 
admittedly there aren't many people who want to find all citations to papers by Dr Apple and published in the journal called Apple
3:31 PM
me: yes
Heather: but I think it woudl be useful in our case because we frankly don't know where the doi is going to show up, or where they are going to slot the "Data from Oak Ridge" phrase
 
but I think it woudl be useful in our case because we frankly don't know where the doi is going to show up, or where they are going to slot the "Data from Oak Ridge" phrase
 
It doesn't look like ISI supports that to me, does it to you?
 
you can OR all the parts together, but that is cumbersome and still maybe lossy
3:32 PM
me: ok
 
I was just thinking from the ANDS angle
 
where they were working with Thomson Reuters and Elsevier to improve search functions for data
Heather: yeah. agreed.
3:33 PM
might be worth touching base with them when you have some of your article flushed out, in case they want to add or give context to something
me: ok, neat
Heather: ok, so how are you feeling?
me: more up to speed
Heather: do you have stuff to go on?
3:34 PM
me: I'll process these notes today and sleep on it
Heather: does it feel like you have a clear path? one that makes sense to you? one that you believe in?
 
that's the goal anyway
 
:)
me: yes
Heather: ok, sounds good.
me: I'll go post this conversation.
3:35 PM
Heather: ok, great.
 
let me know if you need a sounding board if some of it isn't making sense or doesn't sit right
 
let me know if you need a sounding board if some of it isn't making sense or doesn't sit right
me: sure
 
thanks a ton
Heather: you'll aim to have an email out to datacitations in the next day or two?
me: yes
 
once I get a solid outline of what I want to cover
3:36 PM
and after I look through the 100+articles
Heather: cool. maybe we have another chat towards the end of the week? we'll play it by ear.
me: definitely
Heather: if you have trouble getting full text, remember that Bob offered to post them or something.
me: ok
Heather: guessing that will take a little time, so if you expect to have problems,
 
probably best to ask for that earlier than later
3:37 PM
I'm kicking myself that we didn't meet with Oak Ridge Bob when we were there.
me: I guess we just didn't have time.
Heather: Was he out of town? I don't know. Anyway, he has been very supportive
 
via email so we'll just soldier on remotely
me: yes, the spreadsheet helped guide my searches
3:38 PM
Heather: Have a good rest of the day and talk soon!
me: you too!
 
later





Revision as of 06:15, 14 July 2010

Project name <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</html>

See transcript here: DataONE:Notebook/Summer_2010/2010/07/13