DataONE:Notebook/Summer 2010/2010/07/27 chat: Difference between revisions
Sarah Judson (talk | contribs) (New page: In the chat room: Heather Piwowar (hpiwowar@gmail.com), Nicholas Weber (nicholas.m.weber@gmail.com) 11:55 AM Heather: You've been invited to this chat room! todd.vision@gmail.com has j...) |
(No difference)
|
Latest revision as of 13:10, 27 July 2010
In the chat room: Heather Piwowar (hpiwowar@gmail.com), Nicholas Weber (nicholas.m.weber@gmail.com)
11:55 AM Heather: You've been invited to this chat room!
todd.vision@gmail.com has joined
11:56 AM todd.vision: hi guys
Heather: Hi guys. Trying to get Maribeth and Bruce Nicholas: Hi Sarah: Hello
11:57 AM Heather: Unfort they are just showing up as "invited" not "online." Feels like there is some google chat magic I'm missing for connecting to people the first time. Any suggestions?
(btw Valerie had another commitment, but will try to join us later)
11:59 AM todd.vision: just popped off an email to them. it's not quite 3pm here yet, so... 12:00 PM while we are waiting, let me update you on the survey. 12:01 PM Apparently Rebecca Koskela was in a review panel last week, so never got the chance to set up the survey monkey form
I would expect to see it before the end of the week, although she hasn't given us a precise date
12:02 PM refbruce@gmail.com has joined
todd.vision: folks there? Heather: Yay, Bruce is that you? refbruce: here now todd.vision: welcome!
12:03 PM Heather: Great, then I think we can get started. I'll keep an eye out for
Maribeth and Valerie. refbruce: Haven't used this before. It looks like there's voice and video? Heather: Yes, though they don't facilitate the lovely text capture we all have come to love todd.vision: type fast bruce Heather: but useful sometimes
12:04 PM refbruce: 'k. Tx
Heather: proposed agenda: lol a) OWW lessons learned doc b) intern updates on what they've been doing, what they have planned next
12:05 PM (bruce, jump in whenever, interrupting is no problem)
todd.vision: warning - will leave at 4 Heather: c) ideally some pointers on end-of-internship stuff ok other topics? todd.vision: thats a good start Heather: we're thining we'll keep the intern summaries brief this time ok..... a) Suzie sent me email asking about the OWW lessons learned doc
12:06 PM and when we would have it avail
because she'd like to use OWW for a project for her new phds :) so mostly this is just a heads up that I'm going to make a page and ask for your help flushing it out Sarah: sure Nicholas: ok todd.vision: from the perspective of students, or mentors, or both?
12:07 PM Heather: yeah, good question
Sarah: do we want individual perspectives (1st person) or topic by topic how to Heather: I think there is a market for two different types of docs right todd.vision: i like the topic by topic idea Heather: so I think one doc that would be valuable to Suzie, and I'm sure others, is "one way to get started"
12:08 PM and we could give them pointers to what we've learned about how to make a "lab", RSS feeds, talk pages, etc. Mostly pointing to the docs that exist.
then I think attached or separately could be random "lessons learned" and tips. thoughts?
12:09 PM todd.vision: start a google doc?
Nicholas: I think it would be good to share our "formatting" and lessons learned, but part of that value is learning and then posting for one another... Heather: (needless to say I think we also highlight this doc/these docs on our blogs and welcome feedback) todd.vision: or do it in OWW? Heather: I think in OWW
12:10 PM because valuable for it to live in this community so that others here can find it
todd.vision: gd pt Heather: agreed Nic, nothing beats learning it yourself. But hopeuflly we can help someone avoid the struggles with "do I make a notebook under my name or my lab's name?" ok, probably enough time on that, just wanted to give it visibility
12:11 PM I'll make a page and send around a link
Nicholas: you're right Heather, I guess I was just thinking out loud refbruce: And what are the lessons learned that are relevant to other platforms along the lines of OWW, not just this specific toolset. Heather: good point Bruce definitely some things like value of a text chat log refbruce: The advantage of being old. I've seen lots of fads and evolution :-).
12:12 PM Heather: (value of posting the text chats regularly, with little lag, <<whoops, recently>>)
ok! good. Will try to emphasize that. oh, there is Valierie, hold on
12:13 PM Valerie has joined
Heather: Hi Valerie Just in time for status reports Nic, do you want to go first? quick summary of where you've been and where you are going? Nicholas: sure So this week I’ve spent most of my time getting phenomenal R and stat lessons from heather
12:14 PM and then using what she has taught me to discover what in my data needs to be cleaned up
and trying to undertand which variables are valuable todd.vision: did you figure out a way to get comments from others - I remember Heather saying there was a bug with the way it was set up initially
12:15 PM Nicholas: I've kept a pretty close record my oww pages
Heather: I think Todd means your spreadsheets Nicholas: well, I think fusion tables was really hard to edit in Valerie has left Nicholas: so I moved "feedback spreadhseets" to google docs and posted a note to the group (also I'm focusing almost entirely on Journal data this week)
12:16 PM todd.vision: get feedback yet? (I still need to give you mine)
Heather: (I still need to give him mine too) Nicholas: not much... Sarah gave me good feedback last night so I could dive in and sort out my Subscription model column todd.vision: ko Nicholas: where I plan to go:
12:17 PM is understanding my stats better so that I can write them into a paper
I've started an Abstract and Introduction sections for the Data Science Journal todd.vision: any take home messages yet? Nicholas: and I'll post them tomorrow publicly take home messages?
12:18 PM todd.vision: headline results
Valerie has joined
12:19 PM Heather: I've mostly been trying to convince him to not look at the results very much yet :)
Nicholas: hmmm, not that far honestly -- it seemed at first there was some evidence of a relationship between Impact Factor and journal policy.... but that's ebbing and flowing as I clean and change data todd.vision: sorry... Heather: since we are still data cleaning, etc nah, that's ok!
12:23 PM You've been invited to this chat room!
Nicholas: seperate
12:24 PM I broke out the original request / require column (what I had in Knoxville) into seperate columns 12:25 PM todd.vision: last question: what are the publisher categories?
Nicholas: and kept archiving directions and citation directions separate todd.vision: sounds good Nicholas: Wiley Elsevier Springer and Other todd.vision: ok
12:26 PM Heather: Any other questions for Nic?
Or Nic, questions for others? Nicholas: it breaks down to Other - apx 125 -- The 3 Major Pubs -- 185
12:27 PM I don't think so, if you have input for my stats, I've been putting most things on my OWW calendar pages
todd.vision: great thanks! Heather: Yeah, Nic has not only been picking up the stats really fast, Nicholas: I have a reallly really good instructor Heather: he's also been blazing ground about how to show R code and results on OWW pages. Learning lots through is experiences that we'll all be able to use soon/later. Good stuff,
12:28 PM ok. Valerie, do you want to go next?
Quick summary of what you've been doing, where you are going next? Valerie: sure
12:29 PM Now that I have James's answers, I'll be able to finish this round of drafting on my perspectives piece
Heather: YAY! James's answers! Valerie: Todd sent me guidelines to Learned Publishing, which is probably a good route to go since this is mostly aimed for publishers refbruce: sorry that took so long on this end. Valerie: oh no that's ok
12:30 PM I know you all are really busy there
also, I've been adding links/files to Mendeley in collections based on articles I've found citing TreeBASE, Pangaea and ORNL DAAC data (an a separate folder for the ORNL DAAC articles found by James, etc.)
12:31 PM I also added the .pdfs for the web resources page
(that was originally on the DataONE and my OWW notebook) the .pdfs that are up are only the web resources and the files Ranjeet sent me however, I am limited to how many people I can share all .pdfs with
12:32 PM (10 users per collection)
Heather: yes in general, how has the mendeley experience been? Valerie: it's been great I've been able to add so much very quickly. Heather: pros/cons as a way to keep up this biblio as a livign and useful contribution? Valerie: a lot of things will autopopulate if you just have a DOI todd.vision: I have to say I'm really impressed with it Heather: yeah me too Valerie: the only con I can think of is whether or not someone other than me will be able to maintain the collection
12:33 PM I haven't quite looked into that
refbruce: Been doing some looking at Mendeley. Interesting. But I don't see an export capability. Heather: I think you can change the owner, but I don't know if there can be more than one owner. worth looking into.
12:34 PM Valerie: the desktop program is very slick too 12:35 PM todd.vision: Bruce - the desktop app exports in bibtex, ris or endnote xml
refbruce: Tx. Good enough. I don't like lock-in. Those are good formats. Nicholas: in one of the research centers here, they have subscription that allows us to share with unlimited users Valerie: oooh Heather: people there like it, Nic? Nicholas: its a really great way to get a grasp one what people are doing research on they just started
12:36 PM like two weeks ago (just found out myself)
todd.vision: not sure we need to share the PDFs as much as the citations Valerie: ok yeah, there's a limit to how much file space you have for sharing Heather: yeah, though sharing PDFs amongst ourselves can be useful Valerie: (500mb, I think Heather: solves the "who has this 1988 paper" problem Valerie: (for the free basic account at least) ah
12:37 PM I think I was able to find that one and upload it
the one on JSTOR? Heather: yup! great. ok, so thumbs up on mendeley so far. Valerie, send aroudn a pointer to the public and shared collections and we can keep exploring/tagging/seeing what we think back to your research paper for a sec? Valerie: a link to the datacites group?
12:38 PM (and/or on the OWW page?)
Heather: both Valerie: ok Heather: also maybe a blog post with a link to the public collection? todd.vision: that would be great Valerie: sure Heather: ok. as you looked over james's response, did it seem to answer all of your questions?
12:39 PM or enough of them? or ?
Valerie: I haven't looked at it yet (just got home), but the answers look very detailed Heather: basically: do you have what you need? ok, great Valerie: and all questions I listed look answered Heather: let us know soon if you think you need more of something :) Valerie: ok, if I have any followup, I'll be sure to send that around
12:40 PM Heather: great. and asking them if you can post their responses on our public OWW would be great.
Valerie: it shouldn't take me long to work these answers into the article text Heather: ok, quickly moving to Sarah. Valerie: oh yeah, I'll send that in my thank you email Heather: so we can be done by the top of the hour great! Sarah, quick summary? Sarah: ok.... i'm in a similar state as nic
12:41 PM mid analysis and cleaning data
i hashed out the rest of my factor classifications this morning with heather and am currently working on running analysis for reused and shared datasets in terms of what we're calling "resolvability" and "attribution"
12:42 PM and a combined score of "ideal" data citation
*but it doesn't necessarily need to be called ideal just resolvable + attributed = a "good" data citation in that it allows you to find the dataset from the information in the paper and gives the original data author attribution for the dataset in the paper
12:43 PM so, that's we're i'm at
currently, year and depository are coming out as significant factors with genbank reuses driving the trend i haven't yet run the statistics on sharing
12:44 PM but am expecting similar things
any questions on that? or, heather, things that I'm missing that the others should know ? Heather: nope, I think that is good
12:45 PM one attribute of score we aren't pursuing
due to sparsity of data, is what we might call "discovereability" todd.vision: did you agree to keep the two different samples separate for analysis? Sarah: no we've combined them b/c the stats came out similar separate and combined Heather: so how findable is the data reuse citation by someone doing what Valerie was trying to do. Would include where in the paper the reuse attribution was made. But since so few were in the biblio, for example, mostly not relevant for stats.
12:46 PM Sarah: i thought you said not to combine them, heather felt that you did
so, that's still up for debate and could use further clarification, perhaps in response to the email string we've been passing back and forth or we can hash it out here
12:47 PM todd.vision: if the results are the same either way, that is justification for combining them - but make sure to note that the trend (as opposed to just the p value) is not different for the separate analysis
if thats actually the case, of course Sarah: ok, i'll check for that as well i also have a factor for "sampling method" to make sure that isn't significant
12:48 PM but I haven't run that yet 12:49 PM todd.vision: not sure about the wisdom of including that factor. It's not that you necessarily expect the means to differ, but perhaps the shape of the distribution
Sarah: it wouldn't be included in the final analysis just as a test of sampling artifact Heather: good point, Todd Sarah: so, i'm planning to run the complete analysis first, then run it with the sampling artifact Heather: Sarah's going to post her code and results soon, so we can have a deep look
12:50 PM Sarah: is that not the way i should do it?
todd.vision: fine as an exploratory thing just tough to interpret a negative results Sarah: yeah, i'm hoping to put my cleaned code up today or tomorrow...watch the RSS Heather: Todd, you are suggesting doing a stratified analysis to make sure the trends go the same way, right?
12:51 PM todd.vision: right
Heather: ok. todd.vision: but not formally stratified... just sanity checking the results from the separate analyses against the combined one Heather: in general could do with third opinions on number of variables. yes, gotcha. that makes sense.
12:52 PM so I've been running with the "rule of thumb" of about 30 datapoints per degree of freedom in a regression
of course such things depend on size of effect etc but do you have any rules of thumb that are different, that you prefer? todd.vision: my thumb has the same rules Heather: ok :) Sarah: ok...i'll proceed with that then
12:53 PM todd.vision: are you too heavy with variables and factors?
Sarah: if I'm not careful Heather: nic too Sarah: depending on factors we want to consider and factor/character states of each so i'm whittling down on both ends
12:54 PM todd.vision: you can do multiple uni/bivariate analyses and only do a combined analysis when there's some interest in the interaction per se
like from a prior hypothesis
12:55 PM did that make any sense?
Sarah: yep that's what i'm moving towards todd.vision: ie test the response variable against individual factors before getting complicated ok Heather: so do univariate analyses of all the things we are interested in, and only put those that remain interesting into the multivariate analysis Sarah: more general on the multivariate end and then univariate where it's interesting Heather: yup
12:56 PM Nic, bookmark this part of the conversation for later reference :)
todd.vision: sarah - is that reversed? Sarah: hmm?
12:57 PM todd.vision: univariate on everything, multivariate where you have reason to wonder if there is something going on
Sarah: i mean, i have broader character states for multivariate and then more specific ones for univariate Heather: yeah, so Sarah actually had a different approach than that so Sarah maybe we talk this through and see which approach makes more sense..... todd.vision: oh i see changing the number of factors. hmmm... i'd need to know more to respond Heather: the "univariate first" approach is probably more standard Sarah: most of it is in the emails we've been cc ing to you
12:58 PM i did univariate first
when i was having multivar problems todd.vision: sorry i am not keeping up with my email deluge very well these days... Sarah: and at that point we decided the multivar was more interesting as i remember at least Heather: yup. I think maybe going through R results will help. todd.vision: i'll look back at the correspondence
12:59 PM ok - sounds good. I should take off now. any parting issues?
Heather: none that can't be done in email? todd.vision: ok - bye all!
1:00 PM Sarah: so, heather, i think you got cut off on "discoverability"
Heather: Bruce, any comments? todd.vision@gmail.com has left Heather: yeah, that's ok. Sarah, you and I offline will make sure we are on the same page about what Todd was suggesting, and what to do about it.
1:01 PM that works?
Nicholas: Sarah, if you could post some of the code for univarite stuff that you do that would also be really helpful for me to see, even if its something simple that you just throw me Heather: Yup. Nic, your summary() plot was "univariate" stuff too, just in case that wasn't clear Nicholas: ok Sarah: yeah, sorry i've been slow on code
1:02 PM refbruce: Nope, no immediate comments. I had a student who'd scheduled to come to my office at 4:00, and I've taken care of his needs :-). 1:03 PM Heather: ok! ok, shall we all sign off for now then?
refbruce: Will re-read the transcript and see if there are any thoughts that come up. Nicholas: ok.. I guess not ... Bye Heather: Maribeth said sorry she wasn't here... got caught in a meeting
1:04 PM Valerie: ok
Sarah: i'm good...heather, do you want to talk now or later? Heather: she'll read the transcript and please email her if there is anything she can help with Sarah, now works for me Valerie: is anyone posting the transcript? Sarah: i will Valerie: ok, thanks Heather: Thanks. In general guys if you can help by posting any/all transcripts that aren't up yet I'll try to do them too, but I'm behind
1:05 PM (I know lots are up, but my conversations with Nic are slackign at least, whoops)
ok! Bye all. Sarah, let's start a new chat window? Nicholas: Ok, I'll get those up this afternoon Valerie: ok, we'll talk soon
1:06 PM Nicholas: bye, thanks
Valerie: later and thanks again Valerie has left refbruce@gmail.com has left Sarah: yeah, new chat window is good Heather: bye! Heather has left