Biogang:Discussion/Project Ideas: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
m (cleaning)
Line 3: Line 3:
At this point is another placeholder, but I thought we could dump here ideas and thoughts about possible projects.
At this point is another placeholder, but I thought we could dump here ideas and thoughts about possible projects.


//See [[EGC summary|summary]] of [[http://friendfeed.com/e/2a039db0-30b6-11dd-bbc5-003048343a40|the FriendFeed discussion]] on the [[http://www.elseviergrandchallenge.com/description.html|Elsevier Grand Challenge]].//
''See [[Biogang:Discussion/EGC_Summary|summary]] of [http://friendfeed.com/e/2a039db0-30b6-11dd-bbc5-003048343a40 the FriendFeed discussion] on the [http://www.elseviergrandchallenge.com/description.html Elsevier Grand Challenge].''


We could categorize ideas into following buckets:
We could categorize ideas into following buckets:


**Tools** (standalone and web-based pieces of useful code)
'''Tools''' (standalone and web-based pieces of useful code)
* would it be useful to provide links to existing code written by our community?
* would it be useful to provide links to existing code written by our community?
* are community members interested in forming "sub-groups" based on programming experience and/or common interests?
* are community members interested in forming "sub-groups" based on programming experience and/or common interests?


//Matt and I (Deepak) were talking about listing existing code and experience somewhere as a starting point. These points above fit that perfectly//
''Matt and I (Deepak) were talking about listing existing code and experience somewhere as a starting point. These points above fit that perfectly.''


//Another "tool" idea is to revive a project of mine (Paulo) that never took flight: [[http://infasta.genedrift.org|InFasta]]//. //I can// try //to add some more ideas to the blog and we can even migrate it to Google AppEngine. The initial code was developed in C++ and the intention is to convert everything to Python.//
* Another "tool" idea is to revive a project of mine (Paulo) that never took flight: [http://infasta.genedrift.org InFasta]. I can try to add some more ideas to the blog and we can even migrate it to Google AppEngine. The initial code was developed in C++ and the intention is to convert everything to Python.
* mention of the GAppEngine rang a bell in my head - would it be worth to create something resembling [[http://toolkit.tuebingen.mpg.de|MPI bioinformatics toolkit]] starting from the original InFasta project? Basic functionality (sequence manipulation) is there and many tools could be ran via api - the most time-consuming would be writing a script to forward outputs to another tool (pretty unique feature among such websites). The toolkit is not available for download (and it's in Ruby anyway) - I think it's worth replicating. -> //I think this is a good idea, starting with some file manipulation, then moving to tool integration with other systems.//
** mention of the GAppEngine rang a bell in my head - would it be worth to create something resembling [http://toolkit.tuebingen.mpg.de MPI bioinformatics toolkit] starting from the original InFasta project? Basic functionality (sequence manipulation) is there and many tools could be ran via api - the most time-consuming would be writing a script to forward outputs to another tool (pretty unique feature among such websites). The toolkit is not available for download (and it's in Ruby anyway) - I think it's worth replicating. -> ''I think this is a good idea, starting with some file manipulation, then moving to tool integration with other systems.''
** //I like the idea as well [[User:Deepak Singh|Deepak Singh]] 01:53, 20 June 2008 (UTC) //
** I like the idea as well [[User:Deepak_Singh|Deepak Singh]] 01:53, 20 June 2008 (UTC)  
** //Second this. I'm a big fan of the MPI toolkit, an open source version would be great [[user:pansapiens]]//
** Second this. I'm a big fan of the MPI toolkit, an open source version would be great [[User:Andrew_Perry|Andrew Perry]]


* I work on the bug tracking repository (with some help of people behind Lighthouseapp) - does it fit into this bucket as a community effort (given it works)? //[[user:freesci]] I'd say bug tracking goes under "Tools" - Neil//
* I work on the bug tracking repository (with some help of people behind Lighthouseapp) - does it fit into this bucket as a community effort (given it works)? [[User:Pawel_Szczesny|Pawel Szczesny]]  
** I'd say bug tracking goes under "Tools" - Neil




**Analyses** (collaborative blog posts or similar, our own Journal of Biogang Research?)
'''Analyses''' (collaborative blog posts or similar, our own Journal of Biogang Research?)
* could this be the new form of Bio::Blogs?
* could this be the new form of Bio::Blogs?
** That would be great. It's an idea that's come up in different discussions in the past. Something to replace Bio::Blogs as a collection of some of the more popular topics of interest.
** That would be great. It's an idea that's come up in different discussions in the past. Something to replace Bio::Blogs as a collection of some of the more popular topics of interest.
* We could share ideas for blog posts and if there's enough people interested, such (obviously longer) piece of work could be submitted to NatPrecedings. Together it would be much easier to get to NP, than writing everything by oneself. How does impact of NP articles compares to blog posts?
* We could share ideas for blog posts and if there's enough people interested, such (obviously longer) piece of work could be submitted to NatPrecedings. Together it would be much easier to get to NP, than writing everything by oneself. How does impact of NP articles compares to blog posts?
* I thought about this idea of using Bio::Blogs instead for collaborative longer posts (reviews for example) that could eventually be published in journals as well, giving a stronger reward to the authors. Eventually the "bold" goal could to write a collective book by collecting the different reviews together. This could be given way for free online and print-on-demand for a small fee.
* I thought about this idea of using Bio::Blogs instead for collaborative longer posts (reviews for example) that could eventually be published in journals as well, giving a stronger reward to the authors. Eventually the "bold" goal could to write a collective book by collecting the different reviews together. This could be given way for free online and print-on-demand for a small fee.
** //In addition to the inFasta idea, I like this one a lot -- Deepak//
** In addition to the inFasta idea, I like this one a lot -- Deepak


* The similar to Bio:Blogs curation of topics seems excellent and easily can lead to a NP 2-3 page paper. It would be better though for everyone to choose a topic of interest and collect / review information from blog posts. Small groups of people can work on each paper based on the topic that interests them... I started a page with tittle "Web 2.0 and online project communities in bioscience" (feel free to suggest a new tittle). We can start by submitting links from blog posts to that page relevant to the topic, and as people have time they follow the links and write up - edit other people's writings.
* The similar to Bio:Blogs curation of topics seems excellent and easily can lead to a NP 2-3 page paper. It would be better though for everyone to choose a topic of interest and collect / review information from blog posts. Small groups of people can work on each paper based on the topic that interests them... I started a page with tittle "Web 2.0 and online project communities in bioscience" (feel free to suggest a new tittle). We can start by submitting links from blog posts to that page relevant to the topic, and as people have time they follow the links and write up - edit other people's writings.


***Pure science** (projects possibly ending with publication)
'''Pure science''' (projects possibly ending with publication)
* this would be the best demonstration to (academic) sceptics that the process can work
* this would be the best demonstration to (academic) sceptics that the process can work
* can we devise a distributed data project: in which a large task (//e.g.// a genomic analysis) is broken up and sent out to community members?
* can we devise a distributed data project: in which a large task (e.g. a genomic analysis) is broken up and sent out to community members?
** all we need is a proof of concept, one that essentially proves that this is possible (and that it can scale)
** all we need is a proof of concept, one that essentially proves that this is possible (and that it can scale)
** as far as I know, few sequencing consortia (for example tomato) do so-called community annotation - annotation process in splitted between few groups that use their own tools (domain annotation is done by one group, gene prediction by the other)
** as far as I know, few sequencing consortia (for example tomato) do so-called community annotation - annotation process in splitted between few groups that use their own tools (domain annotation is done by one group, gene prediction by the other)
Line 49: Line 50:
** Small, focussed project which is publishable, have support from the Sanger Centre, and this is useful for them
** Small, focussed project which is publishable, have support from the Sanger Centre, and this is useful for them
** Central issues also could be expanded to the ABI systems, Pacific Biosystems and most other sequencing by synthesis approaches
** Central issues also could be expanded to the ABI systems, Pacific Biosystems and most other sequencing by synthesis approaches
** [[http://blog.openwetware.org/scienceintheopen/2008/05/28/defining-error-rates-in-the-illumina-sequence-a-useful-and-feasible-open-project/|Blog post with some comments]].
** [http://blog.openwetware.org/scienceintheopen/2008/05/28/defining-error-rates-in-the-illumina-sequence-a-useful-and-feasible-open-project/ Blog post with some comments].

Revision as of 04:56, 20 June 2008

Home        Projects        Discussion        Events        FriendFeed - The Life Scientists       

At this point is another placeholder, but I thought we could dump here ideas and thoughts about possible projects.

See summary of the FriendFeed discussion on the Elsevier Grand Challenge.

We could categorize ideas into following buckets:

Tools (standalone and web-based pieces of useful code)

  • would it be useful to provide links to existing code written by our community?
  • are community members interested in forming "sub-groups" based on programming experience and/or common interests?

Matt and I (Deepak) were talking about listing existing code and experience somewhere as a starting point. These points above fit that perfectly.

  • Another "tool" idea is to revive a project of mine (Paulo) that never took flight: InFasta. I can try to add some more ideas to the blog and we can even migrate it to Google AppEngine. The initial code was developed in C++ and the intention is to convert everything to Python.
    • mention of the GAppEngine rang a bell in my head - would it be worth to create something resembling MPI bioinformatics toolkit starting from the original InFasta project? Basic functionality (sequence manipulation) is there and many tools could be ran via api - the most time-consuming would be writing a script to forward outputs to another tool (pretty unique feature among such websites). The toolkit is not available for download (and it's in Ruby anyway) - I think it's worth replicating. -> I think this is a good idea, starting with some file manipulation, then moving to tool integration with other systems.
    • I like the idea as well Deepak Singh 01:53, 20 June 2008 (UTC)
    • Second this. I'm a big fan of the MPI toolkit, an open source version would be great Andrew Perry
  • I work on the bug tracking repository (with some help of people behind Lighthouseapp) - does it fit into this bucket as a community effort (given it works)? Pawel Szczesny
    • I'd say bug tracking goes under "Tools" - Neil


Analyses (collaborative blog posts or similar, our own Journal of Biogang Research?)

  • could this be the new form of Bio::Blogs?
    • That would be great. It's an idea that's come up in different discussions in the past. Something to replace Bio::Blogs as a collection of some of the more popular topics of interest.
  • We could share ideas for blog posts and if there's enough people interested, such (obviously longer) piece of work could be submitted to NatPrecedings. Together it would be much easier to get to NP, than writing everything by oneself. How does impact of NP articles compares to blog posts?
  • I thought about this idea of using Bio::Blogs instead for collaborative longer posts (reviews for example) that could eventually be published in journals as well, giving a stronger reward to the authors. Eventually the "bold" goal could to write a collective book by collecting the different reviews together. This could be given way for free online and print-on-demand for a small fee.
    • In addition to the inFasta idea, I like this one a lot -- Deepak
  • The similar to Bio:Blogs curation of topics seems excellent and easily can lead to a NP 2-3 page paper. It would be better though for everyone to choose a topic of interest and collect / review information from blog posts. Small groups of people can work on each paper based on the topic that interests them... I started a page with tittle "Web 2.0 and online project communities in bioscience" (feel free to suggest a new tittle). We can start by submitting links from blog posts to that page relevant to the topic, and as people have time they follow the links and write up - edit other people's writings.

Pure science (projects possibly ending with publication)

  • this would be the best demonstration to (academic) sceptics that the process can work
  • can we devise a distributed data project: in which a large task (e.g. a genomic analysis) is broken up and sent out to community members?
    • all we need is a proof of concept, one that essentially proves that this is possible (and that it can scale)
    • as far as I know, few sequencing consortia (for example tomato) do so-called community annotation - annotation process in splitted between few groups that use their own tools (domain annotation is done by one group, gene prediction by the other)
  • Building a mechanistic error model for the Solexa Sequencer
    • Basically the idea is to take the intensity data for well resolved spots on each cycle of a sequencing run
    • Using a known sequence it is possible to tell what the 'true read' should be
    • Test the experimental intensities for each base against mechanistic models of failure and use the fits to optimise model parameters
    • Possible models are:
      • All base insertions fail at same rate, no sequence context effects, no effect on next cycle (except obviously the wrong base goes in)
      • All base insertions fail at same rate, no sequence context effects, does effect insertion rate at next cycle
      • Bases fail at different rates, no sequence context effects, no effect on next cycle
      • Bases fail at different rates, failure depends on previous base identity, no effect on next cycle
      • etc etc
    • Relatively simple models and data is available for testing.
    • Small, focussed project which is publishable, have support from the Sanger Centre, and this is useful for them
    • Central issues also could be expanded to the ABI systems, Pacific Biosystems and most other sequencing by synthesis approaches
    • Blog post with some comments.