Talk:Open writing projects/Python all a scientist needs: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(outline)
 
(removed outline - now leaving comments)
Line 1: Line 1:
== Outline ==
== Please Leave A Comment ==


=== The Scientists Dilemma ===
Type <nowiki>~~~~: Your Comment</nowiki> to leave a comment. Thanks!
* A typical research project requires a variety of computational tasks
** Data generation
** Data analysis
** Data visualization
* The most common solution is to use separate tools for each task
** Data generation in C
** Data analysis in proprietory software
** Data visualization in separate graphing package
* This is an inadequate solution
** These tools can't be pipelined easily
*** Many manual steps have to be repeated if something changes
** Poor at best data provenance
*** Not sure if an error is due to a program or human error
*** Can only repeat analysis by following written steps in a lab notebook
*** Steps are easily forgotten and hard to pass on
* Python overcomes these weaknesses
 
=== Comparative Genomics Case Study ===
* Brief Project Description - Compare DNA sequences of viruses
** Download and parse the genome files of a many viruses
** Store the genome in a project-specific genome class
** Draw random genomes to compare to the 'real' genome
** Visualize the genomic data in a 'genome landscape' plot
 
=== BioPython ===
* Overview of BioPython
** A suite of bioinformatics tools for tasks such as parsing bio-database files, computing alignments between biological sequences, interacting with bio web-services
* Use of BioPython in this project
** Parsing GenBank files from the National Center for Biotechnology Information
** example code
* Benefits of using Biopython
** parsing code can be wrapped in custom classes that make sense for the particular project
 
=== MatPlotLib ===
* Overview of MatPlotLib
** Matlab-like graphical environment
* Use of MatPlotLib in this project
** generating genome landscapes
** example code
* Benefits of using MatPlotLib
** graphics code resides along-side of data generation code
** quick trouble shooting
** can easily re-generate complicated plots since by tweaking the code
 
=== SWIG ===
* Overview of SWIG
** allows you to speed up selected parts of an application by writing in another language (C,C++)
* Use of SWIG in this project
** speed up of the random genome drawing routine
** example code
* Benefits of using SWIG
** get all the benefits of python with the speed for critical parts
** sped up parts are used in the exact same context - no need for glue code
** can leverage experience in other languages that scientists typically have, within python
 
=== Conclusions ===
* Practical Conclusions
** community modules are useful for a variety of scientific tasks
** python can easily be used by more scientists
* Bigger picture conclusions for good scientific practice
** code readability and package structure promotes good scientific practice
** python and its modules provide a consistent framework to promote data provenance
** can plug into other community tools and practices to help science - e.g. unit testing

Revision as of 19:15, 20 February 2008

Please Leave A Comment

Type ~~~~: Your Comment to leave a comment. Thanks!