|
|
Line 1: |
Line 1: |
| == Outline == | | == Please Leave A Comment == |
|
| |
|
| === The Scientists Dilemma ===
| | Type <nowiki>~~~~: Your Comment</nowiki> to leave a comment. Thanks! |
| * A typical research project requires a variety of computational tasks
| |
| ** Data generation
| |
| ** Data analysis
| |
| ** Data visualization
| |
| * The most common solution is to use separate tools for each task
| |
| ** Data generation in C
| |
| ** Data analysis in proprietory software
| |
| ** Data visualization in separate graphing package
| |
| * This is an inadequate solution
| |
| ** These tools can't be pipelined easily
| |
| *** Many manual steps have to be repeated if something changes
| |
| ** Poor at best data provenance
| |
| *** Not sure if an error is due to a program or human error
| |
| *** Can only repeat analysis by following written steps in a lab notebook
| |
| *** Steps are easily forgotten and hard to pass on
| |
| * Python overcomes these weaknesses
| |
| | |
| === Comparative Genomics Case Study ===
| |
| * Brief Project Description - Compare DNA sequences of viruses
| |
| ** Download and parse the genome files of a many viruses
| |
| ** Store the genome in a project-specific genome class
| |
| ** Draw random genomes to compare to the 'real' genome
| |
| ** Visualize the genomic data in a 'genome landscape' plot
| |
| | |
| === BioPython ===
| |
| * Overview of BioPython
| |
| ** A suite of bioinformatics tools for tasks such as parsing bio-database files, computing alignments between biological sequences, interacting with bio web-services
| |
| * Use of BioPython in this project
| |
| ** Parsing GenBank files from the National Center for Biotechnology Information
| |
| ** example code
| |
| * Benefits of using Biopython
| |
| ** parsing code can be wrapped in custom classes that make sense for the particular project
| |
| | |
| === MatPlotLib ===
| |
| * Overview of MatPlotLib
| |
| ** Matlab-like graphical environment
| |
| * Use of MatPlotLib in this project
| |
| ** generating genome landscapes
| |
| ** example code
| |
| * Benefits of using MatPlotLib
| |
| ** graphics code resides along-side of data generation code
| |
| ** quick trouble shooting
| |
| ** can easily re-generate complicated plots since by tweaking the code
| |
| | |
| === SWIG ===
| |
| * Overview of SWIG
| |
| ** allows you to speed up selected parts of an application by writing in another language (C,C++)
| |
| * Use of SWIG in this project
| |
| ** speed up of the random genome drawing routine
| |
| ** example code
| |
| * Benefits of using SWIG
| |
| ** get all the benefits of python with the speed for critical parts
| |
| ** sped up parts are used in the exact same context - no need for glue code
| |
| ** can leverage experience in other languages that scientists typically have, within python
| |
| | |
| === Conclusions ===
| |
| * Practical Conclusions
| |
| ** community modules are useful for a variety of scientific tasks
| |
| ** python can easily be used by more scientists
| |
| * Bigger picture conclusions for good scientific practice
| |
| ** code readability and package structure promotes good scientific practice
| |
| ** python and its modules provide a consistent framework to promote data provenance
| |
| ** can plug into other community tools and practices to help science - e.g. unit testing
| |