Talk:Harvard:Biophysics 101/2007/02/13: Difference between revisions

Revision as of 20:34, 19 February 2007

Sequence-level differences to consider

Point mutations
- silent/synonymous
- missense
- nonsense
- translational stability (re: codon bias)?
Deletions (and insertions)
- frame shift
- downstream effects?
Add more here...
Coding vs nonCoding seq
- We should be able to determine AUG and stop codons...

General coding questions / ideas

How should we handle file input?
How should we format the output?
Generation of a set of test inputs.
How can the identification of a class of differences be identified with a set of implementable conditions (ie how do you seperate out a duplication or an insertion from a rearrangement.

Using clustalw for sequence alignment

clustalw can be used for performing multiple sequence alignments
BioPython provides a wrapper for clustalw through the Bio.Clustalw package. However, clustalw needs to be downloaded and installed before it is accessible using your biopython scripts.
- clustalw installation instructions
As Zsun noticed, the BioPython cookbook has some outdated example code for accessing clustalw. An online Python Course in Bioinformatics by Katja Schuerer and Catherine Letondal has several better examples with correct code.
Once you have clustalw installed and accessible via python, download an example apoe.fasta and try running the following code.

#!/usr/bin/env python

import os
from Bio import Clustalw

cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, 'apoe.fasta'))
cline.set_output('test.aln')
alignment = Clustalw.do_alignment(cline)

This should give you an output file, test.aln, which you should inspect.
Try modifying apoe.fasta, and upload new versions as you improve it to include all of the test cases enumerated above.

Extracting information from an alignment

Look at Bio/Align/Generic.py for ideas about what you can do with an alignment
Once you have obtained an alignment, one way to parse it is to look at each column and check for differences
Here is some initial code to give you some ideas.

#!/usr/bin/env python

import os
from Bio import Clustalw
from Bio.Align import AlignInfo
from sets import Set

cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, 'apoe.fasta'))
cline.set_output('test.aln')
alignment = Clustalw.do_alignment(cline)

for i in range(alignment.get_alignment_length()):
    col = alignment.get_column(i)
    s = Set() # create a new set
    for c in range(len(col)):
        s.add(col[c]) # add each column element to the set
    if len(s) > 1: # multiple elements in s indicate a mismatch
        print i, col

You may elaborate on this code to start handling specific cases of sequence mismatches.
Try to think about how differences in the alignment correspond to the cases enumerated above.
One thing I (katie) noticed you can do is generate a consensus sequence pretty easily. Click here and scroll down to find out how.

@@ Line 65: / Line 65: @@
 *You may elaborate on this code to start handling specific cases of sequence mismatches.
 *Try to think about how differences in the alignment correspond to the cases enumerated above.
-* One thing I (katie) noticed you can do is generate a consensus sequence pretty easily.  Click [http://www.bioinformatics.org/bradstuff/bp/tut/Tutorial003.html#toc16 | here] and scroll down to find out how.
+* One thing I (katie) noticed you can do is generate a consensus sequence pretty easily.  Click [http://www.bioinformatics.org/bradstuff/bp/tut/Tutorial003.html#toc16 here] and scroll down to find out how.

Talk:Harvard:Biophysics 101/2007/02/13: Difference between revisions

Revision as of 20:34, 19 February 2007

Contents

Sequence-level differences to consider

General coding questions / ideas

Using clustalw for sequence alignment

Extracting information from an alignment

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools