Harvard:Biophysics 101/2007/Notebook:Michael Wang/2007-2-13

From OpenWetWare

< Harvard:Biophysics 101 | 2007(Difference between revisions)
Jump to: navigation, search
Current revision (23:21, 19 February 2007) (view source)
(Removing all content from page)
 
Line 1: Line 1:
-
For anyone still trying to get clustalw working on a PC after reading the link [http://openwetware.org/wiki/Talk:Harvard:Biophysics_101/2007/02/13/install-clustalw#Windows here], the key seems to be making sure that clustalw works from the command line.  Even if you set it up properly, any problems in the actual call will give you the same error as if you didn't set it up properly.  The only thing python cares about is whether or not the output file was created.
 
-
The current version of my code is not very intelligent on the analysis side.  It currently sucks up all the fasta files in the ./import folder of the current directory and then compiles them into a single file.  This file is passed into clustalw for alignment. 
 
-
 
-
<pre>
 
-
#!/usr/bin/env python
 
-
 
-
import os
 
-
from Bio import Clustalw
 
-
 
-
#This first section of code merges all fasta files located in the input folder of curdir
 
-
#into a single file called all.fasta
 
-
input_list = list(os.listdir(os.path.join(os.curdir,'input')))
 
-
print input_list
 
-
merged_file = open(os.path.join(os.curdir, 'all.fasta'),"w")
 
-
print os.path.join(os.curdir, 'all.fasta')
 
-
for i in input_list:
 
-
        print "loading ", os.path.join(os.curdir,'input\\',i)
 
-
        current_file = open(os.path.join(os.curdir,'input\\',i),"r")
 
-
        all_lines = current_file.readlines()
 
-
        merged_file.writelines(all_lines)
 
-
        current_file.close()
 
-
        merged_file.write("\n\n")
 
-
print "done making file"
 
-
merged_file.close()
 
-
 
-
#Once the merged file has been created, it is passed into the alignment program
 
-
cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, 'all.fasta'))
 
-
cline.set_output('test.aln')
 
-
alignment = Clustalw.do_alignment(cline)
 
-
all_records = alignment.get_all_seqs()
 
-
 
-
print alignment
 
-
</pre>
 
-
 
-
I have yet to write code to do counts of say, how many frameshift mutations there are, etc.  It just prints the raw alignment for now.
 
-
 
-
Using a test files uploaded [[Media:apoemod.fasta]] and [[Media:Copy of apoe.fasta]], the following output is generated.
 
-
 
-
<pre>
 
-
loading  .\input\apoe.fasta
 
-
loading  .\input\Copy of apoe.fasta
 
-
done making file
 
-
CLUSTAL X (1.81) multiple sequence alignment
 
-
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      CGCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGACTGGCCAATCAC
 
-
gi|189350|gb|K10296.1|HUMAPOE3      CGCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGACTGGCCAATCAC
 
-
gi|178850|gb|K00396.1|HUMAPOE3      CGCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGACTGGCCAATCAC
 
-
gi|178843|gb|K06396.1|HUMAPOE3      CGCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGACTGGCCAATCAC
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      AGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGG
 
-
gi|189350|gb|K10296.1|HUMAPOE3      AGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGG
 
-
gi|178850|gb|K00396.1|HUMAPOE3      AGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGG
 
-
gi|178843|gb|K06396.1|HUMAPOE3      AGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGG
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      CAGGATGCCAGGCCAAGGTGGAGCAAGCGGTGGAGACAGAGCCGGAGCCC
 
-
gi|189350|gb|K10296.1|HUMAPOE3      CAGGATGCCAGGCCAAGGTGGAG--GGCGGTGGAGACAGAGCCGGAGCCC
 
-
gi|178850|gb|K00396.1|HUMAPOE3      CAGGATGCCAGGCCAAGGTGGAGCAAGCGGTGGAGACAGAGCCGGAGCCC
 
-
gi|178843|gb|K06396.1|HUMAPOE3      CAGGATGCCAGGCCAAGGTGGAGCAAGCGGTGGAGACAGAGCCGGAGCCC
 
-
                                    ***********************  ************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      GAGCTGCGCCAGCAGACCGAGTGGCAGAGCGGCCAGCGCTGGGAACTGGC
 
-
gi|189350|gb|K10296.1|HUMAPOE3      GAGCTGCGCCAGCAGACCGAGTGGCAGAGCGGCCAGCGCTGGGAACTGGC
 
-
gi|178850|gb|K00396.1|HUMAPOE3      GAGCTGCGCCAGCAGACCGAGTGGCAGAGCGGCCAGCGCTGGGAACTGGC
 
-
gi|178843|gb|K06396.1|HUMAPOE3      GAGCTGCGCCAGCAGACCGAGTGGCAGAGCGGCCAGCGCTGGGAACTGGC
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      ACTGGGTCGCTTTTGGGATTAATCCTGCGCTGGGTGCAGACACTGTCTGA
 
-
gi|189350|gb|K10296.1|HUMAPOE3      ACTGGGTCGCTTTTGGGATTAATCCTGCGCTGGGTGCAGACACTGTCTGA
 
-
gi|178850|gb|K00396.1|HUMAPOE3      ACTGGGTCGCTTTTGGGATTA--CCTGCGCTGGGTGCAGACACTGTCTGA
 
-
gi|178843|gb|K06396.1|HUMAPOE3      ACTGGGTCGCTTTTGGGATTA--CCTGCGCTGGGTGCAGACACTGTCTGA
 
-
                                    *********************  ***************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      GCAGGTGCAGGAGGAGCTGCTCAGCTCCCAGGTCACCCAGGAACTGAGGG
 
-
gi|189350|gb|K10296.1|HUMAPOE3      GCAGGTGCAGGAGGAGCTGCTCAGCTCCCAGGTCACCCAGGAACTGAGGG
 
-
gi|178850|gb|K00396.1|HUMAPOE3      GCAGGTGCAGGAGGAGCTGCTCAGCTCCCAGGTCACCCAGGAACTGAGGG
 
-
gi|178843|gb|K06396.1|HUMAPOE3      GCAGGTGCAGGAGGAGCTGCTCAGCTCCCAGGTCACCCAGGAACTGAGGG
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      CGCTGATGGACGAGACCATGAAGGAGTTGAAGGCCTACAAATCGGAACTG
 
-
gi|189350|gb|K10296.1|HUMAPOE3      CGCTGATGGACGAGACCATGAAGGAGTTGAAGGCCTACAAATCGGAACTG
 
-
gi|178850|gb|K00396.1|HUMAPOE3      CGCTGATGGACGAGACCATGAAGGAGTTGAAGGCCTACAAATCGGAACTG
 
-
gi|178843|gb|K06396.1|HUMAPOE3      CGCTGATGGACGAGACCATGAAGGAGTTGAAGGCCTACAAATCGGAACTG
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      GAGGAACAACTGACCCCGGTGGCGGAGGAGACGCGGGCACGGCTGTCCAA
 
-
gi|189350|gb|K10296.1|HUMAPOE3      GAGGAACAACTGACCCCGGTGGCGGAGGAGACGCGGGCACGGCTGTCCAA
 
-
gi|178850|gb|K00396.1|HUMAPOE3      GAGGAACAACTGACCCCGGTGGCGGAGGAGACGCGGGCACGGCTGTCCAA
 
-
gi|178843|gb|K06396.1|HUMAPOE3      GAGGAACAACTGACCCCGGTGGCGGAGGAGACGCGGGCACGGCTGTCCAA
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      GGAGCTGCAGGCGGCGCAGGCCCGGCTGGGCGCGGACATGGAGGACGTGT
 
-
gi|189350|gb|K10296.1|HUMAPOE3      GGAGCTGCAGGCGGCGCAGGCCCGGCTGGGCGCGGACATGGAGGACGTGT
 
-
gi|178850|gb|K00396.1|HUMAPOE3      GGAGCTGCAGGCGGCGCAGGCCCGGCTGGGCGCGGACATGGAGGACGTGT
 
-
gi|178843|gb|K06396.1|HUMAPOE3      GGAGCTGCAGGCGGCGCAGGCCCGGCTGGGCGCGGACATGGAGGACGTGT
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      GCGGCCGCCTGGTGCAGTACCGCGGCGAGGTGCAGGCCATGCTCGGCCAG
 
-
gi|189350|gb|K10296.1|HUMAPOE3      GCGGCCGCCTGGTGCAGTACCGCGGCGAGGTGCAGGCCATGCTCGGCCAG
 
-
gi|178850|gb|K00396.1|HUMAPOE3      GCGGCCGCCTGGTGCAGTACCGCGGCGAGGTGCAGGCCATGCTCGGCCAG
 
-
gi|178843|gb|K06396.1|HUMAPOE3      GCGGCCGCCTGGTGCAGTACCGCGGCGAGGTGCAGGCCATGCTCGGCCAG
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      AGCACCGAGGAGCTGCGGGTGCGCCTCGCCTCCCACCTGCGCAAGCTGCG
 
-
gi|189350|gb|K10296.1|HUMAPOE3      AGCACCGAGGAGCTGCGGGTGCGCCTCGCCTCCCACCTGCGCAAGCTGCG
 
-
gi|178850|gb|K00396.1|HUMAPOE3      AGCACCGAGGAGCTGCGGGTGCGCCTCGCCTCCCACCTGCGCAAGCTGCG
 
-
gi|178843|gb|K06396.1|HUMAPOE3      AGCACCGAGGAGCTGCGGGTGCGCCTCGCCTCCCACCTGCGCAAGCTGCG
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      TAAGCGGCTCCTCCGCGATGCCGATGACCTGCAGAAGCGCCTGGCAGTGT
 
-
gi|189350|gb|K10296.1|HUMAPOE3      TAAGCGGCTCCTCCGCGATGCCGATGACCTGCAGAAGCGCCTGGCAGTGT
 
-
gi|178850|gb|K00396.1|HUMAPOE3      TAAGCGGCTCCTCCGCGATGCCGATGACCTGCAGAAGCGCCTGGCAGTGT
 
-
gi|178843|gb|K06396.1|HUMAPOE3      TAAGCGGCTCCTCCGCGATGCCGATGACCTGCAGAAGCGCCTGGCAGTGT
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      ACCAGGCCGGGGCCCGCGAGGGCGCCGAGCGCGGCCTCAGCGCCATCCGC
 
-
gi|189350|gb|K10296.1|HUMAPOE3      ACCAGGCCGGGGCCCGCGAGGGCGCCGAGCGCGGCCTCAGCGCCATCCGC
 
-
gi|178850|gb|K00396.1|HUMAPOE3      ACCAGGCCGGGGCCCGCGAGGGCGCCGAGCGCGGCCTCAGCGCCATCCGC
 
-
gi|178843|gb|K06396.1|HUMAPOE3      ACCAT-------CCCGCGAGGGCGCCGAGCGCGGCCTCAGCGCCATCCGC
 
-
                                    ****        **************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      GAGCGCCTGGGGCCCCTGGTGGAACAGGGCCGCGTGCGGGCCGCCACTGT
 
-
gi|189350|gb|K10296.1|HUMAPOE3      GAGCGCCTGGGGCCCCTGGTGGAACAGGGCCGCGTGCGGGCCGCCACTGT
 
-
gi|178850|gb|K00396.1|HUMAPOE3      GAGCGCCTGGGGCCCCTGGTGGAACAGGGCCGCGTGCGGGCCGCCACTGT
 
-
gi|178843|gb|K06396.1|HUMAPOE3      GAGCGCCTGGGGCCCCTGGTGGAACAGGGCCGCGTGCGGGCCGCCACTGT
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      GGGCTCCCTGGCCGGCCAGCCGCTACAGGAGCGGGCCCAGGCCTGGGGCG
 
-
gi|189350|gb|K10296.1|HUMAPOE3      GGGCTCCCTGGCCGGCCAGCCGCTACAGGAGCGGGCCCAGGCCTGGGGCG
 
-
gi|178850|gb|K00396.1|HUMAPOE3      GGGCTCCCTGGCCGGCCAGCCGCTACAGGAGCGGGCCCAGGCCTGGGGCG
 
-
gi|178843|gb|K06396.1|HUMAPOE3      GGGCTCCCTGGCCGGCCAGCCGCTACAGGAGCGGGCCCAGGCCTGGGGCG
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      AGCGGCTGCGCGCGCGGATGGAGGAGATGGGCAGCCGGACCCGCGACCGC
 
-
gi|189350|gb|K10296.1|HUMAPOE3      AGCGGCTGCGCGCGCGGATGGAGGAGATGGGCAGCCGGACCCGCGACCGC
 
-
gi|178850|gb|K00396.1|HUMAPOE3      AGCGGCTGCGCGCGCGGATGGAGGAGATGGGCAGCCGGACCCGCGACCGC
 
-
gi|178843|gb|K06396.1|HUMAPOE3      AGCGGCTGCGCGCGCGGATGGAGGAGATGGGCAGCCGGACCCGCGACCGC
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      CTGGACGAGGTGAAGGAGCAGGTGGCGGAGGTGCGCGCCAAGCTGGAGGA
 
-
gi|189350|gb|K10296.1|HUMAPOE3      CTGGACGAGGTGAAGGAGCAGGTGGCGGAGGTGCGCGCCAAGCTGGAGGA
 
-
gi|178850|gb|K00396.1|HUMAPOE3      CTGGACGAGGTGAAGGAGCAGGTGGCGGAGGTGCGCGCCAAGCTGGAGGA
 
-
gi|178843|gb|K06396.1|HUMAPOE3      CTGGACGAGGTGAAGGAGCAGGTGGCGGAGGTGCGCGCCAAGCTGGAGGA
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      GCAGGCCCAGCAGATACGCCTGCAGGCCGAGGCCTTCCAGGCCCGCCTCA
 
-
gi|189350|gb|K10296.1|HUMAPOE3      GCAGGCCCAGCAGATACGCCTGCAGGCCGAGGCCTTCCAGGCCCGCCTCA
 
-
gi|178850|gb|K00396.1|HUMAPOE3      GCAGGCCCAGCAGATACGCCTGCAGGCCGAGGCCTTCCAGGCCCGCCTCA
 
-
gi|178843|gb|K06396.1|HUMAPOE3      GCAGGCCCAGCAGATACGCCTGCAGGCCGAGGCCTTCCAGGCCCGCCTCA
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      AGAGCTGGTTCGAGCCCCTGGTGGAAGACATGCAGCGCCAGTGGGCCGGG
 
-
gi|189350|gb|K10296.1|HUMAPOE3      AGAGCTGGTTCGAGCCCCTGGTGGAAGACATGCAGCGCCAGTGGGCCGGG
 
-
gi|178850|gb|K00396.1|HUMAPOE3      AGAGCTGGTTCGAGCCCCTGGTGGAAGACATGCAGCGCCAGTGGGCCGGG
 
-
gi|178843|gb|K06396.1|HUMAPOE3      AGAGCTGGTTCGAGCCCCTGGTGGAAGACATGCAGCGCCAGTGGGCCGGG
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      CTGGTGGAGAAGGTGCAGGCTGCCGTGGGCACCAGCGCCGCCCCTGTGCC
 
-
gi|189350|gb|K10296.1|HUMAPOE3      CTGGTGGAGAAGGTGCAGGCTGCCGTGGGCACCAGCGCCGCCCCTGTGCC
 
-
gi|178850|gb|K00396.1|HUMAPOE3      CTGGTGGAGAAGGTGCAGGCTGCCGTGGGCACCAGCGCCGCCCCTGTGCC
 
-
gi|178843|gb|K06396.1|HUMAPOE3      CTGGTGGAGAAGGTGCAGGCTGCCGTGGGCACCAGCGCCGCCCCTGTGCC
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      CAGCGACAATCACTGAACGCCGAAGCCTGCAGCCATGCGACCCCACGCCA
 
-
gi|189350|gb|K10296.1|HUMAPOE3      CAGCGACAATCACTGAACGCCGAAGCCTGCAGCCATGCGACCCCACGCCA
 
-
gi|178850|gb|K00396.1|HUMAPOE3      CAGCGACAATCACTGAACGCCGAAGCCTGCAGCCATGCGACCCCACGCCA
 
-
gi|178843|gb|K06396.1|HUMAPOE3      CAGCGACAATCACTGAACGCCGAAGCCTGCAGCCATGCGACCCCACGCCA
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      CCCCGTGCCTCCTGCCTCCGCGCAGCCTGCAGCGGGAGACCCTGTCCCCG
 
-
gi|189350|gb|K10296.1|HUMAPOE3      CCCCGTGCCTCCTGCCTCCGCGCAGCCTGCAGCGGGAGACCCTGTCCCCG
 
-
gi|178850|gb|K00396.1|HUMAPOE3      CCCCGTGCCTCCTGCCTCCGCGCAGCCTGCAGCGGGAGACCCTGTCCCCG
 
-
gi|178843|gb|K06396.1|HUMAPOE3      CCCCGTGCCTCCTGCCTCCGCGCAGCCTGCAGCGGGAGACCCTGTCCCCG
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      CCCCAGCCGTCCTCCTGGGGTGGACCCTAGTTTAATAAAGATTCACCAAG
 
-
gi|189350|gb|K10296.1|HUMAPOE3      CCCCAGCCGTCCTCCTGGGGTGGACCCTAGTTTAATAAAGATTCACCAAG
 
-
gi|178850|gb|K00396.1|HUMAPOE3      CCCCAGCCGTCCTCCTGGGGTGGACCCTAGTTTAATAAAGATTCACCAAG
 
-
gi|178843|gb|K06396.1|HUMAPOE3      CCCCAGCCGTCCTCCTGGGGTGGACCCTAGTTTAATAAAGATTCACCAAG
 
-
                                    **************************************************
 
-
 
-
gi|178350|gb|K00296.1|HUMAPOE3      TTTCACGT
 
-
gi|189350|gb|K10296.1|HUMAPOE3      TTTCACGT
 
-
gi|178850|gb|K00396.1|HUMAPOE3      TTTCACGC
 
-
gi|178843|gb|K06396.1|HUMAPOE3      TTTCACGC
 
-
                                    *******
 
-
</pre>
 
-
Each of the two files contains two sequences (I made fake changes to each).
 

Current revision

Personal tools