|
|
Line 1: |
Line 1: |
| For anyone still trying to get clustalw working on a PC after reading the link [http://openwetware.org/wiki/Talk:Harvard:Biophysics_101/2007/02/13/install-clustalw#Windows here], the key seems to be making sure that clustalw works from the command line. Even if you set it up properly, any problems in the actual call will give you the same error as if you didn't set it up properly. The only thing python cares about is whether or not the output file was created.
| |
|
| |
|
| The current version of my code is not very intelligent on the analysis side. It currently sucks up all the fasta files in the ./import folder of the current directory and then compiles them into a single file. This file is passed into clustalw for alignment.
| |
|
| |
| <pre>
| |
| #!/usr/bin/env python
| |
|
| |
| import os
| |
| from Bio import Clustalw
| |
|
| |
| #This first section of code merges all fasta files located in the input folder of curdir
| |
| #into a single file called all.fasta
| |
| input_list = list(os.listdir(os.path.join(os.curdir,'input')))
| |
| print input_list
| |
| merged_file = open(os.path.join(os.curdir, 'all.fasta'),"w")
| |
| print os.path.join(os.curdir, 'all.fasta')
| |
| for i in input_list:
| |
| print "loading ", os.path.join(os.curdir,'input\\',i)
| |
| current_file = open(os.path.join(os.curdir,'input\\',i),"r")
| |
| all_lines = current_file.readlines()
| |
| merged_file.writelines(all_lines)
| |
| current_file.close()
| |
| merged_file.write("\n\n")
| |
| print "done making file"
| |
| merged_file.close()
| |
|
| |
| #Once the merged file has been created, it is passed into the alignment program
| |
| cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, 'all.fasta'))
| |
| cline.set_output('test.aln')
| |
| alignment = Clustalw.do_alignment(cline)
| |
| all_records = alignment.get_all_seqs()
| |
|
| |
| print alignment
| |
| </pre>
| |
|
| |
| I have yet to write code to do counts of say, how many frameshift mutations there are, etc. It just prints the raw alignment for now.
| |
|
| |
| Using a test files uploaded [[Media:apoemod.fasta]] and [[Media:Copy of apoe.fasta]], the following output is generated.
| |
|
| |
| <pre>
| |
| loading .\input\apoe.fasta
| |
| loading .\input\Copy of apoe.fasta
| |
| done making file
| |
| CLUSTAL X (1.81) multiple sequence alignment
| |
|
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 CGCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGACTGGCCAATCAC
| |
| gi|189350|gb|K10296.1|HUMAPOE3 CGCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGACTGGCCAATCAC
| |
| gi|178850|gb|K00396.1|HUMAPOE3 CGCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGACTGGCCAATCAC
| |
| gi|178843|gb|K06396.1|HUMAPOE3 CGCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGACTGGCCAATCAC
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 AGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGG
| |
| gi|189350|gb|K10296.1|HUMAPOE3 AGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGG
| |
| gi|178850|gb|K00396.1|HUMAPOE3 AGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGG
| |
| gi|178843|gb|K06396.1|HUMAPOE3 AGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGG
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 CAGGATGCCAGGCCAAGGTGGAGCAAGCGGTGGAGACAGAGCCGGAGCCC
| |
| gi|189350|gb|K10296.1|HUMAPOE3 CAGGATGCCAGGCCAAGGTGGAG--GGCGGTGGAGACAGAGCCGGAGCCC
| |
| gi|178850|gb|K00396.1|HUMAPOE3 CAGGATGCCAGGCCAAGGTGGAGCAAGCGGTGGAGACAGAGCCGGAGCCC
| |
| gi|178843|gb|K06396.1|HUMAPOE3 CAGGATGCCAGGCCAAGGTGGAGCAAGCGGTGGAGACAGAGCCGGAGCCC
| |
| *********************** ************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 GAGCTGCGCCAGCAGACCGAGTGGCAGAGCGGCCAGCGCTGGGAACTGGC
| |
| gi|189350|gb|K10296.1|HUMAPOE3 GAGCTGCGCCAGCAGACCGAGTGGCAGAGCGGCCAGCGCTGGGAACTGGC
| |
| gi|178850|gb|K00396.1|HUMAPOE3 GAGCTGCGCCAGCAGACCGAGTGGCAGAGCGGCCAGCGCTGGGAACTGGC
| |
| gi|178843|gb|K06396.1|HUMAPOE3 GAGCTGCGCCAGCAGACCGAGTGGCAGAGCGGCCAGCGCTGGGAACTGGC
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 ACTGGGTCGCTTTTGGGATTAATCCTGCGCTGGGTGCAGACACTGTCTGA
| |
| gi|189350|gb|K10296.1|HUMAPOE3 ACTGGGTCGCTTTTGGGATTAATCCTGCGCTGGGTGCAGACACTGTCTGA
| |
| gi|178850|gb|K00396.1|HUMAPOE3 ACTGGGTCGCTTTTGGGATTA--CCTGCGCTGGGTGCAGACACTGTCTGA
| |
| gi|178843|gb|K06396.1|HUMAPOE3 ACTGGGTCGCTTTTGGGATTA--CCTGCGCTGGGTGCAGACACTGTCTGA
| |
| ********************* ***************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 GCAGGTGCAGGAGGAGCTGCTCAGCTCCCAGGTCACCCAGGAACTGAGGG
| |
| gi|189350|gb|K10296.1|HUMAPOE3 GCAGGTGCAGGAGGAGCTGCTCAGCTCCCAGGTCACCCAGGAACTGAGGG
| |
| gi|178850|gb|K00396.1|HUMAPOE3 GCAGGTGCAGGAGGAGCTGCTCAGCTCCCAGGTCACCCAGGAACTGAGGG
| |
| gi|178843|gb|K06396.1|HUMAPOE3 GCAGGTGCAGGAGGAGCTGCTCAGCTCCCAGGTCACCCAGGAACTGAGGG
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 CGCTGATGGACGAGACCATGAAGGAGTTGAAGGCCTACAAATCGGAACTG
| |
| gi|189350|gb|K10296.1|HUMAPOE3 CGCTGATGGACGAGACCATGAAGGAGTTGAAGGCCTACAAATCGGAACTG
| |
| gi|178850|gb|K00396.1|HUMAPOE3 CGCTGATGGACGAGACCATGAAGGAGTTGAAGGCCTACAAATCGGAACTG
| |
| gi|178843|gb|K06396.1|HUMAPOE3 CGCTGATGGACGAGACCATGAAGGAGTTGAAGGCCTACAAATCGGAACTG
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 GAGGAACAACTGACCCCGGTGGCGGAGGAGACGCGGGCACGGCTGTCCAA
| |
| gi|189350|gb|K10296.1|HUMAPOE3 GAGGAACAACTGACCCCGGTGGCGGAGGAGACGCGGGCACGGCTGTCCAA
| |
| gi|178850|gb|K00396.1|HUMAPOE3 GAGGAACAACTGACCCCGGTGGCGGAGGAGACGCGGGCACGGCTGTCCAA
| |
| gi|178843|gb|K06396.1|HUMAPOE3 GAGGAACAACTGACCCCGGTGGCGGAGGAGACGCGGGCACGGCTGTCCAA
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 GGAGCTGCAGGCGGCGCAGGCCCGGCTGGGCGCGGACATGGAGGACGTGT
| |
| gi|189350|gb|K10296.1|HUMAPOE3 GGAGCTGCAGGCGGCGCAGGCCCGGCTGGGCGCGGACATGGAGGACGTGT
| |
| gi|178850|gb|K00396.1|HUMAPOE3 GGAGCTGCAGGCGGCGCAGGCCCGGCTGGGCGCGGACATGGAGGACGTGT
| |
| gi|178843|gb|K06396.1|HUMAPOE3 GGAGCTGCAGGCGGCGCAGGCCCGGCTGGGCGCGGACATGGAGGACGTGT
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 GCGGCCGCCTGGTGCAGTACCGCGGCGAGGTGCAGGCCATGCTCGGCCAG
| |
| gi|189350|gb|K10296.1|HUMAPOE3 GCGGCCGCCTGGTGCAGTACCGCGGCGAGGTGCAGGCCATGCTCGGCCAG
| |
| gi|178850|gb|K00396.1|HUMAPOE3 GCGGCCGCCTGGTGCAGTACCGCGGCGAGGTGCAGGCCATGCTCGGCCAG
| |
| gi|178843|gb|K06396.1|HUMAPOE3 GCGGCCGCCTGGTGCAGTACCGCGGCGAGGTGCAGGCCATGCTCGGCCAG
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 AGCACCGAGGAGCTGCGGGTGCGCCTCGCCTCCCACCTGCGCAAGCTGCG
| |
| gi|189350|gb|K10296.1|HUMAPOE3 AGCACCGAGGAGCTGCGGGTGCGCCTCGCCTCCCACCTGCGCAAGCTGCG
| |
| gi|178850|gb|K00396.1|HUMAPOE3 AGCACCGAGGAGCTGCGGGTGCGCCTCGCCTCCCACCTGCGCAAGCTGCG
| |
| gi|178843|gb|K06396.1|HUMAPOE3 AGCACCGAGGAGCTGCGGGTGCGCCTCGCCTCCCACCTGCGCAAGCTGCG
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 TAAGCGGCTCCTCCGCGATGCCGATGACCTGCAGAAGCGCCTGGCAGTGT
| |
| gi|189350|gb|K10296.1|HUMAPOE3 TAAGCGGCTCCTCCGCGATGCCGATGACCTGCAGAAGCGCCTGGCAGTGT
| |
| gi|178850|gb|K00396.1|HUMAPOE3 TAAGCGGCTCCTCCGCGATGCCGATGACCTGCAGAAGCGCCTGGCAGTGT
| |
| gi|178843|gb|K06396.1|HUMAPOE3 TAAGCGGCTCCTCCGCGATGCCGATGACCTGCAGAAGCGCCTGGCAGTGT
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 ACCAGGCCGGGGCCCGCGAGGGCGCCGAGCGCGGCCTCAGCGCCATCCGC
| |
| gi|189350|gb|K10296.1|HUMAPOE3 ACCAGGCCGGGGCCCGCGAGGGCGCCGAGCGCGGCCTCAGCGCCATCCGC
| |
| gi|178850|gb|K00396.1|HUMAPOE3 ACCAGGCCGGGGCCCGCGAGGGCGCCGAGCGCGGCCTCAGCGCCATCCGC
| |
| gi|178843|gb|K06396.1|HUMAPOE3 ACCAT-------CCCGCGAGGGCGCCGAGCGCGGCCTCAGCGCCATCCGC
| |
| **** **************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 GAGCGCCTGGGGCCCCTGGTGGAACAGGGCCGCGTGCGGGCCGCCACTGT
| |
| gi|189350|gb|K10296.1|HUMAPOE3 GAGCGCCTGGGGCCCCTGGTGGAACAGGGCCGCGTGCGGGCCGCCACTGT
| |
| gi|178850|gb|K00396.1|HUMAPOE3 GAGCGCCTGGGGCCCCTGGTGGAACAGGGCCGCGTGCGGGCCGCCACTGT
| |
| gi|178843|gb|K06396.1|HUMAPOE3 GAGCGCCTGGGGCCCCTGGTGGAACAGGGCCGCGTGCGGGCCGCCACTGT
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 GGGCTCCCTGGCCGGCCAGCCGCTACAGGAGCGGGCCCAGGCCTGGGGCG
| |
| gi|189350|gb|K10296.1|HUMAPOE3 GGGCTCCCTGGCCGGCCAGCCGCTACAGGAGCGGGCCCAGGCCTGGGGCG
| |
| gi|178850|gb|K00396.1|HUMAPOE3 GGGCTCCCTGGCCGGCCAGCCGCTACAGGAGCGGGCCCAGGCCTGGGGCG
| |
| gi|178843|gb|K06396.1|HUMAPOE3 GGGCTCCCTGGCCGGCCAGCCGCTACAGGAGCGGGCCCAGGCCTGGGGCG
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 AGCGGCTGCGCGCGCGGATGGAGGAGATGGGCAGCCGGACCCGCGACCGC
| |
| gi|189350|gb|K10296.1|HUMAPOE3 AGCGGCTGCGCGCGCGGATGGAGGAGATGGGCAGCCGGACCCGCGACCGC
| |
| gi|178850|gb|K00396.1|HUMAPOE3 AGCGGCTGCGCGCGCGGATGGAGGAGATGGGCAGCCGGACCCGCGACCGC
| |
| gi|178843|gb|K06396.1|HUMAPOE3 AGCGGCTGCGCGCGCGGATGGAGGAGATGGGCAGCCGGACCCGCGACCGC
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 CTGGACGAGGTGAAGGAGCAGGTGGCGGAGGTGCGCGCCAAGCTGGAGGA
| |
| gi|189350|gb|K10296.1|HUMAPOE3 CTGGACGAGGTGAAGGAGCAGGTGGCGGAGGTGCGCGCCAAGCTGGAGGA
| |
| gi|178850|gb|K00396.1|HUMAPOE3 CTGGACGAGGTGAAGGAGCAGGTGGCGGAGGTGCGCGCCAAGCTGGAGGA
| |
| gi|178843|gb|K06396.1|HUMAPOE3 CTGGACGAGGTGAAGGAGCAGGTGGCGGAGGTGCGCGCCAAGCTGGAGGA
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 GCAGGCCCAGCAGATACGCCTGCAGGCCGAGGCCTTCCAGGCCCGCCTCA
| |
| gi|189350|gb|K10296.1|HUMAPOE3 GCAGGCCCAGCAGATACGCCTGCAGGCCGAGGCCTTCCAGGCCCGCCTCA
| |
| gi|178850|gb|K00396.1|HUMAPOE3 GCAGGCCCAGCAGATACGCCTGCAGGCCGAGGCCTTCCAGGCCCGCCTCA
| |
| gi|178843|gb|K06396.1|HUMAPOE3 GCAGGCCCAGCAGATACGCCTGCAGGCCGAGGCCTTCCAGGCCCGCCTCA
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 AGAGCTGGTTCGAGCCCCTGGTGGAAGACATGCAGCGCCAGTGGGCCGGG
| |
| gi|189350|gb|K10296.1|HUMAPOE3 AGAGCTGGTTCGAGCCCCTGGTGGAAGACATGCAGCGCCAGTGGGCCGGG
| |
| gi|178850|gb|K00396.1|HUMAPOE3 AGAGCTGGTTCGAGCCCCTGGTGGAAGACATGCAGCGCCAGTGGGCCGGG
| |
| gi|178843|gb|K06396.1|HUMAPOE3 AGAGCTGGTTCGAGCCCCTGGTGGAAGACATGCAGCGCCAGTGGGCCGGG
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 CTGGTGGAGAAGGTGCAGGCTGCCGTGGGCACCAGCGCCGCCCCTGTGCC
| |
| gi|189350|gb|K10296.1|HUMAPOE3 CTGGTGGAGAAGGTGCAGGCTGCCGTGGGCACCAGCGCCGCCCCTGTGCC
| |
| gi|178850|gb|K00396.1|HUMAPOE3 CTGGTGGAGAAGGTGCAGGCTGCCGTGGGCACCAGCGCCGCCCCTGTGCC
| |
| gi|178843|gb|K06396.1|HUMAPOE3 CTGGTGGAGAAGGTGCAGGCTGCCGTGGGCACCAGCGCCGCCCCTGTGCC
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 CAGCGACAATCACTGAACGCCGAAGCCTGCAGCCATGCGACCCCACGCCA
| |
| gi|189350|gb|K10296.1|HUMAPOE3 CAGCGACAATCACTGAACGCCGAAGCCTGCAGCCATGCGACCCCACGCCA
| |
| gi|178850|gb|K00396.1|HUMAPOE3 CAGCGACAATCACTGAACGCCGAAGCCTGCAGCCATGCGACCCCACGCCA
| |
| gi|178843|gb|K06396.1|HUMAPOE3 CAGCGACAATCACTGAACGCCGAAGCCTGCAGCCATGCGACCCCACGCCA
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 CCCCGTGCCTCCTGCCTCCGCGCAGCCTGCAGCGGGAGACCCTGTCCCCG
| |
| gi|189350|gb|K10296.1|HUMAPOE3 CCCCGTGCCTCCTGCCTCCGCGCAGCCTGCAGCGGGAGACCCTGTCCCCG
| |
| gi|178850|gb|K00396.1|HUMAPOE3 CCCCGTGCCTCCTGCCTCCGCGCAGCCTGCAGCGGGAGACCCTGTCCCCG
| |
| gi|178843|gb|K06396.1|HUMAPOE3 CCCCGTGCCTCCTGCCTCCGCGCAGCCTGCAGCGGGAGACCCTGTCCCCG
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 CCCCAGCCGTCCTCCTGGGGTGGACCCTAGTTTAATAAAGATTCACCAAG
| |
| gi|189350|gb|K10296.1|HUMAPOE3 CCCCAGCCGTCCTCCTGGGGTGGACCCTAGTTTAATAAAGATTCACCAAG
| |
| gi|178850|gb|K00396.1|HUMAPOE3 CCCCAGCCGTCCTCCTGGGGTGGACCCTAGTTTAATAAAGATTCACCAAG
| |
| gi|178843|gb|K06396.1|HUMAPOE3 CCCCAGCCGTCCTCCTGGGGTGGACCCTAGTTTAATAAAGATTCACCAAG
| |
| **************************************************
| |
|
| |
| gi|178350|gb|K00296.1|HUMAPOE3 TTTCACGT
| |
| gi|189350|gb|K10296.1|HUMAPOE3 TTTCACGT
| |
| gi|178850|gb|K00396.1|HUMAPOE3 TTTCACGC
| |
| gi|178843|gb|K06396.1|HUMAPOE3 TTTCACGC
| |
| *******
| |
| </pre>
| |
| Each of the two files contains two sequences (I made fake changes to each).
| |