User:R. Eric Collins/Goldschmidt

From OpenWetWare
Jump to navigationJump to search

Genomics Demonstration

Goal: Find out what the Mystery Sequence is and to which metabolic pathway it belongs

  • Go to the JGI website. The Joint Genome Institute is responsible for a lot of sequencing of environmental microbes, especially those relating to energy, climate, geobiology, etc.
  • click on Find Genes --> BLAST
  • Enter the mystery sequence into the text box
  • Change the Program to 'blastx'. This will translate the DNA sequence into all 6 possible amino acid sequences and search for matches between them and a database of protein sequences from all available complete microbial genome sequences
  • Click 'Run Blast'
  • When the program has finished running, scroll down to the alignments showing the best matches.
  • How good is the hit? The Expect value (e-value) tells you about how probable it is that the hit is due to chance. For amino acid sequences, anything smaller than 1e-5 is considered a pretty good match. For nucleotide sequences, a more conservative 1e-10 may be used. For finding homologs (genes that share a common ancestor) or orthologs (genes that have diverged through speciation) it may be necessary to use more sophisticated methods of calling hits, because paralogs (genes that have duplicated within a genome) or convergent evolution can complicate matters, and matches to only a shared 'domain' (a 'self-contained' building block out of which proteins are formed) can lead to spurious hits.