User:R. Eric Collins/GenomicsTutorial/Genomics/Selection

The purpose of this exercise is to become familiar with the following:

Genetic similarity/Evolutionary relatedness
- Homology = similar
- Orthology = separated by speciation event
- Parology = separated by gene duplication event

Selection (at gene level)
- Neutral = no effect on fitness
- Negative = lower fitness
- Positive = higher fitness

Go to KEGG Pathways
Enter your favorite biomolecule (e.g. sulfate)
Find a pathway of interest involving your biomolecule (e.g. Sulfur Metabolism), click on map image
This shows a map of metabolic reactions and the Enzyme Commission (EC) numbers of the enzymes that mediate them. Click on "Reference Pathway", change it to "Reference Pathway (KO)" and click "Go". This will show which parts of the pathway are represented by genes from existing complete genome sequences.
Click on "Pathway Entry" at the top. The will go to a page with summary information about the pathway. From here you can obtain a list of all the organisms that have entries in the pathway ("All Organisms" button), and a list of which genes are in which organism ("Ortholog Table" button).

Select a gene of interest (under "Orthology") by clicking the KEGG Orthology (KO) link (e.g. K00394) or go Back to the map to select an enzyme mediating a reaction of interest
Note the KEGG Orthology number of your gene of interest
go to IMG
Click "Find Functions", enter your KO number and select "KEGG Orthology ID" from the dropdown list
If you want to limit your search to certain taxonomic groups (e.g. Deltaproteobacteria), select them below (use Tree view to select large groups). Otherwise all Complete and Draft genomes from all 3 Domains will be searched
Click the number under "Gene Count" if you get results
The gene list can be further filtered, e.g. by searching for a specific genus (e.g. Desulfovibrio)
Click the Select box next to the genes you want to keep and click "Add Selected Genes to Cart"
From the Gene Cart a number of analyses can be performed, including genomic neighborhood alignments, export of protein or nucleic acid sequences, and sequence alignments.

Select "DNA" under "Sequence Alignment" and click "Do Alignment"
Click "Launch Jalview"
Remove any misaligned sequences and correct any obvious mistakes in the alignment, e.g. gaps that are not replicates of 3 nucleotides (1 codon)
Select "File --> Output to textbox --> FASTA". Copy and paste into a text editor (e.g. Notepad, TextWrangler) and save to disk under a sensible name

Open in the text editor. Remove the stop codons if they are present at the end of the sequence (TAG, TAA, or TGA)
Rename sequences to something sensible, save, close file
Go to http://www.datamonkey.org/dataupload.php
Choose your file, select "Codon" data, "Universal" code and click "Upload"
If the results look alright, click "Proceed to Analysis Menu"
Precomputed results for all analyses
"Execute" an automatic model selection tool.
Run SLAC, FEL, and REL methods using selected model
Explore results using Integrative Selection Analysis
If 3D structure is known, sites under selection may be visualized, e.g. Crystal structure of adenylylsulfate reductase from Desulfovibrio gigas from NCBI Structure or Protein Data Bank

Contents