KP Ramirez Week 8: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
 
(41 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Week 8 ==
== Week 8 ==


* Journal Club presentations in class.
==Group==
[[User:Kevin A Paiz-Ramirez|KP Ramirez]] & [[User: Janelle N. Ruiz|Janelle Ruiz]]
 
'''Former Question''': Are their differences in HIV-1 diversity or divergence between participants with high CD4 T cell variability within the study (between visits) as compared to participants with linear ‘progression’ (defined as CD4 T cell counts which fall rapidly, or linearly, over time (slope ~ -1)?
 
'''Former Prediction''': We predict that participants with high variability in T cell count between visits will show a lower HIV-1 diversity and divergence than participants with linear progression.
This is predicted under two assumptions
*(1) High diversity and divergence of HIV-1 variants indicates a more rapidly progressing virus (and thus a steadily falling CD4 T cell count)
*(2) high variability in T-Cell count will indicate a participant’s immune system was able to manage this virus better than a participant with a steadily falling CD4 counts. If we do see high diversity in participants with high variability in T cell count between visits, we predict that these will be predominantly synonymous mutations as opposed to non-synonymous mutations (which we would expect to see with linear progressors).
 
'''Subjects Chosen''':
*Linear Progressors: (slope: -1) Subject: 4, 10
*High Variability between visits: Subject 12, 8
*(Low Variability between visits: 5
 


=== Working with Protein Sequences In-class Activity ===
=== Working with Protein Sequences In-class Activity ===


* This week we will begin to learn how to analyze protein structures.  For today, we will be using the ''Bioinformatics for Dummies'' book extensively, so be sure to bring it to class. We will be using some bioinformatics tools to analyze the structure of the gp120 envelope protein.
* This week we will begin to learn how to analyze protein structures.  For today, we will be using the ''Bioinformatics for Dummies'' book extensively, so be sure to bring it to class. We will be using some bioinformatics tools to analyze the structure of the gp120 envelope protein.
* Chapter 2Retrieving Protein Sequences/Retrieving a list of Related protein sequences (pp. 42-51 in second edition).  The example worked through in the book uses the sequence of an enzyme called dUTPase.  Follow the book example yourself and then work through the example again, this time using the HIV gp120 envelope protein instead.
*#Completed
* Chapter 4: Reading a SWISS-PROT entry (pp. 110-123 in the second edition).  The example worked through in the book is the epidermal growth factor receptor.  Work through this example and then do it again with the HIV gp120 envelope protein instead.
===Chapter 2===
* Chapter 5: ORFing your DNA sequence (pp. 146-147 in second edition).  In the previous section of the course, we were working with DNA sequences from the HIV gp120 envelope protein.  Take one of your DNA sequences and follow the instructions to find the open reading frames in the sequence.  Since you were working with just a portion of the entire envelope protein, you may get some strange results.  Compare your results with the SWISS-PROT entry you found for the protein above to decipher what the output means.  Besides the NCBI Open Reading Frame Finder described in the book, ExPASy also has a translation tool you can use, found [http://www.expasy.org/tools/dna.html here].
Retrieving Protein Sequences/Retrieving a list of Related protein sequences (pp. 42-51 in second edition).  The example worked through in the book uses the sequence of an enzyme called dUTPase.  Follow the book example yourself and then work through the example again, this time using the HIV gp120 envelope protein instead.
* Chapter 6: Working with a single protein sequence (pp. 159-195 in second edition).  Work through the following examples in this chapter using the entire HIV gp120 envelop protein sequence that you obtained from SWISS-PROT.  We will then compare the results of these analyses with the actual structure of the gp120 protein obtained by X-ray crystallography.
*#Human immunodeficiency virus type 1 (isolate HXB2 group M subtype B) (HIV-1)
*#P04578 Accession
*#Presented 4 pages of results
*#'''>sp|P04578|ENV_HV1H2 Envelope glycoprotein gp160 OS=Human immunodeficiency virus type 1 (isolate HXB2 group M subtype B) GN=env PE=1 SV=2
MRVKEKYQHLWRWGWRWGTMLLGMLMICSATEKLWVTVYYGVPVWKEATTTLFCASDAKA
YDTEVHNVWATHACVPTDPNPQEVVLVNVTENFNMWKNDMVEQMHEDIISLWDQSLKPCV
KLTPLCVSLKCTDLKNDTNTNSSSGRMIMEKGEIKNCSFNISTSIRGKVQKEYAFFYKLD
IIPIDNDTTSYKLTSCNTSVITQACPKVSFEPIPIHYCAPAGFAILKCNNKTFNGTGPCT
NVSTVQCTHGIRPVVSTQLLLNGSLAEEEVVIRSVNFTDNAKTIIVQLNTSVEINCTRPN
NNTRKRIRIQRGPGRAFVTIGKIGNMRQAHCNISRAKWNNTLKQIASKLREQFGNNKTII
FKQSSGGDPEIVTHSFNCGGEFFYCNSTQLFNSTWFNSTWSTEGSNNTEGSDTITLPCRI
KQIINMWQKVGKAMYAPPISGQIRCSSNITGLLLTRDGGNSNNESEIFRPGGGDMRDNWR
SELYKYKVVKIEPLGVAPTKAKRRVVQREKRAVGIGALFLGFLGAAGSTMGAASMTLTVQ
ARQLLSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARILAVERYLKDQQLLGIWGCSG
KLICTTAVPWNASWSNKSLEQIWNHTTWMEWDREINNYTSLIHSLIEESQNQQEKNEQEL
LELDKWASLWNWFNITNWLWYIKLFIMIVGGLVGLRIVFAVLSIVNRVRQGYSPLSFQTH
LPTPRGPDRPEGIEEEGGERDRDRSIRLVNGSLALIWDDLRSLCLFSYHRLRDLLLIVTR
IVELLGRRGWEALKYWWNLLQYWSQELKNSAVSLLNATAIAVAEGTDRVIEVVQGACRAI
RHIPRRIRQGLERILL'''
 
===Chapter 4===
Reading a SWISS-PROT entry (pp. 110-123 in the second edition).  The example worked through in the book is the epidermal growth factor receptor.  Work through this example and then do it again with the HIV gp120 envelope protein instead.
*#This was a lot different from the Bioinformatics for Dummies book, the entry name has been retained however now they actually spell out and specify Homosapien (HUMAN) rather then EGFR_HUMAN like in the book.
*#Primary (citable) accession number: P04578
*#Secondary accession number(s): O09779
*#These are now located at the bottom of the page
*#Integrated into UniProtKB/Swiss-Prot:August 13, 1987
*#Last sequence update:July 15, 1999
*#Last modified:March 2, 2010
*#Protein name: Recommended name:Envelope glycoprotein gp160 Alternative name(s):Env polyprotein
*#Gene names-Name:env
*#From: Homo sapiens (Human) [TaxID:11706 [NCBI]
*#Taxonomic lineage-Viruses › Retro-transcribing viruses › Retroviridae › Orthoretrovirinae › Lentivirus › Primate lentivirus group
'''Comments'''
*#The comments section has been completely reworked when compared to the Dummies book. The dummies book presented a simple table like format, the newer version has a paragraph format and now only has Function, Subunit structure, Sub cellular location, Domain, Post Translational Modifications, and Misc.
'''Cross References'''
*#The cross refernces sections are similar to the Dummies book, however, they have been further separated into
Sequence databases
3D structure databases
Genome annotation databases
Enzyme and pathway databases
Family and domain databases
'''The Features'''
*#The features section has been changed to sequence annotation
Molecule processing
Regions
Sites
Amino acid modifications
Experimental info
Secondary structure
 
[[Image:Sequece itself.jpg]]
 
===Chapter 5===
ORFing your DNA sequence (pp. 146-147 in second edition).  In the previous section of the course, we were working with DNA sequences from the HIV gp120 envelope protein.  Take one of your DNA sequences and follow the instructions to find the open reading frames in the sequence.  Since you were working with just a portion of the entire envelope protein, you may get some strange results.  Compare your results with the SWISS-PROT entry you found for the protein above to decipher what the output means.  Besides the NCBI Open Reading Frame Finder described in the book, ExPASy also has a translation tool you can use, found [http://www.expasy.org/tools/dna.html here].
 
*Chose Subject 4, a Linear progressor that we examined during our former project.
 
[[Image:Subject10.jpg]]
 
*This was then compared to against SWISSPROT entry. The ORF sequence first appeared to be completely similar, however there were a couple of differences in the sequence between the two.
 
===Chapter 6===
Working with a single protein sequence (pp. 159-195 in second edition).  Work through the following examples in this chapter using the entire HIV gp120 envelop protein sequence that you obtained from SWISS-PROT.  We will then compare the results of these analyses with the actual structure of the gp120 protein obtained by X-ray crystallography.
** ProtParam
** ProtParam
** Looking for transmembrane segments
*Began by using the ProParam tool for P04578 ENV_HIVH2
*The sequence ENV_HV1H2 consists of 856 amino acids.
 
[[Image:Dpic.jpg]]
[[Image:DPic2.jpg]]
[[Image:DPic4.jpg]]
[[Image:Dpic5.jpg]]
 
*Used the expasy tool page in order to carry out a primary structure analysis.
*Used the accession number P04578 from Swiss-Prot into the ProtParam.
*ProtParam generated the parameters of the entire gp120 sequence that we selected.
 
*Then pasted the gp120 seguence into the ExPasy website again in order to cut the protein.
Looking for transmembrane segmenting
We used the accession number P04578 again and put it into the ExPASy ProtScale site and conducted a full range analysis.
The image was retrieved in GIF format.
 
[[Image:Dpicpic.jpg]]
 
 
Interpreting ProtScale results
 
 
A piece of paper was used to help us locate the strongest peaks on the graph.
We determined that there were four important transmembrane regions.
 
==TMHMM==
 
[[Image:THMM.jpg]]
I generated the TMHMM results by using a FASTA sequence of the gp120 protein.
[[Image:Coils.JPG]]
 
 


=== HIV Structure Research Project ===
===Looking for PROSITE patterns===
*Used the accession number P04578  to determine which proteins we wanted to be scanned and then started the scan.
[[Image:Chapter6PT1.JPG]]


Today you will begin your HIV gp120 Structure Research Project.
==InterProScan==
*For this section I used the fasta outlined in the Dummies book.
[[Image:Picturepic.jpg]]


* For this project, you can choose to work with the same sequences you used for the [[BIOL398-01/S10:HIV Evolution | HIV Evolution Project]], or you may choose different sequences.  You will reframe your question from the HIV Evolution Project to make it a structure→function question.  Instead of looking at how the evolution of variation of the viral DNA sequence affects the different patient groups, you will look at how variations in the viral sequence affect the structure and, therefore, function of the virus.  To answer your question, you will need to do the following:
*This involved pasting the gp120 sequence that was given in the book and removing some of the larger databases to assist the search
# Convert your DNA sequences into protein sequences. 
#* How will you do this? 
#* How will you know that it was done correctly?
# Perform a multiple sequence alignment on the protein sequences. 
#* Are there more or fewer differences between the sequences when you look at the DNA sequences versus the protein sequences? 
#* How do you account for this?
# Which of the procedures from Chapter 6 that you ran on the entire gp120 sequence are applicable to the V3 fragment you are working with now? 
#* How are they applicable?
# Chapter 11 contains procedures to use for working with protein 3D structures.  Find the section on "Predicting the Secondary Structure of a Protein Sequence" and perform this on both the entire gp120 sequence and on the V3 fragment that we are now working with.  You will compare the predictions with the actual structures.
# Download the structure files for the papers we read in journal club from the [http://www.ncbi.nlm.nih.gov/sites/entrez?db=Structure&itool=toolbar NCBI Structure Database].  These files can be opened with the [http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml Cn3D software site] that is installed on the computers in the lab (this software is free, so you can download it and use it at home, too.)  Familiarize yourself with the software features (rendering and coloring) with both the gp120 peptide and ternary complex structures.  (The ''Dummies'' book has some information on this program as well).  Answer the following:
#* Find the N-terminus and C-terminus of each (poly)peptide structure.
#* Locate all the secondary structure elements.  Do these match the predictions you made above?
#* Locate the V3 region and figure out which sequences from your alignment are present in the structures and which sequences are absent.
# Once you have oriented yourself, analyze whether the amino acid changes that you see in the multiple sequence alignment would affect the 3D structure and explain why you think this.
# The journal club papers we read are quite old already for a fast-moving field.  Using the [http://0-apps.isiknowledge.com.linus.lmu.edu/ Web of Science] (or [http://www.ncbi.nlm.nih.gov/pubmed/ PubMed] or [http://www.ncbi.nlm.nih.gov/sites/entrez?db=Structure&itool=toolbar Structure]) databases, find at least one more recent publication that has a structure of gp120 (V3) in it and download the structure file to view.  What additional information has been learned from this new paper?
# Your presentation will be formatted similarly to the previous [[BIOL398-01/S10:HIV Evolution | HIV Evolution Project]].  In this case, you will want to work on creating structure figures that illustrate what result you are trying to show.


'''Finding Domains With the CD Server'''
*This condensed search, or CD server used NCBI conserved domains search site and added in the FASTA that I downloaded the results were as followed.
[[Image:Donedone.jpg]]


*This was interpreted using the Bioinformatics for dummies book.
{{Kevin A Paiz-Ramirez}}
{{Kevin A Paiz-Ramirez}}

Latest revision as of 22:39, 14 March 2010

Week 8

Group

KP Ramirez & Janelle Ruiz

Former Question: Are their differences in HIV-1 diversity or divergence between participants with high CD4 T cell variability within the study (between visits) as compared to participants with linear ‘progression’ (defined as CD4 T cell counts which fall rapidly, or linearly, over time (slope ~ -1)?

Former Prediction: We predict that participants with high variability in T cell count between visits will show a lower HIV-1 diversity and divergence than participants with linear progression. This is predicted under two assumptions

  • (1) High diversity and divergence of HIV-1 variants indicates a more rapidly progressing virus (and thus a steadily falling CD4 T cell count)
  • (2) high variability in T-Cell count will indicate a participant’s immune system was able to manage this virus better than a participant with a steadily falling CD4 counts. If we do see high diversity in participants with high variability in T cell count between visits, we predict that these will be predominantly synonymous mutations as opposed to non-synonymous mutations (which we would expect to see with linear progressors).

Subjects Chosen:

  • Linear Progressors: (slope: -1) Subject: 4, 10
  • High Variability between visits: Subject 12, 8
  • (Low Variability between visits: 5


Working with Protein Sequences In-class Activity

  • This week we will begin to learn how to analyze protein structures. For today, we will be using the Bioinformatics for Dummies book extensively, so be sure to bring it to class. We will be using some bioinformatics tools to analyze the structure of the gp120 envelope protein.
    1. Completed

Chapter 2

Retrieving Protein Sequences/Retrieving a list of Related protein sequences (pp. 42-51 in second edition). The example worked through in the book uses the sequence of an enzyme called dUTPase. Follow the book example yourself and then work through the example again, this time using the HIV gp120 envelope protein instead.

    1. Human immunodeficiency virus type 1 (isolate HXB2 group M subtype B) (HIV-1)
    2. P04578 Accession
    3. Presented 4 pages of results
    4. >sp|P04578|ENV_HV1H2 Envelope glycoprotein gp160 OS=Human immunodeficiency virus type 1 (isolate HXB2 group M subtype B) GN=env PE=1 SV=2

MRVKEKYQHLWRWGWRWGTMLLGMLMICSATEKLWVTVYYGVPVWKEATTTLFCASDAKA YDTEVHNVWATHACVPTDPNPQEVVLVNVTENFNMWKNDMVEQMHEDIISLWDQSLKPCV KLTPLCVSLKCTDLKNDTNTNSSSGRMIMEKGEIKNCSFNISTSIRGKVQKEYAFFYKLD IIPIDNDTTSYKLTSCNTSVITQACPKVSFEPIPIHYCAPAGFAILKCNNKTFNGTGPCT NVSTVQCTHGIRPVVSTQLLLNGSLAEEEVVIRSVNFTDNAKTIIVQLNTSVEINCTRPN NNTRKRIRIQRGPGRAFVTIGKIGNMRQAHCNISRAKWNNTLKQIASKLREQFGNNKTII FKQSSGGDPEIVTHSFNCGGEFFYCNSTQLFNSTWFNSTWSTEGSNNTEGSDTITLPCRI KQIINMWQKVGKAMYAPPISGQIRCSSNITGLLLTRDGGNSNNESEIFRPGGGDMRDNWR SELYKYKVVKIEPLGVAPTKAKRRVVQREKRAVGIGALFLGFLGAAGSTMGAASMTLTVQ ARQLLSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARILAVERYLKDQQLLGIWGCSG KLICTTAVPWNASWSNKSLEQIWNHTTWMEWDREINNYTSLIHSLIEESQNQQEKNEQEL LELDKWASLWNWFNITNWLWYIKLFIMIVGGLVGLRIVFAVLSIVNRVRQGYSPLSFQTH LPTPRGPDRPEGIEEEGGERDRDRSIRLVNGSLALIWDDLRSLCLFSYHRLRDLLLIVTR IVELLGRRGWEALKYWWNLLQYWSQELKNSAVSLLNATAIAVAEGTDRVIEVVQGACRAI RHIPRRIRQGLERILL

Chapter 4

Reading a SWISS-PROT entry (pp. 110-123 in the second edition). The example worked through in the book is the epidermal growth factor receptor. Work through this example and then do it again with the HIV gp120 envelope protein instead.

    1. This was a lot different from the Bioinformatics for Dummies book, the entry name has been retained however now they actually spell out and specify Homosapien (HUMAN) rather then EGFR_HUMAN like in the book.
    2. Primary (citable) accession number: P04578
    3. Secondary accession number(s): O09779
    4. These are now located at the bottom of the page
    5. Integrated into UniProtKB/Swiss-Prot:August 13, 1987
    6. Last sequence update:July 15, 1999
    7. Last modified:March 2, 2010
    8. Protein name: Recommended name:Envelope glycoprotein gp160 Alternative name(s):Env polyprotein
    9. Gene names-Name:env
    10. From: Homo sapiens (Human) [TaxID:11706 [NCBI]
    11. Taxonomic lineage-Viruses › Retro-transcribing viruses › Retroviridae › Orthoretrovirinae › Lentivirus › Primate lentivirus group

Comments

    1. The comments section has been completely reworked when compared to the Dummies book. The dummies book presented a simple table like format, the newer version has a paragraph format and now only has Function, Subunit structure, Sub cellular location, Domain, Post Translational Modifications, and Misc.

Cross References

    1. The cross refernces sections are similar to the Dummies book, however, they have been further separated into
Sequence databases
3D structure databases
Genome annotation databases 
Enzyme and pathway databases
Family and domain databases

The Features

    1. The features section has been changed to sequence annotation
Molecule processing
Regions
Sites
Amino acid modifications
Experimental info
Secondary structure

Chapter 5

ORFing your DNA sequence (pp. 146-147 in second edition). In the previous section of the course, we were working with DNA sequences from the HIV gp120 envelope protein. Take one of your DNA sequences and follow the instructions to find the open reading frames in the sequence. Since you were working with just a portion of the entire envelope protein, you may get some strange results. Compare your results with the SWISS-PROT entry you found for the protein above to decipher what the output means. Besides the NCBI Open Reading Frame Finder described in the book, ExPASy also has a translation tool you can use, found here.

  • Chose Subject 4, a Linear progressor that we examined during our former project.

  • This was then compared to against SWISSPROT entry. The ORF sequence first appeared to be completely similar, however there were a couple of differences in the sequence between the two.

Chapter 6

Working with a single protein sequence (pp. 159-195 in second edition). Work through the following examples in this chapter using the entire HIV gp120 envelop protein sequence that you obtained from SWISS-PROT. We will then compare the results of these analyses with the actual structure of the gp120 protein obtained by X-ray crystallography.

    • ProtParam
  • Began by using the ProParam tool for P04578 ENV_HIVH2
  • The sequence ENV_HV1H2 consists of 856 amino acids.

  • Used the expasy tool page in order to carry out a primary structure analysis.
  • Used the accession number P04578 from Swiss-Prot into the ProtParam.
  • ProtParam generated the parameters of the entire gp120 sequence that we selected.
  • Then pasted the gp120 seguence into the ExPasy website again in order to cut the protein.

Looking for transmembrane segmenting We used the accession number P04578 again and put it into the ExPASy ProtScale site and conducted a full range analysis. The image was retrieved in GIF format.


Interpreting ProtScale results


A piece of paper was used to help us locate the strongest peaks on the graph. We determined that there were four important transmembrane regions.

TMHMM

I generated the TMHMM results by using a FASTA sequence of the gp120 protein.


Looking for PROSITE patterns

  • Used the accession number P04578 to determine which proteins we wanted to be scanned and then started the scan.

InterProScan

  • For this section I used the fasta outlined in the Dummies book.

  • This involved pasting the gp120 sequence that was given in the book and removing some of the larger databases to assist the search

Finding Domains With the CD Server

  • This condensed search, or CD server used NCBI conserved domains search site and added in the FASTA that I downloaded the results were as followed.

  • This was interpreted using the Bioinformatics for dummies book.

Journal Assignments

KP Ramirez Week 2 KP Ramirez Week 6 KP Ramirez Week OFF
KP Ramirez Week 3 KP Ramirez Week 7 KP Ramirez Week 11
KP Ramirez Week 4 KP Ramirez Week 8 KP Ramirez Week 12
KP Ramirez Week 5 KP Ramirez Week 9 KP Ramirez Week 13

Shared Journals

  1. Week 2
  2. Week 3
  3. Week 4
  4. Week 5
  5. Week 6
  6. Week 7
  7. Week 8
  8. Week 9
  9. Week 10
  10. Week 11
  11. Week 12
  12. Week 13


Useful Links