Haynes Lab:Notebook/Synthetic Biology and Bioinformatics for Predictable Control of Therapeutic Genes/2012/02/07: Difference between revisions
Caroline Hom (talk | contribs) (Autocreate 2012/02/07 Entry for Haynes_Lab:Notebook/Synthetic_Biology_and_Bioinformatics_for_Predictable_Control_of_Therapeutic_Genes) |
Caroline Hom (talk | contribs) |
||
Line 7: | Line 7: | ||
<!-- ##### DO NOT edit above this line unless you know what you are doing. ##### --> | <!-- ##### DO NOT edit above this line unless you know what you are doing. ##### --> | ||
==Entry title== | ==Entry title== | ||
* | * received a response back from ENCODE. Hopefully this procedure will create the best possible filter to fing genes with significant histone methylation peaks at the promoter | ||
See answers interspersed below: | |||
I am aware that I can check the boxes chrom, chromeSTART, and | |||
chromeEND and then copy and paste that into the Genome Browser. Is it | |||
possible for the tables to provide the location of the promoter (+/- | |||
500bps to the left and right of that region) instead of a being | |||
thrown onto a random area of the gene? | |||
The Table Browser makes it fairly easy to get regions that are some number of bases upstream of a gene; it slightly more work to get a region that is both upstream and downstream from the transcription start site. Depending on how you do it, you may wind up with gene names included or not included in your output. | |||
There are some different options for getting a custom track that has your regions of interest with gene names. One way would be to start by getting the BED file as suggested before (be sure to include "name" in the output), and then use either your own tools (such as Excel) to add or subtract 500 bases from the appropriate lines, or use Galaxy: http://main.g2.bx.psu.edu/. Galaxy works in conjunction with the Table Browser, and it has a lot more data and text manipulation tools. | |||
A perhaps easier way is to generate a BED file using MySQL to query the tables directly. These two queries will generate BED files from the knownCanonical and knownGene tables (knownCanonical contains one representative transcript for each cluster of transcript in UCSC Genes -- see more on the description page: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=knownGene): | |||
mysql> select knownCanonical.chrom, chromStart-500 as start, chromStart+500 as end, name, 0 as score, strand from knownGene, knownCanonical where knownGene.name=knownCanonical.transcript and strand='+'; | |||
mysql> select knownCanonical.chrom, chromStart-500 as start, chromStart+500 as end, name, 0 as score, strand from knownGene, knownCanonical where knownGene.name=knownCanonical.transcript and and strand='+'; | |||
The results from these two queries can be concatenated into one file and uploaded as a custom track. Here is a session that contains exactly that: | |||
http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Rhead&hgS_otherUserSessionName=CarlyHomPromoters | |||
Feel free to use the Table Browser to download the contents of the custom track . . . use either "all fields from selected table" or BED as the output format. | |||
Also, how exactly is the histone methylation score measured? I want | |||
to be able to select a specific score range (ex:>= 500), but I am | |||
unsure of what to qualify as a significant enough of a signal since I | |||
do not know what the score is being measured relative to and what the | |||
top score possible is. | |||
When you have the table selected in the Table Browser, hit the "describe table schema" button. You should see a description of the score, and usually, a link to the range of scores that occur in the table. | |||
Lastly, is there a way to add the gene name to the table output? It | |||
makes it a lot easier than having to determine the gene names when | |||
some genes are very close to each other and have an area of overlap. | |||
I know that the gene name check box is available in the group | |||
knownCANONICAL, but I can't seem to find it when I am in the | |||
Regulation group with the 'selected fields from primary and related | |||
tables' output format. If you could get back to me on these items | |||
that would be great. Thank you! | |||
Generally, when you do an intersection in the Table Browser, the fields of the table that are selected first are the fields that are retained in the output. (We don't have a way to get fields from both sets of tables, but Galaxy does.) Because you are creating a filter for the second table, you will need to first create a custom track of the regions in your second table that pass your filter. Then select your promoter custom track and intersect it with your second custom track. | |||
*Also a paper was found on the Hotspot scoring algorithm from 2004 by Sabo et. al | |||
[[http://openwetware.org/images/6/63/PNAS-2004-Sabo-4537-42.pdf Genome-wide identification of DNaseI hypersensitive sites using active chromatin sequence libraries]] | |||
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### --> | <!-- ##### DO NOT edit below this line unless you know what you are doing. ##### --> |
Revision as of 12:43, 8 February 2012
Project name | <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page <html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html> </html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html> |
Entry title
See answers interspersed below:
The Table Browser makes it fairly easy to get regions that are some number of bases upstream of a gene; it slightly more work to get a region that is both upstream and downstream from the transcription start site. Depending on how you do it, you may wind up with gene names included or not included in your output. There are some different options for getting a custom track that has your regions of interest with gene names. One way would be to start by getting the BED file as suggested before (be sure to include "name" in the output), and then use either your own tools (such as Excel) to add or subtract 500 bases from the appropriate lines, or use Galaxy: http://main.g2.bx.psu.edu/. Galaxy works in conjunction with the Table Browser, and it has a lot more data and text manipulation tools. A perhaps easier way is to generate a BED file using MySQL to query the tables directly. These two queries will generate BED files from the knownCanonical and knownGene tables (knownCanonical contains one representative transcript for each cluster of transcript in UCSC Genes -- see more on the description page: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=knownGene): mysql> select knownCanonical.chrom, chromStart-500 as start, chromStart+500 as end, name, 0 as score, strand from knownGene, knownCanonical where knownGene.name=knownCanonical.transcript and strand='+'; mysql> select knownCanonical.chrom, chromStart-500 as start, chromStart+500 as end, name, 0 as score, strand from knownGene, knownCanonical where knownGene.name=knownCanonical.transcript and and strand='+'; The results from these two queries can be concatenated into one file and uploaded as a custom track. Here is a session that contains exactly that: Feel free to use the Table Browser to download the contents of the custom track . . . use either "all fields from selected table" or BED as the output format.
When you have the table selected in the Table Browser, hit the "describe table schema" button. You should see a description of the score, and usually, a link to the range of scores that occur in the table.
Generally, when you do an intersection in the Table Browser, the fields of the table that are selected first are the fields that are retained in the output. (We don't have a way to get fields from both sets of tables, but Galaxy does.) Because you are creating a filter for the second table, you will need to first create a custom track of the regions in your second table that pass your filter. Then select your promoter custom track and intersect it with your second custom track.
[Genome-wide identification of DNaseI hypersensitive sites using active chromatin sequence libraries] |