Haynes:ChIPDataMining1: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
Line 23: Line 23:
# Set '''''clade''''' to mammal, '''''genome''''' to human, and '''''assembly''''' to Feb. 2009 (GRCh37/hg19).
# Set '''''clade''''' to mammal, '''''genome''''' to human, and '''''assembly''''' to Feb. 2009 (GRCh37/hg19).
# To add a track that contains the chromatin mapping data of interest, set group to Regulation, and track to a type of interest. For instance, try Broad Histone or UW Histone to analyze a histone mark.
# To add a track that contains the chromatin mapping data of interest, set group to Regulation, and track to a type of interest. For instance, try Broad Histone or UW Histone to analyze a histone mark.
# On the region line, click the “define regions” button. In the text window, paste in the text below (tab-delimited), then click the “submit” button.<br>
# On the region line, click the “define regions” button. In the text window, paste in the text below (tab-delimited), then click the “submit” button.<br>{|
{|
|-
|-
| chr7 || 5569982 || 5570482 || ACTB
| chr7 || 5569982 || 5570482 || ACTB
Line 68: Line 67:
# Set output format as “selected fields from primary and related tables.” Click the “get output” button.
# Set output format as “selected fields from primary and related tables.” Click the “get output” button.
# On the proceeding “Select Fields” page, select chrom, chromStart, chromEnd, name, and signalValue. Click the “get output” button. The output will be the coordinates of signalValues (from the track, from step 3) that fall within the regions of interest (from step 4). ''Note: coordinates for which the chromatin-mapping signalValue is zero will be absent from the output list.''
# On the proceeding “Select Fields” page, select chrom, chromStart, chromEnd, name, and signalValue. Click the “get output” button. The output will be the coordinates of signalValues (from the track, from step 3) that fall within the regions of interest (from step 4). ''Note: coordinates for which the chromatin-mapping signalValue is zero will be absent from the output list.''


==GALAXY: Intersecting Chromatin Mark Coordinates with Promoter Regions==
==GALAXY: Intersecting Chromatin Mark Coordinates with Promoter Regions==

Revision as of 12:09, 26 March 2014

<- Back to Protocols

Preliminary Analysis (UCSC Browser and ENCODE)

Part 1: Use the Genome Browser to visualize chromatin mapping data within the human genome sequence.

  1. Go to the UCSC Genome Browser at http://genome.ucsc.edu/cgi-bin/hgGateway.
  2. Enter a gene or region under search term. For instance, try “chr21:34398216-34401503” (these are the coordinates for the RefSeq annotation of the Polycomb-silenced OLIG2 gene). Click “submit.”
  3. A human genome browser window that shows the selected region should appear. Click the “hide all” button to hide all of the information.
  4. Under Genes and Gene Prediction Tracks, set RefSeq Genes to “full” and click the “refresh” button. The annotation of genes in the region you selected should appear in the browser window. Repeat this step for other features, if desired.
  5. To find ChIP-seq data from a publicly shared ChIP-seq experiment,
    1. Click the “track search” button. You will navigate away from the browser, but your coordinates will remain the same. In the next window, click the Advanced search tab. For the first and, set the menu to “Antibody or target protein,” and for is among, set the menu to the chromatin marks of interest. For instance, try H3K27me3 (07-449).
    2. For the second and, set the menu to “Cell, tissue, or DNA sample,” and for is among, set the menu to the chromatin marks of interest. For instance, try H1-hESC.
    3. Click the “search” button. In the list of results select one or more tracks and set each to “full.” Tracks designated as “Peaks” or “Hotspots” will show the mapping data as low-resolution, horizontal bars. Tracks designated as “Signal” will show signal intensities as high-resolution, vertical bars
    4. Click the “View in Browser” button.
  6. If you are viewing any Signal tracks and high values are cut-off by a red bar, right-click on the track in the genome browser and select configure. Select “auto-scale to data view” on the drop down list for “Data view scaling.”

Part 2: Use the Table Browser to retrieve chromatin mapping data values for a preliminary subset of genomic regions (500 bp surrounding transcription start sites of twenty control genes).

  1. Go to the UCSC Table Browser at http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=356574357.
  2. Set clade to mammal, genome to human, and assembly to Feb. 2009 (GRCh37/hg19).
  3. To add a track that contains the chromatin mapping data of interest, set group to Regulation, and track to a type of interest. For instance, try Broad Histone or UW Histone to analyze a histone mark.
  4. On the region line, click the “define regions” button. In the text window, paste in the text below (tab-delimited), then click the “submit” button.
    {|

|- | chr7 || 5569982 || 5570482 || ACTB |- | chr3 || 52231849 || 52232349 || ALAS1 |- | chr15 || 45003435 || 45003935 || B2M |- | chrX || 153774983 || 153775483 || G6PD |- | chr12 || 6643335 || 6643835 || GAPDH |- | chr7 || 65447051 || 65447551 || GUSB |- | chrX || 133593925 || 133594425 || HPRT1 |- | chrX || 77359416 || 77359916 || PGK1 |- | chr7 || 44835991 || 44836491 || PPIA |- | chr6 || 170863171 || 170863671 || TBP |- | chr1 || 212738426 || 212738926 || ATF3 |- | chr12 || 4382652 || 4383152 || CCND2 |- | chr9 || 21974882 || 21975382 || CDKN2A |- | chr2 || 38303073 || 38303573 || CYP1B1 |- | chr4 || 107957203 || 107957703 || DKK2 |- | chr7 || 27224585 || 27225085 || HOXA11 |- | chr16 || 56701727 || 56702227 || MT1G |- | chr20 || 62795577 || 62796077 || MYT1 |- | chr21 || 34397966 || 34398466 || OLIG2 |- | chr3 || 25469584 || 25470084 || RARB |}

  1. Set output format as “selected fields from primary and related tables.” Click the “get output” button.
  2. On the proceeding “Select Fields” page, select chrom, chromStart, chromEnd, name, and signalValue. Click the “get output” button. The output will be the coordinates of signalValues (from the track, from step 3) that fall within the regions of interest (from step 4). Note: coordinates for which the chromatin-mapping signalValue is zero will be absent from the output list.

GALAXY: Intersecting Chromatin Mark Coordinates with Promoter Regions

Part 1: Load chromatin mark coordinates into GALAXY

  1. In GALAXY (after creating an account) select “Create New” from the gear’s drop down list (located in the top right corner).
  2. In “Get Data” select “Upload File” or “UCSC Main”.
  3. If you have selected “Upload File” make sure the file format is “interval” and that your genome matches the assembly used in ENCODE. If you have selected “UCSC Main” follow steps 1, 2, 4, and 5 from “Filtering”, however, do not include signalValue this time. Click “done with selections.” Click “send query to Galaxy.”
  4. After the data appears under “History,” edit it by clicking the pencil. Rename the data as needed and make sure Chrom column, Start Column, End column, Name/Identifier column, and Strand column are set to 1, 2, 3, 4, and 5 respectively. Select the “Datatype” tab and make sure it is set to interval. Click “Save.”

Part 2: Load promoter regions into GALAXY Here, “promoter region” is defined as the 500 bp interval centered at each annotated transcription start site from the Refseq dataset.

  1. In “Get Data” select “Upload File” and upload the RefSeq_genes.txt file (included in Supplemental Material).
  2. Repeat step 4 for the RefSeq data, but rename the data as RefSeq genes.
  3. In “Operate on Genomic Intervals” select “Get Flanks”.
  4. Set Select data as RefSeq genes, Region as around start, Location of the flanking region as both, and Offset as 0 with the length of the flanking region set to 500. (need to debug this)

9. To edit the dataset, repeat step 4 for the new data and rename as “Promoters”. 10. In “Operate on Genomic Intervals” select “Join” and join “Promoters” with your first uploaded data set. 11. In “Text Manipulation” select “Cut”. 12. Under “Cut columns:” make sure you are cutting Chrom, Start, End, Name, and Strand (columns are specified as c1, c2, and so on). 13. To edit the dataset, repeat step 4 for the new data and rename as “Clean Promoters”. 14. To visualize your data in “Graph/Display Data” select “Build custom track” add your first uploaded dataset, RefSeq, and Clean Promoters. 15. Under your built custom track select “display at UCSC main” to visualize your data.