Haynes:GalaxyChiP: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 14: Line 14:


==View Human Genes as a Map (Track)==
==View Human Genes as a Map (Track)==
When an organism's entire genetic content has been "read" by biochemical sequencing techniques, every single A T C, and G, is recorded in a database, in the order that each letter (nucleotide) appears in the organism. The collective data is referred to as a sequenced genome. Because each nucleotide has a position, each one gets a number, or coordinate. Since genes are made up of nucleotides, the start and end of each gene is assigned a range of coordinates. All of this data is stored and shared with scientists so that we can use the same information to make discoveries that link back to a universal set of coordinates in whatever genome we are investigating. The data are usually stored as tables with thousands of rows of numbers. '''Galaxy can be used to turn the numerical data into easy-to-view illustrated genomic maps.'''
When an organism's entire genetic content has been "read" by biochemical sequencing techniques, every single A, T, C, and G is recorded in a database, in the order that each letter (nucleotide) appears in the organism. The collective data is referred to as a sequenced genome. Because each nucleotide has a position, each one gets a number, or coordinate. Since genes are made up of nucleotides, the start and end of each gene is assigned a range of coordinates. All of this data is stored and shared with scientists so that we can use the same information to make discoveries that link back to a universal set of coordinates in whatever genome we are investigating. The data are usually stored as tables with thousands of rows of numbers. '''Galaxy can be used to turn the numerical data into easy-to-view illustrated genomic maps.'''


'''Part 1: Transfer gene mapping data to Galaxy'''
'''Part 1: Transfer gene mapping data to Galaxy'''

Revision as of 16:51, 28 March 2014

<- Back to Protocols

Getting Started: Create a Galaxy Account

Intro: Databases usually do a very good job of collecting published data, sharing data, and making this data searchable/ usable by scientists outside of the publishing author's lab. However, it is almost impossible for the databases to keep up with all data that is constantly being generated by different labs. Therefore, you still need some DIY skills to make use of the most recent data. Scientists are pretty smart, but not everyone can become a computer software developer and make his/her own tools on the fly, but no single software package can do anything and everything you want. One solution is the Galaxy platform, which is a suite of tools available online for free. This platform balances ease-of-use with customizability.


  1. Go to http://usegalaxy.org.
  2. Under User in the top menu, select Register to create a new account. It's free.



View Human Genes as a Map (Track)

When an organism's entire genetic content has been "read" by biochemical sequencing techniques, every single A, T, C, and G is recorded in a database, in the order that each letter (nucleotide) appears in the organism. The collective data is referred to as a sequenced genome. Because each nucleotide has a position, each one gets a number, or coordinate. Since genes are made up of nucleotides, the start and end of each gene is assigned a range of coordinates. All of this data is stored and shared with scientists so that we can use the same information to make discoveries that link back to a universal set of coordinates in whatever genome we are investigating. The data are usually stored as tables with thousands of rows of numbers. Galaxy can be used to turn the numerical data into easy-to-view illustrated genomic maps.

Part 1: Transfer gene mapping data to Galaxy

  1. Log in to Galaxy.
  2. Click Get Data > UCSC Main Table Browser in the left menu.
  3. In the window titled Table Browser, set the parameters to the following values
    1. clade: mammal, genome: human, assembly: Mar. 2006 (NCBI36/hg18)*
    2. group: Genes and Gene predictions, track: RefSeq Genes
    3. table: refGene
    4. region: genome
    5. output format: BED - Browser Extensible Data, Send output to: Galaxy
    6. output file: (leave blank)
    7. file type returned: plain text
  4. Click "Get Output."
  5. In the window titled Output refGene as BED, set the parameters to the following values
    1. name = tb_refGene, description = table browser query on refGene, visibility = full, url = (leave blank)
    2. Create on BED record per: Whole Gene
  6. Click "Send query to Galaxy"
  7. A new file will appear under History (right side menu). Wait until the job has completed.
  8. Under History, click the name of the file to open up more options.
  9. Click the eye (View Data) icon. You should see a huge table with many rows that look something like this:
    chr1 67051159 67163158 NM_024763 0 - 67052400 67163102 0 17
  • Note - 3/28/14 - Newer releases of the human genome (hg19, hg36) have been made available. Much of the genomic data that you will be using is most likely based on the hg18 map. Please use hg19 or any other release as appropriate.


Part 2: View the data as a map

  1. Click Visualization in the top menu. In the pop-up menu, select New Track Browser.
  2. Set the Browser name to Human Genome.
  3. Set the Reference genome build to Human Mar. 2006 (NCBI36/hg18), or which ever genome is consistent with the data you imported in the previous step.
  4. Click Create. This sets up an empty framework for displaying data that maps to the coordinates of the reference genome you selected. Now you must populate the map with data.
  5. Click Add Datasets to Visualization.
  6. Select the refGene data and click Add. Wait for the data to load.
  7. After the data has loaded, you should see some colors within the track. Zoom in until you see thin lines and thick bars that look like gene annotations.
  8. Move your cursor over the track label on the left to open display options. Click the downward arrow (Set display options) icon. Set the display to Pack. You should now see transcript labels (e.g., NM_12345678).
  9. Use the following controls to explore the map:
    1. Use the slider at the bottom of your browser window to scroll across the map.
    2. Use the drop down menu at the top to switch between chromosomes. For example, select chr 16.
    3. Set a specific range to view by editing the range of coordinates. For example, change the range to chr16:46690057-46894855 to view the region where the human ear wax type gene is located (ABCC11, NM_033151 and other transcripts).
  10. Click the save icon in the upper right.
  • Note: Unfortunately, standard human gene symbols are not displayed here. You can identify a gene from this trach by going to NCBI and searching for the NM number with Gene selected in the drop down menu.




Add ChIP Data to a Genome Map (Track)

These steps walk you through uploading BED data from a ChIP experiment and viewing them alongside a gene map.

Part 1: Transfer ChIP data to Galaxy

  1. If you have not already done so, store one or more BED files on your local hard drive. If you are interested in using data from another lab, make sure that the lab provides to you the data in BED format.
  2. Log in to Galaxy.
  3. Click Get Data > Upload File in the left menu.
  4. Set the parameters to the following values:
    1. File: browse for the BED file on your hard drive
    2. Genome: Select a genome that corresponds to your BED data. If you are using the human genome, be sure to select the hg number (e.g., hg18) that corresponds to your BED data.
    3. Convert spaces to tabs: yes
  5. Click the [Execute] button.
  6. Under History on the right side of the page you should see some indication of the data being uploaded to Galaxy. BED files are typically very large. Please be patient while the file uploads.
  7. When the upload is finished, this item will be labeled with a #1 and the file name under History.
  8. Click the eye icon to preview the data. It should look similar to the example below:
  9. Click the pencil icon to edit the file. You can rename the data file here. Importantly, make sure Chrom column, Start Column, End column, Name/Identifier column, and Strand column are set to 1, 2, 3, 4, and 6 respectively. Column 5 is usually the "score" of the row in a BED file. Select the “Datatype” tab and make sure it is set to interval. Click “Save.”
  10. Your data is now ready to be analyzed.

Part 2: View the data as a map

  1. Select Visualization in the top menu. In the pop-up menu, select Saved Visualizations.
    Note: If you have not created a Browser yet, follow the steps from View Human Genes as Map (Track): Part 2.
  2. Open the Browser you saved (Human Genome).
  3. Find and click the plus sign (Add tracks) icon at the top right.
  4. Select the BED data you uploaded. Click the Add button.
  5. The data will appear as a new row, lined up in register with the gene mapping data.
  6. Click the save icon in the upper right.


Note: Because you are a registered user, Galaxy will save this data under History until you delete it (even after you log off and log back in). Note that you have a space limit, so you should delete any files under History that are incorrect, or that you no longer need.