Haynes:GalaxyChiP: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 14: Line 14:


==View Human Genes as Map (Track)==
==View Human Genes as Map (Track)==
Galaxy can be used to house the data files that you are interested in processing or turning into genomic maps.  
When an organism's entire genetic content has been "read" by biochemical sequencing techniques, every single A T C, and G, is recorded in a database, in the order that each letter (nucleotide) appears in the organism. The collective data is referred to as a sequenced genome. Because each nucleotide has a position, each one gets a number, or coordinate. Since genes are made up of nucleotides, the start and end of each gene is assigned a range of coordinates. All of this data is stored and shared with scientists so that we can use the same information to make discoveries that link back to a universal set of coordinates in whatever genome we are investigating. The data are usually stored as tables with thousands of rows of numbers. '''Galaxy can be used to turn the numerical data into easy-to-view illustrated genomic maps.'''
 


'''Part 1: Transfer gene mapping data to Galaxy'''
# Log in to Galaxy.
# Log in to Galaxy.
# Click '''Get Data > UCSC Main Table Browser''' in the left menu.
# Click '''Get Data > UCSC Main Table Browser''' in the left menu.
Line 31: Line 31:
## name = tb_refGene, description = table browser query on refGene, visibility = full, url = (leave blank)
## name = tb_refGene, description = table browser query on refGene, visibility = full, url = (leave blank)
## Create on BED record per: Whole Gene
## Create on BED record per: Whole Gene
## Click "Send query to Galaxy"
# Click "Send query to Galaxy"
# A new file will appear under History (right side menu). Wait until the job has completed.
 


*Note - 3/28/14 - Newer releases of the human genome (hg19, hg36) have been made available. Much of the genomic data that you will be using is most likely based on the hg18 map. Please use hg19 or any other release as appropriate.
*Note - 3/28/14 - Newer releases of the human genome (hg19, hg36) have been made available. Much of the genomic data that you will be using is most likely based on the hg18 map. Please use hg19 or any other release as appropriate.

Revision as of 15:54, 28 March 2014

<- Back to Protocols

Getting Started: Create a Galaxy Account

Intro: Databases usually do a very good job of collecting published data, sharing data, and making this data searchable/ usable by scientists outside of the publishing author's lab. However, it is almost impossible for the databases to keep up with all data that is constantly being generated by different labs. Therefore, you still need some DIY skills to make use of the most recent data. Scientists are pretty smart, but not everyone can become a computer software developer and make his/her own tools on the fly, but no single software package can do anything and everything you want. One solution is the Galaxy platform, which is a suite of tools available online for free. This platform balances ease-of-use with customizability.


  1. Go to http://usegalaxy.org.
  2. Under User in the top menu, select Register to create a new account. It's free.



View Human Genes as Map (Track)

When an organism's entire genetic content has been "read" by biochemical sequencing techniques, every single A T C, and G, is recorded in a database, in the order that each letter (nucleotide) appears in the organism. The collective data is referred to as a sequenced genome. Because each nucleotide has a position, each one gets a number, or coordinate. Since genes are made up of nucleotides, the start and end of each gene is assigned a range of coordinates. All of this data is stored and shared with scientists so that we can use the same information to make discoveries that link back to a universal set of coordinates in whatever genome we are investigating. The data are usually stored as tables with thousands of rows of numbers. Galaxy can be used to turn the numerical data into easy-to-view illustrated genomic maps.

Part 1: Transfer gene mapping data to Galaxy

  1. Log in to Galaxy.
  2. Click Get Data > UCSC Main Table Browser in the left menu.
  3. In the window titled Table Browser, set the parameters to the following values
    1. clade: mammal, genome: human, assembly: Mar. 2006 (NCBI36/hg18)*
    2. group: Genes and Gene predictions, track: RefSeq Genes
    3. table: refGene
    4. region: genome
    5. output format: BED - Browser Extensible Data, Send output to: Galaxy
    6. output file: (leave blank)
    7. file type returned: plain text
  4. Click "Get Output."
  5. In the window titled Output refGene as BED, set the parameters to the following values
    1. name = tb_refGene, description = table browser query on refGene, visibility = full, url = (leave blank)
    2. Create on BED record per: Whole Gene
  6. Click "Send query to Galaxy"
  7. A new file will appear under History (right side menu). Wait until the job has completed.


  • Note - 3/28/14 - Newer releases of the human genome (hg19, hg36) have been made available. Much of the genomic data that you will be using is most likely based on the hg18 map. Please use hg19 or any other release as appropriate.



Add ChIP Data to a Genome Map (Track)

These steps walk you through uploading BED data from a ChIP experiment nd viewing them alongside a gene map.

  1. If you have not already done so, store one or more BED files on your local hard drive. If you are interested in using data from another lab, make sure that the lab provides to you the data in BED format.
  2. Log in to Galaxy.
  3. Click Get Data > Upload File in the left menu.
  4. Set the parameters to the following values:
    1. File: browse for the BED file on your hard drive
    2. Genome: Select a genome that corresponds to your BED data. If you are using the human genome, be sure to select the hg number (e.g., hg18) that corresponds to your BED data.
    3. Convert spaces to tabs: yes
  5. Click the [Execute] button.
  6. Under History on the right side of the page you should see some indication of the data being uploaded to Galaxy. BED files are typically very large. Please be patient while the file uploads.
  7. When the upload is finished, this item will be labeled with a #1 and the file name under History.
  8. Click the eye icon to preview the data. It should look similar to the example below:
  9. Click the pencil icon to edit the file. You can rename the data file here. Importantly, make sure Chrom column, Start Column, End column, Name/Identifier column, and Strand column are set to 1, 2, 3, 4, and 6 respectively. Column 5 is usually the "score" of the row in a BED file. Select the “Datatype” tab and make sure it is set to interval. Click “Save.”
  10. Your data is now ready to be analyzed.

Note: Because you are a registered user, Galaxy will save this data under History until you delete it (even after you log off and log back in). Note that you have a space limit, so you should delete any files under History that are incorrect, or that you no longer need.

Visualize the data as a track

  1. You must start with uploaded data. See "Getting Started, Part 2".
  2. Select Visualization in the top menu. In the pop-up menu, select New Track Browser.
  3. Click on the name of the data file under History to open up the options.
  4. Click the histogram (bar graph) icon. In the pop-up menu, select Trackster.