Etchevers:Notebook/STRA6 in eye development/2009/07/15

From OpenWetWare
Jump to navigationJump to search
Genetics of human eye development Main project page
Previous entry      Next entry

Next entry until links are fixed

Running Findpeaks finally

First, we see that all five sequencing results are extremely similar when the .bed files are visualized on the UCSC Genome Browser. So much so we wondered if any of these unintended possibilities happened:

  • Chromatin conformation in this particular preparation made some places more likely to non-specifically IP than others. In which case we are REALLY sorry not to have performed an IgG IP and sequenced that as well. Though could do pair-wise comparisons between banks.
  • All four transcription factors recognize exactly the same thing. Highly unlikely, esp for non-specific background, or in combination with above.
  • Something else was sequenced and the libraries for which we saw and approved the nature of clones were not the same as what was sequenced. Highly unlikely if not impossible.
  • A single IP was separately bar-coded five times and sequenced, then separated with bioinformatics. Highly unlikely.
  • All five samples were mixed before separate labeling with the bar codes. Also highly unlikely.
  • Each sample had all five bar codes. Highly unlikely if not impossible.

Given that Anthony pushes for an IgG control and discourages using the simulated controls, we conclude that the first option is the most likely, but that ChIP-Seq analyses are sufficiently new that this sort of good conceptualization of the experiment from the get-go is not yet a standard.

So, will try to perform the following pair-wise combinations. The first is to compare OTX2-1 and OTX2-2 to see if these two are more similar than to any of the other TF ChIPs. I REALLY hope so.

OTX2-1 vs:

  • OTX2-2
  • RAX
  • SOX2
  • PAX6

RAX vs:

  • OTX2-1
  • OTX2-2
  • SOX2
  • PAX6

SOX2 vs:

  • OTX2-1
  • OTX2-2
  • RAX
  • PAX6

All the combinations won't be performed because we think that most of these TFs will co-IP with PAX6 and possibly SOX2. When I say "pair" I mean use the top one as the "control" and each subsequent bank as the sample.

Questions to ask Fasteris: What is their empirical experience with the comparison of multiple IPs from a given chromatin preparation? Same sort of near-identical distributions of reads on the human genome. Is the range of peak sizes comparable? Is there a way to filter certain parts of the genome, not just repeated sequences, but perhaps exons even, to avoid some regions that are particularly "sticky" and precipitable from one experiment to another, across human chromatin generally?

Are all the coverage-gap-filtered peaks in all five banks actually the same or not?

oeil@cornee:~/trunk/jars/fp4$ time java -Xmx2G -jar FindPeaks.jar -name RAX-PAX6-FP -input /home/oeil/Documents/Fasteris_7_2009/GDZ5_PAX6_bed/2009-07-01_GDZ-5_export_Chr1.bed -output /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/ -aligner bed -one_per -dist_type 1 200 -minimum 2 -compare /home/oeil/Documents/Fasteris_7_2009/GDZ4_RAX_bed/2009-07-01_GDZ-4_export_Chr1.bed -alpha

   Version: Initializing class Log_Buffer                        $Revision: 1145 $
   Version: Initializing class FindPeaks                         $Revision: 1335 $
   Info:    Note: all output now goes to log file.
   Info:    Log file: /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/RAX-PAX6-FP.log
   Version: Initializing class Parameters                        $Revision: 1298 $
   Info:     * MC simulation        : Off
   Info:     * Chr name prepend     : none
   Info:     * Min. reported pk ht  : 2
   Info:     * Minimum ht to process: Off
   Info:     * Lander-Waterman FDR  : Off
   Info:     * Output Sequence      : Off
   Info:     * Output directory     : /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/
   Info:     * Control files in use : Off
   Info:     * Compare files in use : On
   Info:     * Peak ht transform    : false
   Info:     * Compare window size  : 100
   Info:     * Auto-threshold       : Off
   Info:     * Filter on PET flags  : Off
   Info:     * Maximum PET frag size: Off
   Info:     * Aligner              : bed
   Info:     * Triangle dist.       : 100 low
   Info:     * Triangle dist.       : 200 median
   Info:     * Triangle dist.       : 300 high
   Info:     * One file per chr.    : On
   Info:     * Naming files as      : RAX-PAX6-FP
   Info:     * Sub-peaks            : Off
   Info:     * Trim                 : Off
   Info:     * Saturation Analysis  : Off
   Error:     Unexpected number of parameters for -alpha: 
   
   real	0m1.077s
   user	0m0.096s
   sys	0m0.036s

First try again using a flag for -alpha : 0.05. This went a little further, but,

   Version: Initializing class Log_Buffer                        $Revision: 1145 $
   Version: Initializing class FindPeaks                         $Revision: 1335 $
   Info:    Note: all output now goes to log file.
   Info:    Log file: /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/RAX-PAX6-FP.log
   Version: Initializing class Parameters                        $Revision: 1298 $
   Info:     * MC simulation        : Off
   Info:     * Chr name prepend     : none
   Info:     * Min. reported pk ht  : 2
   Info:     * Minimum ht to process: Off
   Info:     * Lander-Waterman FDR  : Off
   Info:     * Output Sequence      : Off
   Info:     * Output directory     : /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/
   Info:     * Control files in use : Off
   Info:     * Compare files in use : On
   Info:     * Peak ht transform    : false
   Info:     * Compare window size  : 100
   Info:     * Auto-threshold       : Off
   Info:     * Filter on PET flags  : Off
   Info:     * Maximum PET frag size: Off
   Info:     * Aligner              : bed
   Info:     * Triangle dist.       : 100 low
   Info:     * Triangle dist.       : 200 median
   Info:     * Triangle dist.       : 300 high
   Info:     * One file per chr.    : On
   Info:     * Naming files as      : RAX-PAX6-FP
   Info:     * Sub-peaks            : Off
   Info:     * Trim                 : Off
   Info:     * Saturation Analysis  : Off
   Info:     * Compare alpha value  : 0.05	(Confidence Interval: 95.0)
   Info:     * Histogram length     : 30
   Info:     * Histogram precision  : 1
   Info:     * Peaks File Header    : On
   Info:     * Bedgraph/Wigfile     : wig file
   Info:     * R mode               : Off
   Info:     * Filter Duplicates    : Off
   Info:     * Filter quality       : Off
   Version: Initializing class PeakWriter                        $Revision: 1299 $
   Version: Initializing class Generic_AlignRead_Iterator        $Revision: 1318 $
   Version: Initializing class BedIterator                       $Revision: 1317 $
   Warning:  Not enough fields: browser position chr1:1-1000000
   Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
   at src.lib.ioInterfaces.BedIterator.next(BedIterator.java:110)
   at src.lib.ioInterfaces.BedIterator.next(BedIterator.java:23)
   at src.lib.ioInterfaces.Generic_AlignRead_Iterator.hasNext(Generic_AlignRead_Iterator.java:103)
   at src.projects.findPeaks.FindPeaks.core_routine_one_file(FindPeaks.java:458)
   at src.projects.findPeaks.FindPeaks.main(FindPeaks.java:843)

And nothing else is happening. This created a "RAX-PAX6-FP.log" and an empty "RAX-PAX6-FP_triangle_standard.peaks". The first file contains the info I just put above.

It looked like the format of the .bed files was not good, as there are optional lines you can put in to make it look nicer in the browser. I removed these and renamed the files in gedit (the text editor of Ubuntu) to PAX6_Chr21 and RAX_Chr21 as a test.

oeil@cornee:~$ time java -Xmx2G -jar ~/trunk/jars/fp4/FindPeaks.jar -name RAX-PAX6-FP -input /home/oeil/Documents/Fasteris_7_2009/GDZ5_PAX6_bed/PAX6_Chr21.bed -output /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/ -aligner bed -dist_type 1 200 -minimum 2 -compare /home/oeil/Documents/Fasteris_7_2009/GDZ4_RAX_bed/RAX_Chr21.bed -alpha 0.05

   Version: Initializing class Log_Buffer                        $Revision: 1145 $
   Version: Initializing class FindPeaks                         $Revision: 1335 $
   Info:    Note: all output now goes to log file.
   Info:    Log file: /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/RAX-PAX6-FP.log
   Version: Initializing class Parameters                        $Revision: 1298 $
   Info:     * MC simulation        : Off
   Info:     * Chr name prepend     : none
   Info:     * Min. reported pk ht  : 2
   Info:     * Minimum ht to process: Off
   Info:     * Lander-Waterman FDR  : Off
   Info:     * Output Sequence      : Off
   Info:     * Output directory     : /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/
   Info:     * Control files in use : Off
   Info:     * Compare files in use : On
   Info:     * Peak ht transform    : false
   Info:     * Compare window size  : 100
   Info:     * Auto-threshold       : Off
   Info:     * Filter on PET flags  : Off
   Info:     * Maximum PET frag size: Off
   Info:     * Aligner              : bed
   Info:     * Triangle dist.       : 100 low
   Info:     * Triangle dist.       : 200 median
   Info:     * Triangle dist.       : 300 high
   Info:     * One file per chr.    : Off
   Info:     * Naming files as      : RAX-PAX6-FP
   Info:     * Sub-peaks            : Off
   Info:     * Trim                 : Off
   Info:     * Saturation Analysis  : Off
   Info:     * Compare alpha value  : 0.05	(Confidence Interval: 95.0)
   Info:     * Histogram length     : 30
   Info:     * Histogram precision  : 1
   Info:     * Peaks File Header    : On
   Info:     * Bedgraph/Wigfile     : wig file
   Info:     * R mode               : Off
   Info:     * Filter Duplicates    : Off
   Info:     * Filter quality       : Off
   Version: Initializing class PeakWriter                        $Revision: 1299 $
   Version: Initializing class Generic_AlignRead_Iterator        $Revision: 1318 $
   Version: Initializing class BedIterator                       $Revision: 1317 $
   Info:    Running Peak Processor
   Version: Initializing class PeakDataSet Peak Locator          $Revision: 1335 $
   Version: Initializing class PeakStore                         $Revision: 1335 $
   Version: Initializing class MapStore                          $Revision: 1335 $
   Info:    Current chromosome : chr21
   Info:    Reads used: 39684
   Version: Initializing class PeakStats                         $Revision: 1335 $
   Version: Initializing class Histogram                         $Revision: 1197 $
   Info:    Current chromosome : chr21
   Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
   at src.projects.findPeaks.objects.Map_f.<init>(Map_f.java:9)
   at src.projects.findPeaks.objects.MapStore.put(MapStore.java:42)
   at src.projects.findPeaks.PeakDataSetParent.process_float_based(PeakDataSetParent.java:383)
   at src.projects.findPeaks.PeakDataSetParent.process_peaks_from_iterator2(PeakDataSetParent.java:505)
   at src.projects.findPeaks.PeakDataSetParent.<init>(PeakDataSetParent.java:156)
   at src.projects.findPeaks.FindPeaks.core_routine_one_file(FindPeaks.java:619)
   at src.projects.findPeaks.FindPeaks.main(FindPeaks.java:843)
   ^C
   real	1m55.855s
   user	0m4.124s
   sys	0m1.492s
   oeil@cornee:~$ 

Well, I don't really know what that's about. Check out this solution tomorrow.

Tried with "3G" (seems like a lot!!) and get exactly the same thing with a little more time.

real 0m21.437s user 0m5.164s sys 0m2.260s oeil@cornee:~$