Etchevers:Notebook/STRA6 in eye development/2009/07/17

From OpenWetWare
Jump to navigationJump to search
Genetics of human eye development Main project page
Previous entry      Next entry

Findpeaks memory problems (partly) resolved

Did a number of trials and saw that of the 4G + 8G virtual memory, if we say that java can use up to 11G of it, this allows analysis of the Chr15 files, whereas before, both Chr15 and Chr1 exceeded the heap size that was needed by Java.

By keeping System Monitor open, we can see why - the Chr15 wig generation takes up nearly 10 gigabytes all by itself, making us think we didn't get enough physical memory. Trying to find out how to purchase more, but the Optiplex GX960 is too new and I'm not sure it is the same as Optiplex 960.

oeil@cornee:~$ time java -Xms2G -Xmx11G -jar ~/trunk/jars/fp4/FindPeaks.jar -name PAX6-RAX-FP -input /home/oeil/Documents/Fasteris_7_2009/GDZ5_PAX6_bed/PAX6_Chr15.bed -output /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/17-7_PAX6-RAX_chr15 -aligner bed -dist_type 1 200 -minimum 2 -compare /home/oeil/Documents/Fasteris_7_2009/GDZ4_RAX_bed/RAX_Chr15.bed -alpha 0.05

If this ever finishes - it's taking quite some time - it is worth seeing what happens when I change the following parameters. We did try changing around (for chr21 and chrY) the "sample" versus "control" and as one would hope, you get the same but interchanged peaks and wig files for sample/control either way around, but the comparison itself is a one-way street: this configuration should give us all peaks in PAX6 that do not correspond to a "background" as specified by RAX.

We think the following parameters could be changed:

 -dist_type 1 200 >> -dist_type 3
 -minimum 2 >> -minimum 15 (for example; Adeline tried with a -minimum 20 alone and it seemed to generate the same sized files to start)
 -window_size 50 (for example; default is 100)

"-compare <String> [<String> <String>...]

This mode performs a built in compare between two samples, using a symmetrical peak pairing method, as well as a normalization based upon the perpendiculars of best fit slope (linear regression-like)*. It will find the same peak pairs, normalize identically and provide the same list of peaks that pass filtering regardless of which sample is provided as input and which is provided as the compare.

The -compare flag must be followed by a list of files, which must be the same number and order as those provided to the -input flag, with which they will be compared.

This function operates by identifying all of the peaks in both of the samples, and then pairing each peak to the highest point in the opposite sample. The boundaries of the peak or the window_size are used to identify peaks that may be matched. All output from this method is placed in the *.regions files. (See RegionsFiles for format description.)

Two parameters are available for this method: -window_size and -alpha. -windows_size sets the largest window in which one peak may be matched to a peak in the opposite sample, and the window in which to search for the highest point around the peak max of a peak without a designated pair. -alpha sets the confidence interval for peak pair filtering."

* This is not working for us - see yesterday's entry - same inability to perform the plotting. Perhaps no peaks are coincident? seems odd.
  • oeil@cornee:~$ time java -Xms2G -Xmx11G -jar ~/trunk/jars/fp4/FindPeaks.jar -name RAX-PAX6-FP-Chr15 -input /home/oeil/Documents/Fasteris_7_2009/GDZ5_PAX6_bed/PAX6_Chr15.bed -output /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/17-7_RAX-PAX6_chr15 -aligner bed -dist_type 1 200 -minimum 2 -compare /home/oeil/Documents/Fasteris_7_2009/GDZ4_RAX_bed/RAX_Chr15.bed -alpha 0.05
  • Version: Initializing class Log_Buffer $Revision: 1145 $
  • Version: Initializing class FindPeaks $Revision: 1335 $
  • Info: Note: all output now goes to log file.
  • Info: Log file: /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/17-7_RAX-PAX6_chr15/RAX-PAX6-FP-Chr15.log
  • Version: Initializing class Parameters $Revision: 1298 $
  • Info: * MC simulation  : Off
  • Info: * Chr name prepend  : none
  • Info: * Min. reported pk ht  : 2
  • Info: * Minimum ht to process: Off
  • Info: * Lander-Waterman FDR  : Off
  • Info: * Output Sequence  : Off
  • Info: * Output directory  : /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/17-7_RAX-PAX6_chr15/
  • Info: * Control files in use : Off
  • Info: * Compare files in use : On
  • Info: * Peak ht transform  : false
  • Info: * Compare window size  : 100
  • Info: * Auto-threshold  : Off
  • Info: * Filter on PET flags  : Off
  • Info: * Maximum PET frag size: Off
  • Info: * Aligner  : bed
  • Info: * Triangle dist.  : 100 low
  • Info: * Triangle dist.  : 200 median
  • Info: * Triangle dist.  : 300 high
  • Info: * One file per chr.  : Off
  • Info: * Naming files as  : RAX-PAX6-FP-Chr15
  • Info: * Sub-peaks  : Off
  • Info: * Trim  : Off
  • Info: * Saturation Analysis  : Off
  • Info: * Compare alpha value  : 0.05 (Confidence Interval: 95.0)
  • Info: * Histogram length  : 30
  • Info: * Histogram precision  : 1
  • Info: * Peaks File Header  : On
  • Info: * Bedgraph/Wigfile  : wig file
  • Info: * R mode  : Off
  • Info: * Filter Duplicates  : Off
  • Info: * Filter quality  : Off
  • Version: Initializing class PeakWriter $Revision: 1299 $
  • Version: Initializing class Generic_AlignRead_Iterator $Revision: 1318 $
  • Version: Initializing class BedIterator $Revision: 1317 $
  • Info: Running Peak Processor
  • Version: Initializing class PeakDataSet Peak Locator $Revision: 1335 $
  • Version: Initializing class PeakStore $Revision: 1335 $
  • Version: Initializing class MapStore $Revision: 1335 $
  • Info: Current chromosome : chr15
  • Info: Reads used: 89208
  • Version: Initializing class PeakStats $Revision: 1335 $
  • Version: Initializing class Histogram $Revision: 1197 $
  • Info: Current chromosome : chr15
  • Info: Reads used: 128739
  • Version: Initializing class ApplyCompare $Revision: 1332 $
  • Info: Linear Regresion: Total: 14
  • Info: Linear Regresion: Used: 0
  • Version: Initializing class LinearRegressionPerpendicular $Revision: 1285 $

*Warning: Can't run a Linear Regression calculation with zero points.

  • Warning: Can not apply filter. A valid slope was not obtained from the analysis.
  • Info: Linear Regresion: Remaining: 14
  • Version: Initializing class RegionWriter $Revision: 1229 $
  • Info: writing to : /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/17-7_RAX-PAX6_chr15/RAX-PAX6-FP-Chr15_triangle_standard_chr15.regions
  • Version: Initializing class Wigwriter $Revision: 1329 $
  • Info: writing sample to: /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/17-7_RAX-PAX6_chr15/RAX-PAX6-FP-Chr15_triangle_standard_chr15_filtered_sample.wig.gz
  • Info: writing comtrol to: /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/17-7_RAX-PAX6_chr15/RAX-PAX6-FP-Chr15_triangle_standard_chr15_filtered_control.wig.gz
  • Info: writing to: /home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/17-7_RAX-PAX6_chr15/RAX-PAX6-FP-Chr15_triangle_standard_chr15.wig.gz
  • Info: Wrote to:/home/oeil/Documents/Fasteris_7_2009/FindPeaks4_Results/17-7_RAX-PAX6_chr15/RAX-PAX6-FP-Chr15_triangle_standard.wig.gz
  • real 88m22.544s
  • user 4m17.664s
  • sys 3m26.069s

Conclusions:

Chromosome 15 n'avait qu'un "hit" de toute façon identifié par Fasteris et c'était dans le centromère. A re-essayer avec Chr17 qui en a beaucoup plus et qui est encore d'une taille abordable.

Les résultats des fichiers tels "RAX-PAX6-FP-Chr15_triangle_standard_chr15_filtered_sample.peaks" ou *control.peaks" est une série de peaks spécifique à chaque échantillon, semblable aux fichiers peaks de Fasteris mais avec plus de lignes car nous avons gardé jusqu'à un "minimum" (coverage?) de 2 et non de 10, 20, 50...

Les résultats des fichiers "regions" est l'association des deux fichiers avec des -1 à la place des peaks non existants dans l'un ou l'autre échantillon.

Les ".wig" sont des monstres de taille inabordable et ne s'ouvrent pas en fichier .txt mais occupent bien 6-7 Go de mémoire vive avant de planter.

Chromosome 17 semble avoir + de hits, donc essayer avec -minimum 20 pour voir si FindPeaks retrouve les mêmes pics que le fichier Fasteris peaks Cov20_Gap20 et moins que le fichier *10.

En arrivant lundi, refaire avec "-dist_type 3" pour voir si cela prend moins de temps.

C'est parti!