User talk:Darek Kedra/sandbox 11
- AYB AYB is a base caller for the Illumina GA II platform, (21 May 2009, Initial release). No publication yet (Dec 2009).
- language: R with C helper functions
- Installation OK, but requires editing config files ( location of /usr/lib64/R and location of final install directory)
- to run:
run_ayb.sh [-nc=45] -prefix2=output_dir -prefix3=intensity_dir tile=tile_prefix [-compression=gzip] [-matrix=/path/to/matrix.txt] [-mpi=5] [-I] [-niter=5] [-paired] [-saveR] [-tol=1e-5] where: compression: Gzip, bzip2 or none. "gzip" intensity files are *_int.txt.gz, "bzip2" intensities files are *_int.txt.bz2, "none" *_int.txt (default "none"). I: Read intensities in IPAR format (number of cycles must be given). matrix: Use a predetermined phasing matrix (e.g. that estimated by the Illumina pipeline). Switches off cross-talk estimation, faster but may give worse results (optional). mpi: Use MPI to run on multi-processors. Option is number of processors, otherwise number available on computer (optional). nc: Number of cycles to analyse; should be less than or equal to number of cycles in the intensity file (no default, required for IPAR). niter: Number of full tolerance iterations to do (default 5). paired: Treat read as paired-end, split into two reads of length nc/2. (optional). prefix2: Path to directory in which output files are created (default ""). prefix3: Path to directory from which intensities file is read (default ""). saveR: Save final R data structures to tile.RData (optional). tile: Prefix of tile, e.g. s_1_0015. Filenames are automatically completed, so -tile=s_1 does all of lane 1, -tile=s_1_00 does the first 99 tiles (no default, required). tol: Tolerance for iterations (default 1e-5).
- current version: 0.3 (naive) bayesCall (speedup)
- W. C Kao, K. Stevens, and Y. S Song, “BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing,” Genome Research 19, no. 10 (2009): 1884.
- Gnu Scientific Library (GSL) (version >= 1.12) Note that cblas library might be required by GSL.
- Python (version >= 2.5)
- SciPy (version >= 0.7)
- NumPy (version >= 1.3)
- SWIFT from Sanger
- requirements (packages):
- gsl gsl-dev, libtiff libtiff-dev, fftw3 fftw3-dev
- requirements (packages):
- input: image data
- output: base calls (format?)
- Rolexa R package
R source("http://bioconductor.org/biocLite.R") biocLite("Rolexa")
R CMD INSTALL mclust_3.3.2.tar.gz R CMD INSTALL fork_1.2.2.tar.gz R CMD INSTALL Rolexa_1.2.0.tar.gz
- running instructions ( bit complicated): http://www.bioconductor.org/packages/2.5/bioc/vignettes/Rolexa/inst/doc/Rolexa-vignette.pdf
- output: uses alternative probabilistic base calling method based on the fluorescence intensity quantifications that uses the extended IUPAC alphabet to code ambiguous bases