User:Timothee Flutre/Notebook/Postdoc/2011/11/07

From OpenWetWare

(Difference between revisions)
Jump to: navigation, search
(Entry title: test KMLOCAL)
(Entry title: transform page into "About R")
Line 6: Line 6:
| colspan="2"|
| colspan="2"|
<!-- ##### DO NOT edit above this line unless you know what you are doing. ##### -->
<!-- ##### DO NOT edit above this line unless you know what you are doing. ##### -->
-
==Entry title==
+
==About R==
 +
 
 +
* '''Motivation''': when analyzing data for any research project, it's essential to be able to quickly clean the raw data, transform them, plot intermediary results, calculate summary statistics, try various more-or-less sophisticated models, etc. This must be easily doable with small as well as large data sets, interactively or not. Several tools exist to fill exactly this need, and [http://en.wikipedia.org/wiki/R_%28programming_language%29 R] is only one of them, but I especially recommend it because it is build by statisticians (this means that the implemented models are numerous and state-of-the-art). Moreover, it's [http://cran.r-project.org/sources.html open-source], platform-independent, full of [http://cran.r-project.org/web/packages/available_packages_by_name.html packages], with [http://cran.r-project.org/web/views/ well-documented] resources, etc, so give it a try!
 +
 
 +
* '''Documentation''':
 +
** try it [https://www.codeschool.com/courses/try-r online]
 +
** official introductory [http://cran.r-project.org/doc/manuals/R-intro.html manual]
 +
** well-organized [http://www.statmethods.net/ how-to]
 +
** freely-available [http://adv-r.had.co.nz/ book] for advanced usage
 +
** [http://www.r-bloggers.com/ aggregator] of R blogs
 +
** compatible with [http://ess.r-project.org/ ESS] (emacs mode), besides other IDEs such as [http://www.rstudio.com/ Rstudio]
 +
 
 +
 
 +
 
* customize the built-in heatmap in R (inspired from [http://stackoverflow.com/questions/5687891/r-how-do-i-display-clustered-matrix-heatmap-similar-color-patterns-are-grouped/5694349 this]):
* customize the built-in heatmap in R (inspired from [http://stackoverflow.com/questions/5687891/r-how-do-i-display-clustered-matrix-heatmap-similar-color-patterns-are-grouped/5694349 this]):
Line 35: Line 48:
   
   
  myheatmap(mydata.sort)
  myheatmap(mydata.sort)
-
 
-
 
-
* try [http://www.cs.umd.edu/~mount/Projects/KMeans/ KMLOCAL], yet another kmeans clustering program:
 
-
 
-
wget http://www.cs.umd.edu/~mount/Projects/KMeans/kmlocal-1.7.2.tar.gz
 
-
 
-
cat test_kmlocal.config
 
-
show_assignments yes      # show final cluster assignments
 
-
validate yes              # validate assignments
 
-
dim 3                    # dimension
 
-
data_size 1000            # number of data points
 
-
seed 1859                # random number seed
 
-
read_data_pts matrix.txt  # read data points
 
-
kcenters 4                # number of centers
 
-
max_tot_stage 20 0 0 0    # number of stages
 
-
seed 4                    # use different seed
 
-
run_kmeans swap          # run with this algorithm
 
-
 
-
kmltest -i test_kmlocal.config -o test_kmlocal.out
 
-
 
-
But it doesn't work on a big dataset (bad_alloc).
 
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->

Revision as of 23:14, 7 October 2013

Project name Main project page
Previous entry      Next entry

About R

  • Motivation: when analyzing data for any research project, it's essential to be able to quickly clean the raw data, transform them, plot intermediary results, calculate summary statistics, try various more-or-less sophisticated models, etc. This must be easily doable with small as well as large data sets, interactively or not. Several tools exist to fill exactly this need, and R is only one of them, but I especially recommend it because it is build by statisticians (this means that the implemented models are numerous and state-of-the-art). Moreover, it's open-source, platform-independent, full of packages, with well-documented resources, etc, so give it a try!
  • Documentation:


  • customize the built-in heatmap in R (inspired from this):
S <- 3  # nb of subgroups
V <- 7  # nb of observations
z <- matrix(c(0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,0,1,1,1,0,0), nrow=V, ncol=S, byrow=TRUE)

myheatmap <- function(z, out.file="") {
  def.par <- par(no.readonly=TRUE)
  par(mar=c(4,5,3,2), font=2, font.axis=2, font.lab=2, cex=1.5, lwd=2)
  if (out.file != "")
    pdf(out.file)
  layout(mat=cbind(1, 2), width=c(7,1))  # plot +  legend
  mycol <- rev(heat.colors(4))
  image(x=1:NCOL(z), y=1:NROW(z), z=t(z),
        xlim=0.5+c(0,NCOL(z)), ylim=0.5+c(0,NROW(z)),
        xlab="", ylab="Observations sorted by cluster", main="Custom heatmap",
        axes=FALSE, col=mycol)
  axis(1, 1:NCOL(z), labels=paste("subgroup", 1:NCOL(z)), tick=0)
  par(mar=c(0,0,0,0))
  plot.new()
  legend("center", legend=sprintf("%.2f", seq(from=min(z), to=max(z), length.out=5)[-1]),
         fill=mycol, border=mycol, bty="n")
  if (out.file != "")
    dev.off()
  par(def.par)
 }

myheatmap(mydata.sort)


Personal tools