Wilke:Creating figures

From OpenWetWare
Jump to navigationJump to search
Notice: The Wilke Lab page has moved to http://wilkelab.org.
The page you are looking at is kept for archival purposes and will not be further updated.
THE WILKE LAB

Home        Contact        People        Research        Publications        Materials

Creating Publication-Quality Figures

Many different programs can be used to generate figures. Unfortunately, almost all graphing programs have poor default settings. Therefore, if you create a figure without changing the defaults, you can be almost certain that your figure is not ready to be published.

Here are a few general guidelines:

  • Add labels to all axes.
  • Make axis labels and any other labels in the figure sufficiently large. By default, most graphing programs use labels that are far too small. Note that labeling that looks good on the computer screen is often too small for print, because we tend to zoom figures to a large size when we prepare them. To test whether labels are of a good size, zoom out to a point where the figure spans only 3-4 inch in width. If you can still comfortably read the labels, then they are of a good size.
  • Minimize visual clutter by maximizing the amount of ink used to convey data relative to the total amount of ink. Therefore, remove any distracting background (such as a grid in the figure). Use half-open figures where there are axes at the bottom and the left, but not at the right and the top. Remove any lines that don't convey any information. Make sure that the lines that represent data are thicker than the axis lines.
  • Don't put a title on top of the figure. The title belongs into the figure caption.
  • Be mindful of color usage. Many people are color blind and may not be able to distinguish some of the different colors you are using. In general, if at all possible, a figure should still convey all its information when printed black-and-white.
  • If possible, avoid overly busy line styles, such as dotted or dashed lines, in particular many different types of dotted or dashed lines. Always avoid patterned fill styles in bar graphs.
  • Avoid pie charts, and in particular 3d pie charts. These types of graphs do not accurately convey quantitative information.
  • In general, MS Excel cannot produce acceptable figures and should be avoided. MS Excel also makes it difficult to export figures into commonly used formats such as eps, pdf, or svg. Many people achieve excellent results with the programs R, gnuplot, Grace, or Matlab.
Example of a poorly designed figure.
An improved version of the same figure.

Creating figures with plain R

We produce most of our figures in the lab with R. The advantage of making figures with R is that creation of the figure is tightly integrated with the data analysis process, and that we can script and automate figure creation. The latter point is particularly important for reproducibility; a data file plus associated R script is all that is needed to regenerate the exact published figure.

Below follows an example R script to generate a typical figure. You can use this script as a template to generate similar figures. If you need to place two or more figures next to each other, the simplest way to achieve that with the template script is to make use of the split.screen() function.

require(Hmisc) # for function errbar()

# The data to plot.
mean.zdg <- c( 0.35208113, 0.07153585, -0.04377547, -0.12779811,
    -0.25646981, -0.18000377, -0.17827170, 0.03797358, -0.10975094,
     0.04821887, 0.03103208, 0.12747170, 0.07016604 ) # means
se.zdg <- c( 0.1192534, 0.1421630, 0.1408142, 0.1497453, 0.1508856,
     0.1492282, 0.1563277, 0.1174525, 0.1337940, 0.1261310,
     0.1473556, 0.1263988, 0.1258108 ) # standard errors 
window.start <- c( 1, 11, 21, 31, 41, 51, 61, 71, 81, 91, 101, 111,
     121 ) # start position of the analysis window (in nucleotides)

# The code to generate the figure. Output file will be "T7_zdg.pdf".
# The option "useDingbats=False" fixes a font problem that some
# open-source pdf readers experience.
pdf( "T7_zdg.pdf", width=4.5, height=4, useDingbats=FALSE )
par( mai=c(0.65, 0.65, 0.1, 0.05), mgp=c(2, 0.5, 0), tck=-0.03 )
plot( window.start, mean.zdg, type= 'l', col='black',
      ylim=c(-0.45, 0.45), axes=FALSE,
      xlab='Window start position (nt)',
      ylab=expression(bar(Z)[Delta][G]))
errbar( window.start, mean.zdg, mean.zdg+se.zdg, mean.zdg-se.zdg,
        add=TRUE, bg='grey60', pch=18, cex=1, xlab='', ylab='' )
abline( h=0, col='grey60', lty=2 )
axis( 1,
      at=c( 1, 11, 21, 31, 41, 51, 61, 71, 81, 91, 101, 111, 121 ),
      c(1, NA, 21, NA, 41, NA, 61, NA, 81, NA, 101, NA, 121) )
axis( 2,
      at=c(-.4, -.3, -.2, -.1, 0, .1, .2, .3, .4),
      c(-0.4, NA, -0.2, NA, 0, NA, 0.2, NA, 0.4) )
legend( "topright", "Phage T7", pch=c(18), col=c('black'), bty='n' )
dev.off()
Figure created by the example R script.

Notes on using R

  • There are two well-known packages that automate much of the work of preparing multivariate graphics: lattice and ggplot2.
  • The ColorBrewer website can be helpful in selecting colors that reinforce the story your data tells. The website also helps you select color schemes that print in black and white. The RColorBrewer package then provides palettes to use these color schemes in your R plots.
  • Consider using the graphics devices in the Cairo package (e.g., CairoSVG(), CairoEPS()) to produce your graphics. They produce consistent graphics across platforms and output formats.
  • If you are using ggplot2, you may increase the relative amount of ink in the graph that conveys data by setting custom theme options. The following options remove the background grid and the boxes around panel labels:
g <- g + theme_bw() 
g <- g + opts(panel.grid.minor=theme_blank(), panel.grid.major=theme_blank(), panel.background=theme_blank()) 
g <- g + opts(strip.background=theme_blank(), strip.text.y = theme_text())
  • If your are using ggplot2, this page shows, among other nice tips, a way to produce half-open plot borders. Some alternative ways are mentioned in the example below.
  • Exporting the graphic to SVG also allows you to use a graphical editor such as Inkscape to make a few final touches on the graph.
  • If there is something you want to do, search the web. Often a tutorial or mailing-list exchange will quickly come up and provide you with useful information. You can search the R-help mailing list and R documentation from right within R using RSiteSearch().
  • Here are a few websites you might start your search at:
    • R Seek This Google search engine focuses on websites with much R-related content.
    • R Graphics Gallery This website allows you to browse through the graphics produced by many R users.
    • Learning R This blog has a lot of tips and examples of ggplot2 usage. There is also a series summarized and distributed in this post with many examples of how to make multivariate figures with both lattice and ggplot2.

Creating figures with ggplot2

Below follows an R script to generate a multivariate figure in R with ggplot2. It demonstrates a few of the points in the previous section.


#Uncomment to install these packages if they are not already installed.
#install.packages("ggplot2")
#install.packages("RColorBrewer")
#install.packages("Cairo")

# Load ggplot2 
library(ggplot2)

# Load Cairo graphics devices
library(Cairo)

# This code generates data with 2 independent variables 'a' and 'b' with 2 levels each,
# one dependent variable 'value', And 20 replicates.
a <- c(20, 37)
b <- c(0.50, 0.80)
num.reps <- 20
value <- outer(a, b, "*")
conditions <- expand.grid(a, b)
num.trials <- length(a) * length(b) * num.reps
A <- data.frame(value=rnorm(num.trials, mean=rep(as.vector(value[]),num.trials), sd=2),
                a=conditions[,1], b=conditions[,2])

#Add data to the graph object, the data object should be of class 'data.frame'
g <- ggplot(data=A)

# Add a layer to the ggplot object. geom_density() will produce a kernel density plot 
# of a single variable on the x-axis. aes() determines the mapping of data to the 
# density plot. In this case we map the 'value' variable in the data to the x-axis of 
# the density plot, split the  'value' variable into groups by the interaction of the
# variables 'a' and 'b', and make each group a different color.
g <- g + geom_density(aes(x=value, group=interaction(a,b), 
                      colour=interaction(a, b, sep=" and ")))

# Here we change the color scale to one that will still be visible when printed in 
# black and white and emphasizes the paired nature of the groupings. 
g <- g + scale_colour_manual(name="conditions", 
                             values=RColorBrewer::brewer.pal(4, "Paired"))

# This line changes various aspects of the plot's appearance to be suited for black 
# and white print.
g <- g + theme_bw()

# Now we remove the background grid.
g <- g + opts(panel.grid.minor=theme_blank(), panel.grid.major=theme_blank(), 
              panel.background=theme_blank())

# Now we produce an SVG file.
CairoSVG("overlay.svg", width=4.5, height=4)
print(g)
dev.off()

There are a number of ways to make plots with half-open borders when using ggplot2. You can use a custom theme as this page demonstrates. In some cases, the following standard theme options work as well:

g <- g + opts(panel.border=theme_blank(), axis.line=theme_segment())

This solution can fail when producing graphics with lots of panels; sometimes these options result in some panels having no borders at all, which can be a problem. When all else fails you can can export the graphics as an SVG file and delete or add borders as needed. A hack to quickly delete lots of borders that has been useful to one user in the past is as follows. Place an awk script named 'open-panels.awk' in your working directory. The script should contain lines something like the following lines.

{
  if(/49.803922\%/){
    {
      i = NF - 15
      $(i)=$(i+9)
      $(i+3)=$(i+9)
      $(i+4)=$(i+10)
      $(i+5)=$(i+6)
      $(i+6)=$(i+7)
      {for (j = i + 7; j < i +15; j++)
        $(j) = ""
      }
      print
    }
  }
  else
    print 
}

With the awk script in place, you can then use a system call in R to produce an SVG with half-open borders.

system("awk -f open-panels.awk overlay.svg > overlay-open.svg")
Figure created by the example ggplot2 R script and post-processing.

A figure similar to the previous one could have been made using base graphics in plain R. Using a package like ggplot2 makes it easier to produce a graphic with multiple panels though. Here's an example.

g <- ggplot(data=A) + geom_density(aes(x=value))

# We make a paneled plot with 'a' varying across the rows and 'b' varying across the 
# columns. The 'label_both' value for the labeller option puts both the variable name
# and the variable value into the panel labels. The default is to just put the value 
# into the label. For including expressions in the labels, use 'labeller=label_parsed'. 
g <- g + facet_grid(a~b, labeller=label_both)

g <- g + theme_bw() + opts(panel.grid.minor=theme_blank(), panel.grid.major=theme_blank(),
                           panel.background=theme_blank())

# This line removes the background boxes from the panel labels and makes the row panel
# labels horizontally oriented.
g <- g + opts(strip.background=theme_blank(), strip.text.y = theme_text())


CairoSVG( "facets.svg", width=6, height=6)
print(g)
dev.off()
Error creating thumbnail: File with dimensions greater than 12.5 MP
Panelled figure created by the example ggplot2 R script and post-processing.