McClean: Plotting Stacked Histograms: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(New page: =Summary= This explains the basics of making pretty bar plots in Matlab. The Matlab "bar" command is used, along with some nice scripts discovered on the Matlab file exchange. =Example= ...)
 
(26 intermediate revisions by the same user not shown)
Line 1: Line 1:
=Summary=
=Summary=
This explains the basics of making pretty bar plots in Matlab. The Matlab "bar" command is used, along with some nice scripts discovered on the Matlab file exchange.
This explains the basics of plotting histograms stacked vertically (this allows you to see the shift in, for instance, fluorescence in a population of cells analyzed by flow cytometry).   Example code and data is under "Files" at the end of the page.


=Example=
=Example=
Suppose you have some experimental data from two strains (Strain 1 and Strain 2) under four different experiment conditions (Condition A,B,C,D)The means and standard deviations of your measurement of interest look like:
Your data could be anything.  In this example, the variable "Data" (stored in MakeStackHistData.mat) contains five (5) rows, each of which contain 9000 fluorescence readings from a flow cytometry experimentEach row represents a time point, with induction of GFP increasing with time.
 
 
<pre>
<pre>
Strain1_Mean=[0.5137    3.2830    1.5887    5.9188];
%% Preliminaries:
Strain2_Mean=[0.4042    2.9884    0.5709    2.7766];
 
Strain1_std=[1.1393    2.8108    2.2203    3.5233];
close all; clear all;
Strain2_std=[0.8762    2.8478    0.9878    2.2197];
load('MakeStackHistData.mat')
</pre>
</pre>


Use Matlab's ''bar'' command to plot this data (without error bars) as a bar chart:
Chose bins (you probably want to use the same bin for every plot, since you will be stacking them along one y-axis) and then bin your data using the Matlab "hist" command.  We also keep track of the distributions' means since we use this to color the histograms later.
 
<pre>
<pre>
bar([1 2 3 4],[Strain1_Mean' Strain2_Mean'])
%Set up bins (we are making histograms of flow cytometry data so we chose logarithmically spaced bins):
legend('Strain 1','Strain 2')
bins=logspace(0,4,60);
pause; close all;
x=bins;
 
%Bin the data using "hist" and keep track of the number of elements "n" in each bin "x" for each row in "Data".  Also keep track of the mean of each row of "Data":
 
HistData=[]; 
Means=[];
 
for i=1:5
    [n,x]=hist(Data(i,:),x);
    HistData=[HistData; n./sum(n)];
    Means=[Means mean(Data(i,:))];
end
</pre>
</pre>


[[Image:Fig1_BarChart.png]]
We set up a colormap so that our histograms change in color as the mean of their distribution increases:
 


This looks ok, but we would really like some error bars, so we use a handy function (barwitherr) from the Matlab file exchange:
<pre>
<pre>
h=figure; hold;
%% Define a colormap for the histograms that will make the histograms brighter as the mean of the distribution increases
barwitherr([Strain1_std' Strain2_std'], [1 2 3 4],[Strain1_Mean' Strain2_Mean'])
 
legend('Strain 1','Strain 2')
% In this case we chose to make the histograms brighter green at higher
pause; close all;
% mean values since the flow cytometry data is of GFP.
%Define a color map
MMColorMap=zeros(5,3);
 
%Define colors so that they scale with the difference between the mean
%fluorescence at a given timepoint and the mean at time 0
 
 
MM=sort(Means);
MMdiff=Means-Means(1);
MMdiff=MMdiff./(max(MMdiff));
 
 
MMColorMap(1:end,2)=MMdiff;  %The colormap is RGB, so changing the second column changes the green values.
 
%Set up the figure and axis properties:
h=figure; hold; colors=colormap;
 
set(gca,'XScale','log')
set(gca,'XLim',[10,2000])
set(gca,'PlotBoxAspectRatioMode','manual')
set(gca,'PlotBoxAspectRatio',[1 3 1])
set(gca,'FontSize',12)
set(gca,'XTick',[100 1000 10000 100000])
set(gca,'YTick',[0 1])
ylabel('Fraction of Cell Population','FontSize',14)
xlabel('Fluorescence [a.u.]','FontSize',14)
</pre>
</pre>


[[Image:Fig2_BarChart.png]]


This is ok, but we'd rather only have one-sided error bars.  To do this, use a 4x2x2 matrix for the errors: cat(3,zeros(4,2),[Strain1_std' Strain2_std'])


The function barwitherr uses the first matrix zeros(4,2) as the lower error, and the second matrix [Strain1_std' Strain2_std'] for the upper errors:
Plot the histograms along the y-axis.  We choose the spacing variable empirically so that the plot "looks good":
<pre>
 
barwitherr(cat(3,zeros(4,2),[Strain1_std' Strain2_std']), [1 2 3 4],[Strain1_Mean' Strain2_Mean'])
legend('Strain 1','Strain 2')
pause; close all;
</pre>
[[Image:Fig3_BarChartEx.png‎]]


Don't like the colors? You can change them by modifying the colormap:
<pre>
<pre>
barmap=[0.7 0.7 0.7; 0.05 .45 0.1]; %[0.7 0.7 0.7] is grey, [ 0.05 .45 0.1] is green
spacing=.15; %Spacing along the y-axis chosen empirically
colormap(barmap);
ylabel('Data','FontSize',14)
title('Title of Experiment','FontSize',14)
pause;
</pre>


[[Image:Fig4_BarChartEx.png‎]]
for i=1:5
    fill([x(1);x'; x'],[i*spacing; (HistData(i,:)+i*spacing)'; ones(1,length(x))'*i*spacing],MMColorMap(i,:),'LineStyle','none')
    semilogx(x,HistData(i,:)+i*spacing,'LineWidth',3,'Color','k');
end


It isn't very useful to have our experimental conditions labelled 1,2,3,4.  To change the x-ticks to labels:
<pre>
set(ax, 'XTick',[1 2 3 4],'XTickLabel',{'A','B','C','D' });
pause;
</pre>
</pre>


[[Image:Fig5_BarChartEx.png‎]]
[[Image:ExampleStackedHistograms.png|600px]]


Maybe we would like more information in our x-tick labels.  But if the labels are too long, they will overlap.  To get around this, rotate the x-tick labels using the function xticklabel_rotate from the Matlab file exchange (see references below):
Save your figure in a variety of formats for later use (recall that we made h our figure handle):
<pre>
<pre>
set(ax, 'FontSize',12,'XTick',[1 2 3 4],'XTickLabel',{'Condition A','Condition B','Condition C','Condition D' });
saveas(h,'ExampleStackedHistograms','fig')
xticklabel_rotate([1 2 3 4],45,{'Condition A','Condition B','Condition C','Condition D' })
saveas(h,'ExampleStackedHistograms','png')
pause
saveas(h,'ExampleStackedHistograms','ai')
saveas(h,'ExampleStackedHistograms','pdf')
</pre>
</pre>
[[Image:Fig6_BarChartEx.png‎]]


=Code=
You can copy and paste the code below into a Matlab m-file to run all of the examples shown above.  You will also the "Data.mat" example data:


If you are going to use this figure in a presentation or paper you can save it in various forms (including as a file for adobe illustrator).  Recall that h is our figure handle:
<pre>
<pre>
saveas(h, 'ExampleBar.fig','fig')
%% Preliminaries:
saveas(h, 'ExampleBar.png','png')
saveas(h, 'ExampleBar.ai','ai')
close all;
</pre>


=Code=
close all; clear all;
You can copy and paste the code below into a Matlab m-file to run all of the examples shown above.  You will also need the two functions listed in the references below, available from the Matlab file exchange at [http://www.mathworks.com/matlabcentral/ Matlab Central].
load('Data.mat')
<pre>
close all;


%Suppose you have the following data for two different strains across 4
%% Define the bins to use for our data (you will need to adjust this depending on your data):
%different experimental conditions (Conditions A,B,C,D, from left to right)
Strain1_Mean=[0.5137    3.2830    1.5887    5.9188];
Strain2_Mean=[0.4042    2.9884    0.5709    2.7766];
Strain1_std=[1.1393    2.8108    2.2203    3.5233];
Strain2_std=[0.8762    2.8478    0.9878    2.2197];


%In this case we are using the same bins for each data set.  You probably
%want to do this when you are plotting stacked histograms.


%Plot this data as a bar chart
bins=logspace(0,4,60);
bar([1 2 3 4],[Strain1_Mean' Strain2_Mean'])
x=bins;
legend('Strain 1','Strain 2')
pause; close all;


%This looks ok, but we would really like some error bars, so we use a handy
%% Bin your data using Matlabs "hist" function
%function from the file exchange:
h=figure; hold;
barwitherr([Strain1_std' Strain2_std'], [1 2 3 4],[Strain1_Mean' Strain2_Mean'])
legend('Strain 1','Strain 2')
pause; close all;


%This is ok, but we'd rather only have one-sided error bars.  To do this,
%The variable "n" will be the number in each bin described by the variable
%you will send barwitherr zeros for the lower error and keep the upper
%"x".  HistData will become a matrix of the normalized bins (normalized to
%error as is by sending in the matrix cat(3,zeros(4,2),[Strain1_std'
%the total number of elements).  Means will become a vector of the mean
%Strain2_std']) for the error
%value for each distribution, which we will use when coloring our
barwitherr(cat(3,zeros(4,2),[Strain1_std' Strain2_std']), [1 2 3 4],[Strain1_Mean' Strain2_Mean'])
%histograms (so that colors roughly correspond to the mean of the
legend('Strain 1','Strain 2')
%distribution).
pause; close all;


%Now let's use better colors by changing the color map and set the bar
HistData=[]
%widths, line widths, axis fonts etc to something prettier
Means=[];
barwitherr(cat(3,zeros(4,2),[Strain1_std' Strain2_std']), [1 2 3 4],[Strain1_Mean' Strain2_Mean'],'LineWidth',2,'BarWidth',0.9)
legend('Strain 1','Strain 2')
%set the axis properties
ax=gca;
set(ax, 'FontSize',12)


for i=1:5
    [n,x]=hist(Data(i,:),x);
    HistData=[HistData; n./sum(n)];
    Means=[Means mean(Data(i,:))];
end


%Don't like the colors? You can change them by modifying the colormap:
barmap=[0.7 0.7 0.7; 0.05 .45 0.1]; %[0.7 0.7 0.7] is grey, [ 0.05 .45 0.1] is a green
colormap(barmap);
ylabel('Data','FontSize',14)
title('Title of Experiment','FontSize',14)
pause;


%It isn't very useful to have our experimental conditions labelled 1,2,3,4
%so can we change these to words? Yes:
set(ax, 'XTick',[1 2 3 4],'XTickLabel',{'A','B','C','D' });
pause;
%But this isn't perfect, maybe we want more information on the axis.  To
%have actual labels rotate them using the handy xticklabel_rotate function:
%set(ax, 'FontSize',12,'XTick',[1 2 3 4],'XTickLabel',{'Condition A','Condition B','Condition C','Condition D' });
xticklabel_rotate([1 2 3 4],45,{'Condition A','Condition B','Condition C','Condition D' })
pause


%If you are going to use this figure in a presentation or paper you can
%% Define a colormap for the histograms that will make the histograms brighter as the mean of the distribution increases
%save it in various forms (including as a file for adobe illustrator):


%Recall that h is our figure handle:
% In this case we chose to make the histograms brighter green at higher
saveas(h, 'ExampleBar.fig','fig')
% mean values since the flow cytometry data is of GFP.
saveas(h, 'ExampleBar.png','png')
saveas(h, 'ExampleBar.ai','ai')
   
   
  close all;
%Define a color map
MMColorMap=zeros(5,3);
 
%Define colors so that they scale with the difference between the mean
%fluorescence at a given timepoint and the mean at time 0
 
 
MM=sort(Means);
MMdiff=Means-Means(1);
MMdiff=MMdiff./(max(MMdiff));
 
 
MMColorMap(1:end,2)=MMdiff;
 
%Set up the figure and axis properties:
h=figure; hold; colors=colormap;
 
set(gca,'XScale','log')
set(gca,'XLim',[10,2000])
set(gca,'PlotBoxAspectRatioMode','manual')
set(gca,'PlotBoxAspectRatio',[1 3 1])
set(gca,'FontSize',12)
set(gca,'XTick',[100 1000 10000 100000])
set(gca,'YTick',[0 1])
ylabel('Fraction of Cell Population','FontSize',14)
xlabel('Fluorescence [a.u.]','FontSize',14)
 
%% Plot the histograms along the y-axis
 
spacing=.15; %Spacing along the y-axis chosen empirically
 
for i=1:5
    fill([x(1);x'; x'],[i*spacing; (HistData(i,:)+i*spacing)'; ones(1,length(x))'*i*spacing],MMColorMap(i,:),'LineStyle','none')
    semilogx(x,HistData(i,:)+i*spacing,'LineWidth',3,'Color','k');
end
 
 
%% Save the histogram figure
saveas(h,'ExampleStackedHistograms','fig')
saveas(h,'ExampleStackedHistograms','png')
saveas(h,'ExampleStackedHistograms','ai')
saveas(h,'ExampleStackedHistograms','pdf')
</pre>
</pre>


Line 149: Line 184:
Please feel free to post comments, questions, or improvements to this protocol. Happy to have your input!
Please feel free to post comments, questions, or improvements to this protocol. Happy to have your input!


*'''[[User:Megan N McClean|Megan N McClean]] 17:27, 11 June 2012 (EDT)''': There are probably more elegant ways of doing this, but this solution has worked well for me so far. Please feel free to update and add information as you figure out better ways of doing this.
*'''[[User:Megan N McClean|Megan N McClean]] 17:27, 17 July 2013(EDT)''': This ought to get you started.  There are many improvements that could be made.  For instance, a more sophisticated/attractive color scheme or automatic selection of the spacing along the y-axis. Knock yourselves out!


=References=
=Files=
Function xticklabel_rotate: [http://www.mathworks.com/matlabcentral/fileexchange/3486 xticklabel_rotate]
[[Media: MakeStackedHistograms.m‎ | Script for stacked histogram example]]


Function barwitherr: [http://www.mathworks.com/matlabcentral/fileexchange/30639-bar-chart-with-error-bars barwitherr]
[[Media: MakeStackHistData.mat‎ | Data for stacked histogram example]]


=Contact=
=Contact=

Revision as of 13:28, 18 July 2013

Summary

This explains the basics of plotting histograms stacked vertically (this allows you to see the shift in, for instance, fluorescence in a population of cells analyzed by flow cytometry). Example code and data is under "Files" at the end of the page.

Example

Your data could be anything. In this example, the variable "Data" (stored in MakeStackHistData.mat) contains five (5) rows, each of which contain 9000 fluorescence readings from a flow cytometry experiment. Each row represents a time point, with induction of GFP increasing with time.


%% Preliminaries:

close all; clear all;
load('MakeStackHistData.mat')

Chose bins (you probably want to use the same bin for every plot, since you will be stacking them along one y-axis) and then bin your data using the Matlab "hist" command. We also keep track of the distributions' means since we use this to color the histograms later.

%Set up bins (we are making histograms of flow cytometry data so we chose logarithmically spaced bins):
bins=logspace(0,4,60);
x=bins;

%Bin the data using "hist" and keep track of the number of elements "n" in each bin "x" for each row in "Data".  Also keep track of the mean of each row of "Data":

HistData=[];  
Means=[];

for i=1:5
    [n,x]=hist(Data(i,:),x);
    HistData=[HistData; n./sum(n)];
    Means=[Means mean(Data(i,:))];
end

We set up a colormap so that our histograms change in color as the mean of their distribution increases:


%% Define a colormap for the histograms that will make the histograms brighter as the mean of the distribution increases

% In this case we chose to make the histograms brighter green at higher
% mean values since the flow cytometry data is of GFP.
 
%Define a color map
MMColorMap=zeros(5,3);

%Define colors so that they scale with the difference between the mean
%fluorescence at a given timepoint and the mean at time 0


MM=sort(Means);
MMdiff=Means-Means(1);
MMdiff=MMdiff./(max(MMdiff));


MMColorMap(1:end,2)=MMdiff;  %The colormap is RGB, so changing the second column changes the green values.

%Set up the figure and axis properties:
h=figure; hold; colors=colormap;

set(gca,'XScale','log')
set(gca,'XLim',[10,2000])
set(gca,'PlotBoxAspectRatioMode','manual')
set(gca,'PlotBoxAspectRatio',[1 3 1])
set(gca,'FontSize',12)
set(gca,'XTick',[100 1000 10000 100000])
set(gca,'YTick',[0 1])
ylabel('Fraction of Cell Population','FontSize',14)
xlabel('Fluorescence [a.u.]','FontSize',14)


Plot the histograms along the y-axis. We choose the spacing variable empirically so that the plot "looks good":


spacing=.15;  %Spacing along the y-axis chosen empirically 

for i=1:5
    fill([x(1);x'; x'],[i*spacing; (HistData(i,:)+i*spacing)'; ones(1,length(x))'*i*spacing],MMColorMap(i,:),'LineStyle','none')
    semilogx(x,HistData(i,:)+i*spacing,'LineWidth',3,'Color','k');
end

Save your figure in a variety of formats for later use (recall that we made h our figure handle):

saveas(h,'ExampleStackedHistograms','fig')
saveas(h,'ExampleStackedHistograms','png')
saveas(h,'ExampleStackedHistograms','ai')
saveas(h,'ExampleStackedHistograms','pdf')
 

Code

You can copy and paste the code below into a Matlab m-file to run all of the examples shown above. You will also the "Data.mat" example data:

%% Preliminaries:

close all; clear all;
load('Data.mat')

%% Define the bins to use for our data (you will need to adjust this depending on your data):

%In this case we are using the same bins for each data set.  You probably
%want to do this when you are plotting stacked histograms.

bins=logspace(0,4,60);
x=bins;

%% Bin your data using Matlabs "hist" function.  

%The variable "n" will be the number in each bin described by the variable
%"x".  HistData will become a matrix of the normalized bins (normalized to
%the total number of elements).  Means will become a vector of the mean
%value for each distribution, which we will use when coloring our
%histograms (so that colors roughly correspond to the mean of the
%distribution).

HistData=[];  
Means=[];

for i=1:5
    [n,x]=hist(Data(i,:),x);
    HistData=[HistData; n./sum(n)];
    Means=[Means mean(Data(i,:))];
end



%% Define a colormap for the histograms that will make the histograms brighter as the mean of the distribution increases

% In this case we chose to make the histograms brighter green at higher
% mean values since the flow cytometry data is of GFP.
 
%Define a color map
MMColorMap=zeros(5,3);

%Define colors so that they scale with the difference between the mean
%fluorescence at a given timepoint and the mean at time 0


MM=sort(Means);
MMdiff=Means-Means(1);
MMdiff=MMdiff./(max(MMdiff));


MMColorMap(1:end,2)=MMdiff;

%Set up the figure and axis properties:
h=figure; hold; colors=colormap;

set(gca,'XScale','log')
set(gca,'XLim',[10,2000])
set(gca,'PlotBoxAspectRatioMode','manual')
set(gca,'PlotBoxAspectRatio',[1 3 1])
set(gca,'FontSize',12)
set(gca,'XTick',[100 1000 10000 100000])
set(gca,'YTick',[0 1])
ylabel('Fraction of Cell Population','FontSize',14)
xlabel('Fluorescence [a.u.]','FontSize',14)

%% Plot the histograms along the y-axis

spacing=.15;  %Spacing along the y-axis chosen empirically 

for i=1:5
    fill([x(1);x'; x'],[i*spacing; (HistData(i,:)+i*spacing)'; ones(1,length(x))'*i*spacing],MMColorMap(i,:),'LineStyle','none')
    semilogx(x,HistData(i,:)+i*spacing,'LineWidth',3,'Color','k');
end


%% Save the histogram figure
saveas(h,'ExampleStackedHistograms','fig')
saveas(h,'ExampleStackedHistograms','png')
saveas(h,'ExampleStackedHistograms','ai')
saveas(h,'ExampleStackedHistograms','pdf')

Notes

Please feel free to post comments, questions, or improvements to this protocol. Happy to have your input!

  • Megan N McClean 17:27, 17 July 2013(EDT): This ought to get you started. There are many improvements that could be made. For instance, a more sophisticated/attractive color scheme or automatic selection of the spacing along the y-axis. Knock yourselves out!

Files

Script for stacked histogram example

Data for stacked histogram example

Contact

or instead, discuss this protocol.