MATLAB Tip of the Day: Better Histograms
/As I've mentioned before, I'm mentoring an undergrad and teaching him to use MATLAB. He's finishing his Honors Thesis this week, and so we've been cleaning up his figures to make comparisons easier for his readers.
The Problem
My undergrad has a lot of histograms to compare, and the MATLAB defaults can get in the way. You can set the number of bins, but if your data sets span slightly different ranges, you'll quickly find that your bins have different widths or different maxima and minima. They're okay defaults for making a single histogram, but if you want to compare multiple figures, as we do, it gets frustrating fast.1
Our Solution
Here's what we did to get the result we wanted:
- Add a common minimum and a maximum point to the data sets.
- Create the histogram table.
- Subtract one count from the smallest and largest bins (these are the minimum and maximum points we added).
- Plot the histogram.
Code
Take data sets, add max and min points
% x = x values for yA and yB (not needed for histogram)
% yA = data set A
% yB = data set B
% nBins = number of bins to use in histograms
% Set extrema
minY = 0; %there are no values lower than this
maxY = 50; %there are no values higher than this
% Add in extrema placeholders to adjust bins to a common scale
yA2 = [yA minY maxY];
yB2 = [yB minY maxY];
Create the table of histogram data
% Bin data
[countsA2, binsA2] = hist(yA2, nBins);
[countsB2, binsB2] = hist(yB2, nBins);
Remove max and min placeholders
% Remove extrema placeholders from counts
histEnds = zeros(size(binsA2));
%data sets A and B have the same nBins, so binsA2 and binsB2 are same length
histEnds(1) = 1; %removes minimum placeholder
histEnds(end) = 1; %removes maximum placeholder
countsA3 = countsA2 - histEnds;
countsB3 = countsB2 - histEnds;
Create plot
% Plot histograms
hold all
bar(binsA2, countsA3, 'b')
bar(binsB2, countsB3, 'r')
hold off
% Labels
set(gca, 'XLim', [minY maxY])
xlabel('y value')
ylabel('counts')
legend({'A', 'B'})
Instead of the histogram on the left (also shown above), you'll get the one on the right.
The full tutorial script is here. If you run the script, you'll get a figure showing the data points used2 and two histograms: one using the defaults, and one using identical bins.
If you've got a better way to do this, I'm happy to hear it. I love learning new tricks.
1: I realize that you don't need this trick for putting two histograms on the same set of axes. In our actual use case, the histograms could be made days apart and plotted on separate axes in separate figures. They're plotted together here so that you can see the clear difference between the the trick's output and the defaults in a single figure.
2: The points in the tutorial script are generated by pulling from a Gaussian distribution so I didn't have to come up with actual data for the example.