giovedì 26 aprile 2007

How to Superimpose Histograms

Function inspired by the code of Martin Maechler found on the R-List at http://tolstoy.newcastle.edu.au/R/help/06/06/30059.html
superhist2pdf <- function(x, filename = "super_histograms.pdf",
dev = "pdf", title = "Superimposed Histograms", nbreaks ="Sturges") {
junk = NULL
grouping = NULL
for(i in 1:length(x)) {
junk = c(junk,x[[i]])
grouping <- c(grouping, rep(i,length(x[[i]]))) }
grouping <- factor(grouping)
n.gr <- length(table(grouping))
xr <- range(junk)
histL <- tapply(junk, grouping, hist, breaks=nbreaks, plot = FALSE)
maxC <- max(sapply(lapply(histL, "[[", "counts"), max))
if(dev == "pdf") { pdf(filename, version = "1.4") } else{}
if((TC <- transparent.cols <- .Device %in% c("pdf", "png"))) {
cols <- hcl(h = seq(30, by=360 / n.gr, length = n.gr), l = 65, alpha = 0.5) }
else {
h.den <- c(10, 15, 20)
h.ang <- c(45, 15, -30) }
if(TC) {
plot(histL[[1]], xlim = xr, ylim= c(0, maxC), col = cols[1], xlab = "x", main = title) }
else { plot(histL[[1]], xlim = xr, ylim= c(0, maxC), density = h.den[1], angle = h.ang[1], xlab = "x") }
if(!transparent.cols) {
for(j in 2:n.gr) plot(histL[[j]], add = TRUE, density = h.den[j], angle = h.ang[j]) } else {
for(j in 2:n.gr) plot(histL[[j]], add = TRUE, col = cols[j]) }
invisible()
if( dev == "pdf") {
dev.off() }
}
# How to use the function:
d1 = rnorm(1:100)
d2 = rnorm(1:100) + 4
# the input object MUST be a list!
l1 = list(d1,d2)
superhist2pdf(l1, nbreaks="Sturges")

15 commenti:

  1. Nice, thanks :)
    Just change the function name to superhist when calling it, as you created the function with that name. Or declare the function itself with the name superhist2pdf.

    RispondiElimina
  2. How can I change the number of breaks in the histogram?

    RispondiElimina
  3. Jake, thanks for the point. I modified the function so now it has a nbreaks argument;
    see ?hist to check how to use that argument.
    By the way, you can add how many arguments are available for the hist funcion, appending the desired arguments after hist in the below piece of code:
    histL <- tapply(junk, grouping, hist, breaks=nbreaks, plot = FALSE)

    RispondiElimina
  4. Great code, thanks!

    BTW, I cannot switch to the probability densities mode. If I add the argument freq=FALSE after breaks=nbreaks, I get the warning "In hist.default... argument 'freq' is not made use of" when executing the function superhist2pdf.

    Any ideas?

    Thanks again.

    RispondiElimina
  5. Dear Chiara,
    SFTD, the problem seems to be explained in the following paragraph of the man page for hist (?hist):

    freq: logical; if 'TRUE', the histogram graphic is a representation of frequencies, the 'counts' component of the result; if 'FALSE', probability densities, component 'density', are plotted (so that the histogram has a total area of one). Defaults to 'TRUE' _if and only if_ 'breaks' are equidistant
    (and 'probability' is not specified).

    The specific warning you pointed out seems to be raised if you try to use the plot=F and the freq argument (either freq=T or freq=F) together.
    If the 'breaks' you specified are not equidistant (nbreaks argument) the histogram will plot probability densities.

    RispondiElimina
  6. Hi,
    very nice code. Still, I am a little confused. I am not an R expert. I did not really understand what changes should be done on the code to get the probability densities.
    Thank you for your help,
    Amir

    RispondiElimina
  7. alternative:

    hist(x)
    hist(y, add=TRUE)

    RispondiElimina
  8. The problem with the default add argument is that it doesn't take account of the range of the data/distributions.

    RispondiElimina
  9. Hi,
    This looks like great stuff but I am having problems. I keep playing around with it but I get two different errors depending on what I do. The errors are "null device (line break) 1" and "windows (line break) 2". Any ideas? I am on a windows machine if that helps.
    Thanks in advance!
    -Sean

    RispondiElimina
  10. I checked the code on a Windows machine and it seems to work! null device (line break) 1 is not an error, it basically says that the image in pdf is ready and the devide pdf, used to render the image was closed. Check your working directory (getwd() and dir()) you should find a pdf named "super_histograms.pdf" with the produced image!

    RispondiElimina
  11. Lovely function! Just what I was looking for!
    One little thing: If you assign range(junk, na.rm = TRUE) to xr then it can deal with missing values.
    Cheers,

    David

    RispondiElimina
  12. Thanks David! Very useful addition to this thread!

    RispondiElimina
  13. Thanks so much for this code!

    I am being a bit dim, and can't work out how to convert it to greayscale (for a paper) - any thoughts?

    Many thanks again!

    Lauren

    RispondiElimina
  14. The no brain answer would be to use some graphics editor to convert from RGB to Gray scale.
    In Mac OS X you can use the preview application: save as > Quartz Filter > Gray Tone.
    In R you can use the EBImage Bioconductor package to convert the color mode from Color to Grayscale in this way:
    img= readImage("super_histograms.pdf")
    colorMode(img) <- Grayscale
    display(img)

    You can find useful hints about creating overlapping histograms at the following links: http://stackoverflow.com/questions/3541713/how-to-plot-two-histograms-together-in-r http://stackoverflow.com/questions/5328452/overlaid-histograms-in-r-ggplot2-preferred

    HIH

    RispondiElimina