Beeswarm chart for categorical data

 Biomedical journal articles are full of categorical data, showing data for control and case groups using barplots with whiskers. While these are popular, that type of chart can hide the underlying data patterns and are discouraged by statisticians. There are other alternatives such as boxplots, violin charts and strip charts, but I've become a big fan of beeswarm charts lately. The reason is that beeswarms show the distribution just like violin plots but have the benefit of showing the individual points, which is helpful if sample size varies between categories.

The way I like to emply beeswarm charts is to first create a boxplot and then overlay the beeswarm. With that approach, the mean and interquartile ranges are shown, along with the actual datapoints. Here's the result.

Some notes on how to make this chart: First step is to collect the data into a list of vectors, see the example code below for the iris dataset. Then make the boxplot. And finally the beeswarm plot with add=TRUE to overlay the points. Notice that the beeswarm plot uses pch=19 to create filled circles. 

library("beeswarm") # use install.packages("beeswarm") if the package is not yet instaled


set <- subset(iris,Species=="setosa")

ver <- subset(iris,Species=="versicolor")

vir <- subset(iris,Species=="virginica")


l <- list("setosa"=set$Petal.Length,

  "versicolor"=ver$Petal.Length,

  "virginica"=vir$Petal.Length)


boxplot(l,col="white",main="petal length", ylab="cm" )

beeswarm(l,add=TRUE,pch=19)


See also: Beeswarm charts in ggplot2

Popular posts from this blog

Two subtle problems with over-representation analysis

Data analysis step 8: Pathway analysis with GSEA

Uploading data to GEO - which method is faster?