Showing posts from 2022

Beeswarm chart for categorical data

 Biomedical journal articles are full of categorical data, showing data for control and case groups using barplots with whiskers. While these are popular, that type of chart can hide the underlying data patterns and are discouraged by statisticians. There are other alternatives such as boxplots, violin charts and strip charts, but I've become a big fan of beeswarm charts lately. The reason is that beeswarms show the distribution just like violin plots but have the benefit of showing the individual points, which is helpful if sample size varies between categories. The way I like to emply beeswarm charts is to first create a boxplot and then overlay the beeswarm. With that approach, the mean and interquartile ranges are shown, along with the actual datapoints. Here's the result. Some notes on how to make this chart: First step is to collect the data into a list of vectors, see the example code below for the iris dataset. Then make the boxplot. And finally the beeswarm plot with a

Mass download from google drive using R

Google drive is great for sharing documents and other small files but it's definitely not suited to moving  many large files around. For example I just received 170 fastq files that are about 200 MB in size. If you use the browser to download the whole folder, the web app will zip the contents for you which will take a LOOONG time. Alternatively you can download each and every one of those files one by one, which is annoying and prone to human error. You can insist to your collaborators to transfer in a different way, but there are not that many user-friendly and economic approaches. Your biologist collaborators probably won't be able to use rsync  to get the data to you safely. And fast convenient tools for moving around large files like Hightail cost a lot of money. A good solution to this problem is to use the R package googledrive  which enables command line automation of tasks that might take a long time manually. The package vignette has a good overview of the main comma