Posts

Showing posts with the label Bisulfite sequencing

Diagnosing PCR duplicates from cluster duplicates

Image
NovaSeq, HiSeqX and HiSeq4000 Illumina sequencers have patterned flowcells which have a different chemistry as compared to random clustered flowcell systems (Hiseq2500 & MiSeq) which is known to cause duplicates during the clustering process. For some background on the issue, see these previous blog posts:

QC Fail blog Steve WingettEnseqlopedia blog by James Hadfield In my recent whole genome bisulfite sequencing experiment using TruSeq methylation library prep kits and NovaSeq, I noticed a high proportion of duplicate reads and wanted to investigate whether these were "cluster" duplicates, ie generated during the clustering process due to ExAmp chemistry or were duplicates generated during the PCR step. Generally cluster duplicates occur in the immediate proximity on the flowcell surface and PCR duplicates are expected to occur uniformly throughout the flowcell surface.
To diagnose this, I used the diagnose-dups tool by Dave Larson which can be found on Github here. I wr…

Considerations in performing whole human genome bisulfite sequencing on the Illumina NovaSeq system

Image
Today at the NGS workshop at WEHI, Melbourne, I presented some findings related a pilot study of 12 methylomes studied with whole genome bisulfite sequencing. Two of those libraries were also sequenced on the HiSeq4000 platform to similar depth so there were some subtle but interesting differences between the systems. What we found was that the actual sequence coverage obtained was substantially less than that projected due to 2 problems. Firstly that the insert size was too small - which looks like it could be due to the inner workings of the Illumina TruSeq methylation kit. And secondly that there was a high proportion of duplicate reads observed - that is same strand and coordinates which are likely not independent observations. I will need to look into further detail at whether these are PCR duplicates or "cluster" duplicates. Perhaps the library prep or clustering protocols need some tweaking for bisulfite sequencing.

So as promised, here is the link to the slides.

Genome methylation analysis with Bismark

Image
Bismark is currently the de facto standard for primary analysis of high throughput bisulfite sequencing data. Bismark can align the reads to the genome and perform methylation calling. In this post, I'll go through Illumina whole genome bisulfite sequence (WGBS) alignment and methylation calling using Bismark. First I want to mention that this post is just a summary, not meant to be a user manual or thorough troubleshooting guide. Fortunately, Bismark has some of the best documentation for any bioinformatics suite and is mandatory reading. The Bismark crew are very proactive with responding to user queries on various forums as well.

First step in getting Bismark to work is to index the genome, in this case with Bowtie2:

bismark_genome_preparation --bowtie2 /pathto/refgenome/

Conventionally, multiplexed libraries will be sequenced over a number of lanes. Resist concatenating or merging the smaller fastq files for each patient/sample until after the alignment, as the concatenated fil…