Posts

Showing posts with the label patterned flowcell

Diagnosing PCR duplicates from cluster duplicates

Image
NovaSeq, HiSeqX and HiSeq4000 Illumina sequencers have patterned flowcells which have a different chemistry as compared to random clustered flowcell systems (Hiseq2500 & MiSeq) which is known to cause duplicates during the clustering process. For some background on the issue, see these previous blog posts:

QC Fail blog Steve WingettEnseqlopedia blog by James Hadfield In my recent whole genome bisulfite sequencing experiment using TruSeq methylation library prep kits and NovaSeq, I noticed a high proportion of duplicate reads and wanted to investigate whether these were "cluster" duplicates, ie generated during the clustering process due to ExAmp chemistry or were duplicates generated during the PCR step. Generally cluster duplicates occur in the immediate proximity on the flowcell surface and PCR duplicates are expected to occur uniformly throughout the flowcell surface.
To diagnose this, I used the diagnose-dups tool by Dave Larson which can be found on Github here. I wr…