DNA library preparation for next-generation sequencing

Library preparation is a process in which we modify DNA into a form that it is compatible for high throughput sequencing, and is becoming a key molecular biology technique. While there are an amazing variety of different library preparation methods available, I thought I'd start the the blog with a description of the classic method:
- Shearing/fragmentation
- End Repair
- DNA clean-up
- A tailing
- Adaptor ligation
- Size selection
- Amplification
- Quality control

Shearing/Fragmentation
The DNA needs to be in a size range that is compatible with the sequencing platform. The most commonly used sequencing platforms require DNA construct in the range of 300-500 bp, although this depends on the specific platform and the application. Fragmentation can be done by mechanical disruption through sonication like we do in our lab, but can also be done with a nebuliser or with enzymatic fragmentation. Our thoughts are that sonication/nebuliser has a lesser degree of sequence specificity bias as compared to fragmentase approach, thus giving a more even coverage distribution across the genome. Nebulisers are simple and quick to use, but limit you to just one sample at a time. Sonicators come in a range of configurations from using standard 1.5mL tubes to ones which can handle 96 well plates. Fragmentase may be a suitable option if you're working with a small genome or you don't have access to a sonicator. All of these methods require quite a bit of time to optimise. Sonicator power, time, presence of salts and liquid volume can all play a part in dictating the range of DNA fragments. For fragmentase, the concentration of both DNA and enzyme play a part, as well as the length and temperature of incubation. After fragmentase treatment, you will need to clean-up the sample which is normally done with a spin column, whereas this normally isn't required after sonication or nebulisation unless you want to concentrate the sample to a smaller volume.

After fragmentation, you will want to check whether it was successful. You can either run a microlitre on a microchip electrophoresis system or you can run an agarose gel. microchip systems use much less material and have better size resolution and are much preferred. At this stage, you will need to check that you have enough DNA for your library preparation. Most genomic DNA preparation kits suggest that you use 1 microgram of fragmented DNA, but in our experience, you can use much less than this (around 10 ng) with only a slight reduction in sequence coverage and diversity.

End Repair
During fragmentation, the DNA is broken, leaving a mixture of blunt ends, 3' overhangs and 5' overhangs. The end repair process removes the 3' overhangs with Klenow fragment and fills in the 5' overhangs with a T4 DNA polymerase. The end repair cocktail also contains T4 polunucleotide kinase (PNK) which phosphorylates the 5' ends and ensures the 3' ends carry a hydroxyl group. Ideally, choose a library prep kit which has all the three enzymes pre-mixed into a cocktail to save time. We incubate the tubes on a themo-mixer block set at 20C to reduce any effects of a fluctuating lab temperature.

Clean-up
After the reaction, you'll need to isolate the DNA and remove the enzymes and buffers. To do this, the standard protocol has been on a spin column such as Qiagen Qiaquick. On the other hand, newer protocols like the one recommended by Illumina, use magnetic Ampure or SPRI beads to isolate the DNA. Our lab has stuck to the spin column method because it is relatively quick (~10 minutes) compared to about 45 minutes for the bead procedure. Spin columns may work out to be more expensive at about $5 per prep compared to about $1 per bead prep, but once considering the time involved, then at low sample numbers performed manually the column prep is cheaper, whereas on automated work stations dealing with up to 96 samples simultaneously, the bead prep is more economical.

A-tailing
This procedure adds a single adenosine residue to the 3' ends of the blunt ended DNA. This helps to reduce the chance of these fragments ligating to each other and increases the rate of adapter ligation, as the adapters contain a single overhanging "T" base. This enzymatic step is performed by Klenow exo, and requires dATP. We incubate this in a PCR machine at 37C for 30 mins. After A-tailing, you'll need to do another DNA clean-up.

Adapter ligation
This is the stage where the DNA fragments are ligated to the sequencing adapters with T4 DNA ligase. It is important to use the correct amount of adapter for the amount of DNA present, as excess self-ligated adapter can cause headaches if it is carried through to later stages of the prep. Follow the recommendations of the kit manufacturer, and if you are working with smaller DNA amounts, use a smaller amount of adaptor (you may need a titration experiment to optimise), we have tried a 1/10 diluted amount with success on 10 ng DNA inputs. The ligation reaction is incubated on a thermo-mixer at 20C for 15 minutes, and the DNA is cleaned-up again.

It is essential that the DNA sample doesn't contain traces of residual ethanol as that can spell trouble for the following step. We let our columns dry for at least 10 minutes at room temperature to achieve this.

Size selection
This step is required to further eliminate the presence of self-ligated product and get a final library fragment size range which is compatible with the sequencing instrument. This method is done a range of ways in different labs depending on their throughput. In our lab, we stick to using 2% agarose gels to get a desired size range (200-300bp) and eliminate self-ligated product. Any residual ethanol is highly problematic here as you can literally watch your sample jump out of the well, ruining the prep. Obviously, this method is highly labour intensive and as such allows a technician to process only about 16 libraries per day. E-Gels are another option, which are pre-cast mini gels which are said to be quicker to prepare and run. If you are using an E-Gel in your lab I would love to hear some feedback on whether they are helpful or not. DNA is then excised from the gel/E-Gel and purified using a column clean-up.

For larger labs there are gel-free size selection methods suited automated library preparation which take advantage of the size specificity of SPRI beads in certain concentrations of PEG 8000 and NaCl. By altering these combinations, you can fine-tune the size selection for your application and best of all, this is amenable to automation. The size range might not be as tight and accurate as agarose gel excision.

Amplification
Following size selection, it is common practise to use PCR to increase the overall amount of library and incorporate the sequencing primer annealing site (and barcode if required). There are amplification-free methods available (perhaps the subject of another post), but these are still uncommon. The number of cycles required depends on the amount of starting material. When beginning with microgram amounts, you may only need 4-8 cycles, but for nanogram amounts, you may need up to 12 cycles. Phusion polymerase is the most commonly used PCR enzyme, but there are others out there with apparently better coverage for GC/AT biased genomes (like KAPA). After PCR, you'll need to perform yet another DNA clean-up, this time on a dedicated "post PCR" area.

Quality control
To determine whether your libraries have actually worked, you'll need to run some QC checks and this can be done a variety of ways. The simplest way is to use Nanodrop UV Spec or Qubit Fluorometer to quantify the concentration of the sample. While relatively easy, that won't actually tell you whether you have significant adapter-only product present in the library. To find that out, you'll need to run the sample on an agarose gel or microchip electrophoresis. Again, the benefits of the microchip method are in sensitivity and size resolution. We use Shimadzu MultiNA and there are others like the Agilent Bioanalyzer, which is used by other labs including those at the Broad Institute. These microchip systems also come in handy for RNA and epigenetics analysis and so has become invaluable in our lab. Illumina recommends running samples on the bioanalyzer to verify the lack of self-ligated adapter product as well as running a qPCR to accurately determine the concentration of the library.

In our lab, we have found the best way to get even cluster densities on the flowcell is to use MultiNA to determine the volume of diluent to get 10 nM library, and then on the day of sequencing, to re-run the samples once again to quantify concentration and make small adjustments to the DNA volume to add into the sequencing reaction. The extra few minutes to re-run the samples is well worth the extra-consistent data volumes.

Things I haven't covered
Barcoding, robotics, microchip prep methods, amplification-free methods and applications other than genomic DNA will all be covered in future posts.

Selecting a method for your lab
Selecting a library prep method can be a hard one especially given all the different variations available. The most important consideration is the throughput that you will be expecting and secondly the budget. If you can't afford the capital outlay for a sonicator and microchip electrophoresis, your results could suffer as a consequence so these are really quite important. When choosing a sample prep kit, I'd recommend trying the one provided by the sequencing instrument manufacturer (Illumina/Roche/Life Tech), but also consider NEB, which we have found to be just as effective at a much better price. I will be reviewing new kits as they come onto the market in Australia.

If you have any feedback or thoughts on library prep kits and methods, I'd love to hear from you.

Further reading
Broad Institute Sample prep
NEB-Next Library Prep
Illumina genomic DNA sample preps
E-Gels from Life Tech
USC Epigenome Centre Library Prep

Search This Blog

Genome Spot

DNA library preparation for next-generation sequencing

Popular posts from this blog

Data analysis step 8: Pathway analysis with GSEA

Uploading data to GEO - which method is faster?

My personal thoughts on gene name errors