Posts

Showing posts with the label Epigenomics

Extract TSS regions from a GTF file with GTFtools

Since stumbling upon GTFtools recently, I found that it has other another interesting use - to generate coordinate sets around transcriptional start sites (TSSs). This is really important for ChIP-seq analysis when we want to compare for example the strength of enrichment of histone modificaions at TSSs and compare it to RNA expression. Using GTFtools, it is a one line command to extract these positions:

gtftools.py -t Homo_sapiens.GRCh38.94.gtf.tss.bed -w 1000  Homo_sapiens.GRCh38.94.gtf

Where "-t" is the output file flag, "-w" is the desired TSS distance to cover, in this case +/- 1000 bp, and the last argument is the input gtf file which needs to be Ensembl or Gencode (other ones don't work due to differences in formatting) 
If I had to do this without GTFtools, it would end up being more complicated, as TSS positions (exon 1 starts) would need to be extracted from the GTF file separately for the top and bottom strands and then merged.

Genome methylation analysis with Bismark

Image
Bismark is currently the de facto standard for primary analysis of high throughput bisulfite sequencing data. Bismark can align the reads to the genome and perform methylation calling. In this post, I'll go through Illumina whole genome bisulfite sequence (WGBS) alignment and methylation calling using Bismark. First I want to mention that this post is just a summary, not meant to be a user manual or thorough troubleshooting guide. Fortunately, Bismark has some of the best documentation for any bioinformatics suite and is mandatory reading. The Bismark crew are very proactive with responding to user queries on various forums as well.

First step in getting Bismark to work is to index the genome, in this case with Bowtie2:

bismark_genome_preparation --bowtie2 /pathto/refgenome/

Conventionally, multiplexed libraries will be sequenced over a number of lanes. Resist concatenating or merging the smaller fastq files for each patient/sample until after the alignment, as the concatenated fil…