Posts

Showing posts with the label RNA

Screen for rRNA contamination in RNA-seq data

Image
Ribosomal RNA (rRNA) is very abundant in cells (~80% of total RNA), so it is useful to deplete rRNA when doing genomewide assays to have sufficient coverage of other genes including protein coding and non-protein coding genes.

There are two major strategies for achieving rRNA removal, being (1) poly A enrichment and (2) rRNA depletion. Poly A enrichment uses an oligo dT coupled magnetic bead to "pull-out" RNA molecules with a polyA tag, a common feature of protein coding transcripts. rRNA depletion can be achieved using kits such as Ribo-Zero (Illumina), Ribo-Minus (LifeTech) and NEBNext® rRNA Depletion Kit (NEB). These kits contain oligonucleotide probes that either hybridize and immobilise the rRNA or hybridize and degrade the unwanted rRNA.

Once you have some sequence data, you will need to check whether the rRNA depletion has worked. This is somewhat different to a genome-wide analysis I've mentioned in earlier posts because rRNA genes exist in multiple copies and r…

Generate an RNA-seq count matrix with featureCounts

Featurecounts is the fastest read summarization tool currently out there and has some great features which make it superior to HTSeq or Bedtools multicov.

FeatureCounts takes GTF files as an annotation. This can be downloaded from the Ensembl FTP site. Make sure that the GTF version matches the genome that you aligned to. FeatureCounts it also smart enough to recognise and correctly process SAM and BAM alignment files.

Here is a script to generate a gene-wise matrix from all BAM files in a directory.
#!/bin/bash
#Generate RNA-seq matrix
#Set parameters
GTF=/path/to/Mus_musculus.GRCm38.78.gtf
EXPTNAME=mouse_rna
CPUS=8
MAPQ=10
GENEMX=${EXPTNAME}_genes.mx

#Make the gene-wise matrix
featureCounts -Q $MAPQ -T $CPUS -a $GTF -o /dev/stdout *bam \
| cut -f1,7- | sed 1d > $GENEMX

The data are now ready to analyse with your favourite statistical package (DESeq, EdgeR, Voom/Limma, etc).

Consider attaching the gene name to give the data more relevance. To do that, first make a table of Ensembl IDs and ge…

Regulation of gene expression by long non-coding RNAs

Image
Gene regulation is a really complicated thing. We have covalent marks to DNA, histones and transcription factors. Chromatin remodeling and long range enhancer interactions. Enhancer elements located in introns of genes hundreds of kilobases away from the gene they're controlling. Transcriptional control from microRNA networks and now there is an emerging model for the function of some of the thousands of long non-coding RNAs which are just now being uncovered with high resolution (directional) transcriptome analysis.

Many of you which studied molecular biology at Uni would (should) remember the model for how X chromosome inactivation is achieved. The mechanism centers around XIST, one of the first non-coding RNA genes identified. Expression of XIST from the inactive X chromosome essentially wraps it up at the same time that repressive epigenetic marks are established through its interaction with the Polycomb Repressive Complex 2 (PRC2). Sounds simple enough, but the model also inv…