Example blast workflow (nucleotide)
BLAST is a stalwart in the bioinformatics space. I've used it in multiple contexts and it is a good way to introduce students to bioinformatics principles and the process of pipeline development. Although there is a web interface, it is still good to use it locally if you need to run a large number of queries. As my group needed to use BLAST again this week, I thought I'd share a small example script which I shared with my Masters research students just getting into bioinformatics.
The script (shown below) downloads the E. coli gene coding sequences and then extracts by random a few individual sequences. These undergo random mutagenesis and then we can use the mutated sequences as a query to find the original gene with BLAST. The output format is tabular which suits downstream large scale data analysis. The script also includes steps for generating the blast index. The script is uploaded as a gist here. It requires prerequesites:
sudo apt install ncbi-blast+ emboss
unwrap_fasta.pl is available here.
---