Tabular sequence to fasta format

Here is a 1 line command to turn a list of sequences into a fasta file.

$ cat seq.txt 
CAACACCAGTCGATGGGCTGT
CAACACCAGTCGATGGGCTGTC
CAACACCAGTCGATGGGCCGT
TAGCTTATCAGACTGATGTTGA
TAGCTTATCAGACTGATGTTGAC

We use nl to count the lines, then sed to remove whitespaces and introduce the arrow ">" then tr to create line breaks.

$ nl seq.txt | sed 's/^[ \t]*/>/' | tr '\t' '\n'
>1
CAACACCAGTCGATGGGCTGT
>2
CAACACCAGTCGATGGGCTGTC
>3
CAACACCAGTCGATGGGCCGT
>4
TAGCTTATCAGACTGATGTTGA
>5
TAGCTTATCAGACTGATGTTGAC


Popular posts from this blog

Data analysis step 8: Pathway analysis with GSEA

Two subtle problems with over-representation analysis

Uploading data to GEO - which method is faster?