Posts

Showing posts with the label one-liners

Tabular sequence to fasta format

Here is a 1 line command to turn a list of sequences into a fasta file.

$ cat seq.txt 
CAACACCAGTCGATGGGCTGT
CAACACCAGTCGATGGGCTGTC
CAACACCAGTCGATGGGCCGT
TAGCTTATCAGACTGATGTTGA
TAGCTTATCAGACTGATGTTGAC

We use nl to count the lines, then sed to remove whitespaces and introduce the arrow ">" then tr to create line breaks.

$ nl seq.txt | sed 's/^[ \t]*/>/' | tr '\t' '\n'
>1
CAACACCAGTCGATGGGCTGT
>2
CAACACCAGTCGATGGGCTGTC
>3
CAACACCAGTCGATGGGCCGT
>4
TAGCTTATCAGACTGATGTTGA
>5
TAGCTTATCAGACTGATGTTGAC


A selection of useful bash one-liners

Here's  a selection of one liners that you might find useful in data analysis. I'll update this post regularly and hope you can share some of your own gems.

Compress an archive:
tar cf backup.tar.bz2 --use-compress-prog=pbzip2  directory Sync a folder of data to a backup
rsync -azhv /scratch/project1 /backup/project1/ Extract a tar archive:
tar xvf backup.tar   Head a compressed file:
bzcat file.bz2 | head   pbzip2 -dc file.bz2 | head zcat file.gz | head 
Uniq on one column (field 2):
awk '!arr[$2]++' file.txt Explode a file on the first field:
awk '{print > $1}' file1.txt Sum a column:
awk '{ sum+=$1} END {print sum}' file.txt Put spaces between every three characters
sed 's/\(.\{3\}\)/\1 /g' file.txt Join 2 files based on field 1. Both files need to be properly sorted (use sort -k 1b,1)
 join -1 1 -2 1 file1.txt file2.txt Join a bunch of files by field 1. Individual files don't need to be sorted but the final output might need to be sor…