A selection of useful bash one-liners
Here's a selection of one liners that you might find useful in data analysis. I'll update this post regularly and hope you can share some of your own gems.
Compress an archive:
Uniq on one column (field 2):
Confirm md5sum of fastq files
Validate files by comparing checksums in parallel
http://www.catonmat.net/blog/wp-content/uploads/2008/09/sed1line.txt
http://www.catonmat.net/blog/perl-one-liners-explained-part-one/
http://www.unixguide.net/unix/perl_oneliners.shtml
http://www.commandlinefu.com/
http://genomics-array.blogspot.com.au/2010/11/some-unixperl-oneliners-for.html
https://wikis.utexas.edu/display/bioiteam/Scott's+list+of+linux+one-liners
https://www.biostars.org/p/56246/
https://www.gnu.org/software/parallel/parallel_tutorial.html
Compress an archive:
tar cf backup.tar.bz2 --use-compress-prog=pbzip2 directorySync a folder of data to a backup
rsync -azhv /scratch/project1 /backup/project1/Extract a tar archive:
tar xvf backup.tarHead a compressed file:
bzcat file.bz2 | head
pbzip2 -dc file.bz2 | head
zcat file.gz | head
Uniq on one column (field 2):
awk '!arr[$2]++' file.txtExplode a file on the first field:
awk '{print > $1}' file1.txtSum a column:
awk '{ sum+=$1} END {print sum}' file.txtPut spaces between every three characters
sed 's/\(.\{3\}\)/\1 /g' file.txtJoin 2 files based on field 1. Both files need to be properly sorted (use sort -k 1b,1)
join -1 1 -2 1 file1.txt file2.txtJoin a bunch of files by field 1. Individual files don't need to be sorted but the final output might need to be sorted:
awk '$0 !~ /#/{arr[$1]=arr[$1] " " $2}END{for(i in arr)print i,arr[i]}' file1..txt file2.txt ... fileN.txtFind number of lines shared by 2 files:
sort file1 file2 | uniq -dAlternative method to find the common lines (files need to be pre-sorted):
comm -12 file1 file2Add a header to a file
sed -e '1i\HeaderGoesHere' originalFileExtract every 4th line starting at the second line (extract the sequence from fastq)
sed -n '2~4p' file.txtFind the most common strings in column 2:
cut -f2 file.txt | sort | uniq -c | sort -k1nr | headRandomise lines in a file
shuf file.txtGenerate a list of random numbers (integers)
for i in {1..50} ; do echo $RANDOM ; doneFind a bunch of strings in file1 in file2
grep -Ff file1 file2Print lines which contain string1 or string2
egrep '(string1|string2|stringN)' file.txtCount the number of "X" characters per line:
n=0; while read line; do echo -n "$((n=$((n + 1)))) "; echo "$line" | tr -cd "X" | wc -c; done < file.txt
Count the length of strings in field 2:
awk '{print length($2)}' file1Print all possible 3mer DNA sequence combinations
echo {A,C,T,G}{A,C,T,G}{A,C,T,G}Filter reads with SamTools
samtools view -f 4 file.bam > unmapped.sam
samtools view -F 4 file.bam > mapped.sam
samtools view -f 2 file.bam > mappedPairs.samCompress a bunch of folders full of data
for DIR in `ls -d */ | sed 's#/##' ` ; do ZIP=$DIR.zip ; zip -r $ZIP $DIR/ ; doneGet year-month-day hour:minute:second from Unix "date"
DATE=`date +%Y-%m-%d:%H:%M:%S`GNU Parallel has many interesting uses.
Confirm md5sum of fastq files
ls *fastq.gz | parallel md5sum {} > checksums.txtIndex bam files
parallel samtools index {} ::: *bamCount number of lines in a bed file
parallel "wc -l {}" ::: *bedFix Bedtools "Error: malformed BED entry at line 1. Start was greater than end. Exiting." by reorganising bed coordinates
awk '{OFS="\t"} {if ($3<$2) print $1,$3,$2 ; else print $0}' file.bed > file_fixed.bedFix a MACS peak BED file that contains negative coordinates
awk '{if ($2<1) print $1,1,$3 ; else print $0 }' macs_peaks.bed > macs_peaks_fixed.bedYou want to recursively remove spaces from filenames in many sub directories:
find -name "* *" -type d | rename 's/ /_/g'Generate md5 checksums for directory of files in parallel
parallel md5sum ::: * > checksums.md5Aggregate: Sum column 2 values based on colum 1 string
awk '{array[$1]+=$2} END { for (i in array) {print i, array[i]}}' file.tsv
Validate files by comparing checksums in parallel
cat checksums.md5 | parallel --pipe -N1 md5sum -c
Find your own public IP address:
http://www.pement.org/awk/awk1line.txtdig +short myip.opendns.com @resolver1.opendns.comFind the IP address of a webpage:
dig +short www.example.comFind other computers on the local network connected by ethernet:
arp-scan --interface=eth0 --localnetFind your own devices MAC address:
ifconfig -a | grep -Po 'HWaddr \K.*$'Further reading:
http://www.catonmat.net/blog/wp-content/uploads/2008/09/sed1line.txt
http://www.catonmat.net/blog/perl-one-liners-explained-part-one/
http://www.unixguide.net/unix/perl_oneliners.shtml
http://www.commandlinefu.com/
http://genomics-array.blogspot.com.au/2010/11/some-unixperl-oneliners-for.html
https://wikis.utexas.edu/display/bioiteam/Scott's+list+of+linux+one-liners
https://www.biostars.org/p/56246/
https://www.gnu.org/software/parallel/parallel_tutorial.html