Evaluate length of sequence strings
If you are working with sequence data often, you will sometimes need to look at the distribution of read lengths in a data set after quality and adapter trimming. From a bam file this can be done starting with samtools view, then cutting the sequence string out and then using either perl or awk to count the length of the sequence. The list of integers can then be piped to numaverage, a numutils program to evaluate the average, median and mode of a list of numbers. To get the average length samtools view sequence.bam | cut -f10 | awk '{ print length}' | numaverage To get the median length samtools view sequence.bam | cut -f10 | awk '{ print length}' | numaverage -M To get the mode length samtools view sequence.bam | cut -f10 | awk '{ print length}' | numaverage- m To get the shortest length samtools view FC095_1.bam | cut -f10 | awk '{ print length}' | sort | head -1 To get the longest samtools view FC095_1.bam | cut -f10 | awk '{ print