Picard tools is a great set of utilities by the Broad Institute for performing sequence analysis. however, some of the utilities run on the slower side.

To speed things up, I created a new command: insert-size. insert-size is approximately

Insert-size does not operate in exactly the same way as picard CollectInsertSizeMetrics, but the results are stunningly close.

insert-size has some nice advantages over picard as well. The output is a lot more interpretable and parsable than standard picard output.

(Show example of diff)

Additionally, you can run insert-size on a list of bams and collate the results in a file for comparison. The output contains the sample name identified within the bam as well as the filename.

You can also output the distribution of insert-sizes by count by using the --dist flag.

There is room for improvement with this command. It doesn’t examine results by lane or sequencing library. Additionally, it could be made to be even faster by parallelizing across chromosome.

For example, a relatively small BAM file with ~2M reads takes an average of XX seconds to

Recently I started developing a set of utilities called seq-collection (sc) written in nim and using the fantastic hts-nim package.