Picard tools is a great set of utilities by the Broad Institute for performing sequence analysis. however, some of the utilities run on the slower side.

To speed things up, I created a new command: insert-size as part of seq-collection. The command runs much faster, owing in part to parallelization of insert-size calculations.

insert-size does not operate in exactly the same way as picard CollectInsertSizeMetrics, but the results are very close.

insert-size has some nice advantages over picard. The output is a lot more interpretable and parsable than standard picard output.

For example, if you run:

sc insert-size --basename --header tests/data/test.bam

The outputted table will be:

median mean std_dev min percentile_99.5 max_all n_reads n_accept n_use sample basename
179 176.5 63.954 38 358 359 237 101 100 AB1 test.bam

You can also output the distribution of insert-sizes by count by specifying the --dist=<filename> argument.

seq-collection (sc) is a set of tools written in nim and using the fantastic hts-nim package.