Picard tools is a great set of utilities by the Broad Institute for performing sequence analysis. however, some of the utilities run on the slower side.
To speed things up, I created a new command:
insert-size is approximately
Insert-size does not operate in exactly the same way as picard
CollectInsertSizeMetrics, but the results are stunningly close.
insert-size has some nice advantages over picard as well. The output is a lot more interpretable and parsable than standard picard output.
(Show example of diff)
Additionally, you can run
insert-size on a list of bams and collate the results in a file for comparison. The output contains the sample name identified within the bam as well as the filename.
You can also output the distribution of insert-sizes by count by using the
There is room for improvement with this command. It doesn’t examine results by lane or sequencing library. Additionally, it could be made to be even faster by parallelizing across chromosome.
For example, a relatively small BAM file with ~2M reads takes an average of XX seconds to