I have added a new utility to seq-collection called iter which generates chromosomal ranges. Lists of genomic ranges can be easily plugged into utilities such as xargs or gnu-parallel to parallelize commands.

For example:

sc iter test.bam 100,000 # Iterate on bins of 100k base pairs

# Outputs
> I:0-999999
> I:1000000-1999999
> I:2000000-2999999
> I:3000000-3999999
> I:4000000-4999999

Note: BAMs use a 0-based coordinate system; VCFs are 1-based

This list of genomic ranges can be used to process a BAM or VCF in parallel:


function process_chunk {
  # Code to process chunk
  vcf=$1
  region=$2
  # e.g. bcftools call -m --region 
  echo bcftools call --region $region $vcf # ...
}

# Export the function to make it available to GNU parallel
export -f process_chunk

parallel --verbose process_chunk ::: test.bam ::: $(sc iter test.bam)

You can also set the [width] option to 0 to generate a list of chromosomes.

See Using GNU-Parallel for Bioinformatics for a comprehensive guide on using Parallel for bioinformatics.

seq-collection (sc) is a set of tools written in nim and using the fantastic hts-nim package.