bcftools is a great for working with variant call files. In general, it is very fast. However, I have found that the process of merging VCF files (using bcftools merge) and performing concordance checking (using bcftools gtcheck) can be a little bit slow. That is why I wrote two functions that take advantage of GNU Parallel to parallelize them.
The function vcf_chromosomes extracts chromosomes names from a VCF file using bcftools. Parallelization occurs across chromosomes.
parallel_bcftools_merge is run very similar to bcftools merge. The only difference is that you have to pipe it into bcftools to change it to the appropriate output. For example:
parallel_bcftools_merge -m all `ls *list_of_bcffiles` | bcftools view -O z > merged_vcf.vcf.gz
The parallel_bcftools_merge function will generate a temporary vcf for every chromosome. You can use all flags except for -O with this function.
parallel_bcftools_gtcheck should not be used with --all-sites, or --plot. I recommend using this function with -H and -G 1 to calculate the absolute number of differences in terms of homozygous calls between samples. Also, this function requires datamash (on OSX, install with brew install datamash)
The output file is slightly different than what bcftools normally outputs. In general, I use this function specifically to calculate conocordance between individual fastq runs - like this:
If you need to fetch pubmed citations in aggregate it can be convenient to do so using pubmed identifiers. I’ve created a pubmed() function that can be added to a google sheet and used to fetch formatted html citations from pubmed. For example, entering the following into a cell:
Will return an html-formatted citation:
<p><strong>The heritability of metabolic profiles in newborn twins.</strong><br />Alul FY, Cook DE, Shchelochkov OA, Fleener LG, Berberich SL, Murray JC, Ryckman KK, <br />(2013 Mar) <em>Heredity</em> 110 (3) 253-8</p>
This citations formats nicely as:
The heritability of metabolic profiles in newborn twins. Alul FY, Cook DE, Shchelochkov OA, Fleener LG, Berberich SL, Murray JC, Ryckman KK,
(2013 Mar) Heredity 110 (3) 253-8
To implement the function, you’ll need to copy and paste the function below into the script editor and save it as a new project. Then it will become available within your google sheet. The script editor is available through the Tools > Script Editor