Split a GFF File into Individual Features

January 25, 2015   

The General Feature Format is a widely used format for annotating genome sequences. If indexed with tabix, gff files can be viewed in IGV or elsewhere. While features are organized in a nested manner (e.g. genes > exons > variant), you can pull out the individual types and index them, or combine only a few for viewing in your genome browser.

I was working with wormbase annotation files, which combine all the different types of features together (genes, ncRNA, mRNA, binding site, operon, G Quartets, piRNAs, etc). This results in a very dense track in IGV which makes it difficult to disentangle what role individual features (or features of interest) might have.

As a result, I wrote this very short script for splitting the individual feature types apart, sorting them, and indexing them with tabix. This way they can be selectively viewed in IGV or elsewhere.

Bash  Bioinformatics  Programming  gff gist