Export excel worksheets as individual CSVs

November 9, 2013   

If you need to work with data spread across a bunch of worksheets within an excel workbook, but you don’t want to do so in Microsoft Excel – here is a python script for extracting each individual workbook as a csv and exporting them all to a folder.

Programming  Tips  .csv .xls .xlsx excel gist python


Fetch Data from UCSC Genome Browser

November 3, 2013   

Previously, I’ve shown that you can use a mysql database browser (e.g. Sequel Pro) to access and browse the UCSC Genome Browser MySQL database.

If you have a small dataset that you would like to annotate, you can write SQL statements to fetch data. Below I show how you can use python to fetch genome coordinates by specifying gene and genome build.


Note: The UCSC browser mysql resource will throttle you if you make too many queries. If you need to annotate large datasets, all of the data is freely available for download here.

Bioinformatics  Genetics  gist python


Accessing the UCSC Genome Browser MySQL Database

November 2, 2013   

The UCSC Genome browser has a publicly available MySQL database. There are a lot of different ways you can use it, including:

  • Annotating a small dataset
  • Understanding how data is formatted, and can be used.
  • Browsing Data

If you are on a mac – Sequel Pro is a fantastic tool for browsing.

Connecting

To browse the UCSC genome browser database, download Sequal Pro and enter in the following connection information into the login screen:

UCSC MySQL

You should be presented with a screen that looks like this:

Browsing UCSC Genome MySQL database

Databases are represented as Connection/Resource > Database > Tables > Rows. Sequal pro lets you connect to the UCSC Browser MySQL server. Each database represents a different genome. Remember that genomes change and improve over time – so there are multiple builds of each genome (e.g. hg18 and hg19) represented by separate databases.

Within each database are tables. These are the same tables referred to on the UCSC website – and you can find out more about what the data represents on the browser website by clicking ‘describe table schema’ for the table of interest as shown below.

Bioinformatics  Genetics 


Get rid of the Microsoft User Data folder

August 29, 2013   

Who thought it would be a good idea to put this crap in the documents folder? Here is a quick one liner solution – open a terminal window and paste this in:

mv /Users/$USER/Documents/Microsoft\ User\ Data/ /Users/$USER/Library/Preferences/

Thanks to Danny Chang for the tip.

Tips 


A function for retrieving SNP data from Entrez using BioPython

June 6, 2013   

Biopython is a great tool for interacting with biological databases. I use it to retrieve records from NCBI’s Entrez databases including Pubmed.

Unfortunately – one notable database biopython has trouble working with is the SNP database. This is due to the Bio.Entrez parser being unable to handle the XML returned from this database. One solution is to use a built in Python XML parser, but I thought I’d try to come up with an easier solution.

To solve this problem – I wrote a function for retrieving SNP data, and parsing it into an array. Feel free to build on this, and use it as you wish. Suggestions welcome!

Bioinformatics  Genetics  Programming  Entrez gist NCBI