<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>
    Daniel E. Cook
    </title>
    <link>https://www.danielecook.com/</link>
    <description>Recent content on Daniel E. Cook</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    
    
    <copyright>This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.</copyright>
    <lastBuildDate>Wed, 22 Apr 2020 01:15:53 +0000</lastBuildDate>
    
    
        <atom:link href="https://www.danielecook.com/feed/index.xml" rel="self" type="application/rss+xml" />
    
    
    <item>
      <title>A tool for writing TILs (Today I Learned)</title>
      <link>https://www.danielecook.com/a-tool-for-writing-tils-today-i-learned/</link>
      <pubDate>Wed, 22 Apr 2020 01:15:53 +0000</pubDate>
      
      <guid>https://www.danielecook.com/a-tool-for-writing-tils-today-i-learned/</guid>
      <description>&lt;p&gt;There is a great repo on GitHub of &lt;a href=&#34;https://github.com/jbranchaud/til/blob/master/README.md&#34;&gt;TILs&lt;/a&gt;. The author (jbranchaud) states:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A collection of concise write-ups on small things I learn day to day across a variety of languages and technologies. These are things that don&amp;rsquo;t really warrant a full blog post.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I think this is a pretty cool idea, but putting together the repo with the index can take time and interrupt your workflow. I wanted to make it very quick and easy to add TILs, so I wrote &lt;a href=&#34;https://www.github.com/danielecook/til-tool&#34;&gt;TIL-Tool&lt;/a&gt;. &lt;strong&gt;TIL-tool&lt;/strong&gt; is a command line application invoked using &lt;code&gt;til&lt;/code&gt; that makes it very easy to write TILs and generate an index.&lt;/p&gt;
&lt;p&gt;To create a new TIL, run &lt;code&gt;til open topic/title&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;til open Python/list_comprehensions
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This will open up a new text document which you can edit. Once you are done you can save the document and &lt;code&gt;til&lt;/code&gt;. If you have configured a git remote (e.g. on GitHub), then you can then run &lt;code&gt;til push&lt;/code&gt; which will build an index and push changes.&lt;/p&gt;
&lt;p&gt;All TILs are stored in &lt;code&gt;~/.til&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://github.com/danielecook/til-tool&#34;&gt;TIL-Tool on GitHub&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;http://www.github.com/danielecook/til&#34;&gt;TIL Example Repo&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>You can walk to In-N-Out from LAX</title>
      <link>https://www.danielecook.com/you-can-walk-to-in-n-out-from-lax/</link>
      <pubDate>Thu, 30 Jan 2020 00:26:29 +0100</pubDate>
      
      <guid>https://www.danielecook.com/you-can-walk-to-in-n-out-from-lax/</guid>
      <description>&lt;p&gt;Don&amp;rsquo;t believe everything you read on the internet. There is an In-N-Out just outside of LAX on Sepulveda Blvd which I had thought was walkable. I was discouraged after reading &lt;a href=&#34;https://www.yelp.com/topic/los-angeles-in-n-out-next-to-lax-walkable&#34;&gt;this thread&lt;/a&gt; which suggests the walk to In-N-Out from LAX is not possible during a layover. Turns out, on a decent layover (~2.5 hours), it is totally possible!&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/lax_walk.png&#34; alt=&#34;lax-direction&#34;&gt;&lt;/p&gt;
&lt;p&gt;We gave it a go because I am a big fan of In-N-Out. Go to the lower-level (arrivals) and head east. Turn left on Sky Way and follow it until you hit the stairs down to S. Sepulveda Blvd. Then head North and cross once you see In-N-Out on the left. It is not he most glamorous of walks, but a great way to get out of the airport. It is about a mile each way (~22 min).&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/lax.jpeg&#34; alt=&#34;walk-to-lax&#34;&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Parallelize by iterating over chromosomal ranges</title>
      <link>https://www.danielecook.com/parallelize-by-iterating-over-chromosomal-ranges/</link>
      <pubDate>Wed, 29 Jan 2020 01:15:53 +0000</pubDate>
      
      <guid>https://www.danielecook.com/parallelize-by-iterating-over-chromosomal-ranges/</guid>
      <description>&lt;p&gt;I have added a new utility to &lt;code&gt;seq-collection&lt;/code&gt; called &lt;code&gt;iter&lt;/code&gt; which generates chromosomal ranges. Lists of genomic ranges can be easily plugged into utilities such as &lt;code&gt;xargs&lt;/code&gt; or &lt;a href=&#34;https://www.gnu.org/software/parallel/&#34;&gt;gnu-parallel&lt;/a&gt; to parallelize commands.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;sc iter test.bam 100,000 &lt;span style=&#34;color:#75715e&#34;&gt;# Iterate on bins of 100k base pairs&lt;/span&gt;

&lt;span style=&#34;color:#75715e&#34;&gt;# Outputs&lt;/span&gt;
&amp;gt; I:0-999999
&amp;gt; I:1000000-1999999
&amp;gt; I:2000000-2999999
&amp;gt; I:3000000-3999999
&amp;gt; I:4000000-4999999
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;!-- raw HTML omitted --&gt;&lt;!-- raw HTML omitted --&gt;Note:&lt;!-- raw HTML omitted --&gt; BAMs use a 0-based coordinate system; VCFs are 1-based&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;p&gt;This list of genomic ranges can be used to process a BAM or VCF in parallel:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;function&lt;/span&gt; process_chunk &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;
  &lt;span style=&#34;color:#75715e&#34;&gt;# Code to process chunk&lt;/span&gt;
  vcf&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;$1
  region&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;$2
  &lt;span style=&#34;color:#75715e&#34;&gt;# e.g. bcftools call -m --region&lt;/span&gt; 
  echo bcftools call --region $region $vcf &lt;span style=&#34;color:#75715e&#34;&gt;# ...&lt;/span&gt;
&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;

&lt;span style=&#34;color:#75715e&#34;&gt;# Export the function to make it available to GNU parallel&lt;/span&gt;
export -f process_chunk

parallel --verbose process_chunk ::: test.bam ::: &lt;span style=&#34;color:#66d9ef&#34;&gt;$(&lt;/span&gt;sc iter test.bam&lt;span style=&#34;color:#66d9ef&#34;&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can also set the &lt;code&gt;[width]&lt;/code&gt; option to 0 to generate a list of chromosomes.&lt;/p&gt;
&lt;p&gt;See &lt;a href=&#34;https://www.danielecook.com/using-gnu-parallel-for-bioinformatics/&#34;&gt;Using GNU-Parallel for Bioinformatics&lt;/a&gt; for a comprehensive guide on using Parallel for bioinformatics.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/danielecook/seq-collection&#34;&gt;seq-collection&lt;/a&gt; (&lt;strong&gt;sc&lt;/strong&gt;) is a set of tools written in &lt;a href=&#34;https://nim-lang.org/&#34;&gt;nim&lt;/a&gt; and using the fantastic &lt;a href=&#34;https://github.com/brentp/hts-nim&#34;&gt;hts-nim&lt;/a&gt; package.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Calculate Insert Size Metrics Faster</title>
      <link>https://www.danielecook.com/calculate-insert-size-metrics-faster/</link>
      <pubDate>Wed, 29 Jan 2020 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/calculate-insert-size-metrics-faster/</guid>
      <description>&lt;p&gt;Picard tools is a great set of utilities by the Broad Institute for performing sequence analysis. however, some of the utilities run on the slower side.&lt;/p&gt;
&lt;p&gt;To speed things up, I created a new command: &lt;code&gt;insert-size&lt;/code&gt; as part of &lt;a href=&#34;https://www.github.com/danielecook/seq-collection&#34;&gt;seq-collection&lt;/a&gt;. The command runs much faster, owing in part to parallelization of insert-size calculations.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/insert-size-benchmark.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;insert-size&lt;/code&gt; does not operate in exactly the same way as picard &lt;code&gt;CollectInsertSizeMetrics&lt;/code&gt;, but the results are very close.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/insert_size_compare.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;insert-size&lt;/code&gt; has some nice advantages over picard. The output is a lot more interpretable and parsable than standard picard output.&lt;/p&gt;
&lt;p&gt;For example, if you run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;sc insert-size --basename --header tests/data/test.bam
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The outputted table will be:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&#34;right&#34;&gt;median&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;mean&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;std_dev&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;min&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;percentile_99.5&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;max_all&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;n_reads&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;n_accept&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;n_use&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;sample&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;basename&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&#34;right&#34;&gt;179&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;176.5&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;63.954&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;38&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;358&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;359&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;237&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;101&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;100&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;AB1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;test.bam&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;You can also output the distribution of insert-sizes by count by specifying the &lt;code&gt;--dist=&amp;lt;filename&amp;gt;&lt;/code&gt; argument.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/danielecook/seq-collection&#34;&gt;seq-collection&lt;/a&gt; (&lt;strong&gt;sc&lt;/strong&gt;) is a set of tools written in &lt;a href=&#34;https://nim-lang.org/&#34;&gt;nim&lt;/a&gt; and using the fantastic &lt;a href=&#34;https://github.com/brentp/hts-nim&#34;&gt;hts-nim&lt;/a&gt; package.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>From Pandas to Google Sheets</title>
      <link>https://www.danielecook.com/from-pandas-to-google-sheets/</link>
      <pubDate>Fri, 25 Oct 2019 01:15:53 +0000</pubDate>
      
      <guid>https://www.danielecook.com/from-pandas-to-google-sheets/</guid>
      <description>&lt;p&gt;I wrote the following snippet to post datasets (e.g. TSVs or CSVs) to google sheets. In order to get this to work you will need to &lt;a href=&#34;https://gspread.readthedocs.io/en/latest/oauth2.html&#34;&gt;authorize google sheets access&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Then you can set the content of any google sheets worksheet to the data from a pandas dataframe by using the &lt;code&gt;pandas_to_sheets&lt;/code&gt; function.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#!/usr/bin/env python&lt;/span&gt;
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; gspread
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; pandas &lt;span style=&#34;color:#f92672&#34;&gt;as&lt;/span&gt; pd
&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; oauth2client.service_account &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; ServiceAccountCredentials

&lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;iter_pd&lt;/span&gt;(df):
    &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; val &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; df&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;columns:
        &lt;span style=&#34;color:#66d9ef&#34;&gt;yield&lt;/span&gt; val
    &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; row &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; df&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;to_numpy():
        &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; val &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; row:
            &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; pd&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;isna(val):
                &lt;span style=&#34;color:#66d9ef&#34;&gt;yield&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;
            &lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;:
                &lt;span style=&#34;color:#66d9ef&#34;&gt;yield&lt;/span&gt; val

&lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;pandas_to_sheets&lt;/span&gt;(pandas_df, sheet, clear &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; True):
    &lt;span style=&#34;color:#75715e&#34;&gt;# Updates all values in a workbook to match a pandas dataframe&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; clear:
        sheet&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;clear()
    (row, col) &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; pandas_df&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;shape
    cells &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; sheet&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;range(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A1:{}&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;format(gspread&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;utils&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;rowcol_to_a1(row &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;, col)))
    &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; cell, val &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; zip(cells, iter_pd(pandas_df)):
        cell&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;value &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; val
    sheet&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;update_cells(cells)

scope &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; [&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;https://spreadsheets.google.com/feeds&amp;#39;&lt;/span&gt;,
         &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;https://www.googleapis.com/auth/drive&amp;#39;&lt;/span&gt;]

credentials &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; ServiceAccountCredentials&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;from_json_keyfile_name(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;service.json&amp;#39;&lt;/span&gt;, scope)

gc &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; gspread&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;authorize(credentials)

workbook &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; gc&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;open_by_key(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;lt;workbook id&amp;gt;&amp;#34;&lt;/span&gt;)
sheet &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; workbook&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;worksheet(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;worksheet_name&amp;#34;&lt;/span&gt;)

df &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; pd&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;read_csv(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;input_data.tsv&amp;#34;&lt;/span&gt;)
pandas_to_sheets(df, workbook&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;worksheet(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;worksheet&amp;#34;&lt;/span&gt;))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Speeding up Reading and Writing in R</title>
      <link>https://www.danielecook.com/speeding-up-reading-and-writing-in-r/</link>
      <pubDate>Sun, 20 Oct 2019 01:30:53 +0000</pubDate>
      
      <guid>https://www.danielecook.com/speeding-up-reading-and-writing-in-r/</guid>
      <description>&lt;p&gt;If you are relying on built-in functions to read and write large datasets you are losing out  on efficiency and speed gains available through external packages in R. Below, I benchmark some of the options out there used for reading and writing files.&lt;/p&gt;
&lt;h1 id=&#34;sample-data&#34;&gt;Sample Data&lt;/h1&gt;
&lt;p&gt;First, I&amp;rsquo;ll generate a sample dataset with ten million rows we can use for testing.&lt;/p&gt;
&lt;h2 id=&#34;generating-a-test-dataset&#34;&gt;Generating a test dataset&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;library&lt;/span&gt;(tidyverse)
&lt;span style=&#34;color:#a6e22e&#34;&gt;library&lt;/span&gt;(microbenchmark)
n &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1e6&lt;/span&gt;
times &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;# Number of times to run each benchmark&lt;/span&gt;
data &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;data.frame&lt;/span&gt;(a &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;runif&lt;/span&gt;(n),
                   b &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;sample&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;1000&lt;/span&gt;, n, T),
                   c &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;sample&lt;/span&gt;(month.name, n, T),
                   d &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;sample&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;LETTERS&lt;/span&gt;, n, T))

&lt;span style=&#34;color:#a6e22e&#34;&gt;write.table&lt;/span&gt;(data, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;data.tsv&amp;#34;&lt;/span&gt;, quote &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; F, row.names &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; F)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here are the first few rows of that dataset:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&#34;left&#34;&gt;&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;a&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;b&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;c&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;d&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.1926477&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;789&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;August&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;R&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;2&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.8303095&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;156&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;March&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;D&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;3&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.1144189&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;742&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;July&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;P&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;4&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.2828960&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;337&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;April&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;S&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;5&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.2861664&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;43&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;November&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;W&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h1 id=&#34;reading-tsvs&#34;&gt;Reading TSVs&lt;/h1&gt;
&lt;p&gt;Base R has some pretty slow functions for reading files that also are poorly designed (row numbers and quotes by default, issues reading column names with special characters, etc.). Lets see how they compare with more up to date packages.&lt;/p&gt;
&lt;h2 id=&#34;vroom-vs-readr-vs-base-r-vs-datatable&#34;&gt;vroom vs readr vs base R vs data.table&lt;/h2&gt;
&lt;p&gt;Below I use microbenchmark to compare the following methods for reading this 1M row dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;base::read.table&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;base::read.delim&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;readr::read_tsv&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;vroom::vroom&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;data.table::fread&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tbl_df(data.table::fread)&lt;/code&gt; - This converts the data.table to a &lt;code&gt;tibble::tbl_df&lt;/code&gt; object which is the type of data structure &lt;code&gt;readr&lt;/code&gt; and &lt;code&gt;vroom&lt;/code&gt; return and is what is used in the &lt;a href=&#34;https://www.tidyverse.org/&#34;&gt;tidyverse&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These functions will output data in either a data.frame, tibble, or data.table.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;bm &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;microbenchmark&lt;/span&gt;(
  `base::read.table` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;read.table&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;data.tsv&amp;#34;&lt;/span&gt;),
  `base::read.delim` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;read.delim&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;data.tsv&amp;#34;&lt;/span&gt;),
  `readr::read_tsv` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; readr&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;read_tsv&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;data.tsv&amp;#34;&lt;/span&gt;),
  `vroom ~ 1 thread` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; vroom&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;vroom&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;data.tsv&amp;#34;&lt;/span&gt;, num_threads &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;),
  `vroom ~ 8 threads` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; vroom&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;vroom&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;data.tsv&amp;#34;&lt;/span&gt;),
  &lt;span style=&#34;color:#a6e22e&#34;&gt;`tbl_df&lt;/span&gt;(data.table&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;fread) &lt;span style=&#34;color:#f92672&#34;&gt;~&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; thread` = tbl_df(data.table::fread(&amp;#34;data.tsv&amp;#34;, nThread = 1)),
  `&lt;span style=&#34;color:#a6e22e&#34;&gt;tbl_df&lt;/span&gt;(data.table&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;fread) &lt;span style=&#34;color:#f92672&#34;&gt;~&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;8&lt;/span&gt; threads` = tbl_df(data.table::fread(&amp;#34;data.tsv&amp;#34;)),
  `data.table&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;fread &lt;span style=&#34;color:#f92672&#34;&gt;~&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; thread` = data.table::fread(&amp;#34;data.tsv&amp;#34;, nThread = 1),
  `data.table&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;fread &lt;span style=&#34;color:#f92672&#34;&gt;~&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;8&lt;/span&gt; threads` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; data.table&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;fread&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;data.tsv&amp;#34;&lt;/span&gt;),
  times &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; times
)
&lt;span style=&#34;color:#a6e22e&#34;&gt;autoplot&lt;/span&gt;(bm) &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; 
  &lt;span style=&#34;color:#a6e22e&#34;&gt;labs&lt;/span&gt;(caption &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; glue&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;glue&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;{scales::comma(n)} rows; {times} times&amp;#34;&lt;/span&gt;))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/benchmarks-1.png&#34; alt=&#34;&#34;&gt;&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;p&gt;Looks like the base R functions lose - by a lot. &lt;code&gt;data.table::fread&lt;/code&gt; and &lt;code&gt;vroom::vroom&lt;/code&gt; come out on top at ~ 100 milleseconds whereas the base functions take ~10 seconds or 100x longer!&lt;/p&gt;
&lt;p&gt;Stop wasting your time with &lt;code&gt;read.table&lt;/code&gt;, &lt;code&gt;read.csv&lt;/code&gt;, and &lt;code&gt;read.delim&lt;/code&gt; and move to something quicker like &lt;code&gt;data.table::fread&lt;/code&gt;, or &lt;code&gt;vroom::vroom&lt;/code&gt; both of which perform much faster. Both can also take advantage of multiple cores but outperform base R even when they only use a single thread!&lt;/p&gt;
&lt;h1 id=&#34;writing-tsvs&#34;&gt;Writing TSVs&lt;/h1&gt;
&lt;h2 id=&#34;vroom-vs-readr-vs-datatable-vs-base-r&#34;&gt;vroom vs readr vs data.table vs base R&lt;/h2&gt;
&lt;p&gt;Next I compared methods for writing TSV files. Base R has the functions &lt;code&gt;write.csv&lt;/code&gt; and &lt;code&gt;write.table&lt;/code&gt; for writing delimited text files. Unfortunately, these too have poor defaults (quoting strings, adding rownames).  I have turned these off for the comparison.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;bm &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;microbenchmark&lt;/span&gt;(
    `base::write.table` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;write.table&lt;/span&gt;(data, file &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;out.tsv&amp;#34;&lt;/span&gt;, quote&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;F, sep &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;\t&amp;#34;&lt;/span&gt;),
    `readr::write_tsv` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; readr&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;write_tsv&lt;/span&gt;(data, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;out.tsv&amp;#34;&lt;/span&gt;),
    `readr::write_tsv + gz` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; readr&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;write_tsv&lt;/span&gt;(data, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;out.tsv.gz&amp;#34;&lt;/span&gt;),
    `data.table::fwrite ~ 1 thread` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; data.table&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;fwrite&lt;/span&gt;(data, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;out.tsv&amp;#34;&lt;/span&gt;, nThread &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;),
    `data.table::fwrite ~ 8 threads` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; data.table&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;fwrite&lt;/span&gt;(data, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;out.tsv&amp;#34;&lt;/span&gt;, nThread &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;8&lt;/span&gt;),
    `vroom::vroom_write ~ 1 thread` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; vroom&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;vroom_write&lt;/span&gt;(data, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;out.tsv&amp;#34;&lt;/span&gt;, num_threads &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;),
    `vroom::vroom_write ~ 8 threads` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; vroom&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;vroom_write&lt;/span&gt;(data, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;out.tsv&amp;#34;&lt;/span&gt;),
    `vroom::vroom_write ~ 1 thread + gz` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; vroom&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;vroom_write&lt;/span&gt;(data, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;out.tsv.gz&amp;#34;&lt;/span&gt;, num_threads &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;),
    `vroom::vroom_write ~ 8 threads + gz` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; vroom&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;vroom_write&lt;/span&gt;(data, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;out.tsv.gz&amp;#34;&lt;/span&gt;),
    times &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; times
)
&lt;span style=&#34;color:#a6e22e&#34;&gt;autoplot&lt;/span&gt;(bm) 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/writing_tsv-1.png&#34; alt=&#34;&#34;&gt;&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;data.table::fwrite&lt;/code&gt; performs the fastest in multi-threaded mode with &lt;code&gt;vroom::vroom&lt;/code&gt; not far behind. These are ~100x faster than base R. Apparently, applying gzip compression slows things down considerably but can save a lot of space.&lt;/p&gt;
&lt;h3 id=&#34;serializing-data&#34;&gt;Serializing Data&lt;/h3&gt;
&lt;p&gt;Serialized data formats retain column types and avoid data loss that may occur when writing and reading TSVs. Here I compare:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;feather::write_feather&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fst::write_fst&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;base::save&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;base::saveRDS&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note that these serialization formats each provide other benefits that should be considered. For example, feather files are a good interchange format between R and python using the python Pandas module.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;bm &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;microbenchmark&lt;/span&gt;(
   `base::save`&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;save&lt;/span&gt;(data, file &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;out.Rda&amp;#34;&lt;/span&gt;),
   `saveRDS` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;saveRDS&lt;/span&gt;(data, file &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;out.rds&amp;#34;&lt;/span&gt;),
   `fst::write_fst` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; fst&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;write_fst&lt;/span&gt;(data, path &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;out.fst&amp;#34;&lt;/span&gt;),
   `feather::write_feather` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; feather&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;write_feather&lt;/span&gt;(data, path &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;out.feather&amp;#34;&lt;/span&gt;),
   times &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; times
)
&lt;span style=&#34;color:#a6e22e&#34;&gt;autoplot&lt;/span&gt;(bm) 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/serialize-1.png&#34; alt=&#34;&#34;&gt;&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;fst&lt;/code&gt; and &lt;code&gt;feather&lt;/code&gt; perform about the same and again, about ~100x better than base R.&lt;/p&gt;
&lt;h3 id=&#34;reading-serialized-data&#34;&gt;Reading Serialized Data&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;bm &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;microbenchmark&lt;/span&gt;(
   `base::load`&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;load&lt;/span&gt;(file &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;out.Rda&amp;#34;&lt;/span&gt;),
   `readRDS` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;readRDS&lt;/span&gt;(file &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;out.rds&amp;#34;&lt;/span&gt;),
   `fst::read_fst` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; fst&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;read_fst&lt;/span&gt;(path &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;out.fst&amp;#34;&lt;/span&gt;),
   `feather::read_feather` &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; feather&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;read_feather&lt;/span&gt;(path &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;out.feather&amp;#34;&lt;/span&gt;),
   times &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; times
)
&lt;span style=&#34;color:#a6e22e&#34;&gt;autoplot&lt;/span&gt;(bm) 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/serialize_bm-1.png&#34; alt=&#34;&#34;&gt;&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;fst&lt;/code&gt; reads the quickest but &lt;code&gt;feather&lt;/code&gt; is not too far behind. These functions are about ~10x better than base R in this comparison.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Using GNU-Parallel for bioinformatics</title>
      <link>https://www.danielecook.com/using-gnu-parallel-for-bioinformatics/</link>
      <pubDate>Fri, 27 Sep 2019 01:30:53 +0000</pubDate>
      
      <guid>https://www.danielecook.com/using-gnu-parallel-for-bioinformatics/</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://www.gnu.org/software/parallel/&#34;&gt;GNU Parallel&lt;/a&gt; is an indispensible tool for speeding up bioinformatics. It allows you to easily parallelize commands. Below, I detail some of the basics regarding how it is used and how it can be applied to bioinformatics.&lt;/p&gt;
&lt;p&gt;Many HPC clusters will have GNU-Parallel pre-installed or available as a module. You can also install it using &lt;a href=&#34;brew.sh&#34;&gt;homebrew&lt;/a&gt; or other package managers.&lt;/p&gt;
&lt;h1 id=&#34;basic-usage&#34;&gt;Basic Usage&lt;/h1&gt;
&lt;p&gt;Lets start with a basic example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;seq &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; | parallel -j &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; echo
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here we are (1) Printing a sequence of numbers from 1 to 5, and (2) piping this data into &lt;code&gt;parallel&lt;/code&gt;. We have provided the command &lt;code&gt;echo&lt;/code&gt; which will be parallelized across &lt;code&gt;-j=4&lt;/code&gt; jobs. We can see what this looks like by using the &lt;code&gt;--dry-run&lt;/code&gt; flag which prints the commands to be run.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;seq &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; | parallel --dry-run -j &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; echo
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;echo 3
echo 4
echo 5
echo 2
echo 1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The results are out of order! This is due to the nature of parallelization. Not all &amp;ldquo;jobs&amp;rdquo; initiate or take the same amount of time so it is common to observe outputs in a different order. We can enforce a &amp;ldquo;first in first out&amp;rdquo; result set by using the &lt;code&gt;-k&lt;/code&gt; flag. Lets see what the &lt;strong&gt;output&lt;/strong&gt; looks like by removing the &lt;code&gt;--dry-run&lt;/code&gt; flag:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;seq &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; | parallel -j &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; -k echo
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;1
2
3
4
5
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Like any other command, you can send this output to a file:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;seq &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; | parallel -j &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; -k echo &amp;gt; out.txt
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;-j&#34;&gt;&lt;code&gt;-j&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;In order for GNU Parllel to work, you want to have a multi-core CPU. Parallelizing across more cores than you have available can actually make performance &lt;strong&gt;worse&lt;/strong&gt;, so it is important to tune the &lt;code&gt;-j&lt;/code&gt; parameter to the number of cores available.&lt;/p&gt;
&lt;p&gt;Luckily, parallel allows you to specify &lt;code&gt;-j&lt;/code&gt; using a percentage of cores or as a number relative to the total number of cores. For example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;parallel -j 100% &lt;span style=&#34;color:#75715e&#34;&gt;# Uses 100% of cores.&lt;/span&gt;
parallel -j -1 &lt;span style=&#34;color:#75715e&#34;&gt;# Uses 1 less than the total number of cores.&lt;/span&gt;
parallel -j +1 &lt;span style=&#34;color:#75715e&#34;&gt;# Parallelize across the number of cores + 1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;use--for-args&#34;&gt;Use &lt;code&gt;:::&lt;/code&gt; for args&lt;/h2&gt;
&lt;p&gt;Use &lt;code&gt;:::&lt;/code&gt; to specify arguments derived from commands or lists.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;parallel -j &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; -k echo ::: &lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;seq &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; 5&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt; - There are limits to the number of arguments you can provide with a process substitution as shown above. In these instances, it may be better to pipe arguments or use a file (below) rather than supply them with a process substitution:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;seq &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; | parallel -j &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; -k echo
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;use--for-args-within-files&#34;&gt;Use &lt;code&gt;::::&lt;/code&gt; for args within files&lt;/h2&gt;
&lt;p&gt;For large argument lists you can specify a file with a list of arguments. Specify a file of arguments (one per line) using &lt;code&gt;::::&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;parallel -j &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; -k echo :::: my_args.txt
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;use-&#34;&gt;Use `&lt;/h2&gt;
&lt;p&gt;By default, &lt;code&gt;parallel&lt;/code&gt; assumes the arguments are placed at the end of the input command, but you can explicitly define where arguments are substituted using &lt;code&gt;{}&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;parallel --dry-run -j &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; -k echo &lt;span style=&#34;color:#ae81ff&#34;&gt;\&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;{}&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;lt;-- a number&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\&amp;#34;&lt;/span&gt; ::: &lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;seq &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; 5&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;echo &amp;#34;1 &amp;lt;-- a number&amp;#34;
echo &amp;#34;2 &amp;lt;-- a number&amp;#34;
echo &amp;#34;3 &amp;lt;-- a number&amp;#34;
echo &amp;#34;4 &amp;lt;-- a number&amp;#34;
echo &amp;#34;5 &amp;lt;-- a number&amp;#34;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Notice that we are having to escape quotes - there are ways around this.&lt;/p&gt;
&lt;h2 id=&#34;combinatorials&#34;&gt;Combinatorials&lt;/h2&gt;
&lt;p&gt;You can keep adding &lt;code&gt;:::&lt;/code&gt; and &lt;code&gt;::::&lt;/code&gt; to add additional arguments, and these will be combined to generate all possible combinations. This is extremely useful for testing commands with different combinations of input parameters.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;parallel --dry-run -k -j &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; Rscript run_analysis.R &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;1&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;2&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt; ::: &lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;seq &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; 2&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt; ::: A B C
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;Rscript run_analysis.R 1 A
Rscript run_analysis.R 1 B
Rscript run_analysis.R 1 C
Rscript run_analysis.R 2 A
Rscript run_analysis.R 2 B
Rscript run_analysis.R 2 C
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;parallelize-functions&#34;&gt;Parallelize Functions&lt;/h2&gt;
&lt;p&gt;In some cases, you want to perform a series of commands. For example, the code below compute the number of ATCGs of the complement of a DNA sequence.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;echo &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;ATTA&amp;#34;&lt;/span&gt; |  tr ATCG TAGC | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    python -c &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;import sys; o=sys.stdin.read().strip(); print(o, o.count(&amp;#39;T&amp;#39;), o.count(&amp;#39;G&amp;#39;), o.count(&amp;#39;C&amp;#39;), o.count(&amp;#39;A&amp;#39;))&amp;#34;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This command has two operations. While is it possible to incorporate this into a &amp;lsquo;one-liner&amp;rsquo;, it is far easier to create a bash function, export it, and use that as input.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;function&lt;/span&gt; count_nts &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;
    &lt;span style=&#34;color:#75715e&#34;&gt;# $1 is the first argument passed to the function&lt;/span&gt;
    echo $1 | tr ATCG TAGC | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    python -c &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;import sys; o=sys.stdin.read().strip(); print(o, o.count(&amp;#39;T&amp;#39;), o.count(&amp;#39;G&amp;#39;), o.count(&amp;#39;C&amp;#39;), o.count(&amp;#39;A&amp;#39;))&amp;#34;&lt;/span&gt;
&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;

&lt;span style=&#34;color:#75715e&#34;&gt;# Use the `-f` flag to export functions&lt;/span&gt;
export -f count_nts

parallel -j &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; count_nts ::: TAAT TTT AAAAT GCGCAT | tr &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;\t&amp;#39;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With the basics down, lets see how we can use parallel to speed up bioinformatics.&lt;/p&gt;
&lt;h1 id=&#34;gnu-parallel-for-variant-calling&#34;&gt;GNU Parallel for Variant Calling&lt;/h1&gt;
&lt;p&gt;When working with BAMs or VCFs you can parallelize across chromosomes. Most variant callers or annotation tools allow you to operate on a single chromosome at a time by specifying a region. This allows us to apply a &lt;code&gt;split-apply-combine&lt;/code&gt; strategy by splitting by chromosome, operating on each chromosome, and combining the results at the end.&lt;/p&gt;
&lt;h2 id=&#34;__split__-chromosomes-from-a-bam&#34;&gt;&lt;strong&gt;Split&lt;/strong&gt; chromosomes from a BAM&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;
chrom_list&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;samtools idxstats in.bam | cut -f &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; | grep -v &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;*&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;

&lt;span style=&#34;color:#75715e&#34;&gt;# For c. elegans you can would see the following 7&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# I&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# II&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# III&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# IV&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# V&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# X&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# MtDNA&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can create a function so this operation is easier going forward:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;function&lt;/span&gt; bam_chromosomes &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;
    samtools idxstats $1 | cut -f &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; | grep -v &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;*&amp;#39;&lt;/span&gt;
&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;__apply__-an-operation-to-each-chromosome&#34;&gt;&lt;strong&gt;Apply&lt;/strong&gt; an operation to each chromosome&lt;/h2&gt;
&lt;p&gt;Here is where GNU parallel comes into play: Parallelized variant calling by chromosome:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#!/bin/bash
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;
genome&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;path/to/genome.fa
export genome &lt;span style=&#34;color:#75715e&#34;&gt;# This is critical!&lt;/span&gt;

&lt;span style=&#34;color:#66d9ef&#34;&gt;function&lt;/span&gt; parallel_call &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;
    bcftools mpileup &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;        --fasta-ref &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;genome&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;        --regions $2 &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;        --output-type u &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;        $1 | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    bcftools call --multiallelic-caller &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;                  --variants-only &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;                  --output-type u - &amp;gt; &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;1/.bam/&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;.$2.bcf
&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;

export -f parallel_call

chrom_set&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;bam_chromosomes test.bam&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
parallel --verbose -j &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; parallel_call sample_A.bam ::: &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;chrom_set&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A few important notes regarding this step:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You must export any variables you use within a parallelized function. That is what I am doing here with the reference &lt;code&gt;genome&lt;/code&gt; variable.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bcftools mpileup&lt;/code&gt; outputs an uncompressed pileup (&lt;code&gt;--output-type=u&lt;/code&gt;). This is done for efficiency sake - there is no reason to pipe a compressed form of data for it to need to be uncompressed by the next tool.&lt;/li&gt;
&lt;li&gt;Similarly, I also output an uncompressed set of variant calls &lt;code&gt;${1/.bam/}.$2.bcf&lt;/code&gt; because these are temporary files that we will remove later.&lt;/li&gt;
&lt;li&gt;Finally, I use a variable substitution to remove the extension from the bam and to generate a &lt;code&gt;&amp;lt;sample&amp;gt;.&amp;lt;chromsome&amp;gt;.bcf&lt;/code&gt; filename: &lt;code&gt;${1/.bam/}.$2.bcf&lt;/code&gt; → &lt;code&gt;sample_A.I.bam&lt;/code&gt;, &lt;code&gt;sample_A.II.bam&lt;/code&gt;, etc. This prevents filename collisions if we are calling many samples simultaneously.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;__combine__-the-variant-calls&#34;&gt;&lt;strong&gt;Combine&lt;/strong&gt; the variant calls.&lt;/h2&gt;
&lt;p&gt;Once we have completed variant calling we need to combine everything back in the right order. We can use a bash array to add a prefix and suffix to the list of chromosomes to reconstruct the output filenames and concatenate them into a single file.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Generate an array of the resulting files&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# to be concatenated.&lt;/span&gt;
sample_name&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;sample_A&amp;#34;&lt;/span&gt;
set -- &lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;echo $chrom_set | tr &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;\n&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34; &amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
set -- &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;@/#/&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;sample_name&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;.&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt; set -- &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;@/%/.bcf&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# This will generate a list of the output files:&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# sample_A.I.bcf sample_B.II.bcf etc.&lt;/span&gt;

set -- &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;@/#/test.&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt; set -- &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;@/%/.bcf&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;

&lt;span style=&#34;color:#75715e&#34;&gt;# Output compressed result&lt;/span&gt;
bcftools concat $@ --output-type b &amp;gt; $sample_name.bcf

&lt;span style=&#34;color:#75715e&#34;&gt;# Remove intermediate files&lt;/span&gt;
rm $@
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To ensure the intermediate files are removed even when errors occur you should use a &lt;a href=&#34;http://redsymbol.net/articles/bash-exit-traps/&#34;&gt;bash trap&lt;/a&gt;.&lt;/p&gt;
&lt;h1 id=&#34;summary&#34;&gt;Summary&lt;/h1&gt;
&lt;p&gt;GNU Parallel can greatly speed up simple parallelization scenerios. Additional code is often required to handle the &amp;ldquo;splitting&amp;rdquo; and &amp;ldquo;combining&amp;rdquo; steps, but this can allow for tremendous efficiency gains.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Converting VCF To JSON</title>
      <link>https://www.danielecook.com/converting-vcf-to-json/</link>
      <pubDate>Sun, 22 Sep 2019 01:15:53 +0000</pubDate>
      
      <guid>https://www.danielecook.com/converting-vcf-to-json/</guid>
      <description>&lt;p&gt;Recently I started developing a set of utilities called &lt;a href=&#34;https://github.com/danielecook/seq-collection&#34;&gt;seq-collection&lt;/a&gt; (&lt;strong&gt;sc&lt;/strong&gt;) written in &lt;a href=&#34;https://nim-lang.org/&#34;&gt;nim&lt;/a&gt; and using the fantastic &lt;a href=&#34;https://github.com/brentp/hts-nim&#34;&gt;hts-nim&lt;/a&gt; package.&lt;/p&gt;
&lt;p&gt;The first utility I added was a tool to convert a VCF to JSON. This tool is useful for building out an API that reads genotype data directly from the VCF format. It is possible to read specific variants or intervals of VCF files when they are indexed, allowing for fast and efficient querying of genetic data without the need for a database. Furthermore, these queries can be made over http connections making it possible to use a VCF file as a database.&lt;/p&gt;
&lt;p&gt;For example, as a graduate student I developed a &lt;a href=&#34;https://elegansvariation.org/data/browser/&#34;&gt;genome browser&lt;/a&gt; for &lt;em&gt;C. elegans&lt;/em&gt; wild isolates. Queries are made directly on specific genomic intervals of VCF files using &lt;code&gt;bcftools&lt;/code&gt;. However, a large amount of python code is used to convert VCF output ot JSON format. The &lt;code&gt;sc&lt;/code&gt; utility could replace this underlying python code on the server with a simple binary and a few command line arguments to accomplish the same task. Below I have a few examples illustrating how to use &lt;code&gt;sc json&lt;/code&gt;.&lt;/p&gt;
&lt;h1 id=&#34;installation&#34;&gt;Installation&lt;/h1&gt;
&lt;p&gt;You can download the MAC OSX binary &lt;a href=&#34;https://github.com/danielecook/seq-collection/releases/download/0.0.1/sc_macosx&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Or build from source at &lt;a href=&#34;http://www.github.com/danielecook/seq-collection&#34;&gt;danielecook/seq-collection&lt;/a&gt;&lt;/p&gt;
&lt;h1 id=&#34;usage&#34;&gt;Usage&lt;/h1&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;json

Convert a VCF to JSON

Usage:
  json &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;options&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt; vcf &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;region ...&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt;

Arguments:
  vcf              VCF to convert to JSON
  &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;region ...&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt;     List of regions

Options:
  -i, --info&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;INFO            comma-delimited INFO fields; Use &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;ALL&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; everything
  -f, --format&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;FORMAT        comma-delimited FORMAT fields; Use &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;ALL&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; everything
  -s, --samples&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;SAMPLES      Set Samples &lt;span style=&#34;color:#f92672&#34;&gt;(&lt;/span&gt;default: ALL&lt;span style=&#34;color:#f92672&#34;&gt;)&lt;/span&gt;
  -p, --pretty               Prettify result
  -a, --array                Output as a JSON array instead of individual JSON lines
  -z, --zip                  Zip sample names with FORMAT fields &lt;span style=&#34;color:#f92672&#34;&gt;(&lt;/span&gt;e.g. &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;sample1&amp;#39;&lt;/span&gt;: 25, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;sample2&amp;#39;&lt;/span&gt;: 34&lt;span style=&#34;color:#f92672&#34;&gt;})&lt;/span&gt;
  -n, --annotation           Parse ANN Fields
  --pass                     Only output variants where FILTER&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;PASS
  --debug                    Debug
  -h, --help                 Show this help
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h1 id=&#34;examples&#34;&gt;Examples&lt;/h1&gt;
&lt;h2 id=&#34;list-all-sites&#34;&gt;List all sites&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;sc json tests/data/test.vcf.gz
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;{&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;CHROM&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;I&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;POS&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;41947&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ID&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;.&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;REF&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ALT&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;T&amp;#34;&lt;/span&gt;],&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;QUAL&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;999.0&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;FILTER&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;PASS&amp;#34;&lt;/span&gt;]}
{&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;CHROM&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;I&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;POS&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;105133&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ID&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;.&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;REF&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ALT&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A&amp;#34;&lt;/span&gt;],&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;QUAL&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;999.0&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;FILTER&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;PASS&amp;#34;&lt;/span&gt;]}
{&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;CHROM&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;I&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;POS&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;176422&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ID&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;.&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;REF&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ALT&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G&amp;#34;&lt;/span&gt;],&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;QUAL&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;999.0&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;FILTER&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;PASS&amp;#34;&lt;/span&gt;]}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;pretty-output&#34;&gt;pretty output&lt;/h2&gt;
&lt;p&gt;We can &amp;ldquo;prettify&amp;rdquo; this output using the &lt;code&gt;--pretty&lt;/code&gt; flag:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;sc json --pretty tests/data/test.vcf.gz
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;{
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;CHROM&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;I&amp;#34;&lt;/span&gt;,
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;POS&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#ae81ff&#34;&gt;41947&lt;/span&gt;,
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ID&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;.&amp;#34;&lt;/span&gt;,
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;REF&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A&amp;#34;&lt;/span&gt;,
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ALT&amp;#34;&lt;/span&gt;: [
    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;T&amp;#34;&lt;/span&gt;
  ],
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;QUAL&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#ae81ff&#34;&gt;999.0&lt;/span&gt;,
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;FILTER&amp;#34;&lt;/span&gt;: [
    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;PASS&amp;#34;&lt;/span&gt;
  ]
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;fetch-genotypes&#34;&gt;Fetch Genotypes&lt;/h2&gt;
&lt;p&gt;Next we can output genotype calls by specifying the &lt;code&gt;--FORMAT&lt;/code&gt; flag with a &lt;code&gt;GT&lt;/code&gt; argument:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&amp;gt; sc json --FORMAT&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;GT tests/data/test.vcf.gz
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;{&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;CHROM&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;I&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;POS&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;41947&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ID&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#66d9ef&#34;&gt;null&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;REF&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ALT&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;T&amp;#34;&lt;/span&gt;],&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;QUAL&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;999.0&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;FILTER&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;PASS&amp;#34;&lt;/span&gt;],&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;FORMAT&amp;#34;&lt;/span&gt;:{&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;GT&amp;#34;&lt;/span&gt;:[[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;]]}}
{&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;CHROM&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;I&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;POS&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;105133&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ID&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#66d9ef&#34;&gt;null&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;REF&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ALT&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A&amp;#34;&lt;/span&gt;],&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;QUAL&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;999.0&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;FILTER&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;PASS&amp;#34;&lt;/span&gt;],&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;FORMAT&amp;#34;&lt;/span&gt;:{&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;GT&amp;#34;&lt;/span&gt;:[[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;]]}}
{&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;CHROM&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;I&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;POS&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;176422&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ID&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#66d9ef&#34;&gt;null&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;REF&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ALT&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G&amp;#34;&lt;/span&gt;],&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;QUAL&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;999.0&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;FILTER&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;PASS&amp;#34;&lt;/span&gt;],&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;FORMAT&amp;#34;&lt;/span&gt;:{&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;GT&amp;#34;&lt;/span&gt;:[[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;],[&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;],[&lt;span style=&#34;color:#66d9ef&#34;&gt;null&lt;/span&gt;,&lt;span style=&#34;color:#66d9ef&#34;&gt;null&lt;/span&gt;]]}}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The genotypes are ordered by sample, and the numbers correspond as follows as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;-1&lt;/code&gt; → Missing&lt;/li&gt;
&lt;li&gt;&lt;code&gt;0&lt;/code&gt; → Reference&lt;/li&gt;
&lt;li&gt;&lt;code&gt;1&lt;/code&gt; → First ALT allele&lt;/li&gt;
&lt;li&gt;&lt;code&gt;2&lt;/code&gt; → Second ALT allele&lt;/li&gt;
&lt;li&gt;&lt;code&gt;3&lt;/code&gt; → etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can also use &lt;code&gt;SGT&lt;/code&gt; to outut a string representation of genotypes (e.g. &amp;ldquo;0/1&amp;rdquo;). It is also possible to use &lt;code&gt;TGT&lt;/code&gt; to output the actual bases:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt; sc json --format&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;TGT tests/data/test.vcf.gz
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;{&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;CHROM&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;I&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;POS&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;41947&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ID&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#66d9ef&#34;&gt;null&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;REF&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ALT&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;T&amp;#34;&lt;/span&gt;],&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;QUAL&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;999.0&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;FILTER&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;PASS&amp;#34;&lt;/span&gt;],&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;FORMAT&amp;#34;&lt;/span&gt;:{&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;TGT&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;]}}
{&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;CHROM&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;I&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;POS&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;105133&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ID&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#66d9ef&#34;&gt;null&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;REF&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ALT&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A&amp;#34;&lt;/span&gt;],&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;QUAL&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;999.0&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;FILTER&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;PASS&amp;#34;&lt;/span&gt;],&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;FORMAT&amp;#34;&lt;/span&gt;:{&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;TGT&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G/G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G/G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G/G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G/G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G/G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G/G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G/G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G/G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G/G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G/G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G/G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G/G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G/G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;]}}
{&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;CHROM&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;I&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;POS&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;176422&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ID&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#66d9ef&#34;&gt;null&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;REF&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ALT&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G&amp;#34;&lt;/span&gt;],&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;QUAL&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;999.0&lt;/span&gt;,&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;FILTER&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;PASS&amp;#34;&lt;/span&gt;],&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;FORMAT&amp;#34;&lt;/span&gt;:{&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;TGT&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G/G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;A/A&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G/G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;G/G&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;./.&amp;#34;&lt;/span&gt;]}}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Setting the working directory in R</title>
      <link>https://www.danielecook.com/setting-the-working-directory-in-r/</link>
      <pubDate>Tue, 09 Jul 2019 00:26:29 +0100</pubDate>
      
      <guid>https://www.danielecook.com/setting-the-working-directory-in-r/</guid>
      <description>&lt;p&gt;It is convenient to be able to set the working directory of a script to its parent directory. This allows you to point to the relative path of files associated with it. For example, if your working directory is set to the location of &lt;code&gt;init.sh&lt;/code&gt;, then you will be able to read in &lt;code&gt;data/file.dat&lt;/code&gt; without specifying its full path. If these files are in a git repo - you can also be assured they will travel together.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;├── init.sh
└── data
    └── file.dat
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In &lt;strong&gt;bash&lt;/strong&gt; you can set the directory to the location of the script being executed using:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;cd &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;0%/*&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;

&lt;span style=&#34;color:#75715e&#34;&gt;# Or more obviously&lt;/span&gt;
cd &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;$(&lt;/span&gt;dirname &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;BASH_SOURCE[0]&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;)&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In &lt;strong&gt;python&lt;/strong&gt; you can set the directory to the location of the script being executed using:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; os
&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; os.path &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; dirname, abspath

os&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;chdir(dirname(abspath(__file__)))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In &lt;strong&gt;R&lt;/strong&gt;, &lt;del&gt;unfortunately, no straightforward method exists&lt;/del&gt; &lt;!-- raw HTML omitted --&gt;Update 2020-06-29 - there is a way to do this; see the 2020-06-29 update below&lt;!-- raw HTML omitted --&gt;. The update below demonstrates how to get the current directory a script is located in, followed by additional ways of setting the working directory based on the git repo or with Rstudio.&lt;/p&gt;
&lt;h3 id=&#34;update-2020-06-29---a-way-to-get-the-script-directory-in-r&#34;&gt;Update: 2020-06-29 - A way to get the script directory in R&lt;/h3&gt;
&lt;p&gt;There is a way to set the working directory to a script location, &lt;a href=&#34;https://github.com/molgenis/molgenis-pipelines/wiki/How-to-source-another_file.R-from-within-your-R-script&#34;&gt;as shown here&lt;/a&gt;, you can get the location of a script using the function below.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;LocationOfThisScript &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;function&lt;/span&gt;() &lt;span style=&#34;color:#75715e&#34;&gt;# Function LocationOfThisScript returns the location of this .R script (may be needed to source other files in same dir)&lt;/span&gt;
{
	this.file &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;NULL&lt;/span&gt;
	&lt;span style=&#34;color:#75715e&#34;&gt;# This file may be &amp;#39;sourced&amp;#39;&lt;/span&gt;
	&lt;span style=&#34;color:#a6e22e&#34;&gt;for &lt;/span&gt;(i in &lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;sys.nframe&lt;/span&gt;())) {
		&lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;identical&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;sys.function&lt;/span&gt;(i), base&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;source)) this.file &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; (&lt;span style=&#34;color:#a6e22e&#34;&gt;normalizePath&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;sys.frame&lt;/span&gt;(i)&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;ofile))
	}

	&lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(&lt;span style=&#34;color:#f92672&#34;&gt;!&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;is.null&lt;/span&gt;(this.file)) &lt;span style=&#34;color:#a6e22e&#34;&gt;return&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;dirname&lt;/span&gt;(this.file))

	&lt;span style=&#34;color:#75715e&#34;&gt;# But it may also be called from the command line&lt;/span&gt;
	cmd.args &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;commandArgs&lt;/span&gt;(trailingOnly &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;FALSE&lt;/span&gt;)
	cmd.args.trailing &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;commandArgs&lt;/span&gt;(trailingOnly &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;TRUE&lt;/span&gt;)
	cmd.args &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; cmd.args&lt;span style=&#34;color:#a6e22e&#34;&gt;[seq.int&lt;/span&gt;(from&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;, length.out&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;length&lt;/span&gt;(cmd.args) &lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;length&lt;/span&gt;(cmd.args.trailing))]
	res &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;gsub&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;^(?:--file=(.*)|.*)$&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;\\1&amp;#34;&lt;/span&gt;, cmd.args)

	&lt;span style=&#34;color:#75715e&#34;&gt;# If multiple --file arguments are given, R uses the last one&lt;/span&gt;
	res &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;tail&lt;/span&gt;(res[res &lt;span style=&#34;color:#f92672&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;], &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)
	&lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;length&lt;/span&gt;(res)) &lt;span style=&#34;color:#a6e22e&#34;&gt;return&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;dirname&lt;/span&gt;(res))

	&lt;span style=&#34;color:#75715e&#34;&gt;# Both are not the case. Maybe we are in an R GUI?&lt;/span&gt;
	&lt;span style=&#34;color:#a6e22e&#34;&gt;return&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;NULL&lt;/span&gt;)
}
current.dir &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;LocationOfThisScript&lt;/span&gt;()
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;setwd-to-script-location-when-calling-rscript&#34;&gt;setwd to script location when calling &lt;code&gt;Rscript&lt;/code&gt;&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;getScriptPath &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;function&lt;/span&gt;(){
    cmd.args &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;commandArgs&lt;/span&gt;()
    m &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;regexpr&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;(?&amp;lt;=^--file=).+&amp;#34;&lt;/span&gt;, cmd.args, perl&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;TRUE&lt;/span&gt;)
    script.dir &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;dirname&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;regmatches&lt;/span&gt;(cmd.args, m))
    &lt;span style=&#34;color:#a6e22e&#34;&gt;if&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;length&lt;/span&gt;(script.dir) &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;) &lt;span style=&#34;color:#a6e22e&#34;&gt;stop&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;can&amp;#39;t determine script dir: please call the script with Rscript&amp;#34;&lt;/span&gt;)
    &lt;span style=&#34;color:#a6e22e&#34;&gt;if&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;length&lt;/span&gt;(script.dir) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;) &lt;span style=&#34;color:#a6e22e&#34;&gt;stop&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;can&amp;#39;t determine script dir: more than one &amp;#39;--file&amp;#39; argument detected&amp;#34;&lt;/span&gt;)
    &lt;span style=&#34;color:#a6e22e&#34;&gt;return&lt;/span&gt;(script.dir)
}

&lt;span style=&#34;color:#75715e&#34;&gt;# Setting the script path would then be:&lt;/span&gt;
&lt;span style=&#34;color:#a6e22e&#34;&gt;setwd&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;getScriptPath&lt;/span&gt;())
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;setwd-to-script-location-in-rstudio&#34;&gt;setwd to script location in Rstudio&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# This will not throw an error if you are not using rstudio.&lt;/span&gt;
&lt;span style=&#34;color:#a6e22e&#34;&gt;try&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;setwd&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;dirname&lt;/span&gt;(rstudioapi&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;getActiveDocumentContext&lt;/span&gt;()&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;path)))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;setting-the-script-path-relative-to-the-git-repo&#34;&gt;Setting the script path relative to the git repo&lt;/h3&gt;
&lt;p&gt;Most of my code resides in git repositories - so an alternative to setting the working directory to the location of a script is to set it to the location of a git repository. Here is how I do that in R:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# First set the working directory to the location of a script (useful for working in Rstudio)&lt;/span&gt;
&lt;span style=&#34;color:#a6e22e&#34;&gt;try&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;setwd&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;dirname&lt;/span&gt;(rstudioapi&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;getActiveDocumentContext&lt;/span&gt;()&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;path)))
&lt;span style=&#34;color:#75715e&#34;&gt;# Next, set the directory relative to the git repo&lt;/span&gt;
&lt;span style=&#34;color:#a6e22e&#34;&gt;setwd&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;system&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;git rev-parse --show-toplevel&amp;#34;&lt;/span&gt;, intern&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;T))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The big advantage of this approach is that you can call the script from anywhere while you are located in its git repo, and it will always execute from the base or top-level of that repo.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>A bash alias for Microsoft Excel (Mac only)</title>
      <link>https://www.danielecook.com/a-bash-alias-for-microsoft-excel-mac-only/</link>
      <pubDate>Fri, 28 Jun 2019 00:26:29 +0100</pubDate>
      
      <guid>https://www.danielecook.com/a-bash-alias-for-microsoft-excel-mac-only/</guid>
      <description>&lt;p&gt;Years ago I wrote a function for &lt;a href=&#34;https://www.danielecook.com/an-r-function-for-opening-a-dataframe-in-excel-mac-only/&#34;&gt;opening excel from R&lt;/a&gt;. While I would never use Excel for data analysis, it turns out it&amp;rsquo;s pretty good for sorting and browsing data. Thats why I wrote a simple bash alias for opening up text documents from the terminal.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;function&lt;/span&gt; excel&lt;span style=&#34;color:#f92672&#34;&gt;()&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;
    tmp&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;mktemp&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
    out&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;1&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;
    cat &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;out&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt; &amp;gt; $tmp
    open -a &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Microsoft Excel&amp;#34;&lt;/span&gt; $tmp
&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Usage:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;cat spreadsheet.tsv | excel
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Useful Nextflow bash functions for SLURM</title>
      <link>https://www.danielecook.com/useful-nextflow-bash-functions-for-slurm/</link>
      <pubDate>Fri, 21 Jun 2019 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/useful-nextflow-bash-functions-for-slurm/</guid>
      <description>&lt;p&gt;If you use &lt;a href=&#34;http://www.nextflow.io&#34;&gt;Nextflow&lt;/a&gt; on a cluster with the SLURM scheduler, then these bash functions may be useful to you and worth sticking in your &lt;code&gt;.bashrc&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Shortcut for going to work directories&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# Usage: gw &amp;lt;workdir pattern&amp;gt;&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# Replace the work directory below as needed&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# Where workdir pattern is something like &amp;#34;ab/afedeu&amp;#34;&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;function&lt;/span&gt; gw &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;
        path&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;ls --color&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;none -d /path/to/work/directory/$1*&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
        cd $path
&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;

&lt;span style=&#34;color:#75715e&#34;&gt;# sq squeue alternative&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# Outputs more complete information about jobs including the work directory&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;function&lt;/span&gt; sq&lt;span style=&#34;color:#f92672&#34;&gt;()&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;
    squeue --user &lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;whoami&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt; --format&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;%.18i %50j %10u %.10C %m %20J %M %.2t %n %R %Z&amp;#39;&lt;/span&gt; | awk -v OFS&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;\t&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;{ match($10, /([a-f0-9]{2}\/[a-f0-9]{6})/, arr); print $1, $2, $3, $4, $5, $6, $7, $8, $9, arr[1] }&amp;#39;&lt;/span&gt; 
&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, typing &lt;code&gt;sq&lt;/code&gt; will give:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&#34;right&#34;&gt;JOBID&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;NAME&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;USER&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;CPUS&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;MIN_MEMORY&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;THREADS_PER_CORE&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;TIME&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;ST&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;REQ_NODES&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&#34;right&#34;&gt;17475076&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;nf-fastq&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;cookd&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;8&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;12G&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;*&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;0:00&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;PD&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;(Priority)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;04/fbbe3e&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;right&#34;&gt;17475077&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;nf-fastq&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;cookd&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;8&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;12G&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;*&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;0:00&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;PD&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;(Priority)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;c9/9176eb&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;right&#34;&gt;17475078&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;nf-fastq&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;cookd&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;8&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;12G&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;*&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;0:00&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;PD&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;(Priority)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;80/6a233a&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;And you can type &lt;code&gt;gw c9/9176eb&lt;/code&gt; which will take you to the work directory for that job.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://gist.github.com/danielecook/bae19b7b9191b76fb6972bd7ef16718d&#34;&gt;gist&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Log commands to Google Cloud Stackdriver Logs</title>
      <link>https://www.danielecook.com/log-commands-to-google-cloud-stackdriver-logs/</link>
      <pubDate>Fri, 15 Dec 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/log-commands-to-google-cloud-stackdriver-logs/</guid>
      <description>&lt;p&gt;Google Cloud Platform (GCP) has a service called Stackdriver logging which provides a nice interface for accessing logs.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/gcp-logs.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;Stackdriver logging is integrated with all GCP services but it can also be extended. Users can create custom logs and access them centrally using the web-based interface or the &lt;a href=&#34;https://cloud.google.com/sdk/&#34;&gt;Google Cloud SDK&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This got me wondering whether there was a way to log terminal commands locally or on a server. It is possible by setting the &lt;code&gt;PROMPT_COMMAND&lt;/code&gt; variable in BASH. After a command is submitted the value of &lt;code&gt;PROMPT_COMMAND&lt;/code&gt; is interpretted (technically it is interpretted before the next prompt is printed to the screen).&lt;/p&gt;
&lt;p&gt;I wrote up a quick function that looks to see whether the last command exited successfully (0) or resulted in an error (&amp;gt;0), and log using INFO or ERROR respectively. Then I set the function to the &lt;code&gt;PROMPT_COMMAND&lt;/code&gt; variable. Note that you may need to activate the gcloud beta logging for this to work.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;function&lt;/span&gt; prompt &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;[[&lt;/span&gt; $? -eq &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;]]&lt;/span&gt;;&lt;span style=&#34;color:#66d9ef&#34;&gt;then&lt;/span&gt;
    &lt;span style=&#34;color:#f92672&#34;&gt;(&lt;/span&gt;gcloud beta logging write bash_log &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;`fc -nl -1`&amp;#34;&lt;/span&gt; --severity&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;INFO &amp;gt; /dev/null 2&amp;gt;&amp;amp;&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &amp;amp;&lt;span style=&#34;color:#f92672&#34;&gt;)&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;
    &lt;span style=&#34;color:#f92672&#34;&gt;(&lt;/span&gt;gcloud beta logging write bash_log &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;`fc -nl -1`&amp;#34;&lt;/span&gt; --severity&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;ERROR &amp;gt; /dev/null 2&amp;gt;&amp;amp;&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &amp;amp;&lt;span style=&#34;color:#f92672&#34;&gt;)&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;fi&lt;/span&gt;
&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;

PROMPT_COMMAND&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;prompt
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now check the logging interface and you will see your commands are logged!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Python Command-line skeleton</title>
      <link>https://www.danielecook.com/python-command-line-skeleton/</link>
      <pubDate>Thu, 02 Feb 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/python-command-line-skeleton/</guid>
      <description>&lt;p&gt;Writing a command-line interface (CLI) is an easy way to extend the functionality and ease of use of any code you write.&lt;/p&gt;
&lt;p&gt;Python comes with the built-in module, &lt;a href=&#34;https://docs.python.org/3/library/argparse.html&#34;&gt;argparse&lt;/a&gt;, that can be used to easily develop command-line interfaces. To speed up the process, I have developed a &amp;lsquo;skeleton&amp;rsquo; application that can be forked on github and used to quickly develop CLI programs in python.&lt;/p&gt;
&lt;p&gt;The repo has the following features added:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Testing with travis-ci and py.test&lt;/li&gt;
&lt;li&gt;Coverage analysis using coveralls&lt;/li&gt;
&lt;li&gt;A setup file that will install the command&lt;/li&gt;
&lt;li&gt;a simple argparse interface&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To get started, you should signup for an account on &lt;a href=&#34;http://www.travis-ci.org&#34;&gt;travis-ci&lt;/a&gt; and &lt;a href=&#34;http://www.coveralls.io&#34;&gt;coveralls&lt;/a&gt;, and fork the repo!&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/danielecook/python-cli-skeleton&#34;&gt;python-cli-skeleton on Github&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Introducing a Chicago Bioinformatics Slack Channel</title>
      <link>https://www.danielecook.com/introducing-a-chicago-bioinformatics-slack-channel/</link>
      <pubDate>Tue, 31 Jan 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/introducing-a-chicago-bioinformatics-slack-channel/</guid>
      <description>&lt;p&gt;Today I am introducing a new slack team for bioinformatians in Chicago.&lt;/p&gt;
&lt;p&gt;&lt;!-- raw HTML omitted --&gt;Signup for the Chicago Bioinformatics Slack Channel!&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;p&gt;Currently anyone with an email at the following domains can signup:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;@northwestern.edu&lt;/li&gt;
&lt;li&gt;@uchicago.edu&lt;/li&gt;
&lt;li&gt;@uic.edu&lt;/li&gt;
&lt;li&gt;@depaul.edu&lt;/li&gt;
&lt;li&gt;@luc.edu&lt;/li&gt;
&lt;li&gt;@iit.edu&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Members can invite anyone. I am happy to add any Chicago-area domains. Please let me know which ones I am missing!&lt;/p&gt;
&lt;p&gt;The slack team features channels for bioinformatics-help, general, introductions, meetups, and random currently. We can add more channels!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Alfred Image Utilities</title>
      <link>https://www.danielecook.com/alfred-image-utilities/</link>
      <pubDate>Sun, 15 Jan 2017 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/alfred-image-utilities/</guid>
      <description>&lt;p&gt;A workflow for making quick changes to image files. Alfred-image-utilities grabs any selected images in the frontmost finder window and can apply changes to them. Most of the time a copy of the image is made and its extension is changed to &lt;code&gt;&amp;lt;filename&amp;gt;.orig.&amp;lt;ext&amp;gt;&lt;/code&gt;. You can replace the original file by holding &lt;!-- raw HTML omitted --&gt;command&lt;!-- raw HTML omitted --&gt; when executing most commands.&lt;/p&gt;
&lt;h1 id=&#34;downloadhttpsgithubcomdanielecookalfred-image-utilitiesreleaseslatest&#34;&gt;&lt;a href=&#34;https://github.com/danielecook/alfred-image-utilities/releases/latest&#34;&gt;Download&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Main Menu&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://github.com/danielecook/alfred-image-utilities/blob/master/screenshots/home.png?raw=true&#34; alt=&#34;home&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Convert to png or jpg&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;You can convert from a large number of formats to these jpg or png. The original file is retained unless you hold &lt;!-- raw HTML omitted --&gt;command&lt;!-- raw HTML omitted --&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://github.com/danielecook/alfred-image-utilities/blob/master/screenshots/convert.png?raw=true&#34; alt=&#34;convert&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Scale images by a maximum width/height, by percent, or generate thumbnails.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Hold &lt;!-- raw HTML omitted --&gt;command&lt;!-- raw HTML omitted --&gt; to replace original. This option is not available when generating thumbnails. Generating thumbnails will add a &lt;code&gt;.thumb&lt;/code&gt; to the filename (&lt;code&gt;&amp;lt;filename&amp;gt;.thumb.&amp;lt;ext&amp;gt;&lt;/code&gt;)&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://github.com/danielecook/alfred-image-utilities/blob/master/screenshots/scale.png?raw=true&#34; alt=&#34;scale&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Rotate images (clockwise)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Hold &lt;!-- raw HTML omitted --&gt;command&lt;!-- raw HTML omitted --&gt; to replace original.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://github.com/danielecook/alfred-image-utilities/blob/master/screenshots/rotate.png?raw=true&#34; alt=&#34;rotate&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Convert images to black and white.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Hold &lt;!-- raw HTML omitted --&gt;command&lt;!-- raw HTML omitted --&gt; to replace original.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://github.com/danielecook/alfred-image-utilities/blob/master/screenshots/color.png?raw=true&#34; alt=&#34;color&#34;&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>rdatastore</title>
      <link>https://www.danielecook.com/rdatastore/</link>
      <pubDate>Thu, 15 Dec 2016 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/rdatastore/</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve developed a new package for R known as &lt;code&gt;rdatastore&lt;/code&gt; that is avaliable at &lt;a href=&#34;https://github.com/cloudyr/rdatastore&#34;&gt;cloudyr/rdatastore&lt;/a&gt;. &lt;code&gt;rdatastore&lt;/code&gt; provides an interface for Google Cloud&amp;rsquo;s &lt;a href=&#34;https://cloud.google.com/datastore/&#34;&gt;datastore service&lt;/a&gt;. Google Cloud Datastore is a NoSQL database, which makes provides a mechanism for storing and retrieving heterogeneous data. Although Google Datastore is not useful for storing large datasets, it has a number of useful applications within R. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Saving and loading credentials for use with other services.&lt;/li&gt;
&lt;li&gt;Caching data. This is implemented using datastore in my version of the &lt;a href=&#34;https://www.danielecook.com/memoise&#34;&gt;memoise&lt;/a&gt; package.&lt;/li&gt;
&lt;li&gt;Saving/loading universally used pieces of data (&lt;em&gt;e.g.&lt;/em&gt; parameters, options, settings) across systems or between work/home.&lt;/li&gt;
&lt;li&gt;Storage and retrieval of small (&amp;lt;10,000 row) datasets. Useful for integration of summary datasets.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The last two reasons are the primary motivation for developing &lt;code&gt;rdatastore&lt;/code&gt;. Parallelized pipelines can simultaneously submit results to datastore (across many nodes or machines), and the results are obtainable for analysis within R. Settings can be updated on one machine and retrieved on others as well, obviating the need to modify virtual machines or scripts in many cases.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/datastore.png&#34; alt=&#34;datastore&#34;&gt;
&lt;!-- raw HTML omitted --&gt;&lt;!-- raw HTML omitted --&gt;
&lt;!-- raw HTML omitted --&gt;__The datastore interface can be used to view and edit data.__&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id=&#34;setup&#34;&gt;Setup&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;Setup a &lt;a href=&#34;https://cloud.google.com/&#34;&gt;Google Cloud Platform&lt;/a&gt; and create a new project.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cloud.google.com/sdk/&#34;&gt;Download&lt;/a&gt; the Google Cloud SDK. This provides a command line based &lt;code&gt;gcloud&lt;/code&gt; command.&lt;/li&gt;
&lt;li&gt;Install rdatastore&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;devtools&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;install_github&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;cloudyr/rdatastore&amp;#34;&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;usage&#34;&gt;Usage&lt;/h3&gt;
&lt;h4 id=&#34;authentication&#34;&gt;Authentication&lt;/h4&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;library&lt;/span&gt;(rdatastore)
&lt;span style=&#34;color:#a6e22e&#34;&gt;authenticate_datastore&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;andersen-lab&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#75715e&#34;&gt;# Enter your project ID here. rdatastore will authenticate using Oauth.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id=&#34;storing-data&#34;&gt;Storing Data&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;commit()&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Individual entitites can be stored using &lt;code&gt;commit()&lt;/code&gt;. You have to supply a kind (which is analogous to a table in relational database systems). You may optionally submit a name. Any additional arguments supplied are added as properties. Datatypes are inferred from R datatypes. For example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;commit&lt;/span&gt;(kind &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Car&amp;#34;&lt;/span&gt;, name &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Tesla&amp;#34;&lt;/span&gt;, wheels &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;) &lt;span style=&#34;color:#75715e&#34;&gt;# Stores a new entity named &amp;#39;Tesla&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Result&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&#34;left&#34;&gt;kind&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;name&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;wheels&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;car&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Tesla&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Important! Stick with basic datatypes like character vectors, integers, doubles, binary, and datetime objects. Not all datatypes are supported.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I designed &lt;code&gt;rdatastore&lt;/code&gt; to make it easier to append data rather than overwrite it. This is abit against the grain as far as other datastore libraries go. For example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;commit&lt;/span&gt;(kind &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Car&amp;#34;&lt;/span&gt;, name &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Tesla&amp;#34;&lt;/span&gt;, electric &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;TRUE&lt;/span&gt;) &lt;span style=&#34;color:#75715e&#34;&gt;# Stores a new entity named &amp;#39;Tesla&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The entity will now be:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&#34;left&#34;&gt;kind&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;name&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;wheels&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;electric&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;Car&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Tesla&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;If you want to overwrite the entity, you can use &lt;code&gt;keep_existing = FALSE&lt;/code&gt;, and the original data will be wiped and replaced.&lt;/p&gt;
&lt;p&gt;When using &lt;code&gt;commit()&lt;/code&gt; you can omit the &lt;code&gt;name&lt;/code&gt; parameter in which case Google datastore will autogenerate an ID for the entity. I&amp;rsquo;m not sure where this is useful. You won&amp;rsquo;t be able to look the item up without knowing its ID or by performing a query on the entities data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;lookup()&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Retrieve data by specifying its &lt;code&gt;kind&lt;/code&gt; and &lt;code&gt;name&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;lookup&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Car&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Tesla&amp;#34;&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&#34;left&#34;&gt;kind&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;name&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;wheels&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;electric&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;Car&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Tesla&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;gql()&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;You can query items using the &lt;a href=&#34;https://cloud.google.com/datastore/docs/reference/gql_reference&#34;&gt;Google Query Language&lt;/a&gt; (GQL). GQL is a lot like SQL.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Lets commit a few more items&lt;/span&gt;
&lt;span style=&#34;color:#a6e22e&#34;&gt;commit&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Car&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;VW&amp;#34;&lt;/span&gt;, electric &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;FALSE&lt;/span&gt;)
&lt;span style=&#34;color:#a6e22e&#34;&gt;commit&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Car&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Honda&amp;#34;&lt;/span&gt;, make &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Odyssey&amp;#34;&lt;/span&gt;, wheels &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;)
&lt;span style=&#34;color:#a6e22e&#34;&gt;commit&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Car&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Reliant&amp;#34;&lt;/span&gt;, make &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Robin&amp;#34;&lt;/span&gt;, wheels &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;3&lt;/span&gt;)

&lt;span style=&#34;color:#a6e22e&#34;&gt;gql&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;SELECT * FROM Car&amp;#34;&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&#34;left&#34;&gt;kind&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;name&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;make&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;wheels&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;electric&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;Car&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Honda&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Odyssey&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;4&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;Car&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Reliant&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Robin&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;3&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;Car&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Tesla&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;4&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;TRUE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;Car&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;VW&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;NA&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;FALSE&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Notice that some some properties are &lt;code&gt;NA&lt;/code&gt; because they were never specified.&lt;/p&gt;
&lt;p&gt;We can also query specific properties - but this will only return entitites with those properties defined.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;gql&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;SELECT make FROM Car&amp;#34;&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&#34;left&#34;&gt;kind&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;name&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;make&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;Car&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Honda&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Odyssey&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;Car&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Reliant&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Robin&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;You can also filter on properties with GQL:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;gql&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;SELECT * FROM Car WHERE wheels = 3&amp;#34;&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&#34;left&#34;&gt;kind&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;name&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;make&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;wheels&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;Car&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Reliant&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Robin&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
</description>
    </item>
    
    <item>
      <title>A big list of favorites</title>
      <link>https://www.danielecook.com/a-big-list-of-favorites/</link>
      <pubDate>Tue, 29 Nov 2016 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/a-big-list-of-favorites/</guid>
      <description>&lt;p&gt;Here it is! My favorite things in life across all domains. This is a work in progress, but hopefully you&amp;rsquo;ll find one (or a few) things you like and add it to your life. It&amp;rsquo;s a bit sparse currently, but it will fill in over time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; There are no referral links here and I am not being paid to advertise anything here. Any companies/products listed have earned it.&lt;/p&gt;
&lt;h2 id=&#34;programmingsoftware&#34;&gt;Programming/Software&lt;/h2&gt;
&lt;hr&gt;
&lt;h3 id=&#34;terminal&#34;&gt;Terminal&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;http://brew.sh/&#34;&gt;Homebrew&lt;/a&gt;&lt;/strong&gt; - A phenomenal package manage. Use with &lt;a href=&#34;https://github.com/Homebrew/homebrew-science&#34;&gt;homebrew/science&lt;/a&gt;!&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://github.com/wting/autojump&#34;&gt;Autojump&lt;/a&gt;&lt;/strong&gt; - Jump among directories by typing their &lt;code&gt;j&lt;/code&gt; and their name. Even works if you type it incorrectly! Install with &lt;code&gt;brew install autojump&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://github.com/yyuu/pyenv&#34;&gt;pyenv&lt;/a&gt;&lt;/strong&gt; - An easy way to manage multiple installations of python, and set which versions to open globally, locally, or by directory. Install with &lt;code&gt;brew install pyenv&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://direnv.net/&#34;&gt;direnv&lt;/a&gt;&lt;/strong&gt; - Set the terminal environment in a &lt;code&gt;.envrc&lt;/code&gt; file based on the current directory.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;software-osx&#34;&gt;Software (OSX)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;http://www.dropbox.com&#34;&gt;Dropbox&lt;/a&gt;&lt;/strong&gt; - The best backup/syncing solution. I pay for a subscription. I&amp;rsquo;ve used Box and Google Drive as well. Both are inferior.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://panic.com/transmit/&#34;&gt;Transmit&lt;/a&gt;&lt;/strong&gt; - FTP Client.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://www.sublimetext.com/&#34;&gt;Sublime Text&lt;/a&gt;&lt;/strong&gt; - The best text editor. Extend its functionality with &lt;a href=&#34;https://packagecontrol.io/&#34;&gt;package control&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://www.alfredapp.com/&#34;&gt;Alfred&lt;/a&gt;&lt;/strong&gt; - Like spotlight but with a lot more functionality. Workflows extend its functionality considerably. See the ones I&amp;rsquo;ve written!&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://www.sequelpro.com/&#34;&gt;Sequal Pro&lt;/a&gt;&lt;/strong&gt; - A MYSQL GUI. The best GUI for a database I&amp;rsquo;ve seen.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;programming&#34;&gt;Programming&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;http://www.github.com&#34;&gt;Github&lt;/a&gt;&lt;/strong&gt; - A great place to work on projects using git.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Python&lt;/strong&gt; - My favorite all-purpose programming language.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&#34;r&#34;&gt;R&lt;/h4&gt;
&lt;p&gt;Below I list some of my favorite R packages.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://github.com/wilkelab/cowplot&#34;&gt;Cowplot&lt;/a&gt;&lt;/strong&gt; - Arrange plots in a grid &lt;a href=&#34;https://cran.r-project.org/web/packages/cowplot/vignettes/introduction.html&#34;&gt;Vignette&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&#34;python&#34;&gt;Python&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Jekyll&lt;/strong&gt; - Static sites&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flask&lt;/strong&gt; - Python framework&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;http://docs.peewee-orm.com/&#34;&gt;peewee&lt;/a&gt;&lt;/strong&gt; - A very easy to use ORM for simple projects.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;future&#34;&gt;Future&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.youtube.com/watch?v=rAbhypxs1qQ&#34;&gt;Text-based descriptions improve image synthesis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.youtube.com/watch?v=e-WB4lfg30M&#34;&gt;Recurrent Neural Network Writes Sentences About Images&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;personal&#34;&gt;Personal&lt;/h2&gt;
&lt;hr&gt;
&lt;h3 id=&#34;ios-apps&#34;&gt;iOS Apps&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;http://reederapp.com/&#34;&gt;Reeder&lt;/a&gt;&lt;/strong&gt; - A great RSS reader.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;http://www.strava.com&#34;&gt;Strava&lt;/a&gt;&lt;/strong&gt; - Fitness tracker. I&amp;rsquo;ve used Runkeeper in the past. It&amp;rsquo;s worth switching. If you want to switch fitness apps and not lose everything, check out &lt;a href=&#34;https://tapiriik.com/&#34;&gt;tapiriik&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;websites&#34;&gt;Websites&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;News&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;http://www.nyt.com&#34;&gt;New York Times&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;http://www.washingtonpost.com&#34;&gt;Washington Post&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Tech News&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;news.ycombinator.com&#34;&gt;news.ycombinator.com&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Financial&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;http://www.vanguard.com&#34;&gt;Vanguard&lt;/a&gt;&lt;/strong&gt; - Retirement accounts and investing.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Blogs&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;http://www.chicagoarchitecture.org/&#34;&gt;Chicago Architecture Blog&lt;/a&gt;&lt;/strong&gt; - Follow Chicago Architecture.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Services&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;http://pinboard.in/&#34;&gt;Pinboard&lt;/a&gt;&lt;/strong&gt; - Simple bookmarks.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://tapiriik.com/&#34;&gt;tapiriik&lt;/a&gt;&lt;/strong&gt; - Sync fitness data across fitness tracking services.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;biking&#34;&gt;Biking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://www.quadlockcase.com/&#34;&gt;Quad Lock&lt;/a&gt;&lt;/strong&gt; - The best phone mount.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;kitchen&#34;&gt;Kitchen&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Coffee&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://www.amazon.com/gp/product/B0093EPC3O/&#34;&gt;Bodum French Press&lt;/a&gt;&lt;/strong&gt; - I use this every single day.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://www.amazon.com/gp/product/B003NG922U&#34;&gt;Coldbrew press&lt;/a&gt;&lt;/strong&gt; - For iced coffee during the summer.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;wikipedia-pages&#34;&gt;Wikipedia Pages&lt;/h3&gt;
&lt;p&gt;These are mostly a roundup of interesting ideas/facts/concepts I have come across.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/Benfords_law&#34;&gt;Benfords Law&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/Uniparental_disomy&#34;&gt;Uniparental Disomy&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/Dunning-Kruger_effect&#34;&gt;Dunning-Kruger effect&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Guitar  Printouts</title>
      <link>https://www.danielecook.com/guitar-printouts/</link>
      <pubDate>Wed, 17 Aug 2016 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/guitar-printouts/</guid>
      <description>&lt;p&gt;I put these guitar-related printouts (A chord diagram sheet and a fretboard diagram sheet) together years ago:&lt;/p&gt;
&lt;!-- raw HTML omitted --&gt;
</description>
    </item>
    
    <item>
      <title>Quiver-alfred</title>
      <link>https://www.danielecook.com/quiver-alfred/</link>
      <pubDate>Thu, 04 Aug 2016 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/quiver-alfred/</guid>
      <description>&lt;p&gt;Search Quiver from Alfred! &lt;code&gt;Quiver-alfred&lt;/code&gt; quickly constructs a database of your notes for fast and easy querying.&lt;/p&gt;
&lt;h3 id=&#34;downloadhttpsgithubcomdanielecookquiver-alfredreleasesdownload02quiversearchalfredworkflow&#34;&gt;&lt;a href=&#34;https://github.com/danielecook/Quiver-alfred/releases/download/0.2/Quiver.Search.alfredworkflow&#34;&gt;Download&lt;/a&gt;&lt;/h3&gt;
&lt;h2 id=&#34;usage&#34;&gt;Usage&lt;/h2&gt;
&lt;p&gt;Type &lt;!-- raw HTML omitted --&gt;qset&lt;!-- raw HTML omitted --&gt; to set your quiver library location. &lt;code&gt;Quiver-Alfred&lt;/code&gt; constructs a database of your notes to make querying as fast as possible. The database should refresh once every hour and should only take a few seconds to create.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://github.com/danielecook/Quiver-alfred/blob/images/images/qset.png?raw=true&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Type &lt;!-- raw HTML omitted --&gt;q&lt;!-- raw HTML omitted --&gt; to use!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://github.com/danielecook/Quiver-alfred/blob/images/images/initial.png?raw=true&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;You can search tags by hitting &lt;!-- raw HTML omitted --&gt;q #&lt;!-- raw HTML omitted --&gt;.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://github.com/danielecook/Quiver-alfred/blob/images/images/tags.png?raw=true&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Browse Notes within notebook:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://github.com/danielecook/Quiver-alfred/blob/images/images/notebook.png?raw=true&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Full Text Search using sqlite:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://github.com/danielecook/Quiver-alfred/blob/images/images/search.png?raw=true&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>memoise: Caching in the cloud</title>
      <link>https://www.danielecook.com/memoise-caching-in-the-cloud/</link>
      <pubDate>Wed, 27 Jul 2016 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/memoise-caching-in-the-cloud/</guid>
      <description>&lt;h3 id=&#34;update-2019-06-22&#34;&gt;Update: 2019-06-22&lt;/h3&gt;
&lt;p&gt;Based on my suggestions, out-of-memory caching was implemented in the &amp;ldquo;official&amp;rdquo; memoise package &lt;a href=&#34;https://github.com/r-lib/memoise/pull/25&#34;&gt;here&lt;/a&gt;. The memoise package now caches based on files and AWS.&lt;/p&gt;
&lt;h3 id=&#34;original-post&#34;&gt;Original Post&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Memoisation&lt;/strong&gt; is a technique for caching the results of functions based on inputs. For example, the following function calculates the &lt;a href=&#34;https://en.wikipedia.org/wiki/Fibonnacci_sequence&#34;&gt;fibonnaci sequence&lt;/a&gt; in R.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;fib &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;function&lt;/span&gt;(n) {
  &lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(n &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;) &lt;span style=&#34;color:#a6e22e&#34;&gt;return&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)
  &lt;span style=&#34;color:#a6e22e&#34;&gt;fib&lt;/span&gt;(n &lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;fib&lt;/span&gt;(n &lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This is an innefficient way of calculating values of the fibonnacci sequence. However, it is a useful example for understanding memoisation. The following code uses Hadley Wickhams package &lt;a href=&#34;https://github.com/hadley/memoise&#34;&gt;memoise&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;library&lt;/span&gt;(memoise)

&lt;span style=&#34;color:#75715e&#34;&gt;# fib() generates the nth element of the fibonnaci seqeuence&lt;/span&gt;
fib &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;function&lt;/span&gt;(n) {
  &lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(n &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;) &lt;span style=&#34;color:#a6e22e&#34;&gt;return&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)
  &lt;span style=&#34;color:#a6e22e&#34;&gt;fib&lt;/span&gt;(n &lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;fib&lt;/span&gt;(n &lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)
}

&lt;span style=&#34;color:#75715e&#34;&gt;# Memoize fib&lt;/span&gt;
mem_fib &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;memoise&lt;/span&gt;(fib)

&lt;span style=&#34;color:#a6e22e&#34;&gt;mem_fib&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;30&lt;/span&gt;) &lt;span style=&#34;color:#75715e&#34;&gt;# Initial run caches the value&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In the above example, the &lt;code&gt;memoise()&lt;/code&gt; function generates a memoised function, which will automatically cache results. If the function is run again with the same parameters, it will return the cached result rather than recompute the result. Implementing memoisation can significantly speed up analysis when functions that take time to run are repeatedly called.&lt;/p&gt;
&lt;p&gt;What if you are running similar analyses within a cluster environment? The ability to cache results in a centralized datastore could increase the speed of analysis across all machines. Alternatively, perhaps you work on different computers at work and at home. Forgetting to save/load intermediate files may require long-running functions to be run again. Further, managing and retaining intermediate files can be cumbersome and annoying. Again, caching the results of the memoised function in a central location (e.g., cloud-based storage) can speed up analytical pipelines across machines.&lt;/p&gt;
&lt;p&gt;Recently I&amp;rsquo;ve put some work into developing new types of out-of-memory caches for the &lt;strong&gt;memoise&lt;/strong&gt; package  &lt;a href=&#34;https://github.com/danielecook/memoise&#34;&gt;available here&lt;/a&gt;. This forked version can be used to cache items locally or remotely in a variety of environments. Supported environments include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;R environment (cache_local)&lt;/li&gt;
&lt;li&gt;Google Datastore (cache_datastore)&lt;/li&gt;
&lt;li&gt;Amazon S3 (cache_aws_s3)&lt;/li&gt;
&lt;li&gt;File system (cache_filesystem; allows dropbox, google drive to be used for caching)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are a few caveats to consider when using this version of &lt;strong&gt;memoise&lt;/strong&gt;. If you use the external cache options, it will take additional time to retrieve cached items. This is preferable in cluster environments where syncing files across instances/nodes can be difficult. However, when working at home/work, using locally synced files is preferred.&lt;/p&gt;
&lt;h3 id=&#34;installation&#34;&gt;Installation&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;devtools&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;install_github&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;danielecook/memoise&amp;#34;&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;usage&#34;&gt;Usage&lt;/h3&gt;
&lt;h4 id=&#34;google-datastore&#34;&gt;Google Datastore&lt;/h4&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;library&lt;/span&gt;(memoise)

&lt;span style=&#34;color:#75715e&#34;&gt;# fib() generates the nth element of the fibonnaci seqeuence&lt;/span&gt;
fib &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;function&lt;/span&gt;(n) {
  &lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(n &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;) &lt;span style=&#34;color:#a6e22e&#34;&gt;return&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)
  &lt;span style=&#34;color:#a6e22e&#34;&gt;fib&lt;/span&gt;(n &lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;fib&lt;/span&gt;(n &lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)
}

&lt;span style=&#34;color:#75715e&#34;&gt;# Define a cache&lt;/span&gt;
ds &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;cache_datastore&lt;/span&gt;(project &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;your-project-name&amp;#34;&lt;/span&gt;, cache_name &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;rcache2&amp;#34;&lt;/span&gt;)

&lt;span style=&#34;color:#75715e&#34;&gt;# Memoize fib&lt;/span&gt;
mem_fib &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;memoise&lt;/span&gt;(fib, cache &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; ds)

&lt;span style=&#34;color:#a6e22e&#34;&gt;mem_fib&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;30&lt;/span&gt;) &lt;span style=&#34;color:#75715e&#34;&gt;# Initial run caches the value&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id=&#34;amazon-s3&#34;&gt;Amazon S3&lt;/h4&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;library&lt;/span&gt;(memoise)

&lt;span style=&#34;color:#75715e&#34;&gt;# fib() generates the nth element of the fibonnaci seqeuence&lt;/span&gt;
fib &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;function&lt;/span&gt;(n) {
  &lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(n &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;) &lt;span style=&#34;color:#a6e22e&#34;&gt;return&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)
  &lt;span style=&#34;color:#a6e22e&#34;&gt;fib&lt;/span&gt;(n &lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;fib&lt;/span&gt;(n &lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)
}

&lt;span style=&#34;color:#75715e&#34;&gt;# Set up credentials&lt;/span&gt;
&lt;span style=&#34;color:#a6e22e&#34;&gt;Sys.setenv&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;AWS_ACCESS_KEY_ID&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;lt;access key&amp;gt;&amp;#34;&lt;/span&gt;,
           &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;AWS_SECRET_ACCESS_KEY&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;lt;access secret&amp;gt;&amp;#34;&lt;/span&gt;)

&lt;span style=&#34;color:#75715e&#34;&gt;# Define a cache&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# Your bucket name must be unique among all s3 users - so use something like &amp;#39;rcache-&amp;lt;initials&amp;gt;&amp;#39;&lt;/span&gt;
aws_s3 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;cache_s3&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;lt;unique bucket name&amp;gt;&amp;#34;&lt;/span&gt;)

&lt;span style=&#34;color:#75715e&#34;&gt;# Memoize fib&lt;/span&gt;
mem_fib &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;memoise&lt;/span&gt;(fib, cache &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; aws_s3)

&lt;span style=&#34;color:#a6e22e&#34;&gt;mem_fib&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;30&lt;/span&gt;) &lt;span style=&#34;color:#75715e&#34;&gt;# Initial run caches the value&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Automatically construct / infer / sense bigquery schema</title>
      <link>https://www.danielecook.com/automatically-construct-/-infer-/-sense-bigquery-schema/</link>
      <pubDate>Wed, 30 Dec 2015 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/automatically-construct-/-infer-/-sense-bigquery-schema/</guid>
      <description>&lt;h2 id=&#34;update-bigquery-adds-schema-auto-detection-2019-06-22&#34;&gt;Update: BigQuery adds schema auto-detection (2019-06-22)&lt;/h2&gt;
&lt;p&gt;BigQuery now offers a &lt;a href=&#34;https://cloud.google.com/bigquery/docs/schema-detect&#34;&gt;schema auto-detection features&lt;/a&gt; making the work I had done below no longer necessary.&lt;/p&gt;
&lt;h2 id=&#34;original-post&#34;&gt;Original Post&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://cloud.google.com/bigquery/&#34;&gt;BigQuery&lt;/a&gt; is a phenomenal tool for analyzing large datasets. It enables you to upload large datasets and perform sophisticated SQL queries on millions of rows in seconds. Moreover, it can be integrated with R using &lt;a href=&#34;https://github.com/r-dbi/bigrquery&#34;&gt;BigRQuery&lt;/a&gt;, which can be used to interact with bigquery using some of the functions in dplyr.&lt;/p&gt;
&lt;p&gt;It is easy to upload datasets to bigquery, although it requires you to specify a schema. If you have a lot of columns in a dataset this can be a pain to do manually - so I wrote a script to automate the process. The script automatically determines the variable types within the first 500 rows of a tab-delimited dataset. To get started, download the python script below and save it as schema.py.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; mimetypes
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; sys
&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; collections &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; OrderedDict

filename &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; sys&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;argv[&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;]

&lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;file_type&lt;/span&gt;(filename):
    type &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; mimetypes&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;guess_type(filename)
    &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; type

filetype &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; file_type(filename)[&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;]
&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; filetype &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;gzip&amp;#34;&lt;/span&gt;:
    &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; gzip
    readfile &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; gzip&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;GzipFile(filename, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;r&amp;#39;&lt;/span&gt;)
&lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;:
    readfile &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; open(filename,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;r&amp;#39;&lt;/span&gt;)

&lt;span style=&#34;color:#66d9ef&#34;&gt;with&lt;/span&gt; readfile &lt;span style=&#34;color:#66d9ef&#34;&gt;as&lt;/span&gt; f:
    header &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; next(f)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;strip()&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;split(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\t&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;)
    lines &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; [dict(zip(header,next(f)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;strip()&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;split(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\t&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;))) &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; xrange(&lt;span style=&#34;color:#ae81ff&#34;&gt;50000&lt;/span&gt;)]

schema &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; OrderedDict(zip(header, [bool]&lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;len(header)))
&lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;boolify&lt;/span&gt;(s):
    &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; s &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;True&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;or&lt;/span&gt; s &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;TRUE&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;or&lt;/span&gt; s &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;T&amp;#34;&lt;/span&gt;:
        &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; True
    &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; s &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;False&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;or&lt;/span&gt; s &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;FALSE&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;or&lt;/span&gt; s &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;F&amp;#34;&lt;/span&gt;:
        &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; False
    &lt;span style=&#34;color:#66d9ef&#34;&gt;raise&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;ValueError&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;huh?&amp;#34;&lt;/span&gt;)

&lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;autoconvert&lt;/span&gt;(s):
    &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; fn &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; (boolify, int, float):
        &lt;span style=&#34;color:#66d9ef&#34;&gt;try&lt;/span&gt;:
            &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; fn(s)
        &lt;span style=&#34;color:#66d9ef&#34;&gt;except&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;ValueError&lt;/span&gt;:
            &lt;span style=&#34;color:#66d9ef&#34;&gt;pass&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; s

type_precedence &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {str:&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;, float:&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;, int:&lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;,bool:&lt;span style=&#34;color:#ae81ff&#34;&gt;3&lt;/span&gt;}
type_map &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {str:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;STRING&amp;#34;&lt;/span&gt;, float:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;FLOAT&amp;#34;&lt;/span&gt;, int:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;INTEGER&amp;#34;&lt;/span&gt;, bool:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;BOOLEAN&amp;#34;&lt;/span&gt;}

&lt;span style=&#34;color:#75715e&#34;&gt;# Sense header&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; line &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; lines:
    &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; k,v &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; line&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;items():
        &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; v &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;or&lt;/span&gt; v &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;.&amp;#34;&lt;/span&gt;:
            &lt;span style=&#34;color:#66d9ef&#34;&gt;pass&lt;/span&gt;
        &lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;:
            sense_type &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; type(autoconvert(v))
            &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; schema[k] &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; sense_type &lt;span style=&#34;color:#f92672&#34;&gt;or&lt;/span&gt; schema[k] &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; str:
                &lt;span style=&#34;color:#66d9ef&#34;&gt;pass&lt;/span&gt;
            &lt;span style=&#34;color:#66d9ef&#34;&gt;elif&lt;/span&gt; type_precedence[schema[k]] &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; type_precedence[sense_type]:
                schema[k] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; sense_type

&lt;span style=&#34;color:#66d9ef&#34;&gt;print&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;,&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;join([ k&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;replace(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;/&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;_&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;:&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; type_map[v] &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; k,v &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; schema&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;items()])
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;usage&#34;&gt;Usage&lt;/h3&gt;
&lt;p&gt;Save the gist as a script and run it as follows:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;python schema.py &amp;lt;file&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The script supports plain text and gzipped files (which bigquery can load).&lt;/p&gt;
&lt;h3 id=&#34;output-example&#34;&gt;Output Example&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;CHROM:STRING,POS:INTEGER,REF_Original:STRING,ALT_Change:STRING,avg_cover:FLOAT,spikein_snvfrac:FLOAT,maxfrac:FLOAT,in_spikein:BOOLEAN,in_varset:BOOLEAN
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note that support for &lt;a href=&#34;https://cloud.google.com/bigquery/preparing-data-for-bigquery&#34;&gt;RECORD and TIMESTAMP&lt;/a&gt; fieldtypes is not supported.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Parallelize bcftools functions</title>
      <link>https://www.danielecook.com/parallelize-bcftools-functions/</link>
      <pubDate>Sat, 21 Nov 2015 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/parallelize-bcftools-functions/</guid>
      <description>&lt;p&gt;&lt;a href=&#34;http://samtools.github.io/bcftools/&#34;&gt;bcftools&lt;/a&gt; is a great for working with &lt;a href=&#34;http://www.1000genomes.org/wiki/analysis/variant%20call%20format/vcf-variant-call-format-version-41&#34;&gt;variant call files&lt;/a&gt;. In general, it is fast. However, I have found that the process of merging VCF files (using &lt;code&gt;bcftools merge&lt;/code&gt;) and performing concordance checking (using &lt;code&gt;bcftools gtcheck&lt;/code&gt;) can be a little bit slow. That is why I wrote two functions that take advantage of &lt;a href=&#34;http://www.gnu.org/software/parallel/&#34;&gt;GNU Parallel&lt;/a&gt; to parallelize them.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# ~/.bashrc: executed by bash(1) for non-login shells.&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# see /usr/share/doc/bash/examples/startup-files (in the package bash-doc)&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# for examples&lt;/span&gt;

&lt;span style=&#34;color:#66d9ef&#34;&gt;function&lt;/span&gt; bam_chromosomes&lt;span style=&#34;color:#f92672&#34;&gt;()&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;
    &lt;span style=&#34;color:#75715e&#34;&gt;# Fetch chromosomes from a bam file&lt;/span&gt;
    samtools view -H $1 | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    grep -Po &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;SN:(.*)\t&amp;#39;&lt;/span&gt; | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    cut -c 4-1000
&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;

&lt;span style=&#34;color:#66d9ef&#34;&gt;function&lt;/span&gt; vcf_chromosomes&lt;span style=&#34;color:#f92672&#34;&gt;()&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;
    &lt;span style=&#34;color:#75715e&#34;&gt;# Fetch contigs from a vcf file.&lt;/span&gt;
    bcftools view -h $1 | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    grep &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;contig&amp;#39;&lt;/span&gt; | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    egrep -o &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;ID=([^,]+)&amp;#34;&lt;/span&gt; | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    sed &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;s/ID=//g&amp;#39;&lt;/span&gt; | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    tr &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;\n&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; &amp;#39;&lt;/span&gt;
&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;


PARALLEL_CORES&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;function&lt;/span&gt; parallel_bcftools_merge&lt;span style=&#34;color:#f92672&#34;&gt;()&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;
    file_set&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;echo $@ | egrep -o &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;(\-l|\-\-file-list)(=|[ ]+)[^ ]+&amp;#39;&lt;/span&gt; | tr &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;=&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; &amp;#39;&lt;/span&gt; | cut -f &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt; -d &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; &amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt; -n &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;file_set&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt;
        &lt;span style=&#34;color:#66d9ef&#34;&gt;then&lt;/span&gt;
            find_vcf&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;cat &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;file_set&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt; | head -n 1&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
        &lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;
            find_vcf&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;echo $@ | tr &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;\t&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;\n&amp;#39;&lt;/span&gt; | egrep -o &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;[^ ]+.vcf.gz&amp;#39;&lt;/span&gt; | awk &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;NR == 1 { print }&amp;#39;&lt;/span&gt; - &lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;fi&lt;/span&gt;
    contigs&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;vcf_chromosomes &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;find_vcf&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
    current_dir&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;$(&lt;/span&gt;dirname &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;find_vcf&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;)&lt;/span&gt;
    hash_merge&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;echo &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;$@&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt; | md5sum | cut -c 1-5&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
    output_prefix&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;current_dir&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;/parallel_merge.&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;hash_merge&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;
    
    parallel --gnu --workdir &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;current_dir&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    --env args -j &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;PARALLEL_CORES&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;bcftools merge -r {1} -O u &amp;#39;&lt;/span&gt; $@ &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; &amp;gt; &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;output_prefix&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;.{1}.bcf&amp;#39;&lt;/span&gt; ::: &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;contigs&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt; 
    
    order&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;echo $contigs | tr &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;\n&amp;#39;&lt;/span&gt; | awk -v &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;prefix=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;output_prefix&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;{ print prefix &amp;#34;.&amp;#34; $0 &amp;#34;.bcf&amp;#34; }&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
    bcftools concat -O v &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;order&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt; | grep -v &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;parallel_merge&amp;#39;&lt;/span&gt; | sed &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;s/##bcftools_mergeCommand=merge -r I -O u /##bcftools_mergeCommand=merge /g&amp;#39;&lt;/span&gt; | bcftools view -O u
    rm &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;order&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;
&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;


PARALLEL_CORES&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;function&lt;/span&gt; parallel_bcftools_gtcheck&lt;span style=&#34;color:#f92672&#34;&gt;()&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;
    find_vcf&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;echo $@ | tr &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;\t&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;\n&amp;#39;&lt;/span&gt; | egrep -o &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;[^ ]+.vcf.gz&amp;#39;&lt;/span&gt; | awk &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;NR == 1 { print }&amp;#39;&lt;/span&gt; - &lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
    contigs&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;vcf_chromosomes &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;find_vcf&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
    current_dir&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;$(&lt;/span&gt;dirname &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;find_vcf&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;)&lt;/span&gt;
    hash_merge&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;echo &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;$@&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt; | md5sum | cut -c 1-5&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
    output_prefix&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;current_dir&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;/parallel_concordance.&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;hash_merge&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;
    gtcheck_options&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;echo $@ | awk -v vcf&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;find_vcf&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;{ gsub(vcf,&amp;#34;&amp;#34;,$0); print $0; }&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
    parallel --gnu  -j &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;PARALLEL_CORES&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt; --workdir &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;current_dir&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;bcftools view &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;find_vcf&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; {} | \
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    bcftools gtcheck &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;gtcheck_options&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; - | \
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    awk -v chrom={} &amp;#34;/^CN/ { print \$0 \&amp;#34;\t\&amp;#34; chrom } \$0 !~ /.*CN.*/ { print } \$0 ~ /^# \[1\]CN/ { print \$0 \&amp;#34;\tchrom\&amp;#34;}&amp;#34; - &amp;gt; &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;output_prefix&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;.{}.tsv&amp;#39;&lt;/span&gt; ::: &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;contigs&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;

    order&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;echo $contigs | tr &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;\n&amp;#39;&lt;/span&gt; | awk -v &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;prefix=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;output_prefix&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;{ print prefix &amp;#34;.&amp;#34; $0 &amp;#34;.tsv&amp;#34; }&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
    cat &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;order&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt; | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    grep &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;CN&amp;#39;&lt;/span&gt; | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    awk &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;NR == 1 &amp;amp;&amp;amp; /Discordance/ { print } NR &amp;gt; 1 &amp;amp;&amp;amp; $0 !~ /Discordance/ { print }&amp;#39;&lt;/span&gt; | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    awk &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;{ gsub(&amp;#34;(# |\\[[0-9]+\\])&amp;#34;,&amp;#34;&amp;#34;, $0); gsub(&amp;#34; &amp;#34;,&amp;#34;_&amp;#34;, $0); print }&amp;#39;&lt;/span&gt; | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    cut -f 2-7 | datamash --header-in --sort --group&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;Sample_i,Sample_j sum Discordance  sum Number_of_sites | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    cat &amp;lt;&lt;span style=&#34;color:#f92672&#34;&gt;(&lt;/span&gt;echo -e &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;sample_i\tsample_j\tdiscordance\tnumber_of_sites&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;)&lt;/span&gt; - | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    awk &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;NR == 1 { print $0 &amp;#34;\tconcordance&amp;#34; } NR &amp;gt; 1 &amp;amp;&amp;amp; $4 == 0 { print $0 &amp;#34;\t&amp;#34; } NR &amp;gt; 1 &amp;amp;&amp;amp; $4 &amp;gt; 0 { print $0 &amp;#34;\t&amp;#34; ($4-$3)/$4 }&amp;#39;&lt;/span&gt;
    rm &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;order&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;
&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;usage&#34;&gt;Usage&lt;/h3&gt;
&lt;p&gt;The function &lt;code&gt;vcf_chromosomes&lt;/code&gt; extracts chromosomes names from a VCF file using bcftools. Parallelization occurs across chromosomes.&lt;/p&gt;
&lt;h3 id=&#34;parallel_bcftools_merge&#34;&gt;&lt;code&gt;parallel_bcftools_merge&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;parallel_bcftools_merge&lt;/code&gt; is run very similar to &lt;code&gt;bcftools merge&lt;/code&gt;. The only difference is that you have to pipe it into bcftools to change it to the appropriate output. For example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;parallel_bcftools_merge -m all &lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;ls *list_of_bcffiles&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt; | bcftools view -O z &amp;gt; merged_vcf.vcf.gz
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;parallel_bcftools_merge&lt;/code&gt; function will generate a temporary vcf for every chromosome. You can use all flags except for &lt;code&gt;-O&lt;/code&gt; with this function.&lt;/p&gt;
&lt;h3 id=&#34;parallel_bcftools_gtcheck&#34;&gt;&lt;code&gt;parallel_bcftools_gtcheck&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;parallel_bcftools_gtcheck&lt;/code&gt; should not be used with &lt;code&gt;--all-sites&lt;/code&gt;, or &lt;code&gt;--plot&lt;/code&gt;. I recommend using this function with &lt;code&gt;-H&lt;/code&gt; and &lt;code&gt;-G 1&lt;/code&gt; to calculate the absolute number of differences in terms of homozygous calls between samples. Also, this function requires datamash (on OSX, install with &lt;code&gt;brew install datamash&lt;/code&gt;)&lt;/p&gt;
&lt;p&gt;The output file is slightly different than what bcftools normally outputs. In general, I use this function specifically to calculate conocordance between individual fastq runs - like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;parallel_bcftools_gtchek -H -G &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; union_samples.vcf.gz &amp;gt; concordance.tsv
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This parallelized version generates concordances for each chromosome and then merges the results together using datamash. Output looks like this:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&#34;left&#34;&gt;sample_i&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;sample_j&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;discordance&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;number_of_sites&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;concordance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;BGI2-RET1-ED3049&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;BGI1-RET1-ED3049&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;927&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2344043&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.999605&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;BGI1-RET1-CB4856&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;BGI1-RET1-CB4852&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;144484&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2171694&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.933469&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;BGI1-RET1-CX11315&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;BGI1-RET1-CB4852&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;106964&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2721950&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.960703&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;BGI1-RET1-CX11315&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;BGI1-RET1-CB4856&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;137200&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2059983&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.933398&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;BGI1-RET1-DL238&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;BGI1-RET1-CB4852&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;148217&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2097343&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.929331&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;BGI1-RET1-DL238&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;BGI1-RET1-CB4856&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;124132&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1803664&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.931178&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;left&#34;&gt;BGI1-RET1-DL238&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;BGI1-RET1-CX11315&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;146580&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1996802&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.926593&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
</description>
    </item>
    
    <item>
      <title>An Alfred workflow for generating markdown tables from your clipboard</title>
      <link>https://www.danielecook.com/an-alfred-workflow-for-generating-markdown-tables-from-your-clipboard/</link>
      <pubDate>Fri, 30 Oct 2015 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/an-alfred-workflow-for-generating-markdown-tables-from-your-clipboard/</guid>
      <description>&lt;p&gt;Generate markdown tables from clipboard content.&lt;/p&gt;
&lt;h4 id=&#34;downloadhttpsgithubcomdanielecookmarkdown-table-alfredrawworkflowmarkdown-tablesalfredworkflow&#34;&gt;&lt;a href=&#34;https://github.com/danielecook/markdown-table-alfred/raw/workflow/markdown-tables.alfredworkflow&#34;&gt;Download&lt;/a&gt;&lt;/h4&gt;
&lt;h2 id=&#34;usage&#34;&gt;Usage&lt;/h2&gt;
&lt;p&gt;Copy a csv or tsv. The script will attempt to intelligently guess the format. For example, if you copy the table below:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;carat   cut color   clarity depth   table   price   x   y   z
0.23    Ideal   E   SI2 61.5    &lt;span style=&#34;color:#ae81ff&#34;&gt;55&lt;/span&gt;  &lt;span style=&#34;color:#ae81ff&#34;&gt;326&lt;/span&gt; 3.95    3.98    2.43
0.21    Premium E   SI1 59.8    &lt;span style=&#34;color:#ae81ff&#34;&gt;61&lt;/span&gt;  &lt;span style=&#34;color:#ae81ff&#34;&gt;326&lt;/span&gt; 3.89    3.84    2.31
0.23    Good    E   VS1 56.9    &lt;span style=&#34;color:#ae81ff&#34;&gt;65&lt;/span&gt;  &lt;span style=&#34;color:#ae81ff&#34;&gt;327&lt;/span&gt; 4.05    4.07    2.31
0.29    Premium I   VS2 62.4    &lt;span style=&#34;color:#ae81ff&#34;&gt;58&lt;/span&gt;  &lt;span style=&#34;color:#ae81ff&#34;&gt;334&lt;/span&gt; 4.2 4.23    2.63
0.31    Good    J   SI2 63.3    &lt;span style=&#34;color:#ae81ff&#34;&gt;58&lt;/span&gt;  &lt;span style=&#34;color:#ae81ff&#34;&gt;335&lt;/span&gt; 4.34    4.35    2.75
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then type &lt;code&gt;tbl&lt;/code&gt; in alfred and you will see the following:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://github.com/danielecook/markdown-table-alfred/raw/master/tbl.png&#34; alt=&#34;tbl screen&#34;&gt;&lt;/p&gt;
&lt;p&gt;You can create a table with or without a header. It will be copied to your clipboard as this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;|   carat | cut     | color   | clarity   | depth   |   table |   price |      x |    y |    z |
|--------:|:--------|:--------|:----------|:--------|--------:|--------:|-------:|-----:|-----:|
|    0.23 | Ideal   | E       | SI2       | 61.5    |    &lt;span style=&#34;color:#ae81ff&#34;&gt;55&lt;/span&gt;   |     &lt;span style=&#34;color:#ae81ff&#34;&gt;326&lt;/span&gt; |   3.95 | 3.98 | 2.43 |
|    0.21 | Premium | E       | SI1       | 59.8    |    &lt;span style=&#34;color:#ae81ff&#34;&gt;61&lt;/span&gt;   |     &lt;span style=&#34;color:#ae81ff&#34;&gt;326&lt;/span&gt; |   3.89 | 3.84 | 2.31 |
|    0.23 | Good    | E       | VS1       | 56.9    |    &lt;span style=&#34;color:#ae81ff&#34;&gt;65&lt;/span&gt;   |     &lt;span style=&#34;color:#ae81ff&#34;&gt;327&lt;/span&gt; |   4.05 | 4.07 | 2.31 |
|    0.29 | Premium | I       | VS2       | 62.4    |    &lt;span style=&#34;color:#ae81ff&#34;&gt;58&lt;/span&gt;   |     &lt;span style=&#34;color:#ae81ff&#34;&gt;334&lt;/span&gt; |   4.2  | 4.23 | 2.63 |
|    0.31 | Good    | J       | SI2       | 63.3    |    &lt;span style=&#34;color:#ae81ff&#34;&gt;58&lt;/span&gt;   |     &lt;span style=&#34;color:#ae81ff&#34;&gt;335&lt;/span&gt; |   4.34 | 4.35 | 2.75 |
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And it will render like this:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&#34;right&#34;&gt;carat&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;cut&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;color&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;clarity&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;depth&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;table&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;price&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;x&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;y&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;z&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&#34;right&#34;&gt;0.23&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Ideal&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;E&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;SI2&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;61.5&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;55&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;326&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;3.95&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;3.98&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.43&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;right&#34;&gt;0.21&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Premium&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;E&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;SI1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;59.8&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;61&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;326&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;3.89&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;3.84&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.31&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;right&#34;&gt;0.23&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Good&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;E&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;VS1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;56.9&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;65&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;327&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4.05&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4.07&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.31&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;right&#34;&gt;0.29&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Premium&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;I&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;VS2&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;62.4&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;58&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;334&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4.2&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4.23&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.63&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&#34;right&#34;&gt;0.31&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Good&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;J&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;SI2&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;63.3&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;58&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;335&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4.34&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4.35&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.75&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
</description>
    </item>
    
    <item>
      <title>Fetch Citations in Google Sheets using pubmed() function</title>
      <link>https://www.danielecook.com/fetch-citations-in-google-sheets-using-pubmed-function/</link>
      <pubDate>Thu, 29 Oct 2015 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/fetch-citations-in-google-sheets-using-pubmed-function/</guid>
      <description>&lt;p&gt;If you need to fetch pubmed citations in aggregate it can be convenient to do so using pubmed identifiers. I&amp;rsquo;ve created a &lt;code&gt;pubmed()&lt;/code&gt; function that can be added to a google sheet and used to fetch formatted html citations from pubmed. For example, entering the following into a cell:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;pubmed&lt;span style=&#34;color:#f92672&#34;&gt;(&lt;/span&gt;23149456&lt;span style=&#34;color:#f92672&#34;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Will return an html-formatted citation:&lt;/p&gt;
&lt;!-- raw HTML omitted --&gt;
&lt;h3 id=&#34;setup&#34;&gt;Setup&lt;/h3&gt;
&lt;p&gt;To implement the function, you&amp;rsquo;ll need to copy and paste the function below into the script editor and save it as a new project. Then it will become available within your google sheet. The script editor is available through the &lt;code&gt;Tools &amp;gt; Script Editor&lt;/code&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-js&#34; data-lang=&#34;js&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;/**
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt; * Returns formatted pubmed citation.
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt; *
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt; * @param {id} Pubmed identifier.
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt; * @return Formatted pubmed citation.
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt; * @customfunction
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt; */&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;function&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;pubmed&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;id&lt;/span&gt;) {
  &lt;span style=&#34;color:#75715e&#34;&gt;// Special thanks to http://www.alexhadik.com/blog/2014/6/12/create-pubmed-citations-automatically-using-pubmed-api
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;  &lt;span style=&#34;color:#66d9ef&#34;&gt;var&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;content&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;  &lt;span style=&#34;color:#a6e22e&#34;&gt;UrlFetchApp&lt;/span&gt;.&lt;span style=&#34;color:#a6e22e&#34;&gt;fetch&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&amp;amp;id=&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;id&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;amp;retmode=json&amp;#34;&lt;/span&gt;)
  &lt;span style=&#34;color:#a6e22e&#34;&gt;summary&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;JSON&lt;/span&gt;.&lt;span style=&#34;color:#a6e22e&#34;&gt;parse&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;content&lt;/span&gt;)
  
  &lt;span style=&#34;color:#66d9ef&#34;&gt;var&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;title&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;summary&lt;/span&gt;.&lt;span style=&#34;color:#a6e22e&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#a6e22e&#34;&gt;id&lt;/span&gt;].&lt;span style=&#34;color:#a6e22e&#34;&gt;title&lt;/span&gt;;
  &lt;span style=&#34;color:#66d9ef&#34;&gt;var&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;journal&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;summary&lt;/span&gt;.&lt;span style=&#34;color:#a6e22e&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#a6e22e&#34;&gt;id&lt;/span&gt;].&lt;span style=&#34;color:#a6e22e&#34;&gt;fulljournalname&lt;/span&gt;;
  &lt;span style=&#34;color:#66d9ef&#34;&gt;var&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;volume&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;summary&lt;/span&gt;.&lt;span style=&#34;color:#a6e22e&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#a6e22e&#34;&gt;id&lt;/span&gt;].&lt;span style=&#34;color:#a6e22e&#34;&gt;volume&lt;/span&gt;;
  &lt;span style=&#34;color:#66d9ef&#34;&gt;var&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;issue&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;summary&lt;/span&gt;.&lt;span style=&#34;color:#a6e22e&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#a6e22e&#34;&gt;id&lt;/span&gt;].&lt;span style=&#34;color:#a6e22e&#34;&gt;issue&lt;/span&gt;;
  &lt;span style=&#34;color:#66d9ef&#34;&gt;var&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;citation&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;;
  &lt;span style=&#34;color:#66d9ef&#34;&gt;var&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;pub_date&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;summary&lt;/span&gt;.&lt;span style=&#34;color:#a6e22e&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#a6e22e&#34;&gt;id&lt;/span&gt;].&lt;span style=&#34;color:#a6e22e&#34;&gt;pubdate&lt;/span&gt;;
  &lt;span style=&#34;color:#66d9ef&#34;&gt;var&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;pages&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;summary&lt;/span&gt;.&lt;span style=&#34;color:#a6e22e&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#a6e22e&#34;&gt;id&lt;/span&gt;].&lt;span style=&#34;color:#a6e22e&#34;&gt;pages&lt;/span&gt;;
 
  &lt;span style=&#34;color:#66d9ef&#34;&gt;var&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;authors&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;;
  &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;author&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;summary&lt;/span&gt;.&lt;span style=&#34;color:#a6e22e&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#a6e22e&#34;&gt;id&lt;/span&gt;].&lt;span style=&#34;color:#a6e22e&#34;&gt;authors&lt;/span&gt;){
    &lt;span style=&#34;color:#a6e22e&#34;&gt;authors&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;+=&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;summary&lt;/span&gt;.&lt;span style=&#34;color:#a6e22e&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#a6e22e&#34;&gt;id&lt;/span&gt;].&lt;span style=&#34;color:#a6e22e&#34;&gt;authors&lt;/span&gt;[&lt;span style=&#34;color:#a6e22e&#34;&gt;author&lt;/span&gt;].&lt;span style=&#34;color:#a6e22e&#34;&gt;name&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;, &amp;#39;&lt;/span&gt;;
  }
  
  &lt;span style=&#34;color:#66d9ef&#34;&gt;var&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;citation&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;title&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;lt;/strong&amp;gt;&amp;lt;br /&amp;gt;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt;
                 &lt;span style=&#34;color:#a6e22e&#34;&gt;authors&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;lt;br /&amp;gt;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt;
                 &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;(&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;pub_date&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;) &amp;lt;em&amp;gt;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;journal&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;lt;/em&amp;gt; &amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; 
                 &lt;span style=&#34;color:#a6e22e&#34;&gt;volume&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34; (&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;issue&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;) &amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;pages&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;lt;/p&amp;gt;&amp;#34;&lt;/span&gt;;
                 
  
  &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;citation&lt;/span&gt;
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>An Alfred Workflow for Wormbase</title>
      <link>https://www.danielecook.com/an-alfred-workflow-for-wormbase/</link>
      <pubDate>Thu, 09 Jul 2015 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/an-alfred-workflow-for-wormbase/</guid>
      <description>&lt;p&gt;I have created an Alfred workflow for looking up gene information in wormbase. You can search by wormbase ID (e.g., WBGene00006759) You can use it to search for genes. Returned results will include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Gene identifiers&lt;/li&gt;
&lt;li&gt;Location&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Caenorhabditis&lt;/em&gt; orthologs&lt;/li&gt;
&lt;li&gt;Publications&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;download-the-latest-version1&#34;&gt;&lt;a href=&#34;https://github.com/danielecook/wormbase-alfred/releases/latest&#34;&gt;Download the latest version&lt;/a&gt;&lt;/h3&gt;
&lt;h2 id=&#34;usage&#34;&gt;Usage&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Search for Genes&lt;/strong&gt;&lt;br&gt;
&lt;img src=&#34;http://github.com/danielecook/wormbase-alfred/raw/master/img/search_genes.png&#34; alt=&#34;search&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Get Gene Information&lt;/strong&gt;&lt;br&gt;
&lt;img src=&#34;http://github.com/danielecook/wormbase-alfred/raw/master/img/get_gene_info.png&#34; alt=&#34;Get Gene Info&#34;&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>An Alfred Workflow for Codebox</title>
      <link>https://www.danielecook.com/an-alfred-workflow-for-codebox/</link>
      <pubDate>Thu, 25 Jun 2015 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/an-alfred-workflow-for-codebox/</guid>
      <description>&lt;p&gt;&lt;a href=&#34;&#34;&gt;Codebox&lt;/a&gt; is a great program for storing and accessing snippets. It offers a quickbar menu item, but I thought Alfred might offer more functionality. So I wrote a workflow for it.&lt;/p&gt;
&lt;h3 id=&#34;img-srchttpsrawgithubusercontentcomdanielecookcodebox-alfredmaster5ea0cb7e-736d-475c-974f-a761791c582apng-styleheight27pxmargin-right10px-download-codebox-alfred-workflow1&#34;&gt;&lt;a href=&#34;https://github.com/danielecook/codebox-alfred/releases/latest&#34;&gt;&lt;!-- raw HTML omitted --&gt;Download Codebox-Alfred workflow&lt;/a&gt;&lt;/h3&gt;
&lt;h2 id=&#34;important&#34;&gt;Important!&lt;/h2&gt;
&lt;p&gt;The workflow works fairly well, but there are a few caveats. You should not do the following with your codebox libraries:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Don’t put spaces into tag, list, or folder names. Use an underscore instead.&lt;/li&gt;
&lt;li&gt;Don’t nest folders/lists with the same name.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;usage&#34;&gt;Usage&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Set the codebox source using &lt;code&gt;cb_src&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://github.com/danielecook/codebox-alfred/raw/master/img/set_src.png&#34; alt=&#34;set source&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Invoke the workflow by typing &lt;code&gt;ff&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://github.com/danielecook/codebox-alfred/raw/master/img/browse_directory.png&#34; alt=&#34;search directory&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Browse tags with &lt;code&gt;ff #&amp;lt;search&amp;gt;&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
&lt;img src=&#34;http://github.com/danielecook/codebox-alfred/raw/master/img/search_tags.png&#34; alt=&#34;search tags&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Search all Snippets: &lt;code&gt;ff &amp;lt;search&amp;gt;&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://github.com/danielecook/codebox-alfred/raw/master/img/search_snippets.png&#34; alt=&#34;search all&#34;&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>HGNC Search: Instant search of human genes with Alfred</title>
      <link>https://www.danielecook.com/hgnc-search-instant-search-of-human-genes-with-alfred/</link>
      <pubDate>Fri, 12 Jun 2015 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/hgnc-search-instant-search-of-human-genes-with-alfred/</guid>
      <description>&lt;p&gt;I have put together an Alfred workflow – this one searches the HGNC database for genes! I have converted the text database from the &lt;a href=&#34;http://www.genenames.org/&#34;&gt;HGNC website&lt;/a&gt; and configured it for full text search using sqlite. This allows you to lookup genes by their UCSC, Entrez, Vega, Ensembl, and many other identifiers very quickly.&lt;/p&gt;
&lt;h3 id=&#34;img-srcgene-150x150png-width25px--download-the-latest-release2&#34;&gt;&lt;a href=&#34;https://github.com/danielecook/HGNC-Search/releases/latest&#34;&gt;&lt;!-- raw HTML omitted --&gt; Download the latest release&lt;/a&gt;&lt;/h3&gt;
&lt;h2 id=&#34;usage&#34;&gt;Usage&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Full text search of the HGNC database&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/d1-1024x759.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Information and links are provided for individual genes&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/d2.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;feedback&#34;&gt;Feedback&lt;/h2&gt;
&lt;p&gt;Please provide feedback. Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What other gene IDs should be displayed by default? (You can currently search for any)&lt;/li&gt;
&lt;li&gt;What other sites would you like to be able to navigate to.&lt;/li&gt;
&lt;li&gt;Is there additional information that should be folded in that would be useful?&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>An alfred workflow for working with sequence data</title>
      <link>https://www.danielecook.com/an-alfred-workflow-for-working-with-sequence-data/</link>
      <pubDate>Thu, 11 Jun 2015 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/an-alfred-workflow-for-working-with-sequence-data/</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve put together a simple alfred workflow with a few utilities for working with sequence data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;https://github.com/danielecook/seq-utilities/releases/latest&#34;&gt;Download the latest release of Seq-Utilities&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id=&#34;usage&#34;&gt;Usage&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Generate a random dna sequence 200 base pairs long.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/dna1.png&#34; alt=&#34;dna1&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Generate the complement, reverse complement, RNA, and protein of a DNA sequence&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/dna2.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Open up blast and pre-populate the search field&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;blast ATGTCCTCGTTCGACCGTCGTATTGAAGCTGCATGTAAA
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Split a GFF File into Individual Features</title>
      <link>https://www.danielecook.com/split-a-gff-file-into-individual-features/</link>
      <pubDate>Sun, 25 Jan 2015 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/split-a-gff-file-into-individual-features/</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;http://www.ensembl.org/info/website/upload/gff.html&#34;&gt;General Feature Format&lt;/a&gt; is a widely used format for annotating genome sequences. If indexed with tabix, gff files can be viewed in IGV or elsewhere. While features are organized in a nested manner (e.g. genes &amp;gt; exons &amp;gt; variant), you can pull out the individual types and index them, or combine only a few for viewing in your genome browser.&lt;/p&gt;
&lt;p&gt;I was working with &lt;a href=&#34;ftp://ftp.wormbase.org/pub/wormbase/releases/WS245/species/c_elegans/PRJNA13758/&#34;&gt;wormbase&lt;/a&gt; annotation files, which combine all the different types of features together (genes, ncRNA, mRNA, binding site, operon, G Quartets, piRNAs, etc). This results in a very dense track in IGV which makes it difficult to disentangle what role individual features (or features of interest) might have.&lt;/p&gt;
&lt;p&gt;As a result, I wrote this very short script for splitting the individual feature types apart, sorting them, and indexing them with tabix. This way they can be selectively viewed in IGV or elsewhere.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; sys

current_feature &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;

&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; line &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; sys&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;stdin:
    feature &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; line&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;split(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\t&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;)[&lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;]
    &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; feature &lt;span style=&#34;color:#f92672&#34;&gt;!=&lt;/span&gt; current_feature:
        f &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; file(feature &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;.gff&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;a+&amp;#34;&lt;/span&gt;)
    f&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;write(line)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;gunzip -kfc &amp;lt;GFF&amp;gt; | grep -v ^&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;#&amp;#34;&lt;/span&gt; | sort -k3,3 | python process_gff.py

&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; i in &lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;ls *.gff&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;; &lt;span style=&#34;color:#66d9ef&#34;&gt;do&lt;/span&gt;
    &lt;span style=&#34;color:#f92672&#34;&gt;(&lt;/span&gt;grep ^&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;#&amp;#34;&lt;/span&gt; $i.gff; grep -v ^&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;#&amp;#34;&lt;/span&gt; $i.gff | sort -k1,1 -k4,4n&lt;span style=&#34;color:#f92672&#34;&gt;)&lt;/span&gt; | bgzip &amp;gt; $i.sorted.gff.gz;
    tabix $i.sorted.gff.gz
    rm $i.gff
&lt;span style=&#34;color:#66d9ef&#34;&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Aggregate FastQC Reports</title>
      <link>https://www.danielecook.com/aggregate-fastqc-reports/</link>
      <pubDate>Sun, 28 Dec 2014 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/aggregate-fastqc-reports/</guid>
      <description>&lt;h2 id=&#34;update-multiqc-2019-06-21&#34;&gt;Update: MultiQC (2019-06-21)&lt;/h2&gt;
&lt;p&gt;After I originally published this script for aggregating FASTQC reports, &lt;a href=&#34;https://multiqc.info&#34;&gt;MultiQC&lt;/a&gt; was published by &lt;a href=&#34;https://github.com/ewels&#34;&gt;Phil Ewels&lt;/a&gt;. MultiQC aggregates quality-control and other associated data from sequencing tools into an interactive report. Instead of the script below, you can simply run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Run this command where your *_fastqc.zip files are&lt;/span&gt;
multiqc .
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This will output a repor that looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/multiqc.png&#34; alt=&#34;multiqc screenshot&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Publication&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;MultiQC: Summarize analysis results for multiple tools and samples in a single report&lt;/strong&gt; &lt;!-- raw HTML omitted --&gt;
&lt;em&gt;Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller&lt;/em&gt; &lt;!-- raw HTML omitted --&gt;
Bioinformatics (2016) &lt;!-- raw HTML omitted --&gt;
doi: &lt;a href=&#34;http://dx.doi.org/10.1093/bioinformatics/btw354&#34;&gt;10.1093/bioinformatics/btw354&lt;/a&gt; &lt;!-- raw HTML omitted --&gt;
PMID: &lt;a href=&#34;http://www.ncbi.nlm.nih.gov/pubmed/27312411&#34;&gt;27312411&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;original-post-2014-12-28&#34;&gt;Original Post (2014-12-28)&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;http://www.bioinformatics.babraham.ac.uk/projects/fastqc/&#34;&gt;FastQC&lt;/a&gt; is a phenomenal sequence quality assessment tool for evaluating both fastq and bam files. If you are working with a large number of sequence files (fastq), you may wish to compare results across all of them by comparing the plots that fastqc produces. I’m talking about the set of plots that look like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/Uchicago-L001-CB4857_CGC-4642f-1.png&#34; alt=&#34;fastqc&#34;&gt;&lt;/p&gt;
&lt;p&gt;FastQC can be invoked from the command line by typing &lt;code&gt;fastqc &amp;lt;fastq/bam&amp;gt;&lt;/code&gt;, and it will produce an html report and associated zip file containing data, plots, and some ancillary files. The zip file contains an &lt;strong&gt;Images&lt;/strong&gt; folder where the plots that become incorporated into the html report are stored. They are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Adapter Content&lt;/li&gt;
&lt;li&gt;Duplication Levels&lt;/li&gt;
&lt;li&gt;Kmer Profiles&lt;/li&gt;
&lt;li&gt;Per base N Content&lt;/li&gt;
&lt;li&gt;Per Base Quality&lt;/li&gt;
&lt;li&gt;Per Base Sequence Content&lt;/li&gt;
&lt;li&gt;Per Sequence GC Content&lt;/li&gt;
&lt;li&gt;Per Sequence Quality&lt;/li&gt;
&lt;li&gt;Per Tile Quality&lt;/li&gt;
&lt;li&gt;Sequence Length Distribution&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The zipped folder also contains a file called &lt;strong&gt;fastqc_data.txt&lt;/strong&gt; and &lt;strong&gt;summary.txt&lt;/strong&gt;. &lt;strong&gt;fastqc_data.txt&lt;/strong&gt; contains the raw data and statistics while &lt;strong&gt;summary.txt&lt;/strong&gt; summarizes which tests have been passed.&lt;/p&gt;
&lt;p&gt;To easily compare data across reports I wrote this short shell script (below) which will ‘aggregate’ images, statistics, and summaries by:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Unzipping all the avaible fastqc zip files.&lt;/li&gt;
&lt;li&gt;Creating a &lt;strong&gt;fq_aggregated&lt;/strong&gt; folder, and individual folders within for each plot type.&lt;/li&gt;
&lt;li&gt;Move images from each unzipped fastqc report into the folder to which it belongs, and renaming it as the filename of the report (e.g. sample name).&lt;/li&gt;
&lt;li&gt;Concatenating &lt;strong&gt;summary.txt&lt;/strong&gt; files as &lt;strong&gt;fq_aggregated&lt;/strong&gt;/&lt;strong&gt;summary.txt&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Concatenating the basic statistics from each report into &lt;strong&gt;fq_aggregated&lt;/strong&gt;/&lt;strong&gt;statistics.txt&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Images will be reorganized as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/aggregate_fastqc.png&#34; alt=&#34;aggregate fastqc&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;summarytxt&#34;&gt;&lt;code&gt;summary.txt&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;fq_aggregated&lt;/strong&gt;/&lt;strong&gt;summary.txt&lt;/strong&gt; will produce a tab delimited file that looks like this:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Basic Statistics&lt;/td&gt;
&lt;td&gt;SeqA.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Per base  sequence  quality&lt;/td&gt;
&lt;td&gt;SeqA.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Per tile  sequence  quality&lt;/td&gt;
&lt;td&gt;SeqA.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Per sequence  quality scores&lt;/td&gt;
&lt;td&gt;SeqA.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FAIL&lt;/td&gt;
&lt;td&gt;Per base  sequence  content&lt;/td&gt;
&lt;td&gt;SeqA.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Per sequence  GC  content&lt;/td&gt;
&lt;td&gt;SeqA.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Per base  N content&lt;/td&gt;
&lt;td&gt;SeqA.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;…&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Basic Statistics&lt;/td&gt;
&lt;td&gt;SeqB.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Per base  sequence  quality&lt;/td&gt;
&lt;td&gt;SeqB.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Per tile  sequence  quality&lt;/td&gt;
&lt;td&gt;SeqB.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Per sequence  quality scores&lt;/td&gt;
&lt;td&gt;SeqB.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Per base  sequence  content&lt;/td&gt;
&lt;td&gt;SeqB.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FAIL&lt;/td&gt;
&lt;td&gt;Per sequence  GC  content&lt;/td&gt;
&lt;td&gt;SeqB.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FAIL&lt;/td&gt;
&lt;td&gt;Per base  N content&lt;/td&gt;
&lt;td&gt;SeqB.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;statisticstxt&#34;&gt;&lt;code&gt;statistics.txt&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;fq_aggregated&lt;/strong&gt;/&lt;strong&gt;statistics.txt&lt;/strong&gt; will look like this:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Basic Statistics&lt;/td&gt;
&lt;td&gt;SeqA.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Per base  sequence  quality&lt;/td&gt;
&lt;td&gt;SeqA.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Per tile  sequence  quality&lt;/td&gt;
&lt;td&gt;SeqA.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Per sequence  quality scores&lt;/td&gt;
&lt;td&gt;SeqA.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FAIL&lt;/td&gt;
&lt;td&gt;Per base  sequence  content&lt;/td&gt;
&lt;td&gt;SeqA.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Per sequence  GC  content&lt;/td&gt;
&lt;td&gt;SeqA.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Per base  N content&lt;/td&gt;
&lt;td&gt;SeqA.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;d&lt;/td&gt;
&lt;td&gt;…&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Basic Statistics&lt;/td&gt;
&lt;td&gt;SeqB.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Per base  sequence  quality&lt;/td&gt;
&lt;td&gt;SeqB.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Per tile  sequence  quality&lt;/td&gt;
&lt;td&gt;SeqB.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Per sequence  quality scores&lt;/td&gt;
&lt;td&gt;SeqB.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;Per base  sequence  content&lt;/td&gt;
&lt;td&gt;SeqB.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FAIL&lt;/td&gt;
&lt;td&gt;Per sequence  GC  content&lt;/td&gt;
&lt;td&gt;SeqB.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FAIL&lt;/td&gt;
&lt;td&gt;Per base  N content&lt;/td&gt;
&lt;td&gt;SeqB.fq&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;the-code&#34;&gt;The Code&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Run this script in a directory containing zip files from fastqc. It aggregates images of each type in individual folders&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# So looking across data is quick.&lt;/span&gt;

zips&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;ls *.zip&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;

&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; i in $zips; &lt;span style=&#34;color:#66d9ef&#34;&gt;do&lt;/span&gt;
    unzip -o $i &amp;amp;&amp;gt;/dev/null;
&lt;span style=&#34;color:#66d9ef&#34;&gt;done&lt;/span&gt;

fastq_folders&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;zips/.zip/&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;

rm -rf fq_aggregated &lt;span style=&#34;color:#75715e&#34;&gt;# Remove aggregate folder if present&lt;/span&gt;
mkdir fq_aggregated

&lt;span style=&#34;color:#75715e&#34;&gt;# Rename Files within each using folder name.&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; folder in $fastq_folders; &lt;span style=&#34;color:#66d9ef&#34;&gt;do&lt;/span&gt;
    folder&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;folder%.*&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;
    img_files&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;ls &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;folder&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;/Images/*png&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; img in $img_files; &lt;span style=&#34;color:#66d9ef&#34;&gt;do&lt;/span&gt;
        img_name&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;$(&lt;/span&gt;basename &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;$img&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;)&lt;/span&gt;;
        img_name&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;img_name%.*&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;
        new_name&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;folder&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;;
        mkdir -p fq_aggregated/&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;img_name&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;;
        mv $img fq_aggregated/&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;img_name&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;/&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;folder/_fastqc/&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;.png;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;done&lt;/span&gt;;
&lt;span style=&#34;color:#66d9ef&#34;&gt;done&lt;/span&gt;;


&lt;span style=&#34;color:#75715e&#34;&gt;# Concatenate Summaries&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; folder in $fastq_folders; &lt;span style=&#34;color:#66d9ef&#34;&gt;do&lt;/span&gt;
    folder&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;folder%.*&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;
    cat &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;folder&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;/summary.txt &amp;gt;&amp;gt; fq_aggregated/summary.txt
&lt;span style=&#34;color:#66d9ef&#34;&gt;done&lt;/span&gt;;

&lt;span style=&#34;color:#75715e&#34;&gt;# Concatenate Statistics&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; folder in $fastq_folders; &lt;span style=&#34;color:#66d9ef&#34;&gt;do&lt;/span&gt;
    folder&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;folder%.*&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;
    head -n &lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;folder&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;/fastqc_data.txt | tail -n &lt;span style=&#34;color:#ae81ff&#34;&gt;7&lt;/span&gt; | awk -v f&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;folder/_fastqc/&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;{ print $0 &amp;#34;\t&amp;#34; f }&amp;#39;&lt;/span&gt; &amp;gt;&amp;gt; fq_aggregated/statistics.txt
    rm -rf &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;folder&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Downgrade a VCF for viewing in IGV (4.2 &gt; 4.1)</title>
      <link>https://www.danielecook.com/downgrade-a-vcf-for-viewing-in-igv-4.2-4.1/</link>
      <pubDate>Mon, 15 Dec 2014 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/downgrade-a-vcf-for-viewing-in-igv-4.2-4.1/</guid>
      <description>&lt;h2 id=&#34;update-you-probably-no-longer-need-this-2019-06-24&#34;&gt;Update: You probably no longer need this (2019-06-24)&lt;/h2&gt;
&lt;p&gt;If you are using up to date software then you probably do not need to worry about downgrading a VCF file.&lt;/p&gt;
&lt;h2 id=&#34;original-post-2014-12-15&#34;&gt;Original Post (2014-12-15)&lt;/h2&gt;
&lt;p&gt;If you are using the new version of bcftools, and you frequently use IGV to view variants you may have run into issues loading the file in IGV. IGV currently does not support VCF version 4.2. However, I’ve been able to tweak the headers of newer VCF files to allow these variants to be viewable in IGV again.&lt;/p&gt;
&lt;p&gt;All you have to do is revert the version number in the first line and replace a few characters IGV does not like. Below is a bash function that will do this – saving any inputted VCF as &lt;code&gt;{vcf_filename}.dg.vcf.gz&lt;/code&gt;. The script also indexes the file making it ready for use.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# If you are trying to view VCF 4.2 files in IGV - you may run into issues. This function might help you.&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# This script will:&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# 1. Rename the file as version 4.1&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# 2. Replace parentheses in the INFO lines (IGV doesn&amp;#39;t like these!)&lt;/span&gt;

&lt;span style=&#34;color:#66d9ef&#34;&gt;function&lt;/span&gt; vcf_downgrade&lt;span style=&#34;color:#f92672&#34;&gt;()&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;
  outfile&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;1/.bcf/&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;
  outfile&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;outfile/.gz/&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;
  outfile&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;outfile/.vcf/&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;
  bcftools view --max-alleles &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt; -O v $1 | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;  sed &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;s/##fileformat=VCFv4.2/##fileformat=VCFv4.1/&amp;#34;&lt;/span&gt; | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;  sed &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;s/(//&amp;#34;&lt;/span&gt; | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;  sed &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;s/)//&amp;#34;&lt;/span&gt; | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;  sed &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;s/,Version=\&amp;#34;3\&amp;#34;&amp;gt;/&amp;gt;/&amp;#34;&lt;/span&gt; | &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;  bcftools view -O z &amp;gt; &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;outfile&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;.dg.vcf.gz
  tabix &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;outfile&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;.dg.vcf.gz
&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Rename Samples within a VCF/BCF</title>
      <link>https://www.danielecook.com/rename-samples-within-a-vcf/bcf/</link>
      <pubDate>Fri, 05 Dec 2014 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/rename-samples-within-a-vcf/bcf/</guid>
      <description>&lt;h1 id=&#34;update-use-bcftools-2019-06-21&#34;&gt;Update: Use bcftools (2019-06-21)&lt;/h1&gt;
&lt;p&gt;Since this post was originally written, &lt;a href=&#34;https://github.com/samtools/bcftools&#34;&gt;bcftools&lt;/a&gt; has added a command for renaming samples called &lt;code&gt;reheader&lt;/code&gt; which allows sample names to be easily modified.&lt;/p&gt;
&lt;h1 id=&#34;original-post-2014-12-05&#34;&gt;Original Post (2014-12-05)&lt;/h1&gt;
&lt;p&gt;These two simple bash functions make it easy to rename samples within a bcf file by using the filename given (if it is a single sample file) or adding a prefix to all samples. This is useful if you want to merge bcf files where the sample names are identical in both (for comparison purposes).&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;function&lt;/span&gt; rename_to_filename &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;
    &lt;span style=&#34;color:#75715e&#34;&gt;# Renames samples with the filename.&lt;/span&gt;
    tmp&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;mktemp -t temp&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
    echo &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;1/.[vb]cf/&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt; &amp;gt; $tmp
    bcftools reheader -s $tmp $1 &amp;gt; m.$1
    mv m.$1 $1
    bcftools index $1
&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;

&lt;span style=&#34;color:#66d9ef&#34;&gt;function&lt;/span&gt; add_sample_prefix &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;
    &lt;span style=&#34;color:#75715e&#34;&gt;# Adds a prefix to the samples within a bcf file.&lt;/span&gt;
    tmp&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;mktemp -t temp&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
    bcftools query -l $1 | awk -v g&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;$2 &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;{ print g $0 }&amp;#39;&lt;/span&gt;  &amp;gt; $tmp
    bcftools reheader -s $tmp $1 &amp;gt; m.$1
    mv m.$1 $1
    bcftools index $1
&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>From SRA Project to FASTQ</title>
      <link>https://www.danielecook.com/from-sra-project-to-fastq/</link>
      <pubDate>Sat, 25 Oct 2014 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/from-sra-project-to-fastq/</guid>
      <description>&lt;h2 id=&#34;original-post-2014-10-25&#34;&gt;Original Post (2014-10-25)&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&#34;http://www.ncbi.nlm.nih.gov/sra&#34;&gt;Sequence Read Archive (SRA)&lt;/a&gt; contains sequence data from scientific studies stored in a special ‘sra’ format. Data is stored in a &lt;a href=&#34;http://www.ncbi.nlm.nih.gov/Traces/sra/?cmd=show&amp;amp;f=sra_sub_expl&amp;amp;view=get_started&#34;&gt;hierarchical format&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;Project ▸ Study ▸ Sample ▸ Experiment ▸ Run&lt;/p&gt;
&lt;p&gt;Recently, I had to use the SRA to download all of the sequence data for a given project. This required querying the SRA database for all the runs in a sequencing project and converting them to FASTQs. Here’s how I did it:&lt;/p&gt;
&lt;p&gt;First, you’ll need &lt;a href=&#34;http://www.ncbi.nlm.nih.gov/books/NBK179288/&#34;&gt;entrez direct&lt;/a&gt;, and the sra toolkit. If you are on a mac, you can install both using &lt;a href=&#34;homebrew.sh&#34;&gt;homebrew&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;brew install edirect &lt;span style=&#34;color:#75715e&#34;&gt;# Entrez Direct&lt;/span&gt;
brew install sratoolkit
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once installed, the script below can be used to download all the sequence data associated with a given project. The script queries the project for all the associated sequence data, and converts to zipped FASTQs. Note that it also uses gnu parallel (to speed things up) and fastqc for quality control. These can be installed on mac using:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;brew install parallel
brew install fastqc
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;Download_SRP_Runs&lt;span style=&#34;color:#f92672&#34;&gt;()&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;
    SRP_IDs&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;esearch -db sra -query $1 | efetch -format docsum | xtract -pattern DocumentSummary -element Run@acc | tr &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;\t&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;\n&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; r in &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;SRP_IDs&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;; &lt;span style=&#34;color:#66d9ef&#34;&gt;do&lt;/span&gt;
        url&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;r:0:6&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;/&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;r&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;/&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;r&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;.sra&amp;#34;&lt;/span&gt;
        wget $url
    &lt;span style=&#34;color:#66d9ef&#34;&gt;done&lt;/span&gt;;
&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;

Download_SRP_Runs &amp;lt;SRP ID GOES HERE&amp;gt;

&lt;span style=&#34;color:#75715e&#34;&gt;# Convert to fastq&lt;/span&gt;
parallel fastq-dump --split-files --gzip &lt;span style=&#34;color:#f92672&#34;&gt;{}&lt;/span&gt; ::: *.sra

&lt;span style=&#34;color:#75715e&#34;&gt;# Perform quality control&lt;/span&gt;
parallel fastqc &lt;span style=&#34;color:#f92672&#34;&gt;{}&lt;/span&gt; ::: *.fastq.gz
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;update-sra-explorer-2019-06-20&#34;&gt;Update: SRA Explorer (2019-06-20)&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&#34;https://ewels.github.io/sra-explorer/#&#34;&gt;SRA Explorer&lt;/a&gt; by &lt;a href=&#34;http://phil.ewels.co.uk/&#34;&gt;Phil Ewels&lt;/a&gt; can be used to generate a collection of SRA datasets and direct download links for their FASTQs.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Calculate Depth and Breadth of Coverage From a bam File</title>
      <link>https://www.danielecook.com/calculate-depth-and-breadth-of-coverage-from-a-bam-file/</link>
      <pubDate>Sat, 20 Sep 2014 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/calculate-depth-and-breadth-of-coverage-from-a-bam-file/</guid>
      <description>&lt;h3 id=&#34;original-post&#34;&gt;Original Post&lt;/h3&gt;
&lt;p&gt;What is the difference between depth and coverage in sequencing experiments? Actually – &lt;a href=&#34;https://www.biostars.org/p/6571/#6574&#34;&gt;they refer to the same thing&lt;/a&gt;, the average number of reads aligned to an individual base. Previously, I had thought coverage referred to the percentage of the genome with aligned reads to it; however the more appropriate term for this is &lt;strong&gt;breadth of coverage&lt;/strong&gt;. &lt;a href=&#34;http://doi.org/10.1093/bib/bbu029&#34;&gt;This paper&lt;/a&gt; more precisely defines what &lt;strong&gt;breadth of coverage&lt;/strong&gt; and &lt;strong&gt;depth of coverage&lt;/strong&gt; mean.&lt;/p&gt;
&lt;p&gt;&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;p&gt;If you need to calculate &lt;em&gt;depth of coverage&lt;/em&gt; and &lt;em&gt;breadth of coverage&lt;/em&gt; you can do so using the python script below. To use the script, feed the function &lt;code&gt;coverage&lt;/code&gt; a bam file, and the function will return a dictionary of the depth of coverage, breadth of coverage, sum of depths (at every position), and number of bases mapped, for every contig/chromosome individually, and the entire genome as a whole.&lt;/p&gt;
&lt;p&gt;Additionally, if you specify the optional second parameter specifying the mitochondrial chromosome, the script will calculate the parameters listed above for the nuclear genome and calculate the ratio of mitochondrial depth of coverage to nuclear depth of coverage. This can act as a proxy for mitochondrial count/content within a cell.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# This script calculates the depth of coverage and breadth of coverage for a given bam. &lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# Outputs a dictionary containing the contig/chromosome names and the depth and breadth of coverage for each&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# and for the entire genome.&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;#&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# If you optionally specify the name of the mitochondrial chromosome (e.g. mtDNA, chrM, chrMT)&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# The script will also generate breadth and depth of coverage for the nuclear genome AND the ratio&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# of mtDNA:nuclearDNA; which can act as a proxy in some cases for mitochondrial count within an individual.&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# &lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# Author: Daniel E. Cook&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# Website: Danielecook.com&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;#&lt;/span&gt;


&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; os
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; re
&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; subprocess &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; Popen, PIPE

&lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;get_contigs&lt;/span&gt;(bam):
    header, err &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; Popen([&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;samtools&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;view&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;-H&amp;#34;&lt;/span&gt;,bam], stdout&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;PIPE, stderr&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;PIPE)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;communicate()
    &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; err &lt;span style=&#34;color:#f92672&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;:
        &lt;span style=&#34;color:#66d9ef&#34;&gt;raise&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;Exception&lt;/span&gt;(err)
    &lt;span style=&#34;color:#75715e&#34;&gt;# Extract contigs from header and convert contigs to integers&lt;/span&gt;
    contigs &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {}
    &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; re&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;findall(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;@SQ\WSN:(?P&amp;lt;chrom&amp;gt;[A-Za-z0-9_]*)\WLN:(?P&amp;lt;length&amp;gt;[0-9]+)&amp;#34;&lt;/span&gt;, header):
        contigs[x[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;]] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; int(x[&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;])
    &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; contigs

&lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;coverage&lt;/span&gt;(bam, mtchr &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; None):
    &lt;span style=&#34;color:#75715e&#34;&gt;# Check to see if file exists&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; os&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;path&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;isfile(bam) &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; False:
        &lt;span style=&#34;color:#66d9ef&#34;&gt;raise&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;Exception&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Bam file does not exist&amp;#34;&lt;/span&gt;)
    contigs &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; get_contigs(bam)

    &lt;span style=&#34;color:#75715e&#34;&gt;# Guess mitochondrial chromosome&lt;/span&gt;
    mtchr &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; [x &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; contigs &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; x&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;lower()&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;find(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;m&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;]
    &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; len(mtchr) &lt;span style=&#34;color:#f92672&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;:
        mtchr &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; None
    &lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;:
        mtchr &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; mtchr[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;]

    coverage_dict &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {}
    &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; c &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; contigs&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;keys():
        command &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;samtools depth -r &lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;%s&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;%s&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt; | awk &amp;#39;{sum+=$3;cnt++}END{print cnt &lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\&amp;#34;\t\&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt; sum}&amp;#39;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;%&lt;/span&gt; (c, bam)
        coverage_dict[c] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {}
        coverage_dict[c][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Bases Mapped&amp;#34;&lt;/span&gt;], coverage_dict[c][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Sum of Depths&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; map(int,Popen(command, stdout&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;PIPE, shell &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; True)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;communicate()[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;]&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;strip()&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;split(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\t&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;))
        coverage_dict[c][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Breadth of Coverage&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; coverage_dict[c][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Bases Mapped&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; float(contigs[c])
        coverage_dict[c][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Depth of Coverage&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; coverage_dict[c][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Sum of Depths&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; float(contigs[c])
        coverage_dict[c][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Length&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; int(contigs[c])

    &lt;span style=&#34;color:#75715e&#34;&gt;# Calculate Genome Wide Breadth of Coverage and Depth of Coverage&lt;/span&gt;
    genome_length &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; float(sum(contigs&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;values()))
    coverage_dict[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;genome&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {}
    coverage_dict[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;genome&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Length&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; int(genome_length)
    coverage_dict[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;genome&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Bases Mapped&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; sum([x[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Bases Mapped&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; k, x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; coverage_dict&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;iteritems() &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; k &lt;span style=&#34;color:#f92672&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;genome&amp;#34;&lt;/span&gt;])
    coverage_dict[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;genome&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Sum of Depths&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; sum([x[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Sum of Depths&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; k, x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; coverage_dict&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;iteritems() &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; k &lt;span style=&#34;color:#f92672&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;genome&amp;#34;&lt;/span&gt;])
    coverage_dict[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;genome&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Breadth of Coverage&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; sum([x[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Bases Mapped&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; k, x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; coverage_dict&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;iteritems() &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; k &lt;span style=&#34;color:#f92672&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;genome&amp;#34;&lt;/span&gt;]) &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; float(genome_length)
    coverage_dict[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;genome&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Depth of Coverage&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; sum([x[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Sum of Depths&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; k, x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; coverage_dict&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;iteritems() &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; k &lt;span style=&#34;color:#f92672&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;genome&amp;#34;&lt;/span&gt;]) &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; float(genome_length)

    &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; mtchr &lt;span style=&#34;color:#f92672&#34;&gt;!=&lt;/span&gt; None:
        &lt;span style=&#34;color:#75715e&#34;&gt;# Calculate nuclear breadth of coverage and depth of coverage&lt;/span&gt;
        ignore_contigs &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; [mtchr, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;genome&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;nuclear&amp;#34;&lt;/span&gt;]
        coverage_dict[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;nuclear&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {}
        coverage_dict[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;nuclear&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Length&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; sum([x[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Length&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; k,x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; coverage_dict&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;iteritems() &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; k &lt;span style=&#34;color:#f92672&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; ignore_contigs ])
        coverage_dict[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;nuclear&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Bases Mapped&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; sum([x[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Bases Mapped&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; k, x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; coverage_dict&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;iteritems() &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; k &lt;span style=&#34;color:#f92672&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; ignore_contigs])
        coverage_dict[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;nuclear&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Sum of Depths&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; sum([x[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Sum of Depths&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; k, x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; coverage_dict&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;iteritems() &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; k &lt;span style=&#34;color:#f92672&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; ignore_contigs])
        coverage_dict[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;nuclear&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Breadth of Coverage&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; sum([x[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Bases Mapped&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; k, x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; coverage_dict&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;iteritems() &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; k &lt;span style=&#34;color:#f92672&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; ignore_contigs]) &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; float(coverage_dict[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;nuclear&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Length&amp;#34;&lt;/span&gt;])
        coverage_dict[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;nuclear&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Depth of Coverage&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; sum([x[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Sum of Depths&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; k, x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; coverage_dict&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;iteritems() &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; k &lt;span style=&#34;color:#f92672&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; ignore_contigs]) &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; float(coverage_dict[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;nuclear&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Length&amp;#34;&lt;/span&gt;])

        &lt;span style=&#34;color:#75715e&#34;&gt;# Calculate the ratio of mtDNA depth to nuclear depth&lt;/span&gt;
        coverage_dict[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;genome&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;mt_ratio&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; coverage_dict[mtchr][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Depth of Coverage&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; float(coverage_dict[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;nuclear&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Depth of Coverage&amp;#34;&lt;/span&gt;])

    &lt;span style=&#34;color:#75715e&#34;&gt;# Flatten Dictionary &lt;/span&gt;
    coverage &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; []
    &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; k,v &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; coverage_dict&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;items():
        &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; v&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;items():
            coverage &lt;span style=&#34;color:#f92672&#34;&gt;+=&lt;/span&gt; [(k,x[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;], x[&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;])]
    &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; coverage
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;a href=&#34;http://samtools.github.io/bcftools/&#34;&gt;Requires BCFTools&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&#34;update-mosdepth-2019-06-21&#34;&gt;Update: mosdepth (2019-06-21)&lt;/h3&gt;
&lt;p&gt;If you are looking to calculate coverage, I highly recommend &lt;a href=&#34;http://www.github.com/brentp/mosdepth&#34;&gt;mosdepth&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Generate a bedfile of masked ranges a fasta file</title>
      <link>https://www.danielecook.com/generate-a-bedfile-of-masked-ranges-a-fasta-file/</link>
      <pubDate>Mon, 15 Sep 2014 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/generate-a-bedfile-of-masked-ranges-a-fasta-file/</guid>
      <description>&lt;p&gt;If you are calling variants as part of a &lt;!-- raw HTML omitted --&gt;NGS&lt;!-- raw HTML omitted --&gt; experiment, you likely are considering filters such as depth, quality, and filtering low complexity regions from the variant dataset. Programs such as &lt;a href=&#34;http://www.repeatmasker.org/&#34;&gt;repeatmasker&lt;/a&gt; are used to identify low complexity regions, replacing repetitive sequences with &lt;strong&gt;N&lt;/strong&gt;&amp;lsquo;s. Repetitive regions have a tendency to be aligned with inappropriate reads and results in false positives.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;ve been provided with or have generated a masked fasta file for a given genome, you can use the following script convert a masked fasta (left) into a bed file (right) with the masked ranges.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&amp;gt;CHROMOSOME_I 1 15072423
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNGTTTGTTNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
TATTAAAAACTGTTCNNNNNNNNNNNNNNNNNNNN
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;chrI    0   4831
chrI    4869    5146
chrI    5181    5305
chrI    5340    5677
chrI    5706    7409
chrI    7431    9549
chrI    9593    9651
chrI    9683    9979
chrI    10014   18897
chrI    18941   19432
chrI    19468   19747
chrI    19782   19877
chrI    19898   21314
chrI    21357   24849
chrI    24903   27411
chrI    27456   27535
chrI    27561   28015
chrI    28054   28505
chrI    28527   28918
chrI    28961   30659
chrI    30682   39364
chrI    39419   42234
chrI    42307   56428
chrI    56455   57860
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;em&gt;Each range corresponds with a low complexity region within the fasta file.&lt;/em&gt; The resulting bed file can be used to filter variants out of a VCF file using a tool such as bcftools&lt;/p&gt;
&lt;h2 id=&#34;usage&#34;&gt;Usage&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;python generate_masked_ranges.py &amp;lt;fasta_file&amp;gt; &amp;gt; output_ranges.txt
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;script&#34;&gt;Script&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#!bin/python&lt;/span&gt;

&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; gzip
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; io
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; sys
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; os

&lt;span style=&#34;color:#75715e&#34;&gt;# This file will generate a bedfile of the masked regions a fasta file.&lt;/span&gt;

&lt;span style=&#34;color:#75715e&#34;&gt;# STDIN or arguments&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; len(sys&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;argv) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;:

    &lt;span style=&#34;color:#75715e&#34;&gt;# Check file type&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; sys&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;argv[&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;]&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;endswith(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;.fa.gz&amp;#34;&lt;/span&gt;):
        input_fasta &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; io&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;TextIOWrapper(io&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;BufferedReader(gzip&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;open(sys&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;argv[&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;])))
    &lt;span style=&#34;color:#66d9ef&#34;&gt;elif&lt;/span&gt; sys&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;argv[&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;]&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;endswith(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;.fa&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;or&lt;/span&gt; sys&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;argv[&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;]&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;endswith(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;.txt&amp;#34;&lt;/span&gt;):
        input_fasta &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; file(sys&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;argv[&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;],&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;r&amp;#39;&lt;/span&gt;)
    &lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;:
        &lt;span style=&#34;color:#66d9ef&#34;&gt;raise&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;Exception&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Unsupported File Type&amp;#34;&lt;/span&gt;)
&lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;:
    &lt;span style=&#34;color:#66d9ef&#34;&gt;print&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\t&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;Usage:&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\n\t\t&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;generate_masked_ranges.py &amp;lt;fasta file | .fa or .fa.gz&amp;gt; &amp;lt;chrome find&amp;gt; &amp;lt;chrome replace&amp;gt;
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\t\t&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;Chrome find&amp;#39; and &amp;#39;chrome replace&amp;#39; are used to find and replace the name of a chromsome. For example,
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\t\t&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;replacing CHROMSOME_I with chr1 can be accomplished by using the command as follows:
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\t\t\t&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;python generate_masked_ranges.py my_fasta.fa CHROMSOME_ chr
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\t\t&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;Output is to stdout
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;raise&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;SystemExit&lt;/span&gt;


n, state &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;# line, character, state (0=Out of gap; 1=In Gap)&lt;/span&gt;
chrom, start, end &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; None, None, None

&lt;span style=&#34;color:#66d9ef&#34;&gt;with&lt;/span&gt; input_fasta &lt;span style=&#34;color:#66d9ef&#34;&gt;as&lt;/span&gt; f:
    &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; line &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; f:
        line &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; line&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;replace(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;)
        &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; line&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;startswith(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;gt;&amp;#34;&lt;/span&gt;):
            &lt;span style=&#34;color:#75715e&#34;&gt;# Print end range&lt;/span&gt;
            &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; state &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;:
                &lt;span style=&#34;color:#66d9ef&#34;&gt;print&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\t&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;join([chrom ,str(start), str(n)])
                start, end, state  &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;
            n &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;# Reset character&lt;/span&gt;
            chrom &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; line&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;split(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34; &amp;#34;&lt;/span&gt;)[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;]&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;replace(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;gt;&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;)
            &lt;span style=&#34;color:#75715e&#34;&gt;# If user specifies, replace chromosome as well&lt;/span&gt;
            &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; len(sys&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;argv) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;:
                chrom &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; chrom&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;replace(sys&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;argv[&lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;],sys&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;argv[&lt;span style=&#34;color:#ae81ff&#34;&gt;3&lt;/span&gt;])
        &lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;:
            &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; char &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; line:
                &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; state &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;and&lt;/span&gt; char &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;N&amp;#34;&lt;/span&gt;:
                    state &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
                    start &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; n
                &lt;span style=&#34;color:#66d9ef&#34;&gt;elif&lt;/span&gt; state &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;and&lt;/span&gt; char &lt;span style=&#34;color:#f92672&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;N&amp;#34;&lt;/span&gt;:
                    state &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;
                    end &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; n
                    &lt;span style=&#34;color:#66d9ef&#34;&gt;print&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\t&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;join([chrom ,str(start), str(end)])
                &lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;:
                    &lt;span style=&#34;color:#66d9ef&#34;&gt;pass&lt;/span&gt;

                n &lt;span style=&#34;color:#f92672&#34;&gt;+=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;# First base is 0 in bed format.&lt;/span&gt;

&lt;span style=&#34;color:#75715e&#34;&gt;# Print mask close if on the last chromosome.&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; state &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;:
            &lt;span style=&#34;color:#66d9ef&#34;&gt;print&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\t&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;join([chrom ,str(start), str(n)])
            start, end, state  &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Generate fasta sequence lengths</title>
      <link>https://www.danielecook.com/generate-fasta-sequence-lengths/</link>
      <pubDate>Wed, 13 Aug 2014 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/generate-fasta-sequence-lengths/</guid>
      <description>&lt;p&gt;&lt;strong&gt;This one liner:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;cat file.fa | awk &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;$0 ~ &amp;#34;&amp;gt;&amp;#34; {if (NR &amp;gt; 1) {print c;} c=0;printf substr($0,2,100) &amp;#34;\t&amp;#34;; } $0 !~ &amp;#34;&amp;gt;&amp;#34; {c+=length($0);} END { print c; }&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Takes a fasta file as input:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&amp;gt;EF491733
tcagattcaaacaccgacgacgatgacgtggcaaagtctcgacgtgtgcg
caattcgtgtatgtgtccagcaggacctcccggagaacgcggaccagtag
gaccaccaggtctacggggatcgccaggatggcct
&amp;gt;EF491734
tcacagggaatgaaggcactgttcgacttgatcgctttgagaccaagacc
cgtggcaattctcggagggcaatgcactgaagtgaacgagccaatagcga
tggcgctcaagtattggcaaatcgtgcaattatcctatgcggagacacat
gccaa
&amp;gt;EF491735
gtcttgcatgacccaaaaggctcctgctcttctgtttcttcttccaatac
atccttctaaccagttggaagggttgacgtatcaagacttcctgcatcaa
aacttcttgaatttgccttcatttgtcgcaattgtgcagc
&amp;gt;EF491736
taaatggaaggaatcacttggcgctgaagaatttgctctccgcacagctt
aatcagactggaactccaatggttaatccaatgatggctttacaacaaca
agcggccgcagtaaacctgattcccaacacaccaatttacccaccc
&amp;gt;EF491737
actctcgcaatcgtctctccccaaatgatgttaacatcactagaaatgac
aaccgaacatatagcccagtcactcctcgtatcacaacaagtgagcggac
agtaacaccggaacagcggtcgccgggtcgaaaagcgttcgaaaccattc
&amp;gt;EF491738
tccctcgttcattcacaacaaaggaaaagcaaactatgggccattcattg
ttgaaattatgaactatcatcagtattctgcaatgacaagtcatatggtc
aaagtaatgaaacggccccaccaggttccgccaatgaaggtcgaccctga
gg
&amp;gt;EF491739
tccttccaactgttgccaactttccaactacaagacacactgaaccagaa
actacgcggagacctctgtcgccttcaaaaatgacaccttctcttccttc
tcctaccaccaccactttgcctgttttctttttgtcacaaatcactgacg
gcgatgaatcagaagatgaa
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Outputs sequence name and length:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;EF491733    135
EF491734    155
EF491735    140
EF491736    146
EF491737    150
EF491738    152
EF491739    170
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I made this today when I needed a way to generate sequence lengths required for some ChIP-Seq analysis.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Visualizing Pairwise Queries in R</title>
      <link>https://www.danielecook.com/visualizing-pairwise-queries-in-r/</link>
      <pubDate>Sat, 02 Aug 2014 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/visualizing-pairwise-queries-in-r/</guid>
      <description>&lt;p&gt;&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;p&gt;You can look for interesting associations between sets of search terms on PubMed by comparing how often two terms co-occur. The code below returns the number of publications where both terms are mentioned, acting as a rough estimate for how associated they are (at least, in the scholarly world).&lt;/p&gt;
&lt;p&gt;In the example below, I show the results from organisms x diseases/disease-associated terms which is an imperfect look at how various terms estimate of how much each disease is studied in a given organism. Of course, this should all be taken with a (big) grain of salt because these organisms and diseases have many synonyms or related terms (e.g. &lt;em&gt;M. Musculus&lt;/em&gt; is often referred to as Mouse in the literature). Additionally, the result count is based off of whether or not the terms were found together within the title and abstract of the literature only – and not the body of the text in many cases.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# install.packages(&amp;#34;RISmed&amp;#34;, &amp;#34;ggplot2&amp;#34;)&lt;/span&gt;
&lt;span style=&#34;color:#a6e22e&#34;&gt;library&lt;/span&gt;(RISmed)
&lt;span style=&#34;color:#a6e22e&#34;&gt;library&lt;/span&gt;(ggplot2)

&lt;span style=&#34;color:#75715e&#34;&gt;# Given two lists of terms, lets see how &amp;#39;hot&amp;#39; they are together&lt;/span&gt;
set1 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;ebola&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;autoimmune&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Diabetes&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Glioblastoma&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Asthma&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Schizophrenia&amp;#34;&lt;/span&gt;)
set2 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;C. elegans&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;D. Melanogaster&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;M. Musculus&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;S. Cerevisiae&amp;#34;&lt;/span&gt;)

&lt;span style=&#34;color:#75715e&#34;&gt;# Generate all possible pairs&lt;/span&gt;
pairs &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;expand.grid&lt;/span&gt;(set1, set2, stringsAsFactors&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;F)

&lt;span style=&#34;color:#75715e&#34;&gt;# Search pubmed for each pair, and return the number of search results.&lt;/span&gt;
results &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;lapply&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;nrow&lt;/span&gt;(pairs),  &lt;span style=&#34;color:#a6e22e&#34;&gt;function&lt;/span&gt;(x) {
  query &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;sprintf&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;%s %s&amp;#34;&lt;/span&gt;, pairs[x,]&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;Var1, pairs[x,]&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;Var2)
  &lt;span style=&#34;color:#a6e22e&#34;&gt;print&lt;/span&gt;(query)
  result &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;EUtilsSummary&lt;/span&gt;(query, type&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;esearch&amp;#39;&lt;/span&gt;, db&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;pubmed&amp;#39;&lt;/span&gt;)
  &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(q1&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;pairs[x,]&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;Var1, q2&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;pairs[x,]&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;Var2, count&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;QueryCount&lt;/span&gt;(result))
})

&lt;span style=&#34;color:#75715e&#34;&gt;# Do some data formatting on the results.&lt;/span&gt;
results &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;as.data.frame&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;do.call&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;rbind&amp;#34;&lt;/span&gt;, results), stringsAsFactors&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;F)
&lt;span style=&#34;color:#75715e&#34;&gt;# Turn the number of search results into numeric form.&lt;/span&gt;
results&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;count &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;as.numeric&lt;/span&gt;(results&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;count)

&lt;span style=&#34;color:#75715e&#34;&gt;# Plot the results using geom_tile&lt;/span&gt;
&lt;span style=&#34;color:#a6e22e&#34;&gt;ggplot&lt;/span&gt;(results) &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt;
  &lt;span style=&#34;color:#a6e22e&#34;&gt;geom_tile&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;aes&lt;/span&gt;(x&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;q1, y&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;q2, fill&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;count)) &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt;
  &lt;span style=&#34;color:#a6e22e&#34;&gt;geom_text&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;aes&lt;/span&gt;(x&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;q1, y&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;q2, label&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;count), color &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;white&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; 
  &lt;span style=&#34;color:#a6e22e&#34;&gt;labs&lt;/span&gt;(title&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Disease Publications by Organism&amp;#34;&lt;/span&gt;, x&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;x&amp;#34;&lt;/span&gt;, y&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;y&amp;#34;&lt;/span&gt;)

&lt;span style=&#34;color:#a6e22e&#34;&gt;ggsave&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;~/Desktop/pairwise_search.png&amp;#34;&lt;/span&gt;, width &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;8&lt;/span&gt;, height &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt;)

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>A Short tour around Lake Michigan</title>
      <link>https://www.danielecook.com/a-short-tour-around-lake-michigan/</link>
      <pubDate>Sat, 12 Jul 2014 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/a-short-tour-around-lake-michigan/</guid>
      <description>&lt;p&gt;I’ve given bicycle touring a try. Originally I wanted to bike around Lake Michigan, but it turns out to be over &lt;a href=&#34;http://en.wikipedia.org/wiki/Lake_Michigan&#34;&gt;1,400 miles&lt;/a&gt;. So I compromised on a three day trip around a good chunk and making use of the ferry from Muskeegon, MI to Milwaukee, WI. This was my first time – so I also decided to stay in hotels. Next time I intend to camp. I learned a few valuable lessons along the way!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pack less stuff!&lt;/strong&gt; – I had way too much. In fact, I wound up breaking two spokes on the second day.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Shorten the days&lt;/strong&gt; – Having never gone more than 40 miles in a single day, I decided to go 108 on the first day. Yeah. I probably should have gone more like 60-70 each day. By the time I got to my destination each day I was too tired to do anything. Part of the experience is seeing new places.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Get a proper touring bike&lt;/strong&gt; – I didn’t use a touring bike because I don’t have one (yet). I used a &lt;a href=&#34;http://www.trekbikes.com/us/en/bikes/town/fitness/fx/7_2_fx_wsd_2014/#&#34;&gt;Trek 7.2&lt;/a&gt;. My wrists hurt a lot for parts of the trip. Next time I’ll get a proper touring bike with the appropriate handle bars.&lt;/li&gt;
&lt;/ul&gt;
&lt;!-- raw HTML omitted --&gt;
&lt;h3 id=&#34;pictures&#34;&gt;Pictures&lt;/h3&gt;
&lt;p&gt;&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;p&gt;&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;p&gt;&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;p&gt;&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;p&gt;&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;p&gt;&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;p&gt;&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;p&gt;&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>How to plot all of your Runkeeper Data</title>
      <link>https://www.danielecook.com/how-to-plot-all-of-your-runkeeper-data/</link>
      <pubDate>Fri, 30 May 2014 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/how-to-plot-all-of-your-runkeeper-data/</guid>
      <description>&lt;!-- raw HTML omitted --&gt;
&lt;!-- raw HTML omitted --&gt;
&lt;p&gt;If you use &lt;a href=&#34;http://www.runkeeper.com&#34;&gt;runkeeper&lt;/a&gt; and pay for a yearly subscription (runkeeper elite), you can export your data and plot all of your activities simultaneously using &lt;a href=&#34;http://www.r-project.org/&#34;&gt;&lt;strong&gt;R&lt;/strong&gt;&lt;/a&gt;. I’ve written a script for doing so (Special thanks to &lt;a href=&#34;http://www.flowingdata.com.com&#34;&gt;flowing data&lt;/a&gt; which has a tutorial that helped with a few key parts of this).&lt;/p&gt;
&lt;p&gt;The script does a few unique things.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Runkeeper exports data in &lt;a href=&#34;http://www.topografix.com/gpx.asp&#34;&gt;gpx format&lt;/a&gt;. If you ever pause an activity within runkeeper or you lose GPS reception briefly, the GPS path will get split into multiple paths within the same file. The script will retain all paths and plot them separately.&lt;/li&gt;
&lt;li&gt;This script will merge in the type of activities so you can plot different types of activities by color.&lt;/li&gt;
&lt;li&gt;Finally, cluster analysis is used to segregate different locations when plotting. If you are like me and have moved around a bit – this is necessary as plotting distant locations on the same map (e.g. Chicago and Boston) is not feasible and results in distant locations being plotted as single points.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&#34;directions&#34;&gt;Directions&lt;/h4&gt;
&lt;ol&gt;
&lt;li&gt;Export your runkeeper data. The option is available for subscribers only under the settings menu.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;a href=&#34;https://www.danielecook.com/Screen-Shot-2014-05-27-at-8.09.00-PM.png&#34;&gt;&lt;!-- raw HTML omitted --&gt;&lt;/a&gt;&lt;/p&gt;
&lt;!-- raw HTML omitted --&gt;
&lt;ol&gt;
&lt;li&gt;Place the script below within a folder containing your runkeeper data. Set the &lt;code&gt;num_locations&lt;/code&gt; variable to the number of places you have lived/run. This will be used to pull out the number of distinct running locations automatically.&lt;!-- raw HTML omitted --&gt;&lt;/li&gt;
&lt;li&gt;Install the necessary R packages. You can run the following code within R to do so.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;install.packages&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;fpc&amp;#34;&lt;/span&gt;)
&lt;span style=&#34;color:#a6e22e&#34;&gt;install.packages&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;plyr&amp;#34;&lt;/span&gt;)
&lt;span style=&#34;color:#a6e22e&#34;&gt;install.packages&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;dplyr&amp;#34;&lt;/span&gt;)
&lt;span style=&#34;color:#a6e22e&#34;&gt;install.packages&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;mapproj&amp;#34;&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start=&#34;3&#34;&gt;
&lt;li&gt;Run the script below from within R Studio or on unix based machines using &lt;code&gt;RScript plot_runkeeper.R&lt;/code&gt;. If you are using Rstudio, be sure to set the working directory using &lt;code&gt;setwd()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Special thanks for insights from flowingdata.com regarding this.&lt;/span&gt;

&lt;span style=&#34;color:#a6e22e&#34;&gt;library&lt;/span&gt;(plotKML)
&lt;span style=&#34;color:#a6e22e&#34;&gt;library&lt;/span&gt;(plyr)
&lt;span style=&#34;color:#a6e22e&#34;&gt;library&lt;/span&gt;(dplyr)
&lt;span style=&#34;color:#a6e22e&#34;&gt;library&lt;/span&gt;(fpc)

num_locations &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;

&lt;span style=&#34;color:#75715e&#34;&gt;# Usage: Place this script in the directory containing your runkeeper data. You can run from terminal using &amp;#39;Rscript map_runkeeper.R&amp;#39;, or&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# set your working directory to the location and run within RStudio (use setwd(&amp;#34;~/location/of/runkeeper/data&amp;#34;)).&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# See below on how to set the number of clusters.&lt;/span&gt;

&lt;span style=&#34;color:#75715e&#34;&gt;# GPX files downloaded from Runkeeper&lt;/span&gt;
files &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;dir&lt;/span&gt;(pattern &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;\\.gpx&amp;#34;&lt;/span&gt;)

&lt;span style=&#34;color:#75715e&#34;&gt;# Generate vectors for data frame&lt;/span&gt;
index &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;()
latitude &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;()
longitude &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;()
file &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;()

c &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;# Set up Counter&lt;/span&gt;

&lt;span style=&#34;color:#75715e&#34;&gt;# &lt;/span&gt;
&lt;span style=&#34;color:#a6e22e&#34;&gt;for &lt;/span&gt;(f in &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;length&lt;/span&gt;(files)) {
  curr_route &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;readGPX&lt;/span&gt;(files[f])
&lt;span style=&#34;color:#75715e&#34;&gt;# Treat interrupted GPS paths as seperate routes (useful if you occasionally stop running..walk for a bit, and start again like I do.)&lt;/span&gt;
&lt;span style=&#34;color:#a6e22e&#34;&gt;for &lt;/span&gt;(i in curr_route&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;tracks[[1]]) {
  c &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; c &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
  location &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; i
  file &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(file,&lt;span style=&#34;color:#a6e22e&#34;&gt;rep&lt;/span&gt;(files[f], &lt;span style=&#34;color:#a6e22e&#34;&gt;dim&lt;/span&gt;(location)[1])) 
  index &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(index, &lt;span style=&#34;color:#a6e22e&#34;&gt;rep&lt;/span&gt;(c, &lt;span style=&#34;color:#a6e22e&#34;&gt;dim&lt;/span&gt;(location)[1]))
  latitude &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(latitude, location&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;lat)
  longitude &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(longitude, location&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;lon)
}
}
routes &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;data.frame&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;cbind&lt;/span&gt;(index, latitude, longitude,file))

&lt;span style=&#34;color:#75715e&#34;&gt;# Because the routes dataframe takes a while to generate for some folks - save it!&lt;/span&gt;
&lt;span style=&#34;color:#a6e22e&#34;&gt;save&lt;/span&gt;(routes, file&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;routes.Rdata&amp;#34;&lt;/span&gt;)
&lt;span style=&#34;color:#75715e&#34;&gt;# Use to load as needed.&lt;/span&gt;
&lt;span style=&#34;color:#a6e22e&#34;&gt;load&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;routes.Rdata&amp;#34;&lt;/span&gt;)

&lt;span style=&#34;color:#75715e&#34;&gt;# Fix data types&lt;/span&gt;
routes&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;file &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;as.character&lt;/span&gt;(routes&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;file)
routes&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;latitude &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;as.numeric&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;levels&lt;/span&gt;(routes&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;latitude)[routes&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;latitude])
routes&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;longitude &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;as.numeric&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;levels&lt;/span&gt;(routes&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;longitude)[routes&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;longitude])
routes &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;transform&lt;/span&gt;(routes, index &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;as.numeric&lt;/span&gt;(index))

&lt;span style=&#34;color:#75715e&#34;&gt;# Load Meta Data&lt;/span&gt;
meta_data &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;read.csv&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;cardioActivities.csv&amp;#34;&lt;/span&gt;, stringsAsFactors&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FALSE&lt;/span&gt;)
meta_data &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;rename&lt;/span&gt;(meta_data, &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;GPX.File&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;file&amp;#34;&lt;/span&gt;))

&lt;span style=&#34;color:#75715e&#34;&gt;# Bind routes&lt;/span&gt;
routes &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;left_join&lt;/span&gt;(routes, meta_data, by&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;file&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;%.%&lt;/span&gt;
  &lt;span style=&#34;color:#a6e22e&#34;&gt;arrange&lt;/span&gt;(index)


&lt;span style=&#34;color:#75715e&#34;&gt;# Use this function specify activity color if you have multiple activities.&lt;/span&gt;
activity_color &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;function&lt;/span&gt;(activity) {
  &lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(activity&lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Cycling&amp;#34;&lt;/span&gt;) {
    color &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;#00000060&amp;#34;&lt;/span&gt;
  } else &lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(activity&lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Hiking&amp;#34;&lt;/span&gt;) {
    color &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;#00000060&amp;#34;&lt;/span&gt;
  } else {
    color &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;#0080ff60&amp;#34;&lt;/span&gt;
  }
  color
}

&lt;span style=&#34;color:#75715e&#34;&gt;# Identify clusters of points, which will correspond to locations you have run. For example,&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# I have run in Boston, Iowa City, Chicago, and a few other cities. You will want to set the minimum krange&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# to the number of cities you have run in (5 in my case).&lt;/span&gt;
clusters &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;pamk&lt;/span&gt;(routes[,&lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;latitude&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;longitude&amp;#34;&lt;/span&gt;)], krange&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;num_locations&lt;span style=&#34;color:#f92672&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;20&lt;/span&gt;, diss&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;T, usepam&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;F)&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;pamobject&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;medoids

&lt;span style=&#34;color:#75715e&#34;&gt;# Plot Everything&lt;/span&gt;
&lt;span style=&#34;color:#a6e22e&#34;&gt;for &lt;/span&gt;(r in &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;max&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;row&lt;/span&gt;(clusters))) {
  &lt;span style=&#34;color:#a6e22e&#34;&gt;print&lt;/span&gt;(r)
  lat_range &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; clusters[r,][1] &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;rnorm&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;20&lt;/span&gt;, sd&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0.1&lt;/span&gt;)
  lon_range &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt;clusters[r,][2] &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;rnorm&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;20&lt;/span&gt;, sd&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0.1&lt;/span&gt;)
  setroutes &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;filter&lt;/span&gt;(routes, (latitude &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;min&lt;/span&gt;(lat_range) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;amp;&lt;/span&gt; latitude &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;max&lt;/span&gt;(lat_range)),
                      longitude &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;min&lt;/span&gt;(lon_range) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;amp;&lt;/span&gt;  longitude &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;max&lt;/span&gt;(lon_range))
  
  routeIds &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;unique&lt;/span&gt;(setroutes&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;index)
  
  &lt;span style=&#34;color:#75715e&#34;&gt;# Albers projection&lt;/span&gt;
  locProj &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;mapproject&lt;/span&gt;(setroutes&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;longitude, setroutes&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;latitude, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;rectangular&amp;#34;&lt;/span&gt;, par&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;38&lt;/span&gt;)
  setroutes&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;latproj &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; locProj&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;x
  setroutes&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;lonproj &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; locProj&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;y
  
  
  &lt;span style=&#34;color:#75715e&#34;&gt;# Map the projected points&lt;/span&gt;
  &lt;span style=&#34;color:#a6e22e&#34;&gt;pdf&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;sprintf&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;%s-all.pdf&amp;#34;&lt;/span&gt;, r))
  
  &lt;span style=&#34;color:#a6e22e&#34;&gt;plot&lt;/span&gt;(setroutes&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;latproj, setroutes&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;lonproj, type&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;n&amp;#34;&lt;/span&gt;, asp&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;, axes&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FALSE&lt;/span&gt;, xlab&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;, ylab&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;)
  &lt;span style=&#34;color:#a6e22e&#34;&gt;for &lt;/span&gt;(i in routeIds) {
    currRoute &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;subset&lt;/span&gt;(setroutes, index&lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt;i)
    &lt;span style=&#34;color:#a6e22e&#34;&gt;lines&lt;/span&gt;(currRoute&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;latproj, currRoute&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;lonproj, col&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;activity_color&lt;/span&gt;(currRoute&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;Type), lwd&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0.4&lt;/span&gt;)
  }
  &lt;span style=&#34;color:#a6e22e&#34;&gt;dev.off&lt;/span&gt;()
}

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Where I Run and Bike in Chicago</title>
      <link>https://www.danielecook.com/where-i-run-and-bike-in-chicago/</link>
      <pubDate>Sun, 25 May 2014 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/where-i-run-and-bike-in-chicago/</guid>
      <description>&lt;h3 id=&#34;original-post-2014-05-25&#34;&gt;Original Post (2014-05-25)&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/2-all.png&#34; alt=&#34;Where I run and bike in Chicago&#34;&gt;&lt;/p&gt;
&lt;p&gt;Using &lt;a href=&#34;http://www.runkeeper.com&#34;&gt;runkeeper&lt;/a&gt; and with the help of a tutorial at &lt;a href=&#34;http://www.flowingdata.com&#34;&gt;flowing data&lt;/a&gt;, I was able to plot all of the running and biking I’ve been doing in Chicago since moving here two years ago. The &lt;!-- raw HTML omitted --&gt;&lt;!-- raw HTML omitted --&gt;blue&lt;!-- raw HTML omitted --&gt;&lt;!-- raw HTML omitted --&gt; is running and the &lt;strong&gt;black&lt;/strong&gt; is biking.&lt;/p&gt;
&lt;!-- raw HTML omitted --&gt;
&lt;h3 id=&#34;update-strava-2019-06-21&#34;&gt;Update: Strava (2019-06-21)&lt;/h3&gt;
&lt;p&gt;I have since migrated my data to Strava - which I like a lot better than runkeeper. You can migrate your data using the &lt;a href=&#34;https://tapiriik.com/&#34;&gt;Tapariik&lt;/a&gt; service.&lt;/p&gt;
&lt;p&gt;Strava has a really cool global heatmap - or you can visualize your own with their pro subscription.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/ragbrai.png&#34; alt=&#34;Ragbrai as seen by strava&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;!-- raw HTML omitted --&gt;The global heatmap reveals the locations of &lt;a href=&#34;https://ragbrai.com/&#34;&gt;Ragbrai&lt;/a&gt; across the state of Iowa for the past couple of years.&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Double Checking FASTQs</title>
      <link>https://www.danielecook.com/double-checking-fastqs/</link>
      <pubDate>Sat, 24 May 2014 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/double-checking-fastqs/</guid>
      <description>&lt;p&gt;When you have performed a sequencing project, quality control is one of the first things you will need to do. Unfortunately, &lt;a href=&#34;http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0041815&#34;&gt;sample mix-ups&lt;/a&gt; and other issues can and do happen. Systematic biases can also occur by machine and lane.&lt;/p&gt;
&lt;p&gt;This script will extracting basic information from a set of &lt;a href=&#34;http://en.wikipedia.org/wiki/FASTQ_format&#34;&gt;FASTQs&lt;/a&gt; and output it to summary file (&lt;code&gt;fastq_summary.txt&lt;/code&gt;). This will work with demultiplexed FASTQs generated by Illumina machines that appear in the following format:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;@HWI-EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1&lt;/code&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;@HWI-EAS209_0006_FC706VJ&lt;/strong&gt; – Machine name&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;5&lt;/strong&gt; – lane&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;58&lt;/strong&gt; – tile within flowcell lane&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;5894&lt;/strong&gt; – x coordinate of cluster within tile&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;21141&lt;/strong&gt; – y coordinate of cluster within tile&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;#ATCACG&lt;/strong&gt; – index&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;/1&lt;/strong&gt; – member of pair (/1 or /2)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The script below will extract the machine name, lane, index, and pair.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#!/usr/bin/python&lt;/span&gt;
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; re
&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; itertools &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; groupby &lt;span style=&#34;color:#66d9ef&#34;&gt;as&lt;/span&gt; g
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; subprocess
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; sys
&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; collections &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; OrderedDict

&lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;most_common&lt;/span&gt;(L):
  &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; max(g(sorted(L)), key&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;lambda&lt;/span&gt;(x, v):(len(list(v)),&lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt;L&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;index(x)))[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;]

&lt;span style=&#34;color:#75715e&#34;&gt;# Set this variable to ensure no quality score lines get examined.&lt;/span&gt;
fq_at_start &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;HWI&amp;#34;&lt;/span&gt;

r &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; subprocess&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;check_output(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;for r in `ls *fastq.gz`; 
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;do
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;echo &amp;#34;$r&amp;#34; 
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;gunzip -cq $r | head -n 1000 | grep &amp;#39;^@&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;%s&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; - | grep -v &amp;#39;^@@&amp;#39; |  egrep &amp;#39;(:.+){4}&amp;#39; -
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;echo &amp;#34;|&amp;#34; 
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;done;
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;%&lt;/span&gt; fq_at_start, shell&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;True)


f &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; open(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;fastq_summary.txt&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;w&amp;#34;&lt;/span&gt;)
 

orig_line &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;  OrderedDict([(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;file&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;),
      (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;instrument&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;),
      (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;flowcell_id&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;),
      (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;flowcell_lane&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;),
      (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;x_coord&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;),
      (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;y_coord&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;),
      (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;index&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;),
      (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;pair&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;),
      (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;run_id&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;),
      (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;filtered&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;),
      (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;control_bits&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;)])
l_keys &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; orig_line&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;keys()

f&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;write(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\t&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;join(l_keys) &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;)
 
&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; fq_group &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; [filter(len,x&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;split(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;)) &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; r&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;split(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;|&amp;#34;&lt;/span&gt;)][:&lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;]:
  index_set &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; []
  &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; heading &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; fq_group[&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;:]:
    l &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; re&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;split(&lt;span style=&#34;color:#e6db74&#34;&gt;r&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;(\:|#| )&amp;#39;&lt;/span&gt;,heading)
    line &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; orig_line
    line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;file&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; fq_group[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;]
                &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;]&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;startswith(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;@SRR&amp;#34;&lt;/span&gt;):
                    line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;run_id&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;]
                    line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;instrument&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;]
                    line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;flowcell_lane&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;]
                    index_set&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;append(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;)
    &lt;span style=&#34;color:#66d9ef&#34;&gt;elif&lt;/span&gt; len(l) &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;11&lt;/span&gt;:
      line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;instrument&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;]
      line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;flowcell_lane&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;]
      line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;flowcell_tile&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;]
      line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;x_coord&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt;]
      line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;y_coord&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;8&lt;/span&gt;]
      &lt;span style=&#34;color:#66d9ef&#34;&gt;try&lt;/span&gt;:
        line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;pair&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt;]&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;split(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;/&amp;#34;&lt;/span&gt;)[&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;]
        index_set&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;append(l[&lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt;]&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;split(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;/&amp;#34;&lt;/span&gt;)[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;])
      &lt;span style=&#34;color:#66d9ef&#34;&gt;except&lt;/span&gt;:
        &lt;span style=&#34;color:#66d9ef&#34;&gt;break&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;elif&lt;/span&gt; len(l) &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;21&lt;/span&gt;:
      line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;instrument&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;]
      line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;run_id&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;]
      line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;flowcell_id&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;]
      line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;flowcell_lane&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt;]
      line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;flowcell_tile&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;8&lt;/span&gt;]
      line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;x_coord&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt;]
      line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;y_coord&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;12&lt;/span&gt;]
      line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;pair&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;14&lt;/span&gt;]
      line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;filtered&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;16&lt;/span&gt;]
      line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;control_bits&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;16&lt;/span&gt;]
      line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;index&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; l[&lt;span style=&#34;color:#ae81ff&#34;&gt;20&lt;/span&gt;]
      index_set&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;append(l[&lt;span style=&#34;color:#ae81ff&#34;&gt;20&lt;/span&gt;])
    &lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;:
      &lt;span style=&#34;color:#66d9ef&#34;&gt;print&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;error&amp;#34;&lt;/span&gt;, l
  line[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;index&amp;#34;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; most_common(index_set)
  f&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;write(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\t&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;join([line[x] &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; l_keys]&lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; [&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;]))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Use Google Sheets to identify gene-disease associations in Pubmed</title>
      <link>https://www.danielecook.com/use-google-sheets-to-identify-gene-disease-associations-in-pubmed/</link>
      <pubDate>Mon, 03 Mar 2014 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/use-google-sheets-to-identify-gene-disease-associations-in-pubmed/</guid>
      <description>&lt;p&gt;Google Docs allows you to import XML. By using NCBIs &lt;a href=&#34;http://www.ncbi.nlm.nih.gov/books/NBK25499/&#34;&gt;esearch service&lt;/a&gt;, you can query pubmed for a list of genes. Stick the following code in &lt;strong&gt;A2&lt;/strong&gt;, and a keyword in &lt;strong&gt;B2&lt;/strong&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;=importXML(&amp;#34;http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&amp;amp;term=&amp;#34; &amp;amp; B2 ,&amp;#34;(//Count)[1]&amp;#34;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;What is more valuable, however, is if given a gene list – you can query pubmed for each gene combined with a second keyword like a disease.&lt;/p&gt;
&lt;p&gt;For example, suppose you are studying Cleft lip and Palate and are left with a set of genes identified from a gene expression analysis. Now you want to see if any of those genes have published findings on them related to cleft lip and palate.&lt;/p&gt;
&lt;p&gt;You can use the &lt;code&gt;&amp;amp;&lt;/code&gt; operator to concatenate two keywords (&lt;code&gt;gene &amp;amp; &amp;quot; &amp;quot; &amp;amp; disease&lt;/code&gt;). In &lt;strong&gt;B2&lt;/strong&gt; below you would put the following:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;= C2 &amp;amp; &amp;#34; &amp;#34; &amp;amp; D2
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The result would look something like this:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.danielecook.com/Screen-Shot-2014-03-03-at-8.38.03-AM.png&#34;&gt;&lt;!-- raw HTML omitted --&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;!-- raw HTML omitted --&gt;Pubmed result counts are in Column A&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>An R Function for Opening a dataframe in Excel (Mac Only)</title>
      <link>https://www.danielecook.com/an-r-function-for-opening-a-dataframe-in-excel-mac-only/</link>
      <pubDate>Tue, 18 Feb 2014 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/an-r-function-for-opening-a-dataframe-in-excel-mac-only/</guid>
      <description>&lt;p&gt;The dataframe viewer in &lt;a href=&#34;http://www.rstudio.com/&#34;&gt;Rstudio&lt;/a&gt; can be slow or unresponsive, and at times truncates the content within or the number of columns on large datasets. I want to be able to see the full columns and to be able to arrange and filter simultaneously. Although you can do this in R programmatically sometimes its easier and quicker to use Excel. The function below can be used to open a dataframe in Microsoft Excel.&lt;/p&gt;
&lt;p&gt;This may be worth sticking in your &lt;code&gt;.RProfile&lt;/code&gt; so it is always available.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;excel &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;function&lt;/span&gt;(df) {
  f &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;paste0&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;tempdir&lt;/span&gt;(),
              &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;/&amp;#39;&lt;/span&gt;,
              &lt;span style=&#34;color:#a6e22e&#34;&gt;make.names&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;deparse&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;substitute&lt;/span&gt;(df))),
              &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;.&amp;#39;&lt;/span&gt;,
              &lt;span style=&#34;color:#a6e22e&#34;&gt;paste0&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;sample&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;letters&lt;/span&gt;)[1&lt;span style=&#34;color:#f92672&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;],collapse&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;),
              &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;.csv&amp;#39;&lt;/span&gt;)
  &lt;span style=&#34;color:#a6e22e&#34;&gt;write.csv&lt;/span&gt;(df,f)
  &lt;span style=&#34;color:#a6e22e&#34;&gt;system&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;sprintf&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;open -a &amp;#39;Microsoft Excel&amp;#39; %s&amp;#34;&lt;/span&gt;,f))
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To use, just type:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;excel&lt;/span&gt;(dataframe)

&lt;span style=&#34;color:#75715e&#34;&gt;# Or pipe in using dplyr&lt;/span&gt;

df &lt;span style=&#34;color:#f92672&#34;&gt;%&amp;gt;%&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;excel&lt;/span&gt;()
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Microsoft Excel will open with the dataframe that has been passed.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Alfred Workflow for Creating a Data Analysis Project</title>
      <link>https://www.danielecook.com/alfred-workflow-for-creating-a-data-analysis-project/</link>
      <pubDate>Sat, 25 Jan 2014 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/alfred-workflow-for-creating-a-data-analysis-project/</guid>
      <description>&lt;p&gt;This idea I got from my brother. The idea is to keep any data analysis/bioinformatic projects I work on organized by sticking to a standard template. I wrote an Alfred Workflow for generating the template. There are a couple key features:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/Screen-Shot-2014-01-20-at-12.33.18-AM.png&#34; alt=&#34;Directory Structure&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;!-- raw HTML omitted --&gt;Directory Structure&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Markdown (md) extension&lt;/strong&gt; – is used for the readme because its simple and so that the directory is ready for github if desired.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data Folder&lt;/strong&gt; – This directory is used for storing raw data and scripts that are used to clean and prepare data for analysis.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;analysis&lt;/strong&gt; – This directory contains the scripts for producing statistics and visualizing data.
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;report&lt;/strong&gt; – any publications or presentations that come of the project can be stored in the report folder.
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;run.sh&lt;/strong&gt; is a two line script that will run prepare_data.sh and analysis.sh. This allows you to &lt;a href=&#34;http://phys.org/news/2013-09-science-crisis.html&#34;&gt;reproduce&lt;/a&gt; the entirety of your work all at once and verify your results. &lt;!-- raw HTML omitted --&gt;
What are your thoughts? How could this be improved?&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;usage&#34;&gt;Usage&lt;/h2&gt;
&lt;p&gt;Navigate to the directory where you would like to create the project template; open alfred and type&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;project &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;a name &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; your project&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;h4 id=&#34;downloadcreateprojectalfredworkflow&#34;&gt;&lt;a href=&#34;https://www.danielecook.com/createproject.alfredworkflow&#34;&gt;Download&lt;/a&gt;&lt;/h4&gt;
</description>
    </item>
    
    <item>
      <title>Downloading and storing bioinformatic databases locally</title>
      <link>https://www.danielecook.com/downloading-and-storing-bioinformatic-databases-locally/</link>
      <pubDate>Mon, 20 Jan 2014 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/downloading-and-storing-bioinformatic-databases-locally/</guid>
      <description>&lt;p&gt;If you need to annotate biological data there are plenty of resources online (UCSC Genome Browser, BioMart), and plenty of programmatic tools to interact with these databases as well. But if you are going to be annotating a large dataset (like ChIP-Seq or RNA-Seq data) – you will probably not want to rely on web based services because a) It is inefficient b) You may get throttled or banned.&lt;/p&gt;
&lt;p&gt;If you use &lt;strong&gt;python&lt;/strong&gt;, it is easy to download and store data in an SQlite database. This allows you to query the database using SQL and quickly and efficiently annotate large datasets.&lt;/p&gt;
&lt;p&gt;Below you will see that is what I have done here for HapMap allele frequency data (&lt;a href=&#34;http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/2010-08_phaseII+III/forward/&#34;&gt;2010-08_phaseII+III&lt;/a&gt;), and it allows me to retrieve allele frequency data from 26,278,275 rows across 11 populations instantly. The database itself is 3.22 Gb. A zipped version (~1Gb) is available &lt;a href=&#34;https://drive.google.com/file/d/0B_6qjHtu65BDdmFBeXdGeEc2STQ/edit?usp=sharing&#34;&gt;Here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.danielecook.com/Screen-Shot-2014-01-20-at-12.07.25-AM.png&#34;&gt;&lt;!-- raw HTML omitted --&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You will need sqlalchemy for this script to work. Install using &lt;code&gt;pip install sqlalchemy&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#! /usr/local/bin/Python&lt;/span&gt;
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; sqlite3
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; os
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; glob
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; time
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; sqlalchemy
&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; sqlalchemy &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; Table, Column, Index, Integer, String, Float, MetaData, ForeignKey
&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; sqlalchemy &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; create_engine
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; datetime

os&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;chdir(os&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;path&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;dirname(__file__))


os&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;system(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;wget -nd -r  -A &amp;#34;allele*.gz&amp;#34; -e robots=off &amp;#34;http://hapmap.ncbi.nlm.nih.gov/downloads/frequencies/2010-08_phaseII+III/&amp;#34;&amp;#39;&lt;/span&gt;)
os&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;system(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;gunzip *.gz # Unzip all the files&amp;#39;&lt;/span&gt;)

&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; os&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;path&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;isfile(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;hapmap.db&amp;#39;&lt;/span&gt;):
    os&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;remove(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;hapmap.db&amp;#39;&lt;/span&gt;)

engine &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; create_engine(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;sqlite:///hapmap.db&amp;#39;&lt;/span&gt;)
conn &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; engine&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;connect()

metadata &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; MetaData()

freq &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; Table(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;freq&amp;#39;&lt;/span&gt;, metadata,
    Column(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;id&amp;#39;&lt;/span&gt;, Integer, primary_key&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;True),
    Column(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;population&amp;#39;&lt;/span&gt;, String(&lt;span style=&#34;color:#ae81ff&#34;&gt;3&lt;/span&gt;)),
    Column(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;rs&amp;#39;&lt;/span&gt;, Integer),
    Column(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;chrom&amp;#39;&lt;/span&gt;, String(&lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;)),
    Column(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;pos&amp;#39;&lt;/span&gt;, Integer),
    Column(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;refallele&amp;#39;&lt;/span&gt;,String(&lt;span style=&#34;color:#ae81ff&#34;&gt;3&lt;/span&gt;)),
    Column(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;refallele_freq&amp;#39;&lt;/span&gt;,Float),
    Column(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;refallele_count&amp;#39;&lt;/span&gt;,Integer),
    &lt;span style=&#34;color:#75715e&#34;&gt;#&lt;/span&gt;
    Column(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;otherallele&amp;#39;&lt;/span&gt;,String(&lt;span style=&#34;color:#ae81ff&#34;&gt;3&lt;/span&gt;)),
    Column(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;otherallele_freq&amp;#39;&lt;/span&gt;,Float),
    Column(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;otherallele_count&amp;#39;&lt;/span&gt;,Integer),
    &lt;span style=&#34;color:#75715e&#34;&gt;#&lt;/span&gt;
    Column(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;totalcount&amp;#39;&lt;/span&gt;,Integer),
    sqlite_autoincrement&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;True,
)



metadata&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;create_all(engine)

&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; allele_file &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; glob&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;glob(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;allele*&amp;#34;&lt;/span&gt;):
    f &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; file(allele_file,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;r&amp;#39;&lt;/span&gt;)
    &lt;span style=&#34;color:#66d9ef&#34;&gt;print&lt;/span&gt; f
    &lt;span style=&#34;color:#66d9ef&#34;&gt;print&lt;/span&gt; datetime&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;datetime&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;now()
    pop &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; allele_file[allele_file&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;find(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;_&amp;#39;&lt;/span&gt;,allele_file&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;find(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;chr&amp;#39;&lt;/span&gt;)&lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)&lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;:allele_file&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;find(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;_&amp;#39;&lt;/span&gt;,allele_file&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;find(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;chr&amp;#39;&lt;/span&gt;)&lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)&lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;]
    h &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; f&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;readline()&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;replace(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;#&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&amp;#39;&lt;/span&gt;)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;replace(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&amp;#39;&lt;/span&gt;)
    inserts &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; []
    c &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; line &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; f&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;readlines():
        k &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; dict(zip(h&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;split(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; &amp;#39;&lt;/span&gt;), line&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;split(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; &amp;#39;&lt;/span&gt;)))
        k[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;population&amp;#39;&lt;/span&gt;] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; pop
        c &lt;span style=&#34;color:#f92672&#34;&gt;+=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
        inserts&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;append(k)
        &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; c &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1000&lt;/span&gt;:
            conn&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;execute(freq&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;insert(),inserts)
            inserts &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; []
            c &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;
    conn&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;execute(freq&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;insert(),inserts)

&lt;span style=&#34;color:#75715e&#34;&gt;# Add indices&lt;/span&gt;
Index(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;population&amp;#39;&lt;/span&gt;, freq&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;c&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;population)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;create(engine)
Index(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;rs&amp;#39;&lt;/span&gt;, freq&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;c&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;rs)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;create(engine)
Index(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;chrom&amp;#39;&lt;/span&gt;, freq&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;c&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;chrom)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;create(engine)
Index(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;pos&amp;#39;&lt;/span&gt;, freq&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;c&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;pos)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;create(engine)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Use Google to Find Lecture Notes</title>
      <link>https://www.danielecook.com/use-google-to-find-lecture-notes/</link>
      <pubDate>Sun, 10 Nov 2013 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/use-google-to-find-lecture-notes/</guid>
      <description>&lt;p&gt;This may seem obvious - but I&amp;rsquo;ve discovered a wonderful trick if you ever need to review a science topic quickly or are trying to learn something new, try searching google like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;color:#f92672&#34;&gt;(&lt;/span&gt;topic&lt;span style=&#34;color:#f92672&#34;&gt;)&lt;/span&gt; + Lecture filetype:pdf
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You’ll find that tons of professors post their lecture notes online. Also try using &lt;strong&gt;filetype:ppt&lt;/strong&gt; or leave filetype off (as some professors host websites with lecture notes).&lt;/p&gt;
&lt;p&gt;&lt;!-- raw HTML omitted --&gt;&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Export excel worksheets as individual CSVs</title>
      <link>https://www.danielecook.com/export-excel-worksheets-as-individual-csvs/</link>
      <pubDate>Sat, 09 Nov 2013 22:45:53 +0000</pubDate>
      
      <guid>https://www.danielecook.com/export-excel-worksheets-as-individual-csvs/</guid>
      <description>&lt;h2 id=&#34;original-post-2013-11-09&#34;&gt;Original Post (2013-11-09)&lt;/h2&gt;
&lt;p&gt;If you need to work with data spread across a bunch of worksheets within an excel workbook, but you don’t want to do so in Microsoft Excel – here is a python script for extracting each individual workbook as a csv and exporting them all to a folder.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; xlrd &lt;span style=&#34;color:#75715e&#34;&gt;# pip install xlrd&lt;/span&gt;
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; csv
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; os

&lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;export_workbook&lt;/span&gt;(filename):
  &lt;span style=&#34;color:#75715e&#34;&gt;# Open workbook for initial extraction&lt;/span&gt;
  workbook &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; xlrd&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;open_workbook(filename)
  filename &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; os&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;path&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;splitext(filename)[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;] &lt;span style=&#34;color:#75715e&#34;&gt;# Remove extension&lt;/span&gt;
  &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;not&lt;/span&gt; os&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;path&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;exists(filename):
      os&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;makedirs(filename)
  &lt;span style=&#34;color:#75715e&#34;&gt;# Iterate through each workbook.&lt;/span&gt;
  &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; sheet &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; workbook&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;sheet_names():
    worksheet &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; workbook&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;sheet_by_name(sheet)
    &lt;span style=&#34;color:#75715e&#34;&gt;# Create a file for each sheet&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;with&lt;/span&gt; open(filename &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;/&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; str(sheet)&lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;.csv&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;wb&amp;#39;&lt;/span&gt;) &lt;span style=&#34;color:#66d9ef&#34;&gt;as&lt;/span&gt; f:
      c &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; csv&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;writer(f)
      &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; r &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; range(worksheet&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;nrows):
        c&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;writerow(worksheet&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;row_values(r))
      &lt;span style=&#34;color:#66d9ef&#34;&gt;print&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Exported workbook &amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;%s&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; &lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;%12.2d&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt; row&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;%s&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;%&lt;/span&gt; (sheet,worksheet&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;nrows&lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;85&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;s&amp;#34;&lt;/span&gt;[worksheet&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;nrows&lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;:])

export_workbook(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;test.xlsx&amp;#39;&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;update-xlsx2csv-2019-06-18&#34;&gt;Update: xlsx2csv (2019-06-18)&lt;/h2&gt;
&lt;p&gt;The original post here detailed a python script for extracting worksheets from excel files as plain text files. However, I later stumbled upon an easy to use command-line based option called &lt;a href=&#34;https://github.com/dilshod/xlsx2csv&#34;&gt;xlsx2csv&lt;/a&gt;. &lt;code&gt;xlsx2csv&lt;/code&gt; is a python module with a command line interface that can export worksheets in an Excel file as plain text csv or tsv files.&lt;/p&gt;
&lt;h3 id=&#34;install-xlsx2csv&#34;&gt;Install xlsx2csv&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;pip install xlsx2csv
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;example-usage&#34;&gt;Example usage:&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;xlsx2csv -n &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;sheet_name&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;         -d &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;\t&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;         --sci-float file.xlsx &amp;gt; out.tsv
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;options&#34;&gt;Options&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;usage: xlsx2csv &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;-h&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;-v&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;-a&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;-c OUTPUTENCODING&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;-d DELIMITER&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt;
                &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;--hyperlinks&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;-e&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt;
                &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;-E EXCLUDE_SHEET_PATTERN &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;EXCLUDE_SHEET_PATTERN ...&lt;span style=&#34;color:#f92672&#34;&gt;]]&lt;/span&gt;
                &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;-f DATEFORMAT&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;-t TIMEFORMAT&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;--floatformat FLOATFORMAT&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt;
                &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;--sci-float&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt;
                &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;-I INCLUDE_SHEET_PATTERN &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;INCLUDE_SHEET_PATTERN ...&lt;span style=&#34;color:#f92672&#34;&gt;]]&lt;/span&gt;
                &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;--ignore-formats IGNORE_FORMATS &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;IGNORE_FORMATS ...&lt;span style=&#34;color:#f92672&#34;&gt;]]&lt;/span&gt;
                &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;-l LINETERMINATOR&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;-m&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;-n SHEETNAME&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;-i&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt;
                &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;--skipemptycolumns&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;-p SHEETDELIMITER&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;-q QUOTING&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt;
                &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;-s SHEETID&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt;
                xlsxfile &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;outfile&lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt;
xlsx2csv: error: the following arguments are required: xlsxfile
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Fetch Data from UCSC Genome Browser</title>
      <link>https://www.danielecook.com/fetch-data-from-ucsc-genome-browser/</link>
      <pubDate>Sun, 03 Nov 2013 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/fetch-data-from-ucsc-genome-browser/</guid>
      <description>&lt;h2 id=&#34;original-post-2013-11-03&#34;&gt;Original Post (2013-11-03)&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://www.danielecook.com/accessing-the-ucsc-genome-browser-mysql-database/&#34;&gt;Previously&lt;/a&gt;, I’ve shown that you can use a mysql database browser (e.g. &lt;a href=&#34;http://www.sequelpro.com/&#34;&gt;Sequel Pro&lt;/a&gt;) to access and browse the UCSC Genome Browser MySQL database.&lt;/p&gt;
&lt;p&gt;If you have a small dataset that you would like to annotate, you can write &lt;a href=&#34;http://www8.silversand.net/techdoc/teachsql/ch01.htm&#34;&gt;SQL&lt;/a&gt; statements to fetch data. Below I show how you can use python to fetch genome coordinates by specifying gene and genome build.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Note: Requires mysqldb; install using:&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# pip install MySQL-python&lt;/span&gt;
&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; MySQLdb.constants &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; FIELD_TYPE
&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; _mysql

db &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; None

&lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;fetch_gene_coordinates&lt;/span&gt;(gene_name, build):
    &lt;span style=&#34;color:#66d9ef&#34;&gt;global&lt;/span&gt; db &lt;span style=&#34;color:#75715e&#34;&gt;# db is global to prevent reconnecting.&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; db &lt;span style=&#34;color:#f92672&#34;&gt;is&lt;/span&gt; None:
        &lt;span style=&#34;color:#66d9ef&#34;&gt;print&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;connect&amp;#39;&lt;/span&gt;
        conv&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; { FIELD_TYPE&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;LONG: int }
        db &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; _mysql&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;connect(host&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;genome-mysql.cse.ucsc.edu&amp;#39;&lt;/span&gt;, user&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;genome&amp;#39;&lt;/span&gt;, passwd&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&amp;#39;&lt;/span&gt;, db&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;build,conv&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;conv)
    db&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;query(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&amp;#34;SELECT * FROM kgXref INNER JOIN knownGene ON kgXref.kgID=knownGene.name WHERE kgXref.geneSymbol = &amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;%s&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;%&lt;/span&gt; gene_name)

    r &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; db&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;use_result()&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;fetch_row(how&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;, maxrows&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
    &lt;span style=&#34;color:#66d9ef&#34;&gt;print&lt;/span&gt; r
    &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; len(r)&lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;:
        &lt;span style=&#34;color:#66d9ef&#34;&gt;pass&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;:
        &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; r[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;txStart&amp;#39;&lt;/span&gt;], r[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;txEnd&amp;#39;&lt;/span&gt;], r[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;chrom&amp;#39;&lt;/span&gt;],r[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;strand&amp;#39;&lt;/span&gt;]


&lt;span style=&#34;color:#66d9ef&#34;&gt;print&lt;/span&gt; fetch_gene_coordinates(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;klf1&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;mm9&amp;#39;&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The UCSC browser mysql resource will throttle you if you make too many queries. If you need to annotate large datasets, all of the data is freely available for download &lt;a href=&#34;http://hgdownload-test.cse.ucsc.edu/goldenPath/&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;update-cruzdb-2019-06-18&#34;&gt;Update: cruzdb (2019-06-18)&lt;/h2&gt;
&lt;p&gt;Since writing this post, &lt;a href=&#34;https://github.com/brentp/cruzdb/&#34;&gt;cruzdb&lt;/a&gt; has been published. &lt;code&gt;cruzdb&lt;/code&gt; is a python module by &lt;a href=&#34;https://github.com/brentp&#34;&gt;brentp&lt;/a&gt; that greatly simplifies and facilitates queries of the UCSC genome browser.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.1093/bioinformatics/btt534&#34;&gt;Bioinformatics publication&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&#34;installation&#34;&gt;Installation&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;pip install cruzdb
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Examples&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; cruzdb &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; Genome

&lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; g &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; Genome(db&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;hg18&amp;#34;&lt;/span&gt;)

&lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; muc5b &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; g&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;refGene&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;filter_by(name2&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;MUC5B&amp;#34;&lt;/span&gt;)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;first()
&lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; muc5b
refGene(chr11:MUC5B:&lt;span style=&#34;color:#ae81ff&#34;&gt;1200870&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;1239982&lt;/span&gt;)

&lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; muc5b&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;strand
&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;+&amp;#39;&lt;/span&gt;

&lt;span style=&#34;color:#75715e&#34;&gt;# the first 4 introns&lt;/span&gt;
&lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; muc5b&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;introns[:&lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;]
[(&lt;span style=&#34;color:#ae81ff&#34;&gt;1200999L&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;1203486L&lt;/span&gt;), (&lt;span style=&#34;color:#ae81ff&#34;&gt;1203543L&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;1204010L&lt;/span&gt;), (&lt;span style=&#34;color:#ae81ff&#34;&gt;1204082L&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;1204420L&lt;/span&gt;), (&lt;span style=&#34;color:#ae81ff&#34;&gt;1204682L&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;1204836L&lt;/span&gt;)]

&lt;span style=&#34;color:#75715e&#34;&gt;# the first 4 exons.&lt;/span&gt;
&lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; muc5b&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;exons[:&lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;]
[(&lt;span style=&#34;color:#ae81ff&#34;&gt;1200870L&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;1200999L&lt;/span&gt;), (&lt;span style=&#34;color:#ae81ff&#34;&gt;1203486L&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;1203543L&lt;/span&gt;), (&lt;span style=&#34;color:#ae81ff&#34;&gt;1204010L&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;1204082L&lt;/span&gt;), (&lt;span style=&#34;color:#ae81ff&#34;&gt;1204420L&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;1204682L&lt;/span&gt;)]

&lt;span style=&#34;color:#75715e&#34;&gt;# note that some of these are not coding because they are &amp;lt; cdsStart&lt;/span&gt;
&lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; muc5b&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;cdsStart
&lt;span style=&#34;color:#ae81ff&#34;&gt;1200929L&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Accessing the UCSC Genome Browser MySQL Database</title>
      <link>https://www.danielecook.com/accessing-the-ucsc-genome-browser-mysql-database/</link>
      <pubDate>Sat, 02 Nov 2013 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/accessing-the-ucsc-genome-browser-mysql-database/</guid>
      <description>&lt;p&gt;The UCSC Genome browser has a &lt;a href=&#34;http://genome.ucsc.edu/goldenPath/help/mysql.html&#34;&gt;publicly available MySQL database&lt;/a&gt;. There are a lot of different ways you can use it, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Annotating a small dataset&lt;/li&gt;
&lt;li&gt;Understanding how data is formatted, and can be used.&lt;/li&gt;
&lt;li&gt;Browsing Data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you are on a mac – &lt;a href=&#34;http://www.sequelpro.com/&#34;&gt;Sequel Pro&lt;/a&gt; is a fantastic tool for browsing.&lt;/p&gt;
&lt;h1 id=&#34;connecting&#34;&gt;Connecting&lt;/h1&gt;
&lt;p&gt;To browse the UCSC genome browser database, download &lt;a href=&#34;http://www.sequelpro.com/&#34;&gt;Sequal Pro&lt;/a&gt; and enter in the following connection information into the login screen:&lt;/p&gt;
&lt;p&gt;&lt;!-- raw HTML omitted --&gt;&lt;/p&gt;
&lt;p&gt;You should be presented with a screen that looks like this:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.danielecook.com/Screen-Shot-2013-10-25-at-4.05.16-PM1.png&#34;&gt;&lt;!-- raw HTML omitted --&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Databases are represented as &lt;strong&gt;Connection/Resource &amp;gt; Database &amp;gt; Tables &amp;gt; Rows&lt;/strong&gt;. Sequal pro lets you connect to the UCSC Browser MySQL server. Each &lt;strong&gt;database&lt;/strong&gt; represents a different genome. Remember that genomes change and improve over time – so there are multiple &lt;strong&gt;builds&lt;/strong&gt; of each genome (e.g. hg18 and hg19) represented by separate databases.&lt;/p&gt;
&lt;p&gt;Within each database are tables. These are the same tables referred to on the UCSC website – and you can find out more about what the data represents on the browser website by clicking ‘describe table schema’ for the table of interest as shown below.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.danielecook.com/Screen-Shot-2013-10-25-at-3.51.29-PM.png&#34;&gt;&lt;!-- raw HTML omitted --&gt;&lt;/a&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>A function for retrieving SNP data from Entrez using BioPython</title>
      <link>https://www.danielecook.com/a-function-for-retrieving-snp-data-from-entrez-using-biopython/</link>
      <pubDate>Thu, 06 Jun 2013 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/a-function-for-retrieving-snp-data-from-entrez-using-biopython/</guid>
      <description>&lt;p&gt;&lt;a href=&#34;http://biopython.org/&#34;&gt;Biopython&lt;/a&gt; is a great tool for interacting with biological databases. I use it to retrieve records from &lt;a href=&#34;http://www.ncbi.nlm.nih.gov/About/tools/restable_mol.html&#34;&gt;NCBI’s Entrez databases&lt;/a&gt; including Pubmed.&lt;/p&gt;
&lt;p&gt;Unfortunately – one notable database biopython has trouble working with is the &lt;a href=&#34;http://www.ncbi.nlm.nih.gov/snp&#34;&gt;SNP&lt;/a&gt; database. This is due to the &lt;code&gt;Bio.Entrez&lt;/code&gt; parser being unable to handle the XML returned from this database. One solution is to use a built in Python XML parser, but I thought I’d try to come up with an easier solution.&lt;/p&gt;
&lt;p&gt;To solve this problem – I wrote a function for retrieving SNP data, and parsing it into an array.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; pprint &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; pprint &lt;span style=&#34;color:#66d9ef&#34;&gt;as&lt;/span&gt; pp
&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; Bio &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; Entrez

Entrez&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;email &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;YOUR@EMAIL.HERE&amp;#34;&lt;/span&gt;

&lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;pull_line&lt;/span&gt;(var_set, line):
    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    This function parses data from lines in one of three ways:
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    1.) Pulls variables out of a particular line when defined as &amp;#34;variablename=[value]&amp;#34; - uses a string to find the variable.
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    2.) Pulls variables based on a set position within a line [splits the line by &amp;#39;|&amp;#39;]
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    3.) Defines variables that can be identified based on a limited possible set of values - [categorical variable, specified using an array]
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
    line_set &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {}
    &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; k,v &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; var_set&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;items():
        &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; type(v) &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; str:
            &lt;span style=&#34;color:#66d9ef&#34;&gt;try&lt;/span&gt;:
                line_set[k] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; [x &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; line &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; x&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;startswith(v)][&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;]&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;replace(v,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&amp;#39;&lt;/span&gt;)
            &lt;span style=&#34;color:#66d9ef&#34;&gt;except&lt;/span&gt;:
                &lt;span style=&#34;color:#66d9ef&#34;&gt;pass&lt;/span&gt;
        &lt;span style=&#34;color:#66d9ef&#34;&gt;elif&lt;/span&gt; type(v) &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; int:
            &lt;span style=&#34;color:#66d9ef&#34;&gt;try&lt;/span&gt;:
                line_set[k] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; line[v]
            &lt;span style=&#34;color:#66d9ef&#34;&gt;except&lt;/span&gt;:
                &lt;span style=&#34;color:#66d9ef&#34;&gt;pass&lt;/span&gt;
        &lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;:
            &lt;span style=&#34;color:#66d9ef&#34;&gt;try&lt;/span&gt;:
                line_set[k] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; [x &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; line &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; v][&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;]
            &lt;span style=&#34;color:#66d9ef&#34;&gt;except&lt;/span&gt;:
                &lt;span style=&#34;color:#66d9ef&#34;&gt;pass&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; line_set

&lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;pull_vars&lt;/span&gt;(var_set,line_start,line,multi&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;False):
    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    Delegates and compiles data from entrez flat files dependent on whether
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    the type of data trying to be pulled is contained in unique vs. non-unique lines.
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    For example - the first line of the flat file is always something like this:
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    rs12009 | Homo Sapiens | 9606 | etc.
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    This line is unique (refers to RefSnp identifier)- and only occurs once in each flat file. On the other hand, lines
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    beginning with &amp;#34;ss[number]&amp;#34; refer to &amp;#39;submitted snp&amp;#39; numbers and can appear multiple times.
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
    lineset &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; [x&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;split(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; | &amp;#39;&lt;/span&gt;) &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; x &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; line &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; x&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;startswith(line_start)]
    &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; len(lineset) &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;:
        &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; 
    &lt;span style=&#34;color:#75715e&#34;&gt;# If the same line exists multiple times - place results into an array&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; multi &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; True:
        pulled_vars &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; []
        &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; line &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; lineset:
            &lt;span style=&#34;color:#75715e&#34;&gt;# Pull date in from line and append&lt;/span&gt;
            pulled_vars&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;append(pull_line(var_set,line))
        &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; pulled_vars  
    &lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;:
    &lt;span style=&#34;color:#75715e&#34;&gt;# Else if the line is always unique, output single dictionary&lt;/span&gt;
        line &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; lineset[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;]
        pulled_vars &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {}
        &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; pull_line(var_set,line)

&lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;get_snp&lt;/span&gt;(q):
    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&amp;#34; 
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    Takes as input an array of snp identifiers and returns 
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    a parsed dictionary of their data from Entrez.
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
    response &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; Entrez&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;efetch(db&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;SNP&amp;#39;&lt;/span&gt;, id&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;,&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;join(q), rettype&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;flt&amp;#39;&lt;/span&gt;, retmode&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;flt&amp;#39;&lt;/span&gt;)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;read()
    r &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {} &lt;span style=&#34;color:#75715e&#34;&gt;# Return dictionary variable&lt;/span&gt;
    &lt;span style=&#34;color:#75715e&#34;&gt;# Parse flat file response&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; snp_info &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; filter(None,response&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;split(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\n\n&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;)):
        &lt;span style=&#34;color:#66d9ef&#34;&gt;print&lt;/span&gt; snp_info
        &lt;span style=&#34;color:#75715e&#34;&gt;# Parse the First Line. Details of rs flat files available here:&lt;/span&gt;
        &lt;span style=&#34;color:#75715e&#34;&gt;# ftp://ftp.ncbi.nlm.nih.gov/snp/specs/00readme.txt&lt;/span&gt;
        snp &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; snp_info&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;split(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;)
        &lt;span style=&#34;color:#75715e&#34;&gt;# Parse the &amp;#39;rs&amp;#39; line:&lt;/span&gt;
        rsId &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; snp[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;]&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;split(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34; | &amp;#34;&lt;/span&gt;)[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;]
        r[rsId] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {}

        &lt;span style=&#34;color:#75715e&#34;&gt;# rs vars&lt;/span&gt;
        rs_vars &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;organism&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;,
                   &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;taxId&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;,
                   &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;snpClass&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;3&lt;/span&gt;,
                   &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;genotype&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;genotype=&amp;#34;&lt;/span&gt;,
                   &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;rsLinkout&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;submitterlink=&amp;#34;&lt;/span&gt;,
                   &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;date&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;updated &amp;#34;&lt;/span&gt;}

        &lt;span style=&#34;color:#75715e&#34;&gt;# rs vars&lt;/span&gt;
        ss_vars &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;ssId&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;,
                   &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;handle&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;,
                   &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;locSnpId&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;,
                   &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;orient&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;orient=&amp;#34;&lt;/span&gt;,
                   &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;exemplar&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;ss_pick=&amp;#34;&lt;/span&gt;,
                   }

        &lt;span style=&#34;color:#75715e&#34;&gt;# SNP line variables:&lt;/span&gt;
        SNP_vars &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;observed&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;alleles=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;value&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;het=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;stdError&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;se(het)=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;validated&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;validated=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;validProbMin&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;min_prob=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;validProbMax&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;max_prob=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;validation&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;suspect=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;AlleleOrigin&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;unknown&amp;#39;&lt;/span&gt;,
                                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;germline&amp;#39;&lt;/span&gt;,
                                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;somatic&amp;#39;&lt;/span&gt;,
                                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;inherited&amp;#39;&lt;/span&gt;,
                                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;paternal&amp;#39;&lt;/span&gt;,
                                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;maternal&amp;#39;&lt;/span&gt;,
                                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;de-novo&amp;#39;&lt;/span&gt;,
                                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;bipaternal&amp;#39;&lt;/span&gt;,
                                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;unipaternal&amp;#39;&lt;/span&gt;,
                                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;not-tested&amp;#39;&lt;/span&gt;,
                                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;tested-inconclusive&amp;#39;&lt;/span&gt;],
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;snpType&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;notwithdrawn&amp;#39;&lt;/span&gt;,
                               &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;artifact&amp;#39;&lt;/span&gt;,
                               &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;gene-duplication&amp;#39;&lt;/span&gt;,
                               &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;duplicate-submission&amp;#39;&lt;/span&gt;,
                               &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;notspecified&amp;#39;&lt;/span&gt;,
                               &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;ambiguous-location;&amp;#39;&lt;/span&gt;,
                               &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;low-map-quality&amp;#39;&lt;/span&gt;]}
        
        &lt;span style=&#34;color:#75715e&#34;&gt;# CLINSIG line variables:&lt;/span&gt;
        CLINSIG_vars &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;ClinicalSignificance&amp;#34;&lt;/span&gt;:[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;probable-pathogenic&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;pathogenic&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;other&amp;#39;&lt;/span&gt;]}

        &lt;span style=&#34;color:#75715e&#34;&gt;# GMAF line variables&lt;/span&gt;
        GMAF_vars &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;allele&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;allele=&amp;#34;&lt;/span&gt;,
                     &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;sampleSize&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;count=&amp;#34;&lt;/span&gt;,
                     &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;freq&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;MAF=&amp;#34;&lt;/span&gt;}

        &lt;span style=&#34;color:#75715e&#34;&gt;# CTG line variables&lt;/span&gt;
        CTG_vars &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;groupLabel&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;assembly=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;chromosome&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;chr=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;physmapInt&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;chr-pos=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;asnFrom&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;ctg-start=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;asnTo&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;ctg-end=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;loctype&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;loctype=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;orient&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;orient=&amp;#34;&lt;/span&gt;}

        &lt;span style=&#34;color:#75715e&#34;&gt;# LOC line variables&lt;/span&gt;
        LOC_vars &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;symbol&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;geneId&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;locus_id=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;fxnClass&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;fxn-class=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;allele&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;allele=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;readingFrame&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;frame=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;residue&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;residue=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;aaPosition&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;aa_position=&amp;#34;&lt;/span&gt;}

        &lt;span style=&#34;color:#75715e&#34;&gt;# LOC line variables&lt;/span&gt;
        SEQ_vars &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;gi&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;source&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;source-db=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;asnFrom&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;seq-pos=&amp;#34;&lt;/span&gt;,
                    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;orient&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;orient=&amp;#34;&lt;/span&gt;}

        &lt;span style=&#34;color:#75715e&#34;&gt;# Pull out variable information:&lt;/span&gt;
        r[rsId][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;rs&amp;#39;&lt;/span&gt;]       &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; pull_vars(rs_vars,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;rs&amp;#34;&lt;/span&gt;,snp)
        r[rsId][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;ss&amp;#39;&lt;/span&gt;]       &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; pull_vars(ss_vars,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;ss&amp;#34;&lt;/span&gt;,snp,True)
        r[rsId][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;SNP&amp;#39;&lt;/span&gt;]      &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; pull_vars(SNP_vars,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;SNP&amp;#34;&lt;/span&gt;,snp)
        r[rsId][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;CLINSIG&amp;#39;&lt;/span&gt;]  &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; pull_vars(CLINSIG_vars,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;CLINSIG&amp;#34;&lt;/span&gt;,snp)
        r[rsId][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;GMAF&amp;#39;&lt;/span&gt;]     &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; pull_vars(GMAF_vars,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;GMAF&amp;#34;&lt;/span&gt;,snp)
        r[rsId][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;CTG&amp;#39;&lt;/span&gt;]      &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; pull_vars(CTG_vars,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;CTG&amp;#34;&lt;/span&gt;,snp,True)
        r[rsId][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;LOC&amp;#39;&lt;/span&gt;]      &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; pull_vars(LOC_vars,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;LOC&amp;#34;&lt;/span&gt;,snp,True)
        r[rsId][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;SEQ&amp;#39;&lt;/span&gt;]      &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; pull_vars(SEQ_vars,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;SEQ&amp;#34;&lt;/span&gt;,snp,True)
    &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; r
        

snp &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; get_snp([&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;12009&amp;#34;&lt;/span&gt;])
&lt;span style=&#34;color:#66d9ef&#34;&gt;print&lt;/span&gt; pp(snp)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Create New File in Finder with Alfred 2</title>
      <link>https://www.danielecook.com/create-new-file-in-finder-with-alfred-2/</link>
      <pubDate>Thu, 04 Apr 2013 22:45:53 +0000</pubDate>
      
      <guid>https://www.danielecook.com/create-new-file-in-finder-with-alfred-2/</guid>
      <description>&lt;p&gt;I have created a simple workflow for Alfred 2 which makes it easy to create a new text file in the frontmost finder window. &lt;strong&gt;Update&lt;/strong&gt; – at the suggestion of a visitor – &lt;a href=&#34;http://www.jameskachan.com/&#34;&gt;James Kachan&lt;/a&gt; – I have updated the workflow to automatically open the new file in a text editor. &lt;a href=&#34;http://ianisted.co.uk/new-finder-file-alfred-2&#34;&gt;An alternative&lt;/a&gt;, more advanced workflow for Alfred 2 has also been created by Ian Isted.&lt;/p&gt;
&lt;h3 id=&#34;usage&#34;&gt;Usage&lt;/h3&gt;
&lt;p&gt;Open Alfred 2, and type &lt;strong&gt;new&lt;/strong&gt; followed by the name of the file. If you just type new, a file called ‘untitled.txt’ will be created.&lt;/p&gt;
&lt;h3 id=&#34;download&#34;&gt;Download&lt;/h3&gt;
&lt;!-- raw HTML omitted --&gt;
</description>
    </item>
    
    <item>
      <title>Excel Template for Mapping Four 96-Well Plates to One 384-Well Plate</title>
      <link>https://www.danielecook.com/excel-template-for-mapping-four-96-well-plates-to-one-384-well-plate/</link>
      <pubDate>Wed, 06 Mar 2013 22:45:53 +0000</pubDate>
      
      <guid>https://www.danielecook.com/excel-template-for-mapping-four-96-well-plates-to-one-384-well-plate/</guid>
      <description>&lt;h4 id=&#34;download&#34;&gt;Download&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;https://www.danielecook.com/96_to_384_platemapper.xlsx&#34;&gt;96_to_384_platemapper.xlsx&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;h5 id=&#34;copy-in-identifiers&#34;&gt;Copy in Identifiers&lt;/h5&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/Screen-Shot-2013-03-06-at-10.55.08-AM.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;h5 id=&#34;a-384-well-template-is-produced&#34;&gt;A 384 well template is produced&lt;/h5&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/Screen-Shot-2013-03-06-at-10.55.30-AM.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;h5 id=&#34;and-a-summary-table-too&#34;&gt;And a summary table too.&lt;/h5&gt;
&lt;p&gt;&lt;img src=&#34;https://www.danielecook.com/Screen-Shot-2013-03-06-at-10.55.41-AM.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Django models for Chado</title>
      <link>https://www.danielecook.com/django-models-for-chado/</link>
      <pubDate>Wed, 09 Jan 2013 22:45:53 +0000</pubDate>
      
      <guid>https://www.danielecook.com/django-models-for-chado/</guid>
      <description>&lt;h2 id=&#34;original-post-2013-01-09&#34;&gt;Original Post (2013-01-09)&lt;/h2&gt;
&lt;p&gt;Here is my first stab at models for django of the &lt;a href=&#34;http://gmod.org/wiki/Chado_-_Getting_Started&#34;&gt;Chado&lt;/a&gt; database schema&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://gist.github.com/danielecook/4494488&#34;&gt;Gist → Chado models&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;update-machado-2019-06-20&#34;&gt;Update: machado (2019-06-20)&lt;/h2&gt;
&lt;p&gt;I originally put together a set of models for Chado in January of 2013. I later moved on from this work when I started graduate school - but a group incorporated a little bit of this work in the development of their framework for storing, searching and visualizing biological data called &lt;a href=&#34;https://github.com/lmb-embrapa/machado&#34;&gt;machado&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>ccmatch: A stata program for matching cases and controls</title>
      <link>https://www.danielecook.com/ccmatch-a-stata-program-for-matching-cases-and-controls/</link>
      <pubDate>Wed, 19 Dec 2012 22:45:53 +0000</pubDate>
      
      <guid>https://www.danielecook.com/ccmatch-a-stata-program-for-matching-cases-and-controls/</guid>
      <description>&lt;p&gt;&lt;strong&gt;ccmatch&lt;/strong&gt; is used to randomly match cases and controls based on specified criteria. For instance, if you wanted to randomly match cases and controls based on age, you can use ccmatch to pair up people with the same age. You can use multiple variables to match cases and controls.&lt;/p&gt;
&lt;h3 id=&#34;installation&#34;&gt;Installation&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;ssc install ccmatch
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;syntax&#34;&gt;Syntax&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;ccmatch variable_list, cc&lt;span style=&#34;color:#f92672&#34;&gt;(&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;)&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt;id&lt;span style=&#34;color:#f92672&#34;&gt;(&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;)]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;specifying an &lt;code&gt;id&lt;/code&gt; is optional&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;variable_list&lt;/code&gt; The variable list are categorical or discrete variables you want to match on (example: age, sex, weight class, etc.).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;cc( )&lt;/code&gt; Specify your case control variable here. 0=control; 1=case.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;id( )&lt;/code&gt; &lt;em&gt;(optional)&lt;/em&gt; Specify a variable you use as an ID and the &lt;code&gt;match_id&lt;/code&gt; variable will be created and list the case/control partner.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;ccmatch creates one to two variables:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;match&lt;/code&gt; an integer shared by a case and control.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;match_id&lt;/code&gt; &lt;em&gt;Optional&lt;/em&gt; the ID partner of the case control pair (specified in a separate variable).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;example&#34;&gt;Example&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;match_id&lt;/th&gt;
&lt;th&gt;match&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;name&lt;/th&gt;
&lt;th&gt;case_control&lt;/th&gt;
&lt;th&gt;age&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;a6&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;a2&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;a2&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;a6&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;a7&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;a4&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;a4&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;a7&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;a8&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;a5&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;a5&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;a8&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;a10&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;a1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;a1&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;a10&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;.&lt;/td&gt;
&lt;td&gt;a3&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;.&lt;/td&gt;
&lt;td&gt;a9&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The table above shows example ccmatch output. The &lt;!-- raw HTML omitted --&gt;highlighted&lt;!-- raw HTML omitted --&gt; variables were created by &lt;code&gt;ccmatch&lt;/code&gt;. The original data (name, case_control, age) is unchanged, except that it has been reordered. The command used was:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;ccmatch age, id&lt;span style=&#34;color:#f92672&#34;&gt;(&lt;/span&gt;name&lt;span style=&#34;color:#f92672&#34;&gt;)&lt;/span&gt; cc&lt;span style=&#34;color:#f92672&#34;&gt;(&lt;/span&gt;case_control&lt;span style=&#34;color:#f92672&#34;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Age was specified following ccmatch to indicate that we wanted to match case/controls who are the same age.&lt;/p&gt;
&lt;p&gt;The case/control variable is specified as an option using &lt;code&gt;cc( )&lt;/code&gt;, and the id of each individual is specified using &lt;code&gt;id( )&lt;/code&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://www.danielecook.com/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://www.danielecook.com/</guid>
      <description>&lt;p&gt;browser()&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>