Hong Kong/China - open sourcing genomes / crowdsourcing killer outbreaks

Written by

S.C. Edmunds

The genome from the deadly 2011 E. coli in Germany was the first dataset we at GigaScience released with a DOI and CC0 waiver. Due to the unusual severity of the outbreak – thousands severely ill and over 50 deaths, it was clear that the usual scientific procedure of producing data, analyzing it slowly and then releasing it to the public after a potentially long peer-review procedure would not have been helpful in this case. By releasing the first genomic data before it had even finished uploading to the usual scientific repositories (NCBI) via twitter, promoting its use, and releasing subsequently improved assemblies in this way, a huge community of microbial genomicists from around the world took up the challenge to study the organism collaboratively (the process was dubbed by some as the first “Tweenome”). A github repository was created (thanks to the efforts of the Era7 team in Spain) to provide a home for the analyses and data and within 24 hours groups from around the world started producing their own annotations and assemblies. Within a few days a potential ancestral strain was identified by another blogger, helping to clear Spanish farmers of the blame and end the massive boycotting of their agricultural products.

The many eyes thrown at this data allowed the antibiotic resistance genes and pathogenic features to be much more clearly understood. Obviously the main aim of doing science in this accelerated way was speed up diagnosis and treatments during a serious health crisis, and the E. coli data enabled the rapid development of diagnostic tests and anti-microbial agents. Within less than a week of the first data coming out a free diagnostic protocol and free primers were distributed by the BGI to immediately help track the source of the outbreak. This rapid and open response seems an obvious thing to do in a health crisis such as this, but is shockingly still not the norm when contrasted with the lack of data sharing from the Ebola crisis.

Releasing the data under a CC0 license allowed truly open-source analysis. The UK HPA and other researchers followed suit by releasing their work in this way. Following this example, a team at Pacific-Biosystems also released their related data in a similar manner, using the example of their fellow E. coli data producers, which allowed them to release their data without wasting time on legal wrangling. This has subsequently been used as an example for future UK and EU science policy, with the Royal Society using the E. coli crowdsourcing as an example of “the power of intelligently open data” and highlighting it on the cover of their influential “Science as an Open Enterprise” report.

For more information about this open science crowdsourcing effort, you can read the following blog posts: