Tag Archives: Bermuda Principles

Linking government and academic open data

I’m doing some reading on the open data movement for a new project that we will announce in a few weeks and came across an interesting history of the human genome project. I’m looking at links between the open data movement, which is mostly concerned with public, government information and its release in free, usable, digital formats, and open data in academia — university research data — which is often treated as private or somehow protected information.

Human Genome Project logo
Human Genome Project logo

In 2011, Jorge Contreras wrote about the Bermuda Principles in the Minnesota Journal of Law, Science & Technology (SSRN link). The Bermuda Principles, agreed to in 1996… in Bermuda, stated that human genome research data should be released to the public within 24 hours of being collected.

According to Contreras, key researchers and genetic policy thinkers agreed to Bermuda for three reasons:

  1. To aid project coordination,
  2. to advance science, and
  3.  to minimize encumbrances to research that patents on the human genome would cause.
1000th protein structure
Argonne’s Midwest Center for Structural Genomics deposits 1,000th protein structure / Matt Howard, Argonne’s Midwest Center for Structural Genomics deposits (Licensed under CC BY-SA 2.0 via Commons)

Number three is very interesting and certainly has application in other areas of the sciences. But the bigger concept that Contreras analyzes — the idea of openness and data sharing in scientific research also applies in many other areas.

As discussed above, the more quickly scientific data is disseminated, the more quickly science will progress. Conversely, when the release of data is delayed due to the length of the publication cycle and patenting concerns, it can be argued that the progress of scientific advancement is retarded, or at least that it may not achieve its greatest potential. If data were not withheld until a researcher’s conclusions were published, but released prior to publication, the months-long delays associated with the publishing process could be avoided. Following this line of argument, in an ideal world, maximum scientific efficiency could be achieved by reducing the delay between data generation and data release to zero. That is, the most rapid pace of innovation, discovery of new therapies, development of new technologies, and understanding of natural phenomena could be achieved by releasing scientific data to the community the moment it is generated. — Jorge L. Contreras in Minnesota Journal of Law, Science & Technology2011;12(1):61-125. (Emphasis added.)
More on this later, but it dawns on me that Github is basically designed to reduce the delay between data generation and data release to almost zero: update, add, commit, push and the public has a current and historical view of your data manipulations.
Find more data journalism and open data sources on one of my tumblrs: http://journalismprooftexts.tumblr.com/