Tag Archives: Open Data

Rock Cairn, Mt. St. Helens

Elevator pitch for Knight #newschallenge on data and communities

I’ve been working on an idea for the Knight Foundation News Challenge on data and communities and wanted to share the evolution of our one line, elevator pitch for the project.

The proposal is to take the idea of a “data repository” (like data.gov, or the City of Boise’s growing data portal hosted by ESRI’s Open Data platform, which I’m also working on), that offers bulk data downloads of civic info, and add two more types of data to the catalog: research that actaully uses the data and media reporting on the data.

I call this “data in its context,” or “the work done on the data,” and I think it will be convenient to have it all in one place. Also, I think average citizens will be able to make better use of it, better interpret the numbers and contribute back to the research and reporting with their own local insights.

My first stab at explaining this was pretty high level and I still like it:

Draft 1

We’re organizing the web in Boise around community data, locally relevant research, government reports and the news in a structured way that scales to local internet spaces around the world.

But it did not speak to the power of communities harnessing their own data, which is the point.

Draft 2

We’re organizing the web in Boise, Idaho, around community data, relevant research, government reports, local journalism and public ideas in a structured way that scales to communities around the world.

Someone pointed out to me that its not the web that needs organizing, it’s the locally relevant data, thus:

Draft 3

We’re organizing community data alongside relevant research, government reports, local journalism and public ideas in Boise, Idaho with a web app that will scale to benefit communities around the world.

Then, how will it benefit these communities?

Draft 4

We’re marshalling community data alongside relevant university research, government reports, local journalism and public ideas in Boise, Idaho with an open source web app that communities across the world can use to tell their own data stories.

Finally after much work and input from many folks, including and intense “Red Team” (pdf) session at Boise State, I arrived at this draft:

Draft 5

Data Cairn is a platform for data storytelling, starting in Southwest Idaho, that allows communities to harness their data along with the work being done on it: relevant university research, government reports, local journalism, visualizations, public ideas and more, in order to discover and demand better solutions.

The feedback phase for the Knight News Challenge is open for one more day, so feel free to leave more feedback and applause, if warranted, on our proposal. There are tons of other cool projects on there as well. Well, 1,028 other cool projects …

Here’s a few I like:

Linking government and academic open data

I’m doing some reading on the open data movement for a new project that we will announce in a few weeks and came across an interesting history of the human genome project. I’m looking at links between the open data movement, which is mostly concerned with public, government information and its release in free, usable, digital formats, and open data in academia — university research data — which is often treated as private or somehow protected information.

Human Genome Project logo
Human Genome Project logo

In 2011, Jorge Contreras wrote about the Bermuda Principles in the Minnesota Journal of Law, Science & Technology (SSRN link). The Bermuda Principles, agreed to in 1996… in Bermuda, stated that human genome research data should be released to the public within 24 hours of being collected.

According to Contreras, key researchers and genetic policy thinkers agreed to Bermuda for three reasons:

  1. To aid project coordination,
  2. to advance science, and
  3.  to minimize encumbrances to research that patents on the human genome would cause.
1000th protein structure
Argonne’s Midwest Center for Structural Genomics deposits 1,000th protein structure / Matt Howard, Argonne’s Midwest Center for Structural Genomics deposits (Licensed under CC BY-SA 2.0 via Commons)

Number three is very interesting and certainly has application in other areas of the sciences. But the bigger concept that Contreras analyzes — the idea of openness and data sharing in scientific research also applies in many other areas.

As discussed above, the more quickly scientific data is disseminated, the more quickly science will progress. Conversely, when the release of data is delayed due to the length of the publication cycle and patenting concerns, it can be argued that the progress of scientific advancement is retarded, or at least that it may not achieve its greatest potential. If data were not withheld until a researcher’s conclusions were published, but released prior to publication, the months-long delays associated with the publishing process could be avoided. Following this line of argument, in an ideal world, maximum scientific efficiency could be achieved by reducing the delay between data generation and data release to zero. That is, the most rapid pace of innovation, discovery of new therapies, development of new technologies, and understanding of natural phenomena could be achieved by releasing scientific data to the community the moment it is generated. — Jorge L. Contreras in Minnesota Journal of Law, Science & Technology2011;12(1):61-125. (Emphasis added.)
More on this later, but it dawns on me that Github is basically designed to reduce the delay between data generation and data release to almost zero: update, add, commit, push and the public has a current and historical view of your data manipulations.
Find more data journalism and open data sources on one of my tumblrs: http://journalismprooftexts.tumblr.com/