In the mix…Facebook “breach” of public data, data-mining for everyone, thinking through the Panton Principles, and BEST PRACTICES Act in Congress

Friday, July 30th, 2010

1) Facebook’s in privacy trouble again. Ron Bowes created a downloadable file containing information on 100 million searchable Facebook profiles, including the URL, name, and unique ID.  What’s interesting is that it’s not exactly a breach.  As Facebook pointed out, the information was already public.  What Facebook will likely never admit, though, is that there is a qualitative difference between information that is publicly available, and information that is organized into an easily searchable database.  This is what we as a society are struggling to define — if “public” means more public than ever before, how do we balance our societal interests in both privacy and disclosure?

2) Can data mining go mainstream? The article doesn’t actually say much, but it does at least raise an important question.  The value of data and data-mining is immense, as corporations and large government agencies know well.  Will those tools every be available to individuals?  Smaller businesses and organizations?  And what would that mean for them?  It’s a big motivator for us at the Common Data Project — if data doesn’t belong to anyone, and it’s been collected from us, shouldn’t we all be benefiting from data?

3) In the same vein is a new blog by Peter Murray-Rust discussing open knowledge/open data issues, focusing on the Panton Principles for open science data.

4) A new data privacy bill has been introduced in Congress called “Building Effective Strategies to Promote Responsibility Accountability Choice Transparency Innovation Consumer Expectations and Safeguards” Act, aka “BEST PRACTICES Act.”  The Information Law Group has posted Part One of FAQs on this proposed bill.

Although the bill is still being debated and rewritten, some of its provisions indicate that the author of the bill knows a bit more about data and privacy issues than many other Congressional representatives.

  • The information regulated by the Act goes beyond the traditional, American definition of personally identifiable information.  “The definition of “covered information” in the Act does not require such a combination – each data element stands on its own and may not need to be tied to or identify a specific person. If I, as an individual, had an email address that was, that would would appear to satisfy the definition of covered information even if my name was not associated with it.”
  • Notice is required when information will be merged or combined with other data.
  • There’s some limited push to making more information accessible to users: “covered entities, upon request, must provide individuals with access to their personal files.” However, they only have to if “the entity stores such file in a manner that makes it accessible in the normal course of business,” which I’m guessing would apply to much of the data collected by internet companies.

In the mix…democratizing access to data, data literacy, and predictable responses to proposed privacy bill

Friday, June 18th, 2010

1) Infochimps launched their API. People often ask, are you guys doing something similar?  Yes, in that we are also interested in democratizing access to data, but we’re focusing on a narrower area — information that’s too sensitive and too personal to release in the usual channels. In any case, we’re excited to see more movement in this direction.

2) Wikipedia began a trial of a new tool called “Pending Changes.” To deal with glaring inaccuracies and vandalism, Wikipedia made certain entries off-limits for off-the-cuff editing.  The trade-off, however, was that first-time editors to these articles couldn’t get that immediate thrill of seeing their edits.  Wikipedia’s trying out a compromise, a tab in which these edits are visible as “pending changes.”  It’s always fascinating to see all the different spaces in which people in a community can interact online — this is a new one.

3) The Info Law Group posted various groups’ reactions to the privacy bill proposed by Representative Rick Boucher. Here’s Part I, here’s Part II. Fairly predictable, but it still never ceases to amuse me how far apart industry groups are from consumer advocates.

4) Great discussion continues on the concept of “data literacy.” I love this guest post from David Eaves on the Open Knowledge Foundation blog, with the awesome line:

It is worth remembering: We didn’t build libraries for an already literate citizenry. We built libraries to help citizens become literate. Today we build open data portals not because we have a data or public policy literate citizenry, we build them so that citizens may become literate in data, visualization, coding and public policy.

The Common Data Project at the Open Knowledge Conference

Monday, April 19th, 2010

We’ll be at the Open Knowledge Conference in London on April 24th!  Alex Selkirk will be giving a lightning talk, “Can We Have Our Cake and Eat It Too?: The Potential of a “Datatrust” to Open Personal Data While Protecting Privacy.”  He’ll walk through an updated version of our datatrust demo that shows how differential privacy, in the form of PINQ, could be used to allow open-ended queries without revealing the presence of any one individual.  (The updated version isn’t available yet, but for a look at the first version of the demo, The updated version isn’t quite complete, but for a description of how the old one worked, check out Tony Gibbon’s blog post here.)

All of us here have been wrestling with the demo and how it could be used in real-world scenarios.  We’ve

  • Described the basic principles of differential privacy behind PINQ;
  • Illustrated a demo of PINQ;
  • Outlined what would go into a datatrust prototype; and
  • Imagined how PINQ would enable Census data to be open in new ways.

One of the biggest challenges is defining PINQ’s privacy guarantee for real-world use.  We’ve addressed that in these posts:

And there’s still more to come on that…

We’re really excited to be able to share what we’ve been wrestling with for the last couple of months with the people at the Open Knowledge Conference, who are all invested in open knowledge, “any content, information or data that people are free to use, re-use and redistribute — without any legal, technological or social restriction.”

We also look forward to hearing what others are doing to make information more publicly available.  We’re particularly interested in the panel on community driven research, as well as the multi-national panel on opening up government data.  It’s a great opportunity to hear from experts working on open government issues from a European perspective.  In all the talk of open government and transparency, we don’t hear much about how governments are going to deal with privacy issues, despite the fact that much of what governments collect is very personal.  We hope to hear about how these experts are dealing with these issues, especially given that the European understand of privacy seems to be very different from the American one, as evidenced by the Italian Google case.

