Posts Tagged ‘Licenses’

Creative Commons-style licenses for personal information, Part III: What are the challenges?

Monday, November 30th, 2009

In the first two posts, I described how personal information licenses might work and why they might be useful in shifting the debate around how personal information is collected and used. Sharing information could be cool!  People could exercise choices!  Companies could be pressured to offer similar choices!

Unfortunately, it wouldn’t be that easy.  There are certainly obstacles and challenges to creating a system of personal information licenses for common use, which I describe below.

1) Personal information isn’t property—why do you want to propertize it?

The short answer is, we don’t. We’re well aware that there is a history of academic debate on this issue, pro and con around whether making personal information personal property would make it easier to protect individual privacy.  Although the issues are certainly interesting, we don’t want to step into that debate and we don’t think we have to for the licenses we’re imagining.

First, let’s examine how personal information is viewed today.  I can’t own a fact.  I can’t own the fact that I’m 32, but I can have copyright in an essay in which I state I am 32 and I can have copyright in a database that includes the fact I am 32 if I’m creative in building the structure of that database (in the U.S.).  We can understand the reasoning behind this.  We want to live in a world where facts are “free” to be used and reused without any need to pay a licensing fee.

But the simple declaration, “You can’t own a fact” doesn’t begin to describe the many ways in which people are collecting data, selling it, renting it, and otherwise making money off of it.  When a company sells a mailing list, it may not “own” the fact that I live at XYZ Avenue in Brooklyn, but it certainly is using it to its advantage.  Why, then, should the fact that I can’t own the fact of where I live keep me from sharing that data as I like and trying to control it in new ways?

The digital revolution is forcing us to think beyond property/not property.  Facts have become valuable even when they’re not technically “owned” by anyone.  I haven’t come up with some snappy new terms to use, but the issue should no longer be defined solely around “property/not property.”


Some new businesses seem to be working off this model. Blue Kai and KindClicks, while collecting personal information for market research, provide individuals with a way of stating their preferences and monetizing their data.  KindClicks, for example, allows individuals who contribute data to then donate the money they make off their data to the charity of their choice.  BlueKai collects data through cookies, but provides a link on their site by which users can see what information has been gathered about them. Those who want to opt out can.  Those that choose to participate in BlueKai’s registry can then choose to donate a portion of their “earnings” to charity.

We don’t actually want to model these companies’ ways of valuing data.  CDP’s mission isn’t to make sure that everyone gets a dollar here or a dollar there every time their data is accessed.  To us, the value of such information is immense to the public and yet not easily measurable in dollars.  But we do want to explore the idea that we could just take control of our data and obtain value from it, even if it’s the non-monetary, social value of providing something useful to the public.

2) How would these licenses be enforceable? What about existing terms of use on online forums, social networks?

This is a big question.  I’m not sure what kind of dataset could be licensable and the extent to which a license could cover facts within that dataset.  Could the license really encourage new forms of sharing if there was no way to prevent people from using individual facts within that dataset outside of the license terms?  How useful would such a license be?

Arguably, Creative Commons licenses are not easily enforced.  They certainly have an easier case for arguing that they are enforceable; there has been one case I know of where a court upheld the terms of the license.  But the vast majority of people using photographs, art, and other work outside the terms of the license do it without impunity.  Most CC license holders never find out their Flickr photo was used outside the license terms, and most wouldn’t have the resources to do anything about it even if they did find out.  Yet CC licenses have still managed to impact societal norms on intellectual property.

Personal information licenses may still have an effect, then, on societal norms about how information is collected and shared regardless of how much the licenses are litigated.  Even the process of litigation may help us as a society have a smarter conversation about current practices.

As to the objection that the licenses wouldn’t work in the face of existing terms of use for social networks and other sites — the fact that I might not be able to “license” my own information that I put myself on Facebook just underscores why creative, proactive, even aggressive strategies might be necessary.

3) Why would you encourage people to put their personal information out in public?  Isn’t it irresponsible to encourage people to provide information that could increase risk of identity theft and fraud?

I don’t want to dismiss this concern off-hand.  But as my father likes to say, everything in life has good and bad.  There are many things we do that are risky, and we try our best to minimize those risks, both as a society when we pass laws and as individuals when we take more particular, personal measures.  Driving is a very dangerous activity.  It is also a very valuable one.  Many governments have decided to legislatively require the wearing of seatbelts.  Many of us personally make the decision to practice other safe driving techniques that aren’t legally required.

We think it’s imperative that we, as a society, think hard and carefully about how to minimize the risks of personal information being used, collected, and exchanged.  Creative Commons-style licenses for personal information sharing may or may not be the best way to address today’s privacy problems.  I’m curious to hear if you think the risks outweigh the benefits and why.  But to shut down the idea solely because the risk exists — that is not going to help push the conversation forward.


Licenses for making personal information more widely available for research and public use—would they work?  Maybe, maybe not.  Worth exploring?  Most definitely.

We’d love to hear your questions and comments.

Remixing Creative Commons licenses for personal information, Part II — What good would that do?

Wednesday, November 25th, 2009

The scenarios of data sharing I outlined in my first blog post may not sound too exciting to you.  So what if one person uploads a dataset on her blog, making it public, and then says it’s available for reuse?  How does that make the world a better place?

It’s possible that although personal information licenses, a la Creative Commons, wouldn’t solve all data-collection problems today, it could shape and shift the debate in several important ways:

1) Create a proactive way for people to take control of their information.

Right now, we as users generally are told, “Take it or leave it.”  We can agree with the terms of use that govern the use of our personal information, or not. A few companies are trying to offer more choices—Firefox has a “Private Browsing” option, Google offers some choices in what interests are tracked.  But a user almost never gets a choice in how his or her information is used once it’s collected.  A set of licenses could be a way to assert control instead of waiting for the choices to be offered.  As many privacy advocates have noted, it’s problematic that most privacy choices are offered as an opt-out rather than an opt-in.  A set of licenses would create a way to “opt-in” before being asked.  Even if the licenses turned out to be difficult to enforce, if the licenses became popular and widespread, it would be harder to ignore that people do have preferences that are not being considered or honored.

2) Create a grassroots way for people to actively share their information for causes they explicitly support.

Obama's Healthcare Stories for America

We’ve all seen campaigns that are organized around human-interest stories, true stories about real people that are meant to humanize a campaign and give it urgency.  The current healthcare debate, for example, inspired a host of organizations to ask people to “share their stories,” the Obama administration’s site being one of the best-organized ones.

It had the following “Submission Terms“:

submission terms

By submitting your story, you agree that the story, along with any pictures or video you submit along with the story (the “Submission”), is non-confidential and may be freely used and disclosed, in whole or in part and in any manner or media, by or on behalf of Democratic National Committee (“DNC”) in support of health care reform.

You acknowledge that such use will be without acknowledgment or compensation to you.

You grant DNC a perpetual, irrevocable, sublicensable, royalty-free license to publish, reproduce, distribute, display, perform, adapt, create derivative works of and otherwise use the Submission.

Despite the all-or-nothing language, the Obama site was still able to solicit a great number of stories.  But the terms underscore a perennial problem for lesser-known organizations.  How do people trust an organization with their stories?

A more decentralized set of licenses could allow people to essentially tag their information across the internet and flag that it’s been provided in support of a specific cause, without giving their stories explicitly to another organization.  Individuals could also choose to tag their information in support of specific research projects.

The licenses could be an organizing tool, a way for organizations or people without established reputations to gather useful information without asking people to sign away the rights to their stories.  Or the licenses could be a research tool, enabling new forms of data collection.  Already, sociologists are exploring the possibilities of broadening research beyond the couple hundred subjects that can be managed through more traditional methods.  At Harvard, a graduate student in psychology created an iPhone application that allows research subjects in a study on happiness to rate their happiness in real time, rather than through recollection with an interviewer later.

Would the existence of standard licenses for sharing personal information make organizing around real stories easier?  Could it make personal information-based research easier?  Could it encourage people who support such causes or research but are uncertain about existing privacy guarantees more willing to try?  We think it’s certainly worth exploring.

3) Make sharing cool (and good).


Creative Commons is not without controversy, but almost everyone would agree, what the organization did manage to do was making sharing work cool.  The licenses created an easy way for people who shared the same view of intellectual property to band together and display their commitment.  They also made it easier to advertise and sell this ethos of IP to others.

We wonder if a set of licenses for sharing personal information might not be able to do the same.  We want to promote sharing information as a virtue, a civic act of generosity, and a way to enable all of us to have more information for decisions.  We want donating information to feel like donating blood.

4) Raise the bar on use of personal information in research, marketing, and other contexts.

It may seem like we’re encouraging less use and reuse of information by imagining a system where people put licenses on information they already make public (see screenshots from the first post.)  But what the licenses would make clear, which is not clear now, is that there is a difference between something being put out for the public, for general use and enjoyment, and something being put out for someone else’s reuse, gain, and potential profit.  Those who use the license would be signaling clearly their willingness to make their information available for research and other public uses.

About a year ago, researchers at the Berman Center for the Internet and Society at Harvard released a dataset of Facebook profile information for an entire class of college students at an “an anonymous, northeastern American university.”  As Michael Zimmer pointed out, however, the dataset was hardly “anonymous.”  He was quickly able to deduce that the university in question was Harvard.  Although some have argued that some of these profiles were already “public,” Zimmer argues (and we agree) that having a public profile does not equal consent to being a research subject:

This leads to the second point: just because users post information on Facebook doesn’t mean they intend for it to be scraped, aggregated, coded, disected, and distributed. Creating a Facebook account and posting information on the social networking site is a decision made with the intent to engage in a social community, to connect with people, share ideas and thoughts, communicate, be human. Just because some of the profile information is publicly avaiable (either consciously by the user, or due to a failure to adjust the default privacy settings), doesn’t mean there are no expectations of privacy with the data. This is contextual integrity 101.

By creating a license that allows people to clearly signal when they do consent to being “scraped, aggregated, coded, dissected, and distributed,” we would also make clearer that when people don’t clearly signal their consent, that consent cannot be assumed.

5) Ultimately create new scenarios in which licenses can be used.

So far, the scenarios I’ve outlined in which a license could be applied are where information is being displayed openly, as on a website.  But the licenses could eventually apply to more closed systems, where the individual’s decision to share data is not itself public.

CDP is working on building a datatrust, a new kind of institution and trusted entity to store sensitive, personal information and make it publicly accessible for research.  Individuals and institutions could choose to donate data to the datatrust, knowing that they are contributing to public knowledge on a range of issues.  CDP will likely use a system of licenses that allow each data donor to pre-determine his or her preferences on how their data is accessed rather than a single “terms of use” tha applies to everyone, take it or leave it.

Similarly, if the licenses were to become popular, other organizations and companies that collect information from their members or account holders would be under pressure to offer these set choices or licenses when people sign up for accounts that require them to provide personal information.

Taxonomy of data

Thursday, November 19th, 2009

I haven’t yet posted Parts II and III of our series on the idea of creating Creative Commons-type sharing licenses for personal information, but Bruce Schneier posted today on a proposed taxonomy of data, and I thought it was worth sharing now.  Although the taxonomy he’s discussing is limited to social networking data, it’s a helpful way to understand why it’s so hard to come up with rules around personal information in general.

Here is his taxonomy on social networking data:

  1. Service data. Service data is the data you need to give to a social networking site in order to use it. It might include your legal name, your age, and your credit card number.
  2. Disclosed data. This is what you post on your own pages: blog entries, photographs, messages, comments, and so on.
  3. Entrusted data. This is what you post on other people’s pages. It’s basically the same stuff as disclosed data, but the difference is that you don’t have control over the data — someone else does.
  4. Incidental data. Incidental data is data the other people post about you. Again, it’s basically same same stuff as disclosed data, but the difference is that 1) you don’t have control over it, and 2) you didn’t create it in the first place.
  5. Behavioral data. This is data that the site collects about your habits by recording what you do and who you do it with.

As I noted in my first license blog post, our idea is focusing strictly on “disclosed data,” data an individual actively chooses to release.  It doesn’t address the messiness around how the other types of data are being used and reused, except in that we hope explicitly talking about individual preferences around “disclosed data” can help all of us understand what really matters to people (and what doesn’t) when they talk about the need for privacy around other forms of data.

Remixing Creative Commons licenses for personal information, Part I

Wednesday, November 18th, 2009

Creative Commons, in creating its licenses, did a very sexy thing.  It didn’t repeal the Sonny Bono Copyright Term Extension Act, it didn’t change technology.  Yet it managed to shift the social norm around intellectual property.  It’s now cool to share.  And they did this, not by forcing people to give up their rights, but by offering a set of choices by which those rights can be exercised in a way that encourages collaboration and ultimately benefits the public.

Imitation being the sincerest form of flattery, we at CDP have been playing around with the idea of creating personal information licenses, a la Creative Commons. Right now, we live in a pardadoxical world where 1) people have little control over how their information is used and reused, and 2) lots of valuable, fascinating raw data is locked up because of the danger of violating privacy.  Big corporations get a lot of value out of their data-mining; researchers and regular individuals, not so much.  Modern privacy problems aren’t exactly analogous to modern intellectual property problems, but we think Creative Commons-type licenses could have a lot to offer in addressing these two issues.  We’re certainly not the first to think along these lines, but we want to add our voice to the ongoing discussion.

Over the next couple of posts, I’m going to lay out how such licenses might work, the scenarios in which people might choose to license their personal information, what such licenses could accomplish, and the challenges and obstacles such licenses would face.

What choices would the licenses offer?

Imagine a set of licenses with a specific, pre-determined set of choices.  Anyone who wants to signal their willingness to make their personal information available to the public could choose among these licenses and display it prominently, wherever their information is provided, whether it’s an online forum, a social network, or even personal website or blog.

The choices could include the following:


  1. First ask my permission before using the information
  2. Tell me that you are going to use my information.
  3. I don’t care.


  1. I’m okay with non-commercial academic use for research and/or publication.
  2. I’m okay with non-commercial governmental use.
  3. I’m okay with all uses.


  1. If I’ve provided any of this information, strip my information of classic identifiers (as enumerated, most likely, name, email address, etc.), though with no guarantee that this equals “anonymous.”
  2. If I have not provided any identifiers, do not try to re-identify me.
  3. [intermediary option of better anonymization, should the technology develop]
  4. I don’t care.

What kind of “personal information” could be licensed?

The license could be attached to any personal information the individual has gathered and displayed.  It could apply to:

Fertility Forum

Specifics of a medical condition, as shared on an online forum.


An individual’s profile information on Facebook, MySpace, or other social networking site.

An individual’s personal website and/or blog.

As these examples make clear, we’re not talking about slapping a license on “all personal information” about a person in the abstract universe, but about placing a license on specific bits of data collected and displayed by an individual online.  A set of information, a dataset, even arguably a database.  It’s an open question, what might be “licensable,” what might even be worth licensing.

Which brings us to the question, is it worth licensing information that’s already out there, in public view?  Would a license end up restricting rather than enabling more information sharing?  Why would it be useful to license information in the above examples?

All good questions that I’m going to try to address in Posts II and III…

What have we been doing?

Monday, October 19th, 2009

We’ve been silent for a while on the blog, but that’s because we’ve been distracted by actual work building out the datatrust (both the technology and the organization).

Here’s a brief rundown of what we’re doing.

Grace is multi-tasking on 3 papers.

Personal Data License We’re conducting a thought experiment to think through what the world might look like if there was an easy way for individuals to release personal information on their own terms.

Organizational Structures We’ve conducted a brief survey of a few organizational structures we think are interesting models for the datatrust “trusted” entities from Banks to Public Libraries and “member-based” organizations from Credit Unions to Wikipedia. We tried to answer the question: What institutional structures can be practical defenses against abuses of power as the datatrust becomes a significant repository of highly sensitive personal information?

Snapshot of Publicly Available Data Sources A cursory overview of some of the more interesting data sets that are available to the public from government agencies to answer the question: How is the datatrust going to be different / better than the myriad data sources we already have access to today?

We also now have 2 new contributors to CDP: Tony Gibbon and Grant Baillie.

A couple of months ago, Alex wrote about a new anonymization technology coming out of Microsoft Research: PINQ. It’s an elegant, simple solution, but perhaps not the most intuitive way for most people to think about guaranteeing privacy.

Tony is working on a demonstration of PINQ in action so that you and I can see how our privacy is protected and therefore believe *that* it works. Along the way, we’re figuring out what makes intuitive sense about the way PINQ works and what doesn’t and what we’ll need to extend so that researchers using the datatrust will be able to do their work in a way that makes sense.

Grant is working on a prototype of the datatrust itself which involves working out such issues as:

  • What data schemas will we support? We think this one to begin with: Star Schema.
  • How broadly do we support query structures?
  • Managing anonymizing noise levels.

To help us answer some of these questions, we’ve gathered a list of data sources we think we’d like to support in this first iteration. (e.g. IRS tax data, Census data) (More to come on that.)

We will be blogging about all of these projects in the coming week, so stay tuned!

Get Adobe Flash player