Posts Tagged ‘Creative Commons’

Creative Commons-style licenses for personal information, Part III: What are the challenges?

Monday, November 30th, 2009

In the first two posts, I described how personal information licenses might work and why they might be useful in shifting the debate around how personal information is collected and used. Sharing information could be cool!  People could exercise choices!  Companies could be pressured to offer similar choices!

Unfortunately, it wouldn’t be that easy.  There are certainly obstacles and challenges to creating a system of personal information licenses for common use, which I describe below.

1) Personal information isn’t property—why do you want to propertize it?

The short answer is, we don’t. We’re well aware that there is a history of academic debate on this issue, pro and con around whether making personal information personal property would make it easier to protect individual privacy.  Although the issues are certainly interesting, we don’t want to step into that debate and we don’t think we have to for the licenses we’re imagining.

First, let’s examine how personal information is viewed today.  I can’t own a fact.  I can’t own the fact that I’m 32, but I can have copyright in an essay in which I state I am 32 and I can have copyright in a database that includes the fact I am 32 if I’m creative in building the structure of that database (in the U.S.).  We can understand the reasoning behind this.  We want to live in a world where facts are “free” to be used and reused without any need to pay a licensing fee.

But the simple declaration, “You can’t own a fact” doesn’t begin to describe the many ways in which people are collecting data, selling it, renting it, and otherwise making money off of it.  When a company sells a mailing list, it may not “own” the fact that I live at XYZ Avenue in Brooklyn, but it certainly is using it to its advantage.  Why, then, should the fact that I can’t own the fact of where I live keep me from sharing that data as I like and trying to control it in new ways?

The digital revolution is forcing us to think beyond property/not property.  Facts have become valuable even when they’re not technically “owned” by anyone.  I haven’t come up with some snappy new terms to use, but the issue should no longer be defined solely around “property/not property.”

BlueKai

Some new businesses seem to be working off this model. Blue Kai and KindClicks, while collecting personal information for market research, provide individuals with a way of stating their preferences and monetizing their data.  KindClicks, for example, allows individuals who contribute data to then donate the money they make off their data to the charity of their choice.  BlueKai collects data through cookies, but provides a link on their site by which users can see what information has been gathered about them. Those who want to opt out can.  Those that choose to participate in BlueKai’s registry can then choose to donate a portion of their “earnings” to charity.

We don’t actually want to model these companies’ ways of valuing data.  CDP’s mission isn’t to make sure that everyone gets a dollar here or a dollar there every time their data is accessed.  To us, the value of such information is immense to the public and yet not easily measurable in dollars.  But we do want to explore the idea that we could just take control of our data and obtain value from it, even if it’s the non-monetary, social value of providing something useful to the public.

2) How would these licenses be enforceable? What about existing terms of use on online forums, social networks?

This is a big question.  I’m not sure what kind of dataset could be licensable and the extent to which a license could cover facts within that dataset.  Could the license really encourage new forms of sharing if there was no way to prevent people from using individual facts within that dataset outside of the license terms?  How useful would such a license be?

Arguably, Creative Commons licenses are not easily enforced.  They certainly have an easier case for arguing that they are enforceable; there has been one case I know of where a court upheld the terms of the license.  But the vast majority of people using photographs, art, and other work outside the terms of the license do it without impunity.  Most CC license holders never find out their Flickr photo was used outside the license terms, and most wouldn’t have the resources to do anything about it even if they did find out.  Yet CC licenses have still managed to impact societal norms on intellectual property.

Personal information licenses may still have an effect, then, on societal norms about how information is collected and shared regardless of how much the licenses are litigated.  Even the process of litigation may help us as a society have a smarter conversation about current practices.

As to the objection that the licenses wouldn’t work in the face of existing terms of use for social networks and other sites — the fact that I might not be able to “license” my own information that I put myself on Facebook just underscores why creative, proactive, even aggressive strategies might be necessary.

3) Why would you encourage people to put their personal information out in public?  Isn’t it irresponsible to encourage people to provide information that could increase risk of identity theft and fraud?

I don’t want to dismiss this concern off-hand.  But as my father likes to say, everything in life has good and bad.  There are many things we do that are risky, and we try our best to minimize those risks, both as a society when we pass laws and as individuals when we take more particular, personal measures.  Driving is a very dangerous activity.  It is also a very valuable one.  Many governments have decided to legislatively require the wearing of seatbelts.  Many of us personally make the decision to practice other safe driving techniques that aren’t legally required.

We think it’s imperative that we, as a society, think hard and carefully about how to minimize the risks of personal information being used, collected, and exchanged.  Creative Commons-style licenses for personal information sharing may or may not be the best way to address today’s privacy problems.  I’m curious to hear if you think the risks outweigh the benefits and why.  But to shut down the idea solely because the risk exists — that is not going to help push the conversation forward.

CONCLUSION

Licenses for making personal information more widely available for research and public use—would they work?  Maybe, maybe not.  Worth exploring?  Most definitely.

We’d love to hear your questions and comments.

Remixing Creative Commons licenses for personal information, Part II — What good would that do?

Wednesday, November 25th, 2009

The scenarios of data sharing I outlined in my first blog post may not sound too exciting to you.  So what if one person uploads a dataset on her blog, making it public, and then says it’s available for reuse?  How does that make the world a better place?

It’s possible that although personal information licenses, a la Creative Commons, wouldn’t solve all data-collection problems today, it could shape and shift the debate in several important ways:

1) Create a proactive way for people to take control of their information.

Right now, we as users generally are told, “Take it or leave it.”  We can agree with the terms of use that govern the use of our personal information, or not. A few companies are trying to offer more choices—Firefox has a “Private Browsing” option, Google offers some choices in what interests are tracked.  But a user almost never gets a choice in how his or her information is used once it’s collected.  A set of licenses could be a way to assert control instead of waiting for the choices to be offered.  As many privacy advocates have noted, it’s problematic that most privacy choices are offered as an opt-out rather than an opt-in.  A set of licenses would create a way to “opt-in” before being asked.  Even if the licenses turned out to be difficult to enforce, if the licenses became popular and widespread, it would be harder to ignore that people do have preferences that are not being considered or honored.

2) Create a grassroots way for people to actively share their information for causes they explicitly support.

Obama's Healthcare Stories for America

We’ve all seen campaigns that are organized around human-interest stories, true stories about real people that are meant to humanize a campaign and give it urgency.  The current healthcare debate, for example, inspired a host of organizations to ask people to “share their stories,” the Obama administration’s site being one of the best-organized ones.

It had the following “Submission Terms“:

submission terms

By submitting your story, you agree that the story, along with any pictures or video you submit along with the story (the “Submission”), is non-confidential and may be freely used and disclosed, in whole or in part and in any manner or media, by or on behalf of Democratic National Committee (“DNC”) in support of health care reform.

You acknowledge that such use will be without acknowledgment or compensation to you.

You grant DNC a perpetual, irrevocable, sublicensable, royalty-free license to publish, reproduce, distribute, display, perform, adapt, create derivative works of and otherwise use the Submission.

Despite the all-or-nothing language, the Obama site was still able to solicit a great number of stories.  But the terms underscore a perennial problem for lesser-known organizations.  How do people trust an organization with their stories?

A more decentralized set of licenses could allow people to essentially tag their information across the internet and flag that it’s been provided in support of a specific cause, without giving their stories explicitly to another organization.  Individuals could also choose to tag their information in support of specific research projects.

The licenses could be an organizing tool, a way for organizations or people without established reputations to gather useful information without asking people to sign away the rights to their stories.  Or the licenses could be a research tool, enabling new forms of data collection.  Already, sociologists are exploring the possibilities of broadening research beyond the couple hundred subjects that can be managed through more traditional methods.  At Harvard, a graduate student in psychology created an iPhone application that allows research subjects in a study on happiness to rate their happiness in real time, rather than through recollection with an interviewer later.

Would the existence of standard licenses for sharing personal information make organizing around real stories easier?  Could it make personal information-based research easier?  Could it encourage people who support such causes or research but are uncertain about existing privacy guarantees more willing to try?  We think it’s certainly worth exploring.

3) Make sharing cool (and good).

WhyIGivebutton

Creative Commons is not without controversy, but almost everyone would agree, what the organization did manage to do was making sharing work cool.  The licenses created an easy way for people who shared the same view of intellectual property to band together and display their commitment.  They also made it easier to advertise and sell this ethos of IP to others.

We wonder if a set of licenses for sharing personal information might not be able to do the same.  We want to promote sharing information as a virtue, a civic act of generosity, and a way to enable all of us to have more information for decisions.  We want donating information to feel like donating blood.

4) Raise the bar on use of personal information in research, marketing, and other contexts.

It may seem like we’re encouraging less use and reuse of information by imagining a system where people put licenses on information they already make public (see screenshots from the first post.)  But what the licenses would make clear, which is not clear now, is that there is a difference between something being put out for the public, for general use and enjoyment, and something being put out for someone else’s reuse, gain, and potential profit.  Those who use the license would be signaling clearly their willingness to make their information available for research and other public uses.

About a year ago, researchers at the Berman Center for the Internet and Society at Harvard released a dataset of Facebook profile information for an entire class of college students at an “an anonymous, northeastern American university.”  As Michael Zimmer pointed out, however, the dataset was hardly “anonymous.”  He was quickly able to deduce that the university in question was Harvard.  Although some have argued that some of these profiles were already “public,” Zimmer argues (and we agree) that having a public profile does not equal consent to being a research subject:

This leads to the second point: just because users post information on Facebook doesn’t mean they intend for it to be scraped, aggregated, coded, disected, and distributed. Creating a Facebook account and posting information on the social networking site is a decision made with the intent to engage in a social community, to connect with people, share ideas and thoughts, communicate, be human. Just because some of the profile information is publicly avaiable (either consciously by the user, or due to a failure to adjust the default privacy settings), doesn’t mean there are no expectations of privacy with the data. This is contextual integrity 101.

By creating a license that allows people to clearly signal when they do consent to being “scraped, aggregated, coded, dissected, and distributed,” we would also make clearer that when people don’t clearly signal their consent, that consent cannot be assumed.

5) Ultimately create new scenarios in which licenses can be used.

So far, the scenarios I’ve outlined in which a license could be applied are where information is being displayed openly, as on a website.  But the licenses could eventually apply to more closed systems, where the individual’s decision to share data is not itself public.

CDP is working on building a datatrust, a new kind of institution and trusted entity to store sensitive, personal information and make it publicly accessible for research.  Individuals and institutions could choose to donate data to the datatrust, knowing that they are contributing to public knowledge on a range of issues.  CDP will likely use a system of licenses that allow each data donor to pre-determine his or her preferences on how their data is accessed rather than a single “terms of use” tha applies to everyone, take it or leave it.

Similarly, if the licenses were to become popular, other organizations and companies that collect information from their members or account holders would be under pressure to offer these set choices or licenses when people sign up for accounts that require them to provide personal information.

Taxonomy of data

Thursday, November 19th, 2009

I haven’t yet posted Parts II and III of our series on the idea of creating Creative Commons-type sharing licenses for personal information, but Bruce Schneier posted today on a proposed taxonomy of data, and I thought it was worth sharing now.  Although the taxonomy he’s discussing is limited to social networking data, it’s a helpful way to understand why it’s so hard to come up with rules around personal information in general.

Here is his taxonomy on social networking data:

  1. Service data. Service data is the data you need to give to a social networking site in order to use it. It might include your legal name, your age, and your credit card number.
  2. Disclosed data. This is what you post on your own pages: blog entries, photographs, messages, comments, and so on.
  3. Entrusted data. This is what you post on other people’s pages. It’s basically the same stuff as disclosed data, but the difference is that you don’t have control over the data — someone else does.
  4. Incidental data. Incidental data is data the other people post about you. Again, it’s basically same same stuff as disclosed data, but the difference is that 1) you don’t have control over it, and 2) you didn’t create it in the first place.
  5. Behavioral data. This is data that the site collects about your habits by recording what you do and who you do it with.

As I noted in my first license blog post, our idea is focusing strictly on “disclosed data,” data an individual actively chooses to release.  It doesn’t address the messiness around how the other types of data are being used and reused, except in that we hope explicitly talking about individual preferences around “disclosed data” can help all of us understand what really matters to people (and what doesn’t) when they talk about the need for privacy around other forms of data.

Remixing Creative Commons licenses for personal information, Part I

Wednesday, November 18th, 2009

Creative Commons, in creating its licenses, did a very sexy thing.  It didn’t repeal the Sonny Bono Copyright Term Extension Act, it didn’t change technology.  Yet it managed to shift the social norm around intellectual property.  It’s now cool to share.  And they did this, not by forcing people to give up their rights, but by offering a set of choices by which those rights can be exercised in a way that encourages collaboration and ultimately benefits the public.

Imitation being the sincerest form of flattery, we at CDP have been playing around with the idea of creating personal information licenses, a la Creative Commons. Right now, we live in a pardadoxical world where 1) people have little control over how their information is used and reused, and 2) lots of valuable, fascinating raw data is locked up because of the danger of violating privacy.  Big corporations get a lot of value out of their data-mining; researchers and regular individuals, not so much.  Modern privacy problems aren’t exactly analogous to modern intellectual property problems, but we think Creative Commons-type licenses could have a lot to offer in addressing these two issues.  We’re certainly not the first to think along these lines, but we want to add our voice to the ongoing discussion.

Over the next couple of posts, I’m going to lay out how such licenses might work, the scenarios in which people might choose to license their personal information, what such licenses could accomplish, and the challenges and obstacles such licenses would face.

What choices would the licenses offer?

Imagine a set of licenses with a specific, pre-determined set of choices.  Anyone who wants to signal their willingness to make their personal information available to the public could choose among these licenses and display it prominently, wherever their information is provided, whether it’s an online forum, a social network, or even personal website or blog.

The choices could include the following:

A)   NOTIFICATION:

  1. First ask my permission before using the information
  2. Tell me that you are going to use my information.
  3. I don’t care.

B)   COMMERCIAL/NON-COMMERCIAL USE:

  1. I’m okay with non-commercial academic use for research and/or publication.
  2. I’m okay with non-commercial governmental use.
  3. I’m okay with all uses.

C)   LEVEL OF PRIVACY

  1. If I’ve provided any of this information, strip my information of classic identifiers (as enumerated, most likely, name, email address, etc.), though with no guarantee that this equals “anonymous.”
  2. If I have not provided any identifiers, do not try to re-identify me.
  3. [intermediary option of better anonymization, should the technology develop]
  4. I don’t care.

What kind of “personal information” could be licensed?

The license could be attached to any personal information the individual has gathered and displayed.  It could apply to:

Fertility Forum

Specifics of a medical condition, as shared on an online forum.

ashtonkutcher

An individual’s profile information on Facebook, MySpace, or other social networking site.

An individual’s personal website and/or blog.

As these examples make clear, we’re not talking about slapping a license on “all personal information” about a person in the abstract universe, but about placing a license on specific bits of data collected and displayed by an individual online.  A set of information, a dataset, even arguably a database.  It’s an open question, what might be “licensable,” what might even be worth licensing.

Which brings us to the question, is it worth licensing information that’s already out there, in public view?  Would a license end up restricting rather than enabling more information sharing?  Why would it be useful to license information in the above examples?

All good questions that I’m going to try to address in Posts II and III…

Intellectual property and privacy: where they intersect

Wednesday, May 27th, 2009

Creative Commons recently launched a Creative Commons license application for Facebook.  You can now download a Creative Commons license and declare your willingness to share your photos, videos, and even status updates.

For those who aren’t intellectual property geeks, Creative Commons is nonprofit organization that’s worked hard to change the norm around intellectual property law.  In a world where music companies crack down on kids on YouTube videos singing copyrighted songs, Creative Commons has made sharing of intellectual property something the cool kids want to do.  It’s created different licenses by which you can indicate how you’re willing for your work to be used, as long as it’s attributed to you.  Do a search for “Creative Commons” on Flickr, and you’ll see there are lots of people who want to increase what’s in the public domain.

This news is fascinating to me for a couple of reasons.  First, as ReadWriteWeb points out, how will the Creative Commons licenses interact with Facebook’s terms of use, by which Facebook has “a non-exclusive, transferable, sub-licensable, royalty-free, worldwide license to use any IP content that you post on or in connection with Facebook”?  Right now, you can control who you share with within Facebook.  Facebook, however, controls how your information is shared outside of Facebook, whether with marketers and advertisers or with researchers.  Putting a Creative Commons license on your profile is kind of an awesome way to reassert control, even if it’s not yet clear what that will mean.

Second, it’s a good reminder that even if Facebook is all about sharing, there’s a difference between posting something on Facebook and putting something into the public domain. It might seem counterintuitive that anyone would put a “sharing” license on what they post on Facebook.  But even if you have your profile public for all the world to see, you haven’t actually given permission for anything on your profile to be used anywhere else.

Which leads me to my last point: this new license for Facebook could have some fascinating implications for our debates about online privacy.

We at the Common Data Project are also interested in how a license could help signal sharing preferences, but for personal information rather than intellectual property.   In the spirit of Creative Commons, we want to create ways for people to clearly indicate their willingness for certain personal data to be put into a “common” pool, where it is anonymized and aggregated to individual specifications before being made available to the public for research, analysis, and other public uses.   People generally don’t “own” their personal information the way they own their car or the photos they take, but we wanted to both 1) make sharing information for public good a cool thing, and 2) create a new way to signal privacy preferences.

And as this new Creative Commons license for Facebook makes clear, the line between intellectual property and personal information is becoming increasingly blurred.  The kind of licensing system we’re imagining will have different specifications than the Creative Commons licenses, but it’s great to see movement in similar directions and I’m curious to see how our work will continue to intersect.


Get Adobe Flash player