The scenarios of data sharing I outlined in my first blog post may not sound too exciting to you. So what if one person uploads a dataset on her blog, making it public, and then says it’s available for reuse? How does that make the world a better place?
It’s possible that although personal information licenses, a la Creative Commons, wouldn’t solve all data-collection problems today, it could shape and shift the debate in several important ways:
1) Create a proactive way for people to take control of their information.
2) Create a grassroots way for people to actively share their information for causes they explicitly support.
We’ve all seen campaigns that are organized around human-interest stories, true stories about real people that are meant to humanize a campaign and give it urgency. The current healthcare debate, for example, inspired a host of organizations to ask people to “share their stories,” the Obama administration’s site being one of the best-organized ones.
It had the following “Submission Terms“:
By submitting your story, you agree that the story, along with any pictures or video you submit along with the story (the “Submission”), is non-confidential and may be freely used and disclosed, in whole or in part and in any manner or media, by or on behalf of Democratic National Committee (“DNC”) in support of health care reform.
You acknowledge that such use will be without acknowledgment or compensation to you.
You grant DNC a perpetual, irrevocable, sublicensable, royalty-free license to publish, reproduce, distribute, display, perform, adapt, create derivative works of and otherwise use the Submission.
Despite the all-or-nothing language, the Obama site was still able to solicit a great number of stories. But the terms underscore a perennial problem for lesser-known organizations. How do people trust an organization with their stories?
A more decentralized set of licenses could allow people to essentially tag their information across the internet and flag that it’s been provided in support of a specific cause, without giving their stories explicitly to another organization. Individuals could also choose to tag their information in support of specific research projects.
The licenses could be an organizing tool, a way for organizations or people without established reputations to gather useful information without asking people to sign away the rights to their stories. Or the licenses could be a research tool, enabling new forms of data collection. Already, sociologists are exploring the possibilities of broadening research beyond the couple hundred subjects that can be managed through more traditional methods. At Harvard, a graduate student in psychology created an iPhone application that allows research subjects in a study on happiness to rate their happiness in real time, rather than through recollection with an interviewer later.
Would the existence of standard licenses for sharing personal information make organizing around real stories easier? Could it make personal information-based research easier? Could it encourage people who support such causes or research but are uncertain about existing privacy guarantees more willing to try? We think it’s certainly worth exploring.
3) Make sharing cool (and good).
Creative Commons is not without controversy, but almost everyone would agree, what the organization did manage to do was making sharing work cool. The licenses created an easy way for people who shared the same view of intellectual property to band together and display their commitment. They also made it easier to advertise and sell this ethos of IP to others.
We wonder if a set of licenses for sharing personal information might not be able to do the same. We want to promote sharing information as a virtue, a civic act of generosity, and a way to enable all of us to have more information for decisions. We want donating information to feel like donating blood.
4) Raise the bar on use of personal information in research, marketing, and other contexts.
It may seem like we’re encouraging less use and reuse of information by imagining a system where people put licenses on information they already make public (see screenshots from the first post.) But what the licenses would make clear, which is not clear now, is that there is a difference between something being put out for the public, for general use and enjoyment, and something being put out for someone else’s reuse, gain, and potential profit. Those who use the license would be signaling clearly their willingness to make their information available for research and other public uses.
About a year ago, researchers at the Berman Center for the Internet and Society at Harvard released a dataset of Facebook profile information for an entire class of college students at an “an anonymous, northeastern American university.” As Michael Zimmer pointed out, however, the dataset was hardly “anonymous.” He was quickly able to deduce that the university in question was Harvard. Although some have argued that some of these profiles were already “public,” Zimmer argues (and we agree) that having a public profile does not equal consent to being a research subject:
This leads to the second point: just because users post information on Facebook doesn’t mean they intend for it to be scraped, aggregated, coded, disected, and distributed. Creating a Facebook account and posting information on the social networking site is a decision made with the intent to engage in a social community, to connect with people, share ideas and thoughts, communicate, be human. Just because some of the profile information is publicly avaiable (either consciously by the user, or due to a failure to adjust the default privacy settings), doesn’t mean there are no expectations of privacy with the data. This is contextual integrity 101.
By creating a license that allows people to clearly signal when they do consent to being “scraped, aggregated, coded, dissected, and distributed,” we would also make clearer that when people don’t clearly signal their consent, that consent cannot be assumed.
5) Ultimately create new scenarios in which licenses can be used.
So far, the scenarios I’ve outlined in which a license could be applied are where information is being displayed openly, as on a website. But the licenses could eventually apply to more closed systems, where the individual’s decision to share data is not itself public.
Similarly, if the licenses were to become popular, other organizations and companies that collect information from their members or account holders would be under pressure to offer these set choices or licenses when people sign up for accounts that require them to provide personal information.