Posts Tagged ‘Privacy’

Taxonomy of data

Thursday, November 19th, 2009

I haven’t yet posted Parts II and III of our series on the idea of creating Creative Commons-type sharing licenses for personal information, but Bruce Schneier posted today on a proposed taxonomy of data, and I thought it was worth sharing now.  Although the taxonomy he’s discussing is limited to social networking data, it’s a helpful way to understand why it’s so hard to come up with rules around personal information in general.

Here is his taxonomy on social networking data:

  1. Service data. Service data is the data you need to give to a social networking site in order to use it. It might include your legal name, your age, and your credit card number.
  2. Disclosed data. This is what you post on your own pages: blog entries, photographs, messages, comments, and so on.
  3. Entrusted data. This is what you post on other people’s pages. It’s basically the same stuff as disclosed data, but the difference is that you don’t have control over the data — someone else does.
  4. Incidental data. Incidental data is data the other people post about you. Again, it’s basically same same stuff as disclosed data, but the difference is that 1) you don’t have control over it, and 2) you didn’t create it in the first place.
  5. Behavioral data. This is data that the site collects about your habits by recording what you do and who you do it with.

As I noted in my first license blog post, our idea is focusing strictly on “disclosed data,” data an individual actively chooses to release.  It doesn’t address the messiness around how the other types of data are being used and reused, except in that we hope explicitly talking about individual preferences around “disclosed data” can help all of us understand what really matters to people (and what doesn’t) when they talk about the need for privacy around other forms of data.

Remixing Creative Commons licenses for personal information, Part I

Wednesday, November 18th, 2009

Creative Commons, in creating its licenses, did a very sexy thing.  It didn’t repeal the Sonny Bono Copyright Term Extension Act, it didn’t change technology.  Yet it managed to shift the social norm around intellectual property.  It’s now cool to share.  And they did this, not by forcing people to give up their rights, but by offering a set of choices by which those rights can be exercised in a way that encourages collaboration and ultimately benefits the public.

Imitation being the sincerest form of flattery, we at CDP have been playing around with the idea of creating personal information licenses, a la Creative Commons. Right now, we live in a pardadoxical world where 1) people have little control over how their information is used and reused, and 2) lots of valuable, fascinating raw data is locked up because of the danger of violating privacy.  Big corporations get a lot of value out of their data-mining; researchers and regular individuals, not so much.  Modern privacy problems aren’t exactly analogous to modern intellectual property problems, but we think Creative Commons-type licenses could have a lot to offer in addressing these two issues.  We’re certainly not the first to think along these lines, but we want to add our voice to the ongoing discussion.

Over the next couple of posts, I’m going to lay out how such licenses might work, the scenarios in which people might choose to license their personal information, what such licenses could accomplish, and the challenges and obstacles such licenses would face.

What choices would the licenses offer?

Imagine a set of licenses with a specific, pre-determined set of choices.  Anyone who wants to signal their willingness to make their personal information available to the public could choose among these licenses and display it prominently, wherever their information is provided, whether it’s an online forum, a social network, or even personal website or blog.

The choices could include the following:

A)   NOTIFICATION:

  1. First ask my permission before using the information
  2. Tell me that you are going to use my information.
  3. I don’t care.

B)   COMMERCIAL/NON-COMMERCIAL USE:

  1. I’m okay with non-commercial academic use for research and/or publication.
  2. I’m okay with non-commercial governmental use.
  3. I’m okay with all uses.

C)   LEVEL OF PRIVACY

  1. If I’ve provided any of this information, strip my information of classic identifiers (as enumerated, most likely, name, email address, etc.), though with no guarantee that this equals “anonymous.”
  2. If I have not provided any identifiers, do not try to re-identify me.
  3. [intermediary option of better anonymization, should the technology develop]
  4. I don’t care.

What kind of “personal information” could be licensed?

The license could be attached to any personal information the individual has gathered and displayed.  It could apply to:

Fertility Forum

Specifics of a medical condition, as shared on an online forum.

ashtonkutcher

An individual’s profile information on Facebook, MySpace, or other social networking site.

An individual’s personal website and/or blog.

As these examples make clear, we’re not talking about slapping a license on “all personal information” about a person in the abstract universe, but about placing a license on specific bits of data collected and displayed by an individual online.  A set of information, a dataset, even arguably a database.  It’s an open question, what might be “licensable,” what might even be worth licensing.

Which brings us to the question, is it worth licensing information that’s already out there, in public view?  Would a license end up restricting rather than enabling more information sharing?  Why would it be useful to license information in the above examples?

All good questions that I’m going to try to address in Posts II and III…

Geeks Go Shopping

Tuesday, November 17th, 2009

Web curator Jason Kottke shares the items his readers bought after clicking on Amazon links he posted.

Weird that Amazon makes this information available. But good, I suppose, that it’s anonymous.

Double weird: people are still buying VHS. And Amazon is still selling it.

From Star Wars to Jedi – Making of a Saga [VHS]

In the mix

Friday, November 6th, 2009

Cuil’s Famous Privacy Policy No Longer Protects Privacy (michaelzimmer.org)

Google’s Privacy Dashboard Doesn’t Tell Us Anything We Didn’t Know Before (ReadWriteWeb)

What have we been doing?

Monday, October 19th, 2009

We’ve been silent for a while on the blog, but that’s because we’ve been distracted by actual work building out the datatrust (both the technology and the organization).

Here’s a brief rundown of what we’re doing.

Grace is multi-tasking on 3 papers.

Personal Data License We’re conducting a thought experiment to think through what the world might look like if there was an easy way for individuals to release personal information on their own terms.

Organizational Structures We’ve conducted a brief survey of a few organizational structures we think are interesting models for the datatrust “trusted” entities from Banks to Public Libraries and “member-based” organizations from Credit Unions to Wikipedia. We tried to answer the question: What institutional structures can be practical defenses against abuses of power as the datatrust becomes a significant repository of highly sensitive personal information?

Snapshot of Publicly Available Data Sources A cursory overview of some of the more interesting data sets that are available to the public from government agencies to answer the question: How is the datatrust going to be different / better than the myriad data sources we already have access to today?

We also now have 2 new contributors to CDP: Tony Gibbon and Grant Baillie.

A couple of months ago, Alex wrote about a new anonymization technology coming out of Microsoft Research: PINQ. It’s an elegant, simple solution, but perhaps not the most intuitive way for most people to think about guaranteeing privacy.

Tony is working on a demonstration of PINQ in action so that you and I can see how our privacy is protected and therefore believe *that* it works. Along the way, we’re figuring out what makes intuitive sense about the way PINQ works and what doesn’t and what we’ll need to extend so that researchers using the datatrust will be able to do their work in a way that makes sense.

Grant is working on a prototype of the datatrust itself which involves working out such issues as:

  • What data schemas will we support? We think this one to begin with: Star Schema.
  • How broadly do we support query structures?
  • Managing anonymizing noise levels.

To help us answer some of these questions, we’ve gathered a list of data sources we think we’d like to support in this first iteration. (e.g. IRS tax data, Census data) (More to come on that.)

We will be blogging about all of these projects in the coming week, so stay tuned!

What does it take to be an IAPP-certified privacy professional? What should it take?

Wednesday, September 9th, 2009

IAPPcert

UPDATE: I recently was referred to this thoughtful blog post on a similar topic, “Nurturing an Accountable Privacy Profession.” Well-worth a read.

A few weeks ago, I was very relieved to find out I had passed the IAPP exam to be a “Certified Information Privacy Professional” or CIPP.  I got this certificate and even a pin, which is more than I ever got for passing the bar exams of New York and California.

So what exactly did I need to know to become a CIPP?

To be certified in corporate privacy law, you’re expected to know what’s covered in the CIPP Body of Knowledge, primarily major U.S. privacy laws and regulations and “the legal requirements for the responsible transfer of sensitive personal data to/from the United States, the European Union and other jurisdictions.”

You’re also expected to pass the Certification Foundation, required for all three certifications offered by IAPP.  That covers basic privacy law, both in the U.S. and abroad, information security principles and practices, and “online privacy,” which includes an overview of the technologies used by online companies to collect information and the particular issues to be considered in this context.

So what do you think?  Should you be able to pass an all-objective, 180 question, three-hour exam (counting the CIPP and Certification Foundation exams together) on the above topics and be able to call yourself a “privacy professional”?

There are no sample questions available online, and I was too cheap to take a prep course, but if I remember correctly, a typical question on the exam went something like this:

The Gramm-Leach-Bliley Act authorizes financial institutions to share consumer information with third parties if:

a. The information is not personally identifiable.

b. The consumer is informed and given the opportunity to opt-out.

c.  Any information without notice if it is shared with affiliated companies.

d.  All of the above.

The answer would be “C,” since the consumer is only required to be given notice if the third party is “non-affiliated.”  My sample is poorly constructed, and there are also questions that require you to analyze a fact pattern, but essentially, the exam covers existing laws, practices, and technologies.

It doesn’t ever ask you, “What would you do if you were advising RealAge and they told you they wanted to sell answers from a health questionnaire to pharmaceutical companies?”  Or, “Is Facebook doing enough to prevent third parties from misusing images of Facebook members in their ads?”

IAPP presumably doesn’t ask you these questions because there’s no “objectively” right answer.  There may, one day, be an objectively legal answer, depending on if and when legislation gets passed.  Still, it’s obvious that in the field of privacy, the most interesting aspects are not what laws do exist, but what laws should exist, what practices should be used, what innovations, both technological and social, should be promoted to protect privacy in meaningful ways.  But the exam only covers what is, not what could be or what should be.

Privacy may be an ancient concept, but it’s a very modern, very new, very undefined profession, which perhaps is even more reason for the IAPP to exist.  We as a society, particularly in the U.S., are struggling to figure out what privacy means and what we need to do to protect it.  While the medical profession has the Hippocratic Oath dating back to the 4th century B.C., and the legal profession’s adherence to the concept of attorney-client privilege goes back at least as far as the 16th century, the privacy profession has no clear guiding principle.  We don’t know yet what it should be.

I’m not really criticizing the IAPP for having a test that doesn’t quite encompass the dynamic, constantly changing field of privacy.  It’s not like other professions do better.  The bar exam certainly doesn’t screen out incompetent, unethical people from practicing law, even if you are actually required to pass an ethics exam.  And the IAPP does provide resources to its members for tracking changes in privacy law and policy.  But I’m curious to see where the IAPP goes as it tries to “professionalize” the profession, whether the certification exam will change and what expectations will be set for IAPP-certified privacy professionals.  Perhaps in another 100 years, or hopefully sooner, we’ll have a code of conduct for privacy professionals.

In the mix

Wednesday, September 9th, 2009

OpenID Pilot Program to be Announced by U.S. Government (ReadWriteWeb)

Stimulus Funding Map is “Slick as Hell” (FlowingData)

Why Anonymized Data Isn’t (Slashdot)

In the mix

Thursday, August 27th, 2009

What Facebook Quizzes Know About You (ReadWriteWeb)

Facebook Ratchets Up Privacy Controls (Again)

Ole Miss to Tweet Its Watts (CNET News)

In the mix

Friday, August 14th, 2009

Google Opt Out Feature Allows Users Protect Privacy by Moving to Remote Village (The Onion)

Privacy Plan for Federal Websites Gets Mixed Reviews (NY Times)

Welcome to our guided tour of online privacy policies!

Tuesday, July 21st, 2009

We’ve just published our report, “How to Read a Privacy Policy” on our website.  You may have seen some of the blog posts we wrote summarizing each section, but you can now find all the sections together here.

There are other privacy policy analyses out there: Privacy International’s 2007 report describing privacy practices of major companies in general and Know Privacy, a research project created by students from the UC Berkeley School of Information that compares user expectations with data collection methods today for policymakers and website operators.

But we thought there was room for one more, one that takes the web user on a guided tour of that inscrutable document, the online privacy policy, and explains what issues she should keep in mind.  We walk through the privacy policies of companies like Microsoft, Google, Yahoo, and Amazon, as of June 2009, and explain what they’re promising and what they’re not.

A quick visual of how privacy policies stack up next to each other, literally. See it bigger.

Questions or comments?  Please let us know!

Get Adobe Flash playerPlugin by wpburn.com wordpress themes