Archive for the ‘Best Practices’ Category

Whitepaper 2.0: A moral and practical argument for public access to private data.

Monday, April 4th, 2011

It’s here! The Common Data Project’s White Paper version 2.0.

This is our most comprehensive moral and practical argument to date for the creation of a public datatrust that provides public access to today’s growing store of sensitive personal information.

At this point, there can be no doubt that sensitive personal data, in aggregate, is and will continue to be an invaluable resource for commerce and society. However, today, the private sector holds a near monopoly on such data. We believe that it is time We, The People gain access to our own data; access that will enable researchers, policymakers and NGOs acting in the public interest to make decisions in the same data-informed ways businesses have for decades.

Access to sensitive personal information will be the next “Digital Divide” and our work is perhaps best described as an effort to bridge that gap.

Still, we recognize that there are many hurdles to overcome. Currently, highly valuable data, from online behavioral data to personal financial and medical records are silo-ed and, in the name of privacy, inaccessible. Valuable data is kept out of the reach of the public and in many cases unavailable even to the businesses, organizations and government agencies that collect the data in the first place. Many of these data holders have business reasons or public mandates to share the data they have, but can’t or only do so in a severely limited manner and through a time-consuming process.

We believe there are technological and policy solutions that can remedy this situation and our white paper attempts to sketch out these solutions in the form of a “datatrust.”

We set out to answer the major questions and open issues that challenge the viability of the datatrust idea.

  1. Is public access to sensitive personal information really necessary?
  2. If it is, why isn’t this already a solved problem?
  3. How can you open up sensitive data to the public without harming the individuals represented in that data?
  4. How can any organization be trusted to hold such sensitive data?
  5. Assuming this is possible and there is public will to pull it off, will such data be useful?
  6. All existing anonymization methodologies degrade the utility of data, how will the datatrust strike a balance between utility and privacy?
  7. How will the data be collated, managed and curated into a usable form?
  8. How will the quality of the data be evaluated and maintained?
  9. Who has a stake in the datatrust?
  10. The datatrust’s purported mission is to serve the interests of society, will you and I as members of society have a say in how the datatrust is run?

You can read the full paper here.

Comments, reactions and feedback are all welcome. You can post your thoughts here or write us directly at info at commondataproject dot org.

@IAPP Privacy Foo Camp 2010: What Is Anonymous Enough?

Tuesday, October 26th, 2010

Editor’s Note: Becky Pezely is an independent contractor for Shan Gao Ma, a consulting company started by Alex Selkirk, President of the Board of the Common Data Project.  Becky’s work, like Tony’s, touches on many of the privacy challenges that CDP hopes to address with the datatrust.  We’re happy to have her guest blogging about IAPP Academy 2010 here.

Several weeks ago we attended the 2010 Global Privacy Summit (IAPP 2010) in Baltimore, Maryland.   

In addition to some engaging high-profile keynotes – including FTC Bureau of Consumer Protection Director David Vladeck – we got to participate in the first ever IAPP Foo Camp

The Foo Camp was comprised of four discussion topics aimed at covering the top technology concerns facing a wide-range of privacy professionals.

The session we ran was titled “Low Impact Data Mining”.  The intention was to discuss, and better understand, the current challenges in managing data within an organization.  All with a lens on managing data in a way that is “low impact” on resources while returning “high (positive) impact” on the business.

The individuals in our group represented a vast array of industries including: financial services, insurance, pharmaceutical, law enforcement, online marketing, health care, retail and telecommunications.  It was fascinating that, even across such a wide range of industries, that there could be such a pervasive set of privacy  challenges that were common among them.

Starting with:

What is “anonymous enough”?

If all you need is gender, zip code and birthdate to re-identify someone then what data, when released, is truly “anonymous enough”?  Can a baseline be defined, and enforced, within our organization that ensures customer protection?

It feels safe to say that this was the root-challenge from which all the others stemmed.  Today the release of data is mostly controlled, and subsequently managed, by a trusted person(s). The individual(s) is the ones responsible for “sanitizing” the data that gets released internally, or externally, to the organization.  They are charged with managing the release of data to fulfill everything from understanding business performance to fulfilling business obligations with partners.  And their primary concern is to know how well they are protecting their customer’s information, not only from the perspective of company policy, but also from a perspective of personal morals. They are they gatekeepers for assessing the level of protection provided based on which data they released to whom and they want to have some guarantee that what they are releasing is “anonymous enough” to have the level of protection they want to achieve.  These gatekeepers want to know when the data they release is “anonymous enough” and how they can employ a definition, or measurement, that guarantees the right level of anonymity for their customers.

This challenge compounds for these individuals, and their organizations, when adding in various other truths of the nature of data today:

The silos are getting joined.

The convention that used to be held was that data within an organization was in a silo – all on it’s own and protected – such that anyone looking at the data, would only see that set of data.  Now, it’s starting to become the reality that these data sets are getting joined and it’s not always known where, when, how, with whom the join originated. Nor is it known where the joined data set could is currently stored since it was modified from its original silo.  Soon that joined data-set takes on a life of its own and makes its way around the institution.  Given the likelihood of this occurring, how can the person(s) responsible for being the gatekeeper(s) of the data, and assessing the level of protection provided to customers, do so with any kind of reliable measurement that guarantees the right level of anonymity?

And now there’s data in the public market.

Not only is the data joined with data (from other silos) within the organization, but also with data outside the organization sold in the public market.  This prospect has increased the ability for organizations to produce data that is “high impact” for the business – because they now know WAY MORE about their customers.  But does the benefit outweigh the liability? As the ability to know more about individual customers increases, so does the level of sensitivity and the concern for privacy.    How do organizations successfully navigate mounting privacy concerns as they move from in silos, to joined-silos, to joined-silos combined with public data?   

The line between “data analytics” and looking at “raw data” is blurring.

Because the data is richer, and more plentiful, the act of data analysis isn’t as benign as it might once have been.  The definition of “data analytics” has evolved from something high-level (to know, for example, how many new customers are using the service this quarter) to something that  looks a lot more like looking at raw data to target specific parts of their business to specific customers (to, for example, sell <these products> to customers that make <this much money>, are females ages 30 – 35 and live in <this neighborhood> and typically spend <this much> on <these types of products>, etc…).

And the data has different ways of exiting the system.

The truth is, as scary as this data can be, everyone wants to get their hands on it, because the data leads to awareness that is meaningful and valuable for the business.  Thus, the data is shared everywhere – inside and outside the organization.  With that fact comes a whole set of challenges emerge when considering all the ways data might be exiting any given “silo”, such as: Where is all the data going?  How is it getting modified (joined, sanitized, rejoined) and at which point is it no longer the data that needs to be protected by the organization? How much data needs to be released externally to fulfill partner/customer business obligations? Once the data has exited, can the organization’s privacy practices still be enforced? 

Brand affects privacy policy.  Privacy policy affects brand.

Privacy is a concern of the whole business, not just the resources that manage the data, nor solely the resources that manage liability.  In the event of a “big oopsie” where there is a data/privacy breach, it will be the communication with customers before, during and after the incident that determines the internal and external impact on the brand and the perception of the organization.  And that communication is dictated by both what the privacy policy enforces and what brand “allows”.  In today’s age of data, how can an organization have an open dialog with customers about their data if the brand does not support having that kind of a conversation?  No surprise that Facebook is the exemplary case for this: Facebook continues to pave a new path, and draw customers, to share and disclose more information about themselves.  As a result they have experienced the backlash from customers when they take it too far. The line of communication is very open – customers have a clear way to lash back when Facebook has gone too far, and Facebook has a way of visibly standing behind their decision or admitting their mistake.  Either way, it is now commonplace for Facebook’s customers to expect that there will be more incidents like this and that Facebook has a way (apparently suitable enough to keep most customers) of dealing with it.  Their “policy” allowed them to respond this way, and now it’s become a part of who Facebook is.  And now the policy that evolves to support this behavior moving forward.

In the discussion of data and privacy, it seems inherently obvious that the mountain of challenges we face is large, complicated and impacts the core of all our businesses.  Nonetheless, it is still fascinating to have been able to witness first-hand – and to now be able to specifically articulate – how similar the challenges are across a diverse group of businesses and how similar the concerns are across job-function. 

We want to re-thank everyone from IAPP that joined in on the discussions that we had at Foo Camp and throughout the conference.  We look forward to an opportunity to deep dive into these types of problems.

Post Script: Meanwhile, the challenges, and related questions, around the anonymization of data with some kind of measurable privacy guarantee that came up at Foo Camp are ones that we have been discussing on our blog for quite some time.  These are precisely the sorts of challenges that have motivated us to create a datatrust.  While we typically envision the datatrust being used in scenarios where there isn’t direct access to data, we walked away with specific examples from our discussions at IAPP Foo Camp where direct access to the data is required – particularly to fulfill business obligation – as a type of collateral (or currency). 

The concept of data as the new currency of today’s economy has emerged.  Not only did it come up at the IAPP Foo Camp, it also came up back in August where we heard Marc Davis talk about this at IPP 2010. With all of this in mind, it is interesting evaluate the possibility of the datatrust being able to act as a special type of data broker in these exchanges.  The idea being that the datatrust is a sanctioned data broker (by the industry, or possibly by the government), that inherently meets federal, local, municipal regulations and protects the consumers of business partners who want to exchange data as “currency,” while alleviating businesses and their partners from the headaches of managing data use/reuse.  The “tax” on using the service is that these aggregates are stored and made available to the public to query in the way we imagine (no direct access to the data) for policy-making and research.  This is something that feels compelling to us and will influence our thinking as we continue to move forward with our work.

Common Data Project at IAPP Privacy Academy 2010

Monday, September 13th, 2010

We will be giving a Lightning Talk on “Low-Impact Data-Mining” and running two breakout sessions at the IT Privacy Foo Camp – Preconference Session, Wednesday Sept 29.

Below is a preview of our slides and handout for the conference. Unlike our previous presentations, we won’t be talking about CDP and the Datatrust at all. Instead, we’ll be focused on presenting on how SGM helps companies minimize the privacy impact of their data-mining.

More specifically, we’ll be stepping through the symbiotic documentation system we’ve created between the product development/data science folks collecting and making use of the data and the privacy/legal folks trying to regulate and monitor compliance with privacy policies. We will be using the SGM Data Dictionary as a case study in the breakout sessions.

Still, we expect that many of issues we’ve been grappling with from the datatrust perspective (e.g. public perception, trust, ownership of data, meaningful privacy guarantees) will come up as they are universal issues that are central to any meaningful discussion about privacy today.


Handout

What is data science?

An introduction to data-mining from O’Reilly Radar that provides a good explanation of how data-mining is distinct from previous uses of data and provides plenty of examples of how data-mining is changing products and services today.

The “Anonymous” Promise and De-indentification

  1. How you can be re-identified: Zip code + Birth date + Gender = Identity
  2. Promising new technologies for anonymization: Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization by Paul Ohm.

Differential Privacy: A Programmatic Way to Enforce Your Privacy Guarantee?

  1. A Microsoft Research Implementation: PINQ
  2. CDP’s write-up about PINQ.
  3. A deeper look at how differential privacy’s mathematical guarantee might translate into laymen’s terms.

Paradigms of Data Ownership: Individuals vs Companies

  1. Markets and Privacy by Kenneth C. Laudon
  2. Privacy as Property by Lawrence Lessig
  3. CDP explores the advantages and challenges to a “Creative Commons-style” model for licensing personal information?
  4. CDP’s Guide to How to Read a Privacy Policy

Google buys Metaweb: Can corporations acquire the halo effect of the underdog?

Monday, July 26th, 2010

Google recently bought Metaweb, a major semantic web company.  The value of Metaweb to Google is obvious — as ReadWriteWeb notes, “For the most part,…Google merely serves up links to Web pages; knowing more about what is behind those links could allow the search giant to provide better, more contextual results.” But what does the purchase mean for Metaweb?

Big companies buy small companies all the time.  Some entrepreneurs create their start-ups with that goal in mind — get something going and then make a killing when Google buys it.  But what do you think of a company when it seems to be doing something different and then is bought by Google?

Metaweb was never a nonprofit, but like Wikipedia, it has had a similar, community-driven vibe.  Freebase, its database of entities, is crowd-sourced, open, and free.  Google promises that Freebase will remain free, but will the community of people who contribute to Freebase feel the same contributing free labor to a mega-corporation?  Is there anything keeping Google from changing its mind in the future about keeping Freebase free?  How will the culture of Metaweb change as its technologies evolve within Google?

This isn’t to say that Metaweb’s goals have necessarily been compromised by its purchase by Google.   Many people feel like this is the best thing that could have happened to the semantic web.

(Though a few feel, “They didn’t make it big. In fact, this means they failed at their mission of better organizing the world’s information so that rich apps could be built around it. They never got to the APPS part. FAIL!”, and at least one person is concerned Google bought Freebase to kill it.)

But what did you think when Google bought the nonprofit Gapminder, of Hans Rosling’s famous TED talk?

Or when eBay bought a 25% stake in Craigslist?

Or outside the tech world, when Unilever bought Ben & Jerry’s?

Can a company or organization maintain any high-minded mission to be driven by principles other than profit when they’re bought by a major publicly held corporation?

This isn’t just an abstract question for us.  One of the biggest reasons why we chose to be a 501(c)(3) nonprofit organization is that we wanted to make sure no one running the Common Data Project would be tempted to sell its greatest asset, the data it hopes to bring together, for short-term profit.  As a nonprofit, CDP is not barred from making profits, but no profits can inure to the benefit of any private individual or shareholder.  Also as a nonprofit, should CDP dissolve, it cannot merely sell its assets to the highest bidder but must transfer them to another nonprofit organization with a similar mission.

We’re still working on understanding the legal distinctions between IRS-recognized tax-exempt organizations and for-profit businesses.  We were surprised when we first found out that Gapminder, a Swedish nonprofit, had been bought by Google.  Swedish nonprofit law may differ from U.S. nonprofit law.  But it appears Hans Rosling did not stand to make a dime.  Google only bought the software and the website, and whatever that was worth went to the organization itself.  So in a way, the experience of Gapminder supports the idea that being a nonprofit does make a difference in restricting the profit motives of individuals.  Alex Selkirk, as the founder and President of CDP, will never make a windfall through CDP.

The fact that CDP is not profit-driven, and will never be profit-driven makes a difference to us.  Does it make a difference to you?

In the mix…new organizational structures, giant list of data brokers, governments sharing citizens’ financial data, and what IT security has to do with Lady Gaga

Friday, July 9th, 2010

1) More on new kinds of organizational structures for entities that want to form for philanthropic purposes but not fit into the IRS definition of a nonprofit.

2) CDT shone a spotlight on Spokeo, a data broker last week.  Who are other data brokers? Don’t be shocked, there are A LOT of them.  What they do, they mainly do out of the spotlight shone on companies like Facebook, but with very real effects.  In 2005, ChoicePoint sold data to identity thieves posing as a legitimate business.

3) The U.S. has come to an agreement with Europe on sharing finance data, which the U.S. argues is an essential tool of counterterrorism.  The article doesn’t say exactly how these investigations work, whether specific suspects are targeted or whether large amounts of financial data are combed for suspicious activity.  It does make me wonder, given how data crosses borders more easily than any other resource, how will Fourth Amendment protections in the U.S. (and similar protections in other countries) apply to these international data exchanges?  There is also this pithy quote:

Giving passengers a way to challenge the sharing of their personal data in United States courts is a key demand of privacy advocates in Europe — though it is not clear under what circumstances passengers would learn that their records were being misused or were inaccurate.

4) Don’t mean to focus so much on scary data stuff, but 41% of IT professionals admit to abusing privileges.  In a related vein, it turns out a disgruntled soldier accused of illegally downloading classified data managed to do it by disguising his CDs as Lady Gaga CDs.  Even better,

He was able to avoid detection not because he kept a poker face, they said, but apparently because he hummed and lip-synched to Lady Gaga songs to make it appear that he was using the classified computer’s CD player to listen to music.

The New York Times is definitely getting cheekier.

Who has your data and how can the government get it?

Monday, June 28th, 2010

Who has your data? And how can the government get it?

The questions are more complicated than they might seem.

In the last month, we’ve seen Facebook criticized and scrutinized at every turn for the way they collect and share their users’ data.  Much of that criticism was deserved, but what was missing in that discussion were the companies that have your data without even your knowledge, let alone your consent.

The relationship between a user and Facebook is at least relatively straightforward.  The user knows his or her data has been placed in Facebook, and legislation could be updated relatively easily to protect his or her expectation of privacy in that data.

But what about the data consumer service companies share with third parties?

Pharmacies sell prescription data that includes you; cellphone-related businesses sell data that includes you.

So much of the data economy involves companies and businesses that don’t necessarily have you as a customer, and thus even less incentive to protect your interests.

What about data that’s supposedly de-identified or anonymized?  We know that such data can be combined with another dataset to re-identify people.  Could the government seek that kind of data and avoid getting even a subpoena?  Increasingly, the companies that have data about you aren’t even the companies you initially transacted with.  How will existing privacy laws, even proposed reforms by the Digital Due Process coalition, deal with this reality?

These are all questions that consume us at the Common Data Project for good reason.  As an organization dedicated to enabling the safe disclosure of personal information, we are committed to talking about privacy and anonymity in measurable ways, rather than with vague promises.

If you read a typical privacy policy, you’ll see language that goes something like this,

Google only shares personal information with other companies or individuals outside of Google in the following limited circumstances:…

We have a good faith belief that access, use, preservation or disclosure of such information is reasonably necessary to (a) satisfy any applicable law, regulation, legal process or enforceable governmental request

We think the datatrust needs to be do better than that. We want to know exactly what “enforceable government request” means.  We want to think creatively about what individual privacy rights mean when organizations are sharing information with each other. We’ve written up the aspects that seem most directly relevant to our project here, including 1) a quick overview of federal privacy law; 2) implications for data collectors today; and 3) implications for the datatrust.

We ultimately have more questions than answers.  But we definitely can’t assume we know everything there is to know.  Even at the Supreme Court, where the Justices seem to have some trouble understanding how pagers and text messages work, they understand that the world is changing quickly.  (See City of Ontario v. Quon.)  We all need to be asking questions together.

So take a look.  Let us know if there are issues we’re missing. What are some other questions we should be asking?

In the mix…data for coupons, information literacy, most-visited sites

Friday, June 4th, 2010

1) There’s obviously an increasing move to a model of data collection in which the company says, “give us your data and get something in return,” a quid pro quo.  But as Marc Rotenberg at EPIC points out,

The big problem is that these business models are not very stable. Companies set out privacy policies, consumers disclose data, and then the action begins…The business model changes. The companies simply want the data, and the consumer benefit disappears.

It’s not enough to start with compensating consumers for their data.  The persistent, shareable nature of data makes it very different from a transaction involving money, where someone can buy, walk away, and never interact with the company again.  These data-centered companies are creating a network of users whose data are continually used in the business.  Maybe it’s time for a new model of business, where governance plans incorporate ways for users to be involved in decisions about their data.

2) In a related vein, danah boyd argues that transparency should not be an end in itself, and that information literacy needs to developed in conjunction with information access.  A similar argument can be made about the concept of privacy.  In “real life” (i.e., offline life), no one aims for total privacy.  Everyday, we make decisions about what we want to share with whom.  Online, total privacy and “anonymization” are also impossible, no matter the company promises in its privacy policy.  For our datatrust, we’re going to use PINQ, a technology using differential privacy, that acknowledges privacy is not binary, but something one spends.  So perhaps we’ll need to work on privacy and data literacy as well?

3) Google recently released a list of the most visited sites on the Internet. Two questions jump out: a) Where is Google on this list? and b) Could the list be a proxy for the biggest data collectors online?

Mark Zuckerberg: It takes a village to build trust.

Friday, June 4th, 2010

This whole brouhaha over Facebook privacy appears to be stuck revolving around Mark Zuckerberg.

We seem to be stuck in a personal tug-of-war with the CEO of Facebook frustrated that a 26 year-old personally has so much power over so many.

Meanwhile, Mark Z. is personally reassuring us that we can trust Facebook which on some level implies we must trust him.

But should any single individual really be entrusted with so much? Especially “a 26 year-old nervous, sweaty guy who dodges the questions.” Harsh, but not a completely invalid point.

As users of Facebook, we all know that it is the content of all our lives and our relationships to each other that make Facebook special. As a result, we feel a sense of entitlement about Facebook policy-making that we don’t feel about services that are in many ways way more intrusive and/or less disciplined about protecting privacy (e.g. ISPs, cellphone providers, search).

Another way of putting it is, Facebook is not Apple! and as a result, needs a CEO who is a community leader, not a dictator of cool.

So we start asking questions like, why should Facebook make the big bucks at the expense of my privacy? Shouldn’t I get a piece of that?

(Google’s been doing this for over a decade now, but the privacy exposure over at Google is invisible to the end-user.)

At some point, will we decide we would rather pay for a service than feel like we’re being manipulated by companies who know more about us than we do and can decide whether to use that information to help us or hurt us depending on profit margin. Here’s another example.

Or are there other ways to counterbalance the corporate monopoly on personal information? We think so.

In order for us to trust Facebook, Facebook needs to stop feeling like a benevolent dictatorship, albeit one open to feedback, but also one with a dictator who looks like he’s in need of a regent.

Instead Facebook the company should consider adopting some significant community-driven governance reforms that will at least give it the patina of a democracy.


(Even if at the end of the day, it is beholden to its owners and investors.

For some context, this was the sum total of what Mark Z. had to say about how important decisions are made at Facebook:

We’re a company where there’s a lot of open dialogue. We have crazy dialogue and arguments. Every Friday, I have an open Q&A where people can come and ask me whatever questions they want. We try to do what we think is right, but we also listen to feedback and use it to improve. And we look at data about how people are using the site. In response to the most recent changes we made, we innovated, we did what we thought was right about the defaults, and then we listened to the feedback and then we holed up for two weeks to crank out a new privacy system.

Nothing outrageous. About par for your average web service. (But then again, Facebook isn’t your average web service.)

However, this is what should have been the meat of the discussion about how Facebook is going to address privacy concerns: community agency and decision-making, not Mark Z.’s personal vision of an interwebs brimming with serendipitous happenings.

Facebook the organization needs to be trusted. So it might be best if Mark Z. backed out of the limelight and stopped being the lone face of Facebook.

How might have that D8 interview have turned out if he had come on stage with a small group of Facebook users?

What governance changes would make you feel more empowered as a Facebook user?

Governing the Datatrust: Answering the question, “Why should I trust you with my data?”

Thursday, June 3rd, 2010

Progress on defining the datatrust is accelerating–we can almost smell it!

For a refresher, the datatrust is an online service that will allow organizations to open sensitive data to the public and provide researchers, policymakers and application developers with a way to directly query the data, all without compromising individual privacy. Read more.

For the past two years, we’ve been working on figuring out exactly what the datatrust will be, not just in technical terms, but also in policy terms.

We’ve been thinking through what promises the datatrust will make, how those promises will be enforced, and how best we can build a datatrust that is governed, not by the whim of a dictator, but by a healthy synergy between the user community, the staff, and the board.

The policies we’re writing and the infrastructure we’re building are still a work in progress.  But for an overview of the decisions we’ve made and outstanding issues, take a look at “Datatrust Governance and Policies: Questions, Concerns, and Bright Ideas”.

Here’s a short summary of our overall strategy.

  1. Make a clear and enforceable promise around privacy.
  2. Keep the datatrust simple. We will never be all things to all people. The functions it does have will be small enough to be managed and monitored easily by a small staff, the user community, and the board.
  3. Have many decision-makers. It’s more important that we do the right thing than that we do them quickly. We will create a system of checks and balances, in which authority to maintain and monitor the datatrust will be entrusted to several, separate parties, including the staff, the user community, and the board.
  4. Monitor, report and review, regularly. We will regularly review what we’re monitoring and how we’re doing it. Release results to the public.
  5. Provide an escape valve. Develop explicit, enforceable policies on what the datatrust can and can’t do with the data. Prepare a “living will” to safely dispose of the data if the organization can no longer meet its obligations to its user community and the general public.

We definitely have a lot of work to do, but it’s exciting to be narrowing down the issues.  We’d love to hear what you think!

P.S. You can read more about the technical progress we’re making on the datatrust by visiting our Projects page.

Measuring the privacy cost of “free” services.

Wednesday, June 2nd, 2010

There was an interesting pair of pieces on this Sunday’s “On The Media.”

The first was “The Cost of Privacy,” a discussion of Facebook’s new privacy settings, which presumably makes it easier for users to clamp down on what’s shared.

A few points that resonated with us:

  1. Privacy is a commodity we all trade for things we want (e.g. celebrity, discounts, free online services).
  2. Going down the path of having us all set privacy controls everywhere we go on internet is impractical and unsustainable.
  3. If no one is willing to share their data, most of the services we love to get for free would disappear. Randall Rothenberg.
  4. The services collecting and using data don’t really care about you the individual, they only care about trends and aggregates. Dr. Paul H. Rubin.

We wish one of the interviewees had gone even farther to make the point that since we all make decisions every day to trade a little bit of privacy in exchange for services, privacy policies really need to be built around notions of buying and paying where what you “buy” are services and how you pay for them are with “units” of privacy risk (as in risk of exposure).

  1. Here’s what you get in exchange for letting us collect data about you.”
  2. Here’s the privacy cost of what you’re getting (in meaningful and quantifiable terms).

(And no, we don’t believe that deleting data after 6 months and/or listing out all the ways your data will be used is an acceptable proxy for calculating “privacy cost.” Besides, such policies inevitably severely limit the utility of data and stifle innovation to boot.)

Gaining clarity around privacy cost is exactly where we’re headed with the datatrust. What’s going to make our privacy policy stand out is not that our privacy “guarantee” will be 100% ironclad.

We can’t guarantee total anonymity. No one can. Instead, what we’re offering is an actual way to “quantify” privacy risk so that we can track and measure the cost of each use of your data and we can “guarantee” that we will never use more than the amount you agreed to.

This in turn is what will allow us to make some measurable guarantees around the “maximum amount of privacy risk” you will be exposed to by having your data in the datatrust.


The second segment on privacy rights and issues of due process vis-a-vis the government and data-mining.

Kevin Bankston from EFF gave a good run-down how ECPA is laughably ill-equipped to protect individuals using modern-day online services from unprincipled government intrusions.

One point that wasn’t made was that unlike search and seizure of physical property, the privacy impact of data-mining is easily several orders of magnitude greater. Like most things in the digital realm, it’s incredibly easy to sift through hundreds of thousands of user accounts whereas it would be impossibly onerous to search 100,000 homes or read 100,000 paper files.

This is why we disagree with the idea that we should apply old standards created for a physical world to the new realities of the digital one.

Instead, we need to look at actual harm and define new standards around limiting the privacy impact of investigative data-mining.

Again, this would require a quantitative approach to measuring privacy risk.

(Just to be clear, I’m not suggesting that we limit the size of the datasets being mined, that would defeat the purpose of data-mining. Rather, I’m talking about process guidelines for how to go about doing low-(privacy) impact data-mining. More to come on this topic.)


Get Adobe Flash player