Editor’s Note: Becky Pezely is an independent contractor for Shan Gao Ma, a consulting company started by Alex Selkirk, President of the Board of the Common Data Project. Becky’s work, like Tony’s, touches on many of the privacy challenges that CDP hopes to address with the datatrust. We’re happy to have her guest blogging about IAPP Academy 2010 here.
Several weeks ago we attended the 2010 Global Privacy Summit (IAPP 2010) in Baltimore, Maryland.
The Foo Camp was comprised of four discussion topics aimed at covering the top technology concerns facing a wide-range of privacy professionals.
The session we ran was titled “Low Impact Data Mining”. The intention was to discuss, and better understand, the current challenges in managing data within an organization. All with a lens on managing data in a way that is “low impact” on resources while returning “high (positive) impact” on the business.
The individuals in our group represented a vast array of industries including: financial services, insurance, pharmaceutical, law enforcement, online marketing, health care, retail and telecommunications. It was fascinating that, even across such a wide range of industries, that there could be such a pervasive set of privacy challenges that were common among them.
What is “anonymous enough”?
If all you need is gender, zip code and birthdate to re-identify someone then what data, when released, is truly “anonymous enough”? Can a baseline be defined, and enforced, within our organization that ensures customer protection?
It feels safe to say that this was the root-challenge from which all the others stemmed. Today the release of data is mostly controlled, and subsequently managed, by a trusted person(s). The individual(s) is the ones responsible for “sanitizing” the data that gets released internally, or externally, to the organization. They are charged with managing the release of data to fulfill everything from understanding business performance to fulfilling business obligations with partners. And their primary concern is to know how well they are protecting their customer’s information, not only from the perspective of company policy, but also from a perspective of personal morals. They are they gatekeepers for assessing the level of protection provided based on which data they released to whom and they want to have some guarantee that what they are releasing is “anonymous enough” to have the level of protection they want to achieve. These gatekeepers want to know when the data they release is “anonymous enough” and how they can employ a definition, or measurement, that guarantees the right level of anonymity for their customers.
This challenge compounds for these individuals, and their organizations, when adding in various other truths of the nature of data today:
The silos are getting joined.
The convention that used to be held was that data within an organization was in a silo – all on it’s own and protected – such that anyone looking at the data, would only see that set of data. Now, it’s starting to become the reality that these data sets are getting joined and it’s not always known where, when, how, with whom the join originated. Nor is it known where the joined data set could is currently stored since it was modified from its original silo. Soon that joined data-set takes on a life of its own and makes its way around the institution. Given the likelihood of this occurring, how can the person(s) responsible for being the gatekeeper(s) of the data, and assessing the level of protection provided to customers, do so with any kind of reliable measurement that guarantees the right level of anonymity?
And now there’s data in the public market.
Not only is the data joined with data (from other silos) within the organization, but also with data outside the organization sold in the public market. This prospect has increased the ability for organizations to produce data that is “high impact” for the business – because they now know WAY MORE about their customers. But does the benefit outweigh the liability? As the ability to know more about individual customers increases, so does the level of sensitivity and the concern for privacy. How do organizations successfully navigate mounting privacy concerns as they move from in silos, to joined-silos, to joined-silos combined with public data?
The line between “data analytics” and looking at “raw data” is blurring.
Because the data is richer, and more plentiful, the act of data analysis isn’t as benign as it might once have been. The definition of “data analytics” has evolved from something high-level (to know, for example, how many new customers are using the service this quarter) to something that looks a lot more like looking at raw data to target specific parts of their business to specific customers (to, for example, sell <these products> to customers that make <this much money>, are females ages 30 – 35 and live in <this neighborhood> and typically spend <this much> on <these types of products>, etc…).
And the data has different ways of exiting the system.
The truth is, as scary as this data can be, everyone wants to get their hands on it, because the data leads to awareness that is meaningful and valuable for the business. Thus, the data is shared everywhere – inside and outside the organization. With that fact comes a whole set of challenges emerge when considering all the ways data might be exiting any given “silo”, such as: Where is all the data going? How is it getting modified (joined, sanitized, rejoined) and at which point is it no longer the data that needs to be protected by the organization? How much data needs to be released externally to fulfill partner/customer business obligations? Once the data has exited, can the organization’s privacy practices still be enforced?
In the discussion of data and privacy, it seems inherently obvious that the mountain of challenges we face is large, complicated and impacts the core of all our businesses. Nonetheless, it is still fascinating to have been able to witness first-hand – and to now be able to specifically articulate – how similar the challenges are across a diverse group of businesses and how similar the concerns are across job-function.
We want to re-thank everyone from IAPP that joined in on the discussions that we had at Foo Camp and throughout the conference. We look forward to an opportunity to deep dive into these types of problems.
Post Script: Meanwhile, the challenges, and related questions, around the anonymization of data with some kind of measurable privacy guarantee that came up at Foo Camp are ones that we have been discussing on our blog for quite some time. These are precisely the sorts of challenges that have motivated us to create a datatrust. While we typically envision the datatrust being used in scenarios where there isn’t direct access to data, we walked away with specific examples from our discussions at IAPP Foo Camp where direct access to the data is required – particularly to fulfill business obligation – as a type of collateral (or currency).
The concept of data as the new currency of today’s economy has emerged. Not only did it come up at the IAPP Foo Camp, it also came up back in August where we heard Marc Davis talk about this at IPP 2010. With all of this in mind, it is interesting evaluate the possibility of the datatrust being able to act as a special type of data broker in these exchanges. The idea being that the datatrust is a sanctioned data broker (by the industry, or possibly by the government), that inherently meets federal, local, municipal regulations and protects the consumers of business partners who want to exchange data as “currency,” while alleviating businesses and their partners from the headaches of managing data use/reuse. The “tax” on using the service is that these aggregates are stored and made available to the public to query in the way we imagine (no direct access to the data) for policy-making and research. This is something that feels compelling to us and will influence our thinking as we continue to move forward with our work.