The Datatrust Product: What it is. What it’s not.

Monday, June 21st, 2010

The datatrust has always been a big-tent project, but over the last few months, we’ve done a lot of paring down.  We’re getting closer to something that feels like a product and less like a vague hope for a better future!

The following is an attempt to describe the datatrust “technology product” by way of comparison with existing websites and services. The “Governance and Policies” aspect of the datatrust was covered in a separate post.

Sensitive information about us. They have it, we don’t.

Today, most of the sensitive data about us (e.g. medical records, personal finance data, online search history) is inaccessible to us and to those who represent the public: elected officials, government agencies, advocacy groups, researchers.

Our Mission: Democratizing Access to Sensitive Data

While a significant movement has grown up around opening up government data, there are few efforts to gain public access to sensitive “personal information” data, most of it held in the private sector.

Our goal for the datatrust is to create an open marketplace for information to democratize access to some of the most sensitive and valuable data there is, to help us answer difficult policy and societal questions. .

Which brings us to the question: What is a datatrust?

A datatrust will be an online service that allows organizations to make sensitive data available to the public and provides researchers, policymakers and application developers with a way to directly query that data.

We believe the datatrust is only possible with 1) technical innovations that will allow us to provide a quantifiable and enforceable privacy guarantee; and 2) governance and policy innovations that will inspire public confidence.

The datatrust will include a data catalog, a registry of queries and their privacy risks, and a collaboration network for both data donors and data users.

We realize that as a new breed of service, the datatrust is difficult to conceptualize. So, we thought it might be helpful to compare it to some existing websites and services.

A Data Catalog

Like, the datatrust will provide ways to browse and search a “catalog” of available data.

A Query-able Database of “Raw Data”

Unlike, datatrust data will be released in “raw” form, not in pre-digested aggregate reports.

Unlike, datatrust data will not be viewable or downloadable.

Instead, the datatrust will provide a way to directly query raw data.

An “Automated” Privacy Filter

Unlike most open government data releases, the datatrust will not rely on labor-intensive and subjective anonymization methods. Existing methods like scrubbing, swapping or synthesizing data limit the accuracy and usefulness of the data.

By contrast, the datatrust will makes use of new privacy technologies to provide a measurable and enforceable privacy guarantee that treats individual privacy as a value-able asset with a quantifiable limit on re-use.

As a result, the datatrust will keep track of the amount of privacy risk incurred by each query

Privacy protection will happen on-the-fly, thereby automating the “anonymization” aspect of releasing data.

An Open Collaboration Network

Because the datatrust will maintain an open history of all queries and data users, it will also become an important open registry of how data is being used and analyzed. This in turn can become the foundation for a community of data donors and data users, who will collaborate on collecting and analyzing data for research and data-driven software applications.

Like Amazon, the datatrust will do a better job of describing and browsing data sets as well as eliciting user feedback and data-mining actual usage (as opposed to self-reported usage) to help users find relevant data sets.

Like Wikipedia, the datatrust will depend on an invested and active community to curate and manage the data.

Unlike Wikipedia and Yelp (but like Facebook and LinkedIn), the datatrust will require its users to maintain real and active identities in order to build a quality rating system for evaluating data and data use, based on actual usage and individual reputations (as opposed to explicit user ratings).

Not A Generic Set of Tools for Working With Data

Unlike Swivel, the datatrust is not a generic tool set for working with and visualizing data.

Unlike Ning (a consumer platform for creating your own social network), the datatrust is not a consumer platform for creating your own data-sharing networks. It is also not a developer toolkit for building data-driven services.

Unlike Freebase, the datatrust is not Wikipedia for structured data.

Not A Data-Driven Service for Consumers

You should not expect to come to the datatrust to find out if people like you are also experiencing worse than average allergies this year.

Unlike Mint or Patients Like Me, the datatrust is not a personal data-sharing service focused on offering a consumer service (personal finance manager in the case of Mint) or sharing a specific kind of data (tracking chronic diseases in the case of Patiens Like Me)..

But application builders like as well as researchers may find the datatrust useful in allowing them to provide services and collect data in new ways from larger groups of people, due to the measurable privacy guarantee provided by the datatrust.

The datatrust is just about data.

The datatrust is a sensitive data release engine and we will build tools insofar as it helps our Data Donors get more data to Data Users. However, it stops short of directly serving consumers. We think that is better left to those with a passion for a specific cause and the domain expertise to serve their constituents well.

