Posts Tagged ‘Access to Information’

Yahoo or Google as a Datatrust? But will Facebook play?

Monday, May 4th, 2009

Time will tell, but it appears that Yahoo! has made it *really* easy (for application developers) to extract publicly available data from all over the interwebs and query it through Yahoo!’s servers.

YQL Execute allows you to build tables of data from other sources online, using Javascript as a programming language and run it on Yahoo’s servers, so the infrastructure needs are very small.

Similarly, Google “just launched a new search feature that makes it easy (for you and I) to find and compare public data.”

Graph from Google Public Data

Image taken from the Google Blog.

Which is pretty exciting as both are huge leaps towards what we’ve envisioned as a “datatrust” in various blog posts and our white paper. Well except for maybe the “trust” part. (Especially given our experiences with Yahoo here and here.)

A few more points to contemplate:

  1. Now that the Promised Land of collating all the world’s data approaches on the horizon, will that change people’s willingness to make data publicly accessible? What I share on my personal website might not be okay rearing its head in new contexts I never intended. As we’ve said elsewhere, when talking about privacy, context is everything.
  2. What about ownership? Both Yahoo! and Google may only temporarily cache the data insofar as is needed to serve it up. But, in effect, they will become the gatekeepers to all of our public data, data you and I contribute to. So the question remains, What about ownership?
  3. There’s still a lot of data that’s *not* publicly accessible. Possibly some of the most interesting and accurate data out there. How will we get at that? Case in point, Facebook just shut down a new app that allows you to extract your personal “Facebook Newsfeed” and make it public via an RSS feed, citing, what else? Privacy concerns. (Not to mention the fact that access to Facebook data is generally hamstrung by privacy.)

Data’s endless possibilities

Friday, January 9th, 2009

The New York Times recently published a succinct but meaty article on New York City’s new electronic health record system.  Planned and promoted by the Bloomberg administration, the system includes about 1000 primary care physicians, focused primarily on three of the poorest neighborhoods, and the data they generate about their patients.  As I read it, I found myself counting all the different functions of the system.  I found at least ten:

•    Clean up outdated filing systems;
•    Enable a doctor to compare how one patient is doing compared with his or her other patients;
•    Enable a doctor to compare how one patient is doing compared to patients all over the city;
•    Enable the city’s public health department to monitor disease frequency and outbreaks, like the flu;
•    Enable the city to promote preventative measures, like cancer screening in new ways;
•    Create new financial incentives for doctors to improve their patients’ health, on measures like controlling blood pressure or cholesterol;
•    Provide reports cards to doctors comparing their results with other doctors’;
•    Improve care by less-experienced doctors with advice and information based on a patient’s age, sex, ethnic background, and medical history, including prompts to provide routine tests and vaccinations and warnings on how drugs can potentially interact;
•    Allow doctors to follow up more closely with patients, like reminding them of appointments through new calling and text-messaging systems and being notified if their patients do not fill prescriptions; and
•    Allow patients to access their own records, make appointments electronically, and monitor their own progress on health targets (should the doctor decide to do so);

Pretty amazing, isn’t it?

Data is like that.  Once you collect it, the possibilities are endless.  Reading about this one system for health records made me realize why it’s so hard for me to describe CDP’s goals in one sentence.  We’re not trying to do something singular, like “enable a doctor to compare patients’ data.”  We’re trying to create a place where this function, and innumerable other possibilities can exist, while also being mindful that “endless possibilities” include some scary ones that we need to guard against.

Making personal data more personal

Monday, December 29th, 2008

nys-health-site.jpg

The New York State Department of Health recently launched a new online tool for researching the prevalence of certain medical conditions by zip code.  It has a terribly boring name—Prevention Quality Indicators in New York State—but what they’re providing is very exciting.

Prevention Quality Indicators or PQIs are a set of measures developed by a federal health agency.  They count the number of people admitted to hospitals for a specific list of twelve conditions, some of which include various complications from diabetes, hypertension, asthma, and urinary tract infections.  All of these are conditions in which good preventative care can help avoid hospitalization or the development of more severe conditions.  As the Department explains, “The PQIs can be used as a starting point for evaluating the overall quality of primary and preventive care in an area. They are sometimes characterized as ‘avoidable hospitalizations,’ but this does not mean that the hospitalizations were unnecessary or inappropriate at the time they occurred.”

It’s not the kind of data that would normally get your average New York resident excited.  Even though it’s personal information—it doesn’t get more personal than health—it’s unlikely to feel very personal to anyone.

That’s what makes numbers and data off-putting for so many people.  Even when the numbers include people like us, we don’t see ourselves in them, so it’s hard to feel like those numbers have anything to say to us personally.  At the same time, so many decisions are being made based on data, huge decisions that affect all of us.  It’s important for democracy that ordinary citizens have a stake in the data, that they not only have access to the data but that they also have an interest in reviewing the data themselves.

What’s interesting to me about this website, then, is that is its potential for making this obscure piece of government health data much more immediate and personal for ordinary citizens, and not just public health data geeks.  As soon as I heard about this website, the first thing I did was look up my zip code, “11205” in the county of Kings (Brooklyn).  I could then see racial disparities in the admission rate for these conditions in my neighborhood, and even see data on specific hospitals in my area.  Whenever there is a way to organize and access data in a way that is personal to the user, it’s immediately more compelling.

There’s no particular reason for me to wonder what asthma admission rates were in my zip code in 2006.  But I can imagine a mother of a child with asthma coming upon this site, wondering what asthma rates are in her zip code and the ones around it, and maybe seeing patterns that lead her to talk to other parents and elected officials.  And I can imagine other data sets of personal information being made truly relevant and personal in similar ways.

Woo-hoo, more data…from Amazon?

Tuesday, December 9th, 2008

Amazon announced recently that they would begin hosting huge databases of public information on their servers and charging users only for the cost of computing and storage for their own applications.  Although this information is already publicly available, Amazon’s service in hosting the data means scientists, other researchers, and businesses no longer have to create their own infrastructure to store and analyze this data.  It’s the data equivalent of a library—where people can do research without having to house and maintain their own collections.

This is an incredible service Amazon is providing, but it did make me wonder, do we need an Andrew Carnegie of public databases for our time?  Carnegie, of course, was not a saint, and he imposed terms on the towns that applied for his money, but ultimately, he created the public institution of the public library.  Although we now take the idea of a public library for granted, to the point that we’ve let many of them wither away without funding, we’ve come to believe wholeheartedly that public access to information is essential and right.  Even the great collections of private universities support this principle; as nonprofit institutions given tax-exempt status, they are governed by their missions to add knowledge to the world and have simple procedures to grant access to people who are not affiliated with the university.

Here, Amazon is providing public access, but as a private company rather than a public institution or nonprofit organization.  I’m not saying that nonprofits and government entities are morally superior to private companies, or that private companies are incapable of providing a public service.  I actually think that private and public, for-profit and non-profit approaches to different issues is crucial for creating a truly vibrant marketplace of ideas.  But given the central and increasingly commanding role of data in our lives, it’s essential that we at least ask ourselves the question, “Are there functions that nonprofits and public institutions could fill better with regards to public access to data, than private companies?

We at the Common Data Project obviously believe there are good reasons to found a nonprofit organization to make data more public and accessible.  The number one reason, for me, is that the goal of public access to information may not always jive neatly with the more simple and straightforward goal of profit for a private company.

But what do you think?

The great story of good data

Wednesday, October 22nd, 2008

I love stories.

You might think, then, that I wouldn’t love data.  Stories and data are often seen as two very different ways of presenting information.  Data is considered cold, impersonal, incomplete.

But much of data’s bad reputation comes from limited data, not data in and of itself.  As Hans Rosling, a Swedish professor, demonstrates in this video, data can tell amazing stories.

It’s long but well-worth watching in its entirety.  It’s a few years old, from the 2006 TED conference, but I would bet it’s almost as riveting on YouTube as it was live at the conference.  In it, Rosling uses animated graphs of UN statistics from 1962 to 2003 to tell stories about our world and how it’s changed in ways that defy easy generalizations.

For example, his Swedish medical students studying global health assumed that there were two kinds of countries in the world—Western countries where family sizes are small and people live longer, and Third World countries where family sizes are large and people die young.  But as his animated graphs show, many countries that are still poor and developing have moved by 2003 into the upper left-hand quadrant, of countries with smaller families and longer life expectancies.  By 2003, Vietnam is in the same place the United States was in 1974.  As he declares, “If we don’t look at the data, we underestimate the tremendous change in Asia.”

In my favorite segment, like a great novelist building a complex character, Rosling breaks down one set of data over and over, showing the much more interesting and complex story behind average income and child survival.

easy-gdp.png

The first graph, comparing GDP per capita among countries in the OECD, East Asia, South Asia, Africa, and Latin America, tells the story we all expect.  The blue dot in the upper right-hand quadrant is OECD countries; the small red dot on the bottom left-hand quadrant is Africa.

within-africa.png

But then he shows how different countries within Africa have tremendous variations in GDP per capita, as well as child mortality, despite Western conceptions of a monolithic “Africa and its problems.”

within-countries.png

And just when you’re patting yourself on the back for understanding that Africa includes a very diverse range of countries, he shows that even within the countries, the distribution of income is very broad.  The highest income quintile in South Africa is quite high, approaching the average per capita GDP in the United States.

As Rosling says, “Improvement of the world must be highly contextualized!”  And the data is what will allow us to do it.  His demonstration itself shows how data can be limiting, how it can be used to “prove” that all of Africa is poor and sick.  But the solution clearly isn’t to ignore the data but to look at more data. Ultimately, broad, detailed, longitudinal data push us to think harder, rather than rest on our assumptions. Stories still need to be told–how did Mauritius get wealthy and healthy?  Why didn’t Ghana? But without the data, we wouldn’t even know those stories were there.

Get Adobe Flash playerPlugin by wpburn.com wordpress themes