Posts Tagged ‘statistics’

In the mix — open data issues, bad econ stats, Facebook gaydar, and fraud detection in data

Friday, April 30th, 2010

1) It’s definitely become trendy for cities to open up their data, and I appreciated this article about Vancouver for its substantive points:

  • It’s important that data not only be open but be available in real time.  In all my conversations with people who work with data, though, whenever you have sensitive data, there’s going to be a significant time lag between when the data is collected and when it is “cleaned up” and made presentable for the public so as to avoid inadvertent disclosure.  This is why we think something like PINQ, a filter using differential privacy, could be revolutionary in making data available more quickly — it won’t need to be scrubbed for privacy reasons.
  • Licensing is an issue — although the city claims the data is public domain, there are terms of use that restrict use of the data by things like OpenStreetMaps.  It discusses the possibility of using the Public Domain Dedication and License, which is a project of Open Data Commons.  Alex heard some interesting discussion on this issue from Jordan Hatcher at the OkCon this past weekend.  This is a really fascinating issue, and I’m curious to see where else this gets picked up.

2) Existing economic statistics are riddled with problems.  I can’t say this enough — if existing ways of collecting and analyzing data are not quite good enough, we need to be open to new ones.

3) This is an old article, but highlights an issue Mimi and I have been thinking a lot about recently: How can data, even when shared according to your precise directions, reveal more than you intended? In this case, researchers found you could more or less determine the sexual orientation of people on Facebook based on their friends, even if they hadn’t indicated it themselves.  Privacy is definitely about control, yet how do you control something you don’t even know you’re revealing?

4) This past week, the Supreme Court heard a case involving the right to privacy of those who sign petitions to put initiatives on the ballot.  There is a lot of stuff going on in this case, gay rights, the experience of those in California who were targeted for supporting Prop 8, the difference between voting and legislating, etc., but overall, it’s a perfect illustration of how complicated our understanding of public and private has gotten.  We leave those lists open to scrutiny so we can prevent fraud — people signing “Mickey Mouse” — but public when you can go look at the list at the clerks’ office and public when you can post information online for millions to see are two different things.  There may be reasons we want to make these names public other than to prevent fraud (Justice Scalia thinks so), but are there other ways fraud could be detected among signatories that would not require an open examination of all petition signers’ names?  Could modern technology help us detect odd patterns, fake names and more without revealing individual identities?

In the mix

Wednesday, June 3rd, 2009

Google is Top Tracker of Surfers in Study. (NY Times Bits Blog)

The Obama Administration’s Silence on Privacy. (NY Times Bits Blog)

This UK Sheriff Cites Officials for Serious Statistical Violations.  (WSJ The Numbers Guy)


Get Adobe Flash player