Archive for the ‘Best Practices’ Category

Building a community — will the real Mrs. Del Toro please stand up?

Tuesday, April 20th, 2010

As I discussed in my last post, having real profiles can really change the dynamic in an online community.  Yelp, which has more or less encourages Yelpers to use real identities, has created a community where people really care what other Yelpers think of them.  In contrast, Wikipedians care about the work others are doing, but they’re not so invested impressing other Wikipedians with their taste in music or food.  Yelp and Wikipedia have some similar incentives for people to create good stuff, like increased status and privileges, but Yelp feels like a social network while Wikipedia does not.

So how does the power of real profiles play out within Facebook, which is a social network and nothing else?  How do people’s concerns about their reputations play out when there are no reviews to write or encyclopedia entries to edit?  And how does Facebook in this context encourage people to create content?

(MySpace is a social network as well, but it’s so different from Facebook that I’m going to address it in a separate post.)

Facebook cares even more than Yelp about having real people.

Poor Peowtie, her Facebook account's been disabled.

While Yelp encourages people to use real first names and last initials and a real photo, Facebook requires it.  The Statement of Rights and Responsibilities states, along with other rules:

  1. You will not provide any false personal information on Facebook, or create an account for anyone other than yourself without permission.
  2. You will not use your personal profile for your own commercial gain (such as selling your status update to an advertiser).
  3. You will not use Facebook if you are under 13.
  4. You will not use Facebook if you are a convicted sex offender.
  5. You will keep your contact information accurate and up-to-date.
  6. You will not share your password, let anyone else access your account, or do anything else that might jeopardize the security of your account.
  7. You will not transfer your account to anyone without first getting our written permission.
  8. If you select a username for your account we reserve the right to remove or reclaim it if we believe appropriate (such as when a trademark owner complains about a username that does not closely relate to a user’s actual name).

Although we all know people who’ve sneaked through with a profile based on the name of a pet or a nickname, Facebook is diligent enough that it was very difficult for Caitlin Batman, Tim Six, Becky Super, and others with unusual names to sign up for an account.  My friend’s Peowtie Del Toro account, which she opened in October 2007, was disabled just this year.  (The above is a mock-up as it is completely inaccessible to her now.)

Facebook doesn’t create a product, like business reviews or an encyclopedia, separate from its social network.  Its product is essentially human relationships, to the extent that they can be captured through status updates, photos uploaded, articles linked, and virtual gifts/pokes sent.  For all the personal detail Yelpers put into their reviews, it doesn’t compare to how personal content is on Facebook.

As real people, Facebook members are hyper-aware of their reputations.

The fact that they are creating content that is solely about their lives means that Facebook users care even more about the way that content affects their reputations.  It’s not just about whether they write witty Yelp reviews or edit correctly on Wikipedia — it’s about who they are. As much as they horrify middle-aged adults, the kids who post drunken photos of themselves on Facebook do care about their reputations.  It’s just that at that moment in their lives, it’s more important that they project that image than a more staid, responsible one. Some people want to be the kind of people who have 900 Facebook friends; other people want equally strongly to be the kind of people who have 50.  Some people want their friends to know they made pickles with seasonal ramps that weekend; other people want their friends to know they were watching football.

And despite the fact that Facebook has become a symbol of our over-sharing culture, Facebook wants us to share even more.  The more we share, the more it can make in advertising.  The more we share, the more valuable its data becomes.

But Facebook can’t give you a gold star for being a cool person…

Because people on Facebook are creating content about themselves, Facebook can’t use the same incentive systems used by Yelp or Wikipedia.  Yelp can promote good reviewers to the Yelp Elite Squad, Wikipedia can give privileges to reliable editors, but it would be laughable for Facebook to create a Facebook Elite Squad.  Imagine if Facebook deemed some users’ vacation photos better than others, or gave karma points like Slashdot to those whose status updates were wittiest.

Facebook, however, can use people’s concerns about their reputations to motivate and promote activity.  There are ways for users to give each other the Facebook equivalent of badges, stars, and compliments: virtual gifts and “pokes.”

Facebook’s ways of motivating activity are generally more subtle.  Facebook doesn’t just ask people to share — it asks people to respond.  I once had a friend ask me why I never commented on her status updates.  She clearly cares whether people respond to what she says.  It’s part of why she uses Facebook.  If Facebook didn’t allow people to comment on each other’s status updates and posts, I imagine the level of activity would rapidly decrease.

Facebook’s “like” button serves an interesting purpose in this context.  Like Yelp’s “useful, funny or cool,” it lets people respond to their friends without having to write out an actual sentence.  It’s equivalent to a nod or sympathetic “uh-huh” offline — it’s a way to show you’re paying attention.

But of course, Facebook isn’t really satisfied with the level of activity currently happening.  Everyday, I’m given suggestions, not only for new friends but ways in which I could interact with existing ones.

I’m curious how many people actually see this and then go out and write on the wall of that elementary school friend who they haven’t communicated with since they accepted the friend request.  (In my case, I feel like it’s always telling me to reach out to Alex Selkirk, who I see almost everyday.)

It’ll be interesting to see what else Facebook tries as it works to monetize itself.  I don’t see how it can ever give out gold stars or badges or create elite classes within Facebook.  Not only would it be weird to rate a person for being a person, it would be difficult to come up with an incentive structure that appeals to its 400 million registered users.  Being a member of some Elite Squad, having karma points, being the Mayor of a local business as FourSquare does, might be appealing to some people.  It definitely won’t be appealing to all of them.

Completely not there versus almost not there.

Wednesday, April 14th, 2010


Picture taken by Stephan Delange

In my last post where I tried to quantify the concept of “discernibility” I left off at the point where I said I was going to try out my “50/50″ definition on the PINQ implementation of differential privacy.

It turned out to be a rather painful process. Both because I can be rather literal-minded in an unhelpful way at times and because it is plain hard to figure this stuff out.

To backtrack a bit, let’s first make some rather obvious statements to get a running start in preparation for wading through some truly non-obvious ones.

Crossing the discernibility line.

In the extreme case, we know that if there was no privacy protection whatsoever and the datatrust just gave out straight answers, then we would definitely cross the “discernibility line” and violate our privacy guarantee. So if we go back to my pirate friend again and ask, “How many people with skeletons in their closet wear an eye-patch and live in my building?” If you (my rather distinctive eye-patch wearing neighbor) exist in the data set, the answer will be 1. If you are not in the data set, the answer will be 0.

With no privacy protection, the presence or absence of your record in the data set makes a huge difference to the answers I get and are therefore extremely discernible.

Thankfully, PINQ doesn’t give straight answers. It adds “noise” to answers to obfuscate them.

Now when I ask, “How many people in this data set of people with skeletons in their closet wear an eye-patch and live in my building?” PINQ counts the number of people who meet these criteria and then decides to either “remove” some of those people or “add” some “fake” people to give me a “noisy” answer to my question.

How it chooses to do so is governed by a distribution curve developed and named for the French marquis Pierre-Simon La Place. (I don’t know why it has to be this particular curve, but I am curious to learn why.)

You can see the curve illustrated below in two distinct postures that illustrate very little privacy protection and quite a lot of privacy protection, respectively.

  • The point of the curve is centered on the “real answer.”
  • The width of the curve shows the range of possible “noisy answers” PINQ will choose from.
  • The height of the curve shows the relative probability of one noisy answer being chosen over another noisy answer.

A quiet curve with few “fake” answers for PINQ to choose from:

A noisy curve with many “fake” answers for PINQ to choose from:

More noise equals less discernibility.

It’s easy to wave your hands around and see in your mind’s eye how if you randomly add and remove people from “real answers” to questions, as you turn up the amount of noise you’re adding, the presence or absence of a particular record becomes increasingly irrelevant and therefore increasingly indiscernible. This in turn means that it will also be increasingly difficult to confidently isolate and identify a particular individual in the data set precisely because you can’t really ever get a “straight” answer out of PINQ that is accurate down to the individual.

With differential privacy, I can’t ever know that my eye-patch wearing neighbor has a skeleton in his closet. I can only conclude that he might or might not be in the dataset to varying degrees of certainty depending on how much noise is applied to the “real answer.”

Below, you can see how if you get a noisy answer of 2, it is about 7x more likely that the “real answer” is 1, than that the “real answer” is 0. A flatter, more noisy curve would yield a substantially smaller margin.

But wait a minute, we started out saying that our privacy guarantee, guarantees that individuals will be completely non-discernible. Is non-discernible the same thing as hardly discernible?

Clearly not.

Is complete indiscernibility even possible with differential privacy?

Apparently not…

On the question of “Discernibility”

Tuesday, April 13th, 2010

Where's Waldo?Where’s Waldo?

In my last post about PINQ and meaningful privacy guarantees, we defined “privacy guarantee” as a guarantee that the presence or absence of a single record will not be discernible.

Sounds reasonable enough, until you ask yourself, what exactly do we mean by “discernible”? And by “exactly”, I mean, “quantitatively” what do we mean by “discernible”? After all, differential privacy’s central value proposition is that it’s going to bring quantifiable, accountable math to bear on privacy, an area of policy that heretofore has been largely preoccupied with placing limitations on collecting and storing data or fine-print legalese and bald-faced marketing.

However, PINQ (a Microsoft Research implementation of differential privacy we’ve been working with) doesn’t have a built-in mathematical definition of “discernible” either. A human being (aka one of us) has to do that.

A human endeavors to come up with a machine definition of discernibility.

At our symposium last Fall, we talked about using a legal-ish framework for addressing this very issue of discernibility: Reasonable Suspicion, Probable Cause, Preponderence of Evidence, Clear and Convincing Evidence, Beyond a Reasonable Doubt.

Even if we decided to use such a framework, we would still need to figure out how these legal concepts translate into something quantifiable that PINQ can work with.

“Not Discernible” means seeing 50/50.

My initial reaction when I first starting thinking about this problem was that clearly, discernibility or lack thereof needed to revolve around some concept of 50/50, as in “odds of,” “chances are.”

Whatever answer you got out of PINQ, you should never get even a hint of an idea that any one number was more likely to be the real answer than the numbers to either of side of that number. (In other words, x and x+/-1 should be equally likely candidates for “real answerhood.”)

Testing discernibility with a “Worst-Case Scenario”

I ask a rather “pointed” question about my neighbor, one that essentially amounts to “Is so-and-so in this data set? Yes or no?” without actually naming names (or social security numbers, email addresses, cell phone numbers or any other unique identifiers). e.g. “How many people in this data set of ‘people with skeletons in their closet’ wear an eye-patch and live in my building?” Ideally, I should walk away with an answer that says,

“You know what, your guess is as good as mine, it is just as likely that the answer is 0, as it is that the answer is 1.”

In such a situation, I would be comfortable saying that I have received ZERO ADDITIONAL INFORMATION on the question of a certain eye-patched individual in my building and whether or not he has skeletons in his closets. I may as well have tossed a coin. My pirate neighbor is truly invisible in the dataset, if indeed he’s in there at all.

Armed with this idea, I set out to understand how this might be implemented with differential privacy...

Building a community — populated by real people or anonymous cowards

Friday, April 9th, 2010

Mimi’s comment on my last blog post about building communities made an important point – although both Yelp and Wikipedia reward their users for their activities with increased status within their communities, they do so in very different ways with very different results for their content.

There are many, many differences between Yelp and Wikipedia.  (I’m curious how many people are registered users on both sites.)

But one really obvious one is that Yelp has created an active community of reviewers who largely use real photos and real names (or at least real first name and last initial), like peter d., a member of the Yelp Elite Squad.

Wikipedia, in contrast, is a free for all.  Many people who write or edit are anonymous.  They may register with pseudonyms, or they may not register at all, so that their edits are only associated with an IP address.  There are occasionally Wikipedians who reveal their real names and provide a lot of biographical information on their profile pages, like Ragesoss, who even provides a photo.

But for every user like him, there are many more like Jayjg and Neutrality, who seems to identify with Thomas Jefferson, as well as users who have been banned.

Obviously, there are other communities that encourage the use of real identities — Facebook, MySpace, social networks in general.  And there are communities where being pseudonymous or anonymous is perfectly fine, even encouraged — Slashdot, Flickr, and many more.

So how does the use of real identities affect the community?  How does it affect the incentives to participate?  The content that’s created?

Yelp’s reward system, as described in my earlier post, is very focused on the individual.  The compliments, the Elite Squad badge, the number of “useful, funny or cool” reviews written are all clearly attached to a specific person, such as peter d. above.  Although people are complimenting peter d. for the content he’s generated for Yelp, they’re also complimenting him as a person, as he’s told that he’s funny, he’s a good writer, and so forth.

Yelpers are encouraged to develop personas that are separate from the reviews they write.  The profiles have set questions, like “Last Great Book I Read” and “My First Concert.”  They know that it’s not just about one review they’ve written, but where they’ve eaten, where they’ve gone, what they’ve done, that shows something in a generation that recognizes tribes based on what people consume.  There is a suggestion that Yelpers might interact outside of Yelp, and in the case of the Yelp Elite Squad, an assumption that they will, as one of the major privileges is that members get invited to local events.  The reputation you seek to develop on Yelp is not necessarily so different from the reputation you seek to develop in real life.

Yelp isn’t just a review site.  It’s a social network that feels almost like an online dating site — you can see how easily compliments could be used to flirt.

Wikipedia’s reward system, based on the open source software model, is more low-key.  Wikipedia does rate articles as “good articles,” and notes which articles have priority within certain classes of subjects.  If you write a lot of “good articles,” or otherwise contribute substantively, you can get various gold stars and badges as well, like the Platinum Editor Star Jayjg has on his profile page.

But the compliments are less about the Wikipedia user, even when stars are given, and more about the Wikipedia-related contribution he or she has made.  Some Wikipedians may be flirting with each other, but it seems really unlikely, at least not within any Wikipedia-built mechanism.  Jayjg clearly feels no need to tell us where he’s from and what his first concert was — it’s not relevant here.  The Wikipedians who do share more personal information aren’t required or even encouraged to do so by the Wikipedia system.

It doesn’t matter who Jayjg is.  It only matters what he does for Wikipedia.

So although both sites use rewards and feedback loops to encourage participation, they’re creating fundamentally different content with fundamentally different communities.

Yelp’s entry for the Brooklyn Botanic Garden has 126 reviews, each wholly written by a single user with that user’s photo, name, number of friends and number of reviews immediately visible.  The reviews are clearly personal and subjective, as made obvious by references to what that person specifically experienced.  In peter d.’s case, his review notes how his brother once pushed him into the Japanese Pond.

When you look at Wikipedia’s entry on the Brooklyn Botanic Garden, you see a seamless, unitary document.  Unless you click on the tabs that cover history or discussion, you won’t even see who worked on the article.  There is no personal perspective, there is no author listed with stars next to his name, there are no buttons asking you to rate that author’s contribution as “useful, funny or cool.”

This makes sense, given their respective missions.  Yelp’s goal is to generate as many reviews as possible about local businesses, recognizing that taste is really subjective.  Wikipedia’s goal is to produce a free encyclopedia with unbiased, objective content.  Yelp doesn’t want you to write one review and go away.  If you do, your review may not even show up, as the spam filter may decide you’re not trustworthy.  But of the thousands and thousands of people who’ve edited Wikipedia, the vast majority have done a few, maybe even just one edit, and never come back.  Wikipedia is a collective work; Yelp is a collection of individual works.

I don’t have an opinion on whether a community of real profiles is better or worse than a community of anonymous and pseudonymous contributors.

The different structures seem to shape the content of Yelp versus Wikipedia in appropriate ways.  What’s less clear to me is how this difference affects the make up of their communities.  Wikipedia has recently been in the news as it examines the demographics of its users, which is “more than 80 percent male, more than 65 percent single, more than 85 percent without children, around 70 percent under the age of 30.” Its rewards system and its open source model clearly attracted the right kind of enthusiastic people who were willing to write encyclopedia entries without personal recognition and glory.  Wikipedia wants more and different kinds of people to be writing entries.  Would a system like Yelp’s that encourages a more explicit sense of community and social networking change who is attracted to Wikipedia? Or would it attract precisely the wrong kind of people, the ones who couldn’t work collaboratively without explicit credit and acknowledgment?

Yelp isn’t a model of community building either, of course.  Its users are more diverse than Wikipedia’s in that its breakdown by gender is 54-46, male-female, but it’s also a very young community.  It’s less international than Wikipedia, partly because it grows city by city, but its American youth-oriented culture may not translate well either.  It’s facing its own credibility problem as business owners accuse Yelp of extortion.  It’s not surprising, as it’s fueled by people who are addicted to writing reviews and complimenting each other, but it’s paid for by advertisers who don’t participate in that same incentive structure.

Both Yelp and WIkipedia have managed to attract active, enthusiastic contributors willing to do a lot for no pay (or mostly no pay in the case of Yelp, which has admitted to paying some reviewers.)  But moving forward, which model of participation and rewards will be more attractive to more people for the right reasons?

For more on this issue, see today’s New York Times article on how news sites are considering getting rid of the anonymous option for commenters.  Or they could do what Slashdot does, which is call anyone who chooses to post anonymously an “Anonymous Coward.”

Building a community — with karma and elite squads

Thursday, April 8th, 2010

In high school psychology, I learned that rats that are rewarded for good behavior, i.e., given positive reinforcement, will repeat the good behavior.  Humans aren’t really that different.

Several of the online communities I looked at need their members to do stuff to make their communities work.  Some of them have decided to explicitly reward their members for good contributions.  For example, Yelp is a site with reviews of local businesses.  It knows that ratings alone aren’t very useful, as people have different standards, and it also knows one-line reviews claiming a restaurant is “great” or “terrible,” aren’t very informative either.

Yelp encourages detailed, specific reviews in several ways.  Yelp invites members to rate each other’s reviews as “useful, funny or cool.”  Members can send each other compliments, little encouraging notes about what good writers they are or how cool they are.  Yelp removes reviews it deems to be rants or shills.  (This has led to some controversy as business owners have claimed Yelp removes reviews to extort business owners to take out ads, to which Yelp has responded with some changes.)  The biggest gold star, literally a “badge” that gets attached to the user’s profile, is reserved for Yelpers in the Yelp Elite Squad.  To be eligible to become a member of the Elite Squad, a reviewer must post a real photo, use a real first name and last initial, and “be active Yelp evangelists and role models, on and off the site.”  Members of the Elite Squad are invited to local events, and they become another community onto themselves.

As a result, reviews on Yelp are considerably more detailed than reviews on comparable sites, and there are more of them.  For Abraco, a coffee shop in the East Village, Yelp lists 241 reviewsMenupages, which doesn’t do any of the things Yelp does, has 7, and they tend to be a bit more prosaic:

Of course, everything has a downside.  Yelpers have a tendency to be self-indulgent in the way they write, with details about their personal lives and more that aren’t always relevant to the business they’re reviewing at hand.  But the details aren’t totally worthless.  I appreciate the way Yelp encourages detailed reviews because the details are often helpful in helping me determine whether the reviewer is someone whose taste is similar to mine.  When someone tells me that he doesn’t like Chinese food and thought the restaurant should be serving white chicken meat, I know instantly that he does not have the same taste as me, and I will not rely on his review.  Whereas if that same person had only written, “Terrible food!”, I wouldn’t know enough to judge.

If I really want to know more about the reviewer’s tastes and preferences, I can even click on the reviewer’s name and see what else he or she has reviewed.  I can get a much better sense of who Mark L. is than of TheJuicyShow.

Similarly, Slashdot uses “karma” to encourage smart comments.  As a news aggregator for self-described nerds, Slashdot is as much a place to comment on stories as to read them.  Anyone who has read open comments on popular blogs knows that they are often full of inflammatory rants where people spout rather than read/listen to what others are saying.  Slashdot tries to deal with this by rating Slashdot users on their comments.   The better your comments, the more “karma” you get, in the form of assessments that your comment is “insightful,” “interesting,” etc.  Karma give you the power to moderate others’ comments, though you have to spend the points within 3 days.  Good comments are considered an “achievement” that gets included on the profile of each user, which means, like Yelp, Slashdot users have personas that can be viewed by clicking on their profiles.

Wikipedia awards activities in a slightly different way. Although Wikipedians also get rewarded with higher status, it’s not in as prominent a way as it is for Yelp or Slashdot users.  There are no badges or notes like “Insightful.”  Rather, as registered users contribute, they gain a reputation in that community. Those who meet the threshold for number of edits can vote in Wikimedia board elections, as well as be a candidate for the board.  Other privileges, like administrator privileges, are granted to those who request them after a lengthy review of their contributions.  Wikipedia is following the model of open source software projects where people are granted more responsibility, like commit privileges, as they demonstrate that they do good work.  They’re rewarded with status, but not in as prominent a way as the badge Yelp Elite Squad members get.

Offline organizations also reward good participation, with awards that recognize exceptional volunteers and positions of leadership.  Habitat for Humanity affiliate chapters are often run by volunteers who have taken on responsibility after demonstrating their commitment.  But because activities online are transparent to the whole community, the rewards given for those activities are similarly transparent as well.  It’s easier to reward online activities in small as well as large ways.  It’s also easier to keep track of large groups of people online.  Thus, the reward system for these online organizations is more visible and more apparent than for offline organizations.

And because the rewards systems are visible and apparent, they really affect the culture of the community.  There are people who claim to be addicted to Yelp; there are also people who really don’t care about being made a member of an elite squad.  Yelp’s reward system probably repels as many people as it attracts, and it’s important for anyone building a community to think about who they want to attract and how.

Yea or Nay: NYPD Skywatch crime surveillance…coming to a corner near you.

Friday, March 19th, 2010

One of these just showed up nearby. Here’s more info on what these things are.

Not the most subtle device in the world. But really that’s just the point?

Mobile crime surveillance units?

View Results

Loading ... Loading ...

Yea or Nay: Sympathetic Advertising

Wednesday, March 17th, 2010

Using facial recognition technology, an internal computer determines your gender and your age. The billboard then pulls up an ad based on your demographic, targeting your best possible interest. The billboard I tried out saw that I was indeed a woman in her thirties and… lo and behold, pulled up a very appealing lunch advertisement.

The author of this article compares this new technology to retina scanning technology in the movie “Minority Report” that allowed “billboards” to play ads that are tailored to YOU, personally, not you, as a member of a demographic group. Is that a fair comparison?

After all, the data behind the Japanese advertising technology probably looks more like this Wikipedia page on Japanese demographics than this IMDB page on Tom Cruise.

Still, it’s very easy to see the slippery slope between these two scenarios, in particular because they are collecting the faces they’re reading.

So the question remains, where’s the bright line between tracking people to gain a “general understanding” of what’s going and tracking individuals so they can’t get away with anything? Has this face-reading advertising technology already crossed that line?

What do you think?

Read faces to play demographically targeted ads?

View Results

Loading ... Loading ...

Yea or Nay: Track Taxis with GPS?

Wednesday, March 17th, 2010

We talk a lot on this blog about how tracking personal activities and collecting data can be extremely useful. We also talk about the need for better laws, regulations and shared social understanding of how such data should be collected, shared and used.

As part of our ongoing work to make sense of such a complicated and confusing set of issues, we’ll be collecting interesting “moral dilemmas” related to the issue of tracking human behaviors and posting them as a series of online polls. It’s an attempt to take a more “empirical,” case-by-case approach in an effort to keep high-level policy thinking rooted in reality.

If you come across something an interesting moral dilemma, please send them our way.

Without further ado, here’s the first poll:

Using G.P.S. technology installed in cabs, the (Taxi and Limousine) commission discovered more than 1.8 million trips where passengers were charged the higher rate.

Should we track taxis with GPS devices?

View Results

Loading ... Loading ...

Smart Grid Data: Unexpected and Amazing Reuses?

Tuesday, March 16th, 2010

As noted in “In the Mix,” the Center for Democracy and Technology and the Electronic Freedom Foundation recently issued joint comments to the California Public Utilities Commission regarding proposed policies around the use of smart grids and smart meters.

(via Flowing Data.)

And then a few days later, I saw this: EPCOR, a Canadian water utility company, issued a graph plotting water usage during the Olympic men’s hockey final.  Notice the spikes in water consumption (and toilet flushing) immediately after the first period, second period, third period, and finally when Canada wins the gold medal.

Is this our worst nightmare?  That someone will find out when we’re peeing?

That’s a bad joke. Plotting a large area’s water consumption in aggregate is not the same as what some of these smart meters are able to measure in terms of energy consumption.

But I do have a more serious point to make.  One of the points CDT and EFF make repeatedly in their comments is that we should avoid “unnecessary” data collection and destroy any “unnecessary” data.

What exactly does “unnecessary” mean?

Does it mean any purpose that is not related to the work of a utility company?  Who decides what’s unnecessary and should they decide what’s unnecessary and necessary now?

The beauty of data is that its potential value is unknown.  A single dataset, collected for one purpose, can be used for other purposes that are socially beneficial but rather unexpected.  For example, Google Trends was created for advertisers so that they can track what search terms are popular.  The CDC, however, has been using Google Trends to track flu outbreaks, by watching where people are Googling flu symptoms, data which is more quickly collected than reports from doctors.  The reason governments all over the world are pushing for open data is because we don’t know yet all that can be done.  By giving access to everyone, we expect interesting, useful, imaginative things to come out of the data we never might have imagined.

Data from the smart grids, in particular, will also require smart visualizations that are easy for individual consumers to understand and access.  Data alone isn’t going to change behavior.  You can imagine open data inviting developers to create easy to use apps that allow consumers to identify easily and painlessly ways to reduce energy consumption.  Some may even choose to share that information and compete with others, the way several universities have set up competitions between dorms.  As much as Al Gore was embarrassed by news revealing how much energy his mansion used, others may be eager to brag about how little energy they use.

Can we protect privacy while also creating room for imaginative and innovative reuse of data?

There are definitely privacy issues we have to consider.  I agree with a lot of the points made in CDT and EFF’s comments.  That “customer information” shouldn’t be limited to “personally identifying information.”  The misuse and misapplication of phrases like “personal information” is something we’ve been harping on for a while.  That customers should have access to the data collected from them and the power to correct mistakes.  That law enforcement shouldn’t be allowed to troll this information without a warrant, that civil litigants shouldn’t be allowed to access this information without a court order based on a showing of compelling interest and after notifying the customer to provide her with a chance to object.

But rather than talking about barring “unnecessary” data collection and data use, we should be thinking of ways to make the data safely available, regardless of whether someone has decided it’s necessary or not.  The data from smart grids is going to be both dangerous and valuable because it is so fine-grained; we clearly can’t just plop it online.  Anonymizing data is really hard.  So at CDP, we’re working hard at thinking about ways to come up with measurable privacy guarantees and testing technologies like PINQ that promise to provide access to raw data without indicating the existence of any particular individual in a dataset.  Other organizations may have different ideas.  I’m grateful for the existence of organizations that imagine the worst-case scenarios around data collection to protect our civil rights.  I also hope to see the growth of more organizations that try to imagine the best-case scenarios.

Prostate Cancer and the Inexorable Pull To Act On Unlikely Events

Wednesday, March 10th, 2010

Here’s another example of how we seize on numbers we can see, no matter how uncertain and meaningless they might be, because there’s not yet a viable alternative source of information.

As a society, we will probably opt for prostate testing no matter how flawed it is until there’s a better, more accurate alternative. In other words, bad, misleading information is better than no information, especially in a culture that prizes initiative and can-do-ness over a more fatalistic view of life: Yes We Can!

This is a design challenge for anybody trying to help people make sense of data. It is also especially important for us right now as we try to figure out a meaningful privacy guarantee for the datatrust. It’s easy for us to guarantee that you’ll never know with 100% certainty the answer to any question. But in many situations, people won’t need anything close to 100% certainty to feel compelled to act.

Certainly in the case of screening for diseases, it’s incredibly hard to do nothing if there is even a hint of a chance that we might be fatally ill.

What are other examples of numbers we make too much of and can’t get enough of?

  • Poll numbers
  • Housing data
  • Almost any study that comes about health and nutrition
Get Adobe Flash player