Posts Tagged ‘Privacy’

Building a community: the implications of Facebook’s new features for privacy and community

Thursday, May 6th, 2010

As I described in my last post, the differences between MySpace and Facebook are so stark, they don’t feel like natural competitors to me.  One isn’t necessarily better than the other.  Rather, one is catering to people who are looking for more of a public, party atmosphere, and the other is catering to people who want to feel like they can go to parties that are more exclusive and/or more intimate, even when they have 1000 friends.

But this difference doesn’t mean that one’s personal information on Facebook is necessarily more “private” than on MySpace.  MySpace can feel more public.  There is no visible wall between the site and the rest of the Internet-browsing community.  But Facebook’s desire to make more of its users’ information public is no secret.  For Facebook to maintain its brand, though, it can’t just make all information public by default.  This is a company that grew by promising Harvard students a network just for them, then Ivy League students a network just for them, and even now, it promises a network just for you and the people you want to connect with.

Facebook needs to remain a space where people feel like they can define their connections, rather than be open to anyone and everyone, even as more information is being shared.

And just in time for this post, Facebook rolled out new features that demonstrate how it is trying to do just that.

Facebook’s new system of Connections, for example, links information from people’s personal profiles to community pages, so that everyone who went to Yale Law School, for example, can link to that page. Although you could see other “Fans” of the school on the school’s own page before, the Community page puts every status update that mentions the school in one place, so that you’re encouraged to interact with others who mention the school.  The Community Pages make your presence on Facebook visible in new ways, but primarily to people who went to the same school as you, who grew up in the same town, who have the same interests.

Thus, even as information is shared beyond current friends, Facebook is trying to reassure you that mini-communities still exist.  You are not being thrown into the open.

Social plug-ins similarly “personalize” a Facebook user’s experience by accessing the user’s friends.  If you go to, you’ll see which stories your friends have recommended.  If you “Like” a story on that site, it will appear as an item in your Facebook newsfeed.  The information that is being shared thus maps onto your existing connections.

The “Personalization” feature is a little different in that it’s not so much about your interactions with other Facebook users, but about your interaction with other websites.  Facebook shares the public information on your profile with certain partners.  For example, if you are logged into Facebook and you go to the music site Pandora, Pandora will access public information on your profile and play music based on the your “Likes.”

This experience is significantly different from the way people explore music on MySpace.  MySpace has taken off as a place for bands to promote themselves because people’s musical preferences are public.  MySpace users actively request to be added to their favorite bands’ pages, they click on music their friends like, and thus browse through new music.  All of these actions are overt.

Pandora, on the other hand, recommends new music to you based on music you’ve already indicated you “Like” on your profile.   But it’s not through any obvious activity on your part.  You may have noted publicly that you “Like” Alicia Keys on your Facebook profile page, but you didn’t decide to actively plug that information into Pandora.  Facebook has done it for you.

Depending on how you feel about Facebook, you may think that’s wonderfully convenient or frighteningly intrusive.

And this is ultimately why Facebook’s changes feel so troubling for many people.

Although they aren’t ripping down the walls of its convention center and declaring an open party. As Farhad Manjoo at Slate says, Facebook is not tearing down its walls but “expanding them.”

Facebook is making peepholes in certain walls, or letting some people (though not everyone) into the parties users thought were private.

This reinforces the feeling that mini-communities continue to exist within Facebook, something the company should try to do as it’s a major draw for many of its users.

Yet the multiplication of controls on Facebook for adjusting your privacy settings makes clear how difficult it is to share information and maintain this sense of mini-communities.  There are some who suspect Facebook is purposefully making it difficult to opt-out.  But even if we give Facebook the benefit of the doubt, it’s undeniable that the controls as they were, plus the controls that now exist for all the new features, are bewildering.  Just because users have choices doesn’t mean they feel confident about exercising them.

On MySpace, the prevailing ethos of being more public has its own pitfalls.  A teenager posting suggestive photos of herself may not fully appreciate what she’s doing.  At the least, though, she knows her profile is public to the world.

On Facebook, users are increasingly unsure of what information is public and to whom.  That arguably is more unsettling than total disclosure.

Can we reconcile the goals of increased government transparency and more individual privacy?

Tuesday, April 13th, 2010

I really appreciate the Sunlight Foundation‘s continuing series on new data sets being made public by the federal government as part of the Open Government Directive.  Yesterday, I found out the Centers for Medicaid and Medicare Services will be releasing all kinds of new goodies.  As the Sunlight Foundation points out, the data so far is lacking granularity — comparisons of Medicare spending by state, rather than county.  But still all very exciting.

Yet not a single mention of privacy.  Even though, according to the blogger, the new claims database will include data for 5% of Medicare recipients.  After “strip[ping] all personal identification data out,” the database will “present it by service type (inpatient, outpatient, home health, prescription drug, etc.)” As privacy advocates have noted, that’s probably not going to do enough to anonymize it.

I don’t really mind not hearing about privacy every time someone talks about a database.  But it’s sort of funny.  Everyday, I read a bunch of blogs on open data and government transparency, as well as a bunch of blogs on privacy issues.  But I rarely read about both issues in the same place.  Shouldn’t we all be talking to each other more?

In the mix

Monday, April 5th, 2010

1) Slate had an interesting take on the bullying story in Massachusetts and the prosecutor’s anger at Facebook for not providing information, i.e., evidence of the bullying.  Apparently, Facebook provided basic subscriber information, but resisted providing more without a search warrant.  Emily Bazelon points out how this area of law is murky, and references the coalition forming around reforming the Electronic Communications Privacy Act, but her larger point is an extra-legal one.  The evidence of bullying the DA was looking for was at one point public, even if eventually deleted. She points out that it may be hard for kids or parents who are upset to have the presence of mind to do this, but that they could take screenshots and preserve evidence themselves.

The case raises a lot of interesting questions about anonymity, privacy, and the values we have online.  Anonymity on the Internet has been a rallying cry for so many people, but I wonder, if something is illegal in the offline world, should it suddenly be legal online because you can be anonymous and avoid prosecution?  (Sexual harassment is a crime in the subway, too!)  We now live in a world where many of us occupy space both online and offline.  We used to think of them as completely separate spaces, and it’s true that the Internet gives us opportunities to do things, both good and bad, that we wouldn’t have offline.  But it’s increasingly obvious that we need to transfer some of the rules we have about the offline world into the online one.  For disability rights advocates, that includes pushing the definition of “public accommodation” to include online stores like Target, and suing them if their sites are not accessible to the blind using screen readers.  For privacy advocates, that includes acknowledging that people have an expectation of privacy in their emails as well as their snail mail.  Free speech in the offline world doesn’t mean you can say anything you want anywhere you want.  Maybe it’s time to be more nuanced about how we protect free speech online as well.

2) It turns out Twitter is pretty good at predicting box office returns — what else might it predict?

3) Cases like this amaze me, because the parties are litigating a question that seems like a no-brainer.  A New Jersey court upheld recently that an employee had an expectation of privacy in her Yahoo personal account, even if she accessed it on a company computer. Would we ever litigate whether an employee had an expectation of privacy in a piece of personal mail she brought to the office and decided to read at her desk?

4) The New York Times is acknowledging their readers’ online comments in separate articles, namely, this one describing readers’ reactions to federal mortgage aid.  It’s a smart way to give online readers a sense that their comments are being read.  I wonder if this is where the “Letters to the Editor” page is going.  I’ve been wondering, who are these readers who are so happy to be the 136th comment on an article?  But the people who write letters to the editor have always been people who have extra time and energy.  In a way, online comments expands the world of people who are willing to write a letter to the editor.

5) Would we feel differently about government data mining if the government were better at it? Mimi and I went to a talk at the NYU Colloquium on Information Technology and Society where Joel Reidenberg, a law professor at Fordham, talked about how transparency of personal information online is eroding the rule of law.  One of the arguments he made against government data mining was that it doesn’t work, with the example of airport security, its inability to stop the underwear bomber, and its terribly inaccurate no-fly lists.  Well, the Obama administration just announced a new system of airport security checks that uses intelligence-based data mining that is meant to be more targeted.  It’s hard to know now whether the new system will be better and smarter, but it raises a point those opposed to data mining don’t seem to consider — what if the government were better at it?  Could data mining be so precise that it avoids racial profiling?  Are there other dangers to consider, and can they be warded off without shutting down data mining altogether?

Would PINQ solve the problems with the Census data?

Friday, February 5th, 2010

Frank McSherry, the researcher behind PINQ, has responded to our earlier blog post about the problems found in certain Census datasets and how PINQ might deal with those problems.

Would PINQ solve the problems with the Census data?

No.  But it might help in the future.

The immediate problem facing the Census Bureau is that they want to release a small sample of raw data, a Public Use Microdata Sample or PUMS, about 1/20 of the larger dataset they use for their own aggregates, that is supposed to be a statistical sample of the general population.  To release that data, the Bureau has to protect the confidentiality of people in the PUMS, and they do so, in part, by manipulating the data.  Some of their efforts, though, seem to have altered the data so seriously that it no longer accurately reflects the general population.

PINQ would not solve the immediate problem of allowing the Census Bureau to release a 1/20 sample of their data.  PINQ only allows researchers to query for aggregates.

However, if Census data were released behind PINQ, the Bureau would not have to swap or synthesize data to protect privacy; PINQ would do that.  Presumably, if the danger of violating confidentiality were removed, the Census could release more than 1/20 sample of the data. Furthermore, unlike the Bureau’s disclosure avoidance procedures, PINQ is transparent in describing the range of noise that is being added.  Currently, the Bureau can’t even tell you what it did to protect privacy without potentially violating it.

The mechanism for accessing data through PINQ, of course, would be very different than what researchers are used to today.  Now, with raw data, researchers like to “look at the data” and “fit a line to the data.”  A lot of these things can be approximated with PINQ, but most researchers reflexively pull back when asked to rethink how they approach data.  There are almost certainly research objectives that cannot be met with PINQ alone.  But the objectives that can be met should not be held back by the unavailability of high quality statistical information. Researchers able to express how and why their analyses respect privacy should be rewarded with good data, incentivizing creative rethinking of research processes.

With this research published, it may be easier to argue that the choice between PUMS (and other microdata) and PINQ is not between raw data/noisy aggregates, but rather bad data/noisy aggregates. If and when it becomes a choice between these two, any serious scientist would reject bad data and accept noisy aggregates.

Can we trust Census data?

Wednesday, February 3rd, 2010

Yesterday, the Freakanomics blog at the New York Times reported that a group of researchers had discovered serious errors in PUMS (public-use microdata samples) files released by the U.S. Census Bureau.  When compared to aggregate data released by the Census, the PUMS files revealed up to 15% discrepancies for the 65-and-older population.  As Justin Wolfers explains, PUMS files are small samples of the much larger, confidential data used by the Census for the general statistics it releases. These samples are crucial to researchers and policymakers looking to measure trends that the Census itself has not calculated.

When I read this, the first thought I had was, “Hallelujah!”  Not because I felt gleeful about the Census Bureau’s mistakes, but because this little post in the New York Times articulated something we’ve been trying to communicate for awhile: current methods of data collection (and especially data release) are not perfect.

People love throwing around statistics, and increasingly people love debunking statistics, but that kind of scrutiny is normally directed at surveys conducted by people who are not statisticians.  Most people generally hear words like “statistical sampling” and “disclosure avoidance procedure” and assume that those people surely know what they’re doing.

But you don’t have to have training in statistics to read this paper and understand what happened. The Census Bureau, unlike many organizations and businesses that claim to “anonymize” datasets, knows that individual identities cannot be kept confidential simply by removing “identifiers” like name and address, which is why they use techniques like “data swapping” and “synthetic data.” It doesn’t take a mathematician to understand that when you’re making up data, you might have trouble maintaining the accuracy of the overall microdata sample.

To the Bureau’s credit, it does acknowledge where inaccuracies exist.  But as the researchers found, the Bureau is unwilling to correct its mistakes because doing so could reveal how they altered the data in the first place and thus compromise someone’s identity.  Which gets to the heart of the problem:

Newer techniques, such as swapping or blanking, retain detail and provide better protection of respondents’ confidentiality. However, the effects of the new techniques are less transparent to data users and mistakes can easily be overlooked.

The problems with current methods of data collection aren’t limited to the Census PUMS files either.  The weaknesses outlined by this former employee could apply to so many organizations.

This is why we have to work on new ways to collect, analyze, and release sensitive data.

In the mix: Your unique(ish) browser fingerprint…and…No $$ for privacy.

Friday, January 29th, 2010

1) EFF’s Panopticlick project lets you see how much your browser reveals and whether that might potentially “identify” you, based on their calculation of how identifiable a set of bits might be.

Can someone with a better grasp of math than I have explain to me how their information theory works? Right now, they have let’s say 10,000 people who’ve contributed their browser info. Bruce Schneier found out he was unique in 120,000. But if millions of people tested their browsers, would his configuration really be that unique? (Lots of skepticism in the comments to Schneier’s post, too.)

2) New initiative by advertising groups to reveal that they are tracking information — a small “i” icon:

What a quote: “‘This is not the full solution, but this moves the ball forward,’ he said.”

Well, that’s the understatement of the century. Full solution to what? The advertising industry keeping regulators off their backs? Helping users understanding how targeted advertising finds them? Really, neither are the real problem. Regulators should be focusing on establishing industry guidelines for how service providers and 3rd party advertising partners store and share data.

3) Should government data be in more user-friendly formats than XML?

Or should we leave usability to disinterested 3rd parites? If the government starts releasing user-friendly data, will that simply open the door for agencies to “spin” their data to make themselves look good? Actually, right now, how do we really know the data that’s being released hasn’t been “edited” in some way? Who’s vetting these releases and what’s the process?

4) Ten years and no one is really making any money off of “privacy”?

Perhaps no one has successfully “sold” privacy (as it’s own thing) because we haven’t yet agreed on what that a “privacy product” would look like. As Mimi says, “If someone was selling something that would guarantee that I would never get any SPAM (mail or email) for the rest of my life, I would totally sign up for that.” But that might not equal “privacy” for someone else.

Yay, it’s Data Privacy Day!

Thursday, January 28th, 2010

As sponsored by, among others, Google, Microsoft, Lexis-Nexis, and AT&T.

Lexis-Nexis, for those of you who are not lawyers and journalists, is an amazing tool for doing research on court decsions, regulations, statutes, and other legal matters.  It is also a great way to investigate people, comb through property records, and more!  In a way, though, the information it stores is pretty private, at least to the extent that it’s so expensive to access, it’s not available to the vast majority of people.  Which makes me wonder, how much is Lexis-Nexis worried that its product is becoming less valuable because more and more of their information is available elsewhere for free?

Which leads me to the crux of the problem.  Privacy, a word for which very few people can agree on a definition, is nevertheless a real issue these days.  But the reason it’s become such a pressing concern isn’t only because surveillance technology has gotten better or more pervasive.  It’s also because more information is available everywhere.  Re-identification from supposedly anonymized databases wouldn’t be so easy if other data sources, like DMV records, weren’t so readily available.  In addition, the Internet is teeming with information we want to provide ourselves, through Facebook, PatientsLikeMe,, which we do not just because we’re exhibitionists, but because we get value from sharing that information and seeing what others have shared as well.

We want privacy.  We want information.  How are we going to reconcile these two very legitimate desires?  Will there be trade-offs?  Can we really have it all?

We’re definitely not in the camp of “We’ll never have privacy, let’s throw out the data!”, nor the camp of “Privacy’s gone anyway.”  So yes, we do think we can have a lot, if not “all.”  And to do that, we need to move beyond talking about privacy and information in the abstract.  We need to look at specific areas — like electronic health records, campaign finance, government transparency — and be concrete about what we lose and what we gain with every decision we make.

Data Privacy Day may be “an international celebration of the dignity of the individual expressed through personal information,” but let’s be honest.  Dealing with these questions will be interesting, but it isn’t going to be a party.

Did the NYTimes Netflix Data Graphic Reveal the Netflix Preferences of Individual Users?

Tuesday, January 12th, 2010

Slate has an interesting slant on the New York Times graphic everyone’s been raving about — the most popular Netflix movies by zip code all over the country.  It really is great and fun to play with, but as Slate points out, some of the zip codes with rather anomalous lists may be pointing to individual users.  For example, 11317 has this top-ten list:

  1. Wall-E
  2. Indiana Jones and the Temple of Doom
  3. Oz: Season 3: Disc 1
  4. Watchmen
  5. The Midnight Meat Train
  6. Man, Woman, and the Wall
  7. Traffic
  8. Romancing the Stone
  9. Crocodile Dundee 2
  10. Godzilla’s Revenge

11317 is the zip code for LaGuardia Airport, which doesn’t have any residents.  That means this list may very well represent the Netflix renting habit of a small group or even a single subscriber who has his or her DVDs mailed there.

Slate finds some other zip codes that may represent a single subscriber, but doesn’t point out the privacy problem here, despite the fact that Netflix is already in hot water about its data releases.

We’ve said a lot about what “anonymization” means and what a privacy guarantee should include, so I won’t say more here.  Instead, I just want to point out that the Slate article helps illustrate the problem PINQ is trying to avoid.  As Tony points out in his post, PINQ won’t give you answers that would be changed by the presence of a single record.  Of course, because PINQ gives aggregate answers, you wouldn’t be asking questions phrased exactly as, “What are the top ten most popular Netflix movies for 11317?”  But if you tried to ask, “How many people in 11317 had viewed “The Midnight Meat Train?”, it would add sufficient noise that you would never know that the single person using LaGuardia airport as an address had viewed it.

Privacy Problems as Governance Problems at Facebook

Monday, January 4th, 2010

You know that feeling when you’ve been pondering something for awhile and then you read something that articulates what you’ve been thinking about perfectly?  It’s a feeling between relief and joy, and it’s what I felt reading Ed Felten’s critique of Facebook’s new privacy problems:

What Facebook has, in other words, is a governance problem. Users see Facebook as a community in which they are members. Though Facebook (presumably) has no legal obligation to get users’ permission before instituting changes, it makes business sense to consult the user community before making significant changes in the privacy model. Announcing a new initiative, only to backpedal in the face of user outrage, can’t be the best way to maximize long-term profits.

The challenge is finding a structure that allows the company to explore new business opportunities, while at the same time securing truly informed consent from the user community. Some kind of customer advisory board seems like an obvious approach. But how would the members be chosen? And how much information and power would they get? This isn’t easy to do. But the current approach isn’t working either. If your business is based on user buy-in to an online community, then you have to give that community some kind of voice — you have to make it a community that users want to inhabit.

This is a question we at CDP have been asking ourselves recently — how do you create a community that users want to inhabit?  We agree with Ed Felten that privacy in Facebook, as in most online activities, “means not the prevention of all information flow, but control over the content of their story and who gets to read it.”  Our idea of a datatrust is premised on precisely this principle, that people can and should share information in a way that benefits all of society without being asked to relinquish control over their data. Which is why we’re in the process of researching a wide range of online and offline communities, so that when we launch our datatrust, it will be built around a community of users who feel a sense of investment and commitment to our shared mission of making more sensitive data available for public decision-making.

We’d love to know, what communities are you happy to inhabit?  And what makes them worth inhabiting?  What do they do that’s different from Facebook or any other organization?

Wow, new privacy features!

Friday, December 11th, 2009

Wow, so many companies rolling out new privacy features lately!

Facebook rolled out its new “simplified” privacy settingsGoogle introduced Google Dashboard, a central location from which to manage your profile data, which supplements Google Ads Preferences.  And Yahoo released a beta version of the Ad Interest Manager.

Many, many people have reviewed Facebook’s new changes, and pointed out some of the “bait-and-switch” Facebook has done for some new, and I think better, controls.  I don’t have much more to say about that.

But it’s interesting to me that Google and Yahoo have chosen similar strategies around privacy issues, though with some differences in execution.  Both companies haven’t actually changed their data collection practices, and cynics have argued that they’re both just trying to stave off government regulation.  Still, I think that it makes a difference when companies actually make clear and visible what they are doing with user data.

“Is this everything?”

Both Google and Yahoo indicate in different ways that the user who is looking at Dashboard or Ad Interest Manager is not getting the full data story.

Google’s Dashboard is supposed to be a central place where a user can manage his or her own data.  In and of itself, it’s not that exciting.  As ReadWriteWeb put it, it doesn’t tell you anything you didn’t know before.  It provides links in one place to the privacy settings for various applications, but it focuses on profile information the user provides, which represents only a tiny bit of the personal information Google is tracking.

Google does, however, provide a link to the question, “Is this everything?” that describes some of their browser-based data collection and a link to the Ads Preferences Manager page.  To me, it feels a little shifty, that the Dashboard promises to be a place for you to control “data that is personally associated with you,” but it doesn’t reveal until you scroll to the bottom that this might not be everything.  Others may feel differently, but this to me goes right at the heart of the problem of how “personal information” is defined.  When I go to the Ads Preferences Manager, I see clearly that Google has associated all kinds of interests with me–how is this not “personally associated” with me?  Google states it’s not linking this data to my personal account data which is why they haven’t put it all in one place, which is good, but it seems too convenient a reason to silo that off.

Yahoo’s strategy is a little different.  It may not be fair to compare Yahoo’s Ad Interest Manager to Google’s Dashboard at this point, given that it’s in such a rudimentary phase.  It’s in beta and doesn’t work yet with all browsers.  (As David Courtney points out in PCWorld, being in beta is a pretty sorry excuse for the fact that it doesn’t work with IE8 and Firefox.)  Depending on how much you use Yahoo, you may not see anything about yourself.

Still, I thought it was interesting that Yahoo highlighted some of the hairy parts of its privacy policy in separate boxes high up on the page.  Starting from the top, Yahoo states clearly in separate boxes with bold headings that there are ways in which your data is collected and analyzed that are not addressed in this Ad Interest Manager.  The box for the Network Advertising Initiative is a little weak; it doesn’t really explain what it means that Yahoo is connected to the NAI.  But the box on “other inputs,” shows prominently that even as you manage your settings on this page, there may be other sources of data Yahoo is using to find out more about you.


Yahoo also reveals that the information they’re tracking from you is collected from a wide range of sources, including both Yahoo account information like Mail and non-account websites like its Front Page.  Unlike Google, Yahoo doesn’t ask you to click around to find out that some of “everything” is elsewhere.


Turning “interests” on and off

Google and Yahoo are very similar here.  Google’s Ad Preferences Manager indicates which interests have been associated with you with a clear link to how they can be removed, with a button for opting out from tracking altogether.


Yahoo’s Ad Interest Manager has a different design, but the button for opting out altogether is similarly visible.


We’re using cookies!

Compared to the other issues, this is the most obvious difference between Google and Yahoo.

Google has this on its Ads Preferences Manager:


So you can see that some string of numbers and letters has somehow been attached to your computer, but you’re not told what this means in terms of what Google knows about you.

In contrast, Yahoo shows this at the bottom of the Ad Interest Manager:


Yahoo knows I’m a woman!  Between 26 and 35!  The location is actually wrong, as I am in Brooklyn, NY, but I did live in San Francisco 5 years ago when I first signed up for a Yahoo account.  Still, Yahoo is very explicitly showing, and not just telling, that it knows geographical information, age, gender, and the make and operating system of your computer.  I’m impressed—they must know this is going to scare some people.

Does any of this even matter?

I prefer the Yahoo design in many ways — the boxes and verticality of the manager to me are easier to read and understand than the horizontal spareness of the Google design.  But in the end, the design differences between Google and Yahoo’s new privacy tools may not even matter.  I don’t know how many people will actually see either Manager.  You still have to be curious enough about privacy to click on “Privacy Policy,” which takes you to Yahoo! Privacy, at which point, in the top right-hand corner, you see a link to “Opt-out” of interest-based advertising.  The same is true with Google. And neither company has actually changed much about their data collection practices.  They’re just being more open about them.

But I am impressed and heartened that both companies have started to reveal more about what they’re tracking and in ways that are more visually understandable than a long, boring, legalistic privacy policy.  I hope Yahoo is feeling competitive with Google on privacy issues and vice-versa.  I’d love to see a race to the top.

Get Adobe Flash player