Mark Zuckerberg: It takes a village to build trust.

June 4th, 2010 by Mimi Yin

This whole brouhaha over Facebook privacy appears to be stuck revolving around Mark Zuckerberg.

We seem to be stuck in a personal tug-of-war with the CEO of Facebook frustrated that a 26 year-old personally has so much power over so many.

Meanwhile, Mark Z. is personally reassuring us that we can trust Facebook which on some level implies we must trust him.

But should any single individual really be entrusted with so much? Especially “a 26 year-old nervous, sweaty guy who dodges the questions.” Harsh, but not a completely invalid point.

As users of Facebook, we all know that it is the content of all our lives and our relationships to each other that make Facebook special. As a result, we feel a sense of entitlement about Facebook policy-making that we don’t feel about services that are in many ways way more intrusive and/or less disciplined about protecting privacy (e.g. ISPs, cellphone providers, search).

Another way of putting it is, Facebook is not Apple! and as a result, needs a CEO who is a community leader, not a dictator of cool.

So we start asking questions like, why should Facebook make the big bucks at the expense of my privacy? Shouldn’t I get a piece of that?

(Google’s been doing this for over a decade now, but the privacy exposure over at Google is invisible to the end-user.)

At some point, will we decide we would rather pay for a service than feel like we’re being manipulated by companies who know more about us than we do and can decide whether to use that information to help us or hurt us depending on profit margin. Here’s another example.

Or are there other ways to counterbalance the corporate monopoly on personal information? We think so.

In order for us to trust Facebook, Facebook needs to stop feeling like a benevolent dictatorship, albeit one open to feedback, but also one with a dictator who looks like he’s in need of a regent.

Instead Facebook the company should consider adopting some significant community-driven governance reforms that will at least give it the patina of a democracy.


(Even if at the end of the day, it is beholden to its owners and investors.

For some context, this was the sum total of what Mark Z. had to say about how important decisions are made at Facebook:

We’re a company where there’s a lot of open dialogue. We have crazy dialogue and arguments. Every Friday, I have an open Q&A where people can come and ask me whatever questions they want. We try to do what we think is right, but we also listen to feedback and use it to improve. And we look at data about how people are using the site. In response to the most recent changes we made, we innovated, we did what we thought was right about the defaults, and then we listened to the feedback and then we holed up for two weeks to crank out a new privacy system.

Nothing outrageous. About par for your average web service. (But then again, Facebook isn’t your average web service.)

However, this is what should have been the meat of the discussion about how Facebook is going to address privacy concerns: community agency and decision-making, not Mark Z.’s personal vision of an interwebs brimming with serendipitous happenings.

Facebook the organization needs to be trusted. So it might be best if Mark Z. backed out of the limelight and stopped being the lone face of Facebook.

How might have that D8 interview have turned out if he had come on stage with a small group of Facebook users?

What governance changes would make you feel more empowered as a Facebook user?

Governing the Datatrust: Answering the question, “Why should I trust you with my data?”

June 3rd, 2010 by The Common Data Project

Progress on defining the datatrust is accelerating–we can almost smell it!

For a refresher, the datatrust is an online service that will allow organizations to open sensitive data to the public and provide researchers, policymakers and application developers with a way to directly query the data, all without compromising individual privacy. Read more.

For the past two years, we’ve been working on figuring out exactly what the datatrust will be, not just in technical terms, but also in policy terms.

We’ve been thinking through what promises the datatrust will make, how those promises will be enforced, and how best we can build a datatrust that is governed, not by the whim of a dictator, but by a healthy synergy between the user community, the staff, and the board.

The policies we’re writing and the infrastructure we’re building are still a work in progress.  But for an overview of the decisions we’ve made and outstanding issues, take a look at “Datatrust Governance and Policies: Questions, Concerns, and Bright Ideas”.

Here’s a short summary of our overall strategy.

  1. Make a clear and enforceable promise around privacy.
  2. Keep the datatrust simple. We will never be all things to all people. The functions it does have will be small enough to be managed and monitored easily by a small staff, the user community, and the board.
  3. Have many decision-makers. It’s more important that we do the right thing than that we do them quickly. We will create a system of checks and balances, in which authority to maintain and monitor the datatrust will be entrusted to several, separate parties, including the staff, the user community, and the board.
  4. Monitor, report and review, regularly. We will regularly review what we’re monitoring and how we’re doing it. Release results to the public.
  5. Provide an escape valve. Develop explicit, enforceable policies on what the datatrust can and can’t do with the data. Prepare a “living will” to safely dispose of the data if the organization can no longer meet its obligations to its user community and the general public.

We definitely have a lot of work to do, but it’s exciting to be narrowing down the issues.  We’d love to hear what you think!

P.S. You can read more about the technical progress we’re making on the datatrust by visiting our Projects page.

Measuring the privacy cost of “free” services.

June 2nd, 2010 by Mimi Yin

There was an interesting pair of pieces on this Sunday’s “On The Media.”

The first was “The Cost of Privacy,” a discussion of Facebook’s new privacy settings, which presumably makes it easier for users to clamp down on what’s shared.

A few points that resonated with us:

  1. Privacy is a commodity we all trade for things we want (e.g. celebrity, discounts, free online services).
  2. Going down the path of having us all set privacy controls everywhere we go on internet is impractical and unsustainable.
  3. If no one is willing to share their data, most of the services we love to get for free would disappear. Randall Rothenberg.
  4. The services collecting and using data don’t really care about you the individual, they only care about trends and aggregates. Dr. Paul H. Rubin.

We wish one of the interviewees had gone even farther to make the point that since we all make decisions every day to trade a little bit of privacy in exchange for services, privacy policies really need to be built around notions of buying and paying where what you “buy” are services and how you pay for them are with “units” of privacy risk (as in risk of exposure).

  1. Here’s what you get in exchange for letting us collect data about you.”
  2. Here’s the privacy cost of what you’re getting (in meaningful and quantifiable terms).

(And no, we don’t believe that deleting data after 6 months and/or listing out all the ways your data will be used is an acceptable proxy for calculating “privacy cost.” Besides, such policies inevitably severely limit the utility of data and stifle innovation to boot.)

Gaining clarity around privacy cost is exactly where we’re headed with the datatrust. What’s going to make our privacy policy stand out is not that our privacy “guarantee” will be 100% ironclad.

We can’t guarantee total anonymity. No one can. Instead, what we’re offering is an actual way to “quantify” privacy risk so that we can track and measure the cost of each use of your data and we can “guarantee” that we will never use more than the amount you agreed to.

This in turn is what will allow us to make some measurable guarantees around the “maximum amount of privacy risk” you will be exposed to by having your data in the datatrust.


The second segment on privacy rights and issues of due process vis-a-vis the government and data-mining.

Kevin Bankston from EFF gave a good run-down how ECPA is laughably ill-equipped to protect individuals using modern-day online services from unprincipled government intrusions.

One point that wasn’t made was that unlike search and seizure of physical property, the privacy impact of data-mining is easily several orders of magnitude greater. Like most things in the digital realm, it’s incredibly easy to sift through hundreds of thousands of user accounts whereas it would be impossibly onerous to search 100,000 homes or read 100,000 paper files.

This is why we disagree with the idea that we should apply old standards created for a physical world to the new realities of the digital one.

Instead, we need to look at actual harm and define new standards around limiting the privacy impact of investigative data-mining.

Again, this would require a quantitative approach to measuring privacy risk.

(Just to be clear, I’m not suggesting that we limit the size of the datasets being mined, that would defeat the purpose of data-mining. Rather, I’m talking about process guidelines for how to go about doing low-(privacy) impact data-mining. More to come on this topic.)

Ten Things We Learned About Communities

June 1st, 2010 by Grace Meng

After 8 posts and several thousand words on how communities encourage participation, define membership, sustain networks, and govern themselves, what have we learned?

Dimitri Damasceno Creative Commons Attribution ShareAlike 2.0 (Generic)

We started this study because the datatrust we are working to build will depend on an invested and active community.  We want data donors, data borrowers, and data curators to interact as members of a community that are empowered to manage data, monitor the community, and hold the datatrust accountable to its mission.

So here are the findings we think are most relevant to the datatrust:

What motivates high-quality participation?

1. People are motivated to participate by rewards, but also by a desire to enhance their reputations.

Do communities need to have a mission?

2.  A shared ethos, culture, or mission are important if you want members of the community to be invested in the community and its survival as an institution.

3.  A shared ethos, culture, or mission also make it harder to have a very large and diverse community of people with different tastes and goals.

Should we require real identities?

4. People care more about their reputations when their real identities are on the line.

Can a community get too big?

5. If a large social network is to maintain a sense of small-scale community, it needs to reinforce a feeling of smaller communities within the social network.

Does diversity matter, in what way and why?

6. Diversity isn’t necessary for a successful community, but it’s important if the community’s goals require participation from a broad and diverse range of people.

Should you have to “pay to play”?

7.  We have always anticipated instituting a clear quid pro quo in the datatrust community – if you donate data, you get access to data.  Although we value the clarity of that exchange, will it limit our ability to grow?

Do more privacy controls=more control over privacy?

8.  People need to understand intuitively where information is going and to whom for privacy controls to be meaningful.

Is self-governance worth it?

9.  Decentralization of power and transparency can go a long way in helping an organization build trust.

10.  But you will have to put up with people who argue about what color to paint the bike shed.

Building a community: who’s in charge?

May 28th, 2010 by Grace Meng

From http://xkcd.com/

We’ve seen so far that for a community to be vibrant and healthy, people have to care about the community and the roles they play in it.  A community doesn’t have to be a simple democracy, one member/one vote on all decisions, but members have to feel some sense of agency and power over what happens in the community.

Of course, agency can mean a lot of things.

On one end of the spectrum are membership-based cooperatives, like credit unions and the Park Slope Food Coop, where members, whether or not they exercise it, have decision-making power built into the infrastructure of the organization.

On the other end are most online communities, like Yelp, Facebook, and MySpace.  Because the communities are all about user-generated content, users clearly have a lot of say in how the community develops.

But generally speaking, users of for-profit online services, even ones that revolve around user-generated content don’t have power to actually govern the community or shape policies.

Yelp, for example, allows more or less anyone to write a review.  But the power to monitor and remove reviews for being shills, rants or otherwise violations of its terms of use is centralized in Yelp’s management and staff.  The editing is done behind closed doors, rather than out in the open with community input.  Given its profit model, it’s not surprising that Yelp has been accused repeatedly of using its editing power as a form of extortion when it tries to sell ads to business owners.

Even if Yelp is innocent, it doesn’t help that the process is not transparent, which is why Yelp has responded by at least revealing which reviews have been removed.

(As for Facebook, the hostility between the company and at least some of its users is obvious.  No need to go there again.)

And then there are communities that are somewhere in between, like Wikipedia.  Wikipedia isn’t a member-based organization in a traditional sense.  Community members elect three, not all, of the board members of Wikimedia.  Each community member does not have the same amount of power as another community member – people who gain greater responsibilities and greater status also have more power.  But many who are actively involved in how Wikipedia is run are volunteers, rather than paid staff, who initially got involved the same way everybody does, as writers and editors of entries.

There are some obvious benefits to a community that largely governs itself.

It’s another way for the community to feel that it belongs to its members, not some outside management structure.  The staff that runs Wikipedia can remain relatively small, because many volunteers are out there reading, editing, and monitoring the site.

Perhaps most importantly, power is decentralized and decisions are by necessity transparent.  Although not all Wikipedia users have access to all pages, there’s an ethos of openness and collaboration.

For example, a controversy recently erupted at Wikipedia.  Wikimedia Commons was accused of holding child pornography.  Jimmy Wales, the founder of Wikipedia, then started deleting images.  A debate ensued within the Wikipedia community about whether this was appropriate, a debate any of us can read.  Ultimately, it was decided that he would no longer have “founder” editing privileges, which had allowed him to delete content without the consent of other editors.  Wikimedia also claims that he never had final editorial control to begin with.  Whether or not Wikimedia is successful, it wants and needs to project a culture of collaboration, rather than personality-driven dictatorship.

It’s hard to imagine Mark Zuckerberg giving up comparable privileges to resolve the current privacy brouhaha at Facebook.

But it’s not all puppies and roses, as anyone who’s actually been a part of such a community knows.

It’s harder to control problems, which is why a blatantly inaccurate entry on Wikipedia once sat around for 123 days.  Some community members tend to get a little too excited telling other members they’re wrong, which can be a problem in any organization, but is multiplied when everyone has the right to monitor.

Some are great at pointing out problems but not so good at taking responsibility for fixing them.

And groups of people together can rathole on insignificant issues (especially on mailing lists), stalling progress because they can’t bring themselves to resolve “What color should we paint the bikeshed?” issues.

Wikipedia has struggled with these challenges over the past ten years.  It now limits access to certain entries in order to control accuracy, but arguably at some cost to the vibrancy of the community.  Wikipedia is trying to open up Wikipedia in new directions, as it tries a redesign in the hope it will encourage more diverse groups to write and edit entries (though personally, it looks a lot like the old one).

Ultimately, someone still has to be in charge.  And when you value democracy over dictatorship, it’s harder but arguably more interesting, to figure out what that looks like.

Recap and Proposal: 95/5, The Statistically Insignificant Privacy Guarantee

May 26th, 2010 by Mimi Yin


Image from: xkcd.

In our search for a privacy guarantee that is both measurable and meaningful to the general public, we’ve traveled a long way in and out of the nuances of PINQ and differential privacy: A relatively new, quantitative approach to protecting privacy. Here’s a short summary of where we’ve been followed by a proposal built around the notion of statistical significance for where we might want to go.

The “Differential Privacy” Privacy Guarantee

Differential privacy guarantees that no matter what questions are asked and how answers to those questions are crossed with outside data, your individual record will remain “almost indiscernible” in a data set protected by differential privacy. (The corollary to that is that the impact of your individual record on the answers given out by differential privacy will be “negligeable.”)

For a “quantitative” approach to protecting privacy, the differential privacy guarantee is remarkably NOT quantitative.

So I began by proposing the idea that the probability of a single record being present in a data set should equal the probability of that single record not being present in that data set (50/50).

I introduced the idea of worst-case scenario where a nosy neighbor asks a pointed question that essentially reduces to a “Yes or no? Is my neighbor in this data set?” sort of question and I proposed that the nosy neighbor should get an equivocal (50/50) answer: “Maybe yes, but then again, (equally) maybe no.”

(In other words, “almost indiscernible” is hard to quantify. But completely indiscernible is easy to quantify.)

We took this 50/50 definition and tried to bring it to bear on the reality of how differential privacy applies noise to “real answers” to produce identity-obfuscating “noisy aswers.”

I quickly discovered that no matter what, differential privacy’s noisy answers always imply that one answer is more likely than another.

My latest post was a last gasp explaining why there really is no way to deliver on the completely invisible, completely non-discernible 50/50 privacy guarantee (even if we abandoned Laplace).

(But I haven’t given up on quantifying the privacy guarantee.)

Now we’re looking at statistical significance as a way to draw a quantitative boundary around a differential privacy guarantee.

Below is a proposal that we’re looking for feedback on. We’re also curious to know if anyone else tried to come up with a way to quantify the differential privacy guarantee?

What is Statistical Significance? Is it appropriate for our privacy guarantee?

In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. Applied to our privacy guarantee, you might ask the question this way: When you get an answer about a protected data set, are the implications of that “differentially private” answer (as in implications about what the “real answer” might be) significant or are they simply the product of chance?

Is this an appropriate way to define a quantifiable privacy guarantee, we’re not sure.

Thought Experiment: Tossing a Weighted Coin

You have a coin. You know that one side is heavier than the other side. You have only 1 chance to spin the coin and draw a conclusion about which side is heavier.

At what weight distribution split does the result of that 1 coin spin start to be statistically significant?

Well, if you take the “conventional” definition of statistical significance where results start to be statistically significant when you have less than a 5% chance of being wrong, the boundary in our weighted coin example would be 95/5 where 95% of the weight is on one side of the coin and 5% is on the other.

What does this have to do with differential privacy?

Mapped onto differential privacy, the weight distribution split is the moral equivalent of the probability split between two possible “real answers.”

The 1 coin toss is the moral equivalent of being able to ask 1 question of the data set.

With a sample size of 1 question, the probability split between two possible, adjacent “real answers” would need to be at least 95/5 before the result of that 1 question was statistically significant.

That in turn means that at 95/5, the presence or absence of a single individual’s record in a data set won’t have a statistically significant impact on the noisy answer given out through differential privacy.

(Still 95% certainty doesn’t sound very good.)

Postscript Obviously, we don’t want to be a situation where asking just 1 question of a data set brings it to the brink of violating the privacy guarantee. However, thinking in terms of 1 question is helpful way to figure out the “total” amount of privacy risk the system can tolerate. And since the whole point of differential privacy is that it offers a quantitative way to track privacy risk, we can take that “total” amount and divide it by the number of questions we want to be able to dole out per data set and arrive at a per-question risk threshold.

Really? 50/50 privacy guarantee is truly impossible?

May 24th, 2010 by Mimi Yin

At the end of my last post, we came to the rather sad conclusion that as far as differential privacy is concerned, it is not possible to offer a 50/50, “you might as well not be in the data set” privacy guarantee because, well, the Laplace distribution curves used to apply identity-obfuscating noise in differential privacy are too…curvy.

No matter how much noise you add, answers you get out of differential privacy will always imply that one number is more likely to be the “real answer” than another. (Which as we know from our “nosy-neighbor-worst-case-scenario,” can translate into revealing the presence of an individual in a data set: The very thing differential privacy is supposed to protect against.)

Still, “50/50 is impossible” is predicated on the nature of the Laplace curves. What would happen if we got rid of them? Are there any viable alternatives?

Apparently, no. 50/50 truly is impossible.

There are a few ways to understand why and how.

The first is a mental sleight of hand. A 50/50 guarantee is impossible because that would mean that the presence of an individual’s data literally has ZERO impact on the answers given out by PINQ, which would effectively cancel out differential privacy’s ability to provide more or less accurate answers.

Back to our worst-case scenario, in a 50/50 world, a PINQ answer of 3.7 would not only equally imply that the real answer was 0 as that it was 1, it would also equally imply that the real answer was 8, as that it was 18K or 18MM. Differential privacy answers would effectively be completely meaningless.

Graphically speaking, to get 50/50, the currently pointy noise distribution curves would have to be perfectly horizontal, stretching out to infinity in both directions on the number line.

What about a bounded flat curve?

(If pressed, this is probably the way most people would understand what is meant when someone says an answer has a noise level or margin of error of +/-50.)

Well, if you were to apply noise with a rectangular curve, in our worst-case scenario, with +/-50 noise, there would be a 1 in 100 chance that you get an answer that definitively tells you the real answer.

If the real answer is 0, with a rectangular noise level +/- 50 would yield answers from -50 to +50.

If the real answer is 1, a rectangular noise level +/-50 would yield answers from -49 to +51.

If you get a PINQ answer of 37, you’re set. It’s equally likely that the answer is 0 as that the answer is 1. 50/50 achieved.

If you get a PINQ answer of 51, well you’ll know for sure that the real answer is 1, not 0. And there’s a 1 in a 100 chance that you’ll get an answer of 51.

Meaning there’s a 1% chance that in the worst-case scenario you’ll get 100% “smoking gun” confirmation of that someone is definitely present in a data set.

As it turns out, rectangular curves are a lot dumber than those pointy Laplace things because they don’t have asymptotes to plant a nagging seed of doubt. In PINQ, all noise distribution curves have an asymptote of zero (as in zero likelihood of being chosen as a noisy answer).

In plain English, that means that every number on the real number line has a chance (no matter how tiny) of being chosen as a noisy answer, no matter what the “real answer” is. In other words, there are no “smoking guns.”

So now we’re back to where we left off in our last post, trying to pick an arbitrary arbitrary probability split for our privacy guarantee.

Or maybe not. Could statistical significance come and save the day?

Could we quantify our privacy guarantee by saying that the presence or absence of a single record will not affect the answers we give out to a statistically significant degree?

In the mix…DNA testing for college kids, Germany trying to get illegally gathered Google data, and the EFF’s privacy bill of rights for social networks

May 21st, 2010 by Grace Meng

1) UC Berkeley’s incoming class will all get DNA tests to identify genes that show how well you metabolize alcohol, lactose, and folates. “After the genetic testing, the university will offer a campuswide lecture by Mr. Rine about the three genetic markers, along with other lectures and panels with philosophers, ethicists, biologists and statisticians exploring the benefits and risks of personal genomics.”

Obviously, genetic testing is not something to take lightly, but the objections quoted sounded a little paternalistic. For example, “They may think these are noncontroversial genes, but there’s nothing noncontroversial about alcohol on campus,” said George Annas, a bioethicist at the Boston University School of Public Health. “What if someone tests negative, and they don’t have the marker, so they think that means they can drink more? Like all genetic information, it’s potentially harmful.”

Isn’t this the reasoning of people who preach abstinence-only sex education?

2) Google recently admitted they were collecting wifi information during their Streetview runs.  Germany’s reaction? To ask for the data so they can see if there’s reason to charge Google criminally.  I don’t understand this.  Private information is collected illegally so it should just be handed over to the government?  Are there useful ways to review this data and identify potential illegalities without handing the raw data over to the government?  Another example of why we can’t rest on our laurels — we need to find new ways to look at private data.

3) EFF issued a privacy bill of rights for social network users.  Short and simple.  It’s gotten me thinking, though, about what it means that we’re demanding rights from a private company. Not to get all Rand Paul on people (I really believe in the Civil Rights Act, all of it), but users’ frustrations with Facebook and their unwillingness to actually leave makes clear that the service Facebook is offering is not just a service provided to just a customer.  danah boyd has a suggestion — let’s think of Facebook as a utility and regulate it the way we regulate electric, water, and other similar utilities.

Building a community: the costs and benefits of a community built on a quid pro quo

May 18th, 2010 by Grace Meng

A couple of posts ago, I wrote about how Yelp, Slashdot and Wikipedia reward their members for contributing good content with stars, karma points, and increased status, all benefits reserved just for their registered members.  All three communities, however, share the benefits of what they do with the general public.  You don’t have to contribute a single edit to a Wikipedia entry to read all the entries you want.  You don’t have to register to read Yelp reviews, nor to read Slashdot news.  For Wikipedia and Slashdot, you don’t even have to register to edit/make a comment.  You can do it anonymously.

In other communities, however, those who want to benefit from the community must also give back to the community.

Credit unions, for example, have benefits for their members and their members only.  Credit unions and banks offer a lot of the same services – accounts, mortgages, and other loans – but they often do so on better terms than banks do.  However, while a bank will offer a mortgage to a person who does not have an account at that bank, a credit union will provide services only to credit union members.

It is a quid pro quo deal – the credit union member opens an account and the credit union provides services in return.

A more particular example is the Park Slope Food Coop, a cooperative grocery store to which I belong.  Many food coops operate on multiple levels of access and benefits.  Non-members can shop, but may not get as big a discount as members.   Those who want to be members can choose to pay a fee or to volunteer their time.  The Park Slope Food Coop eliminates all those choices – you have to be a member to shop, and you have to work to be a member.  Every member of the Coop is required to work 2 hours and 45 minutes every 4 weeks.  The exact requirement can vary depending on the type of work you sign up for, and the kind of work schedule you have, but that work requirement exists for every single adult member of the Coop.  In return, you get access to the Coop’s very fresh and varied produce and goods, often of higher quality and at lower prices than other local stores.

Again, it’s quid pro quo, members work and they get access to food in return.

This is not to suggest that the arrangement members of credit unions and the Coop are acting in a mercenary way.  Quid pro quo doesn’t just mean “you scratch my back, I scratch yours.”  It means you do something and get something of equal value in return.

There are some real advantages to limiting benefits for community members and community members only.

The incentive to join is clear.  The community is often more tight-knit.  Most of all, there is no conflict of interest between what’s good for the community and what’s good for the members.  A bank serves its customers but it has an incentive to make money that goes beyond protecting its customers. Credit unions were not untouched by the financial crisis, but they were certainly not as entangled as commercial banks and are considered good places still to get loans if you have good credit.

There are also real disadvantages.

As both examples make clear, such communities tend to be small and local.  The Coop has more than 12,000 members, a lot for a physically small space, but nowhere close to the numbers that visit large supermarkets.  Credit unions boast that they serve 186 million people worldwide, but any particular credit union is much smaller.  Even the credit union associated with an employer as large as Microsoft is nowhere near as large as a national bank.  It’s difficult to scale the benefits of a credit union up.

Even if the group is kept small, the costs of monitoring this kind of community are obviously high.  In an organization like the Coop, someone needs to make sure everyone is doing their fair share of the work.  Stories about being suspended, applying for “amnesty,” and trying to hide spouses and partners abound.  The Coop is the grocery store non-members love to hate and a favorite subject in local media, with stories popping up every couple of years with headlines like, “Won’t Work for Food: Horror Stories of the World’s Largest Member-Owned Cooperative Grocery Store” and “Flunking Out at the Coop.”

Personally, I think the Coop functions surprisingly well, proven by its relative longevity among cooperative endeavors, but it’s certainly not a utopian grocery store where people hold hands and sing “Kumbaya” over artichokes.

Notably, both examples are also communities that mainly operate offline.  The Internet with its ethos of openness generally doesn’t favor sites that limit access only to members.  Registered users may need to log on to view their personal accounts, but few sites really limit the benefits of the site to members alone.

So is there any online community that limits the benefits of the community as strictly to members as my two offline examples?

The first example I could come up with was Facebook, and it’s actually a terrible one.  Facebook’s been all over the news for the changes that make its users’ information more publicly available, and new sites like Openbook are making obvious how public that information is.  At the same time, though, that public-ness is still not that obvious to the average Facebook user.  Information is primarily being accessed by third party partners (like Pandora), other sites using Facebook’s Open Graph, and other Facebook users (Community Pages, Like buttons across the Internet).  Facebook profiles can show up in public search results, but when you go to facebook.com, the first thing you see is a wall.  If you register, you can use Facebook.  If not, you can’t.

Facebook is perhaps most accurately an example of a community that looks closed but isn’t.  As danah boyd points out,

If Facebook wanted radical transparency, they could communicate to users every single person and entity who can see their content…When people think “friends-of-friends” they don’t think about all of the types of people that their friends might link to; they think of the people that their friends would bring to a dinner party if they were to host it. When they think of everyone, they think of individual people who might have an interest in them, not 3rd party services who want to monetize or redistribute their data. Users have no sense of how their data is being used and Facebook is not radically transparent about what that data is used for. Quite the opposite. Convolution works. It keeps the press out.

In a way, it shouldn’t surprise us that Facebook is pushing information public.  Its whole economic model is based on information, not on providing a service to its users.

Which leads me to the one good example of an online community where you really have to join to benefit — online dating sites.  Match.com, eHarmony, OKCupid — none of them let you look at other members’ profiles before you join.  OkCupid is free, but the others rely on an economic model of subscriptions, not advertising.

It seems dating is in that narrow realm of things people are willing to pay for on the Internet.

So I’m left wondering, is it possible to set up a free, large-scale, online community where benefits are limited to its members?  What are the other costs and benefits of a community where you have to give to get?  Closed versus open?  And do the benefits outweigh the costs?

In the mix…Linkedin v. Facebook, online identities, and diversity in online communities

May 14th, 2010 by Grace Meng

1) Is Linkedin better than Facebook with privacy? I’m not sure this is the right question to ask. I’m also not sure the measures Cline uses to evaluate “better privacy” get to the heart of the problem.  The existence of a privacy seal of approval, the level of detail in the privacy policy, the employment of certified privacy professionals … none of these factors address what users are struggling to understand, that is, what’s happening to their information.  73% of adult Facebook users think they only share content with friends, but only 42% have customized their privacy settings.

Ultimately, Linkedin and Facebook are apples to oranges.  As Cline points out himself, people on Linkedin are in a purely professional setting.  People who share information on Linkedin do so for a specific, limited purpose — to promote themselves professionally.  In contrast, people on Facebook have to navigate being friends with parents, kids, co-workers, college buddies, and acquaintances.  Every decision to share information is much more complicated — who will see it, what will they think, how will it reflect on the user?  Facebook’s constant changes to how user information makes these decisions even more complicated — who can keep track?

In this sense, Linkedin is definitely easier to use.  If privacy is about control, then Linkedin is definitely easier to control.  But does this mean something like Facebook, where people share in a more generally social context, will always be impossible to navigate?

2) Mark Zuckerberg thinks everyone should have a single identity (via Michael Zimmer).  Well, that would certainly be one way to deal with it.

3) But most people, even the “tell-all” generation, don’t really want to go there.

4) In a not unrelated vein, Sunlight Labs has a new app that allows you to link data on campaign donations to people who email you through Gmail.  At least with regards to government transparency, Sunlight Labs seems to agree with Mark Zuckerberg.  I think information about who I’ve donated money to should be public (go ahead, look me up), but it does unnerve me a little to think that I could email someone on Craigslist about renting an apartment and have this information just pop up.  I don’t know, does the fact that it unnerves me mean that it’s wrong?  Maybe not.

5) Finally, a last bit on the diversity of online communitiesit may be more necessary than I claimed, though with a slightly different slant on diversity.  A new study found that the healthiest communities are “diverse” in that new members are constantly being added.  Although they were looking at chat rooms, which to me seems like the loosest form of community, the finding makes a lot of sense to me.  A breast cancer survivors’ forum may not care whether they have a lot of men, but they do need to attract new participants to stay vibrant.


Get Adobe Flash player