Archive for the ‘Best Practices’ Category

Ten Things We Learned About Communities

Tuesday, June 1st, 2010

After 8 posts and several thousand words on how communities encourage participation, define membership, sustain networks, and govern themselves, what have we learned?

Dimitri Damasceno Creative Commons Attribution ShareAlike 2.0 (Generic)

We started this study because the datatrust we are working to build will depend on an invested and active community.  We want data donors, data borrowers, and data curators to interact as members of a community that are empowered to manage data, monitor the community, and hold the datatrust accountable to its mission.

So here are the findings we think are most relevant to the datatrust:

What motivates high-quality participation?

1. People are motivated to participate by rewards, but also by a desire to enhance their reputations.

Do communities need to have a mission?

2.  A shared ethos, culture, or mission are important if you want members of the community to be invested in the community and its survival as an institution.

3.  A shared ethos, culture, or mission also make it harder to have a very large and diverse community of people with different tastes and goals.

Should we require real identities?

4. People care more about their reputations when their real identities are on the line.

Can a community get too big?

5. If a large social network is to maintain a sense of small-scale community, it needs to reinforce a feeling of smaller communities within the social network.

Does diversity matter, in what way and why?

6. Diversity isn’t necessary for a successful community, but it’s important if the community’s goals require participation from a broad and diverse range of people.

Should you have to “pay to play”?

7.  We have always anticipated instituting a clear quid pro quo in the datatrust community – if you donate data, you get access to data.  Although we value the clarity of that exchange, will it limit our ability to grow?

Do more privacy controls=more control over privacy?

8.  People need to understand intuitively where information is going and to whom for privacy controls to be meaningful.

Is self-governance worth it?

9.  Decentralization of power and transparency can go a long way in helping an organization build trust.

10.  But you will have to put up with people who argue about what color to paint the bike shed.

Building a community: who’s in charge?

Friday, May 28th, 2010

From http://xkcd.com/

We’ve seen so far that for a community to be vibrant and healthy, people have to care about the community and the roles they play in it.  A community doesn’t have to be a simple democracy, one member/one vote on all decisions, but members have to feel some sense of agency and power over what happens in the community.

Of course, agency can mean a lot of things.

On one end of the spectrum are membership-based cooperatives, like credit unions and the Park Slope Food Coop, where members, whether or not they exercise it, have decision-making power built into the infrastructure of the organization.

On the other end are most online communities, like Yelp, Facebook, and MySpace.  Because the communities are all about user-generated content, users clearly have a lot of say in how the community develops.

But generally speaking, users of for-profit online services, even ones that revolve around user-generated content don’t have power to actually govern the community or shape policies.

Yelp, for example, allows more or less anyone to write a review.  But the power to monitor and remove reviews for being shills, rants or otherwise violations of its terms of use is centralized in Yelp’s management and staff.  The editing is done behind closed doors, rather than out in the open with community input.  Given its profit model, it’s not surprising that Yelp has been accused repeatedly of using its editing power as a form of extortion when it tries to sell ads to business owners.

Even if Yelp is innocent, it doesn’t help that the process is not transparent, which is why Yelp has responded by at least revealing which reviews have been removed.

(As for Facebook, the hostility between the company and at least some of its users is obvious.  No need to go there again.)

And then there are communities that are somewhere in between, like Wikipedia.  Wikipedia isn’t a member-based organization in a traditional sense.  Community members elect three, not all, of the board members of Wikimedia.  Each community member does not have the same amount of power as another community member – people who gain greater responsibilities and greater status also have more power.  But many who are actively involved in how Wikipedia is run are volunteers, rather than paid staff, who initially got involved the same way everybody does, as writers and editors of entries.

There are some obvious benefits to a community that largely governs itself.

It’s another way for the community to feel that it belongs to its members, not some outside management structure.  The staff that runs Wikipedia can remain relatively small, because many volunteers are out there reading, editing, and monitoring the site.

Perhaps most importantly, power is decentralized and decisions are by necessity transparent.  Although not all Wikipedia users have access to all pages, there’s an ethos of openness and collaboration.

For example, a controversy recently erupted at Wikipedia.  Wikimedia Commons was accused of holding child pornography.  Jimmy Wales, the founder of Wikipedia, then started deleting images.  A debate ensued within the Wikipedia community about whether this was appropriate, a debate any of us can read.  Ultimately, it was decided that he would no longer have “founder” editing privileges, which had allowed him to delete content without the consent of other editors.  Wikimedia also claims that he never had final editorial control to begin with.  Whether or not Wikimedia is successful, it wants and needs to project a culture of collaboration, rather than personality-driven dictatorship.

It’s hard to imagine Mark Zuckerberg giving up comparable privileges to resolve the current privacy brouhaha at Facebook.

But it’s not all puppies and roses, as anyone who’s actually been a part of such a community knows.

It’s harder to control problems, which is why a blatantly inaccurate entry on Wikipedia once sat around for 123 days.  Some community members tend to get a little too excited telling other members they’re wrong, which can be a problem in any organization, but is multiplied when everyone has the right to monitor.

Some are great at pointing out problems but not so good at taking responsibility for fixing them.

And groups of people together can rathole on insignificant issues (especially on mailing lists), stalling progress because they can’t bring themselves to resolve “What color should we paint the bikeshed?” issues.

Wikipedia has struggled with these challenges over the past ten years.  It now limits access to certain entries in order to control accuracy, but arguably at some cost to the vibrancy of the community.  Wikipedia is trying to open up Wikipedia in new directions, as it tries a redesign in the hope it will encourage more diverse groups to write and edit entries (though personally, it looks a lot like the old one).

Ultimately, someone still has to be in charge.  And when you value democracy over dictatorship, it’s harder but arguably more interesting, to figure out what that looks like.

Recap and Proposal: 95/5, The Statistically Insignificant Privacy Guarantee

Wednesday, May 26th, 2010


Image from: xkcd.

In our search for a privacy guarantee that is both measurable and meaningful to the general public, we’ve traveled a long way in and out of the nuances of PINQ and differential privacy: A relatively new, quantitative approach to protecting privacy. Here’s a short summary of where we’ve been followed by a proposal built around the notion of statistical significance for where we might want to go.

The “Differential Privacy” Privacy Guarantee

Differential privacy guarantees that no matter what questions are asked and how answers to those questions are crossed with outside data, your individual record will remain “almost indiscernible” in a data set protected by differential privacy. (The corollary to that is that the impact of your individual record on the answers given out by differential privacy will be “negligeable.”)

For a “quantitative” approach to protecting privacy, the differential privacy guarantee is remarkably NOT quantitative.

So I began by proposing the idea that the probability of a single record being present in a data set should equal the probability of that single record not being present in that data set (50/50).

I introduced the idea of worst-case scenario where a nosy neighbor asks a pointed question that essentially reduces to a “Yes or no? Is my neighbor in this data set?” sort of question and I proposed that the nosy neighbor should get an equivocal (50/50) answer: “Maybe yes, but then again, (equally) maybe no.”

(In other words, “almost indiscernible” is hard to quantify. But completely indiscernible is easy to quantify.)

We took this 50/50 definition and tried to bring it to bear on the reality of how differential privacy applies noise to “real answers” to produce identity-obfuscating “noisy aswers.”

I quickly discovered that no matter what, differential privacy’s noisy answers always imply that one answer is more likely than another.

My latest post was a last gasp explaining why there really is no way to deliver on the completely invisible, completely non-discernible 50/50 privacy guarantee (even if we abandoned Laplace).

(But I haven’t given up on quantifying the privacy guarantee.)

Now we’re looking at statistical significance as a way to draw a quantitative boundary around a differential privacy guarantee.

Below is a proposal that we’re looking for feedback on. We’re also curious to know if anyone else tried to come up with a way to quantify the differential privacy guarantee?

What is Statistical Significance? Is it appropriate for our privacy guarantee?

In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. Applied to our privacy guarantee, you might ask the question this way: When you get an answer about a protected data set, are the implications of that “differentially private” answer (as in implications about what the “real answer” might be) significant or are they simply the product of chance?

Is this an appropriate way to define a quantifiable privacy guarantee, we’re not sure.

Thought Experiment: Tossing a Weighted Coin

You have a coin. You know that one side is heavier than the other side. You have only 1 chance to spin the coin and draw a conclusion about which side is heavier.

At what weight distribution split does the result of that 1 coin spin start to be statistically significant?

Well, if you take the “conventional” definition of statistical significance where results start to be statistically significant when you have less than a 5% chance of being wrong, the boundary in our weighted coin example would be 95/5 where 95% of the weight is on one side of the coin and 5% is on the other.

What does this have to do with differential privacy?

Mapped onto differential privacy, the weight distribution split is the moral equivalent of the probability split between two possible “real answers.”

The 1 coin toss is the moral equivalent of being able to ask 1 question of the data set.

With a sample size of 1 question, the probability split between two possible, adjacent “real answers” would need to be at least 95/5 before the result of that 1 question was statistically significant.

That in turn means that at 95/5, the presence or absence of a single individual’s record in a data set won’t have a statistically significant impact on the noisy answer given out through differential privacy.

(Still 95% certainty doesn’t sound very good.)

Postscript Obviously, we don’t want to be a situation where asking just 1 question of a data set brings it to the brink of violating the privacy guarantee. However, thinking in terms of 1 question is helpful way to figure out the “total” amount of privacy risk the system can tolerate. And since the whole point of differential privacy is that it offers a quantitative way to track privacy risk, we can take that “total” amount and divide it by the number of questions we want to be able to dole out per data set and arrive at a per-question risk threshold.

Really? 50/50 privacy guarantee is truly impossible?

Monday, May 24th, 2010

At the end of my last post, we came to the rather sad conclusion that as far as differential privacy is concerned, it is not possible to offer a 50/50, “you might as well not be in the data set” privacy guarantee because, well, the Laplace distribution curves used to apply identity-obfuscating noise in differential privacy are too…curvy.

No matter how much noise you add, answers you get out of differential privacy will always imply that one number is more likely to be the “real answer” than another. (Which as we know from our “nosy-neighbor-worst-case-scenario,” can translate into revealing the presence of an individual in a data set: The very thing differential privacy is supposed to protect against.)

Still, “50/50 is impossible” is predicated on the nature of the Laplace curves. What would happen if we got rid of them? Are there any viable alternatives?

Apparently, no. 50/50 truly is impossible.

There are a few ways to understand why and how.

The first is a mental sleight of hand. A 50/50 guarantee is impossible because that would mean that the presence of an individual’s data literally has ZERO impact on the answers given out by PINQ, which would effectively cancel out differential privacy’s ability to provide more or less accurate answers.

Back to our worst-case scenario, in a 50/50 world, a PINQ answer of 3.7 would not only equally imply that the real answer was 0 as that it was 1, it would also equally imply that the real answer was 8, as that it was 18K or 18MM. Differential privacy answers would effectively be completely meaningless.

Graphically speaking, to get 50/50, the currently pointy noise distribution curves would have to be perfectly horizontal, stretching out to infinity in both directions on the number line.

What about a bounded flat curve?

(If pressed, this is probably the way most people would understand what is meant when someone says an answer has a noise level or margin of error of +/-50.)

Well, if you were to apply noise with a rectangular curve, in our worst-case scenario, with +/-50 noise, there would be a 1 in 100 chance that you get an answer that definitively tells you the real answer.

If the real answer is 0, with a rectangular noise level +/- 50 would yield answers from -50 to +50.

If the real answer is 1, a rectangular noise level +/-50 would yield answers from -49 to +51.

If you get a PINQ answer of 37, you’re set. It’s equally likely that the answer is 0 as that the answer is 1. 50/50 achieved.

If you get a PINQ answer of 51, well you’ll know for sure that the real answer is 1, not 0. And there’s a 1 in a 100 chance that you’ll get an answer of 51.

Meaning there’s a 1% chance that in the worst-case scenario you’ll get 100% “smoking gun” confirmation of that someone is definitely present in a data set.

As it turns out, rectangular curves are a lot dumber than those pointy Laplace things because they don’t have asymptotes to plant a nagging seed of doubt. In PINQ, all noise distribution curves have an asymptote of zero (as in zero likelihood of being chosen as a noisy answer).

In plain English, that means that every number on the real number line has a chance (no matter how tiny) of being chosen as a noisy answer, no matter what the “real answer” is. In other words, there are no “smoking guns.”

So now we’re back to where we left off in our last post, trying to pick an arbitrary arbitrary probability split for our privacy guarantee.

Or maybe not. Could statistical significance come and save the day?

Could we quantify our privacy guarantee by saying that the presence or absence of a single record will not affect the answers we give out to a statistically significant degree?

Building a community: the costs and benefits of a community built on a quid pro quo

Tuesday, May 18th, 2010

A couple of posts ago, I wrote about how Yelp, Slashdot and Wikipedia reward their members for contributing good content with stars, karma points, and increased status, all benefits reserved just for their registered members.  All three communities, however, share the benefits of what they do with the general public.  You don’t have to contribute a single edit to a Wikipedia entry to read all the entries you want.  You don’t have to register to read Yelp reviews, nor to read Slashdot news.  For Wikipedia and Slashdot, you don’t even have to register to edit/make a comment.  You can do it anonymously.

In other communities, however, those who want to benefit from the community must also give back to the community.

Credit unions, for example, have benefits for their members and their members only.  Credit unions and banks offer a lot of the same services – accounts, mortgages, and other loans – but they often do so on better terms than banks do.  However, while a bank will offer a mortgage to a person who does not have an account at that bank, a credit union will provide services only to credit union members.

It is a quid pro quo deal – the credit union member opens an account and the credit union provides services in return.

A more particular example is the Park Slope Food Coop, a cooperative grocery store to which I belong.  Many food coops operate on multiple levels of access and benefits.  Non-members can shop, but may not get as big a discount as members.   Those who want to be members can choose to pay a fee or to volunteer their time.  The Park Slope Food Coop eliminates all those choices – you have to be a member to shop, and you have to work to be a member.  Every member of the Coop is required to work 2 hours and 45 minutes every 4 weeks.  The exact requirement can vary depending on the type of work you sign up for, and the kind of work schedule you have, but that work requirement exists for every single adult member of the Coop.  In return, you get access to the Coop’s very fresh and varied produce and goods, often of higher quality and at lower prices than other local stores.

Again, it’s quid pro quo, members work and they get access to food in return.

This is not to suggest that the arrangement members of credit unions and the Coop are acting in a mercenary way.  Quid pro quo doesn’t just mean “you scratch my back, I scratch yours.”  It means you do something and get something of equal value in return.

There are some real advantages to limiting benefits for community members and community members only.

The incentive to join is clear.  The community is often more tight-knit.  Most of all, there is no conflict of interest between what’s good for the community and what’s good for the members.  A bank serves its customers but it has an incentive to make money that goes beyond protecting its customers. Credit unions were not untouched by the financial crisis, but they were certainly not as entangled as commercial banks and are considered good places still to get loans if you have good credit.

There are also real disadvantages.

As both examples make clear, such communities tend to be small and local.  The Coop has more than 12,000 members, a lot for a physically small space, but nowhere close to the numbers that visit large supermarkets.  Credit unions boast that they serve 186 million people worldwide, but any particular credit union is much smaller.  Even the credit union associated with an employer as large as Microsoft is nowhere near as large as a national bank.  It’s difficult to scale the benefits of a credit union up.

Even if the group is kept small, the costs of monitoring this kind of community are obviously high.  In an organization like the Coop, someone needs to make sure everyone is doing their fair share of the work.  Stories about being suspended, applying for “amnesty,” and trying to hide spouses and partners abound.  The Coop is the grocery store non-members love to hate and a favorite subject in local media, with stories popping up every couple of years with headlines like, “Won’t Work for Food: Horror Stories of the World’s Largest Member-Owned Cooperative Grocery Store” and “Flunking Out at the Coop.”

Personally, I think the Coop functions surprisingly well, proven by its relative longevity among cooperative endeavors, but it’s certainly not a utopian grocery store where people hold hands and sing “Kumbaya” over artichokes.

Notably, both examples are also communities that mainly operate offline.  The Internet with its ethos of openness generally doesn’t favor sites that limit access only to members.  Registered users may need to log on to view their personal accounts, but few sites really limit the benefits of the site to members alone.

So is there any online community that limits the benefits of the community as strictly to members as my two offline examples?

The first example I could come up with was Facebook, and it’s actually a terrible one.  Facebook’s been all over the news for the changes that make its users’ information more publicly available, and new sites like Openbook are making obvious how public that information is.  At the same time, though, that public-ness is still not that obvious to the average Facebook user.  Information is primarily being accessed by third party partners (like Pandora), other sites using Facebook’s Open Graph, and other Facebook users (Community Pages, Like buttons across the Internet).  Facebook profiles can show up in public search results, but when you go to facebook.com, the first thing you see is a wall.  If you register, you can use Facebook.  If not, you can’t.

Facebook is perhaps most accurately an example of a community that looks closed but isn’t.  As danah boyd points out,

If Facebook wanted radical transparency, they could communicate to users every single person and entity who can see their content…When people think “friends-of-friends” they don’t think about all of the types of people that their friends might link to; they think of the people that their friends would bring to a dinner party if they were to host it. When they think of everyone, they think of individual people who might have an interest in them, not 3rd party services who want to monetize or redistribute their data. Users have no sense of how their data is being used and Facebook is not radically transparent about what that data is used for. Quite the opposite. Convolution works. It keeps the press out.

In a way, it shouldn’t surprise us that Facebook is pushing information public.  Its whole economic model is based on information, not on providing a service to its users.

Which leads me to the one good example of an online community where you really have to join to benefit — online dating sites.  Match.com, eHarmony, OKCupid — none of them let you look at other members’ profiles before you join.  OkCupid is free, but the others rely on an economic model of subscriptions, not advertising.

It seems dating is in that narrow realm of things people are willing to pay for on the Internet.

So I’m left wondering, is it possible to set up a free, large-scale, online community where benefits are limited to its members?  What are the other costs and benefits of a community where you have to give to get?  Closed versus open?  And do the benefits outweigh the costs?

Building a community: Does a community have to be diverse to be successful?

Tuesday, May 11th, 2010

Last year, Wikipedia made headlines when a survey commissioned by the Wikimedia Foundation discovered only 13% of Wikipedia’s writers and editors are women.  Among people who read but don’t write or edit for Wikipedia, 69% are men and 31% are women.  The same survey found that Wikipedians were much more highly educated than the rest of the population, with 19% saying they have a Master’s degree and 4.4% saying they have a Ph.D.

Facebook and MySpace have similarly gotten press for news that the demographics of the social networks’ members vary across race, class, and education.

It shouldn’t surprise us that these sites, or any other sites, would be more popular among certain demographic groups.

All communities, online or off, tend to reflect their founders and the worlds they come from.

Mark Zuckerberg, Facebook

Facebook was founded by Mark Zuckerberg while he was at Harvard.  Facebook ended up more popular with Ivy League students.  Wikipedia was founded using wiki technology and principles from the open source software movement.  Wikipedians, not surprisingly, are “mostly male computer geeks,” as described by founder Jimmy Wales.  Yelp started in San Francisco, and the irreverent, young tone echoes the tone of many Silicon Valley start-ups, attracting irreverent, young people.  It’s not just that the sites’ founders attract people who are like them.  They set the tone, based on values they hold, that tend to be shared by people similar to them.

Jimmy Wales, Wikipedia

Even as sites grow and expand beyond the first adopters, communities can develop cultures that are more attractive to certain groups than others.

Women are allegedly more active than men on Facebook, whereas the opposite is true on Twitter. 

Although both sites involve sharing information, the mechanisms are quite different.

I’m not going to hazard a guess as to why women are more drawn to Facebook or why men are more drawn to Twitter.  I do think it’s funny when writers forget their personal preferences might not be universal.  This writer, this writer, and this writer, who agree Twitter is much better than Facebook — all men.

Martha Stewart

Here’s one prominent exception, Martha Stewart, who says,

First of all, you don’t have to spend any time on it, and, second of all, you reach a lot more people. And I don’t have to ‘befriend’ and do all that other dippy stuff that they do on Facebook.

Which sounds like a stereotypically male sort of thing to say.

But it is worth noting that certain ways of interacting are more appealing to some groups than others, even when sites are not being marketed specifically to one group or another.

Why does it matter?  A successful community is not necessarily a diverse one.

A forum for breast cancer patients won’t measure the health of its community by the number of men on it.

One of the most attractive things about the Internet is its ability to concentrate people with esoteric interests.

However, for communities with more universal goals, diversity is an important issue.

It makes sense that Wikipedia has publicly been working on making its community of writers and editors more diverse.  If Wikipedia’s goal is to create “a world in which every single human being can freely share in the sum of all knowledge,” it has to include the knowledge and perspective of people other than male computer geeks.  (I can’t say for sure, but I would bet there were a couple of male computer geeks involved in the writing of this rather literal exposition of the “sanitary napkin.”)

As part of that plan, Wikipedia is rolling out a redesign, which they hope will encourage more people to contribute their knowledge.

Whether or not the redesign drastically affects Wikipedia’s numbers, the plan will likely involve a delicate balancing act.

Wikipedia needs to attract new members without alienating the original members of its community.

If the old interface was intimidating to some people, it was probably equally attractive to others.  Those who didn’t find it intimidating could identify as part of a hard-core, committed group, an identity that can be crucial for energizing early members of a community.

It’s the problem of any community that wants to grow – how do you grow without destroying the sense of community that helped it start in the first place?

Large organizations have traditionally tried to maintain a sense of community with local chapters.

The Sierra Club and Habitat for Humanity International are both built on a network of local affiliates that have a certain amount of autonomy.  The Catholic Church and other religious organizations operate using a similar organizational structure, though with varying degrees of centralized control.

Online, the examples are fewer.

In fact, the only example of a community in our study that’s grown obviously beyond the boundaries of the original group is Facebook, and as I’ve discussed earlier, it’s an outlier.

It contains communities but is not actually a community in and of itself.  Despite the demographic differences between Facebook and MySpace, Facebook has arguably grown so big, those differences have become negligible.  It almost doesn’t matter if Facebook is somewhat more popular among certain groups when it has 400 million active users.  At the same time, though, each Facebook user’s experience of Facebook is filtered through is or her friends.  Even though the dilemma of whether to accept a friend request from a parent has become a common joke, most people on Facebook haven’t directly experienced how quickly Facebook has expanded.

This may be why Facebook has managed to transcend its origins so quickly as an online social network for Harvard students.  The feeling of intimacy and connection hasn’t changed for the average user.  It’s questionable whether Facebook can maintain that sense of localized community with the various changes it’s made to how user information is shared, but Facebook is gambling that it can.

Building a community: the implications of Facebook’s new features for privacy and community

Thursday, May 6th, 2010

As I described in my last post, the differences between MySpace and Facebook are so stark, they don’t feel like natural competitors to me.  One isn’t necessarily better than the other.  Rather, one is catering to people who are looking for more of a public, party atmosphere, and the other is catering to people who want to feel like they can go to parties that are more exclusive and/or more intimate, even when they have 1000 friends.

But this difference doesn’t mean that one’s personal information on Facebook is necessarily more “private” than on MySpace.  MySpace can feel more public.  There is no visible wall between the site and the rest of the Internet-browsing community.  But Facebook’s desire to make more of its users’ information public is no secret.  For Facebook to maintain its brand, though, it can’t just make all information public by default.  This is a company that grew by promising Harvard students a network just for them, then Ivy League students a network just for them, and even now, it promises a network just for you and the people you want to connect with.

Facebook needs to remain a space where people feel like they can define their connections, rather than be open to anyone and everyone, even as more information is being shared.

And just in time for this post, Facebook rolled out new features that demonstrate how it is trying to do just that.

Facebook’s new system of Connections, for example, links information from people’s personal profiles to community pages, so that everyone who went to Yale Law School, for example, can link to that page. Although you could see other “Fans” of the school on the school’s own page before, the Community page puts every status update that mentions the school in one place, so that you’re encouraged to interact with others who mention the school.  The Community Pages make your presence on Facebook visible in new ways, but primarily to people who went to the same school as you, who grew up in the same town, who have the same interests.

Thus, even as information is shared beyond current friends, Facebook is trying to reassure you that mini-communities still exist.  You are not being thrown into the open.

Social plug-ins similarly “personalize” a Facebook user’s experience by accessing the user’s friends.  If you go to CNN.com, you’ll see which stories your friends have recommended.  If you “Like” a story on that site, it will appear as an item in your Facebook newsfeed.  The information that is being shared thus maps onto your existing connections.

The “Personalization” feature is a little different in that it’s not so much about your interactions with other Facebook users, but about your interaction with other websites.  Facebook shares the public information on your profile with certain partners.  For example, if you are logged into Facebook and you go to the music site Pandora, Pandora will access public information on your profile and play music based on the your “Likes.”

This experience is significantly different from the way people explore music on MySpace.  MySpace has taken off as a place for bands to promote themselves because people’s musical preferences are public.  MySpace users actively request to be added to their favorite bands’ pages, they click on music their friends like, and thus browse through new music.  All of these actions are overt.

Pandora, on the other hand, recommends new music to you based on music you’ve already indicated you “Like” on your profile.   But it’s not through any obvious activity on your part.  You may have noted publicly that you “Like” Alicia Keys on your Facebook profile page, but you didn’t decide to actively plug that information into Pandora.  Facebook has done it for you.

Depending on how you feel about Facebook, you may think that’s wonderfully convenient or frighteningly intrusive.

And this is ultimately why Facebook’s changes feel so troubling for many people.

Although they aren’t ripping down the walls of its convention center and declaring an open party. As Farhad Manjoo at Slate says, Facebook is not tearing down its walls but “expanding them.”

Facebook is making peepholes in certain walls, or letting some people (though not everyone) into the parties users thought were private.

This reinforces the feeling that mini-communities continue to exist within Facebook, something the company should try to do as it’s a major draw for many of its users.

Yet the multiplication of controls on Facebook for adjusting your privacy settings makes clear how difficult it is to share information and maintain this sense of mini-communities.  There are some who suspect Facebook is purposefully making it difficult to opt-out.  But even if we give Facebook the benefit of the doubt, it’s undeniable that the controls as they were, plus the controls that now exist for all the new features, are bewildering.  Just because users have choices doesn’t mean they feel confident about exercising them.

On MySpace, the prevailing ethos of being more public has its own pitfalls.  A teenager posting suggestive photos of herself may not fully appreciate what she’s doing.  At the least, though, she knows her profile is public to the world.

On Facebook, users are increasingly unsure of what information is public and to whom.  That arguably is more unsettling than total disclosure.

Building a community: Are we all at the same party?

Monday, May 3rd, 2010

In my last post, I described the ways in which Yelp and Facebook are different animals, despite Yelp’s social network-like qualities. Yelp feels like a community in which its members share the goal of writing good reviews for Yelp; Facebook contains communities, none of which particularly feel any affinity for doing anything for Facebook.

You would think, then, that Facebook isn’t a community simply because its members aren’t invested in a shared mission. But when you look at MySpace, a site that is also a social network and nothing else, the question of what makes a community a community becomes more complicated.

MySpace, first off, is not exactly a community either. Its members aren’t invested in MySpace any more than Facebook members are invested in Facebook. But it does feel very different from Facebook in a couple of obvious ways.

MySpace feels crazier, looser, and less professional, and thus also more personal and individualistic.

One of the most obvious and immediate visual differences between MySpace and Facebook is the way users design their profile pages. MySpace allows its users to customize their pages, which means MySpace is a riot of colors, animated gifs, and backgrounds. There’s a basic template with neat boxes, similar to what Ashton Kutcher has here:

But there are many more users who have so much animation and graphics, sometimes even their names are obscured.

The aesthetic reminds me a bit of my teenage bedroom, how much I was interested in making sure that the the posters I hung, art and music and what have you, expressed exactly who I was. A lot has already been said about the racial and socioeconomic differences between MySpace and Facebook, so I won’t go into them here, but it’s worth noting that this flexible aesthetic, as danah boyd points out, doesn’t only attract kids who are poor or don’t plan to go to college, but “the kids who are socially ostracized at school because they are geeks, freaks, or queers.”

Facebook, on the other hand, has uniform blue and white boxes. You might choose to upload a quirky or weird profile photo, or even make up a profile like Peowtie del Toro, but your aesthetic choices are severely limited.

Facebook provides a more corporate, professional framework on which people are neatly displayed, like a telephone book or directory (aka, a college facebook).

MySpace’s design gets made fun of all the time, at least by the kind of people who tend to write for tech blogs, but it’s a draw for people who want to be able to individualize their profiles. Facebook’s design, in contrast, doesn’t promote a particular aesthetic. There are people who are drawn to Facebook’s clean professional look and repelled by MySpace’s free-for-all.

But the reason people praise Facebook’s design is because they value the way it’s a clean slate, bland and able to absorb almost anything and anyone.

Facebook is growing faster than MySpace, and although that may be in part due to its design, but it’s less because Facebook’s design is so compelling and more because it’s inoffensive.

MySpace feels like one big party.

The free-for-all of MySpace, compared to the clean, blank slate of Facebook results in very different atmospheres on both sites. MySpace feels like one big party. There are definitely subgroups within MySpace, but there is an openness to the site that is completely missing from Facebook.

From the moment you go to MySpace, you see content that’s available to you. With a few clicks, you can find videos, band pages with music, and individual profiles that have been made publicly accessible to anyone, even those who are not registered members of MySpace. Although there are MySpace users who don’t make their profiles public, you can browse the profiles of those who are public, and there’s a sense that any of these strangers might connect with each other. (It helps that so many of the photos are aggressively flirty.) Everyone’s at the same party.

Facebook certainly isn’t private, and as its many new developments indicate, the company is aggressively trying to make more of its users’ information public. (More on that to come.)

But Facebook isn’t one big open party. It’s a convention hall where you’re supposed to find your group and join whichever cocktail party, networking event, or shindig is being hosted by your group.

Your first view of Facebook is a virtual wall. The first page consists mainly of a blue and white graphic with abstract images of people connected all over the world. The login for members is the only visibly interactive part of the page, other than the sign up for new members. There isn’t even a search box for existing members. The impression is that until you log in or sign up, you don’t really have access to the site.

There’s definitely no “Browse” function. Even after you log in, you can only search for specific people. At best, you can browse your friends’ friends, but even that is based on the connections you already have. Although people increasingly have Facebook friends they don’t actually know, the connections most people have to each other aren’t based on the fact that they’re both on Facebook. Rather, people friend each other because they went to the same school, work at the same place, or have friends in common. To some extent, I really don’t know the full range of people who are on Facebook because I can only see the people I’m friends with.

It’s not surprising MySpace is the place to hear new music.

Despite Facebook’s rapid growth, MySpace is still the most popular site for bands. Part of that has to do with the ease with which tracks can be uploaded, but it also has to with the one party atmosphere of MySpace. You’ve come to have a good time, you’re open to hearing new music, you might just end up talking to the drummer after a set.

For example, when you look at the MySpace page for The National, the band’s 63,798 friends write messages that are directed to the band like,

THANX SO MUCH FOR THE ADD!! LUV UR MUZIK!! DIGGIN UR ENTIRE PLAYLIST!!
MUCH RESPECT!! MAD LUV!!
Angela Marie 😉
HAVE A BEAUTIFUL DAY AND ALWAYZ REMEMBER TO ROCK IT WITH A SMILE LIKE ME, UR KICK ASS FRIEND MISS A TO THE G!! 😉

Whereas on Facebook, the people who “like” The National don’t seem to necessarily have a sense of personal connection to the band. Some of them write direct appeals, like please come play in my town, but there are just as many comments that are directed to other fans as to the band itself, like,

just purchased tickets for the seattle show in sept!! can not wait to see them live..what an amazing follow-up to Boxer.”

Although The National is a relatively famous band, anyone can upload music on MySpace and hope to make it big, the way Lily Allen did. Many of the comments on The National page are from people in their own bands asking them to check out their music. They don’t have to already know each other to comment or become friends, whereas the social expectations are very different on Facebook.

Even as Facebook tries to make more of its users’ information public, it will never feel like MySpace.

Recently, Facebook rolled out several changes that either encourage or push its users to make more of their information public, depending on how you feel about Facebook.

These new developments — Personalization, Community Pages, and Like buttons across the Internet — are changing the way Facebook users’ information is available. Yet these changes are still in line with the “many communities” model, rather than the one-big tent feel of MySpace, with some interesting consequences for individual privacy.

More to come in a follow-up post…

Building a community: Just because it’s a social network doesn’t mean it’s a community.

Thursday, April 22nd, 2010

Yelp via Flickr Creative Commons Attribution-Noncommercial-No Derivative Works

Yelp and Facebook have a lot in common.  As I wrote in my last two posts, they both emphasize or require the use of real profiles and they use people’s concerns about their reputations to motivate activity and interaction on the site.

But Yelp and Facebook are fundamentally different.  In short, Yelp is a community and Facebook is not.

Although Facebook is a social network, it is not a community.  It began as a social network for Harvard students, basing itself on the existing connections within that community.  When it grew, it grew from community to community, from Ivy League universities to all colleges to high schools and then certain corporations, before becoming open to anyone with an email address.  When people interacted with other people on Facebook, it didn’t feel as funny or sleazy or strange as interacting with a stranger in a chatroom.  You might not have personally known a new friend, but he likely knew someone you knew.  Facebook emphasized real people and real connections.

So Facebook certainly contains communities.  It contains people who know each other from college, elementary school, an office, or even a party.  But it is not in and of itself a community. 

There is no ethos or set of values that all Facebook users share together.

Facebook users may be active on the site, but they don’t write status updates, upload photos, and play Farmville for Facebook.  They do it for themselves and for the people they want to interact with.  If another social network came along that was better, their friends were there, and it was easy to transfer their profiles, people would do it without a single pang of disloyalty.  It’s why Facebook has resisted calls for portability of profile data.  As addicted as people claim to be, no one calls himself a Facebooker.

In contrast, many people consider themselves Yelpers and Wikipedians. Yelpers have inside jokes and a self-conscious recognition that Yelpers are a tribe.  The Yelp Elite Squad gets together at events, while Wikipedians gather at Wikimania.  Although Facebook may have more users interacting in the offline world than any other site, it’s never an activity organized by or devoted to Facebook.

To me, the biggest reason for this difference is that Yelp and Wikipedia have a mission and Facebook does not.

The Wikimedia Foundation obviously has a mission; it’s a nonprofit organization with altruistic goals.  In a recent survey of Wikipedians, when asked why they contributed, 73% indicated, “I like the idea of sharing knowledge and want to contribute to it,” while 69% said “I saw an error I wanted to fix.”  They’re motivated in part by their belief in Wikipedia’s mission, to provide knowledge for the world. Yelp may not have a mission in a traditional sense, but its goal to provide informative reviews of local businesses is one that’s shared enthusiastically by many of its reviewers.  As a result, the users on Yelp are helping to create Yelp’s product, reviews, while the users on Wikipedia are helping to create Wikipedia’s product, the encyclopedia.

Facebook, in contrast, has a stated mission but it means nothing to its users.  No one joins Facebook because he believes in Facebook’s mission.  He joins because that’s where his friends are.  He is not interested in helping Facebook create a product.  In fact, as Bruce Schneier put it,

“Alice is not Facebook’s customer.  Alice is Facebook’s product.”

Facebook itself admits more or less that it has no interest in building a community.  Rather, it’s building “info aggregation with a great photos app.”.  It’s why it’s trying it’s hardest to become Twitter, and why it keeps trying to think of new ways to make more of its members personal information public.

Can differential privacy be as good as tossing a coin?

Tuesday, April 20th, 2010

At the end of my last post, I had reasoned my way to understanding how differential privacy is capable of doing a really good job of erasing almost all traces of an individual in a dataset, no matter how much “external information” you are armed with and no matter how pointed your questions are.

Now, I’m going to attempt to explain why we can’t quite clear the final hurdle to truly and completely eradicate an individual’s presence from a dataset.

  • If coins are actually weighted such that one side is just ever-so-slightly heavier than the other side.
  • And such a coin is spun by a platonically balanced machine.
  • And the coin falls with the head’s side facing up.
  • And I only get one “spin” to decide which side is heavier.
  • Probabilistically, (by an extremely slim margin) I’m better off claiming that the tail’s side is heavier.

Translate this slightly weighted coin toss example into the world of differential privacy and PINQ and we have an explanation for why complete non-discernibility is also non-possible.

I have a question. I know ahead of time that the only two valid answers are 0 and 1. PINQ gives me 1.7.

Probabilistically, I’m better off betting that 1 is the real answer.

In fact, PINQ doesn’t even have to give me an answer so close to the real answer. Even if I were to ask my question with a lot of noise, if PINQ says -10,000,000,374, then probabilistically, I’m still better off claiming that 0 is the real answer. (I’d be a gigantic fool for thinking I’ve actually gotten any real information out of PINQ to help me make my bet. But lacking any other additional information, I’d be an even gigantic-er fool to bet in the other direction, even if only by a virtually non-existent slim margin.)

The only answer that would give me absolutely zero “new information” about the “real answer” is 0.5 (where the two distribution curves for 0 and 1 intersect). An answer of 0.5 makes no implications about whether 0 or 1 is the “real answer.” Both are equally likely. 50/50 odds.

But most of the time…and I really mean most of the time, PINQ is going to give me an answer that implies either 0 or 1, no matter how much noise I add.

Does this matter? you ask.

It’s easy to argue that if PINQ gives out answers that imply the “real answer” over “the only other possible answer” by a margin of, say, 0.000001%, who could possibly accuse us of false advertising if we claimed to guarantee total non-discernibility of individual records?

(As it turns out, coin tosses aren’t really a 50/50 proposition. they’re actually more of 51/49 proposition. So perhaps the way you would answer the “Does it matter?” question depends on whether you’d be the kind of person to take “The Strategy of Coin Flipping” seriously.)

Nevertheless, a real problem arises when you try to actually draw a definitive line in the sand about when it’s no longer okay for us to claim total non-discernibility in our privacy guarantee.

If 50/50 odds are the ideal when it comes to true and complete non-discernibility, then is 49/51 still okay? 45/55? What about 33/66? That seems like too much. 33/66 means that if the only two possible answers are 0 and 1, PINQ is going to be twice as likely to give me an answer that implies 1 than as to give me answer that implies 0.

Yet still I wonder, does this really count as discernment?

Technically speaking, sure.

But what if discernment in the real world can really only happen over time with multiple tries?

If I ask a question and I get 4 as an answer. Rationally, I can know that a “real answer” of 1 is twice as likely to yield a PINQ answer of 4 as a “real answer” of 0. But I’m not sure if viewed through the lens of human psychology, that makes a whole lot of sense.

After all, there are those psychology studies that show that people need to see 3 options before they feel comfortable making a decision. Maybe it takes “best out of 3” for people to ever feel like they can “discern” any kind of pattern. (I know I’ve read this in multiple places, but Google is failing me right now.)

Here’s psychologist Dan Gilbert on how we evaluate numbers (including odds and value) based on context and repeated past experience.

These two threads on the difference between the probability of a coin landing heads n-times versus the probability of the next coin landing heads after it has already landed n-times further illustrates how context and experience cloud our judgement around probabilities.

If my instincts are correct, what does all this mean for our poor, beleaguered privacy guarantee?


Get Adobe Flash player