Posts Tagged ‘Organizational Integrity’

Building a community — populated by real people or anonymous cowards

Friday, April 9th, 2010

Mimi’s comment on my last blog post about building communities made an important point – although both Yelp and Wikipedia reward their users for their activities with increased status within their communities, they do so in very different ways with very different results for their content.

There are many, many differences between Yelp and Wikipedia.  (I’m curious how many people are registered users on both sites.)

But one really obvious one is that Yelp has created an active community of reviewers who largely use real photos and real names (or at least real first name and last initial), like peter d., a member of the Yelp Elite Squad.

Wikipedia, in contrast, is a free for all.  Many people who write or edit are anonymous.  They may register with pseudonyms, or they may not register at all, so that their edits are only associated with an IP address.  There are occasionally Wikipedians who reveal their real names and provide a lot of biographical information on their profile pages, like Ragesoss, who even provides a photo.

But for every user like him, there are many more like Jayjg and Neutrality, who seems to identify with Thomas Jefferson, as well as users who have been banned.

Obviously, there are other communities that encourage the use of real identities — Facebook, MySpace, social networks in general.  And there are communities where being pseudonymous or anonymous is perfectly fine, even encouraged — Slashdot, Flickr, and many more.

So how does the use of real identities affect the community?  How does it affect the incentives to participate?  The content that’s created?

Yelp’s reward system, as described in my earlier post, is very focused on the individual.  The compliments, the Elite Squad badge, the number of “useful, funny or cool” reviews written are all clearly attached to a specific person, such as peter d. above.  Although people are complimenting peter d. for the content he’s generated for Yelp, they’re also complimenting him as a person, as he’s told that he’s funny, he’s a good writer, and so forth.

Yelpers are encouraged to develop personas that are separate from the reviews they write.  The profiles have set questions, like “Last Great Book I Read” and “My First Concert.”  They know that it’s not just about one review they’ve written, but where they’ve eaten, where they’ve gone, what they’ve done, that shows something in a generation that recognizes tribes based on what people consume.  There is a suggestion that Yelpers might interact outside of Yelp, and in the case of the Yelp Elite Squad, an assumption that they will, as one of the major privileges is that members get invited to local events.  The reputation you seek to develop on Yelp is not necessarily so different from the reputation you seek to develop in real life.

Yelp isn’t just a review site.  It’s a social network that feels almost like an online dating site — you can see how easily compliments could be used to flirt.

Wikipedia’s reward system, based on the open source software model, is more low-key.  Wikipedia does rate articles as “good articles,” and notes which articles have priority within certain classes of subjects.  If you write a lot of “good articles,” or otherwise contribute substantively, you can get various gold stars and badges as well, like the Platinum Editor Star Jayjg has on his profile page.

But the compliments are less about the Wikipedia user, even when stars are given, and more about the Wikipedia-related contribution he or she has made.  Some Wikipedians may be flirting with each other, but it seems really unlikely, at least not within any Wikipedia-built mechanism.  Jayjg clearly feels no need to tell us where he’s from and what his first concert was — it’s not relevant here.  The Wikipedians who do share more personal information aren’t required or even encouraged to do so by the Wikipedia system.

It doesn’t matter who Jayjg is.  It only matters what he does for Wikipedia.

So although both sites use rewards and feedback loops to encourage participation, they’re creating fundamentally different content with fundamentally different communities.

Yelp’s entry for the Brooklyn Botanic Garden has 126 reviews, each wholly written by a single user with that user’s photo, name, number of friends and number of reviews immediately visible.  The reviews are clearly personal and subjective, as made obvious by references to what that person specifically experienced.  In peter d.’s case, his review notes how his brother once pushed him into the Japanese Pond.

When you look at Wikipedia’s entry on the Brooklyn Botanic Garden, you see a seamless, unitary document.  Unless you click on the tabs that cover history or discussion, you won’t even see who worked on the article.  There is no personal perspective, there is no author listed with stars next to his name, there are no buttons asking you to rate that author’s contribution as “useful, funny or cool.”

This makes sense, given their respective missions.  Yelp’s goal is to generate as many reviews as possible about local businesses, recognizing that taste is really subjective.  Wikipedia’s goal is to produce a free encyclopedia with unbiased, objective content.  Yelp doesn’t want you to write one review and go away.  If you do, your review may not even show up, as the spam filter may decide you’re not trustworthy.  But of the thousands and thousands of people who’ve edited Wikipedia, the vast majority have done a few, maybe even just one edit, and never come back.  Wikipedia is a collective work; Yelp is a collection of individual works.

I don’t have an opinion on whether a community of real profiles is better or worse than a community of anonymous and pseudonymous contributors.

The different structures seem to shape the content of Yelp versus Wikipedia in appropriate ways.  What’s less clear to me is how this difference affects the make up of their communities.  Wikipedia has recently been in the news as it examines the demographics of its users, which is “more than 80 percent male, more than 65 percent single, more than 85 percent without children, around 70 percent under the age of 30.” Its rewards system and its open source model clearly attracted the right kind of enthusiastic people who were willing to write encyclopedia entries without personal recognition and glory.  Wikipedia wants more and different kinds of people to be writing entries.  Would a system like Yelp’s that encourages a more explicit sense of community and social networking change who is attracted to Wikipedia? Or would it attract precisely the wrong kind of people, the ones who couldn’t work collaboratively without explicit credit and acknowledgment?

Yelp isn’t a model of community building either, of course.  Its users are more diverse than Wikipedia’s in that its breakdown by gender is 54-46, male-female, but it’s also a very young community.  It’s less international than Wikipedia, partly because it grows city by city, but its American youth-oriented culture may not translate well either.  It’s facing its own credibility problem as business owners accuse Yelp of extortion.  It’s not surprising, as it’s fueled by people who are addicted to writing reviews and complimenting each other, but it’s paid for by advertisers who don’t participate in that same incentive structure.

Both Yelp and WIkipedia have managed to attract active, enthusiastic contributors willing to do a lot for no pay (or mostly no pay in the case of Yelp, which has admitted to paying some reviewers.)  But moving forward, which model of participation and rewards will be more attractive to more people for the right reasons?

For more on this issue, see today’s New York Times article on how news sites are considering getting rid of the anonymous option for commenters.  Or they could do what Slashdot does, which is call anyone who chooses to post anonymously an “Anonymous Coward.”

Privacy Problems as Governance Problems at Facebook

Monday, January 4th, 2010

You know that feeling when you’ve been pondering something for awhile and then you read something that articulates what you’ve been thinking about perfectly?  It’s a feeling between relief and joy, and it’s what I felt reading Ed Felten’s critique of Facebook’s new privacy problems:

What Facebook has, in other words, is a governance problem. Users see Facebook as a community in which they are members. Though Facebook (presumably) has no legal obligation to get users’ permission before instituting changes, it makes business sense to consult the user community before making significant changes in the privacy model. Announcing a new initiative, only to backpedal in the face of user outrage, can’t be the best way to maximize long-term profits.

The challenge is finding a structure that allows the company to explore new business opportunities, while at the same time securing truly informed consent from the user community. Some kind of customer advisory board seems like an obvious approach. But how would the members be chosen? And how much information and power would they get? This isn’t easy to do. But the current approach isn’t working either. If your business is based on user buy-in to an online community, then you have to give that community some kind of voice — you have to make it a community that users want to inhabit.

This is a question we at CDP have been asking ourselves recently — how do you create a community that users want to inhabit?  We agree with Ed Felten that privacy in Facebook, as in most online activities, “means not the prevention of all information flow, but control over the content of their story and who gets to read it.”  Our idea of a datatrust is premised on precisely this principle, that people can and should share information in a way that benefits all of society without being asked to relinquish control over their data. Which is why we’re in the process of researching a wide range of online and offline communities, so that when we launch our datatrust, it will be built around a community of users who feel a sense of investment and commitment to our shared mission of making more sensitive data available for public decision-making.

We’d love to know, what communities are you happy to inhabit?  And what makes them worth inhabiting?  What do they do that’s different from Facebook or any other organization?

What kind of institution do we want to be? Part II

Tuesday, December 15th, 2009

As described in the first post, banks and credit unions could be useful models for the datatrust because of their function of holding valuable assets for account holders.  Public libraries and museums are very different, but their function, of providing the public access to valuable social assets, is also relevant to the datatrust.

A. We want to be an online public library of useful, personal data, because no democracy can function properly without broad access to information.

Image by FreeFoto, available under a Creative Commons Non-Commercial No-Derivative Works License.

Image by FreeFoto, available under a Creative Commons Non-Commercial No-Derivative Works License.

Although public libraries now have the fuzzy-warm feeling status right up there with puppies and babies, the public library system was not established in the U.S. without controversy.  The only people who owned books were the rich, and many argued that the poor would not know how to take care of the books they borrowed.  The system was largely established through the efforts of Andrew Carnegie and others who believed in both public libraries and public schools, and that democracy could not function without public access to information.

Librarians are now champions for intellectual freedom.  As a profession, librarians have developed strong principles around the confidentiality of library users, and they were on the front lines in resisting the USA PATRIOT Act’s provisions around FBI access to library records. Although they are often underfunded and can seem out of date, the current recession has made obvious what has been going on for awhile, that people really do use the library. And when they do, they don’t abuse the privilege.  Many communities feel invested in their local branches, and the respect people have for libraries translates into a respect for their holdings.

We hope the establishment of our datatrust can follow a similar path.  Everyone may not agree now that this kind of access to information is necessary.  But we strongly believe that the status quo, where large corporations and government agencies have access but the public does not, stifles the free flow of information that really is crucial for a functioning democracy.  We hope that the datatrust can grow to engender the same kind of respect and to be a valuable member of many communities.

Of course, the information in books is qualitatively different from personal data about an individual.  If a book gets lost, it’s not as great a loss as if personal data gets misused.  Which leads us to the next point.

B. We want to make data available to the public because it is too valuable to be kept in a locked safe, the way museums make great art available.


Museums are interesting institutions to us because they showcase extremely valuable pieces that would be safest from damage and theft if kept locked up in a vault, yet are put on public display because the value afforded to the public outweighs the risk of damage and theft.  Although they have a greater reputation for elitism than public libraries, museums also operate on the belief that certain assets, like great art or historical artifacts, should belong to society at large rather than to a private collector.  Thus, when a private collector does donate his or her collection to a museum, he or she gains the reputational benefit of having done something altruistic.  At the same time, access to the public comes with protective measures for security—guards, technology, velvet ropes, and more.

Personal data, to us at CDP, is also too valuable to keep locked up.  Arguably, personal data is currently kept by many private collectors or corporations.  They gain value from that data, but that value is not shared with the public.  Unlike art, which is usually made by an individual, personal data is collected from a large swaths of the general population, and yet we don’t have access to that data.  Like museums, we will want to think of security measures to minimize any risk, but we do acknowledge that there will be some risk, known and unknown, in our project.  But that risk is so much outweighed by the potential benefits to society, we think it’s a worthwhile experiment.

Museums also add value to their holdings by curating them.  That’s an important challenge for us, as information is only valuable when it’s organized.

What kind of institution do we want to be?

Tuesday, December 8th, 2009

Apparently, everyone wants to be a nonprofit these days.

As this New York Times article describes, the broad and elastic definition of a 501(c)(3) tax-exempt organization has allowed the Internal Revenue Service to approve 99% of applications last year.  (And I had been so proud of our tax-exempt status!)  The new nonprofits include the Woohoo Sistahs, a social club that raises money for cancer research and other causes, and the Red Nose Institute, an organization that sends red clown noses to American troops serving abroad.  The proliferation of nonprofit organizations has a real impact on the country — each donation to a tax-exempt organization represents a loss in tax revenue to a government struggling to pay for two wars, healthcare reform, and economic stimulus.

As we’ve noted before, it’s not enough to offer services for free or to have no revenues.  Most people think “nonprofit” must “do good,” but that’s not a very useful definition.  There are lots of for-profit businesses that “do good,” as well as plenty of inefficient nonprofits that “do good” very badly (and I’m not even counting the fraudulent nonprofits that “do bad”).  “Nonprofit” can’t be defined by “no profits” either.  Many businesses aren’t profitable for years, and as this article by the same writer suggests, how is a poor, revenue-less alternative energy organization that hopes to eventually sell the technologies it develops different from Apple or Microsoft in their start-up stages?

So we know that in building our reputation as a trustworthy organization, it’s not enough just to declare, “Trust us, we’re a nonprofit!”  We really really care about what kind of organization, what kind of institution we’re going to be.  We’ve talked previously about how we decided to incorporate as a nonprofit, but now, it’s time to talk about other organizational models and ideas we want to learn from, whether or not they’re nonprofits.

As we study these institutions and how they work, we hope both to be able to better explain to others what we want to be (“We’re like a ____ for data!”) and to better understand ourselves what we need to do to get there.  Over the next couple of posts, we’re going to share some of our ideas, and we hope you’ll test us vigorously.


A. We want to be a bank for personal information, so you don’t have to put your personal information under your mattress.

We put our money in the bank for a number of reasons.  It’s safer than putting it under our mattresses, and at least up to FDIC limits, we know we won’t lose it.  It’s more convenient than carrying around wads of cash—remember what it was like to travel abroad before you could use foreign ATMs?  There are services banks provide, like interest, checkbooks, debit cards, and more.  And whether or not it’s a motivating force for individual account holders, banks make the economy run because funds are more easily moved around.

We want putting data into a datatrust to feel as easy, secure, and normal as putting money in the bank.  Although personal information is not like money, we think that the datatrust could do a lot of what a bank does for its account holders.  Account holders would have the ability to control how their information is shared.  They would know what is in the datatrust, as easily as we can check now online how much is in our bank account.  Although they may not get interest in a monetary sense, they would gain access to the information of others in the datatrust, once properly aggregated and secured.  Information would flow more easily when it becomes something that can be controlled.

One thing we want to avoid though is the conflict of interest that can arise when banks make decisions to enrich their shareholders rather than to benefit their account holders.  Which brings us to the next section.

B. We want to be a credit union for personal information, where the goal is to serve members, not a third party that pays the bills.

Credit unions are not as widespread as banks, and so are not as convenient in that your workplace credit union will not have ATMs across the country.  However, credit unions serve many of the same functions as banks but through an inherently different structure.

Credit unions are member-owned and member-governed organizations.  The credit union’s board consists of volunteers elected by the credit union’s members.  Credit unions generally offer better rates and lower fees than conventional banks.  Because credit unions are meant to serve its members, people tend to trust credit unions more than banks.

Although the Common Data Project is not strictly a member-based 501(c)(3), we’re interested in the member-based structure of credit unions.  We certainly want those who donate data to the datatrust to feel like the datatrust exists in their interest and that they are part of a community, even if that community is not limited to a preexisting defined group the way credit union membership is limited.  We’re exploring ways for individuals donating data to the datatrust to be involved in governing the datatrust on multiple levels.

What have we been doing?

Monday, October 19th, 2009

We’ve been silent for a while on the blog, but that’s because we’ve been distracted by actual work building out the datatrust (both the technology and the organization).

Here’s a brief rundown of what we’re doing.

Grace is multi-tasking on 3 papers.

Personal Data License We’re conducting a thought experiment to think through what the world might look like if there was an easy way for individuals to release personal information on their own terms.

Organizational Structures We’ve conducted a brief survey of a few organizational structures we think are interesting models for the datatrust “trusted” entities from Banks to Public Libraries and “member-based” organizations from Credit Unions to Wikipedia. We tried to answer the question: What institutional structures can be practical defenses against abuses of power as the datatrust becomes a significant repository of highly sensitive personal information?

Snapshot of Publicly Available Data Sources A cursory overview of some of the more interesting data sets that are available to the public from government agencies to answer the question: How is the datatrust going to be different / better than the myriad data sources we already have access to today?

We also now have 2 new contributors to CDP: Tony Gibbon and Grant Baillie.

A couple of months ago, Alex wrote about a new anonymization technology coming out of Microsoft Research: PINQ. It’s an elegant, simple solution, but perhaps not the most intuitive way for most people to think about guaranteeing privacy.

Tony is working on a demonstration of PINQ in action so that you and I can see how our privacy is protected and therefore believe *that* it works. Along the way, we’re figuring out what makes intuitive sense about the way PINQ works and what doesn’t and what we’ll need to extend so that researchers using the datatrust will be able to do their work in a way that makes sense.

Grant is working on a prototype of the datatrust itself which involves working out such issues as:

  • What data schemas will we support? We think this one to begin with: Star Schema.
  • How broadly do we support query structures?
  • Managing anonymizing noise levels.

To help us answer some of these questions, we’ve gathered a list of data sources we think we’d like to support in this first iteration. (e.g. IRS tax data, Census data) (More to come on that.)

We will be blogging about all of these projects in the coming week, so stay tuned!

Get Adobe Flash player