Posts Tagged ‘Data Collection’

Cuil: Is zero data collection the answer?

Monday, August 11th, 2008

Cuil, the new search engine, launched with much fanfare this past week. It’s been blogged about all over the place already, so I’m not going to analyze how its results compare to Google’s. I’m more curious about its privacy policy, which trumpets that it collects NOTHING, nada, zip, zilch.

I found it sort of funny that the other big news in search engines recently was Google’s announcement that it was launching an updated version of Google Trends called Google Insights for Search. While one search engine bragged about its lack of data collection, the other was showing it off.

The two news items together highlight the problem at the heart of our ongoing search for more privacy online. Despite all the handwringing over online data collection, especially by big search engines, people love seeing the data that gets collected, even when they’re not advertisers. We want to see how often we’re mentioned in Twitter, or what parts of the world are searching for topics we blog about. It’s not hard to imagine more serious research and analysis being applied to this data and real social good coming out of it.

I’ve never found very compelling the National Rifle Association’s argument, “Guns don’t kill people; people kill people.” But I find myself wanting to say something similar about data collection: “Data collection doesn’t violate privacy; irresponsible people and laws violate privacy.” Shutting down data collection altogether can’t be the answer.

Raising privacy expectations by raising privacy standards

Tuesday, August 5th, 2008

It’s great that Google is becoming more transparent about how they use your data to tailor your search results. It’s the kind of thing we’d like to see more of. However, is it enough to merely state the status quo? Or should we really be demanding not only transparency but control and ownership as well? Saul Hansell has it right, the data Google collects from you is *yours*, not theirs. So not only should we all get a better look at what Google is doing with “our” information, we need to be able to set some ground rules about what is used and how its used. And by “setting ground rules,” I don’t mean choosing between opt-in and opt-out radio buttons.


Today, privacy policies are meant more to protect companies from liability than protect individuals’ privacy rights. Even though a lot of people don’t fully understand how their information is being collected, we all know that it’s a one-way street. While businesses buy, sell, and share sensitive, personal information, we can’t even access our own information. More and more people are becoming wary of data collection in general, and as a result, the debate between privacy advocates and businesses has become framed as a conflict, privacy versus information. However, we as a society should seek solutions that promote both privacy and information.We at CDTF want to change the culture of data collection from one where one where businesses and other data collectors have all the control to one where individual users are secure enough in their privacy to become active participants and consumers of data. We’ll need new technologies, policies, and possibly legislation, but perhaps most crucial is our need to come to some consensus about how to balance individual privacy rights with our societal interest in information-sharing.We think an important first step is to develop new industry standards that describe what should be happening, not just what is happening in data collection. If more information-sharing is to happen, individual users have to become more confident about their privacy. So privacy standards have to be raised, not maintained.

For example, the first step in certification by privacy companies today is determining whether a company has a privacy policy. Although a company should certainly have a written policy, providing credit for merely having a written policy doesn’t raise the ante in any way.

Currently, many companies’ privacy policies don’t even cover all traffic to a website, as they disclaim responsibility for the practices of their partners and/or third-party advertisers. A standard that declares as a “best practice” the use of an all-inclusive privacy policy that covers all traffic to a site would certainly raise the bar.

Although few companies now would meet this standard, by declaring it to be a possibility, users would become better aware how most privacy policies are not all-inclusive, while companies willing to meet the standard would be able to signal more clearly how they are different from their competitors.

We think the bar should be raised on the following issues as well:

1. How much notice is required when the terms of a privacy policy change;
2. How changes in privacy policy apply to data collected under the previous policy;
3. How long data is stored;
4. How explicitly companies describe how data is used;
5. How data is secured and anonymized before it is shared with 3rd parties in order to provide an “appropriate” level of protection.

At the same time, user awareness of the potential benefits of multi-directional information-sharing, to both individuals and society as a whole, has to increase. We think new standards for user participation in the management of their data should be created around these issues:

1. User access to collected data;
2. User control over whether data is shared and for what purpose;
3. Use control over the “level of anonymization” applied to data before it is shared;
4. Availability of data for public secondary use.

We’re not here to say, “Ta-da! Here are the perfect standards for reconciling the goals of privacy and information-sharing.” Instead, we want to start a conversation on how such standards could be useful, how they could be developed, and how they could be promoted.

Ballmer sees a customer demand for privacy

Monday, July 28th, 2008

I have meant to comment on this Ballmer clip from the Washington Post (embedded below) for a few weeks, so this is hardly news, but I can’t pass it up.

There is some undeniably good news for those concerned about privacy. After years of laying low under the glare of Google’s “Don’t be Evil” mantra, Steve Ballmer has said “I actually think we [Microsoft] are going to have to compete on Privacy Policy.”

Watch the clip:

Ballmer is suggesting that the market might make “privacy” a competitive differentiator between services. For example, as a user you might compare two personal financial services with similar functionality, and choose the one that offers “better” privacy guarantees.

To date I think it is fair to say that privacy has not been a mainstream concern for people choosing what software or service to use.

How Microsoft chooses to execute on this lofty goal remains to be seen. At CDTF, our hope is that Microsoft won’t frame privacy in the usual fear-mongering terms of protecting individuals from being “spied on” or “exposed” but will instead address privacy head-on by finally providing individuals with the control and access they need to become active partners in safeguarding their own privacy(…without losing the opportunity to take advantage of the data available to us all: as individuals, businesses, society and yes, even marketers.)

[Full disclosure: Microsoft is currently a client of my consulting business. I also worked at Microsoft from 2003-2006.]

Let’s ask the government to give us information!

Monday, July 7th, 2008

My contracts professor from law school, Ian Ayres, suggests in his book Super Crunchers that the IRS become a source for useful information for ordinary people. The agency could tell taxpayers how much others in their income bracket, on average, are donating to charity or contributing to their IRAs, or tell small businesses whether they might be spending too much money on advertising.

The idea isn’t so far-fetched. About two months ago, the Italian government caused an uproar when it published online the tax details of every single Italian taxpayer. Allegedly meant to fight tax evasion, the move by the outgoing government sounded more like it was motivated by political spite. The most fascinating thing for me, though, was reading various comments in the blogosphere and finding out Norway, Sweden, and Finland do this every year! Apparently, the tax documents are considered official and therefore public records. According to the Swedish government, it’s in keeping with a general principle of government transparency: “To encourage the free exchange of opinion and availability of comprehensive information, every Swedish citizen shall be entitled to have free access to official documents.” And no one really minds.

Of course, this would be inconceivable in the U.S.—there’s a law against it. But as Ian Ayres suggests, the idea that the government should be giving information back to us, instead of just collecting it from us, isn’t totally crazy and Scandinavian. It could be released in anonymized aggregates or in others ways that wouldn’t reveal how much our neighbor makes. The information could be genuinely useful, not just titillating.

There could even be implications for public policy. So much of government policy is expressed in the Internal Revenue Code (such as favoring homeownership over renting), but our debates about tax cuts, mortgage deductions, and credits are based on fairly imprecise numbers. Even as we argue about what a tax cut will do to the “middle class,” we don’t even know what the “middle class” is. Where should government transparency start, if not at the point of revenue collection?

Scary pizza

Tuesday, June 17th, 2008

My friend sent this to me recently. Created by the ACLU for its campaign against the National ID program, it’s a mash-up of all our worst surveillance fears. It starts with a guy calling his local pizzeria for a couple of double meat pizzas, while you see the computer screen the girl at the pizza place is looking at as she rings up his order. She surprises him first by knowing his name, his home address, and his place of work from the moment his call comes in, but it gets rapidly worse, from a $20 health surcharge for meat pizza because of his high cholesterol and blood pressure to her snide comments about his waist size and his ability to pay for the pizzas, based on what she knows of his purchase history, including airplane tickets to Hawaii.

It’s entertaining, but also frustrating for a couple of reasons. First, there are very good reasons for me to be concerned about private companies’ data collection and their potential for collusion in U.S. government surveillance, but this video doesn’t explain how the National ID program would lead to the pizzeria having my health records. By focusing only on the sensational horror of the pizza girl knowing the customer bought a bunch of condoms, it forgets to tell us the pizzeria might literally be giving their customers’ names, phone numbers, and addresses to government officials. (The ACLU does have this report providing a more detailed argument about the dangers of private-public surveillance, but there was no direct link to it from the pizza video.)

Second, in terms of data collection and its dangers in general, the video ends up feeling sort of hysterical. It obscures, rather than clarifies, what’s really at stake.

We do live in a world where data collection is happening on an unprecedented level. But for me, what’s scary is not the mere possibility that all this data could get linked together. It’s about control. Do I get to decide who has my information? Do I get to control how it’s disseminated and analyzed?

Right now, we definitely don’t and that’s a problem. But the solution may not be to stop data collection altogether and segregate all the information out there so no linkage can happen ever.

I might not want the pizza girl at my local pizzeria to know about my health problems, but I might not mind if, as I ordered food online, the program allowed me to review my choices and build a more a nutritious meal specific to my needs, without disclosing my specific preferences to each restaurant. I might not want the government to be able to access my purchase history, but I might want to be able to securely track and access my purchases and my financial accounts at the same time so I can better determine how well I’m meeting my budget. I might even want to share certain information, securely and anonymously, if I thought it would lead to beneficial research by scientists, economists, and policymakers.

Of course, I wouldn’t sign up for anything if I thought my personal information could get leaked to the government or anyone else without my consent. It would make for a somewhat less dramatic video, but this is what the Common Datatrust Foundation is interested in addressing—how can we turn our capacity for data collection and sharing into something that is a public good, rather than a scary fear?

Frequently Asked Question #1: Why is Google offering Google Health?

Wednesday, May 21st, 2008

Everyone must be wondering the same thing I am, as the number one question on the FAQ’s about Google Health is: “Why is Google offering this product?” Related, of course, is Question #6: “If it’s free, how does Google make money off Google Health?”

Unfortunately, the answers aren’t very satisfying.

“It’s what we do. Our corporate mission is to organize the world’s information and make it universally accessible and useful. Health information is very fragmented today, and we think we can help. Google believes the Internet can help users get access to their health information and help people make more empowered and informed health decisions. People already come to Google to search for health information, so we are a natural starting point. In addition, we have a lot of experience storing and managing large amounts of data and developing consumer products that offer a positive and simple user experience.”

I thought their mission, as a corporation, was to maximize profits for their shareholders.

The answer to Question #6 is even worse:

“Much like other Google products we offer, Google Health is free to anyone who uses it. There are no ads in Google Health. Our primary focus is providing a good user experience and meeting our users’ needs.”

But we all know that “other Google products” that are free make money through advertising. And there are “no ads in Google Health”?

In launching Google Health, Google has clearly acknowledged that health information is even more sensitive than the personal information the company has been assiduously collecting up to this point. Although it glosses over the differences between its other applications and Google Health, promising to “conduct our health service with the same privacy, security, and integrity users have come to expect in all our services,” the mere fact that it doesn’t have advertising trumpets that Google is trying to differentiate Google Health from something like Gmail.

But the harder Google tries to assure me that there is no advertising and that the service is free, the harder it is for me to believe there are truly no costs to me. Clearly, there is a real value to providing secure online access to personal health records. Medical records, for the appropriate people, should be accessible, transferable, and plain legible, as anyone who has tried to read a doctor’s handwriting can attest. So why would someone give me something for nothing?

According to the Wall Street Journal, Google is not ruling out advertising in the future, and in the meantime, it hopes Google Health will simply drive more users to Google in general. Perhaps Google itself doesn’t quite know where Google Health will go. But given how easy it is to imagine nightmare scenarios of what can happen with this kind of information, I want the company who’s collecting it and storing it to have a better story about why it’s doing this.

Proposed legislation that gives people access to their own data

Tuesday, March 25th, 2008

I totally missed this.

Even thought I thought this New York Times article on data collection wasn’t very informative, a New York legislator was sufficiently moved to propose this legislation.

Michael Zimmer has some interesting comments about the proposed bill and some of its weaknesses, as well as a strength:

“The bill is strongest, however, in relation to a demand I have long made on Web search providers: let me see the data you have collected about my actions. The bill states:

17. Business entities shall provide consumers with reasonable access to personally identifiable information and other information that is associated with personally identifiable information retained by the third party entity for online preference marketing uses.

The press seems to have missed the importance of this section. If passed, the law would require Google, Facebook, DoubleClick, etc to provide me access to the personally identifiable information ‘and other information that is associated’ with my user account stored in their databases.

This is a vital right for consumers to be able to protect their data privacy: having access to view your data is the first step towards regaining some control over the collection of the data in the first place.”

The challenge–and it’s a worthy one–is how could this information be provided to us in a way that makes it useful and relevant? I’d like to see a law providing access to my own data that is more meaningful to me than HIPAA feels when I’m faced with a sheaf of waivers to sign at my doctor’s office.

A nonprofit wants to share its mailing list with some economists–would that bother you?

Thursday, March 13th, 2008

There’s a fascinating article in the New York Times Sunday Magazine on an economists’ study of what makes people donate by an interesting liberal-conservative pair, Dean Karlan and John List. They wanted to do an empirical study of fundraising strategies, to find out what kind of solicitations are the most successful. As the article points out, lab experiments of economic choices aren’t particularly realistic: “If you put a college sophomore in a room, gave her $20 to spend and presented her with a series of pitches from hypothetical charities, she might behave very differently than when sitting on her sofa sorting through letters from actual organizations.”

So Karlan and List found an opportunity for a field experiment, a partnership with an actual, unnamed nonprofit that allowed them to try different solicitation strategies and map the outcomes. They wrote solicitation letters that were similar, except some didn’t mention a matching gift, some mentioned a 1-to-1 match, some a 2-to-1, and some a 3-to-1. In the end, if a matching gift was mentioned, it increased the likelihood of a donation, but the size of the matching gift did not. As the author, David Leonhardt, notes, their findings and the findings of other economists in this area are significant to many people, from the nonprofits trying to be better fundraisers to economists studying human behavior, even to those who want to make tax policy more effective and efficient.

The article, however, didn’t mention whether the donors to the nonprofit had consented to their responses being shared with anyone other than the nonprofit. I’m not that concerned about whether donors’ privacy may have been egregiously violated. (I’m also not sure what’s required of nonprofits in this area.) I’m just curious to know, if they had been given the choice, would they have agreed to their information being shared with the economists? Obviously, the study wouldn’t have worked if potential donors had been told they would be sent different solicitation letters to measure their responses, but I think if most people on a nonprofit’s mailing list were asked if they would explicitly allow their information to be used in academic studies, they would consent. They might want assurances that their individual identities would be protected—that no one would know Mr. So-And-So had given zero dollars to a cause he publicly champions. But they might very well be willing to help the nonprofit figure out how to be more effective and be a part of an academic study that could shape public policy. They might even be curious to know how their giving measures compares to other donors in their income brackets or geographic areas.

Most people, myself included, have a knee-jerk antipathy to having their personal information shared with anybody other than the organization or company they give it to. But maybe we would feel differently if we were actually given some choices, if our personal identities could be protected, if sharing information could lead to more than just targeted advertising or more junk mail.

Ohmigod, companies are tracking what we look at online!

Monday, March 10th, 2008

Breaking news from the New York Times.

What’s truly interesting about this article, though, isn’t that the New York Times is announcing as “news” something that’s been going on for a very long time. Rather, the New York Times, even while devoting space on its front page, doesn’t really seem to have a point.

The article tries to distinguish itself from vague alarms raised by privacy advocates with data, the results of a study done with comCast measuring “data collection events,” each time “consumer data was zapped back to the Web companies’ servers.” (Even though the New York Times has produced some of the prettiest data graphics in recent memory, this one looks like something created on Excel and conveys little more than a flurry of numbers.) But the overwhelming impression left by the article is that companies are trying to target advertising, and some might do it better than others, rather than that extensive personal information is being collected. So then it isn’t surprising that several of the comments in response to the Bits blog post are about how they never click on ads, or how stupid these companies are in sending them ads for things they’re not interested in, or how they’ve blocked pop-up ads on their browser.

After all, the article mentions only briefly what kind of information is being collected: “the person’s zip code, a search for anything from vacation information, or a purchase of prescription drugs or other intimate items.” The article cites Jules Polonetsky, chief privacy officer for AOL, “[who] cautions that not all the data at every company is used together. Much of it is stored separate,” yet the author doesn’t explain the significance of that statement. The article doesn’t mention that even if consumer data is stripped of “identifiers” like a user name, individual identification could happen easily through the combination of datasets.

I would love to see an article by a mainstream publication that addresses this issue in a truly comprehensive and thoughtful way. What’s missing in the conversation started by this article is not only a fuller analysis of how personal information is being collected and what dangers there are for individual privacy, but also a nuanced discussion of that information’s value and what it means for “a handful of big players” to hold most of it. The article ends citing a study of California adults, 85% of whom thought sites should not be allowed to track their behavior around the Web to show them ads. But does that statistic really capture what’s at stake?

P.S. Is AOL’s innocent penguin happy or merely surprised that anchovy ads are being sent to him?

Facebook: The Only Hotel California?

Thursday, February 14th, 2008

As the subject of recent splashy news on privacy and personal data collection, Facebook is starting to seem a little scary. In the words of one former user, Nipon Das, “It’s like the Hotel California. You can check out anytime you like, but you can never leave.” We’ve heard how difficult it is to remove yourself from Facebook.

We’ve seen how Facebook initially chose to launch Beacon, a advertising tool that told your friends about your activities on other websites, such as a purchase on eBay, without an easy opt-out mechanism, until outrage and a petition organized by MoveOn.org forced Facebook to change its policy.

Facebook employees are even poking around private user profiles for personal entertainment.

But although Facebook is at the forefront of a new kind of marketing, it’s not the only company with discomforting privacy policies and terms of use. Facebook’s statement that its terms are subject to change at any time is standard boilerplate. Its disclosure that it may share your information with third parties to provide you service is also pretty standard. After all, it’s certified by TRUSTe, the leading privacy certifier for online businesses. In fact, Facebook is arguably more explicit than most companies about what it’s doing because by its very nature, it’s more obvious that users’ personal information is being collected.

You could argue that the users do have a choice. They could choose not to use Facebook. But how did it turn out that in the big world of the internet, we have only two choices: 1) provide your personal information on the company’s terms; or 2) don’t use the service?

So far, it’s not clear that the controversy around Facebook has led to increased public concern about other companies and their personal data collection. It doesn’t even seem to have spilled over to all the programs that run on Facebook’s platform. No one seems perturbed that the creator of some random new application for feeding virtual fish now has access to his or her profile.

But there clearly is growing public unease, an increasing sense that our Google searches or our online purchases may be available to people we don’t know and can’t trust. Perhaps Facebook will end up providing an invaluable public service, albeit inadvertently, in making more people wonder, “What exactly did I agree to?”


Get Adobe Flash player