Posts Tagged ‘Privacy Policies’

Promises, promises: what information is being shared with third parties?

Friday, May 8th, 2009

If you read a bunch of privacy policies in a row, they all start to sound the same.  They all seem to collect a whole lot of information from you, whether or not they call it “personal,” and they all seem to have similar reasons for doing so.  The most common are:

  • To provide services, including customer service
  • To operate the site/ensure technical functioning of the site
  • To customize content and advertising
  • To conduct research to improve services and develop new services.

They also list the circumstances in which data is shared with third parties, the most common being:

  • To provide information to subsidiaries or partners that perform services for the company
  • To respond to subpoenas, court orders, or legal process, or otherwise comply with law
  • To enforce terms of service
  • To detect or prevent fraud
  • To protect the rights, property, or safety of the company, its users, or the public
  • Upon merger or acquisition.

After awhile, you can almost get lulled into believing these are all just very standard, normal uses of your information.

The policies generally use language that makes it all seem very reasonable.  “Customize” advertising sounds a lot better than “targeted” advertising.  Who wants to be a “target”?  New York Times Digital even assures its readers that print subscribers’ information will be sold to “reputable companies” that offer marketing info or products through direct mail, which sounds wonderfully quaint.

But what I find most interesting is the way many companies admit that they do share information with third parties.

It’s probably a surprise to many Americans, as a recent survey found that a majority of Californians think that when a company merely has a privacy policy, that means the company doesn’t share its users’ information with third parties.  Clearly, most of these people have never actually read a privacy policy, but even if they had, they wouldn’t necessarily be enlightened about what kind of information is being shared.

Most policies begin their discussion of information-sharing with a declaration that they don’t share information with third parties, with certain exceptions.  Yahoo states, “Yahoo! does not rent, sell, or share personal information about you with other people or non-affiliated companies except to provide products or services you’ve requested, when we have your permission, or under the following circumstances.”  Microsoft: “Except as described in this statement, we will not disclose your personal information outside of Microsoft and its controlled subsidiaries and affiliates without your consent.”  Google’s construction is slightly different, but when it states the circumstances in which it shares information, the first circumstance is, “We have your consent. We require opt-in consent for the sharing of any sensitive personal information.”

The crucial issue, then, is how “personal information” is defined.  And as I described in my last blog post, the definition of “personal information” varies widely from company to company.  When the definition can vary so much, the promise not to share “personal information” isn’t an easy one to understand.

For example, Google’s promise not to share “sensitive personal information”: it’s “information we know to be related to confidential medical information, racial or ethnic origins, political or religious beliefs or sexuality and tied to personal information.”  Does that mean that my search queries for B-list celebrities are fair game?

Given the varying definitions of “personal” that are used, the strong declaration that my “personal information” will generally not be shared is not, ultimately, a very comforting one.  At the same time, many of these companies admit that they will share “aggregate” or “anonymous” information collected from you.  But they don’t explain what they’ve done to make that information “anonymous.”  As we know from AOL’s debacle, a company’s promise that information has been made anonymous is no guarantee that it’ll stay anonymous.

In this context, it’s interesting that Ask Network explicitly lists what it is sharing with third parties, so you don’t have to figure out what they consider personal and not personal:

(a) your Internet Protocol (IP) address; (b) the address of the last URL you visited prior to clicking through to the Site; (c) your browser and platform type (e.g., a Netscape browser on a Macintosh platform); (d) your browser language; (e) the data in any undeleted cookies that your browser previously accepted from us; and (f) the search queries you submit. For example, when you submit a query, we transmit it (and some of the related information described above) to our paid listing providers in order to obtain relevant advertising to display in response to your query. We may merge information about you into group data, which may then be shared on an aggregated basis with our advertisers.

Ask Network also goes on to promise that that third-parties will not be allowed to “make” the information personal, explicitly acknowledging that the difference between personal and not-personal is not a hard, bright line.

We at CDP don’t really care whether IP addresses are included in the “personal information” category or not.  What we really want to see are honest, meaningful promises about user privacy. We would like to see organizations offer choices to users about how specific pieces of data about them are stored and shared, rather than simply make broad promises about “personal information,” as defined by that company.  It may turn out that “personal” and “anonymous” are categories that are so difficult to define, we’ll have to come up with new terminology that is more descriptive and informative.

Or companies will end up having to do what Wikipedia does: honestly state that it “cannot guarantee that user information will remain private.”

Ack! Congress writing privacy policies?

Thursday, May 7th, 2009

It remains to be seen what actually get’s proposed. But, on first blush, it doesn’t feel right for Congress to be writing privacy policies for all the interwebs. But that appears to be what the Democratic Congressman from Virginia (Rick Boucher) is trying to do:Rick Boucher
‘If the site used its customer data for first-party purposes (i.e., the site itself advertising to its own customers), it would have to offer consumers an opt-out option. “The default position would be that the first-party marketing transaction could occur,” Boucher elaborated. “It would only be prevented if the affirmative step was taken to say, ‘no, you can’t do that.”

‘But if the customer information is going to be sent to “some completely unrelated party,” Boucher added, “not associated with the first-party transaction, that would fall under opt-in, and that information could then be shared with the other party only if the customer affirmatively took the step of saying ‘yes you can share it.'”

What would be the fallout of such legislation for you and me?

Every time I use Google without logging in (which is almost always), do I need to give permission for Google to collect data from me so they now what ads to serve up? What if I use the Google search bar in my browser? How would that work?

Since advertising is “core” to Google’s business, maybe collecting search query data would fall under “first-party purposes”, even though that data is shared with “third-party advertisers”.

It’s a sign of the times that even Congress is starting to worry about the fine print in privacy policies and we certainly laud attempts to cut through the obfuscation of privacy legalese.

Still, this binary opt-in/opt-out approach feels like a hatchet job where a scalpel is needed.

Or better yet, Congress should first focus on legislation that will create standards around currently wishy-washy concepts of “anonymization” and “personal information” that allow companies to violate the spirit of their own policies, if not the letter.

Don’t take it personally: how “personal” information is defined in privacy policies

Tuesday, April 28th, 2009

Most privacy certification programs, like Truste, require that the privacy policy identify what kinds of personally identifiable information (PII) are being collected.  It’s a requirement that’s meant to promote transparency—the user must be informed!

As a result, nearly every privacy policy we looked at included a long list of the types of information being collected.  But who can process a long catalog of items?  What popped out at me, after reading policy after policy, was the way so many of the companies we surveyed categorize the information they collect into 1) “personal information” that you provide, such as name and email address, often when you sign up for an account; and 2) cookie and log data, including IP address, browser type, browser language, web request, and page views.

When the first category is called “personal” information, the second category implicitly becomes “not-personal” information.  But the queries we put into search engines—what could be more personal?  How much could you learn about me, just looking at the history of things I’ve bought on Amazon, let alone the things I’ve Googled?  What is an IP address if not a marker linking my computer to the actions I (and others) take on that computer?

Yahoo and Amazon go the extra step of labeling cookie and log data, “automatic information,” giving it a ring of inevitability.  Ask Network calls this information “limited information that your browser makes available whenever you visit any website.”  Wikipedia similarly states, “When a visitor requests or reads a page, or sends email to a Wikimedia server, no more information is collected than is typically collected by web sites.”

There are companies that do define “personal information” much more broadly.  EBay’s definition includes “computer and connection information, statistics on page views, traffic to and from the sites, ad data, IP address and standard web log information” and “information from other companies, such as demographic and navigation information.”  AOL states that its AOL Network Information may include “personally identifiable information” that includes “information about your visits to AOL Network Web sites and pages, and your responses to the offerings and advertisements presented on these Web sites and pages” and “information about the searches you perform through the AOL Network and how you use the results of those searches.”

And there are websites that don’t collect information at all: Ixquick and Cuil, the search engines that have been trying to build a brand around privacy.  These companies decided that privacy required that they not record IP addresses, and Ixquick deletes log data after 48 hours.

Personally, I don’t think the solution is in deleting IP addresses and log data willy-nilly.  But we as a society can’t have a thoughtful discussion on what it takes to balance privacy rights against the value of data if companies aren’t honest about how “personal” cookie and log data can be.

Some companies do acknowledge that information that they don’t consider “personal” could become personally identifying if it were combined with other data.  Microsoft therefore promises to “store page views, clicks and search terms…separately from your contact information or other data that directly identifies you (such as your name, email address, etc.).  Further we have built in technological and process safeguards to prevent the unauthorized correlation of this data.”  Similarly, WebMD makes this promise: “we do not link non-personal information from Cookies to personally identifiable information without your permission and do not use Cookies to collect or store Personal Health Information about you.”  WebMD further states that data warehouses it contracts with are required to agree that they “not attempt to make this information personally identifiable, such as by combining it with other databases.”

Otherwise, there’s very little discussion of what combination of data means.  When data is combined, many data sets that initially appear to be anonymous or “non-personally identifiable” can become de-anonymized.  Researchers at the University of Texas in recent years have demonstrated that it is possible to de-anonymize through combination, as when Netflix data is combined with IMDB ratings,  or when Twitter is combined with Flickr.   So when companies offhandedly note that they are combining information they collect from different sources, they are learning a great deal more about individual people than the average user would imagine.  And as you might imagine, large companies like Microsoft, Google, and Yahoo have a wealth of databases at their disposal.

So that’s “transparency,” a long list of types of information collected and artful categorization.  It’s amazing that some privacy policies can use so many words and yet say so little.

What do privacy policies actually say?

Friday, April 24th, 2009

Last year, the Common Data Project started a project to survey and analyze the privacy policies of some of the largest, most visited Internet companies. Reading the policies was truly as painful as expected, horrifically boring and difficult to decipher. We found that many companies are as vague and wordy as they can be, which is surely no surprise to anyone interested in online privacy. So why did we do it?

CDP is committed to understanding and articulating a set of “best practices” for data collection and privacy protection. We don’t simply want to criticize companies for their obfuscation. We want to set forth standards that declare it is both possible and desirable to make privacy an integral part of data collection, and not just an afterthought.

But what’s the status quo? What are major companies promising now? What language are they using, and what implications are there for the kind of privacy concerns people actually have?

The first question we asked: What data collection is happening on the site that is not covered by the privacy policy?

It might seem like an odd question. But the fact that there is data collection going on that’s not covered captures so much of what is confusing for people who are used to the bricks-and-mortar world. When you walk into your neighborhood grocery store, you might not be surprised that the owner is keeping track of what is popular, what is not, and what items people in the neighborhood seem to want. You would be surprised, though, if you found out that some of the people in the store who were asking questions of the customers didn’t work for the grocery store. You would be especially surprised if you asked the grocery store owner about it, and he said, “Oh those people? I take no responsibility for what they do.” (Even Walmart, master of business data, probably doesn’t let third parties into its stores to do customer surveys that aren’t on Walmart’s behalf.)

But in the online world, that happens all the time. I’m not talking about the fact that when you click on a link and leave a site, you will end up subject to new rules. I’m talking about data collection by third party advertisers that’s happening while you sit there, looking at that site. Companies rarely vouch for what these third party advertisers are doing. Some companies, such as AOL, Microsoft, Yahoo, Facebook, Amazon, and the New York Times Digital, will at least explicitly acknowledge there are third parties that use cookies on their sites with their own policies around data collection. The user is then directed to these third parties’ privacy policies, as New York Times Digital does here. (Note that some of these links are outdated, at least at the time of this post.)

Google, in contrast, doesn’t mention third party advertisers on its privacy policy directly, alluding to the separate controls for opting out of their tracking on a separate page discussing advertising and privacy.

Companies that don’t allow third party advertisers, like Craigslist, of course have no reason to declare this is happening.

We live in a pretty topsy-turvy world. Let’s say you’re an ordinary user with some vague concerns about privacy. You’ve never read a privacy policy in your life (the way I never had until I started working with CDP), and you decide, oh, I’m going to read Yahoo’s privacy policy. And then you find out that you have to read several more policies if you really want to know who is collecting data from you, how, and for what. Can you imagine if the grocery store owner told you you had to go talk to six different people to understand what was being tracked in that store?

We’ll eventually publish a report summarizing our findings on our website, but we’re going to keep rolling out these posts analyzing different aspects of online privacy policies. We’d love to hear what you think about our analysis, whether you agree or vehemently disagree. Tune in for more.

How should we define “personal information”?

Thursday, September 4th, 2008

We at CDP recently decided that in keeping with our work on developing new standards for online data collection, we should also create a survey of the privacy policies of the biggest online companies. We want to help users not only understand privacy policies more quickly and easily, but also to help them compare the practices of different companies.

As a result, I’ve been spending a lot of time reading privacy policies.  I knew it wouldn’t be a fun activity, but it’s also been challenging in ways I didn’t quite anticipate.  As I started to sit down and actually compare policies across a set of specific issues, it became quickly obvious that although they use many of the same words—private, personal, anonymous—they aren’t all using the same definitions.

For example, Yahoo defines “personal information” as “information about you that is personally identifiable like your name, address, email address, or phone number, and that is not otherwise publicly available.”  Although it discusses the collection of other information, like log data and IP addresses, it never calls this information “personal.”  Ask.com takes a similar tack, disclosing that it does collect such information, but calling it “anonymous information.”

AOL, in contrast, defines “AOL Network Information” as “personally identifiable information” that includes data like IP addresses, sites visited, and search history.  Of course, AOL can’t pretend that such data is actually “anonymous.”  After all, its proud release of “scrubbed” search query data two years ago was quickly shown to reveal the individual identities of thousands of users.

So what do you think?  When a privacy policy makes promises about your “personal information,” should that include your search query history, your IP address, and your log data?  If not, does that mean these companies are free to do what they will with this data?  Leave it unsecured? Hand it over to marketers, government, anyone?

And what does it mean to us, as a society, that companies are defining these words on their terms?


Get Adobe Flash player