Posts Tagged ‘Internet companies’

It’s our data. When do we get to use it, too?

Thursday, June 4th, 2009

When we started this survey of privacy policies, our goal was simple: find out what these policies actually say.  But our larger goal was to place the promises companies made about users’ privacy in a larger context—how do these companies view data?  Do they see it as something that wholly belongs to them?  Because ultimately, their attitude towards this data very much shapes their attitude towards user privacy.

In the last couple of years, we’ve seen an unprecedented amount of online data collection that’s happened largely surreptitiously.  We can’t say that we, as users, haven’t gotten something in return.  The “free” services on the internet have been paid for with our personal information.  But the way the information has been collected has prevented us negotiating with the benefit of full information.  In other words, we haven’t gotten a good deal.  The data we’ve provided is so valuable, we should have struck a harder bargain.

And I think more and more people are starting to feel that way.  Even though most only feel a vague discomfort at this point, it’s unlikely that companies like RealAge will be able to continue what they’ve been doing.

For us at CDP, the fear is that we’ll throw the baby out with the bathwater.  We don’t want to shut down data collection altogether—we just want companies to stop thinking of our data as their data and their data alone.  We want to be able to share in the incredible value that this data has, so that we as a society can all benefit from the data collection and analysis capabilities we’ve developed.  Of course, that’s only possible with stronger privacy protections than are available now, which is why privacy is such an important issue for us to understand.

So what would it look like for us to “share” in the value of data?  It might sound crazy that companies collecting all this data would ever share data with their users, but it’s already happening.

Google, as a company that believes it’s in the business of information rather than advertising, does make some sincere efforts to provide data to the public.  Google Trends may be intended for advertisers, but it also provides the whole world with information on what people are searching for.  Google Flu Trends is a natural outgrowth of that, and some researchers believe this data can be helpful in determining where flu outbreaks are going to occur faster than reporting by clinics.

Some companies, like eBay and Amazon, have built their data collection into the service they provide to their customers.  Some of the information they collect on transactions and ratings can be viewed by all users.  Anyone looking to bid on an item on eBay can see how other buyers have rated that seller.  A user of Amazon looking to buy a new digital camera can view what other buyers considered.

Although Wikipedia is a bit different as a nonprofit, the service it provides also actively incorporates public disclosure of the data collected.  The contributions of any one editor can be seen in aggregate and aggregate stats on website activity are also available to the general public.  This information is important in the self-policing that is essential for Wikipedia to maintain any credibility.

Although the amount of data these companies are sharing with their users and the public is miniscule compared to the amount of data they’ve actually collected from us, it raises the possibility that data collection could happen in a completely different way than it does now.  Companies could make more obvious that data collection is happening, and instead of scaring users away, give users some reason to participate in the collection of data.  The whole process could be one in which users are openly engaged, rather than one in which users feel hoodwinked.

So this is our goal at CDP: what do we need to do in terms of privacy protection, both in terms of technologies and social norms, to make this model of data collection possible?

And if the terms of the policy change?

Tuesday, June 2nd, 2009

It’s bad enough that most of the “choices” we have in privacy today are either, “Accept our terms or don’t use the service.”  But then the terms can change at any time?

Nearly every privacy policy I looked at had some variation on these words: “Please note that this Privacy Policy may change from time to time.”  If the changes are “material,” which is a legal phrase meaning “actually affects your rights,” then all data that’s collected under the prior terms will remain subject to those terms.  Data that’s collected after the change, though, will be subject to the new terms, and the onus is put on the user to check back and see if the terms have changed.

Most companies, like Google, Yahoo, and Microsoft, promise to make an effort to let you know that material changes have been made, by contacting you or posting the changes prominently.  Some, like New York Times Digital and Facebook, promise that material changes won’t go into effect for six months, giving their users some time to find out.

Recently, Facebook decided to test out the right they had reserved to change the terms of use.  Facebook wanted to amend the terms of its license to the content provided by Facebook members.  Although it wasn’t actually a term in the privacy policy, it implicated users’ privacy rights as it involved personal content they had uploaded to Facebook.  Facebook claimed that its new terms of use didn’t materially change users’ rights but merely clarified what was already happening with data.  For example, if user A decides to send a message to user B, and then A deletes her account, the message A sent to B will not be deleted from B’s account.  The information is no longer belongs only to user A.

However, Facebook’s unilateral attempt to change the terms of use provoked such uproar that the changes were withdrawn.  Instead, two new documents were created, Facebook Principles and Statement of Rights and Responsibilties, and users were given the option to discuss and vote on these documents before they go into effect.  Ultimately, the new versions were approved by vote of Facebook members.

Facebook is certainly not a model of privacy protection, but this incident is illuminating.  Legally, Facebook could change its terms without its members’ approval.  But practically, it couldn’t.  There’s been some debate over whether angry users understood the changes and what they meant, but that’s almost irrelevant.  Facebook couldn’t simply dictate the terms of its relationship with its users any more, given that its greatest asset is the content created by its users.

It may seem counterintuitive, but it’s not surprising that some of the most visible and effective consumer efforts to change how a company uses personal information have stemmed from an online service based on voluntary sharing. The more people are given opportunities to participate in how information is shared, the better people can understand what it means for a company to share their information and the more likely they are to feel empowered to shape what happens to their information.  Facebook can’t offer the service that it does without the content generated by its users.  But as it’s begun to realize, its users then have to be a part of decisions about the way that content is used.

We all know privacy policies are frustrating, inadequate, and difficult to understand.  So it’s good to remember that all our privacy battles don’t have to be fought on their terms.

Multiple-choice privacy

Thursday, May 21st, 2009

Everyone agrees that “choice” is crucial for protecting privacy. But what should the choices be?

a) Do not call me, email me, or contact me in any way.
b) Do not let any of your partners/affiliates/anyone else call me, email me, or contact me in any way.
c) Let me access, edit, and delete my account information.
d) Let me access, edit, and delete all information you’ve collected from me, including log data.
e) Track me not.
f) All of the above.
g) None of the above.

Until recently, most tools offered by internet companies over user information have focused on helping people avoid being contacted, i.e., “marketing preferences.” That’s presumably what we cared about when privacy was all about the telemarketer not calling you at home. Companies have also given users access to their account information, which is in the companies’ own interest, since they would prefer to have updated information on you as well.

But few companies acknowledge that other kinds of information they’ve collected from you, like log data, search history, and what you’ve clicked on, might affect your sense of privacy as well. Since they conveniently choose not to call this kind of information “personal,” they have no privacy-based obligation to give you access to this information or allow you to opt out of it.

Still, in the last year or two, there have been some interesting changes in the way some companies view privacy choices. They’re starting to understand that people not only care about whether the telemarketer calls them during dinner, but also whether that telemarketer already knows what they’re eating for dinner.

Most privacy policies will at least state that the user can choose to turn off cookies, though with the caveat that the action might affect the functionality of the site. AskNetwork developed AskEraser to be a more visible way for users to use without being tracked, but as privacy advocates noted, AskEraser requires that a cookie be downloaded, when many people who care about privacy periodically clear their cookies. AskEraser also doesn’t affect data collection by third parties.

More interestingly, Google recently announced some new tools for their targeted advertising program for people concerned about being tracked. These tools include a plug-in for people who don’t want to be tracked that will persist even when cookies are cleared and a way for users to know what interests have been associated with them. Google’s new Ad Preferences page also allows people to control what interests are associated with them and not just turn off tracking altogether.

Neither tool is perfect but they’re still fascinating. The more users are able to see what companies know about them, the better they can understand what kind of information is being collected as they use the internet. And Google seems to recognize that people’s concerns about privacy can’t just be assuaged just through an on-off switch, that we want more fine-tuned controls instead.

The big concern for me, though, is whether Google or any other company that wants to be innovative about privacy is actually interested in fundamentally changing the way data is collected. Google’s targeted advertising program can afford to lose the data they would have tracked from privacy geeks, and still rely on getting as much information as possible from others, most of whom have no idea what is happening.

Data retention: are we missing the point?

Tuesday, May 12th, 2009

Data retention has been a controversial issue for many years, with American companies not measuring up to the European Union’s more stringent requirements.  But for us at CDP, it obscures what’s really at stake and often confuses consumers.

For many privacy advocates, limiting the amount of time data is stored reduces the risk of it being exposed.  The theory, presumably, is that sensitive data is like toxic waste, and the less we have of it lying around, the better off we are.  But that theory, as appealing as it is, doesn’t address the fact that our new abilities to collect and store data are incredibly valuable, not just to major corporations, but to policymakers, researchers, and even the average citizen.  It doesn’t seem like focusing on this issue of data retention has necessarily led to better privacy protections.  In fact, it may be distracting us from developing better solutions.

For example, Google and Yahoo in the past year announced major changes to their policies about data retention, promising to retain data for 9 months and 6 months, respectively.  These promises, however, were not promises to delete data, but to “anonymize” it. As discussed previously, neither company defines precisely what that verb means.  According to the Electronic Frontier Foundation, Yahoo is still retaining 24 of 32 digits of users’ IP addresses.   As the Executive Director of Electronic Privacy Information Center (EPIC) stated, “That is not provably anonymous.” Yet most mainstream media headlines focused only on the Yahoo’s claim of shorter data retention.  The article in which the above quote appeared sported the headline: “Yahoo Limits Retention of Personal Data.”

At the same time, despite all the controversy around data retention, this issue isn’t even addressed in the privacy policies of these three ISPs.  Google addressed this issue in a separate FAQs section, while Yahoo addressed it in a press release and its blog.  Microsoft in December 2008 said that they would cut their data retention time from 18 months to six if their major competitors did the same.  But this information was not in the privacy policy itself.  Among the other companies I looked at, Wikipedia, Ask Network, Craigslist, and WebMd did at least provide some information, if not comprehensive answers, on how long certain types of data are retained in their policies.  No information could be found readily on the sites of eBay, AOL, Amazon, New York Times Digital, Facebook, and Apple.

This might be due to the fact that data retention remains a somewhat obscure issue to most internet users.  But it’s also true that for many of these sites, much of the data that’s collected is part of the service.  As an eBay buyer or seller, it’s useful to see how others have been rated.  On Amazon, it’s helpful to know what others have considered as they shop for a particular product.  At the same time, my buying and viewing history on Amazon could easily reveal as much, if not more, that I want to keep private as my surfing history on Google.  So why does most of the focus on data retention seem to be on ISPs and search engines?

When I look at a search engine like Ixquick, which is trying to build a reputation for privacy by not storing any information, I’m even less convinced that deleting all the data is a sustainable solution.  Ixquick is a metasearch engine, meaning that it’s pulling results from other search engines.  It’s not a solution to replace Google or Yahoo for everyone.  It feels more like a handy tool for someone who is concerned about his or her privacy, than a model that other search engines could end up following.  If data deletion by all search engines is the goal, the example to hold up can’t be a search engine that relies on other non-deleting search engines.

What exactly do we want to keep private?  At the same time, what information do we want to have?  What is the best way to balance these interests?  These are the questions we should be asking, not “How long is Yahoo going to keep my data?”

Promises, promises: what information is being shared with third parties?

Friday, May 8th, 2009

If you read a bunch of privacy policies in a row, they all start to sound the same.  They all seem to collect a whole lot of information from you, whether or not they call it “personal,” and they all seem to have similar reasons for doing so.  The most common are:

  • To provide services, including customer service
  • To operate the site/ensure technical functioning of the site
  • To customize content and advertising
  • To conduct research to improve services and develop new services.

They also list the circumstances in which data is shared with third parties, the most common being:

  • To provide information to subsidiaries or partners that perform services for the company
  • To respond to subpoenas, court orders, or legal process, or otherwise comply with law
  • To enforce terms of service
  • To detect or prevent fraud
  • To protect the rights, property, or safety of the company, its users, or the public
  • Upon merger or acquisition.

After awhile, you can almost get lulled into believing these are all just very standard, normal uses of your information.

The policies generally use language that makes it all seem very reasonable.  “Customize” advertising sounds a lot better than “targeted” advertising.  Who wants to be a “target”?  New York Times Digital even assures its readers that print subscribers’ information will be sold to “reputable companies” that offer marketing info or products through direct mail, which sounds wonderfully quaint.

But what I find most interesting is the way many companies admit that they do share information with third parties.

It’s probably a surprise to many Americans, as a recent survey found that a majority of Californians think that when a company merely has a privacy policy, that means the company doesn’t share its users’ information with third parties.  Clearly, most of these people have never actually read a privacy policy, but even if they had, they wouldn’t necessarily be enlightened about what kind of information is being shared.

Most policies begin their discussion of information-sharing with a declaration that they don’t share information with third parties, with certain exceptions.  Yahoo states, “Yahoo! does not rent, sell, or share personal information about you with other people or non-affiliated companies except to provide products or services you’ve requested, when we have your permission, or under the following circumstances.”  Microsoft: “Except as described in this statement, we will not disclose your personal information outside of Microsoft and its controlled subsidiaries and affiliates without your consent.”  Google’s construction is slightly different, but when it states the circumstances in which it shares information, the first circumstance is, “We have your consent. We require opt-in consent for the sharing of any sensitive personal information.”

The crucial issue, then, is how “personal information” is defined.  And as I described in my last blog post, the definition of “personal information” varies widely from company to company.  When the definition can vary so much, the promise not to share “personal information” isn’t an easy one to understand.

For example, Google’s promise not to share “sensitive personal information”: it’s “information we know to be related to confidential medical information, racial or ethnic origins, political or religious beliefs or sexuality and tied to personal information.”  Does that mean that my search queries for B-list celebrities are fair game?

Given the varying definitions of “personal” that are used, the strong declaration that my “personal information” will generally not be shared is not, ultimately, a very comforting one.  At the same time, many of these companies admit that they will share “aggregate” or “anonymous” information collected from you.  But they don’t explain what they’ve done to make that information “anonymous.”  As we know from AOL’s debacle, a company’s promise that information has been made anonymous is no guarantee that it’ll stay anonymous.

In this context, it’s interesting that Ask Network explicitly lists what it is sharing with third parties, so you don’t have to figure out what they consider personal and not personal:

(a) your Internet Protocol (IP) address; (b) the address of the last URL you visited prior to clicking through to the Site; (c) your browser and platform type (e.g., a Netscape browser on a Macintosh platform); (d) your browser language; (e) the data in any undeleted cookies that your browser previously accepted from us; and (f) the search queries you submit. For example, when you submit a query, we transmit it (and some of the related information described above) to our paid listing providers in order to obtain relevant advertising to display in response to your query. We may merge information about you into group data, which may then be shared on an aggregated basis with our advertisers.

Ask Network also goes on to promise that that third-parties will not be allowed to “make” the information personal, explicitly acknowledging that the difference between personal and not-personal is not a hard, bright line.

We at CDP don’t really care whether IP addresses are included in the “personal information” category or not.  What we really want to see are honest, meaningful promises about user privacy. We would like to see organizations offer choices to users about how specific pieces of data about them are stored and shared, rather than simply make broad promises about “personal information,” as defined by that company.  It may turn out that “personal” and “anonymous” are categories that are so difficult to define, we’ll have to come up with new terminology that is more descriptive and informative.

Or companies will end up having to do what Wikipedia does: honestly state that it “cannot guarantee that user information will remain private.”

Don’t take it personally: how “personal” information is defined in privacy policies

Tuesday, April 28th, 2009

Most privacy certification programs, like Truste, require that the privacy policy identify what kinds of personally identifiable information (PII) are being collected.  It’s a requirement that’s meant to promote transparency—the user must be informed!

As a result, nearly every privacy policy we looked at included a long list of the types of information being collected.  But who can process a long catalog of items?  What popped out at me, after reading policy after policy, was the way so many of the companies we surveyed categorize the information they collect into 1) “personal information” that you provide, such as name and email address, often when you sign up for an account; and 2) cookie and log data, including IP address, browser type, browser language, web request, and page views.

When the first category is called “personal” information, the second category implicitly becomes “not-personal” information.  But the queries we put into search engines—what could be more personal?  How much could you learn about me, just looking at the history of things I’ve bought on Amazon, let alone the things I’ve Googled?  What is an IP address if not a marker linking my computer to the actions I (and others) take on that computer?

Yahoo and Amazon go the extra step of labeling cookie and log data, “automatic information,” giving it a ring of inevitability.  Ask Network calls this information “limited information that your browser makes available whenever you visit any website.”  Wikipedia similarly states, “When a visitor requests or reads a page, or sends email to a Wikimedia server, no more information is collected than is typically collected by web sites.”

There are companies that do define “personal information” much more broadly.  EBay’s definition includes “computer and connection information, statistics on page views, traffic to and from the sites, ad data, IP address and standard web log information” and “information from other companies, such as demographic and navigation information.”  AOL states that its AOL Network Information may include “personally identifiable information” that includes “information about your visits to AOL Network Web sites and pages, and your responses to the offerings and advertisements presented on these Web sites and pages” and “information about the searches you perform through the AOL Network and how you use the results of those searches.”

And there are websites that don’t collect information at all: Ixquick and Cuil, the search engines that have been trying to build a brand around privacy.  These companies decided that privacy required that they not record IP addresses, and Ixquick deletes log data after 48 hours.

Personally, I don’t think the solution is in deleting IP addresses and log data willy-nilly.  But we as a society can’t have a thoughtful discussion on what it takes to balance privacy rights against the value of data if companies aren’t honest about how “personal” cookie and log data can be.

Some companies do acknowledge that information that they don’t consider “personal” could become personally identifying if it were combined with other data.  Microsoft therefore promises to “store page views, clicks and search terms…separately from your contact information or other data that directly identifies you (such as your name, email address, etc.).  Further we have built in technological and process safeguards to prevent the unauthorized correlation of this data.”  Similarly, WebMD makes this promise: “we do not link non-personal information from Cookies to personally identifiable information without your permission and do not use Cookies to collect or store Personal Health Information about you.”  WebMD further states that data warehouses it contracts with are required to agree that they “not attempt to make this information personally identifiable, such as by combining it with other databases.”

Otherwise, there’s very little discussion of what combination of data means.  When data is combined, many data sets that initially appear to be anonymous or “non-personally identifiable” can become de-anonymized.  Researchers at the University of Texas in recent years have demonstrated that it is possible to de-anonymize through combination, as when Netflix data is combined with IMDB ratings,  or when Twitter is combined with Flickr.   So when companies offhandedly note that they are combining information they collect from different sources, they are learning a great deal more about individual people than the average user would imagine.  And as you might imagine, large companies like Microsoft, Google, and Yahoo have a wealth of databases at their disposal.

So that’s “transparency,” a long list of types of information collected and artful categorization.  It’s amazing that some privacy policies can use so many words and yet say so little.

What do privacy policies actually say?

Friday, April 24th, 2009

Last year, the Common Data Project started a project to survey and analyze the privacy policies of some of the largest, most visited Internet companies. Reading the policies was truly as painful as expected, horrifically boring and difficult to decipher. We found that many companies are as vague and wordy as they can be, which is surely no surprise to anyone interested in online privacy. So why did we do it?

CDP is committed to understanding and articulating a set of “best practices” for data collection and privacy protection. We don’t simply want to criticize companies for their obfuscation. We want to set forth standards that declare it is both possible and desirable to make privacy an integral part of data collection, and not just an afterthought.

But what’s the status quo? What are major companies promising now? What language are they using, and what implications are there for the kind of privacy concerns people actually have?

The first question we asked: What data collection is happening on the site that is not covered by the privacy policy?

It might seem like an odd question. But the fact that there is data collection going on that’s not covered captures so much of what is confusing for people who are used to the bricks-and-mortar world. When you walk into your neighborhood grocery store, you might not be surprised that the owner is keeping track of what is popular, what is not, and what items people in the neighborhood seem to want. You would be surprised, though, if you found out that some of the people in the store who were asking questions of the customers didn’t work for the grocery store. You would be especially surprised if you asked the grocery store owner about it, and he said, “Oh those people? I take no responsibility for what they do.” (Even Walmart, master of business data, probably doesn’t let third parties into its stores to do customer surveys that aren’t on Walmart’s behalf.)

But in the online world, that happens all the time. I’m not talking about the fact that when you click on a link and leave a site, you will end up subject to new rules. I’m talking about data collection by third party advertisers that’s happening while you sit there, looking at that site. Companies rarely vouch for what these third party advertisers are doing. Some companies, such as AOL, Microsoft, Yahoo, Facebook, Amazon, and the New York Times Digital, will at least explicitly acknowledge there are third parties that use cookies on their sites with their own policies around data collection. The user is then directed to these third parties’ privacy policies, as New York Times Digital does here. (Note that some of these links are outdated, at least at the time of this post.)

Google, in contrast, doesn’t mention third party advertisers on its privacy policy directly, alluding to the separate controls for opting out of their tracking on a separate page discussing advertising and privacy.

Companies that don’t allow third party advertisers, like Craigslist, of course have no reason to declare this is happening.

We live in a pretty topsy-turvy world. Let’s say you’re an ordinary user with some vague concerns about privacy. You’ve never read a privacy policy in your life (the way I never had until I started working with CDP), and you decide, oh, I’m going to read Yahoo’s privacy policy. And then you find out that you have to read several more policies if you really want to know who is collecting data from you, how, and for what. Can you imagine if the grocery store owner told you you had to go talk to six different people to understand what was being tracked in that store?

We’ll eventually publish a report summarizing our findings on our website, but we’re going to keep rolling out these posts analyzing different aspects of online privacy policies. We’d love to hear what you think about our analysis, whether you agree or vehemently disagree. Tune in for more.

Get Adobe Flash player