Data retention has been a controversial issue for many years, with American companies not measuring up to the European Union’s more stringent requirements. But for us at CDP, it obscures what’s really at stake and often confuses consumers.
For many privacy advocates, limiting the amount of time data is stored reduces the risk of it being exposed. The theory, presumably, is that sensitive data is like toxic waste, and the less we have of it lying around, the better off we are. But that theory, as appealing as it is, doesn’t address the fact that our new abilities to collect and store data are incredibly valuable, not just to major corporations, but to policymakers, researchers, and even the average citizen. It doesn’t seem like focusing on this issue of data retention has necessarily led to better privacy protections. In fact, it may be distracting us from developing better solutions.
For example, Google and Yahoo in the past year announced major changes to their policies about data retention, promising to retain data for 9 months and 6 months, respectively. These promises, however, were not promises to delete data, but to “anonymize” it. As discussed previously, neither company defines precisely what that verb means. According to the Electronic Frontier Foundation, Yahoo is still retaining 24 of 32 digits of users’ IP addresses. As the Executive Director of Electronic Privacy Information Center (EPIC) stated, “That is not provably anonymous.” Yet most mainstream media headlines focused only on the Yahoo’s claim of shorter data retention. The article in which the above quote appeared sported the headline: “Yahoo Limits Retention of Personal Data.”
This might be due to the fact that data retention remains a somewhat obscure issue to most internet users. But it’s also true that for many of these sites, much of the data that’s collected is part of the service. As an eBay buyer or seller, it’s useful to see how others have been rated. On Amazon, it’s helpful to know what others have considered as they shop for a particular product. At the same time, my buying and viewing history on Amazon could easily reveal as much, if not more, that I want to keep private as my surfing history on Google. So why does most of the focus on data retention seem to be on ISPs and search engines?
When I look at a search engine like Ixquick, which is trying to build a reputation for privacy by not storing any information, I’m even less convinced that deleting all the data is a sustainable solution. Ixquick is a metasearch engine, meaning that it’s pulling results from other search engines. It’s not a solution to replace Google or Yahoo for everyone. It feels more like a handy tool for someone who is concerned about his or her privacy, than a model that other search engines could end up following. If data deletion by all search engines is the goal, the example to hold up can’t be a search engine that relies on other non-deleting search engines.
What exactly do we want to keep private? At the same time, what information do we want to have? What is the best way to balance these interests? These are the questions we should be asking, not “How long is Yahoo going to keep my data?”