Everyone is in a tizzy with the news that Google is slashing its data-retention policy from 18 months to nine. To be more specific, Google will “anonymize IP addresses on our server logs after 9 months.” The announcement, though, only highlights for me the lack of clarity around the word “anonymize” and the general lack of information around what these data retention policies are actually doing for users’ privacy.
Data-retention is a big issue for some privacy advocates, on the theory that something like the AOL privacy scandal wouldn’t have happened if AOL hadn’t been storing the search queries to begin with. But as we’ve stated before, we at CDP don’t think data deletion is the answer. In fact, we’re concerned that announcements like the one today from Google can actually further confuse consumers about what’s at stake.
To begin with, Google isn’t promising to delete its data after nine months, just to “anonymize” it. The company knows that the word “anonymize” can mean quite a lot of things, and even says so: “We haven’t sorted out all of the implementation details, and we may not be able to use precisely the same methods for anonymizing as we do after 18 months…”
Google is being prodded by the European Union’s stricter regulations around privacy, but even the EU directive on data retention only states, “Such data must be erased or made anonymous when no longer needed for the purpose of the transmission of a communication, except for the data necessary for billing or interconnection payments.” No clear directive on what “made anonymous” means.
When AOL made its search query data public, the company thought it had “anonymized” it. Same when Netflix released its data. That didn’t stop people from individually identifying people in the “anonymized” data set. I trust that Google’s engineers are not using AOL’s and Netflix’s “anonymization” techniques, but it’s clear that focusing so much on the length of time data is retained draws attention away from what happens after the nine months are up.