When the first category is called “personal” information, the second category implicitly becomes “not-personal” information. But the queries we put into search engines—what could be more personal? How much could you learn about me, just looking at the history of things I’ve bought on Amazon, let alone the things I’ve Googled? What is an IP address if not a marker linking my computer to the actions I (and others) take on that computer?
Yahoo and Amazon go the extra step of labeling cookie and log data, “automatic information,” giving it a ring of inevitability. Ask Network calls this information “limited information that your browser makes available whenever you visit any website.” Wikipedia similarly states, “When a visitor requests or reads a page, or sends email to a Wikimedia server, no more information is collected than is typically collected by web sites.”
There are companies that do define “personal information” much more broadly. EBay’s definition includes “computer and connection information, statistics on page views, traffic to and from the sites, ad data, IP address and standard web log information” and “information from other companies, such as demographic and navigation information.” AOL states that its AOL Network Information may include “personally identifiable information” that includes “information about your visits to AOL Network Web sites and pages, and your responses to the offerings and advertisements presented on these Web sites and pages” and “information about the searches you perform through the AOL Network and how you use the results of those searches.”
And there are websites that don’t collect information at all: Ixquick and Cuil, the search engines that have been trying to build a brand around privacy. These companies decided that privacy required that they not record IP addresses, and Ixquick deletes log data after 48 hours.
Personally, I don’t think the solution is in deleting IP addresses and log data willy-nilly. But we as a society can’t have a thoughtful discussion on what it takes to balance privacy rights against the value of data if companies aren’t honest about how “personal” cookie and log data can be.
Otherwise, there’s very little discussion of what combination of data means. When data is combined, many data sets that initially appear to be anonymous or “non-personally identifiable” can become de-anonymized. Researchers at the University of Texas in recent years have demonstrated that it is possible to de-anonymize through combination, as when Netflix data is combined with IMDB ratings, or when Twitter is combined with Flickr. So when companies offhandedly note that they are combining information they collect from different sources, they are learning a great deal more about individual people than the average user would imagine. And as you might imagine, large companies like Microsoft, Google, and Yahoo have a wealth of databases at their disposal.
So that’s “transparency,” a long list of types of information collected and artful categorization. It’s amazing that some privacy policies can use so many words and yet say so little.