Posts Tagged ‘Google’

In the mix

Tuesday, June 16th, 2009

EFF Launches TOSBack–A “Terms of Service” Tracker for Facebook, Google, eBay, and More.  (EFF)

The “Hidden Cost” of Privacy.  (Schneier on Security)

Google Fusion Tables.  (Official Google Research Blog)

In the mix

Wednesday, June 3rd, 2009

Google is Top Tracker of Surfers in Study. (NY Times Bits Blog)

The Obama Administration’s Silence on Privacy. (NY Times Bits Blog)

This UK Sheriff Cites Officials for Serious Statistical Violations.  (WSJ The Numbers Guy)

Tuesday in the Mix

Tuesday, May 12th, 2009

Just Landed: Processing, Twitter, MetaCarta & Hidden Data (blprnt)

Greece Puts Brakes on Street View (BBC)

Developer of AdBlock Plus Proposes a Fairer Approach to Ad Blocking (ReadWriteWeb)

What Does Access to Real World Data Online Make Possible? (ReadWriteWeb)

Monday in the Mix

Monday, May 11th, 2009

Signs Your Wireless Carrier Loves You (NYT)

Calendar as filter (Dilbert.com)

New Search Service Aims at Answering Tough Queries, but Not Taking on Google (NYT)

Yahoo or Google as a Datatrust? But will Facebook play?

Monday, May 4th, 2009

Time will tell, but it appears that Yahoo! has made it *really* easy (for application developers) to extract publicly available data from all over the interwebs and query it through Yahoo!’s servers.

YQL Execute allows you to build tables of data from other sources online, using Javascript as a programming language and run it on Yahoo’s servers, so the infrastructure needs are very small.

Similarly, Google “just launched a new search feature that makes it easy (for you and I) to find and compare public data.”

Graph from Google Public Data

Image taken from the Google Blog.

Which is pretty exciting as both are huge leaps towards what we’ve envisioned as a “datatrust” in various blog posts and our white paper. Well except for maybe the “trust” part. (Especially given our experiences with Yahoo here and here.)

A few more points to contemplate:

  1. Now that the Promised Land of collating all the world’s data approaches on the horizon, will that change people’s willingness to make data publicly accessible? What I share on my personal website might not be okay rearing its head in new contexts I never intended. As we’ve said elsewhere, when talking about privacy, context is everything.
  2. What about ownership? Both Yahoo! and Google may only temporarily cache the data insofar as is needed to serve it up. But, in effect, they will become the gatekeepers to all of our public data, data you and I contribute to. So the question remains, What about ownership?
  3. There’s still a lot of data that’s *not* publicly accessible. Possibly some of the most interesting and accurate data out there. How will we get at that? Case in point, Facebook just shut down a new app that allows you to extract your personal “Facebook Newsfeed” and make it public via an RSS feed, citing, what else? Privacy concerns. (Not to mention the fact that access to Facebook data is generally hamstrung by privacy.)

Google Maps: good or evil?

Tuesday, April 7th, 2009

I love that these two news items posted on the same day last week:

The Natural Resources Defense Council and the National Audubon Society launch a new tool for environmentalists and green energy developers based on Google Earth, marking clearly which lands are available for solar and wind farm development.

Angry mob in England attacks Google Street View car.

Transparent Google?

Friday, March 27th, 2009

There’s some fascinating new stuff going on in the world of online tracking and targeted advertising.  First, Google rolled out its new behavioral targeting ad program with some features that long-time privacy advocates, like the Electronic Frontier Foundation and Michael Zimmer, found worthy of praise.

For people who choose not to be tracked, Google developed a plug-in that persists even after cookies are cleared.  Most other systems for opt-out rely on cookies.  Given that most people who are concerned about their privacy clear their cookies periodically, it was important to EFF that Google’s opt-out mechanism would remain even if all cookies were cleared.

Even more interesting was Google’s decision to link a page to the caption “Ads by Google” that explains the behavioral targeting technique with a list of interest categories that have been assigned to you.  In other words, Google is making more transparent what they know, or think they know about you.  You can then choose to remove some of those interest categories or to opt-out of tracking altogether.

As Zimmer points out, Google could show more fine-grained detail regarding what they know about you.  But it’s still a fascinating step for a major corporation to take.  Even better, Google isn’t the only one creating pages that show users how they’re being viewed for marketing purposes.

BlueKai and eXelate Media run “behavioral exchanges,” selling information to companies about website visitors.  Like Google, they both provide pages, here and here, where people can choose to opt-out of tracking altogether.  Otherwise, they can monitor and edit what interests are associated with them.

It’s hard to know how “transparent” all this really is to people who are not tech and privacy geeks.  Ultimately, companies need to improve data collection practices for everyone, not just people who care enough to find out.  And I would argue that it can’t be a model where a select few can just opt-out and protect themselves, and the companies can continue to do  anything they want to do with everyone else’s data. But it’s still a new way of managing your life online that doesn’t require as much investment in self-education and time as the many of the other methods described by EFF in its Surveillance Self-Defense Site.

Will this model become the dominant one in online tracking?  Compare the transparency of these companies with RealAge, an online quiz that’s just been outed as selling information to pharmaceutical companies who want to market directly to quiz takers.  What most consumers find instinctively distasteful is a feeling of being fooled.  RealAge claimed that it protected privacy by not giving personally identifiable information to the companies and that it is “providing value in return for the information” with ads that might interest the quiz takers, but it’s not the kind of value RealAge users consciously “paid” for.  What BlueKai, eXelate Media, and Google have shown is an understanding that for many people, their privacy is violated not just when a company knows such-and-such information is associated with Mr. Tom Smith, but when any of that information is being collected and shared without the full knowledge and consent of Tom Smith.

It’s obvious why RealAge chose to be vague about where their profits came from–would 27 million people have taken the test if the website had declared prominently that the information would be sold to pharmaceutical companies?  But it’s hard to see how sustainable that business model is.  Presumably, BlueKai and eXelate Media, as well as Google, will also get somewhat less data with their more transparent strategy.  But what model of business will still be around ten, twenty, fifty years in the future?

What Would Diderot Do?

Monday, March 9th, 2009

Book historian Robert Darnton has read and summarized the lengthy draft settlement between Google and book publishers for the New York Review of Books.

Please allow me to now further simplify by summarizing Darnton’s analysis:

> The Enlightenment represented the dawn of a new age of learning, built on the free-ish exchange of ideas in letters and books.

> The enlightened founders of the United States limited copyright to 28 years, recognizing the necessity of both protecting authors’ rights and advancing public knowledge. Life expectancy was much shorter then, but a young author could have a reasonable expectation of his or her book losing copyright within their lifetime.

> The 1998 Sonny Bono Copyright Term Extension Act extends copyright to the life of the author + seventy years. That means the books now entering the public domain date to roughly to the 1920s, and all the authors are dead.

> Google has been digitizing millions of books. Some of them are in the public domain, some are still copyrighted, and the largest portion are copyrighted but out of print, and therefore largely out of reach.

> A draft settlement between Google and publishers promises to bring the texts of these books to the people, at low cost (at your home computer) or no cost (at public and university libraries which purchase a license). This archive could quickly become the world’s largest library, bar none.

> This exciting archive could represent a Digital Republic of Learning that would have made Diderot (the author of the first encyclopedia) salivate.

> While there have been some similar efforts by not-for-profit groups like the Open Content Alliance, Google Books, will eat their lunch.

> The draft agreement between Google and publishers has problems: libraries would be limited to a single computer terminal with access to the archive, and users would have to pay to print copyrighted material.

> The biggest problem, however, is this:

“What will happen if Google favors profitability over access? Nothing, if I read the terms of the settlement correctly. Only the registry, acting for the copyright holders, has the power to force a change in the subscription prices charged by Google, and there is no reason to expect the registry to object if the prices are too high.”

It’s interesting to consider this scenario. In the short life of Google, most criticism has come from a smallish cadre of geeks. Under different management, could the company ever do anything to make your mom mad?

Google announces data will be “anonymized” after nine months–but then what?

Tuesday, September 9th, 2008

Everyone is in a tizzy with the news that Google is slashing its data-retention policy from 18 months to nine.  To be more specific, Google will “anonymize IP addresses on our server logs after 9 months.”  The announcement, though, only highlights for me the lack of clarity around the word “anonymize” and the general lack of information around what these data retention policies are actually doing for users’ privacy.

Data-retention is a big issue for some privacy advocates, on the theory that something like the AOL privacy scandal wouldn’t have happened if AOL hadn’t been storing the search queries to begin with.  But as we’ve stated before, we at CDP don’t think data deletion is the answer.  In fact, we’re concerned that announcements like the one today from Google can actually further confuse consumers about what’s at stake.

To begin with, Google isn’t promising to delete its data after nine months, just to “anonymize” it.  The company knows that the word “anonymize” can mean quite a lot of things, and even says so: “We haven’t sorted out all of the implementation details, and we may not be able to use precisely the same methods for anonymizing as we do after 18 months…”

Google is being prodded by the European Union’s stricter regulations around privacy, but even the EU directive on data retention only states, “Such data must be erased or made anonymous when no longer needed for the purpose of the transmission of a communication, except for the data necessary for billing or interconnection payments.”  No clear directive on what “made anonymous” means.

When AOL made its search query data public, the company thought it had “anonymized” it.  Same when Netflix released its data.  That didn’t stop people from individually identifying people in the “anonymized” data set.  I trust that Google’s engineers are not using AOL’s and Netflix’s “anonymization” techniques, but it’s clear that focusing so much on the length of time data is retained draws attention away from what happens after the nine months are up.

Cuil: Is zero data collection the answer?

Monday, August 11th, 2008

Cuil, the new search engine, launched with much fanfare this past week. It’s been blogged about all over the place already, so I’m not going to analyze how its results compare to Google’s. I’m more curious about its privacy policy, which trumpets that it collects NOTHING, nada, zip, zilch.

I found it sort of funny that the other big news in search engines recently was Google’s announcement that it was launching an updated version of Google Trends called Google Insights for Search. While one search engine bragged about its lack of data collection, the other was showing it off.

The two news items together highlight the problem at the heart of our ongoing search for more privacy online. Despite all the handwringing over online data collection, especially by big search engines, people love seeing the data that gets collected, even when they’re not advertisers. We want to see how often we’re mentioned in Twitter, or what parts of the world are searching for topics we blog about. It’s not hard to imagine more serious research and analysis being applied to this data and real social good coming out of it.

I’ve never found very compelling the National Rifle Association’s argument, “Guns don’t kill people; people kill people.” But I find myself wanting to say something similar about data collection: “Data collection doesn’t violate privacy; irresponsible people and laws violate privacy.” Shutting down data collection altogether can’t be the answer.