Yea or Nay: Sympathetic Advertising

March 17th, 2010 by Mimi Yin

Using facial recognition technology, an internal computer determines your gender and your age. The billboard then pulls up an ad based on your demographic, targeting your best possible interest. The billboard I tried out saw that I was indeed a woman in her thirties and… lo and behold, pulled up a very appealing lunch advertisement.

The author of this article compares this new technology to retina scanning technology in the movie “Minority Report” that allowed “billboards” to play ads that are tailored to YOU, personally, not you, as a member of a demographic group. Is that a fair comparison?

After all, the data behind the Japanese advertising technology probably looks more like this Wikipedia page on Japanese demographics than this IMDB page on Tom Cruise.

Still, it’s very easy to see the slippery slope between these two scenarios, in particular because they are collecting the faces they’re reading.

So the question remains, where’s the bright line between tracking people to gain a “general understanding” of what’s going and tracking individuals so they can’t get away with anything? Has this face-reading advertising technology already crossed that line?

What do you think?

Read faces to play demographically targeted ads?

View Results

Loading ... Loading ...

Yea or Nay: Track Taxis with GPS?

March 17th, 2010 by Mimi Yin

We talk a lot on this blog about how tracking personal activities and collecting data can be extremely useful. We also talk about the need for better laws, regulations and shared social understanding of how such data should be collected, shared and used.

As part of our ongoing work to make sense of such a complicated and confusing set of issues, we’ll be collecting interesting “moral dilemmas” related to the issue of tracking human behaviors and posting them as a series of online polls. It’s an attempt to take a more “empirical,” case-by-case approach in an effort to keep high-level policy thinking rooted in reality.

If you come across something an interesting moral dilemma, please send them our way.

Without further ado, here’s the first poll:

Using G.P.S. technology installed in cabs, the (Taxi and Limousine) commission discovered more than 1.8 million trips where passengers were charged the higher rate.

Should we track taxis with GPS devices?

View Results

Loading ... Loading ...

Leaving bacterial “fingerprints” on digital devices.

March 16th, 2010 by Mimi Yin


These knitted bacteria also happen to look like fingers.

We’re usually concerned with issues around leaving “digital” fingerprints (e.g. browsing behavior via cookies). But I couldn’t resist posting about new developments in using genetically specific bacterial traces to track your usage of digital devices (well really anything that retains bacteria.) Hmm, does this work on stainless steel?

Smart Grid Data: Unexpected and Amazing Reuses?

March 16th, 2010 by Grace Meng

As noted in “In the Mix,” the Center for Democracy and Technology and the Electronic Freedom Foundation recently issued joint comments to the California Public Utilities Commission regarding proposed policies around the use of smart grids and smart meters.

(via Flowing Data.)

And then a few days later, I saw this: EPCOR, a Canadian water utility company, issued a graph plotting water usage during the Olympic men’s hockey final.  Notice the spikes in water consumption (and toilet flushing) immediately after the first period, second period, third period, and finally when Canada wins the gold medal.

Is this our worst nightmare?  That someone will find out when we’re peeing?

That’s a bad joke. Plotting a large area’s water consumption in aggregate is not the same as what some of these smart meters are able to measure in terms of energy consumption.

But I do have a more serious point to make.  One of the points CDT and EFF make repeatedly in their comments is that we should avoid “unnecessary” data collection and destroy any “unnecessary” data.

What exactly does “unnecessary” mean?

Does it mean any purpose that is not related to the work of a utility company?  Who decides what’s unnecessary and should they decide what’s unnecessary and necessary now?

The beauty of data is that its potential value is unknown.  A single dataset, collected for one purpose, can be used for other purposes that are socially beneficial but rather unexpected.  For example, Google Trends was created for advertisers so that they can track what search terms are popular.  The CDC, however, has been using Google Trends to track flu outbreaks, by watching where people are Googling flu symptoms, data which is more quickly collected than reports from doctors.  The reason governments all over the world are pushing for open data is because we don’t know yet all that can be done.  By giving access to everyone, we expect interesting, useful, imaginative things to come out of the data we never might have imagined.

Data from the smart grids, in particular, will also require smart visualizations that are easy for individual consumers to understand and access.  Data alone isn’t going to change behavior.  You can imagine open data inviting developers to create easy to use apps that allow consumers to identify easily and painlessly ways to reduce energy consumption.  Some may even choose to share that information and compete with others, the way several universities have set up competitions between dorms.  As much as Al Gore was embarrassed by news revealing how much energy his mansion used, others may be eager to brag about how little energy they use.

Can we protect privacy while also creating room for imaginative and innovative reuse of data?

There are definitely privacy issues we have to consider.  I agree with a lot of the points made in CDT and EFF’s comments.  That “customer information” shouldn’t be limited to “personally identifying information.”  The misuse and misapplication of phrases like “personal information” is something we’ve been harping on for a while.  That customers should have access to the data collected from them and the power to correct mistakes.  That law enforcement shouldn’t be allowed to troll this information without a warrant, that civil litigants shouldn’t be allowed to access this information without a court order based on a showing of compelling interest and after notifying the customer to provide her with a chance to object.

But rather than talking about barring “unnecessary” data collection and data use, we should be thinking of ways to make the data safely available, regardless of whether someone has decided it’s necessary or not.  The data from smart grids is going to be both dangerous and valuable because it is so fine-grained; we clearly can’t just plop it online.  Anonymizing data is really hard.  So at CDP, we’re working hard at thinking about ways to come up with measurable privacy guarantees and testing technologies like PINQ that promise to provide access to raw data without indicating the existence of any particular individual in a dataset.  Other organizations may have different ideas.  I’m grateful for the existence of organizations that imagine the worst-case scenarios around data collection to protect our civil rights.  I also hope to see the growth of more organizations that try to imagine the best-case scenarios.

In the mix

March 12th, 2010 by Grace Meng

1) The CDC recently used shopper-card data to track a salmonella outbreak that sickened 245 in 44 states.  It turned out the pepper in salami made in Rhode Island was the culprit.  Although the CDC began to suspect through interviews and questionnaires that some sort of Italian meat product was the problem, the people they talked to couldn’t remember precisely what they had bought and the shopper-card records helped them identify the actual product.

Great story, right?  Unless you’re the director of Consumers Against Supermarket Privacy Invasion and Numbering, in which case, the story smacks of privacy invasion by the government.  The CDC got the records with the permission of the account holders, but to Katherine Albrecht and several of the commenters to the Yahoo News Story, that didn’t assuage their fears.

Here’s a choice quote: “I’d rather have a few die from poisoning and then they fix the problem then have the entire country enslaved, thank you very much.”

There was at least one person who pointed out commenting on a Yahoo news story wasn’t going to do much to preserve their privacy either.

2) MySpace is selling bulk user data! I’m with ReadWriteWeb:

I think the world is an awfully unfair mess and I’m hoping that data analysis will help illuminate some of the hows and the whys. Like the way that real-estate redlining was exposed back in the day by cross referencing census data around racial demographics and housing loan data. That illuminated systematic discrimination against black families in applying for home loans in certain parts of town. So too I think we’ll find a lot of undeniable proof of injustices and clues for how we might deal with them in big data today.

We don’t want another AOL debacle on our hands, but we also don’t want to give up on the possibilities of “big data” because we prematurely assume better privacy-creating techniques and standards aren’t available.

3) My, it’s a privacy-obsessed week!  Here’s one person’s argument “why no one cares about privacy.” It’s a good round-up of pithy quotes from people like Judge Posner, new “talk about me” sites like Blippy.com, and surveys demonstrating the change in the public’s attitude over time.  Wow, in 1998, 80% of people in a Harris poll said they were hesitant to shop online because of privacy worries.

Still, articles like this and the comments to the Yahoo CDC-shopper data article show how much our discussion of privacy involves people yelling at each other across a very big divide.  Is the choice really a binary one?  Privacy + a few deaths versus Big Brother + public health data?  I don’t care if the CDC has access to my grocery records; at the same time, I don’t plan to sign up for Blippy.com and broadcast my purchase of kale and four kinds of cheese this morning.  (Oops, I just did.)  Maybe we should stop talking about “privacy” and start talking about specific situations.

Prostate Cancer and the Inexorable Pull To Act On Unlikely Events

March 10th, 2010 by Mimi Yin

Here’s another example of how we seize on numbers we can see, no matter how uncertain and meaningless they might be, because there’s not yet a viable alternative source of information.

As a society, we will probably opt for prostate testing no matter how flawed it is until there’s a better, more accurate alternative. In other words, bad, misleading information is better than no information, especially in a culture that prizes initiative and can-do-ness over a more fatalistic view of life: Yes We Can!

This is a design challenge for anybody trying to help people make sense of data. It is also especially important for us right now as we try to figure out a meaningful privacy guarantee for the datatrust. It’s easy for us to guarantee that you’ll never know with 100% certainty the answer to any question. But in many situations, people won’t need anything close to 100% certainty to feel compelled to act.

Certainly in the case of screening for diseases, it’s incredibly hard to do nothing if there is even a hint of a chance that we might be fatally ill.

What are other examples of numbers we make too much of and can’t get enough of?

  • Poll numbers
  • Housing data
  • Almost any study that comes about health and nutrition

In the mix

March 10th, 2010 by Grace Meng

1) We’ve wondered in the past, why don’t targeted advertising companies just ask you to opt-in to be tracked?  When I first heard about it, I thought this newish website, Blippy.com, described on NPR, was doing something like that.  You actively register a credit card with the site and it shares ALL your transactions with your friends.  Except NPR reports the company was rather vague about how the information gets to marketing companies.  And what exactly are they offering anyway, other than the opportunity to broadcast, “I am what I buy”?  The only news being broadcast seem to be about people’s Netflix and iTunes buying tendencies.  Services like Mint.com and and Patients Like Me are also using customers’ data to make money, but they’re offering a real, identifiable service in return.

2) Google explains why it needs your data to provide a better service.

Search data is mined to “learn from the good guys,” in Google’s parlance, by watching how users correct their own spelling mistakes, how they write in their native language, and what sites they visit after searches. That information has been crucial to Google’s famously algorithm-driven approach to problems like spell check, machine language translation, and improving its main search engine. Without the algorithms, Google Translate wouldn’t be able to support less-used languages like Catalan and Welsh.

Data is also mined to watch how the “bad guys” run link farms and other Web irritants so that Google can takecountermeasures.

This is an argument I’m really glad to hear.  It doesn’t make the issue of privacy go away, but I’d love to see privacy advocates and Google talk honestly and thoughtfully about what Google does with the data, how important that is to making Google’s services useful, and what trade-offs people are willing to make when they ask Google to destroy the data.

3) Nat Torkington describes how open source principles could be applied for open data. We heartily agree that these principles could be useful for making data public and useful, though Mimi, who’s worked on open source projects, points out that open source production, with its standard processes, is something  that’s been worked out over decades.  Data management is still relatively in its infancy, so open-sourcing data management will definitely take some work.  Onward ho!

4) The Center for Democracy and Technology and EFF are thinking about privacy and Smart Grids, which monitor energy consumption so that consumers can better control their energy use.  I’m more enthusiastic than EFF about the “potentially beneficial” aspects of smart meters, but in any case, it’s interesting to see these two blog posts within two days of each other.  Energy consumption data, as well as health data, are going to be two huge areas of debate, because the benefits of large-scale data collection and analysis are obvious, even though detailed personal information is involved.

5) The Onion reports Google is apologizing for its privacy problems, directed to very specific people. Ha ha.

“Americans have every right to be angry at us,” Google spokesperson Janet Kemper told reporters. “Though perhaps Dale Gilbert should just take a few deep breaths and go sit in his car and relax, like they tell him to do at the anger management classes he attends over at St. Francis Church every Tuesday night.”

In the mix

March 2nd, 2010 by Grace Meng

1) I’m looking forward to reading this series of blog posts from the Freedom to Tinker blog at Princeton’s Center for Information Technology Policy on what government datasets should look like to facilitate innovation, as the first one is incredibly clear and smart.

2) The NYTimes Bits blog recently interviewed Esther Dyson, “Health Tech Investor and Space Tourist” as the Times calls her, where she shares her thoughts on why ordinary people might want to track their own data and why we shouldn’t worry so much about privacy.

3) A commenter on the Bits interview with Esther Dyson referenced this new 501(c)(6) nonprofit, CLOUD: Consortium for Local Ownership and Use of Data.  Their site says, “CLOUD has been formed to create standards to give people property rights in their personal information on the Web and in the cloud, including the right to decide how and when others might use personal information and whether others might be allowed to connect personal information with identifying information.”

We’ve been thinking about whether personal information could or should be viewed as personal property, as understood by the American legal system, for awhile now.  I’m not quite sure it’s the best or most practical solution, but I’m curious to see where CLOUD goes.

4) The German Federal Constitutional Court has ruled that the law requiring data retention for 6 months is unconstitutional.  Previously, all phone and email records had to be kept for 6 months for law enforcement purposes.  The court criticized the lack of data security and insufficient restrictions to access to the data.

Although Europe has more comprehensive and arguably “stricter” privacy laws, many countries also require data retention for law enforcement purposes.  We in the U.S. might think the Fourth Amendment is going to protect our phone and email records from being poked into unnecessarily by law enforcement, but existing law is even less clear than in Europe.  So much privacy law around telephone and email records is built around antiquated ideas of our “expectations,” with analogies to what’s “inside the envelope” and what’s “outside the envelope,” as if all our communications can be easily analogized to snail mail.  All these issues are clearly simmering to a boil.

5) Google’s introduced a new version of Chrome with more privacy controls that allow you to determine how browser cookies, plug-ins, pop-ups and more are handled on a site-by-site basis.  Of course, those controls won’t necessarily stop a publisher from selling your IP address to a third-party behavioral targeting company!

IP addresses + zip codes = ?

March 1st, 2010 by Grace Meng

ClearSight Interactive, a new behavioral targeting company, has spent the past 18 months collecting more than 100 million IP addresses.  CEO Tom Alison says, in a comment to the article, “Our goal is to become the bridge between online and offline data.”

Whoa, baby.

Alison claims in his comment that Wendy Davis, the writer of the article, didn’t accurately describe what ClearSight Interactive is doing.  So let’s look at the claims he puts out in his comment.

We have a file of IP addresses with 9-digit zip code appended. Our data providers supply the zip code linked to IP without any personally identifiable information. We are able to predict a more likely neighborhood or work location than the zip code or longitude and latitude of the ISPs server readily available from many software or online providers…

In other words, they know where you live. Their press release says more: “ClearSight Interactive bridges IP addresses to verified postal addresses and email addresses.”

Alison claims they do not collect data on online behavior:

We offer geo-demographic marketplace data, not behavioral data. We collect no online behavior. Unlike those companies and websites that utilize individual household data and set cookies, we append census and de-identified marketing data at the neighborhood level.  We all know that people in the same household or neighborhood are not the same. But for many useful marketing attributes, bird of a feather do flock or even live together.

I guess that’s supposed to make me feel better, that the company knows where I live but it only guesses what I might be looking for in a car.  Actually, the company isn’t guessing.  It promises in its press release, “After a consumer views or clicks an ad, the company can then monitor the users future behavior using contact information databases to determine if they later made a purchase – e.g. did someone who viewed a car ad actually visit the dealership and purchase a vehicle?”

Almost more shocking is Alison’s attitude about the privacy implications.  He repeats over and over that they do not have “PII” or “personally identifying information.”  If nothing else, we’ve learned from the AOL debacle and numerous other supposedly anonymized databases, that PII like name and address are not necessary to successfully reidentify large numbers of people in a dataset.

So how did ClearSight Interactive even get this information?  It bought it from publishers, who normally ask their customers if they are okay with their information being shared with third-party marketers.  As the article points out, most people who click “yes” assume that means they’ll get emails from third-party marketers.  They don’t assume that the publishers will sell IP logs to a third-party targeting company.  ClearSight Interactive promises that if you choose to opt-out later, the company will update its records and remove you from its databases.  To which, all I can say is, if you’re so sure that people have actively chosen to allow you to have this information, why not build your business around asking them to opt-in?

On some level, Alison is clearly aware privacy could impact his company.  He writes, “At ClearSight we take privacy matters very seriously,” and the article quotes him as saying they are waiting to see if Congress passes privacy legislation.  But if it’s true that “[a]ll our IP and zip data fall within the appropriate privacy provisions of our partners” and everything they’ve done is legal, well, that’s some of the strongest evidence I’ve heard in support of better privacy legislation.

In the mix

February 25th, 2010 by Grace Meng

1) Interesting story on NPR last week about a new study using cellphone data to track people’s movements.  It turns out they were able to predict the nearest cellphone tower 93% of the time and their actual locations 80% of the time.  The potential value to public policy is significant.  It could affect how we put money into public transportation, for example.

Interestingly, though, no one mentioned any concerns about privacy, just a short statement that researchers don’t have names or numbers.  Seems like a perfect, obvious example of how that’s not sufficiently deidentifying, especially as the conclusion is that you can predict where people are.  Another researcher claims that he has data for half a million people and that “major carriers around the world are now starting to share data with scientists.”  What if we end up with another AOL scandal on our hands, and worse, the scandal keeps this kind of research from continuing?

2) The Open Knowledge Foundation has launched a set of principles for open data in science, in support of the idea that scientific data should be “freely available on the public internet permitting any user to download, copy, analyse, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. To this end data related to published science should be explicitly placed in the public domain.”

We certainly support more data being openly and freely available, but we’re curious.  How will we deal with the rights of people who are in scientific studies?  I’m not a scientist — do most agreements to participate in studies anticipate this level of public availability?  And how can we standardize data to be more easily comparable?

3) It’s not enough to have data. We also need tools to visualize, analyze, and understand data, and more and more tools are available for just that purpose.  Here’s a long list of mapping tools from the Sunlight Foundation, ClearMaps from Sunlight Labs, and Pivot, a new way to combine large groups of similar items on the internet, from Microsoft Live Labs.

Get Adobe Flash playerPlugin by wpburn.com wordpress themes