Posts Tagged ‘Privacy’

A nonprofit wants to share its mailing list with some economists–would that bother you?

Thursday, March 13th, 2008

There’s a fascinating article in the New York Times Sunday Magazine on an economists’ study of what makes people donate by an interesting liberal-conservative pair, Dean Karlan and John List. They wanted to do an empirical study of fundraising strategies, to find out what kind of solicitations are the most successful. As the article points out, lab experiments of economic choices aren’t particularly realistic: “If you put a college sophomore in a room, gave her $20 to spend and presented her with a series of pitches from hypothetical charities, she might behave very differently than when sitting on her sofa sorting through letters from actual organizations.”

So Karlan and List found an opportunity for a field experiment, a partnership with an actual, unnamed nonprofit that allowed them to try different solicitation strategies and map the outcomes. They wrote solicitation letters that were similar, except some didn’t mention a matching gift, some mentioned a 1-to-1 match, some a 2-to-1, and some a 3-to-1. In the end, if a matching gift was mentioned, it increased the likelihood of a donation, but the size of the matching gift did not. As the author, David Leonhardt, notes, their findings and the findings of other economists in this area are significant to many people, from the nonprofits trying to be better fundraisers to economists studying human behavior, even to those who want to make tax policy more effective and efficient.

The article, however, didn’t mention whether the donors to the nonprofit had consented to their responses being shared with anyone other than the nonprofit. I’m not that concerned about whether donors’ privacy may have been egregiously violated. (I’m also not sure what’s required of nonprofits in this area.) I’m just curious to know, if they had been given the choice, would they have agreed to their information being shared with the economists? Obviously, the study wouldn’t have worked if potential donors had been told they would be sent different solicitation letters to measure their responses, but I think if most people on a nonprofit’s mailing list were asked if they would explicitly allow their information to be used in academic studies, they would consent. They might want assurances that their individual identities would be protected—that no one would know Mr. So-And-So had given zero dollars to a cause he publicly champions. But they might very well be willing to help the nonprofit figure out how to be more effective and be a part of an academic study that could shape public policy. They might even be curious to know how their giving measures compares to other donors in their income brackets or geographic areas.

Most people, myself included, have a knee-jerk antipathy to having their personal information shared with anybody other than the organization or company they give it to. But maybe we would feel differently if we were actually given some choices, if our personal identities could be protected, if sharing information could lead to more than just targeted advertising or more junk mail.

Ohmigod, companies are tracking what we look at online!

Monday, March 10th, 2008

Breaking news from the New York Times.

What’s truly interesting about this article, though, isn’t that the New York Times is announcing as “news” something that’s been going on for a very long time. Rather, the New York Times, even while devoting space on its front page, doesn’t really seem to have a point.

The article tries to distinguish itself from vague alarms raised by privacy advocates with data, the results of a study done with comCast measuring “data collection events,” each time “consumer data was zapped back to the Web companies’ servers.” (Even though the New York Times has produced some of the prettiest data graphics in recent memory, this one looks like something created on Excel and conveys little more than a flurry of numbers.) But the overwhelming impression left by the article is that companies are trying to target advertising, and some might do it better than others, rather than that extensive personal information is being collected. So then it isn’t surprising that several of the comments in response to the Bits blog post are about how they never click on ads, or how stupid these companies are in sending them ads for things they’re not interested in, or how they’ve blocked pop-up ads on their browser.

After all, the article mentions only briefly what kind of information is being collected: “the person’s zip code, a search for anything from vacation information, or a purchase of prescription drugs or other intimate items.” The article cites Jules Polonetsky, chief privacy officer for AOL, “[who] cautions that not all the data at every company is used together. Much of it is stored separate,” yet the author doesn’t explain the significance of that statement. The article doesn’t mention that even if consumer data is stripped of “identifiers” like a user name, individual identification could happen easily through the combination of datasets.

I would love to see an article by a mainstream publication that addresses this issue in a truly comprehensive and thoughtful way. What’s missing in the conversation started by this article is not only a fuller analysis of how personal information is being collected and what dangers there are for individual privacy, but also a nuanced discussion of that information’s value and what it means for “a handful of big players” to hold most of it. The article ends citing a study of California adults, 85% of whom thought sites should not be allowed to track their behavior around the Web to show them ads. But does that statistic really capture what’s at stake?

P.S. Is AOL’s innocent penguin happy or merely surprised that anchovy ads are being sent to him?

Property Shark and “Contextual Integrity”: Where real estate obsession and privacy academia intersect

Tuesday, March 4th, 2008

Recently, I was having dinner with some friends when the topic of Property Shark came up. My friends, being homeowners, were disturbed that someone could simply go online, type in their address, and find out who the owners were and precisely what they had paid for it. One friend exclaimed, “I don’t want people to know how much money I have!” When I pointed out that the information was public record, and that before Property Shark, anyone could have gone down to City Hall and found the same information, he didn’t care. It still bothered him.

For all our talk of “privacy,” of how it’s being violated all over the place, of how it’s already lost, it’s not even clear what we mean when we say “privacy.” We, as a society, might have agreed that it is good public policy for real estate records to be public so that potential buyers can make sure sellers actually own the property they’re selling. Capitalism can’t thrive if you can’t be sure you own what you own. But when we theoretically made this agreement, we certainly didn’t imagine a world where “public” means available to anyone, anywhere, at any time. Professor Helen Nissenbaum, who recently presented at the DIMACS Data Privacy Workshop, has proposed that we think about “contextual integrity” rather than “privacy.” She argues that it’s more useful to consider what’s appropriate in each context rather than assuming there is a blanket “privacy” standard applicable to all situations.

That makes sense to me. My friend wasn’t arguing that the information shouldn’t be public record. Rather, he wasn’t comfortable with that information being accessed so easily online.

Personally, in the universe of privacy breaches, Property Shark doesn’t seem so problematic, but it’s certainly helpful as the Common Datatrust Foundation works on privacy problems to remember that “privacy” doesn’t have a singular meaning. One of CDTF’s goals for this year is to create some privacy standards for companies and other data collectors that acknowledge that information flow can’t just have a on/off, public/private spigot. It’s obvious that our world and our needs are more complex than that. After all, sometimes it’s hard to know even what we want when we clamor for more privacy. Even my friend, when pressed, admitted that the next time he was looking to buy a house, the first thing he would do is go to Property Shark.

Yahoo! Private Domain Debacle Part II: Can’t Keep a Secret

Tuesday, February 26th, 2008

Many months ago I wrote a long rant about my experiences trying to transfer a domain that Yahoo!I had registered with Yahoo!’s Private Domain Registration service to another registrar. The short story is that I was unable to transfer the domain without making my WHOIS contact details public. The long story is long.

There’s another “feature” of Yahoo!’s Private Domain Registration service though that I just learned about: it doesn’t clean up after itself.

So I created a site with the Yahoo! Small Business hosting service, part of which is a somewhat opaque domain registration service for which there is no separate charge. (This is actually quite a good user experience, as I imagine most user’s don’t want to have to understand about registrars, they just want to pay to have a working web site.) I did check the box to use the private domain registration feature to keep my contact information private.

The web site was for an event, and when the event was over, I no longer had any use for the site or the domain, so I logged back in to Yahoo! Small Business and canceled the service. This was a relatively simply process that took me through a number of the-sky-is-falling-bold-red-letter steps, warning me repeatedly that my site would be deleted, and was I really sure I wanted to do this? Yes, I was sure. Cancelled, done, gone. It seemed gone anyway - the site was unavailable.

About one year later, I get an email from the friendly registrar that Yahoo! uses, a cast member in my last rant, Melbourne IT.

From: whoisreminders@whoisupdate.com
To: <xxxxxxx@xxxxx.com>
Sent: Tuesday, December 11, 2007 3:13 PM
Subject: WHOIS Data Reminder

Dear Valued Customer,

In accordance with ICANN (Internet Corporation of Assigned Names and
Numbers) Whois Data Reminder Policy (WDRP) resolution 03.41, this message is a reminder to help you keep the public WHOIS contact data associated with your domain name registration up-to-date. Our records include the following information as of 14-Nov-07:

Domain Name: My domain name
Registration Date: 6-Jun-06
Expiration Date: 6-Jun-08

Registrant Contact Details
Name: My name here
Email: My email address here
Address: My address here
Address: (null)
City: My city here
State/Province: My state here
Post code: My zip
Country: My country

Administration Contact Details
Name: My name here
Email: My email address here
Address: My address here
Address: (null)
City: My city here
State/Province: My state here
Post code: My zip
Country: My country
Phone: and finally My phone number
Fax: (null)

Technical Contact Details
Name: YahooDomains TechContact
Email: domain.tech@YAHOO-INC.COM
Address: 701 First Ave.
Address: (null)
City: Sunnyvale
State/Province: CA
Post code: 94089
Country: UNITED STATES
Phone: 1.61988131
Fax: (null)

Registrar Name: Melbourne IT
Name server Details
yns1.yahoo.com
yns2.yahoo.com

If any of the information above is inaccurate, you must correct it by contacting your domain name supplier, hosting company or web services provider by either calling them or visiting their web site. If your review indicates that all of the information above is accurate, you do not need to take any action. Please remember that under the terms of your registration agreement, the provision of false WHOIS information can be grounds for cancellation of your domain name registration.

***************************************************************
**** Please do not reply directly to this WHOIS reminder email as your
****
**** request will not be attended to.
****
***************************************************************

Thank you for your attention.
Best regards,
Your hosting services provider
—————————————————————————
This email was sent by your current Registrar, at request by ICANN to
xxxxxx@xxxxxx.com

—————————————————————————

Yes, believe it or not, all of the billing information I had given to Yahoo! for billing my website had been dumped into the WHOIS database.

I don’t have the energy to follow-up with Yahoo! customer service given my past experience, and its possible that the experience has improved in the last year, but with my anecdotal evidence, here’s what I think happened to me:

  1. User pays to host a website with a “private” domain name with Yahoo!
  2. Yahoo! registers the domain with MelbourneIT for two years using the anonymized contact information (For those who didn’t read the first rant, contact@myprivateregistration.com, Emeryville P.O. Box and 510-595-2002)
  3. User cancels hosting service and website with Yahoo!
  4. Yahoo! updates domain registration contact information with the billing information provided to them for the hosting service, exposing it to the world.

This is broken. In my mind there are two more appropriate alternatives to the above given that all of the registration process was hidden from my user experience.

  1. Ideally, I would think Yahoo! should cancel the domain registration with MelbourneIT, never exposing the contact information. There are certainly enough warnings in the site deletion process such that as the end user I didn’t have any expectation that any part of the web site would remain.
  2. If there’s some sort of legal catch-22 that prevents true demolition of the domain, those users who paid extra to have the “private” domain registration service should be provided the option to update registration contact information to details of their own choosing.

Given the amount of traffic that my last post on this issue got, this whole mess is a big concern that lots of people are running into. I keep expecting someone from Yahoo! Domain Registration to find these blog posts and respond, but so far, nary an email or a comment. YahooDomains? Anyone listening out there?

Facebook: The Only Hotel California?

Thursday, February 14th, 2008

As the subject of recent splashy news on privacy and personal data collection, Facebook is starting to seem a little scary. In the words of one former user, Nipon Das, “It’s like the Hotel California. You can check out anytime you like, but you can never leave.” We’ve heard how difficult it is to remove yourself from Facebook.

We’ve seen how Facebook initially chose to launch Beacon, a advertising tool that told your friends about your activities on other websites, such as a purchase on eBay, without an easy opt-out mechanism, until outrage and a petition organized by MoveOn.org forced Facebook to change its policy.

Facebook employees are even poking around private user profiles for personal entertainment.

But although Facebook is at the forefront of a new kind of marketing, it’s not the only company with discomforting privacy policies and terms of use. Facebook’s statement that its terms are subject to change at any time is standard boilerplate. Its disclosure that it may share your information with third parties to provide you service is also pretty standard. After all, it’s certified by TRUSTe, the leading privacy certifier for online businesses. In fact, Facebook is arguably more explicit than most companies about what it’s doing because by its very nature, it’s more obvious that users’ personal information is being collected.

You could argue that the users do have a choice. They could choose not to use Facebook. But how did it turn out that in the big world of the internet, we have only two choices: 1) provide your personal information on the company’s terms; or 2) don’t use the service?

So far, it’s not clear that the controversy around Facebook has led to increased public concern about other companies and their personal data collection. It doesn’t even seem to have spilled over to all the programs that run on Facebook’s platform. No one seems perturbed that the creator of some random new application for feeding virtual fish now has access to his or her profile.

But there clearly is growing public unease, an increasing sense that our Google searches or our online purchases may be available to people we don’t know and can’t trust. Perhaps Facebook will end up providing an invaluable public service, albeit inadvertently, in making more people wonder, “What exactly did I agree to?”

Where that “study” you quoted came from: Remember that call you got during dinner?

Tuesday, May 29th, 2007

Over the last few months I’ve been to a number of interesting talks at the Stanford Methods of Analysis Program in the Social Sciences (MAPSS) colloquium. Two types of speakers have caught my attention: those who work closely with the logistics and mechanics of data collection, and those who try to use survey data to test their hypotheses.

Most recently I got to hear Linda Piekarski of Survey Sampling International on SSI’s efforts to address changes in the telephone system, as well as their recent forays into internet surveys. (I didn’t realize how perfect the original design for the U.S. phone system was for tele-survey companies.)

Also memorable was Yale Professor Don Green’s talk about measuring the effectiveness of political campaign advertising. One of my favorite lines (though I’m paraphrasing) was that “Any time you see a clean, clear graph of data, there’s something wrong. Data “noise” is what reality looks like.”

What follows is a summary of the challenges facing the collection of data about individuals derived in part from these talks.
Today, there are three main ways of collecting data from individuals, each of which contain flaws that seriously undermine the quality of the data collected.

  1. Pay them a tiny reward, lure them with a sweepstakes or nag them at dinner with a phone call from a stranger. For example, online stores may offer a coupon or rebate for your feedback on your buying experience.
  2. Make it easy for individuals to inadvertently or unthinkingly consent to data being collected about them, and/or subsequently changing the substances of what is collected, or the uses for that data. One prominent example is Amazon.com’s site registration process, which makes no attempt to highlight their third-party data-sharing practices.
  3. Leverage data collected for some other purpose – so-called “Secondary Use”. For example addresses collected for fulfillment (shipping) being used for geographically targeted marketing messages.

These mechanisms have a set of critical flaws:

  1. Tiny rewards and nagging phone calls are an insufficient value proposition for many individuals, thus the pool of participants is unlikely to be well distributed across the target distribution. Instead it will favor those individuals for whom the reward remains attractive, however small; or those individuals for whom the cost of participation (time) is small enough to make the reward adequate. (Mechanism 1)
  2. Rewards or compensation that are distributed without regard to accuracy provide no incentive for either careful or genuine accurate self-reporting. (Mechanism 1)
  3. These practices cultivate a public perception of a mesh of “big brother” networks collecting an ever-expanding set of data, beyond the control of any one individual. Privacy outrage still surfaces in mainstream media occasionally, but the general public is increasingly numb to incremental discoveries of the erosion of personal privacy. While anesthesia may appear temporarily attractive to data collectors, it also disengages individuals from the data collection goals, which decreases participation and discourages accurate self-reporting. For example, when you are pressured to answer a survey at a department store or after check-out at a web retailer, do you react with an earnest attempt to supply them with the information they need? (All mechanisms)
  4. In an effort to fight back the ever-increasing invasive data collection going on, privacy legislation and legal liability has forced data to be “silo-ed” and “anonymized” as much as possible. That means that unless you are a part of a larger survey panel, each subsequent survey you complete or data you consent to have collected will be stored separately from your other data. This eliminates the possibility of data-accuracy maintenance by individuals, and makes longitudinal analysis increasingly difficult. (All mechanisms)

Yahoo! Private Domain Registration: If it’s broken, don’t fix it?

Thursday, January 25th, 2007

Recently I setup a temporary personal web site that I was concerned might see a traffic spike, and rather than going through my usual registrar and web host, I tried a cheap off-the-shelf package from Yahoo! instead.

Yahoo! offers an add-on service called “Private Domain Registration” where they hide your contact information from the WHOIS database for an additional $0.75/month. Familiar with WHOIS spam, the service sounded great to me and at that price it was practically a free lunch.

Everything worked smoothly (6 Months, 0 Spam) until I shut the site down and decided to transfer the domain from Yahoo! Domains to my normal registrar. The following is a true story about how I learned that Yahoo! Private Domain Registration is broken and is effectively holding my contact info for ransom.

******

The process of transferring a domain between registrars is designed to avoid fraudulent transfers by people trying to steal domains. There are several authorization steps, one is: The new registrar (Tucows) sends an email to the the current owner of the domain registration (me). When I got to that step in the process my new registrar informed me that their repeated attempts to do so had failed.

Sigh.

So, I contact Yahoo! Domains support over email, assuming that they are having a problem with their mail servers, or that the authorization emails are being blocked by a spam filter. Instead, Yahoo! informs me that the service is working exactly as expected:

I understand that you want to transfer the domain registration to “Tucows”, but you are unable to receive the mail sent by them to your Admin email address “contact@myprivateregistration.com”.

Regarding your issue, I have checked the record and found that you have activate the Private domain registration on your domain “[domain removed]”, in order to conceal your personal information from unwanted solicitors by listing contact information for Yahoo!’s domain name registration partner, Melbourne IT, in place of your own registrant, administrative, technical, and billing contact information in the public WHOIS database. [sic] Your own contact information will remain associated with your domain in Yahoo!/MelbourneIT’s database but will not be made available in the public WHOIS.

So, in order to show your actual information in the public WHOIS record, you have to disable the private domain registration.

Yahoo! Domains Support doesn’t expect you to receive any email when you have “Private Domain Registration” turned on. In order to complete the registration, I need to turn off the privacy feature and expose my real email address.

Quality.

My initial reaction is: I have misunderstood the feature. I read up on the service offering, as well as the slightly more detailed help content, and it turns out that I’m right; something is wrong with the service.

From the Yahoo! Private Domain Registration marketing page (my bold):

How Does Private Domain Registration Work?

  • When you sign up, our partner Melbourne IT updates your registration listing with generic contact information that points to MelbourneIT’s offices.
  • Whenever someone looks up your domain and tries to contact you, Melbourne IT receives the call, email, or letter and screens the information on your behalf.
  • Melbourne IT forwards prescreened communications to you, so you can reply as you see fit.

What does this mean? In practice Yahoo!, with the help of MelbourneIT, replaces your contact email address with contact@myprivateregistration.com, your address with a PO Box in Emeryville, CA, and your phone number with their phone number, all for $0.75/month. How could they possibly afford to do that?

I reply to the support mail explaining the discrepancy between the feature list and the service I have been experiencing, and ask for a refund for the last 6 months of service.
Later that week…

1. Yahoo! still has not responded to my email. Several more attempts have been made by my registrar to contact me through the pre-screening service.

2. I decide to call the Yahoo! support phone number. To my surprise, someone promptly answers the phone, and within 10 minutes I have my answer: The mails are getting blocked by spam filters, but Yahoo has no control over their own spam filters, so nothing can be done about my problem. I am surprised that this is an acceptable answer, but I let it go and allow myself to be forwarded to billing to request a refund.

3. Billing listens to my complaint, and then spends several minutes trying to transfer me back to tech support to help resolve my issue. I re-explain that tech support has already given up on resolving it. There is some confusion on the line.

I am disconnected, apparently unintentionally.

4. I call back, and this time ask for billing support immediately. I am transferred to Yahoo! Personals support, where the operator informs me that I have called the wrong number, and gives me a new number to call.
5. Finally, I get another billing support agent on the phone, and this time make it clear up front that I want a refund for the service. The agent I speak with informs me that when I cancel the service, I will be refunded a pro-rated amount for the remainder of the month. As for the past six months of service they have already provided, no refund would be supplied, as the service has already been rendered.

As far as I am concerned, this is not acceptable. The way I see it, the 6 months of privacy “protection” they provided are about to be voided because their service doesn’t work the way it’s supposed to, which in turn makes it impossible for me to transfer my domain registration away from Yahoo! without exposing my personal contact info.

I point out to them that this amounts to blackmail – my privacy is being held hostage to keep me a Yahoo! customer. There is a pause on the other end of the line when I mention to her that I will be writing this up as a blog entry.Finally she says “The bottom line is, I can’t refund you for more than the current month.”

I asked her to escalate my complaint, and she puts me on hold for a few minutes. When she returns she informs me that I will receive an email with a “decision”.

I sit grumbling, hammering out this blog post as the best way to escalate the issue, when I think of another approach. I send a mail quickly to contact@myprivateregistration.com. It bounces back immediately. (Try it yourself.) This wasn’t about registrar mails getting bounced, nor did it seem to be about spam filters; I am quite certain now that all mails get bounced, regardless of content.

What’s more, in writing the test email, I realize something else that should have been obvious to me before: Everyone with the Private Domain Registration service gets the same generic contact@myprivate…email address. Ditto for the PO Box and the phone number. Meaning, in order for the pre-screening service to work, some system or person would have to scan each individual communication in order to decide which ones were directed at which domain owners.

How could that possibly work for $0.75/month? Hmmm…the free lunch is sounding less and less like lunch.

Anyway, the all-mails-bounce problem seems like a more concrete issue for the tech support folks to chew on, so I call back.

6. This time, I get a helpful support agent on the line, repeat my story, and even get him to send a mail to see it bounce with his own eyes. His initial response is also that the service is working as expected, and I direct him to the URL that describes the service so that he can understand my problem. After much ado, he decides that the problem is with their partner MelbourneIT, (a diagnosis I agree with) and that therefore I should contact them to resolve the issue.

HEADS UP, BIG BUCK PASSIN’ THROUGH!

Then he gives me a long distance phone number to Australia that he suggests I call. I laugh. He also thinks this is silly, and hopes, for my sake, that they speak English over there.

I try another tack: I explain to him that from Yahoo!’s perspective, this isn’t about my individual complaint, but that everyone who is paying for this service is being affected. I recommend that he escalate this to his manager, and he seems to understand what I am saying, but is also reaching the end of his patience. I can tell that whomever he’s working with on his side is not as sympathetic. He puts me on hold again, and I go to the MelbourneIT website to check out their online support.

As it turns out, MelbourneIT has a nifty support tool that allows me to identify my problem and domain. I write a quick note and submit the request.

Minutes later, while still on hold with Yahoo!, I get an automated reply to my complaint (my bold):

THIS IS A SYSTEM GENERATED MESSAGE

A Melbourne IT Reseller manages the domains specified in your message.

Please contact this reseller using the details below for any assistance you require. If the person you contact refers you back to us, ask them if they would please contact us on your behalf.

Reseller details:

Yahoo Inc.
Web address: domains.yahoo.com
Email address: domains-support@cc.yahoo-inc.com

Genius! An automatic buck passer. Lucky for me, I’m still on the phone with Yahoo!

When my Yahoo! support agent comes back to the phone, he says that a “special note” has been added to my case to indicate that this issue may affect other Yahoo! customers, and re-recommends that I contact MelbourneIT.

He is quite disappointed when I read him the automated reply from MelbourneIT.

I try explaining to him why I think MelbourneIT is right – after all, Yahoo! contracts MelbourneIT to provide the service – MelbourneIT doesn’t know who I am as an individual. I pay Yahoo!, Yahoo! pays MelbourneIT – if I have a problem, I ask Yahoo! to fix it. If Yahoo! has a problem with MelbourneIT, they ask MelbourneIT to fix it. Who do I want a refund from? Yahoo! Who’s holding my privacy hostage? Yahoo!

At this point, I decide that a blog post is a more effective use of my time and energy, but I let the support agent put me on hold one last time to get a final response from his management.

After several minutes he comes back with, no surprise, a restatement that the problem is on MelbourneIT’s side. But to sweeten the deal he throws in a final gem. He gives me the phone number-equivalent of contact@myprivate…, the phone number that is listed for every Yahoo! Private Domain and suggests I give that number a call, since it is a US phone number. In a manner of speaking, he suggests I try giving myself a call.
Yeah, right, I think, thank him and hang up.

Just for kicks, I dial the number:

Sorry, the mailbox is full and there is not enough space to leave a message. To leave a message for another subscriber, enter the area code or phone number for that subscriber.

LOL! Don’t believe me? Try it yourself. (510-595-2002)

So, in closing: If you sign up for Yahoo! Private Domain Registration, it works great – you won’t get any emails, or phone calls…and though I haven’t tested it, I wouldn’t expect too much mail to make it through that PO Box in Emeryville either.
So, am I missing something? Or is this service a farce at best? Is it anything more than an attempt by Yahoo! to appear to care about user privacy?

No? Well it would just be a good joke if this broken service didn’t also block Yahoo! customers from switching off of the Yahoo! Domains service and on to a competitor’s. Isn’t that a form of extortion?

Update February 26, 2008

I recently discovered that the above story does actually get worse: Yahoo! Private Domain Debacle Part II: Can’t Keep a Secret.

Who cares about Privacy: Why search queries in America trump sexual history in Africa.

Wednesday, December 6th, 2006

A couple of weeks ago I reported on Sam Clark’s presentation about interesting social science data collection efforts, in particular, research being done in the area of AIDS/HIV in Sub-Saharan Africa…a data collection project that was startling both for how intimate the survey questions were and for the cursory attention paid to privacy matters.

A week later, I attempted to give various rationales for why privacy did not figure prominently in Clark’s presentation. The sheer urgency of the AIDS epidemic, the relative powerlessness of the survey subjects and the relative irrelevance of databases, the internet and modern digital life as we know it to much of Africa seemed to me to be the three most powerful reasons.

So now I’d like to contrast that with the media frenzy of late in the First World over AOL’s unfortunate bungling that led to an unqualified release of user query data to the public.

So in the context of this DSS work in Africa, do the AOL users have a right to be outraged? If there was a leak of INDEPTH user data, would the U.S. media be condemning INDEPTH? or would they not care because the general African’s privacy is too far removed from our reality? Or maybe INDEPTH survey respondents are disenfranchised at this point?

What if there was an unfortunate bungling of personal information at INDEPTH? Who cares if we know AOL user 34653 is looking for a good cross-dressing cruise for couples and idolizes Cher, Castro and Trent Lott in a single breath. It’s trivial in comparison to Name, HIV status, # and type of sex partners in the last 6 months, # of times you’ve had unprotected sex in the last 6 months and with who.

What if an embattled, desperate government with a touch of psychosis decided that this data was handy for carrying out a genocidal “solution” to the AIDS epidemic?

I can’t help feeling that the fact that “the [AOL] data was leaked” is besides the point.

Yes it was careless, wrong, and inconsiderate. But in the end, is that really why people are so unhappy?

I think people are unhappy about the AOL data release because it was a surprise. People simply didn’t realize how much of their life was being captured, recorded and analyzed by search engines. Even with our modern-day sophistication, we are just as naive about the digital fingerprints we leave everywhere as the respondents are about the surveys they answer. In some ways you could say the respondents in INDEPTH’s DSS were more aware. They were painfully aware that their lives were being examined, and not only that, they knew and understood the goals of the organization that was collecting that information.

For AOL users, it was only after the data release that people started to realize that as an individual, you are laying bare your psyche: contemplations of suicide, murder, sexual hang-ups, personal insecurities, etc…so that the folks at AOL (and other search engines) can sell you better targeted advertising and make more money. Contrast that with what social scientists are trying to accomplish in sub-Saharan Africa and you start to feel like a cheap date.

What’s unfortunate is that the reason why the AOL data became compromised was because AOL was following others in the industry trying to “do good” by making their data available to academic researchers who might try to do something more with the data than figure out advertising schemes.

The takeaway here is that there is no straightforward, one-size-fits-most policy when it comes to privacy. It’s not about how much privacy is enough privacy. It’s not about whether people should share data or not share data. It’s clear that there are myriad circumstances that call for different levels of care on the part of the people collecting data and provoke different responses on the part of the people sharing information. Like most things having to do with human beings and society, privacy is context-sensitive and grand sweeping EULAs and privacy policies are insufficient, if not downright ridiculous for capturing how we should approach the issue, as an industry and as a society.

Now that AOL knows beyond a shadow of a doubt that their users can’t seem to be able to extrapolate from their generic, vague privacy policy, the natural consequences of using the AOL search service, they need to find a way to make the search experience itself clearly communicate to the user the data collection that is happening behind the scenes. The INDEPTH survey respondents wouldn’t be surprised to see themselves in a report on the sexual history of people who are HIV+ or living with AIDS in Sub-Saharan African. They’re answering a survey, what else would they expect?

Similarly, AOL users shouldn’t be surprised that someone is keeping track of what they search for, what sites they visit, for how long and how often.* Attaining this kind of mutual understanding with your users is much trickier and has yet to be done successfully. After all, AOL’s users don’t think of themselves as answering a survey when they conduct a search or visit a website. But as far as the researchers at AOL are concerned, that’s exactly what they’re doing,

*That being said, everyone should be surprised and outraged if any of this data is released without being properly anonymized. Whether or not everyone has the wherewithal and press connections to express their indignation and anger is another issue for another blog entry.

How to evaluate a privacy statement when you’re dying of AIDS

Sunday, November 12th, 2006

Last week, I reported on Professor Sam Clark’s recent talk: “Relational Databases in the Social and Health Sciences: The View from Demography.” Clark covered a wide array of topics from the challenges of working with heterogeneous sets of social science field research to data-driven outcome-modeling that is used to drive policy decisions in the arena of AIDS/HIV prevention and treatment in Sub-Saharan Africa.

As I mentioned last week, surprisingly, privacy did not come up during Professor Clark’s talk…except in a brief aside, where Clark acknowledged that study subjects are at times uncomfortable disclosing extra-marital relationships. On the whole, privacy did not appear to be a taking up too many cycles at either INDEPTH, a network of ‘Demographic Surveillance Systems’ (DSS is social science-speak for data collection sites) that is working to standardize field research, or SPEHR, Clark’s personal effort to design a standard database schema for social science research. At the risk of being presumptuous, ‘Demographic Surveillance System‘ itself speaks volumes about how social science regards the issue of privacy.

At the same time, the frequent media alerts about privacy and data leaks (HP, AOL, Veterans) got me wondering: How would this data be handled in a US-based study? How readily would you respond to an online survey asking you how many times you’ve had unprotected sex?

Not very well would be my guess. Forget allowing someone to compile a detailed log of your day-to-day sexual activity. People would never even get past the first 2 questions: Are you HIV positive? Are you living with AIDS? The ramifications of leaking such information are all too well-known in modern society.

Just to make sure that I hadn’t misread the lack of emphasis, I rooted around the INDEPTH website to see if I could find a meatier discussion about privacy.

I found a reference to “A Data Model for Demographic Surveillance Systems“, a 21 page paper which makes it’s first and last mention of privacy on p.18 in its ‘Conclusions and Future Work’ section:

“More work is needed for sites that require better data privacy than simply restricting access to the data set. Certainly, separating the name from the ID field is the first step in providing better data privacy.”

I also found “Data access, security and confidentiality“, a 174-word document in the INDEPTH DSS Resource Toolkit that recommends 3 things to researchers designing data collection systems:

1. Be clear about who has access to the data, what data do they have access to, and what level of access should they have.
2. Back up the data. A RAID server is ideal.
3. Separate survey respondent ID numbers from their names.

These are all good recommendations that demonstrate a willingness to address the issue. But isn’t this oversimplification at best and gross negligence at worst? Granted, I may be unfair in singling out INDEPTH to play the role of spokesperson for the entire social science community on the topic of privacy. So maybe all I really can say is that, at the very least, the folks at INDEPTH are seriously underestimating the challenges of taking on guardianship of sensitive personal data. Like the researchers at AOL, we can only wait for the consequences of their mis-estimation to play out.

So again, the sense I get is that privacy isn’t a major issue. Why’s that?

INDEPTH’s users have more important things to worry about. They’re not scanning people’s email to sell mattress companies more targeted advertising. They’re trying to do things like save a continent from implosion.

According to UNAIDS, in 2005 alone an estimated 3.2 million people in Sub-Saharan Africa became newly infected, while 2.4 million adults and children died of AIDS. In the U.S., which has less than 40% of the population of Sub-Saharan African, if 1 million Americans were dying AIDS every year, we wouldn’t be talking about privacy either.

A second, more insidious reason is that this flavor of information privacy is largely an information-age phenomena, one that requires the individual to understand the implications and weigh the risks of disclosure.

Our ‘modern-day’ awareness, or wariness of disclosure did not come for free. Even with all of the media frenzy, people regularly compromise their personal information in myriad ways everyday: Chocolate bars for passwords.

Nevertheless, no matter how tenuous a grasp the public has on data and databases, the level of sophistication mainstream America has achieved in the realm of ‘things digital’ is not to be taken for granted.

It’s not a matter of intelligence or common sense. I’m guessing that the people who willingly participate in DSS such as INDEPTH don’t have a gut-level appreciation of what it means to be ‘in the system’ for the simple reason that they live in pre-digital or barely digital societies and aren’t kept track of in their daily existence the way we are.

They don’t log in, they don’t enter passwords, PIN numbers or secret codes. They don’t answer self-selected security questions, swipe key fobs, scan ID cards, metro cards, and medical insurance cards. They don’t accept certificates, add people to whitelists, report spam. They don’t make spreadsheets, tag pictures, maintain ‘address books’, query their email or for that matter, query the web. They don’t inspect the history in their web browser to delete all the URLs that might not be so great for other people to inadvertently stumble across. They’ve never had an application rejected because of ‘low’ test scores and ‘bad’ grades. They’ve never been denied insurance for having ‘above average’ blood pressure. They’ve never been denied a mortgage for having ‘below average’ credit. They’ve never been audited by the IRS or logged into Amazon to be confronted with “Here’s a recommendation just for you: Getting pregnant after Menopause!”.

In other words, the subjects in this study don’t necessarily have a clear conception of this thing called a database that is going to consume their personal life history, chop it up into discrete cells, array it in rows and columns, making it all the more digestible for aggregating, analyzing, comparing and accessible to on-demand recall. The question is, when a respondent ‘consents’ to ‘participate in a survey’, do they understand what they’re consenting to? Do the field researchers themselves understand what respondents are consenting to?

Even if respondents did fully understand what ‘consent’ really meant (which is highly doubtful given that most First World internet users don’t fully digest what it means to ‘Accept’ a EULA), there still remains the unresolved issue of whether dire circumstances (e.g. lots of people dying with no end in sight) warrant slackened attention to privacy.

Up Next: Who cares about Privacy: Why search queries in America trump sexual history in Africa.

When Privacy Doesn’t Matter

Friday, November 3rd, 2006

Last Thursday MSR hosted Professor Sam Clark from the University of Washington for a talk entitled “Relational Databases in the Social and Health Sciences: The View from Demography.” For someone interested in using data for driving decision-making, it was interesting to hear about someone using empirical data to model the impact of different policies on a societal problem.

My main take-aways from the talk were as follows:

  • Social scientists today rarely use relational database (RDBMS) technology, or when they do, they use antique software. Apparently much analysis is done in statistical packages (I’m guessing SAS, and the like), which apparently lack much of the data management technology that is indispensable when working with larger datasets. For social scientists in general, the potential of current database technologies is only just becoming apparent.
    • As I am not familiar with many of the alternatives, had I been physically at the talk on campus rather than watching on-line, I would have liked to clarify what has changed to make RDBMS more attractive than it was before. I can only surmise from the talk that large data sets have only recently become available to social scientists, and that previous data sets were too small to warrant the RDBMS.
    • Even now, Clark said his colleagues would categorize a “large” dataset to be around 500 Megabytes.
  • Many early attempts at moving demographic data to relational data structures failed because the impact of the schema design on the demographical data uses was underestimated by those developing such systems.
  • Breadth of data has significant value to the longitudinal studies social scientists are conducting. Yet, lack of agreement on how to collect and store data is hampering their ability to interrelate data sets. Therefore, developing and agreeing on a standard is very desirable. (Incidentally, this problem is not exclusive to demographic datasets.)
  • Clark has done several iterations on a standard schema, particularly for capturing “Event-Influence-State” type datasets, commonly used in demography.
    • The Structured Population Event History Register (SPEHR).
    • One example he shared with us assessed “the impact of male circumcision as an HIV prevention strategy”. By using a longitudinal study (2 years, 3,000 people) to feed his simulation, he was able to demonstrate likely outcomes of the policy intervention in different phases of the epidemic; data to feed a real-world policy decision. :)

He also shared lots of anecdotal statistics about the AIDS epidemic in Africa, massive infection rates and death rates, which I continue to find mind-boggling: What would day-to-day living look like in the U.S. if 20% of Americans were infected with HIV? Or if we suddenly had millions of “dual-orphans” (normally a rare phenomenon) to raise? How will Africa recover?

All in all a very interesting talk with food for thought on many fronts, but one issue was conspicuously missing: Privacy.

Up Next: How to evaluate a privacy statement when you’re dying of AIDS.