Posts Tagged ‘Privacy’

Facebook: The Only Hotel California?

Thursday, February 14th, 2008

As the subject of recent splashy news on privacy and personal data collection, Facebook is starting to seem a little scary. In the words of one former user, Nipon Das, “It’s like the Hotel California. You can check out anytime you like, but you can never leave.” We’ve heard how difficult it is to remove yourself from Facebook.

We’ve seen how Facebook initially chose to launch Beacon, a advertising tool that told your friends about your activities on other websites, such as a purchase on eBay, without an easy opt-out mechanism, until outrage and a petition organized by MoveOn.org forced Facebook to change its policy.

Facebook employees are even poking around private user profiles for personal entertainment.

But although Facebook is at the forefront of a new kind of marketing, it’s not the only company with discomforting privacy policies and terms of use. Facebook’s statement that its terms are subject to change at any time is standard boilerplate. Its disclosure that it may share your information with third parties to provide you service is also pretty standard. After all, it’s certified by TRUSTe, the leading privacy certifier for online businesses. In fact, Facebook is arguably more explicit than most companies about what it’s doing because by its very nature, it’s more obvious that users’ personal information is being collected.

You could argue that the users do have a choice. They could choose not to use Facebook. But how did it turn out that in the big world of the internet, we have only two choices: 1) provide your personal information on the company’s terms; or 2) don’t use the service?

So far, it’s not clear that the controversy around Facebook has led to increased public concern about other companies and their personal data collection. It doesn’t even seem to have spilled over to all the programs that run on Facebook’s platform. No one seems perturbed that the creator of some random new application for feeding virtual fish now has access to his or her profile.

But there clearly is growing public unease, an increasing sense that our Google searches or our online purchases may be available to people we don’t know and can’t trust. Perhaps Facebook will end up providing an invaluable public service, albeit inadvertently, in making more people wonder, “What exactly did I agree to?”

Where that “study” you quoted came from: Remember that call you got during dinner?

Tuesday, May 29th, 2007

Over the last few months I’ve been to a number of interesting talks at the Stanford Methods of Analysis Program in the Social Sciences (MAPSS) colloquium. Two types of speakers have caught my attention: those who work closely with the logistics and mechanics of data collection, and those who try to use survey data to test their hypotheses.

Most recently I got to hear Linda Piekarski of Survey Sampling International on SSI’s efforts to address changes in the telephone system, as well as their recent forays into internet surveys. (I didn’t realize how perfect the original design for the U.S. phone system was for tele-survey companies.)

Also memorable was Yale Professor Don Green’s talk about measuring the effectiveness of political campaign advertising. One of my favorite lines (though I’m paraphrasing) was that “Any time you see a clean, clear graph of data, there’s something wrong. Data “noise” is what reality looks like.”

What follows is a summary of the challenges facing the collection of data about individuals derived in part from these talks.
Today, there are three main ways of collecting data from individuals, each of which contain flaws that seriously undermine the quality of the data collected.

  1. Pay them a tiny reward, lure them with a sweepstakes or nag them at dinner with a phone call from a stranger. For example, online stores may offer a coupon or rebate for your feedback on your buying experience.
  2. Make it easy for individuals to inadvertently or unthinkingly consent to data being collected about them, and/or subsequently changing the substances of what is collected, or the uses for that data. One prominent example is Amazon.com’s site registration process, which makes no attempt to highlight their third-party data-sharing practices.
  3. Leverage data collected for some other purpose – so-called “Secondary Use”. For example addresses collected for fulfillment (shipping) being used for geographically targeted marketing messages.

These mechanisms have a set of critical flaws:

  1. Tiny rewards and nagging phone calls are an insufficient value proposition for many individuals, thus the pool of participants is unlikely to be well distributed across the target distribution. Instead it will favor those individuals for whom the reward remains attractive, however small; or those individuals for whom the cost of participation (time) is small enough to make the reward adequate. (Mechanism 1)
  2. Rewards or compensation that are distributed without regard to accuracy provide no incentive for either careful or genuine accurate self-reporting. (Mechanism 1)
  3. These practices cultivate a public perception of a mesh of “big brother” networks collecting an ever-expanding set of data, beyond the control of any one individual. Privacy outrage still surfaces in mainstream media occasionally, but the general public is increasingly numb to incremental discoveries of the erosion of personal privacy. While anesthesia may appear temporarily attractive to data collectors, it also disengages individuals from the data collection goals, which decreases participation and discourages accurate self-reporting. For example, when you are pressured to answer a survey at a department store or after check-out at a web retailer, do you react with an earnest attempt to supply them with the information they need? (All mechanisms)
  4. In an effort to fight back the ever-increasing invasive data collection going on, privacy legislation and legal liability has forced data to be “silo-ed” and “anonymized” as much as possible. That means that unless you are a part of a larger survey panel, each subsequent survey you complete or data you consent to have collected will be stored separately from your other data. This eliminates the possibility of data-accuracy maintenance by individuals, and makes longitudinal analysis increasingly difficult. (All mechanisms)

Yahoo! Private Domain Registration: If it’s broken, don’t fix it?

Thursday, January 25th, 2007

Recently I setup a temporary personal web site that I was concerned might see a traffic spike, and rather than going through my usual registrar and web host, I tried a cheap off-the-shelf package from Yahoo! instead.

Yahoo! offers an add-on service called “Private Domain Registration” where they hide your contact information from the WHOIS database for an additional $0.75/month. Familiar with WHOIS spam, the service sounded great to me and at that price it was practically a free lunch.

Everything worked smoothly (6 Months, 0 Spam) until I shut the site down and decided to transfer the domain from Yahoo! Domains to my normal registrar. The following is a true story about how I learned that Yahoo! Private Domain Registration is broken and is effectively holding my contact info for ransom.

******

The process of transferring a domain between registrars is designed to avoid fraudulent transfers by people trying to steal domains. There are several authorization steps, one is: The new registrar (Tucows) sends an email to the the current owner of the domain registration (me). When I got to that step in the process my new registrar informed me that their repeated attempts to do so had failed.

Sigh.

So, I contact Yahoo! Domains support over email, assuming that they are having a problem with their mail servers, or that the authorization emails are being blocked by a spam filter. Instead, Yahoo! informs me that the service is working exactly as expected:

I understand that you want to transfer the domain registration to “Tucows”, but you are unable to receive the mail sent by them to your Admin email address “contact@myprivateregistration.com”.

Regarding your issue, I have checked the record and found that you have activate the Private domain registration on your domain “[domain removed]”, in order to conceal your personal information from unwanted solicitors by listing contact information for Yahoo!’s domain name registration partner, Melbourne IT, in place of your own registrant, administrative, technical, and billing contact information in the public WHOIS database. [sic] Your own contact information will remain associated with your domain in Yahoo!/MelbourneIT’s database but will not be made available in the public WHOIS.

So, in order to show your actual information in the public WHOIS record, you have to disable the private domain registration.

Yahoo! Domains Support doesn’t expect you to receive any email when you have “Private Domain Registration” turned on. In order to complete the registration, I need to turn off the privacy feature and expose my real email address.

Quality.

My initial reaction is: I have misunderstood the feature. I read up on the service offering, as well as the slightly more detailed help content, and it turns out that I’m right; something is wrong with the service.

From the Yahoo! Private Domain Registration marketing page (my bold):

How Does Private Domain Registration Work?

  • When you sign up, our partner Melbourne IT updates your registration listing with generic contact information that points to MelbourneIT’s offices.
  • Whenever someone looks up your domain and tries to contact you, Melbourne IT receives the call, email, or letter and screens the information on your behalf.
  • Melbourne IT forwards prescreened communications to you, so you can reply as you see fit.

What does this mean? In practice Yahoo!, with the help of MelbourneIT, replaces your contact email address with contact@myprivateregistration.com, your address with a PO Box in Emeryville, CA, and your phone number with their phone number, all for $0.75/month. How could they possibly afford to do that?

I reply to the support mail explaining the discrepancy between the feature list and the service I have been experiencing, and ask for a refund for the last 6 months of service.
Later that week…

1. Yahoo! still has not responded to my email. Several more attempts have been made by my registrar to contact me through the pre-screening service.

2. I decide to call the Yahoo! support phone number. To my surprise, someone promptly answers the phone, and within 10 minutes I have my answer: The mails are getting blocked by spam filters, but Yahoo has no control over their own spam filters, so nothing can be done about my problem. I am surprised that this is an acceptable answer, but I let it go and allow myself to be forwarded to billing to request a refund.

3. Billing listens to my complaint, and then spends several minutes trying to transfer me back to tech support to help resolve my issue. I re-explain that tech support has already given up on resolving it. There is some confusion on the line.

I am disconnected, apparently unintentionally.

4. I call back, and this time ask for billing support immediately. I am transferred to Yahoo! Personals support, where the operator informs me that I have called the wrong number, and gives me a new number to call.
5. Finally, I get another billing support agent on the phone, and this time make it clear up front that I want a refund for the service. The agent I speak with informs me that when I cancel the service, I will be refunded a pro-rated amount for the remainder of the month. As for the past six months of service they have already provided, no refund would be supplied, as the service has already been rendered.

As far as I am concerned, this is not acceptable. The way I see it, the 6 months of privacy “protection” they provided are about to be voided because their service doesn’t work the way it’s supposed to, which in turn makes it impossible for me to transfer my domain registration away from Yahoo! without exposing my personal contact info.

I point out to them that this amounts to blackmail – my privacy is being held hostage to keep me a Yahoo! customer. There is a pause on the other end of the line when I mention to her that I will be writing this up as a blog entry.Finally she says “The bottom line is, I can’t refund you for more than the current month.”

I asked her to escalate my complaint, and she puts me on hold for a few minutes. When she returns she informs me that I will receive an email with a “decision”.

I sit grumbling, hammering out this blog post as the best way to escalate the issue, when I think of another approach. I send a mail quickly to contact@myprivateregistration.com. It bounces back immediately. (Try it yourself.) This wasn’t about registrar mails getting bounced, nor did it seem to be about spam filters; I am quite certain now that all mails get bounced, regardless of content.

What’s more, in writing the test email, I realize something else that should have been obvious to me before: Everyone with the Private Domain Registration service gets the same generic contact@myprivate…email address. Ditto for the PO Box and the phone number. Meaning, in order for the pre-screening service to work, some system or person would have to scan each individual communication in order to decide which ones were directed at which domain owners.

How could that possibly work for $0.75/month? Hmmm…the free lunch is sounding less and less like lunch.

Anyway, the all-mails-bounce problem seems like a more concrete issue for the tech support folks to chew on, so I call back.

6. This time, I get a helpful support agent on the line, repeat my story, and even get him to send a mail to see it bounce with his own eyes. His initial response is also that the service is working as expected, and I direct him to the URL that describes the service so that he can understand my problem. After much ado, he decides that the problem is with their partner MelbourneIT, (a diagnosis I agree with) and that therefore I should contact them to resolve the issue.

HEADS UP, BIG BUCK PASSIN’ THROUGH!

Then he gives me a long distance phone number to Australia that he suggests I call. I laugh. He also thinks this is silly, and hopes, for my sake, that they speak English over there.

I try another tack: I explain to him that from Yahoo!’s perspective, this isn’t about my individual complaint, but that everyone who is paying for this service is being affected. I recommend that he escalate this to his manager, and he seems to understand what I am saying, but is also reaching the end of his patience. I can tell that whomever he’s working with on his side is not as sympathetic. He puts me on hold again, and I go to the MelbourneIT website to check out their online support.

As it turns out, MelbourneIT has a nifty support tool that allows me to identify my problem and domain. I write a quick note and submit the request.

Minutes later, while still on hold with Yahoo!, I get an automated reply to my complaint (my bold):

THIS IS A SYSTEM GENERATED MESSAGE

A Melbourne IT Reseller manages the domains specified in your message.

Please contact this reseller using the details below for any assistance you require. If the person you contact refers you back to us, ask them if they would please contact us on your behalf.

Reseller details:

Yahoo Inc.
Web address: domains.yahoo.com
Email address: domains-support@cc.yahoo-inc.com

Genius! An automatic buck passer. Lucky for me, I’m still on the phone with Yahoo!

When my Yahoo! support agent comes back to the phone, he says that a “special note” has been added to my case to indicate that this issue may affect other Yahoo! customers, and re-recommends that I contact MelbourneIT.

He is quite disappointed when I read him the automated reply from MelbourneIT.

I try explaining to him why I think MelbourneIT is right – after all, Yahoo! contracts MelbourneIT to provide the service – MelbourneIT doesn’t know who I am as an individual. I pay Yahoo!, Yahoo! pays MelbourneIT – if I have a problem, I ask Yahoo! to fix it. If Yahoo! has a problem with MelbourneIT, they ask MelbourneIT to fix it. Who do I want a refund from? Yahoo! Who’s holding my privacy hostage? Yahoo!

At this point, I decide that a blog post is a more effective use of my time and energy, but I let the support agent put me on hold one last time to get a final response from his management.

After several minutes he comes back with, no surprise, a restatement that the problem is on MelbourneIT’s side. But to sweeten the deal he throws in a final gem. He gives me the phone number-equivalent of contact@myprivate…, the phone number that is listed for every Yahoo! Private Domain and suggests I give that number a call, since it is a US phone number. In a manner of speaking, he suggests I try giving myself a call.
Yeah, right, I think, thank him and hang up.

Just for kicks, I dial the number:

Sorry, the mailbox is full and there is not enough space to leave a message. To leave a message for another subscriber, enter the area code or phone number for that subscriber.

LOL! Don’t believe me? Try it yourself. (510-595-2002)

So, in closing: If you sign up for Yahoo! Private Domain Registration, it works great – you won’t get any emails, or phone calls…and though I haven’t tested it, I wouldn’t expect too much mail to make it through that PO Box in Emeryville either.
So, am I missing something? Or is this service a farce at best? Is it anything more than an attempt by Yahoo! to appear to care about user privacy?

No? Well it would just be a good joke if this broken service didn’t also block Yahoo! customers from switching off of the Yahoo! Domains service and on to a competitor’s. Isn’t that a form of extortion?

Update February 26, 2008

I recently discovered that the above story does actually get worse: Yahoo! Private Domain Debacle Part II: Can’t Keep a Secret.

Who cares about Privacy: Why search queries in America trump sexual history in Africa.

Wednesday, December 6th, 2006

A couple of weeks ago I reported on Sam Clark’s presentation about interesting social science data collection efforts, in particular, research being done in the area of AIDS/HIV in Sub-Saharan Africa…a data collection project that was startling both for how intimate the survey questions were and for the cursory attention paid to privacy matters.

A week later, I attempted to give various rationales for why privacy did not figure prominently in Clark’s presentation. The sheer urgency of the AIDS epidemic, the relative powerlessness of the survey subjects and the relative irrelevance of databases, the internet and modern digital life as we know it to much of Africa seemed to me to be the three most powerful reasons.

So now I’d like to contrast that with the media frenzy of late in the First World over AOL’s unfortunate bungling that led to an unqualified release of user query data to the public.

So in the context of this DSS work in Africa, do the AOL users have a right to be outraged? If there was a leak of INDEPTH user data, would the U.S. media be condemning INDEPTH? or would they not care because the general African’s privacy is too far removed from our reality? Or maybe INDEPTH survey respondents are disenfranchised at this point?

What if there was an unfortunate bungling of personal information at INDEPTH? Who cares if we know AOL user 34653 is looking for a good cross-dressing cruise for couples and idolizes Cher, Castro and Trent Lott in a single breath. It’s trivial in comparison to Name, HIV status, # and type of sex partners in the last 6 months, # of times you’ve had unprotected sex in the last 6 months and with who.

What if an embattled, desperate government with a touch of psychosis decided that this data was handy for carrying out a genocidal “solution” to the AIDS epidemic?

I can’t help feeling that the fact that “the [AOL] data was leaked” is besides the point.

Yes it was careless, wrong, and inconsiderate. But in the end, is that really why people are so unhappy?

I think people are unhappy about the AOL data release because it was a surprise. People simply didn’t realize how much of their life was being captured, recorded and analyzed by search engines. Even with our modern-day sophistication, we are just as naive about the digital fingerprints we leave everywhere as the respondents are about the surveys they answer. In some ways you could say the respondents in INDEPTH’s DSS were more aware. They were painfully aware that their lives were being examined, and not only that, they knew and understood the goals of the organization that was collecting that information.

For AOL users, it was only after the data release that people started to realize that as an individual, you are laying bare your psyche: contemplations of suicide, murder, sexual hang-ups, personal insecurities, etc…so that the folks at AOL (and other search engines) can sell you better targeted advertising and make more money. Contrast that with what social scientists are trying to accomplish in sub-Saharan Africa and you start to feel like a cheap date.

What’s unfortunate is that the reason why the AOL data became compromised was because AOL was following others in the industry trying to “do good” by making their data available to academic researchers who might try to do something more with the data than figure out advertising schemes.

The takeaway here is that there is no straightforward, one-size-fits-most policy when it comes to privacy. It’s not about how much privacy is enough privacy. It’s not about whether people should share data or not share data. It’s clear that there are myriad circumstances that call for different levels of care on the part of the people collecting data and provoke different responses on the part of the people sharing information. Like most things having to do with human beings and society, privacy is context-sensitive and grand sweeping EULAs and privacy policies are insufficient, if not downright ridiculous for capturing how we should approach the issue, as an industry and as a society.

Now that AOL knows beyond a shadow of a doubt that their users can’t seem to be able to extrapolate from their generic, vague privacy policy, the natural consequences of using the AOL search service, they need to find a way to make the search experience itself clearly communicate to the user the data collection that is happening behind the scenes. The INDEPTH survey respondents wouldn’t be surprised to see themselves in a report on the sexual history of people who are HIV+ or living with AIDS in Sub-Saharan African. They’re answering a survey, what else would they expect?

Similarly, AOL users shouldn’t be surprised that someone is keeping track of what they search for, what sites they visit, for how long and how often.* Attaining this kind of mutual understanding with your users is much trickier and has yet to be done successfully. After all, AOL’s users don’t think of themselves as answering a survey when they conduct a search or visit a website. But as far as the researchers at AOL are concerned, that’s exactly what they’re doing,

*That being said, everyone should be surprised and outraged if any of this data is released without being properly anonymized. Whether or not everyone has the wherewithal and press connections to express their indignation and anger is another issue for another blog entry.

How to evaluate a privacy statement when you’re dying of AIDS

Sunday, November 12th, 2006

Last week, I reported on Professor Sam Clark’s recent talk: “Relational Databases in the Social and Health Sciences: The View from Demography.” Clark covered a wide array of topics from the challenges of working with heterogeneous sets of social science field research to data-driven outcome-modeling that is used to drive policy decisions in the arena of AIDS/HIV prevention and treatment in Sub-Saharan Africa.

As I mentioned last week, surprisingly, privacy did not come up during Professor Clark’s talk…except in a brief aside, where Clark acknowledged that study subjects are at times uncomfortable disclosing extra-marital relationships. On the whole, privacy did not appear to be a taking up too many cycles at either INDEPTH, a network of ‘Demographic Surveillance Systems’ (DSS is social science-speak for data collection sites) that is working to standardize field research, or SPEHR, Clark’s personal effort to design a standard database schema for social science research. At the risk of being presumptuous, ‘Demographic Surveillance System‘ itself speaks volumes about how social science regards the issue of privacy.

At the same time, the frequent media alerts about privacy and data leaks (HP, AOL, Veterans) got me wondering: How would this data be handled in a US-based study? How readily would you respond to an online survey asking you how many times you’ve had unprotected sex?

Not very well would be my guess. Forget allowing someone to compile a detailed log of your day-to-day sexual activity. People would never even get past the first 2 questions: Are you HIV positive? Are you living with AIDS? The ramifications of leaking such information are all too well-known in modern society.

Just to make sure that I hadn’t misread the lack of emphasis, I rooted around the INDEPTH website to see if I could find a meatier discussion about privacy.

I found a reference to “A Data Model for Demographic Surveillance Systems“, a 21 page paper which makes it’s first and last mention of privacy on p.18 in its ‘Conclusions and Future Work’ section:

“More work is needed for sites that require better data privacy than simply restricting access to the data set. Certainly, separating the name from the ID field is the first step in providing better data privacy.”

I also found “Data access, security and confidentiality“, a 174-word document in the INDEPTH DSS Resource Toolkit that recommends 3 things to researchers designing data collection systems:

1. Be clear about who has access to the data, what data do they have access to, and what level of access should they have.
2. Back up the data. A RAID server is ideal.
3. Separate survey respondent ID numbers from their names.

These are all good recommendations that demonstrate a willingness to address the issue. But isn’t this oversimplification at best and gross negligence at worst? Granted, I may be unfair in singling out INDEPTH to play the role of spokesperson for the entire social science community on the topic of privacy. So maybe all I really can say is that, at the very least, the folks at INDEPTH are seriously underestimating the challenges of taking on guardianship of sensitive personal data. Like the researchers at AOL, we can only wait for the consequences of their mis-estimation to play out.

So again, the sense I get is that privacy isn’t a major issue. Why’s that?

INDEPTH’s users have more important things to worry about. They’re not scanning people’s email to sell mattress companies more targeted advertising. They’re trying to do things like save a continent from implosion.

According to UNAIDS, in 2005 alone an estimated 3.2 million people in Sub-Saharan Africa became newly infected, while 2.4 million adults and children died of AIDS. In the U.S., which has less than 40% of the population of Sub-Saharan African, if 1 million Americans were dying AIDS every year, we wouldn’t be talking about privacy either.

A second, more insidious reason is that this flavor of information privacy is largely an information-age phenomena, one that requires the individual to understand the implications and weigh the risks of disclosure.

Our ‘modern-day’ awareness, or wariness of disclosure did not come for free. Even with all of the media frenzy, people regularly compromise their personal information in myriad ways everyday: Chocolate bars for passwords.

Nevertheless, no matter how tenuous a grasp the public has on data and databases, the level of sophistication mainstream America has achieved in the realm of ‘things digital’ is not to be taken for granted.

It’s not a matter of intelligence or common sense. I’m guessing that the people who willingly participate in DSS such as INDEPTH don’t have a gut-level appreciation of what it means to be ‘in the system’ for the simple reason that they live in pre-digital or barely digital societies and aren’t kept track of in their daily existence the way we are.

They don’t log in, they don’t enter passwords, PIN numbers or secret codes. They don’t answer self-selected security questions, swipe key fobs, scan ID cards, metro cards, and medical insurance cards. They don’t accept certificates, add people to whitelists, report spam. They don’t make spreadsheets, tag pictures, maintain ‘address books’, query their email or for that matter, query the web. They don’t inspect the history in their web browser to delete all the URLs that might not be so great for other people to inadvertently stumble across. They’ve never had an application rejected because of ‘low’ test scores and ‘bad’ grades. They’ve never been denied insurance for having ‘above average’ blood pressure. They’ve never been denied a mortgage for having ‘below average’ credit. They’ve never been audited by the IRS or logged into Amazon to be confronted with “Here’s a recommendation just for you: Getting pregnant after Menopause!”.

In other words, the subjects in this study don’t necessarily have a clear conception of this thing called a database that is going to consume their personal life history, chop it up into discrete cells, array it in rows and columns, making it all the more digestible for aggregating, analyzing, comparing and accessible to on-demand recall. The question is, when a respondent ‘consents’ to ‘participate in a survey’, do they understand what they’re consenting to? Do the field researchers themselves understand what respondents are consenting to?

Even if respondents did fully understand what ‘consent’ really meant (which is highly doubtful given that most First World internet users don’t fully digest what it means to ‘Accept’ a EULA), there still remains the unresolved issue of whether dire circumstances (e.g. lots of people dying with no end in sight) warrant slackened attention to privacy.

Up Next: Who cares about Privacy: Why search queries in America trump sexual history in Africa.

When Privacy Doesn’t Matter

Friday, November 3rd, 2006

Last Thursday MSR hosted Professor Sam Clark from the University of Washington for a talk entitled “Relational Databases in the Social and Health Sciences: The View from Demography.” For someone interested in using data for driving decision-making, it was interesting to hear about someone using empirical data to model the impact of different policies on a societal problem.

My main take-aways from the talk were as follows:

  • Social scientists today rarely use relational database (RDBMS) technology, or when they do, they use antique software. Apparently much analysis is done in statistical packages (I’m guessing SAS, and the like), which apparently lack much of the data management technology that is indispensable when working with larger datasets. For social scientists in general, the potential of current database technologies is only just becoming apparent.
    • As I am not familiar with many of the alternatives, had I been physically at the talk on campus rather than watching on-line, I would have liked to clarify what has changed to make RDBMS more attractive than it was before. I can only surmise from the talk that large data sets have only recently become available to social scientists, and that previous data sets were too small to warrant the RDBMS.
    • Even now, Clark said his colleagues would categorize a “large” dataset to be around 500 Megabytes.
  • Many early attempts at moving demographic data to relational data structures failed because the impact of the schema design on the demographical data uses was underestimated by those developing such systems.
  • Breadth of data has significant value to the longitudinal studies social scientists are conducting. Yet, lack of agreement on how to collect and store data is hampering their ability to interrelate data sets. Therefore, developing and agreeing on a standard is very desirable. (Incidentally, this problem is not exclusive to demographic datasets.)
  • Clark has done several iterations on a standard schema, particularly for capturing “Event-Influence-State” type datasets, commonly used in demography.
    • The Structured Population Event History Register (SPEHR).
    • One example he shared with us assessed “the impact of male circumcision as an HIV prevention strategy”. By using a longitudinal study (2 years, 3,000 people) to feed his simulation, he was able to demonstrate likely outcomes of the policy intervention in different phases of the epidemic; data to feed a real-world policy decision. :)

He also shared lots of anecdotal statistics about the AIDS epidemic in Africa, massive infection rates and death rates, which I continue to find mind-boggling: What would day-to-day living look like in the U.S. if 20% of Americans were infected with HIV? Or if we suddenly had millions of “dual-orphans” (normally a rare phenomenon) to raise? How will Africa recover?

All in all a very interesting talk with food for thought on many fronts, but one issue was conspicuously missing: Privacy.

Up Next: How to evaluate a privacy statement when you’re dying of AIDS.

Privacy Paranoia Part II: What are they afraid of?

Tuesday, October 24th, 2006

In Privacy Paranoia Part I, I questioned the assumption that people are intrinsically suspicious of data collection efforts and generally unwilling to volunteer personal information, by walking through a few everyday examples of information sharing.

However, while there are an abundance of scenarios and circumstances under which you and I are happy to reveal personal data, that does not change the stubborn fact that users generally are suspicious of data collection efforts and in many cases would choose NOT to share personal information. (Except for a lack of patience for reading fine print and paying attention to default settings on the software they install.)

Privacy Paranoia Part II addresses this apparent inconsistency which clears the path to Part III, which will address concrete ways to change user attitudes toward data collection.

The general public’s seemingly contradictory relationship with information-sharing can be explained away once we, as web service providers, accept responsibility for the reaction we provoke in our users.

In the real world, information-sharing works as a quid pro quo where both sides agree to terms they can live with and exchange information accordingly.

In the world of online services, we as service providers are attempting to engage our users in this exchange, but we present it as a one-sided deal. You give, we take. The terminology we use as an industry belie our inward focus. We don’t engage in information-sharing with our users. We collect data. We mine data. We warehouse data.

So, the million-dollar question is: What do we need to provide our users in order to engage them in an information-exchange with us?

1. Transparency of intent. As the user, if I know why you need the information you are requesting, I am more likely to give it to you, even if there are opportunities for you to re-purpose my information in ways I don’t intend.

2. Personal benefit (If I need to tell you.)

  • I give complete strangers on eBay my home address, in exchange for having my purchase arrive on my doorstep.
  • I tell my credit card company what I purchased, where I purchased and when I purchased it, in exchange for being free from the constraints of managing cash.

1 and 2 are as far as most people go. And many people have pretty low standards for 2.

3. Reputation. What is the reputation of the person/entity that is requesting this information? Are they going to maliciously misuse my information? Are they going to take care with my information? Are they even capable of understanding what “taking care with my information” means? (As in, are they clueless enough to transmit my credit card number in plain text?)

4. What else could the requester do with this information? How valuable, how sensitive is the information I’m giving out?

Today, few people weigh these factors systematically, not because they don’t want to, but because they can’t. The services, organizations and businesses asking for phone numbers, addresses, gender, income, credit card numbers and social security numbers aren’t holding up their end of the quid pro quo.

1. Transparency into the Hows, Whys, Whens and What-fors
2. Exchanging data rather than Collecting data

As a result, in place of rational evaluation, habit and confusing design rule. Some people run their own email servers and devise dozens of aliases to throw ‘Big Brother’ off the trail. Others happily hand over their data in exchange for the famous free bar of chocolate in the subway.

This makes it very hard to predict how the general public will deal with information-sharing services. The reaction could run the gamut from paranoid revulsion to earnest enthusiasm to blasé indifference. This in turn makes the quality of the data we hope to collect and build a service around, unreliable and uneven. We want everyone to be represented in the data pool, paranoiacs included.

Therefore, if we want to neutralize the randomizing influence of personality, we must find a way to walk people through evaluating questions 1-4 in a rational and considered way; and hopefully the answers they come up with convince them that participating in the information-sharing community is in their best interest.

How do we do that?

Privacy Paranoia Part I: What are we afraid of?

Wednesday, October 18th, 2006

If a stranger asked you on the street “What is your street address?” you would probably be pretty startled at his presumption and walk away. What part of town you’re from is friendly chit-chat, but street address is a tad too specific for comfort. After all, what business could he possibly have with your address? However, If that same stranger is standing behind a counter at a store, wearing a uniform asking the same question, you still might not give him your address, but you’d have a better sense of why he was asking, what he’s likely to do with the information and how it will affect your life (more snail mail SPAM).

You may also wonder if the stranger will abuse his access privileges and re-purpose your personal information for his own interests, possibly at your expense (e.g. identity theft). How likely is this? That depends on a whole host of factors from the brand and reputation of the store, your past experiences with the store, the dress and mannerisms of the stranger, personal biases, etc.

When a security gate asks you to identify yourself with your swipe card, you volunteer personal information (who you are, where you are and when you were there) without even thinking about it. The social contract is clear: If I tell you who I am, you (the disembodied security system instituted by the disembodied corporation I work for) will let me in so I can go to work, make money and support myself and my expensive spending habits. Besides, who cares if everyone in the world knows that I was at work at 9:14 AM in the morning? How could that information possibly harm me in the future?

Finally, when your doctor wants to know if you’re sexually active or abusing drugs, depending on how ill you feel, how desperate you are to feel better and the political leanings of the hospital, you’ll spill your guts, because that’s what you’re supposed to do with doctors.

Once you get past these questions of Who, Why, For What and How, you might ask yourself if the person, business or organization who is asking for your information is even capable of taking responsibility for it.

Clearly, we wear our personal information on our sleeves in a variety of ways in a broad range of situations every day, multiple times a day. Yet, as an industry, we’ve pretty much given up on the idea that users will volunteer personal information to a web service. Instead, we resort to not-so-subtle tricks that we hope our users won’t notice. Clever default settings and EULAs we know our users don’t read. However, this is neither the right way to go about building a user base, nor is it sustainable. It is also, by no means, the only way.

Privacy Paranoia Part II: What are they afraid of?

FreshBooks Aligns Data Collection with its Customers’ Interests

Wednesday, October 11th, 2006

I think FreshBooks is attempting something very interesting.

[Freshbooks is geared toward small businesses and/or independent contractors. From their Manifesto: “Our mission is to deliver fast and simple invoicing and time tracking services that help you manage your business.”]

They are asking their users to optionally classify their profession/industry. In return, participants gain access to business metrics for their industry, based on aggregations of data collected from the Freshbooks user population.

The examples they give are

  • “What is the average invoice size for [your profession]?”
  • “How long does the average [your profession] take to get paid?”
  • “What is the average monthly revenue of other [your profession]?”

I would imagine this will raise many a small business eyebrow. However, they still feel thin and generic to me. I want to know:

  • “How many years of experience do other professionals in my industry have?”
  • “What are their industry credentials? Education? Training? Skill set? Work experience?”
  • “What is the quality of their clientèle?”
  • “Where is there operation based?”
  • “What kind of capital investments have they made?

Collecting data from users is not new. Collecting data from users to provide a service is not new (if you consider targeted advertising a user service). However, there is something unique about what Freshbooks is doing that differentiates it from the various other data collection efforts on the internet. They have figured out a way to provide data to their customers that provides tangible, monetary value to their users; value that their users would probably be willing to pay for, and value that is difficult (expensive!) if not impossible for them to get anywhere else.

Furthermore, Freshbooks’ model turns the tables on data collection and privacy. In place of a parasitic relationship where Internet Company as Big Brother spies on users in order to make big bucks selling Targeted Advertising, a symbiotic exchange is established where users happily provide personal data in exchange for a tangible good in return. Sounds too good to be true? It probably is in the immediate future.

It’s worth noting that

  1. Freshbooks is collecting data from a real service they provide (as opposed to polls and surveys). This minimizes the risk of collecting bogus data.
  2. Because FreshBooks implies they will only tell you about the industry you indicate (thereby encouraging you to provide an accurate categorization or be given useless data) data inaccuracies due to user information distortions should be minimal.
  3. Freshbooks is being at least semi-transparent about what they’re doing with the data they collect. As a result, Freshbooks is establishing a trust relationship with their users, which turns the data they collect from their users into a renewable resource, as opposed to one (advertising) that runs dry as soon as users find out they’re being spied on.I say semi-transparent because:3a. Freshbooks is not being completely forthright about who else they may or may not be selling this data to.3b. Implicit is the fact that Freshbooks can also use this data to optimize their own business and pricing strategies.
  4. Although they are not charging for this data yet, the information (to any given customer) would probably be valued at at least $100s/year. (How Freshbooks might choose to monetize that value is a different story.) By contrast, the dollars that Freshbooks might have been able to get from selling targeted advertising for that customer’s eyeballs is unlikely to approach $100/year.
  5. Freshbooks reassures its users that their data is only used in its “anonymous aggregate form”. However, the term ‘data aggregates’ is so vague as to be largely useless. Freshbooks still doesn’t have a complete story about how they will protect the individual identities of their users.
  6. I’m not clear on how this new program jibes with the FreshBooks privacy statement, which under the heading “Ownership of Data Submitted to Active FreshBooks Subscriptions” suggests that user data is owned by the user, not by FreshBooks. How then does Freshbooks have the right to aggregate and share your data with other users? Does Freshbooks only collect data from users who opt-in to share/view data? If so, that severely limits their data pool. I wonder how many of their 90,000+ users are considered active and will opt-in…?

I’m very interested to hear if this sticks, and if their users are able to jump over the hurdle of giving up a little bit of privacy for a little bit of information. The relevancy of the data will presumably be a factor in continued participation.

What they should be doing:

  • Providing context about what’s missing: It is as important to understand who isn’t participating in providing data, as it is to know who is.
  • Provide context about their users: It is as important to understand the demographics, circumstances and nature of the other participants as it is to know what they raw accounting numbers are. After all, do I, as an small-town consultant really care what the big boys are charging on Madison avenue?
  • Taking a lot of care with the aggregates such that some sort of data-release scandal doesn’t come and bite them.
  • Refrain from using their data for parasitic reasons which undermine the trust relationship they’re building with their users.
  • Provide a way for users to cleanly and completely end their participation in the data collection program.

While time will tell what happens with the execution of this effort, I am excited by the attempt: A business that collects data from their users and returns to them business intelligence, rather than handing over the customer relationships they built to the highest pay-per-click bidder.