Last week, I reported on Professor Sam Clark’s recent talk: “Relational Databases in the Social and Health Sciences: The View from Demography.” Clark covered a wide array of topics from the challenges of working with heterogeneous sets of social science field research to data-driven outcome-modeling that is used to drive policy decisions in the arena of AIDS/HIV prevention and treatment in Sub-Saharan Africa.
As I mentioned last week, surprisingly, privacy did not come up during Professor Clark’s talk…except in a brief aside, where Clark acknowledged that study subjects are at times uncomfortable disclosing extra-marital relationships. On the whole, privacy did not appear to be a taking up too many cycles at either INDEPTH, a network of ‘Demographic Surveillance Systems’ (DSS is social science-speak for data collection sites) that is working to standardize field research, or SPEHR, Clark’s personal effort to design a standard database schema for social science research. At the risk of being presumptuous, ‘Demographic Surveillance System‘ itself speaks volumes about how social science regards the issue of privacy.
At the same time, the frequent media alerts about privacy and data leaks (HP, AOL, Veterans) got me wondering: How would this data be handled in a US-based study? How readily would you respond to an online survey asking you how many times you’ve had unprotected sex?
Not very well would be my guess. Forget allowing someone to compile a detailed log of your day-to-day sexual activity. People would never even get past the first 2 questions: Are you HIV positive? Are you living with AIDS? The ramifications of leaking such information are all too well-known in modern society.
Just to make sure that I hadn’t misread the lack of emphasis, I rooted around the INDEPTH website to see if I could find a meatier discussion about privacy.
I found a reference to “A Data Model for Demographic Surveillance Systems“, a 21 page paper which makes it’s first and last mention of privacy on p.18 in its ‘Conclusions and Future Work’ section:
“More work is needed for sites that require better data privacy than simply restricting access to the data set. Certainly, separating the name from the ID field is the first step in providing better data privacy.”
I also found “Data access, security and confidentiality“, a 174-word document in the INDEPTH DSS Resource Toolkit that recommends 3 things to researchers designing data collection systems:
1. Be clear about who has access to the data, what data do they have access to, and what level of access should they have.
2. Back up the data. A RAID server is ideal.
3. Separate survey respondent ID numbers from their names.
These are all good recommendations that demonstrate a willingness to address the issue. But isn’t this oversimplification at best and gross negligence at worst? Granted, I may be unfair in singling out INDEPTH to play the role of spokesperson for the entire social science community on the topic of privacy. So maybe all I really can say is that, at the very least, the folks at INDEPTH are seriously underestimating the challenges of taking on guardianship of sensitive personal data. Like the researchers at AOL, we can only wait for the consequences of their mis-estimation to play out.
So again, the sense I get is that privacy isn’t a major issue. Why’s that?
INDEPTH’s users have more important things to worry about. They’re not scanning people’s email to sell mattress companies more targeted advertising. They’re trying to do things like save a continent from implosion.
According to UNAIDS, in 2005 alone an estimated 3.2 million people in Sub-Saharan Africa became newly infected, while 2.4 million adults and children died of AIDS. In the U.S., which has less than 40% of the population of Sub-Saharan African, if 1 million Americans were dying AIDS every year, we wouldn’t be talking about privacy either.
A second, more insidious reason is that this flavor of information privacy is largely an information-age phenomena, one that requires the individual to understand the implications and weigh the risks of disclosure.
Our ‘modern-day’ awareness, or wariness of disclosure did not come for free. Even with all of the media frenzy, people regularly compromise their personal information in myriad ways everyday: Chocolate bars for passwords.
Nevertheless, no matter how tenuous a grasp the public has on data and databases, the level of sophistication mainstream America has achieved in the realm of ‘things digital’ is not to be taken for granted.
It’s not a matter of intelligence or common sense. I’m guessing that the people who willingly participate in DSS such as INDEPTH don’t have a gut-level appreciation of what it means to be ‘in the system’ for the simple reason that they live in pre-digital or barely digital societies and aren’t kept track of in their daily existence the way we are.
They don’t log in, they don’t enter passwords, PIN numbers or secret codes. They don’t answer self-selected security questions, swipe key fobs, scan ID cards, metro cards, and medical insurance cards. They don’t accept certificates, add people to whitelists, report spam. They don’t make spreadsheets, tag pictures, maintain ‘address books’, query their email or for that matter, query the web. They don’t inspect the history in their web browser to delete all the URLs that might not be so great for other people to inadvertently stumble across. They’ve never had an application rejected because of ‘low’ test scores and ‘bad’ grades. They’ve never been denied insurance for having ‘above average’ blood pressure. They’ve never been denied a mortgage for having ‘below average’ credit. They’ve never been audited by the IRS or logged into Amazon to be confronted with “Here’s a recommendation just for you: Getting pregnant after Menopause!”.
In other words, the subjects in this study don’t necessarily have a clear conception of this thing called a database that is going to consume their personal life history, chop it up into discrete cells, array it in rows and columns, making it all the more digestible for aggregating, analyzing, comparing and accessible to on-demand recall. The question is, when a respondent ‘consents’ to ‘participate in a survey’, do they understand what they’re consenting to? Do the field researchers themselves understand what respondents are consenting to?
Even if respondents did fully understand what ‘consent’ really meant (which is highly doubtful given that most First World internet users don’t fully digest what it means to ‘Accept’ a EULA), there still remains the unresolved issue of whether dire circumstances (e.g. lots of people dying with no end in sight) warrant slackened attention to privacy.
Up Next: Who cares about Privacy: Why search queries in America trump sexual history in Africa.