Posts Tagged ‘Public Health’

Data’s endless possibilities

Friday, January 9th, 2009

The New York Times recently published a succinct but meaty article on New York City’s new electronic health record system.  Planned and promoted by the Bloomberg administration, the system includes about 1000 primary care physicians, focused primarily on three of the poorest neighborhoods, and the data they generate about their patients.  As I read it, I found myself counting all the different functions of the system.  I found at least ten:

•    Clean up outdated filing systems;
•    Enable a doctor to compare how one patient is doing compared with his or her other patients;
•    Enable a doctor to compare how one patient is doing compared to patients all over the city;
•    Enable the city’s public health department to monitor disease frequency and outbreaks, like the flu;
•    Enable the city to promote preventative measures, like cancer screening in new ways;
•    Create new financial incentives for doctors to improve their patients’ health, on measures like controlling blood pressure or cholesterol;
•    Provide reports cards to doctors comparing their results with other doctors’;
•    Improve care by less-experienced doctors with advice and information based on a patient’s age, sex, ethnic background, and medical history, including prompts to provide routine tests and vaccinations and warnings on how drugs can potentially interact;
•    Allow doctors to follow up more closely with patients, like reminding them of appointments through new calling and text-messaging systems and being notified if their patients do not fill prescriptions; and
•    Allow patients to access their own records, make appointments electronically, and monitor their own progress on health targets (should the doctor decide to do so);

Pretty amazing, isn’t it?

Data is like that.  Once you collect it, the possibilities are endless.  Reading about this one system for health records made me realize why it’s so hard for me to describe CDP’s goals in one sentence.  We’re not trying to do something singular, like “enable a doctor to compare patients’ data.”  We’re trying to create a place where this function, and innumerable other possibilities can exist, while also being mindful that “endless possibilities” include some scary ones that we need to guard against.

Making personal data more personal

Monday, December 29th, 2008


The New York State Department of Health recently launched a new online tool for researching the prevalence of certain medical conditions by zip code.  It has a terribly boring name—Prevention Quality Indicators in New York State—but what they’re providing is very exciting.

Prevention Quality Indicators or PQIs are a set of measures developed by a federal health agency.  They count the number of people admitted to hospitals for a specific list of twelve conditions, some of which include various complications from diabetes, hypertension, asthma, and urinary tract infections.  All of these are conditions in which good preventative care can help avoid hospitalization or the development of more severe conditions.  As the Department explains, “The PQIs can be used as a starting point for evaluating the overall quality of primary and preventive care in an area. They are sometimes characterized as ‘avoidable hospitalizations,’ but this does not mean that the hospitalizations were unnecessary or inappropriate at the time they occurred.”

It’s not the kind of data that would normally get your average New York resident excited.  Even though it’s personal information—it doesn’t get more personal than health—it’s unlikely to feel very personal to anyone.

That’s what makes numbers and data off-putting for so many people.  Even when the numbers include people like us, we don’t see ourselves in them, so it’s hard to feel like those numbers have anything to say to us personally.  At the same time, so many decisions are being made based on data, huge decisions that affect all of us.  It’s important for democracy that ordinary citizens have a stake in the data, that they not only have access to the data but that they also have an interest in reviewing the data themselves.

What’s interesting to me about this website, then, is that is its potential for making this obscure piece of government health data much more immediate and personal for ordinary citizens, and not just public health data geeks.  As soon as I heard about this website, the first thing I did was look up my zip code, “11205” in the county of Kings (Brooklyn).  I could then see racial disparities in the admission rate for these conditions in my neighborhood, and even see data on specific hospitals in my area.  Whenever there is a way to organize and access data in a way that is personal to the user, it’s immediately more compelling.

There’s no particular reason for me to wonder what asthma admission rates were in my zip code in 2006.  But I can imagine a mother of a child with asthma coming upon this site, wondering what asthma rates are in her zip code and the ones around it, and maybe seeing patterns that lead her to talk to other parents and elected officials.  And I can imagine other data sets of personal information being made truly relevant and personal in similar ways.

Who cares about Privacy: Why search queries in America trump sexual history in Africa.

Wednesday, December 6th, 2006

A couple of weeks ago I reported on Sam Clark’s presentation about interesting social science data collection efforts, in particular, research being done in the area of AIDS/HIV in Sub-Saharan Africa…a data collection project that was startling both for how intimate the survey questions were and for the cursory attention paid to privacy matters.

A week later, I attempted to give various rationales for why privacy did not figure prominently in Clark’s presentation. The sheer urgency of the AIDS epidemic, the relative powerlessness of the survey subjects and the relative irrelevance of databases, the internet and modern digital life as we know it to much of Africa seemed to me to be the three most powerful reasons.

So now I’d like to contrast that with the media frenzy of late in the First World over AOL’s unfortunate bungling that led to an unqualified release of user query data to the public.

So in the context of this DSS work in Africa, do the AOL users have a right to be outraged? If there was a leak of INDEPTH user data, would the U.S. media be condemning INDEPTH? or would they not care because the general African’s privacy is too far removed from our reality? Or maybe INDEPTH survey respondents are disenfranchised at this point?

What if there was an unfortunate bungling of personal information at INDEPTH? Who cares if we know AOL user 34653 is looking for a good cross-dressing cruise for couples and idolizes Cher, Castro and Trent Lott in a single breath. It’s trivial in comparison to Name, HIV status, # and type of sex partners in the last 6 months, # of times you’ve had unprotected sex in the last 6 months and with who.

What if an embattled, desperate government with a touch of psychosis decided that this data was handy for carrying out a genocidal “solution” to the AIDS epidemic?

I can’t help feeling that the fact that “the [AOL] data was leaked” is besides the point.

Yes it was careless, wrong, and inconsiderate. But in the end, is that really why people are so unhappy?

I think people are unhappy about the AOL data release because it was a surprise. People simply didn’t realize how much of their life was being captured, recorded and analyzed by search engines. Even with our modern-day sophistication, we are just as naive about the digital fingerprints we leave everywhere as the respondents are about the surveys they answer. In some ways you could say the respondents in INDEPTH’s DSS were more aware. They were painfully aware that their lives were being examined, and not only that, they knew and understood the goals of the organization that was collecting that information.

For AOL users, it was only after the data release that people started to realize that as an individual, you are laying bare your psyche: contemplations of suicide, murder, sexual hang-ups, personal insecurities, etc…so that the folks at AOL (and other search engines) can sell you better targeted advertising and make more money. Contrast that with what social scientists are trying to accomplish in sub-Saharan Africa and you start to feel like a cheap date.

What’s unfortunate is that the reason why the AOL data became compromised was because AOL was following others in the industry trying to “do good” by making their data available to academic researchers who might try to do something more with the data than figure out advertising schemes.

The takeaway here is that there is no straightforward, one-size-fits-most policy when it comes to privacy. It’s not about how much privacy is enough privacy. It’s not about whether people should share data or not share data. It’s clear that there are myriad circumstances that call for different levels of care on the part of the people collecting data and provoke different responses on the part of the people sharing information. Like most things having to do with human beings and society, privacy is context-sensitive and grand sweeping EULAs and privacy policies are insufficient, if not downright ridiculous for capturing how we should approach the issue, as an industry and as a society.

Now that AOL knows beyond a shadow of a doubt that their users can’t seem to be able to extrapolate from their generic, vague privacy policy, the natural consequences of using the AOL search service, they need to find a way to make the search experience itself clearly communicate to the user the data collection that is happening behind the scenes. The INDEPTH survey respondents wouldn’t be surprised to see themselves in a report on the sexual history of people who are HIV+ or living with AIDS in Sub-Saharan African. They’re answering a survey, what else would they expect?

Similarly, AOL users shouldn’t be surprised that someone is keeping track of what they search for, what sites they visit, for how long and how often.* Attaining this kind of mutual understanding with your users is much trickier and has yet to be done successfully. After all, AOL’s users don’t think of themselves as answering a survey when they conduct a search or visit a website. But as far as the researchers at AOL are concerned, that’s exactly what they’re doing,

*That being said, everyone should be surprised and outraged if any of this data is released without being properly anonymized. Whether or not everyone has the wherewithal and press connections to express their indignation and anger is another issue for another blog entry.

How to evaluate a privacy statement when you’re dying of AIDS

Sunday, November 12th, 2006

Last week, I reported on Professor Sam Clark’s recent talk: “Relational Databases in the Social and Health Sciences: The View from Demography.” Clark covered a wide array of topics from the challenges of working with heterogeneous sets of social science field research to data-driven outcome-modeling that is used to drive policy decisions in the arena of AIDS/HIV prevention and treatment in Sub-Saharan Africa.

As I mentioned last week, surprisingly, privacy did not come up during Professor Clark’s talk…except in a brief aside, where Clark acknowledged that study subjects are at times uncomfortable disclosing extra-marital relationships. On the whole, privacy did not appear to be a taking up too many cycles at either INDEPTH, a network of ‘Demographic Surveillance Systems’ (DSS is social science-speak for data collection sites) that is working to standardize field research, or SPEHR, Clark’s personal effort to design a standard database schema for social science research. At the risk of being presumptuous, ‘Demographic Surveillance System‘ itself speaks volumes about how social science regards the issue of privacy.

At the same time, the frequent media alerts about privacy and data leaks (HP, AOL, Veterans) got me wondering: How would this data be handled in a US-based study? How readily would you respond to an online survey asking you how many times you’ve had unprotected sex?

Not very well would be my guess. Forget allowing someone to compile a detailed log of your day-to-day sexual activity. People would never even get past the first 2 questions: Are you HIV positive? Are you living with AIDS? The ramifications of leaking such information are all too well-known in modern society.

Just to make sure that I hadn’t misread the lack of emphasis, I rooted around the INDEPTH website to see if I could find a meatier discussion about privacy.

I found a reference to “A Data Model for Demographic Surveillance Systems“, a 21 page paper which makes it’s first and last mention of privacy on p.18 in its ‘Conclusions and Future Work’ section:

“More work is needed for sites that require better data privacy than simply restricting access to the data set. Certainly, separating the name from the ID field is the first step in providing better data privacy.”

I also found “Data access, security and confidentiality“, a 174-word document in the INDEPTH DSS Resource Toolkit that recommends 3 things to researchers designing data collection systems:

1. Be clear about who has access to the data, what data do they have access to, and what level of access should they have.
2. Back up the data. A RAID server is ideal.
3. Separate survey respondent ID numbers from their names.

These are all good recommendations that demonstrate a willingness to address the issue. But isn’t this oversimplification at best and gross negligence at worst? Granted, I may be unfair in singling out INDEPTH to play the role of spokesperson for the entire social science community on the topic of privacy. So maybe all I really can say is that, at the very least, the folks at INDEPTH are seriously underestimating the challenges of taking on guardianship of sensitive personal data. Like the researchers at AOL, we can only wait for the consequences of their mis-estimation to play out.

So again, the sense I get is that privacy isn’t a major issue. Why’s that?

INDEPTH’s users have more important things to worry about. They’re not scanning people’s email to sell mattress companies more targeted advertising. They’re trying to do things like save a continent from implosion.

According to UNAIDS, in 2005 alone an estimated 3.2 million people in Sub-Saharan Africa became newly infected, while 2.4 million adults and children died of AIDS. In the U.S., which has less than 40% of the population of Sub-Saharan African, if 1 million Americans were dying AIDS every year, we wouldn’t be talking about privacy either.

A second, more insidious reason is that this flavor of information privacy is largely an information-age phenomena, one that requires the individual to understand the implications and weigh the risks of disclosure.

Our ‘modern-day’ awareness, or wariness of disclosure did not come for free. Even with all of the media frenzy, people regularly compromise their personal information in myriad ways everyday: Chocolate bars for passwords.

Nevertheless, no matter how tenuous a grasp the public has on data and databases, the level of sophistication mainstream America has achieved in the realm of ‘things digital’ is not to be taken for granted.

It’s not a matter of intelligence or common sense. I’m guessing that the people who willingly participate in DSS such as INDEPTH don’t have a gut-level appreciation of what it means to be ‘in the system’ for the simple reason that they live in pre-digital or barely digital societies and aren’t kept track of in their daily existence the way we are.

They don’t log in, they don’t enter passwords, PIN numbers or secret codes. They don’t answer self-selected security questions, swipe key fobs, scan ID cards, metro cards, and medical insurance cards. They don’t accept certificates, add people to whitelists, report spam. They don’t make spreadsheets, tag pictures, maintain ‘address books’, query their email or for that matter, query the web. They don’t inspect the history in their web browser to delete all the URLs that might not be so great for other people to inadvertently stumble across. They’ve never had an application rejected because of ‘low’ test scores and ‘bad’ grades. They’ve never been denied insurance for having ‘above average’ blood pressure. They’ve never been denied a mortgage for having ‘below average’ credit. They’ve never been audited by the IRS or logged into Amazon to be confronted with “Here’s a recommendation just for you: Getting pregnant after Menopause!”.

In other words, the subjects in this study don’t necessarily have a clear conception of this thing called a database that is going to consume their personal life history, chop it up into discrete cells, array it in rows and columns, making it all the more digestible for aggregating, analyzing, comparing and accessible to on-demand recall. The question is, when a respondent ‘consents’ to ‘participate in a survey’, do they understand what they’re consenting to? Do the field researchers themselves understand what respondents are consenting to?

Even if respondents did fully understand what ‘consent’ really meant (which is highly doubtful given that most First World internet users don’t fully digest what it means to ‘Accept’ a EULA), there still remains the unresolved issue of whether dire circumstances (e.g. lots of people dying with no end in sight) warrant slackened attention to privacy.

Up Next: Who cares about Privacy: Why search queries in America trump sexual history in Africa.

When Privacy Doesn’t Matter

Friday, November 3rd, 2006

Last Thursday MSR hosted Professor Sam Clark from the University of Washington for a talk entitled “Relational Databases in the Social and Health Sciences: The View from Demography.” For someone interested in using data for driving decision-making, it was interesting to hear about someone using empirical data to model the impact of different policies on a societal problem.

My main take-aways from the talk were as follows:

  • Social scientists today rarely use relational database (RDBMS) technology, or when they do, they use antique software. Apparently much analysis is done in statistical packages (I’m guessing SAS, and the like), which apparently lack much of the data management technology that is indispensable when working with larger datasets. For social scientists in general, the potential of current database technologies is only just becoming apparent.
    • As I am not familiar with many of the alternatives, had I been physically at the talk on campus rather than watching on-line, I would have liked to clarify what has changed to make RDBMS more attractive than it was before. I can only surmise from the talk that large data sets have only recently become available to social scientists, and that previous data sets were too small to warrant the RDBMS.
    • Even now, Clark said his colleagues would categorize a “large” dataset to be around 500 Megabytes.
  • Many early attempts at moving demographic data to relational data structures failed because the impact of the schema design on the demographical data uses was underestimated by those developing such systems.
  • Breadth of data has significant value to the longitudinal studies social scientists are conducting. Yet, lack of agreement on how to collect and store data is hampering their ability to interrelate data sets. Therefore, developing and agreeing on a standard is very desirable. (Incidentally, this problem is not exclusive to demographic datasets.)
  • Clark has done several iterations on a standard schema, particularly for capturing “Event-Influence-State” type datasets, commonly used in demography.
    • The Structured Population Event History Register (SPEHR).
    • One example he shared with us assessed “the impact of male circumcision as an HIV prevention strategy”. By using a longitudinal study (2 years, 3,000 people) to feed his simulation, he was able to demonstrate likely outcomes of the policy intervention in different phases of the epidemic; data to feed a real-world policy decision. 🙂

He also shared lots of anecdotal statistics about the AIDS epidemic in Africa, massive infection rates and death rates, which I continue to find mind-boggling: What would day-to-day living look like in the U.S. if 20% of Americans were infected with HIV? Or if we suddenly had millions of “dual-orphans” (normally a rare phenomenon) to raise? How will Africa recover?

All in all a very interesting talk with food for thought on many fronts, but one issue was conspicuously missing: Privacy.

Up Next: How to evaluate a privacy statement when you’re dying of AIDS.

Get Adobe Flash player