Over the last few months I’ve been to a number of interesting talks at the Stanford Methods of Analysis Program in the Social Sciences (MAPSS) colloquium. Two types of speakers have caught my attention: those who work closely with the logistics and mechanics of data collection, and those who try to use survey data to test their hypotheses.
Most recently I got to hear Linda Piekarski of Survey Sampling International on SSI’s efforts to address changes in the telephone system, as well as their recent forays into internet surveys. (I didn’t realize how perfect the original design for the U.S. phone system was for tele-survey companies.)
Also memorable was Yale Professor Don Green‘s talk about measuring the effectiveness of political campaign advertising. One of my favorite lines (though I’m paraphrasing) was that “Any time you see a clean, clear graph of data, there’s something wrong. Data “noise” is what reality looks like.”
What follows is a summary of the challenges facing the collection of data about individuals derived in part from these talks.
Today, there are three main ways of collecting data from individuals, each of which contain flaws that seriously undermine the quality of the data collected.
- Pay them a tiny reward, lure them with a sweepstakes or nag them at dinner with a phone call from a stranger. For example, online stores may offer a coupon or rebate for your feedback on your buying experience.
- Make it easy for individuals to inadvertently or unthinkingly consent to data being collected about them, and/or subsequently changing the substances of what is collected, or the uses for that data. One prominent example is Amazon.com’s site registration process, which makes no attempt to highlight their third-party data-sharing practices.
- Leverage data collected for some other purpose – so-called “Secondary Use”. For example addresses collected for fulfillment (shipping) being used for geographically targeted marketing messages.
These mechanisms have a set of critical flaws:
- Tiny rewards and nagging phone calls are an insufficient value proposition for many individuals, thus the pool of participants is unlikely to be well distributed across the target distribution. Instead it will favor those individuals for whom the reward remains attractive, however small; or those individuals for whom the cost of participation (time) is small enough to make the reward adequate. (Mechanism 1)
- Rewards or compensation that are distributed without regard to accuracy provide no incentive for either careful or genuine accurate self-reporting. (Mechanism 1)
- These practices cultivate a public perception of a mesh of “big brother” networks collecting an ever-expanding set of data, beyond the control of any one individual. Privacy outrage still surfaces in mainstream media occasionally, but the general public is increasingly numb to incremental discoveries of the erosion of personal privacy. While anesthesia may appear temporarily attractive to data collectors, it also disengages individuals from the data collection goals, which decreases participation and discourages accurate self-reporting. For example, when you are pressured to answer a survey at a department store or after check-out at a web retailer, do you react with an earnest attempt to supply them with the information they need? (All mechanisms)
- In an effort to fight back the ever-increasing invasive data collection going on, privacy legislation and legal liability has forced data to be “silo-ed” and “anonymized” as much as possible. That means that unless you are a part of a larger survey panel, each subsequent survey you complete or data you consent to have collected will be stored separately from your other data. This eliminates the possibility of data-accuracy maintenance by individuals, and makes longitudinal analysis increasingly difficult. (All mechanisms)