Archive for the ‘Public Policy’ Category

In the mix…Facebook “breach” of public data, data-mining for everyone, thinking through the Panton Principles, and BEST PRACTICES Act in Congress

Friday, July 30th, 2010

1) Facebook’s in privacy trouble again. Ron Bowes created a downloadable file containing information on 100 million searchable Facebook profiles, including the URL, name, and unique ID.  What’s interesting is that it’s not exactly a breach.  As Facebook pointed out, the information was already public.  What Facebook will likely never admit, though, is that there is a qualitative difference between information that is publicly available, and information that is organized into an easily searchable database.  This is what we as a society are struggling to define — if “public” means more public than ever before, how do we balance our societal interests in both privacy and disclosure?

2) Can data mining go mainstream? The article doesn’t actually say much, but it does at least raise an important question.  The value of data and data-mining is immense, as corporations and large government agencies know well.  Will those tools every be available to individuals?  Smaller businesses and organizations?  And what would that mean for them?  It’s a big motivator for us at the Common Data Project — if data doesn’t belong to anyone, and it’s been collected from us, shouldn’t we all be benefiting from data?

3) In the same vein is a new blog by Peter Murray-Rust discussing open knowledge/open data issues, focusing on the Panton Principles for open science data.

4) A new data privacy bill has been introduced in Congress called “Building Effective Strategies to Promote Responsibility Accountability Choice Transparency Innovation Consumer Expectations and Safeguards” Act, aka “BEST PRACTICES Act.”  The Information Law Group has posted Part One of FAQs on this proposed bill.

Although the bill is still being debated and rewritten, some of its provisions indicate that the author of the bill knows a bit more about data and privacy issues than many other Congressional representatives.

  • The information regulated by the Act goes beyond the traditional, American definition of personally identifiable information.  “The definition of “covered information” in the Act does not require such a combination – each data element stands on its own and may not need to be tied to or identify a specific person. If I, as an individual, had an email address that was wildwolf432@hotmail.com, that would would appear to satisfy the definition of covered information even if my name was not associated with it.”
  • Notice is required when information will be merged or combined with other data.
  • There’s some limited push to making more information accessible to users: “covered entities, upon request, must provide individuals with access to their personal files.” However, they only have to if “the entity stores such file in a manner that makes it accessible in the normal course of business,” which I’m guessing would apply to much of the data collected by internet companies.

In the mix…philanthropic entities, who’s online doing what, data brokers, and data portability

Monday, July 5th, 2010

1) Mimi and I are constantly discussing what it means to be a nonprofit organization, whether it’s a legal definition or a philosophical one.  We both agree, though, that our current system is pretty narrow, which is why it’s interesting to see states considering new kinds of entities, like the low-profit LLC.

2) This graphic of who’s online and what they’re doing isn’t going to tell you anything you don’t already know, but I like the way it breaks down the different ways to be online.  (via FlowingData) At CDP, as we work on creating a community for the datatrust, we want to create avenues for different levels of participation.  I’d be curious to see this updated for 2010, and to see if and how people transition from being passive userd to more active userd of the internet.

3) CDT has filed a complaint against Spokeo, a data broker, alleging, “Consumers have no access to the data underlying Spokeo’s conclusions, are not informed of adverse determinations based on that data, and have no opportunity to learn who has accessed their profiles.” We’ve been wondering when people would start to look at data businesses, which have even less reason to care about individuals’ privacy than businesses with customers like Google and Facebook.  We’re interested to see what happens.

4) The Data Portability Project is advocating for every site to have a Portability Policy that states clearly what data visitors can take in and take out. The organization believes “a lot more economic value could be created if sites realized the opportunity of an Internet whose sites do not put borders around people’s data.” (via Techcrunch)  It definitely makes sense to create standards, though I do wonder how standards and icons like the ones they propose would be useful to the average internet user.

In the mix…democratizing access to data, data literacy, and predictable responses to proposed privacy bill

Friday, June 18th, 2010

1) Infochimps launched their API. People often ask, are you guys doing something similar?  Yes, in that we are also interested in democratizing access to data, but we’re focusing on a narrower area — information that’s too sensitive and too personal to release in the usual channels. In any case, we’re excited to see more movement in this direction.

2) Wikipedia began a trial of a new tool called “Pending Changes.” To deal with glaring inaccuracies and vandalism, Wikipedia made certain entries off-limits for off-the-cuff editing.  The trade-off, however, was that first-time editors to these articles couldn’t get that immediate thrill of seeing their edits.  Wikipedia’s trying out a compromise, a tab in which these edits are visible as “pending changes.”  It’s always fascinating to see all the different spaces in which people in a community can interact online — this is a new one.

3) The Info Law Group posted various groups’ reactions to the privacy bill proposed by Representative Rick Boucher. Here’s Part I, here’s Part II. Fairly predictable, but it still never ceases to amuse me how far apart industry groups are from consumer advocates.

4) Great discussion continues on the concept of “data literacy.” I love this guest post from David Eaves on the Open Knowledge Foundation blog, with the awesome line:

It is worth remembering: We didn’t build libraries for an already literate citizenry. We built libraries to help citizens become literate. Today we build open data portals not because we have a data or public policy literate citizenry, we build them so that citizens may become literate in data, visualization, coding and public policy.

Yea or Nay: Data-Mine Bank Records to Find Tax Evaders

Tuesday, June 8th, 2010

The lower house of the Swiss Parliament voted Tuesday to reject a deal with the United States to transfer bank data from 4,450 American clients of UBS suspected of tax evasion…

What kind of data-mining is the IRS doing within the U.S.? The Right to Financial Privacy Act protects our personal banking data from government searches.

However, should the government be asking for aggregate data from banks about customer account activity that could help them identify suspicious behavior?

View Results

Loading ... Loading ...

Measuring the privacy cost of “free” services.

Wednesday, June 2nd, 2010

There was an interesting pair of pieces on this Sunday’s “On The Media.”

The first was “The Cost of Privacy,” a discussion of Facebook’s new privacy settings, which presumably makes it easier for users to clamp down on what’s shared.

A few points that resonated with us:

  1. Privacy is a commodity we all trade for things we want (e.g. celebrity, discounts, free online services).
  2. Going down the path of having us all set privacy controls everywhere we go on internet is impractical and unsustainable.
  3. If no one is willing to share their data, most of the services we love to get for free would disappear. Randall Rothenberg.
  4. The services collecting and using data don’t really care about you the individual, they only care about trends and aggregates. Dr. Paul H. Rubin.

We wish one of the interviewees had gone even farther to make the point that since we all make decisions every day to trade a little bit of privacy in exchange for services, privacy policies really need to be built around notions of buying and paying where what you “buy” are services and how you pay for them are with “units” of privacy risk (as in risk of exposure).

  1. Here’s what you get in exchange for letting us collect data about you.”
  2. Here’s the privacy cost of what you’re getting (in meaningful and quantifiable terms).

(And no, we don’t believe that deleting data after 6 months and/or listing out all the ways your data will be used is an acceptable proxy for calculating “privacy cost.” Besides, such policies inevitably severely limit the utility of data and stifle innovation to boot.)

Gaining clarity around privacy cost is exactly where we’re headed with the datatrust. What’s going to make our privacy policy stand out is not that our privacy “guarantee” will be 100% ironclad.

We can’t guarantee total anonymity. No one can. Instead, what we’re offering is an actual way to “quantify” privacy risk so that we can track and measure the cost of each use of your data and we can “guarantee” that we will never use more than the amount you agreed to.

This in turn is what will allow us to make some measurable guarantees around the “maximum amount of privacy risk” you will be exposed to by having your data in the datatrust.


The second segment on privacy rights and issues of due process vis-a-vis the government and data-mining.

Kevin Bankston from EFF gave a good run-down how ECPA is laughably ill-equipped to protect individuals using modern-day online services from unprincipled government intrusions.

One point that wasn’t made was that unlike search and seizure of physical property, the privacy impact of data-mining is easily several orders of magnitude greater. Like most things in the digital realm, it’s incredibly easy to sift through hundreds of thousands of user accounts whereas it would be impossibly onerous to search 100,000 homes or read 100,000 paper files.

This is why we disagree with the idea that we should apply old standards created for a physical world to the new realities of the digital one.

Instead, we need to look at actual harm and define new standards around limiting the privacy impact of investigative data-mining.

Again, this would require a quantitative approach to measuring privacy risk.

(Just to be clear, I’m not suggesting that we limit the size of the datasets being mined, that would defeat the purpose of data-mining. Rather, I’m talking about process guidelines for how to go about doing low-(privacy) impact data-mining. More to come on this topic.)

In the mix…DNA testing for college kids, Germany trying to get illegally gathered Google data, and the EFF’s privacy bill of rights for social networks

Friday, May 21st, 2010

1) UC Berkeley’s incoming class will all get DNA tests to identify genes that show how well you metabolize alcohol, lactose, and folates. “After the genetic testing, the university will offer a campuswide lecture by Mr. Rine about the three genetic markers, along with other lectures and panels with philosophers, ethicists, biologists and statisticians exploring the benefits and risks of personal genomics.”

Obviously, genetic testing is not something to take lightly, but the objections quoted sounded a little paternalistic. For example, “They may think these are noncontroversial genes, but there’s nothing noncontroversial about alcohol on campus,” said George Annas, a bioethicist at the Boston University School of Public Health. “What if someone tests negative, and they don’t have the marker, so they think that means they can drink more? Like all genetic information, it’s potentially harmful.”

Isn’t this the reasoning of people who preach abstinence-only sex education?

2) Google recently admitted they were collecting wifi information during their Streetview runs.  Germany’s reaction? To ask for the data so they can see if there’s reason to charge Google criminally.  I don’t understand this.  Private information is collected illegally so it should just be handed over to the government?  Are there useful ways to review this data and identify potential illegalities without handing the raw data over to the government?  Another example of why we can’t rest on our laurels — we need to find new ways to look at private data.

3) EFF issued a privacy bill of rights for social network users.  Short and simple.  It’s gotten me thinking, though, about what it means that we’re demanding rights from a private company. Not to get all Rand Paul on people (I really believe in the Civil Rights Act, all of it), but users’ frustrations with Facebook and their unwillingness to actually leave makes clear that the service Facebook is offering is not just a service provided to just a customer.  danah boyd has a suggestion — let’s think of Facebook as a utility and regulate it the way we regulate electric, water, and other similar utilities.

In the mix…Everyone’s obsessed with Facebook

Friday, May 7th, 2010

UPDATE: One more Facebook-related bit, a great graphic illustrating how Facebook’s default sharing settings have changed over the past five years by Matt McKeon. Highly recommend that you click through and watch how the wheel changes.

1) I love when other people agree with me, especially on subjects like Facebook’s continuing clashes with privacy advocates. Says danah boyd,

Facebook started out with a strong promise of privacy…You had to be at a university or some network to sign up. That’s part of how it competed with other social networks, by being the anti-MySpace.

2) EFF has a striking post on the changes made to Facebook’s privacy policy over the last five years.

3) There’s a new app for people who are worried about Facebook having their data, but it means you have to hand it over to this company which also states, it “may use your info to serve up ads that target your interests.” Hmm.

4) Consumer Reports is worried that we’re oversharing, but if we followed all its tips on how to be safe, what would be the point of being on a social network? On its list of things we shouldn’t do:

  • Posting a child’s name in a caption
  • Mentioning being away from home
  • Letting yourself be found by a search engine

What’s the fun of Facebook if you can’t brag about the pina colada you’re drinking on the beach right at that moment? I’m joking, but this list just underscores that we can’t expect to control safety issues solely through consumer choices. Another thing we shouldn’t do is put our full birthdate on display, though given how many people put details about their education, it wouldn’t necessarily be hard to guess which year someone was born. Consumer Reports is clearly focusing on its job, warning consumers, but it’s increasingly obvious privacy is not just a matter of personal responsibility.

5) In a related vein, there’s an interesting Wall St. Journal article on whether the Internet is increasing public humiliation. One WSJ reader, Paul Cooper, had this to say:

The simple rule here is that one should always assume that everything one does will someday be made public. Behave accordingly. Don’t do or say things you don’t want reported or repeated. At least not where anyone can see or hear you doing it. Ask yourself whether you trust the person who wants to take nude pictures of you before you let them take the pictures. It is not society’s job to protect your reputation; it’s your job. If you choose to act like a buffoon, chances are someone is going to notice.

Like I said above, privacy in a world where the word “public” means really really public forever and ever, and “private” means whatever you manage to keep hidden from everyone you know, protecting “privacy” isn’t only a matter of personal responsibility. The Internet easily takes actions that are appropriate in certain contexts and republishes them in other contexts. People change, which is part of the fun of being human. Even if you’re not ashamed of your past, you may not want it following you around in persistent web form.

Perhaps on the bright side, we’ll get to a point where we can all agree everyone has done things that are embarrassing at some point and no one can walk around in self-righteous indignation. We’ve seen norms change elsewhere. When Bill Clinton was running for president, he felt compelled to say that he had smoked marijuana but had never inhaled. When Barack Obama ran for president 16 years later, he could say, “I inhaled–that was the point,” and no one blinked.

6) The draft of a federal online privacy bill has been released. In its comments, Truste notes, “The current draft language positions the traditional privacy policy as the go to standard for ‘notice’ — this is both a good and bad thing.” If nothing else, the “How to Read a Privacy Policy” report we published last year had a similar conclusion, that privacy policies are not going to save us.

In the mix — open data issues, bad econ stats, Facebook gaydar, and fraud detection in data

Friday, April 30th, 2010

1) It’s definitely become trendy for cities to open up their data, and I appreciated this article about Vancouver for its substantive points:

  • It’s important that data not only be open but be available in real time.  In all my conversations with people who work with data, though, whenever you have sensitive data, there’s going to be a significant time lag between when the data is collected and when it is “cleaned up” and made presentable for the public so as to avoid inadvertent disclosure.  This is why we think something like PINQ, a filter using differential privacy, could be revolutionary in making data available more quickly — it won’t need to be scrubbed for privacy reasons.
  • Licensing is an issue — although the city claims the data is public domain, there are terms of use that restrict use of the data by things like OpenStreetMaps.  It discusses the possibility of using the Public Domain Dedication and License, which is a project of Open Data Commons.  Alex heard some interesting discussion on this issue from Jordan Hatcher at the OkCon this past weekend.  This is a really fascinating issue, and I’m curious to see where else this gets picked up.

2) Existing economic statistics are riddled with problems.  I can’t say this enough — if existing ways of collecting and analyzing data are not quite good enough, we need to be open to new ones.

3) This is an old article, but highlights an issue Mimi and I have been thinking a lot about recently: How can data, even when shared according to your precise directions, reveal more than you intended? In this case, researchers found you could more or less determine the sexual orientation of people on Facebook based on their friends, even if they hadn’t indicated it themselves.  Privacy is definitely about control, yet how do you control something you don’t even know you’re revealing?

4) This past week, the Supreme Court heard a case involving the right to privacy of those who sign petitions to put initiatives on the ballot.  There is a lot of stuff going on in this case, gay rights, the experience of those in California who were targeted for supporting Prop 8, the difference between voting and legislating, etc., but overall, it’s a perfect illustration of how complicated our understanding of public and private has gotten.  We leave those lists open to scrutiny so we can prevent fraud — people signing “Mickey Mouse” — but public when you can go look at the list at the clerks’ office and public when you can post information online for millions to see are two different things.  There may be reasons we want to make these names public other than to prevent fraud (Justice Scalia thinks so), but are there other ways fraud could be detected among signatories that would not require an open examination of all petition signers’ names?  Could modern technology help us detect odd patterns, fake names and more without revealing individual identities?

Yea or Nay: Credit Checks on Job Applicants

Monday, April 12th, 2010

Should employers continue to be allowed to check your credit history as a part of the job application process?

View Results

Loading ... Loading ...

The biggest argument against this appears to be the lack of evidence showing a connection between credit history and job performance.

Sort of interesting to think about this in the context of other things employers ask about that may or may not have anything to do with job performance.

  1. Have you ever set a world record in anything?
  2. Do you play World of Warcraft?
  3. You have one fox and two chickens…

Number of subway passengers from Powell Station = retail revenues?

Thursday, April 8th, 2010

Spinn via Flickr/Creative Commons License Attribution

The Wall Street Journal reports that economists are looking to “oddball data” to see trends before official numbers are released.

We’re obviously a little obsessed with data reuse — the more imaginative, the better. There’s Ted Egan, the chief economist in the San Francisco Comptroller’s office, who looks at weekend passenger tallies for the Union Square shopping district rather than wait six months for the state’s official retail revenue numbers.  Then there’s Edward Learner, the economist who discovered diesel fuel sales on Interstate Highway 5 is a leading indicator of construction employment in California, while diesel sales on Interstate Highway 80 is an indicator of manufacturing employment.

The people who collected this data surely didn’t imagine it being used this way, which is why we should be really careful about closing off data reuse before we even know what the potential reuses are.  And, as these economists have found, these indicators are often faster and arguably, more accurate.

(P.S.  I used to live in San Francisco.  I know the Powell St. trolley is not the same as the Powell BART station.  Sorry.)


Get Adobe Flash player