Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Why Anonymized Data Isn't

kdawson posted about 5 years ago | from the can't-keep-good-PII-down dept.

Data Storage 280

Ars has a review of recent research, and a summary of the history, in the field of reidentification — identifying people from anonymized data. Paul Ohm's recent paper is an elaboration of what Ohm terms a central reality of data collection: "Data can either be useful or perfectly anonymous but never both." "...in 2000, [researcher Latanya Sweeney] showed that 87 percent of all Americans could be uniquely identified using only three bits of information: ZIP code, birthdate, and sex. ... For almost every person on earth, there is at least one fact about them stored in a computer database that an adversary could use to blackmail, discriminate against, harass, or steal the identity of him or her. I mean more than mere embarrassment or inconvenience; I mean legally cognizable harm. ... Reidentification science disrupts the privacy policy landscape by undermining the faith that we have placed in anonymization."

cancel ×

280 comments

Sorry! There are no comments related to the filter you selected.

Damn voyeurism is all it is (5, Insightful)

Ethanol-fueled (1125189) | about 5 years ago | (#29355107)

For almost every person on earth, there is at least one fact about them stored in a computer database that an adversary could use to blackmail, discriminate against, harass, or steal the identity of him or her. I mean more than mere embarrassment or inconvenience; I mean legally cognizable harm.

...And this is the first thing that the author(s) though of regarding data-mining? Okay, but how would this happen? Why go through all the trouble to gather all that data when you could just hire a P.I. or know (or bribe) a law-enforcement official or an ISP employee? It Reminds me of a conversation I had with a guy who bragged that he could get anybody's info because a very good friend of his worked at the DMV. There were a couple semi-profile firings at the State Department because some employees snooped through celebrities' records for no reason other than voyeurism..er..curiosity.

Those types, the ones with the direct access to the info, are the weakest link. They're only human. "Hey, Bob, there's this guy I really hate. Look up his IP logs and tell me what you see!"

It all boils down to voyeurism. People would rather bring others down before bring their own lives up. It's the nature of the beast! Pathetic.

Re:Damn voyeurism is all it is (0, Flamebait)

winkydink (650484) | about 5 years ago | (#29355233)

But the voyeurism slant isn't newsworthy. Oh wait. Neither is this.

Re:Damn voyeurism is all it is (5, Insightful)

causality (777677) | about 5 years ago | (#29355329)

But the voyeurism slant isn't newsworthy.

Then how do you explain shows like Entertainment Tonight and all of these magazines and Web sites devoted entirely to completely useless celebrity trivia? Y'know, the ability to obsess over the personal life of someone you have never met and will never personally know, merely because they can sing or act, should be recognized as a pathology. Voyeurism only seems to partly explain it; much of it seems to come from an empty and unsatisfying life that leads to an attempt to live vicariously through some sort of idol which is perceived to be successful, in that sense that "most men lead lives of quiet desperation". However stupid and useless it may be, I can't deny that many do consider it newsworthy and much of "the news" includes such elements.

Re:Damn voyeurism is all it is (0)

Anonymous Coward | about 5 years ago | (#29355677)

Except that most people who watch Entertainment Tonight and such aren't "obsessed" with celebrity trivia. Interest =/= obsession.

Re:Damn voyeurism is all it is (2, Insightful)

causality (777677) | about 5 years ago | (#29355981)

Except that most people who watch Entertainment Tonight and such aren't "obsessed" with celebrity trivia. Interest =/= obsession.

Dear AC, perhaps we are using different definitions of "obsession." Here's mine: when something cannot possibly benefit your life in any measurable way whatsoever, and you devote energy to pursuing it anyway, this is something of an obsession. To me, an interest is something different. The RIAA has an interest in strong copyright laws. Why? Because the RIAA is benefitted by strong copyright laws. Therefore, it's not a surprise that the RIAA tries to bring them about. However, it doesn't do a damned thing for me to know that $ACTRESS is thinking of divorcing her husband. I don't benefit from knowing this, therefore I can accurately say that it is not in my interests. Her family and personal friends might have an interest in this, and with good reason, but then they wouldn't need to find out second-hand from a TV show either.

Think about it this way. If we treated all industries equally, in the sense that all industries were treated just like the entertainment industry, then anytime you bought a car or a computer it would come with a big long list containing the names of all the members of management, designers, and factory workers who produced it as well as the truck drivers who shipped it and the advertisers who marketed it. We would then have TV shows and magazines talking about the personal private lives of those people who produced your cars and computers, whom they marry, how many times they divorce and why, what goes on behind closed doors in their homes, and paparazzi would follow them around and try to get "exclusive" or embarassing photos of them. Additionally, average people who never met any of them would talk about them fondly as though they personally knew them.

Now if this happened for the automobile or computer industries, and I said it was obsessive behavior, on what grounds would you dispute that? Real question. I'd like to know.

Re:Damn voyeurism is all it is (1)

R2.0 (532027) | about 5 years ago | (#29355831)

"Then how do you explain shows like Entertainment Tonight and all of these magazines and Web sites devoted entirely to completely useless celebrity trivia? Y'know, the ability to obsess over the personal life of someone you have never met and will never personally know, merely because they can sing or act, should be recognized as a pathology. Voyeurism only seems to partly explain it"

But is it really voyeurism if no sex is involved? I mean, it's one thing to know that Lindsay Lohan is a slut, but bid deal.

Show me primetime video of her being bent over a car hood by her boi/gurl/whatever friend, THAT's voyeurism. (And quality TV, to boot.)

Some perspective please. (2, Insightful)

mosb1000 (710161) | about 5 years ago | (#29356035)

How is this any different than articles about rockets and space travel (after all, most of us will never travel into space, or work for NASA)? Or any other in a myriad of technical subjects that most of us are not, and will not be directly involved in or use directly.

People are curious. They are curious about everything. It's an exercise in futility to pick and chose useful information over non-useful information since none of us knows what tomorrow holds. If someone want's to read celebratory gossip more power to them. In truth, the gossip is more likely to be both true and useful than news about an new process that may produce titanium at half the cost or an article about NASA's next big toy. We on slashdot find the technical news more interesting, normal people who are interested in interpersonal relationships find the gossip more interesting. It's two sides of the same coin.

Re:Damn voyeurism is all it is (2, Insightful)

mea37 (1201159) | about 5 years ago | (#29355491)

Do you mean, you think you could've gotten an individual's medical records in MA for less than $20? Or maybe you can't see why someone would dig up an individual's medical records? (I can think of many... but then my employer was extorted by someone who'd stolen a bunch of medical-related data from them not that long ago.)

I think I hear a bit of "nobody would go to all that trouble" in your message. If in the early days of WiFi networks I described to you in tedius yet vague terms how to compromise WEP encryption, you probably would've thought the same thing. Today anyone who cares to can break WEP using readily available tools - it's really no bother at all if you're even slightly inclined to do it.

I've seen companies with contractual and regulatory obligations to protect data privacy make half-gestures to make it look like they're honoring privacy while still engaging in whatever easy-money scheme or shortcut they want. Shedding light on why those half-gestures don't work is a big deal.

Re:Damn voyeurism is all it is (4, Insightful)

causality (777677) | about 5 years ago | (#29356179)

Do you mean, you think you could've gotten an individual's medical records in MA for less than $20? Or maybe you can't see why someone would dig up an individual's medical records? (I can think of many... but then my employer was extorted by someone who'd stolen a bunch of medical-related data from them not that long ago.)

I think I hear a bit of "nobody would go to all that trouble" in your message. If in the early days of WiFi networks I described to you in tedius yet vague terms how to compromise WEP encryption, you probably would've thought the same thing. Today anyone who cares to can break WEP using readily available tools - it's really no bother at all if you're even slightly inclined to do it.

I've seen companies with contractual and regulatory obligations to protect data privacy make half-gestures to make it look like they're honoring privacy while still engaging in whatever easy-money scheme or shortcut they want. Shedding light on why those half-gestures don't work is a big deal.

That's the thing that I also think people don't understand. With good reason, I am not satisfied merely that someone probably wouldn't want to abuse my information. I am satisfied only when I know that they cannot do so.

I think the solution is to have the concept of "intellectual property" work both ways. Obviously your private information has value, otherwise advertisers and other companies wouldn't go to such great lenghts to obtain and use it. The problem is that they obtain it without your consent and without directly compensating you. For example, if I don't actively block web bugs, cookies, HTTP "ping", analytics tools, and other similar attempts, then that data will be gathered whether or not I like it.

The reason why I actively go out of my way to prevent companies from gathering data on me is simple. No one asked me if I wanted to be data-mined. I refuse to honor agreements in which I did not participate. Why anyone else would do so is a mystery to me.

So make each individual's private data their personal property. They can set whatever value they like, and if that value is more than a company thinks it is worth, the company is free to decline the sale. Most importantly, any attempt to just take that data will be theft, and anyone who does this can be prosecuted in a criminal court. I mean, think about it: why is it "marketing" when a company helps itself to my information against my will and "piracy" or "industrial espionage" if I helped myself to THEIR zeroes and ones against their will?

The Only Truly Anonymous Data (1, Informative)

Anonymous Coward | about 5 years ago | (#29355135)

The only way to make sure that data remains truly anonymous if or it to start out as anonymous data. "Scrubbed" data will always be traceable and often will have the source data, non-scrubbed, leak into the wild.

All hail the glorious Hypno-Google.

Re:The Only Truly Anonymous Data (0, Troll)

oodaloop (1229816) | about 5 years ago | (#29355173)

That's pretty cowardly.

Re:The Only Truly Anonymous Data (1)

shentino (1139071) | about 5 years ago | (#29355649)

Especially if you're outed by a friend who posts RL info without your permission that you can't retract.

Had this happen to me once.

Re:The Only Truly Anonymous Data (0)

Anonymous Coward | about 5 years ago | (#29355859)

It's 2009. If you are still in the closet you NEED to be outed by a friend. He did you a favor, closet-boy.

I know you feel righteous about this cause, but... (0)

Anonymous Coward | about 5 years ago | (#29356039)

what if your family members lives become endangered if you "come out"?
true fucking story, Jack, of a friend of mine that was a lesbian. Someone
threw her out of a second story window for even hinting at coming out.
Parents were very important government officials of a not so understanding
government. Details left out so they can all live.

Re:The Only Truly Anonymous Data (0)

Anonymous Coward | about 5 years ago | (#29355945)

Obviously the appropriate response would have been to post your "friend's" phone number on 4chan.

Bonus points if you can come up with a story that actually entices Anon to call en masse, but some are likely to do it anyway just for the lulz.

Paul Ohm? (4, Funny)

Yvan256 (722131) | about 5 years ago | (#29355161)

Paul Ohm's recent paper is an elaboration of what Ohm terms a central reality of data collection: "Data can either be useful or perfectly anonymous but never both."

Great, another Ohm's law [wikipedia.org] to learn.

Re:Paul Ohm? (4, Informative)

natehoy (1608657) | about 5 years ago | (#29355231)

Nonsense, it could be a extension of the current Law:

"In electrical circuits, Ohms' law states that the current through a conductor between two points is directly proportional to the potential difference or voltage across the two points, and inversely proportional to the resistance between them. In data anonymity, the law states that the general usefulness of any set of data that originally contained personally-identifiable information is inversely proportional to the degree of anonymity applied to said data."

See, on simple law to memorize, and now data analysts learn just a teensy bit about electricity and EEs learn just a teensy bit about data anonymization.

Re:Paul Ohm? (2, Funny)

2names (531755) | about 5 years ago | (#29355529)

Could you put that in the form of a car analogy so us laymen can understand it please? :)

Re:Paul Ohm? (4, Informative)

Beardo the Bearded (321478) | about 5 years ago | (#29355819)

Okay, let's take a road. The speed at which traffic can travel depends on the quality of the surface, gradient, camber, zoning, etc. Let's call this the "road conditions", with a lower number being better roads.

The number of cars that want to get through that road is a primary unit, which we can refer to as the "volume of traffic".

The third major criteria is the speed at which the traffic actually flows. This is the "actual flow" of traffic -- in other words, the "influence of other cars" on the traffic congestion.

In other words:
volume = influence of traffic * road conditions

or:
V = IR

Oh dammit. (0, Redundant)

Anonymous Coward | about 5 years ago | (#29355189)

They're on to me.

Re:Oh dammit. (0)

Anonymous Coward | about 5 years ago | (#29355497)

Are they onto me, too?

Duh. (3, Informative)

SatanicPuppy (611928) | about 5 years ago | (#29355193)

Am I the only one who always gives their birthday as 01/01/1970 and their zip code as 20500?

I mean, seriously. They don't need to know. Why would I give 'em the right numbers? They're lucky I even allow them to have rough demographic data.

Re:Duh. (0)

Anonymous Coward | about 5 years ago | (#29355235)

Yes you are. I'm watching you from across the street now.

Re:Duh. (4, Funny)

ColdWetDog (752185) | about 5 years ago | (#29355289)

I just put "No" under sex. I like to tell the truth. Not sure how it helps on the ID end though.

Re:Duh. (4, Funny)

Anonymous Coward | about 5 years ago | (#29355433)

I put "please!" and it doesnt seem to help either.

Re:Duh. (2, Funny)

syrinx (106469) | about 5 years ago | (#29355493)

It identifies you as a Slashdotter...?

20500 (1)

jDeepbeep (913892) | about 5 years ago | (#29355371)

Any particular reason you chose District of Columbia?

Re:20500 (5, Insightful)

natehoy (1608657) | about 5 years ago | (#29355385)

Because everyone knows that EVERYONE in DC lies.

Re:20500 (1)

BassMan449 (1356143) | about 5 years ago | (#29355657)

Good thing I don't have mod points. I would have a very hard time deciding if that should be modded Funny or Insightful

Re:Duh. (3, Insightful)

garcia (6573) | about 5 years ago | (#29355383)

Am I the only one who always gives their birthday as 01/01/1970 and their zip code as 20500?

I use 1/1/1979 (it's closer to my real age) and 90210 instead. I get a lot of crosseyed looks and many times the cashier (or whatever human I'm dealing with) will end up entering in a local zip code instead but people are no longer arguing w/me about what I choose to provide them when pressured for information (I always politely reply, "no thanks," when asked for that type of information but will give them false shit when they ask again and whine that they'll be fired).

Why would I give 'em the right numbers? They're lucky I even allow them to have rough demographic data.

Because the majority of people have absolutely no problems handing over any and all information they're prompted for up to and including their e-mail address, phone number or even SSN! Because most people don't even blink, those of us that don't feel like it should be anyone's business (like the scanning of IDs at liquor stores or bars to check age--there is a birthdate listed on IDs for a fucking reason people--not that they can scan my rare earth magnet swiped ID anyway) are looked at like assholes when we refuse to provide information that no one really needs anyway.

Re:Duh. (2, Informative)

mmkkbb (816035) | about 5 years ago | (#29355643)

(like the scanning of IDs at liquor stores or bars to check age--there is a birthdate listed on IDs for a fucking reason people--not that they can scan my rare earth magnet swiped ID anyway)

That's not to check age; that's to check for counterfeits with mismatched mag data, or mismatched 2-D barcode data, or missing UV ink prints, or missing holograms, etc. etc.

Re:Duh. (2, Funny)

ACMENEWSLLC (940904) | about 5 years ago | (#29355669)

This makes me think of a probably not unique idea. Most places that ask my my phone number are the same places asking over and over again. Radio Shack, Toys-R-Us, and Sears for example. What would be great is to memorize one of their phone numbers from the phone book and always give them that. Perhaps a number from a different store. Let their telemarketers waste time calling their own stores.

Re:Duh. (1)

foniksonik (573572) | about 5 years ago | (#29356063)

Your "account" is indexed under your phone number - they are looking it up to know what offers they should let you in on, check to see if you have a store credit card or should have one and of course to build their profile on you.

They don't care about your phone number other than that it is a unique identifier.

Re:Duh. (1)

ModifiedDog (514241) | about 5 years ago | (#29355725)

My phone number is always 555-1212.

Re:Duh. (4, Funny)

plague3106 (71849) | about 5 years ago | (#29355771)

I once gave a gamestop employee my zip as 12345. He say "its ok if you don't want to give it." My reply was the no, I am from Schenectady, NY.

Re:Duh. (1)

antifoidulus (807088) | about 5 years ago | (#29355445)

You forgot phone number "867-5309"

Re:Duh. (1, Informative)

Anonymous Coward | about 5 years ago | (#29355461)

Am I the only one who always gives their birthday as 01/01/1970 and their zip code as 20500?

But be careful. Using the same fake data consistently still allows someone to correlate across different records. For instance the aggregate data from various websites where you've filled-in data would identify you (with reasonably high probability) as being a single person. Then all it takes is one database that has enough info to link back to your real identity for your anonymity to be gone again.

I'm not saying that the average company would go to that much effort. I'm just saying that if you're going to be paranoid about anonymity, you should vary the data you provide somewhat randomly.

Re:Duh. (5, Funny)

interkin3tic (1469267) | about 5 years ago | (#29355473)

Yes you are. I always put put 90210. Phone number 867-5309. If anyone tries to find me, they're at least going to have that song stuck in their head and recall with disgust the shows they watched in the early 90's. Hopefully that will demoralize them enough to give up.

Re:Duh. (3, Funny)

compro01 (777531) | about 5 years ago | (#29355487)

I would think 90210 is a more common choice for zip code. It's probably the most densely populated area on the planet according to dataminers.

Re:Duh. (1)

davester666 (731373) | about 5 years ago | (#29355543)

I use 1/1/00, and a zip code of 12345. So, either a fairly young child or a rather old person has quite the interest in porn...

Re:Duh. (0)

Anonymous Coward | about 5 years ago | (#29355661)

I usually go by 01/01/1980 and 90210.

I guess I'm about 10 years younger than you. I don't live in the United States.

Re:Duh. (1)

R2.0 (532027) | about 5 years ago | (#29355743)

I always use 20001. Pretty sure it covers the White House and Congress.

Re:Duh. (0, Troll)

ShieldW0lf (601553) | about 5 years ago | (#29355803)

There are 3 different scenarios.

One is where the facts are embarrassing because of hypocrisy. That scenario is alleviated by loss of anonymity and increased transparency.

Another is where the person is engaged in anti-social behavior because they are in a reactionary state and need assistance because they are going off the rails in isolation. That scenario is also alleviated by the loss of anonymity. People will get the help they need to be happy and self-reliant if society at large is aware that they need it.

The third scenario is when people are engaged in premeditated anti-social behavior because they are amoral and vicious or involved in a conspiracy against the best interests of their peers for ideological reasons. Those people belong in the ground, and we should not be protecting their obscurity.

There is no justification for anonymity, nor for secrecy. Anonymity and secrecy preserve and reinforce the hazards that they purport to protect people from. They need to be abolished in a systematic fashion that doesn't expose early adopters to the dangers of hypocrisy, but tears the veils away for everyone all at once.

Re:Duh. (1)

Mycroft_VIII (572950) | about 5 years ago | (#29356145)

Yes because no group or subset of society has ever been wrongly subjected to bias and everyone loves targeted advertising in their mailbox (physical or email).

I know, I know, don't feed the trolls......

Re:Duh. (1)

Beardo the Bearded (321478) | about 5 years ago | (#29355913)

No, I use 90210 because I know that's a valid code.

I've given out random birthdays so many times that I have to check my DL before I order a cake.

Re:Duh. (1)

TREE (9562) | about 5 years ago | (#29356017)

I use 1/1/1970 often, because it's ZERO in UTC.
Shocked no one has "gotten that" yet, even here.

I'm perfectly anonymous! (2, Funny)

A beautiful mind (821714) | about 5 years ago | (#29355199)

See!

-- Anonymous Coward

Only three bits? (4, Funny)

Yvan256 (722131) | about 5 years ago | (#29355211)

[researcher Latanya Sweeney] showed that 87 percent of all Americans could be uniquely identified using only three bits of information: ZIP code, birthdate, and sex.

Holy hell forget about that anonymized data crap, I want to learn how she can compress that much data into three bits!

Re:Only three bits? (1)

clone53421 (1310749) | about 5 years ago | (#29355253)

For values of "all" equal to or less than 8, to index a table...

Re:Only three bits? (1)

Yvan256 (722131) | about 5 years ago | (#29355301)

I think you mean equal to or less than 7. You forgot to count zero.

Re:Only three bits? (1)

clone53421 (1310749) | about 5 years ago | (#29355327)

No, I mean equal to or less than 8. When counting Americans, zero isn't a useful number, because if we used zero Americans for our research then we wouldn't have anything to publish.

Re:Only three bits? (1)

clone53421 (1310749) | about 5 years ago | (#29355337)

...by which I mean, an empty table is != a table with one record, index # 000.

Re:Only three bits? (1)

An ominous Cow art (320322) | about 5 years ago | (#29355853)

It's not like he's some kind of neuromancer.

Re:Only three bits? (0)

Anonymous Coward | about 5 years ago | (#29355317)

Well if sex = yes then it excludes most of Slashdot posters.

Mission Impossible (5, Insightful)

im_thatoneguy (819432) | about 5 years ago | (#29355219)

I've pretty much given up any hope of being anonymous. It's just going to get exponentially more difficult as time goes on.

I had my credit card stolen once. It was stolen from the CC company. How is a business supposed to entrust me with thousands of dollars in credit if they don't know who I am? How is a credit card company supposed to function without a worldwide network which authorizes transactions.

If someone wants to find me they'll find me.

If someone wants to use my identity to frame me for a crime then they're just going to encounter a mountain of evidence from numerous sources which contradict their fabrication.

"My G1 was on a Starbucks Wifi at the time of the crime. I used my CC to purchase the drink. I received a text from a nearby tower. I posted a comment on breaking news story that is written in my style of writing. I was seen on 8 security cameras walking to the starbucks from my car. I used an automatic toll card 5 miles away from the coffee shop...." Good luck coming up with a large mountain of evidence to put me somewhere else.

Re:Mission Impossible (1, Interesting)

Anonymous Coward | about 5 years ago | (#29355537)

Mission Impossible

You're thinking of the wrong Cruise flick.
Take all that tracking information then add a few heuristics and you've got Minority Report.

Re:Mission Impossible (1, Interesting)

Anonymous Coward | about 5 years ago | (#29355545)

How is a business supposed to entrust me with thousands of dollars in credit if they don't know who I am?

They entrust the credit card company with credit. The credit card company entrusts you with credit, or in the case of prepaid credit cards, the credit card company takes your money and you trust them to relay it to the companies that you do businesses with. Only the credit card company needs to know your name and only if you don't give them the money up front.

"My G1 was on a Starbucks Wifi at the time of the crime. I used my CC to purchase the drink. I received a text from a nearby tower. I posted a comment on breaking news story that is written in my style of writing. I was seen on 8 security cameras walking to the starbucks from my car. I used an automatic toll card 5 miles away from the coffee shop...." Good luck coming up with a large mountain of evidence to put me somewhere else.

That is all volatile information in computer databases, which are unlikely to be cryptographically secured or in any other way tamper-proof. The same databases which put you where you really were can put you near a crime scene and all the traces you mentioned would not be any more trustworthy than the false data.

Re:Mission Impossible (2, Insightful)

riqtare (264681) | about 5 years ago | (#29355547)

If access to the evidence you just stated was available to the framer it makes it very easy to find a likely fall guy according to their habits. Makes the alibi of overwhelming evidence evaporate into prime suspicion.
The best lies are those that are mostly truth.

Re:Mission Impossible (1)

ArsonSmith (13997) | about 5 years ago | (#29355597)

So you're saying you robbed the coffee shop?

Re:Mission Impossible (1, Insightful)

Anonymous Coward | about 5 years ago | (#29355871)

"My G1 was on a Starbucks Wifi at the time of the crime. I used my CC to purchase the drink. I received a text from a nearby tower. I posted a comment on breaking news story that is written in my style of writing. I was seen on 8 security cameras walking to the starbucks from my car. I used an automatic toll card 5 miles away from the coffee shop...." Good luck coming up with a large mountain of evidence to put me somewhere else.

They won't need luck. If they're trying to frame you (and especially if you're helping them by being so cavalier about your privacy), then they'll know all of the above, and thus how to avoid conflicts between their evidence and yours. You'll simply have committed the crime at a time and place for which you have no alibi, or in a way that makes the time and location irrelevant.

At least one fact about them could be used (1)

RiotingPacifist (1228016) | about 5 years ago | (#29355245)

[citation needed]
I can't think of anything I've done online (even my shemale midget fetish on youpron) that could be used to blackmail me, now i get that others are more ashamed about what they do online but "almost everybody"?

Re:At least one fact about them could be used (2, Insightful)

interkin3tic (1469267) | about 5 years ago | (#29355667)

I did think that was an overstatement that undermined the main point. None of my prescriptions would be embarassing to anyone but a holistic medicine believer, I've told some tasteless jokes online. If someone were to send that information to my family along with what porn I looked at, that would be awkward at most. And that's assuming it's credible, which it wouldn't be.

How exactly would this blackmail work? Bob, the evil co-worker threatens to tell your wife and boss you have had a sex change, a running prescription to anti-psychotic medication, were arrested for something that they don't know about and you weren't legally obligated to inform them of, and look at gay porn aproximately 30% of your waking hours. For this hypothetical situation, assume that information is true. Do you do what he wants? If you don't and he does tell your wife and boss, do they actually believe him?

I think privacy is good for privacy's sake, overstatements such as this undermine the point.

Re:At least one fact about them could be used (-1, Offtopic)

Anonymous Coward | about 5 years ago | (#29355763)

sorry to ask but wtf is a shemale anyway? is it a person who is a woman in every way, breasts and curves and everything, but somehow has a penis? how does that happen, some kind of genetic defect? or are they hermaphrodites with both types of genitals yet look perfectly like women if you saw them clothed? or are they men who have undergone a sex change but have retained their penis? i hope this isn't a dumb question that everybody already knows the answer to. i just really dont understand some things.

Re:At least one fact about them could be used (1)

5KVGhost (208137) | about 5 years ago | (#29355779)

I'm skeptical about that claim, too, but I think the author also intended it to include real-world activities. For example, you've called in sick to work, but records of your activity suggest that you were actually at a job interview / romantic liaison / midget convention over on the other side of town.

Re:At least one fact about them could be used (1)

mdf356 (774923) | about 5 years ago | (#29355791)

I can't think of anything I've done online (even my shemale midget fetish on youpron) that could be used to blackmail me

Same here. However, the next bit of text is more relevant:

discriminate against, harass, or steal the identity of him or her. I mean more than mere embarrassment or inconvenience; I mean legally cognizable harm.

There's almost certainly something that can be used to discriminate against you, harass you, or steal your identity, causing legally cognizable harm. Blackmail is just for the people ashamed of what they do; the rest affects everyone.

Bah, humbug. (2, Funny)

jdgeorge (18767) | about 5 years ago | (#29355273)

Forget anonymity. I'm better off living in a glass house, so it's easier for me to know when I need to yell "Get off my lawn!"

Let's see (0)

ceoyoyo (59147) | about 5 years ago | (#29355275)

"Ohm terms a central reality of data collection: 'Data can either be useful or perfectly anonymous but never both.'"

Okay, I just got finished anonymizing some data. What's going out is ID (incremented, starting at 1), total voxels, voxels increasing and voxels decreasing. The people who are getting the data think it is highly useful. According to Ohm's "law" that means it is not anonymous.

Unless someone (including Ohm) pipes up with a plausible means for identifying the original subjects, I call BS on Ohm's "law."

anonymization is bullshit (2, Insightful)

Anonymous Coward | about 5 years ago | (#29355277)

Even if the data is completely and unreversably anonymized, it is still invasive. Look at the story yesterday about the marketers data-mining kids' online private conversations for consumer gadget preferences. Even if there's no way from that data to infer the preferences of any particular kid, they should still be able to talk to each other without having their conversation be part of a marketing survey.

Think also of a cafe that sells two kinds of food: apple pie (eaten by freedom-loving patriots), and felafel (eaten by terrorists and their supporters and sympathizers). Of course it would be invasive for the cafe to disclose which of its customers ordered which kind of food. But even releasing aggragate statistics is bad. An increase in felafel sales can led to a bullshit fbi investigation [wired.com] even if individual customers aren't identified.

People sitting on private data constantly search for self-searching justifications to disclose as much as they can without getting clobbered by the sources of the data. It is bullshit. Private should mean no disclosure, not anonymized disclosure, not aggregate disclosure, just plain no disclosure period.

Re:anonymization is bullshit (2, Interesting)

blahplusplus (757119) | about 5 years ago | (#29355647)

"Private should mean no disclosure, not anonymized disclosure, not aggregate disclosure, just plain no disclosure period."

The profit motive and privacy are at odds, trying to make the most money and sell the most stuff means you want to know everything about everyone so that you can one up you competitors, it's a race to the bottom. Ideals in the real world always submit to the pragmatic concerns of making money in a capitalist society.

Re:anonymization is bullshit (0)

Anonymous Coward | about 5 years ago | (#29355807)

It is a GOOD thing if people have a better understanding of people as a whole. And that is all aggregate stats give you (assuming they're done right of course).

Your real complaint is with FBI profiling large groups of people based on aggregate data. They're going to do that anyway, they're going to use their own data for it not the stuff you're complaining about, and even if you could force them to have worse data, it helps nobody. That would just turn their mostly worthless crackdowns into totally worthless crackdowns without addressing any of the underlying problems.

PS: Won't somebody think of the kids? I'm shocked, shocked to find out retail companies are trying to understand their young customers better. Eeeeeeevil.

Remeber "Mother Earth" and the Espionage Act (2, Informative)

RevWaldo (1186281) | about 5 years ago | (#29355355)

If you ever wonder why people view the privacy of your records in the hand of third parties is important, and don't just hop on the "privacy is dead" bandwagon, this is the sort of scenario they have in mind.

http://en.wikipedia.org/wiki/Mother_Earth_(magazine) [wikipedia.org]

Mother Earth was an anarchist journal that described itself as "A Monthly Magazine Devoted to Social Science and Literature," edited by Emma Goldman. Alexander Berkman, another well-known anarchist, was the magazine's editor from 1907 to 1915. It published longer articles on a variety of anarchist topics including the labor movement, education, literature and the arts, state and government control, and women's emancipation, sexual freedom, and was an early supporter of birth control. Its subscribers and supporters formed a virtual "who's who" of the radical left in America in the years prior to 1920.

In 1917, Mother Earth began to openly call for opposition to American entry into World War I and specifically to disobey government laws on conscription and registration for the military draft. On June 15, 1917, Congress passed the Espionage Act. The law set punishments for acts of interference in foreign policy and espionage. The Act authorized stiff fines and prison terms of up to 20 years for anyone who obstructed the military draft or encouraged "disloyalty" against the U.S. government. After Emma Goldman and Alexander Berkman continued to advocate against conscription, Goldman's offices at Mother Earth were thoroughly searched, and volumes of files and detailed subscription lists from Mother Earth, along with Berkman's journal The Blast, were seized. As a Justice Department news release reported:

"A wagon load of anarchist records and propaganda material was seized, and included in the lot is what is believed to be a complete registry of anarchy's friends in the United States. A splendidly kept card index was found, which the Federal agents believe will greatly simplify their task of identifying persons mentioned in the various record books and papers. The subscription lists of Mother Earth and The Blast, which contain 10,000 names, were also seized."

Mother Earth remained in monthly circulation until August 1917.[1] Berkman and Goldman were found guilty of violating the Espionage Act, (imprisoned for two years) and were later deported.

Just three bits? (0)

Anonymous Coward | about 5 years ago | (#29355377)

I understand that one bit alone will do to specify the sex but, how does one specify ZIP code with just one bit? One bit will tell you whether or no somebody has a ZIP code but, in order to specify a ZIP code completely we need - what, 16 bits?

Re:Just three bits? (0)

Anonymous Coward | about 5 years ago | (#29355557)

You are being too literal.

When someone says: "Let me give you a bit of advice." they don't continue by saying: "One" or "Zero", they give you advice.

In this case bit is being used to describe individual datum, {Zipcode, sex, birthdate}

Damnit, I fed the troll...sigh.

This parallels encryption (0, Flamebait)

DontLickJesus (1141027) | about 5 years ago | (#29355405)

All persons whom understand encryption also understand that there is no such thing as perfect encryption. Anonymizing(sp?) data works using roughly the same methods as encryption, and there is no such thing as an unbreakable encryption. We can only hope for "acceptable". I'd assume the most acceptable means of anonymizing data would be to allow the user to first choose what gets scrubbed out, followed by a sort of data "blacklist" compiled by experts. The real problem here is that companies selling this data have a vested interest in never getting it quite right.

Three things? Really? (2, Insightful)

Applekid (993327) | about 5 years ago | (#29355435)

So, despite the Birthday Paradox [wikipedia.org] , they can still identify 87% of Americans? For some reason I'm under the impression that there are a lot more zip codes with more than 366 people (heck, even 1000 to call upon 3 or 4 duplicates that should cover gender differences) than there are zip codes under that amount.

Re:Three things? Really? (5, Informative)

Daniel_Staal (609844) | about 5 years ago | (#29355571)

That Paradox ignores the year. Add that in and it starts to become harder.

Re:Three things? Really? (2)

clone53421 (1310749) | about 5 years ago | (#29355637)

Your birthdate includes the year. Your birthday does not (at least for this discussion).

The party trick of finding two people with the same birthday (a good probability in any group of 30 people or more) doesn't require them to have the same year of birth (although in most gatherings there's a good chance of this as well since often it's already somewhat segregated by age).

Re:Three things? Really? (1)

bdleonard (931507) | about 5 years ago | (#29355641)

A birthdate and a birthday are not the same thing...

Re:Three things? Really? (2, Informative)

OrigamiMarie (1501451) | about 5 years ago | (#29355753)

Perhaps they meant zip + 4. Which gets you down to very few households, but most people can't rattle off their zip + 4, so this information wouldn't actually apply to the questions posed by cashiers. On the other hand, I have heard that data mining on web-surfing habits can usually pick up your zip + 4, so yeah, it would be pretty trivial to put that together with birth date (which is asked for a various places to determine that you're of-age -- though of course you can lie) and sex, which can probably be guessed at even if you don't click one of the radio buttons.

Re:Three things? Really? (1)

ArsonSmith (13997) | about 5 years ago | (#29355785)

Date of Birth != Annual Birthday

one being month/day/year the other being just month/day.

Couple of things.. (5, Insightful)

hansraj (458504) | about 5 years ago | (#29355469)

Potential nitpick, but here goes.

The summary (not surprisingly for a /. summary) omits a couple of details that give the reader a rather partial picture.

For one, Paul Ohm is an Assistant Professor of law, and although the summary makes it sounds like the linked article would be from a technical perspective, (mostly) it is not.

A quote like:

"Data can either be useful or perfectly anonymous but never both."

needs a bit of background about the qualification of the person making that claim. Why? Simply because it sounds like a rather technical remark. If some computer science researcher made this claim, I would tend to take it more on the face value, otherwise I would take it with a grain of salt.

Now obviously this statement was not meant to be taken quite literally because the notion of "useful" is not precise. I can get reasonably useful information like "most of the people in my country like to buy branded stuff" or "most people who rent videos of actor X regularly, also rent the videos of actor Y regularly" without needing the underlying data to contain *any* personally identifiable information. The fact that extra data is store is a different thing.

I personally believe that instead of claiming that some researcher has argued X, it can be more informative to actually say what kind of researcher it is who made a claim. Not because only researchers in a certain area can be trusted, but because a little bit of background puts the claims in right perspective.

Err.. (1)

Kokuyo (549451) | about 5 years ago | (#29355485)

English is not my first language, so I probably didn't catch the whole meaning, but...

The idea was that everyone can be identified with only the birth date, gender and ZIP code? So... err... There is, in fact, not even one ZIP code that has two people living there of the same gender that happen to share a birthday? Sure, to have the year coincide would take a bit more than just the date itself but it's hard for me to imagine that this could be true.

So... what did I miss?

Re:Err.. (0)

Anonymous Coward | about 5 years ago | (#29355651)

When I lived in Atlanta, there was a person who shared my gender, birthday, first and last name, and our middle initial was the same. Our zip code was different, but I could have moved into his!

Re:Err.. (0)

Anonymous Coward | about 5 years ago | (#29355723)

That's a good point. Doing rough calculations, 2 (genders) * 120 (Maximum lifespan - I said it was rough) * 366 (maximum days in a year) = 87840 That means, thanks to our good old friend the Pigeonhole Principle [wikipedia.org] , any zip code containing 87841 people is guaranteed a collision. And, generally speaking, the collision rate is going to be much higher (there aren't all that many 120 year olds walking around). I don't see how this 87% figure was arrived at, but I'm fairly certain it was fabricated.

Re:Err.. (1)

clone53421 (1310749) | about 5 years ago | (#29355755)

You missed the 87% figure. For 13%, this data is insufficient. (6.5% will share their birthdate, zip, and gender with another 6.5%)

365*2 = 730 children per day per zip code could be uniquely identified using this information. If I understand this correctly, it implies that 730 / 93.5% = about 781 babies born per day per zip code (on a national average).

Re:Err.. (1)

snspdaarf (1314399) | about 5 years ago | (#29355827)

English is not my first language, so I probably didn't catch the whole meaning, but...

The idea was that everyone can be identified with only the birth date, gender and ZIP code? So... err... There is, in fact, not even one ZIP code that has two people living there of the same gender that happen to share a birthday? Sure, to have the year coincide would take a bit more than just the date itself but it's hard for me to imagine that this could be true.

So... what did I miss?

It takes more than just these three items. What was meant was that if you take these three items, and run them against a database of known items, you end up knowing more from the combination than from the two separately. In this case, if you have a database with redacted information, and a second, non related, database that happens to have the redacted elements from the first, by selecting a good set of common keys to run a union of the two, you can "un-redact" the missing information. Nothing new here. The point is that confidential information is that way for a reason, and should not be released at any level of sanitization.

Re:Err.. (1)

ArsonSmith (13997) | about 5 years ago | (#29355837)

perhaps the 87% part?

Anonymous can be useful.. (4, Insightful)

EasyTarget (43516) | about 5 years ago | (#29355539)

Data can either be useful or perfectly anonymous but never both

What a load of bolaks....

Supposing you have a list of -just- birth dates for every citizen at the census. You -only- have only been given one piece of data per person, the date, nothing more. Just a huge list of dates, sorted chronologically.
1) The data has been totally anonymised.
2) You can do all kinds of meaningful analysis on the age demographics of the population. And make policy decisions based on that.

Fully anonymous data producing useful results.

Re:Anonymous can be useful.. (1)

Abcd1234 (188840) | about 5 years ago | (#29355903)

Well, I think what your example demonstrates is that *application-specific* anonymization and, in your case, aggregation, can produce data that's both useful and actually anonymous. But I happen to agree with the article that, in the *general* case, it's impossible to take data and anonymize it in a way that retains it's usefulness across a large domain of potential applications while simultaneously protecting the anonymity of those in the database.

'course, when you think about it, that's common sense: To anonymize data effectively, as this article points out, you can't just throw in new identifiers, as enough detail correlated together is enough to identify individuals. Instead, if you really want to anonymize data, you have to throw out some of those details, so that the correlations are no longer possible. But if you do that, you lose information, which limits usefulness.

So, what you really need is the ability for an outside individual to submit a request to a database to obtain some cross-section of aggregated/anonymized data that's useful to them specifically, but isn't sufficiently detailed to allow individual identification. 'course, how you determine a given query is "too detailed", I don't know...

Re:Anonymous can be useful.. (0)

Anonymous Coward | about 5 years ago | (#29356079)

In the example you gave the data is not *perfectly* anonymous. For example, suppose that I knew the birthday of everybody in the census except for person i. Given the list of dates, I *could* recover your DOB so the list may not be *perfectly* anonymous.

However, I still agree with you that you can do lots of use analysis on anonymous data. You just have to define your notion of anonymity. The notion of k-anonymity, proposed by Sweeney, allows you to blend in with "k-1" other people.

My favorite definition of anonymity is a concept called Differential Privacy. Basically, the idea is that anything I could learn about you from the data set D, I could learn even if your records are not contained in D. For example, the mean age of the worlds population is close to being Differentially Private because the answer doesn't really change if I leave your age out of the average.

Perfectly? (1)

peterwayner (266189) | about 5 years ago | (#29355595)

This is much too extreme. There are many good examples of useful data that is for almost all intents and purposes anonymous. Consider the example of anonymous lending libraries [wayner.org] from my book, Translucent Databases.

The simplest version just pushes the book title through a one-way function. The more complex version also hides the name in a similar way.

Can the anonymity be stripped away? There are coincidences and connections as Sweeney's examples and the Netflix examples show, but they can be fought by adding some salt/nonce to the one-way function. We can also add passwords.

There are so many different ways to add bits of complexity to the results that there are many tradeoffs we can make between effective privacy and the complexity of using the systems. I think it's good to keep the weaknesses in mind, but I think it's more of a feasible engineering problem than something that should be dismissed out of hand. (The law review piece is also worth reading in its entirety because it's more concerned with the legal issues created by the existence of privacy-enhanced databases. It would be simpler for some issues if they didn't exist and so it helps to argue seriously.)

Levels of anonymity? (1)

Burning1 (204959) | about 5 years ago | (#29355615)

Data can either be useful or perfectly anonymous but never both

I'm not sure I entirely agree with this statement. While it's tecnically correct, I believe it's misleading...

It's perfectly possible to hash personally identifiable information into an MD5 sum, to ensure that your records are unique, and then to generate useful statistics based on the resulting aggregate data without releasing significant personal information.

For instance:

Key = Hash(Your name + Your Zip + Your Birthday)
Zipcode
Birth Decade
Hobbies
Household income (Averaged to the nearest $20K increment.)

This information is significantly anonymous, and still highly valuable market research. If you happen to submit your information twice, it will be caught by the unique hash.

Of course, the author describes 'perfect' anonymity. It's technically possible that you're the only person in your zip who is between the ages of 21 - 30, enjoys playing video games, and makes $60K-$80K a year... However, it's generalized enough to provide a great deal of plausible deniability.

The same basic statement about anonymity could be made for a person standing in a crowd: given enough detail, you could identify that person by their appearance, without knowing any unique identifying information about them.

What should you get from this? You aren't as anonymous online as you might expect. Who's really surprised?

Re:Levels of anonymity? (1)

Mprx (82435) | about 5 years ago | (#29355811)

That hash is too easily reversible. Brute force search in order of name popularity.

Re:Levels of anonymity? (1)

snspdaarf (1314399) | about 5 years ago | (#29356109)

Right, but not on target. The issue is that if databases include enough data, and seemingly trivial data at that, by selecting good common elements and good databases to generate union result sets, I can show that it really was you at the Game Store that bought Bitchslap III, Nun Terror At The Vatican for PS3 at 7:30 PM last Tuesday, and you were not at the bar watching football like you claim.

Re:Levels of anonymity? (1)

Abcd1234 (188840) | about 5 years ago | (#29356133)

Yeah, but the whole point is, given a zip code, birth date, household income, and hobbies, I can probably figure out who you are.

Fundamentally, the issue is very simple: Given some sort of identifier, and a series of properties about that identifier, if you have enough dimensions of detail, you end up narrowing down your sample so much that you end up with a population of one, that being the person the identifier "hides". It's just that simple.

The only way to prevent this is to generate crosscuts of data where you eliminate those correlations. ie, given an identifier, you only provide a limited set of dimensions which, by themselves, aren't enough to get down to a single person. But, of course, that process destroys information.

Anonymous != De-identified (0)

Anonymous Coward | about 5 years ago | (#29355865)

I think the article should differentiate between an 'anonymous' dataset and one that is 'de-identified.' Anonymous as defined in the article is not the same thing as de-identified as defined by HIPAA. If a dataset is de-identified, you cannot include date of birth (except year), date of visit/service (except year), or anything more than the first three digits of the zip code (there's 15 other identifiers that aren't allowed, either).

The attack described in the article wouldn't work if the dataset were de-identified, or at the very least, would have been a lot more difficult.

nonsense (1)

junglebeast (1497399) | about 5 years ago | (#29355919)

" 87 percent of all Americans could be uniquely identified using only three bits of information: ZIP code, birthdate, and sex." I'll be generous and overlook the gross misuse of the term "bits" in this context and pretend that the author wrote "tidbits" instead. That said, I do not for a second believe that 87 percent of Americans were the only person of the same gender born in their particular zip code on the same day. That's just ludicrous. Now if "birthday" is actually referring to a more specific point in time, such as the exact second of birth, then sure...but that's not really common knowledge.

Uniquely ID 87% of 300 million Americans? (1)

SPickett (911670) | about 5 years ago | (#29355995)

"in 2000, [researcher Latanya Sweeney] showed that 87 percent of all Americans could be uniquely identified using only three bits of information: ZIP code, birthdate, and sex"

That doesn't seem right. IIRC, there are somewhere around 60,000 zipcodes. (Obviously there are under 100000.) If the population is 300 million, that's an average of about 5000 people per zipcode. Male/female splits it in half, so you have 2500 birthdates to distribute uniquely over 365 days.

Looked at another way, 365 days *times* 2 sexes *times* 60000 zipcodes totals less than 44 million. How do you uniquely ID 300 million people?

Add the problem that many people could have given you either their work or home zipcode. How does she do that?

Re:Uniquely ID 87% of 300 million Americans? (1)

SPickett (911670) | about 5 years ago | (#29356047)

Whoops. Birthdate (not birthday). Never mind.

Ohm is overwrought (2, Informative)

feenberg (201582) | about 5 years ago | (#29356189)

I have worked with anonymized government data extensively, and birthdate and zipcode are always considered personally identifiable information. Sometimes birth year is available, and sometimes state or (rarely) county is available, but I have never even heard of a dataset with both. Datasets with month and day of birth are never considered to be anonymized, and are not released. The author of the paper is much overwrought.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>