Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Improperly Anonymized Logs Reveal Details of NYC Cab Trips

Unknown Lamer posted about 2 months ago | from the check-your-proof dept.

Math 192

mpicpp (3454017) writes with news that a dump of fare logs from NYC cabs resulted in trip details being leaked thanks to using an MD5 hash on input data with a very small key space and regular format. From the article: City officials released the data in response to a public records request and specifically obscured the drivers' hack license numbers and medallion numbers. ... Presumably, officials used the hashes to preserve the privacy of individual drivers since the records provide a detailed view of their locations and work performance over an extended period of time.

It turns out there's a significant flaw in the approach. Because both the medallion and hack numbers are structured in predictable patterns, it was trivial to run all possible iterations through the same MD5 algorithm and then compare the output to the data contained in the 20GB file. Software developer Vijay Pandurangan did just that, and in less than two hours he had completely de-anonymized all 173 million entries.

cancel ×

192 comments

Sorry! There are no comments related to the filter you selected.

Oops. (1)

mythosaz (572040) | about 2 months ago | (#47301921)

"Oops"

-New York

Cue the DMCA. (1)

MickLinux (579158) | about 2 months ago | (#47302207)

Oops.

Cue the DMCA. (2, Insightful)

Anonymous Coward | about 2 months ago | (#47302949)

In other news, the credentials for their plug-n-play coffee machine are 'admin' 'admin', and their gym locker combo is 1234. Someone made a half-assed attempt to obfuscate some data that nobody cares about (unless your husband's a cheating cabbie, I guess) and someone cracked it. News?

Re:Oops. (-1)

Anonymous Coward | about 2 months ago | (#47302255)

"Oops"

-New York

"This crime ridden area is full of niggers. What a shocking surprise!"

- Anybody that visits Harlem

Re:Oops. (1)

Anonymous Coward | about 2 months ago | (#47302791)

You mean the hacks got hacked?

This may sound like a funny incident, but it does point to the vulnerability I've always pointed out about Bitcoin: the block chain tells you who got what. Sure, the identities are hashed, but aggregate those hashes and compare them to other kinds of records and you can start drawing all kinds of interesting inferences.

Re:Oops. (0)

Anonymous Coward | about 2 months ago | (#47303345)

but it does point to the vulnerability I've always pointed out about Bitcoin

This is like pointing out the vulnerability in a screen whereby it lets air through.

Bitcoin. Is. Not. Anonymous. Currency.

Bitcoin. Is. Not. Meant. To. Be. Anonymous. Currency.

Re:Oops. (0)

Anonymous Coward | about 2 months ago | (#47303367)

That's not a vulnerability. You're perpetuating the widespread misconception that anonymity was ever a Bitcoin design goal. It wasn't and it isn't. You've apparently spent a bunch of time stating a falsehood.

(philip.paradis posting as AC because I don't log in on this machine)

Re:Oops. (-1, Troll)

imthesponge (621107) | about 2 months ago | (#47304233)

No anonymity in a currency for drug dealers?

Re:Oops. (4, Insightful)

philip.paradis (2580427) | about 2 months ago | (#47304281)

The United States dollar [wikipedia.org] is the currency preferred by drug dealers, whose trade is in fact made more profitable by the failed "War on Drugs" [wikipedia.org] .

That's nothing (1)

Anonymous Coward | about 2 months ago | (#47301943)

I know someone who keeps logs of all phone calls, all e-mails, all movement of everybody.

Re:That's nothing (1)

msauve (701917) | about 2 months ago | (#47302537)

Are his initials "NSA?"

Re:That's nothing (2)

viperidaenz (2515578) | about 2 months ago | (#47303493)

After he discombobulated Agent Smith from the inside, Neo changed his name to incorporate all 3 identities.

Neo Smith Anderson.

Data Security Officer (4, Insightful)

FlyHelicopters (1540845) | about 2 months ago | (#47301949)

Too many governments and corporations continue to fail to understand that it requires having experts who actually know what they are doing be in charge of data security.

This doesn't mean you contract it out to the lowest bidder or hire the cheapest CS degree you can find.

It means you hire knowledge and experience, you hire expert skills, and those cost money.

Re:Data Security Officer (2, Insightful)

fuzzyfuzzyfungus (1223518) | about 2 months ago | (#47302037)

In this case, it sounds like whoever got handed the job just couldn't, didn't care to, or was overruled about, thinking like an attacker.

There are probably subtler methods of de-anonymizing the data that would require nontrivial skill to think of and counter; but it's a bit surprising to see somebody who knows enough about manipulating data to pull 20GB of records and hash a single field in each one without hurting himself or munging the result; but doesn't think "Medallion numbers are written on cabs. Somebody could grab dozens of them while waiting by the curb at the airport and just MD5 them in milliseconds", much less "Medallion numbers are quite short, someone could traverse the whole damn keyspace in a few days at most".

Either their person thinks that MD5 is magic, or his thought process marched in a nice straight line from request to solution, without ever thinking about attack: "We need all medallion numbers replaced with internally consistent but unrelated UIDs." "Umm, OK. Hey, a hash function is deterministic and non-reversible, it's perfect!"

Re: Data Security Officer (1)

MalleusEBHC (597600) | about 2 months ago | (#47302259)

Adding a salt is a trivial way of fixing this.

Re: Data Security Officer (2)

WaffleMonster (969671) | about 2 months ago | (#47302337)

Adding a salt is a trivial way of fixing this.

No it aint.

Re: Data Security Officer (0)

Anonymous Coward | about 2 months ago | (#47304119)

Adding a salt is a trivial way of fixing this.

No it aint.

Care to explain why? If each hash has a different individual salt (the only correct way to use salt) and you don't include the salt in the public data (and not use MD5) then there will be no shared pattern amongst all the hashes which makes them far more difficult to reverse when you have to brute the salt as well.

Re: Data Security Officer (4, Informative)

Anonymous Coward | about 2 months ago | (#47304255)

A naive use of salt would mean that you might as well omit the data. The aim of including the values in hashed form is to be able to say: This is the same driver as this. So same numbers have to hash to same numbers, which means you can't hash individual lines with different salts or you lose that information. In order to keep that information, you have to hash same numbers with the same salt each time. That basically gives you a random number with which to replace each number. So that works, but it removes the reason for using a hash, which is to have a local operation which creates a global irreversible one-to-one mapping. If you have to create one salt per unique number, you might as well use the salt as irreversible identifier.

Re: Data Security Officer (2)

m.dillon (147925) | about 2 months ago | (#47302339)

Except you can decode the salt trivially if you took a cab ride that happens to be in the data set and you recorded the license and medallion number. At which point the salt is useless.

-Matt

Re: Data Security Officer (1)

fuzzyfuzzyfungus (1223518) | about 2 months ago | (#47302557)

It does make your table 'o handy precomputed hashes unhelpful; but on such a computationally trivial keyspace that barely matters.

I wonder if the choice of hashing, rather than substituting a UUID, was based on not thinking through the weakness of a hash under the circumstances, or based on the extra difficulty of making sure that the same UUID is substituted for the same hack and medallion number in all instances? It's not a whole lot of additional difficulty; but the tipping point has to live somewhere...

Re: Data Security Officer (1)

cheater512 (783349) | about 2 months ago | (#47302691)

What part of the story used ANY precomputed rainbow tables? None.

salt + "1234", if you know the "1234" then its a tiny brute force to get the salt.

Re: Data Security Officer (0)

Anonymous Coward | about 2 months ago | (#47303405)

Especially true if the salt is static or easily predictable.

Re: Data Security Officer (0)

Anonymous Coward | about 2 months ago | (#47302733)

I think he meant using a secret and sufficiently long salt. At this point, this pretty much becomes a HMAC. But anyway, just encrypting the license might be easier and more straightforward.

Re: Data Security Officer (1)

Buzer (809214) | about 2 months ago | (#47303877)

What? The only thing you would learn is that one license & medallion number (as in, you know which hash means that combination). You wouldn't know the actual salt (unless the hash algorithm was complete shit/your salt is too short for bruteforce).

Re: Data Security Officer (1)

msauve (701917) | about 2 months ago | (#47302553)

Using a one time pad is even easier.

Re: Data Security Officer (1)

ColdWetDog (752185) | about 2 months ago | (#47302661)

For taxi cabs?

Re: Data Security Officer (3, Informative)

msauve (701917) | about 2 months ago | (#47302889)

Sure. I'm assuming there's a requirement to have a unique transformation of medallion numbers (otherwise, you wouldn't have to include even a hashed version)...

Instead of applying some hash to the medallion number, just do something like:
Change all appearances of the first number in the list to "1". Change all appearances of the next unique medallion number in the list to "2." Etc.

The result is in essence a OTP. Unless records of the process are kept, it's irreversible (lacking external info, such as medallion number x picked up a fare at location y at time z and correlated info is in the info provided)..

Re: Data Security Officer (0, Insightful)

Anonymous Coward | about 2 months ago | (#47303217)

Change all appearances of the first number in the list to "1".

You have described something most definitely NOT a one time pad. In an OTP scheme, every *instance* of any particular value maps with equal probability to every potential output value. What you described is a basic substitution cipher--trivial to crack by frequency analysis. Every input value has a definite output value to which it maps with 100% probability. Once you find the first correlation between input/output, you can replace all the others. Not so for an OTP. Frequency analysis won't do squat if your OTP was generated in a truly random fashion and applied correctly.

And this, folks, is why you shouldn't trust advice from strangers about crypto or homebrew crypto schemes. Play with them, learn about the principles, but please, for the love of FSM, do not trust them.

Re: Data Security Officer (1, Insightful)

philip.paradis (2580427) | about 2 months ago | (#47304319)

I'm appalled that your post has been modded "informative." Please do us all a favor and abstain from any future posts on cryptography. Instead, I recommend you spend your time with resources like Applied Cryptography [schneier.com] . Seriously, please put down the shovel, and if you're doing anything involving crypto for a living, please do the world a favor and resign today.

Re:Data Security Officer (0)

Anonymous Coward | about 2 months ago | (#47302389)

Maybe we would rather have the data than the governent clamping up and releasing nothing (just in case) - and changing the laws because of the cost burden.

Re:Data Security Officer (4, Interesting)

Opportunist (166417) | about 2 months ago | (#47302541)

You can contract it out to the lowest bidder without a problem. There only have to be 2 clauses in the contract:

1) You have a GOOD ITSEC company audit the shit out of it before it goes live.
2) If the audit reveals that the company taking the contract don't know jack about security, THEY will pay for the audit and THEY will improve the software until they think it's finally good enough.

1 and 2 are repeated until 1 turns out good.

I worked for a very long time in government. And I learned one thing: You are not supposed to know shit. You are supposed to buy knowledge.

Re:Data Security Officer (3, Insightful)

chriscappuccio (80696) | about 2 months ago | (#47303249)

Sorry but unless you define "GOOD ITSEC company audit the shit out of it" in tangible terms that can actually hold someone liable for failure in a real way, this is just baloney. And if you define it with teeth, the price will increase. Basically, to define it properly, you'd be able to do it yourself. Oops.

Re:Data Security Officer (0)

Anonymous Coward | about 2 months ago | (#47303417)

I worked for a very long time in government. And I learned one thing: You are not supposed to know shit. You are supposed to buy knowledge.

Bull spittle. How it's currently being done is not how it should be done, nor how it has been done in the past. How it's currently being done is broken and flawed.

Re:Data Security Officer (1)

SeaFox (739806) | about 2 months ago | (#47303619)

I worked for a very long time in government. And I learned one thing: You are not supposed to know shit. You are supposed to buy knowledge.

Isn't that how the entire job market works? That's why we have the education loan bubble we have -- employers don't believe you know anything without a piece of paper showing you spent thousands of dollars to learn it.

Re:Data Security Officer (1)

gweihir (88907) | about 2 months ago | (#47302543)

It is not a surprise when you consider where else they mess up spectacularly. It is like there is no active intelligence to be had in these organizations.

Re:Data Security Officer (0)

Anonymous Coward | about 2 months ago | (#47302769)

Too many governments and corporations continue to fail to understand that it requires having experts who actually know what they are doing be in charge of data security.

This doesn't mean you contract it out to the lowest bidder or hire the cheapest CS degree you can find.

It means you hire knowledge and experience, you hire expert skills, and those cost money.

And always consult the Slashdot crowd first . . .

Re:Data Security Officer (4, Interesting)

penix1 (722987) | about 2 months ago | (#47302855)

From TFS...

City officials released the data in response to a public records request and specifically obscured the drivers' hack license numbers and medallion numbers...

How many of you here have had to deal with a Freedom Of Information Act (FOIA) request which is what a "public records request" is? I have had the pleasure over a dozen times. You have 10 days to respond to that request in my state. Some states it is even less. Failure to do so can result in stiff penalties. 10 days is hardly enough time to contract out to someone and have the job "done right".

It means you hire knowledge and experience, you hire expert skills, and those cost money.

And you are happy to have your taxes raised to pay those fees? Riiiight!

Re:Data Security Officer (1)

chromaexcursion (2047080) | about 2 months ago | (#47303017)

Small problem.
Taxi Hack numbers are available in a publicly accessible data base.
A determined individual probably could find license numbers, they may be publicly accessible.
Failure to understand the vulnerability is the design failure.
A simple solution would have been to order the hashes numerically and re-number them cardinally. ie. 1,2,3 ...
Would take less than a minute, for someone than knew how.
Perhaps a few hours if the right person had to be tracked down.
Never release source data.

Re:Data Security Officer (5, Informative)

sexybomber (740588) | about 2 months ago | (#47303197)

Your State may be different, but New York's Freedom of Information Law (or FOIL, we like to be different) works like this:

The agency has to respond within five business days, but that response can read something like:

Dear Sexybomber:

We have received your request for public records pursuant to FOIL. Due to the complexity of the records you have requested, it may not be possible to produce them within the standard 20-day statutory period. We anticipate that we will be able to produce the records you have requested within 40 days. If you have questions or concerns, please direct them in writing to the address above.

If they run into a snag, they have to inform you of this and produce the records within a "reasonable period".

So it's not like NYC was under a five-day time crunch here. They could easily have responded and said it would take 40 or 60 days, being as there were several million records requested. That's definitely long enough to bring in a consultant (or even one of the more technically-literate staff members) to properly secure the data.

Re: Data Security Officer (0)

Anonymous Coward | about 2 months ago | (#47303467)

A response doesn't require you to provide all the data within 10 days though - as long as you *respond* to the requester, you can still tell them it will take x days to gather and process the info - as long as the communicated time line isn't unreasonable, it's still OK.

This (0, Insightful)

Anonymous Coward | about 2 months ago | (#47301975)

This is why we can't have nice things.....

That was a dumb thing to do. (1)

K. S. Kyosuke (729550) | about 2 months ago | (#47301981)

Cue a CFAA trial and a long stay in a cozy federal PMITA penitentiary.

Re:That was a dumb thing to do. (1)

Opportunist (166417) | about 2 months ago | (#47302561)

And the crime would be? Exposing government stupidity?

Prediction: de-anonymization considered "hacking" (5, Insightful)

rsborg (111459) | about 2 months ago | (#47301983)

Large organizations will consistently fail to hire/staff competent people for data security related issues, and will push back on fines or punitive findings by criminalizing publicizing their incompetence.

Thus sending all such talent straight to criminals who'll be happy to reward them with hard cash.

It's like these guys _want_ a dystopian future.

Re:Prediction: de-anonymization considered "hackin (0)

Anonymous Coward | about 2 months ago | (#47302263)

Target's breach cost them 50% of their revenue for a year.

That nearly put them out of business.

The meeting between them and the card carriers went something like
AMEX_Discover_MasterCard_VISA: You are paying to replace cards, paying for the fraud from the compromise (10-15 billion\year), and you are paying enhanced fee's for several years to us until you proove you are again trustworthy of the normal rates we give your competitors.
Target: And if we refuse?
AMEX_Discover_MasterCard_VISA: We will choose not to do business with you. Your customers will have to buy in cash.
Target: Oh...well then. Where do I sign?

As systems become more integrated, Data Security is going to become less about keeping egg of of your face and more about corporate and personal survival. Those old movies from the 80's with the hacker causing the elevator to drop 100 stories, or the cellphone battery to explode, or the factory to go out of business for months on end, and so on?

The industry is already moving towards true security.

When data starts being used against people personally, they will begin asking questions, and then it will become very important.

Until then, if you're a hacker and know your shit, enjoy being God.

Re:Prediction: de-anonymization considered "hackin (5, Interesting)

Opportunist (166417) | about 2 months ago | (#47302639)

True that.

I am in the fortunate situation of having near unlimited funds. I was joking that I need a rubber stamp labeled "for security reasons", because whenever I want something, these three magic words will brush aside nearly all objections (ok, within reason, but anything 5 digits or less is nearly certainly mine if I "rubber stamp" it that way).

The most recent draft of the security procedures I did I peppered liberally with "insanity" as I call it. It's a political thing. You demand stuff that you don't really want but is so terribly obstructive to everyone else that they'll agree with what you actually want just to get the insane levels of "security" (read: obstruction and red tape) out of the way. To my unending horror (and slight amusement) they signed it off without changing a comma. Now find out how to argue why you want your own requirements out of the crap...

The reason isn't that our board suddenly found out how much they love security or how important the confidentiality of the (considerably sensitive, I should add) private data we hold here is. What changed is simply that our government upped the fines and punishment for data breeches considerably, up to and including jail time for board members if negligence can somehow be tacked to them. In a nutshell, unless you can show that you tried to stay on top of security when holding highly sensitive data, you should prepare to take a longer vacation, all expenses paid, in a holiday resort of your government's choice.

I guess when your ass is on the line, you get very willing to spend money.

Re:Prediction: de-anonymization considered "hackin (1)

chromaexcursion (2047080) | about 2 months ago | (#47303081)

You've elegantly described why stiff federal penalties are needed.

Interesting that when a direct line to someone's pocketbook is defined everyone gets on board, but when it's just a chance someone's drinking water would be tainted with cancer causing chemicals most can't find the connection.
Corporate malfeasance comes in all forms.

Re:Prediction: de-anonymization considered "hackin (5, Interesting)

Opportunist (166417) | about 2 months ago | (#47303327)

Fines in a corporate world are a matter of risk management: How likely is it that it happens, what's the fine if it happens and how much do we save by not giving a damn? If this unholy trinity comes up with the "don't give a damn" on top, you don't give a damn and the fine becomes part of the operation cost. The more I get to play with C-Levels, the more I get the nagging feeling that I'm the only one weighed down by a consciousness.

Actually, I think it's more insidious. It's a blame shifting game where everyone can claim he's doing it for the "greater good", because "being bad" is actually "being good". Take the scenario where some people have to be laid off. The floor manager knows them personally. He knows every single one of them, he knows their personal life, their family situation and it really breaks his heart to let one of them go, but he knows he has to. Either he fires one of them or he might have to fire them all because they won't be profitable anymore with the new requirements, and that could lead to the shutdown of the entire branch. His superior may not know the people anymore, but he has to do it because he himself doesn't make that decision, that's been decided further up. He can't simply ignore an order from C-Level. The C's don't need to be psychopaths (though it sure helps, it seems...), they can even be compassionate, but they know that the investors will only keep their money in the company if they perform well and if the cash flow is to their liking. He can easily brush any troubles with his consciousness aside when he fires a few people now, since if he didn't their quarter figures won't look nice, stock would plummet and investors will jump ship, and then he'd have to lay off even more people. But you can't even blame the investment bankers. Because they have to pick the best performing stocks, it's not their money, it's money from investors, money they put aside for their retirement, the investors have a responsibility towards the people that entrust them with their money (ok, recent history shows that most don't give a shit, but let's assume we find an investment banker with a consciousness... it's just a thought experiment, remember). The people investing money don't even know WHAT they invest in, they just toss money onto their investor with the order to "make more of it". And they're not "evil" either, they just want to prepare for their retirement. That people could well be the same that get fired now for the sake of more profit. Essentially, they're firing themselves without knowing it.

But I ramble.

What this is supposed to show is that in the corporate world it's easy to play the blame shifting game and use the "but I have to!" excuse. It's sad but it seems the only escape from that game is to actually grab them at the nuts and tell them that they won't be shifting the blame anywhere. And behold, it works.

Of course that also means that I have to watch my back or it's going to be my ass that's going to jail. But fortunately all I have to do is heed the laws. And that's easy enough, surprisingly.

Re:Prediction: de-anonymization considered "hackin (0)

Anonymous Coward | about 2 months ago | (#47303587)

You are in rare form. Glad to be here for it.

Re:Prediction: de-anonymization considered "hackin (0)

Anonymous Coward | about 2 months ago | (#47303661)

Indeed--I suspect he's had a bit much to drink. But it's really quite fascinating...

Re:Prediction: de-anonymization considered "hackin (2)

skovnymfe (1671822) | about 2 months ago | (#47303927)

A new car built by my company leaves somewhere traveling at 60 mph. The rear differential locks up. The car crashes and burns with everyone trapped inside. Now, should we initiate a recall? Take the number of vehicles in the field, A, multiply by the probable rate of failure, B, multiply by the average out-of-court settlement, C. A times B times C equals X. If X is less than the cost of a recall, we don't do one.

Re:Prediction: de-anonymization considered "hackin (1)

wonkey_monkey (2592601) | about 2 months ago | (#47304147)

A new car built by my company [...] car crashes and burns with everyone trapped inside. Now, should we initiate a recall?

No, you just need to stop making such shitty cars.

Re:Prediction: de-anonymization considered "hackin (1)

superdana (1211758) | about 2 months ago | (#47303525)

data breeches

bring me my computing pants!

Re:Prediction: de-anonymization considered "hackin (5, Informative)

Anonymous Coward | about 2 months ago | (#47303621)

> Target's breach cost them 50% of their revenue for a year.

No it did not. Not even close. [cbsnews.com] At worst their profits for the subsequent quarter were down 50% or in terms of revenue, that's less than a 6% drop compared to a year ago.

Re:Prediction: de-anonymization considered "hackin (0)

Anonymous Coward | about 2 months ago | (#47303739)

It was a pleasant fantasy and you had to go and spoil it all.

What's the issue here? (0)

Anonymous Coward | about 2 months ago | (#47301985)

People will know driver XYZ drove from 122 Main St to 123 Second St?
It's not like they have the info on where the person was actually going when they got out of the cab.
This isn't even an issue. *yawn*

Re:What's the issue here? (4, Insightful)

gweihir (88907) | about 2 months ago | (#47302529)

You are naive. The problem starts to crop up when you start correlating things. Then you can find all sorts of things, like patterns of visiting a mistress, people meeting in secret (which is perfectly legal, but the government fears it), etc.

Re:What's the issue here? (4, Insightful)

chriscappuccio (80696) | about 2 months ago | (#47303477)

The government has the info already, they handed it out!

Re:What's the issue here? (1)

gweihir (88907) | about 2 months ago | (#47304029)

And the Government is the only party that does data-correlation?

Re:What's the issue here? (5, Insightful)

Opportunist (166417) | about 2 months ago | (#47302747)

Actually the movement of a cab is a wealth of information. Not by itself, but it's very good at connecting dots. If you want to follow someone around, these things tend to be invaluable. You can, essentially, follow someone around without following them around, even retroactively. People rarely go from place to place randomly. They have destinations. If someone takes a cab from the airport and doesn't live in the area where he landed, it is likely that his destination is the place that he will stay in. After a flight, especially a long one, people want to get rid of their heavy baggage, take a shower, put on new clothing. So you can easily find out where someone stayed. Which becomes twice as interesting if the destination is not a hotel, because now you got another person to screen.

This information by itself is not much. But as part of a bigger network it is something we'd have killed for back when I was still doing profiling.

Re:What's the issue here? (0)

Anonymous Coward | about 2 months ago | (#47302953)

Yeah I could see this, if this information included the name of the passenger. It's only the cab information, origin and destination. Unless I'm reading this wrong it's not like it says Joe Sixpack got into cab #123 at such and such street and got off at this other street at this time. I don't see how you'd get any useful information out of this. And if you could, who would care? It's not like it has the passengers payment information like CC info or drivers license number or whatever... doesn't seem to be an issue. Why even go through the trouble of trying to follow someone with this info when if you really wanted to it'd be much easier other ways I'm sure. Still seems like a non issue to me.

Re:What's the issue here? (2)

AHuxley (892839) | about 2 months ago | (#47303165)

Has Joe Sixpack been seen near any anti war protests? Written to the press at a city, star or federal level? Given charitable contributions to a faith based group now under investigation? Have a security clearance? Have a family member with a new or old security clearance? Does Joe Sixpack travel outside the USA a lot?
Its not just about been "much easier" its about getting it all, having domestic staff feel ok about storing and sorting domestic details per person, been able to legally collect more domestically without needed per person court work.

Re:What's the issue here? (2)

Opportunist (166417) | about 2 months ago | (#47303275)

The point is that you can't follow every Joe Random around all the time. But occasionally some Joe Random becomes a Joe Someone and you just wish you had the information that you could have if you just followed him.

Scenario.

You find out that there is someone you deem a nuisance to the powers that are. You finally caught him. But he doesn't talk. Imagine you're an entity that has access to a lot of information, either directly (because you have it) or indirectly (because you can request it). Using the CC information of your subject you find out that he recently spent time in another city (because you get the flight information). Since there is no other reason (like, say, business reasons), and since his travel visa says "vacation", you deem it likely that he met a contact or even an accomplice. You have no hotel bills on CC, so either he paid in cash or, and this is what you hope for, he stayed with his contact.

You know when his plane landed and you can even determine to some degree of certainty when he left the airport (you may even have access to the CCTV to pinpoint the moment). Of course more than one taxi leaves around that time, but most of them go to hotels (that you can then check out for reservations by the name of the person you're looking for). What you're really hoping for is a private address. And unless your subject was very careful, he might even have given the cab driver the real address, which now offers you another address and another contact to use.

Next thing you want to do is find out all cab movements to and from this address. It may be some kind of "hub" for people of that particular kind of nuisance, you may actually find some kind of structure. You can at least find out whether your subject also took cabs to other destinations and when, how often and where he went.

Or how about a more general approach? You could use the information to find out whether some private address gets visited by people from outside of town suspiciously often. What do they do there? Why do they go there? Do they stay there? If not, what could they be doing there?

Cabs offer a wealth of information. Again, by itself that information is fairly useless, but it is great for "connecting dots", because that's what cabs do: They move from point A to point B with their passenger.

Re:What's the issue here? (2)

AHuxley (892839) | about 2 months ago | (#47303093)

Very insightful Opportunist .
With more nations trying to count passports in and out a wealth of information about each person entering some countries is now been stored.
From face recognition, gait analysis, 'free' wifi, a new/old phone been set up for cheaper local use, the random risk of a laptop been examined and cloned on entry and exit.
If you want to rent a car you face a complex 'chat down' by the friendly on site rental staff.
So you take the next random taxi.
In the past along a long airport road the interaction of a few tailing vehicles might be detected given the number of turns into a city.
Destinations can be looked at over time, in near real time and as a history.
That first trip can open up a world of new digital 'hops' - old friends, college buddy, lover, extended family, until now unknown associate to having their lives been examined too.
If you go to a hotel you face another 'chat down' attempt by the friendly staff over a long complex CC or cash transaction.
No follow car pool or beacons needed anymore just go big, local and federally with “collect-it-all” :)

Re:What's the issue here? (1)

dcw3 (649211) | about 2 months ago | (#47303691)

I would love to have a glimpse, at this. I bet we'd be able to find some hacks who frequently take extended routes to bump up their fares.

I'm new here! (-1)

Anonymous Coward | about 2 months ago | (#47301987)

I just noticed there's a version of this site called "classic", it looks really good - easier to read through than the current version. Are there plans to migrate to the "classic" version?
Sorry for the stupid question.

Re:I'm new here! (0)

Anonymous Coward | about 2 months ago | (#47303031)

The 'classic', or alpha version is, as the name implies, reserved for the alpha [urbandictionary.com] users. All the rest will have to stick with the beta [urbandictionary.com] version.

Go directly to jail (0)

AndyKron (937105) | about 2 months ago | (#47302033)

Now you must go to jail Sorry :-(

Oops, indeed (4, Funny)

Krishnoid (984597) | about 2 months ago | (#47302041)

Software developer Vijay Pandurangan did just that, and in less than two hours he had completely de-anonymized all 173 million entries.

Having thereby run afoul of the circumvention of copyright protection mechanisms clause of the Digital Millenium Copyright Act, he was then subjected to the NYPD's controversial new program [theonion.com] , and subsequently incarcerated.

Re:Oops, indeed (0)

Anonymous Coward | about 2 months ago | (#47302423)

For a moment there, I thought you said incinerated.

Re:Oops, indeed (0)

Anonymous Coward | about 2 months ago | (#47302767)

You can't copyright lists of facts, therefore DMCA doesn't apply.
I'm sure they'll find something else specious though.

Give that man a Big Gulp! (0)

Anonymous Coward | about 2 months ago | (#47302119)

Wait. It's NY city. We can't do that.

Slashdot ACs: You're On Our Watch List muahHahahha (-1)

Anonymous Coward | about 2 months ago | (#47302129)

An AC mentioned this in a post a couple of weeks ago on Slashdot about Slashdot analytics to cross reference so-called anonymity. An md5 reply immediately folowed. Just wondering; just sayin'.

Error so popular it was enshrined in PCI DSS (5, Insightful)

WaffleMonster (969671) | about 2 months ago | (#47302281)

Always assumed anywhere term "anonymized data" is used it is more likely than not to be companies and governments paying lip service to its customers... where data could easily be reversed into an identifiable way by either taking advantage of insufficient entropy or cross referencing datasets.

There is after all no cost for violating privacy or unnecessary risk exposure associated with disclosure.

One of my favorite examples of dangers of insufficient entropy stem from a PCI DSS requirement written by "experts" who should know better.

3.4 Render PAN unreadable anywhere it is stored (including on portable digital media, backup media, and in logs) by using any of the following approaches:

One-way hashes based on strong cryptography, (hash must be of the entire PAN) ...

Search space of typical 16-digit card numbers is no match for a modern CPU once you have taken check digit, card type, issuer and issuer specific numbering into account... "strong cryptography" can't fix stupid.

Re:Error so popular it was enshrined in PCI DSS (0)

Anonymous Coward | about 2 months ago | (#47302483)

>There is after all no cost for violating privacy or unnecessary risk exposure associated with disclosure.

Except for healthcare.

Re:Error so popular it was enshrined in PCI DSS (1)

gweihir (88907) | about 2 months ago | (#47302519)

Indeed. Any reversible transformation for a small-entropy source set is insecure. Anybody that actually understands crypto knows that. Seems this mess is just one more indicator that some people hire far too cheap when it gets to IT.

Re:Error so popular it was enshrined in PCI DSS (1)

swillden (191260) | about 2 months ago | (#47303117)

Always assumed anywhere term "anonymized data" is used it is more likely than not to be companies and governments paying lip service to its customers... where data could easily be reversed into an identifiable way by either taking advantage of insufficient entropy or cross referencing datasets.

It's worth mentioning that one possible solution in this sort of situation is to use a keyed hash. Assuming a good base hash (which MD5 really isn't, any more, but HMAC MD5 would likely have been fine) and a well-secured key with sufficient entropy, it is infeasible to reverse the hash. Cross-referencing may still be an issue, though straight brute force reversing of the hashing isn't. To eliminate the possibility of cross-referencing it's necessary to use a different hash key for each database.

Of course, like all cryptographic "solutions", this merely replaces a large secret (the contents of the database(s)) with a small secret (the key or keys). Still, it's typically easier to secure a key than a database. "Easier" doesn't mean "easy". Depending on the application, though it's often the case that if all you need is unique IDs for delivery to a third party, you can just generate a random key, use it to hash all of the to-be-secured IDs then discard the key.

Oh, and the real "solution", of course, is to hire someone who knows what they're doing and give them the time and resources to fully and accurately understand the security problem they're trying to solve. They'll either do the job or tell you it can't be done (or do the job and screw it up in a subtle and non-obvious way rather than a stupid and obvious one... but hey, at least if it's broken it'll be subtle and non-obvious break).

Re:Error so popular it was enshrined in PCI DSS (0)

Anonymous Coward | about 2 months ago | (#47303645)

if all you need is unique IDs for delivery to a third party, you can just generate a random key, use it to hash all of the to-be-secured IDs then discard the key.

Thereby introducing a known plaintext into a cryptographic construct--something not to be taken lightly. In that instance, actually worse than useless. If all you need is unique IDs, a random sort and cardinal numbering is better (no known plaintext introduced to later convey extra information like license plate # in the event of a break). Random IDs would work just as well.

There's an incredible amount of bad homebrew crypto suggestions on this article. Play with it, learn the principles, but please, for the love of FSM, don't trust it.

Re:Error so popular it was enshrined in PCI DSS (2)

Wrath0fb0b (302444) | about 2 months ago | (#47303271)

Um, the standard is fine. The phrase "One-way hashes based on strong cryptography" means (to any professional in the business) that one must salt [wikipedia.org] the hash with sufficient entropy to make brute-forcing the input space impossible. So 16 digit CC has little entry, but add a 16-byte hash and you've somewhere.

So yeah, "strong cryptography" can't fix stupid, but those that know how to use it are plenty fine.

Re:Error so popular it was enshrined in PCI DSS (1)

WaffleMonster (969671) | about 2 months ago | (#47303831)

Um, the standard is fine. The phrase "One-way hashes based on strong cryptography" means (to any professional in the business) that one must salt the hash with sufficient entropy to make brute-forcing the input space impossible. So 16 digit CC has little entry, but add a 16-byte hash and you've somewhere.

This is the second time 'use salts' has been mentioned. Salts are not secret keys and only provide protection against creation of lookup tables to accelerate brute force of multiple items... they in no way address the underlying problem of insufficient entropy.

I don't know the exact figure last I looked into this space of every possible credit card that can be issued across all currently known issuers is well less than a trillion most likely in tens to hundreds of billions range... practically free by today's hardware standards.

Re:Error so popular it was enshrined in PCI DSS (3, Interesting)

Buzer (809214) | about 2 months ago | (#47304195)

Salts do provide protection against that. Salts are secret if you want them to be (you can protect the plain text salt same way as you do protect your plain text keys for encryption), you only need to share them when other party has to be able to hash their original data.

Here are some sha1 hashes:

  • 4c2199828f355281e0f6eccb76d9df609f99ed0e salt+"123"
  • 458183225b77f6baff7c4c439b0ed3a5e7278e8a salt+"456"
  • ed974fc96c530639cccc9b18315396789d93a697 salt+"789"
  • f87a2fa039a20d01032f19b5852868343f3d06b9 salt+"???"

So, how about you tell me what that last number combination is? I can give you a hint that it matches regex /^[1-9]{3}$/ (so there are only 729 possibilities). The salt is 60 character string. If you cannot do it, then OPs post was correct.

PCI DSS is to protect bank not customer (0)

Anonymous Coward | about 2 months ago | (#47303633)

It has never been about protecting you the customer with the CC, but to give bank & firm a protection against lawsuit or class action in case of massive breach , now they can simply say "hey we were respecting the PCI DSS standard" and be out of the heat. That's why there is no real security, or requirement to have something stronger like a salt hash.

Where is the harm (0)

Anonymous Coward | about 2 months ago | (#47302379)

why did NYC attempt to hide the data in the first place?

Re:Where is the harm (0)

PPH (736903) | about 2 months ago | (#47302945)

Probably some union rule prohibiting the compilation and/or publication of driver's performance records. It's all seniority.

Vijay Pandurangan arrested (0)

Anonymous Coward | about 2 months ago | (#47302473)

Surely Vijay Pandurangan will not be arrested for hacking?

Re:Vijay Pandurangan arrested (1)

wiredlogic (135348) | about 2 months ago | (#47303245)

Hacking? This man is obviously a terrist fer'ner. Get him to Gitmo in a rendition wagon ASAP.

MD5 is not the problem (1)

gweihir (88907) | about 2 months ago | (#47302493)

For this application, MD5 did not make a difference. SHA512 would have been just as insecure. For some applications, MD5 is perfectly secure if used competently. This example is one and the original story doe snot claim any culpability on the part of MD5. As always, there is no substitute for knowing what you are doing.

congrats, whats the prize? (-1)

Anonymous Coward | about 2 months ago | (#47302589)

For proving you have programming skills to complete "trivial" hacks, and the moral compass of a hubble gyro?
Do you only steal from houses with "cheap locks"?
I guess exposing private work flows that criminals smarter than you will use to murder and rob cabbies never occurred to you. But what the heck, your so l33t. Maybe when you get really good at being substandard, you can get Snowdens job. He was another asshole that had no understanding even of the relative drivle that slid through his cube. Are you some kind of H1b credit card stealling call center fuckhead ?

I de-anonymized this comment (1)

ewg (158266) | about 2 months ago | (#47302727)

I de-anonymized this comment by signing in.

Re:I de-anonymized this comment (-1)

Anonymous Coward | about 2 months ago | (#47302795)

I de-anonymized this comment by signing in.

You have become part of the problem.

Using a published hash - FAIL (1)

chromaexcursion (2047080) | about 2 months ago | (#47302803)

Using any public hash exposes you to dictionary attacks. Especially when you publish which one you've used.
The quality of the encryption is irrelevant.
Security through obscurity, using a custom algorithm, is the only way.
Taking MD5, it's published, and tweaking a few points (though who ever did this needs to be very competent) would have been sufficient.

Some manager probably said any work for addition security wasn't worth the cost. Ooops!

Re:Using a published hash - FAIL (2)

PPH (736903) | about 2 months ago | (#47302987)

Security through obscurity, using a custom algorithm, is the only way.

Not necessarily. I imagine the reason the hashed field was included in the published logs was to provide a key to group results by driver. Even if that driver was to remain anonymous. So all the city would have had to do is issue a system generated UID for each medallion/license number combination and populate the published data with that.

Nobody knows who driver 1, 2, 3, .., 736903, ... etc. are. But one can still analyze per-driver data.

Re:Using a published hash - FAIL (1)

chromaexcursion (2047080) | about 2 months ago | (#47303173)

nope, it has to do with the key. given a tag # and license # you can dictionary attack the hash. especially since the the source data is known, easy to break.

they didn't pre-anonamize the keys

Re:Using a published hash - FAIL (3, Interesting)

Vellmont (569020) | about 2 months ago | (#47303101)

Taking MD5, it's published, and tweaking a few points (though who ever did this needs to be very competent) would have been sufficient.

No, that would have been stupid. It's unlikely someone would have reverse engineered your hacked md5 algorithm, but it's also possible you could screw it up.

The solution is VERY simple. Generate a random 256 bit string. Hash random-string+data, and use the output as the identifier. Throw away the random 256 bit string.


Some manager probably said any work for addition security wasn't worth the cost. Ooops!

No, some developer didn't know what the hell they were doing. You'd be surprised (but shouldn't be) how little most developers know about security, especially encryption.

Re:Using a published hash - FAIL (1)

chromaexcursion (2047080) | about 2 months ago | (#47303279)

well, you just described a way to tweak an algorithm.
wouldn't even have to go to a 256 bit key. Doing that into MD5 would probably foil anything less than a concerted financial attack.
No media outlet could afford the computing power to attack that.
I used the same approach, with some further tweaks to secure financial communications a decade ago.

Lack of understanding security doesn't surprise me. I'm an engineer who does. I designed and wrote a suite that passed a 3d party, hostile, security audit.

Re:Using a published hash - FAIL (1)

Vellmont (569020) | about 2 months ago | (#47303393)

No, that's not a tweak to an algorithm, it's a random input to an algorithm. The algorithm is the same, the input is different.

Re:Using a published hash - FAIL (0)

Anonymous Coward | about 2 months ago | (#47303767)

Lack of understanding security doesn't surprise me. I'm an engineer who does. I designed and wrote a suite that passed a 3d party, hostile, security audit.

That could mean something, or it could have been useless if the 3rd party was incompetent, or was lazy in their audit process. You can't stake your claim to be an engineer who understands security on that w/o a bit more evidence...but, maybe you have and just didn't share that.

Re:Using a published hash - FAIL (0)

Anonymous Coward | about 2 months ago | (#47303547)

You do realize that hash functions are often used in PRNGs? And you realize that if you can generate 256 bits of random garbage, there's no extra need to hash the original data AND the garbage. You can simply use the 256 bits of garbage if all you need is an identifier. You're talking about adding in known plaintext where none is actually needed. You might hash the 256 bits of garbage to produce better pseudorandom garbage, but that's a secondary consideration. Heck, if we're going to assign identifiers to cabs in the first place, just number them 0-255. It's about as useful... The identifiers are public in the first place. The space of identifiers is public. If the same cab always shows up as 42 in the logs,, even though its real number is 7, you're not obscuring a damn thing or even slowing anyone down (more than trivially).

I'm really at a loss as to what problem your solution was looking for.

Re: throwing away the salt (1)

Anonymous Coward | about 2 months ago | (#47304271)

The solution is VERY simple. Generate a random 256 bit string. Hash random-string+data, and use the output as the identifier. Throw away the random 256 bit string.

If you're going to throw away the salt, why not just assign a unique, shuffled identifier for each data string?

A hash collision could make it look like a single taxi driving in opposite directions simultaneously, or it could cause a pair of day-night shift taxis to appear to be a single taxi that's used 24/7. So if you want to avoid hash collisions, you at least have to verify that none of the values hashed to the same value, and the cost of doing that is roughly the same as the extra overhead of generating a shuffled identifier.

A little salt might have helped. (1)

brian81 (1370977) | about 2 months ago | (#47304115)

I they would have salted the hash, they may have gotten away with it.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>