×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

NSF Funds Data Anonymization Project

Soulskill posted more than 3 years ago | from the maybe-facebook-will-donate dept.

Education 36

Trailrunner7 writes "A group of researchers from Purdue University has been awarded $1.5 million from the National Science Foundation to help fund an ongoing project that's investigating how well current techniques for anonymizing data are working and whether there's a need for better methods. The grant will help to further research from computer scientists and linguists, who are looking at ways in which people can still be identified through textual clues even after explicitly identifiable data has been removed. The Purdue anonymization project has been ongoing for some time, and also includes researchers from a number of other institutions, including Indiana University and the Kinsey Institute."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

36 comments

Testing Slashdots Methods for Anonymization (4, Funny)

Anonymous Coward | more than 3 years ago | (#34103516)

It works!

Can I pick up my grant check now?

Re:Testing Slashdots Methods for Anonymization (4, Funny)

drunkennewfiemidget (712572) | more than 3 years ago | (#34103648)

We wish you could, but we don't know who to write the cheque out to.

Re:Testing Slashdots Methods for Anonymization (2, Funny)

teachknowlegy (1003477) | more than 3 years ago | (#34103760)

Write the check out to Anonymous Coward, duh! When someone produces the ID of the same name, they can have it. Name changes are cheap, aren't they?

Re:Testing Slashdots Methods for Anonymization (1)

stephanruby (542433) | more than 3 years ago | (#34103768)

Sorry Rob,

With that number next to your "anonymous" name, #34103516, you might as well just have given us your full social security number.

Re:Testing Slashdots Methods for Anonymization (3, Funny)

zkp (1634437) | more than 3 years ago | (#34104674)

There are many bits of information we can glean!
  1. Your "anonymous" name, #34103516
  2. Date and Time: (Tuesday, November 02 @ 6:04PM)
  3. You were one of the first posts so you probably read Slashdot often. Also, you probably visit Slashdot regularly around 6:00 PM.
  4. Writing Style: Short messages, funny

    So I could search for regular Slashdot users who tend to be active around 6:00 PM, post brief messages, and are often one of the first to comment. Narrow down that list to users who actually did log in on 11/02/2010. Since, we know that you did read this article there is also a decent chance that you commented on this article with your actual user name.

    We will find you!

Re:Testing Slashdots Methods for Anonymization (0)

Anonymous Coward | more than 3 years ago | (#34107952)

We got your time, You done left evidence and all, You are so dumb, for real you are really dumb

You don’t have to come and confess, We’re lookin for you, We gon find you!

Re:Testing Slashdots Methods for Anonymization (0)

Anonymous Coward | more than 3 years ago | (#34106308)

Not anon at all. Your UID is 666. [slashdot.org]
And you stole my account you asshole!

Hmmm (2, Insightful)

WrongSizeGlass (838941) | more than 3 years ago | (#34103532)

I wonder if they could get a larger grant from Google or Facebook or the NSA or [insert large organization name here] to get a guaranteed result of "things are just fine, nothing to see here"?

Re:Hmmm (1)

natehoy (1608657) | more than 3 years ago | (#34103610)

No, that would be wrong, of course. They'd never be able to accept a grant. It could never happen. Ever.

But only because, technically, it's called a bribe, not a grant. If you want to call it a grant, you have to put it in quotes, as in: "I wonder if they could get a larger "grant" from..."

Re:Hmmm (1)

elrous0 (869638) | more than 3 years ago | (#34110874)

Now why would the NSA be interested in technology that could identify anonymous posters using "textual clues even after explicitly identifiable data has been removed"? That's just silly talk.

WTF (0)

Anonymous Coward | more than 3 years ago | (#34103602)

This is a sheer waste of money.

How can you anonymize without removing meaning (0)

Anonymous Coward | more than 3 years ago | (#34103736)

You can remove the nominative information but still the data is not anonymous because you can use advanced technique like behaviour analysis to segment your samples back to the individual and then correlate it with some known data about the individual to identify it.

If you remove enough discriminative information (information the enables you to separate your sample into groups) you data start loosing meanings fast. And I don't see how can you have meaningful data if you removed all the information that would enable you to recreate the individual sample.

Re:How can you anonymize without removing meaning (1)

Monkeedude1212 (1560403) | more than 3 years ago | (#34104672)

And I don't see how can you have meaningful data if you removed all the information that would enable you to recreate the individual sample.

You can still segregrate them into groups even if you can't identify the individual sample - which is essentially what happens already. Data miners go and determine "People who like Penny Arcade also like Video games - so lets put an Ad for Fable 3 up on the main page" - whether that is Penny-Arcade's decision to get more click-revenue or whether they just let an adserver handle that obvious piece of info is irrelevant, you are still using relevant data with meaning to market to a large group of people instead of an individual.

Now - this article brings up the idea of whether I can retain my anonymity online. If it were up to me to run this experiment, I would do exactly as you said, some advanced behaviour analysis technique. First we'll start off here: I'm on Slashdot. You have an alias, and you have a few of my posts. You can tell that they tend get a little long winded sometimes, easily getting to 3 or more paragraphs if there isn't an immediate punchline in sight, or responding to a question. You also get what stories I usually respond to - I often don't have much to say about Linux releases, but I am often avid in the gaming area.

So you go a lot of the other sites that you can infer slashdotters might frequent. All the tech news sites, and then those towards my posting habits, a lot of gaming sites, yadda yadda yadda. First thing you are looking for is similar aliases, then you cross-refer the posts on different sites to see the similarities. How many Monkeedudes are there on the Gamespy forums? Do any of them make really long posts? He's mentioned on Slashdot that he is Canadian - do any of the other sites have public profile info that say he's Canadian?

And so on and so forth. This is all automated - so it's much quicker than a person trying to build this file. After it's all built, a human can quickly skim the data and knock off any outliers that might have seemed similar to the computer.

Now - have I ever mentioned my name anywhere in all the data collected? My age? My city? Can you infer my age given the relative maturity of my posts - and my registered dates and other posts online? Can you infer my city based on my jokes about the weather around here? How hard would it be to nail me to a Facebook page with various likes and dislikes - if that information were available to you (either publicly or for sale?).

It's a scary world we live in, I don't know if any such systems exist, but I see it as definately technically feasible. It also seems like a great product I could market and make lots of money off of it - but I definately don't believe in progressing that side of the internet.

Did anybody else read.... (1)

tacarat (696339) | more than 3 years ago | (#34103764)

NSFW?

Re:Did anybody else read.... (2, Funny)

Anonymous Coward | more than 3 years ago | (#34104170)

No, but I read NSF Funds, and thought, why is slashdot doing a story on my wife?

Re:Did anybody else read.... (2, Insightful)

kmoser (1469707) | more than 3 years ago | (#34108502)

I read it as "NSFW" and thought the same thing: why is Slashdot doing a story on your wife?

How Benevolent Of The N.S.F. ( +3, Instrusive ) (5, Interesting)

Anonymous Coward | more than 3 years ago | (#34103772)

"The grant will help to further research from computer scientists and linguists, who are looking at ways in which people can still be identified through textual clues even after explicitly identifiable data has been removed." SHOULD READ

"The grant will help to further research from computer scientists, linguists, AND the N.S.A. who are looking at ways in which people can still be identified through textual clues even after explicitly identifiable data has been removed."

Yours In Krasnoyarsk,
Kilgore T.

Identification After Anonymization (0)

Anonymous Coward | more than 3 years ago | (#34104990)

can be achieved with conceptual clustering with galois lattices [google.com].

You can now send me a cashier's check for the sum of Euro 100,000,000,000.

Thanks in advance.

Yours In Akademgorodok,
Kilgore T.

truly anonymous data is often useless (-1, Offtopic)

Anonymous Coward | more than 3 years ago | (#34103778)

Let's say you are in a medical study and there is some measurement like blood chemsitry, spinal tap, imaging (MRI, CT etc) .. which is to be tracked over time to figure out if a drug works or not. If the data is really anonymous then you can't know the time trajectory and you can't do the study. Moral of the story: strict anonymity sounds good, but be careful what you wish for.

Re:truly anonymous data is often useless (0)

Anonymous Coward | more than 3 years ago | (#34104876)

truly anonymous data is often useless

And that differs from fully-attributed data HOW?

NSF (2, Insightful)

Combatso (1793216) | more than 3 years ago | (#34103812)

Headline had me thinking the science grants were returned Non Sufficient Funds... thats a sign of a really bad economy.

Re:NSF (1)

bhcompy (1877290) | more than 3 years ago | (#34103832)

Yea, seriously. It's the Non/In Sufficient Funds funds

Re:NSF (1)

Combatso (1793216) | more than 3 years ago | (#34103912)

you mean everytime i drop an NSF and my bank charges me 20 dollars, that 20 dollars goes to fund Data Anonymization... but there is no way to know for sure, because they didn't catch the guys name who they gave my money to.

Re:NSF (0)

Anonymous Coward | more than 3 years ago | (#34104386)

What bank charges only 20 dollars for NSF? All the banks I know of charge at least twice that.

Re:NSF (1)

Philomage (1851668) | more than 3 years ago | (#34105090)

Even after getting that it was about the National Science Foundation providing funding for a research grant, I was still reading (for a while) to see what it had to do with kiting cheques. :-/ "You can take the nerd out of the trailer park..."

Interesting Spin (3, Informative)

Anonymous Coward | more than 3 years ago | (#34103964)

The research is actually into data mining, not some new forms of encryption/anonymization.

I'm sure the results will provide insight that may lead to better anonymization, but I bet framing the whole thing around the more popular side of that spectrum makes it sell better.

Your tax dollars going to waste! (0)

Anonymous Coward | more than 3 years ago | (#34104240)

If private enterprise won't fund it, it isn't worth doing. Kill the NSF! Kill the DOE! Privatize NOAA & NIST!

Sincerely,

Citizen Tea

Good or bad? (1)

kwbauer (1677400) | more than 3 years ago | (#34104290)

So, is this a good development or a bad development? If the finding better ways to identify people leads to better ways to remove that information then it is better?

Or is it better because it will help us not remain anonymous when we donate to our favorite cause and that organization is in some way involved in US politics?

The EFF already did this... (0)

Anonymous Coward | more than 3 years ago | (#34104484)

Panopticlick [eff.org] already showed that it was child's play to track somebody, even with cookies disabled. Unless the way websites/browsers work is fundamentally changed, this will continue to be the case.

Simple equation (1)

shoehornjob (1632387) | more than 3 years ago | (#34104696)

governemnt entity(CIA+NSA)* national security + keylogger or trojan = we ownz all your base (where base = data). Anonymiztion HAH.

Google translate? (1)

mveloso (325617) | more than 3 years ago | (#34106926)

For better anonymization, you could run the data though google translate a few times. That'll guarantee that it's anonymized.

1.5 million (0)

Anonymous Coward | more than 3 years ago | (#34179844)

1.5 million won't do that much at all! They are going to need alot more that that.
dating contacts [datecontacts.co.uk]

Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...